Compare commits

..

29 Commits

Author SHA1 Message Date
Ben 7e6d4b982e fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.

Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.

Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.

Endpoint inventory (verified safe to remain public):

* /api/status            — version, gateway state, active session count,
                           auth-gate shape. Portal liveness probe target.
* /api/config/defaults   — config-defaults feed for the SPA's Config page
* /api/config/schema     — config schema for the SPA's Config page
* /api/model/info        — model catalogue metadata (context windows)
* /api/dashboard/themes  — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard

No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.

Tests:

* New: test_gated_status_is_public (regression guard with the NAS
  fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
  over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
  never the response)
* New: docker integration check #3 in
  test_dashboard_oauth_gate_engaged_by_default — /api/status
  remains 200 under the gate AND reports auth_required=True so the
  portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
  /api/sessions instead of /api/status (status is public, so it
  can no longer distinguish 'logged in' from 'gate accidentally
  disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
  dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
  test_dashboard_oauth_gate_engaged_by_default probes
  /api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
  test_dashboard_auth_status_endpoint.py (no longer needed since
  /api/status is reachable cold)

Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
2026-05-29 12:03:49 +10:00
Ben 3c6e70aef1 docs(docker): document new persist-across-processes contract and orphan reaper (#20561)
Updates the Docker Backend section of the user-guide configuration page
to match the actual behavior shipped in PR #33645. Pre-PR the docs
claimed "container is stopped and removed on shutdown," which was
never quite true for the documented happy path and is now actively
wrong: in default mode the container survives across Hermes processes
so background processes (npm watchers, dev servers, long-running
pytest) carry over the way the "ONE long-lived container shared
across sessions" promise requires.

Changes to `website/docs/user-guide/configuration.md`:

* Reworked the intro paragraph at the top of the Docker Backend
  section to describe the actual cross-process reuse contract.
* Expanded the YAML example with the new keys
  `docker_persist_across_processes` and `docker_orphan_reaper`, plus
  the pre-existing-but-undocumented `docker_env`, `timeout`, and
  `lifetime_seconds`.  Clarified the `container_persistent` comment
  to disambiguate from `docker_persist_across_processes`.
* Added a `docker_env` vs `docker_forward_env` explainer (one
  injects literal KEY=value, the other forwards values from the
  host/.env — easy to confuse).
* Replaced the one-line "Container lifecycle" paragraph with a full
  subsection covering:
    - the three labels Hermes tags every container with
      (hermes-agent, hermes-task-id, hermes-profile)
    - the label-probe reuse mechanism on startup
    - a teardown-trigger table with four rows for every situation
      that destroys the container in default mode
    - edge cases (OOM kill, profile switching)
* Added an "Environment variable overrides" table covering all
  TERMINAL_* env vars relevant to the Docker backend, including the
  previously-undocumented `TERMINAL_DOCKER_ENV` and
  `HERMES_DOCKER_BINARY`.

Changes to `website/docs/user-guide/docker.md`:

* Extended the cross-link admonition (around l.227) so the
  Hermes-in-Docker page points at the new terminal-backend keys
  (`docker_env`, `docker_persist_across_processes`,
  `docker_orphan_reaper`) alongside the ones already mentioned.

No code changes.  Behavior already covered by tests added in earlier
commits on this branch (#33645 commits 1-5).

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 2f0f03c40d fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close)
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.

Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.

The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.

Two new unit tests pin the behavior:

* `test_cleanup_vm_default_honors_persist_mode` — asserts
  `cleanup_vm(task_id)` does neither docker stop nor docker rm on a
  persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
  asserts the kwarg still flows through the runtime-signature-inspection
  plumbing to the backend's cleanup().

E2E verified against real Docker (in addition to all 17 existing checks):

  ✓ Default cleanup_vm() leaves persist-mode container running
  ✓ cleanup_vm(force_remove=True) removed the container

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 5c2170a7c6 fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561)
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.

This commit splits the cleanup paths cleanly:

* **Persist mode (default)** — cleanup() is a NO-OP for the
  container. Container stays running, processes survive, next Hermes
  process attaches via the existing label probe in ~ms instead of
  waiting for docker start. Resource reclamation happens via the
  orphan reaper at next startup (2 × lifetime_seconds threshold), which
  covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
  docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
  overrides persist mode and tears the container down unconditionally.
  cleanup_vm(task_id) now defaults to force_remove=True since
  it's the user-driven reset path (called from AIAgent.close(),
  /reset-style flows, and the idle reaper's per-turn cleanup).

The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.

E2E verified against real Docker:

  ✓ Container is still running after cleanup()
  ✓ Background process (sleep loop) survived cleanup()
  ✓ Filesystem state preserved across cleanup()
  ✓ In-process container_id cleared (next __init__ will re-probe)
  ✓ Background process visible from reused env (no docker start happened)
  ✓ force_remove=True removed the container even in persist mode
  ✓ cleanup_vm() removed the container (defaults to force_remove=True)

Test changes:

* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
  `test_cleanup_with_persist_is_noop_for_container` — asserts neither
  stop nor rm runs in persist mode, and the in-process handle is
  cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
  — covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
  `test_wait_for_cleanup_after_cleanup_returns_true` to pass
  `force_remove=True` so they actually exercise the docker code path
  (default no-op would trivially pass).

cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben d77d877665 fix(docker): startup orphan reaper for crashed-process containers
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.

But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.

This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.

Implementation:

* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
  Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
  ``label=hermes-profile=<current>``. Per-container ``docker inspect``
  parses ``State.FinishedAt`` (with nanosecond-precision trimming for
  Python's microsecond-bound ``fromisoformat``); containers older than
  the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
  load-bearing — a running container may belong to a sibling Hermes
  process whose reuse path will pick it up; killing it would crash the
  sibling mid-command. Single-container failures are logged and the
  sweep continues to the next candidate.

* New ``_maybe_reap_docker_orphans()`` helper in
  ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
  ``env_type == "docker"``. Gated by:

    - ``terminal.docker_orphan_reaper: true`` (default; opt-out for
      operators running multiple Hermes processes in the same profile
      who don't trust the conservative defaults)
    - ``_docker_orphan_reaper_ran`` module flag with double-checked
      locking — parallel subagents and RL rollouts don't trigger N
      concurrent docker ps storms
    - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
      (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
      setup)
    - Profile scoping — a research profile NEVER reaps the default
      profile's stragglers
    - Exception swallow — a janitor failure must never block container
      creation

* New config ``terminal.docker_orphan_reaper`` wired through all four
  config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
  tests/conftest.py) and pinned by
  ``test_docker_orphan_reaper_is_bridged_everywhere``.

Coverage:

* 9 new unit tests in test_docker_environment.py — happy path, recent-
  container sparing, profile scoping, unparseable-timestamp safety,
  docker-ps-failure handling, partial-failure continuation, nanosecond
  timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
  — once-per-process gate, disable-flag respected, lifetime doubling
  with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.

Closes #20561 (combined with the two prior commits on this branch).
2026-05-29 11:49:54 +10:00
Ben ac8e238bc8 fix(docker): reuse containers across processes + fix cleanup leaks
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).

Three root causes, all addressed by this commit:

1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
   which raced with parent-process exit — when Python exited promptly the
   detached shell child got killed mid-\`docker stop\`, leaving stopped
   containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
   (the bind-mount-persistence flag). Default config sets
   \`container_persistent: true\`, so the default happy path skipped \`rm\`
   entirely — even when the user explicitly didn't want cross-process
   reuse, containers leaked.

Fix:

* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
  true, init probes
  \`docker ps -a --filter label=hermes-agent=1
                  --filter label=hermes-task-id=<task>
                  --filter label=hermes-profile=<profile>\`
  and reuses a matching container (running → attach; stopped →
  \`docker start\` → attach; \`docker start\` failure → fall through to a
  fresh \`docker run\`). Multiple matches prefer the running one, with the
  stragglers left for the orphan reaper (next commit) to clean up.

* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
  daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
  \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
  whenever \`persist_across_processes\` is false, regardless of the
  bind-mount-persistence setting. The leak class is gone in all
  combinations.

* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
  hook calls this on every active env, blocking up to 15s for the
  cleanup thread before interpreter exit. Without this, \`hermes /quit\`
  raced the daemon-thread teardown and dropped the stop/rm work.

* New config \`terminal.docker_persist_across_processes\` (default
  \`true\` — restores the documented contract). Set \`false\` for hard
  per-process isolation. Wired through all four config-bridge sites
  (cli.py env_mappings, gateway/run.py _terminal_env_map,
  hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
  list); regression-pinned by
  \`test_docker_persist_across_processes_is_bridged_everywhere\` matching
  the existing pattern for docker_run_as_host_user / docker_env.

Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.

Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 8d129d013b fix(docker): tag containers with hermes-agent labels for identification
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.

This commit adds three labels at `docker run` time:

  --label hermes-agent=1                # global sweep target
  --label hermes-task-id=<sanitized>    # per-task reuse key
  --label hermes-profile=<sanitized>    # per-profile isolation key

Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.

No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.

Refs #20561
2026-05-29 11:49:54 +10:00
Teknium 300140e006 test(tui_gateway): stop reloading server module in fixture teardown (#34217)
tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
2026-05-28 18:16:54 -07:00
Teknium e71a2bd11b chore: release v0.15.1 (2026.5.29) (#34222) 2026-05-28 18:11:49 -07:00
Teknium 769ee86cd2 feat(kanban): attach images referenced in task bodies to worker vision (#34210)
Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
2026-05-28 17:50:42 -07:00
Ben 1b1e30510a test(docker): repair dashboard tests broken by the insecure-opt-in fix
The Docker integration test job started failing on main after
fb5125362 ("docker: opt in to dashboard --insecure via env var").
Two distinct failures, both fallout from that change being more
behaviour-changing than the existing test harness anticipated.

Failure 1 — test_dashboard_port_override (silent regression in an
already-existing test)
The test starts the container with just HERMES_DASHBOARD=1, defaults
to host=0.0.0.0, no HERMES_DASHBOARD_OAUTH_CLIENT_ID, no
HERMES_DASHBOARD_INSECURE. Pre-fix that combination got --insecure
auto-injected by the s6 run script (anything non-loopback was
implicitly insecure), so the OAuth gate stayed off and start_server
bound the port. Post-fix the gate engages, no provider is
registered, and start_server raises SystemExit before binding —
under s6 the dashboard goes into a restart loop and the test's
/proc/net/tcp poll finds nothing.

Same silent regression was masking three sibling tests
(test_dashboard_slot_reports_up_when_enabled, test_dashboard_opt_in_starts,
test_dashboard_restarts_after_crash) — they all only sample pgrep
or s6-svstat and so caught the supervised process mid-restart
loop, appearing to pass while the dashboard was actually never
reaching a healthy state.

Fix: pin HERMES_DASHBOARD_INSECURE=1 on every test that enables
the dashboard but doesn't itself exercise the auth gate. Each
pinned site carries an inline comment pointing back to
test_dashboard_slot_reports_up_when_enabled for the full
rationale.

Failure 2 — test_dashboard_oauth_gate_engages_on_non_loopback_bind
(bug in the test I added in fb5125362)
The probe used urllib.request.urlopen() against /api/status. Under
the now-engaged OAuth gate /api/status no longer answers
unauthenticated callers (the gate middleware runs upstream of the
legacy _SESSION_TOKEN allowlist and 401s anything without a valid
session cookie). urlopen() raises HTTPError on the 401, the wrapper
treated that as "not ready yet", and the poll loop hit
timeout.

Fix: split the probe into a generic _http_probe() helper that
returns (status_code, body) for any HTTP response — including 401,
which IS the gate-engaged success signal. The helper feeds a
multi-line Python program over stdin via a POSIX heredoc so the
try/except branch reads naturally; far less fragile than the
earlier semicolon-laden -c one-liner.

The OAuth-gate test now verifies two independent observable
consequences of the gate being on:

  1. GET /api/auth/providers (publicly reachable through the gate
     so the login page can bootstrap) returns 200 with `nous` in
     the provider list — proves the bundled provider registered.
  2. GET /api/status returns 401 — proves the OAuth gate runs
     upstream of the legacy public-paths allowlist and is
     actively intercepting unauthenticated callers.

The insecure-opt-out test still hits /api/status, but now
asserts status_code == 200 first (proves the gate is bypassed)
before parsing the JSON for auth_required: false (proves the
gate-state flag is also correctly off).

Verified locally end-to-end against a fresh image build on a
real Docker daemon: all 41 tests under tests/docker/ pass in
2m38s, including the two formerly-failing dashboard tests and
the three sibling tests that were passing by accident.
2026-05-29 10:30:52 +10:00
Teknium f3acdd94fe Merge pull request #30698 from NousResearch/refactor/use-ds-primitives
refactor(web): consume DS primitives, remove local component copies
2026-05-28 17:29:28 -07:00
Teknium 78a54d2c00 fix(skills-page): source pills and category sidebar collapsed to All only (#34194)
Regression from PR #33809 (lazy-fetch refactor). The `sources` and
`categoryEntries` useMemo blocks were derived from `allSkillsLocal`
but had empty/incomplete deps arrays — so they computed once at mount
when the catalog was still `[]`, then never recomputed when the fetch
resolved.

Symptom: live site shows only the "All 87,639" source button and
"All Skills 87,639" category — no per-source pills (ClawHub, skills.sh,
LobeHub, etc.) and no category breakdown. Filtering by source/category
is unusable.

Fix: add `allSkillsLocal` to both deps arrays so they recompute when
data arrives. Local build green on en + zh-Hans.
2026-05-28 17:11:40 -07:00
Ben e7c99651fb fix(mcp): resolve bare npx/npm/node against /usr/local/bin
When the Hermes Docker image runs an stdio MCP server configured with an
explicit env.PATH that omits /usr/local/bin (a common pattern when users
hand-author PATH for sandboxing), the MCP env-filter passes that narrow
PATH straight through to the subprocess. _resolve_stdio_command's
fallback for bare 'npx' / 'npm' / 'node' commands only checked
$HERMES_HOME/node/bin/ and ~/.local/bin/, so execvp() failed with
'[Errno 2] No such file or directory: npx' on every Node-based stdio
MCP server (Railway, Anthropic, GitHub Copilot, etc.).

The naive workaround — symlink /usr/local/bin/npx into the user's PATH —
fails one layer deeper because npx's shebang re-execs /usr/bin/env node
and node also lives at /usr/local/bin/node.

Fix: add /usr/local/bin/<cmd> as a third candidate in the fallback list.
This is the canonical install location for Node on:
  - Linux from-source builds
  - the upstream node:bookworm-slim image, which the Hermes Docker
    image copies node + npm + corepack from since #4977 (the Node 22 LTS
    refactor that exposed this)
  - macOS Homebrew on Intel

Because the resolver already calls _prepend_path(resolved_env, command_dir)
after locating the command, /usr/local/bin gets prepended to the env's
PATH automatically, which also fixes the second-layer shebang failure
(npx-cli.js can now find node).

Scope is intentionally narrow: the fix activates only when the bare
command isn't otherwise locatable through the user's PATH. Users who
explicitly narrowed PATH for a non-Node MCP server see no change in
behavior.

Tested:
  - tests/tools/test_mcp_tool_issue_948.py: new test
    test_resolve_stdio_command_falls_back_to_usr_local_bin (mirrors the
    existing hermes-node-bin fallback test)
  - Full MCP test suite: 254/254 pass across 7 test files
  - E2E against a freshly-built Docker image: reproduced the original
    failure mode (env.PATH=/opt/data/bin:/usr/bin:/bin), confirmed the
    resolver returns /usr/local/bin/npx and prepends /usr/local/bin to
    PATH; subprocess.run of the resolved command prints '10.9.8' and
    exits 0 with empty stderr
  - Negative E2E on the host (where Node is already on PATH via mise):
    resolver still hits the mise install dir, /usr/local/bin candidate
    is not consulted, PATH is unchanged
2026-05-29 10:05:42 +10:00
Ben fb51253620 docker: opt in to dashboard --insecure via env var, never derive from bind host
The s6 dashboard run script flipped `--insecure` on whenever
`HERMES_DASHBOARD_HOST` was anything other than 127.0.0.1 / localhost.
That comment ("the dashboard refuses otherwise") predates the OAuth
auth gate: back when it was written, `start_server` would SystemExit
on any non-loopback bind, so the run script's `--insecure` was the
only way to make in-container deployments work at all.

The gate has since been replaced by `should_require_auth(host,
allow_public)`, which engages the OAuth flow when a
`DashboardAuthProvider` is registered (the bundled `dashboard_auth/nous`
provider auto-registers on `HERMES_DASHBOARD_OAUTH_CLIENT_ID`) and
fails closed with a specific operator-facing error when none is. The
host-derived `--insecure` ran upstream of all that and silently
disabled the gate on every container-deployed dashboard.

Most visible under the portal's wildcard-subdomain rollout: every Fly
machine binds 0.0.0.0 so the edge can reach Flycast, every machine
boots with the correct `HERMES_DASHBOARD_OAUTH_CLIENT_ID`, the nous
provider registers — and `/api/status` still returns
`{"auth_required": false, "auth_providers": ["nous"]}` because the
run script disabled the gate before `start_server` ever saw the
request. The dashboard SPA was served to anyone, no `/login` redirect,
no OAuth challenge.

Fix: derive `--insecure` from an explicit opt-in env var,
`HERMES_DASHBOARD_INSECURE` (truthy values matching the rest of the
s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on
trusted LANs behind a reverse proxy without the OAuth contract
(the existing `docker-compose.windows.yml` use case) opt in
explicitly; portal-managed agent deployments leave it unset and let
the gate engage.

`docker-compose.windows.yml` already passes `--insecure` on the
`command:` array directly (line 38), so it doesn't depend on the s6
auto-injection. No compose-file change required.

Tests:
* `tests/test_docker_home_override_scripts.py` — extends the existing
  static-text guard with a regression assertion that the legacy
  host-derived case-statement is gone and the new env-var opt-in is
  present (locks against accidental revert).
* `tests/docker/test_dashboard.py` — adds two Docker-in-Docker tests
  exercising the actual `/api/status` round-trip:
  - 0.0.0.0 bind + `HERMES_DASHBOARD_OAUTH_CLIENT_ID` → gate engaged
  - 0.0.0.0 bind + `HERMES_DASHBOARD_INSECURE=1` → gate disabled

Docs:
* `website/docs/user-guide/docker.md` + zh-Hans i18n — adds the new
  env var to the table, replaces the stale prose ("the entrypoint
  no longer auto-enables insecure mode" — which until this PR was
  flat-out wrong) with an accurate description of the gate's
  trigger conditions and the explicit opt-out.

shellcheck clean. Python static-text test passes locally. Behavioural
test will run against any future image build (CI's Docker harness).
2026-05-29 09:56:40 +10:00
Evo ef009a987a docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE from #33583 (#33751)
* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)

* docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)
2026-05-29 09:44:53 +10:00
BROCCOLO1D 130396c658 ci(docker): avoid gha cache on arm64 PR builds 2026-05-29 09:43:48 +10:00
Austin Pickett a5c1f925b5 fix(web): stop /api/auth/me 401 from triggering a reload loop
In loopback mode the dashboard's identity probe (/api/auth/me) returns
401 by design — AuthWidget swallows it and renders nothing. But the
probe routed through fetchJSON, whose loopback 401 handler treats a 401
as a rotated session token and full-page-reloads to pick up a fresh one.
That reload is guarded by a one-shot sessionStorage flag which every
*successful* request clears, so with auth/me reliably 401ing and the
other dashboard calls (status/config/sessions) reliably succeeding, the
guard never sticks and the page reload-loops indefinitely (the "boot
flash").

Add an allowUnauthorized option to fetchJSON that skips only the loopback
stale-token reload (the 401 still throws so AuthWidget can catch it, and
the gated-mode login_url envelope redirect is unaffected), and use it for
getAuthMe.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:58:42 -04:00
kshitij 11d93096b3 Merge pull request #34097 from kshitijk4poor/salvage/memori-trace-messages
feat: expose completed-turn message context to memory providers (salvage #28065)
2026-05-28 13:56:07 -07:00
kshitijk4poor d464d08a5f chore: add devwdave to AUTHOR_MAP
Maps both commit emails (david@memorilabs.ai, dave@devwdave.com) used on
#28065 to the devwdave GitHub account so the contributor audit in
scripts/release.py passes.
2026-05-29 02:16:43 +05:30
Dave Heritage 5a95fb2e14 feat: expose completed-turn message context to memory providers
Adds an optional `messages` keyword to the `MemoryProvider.sync_turn`
contract so external/community memory plugins can receive the OpenAI-style
conversation message list for the completed turn — including assistant tool
calls and tool result content — not just the final assistant text.

Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only
providers that declare a `messages` parameter (or `**kwargs`) receive it; all
existing in-tree providers keep their legacy text-only signature and are
called unchanged. No structured-trace envelope is added to core — providers
reconstruct whatever they need from the standard message list.

Also documents Memori as a standalone community memory provider.

Salvaged from #28065 — rebased onto current main.

Co-authored-by: Dave Heritage <david@memorilabs.ai>
2026-05-29 02:16:43 +05:30
Austin Pickett 0acb7f4583 fix(nix): update hermes-web npmDepsHash for @nous-research/ui 0.18.2
The web/package-lock.json changed when bumping @nous-research/ui to
0.18.2, so the fetchNpmDeps fixed-output hash in nix/web.nix was stale.
Update it to the hash prefetch-npm-deps computes for the new lockfile.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:24:01 -04:00
Austin Pickett a3cd974ee7 chore(web): bump @nous-research/ui to 0.18.2
Picks up the deferred GPU-tier detection fix (design-language) that
stops the synchronous WebGL probe from blocking first paint, which was
causing a boot-time flash in the dashboard backdrop.

nix/web.nix npmDepsHash is a placeholder here and is corrected in the
follow-up commit using the hash reported by the Nix CI job.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:20:14 -04:00
Teknium ea5a6c216b ci(deploy): allow workflow_dispatch to also trigger Vercel deploy (#34081)
Today's three skills-index PRs (#33748, #33809, #34025) merged to main
but the live Vercel-hosted docs site didn't pick them up — Vercel is
fired by the deploy-vercel job, which was gated on release events only.
Out-of-band main commits between releases couldn't reach Vercel without
cutting a tag.

Widen the gate to also include workflow_dispatch so 'gh workflow run
deploy-site.yml' can ship pending main changes to Vercel on demand.
Release-tag behavior is unchanged.
2026-05-28 13:17:58 -07:00
kshitijk4poor 4df62d239e docs(hindsight): correct recall_types scope — tool path is also narrowed
The original change's description and README claimed the per-call
hindsight_recall tool was unaffected by the new observation-only default.
That is inaccurate: hindsight_recall reads the same self._recall_types
instance attribute as the auto-recall prefetch path, and RECALL_SCHEMA
exposes no per-call types argument, so the model cannot override it.
Narrowing the default narrows BOTH paths.

Corrects the README behavior-change note, the config-table row, and the
get_config_schema description to reflect that recall_types applies to
both auto-recall and the hindsight_recall tool.
2026-05-28 13:07:20 -07:00
Nicolò Boschi 490b3e76b1 feat(hindsight): default recall_types to observation only
Auto-recall used to surface every fact type Hindsight had on the
session — `world`, `experience`, and `observation`. That triple-ships
the same underlying signal in three different framings: observations
are the concrete events the user said/did/asked, while world and
experience facts are aggregate summaries Hindsight derives from those
exact observations. Including all three burns most of
`recall_max_tokens` on rephrasings, crowds out events the model
actually needs to see, and produces effective duplicates in the
prompt — observations themselves are deduplicated by construction
so observation-only recall is denser per token and closer to
conversational ground truth.

Change
------
- Default `_recall_types = ["observation"]` (was `None`, which
  delegated to server-side "return everything").
- `initialize()` now treats a missing `recall_types` config the same
  way; also accepts comma-separated strings for parity with `recall_tags`.
- An explicit `recall_types=[]` config falls back to the default rather
  than disabling the filter (would silently widen recall vs. the new
  default).
- Added to `get_config_schema()` so it's discoverable via `hermes config`.

Per-call `hindsight_recall` tool invocations are unaffected — they
already only forward `types` when the caller passes the argument.

Docs / migration
----------------
plugins/memory/hindsight/README.md grows a "Behavior change" callout
explaining the why (no-duplicates, information-efficient) and how to
restore the legacy broad recall:

    "recall_types": "observation,world,experience"   # or a JSON list

in `~/.hermes/hindsight/config.json`.

Tests
-----
- `test_default_values` updated for the new default.
- New cases: explicit list override, CSV string accepted, empty list
  falls back to default (not "wider than default").
2026-05-28 13:07:20 -07:00
Austin Pickett 102eb4adc0 fix(nix): update hermes-web npmDepsHash for bumped @nous-research/ui
The web/package-lock.json changed when bumping @nous-research/ui to 0.18.0,
so the fetchNpmDeps fixed-output hash in nix/web.nix was stale and the nix
build failed. Update it to the hash prefetch-npm-deps computes for the new
lockfile.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 14:27:08 -04:00
Austin Pickett c661fefa08 Merge remote-tracking branch 'origin/main' into refactor/use-ds-primitives
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	web/src/components/BottomPickSheet.tsx
#	web/src/components/SidebarFooter.tsx
#	web/src/components/ui/card.tsx
#	web/src/components/ui/confirm-dialog.tsx
#	web/src/pages/ChatPage.tsx
2026-05-28 14:20:49 -04:00
Austin Pickett c9e5a9bb08 refactor(web): consume DS primitives, remove local component copies
Replace locally-forked UI components and hooks with their newly
promoted counterparts from @nous-research/ui:

Deleted local components (now in DS):
- components/ui/input.tsx, label.tsx, separator.tsx, card.tsx,
  confirm-dialog.tsx
- components/Toast.tsx, BottomPickSheet.tsx, NouiTypography.tsx
- hooks/useToast.ts, useModalBehavior.ts, useBelowBreakpoint.ts,
  useConfirmDelete.ts

Import updates across 25 files to use DS deep imports:
- @nous-research/ui/ui/components/{input,label,separator,card,
  confirm-dialog,toast,bottom-sheet}
- @nous-research/ui/ui/components/typography (replaces NouiTypography)
- @nous-research/ui/hooks/{use-toast,use-modal-behavior,
  use-below-breakpoint,use-confirm-delete}

Requires design-language >= feat/promote-hermes-web-primitives.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 21:57:59 -04:00
95 changed files with 6074 additions and 1049 deletions
+6 -1
View File
@@ -22,7 +22,12 @@ concurrency:
jobs:
deploy-vercel:
if: github.event_name == 'release'
# Triggered automatically on release publish (production cuts) and
# manually via `gh workflow run deploy-site.yml` when an out-of-band
# main commit needs to ship live before the next release tag — e.g.
# a skills-index PR that doesn't touch website/** paths and so
# doesn't auto-deploy via the deploy-docs path.
if: github.event_name == 'release' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
steps:
- name: Trigger Vercel Deploy
+20 -4
View File
@@ -196,10 +196,26 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for smoke testing. Cached
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (arm64, smoke test)
# Build once, load into the local daemon for smoke testing. PR arm64
# builds deliberately avoid the gha cache: cold-cache arm64 builds can
# outlive GitHub's short-lived Azure cache SAS token, then fail while
# reading or writing cache blobs before the smoke test can run.
- name: Build image (arm64, smoke test, uncached PR)
if: github.event_name == 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
# Main/release builds still use the per-arch gha cache so the digest
# push below can reuse layers from this smoke-test build.
- name: Build image (arm64, smoke test, cached publish)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
+110
View File
@@ -0,0 +1,110 @@
# Hermes Agent v0.15.1 (v2026.5.29)
**Release Date:** May 29, 2026
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
---
## ✨ Highlights
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
---
## 🐛 Bug Fixes
### Dashboard / Web
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
### Docker
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
### MCP
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
### Kanban
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
### Gateway
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
### CLI
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
### Skills
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
### Redaction
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
---
## ✨ Small Features
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
---
## 📚 Documentation
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
---
## 🛠️ Infrastructure
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
---
## 👥 Contributors
### Core
- @teknium1
### Community
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
- @kshitijk4poor`/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
- @nicoloboschi — hindsight `recall_types` observation default
- @BROCCOLO1D — arm64 PR build cache fix
- @r266-tech — `--no-supervise` reference docs
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
- @devwdave — completed-turn memory context (credited via salvage)
- @andrewhosf — co-author
### Issue Reporters (the 401 loop)
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
---
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
+2 -2
View File
@@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.15.0",
"version": "0.15.1",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.15.0",
"package": "hermes-agent[acp]==0.15.1",
"args": ["hermes-acp"]
}
}
+1
View File
@@ -4561,6 +4561,7 @@ def run_conversation(
original_user_message=original_user_message,
final_response=final_response,
interrupted=interrupted,
messages=messages,
)
# Background memory/skill review — runs AFTER the response is delivered
+134 -14
View File
@@ -37,6 +37,8 @@ from __future__ import annotations
import base64
import logging
import mimetypes
import os
import re
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
@@ -46,6 +48,102 @@ logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
# Image extensions used by extract_image_refs(). Kept tight on purpose — we
# only auto-attach things the model can actually see. Documents/archives are
# excluded because the gateway's broader extract_local_files() also routes
# them differently (send_document), and we don't want to attach a PDF as a
# vision part.
_IMAGE_EXTS = (
".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".heic",
)
_IMAGE_EXT_PATTERN = "|".join(e.lstrip(".") for e in _IMAGE_EXTS)
# Absolute / home-relative local image path. Matches the same shape gateway's
# extract_local_files() uses: anchors to ``~/`` or ``/``, ignores matches inside
# URLs (the ``(?<![/:\w.])`` lookbehind), and case-insensitive on the extension.
_LOCAL_IMAGE_PATH_RE = re.compile(
r"(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:" + _IMAGE_EXT_PATTERN + r")\b",
re.IGNORECASE,
)
# http(s) URL ending in an image extension (optionally followed by a
# query string). Case-insensitive on the extension. Strict ``http(s)://``
# scheme so we don't accidentally grab ``file://`` URLs or other shapes.
_IMAGE_URL_RE = re.compile(
r"https?://[^\s<>\"']+?\.(?:" + _IMAGE_EXT_PATTERN + r")(?:\?[^\s<>\"']*)?",
re.IGNORECASE,
)
def extract_image_refs(text: str) -> Tuple[List[str], List[str]]:
"""Scan free-form text for image references the model should see.
Returns ``(local_paths, urls)``:
* ``local_paths`` — absolute (``/``) or home-relative (``~/``) paths
whose suffix is an image extension AND whose expanded form exists
on disk as a file. Order-preserving, deduplicated.
* ``urls`` — ``http(s)://…`` URLs whose path ends in an image
extension (a ``?query`` is allowed after the extension).
Order-preserving, deduplicated.
Matches inside fenced code blocks (``` ``` ```) and inline backticks
(`` `…` ``) are skipped so that snippets pasted into a task body for
reference aren't mistaken for live attachments. This mirrors the
behaviour of ``gateway.platforms.base.BaseAdapter.extract_local_files``.
Local paths are validated against the filesystem; URLs are not
(the provider fetches them at request time).
"""
if not isinstance(text, str) or not text:
return [], []
# Build spans covered by fenced code blocks and inline code so we can
# ignore references the author embedded purely as example text.
code_spans: list[tuple[int, int]] = []
for m in re.finditer(r"```[^\n]*\n.*?```", text, re.DOTALL):
code_spans.append((m.start(), m.end()))
for m in re.finditer(r"`[^`\n]+`", text):
code_spans.append((m.start(), m.end()))
def _in_code(pos: int) -> bool:
return any(s <= pos < e for s, e in code_spans)
local_paths: list[str] = []
seen_paths: set[str] = set()
for match in _LOCAL_IMAGE_PATH_RE.finditer(text):
if _in_code(match.start()):
continue
raw = match.group(0)
expanded = os.path.expanduser(raw)
try:
if not os.path.isfile(expanded):
continue
except OSError:
# ENAMETOOLONG / EINVAL on pathological inputs — skip rather than crash.
continue
if expanded in seen_paths:
continue
seen_paths.add(expanded)
local_paths.append(expanded)
urls: list[str] = []
seen_urls: set[str] = set()
for match in _IMAGE_URL_RE.finditer(text):
if _in_code(match.start()):
continue
url = match.group(0)
# Strip trailing punctuation that's almost certainly prose, not part
# of the URL (e.g. "see https://x.com/a.png." or "/a.png)").
url = url.rstrip(".,;:!?)]>")
if url in seen_urls:
continue
seen_urls.add(url)
urls.append(url)
return local_paths, urls
# Strict YAML/JSON boolean coercion for capability overrides.
#
# ``bool("false")`` is True in Python because non-empty strings are truthy, so
@@ -320,20 +418,29 @@ def _file_to_data_url(path: Path) -> Optional[str]:
def build_native_content_parts(
user_text: str,
image_paths: List[str],
image_urls: Optional[List[str]] = None,
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
{"type": "image_url", "image_url": {"url": "https://example.com/a.png"}},
...]
The local path of each successfully attached image is appended to the
text part as ``[Image attached at: <path>]``. The model still sees the
pixels via the ``image_url`` part (full native vision); the path note
just gives it a string handle so MCP/skill tools that take an image
path or URL argument can be invoked on the same image without an
extra round-trip. This parallels the text-mode hint produced by
Local paths are read from disk and embedded as base64 ``data:`` URLs.
Remote URLs (``http(s)://``) are passed through verbatim — the provider
fetches them server-side. The model still sees the pixels either way.
For each successfully attached image, a hint is appended to the text
part:
* local path → ``[Image attached at: <path>]``
* URL → ``[Image attached: <url>]``
The hint gives the model a string handle so MCP/skill tools that take
an image path or URL argument can be invoked on the same image without
an extra round-trip. This parallels the text-mode hint produced by
``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
<path>``) so behaviour is consistent across both image input modes.
@@ -342,12 +449,14 @@ def build_native_content_parts(
ceiling), the agent's retry loop transparently shrinks and retries
once — see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk and are NOT advertised in the path hints.
Returns (content_parts, skipped). Skipped entries are local paths
that couldn't be read from disk; URLs are never skipped (they're
not validated here).
"""
skipped: List[str] = []
image_parts: List[Dict[str, Any]] = []
attached_paths: List[str] = []
attached_urls: List[str] = []
for raw_path in image_paths:
p = Path(raw_path)
@@ -364,16 +473,26 @@ def build_native_content_parts(
})
attached_paths.append(str(raw_path))
for url in image_urls or []:
url = (url or "").strip()
if not url:
continue
image_parts.append({
"type": "image_url",
"image_url": {"url": url},
})
attached_urls.append(url)
text = (user_text or "").strip()
# If at least one image attached, build a single text part that combines
# the user's caption (or a neutral default) with one path hint per image.
if attached_paths:
# the user's caption (or a neutral default) with one hint per image.
if attached_paths or attached_urls:
base_text = text or "What do you see in this image?"
path_hints = "\n".join(
f"[Image attached at: {p}]" for p in attached_paths
)
combined_text = f"{base_text}\n\n{path_hints}"
hint_lines: List[str] = []
hint_lines.extend(f"[Image attached at: {p}]" for p in attached_paths)
hint_lines.extend(f"[Image attached: {u}]" for u in attached_urls)
combined_text = f"{base_text}\n\n" + "\n".join(hint_lines)
parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
parts.extend(image_parts)
return parts, skipped
@@ -388,4 +507,5 @@ def build_native_content_parts(
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
"extract_image_refs",
]
+33 -2
View File
@@ -368,11 +368,42 @@ class MemoryManager:
# -- Sync ----------------------------------------------------------------
def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
@staticmethod
def _provider_sync_accepts_messages(provider: MemoryProvider) -> bool:
"""Return whether sync_turn accepts a messages keyword."""
try:
signature = inspect.signature(provider.sync_turn)
except (TypeError, ValueError):
return True
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return True
return "messages" in signature.parameters
def sync_all(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
"""Sync a completed turn to all providers."""
for provider in self._providers:
try:
provider.sync_turn(user_content, assistant_content, session_id=session_id)
if messages is not None and self._provider_sync_accepts_messages(provider):
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
messages=messages,
)
else:
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
)
except Exception as e:
logger.warning(
"Memory provider '%s' sync_turn failed: %s",
+12 -1
View File
@@ -112,11 +112,22 @@ class MemoryProvider(ABC):
that do background prefetching should override this.
"""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
def sync_turn(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
"""Persist a completed turn to the backend.
Called after each turn. Should be non-blocking — queue for
background processing if the backend has latency.
``messages`` is the OpenAI-style conversation message list as of the
completed turn, including any assistant tool calls and tool results.
Providers that do not need raw turn context can ignore it.
"""
@abstractmethod
+52 -6
View File
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
# Persistent shell (non-local backends)
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
@@ -15125,13 +15127,50 @@ def main(
# Handle single query mode
if query or image:
query, single_query_images = _collect_query_images(query, image)
# Kanban workers spawn with ``hermes chat -q "work kanban task <id>"``;
# the actual task description lives in the task body. Mirror the
# gateway/CLI behaviour for inbound images by scanning the body for
# local image paths and http(s) image URLs and attaching them to the
# worker's first turn. Without this, users who paste a screenshot
# path or URL into a kanban task body never get it routed to the
# model's vision input.
single_query_image_urls: list[str] = []
_kanban_task_id = os.environ.get("HERMES_KANBAN_TASK", "").strip()
if _kanban_task_id:
try:
from hermes_cli import kanban_db as _kb
from agent.image_routing import extract_image_refs as _extract_refs
_conn = _kb.connect()
try:
_task = _kb.get_task(_conn, _kanban_task_id)
finally:
try:
_conn.close()
except Exception:
pass
_body = getattr(_task, "body", "") if _task is not None else ""
if _body:
_kb_paths, _kb_urls = _extract_refs(_body)
if _kb_paths:
# Dedupe against any --image the user already passed.
_seen = {str(p) for p in single_query_images}
for _p in _kb_paths:
if _p not in _seen:
_seen.add(_p)
single_query_images.append(Path(_p))
if _kb_urls:
single_query_image_urls.extend(_kb_urls)
except Exception as _exc:
# Best-effort enrichment; never block worker startup on it.
logger.debug("kanban image-ref extraction failed: %s", _exc)
if quiet:
# Quiet mode: suppress banner, spinner, tool previews.
# Only print the final response and parseable session info.
cli.tool_progress_mode = "off"
if cli._ensure_runtime_credentials():
effective_query: Any = query
if single_query_images:
if single_query_images or single_query_image_urls:
# Honour the same image-routing decision used by the
# interactive path. With a vision-capable model (incl.
# custom-provider models declared via
@@ -15160,19 +15199,26 @@ def main(
_parts, _skipped = _build_parts(
query if isinstance(query, str) else "",
[str(p) for p in single_query_images],
image_urls=list(single_query_image_urls) or None,
)
if any(p.get("type") == "image_url" for p in _parts):
effective_query = _parts
else:
# All images unreadable — text fallback.
# ``_preprocess_images_with_vision`` only knows
# about local files; URLs would be lost there,
# so keep the original query text intact when
# only URLs were supplied.
if single_query_images:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
except Exception:
if single_query_images:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
except Exception:
effective_query = cli._preprocess_images_with_vision(
query, single_query_images, announce=False,
)
else:
elif single_query_images:
effective_query = cli._preprocess_images_with_vision(
query,
single_query_images,
+14 -6
View File
@@ -30,13 +30,21 @@ cd /opt/data
dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"
dash_port="${HERMES_DASHBOARD_PORT:-9119}"
# Binding to anything other than localhost requires --insecure — the
# dashboard refuses otherwise because it exposes API keys. Inside a
# container this is the expected deployment.
# `--insecure` is opt-in via HERMES_DASHBOARD_INSECURE. The dashboard's
# OAuth auth gate engages automatically on non-loopback binds when a
# DashboardAuthProvider is registered (e.g. the bundled dashboard_auth/nous
# provider, which auto-registers when HERMES_DASHBOARD_OAUTH_CLIENT_ID is
# set). If no provider is registered, start_server fails closed with a
# specific operator-facing error.
#
# This used to derive --insecure from the bind host ("anything non-loopback
# implies insecure"), but that predates the OAuth gate and silently
# disabled it on every container-deployed dashboard. The gate is now the
# authority; operators on trusted LANs / behind a reverse proxy without
# the OAuth contract opt in explicitly.
insecure=""
case "$dash_host" in
127.0.0.1|localhost) ;;
*) insecure="--insecure" ;;
case "${HERMES_DASHBOARD_INSECURE:-}" in
1|true|TRUE|True|yes|YES|Yes) insecure="--insecure" ;;
esac
# shellcheck disable=SC2086 # word-splitting of $insecure is intentional
+2
View File
@@ -831,6 +831,8 @@ if _config_path.exists():
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
}
+2 -2
View File
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
__version__ = "0.15.0"
__release_date__ = "2026.5.28"
__version__ = "0.15.1"
__release_date__ = "2026.5.29"
def _ensure_utf8():
+2
View File
@@ -5551,6 +5551,8 @@ def set_config_value(key: str, value: str):
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
# terminal.cwd intentionally excluded — CLI resolves at runtime,
# gateway bridges it in gateway/run.py. Persisting to .env causes
+20 -1
View File
@@ -26,10 +26,15 @@ from hermes_cli.dashboard_auth import list_providers
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
from hermes_cli.dashboard_auth.base import ProviderError
from hermes_cli.dashboard_auth.cookies import read_session_cookies
from hermes_cli.dashboard_auth.public_paths import PUBLIC_API_PATHS
_log = logging.getLogger(__name__)
# Paths that bypass the auth gate. Order matters: prefix match.
# Prefixes that bypass the auth gate. Match via ``path == prefix`` or
# ``path.startswith(prefix)`` — so ``/assets/`` (with trailing slash)
# matches ``/assets/foo.css`` but not ``/assetsleak``. Auth-bootstrap
# (login page, OAuth round trip, provider listing) and static asset
# mounts go here.
_GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
"/auth/login",
"/auth/callback",
@@ -45,6 +50,20 @@ _GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
def _path_is_public(path: str) -> bool:
"""True if ``path`` bypasses the OAuth auth gate.
Two sources of public-ness:
* :data:`PUBLIC_API_PATHS` — the shared ``/api/*`` allowlist that
the legacy ``_SESSION_TOKEN`` middleware also honours. Matched
exactly (no prefix expansion) so adding ``/api/status`` doesn't
accidentally expose ``/api/status/secret-extension``.
* :data:`_GATE_PUBLIC_PREFIXES` — auth-bootstrap routes and static
mounts. Prefix-matched so ``/assets/foo.css`` lights up via
``/assets/``.
"""
if path in PUBLIC_API_PATHS:
return True
return any(
path == prefix or path.startswith(prefix)
for prefix in _GATE_PUBLIC_PREFIXES
+49
View File
@@ -0,0 +1,49 @@
"""Shared allowlist of ``/api/*`` paths that bypass dashboard auth.
Two middlewares enforce dashboard auth and previously kept independent
copies of this list:
* ``hermes_cli.web_server.auth_middleware`` — loopback / ``--insecure``
mode, gates on the ephemeral ``_SESSION_TOKEN``.
* ``hermes_cli.dashboard_auth.middleware.gated_auth_middleware`` —
non-loopback mode, gates on the OAuth session cookie.
When the lists drifted, ``/api/status`` ended up public under the legacy
gate but 401'd under the OAuth gate. That broke the portal's wildcard
liveness probe (``nous-account-service`` ``fly-provider.ts``
``getInstanceRuntimeStatus``), which fetches ``/api/status`` without a
cookie as its sole signal of "agent dashboard is alive": every healthy
wildcard-subdomain agent surfaced as STARTING/down in the portal UI even
though the dashboard was serving correctly.
Centralising the allowlist here so both middlewares import the same
frozenset prevents the next drift. Keep this list minimal — only truly
non-sensitive, read-only endpoints belong here. As a sanity check, every
entry should be safe to expose to:
* external uptime probes (Pingdom, Better Stack, NAS),
* the dashboard SPA before the user has logged in,
* anyone who happens to ``curl`` the hostname.
If a new endpoint doesn't pass all three tests, it should be gated and
the SPA should bootstrap it after login instead.
"""
from __future__ import annotations
PUBLIC_API_PATHS: frozenset[str] = frozenset({
# Liveness probe target. Returns version, gateway state, active
# session count, and the dashboard auth-gate shape. No bodies, no
# session content, no secrets. Documented as the portal's wildcard
# liveness probe in
# ``docs/agent-dashboard-public-url-contract.md`` (NAS side).
"/api/status",
# Read-only config-defaults / schema feeds for the SPA's Config page.
"/api/config/defaults",
"/api/config/schema",
# Read-only model metadata (context windows, etc.) — same shape as
# provider catalogs already exposed on the public internet.
"/api/model/info",
# Read-only theme + plugin manifests for the dashboard skin engine.
"/api/dashboard/themes",
"/api/dashboard/plugins",
})
-38
View File
@@ -13391,11 +13391,6 @@ Examples:
"--yes", "-y", action="store_true", help="Skip confirmation"
)
sessions_subparsers.add_parser(
"optimize",
help="Reclaim disk space: merge FTS5 segments + VACUUM (no data change)",
)
sessions_subparsers.add_parser("stats", help="Show session store statistics")
sessions_rename = sessions_subparsers.add_parser(
@@ -13568,39 +13563,6 @@ Examples:
relaunch(["--resume", selected_id])
return # won't reach here after execvp
elif action == "optimize":
db_path = db.db_path
before_mb = (
os.path.getsize(db_path) / (1024 * 1024)
if db_path.exists()
else 0.0
)
print("Optimizing session store (FTS merge + VACUUM)…")
try:
# vacuum() merges FTS5 segments (optimize_fts) then VACUUMs.
# Probe the index count first for the summary line.
n = sum(
1
for t in db._FTS_TABLES
if db._fts_table_exists(t)
)
db.vacuum()
except Exception as e:
print(f"Error: optimization failed: {e}")
db.close()
return
after_mb = (
os.path.getsize(db_path) / (1024 * 1024)
if db_path.exists()
else 0.0
)
saved = before_mb - after_mb
print(f"Optimized {n} FTS index(es).")
print(
f"Database size: {before_mb:.1f} MB -> {after_mb:.1f} MB "
f"(reclaimed {saved:.1f} MB)"
)
elif action == "stats":
total = db.session_count()
msgs = db.message_count()
+13 -10
View File
@@ -110,17 +110,20 @@ app.add_middleware(
# ---------------------------------------------------------------------------
# Endpoints that do NOT require the session token. Everything else under
# /api/ is gated by the auth middleware below. Keep this list minimal —
# only truly non-sensitive, read-only endpoints belong here.
# /api/ is gated by the auth middleware below.
#
# This list is defined in ``hermes_cli.dashboard_auth.public_paths`` so the
# OAuth gate middleware can honour the same allowlist — keeping the two
# gates in lockstep avoids drift like the wildcard-subdomain regression
# where ``/api/status`` was public under the legacy gate but 401'd under
# the OAuth gate (breaking the portal's liveness probe).
#
# Keep the upstream list minimal — only truly non-sensitive, read-only
# endpoints belong there.
# ---------------------------------------------------------------------------
_PUBLIC_API_PATHS: frozenset = frozenset({
"/api/status",
"/api/config/defaults",
"/api/config/schema",
"/api/model/info",
"/api/dashboard/themes",
"/api/dashboard/plugins",
})
from hermes_cli.dashboard_auth.public_paths import (
PUBLIC_API_PATHS as _PUBLIC_API_PATHS,
)
def _has_valid_session_token(request: Request) -> bool:
-62
View File
@@ -3116,58 +3116,6 @@ class SessionDB:
# ── Space reclamation ──
# FTS5 virtual tables whose b-tree segments we merge on optimize. The
# trigram table is created lazily / may be disabled, so we probe before
# touching it (see optimize_fts).
_FTS_TABLES = ("messages_fts", "messages_fts_trigram")
def _fts_table_exists(self, name: str) -> bool:
"""True if an FTS5 virtual table is queryable in this DB."""
try:
self._conn.execute(f"SELECT 1 FROM {name} LIMIT 0")
return True
except sqlite3.OperationalError:
return False
def optimize_fts(self) -> int:
"""Merge fragmented FTS5 b-tree segments into one per index.
FTS5 indexes grow as a series of incremental segments one per
``INSERT`` batch driven by the message triggers. Over tens of
thousands of messages these segments accumulate, which both bloats
the ``*_data`` shadow tables and slows ``MATCH`` queries that must
scan every segment. The special ``'optimize'`` command rewrites each
index as a single merged segment.
This is purely a maintenance operation it changes neither search
results nor ``snippet()`` output, only on-disk layout and query
speed. It is complementary to VACUUM: ``optimize`` compacts the FTS
index internally, then VACUUM returns the freed pages to the OS.
Skips any FTS table that does not exist (e.g. the trigram index when
disabled via ``HERMES_DISABLE_FTS_TRIGRAM`` or not yet created), so
it is safe to call unconditionally.
Returns the number of FTS indexes that were optimized.
"""
optimized = 0
with self._lock:
for tbl in self._FTS_TABLES:
if not self._fts_table_exists(tbl):
continue
try:
# The column name in the INSERT must match the table name
# for FTS5 special commands.
self._conn.execute(
f"INSERT INTO {tbl}({tbl}) VALUES('optimize')"
)
optimized += 1
except sqlite3.OperationalError as exc:
logger.warning(
"FTS optimize failed for %s: %s", tbl, exc
)
return optimized
def vacuum(self) -> None:
"""Run VACUUM to reclaim disk space after large deletes.
@@ -3181,17 +3129,7 @@ class SessionDB:
exclusive lock, so callers must ensure no other writers are
active. Safe to call at startup before the gateway/CLI starts
serving traffic.
FTS5 segments are merged first via :meth:`optimize_fts` so the
subsequent VACUUM reclaims the pages freed by the merge. This is a
layout-only optimization search results are unchanged.
"""
# Merge FTS5 segments before VACUUM so the freed pages are returned
# to the OS in the same pass. optimize_fts() manages its own lock.
try:
self.optimize_fts()
except Exception as exc:
logger.warning("FTS optimize before VACUUM failed: %s", exc)
# VACUUM cannot be executed inside a transaction.
with self._lock:
# Best-effort WAL checkpoint first, then VACUUM.
+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-6qhGuifHVtCeep1SiQdCUxBMr7UGhYpdMTvXhrQu/zA=";
hash = "sha256-HV0aISBVjwbGqDj8qQynSxGFrrZDzuYAW3D3lB/x3zo=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
+9
View File
@@ -75,8 +75,17 @@ Config file: `~/.hermes/hindsight/config.json`
| `recall_prompt_preamble` | — | Custom preamble for recalled memories in context |
| `recall_tags` | — | Tags to filter when searching memories |
| `recall_tags_match` | `any` | Tag matching mode: `any` / `all` / `any_strict` / `all_strict` |
| `recall_types` | `observation` | Fact types surfaced by recall (both auto-recall and the `hindsight_recall` tool). Comma-separated string or JSON list. **Default narrowed to `observation` only** (see "Behavior change" below). Set to `observation,world,experience` to also include raw facts. |
| `auto_recall` | `true` | Automatically recall memories before each turn |
> **Behavior change — `recall_types` defaults to `observation` only.**
>
> Previously recall returned all three fact types. It now returns only observations.
>
> Per [Hindsight's docs](https://hindsight.vectorize.io/developer/observations), observations are the **consolidated** knowledge layer Hindsight builds on top of raw facts: deduplicated beliefs grounded in evidence, refined as new facts arrive, with proof counts and freshness signals. Raw `world` / `experience` facts are the individual supporting evidence that feeds them. For per-turn context injection, observations are denser per token and avoid feeding the model multiple raw facts that one observation already summarizes.
>
> Restore the broad recall with `"recall_types": "observation,world,experience"` (string or JSON list) in `~/.hermes/hindsight/config.json`. This applies to **both** auto-recall and the `hindsight_recall` tool — both read the same `recall_types` setting (the tool schema has no per-call `types` argument), so narrowing the default narrows both paths.
### Retain
| Key | Default | Description |
+21 -2
View File
@@ -579,7 +579,15 @@ class HindsightMemoryProvider(MemoryProvider):
# Recall controls
self._auto_recall = True
self._recall_max_tokens = 4096
self._recall_types: list[str] | None = None
# Default to observation-only recall. Observations are Hindsight's
# consolidated knowledge layer — deduplicated, evidence-grounded
# beliefs built from many raw facts, with proof counts and
# freshness signals (see hindsight.vectorize.io/developer/observations).
# Including raw world/experience facts re-ships the supporting
# evidence that observations already summarize, burning the
# `recall_max_tokens` budget. Users can restore the broader
# recall via the `recall_types` config key.
self._recall_types: list[str] = ["observation"]
self._recall_prompt_preamble = ""
self._recall_max_input_chars = 800
@@ -856,6 +864,7 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "retain_assistant_prefix", "description": "Label used before assistant turns in retained transcripts", "default": "Assistant"},
{"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
{"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
{"key": "recall_types", "description": "Fact types to surface on recall — applies to both auto-recall and the hindsight_recall tool (comma-separated or list). Defaults to observation-only — observations are Hindsight's consolidated, deduplicated, evidence-grounded knowledge layer; raw world/experience facts are the supporting evidence observations already summarize. Set to e.g. 'observation,world,experience' to also include raw facts.", "default": "observation"},
{"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
{"key": "auto_retain", "description": "Automatically retain conversation turns", "default": True},
{"key": "retain_every_n_turns", "description": "Retain every N turns (1 = every turn)", "default": 1},
@@ -1187,7 +1196,17 @@ class HindsightMemoryProvider(MemoryProvider):
# Recall controls
self._auto_recall = self._config.get("auto_recall", True)
self._recall_max_tokens = int(self._config.get("recall_max_tokens", 4096))
self._recall_types = self._config.get("recall_types") or None
# Default narrows recall to observation-only; pass an explicit
# `recall_types` list in config.json to broaden (e.g. include
# "world" / "experience") or to disable the filter entirely.
configured_types = self._config.get("recall_types")
if configured_types is None:
self._recall_types = ["observation"]
elif isinstance(configured_types, str):
# Allow comma-separated strings for parity with recall_tags.
self._recall_types = [t.strip() for t in configured_types.split(",") if t.strip()]
else:
self._recall_types = list(configured_types) or ["observation"]
self._recall_prompt_preamble = self._config.get("recall_prompt_preamble", "")
self._recall_max_input_chars = int(self._config.get("recall_max_input_chars", 800))
self._retain_async = self._config.get("retain_async", True)
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.15.0"
version = "0.15.1"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
+7 -2
View File
@@ -2302,6 +2302,7 @@ class AIAgent:
original_user_message: Any,
final_response: Any,
interrupted: bool,
messages: list | None = None,
) -> None:
"""Mirror a completed turn into external memory providers.
@@ -2334,9 +2335,13 @@ class AIAgent:
if not (self._memory_manager and final_response and original_user_message):
return
try:
sync_kwargs = {"session_id": self.session_id or ""}
if messages is not None:
sync_kwargs["messages"] = messages
self._memory_manager.sync_all(
original_user_message, final_response,
session_id=self.session_id or "",
original_user_message,
final_response,
**sync_kwargs,
)
self._memory_manager.queue_prefetch_all(
original_user_message,
+2
View File
@@ -101,6 +101,8 @@ AUTHOR_MAP = {
"kronexoi13@gmail.com": "kronexoi",
"hua.zhong@kingsmith.com": "vgocoder",
"hermes@marian.local": "Schrotti77",
"david@memorilabs.ai": "devwdave",
"dave@devwdave.com": "devwdave",
"1920071390@campus.ouj.ac.jp": "zapabob",
"gaia@gaia.local": "jfuenmayor",
"jiahuigu@users.noreply.github.com": "Jiahui-Gu",
+188
View File
@@ -16,6 +16,7 @@ from agent.image_routing import (
_supports_vision_override,
build_native_content_parts,
decide_image_input_mode,
extract_image_refs,
)
@@ -449,3 +450,190 @@ class TestLargeImageHandling:
assert len(parts) == 2
assert parts[0]["type"] == "text"
assert parts[1]["type"] == "image_url"
# ─── extract_image_refs ──────────────────────────────────────────────────────
class TestExtractImageRefs:
"""Scan task body / inbound text for image paths and URLs (kanban worker
enrichment, issue raised May 2026)."""
def test_empty_or_none_returns_empty(self):
assert extract_image_refs("") == ([], [])
assert extract_image_refs(None) == ([], []) # type: ignore[arg-type]
def test_finds_absolute_path(self, tmp_path: Path):
img = tmp_path / "screenshot.png"
img.write_bytes(_png_bytes())
body = f"Look at {img} and tell me what's wrong."
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == []
def test_finds_home_relative_path(self, tmp_path: Path, monkeypatch):
# Simulate ~/foo.png by pointing HOME at tmp_path and creating the file
monkeypatch.setenv("HOME", str(tmp_path))
img = tmp_path / "foo.png"
img.write_bytes(_png_bytes())
paths, urls = extract_image_refs("see ~/foo.png please")
assert paths == [str(img)]
assert urls == []
def test_skips_nonexistent_paths(self, tmp_path: Path):
# Path-shaped but no file on disk → skipped.
body = f"What's at {tmp_path}/never_created.png ?"
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == []
def test_finds_http_image_url(self):
body = "Check out https://example.com/photos/cat.png — cute right?"
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == ["https://example.com/photos/cat.png"]
def test_finds_https_url_with_query_string(self):
body = "Diagram: https://cdn.example.com/img.jpeg?size=large&v=2 here"
paths, urls = extract_image_refs(body)
assert urls == ["https://cdn.example.com/img.jpeg?size=large&v=2"]
def test_url_trailing_punctuation_stripped(self):
# Prose punctuation right after the URL must not be part of the URL.
body = "See https://example.com/a.png."
paths, urls = extract_image_refs(body)
assert urls == ["https://example.com/a.png"]
def test_ignores_non_image_urls(self):
body = "See https://example.com/page.html and https://x.com/y.pdf"
paths, urls = extract_image_refs(body)
assert urls == []
def test_dedupes_paths_and_urls(self, tmp_path: Path):
img = tmp_path / "dup.png"
img.write_bytes(_png_bytes())
body = (
f"First {img} then again {img}. "
"Also https://example.com/x.png and https://example.com/x.png again."
)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == ["https://example.com/x.png"]
def test_ignores_paths_in_fenced_code_block(self, tmp_path: Path):
img = tmp_path / "real.png"
img.write_bytes(_png_bytes())
body = (
"Outside the block, attach this:\n"
f"{img}\n"
"But not these examples:\n"
"```\n"
f"some_other_image: /tmp/example.png\n"
f"url: https://example.com/example.png\n"
"```\n"
)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == []
def test_ignores_paths_in_inline_code(self, tmp_path: Path):
img = tmp_path / "real.jpg"
img.write_bytes(_png_bytes())
body = (
f"Attach {img}, but ignore the example "
"`https://example.com/skip.png` in backticks."
)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == []
def test_does_not_match_paths_inside_urls(self, tmp_path: Path):
# The lookbehind in the regex prevents matching the path-portion of
# a URL as a local path. Only the URL should be detected.
body = "Just the URL: https://example.com/some/dir/image.png"
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == ["https://example.com/some/dir/image.png"]
def test_mixed_paths_and_urls(self, tmp_path: Path):
img = tmp_path / "local.png"
img.write_bytes(_png_bytes())
body = (
f"Compare local {img} against the design at "
"https://example.com/design/v2.png — does it match?"
)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == ["https://example.com/design/v2.png"]
def test_case_insensitive_extension(self, tmp_path: Path):
img = tmp_path / "shouty.PNG"
img.write_bytes(_png_bytes())
body = f"see {img}"
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
# ─── build_native_content_parts with URLs ────────────────────────────────────
class TestBuildNativeContentPartsURLs:
"""URL pass-through support added so kanban task bodies (and other
inbound surfaces) can route remote image URLs straight to the model."""
def test_url_only_no_local_paths(self):
parts, skipped = build_native_content_parts(
"what is this?",
[],
image_urls=["https://example.com/diagram.png"],
)
assert skipped == []
assert len(parts) == 2
assert parts[0]["type"] == "text"
assert "[Image attached: https://example.com/diagram.png]" in parts[0]["text"]
assert parts[0]["text"].startswith("what is this?")
assert parts[1] == {
"type": "image_url",
"image_url": {"url": "https://example.com/diagram.png"},
}
def test_mixed_path_and_url(self, tmp_path: Path):
img = tmp_path / "local.png"
img.write_bytes(_png_bytes())
parts, skipped = build_native_content_parts(
"compare these",
[str(img)],
image_urls=["https://example.com/remote.jpg"],
)
assert skipped == []
# 1 text + 2 image parts (local data URL first, then remote URL).
image_parts = [p for p in parts if p.get("type") == "image_url"]
assert len(image_parts) == 2
assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
assert image_parts[1]["image_url"]["url"] == "https://example.com/remote.jpg"
text = parts[0]["text"]
assert "[Image attached at:" in text
assert "[Image attached: https://example.com/remote.jpg]" in text
def test_empty_url_list_is_no_op(self, tmp_path: Path):
img = tmp_path / "x.png"
img.write_bytes(_png_bytes())
# image_urls=[] should behave the same as not passing it at all.
parts_no_urls, _ = build_native_content_parts("hi", [str(img)])
parts_empty_urls, _ = build_native_content_parts("hi", [str(img)], image_urls=[])
assert parts_no_urls == parts_empty_urls
def test_blank_url_strings_are_dropped(self):
parts, _ = build_native_content_parts(
"x", [], image_urls=["", " ", "https://example.com/a.png"]
)
image_parts = [p for p in parts if p.get("type") == "image_url"]
assert len(image_parts) == 1
assert image_parts[0]["image_url"]["url"] == "https://example.com/a.png"
def test_url_only_inserts_default_prompt_when_text_empty(self):
parts, _ = build_native_content_parts(
"", [], image_urls=["https://example.com/a.png"]
)
assert parts[0]["type"] == "text"
assert parts[0]["text"].startswith("What do you see in this image?")
+29
View File
@@ -84,6 +84,13 @@ class MetadataMemoryProvider(FakeMemoryProvider):
self.memory_writes.append((action, target, content, metadata or {}))
class MessagesMemoryProvider(FakeMemoryProvider):
"""Provider that opts into completed-turn message context."""
def sync_turn(self, user_content, assistant_content, *, session_id="", messages=None):
self.synced_turns.append((user_content, assistant_content, session_id, messages))
# ---------------------------------------------------------------------------
# MemoryProvider ABC tests
# ---------------------------------------------------------------------------
@@ -236,6 +243,28 @@ class TestMemoryManager:
assert p1.synced_turns == [("user msg", "assistant msg")]
assert p2.synced_turns == [("user msg", "assistant msg")]
def test_sync_all_passes_messages_to_opted_in_provider(self):
mgr = MemoryManager()
p = MessagesMemoryProvider("external")
mgr.add_provider(p)
messages = [
{"role": "assistant", "tool_calls": [{"id": "call-1"}]},
{"role": "tool", "tool_call_id": "call-1", "content": "ok"},
]
mgr.sync_all("user msg", "assistant msg", session_id="sess-1", messages=messages)
assert p.synced_turns == [("user msg", "assistant msg", "sess-1", messages)]
def test_sync_all_omits_messages_for_legacy_provider(self):
mgr = MemoryManager()
p = FakeMemoryProvider("external")
mgr.add_provider(p)
mgr.sync_all("user msg", "assistant msg", messages=[{"role": "tool"}])
assert p.synced_turns == [("user msg", "assistant msg")]
def test_sync_failure_doesnt_block_others(self):
"""If one provider's sync fails, others still run."""
mgr = MemoryManager()
+2
View File
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
"TERMINAL_CONTAINER_DISK",
"TERMINAL_CONTAINER_MEMORY",
"TERMINAL_CONTAINER_PERSISTENT",
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"TERMINAL_DOCKER_ORPHAN_REAPER",
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
"BROWSER_CDP_URL",
"CAMOFOX_URL",
+215 -3
View File
@@ -12,6 +12,7 @@ the realistic runtime context. See the conftest module docstring.
"""
from __future__ import annotations
import json
import subprocess
import time
@@ -87,7 +88,15 @@ def test_dashboard_slot_reports_up_when_enabled(
"""Symmetry: with HERMES_DASHBOARD=1, s6-svstat reports the slot as up."""
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
"-e", "HERMES_DASHBOARD=1",
# The default dashboard host is 0.0.0.0, which now engages the
# OAuth auth gate. Without a provider registered (no
# HERMES_DASHBOARD_OAUTH_CLIENT_ID in this test env), start_server
# would fail closed and the slot would never come up. Pin the
# explicit insecure opt-in to keep this test focused on the s6
# supervision contract, not the auth gate.
"-e", "HERMES_DASHBOARD_INSECURE=1",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
# uvicorn takes a moment to bind; poll svstat.
@@ -112,7 +121,12 @@ def test_dashboard_opt_in_starts(
"""With HERMES_DASHBOARD=1, a dashboard process should be visible."""
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
"-e", "HERMES_DASHBOARD=1",
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
# doesn't fail-closed before the process can come up. See
# test_dashboard_slot_reports_up_when_enabled for the full rationale.
"-e", "HERMES_DASHBOARD_INSECURE=1",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
# Poll for the dashboard subprocess to appear — the entrypoint
@@ -131,6 +145,10 @@ def test_dashboard_port_override(
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1", "-e", "HERMES_DASHBOARD_PORT=9120",
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
# doesn't fail-closed before the port is bound. See
# test_dashboard_slot_reports_up_when_enabled for the full rationale.
"-e", "HERMES_DASHBOARD_INSECURE=1",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
@@ -160,7 +178,13 @@ def test_dashboard_restarts_after_crash(
"""
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
"-e", "HERMES_DASHBOARD=1",
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
# doesn't fail-closed before the supervised dashboard can come up.
# See test_dashboard_slot_reports_up_when_enabled for the full
# rationale.
"-e", "HERMES_DASHBOARD_INSECURE=1",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
# Wait for the first dashboard to come up.
@@ -201,3 +225,191 @@ def test_dashboard_restarts_after_crash(
raise AssertionError(
f"Dashboard not restarted after kill (first_pid={first_pid})"
)
# ---------------------------------------------------------------------------
# OAuth auth-gate behaviour — regression guard for the dashboard-insecure
# auto-injection bug. Pre-fix, the s6 run script appended `--insecure`
# whenever `HERMES_DASHBOARD_HOST` was non-loopback, silently disabling
# the OAuth gate on every container-deployed dashboard. The matching
# static-text guard lives in tests/test_docker_home_override_scripts.py;
# this is the behavioural end-to-end check.
# ---------------------------------------------------------------------------
def _http_probe(
container: str,
path: str,
*,
deadline_s: float = 60.0,
) -> tuple[int, str]:
"""Poll ``http://127.0.0.1:9119<path>`` from inside the container.
Returns ``(status_code, body)`` as soon as the dashboard answers any
HTTP response 200, 401, 503, anything. The image doesn't ship
``curl`` but the venv's stdlib ``urllib`` is good enough; we use a
proper ``try``/``except`` to intercept ``HTTPError`` because
``urlopen`` raises on 4xx/5xx, and we treat those as legitimate
responses (the OAuth gate's 401 IS the success signal for the
gate-engaged test).
Connection errors (uvicorn still starting, fail-closed exited) keep
the poll loop running until ``deadline_s`` elapses.
The probe Python program is fed over stdin (``python -``) rather
than ``python -c`` so we can use proper multi-line syntax with
``try``/``except`` blocks without escaping hell.
Raises ``AssertionError`` on timeout.
"""
py_program = f"""\
import urllib.request, urllib.error
req = urllib.request.Request("http://127.0.0.1:9119{path}")
try:
r = urllib.request.urlopen(req, timeout=5)
print(r.status)
print(r.read().decode(), end="")
except urllib.error.HTTPError as h:
print(h.code)
print(h.read().decode(), end="")
"""
# Feed the program over stdin via a heredoc so docker_exec_sh's
# single bash string stays clean. The 'PY' delimiter is quoted to
# disable shell expansion inside the heredoc body.
probe = (
"/opt/hermes/.venv/bin/python - <<'PY'\n"
f"{py_program}"
"PY"
)
end = time.monotonic() + deadline_s
last_err = ""
while time.monotonic() < end:
r = docker_exec_sh(container, probe, timeout=10)
if r.returncode == 0 and r.stdout.strip():
lines = r.stdout.split("\n", 1)
try:
status = int(lines[0].strip())
body = lines[1] if len(lines) > 1 else ""
return status, body
except (ValueError, IndexError) as exc:
last_err = f"parse: {exc!r} / stdout={r.stdout!r}"
else:
last_err = f"rc={r.returncode} stderr={r.stderr!r}"
time.sleep(0.5)
raise AssertionError(
f"Probe of {path} never returned HTTP within {deadline_s}s; "
f"last error: {last_err}"
)
def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
built_image: str, container_name: str,
) -> None:
"""The s6 dashboard run script must NOT auto-add ``--insecure`` when the
dashboard binds to ``0.0.0.0``. The OAuth auth gate engages on its own
when a ``DashboardAuthProvider`` is registered (the bundled nous
provider activates whenever ``HERMES_DASHBOARD_OAUTH_CLIENT_ID`` is
set).
Regression guard for the wildcard-subdomain rollout where every
portal-provisioned agent binds ``0.0.0.0`` and relies on the OAuth
gate to authenticate browser callers. Before this fix, the run script
flipped ``--insecure`` on for any non-loopback bind, which routed
``start_server`` straight back into the legacy ``allow_public=True``
branch and disabled the gate every time.
We verify two independent observable consequences of the gate being
on:
1. ``/api/auth/providers`` (publicly reachable through the gate so
the login page can bootstrap) returns 200 with ``nous`` in the
provider list proves the bundled provider registered.
2. ``/api/sessions`` (a gated route under both the legacy
``_SESSION_TOKEN`` middleware and the OAuth gate) returns 401
to an unauthenticated caller proves the OAuth gate is actively
intercepting browser traffic. We deliberately probe a gated route
here rather than ``/api/status``: status sits in the shared
``PUBLIC_API_PATHS`` allowlist (portal liveness probe target) and
responds 200 without a cookie under both gates, so it cannot
distinguish "gate on" from "gate off".
"""
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1",
"-e", "HERMES_DASHBOARD_HOST=0.0.0.0",
"-e", "HERMES_DASHBOARD_OAUTH_CLIENT_ID=agent:test-instance",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
# (1) Provider registry visible via the public bootstrap endpoint.
status_code, body = _http_probe(container_name, "/api/auth/providers")
assert status_code == 200, (
f"/api/auth/providers should return 200 when a provider is "
f"registered; got {status_code} body={body!r}"
)
payload = json.loads(body)
provider_names = [p.get("name") for p in payload.get("providers", [])]
assert "nous" in provider_names, (
"Bundled dashboard_auth/nous provider should register when "
f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {payload!r}"
)
# (2) A gated route (``/api/sessions``) returns 401 to an
# unauthenticated caller — the OAuth gate is intercepting.
status_code, body = _http_probe(container_name, "/api/sessions")
assert status_code == 401, (
"OAuth gate must intercept gated /api/* routes on 0.0.0.0 bind "
"when a provider is registered and HERMES_DASHBOARD_INSECURE "
f"is unset. Got: status={status_code} body={body!r}"
)
# (3) ``/api/status`` remains 200 under the gate — it's in the shared
# ``PUBLIC_API_PATHS`` allowlist so NAS's wildcard-subdomain
# liveness probe (``fly-provider.ts`` ``getInstanceRuntimeStatus``)
# can reach it without a cookie. Regression guard: this allowlist
# drifted once already and surfaced every healthy agent as
# STARTING/down in the portal UI.
status_code, body = _http_probe(container_name, "/api/status")
assert status_code == 200, (
"/api/status must remain publicly reachable under the OAuth gate "
"— the portal uses it as the wildcard-subdomain liveness probe. "
f"Got: status={status_code} body={body!r}"
)
status = json.loads(body)
assert status.get("auth_required") is True, (
"/api/status must report auth_required=True when the OAuth gate "
f"is engaged so the SPA/portal can distinguish modes. Got: {status!r}"
)
def test_dashboard_insecure_env_var_opts_out_of_gate(
built_image: str, container_name: str,
) -> None:
"""``HERMES_DASHBOARD_INSECURE=1`` re-enables the legacy no-gate mode
for operators running on trusted LANs behind a reverse proxy without
the OAuth contract. Same opt-out shape as the rest of the s6 boolean
envs (``HERMES_DASHBOARD``, ``HERMES_DASHBOARD_TUI``).
With the gate off, ``/api/status`` (a public endpoint under the
legacy ``_SESSION_TOKEN`` middleware) returns 200 with the
``auth_required: false`` body proves the gate is bypassed.
"""
subprocess.run(
["docker", "run", "-d", "--name", container_name,
"-e", "HERMES_DASHBOARD=1",
"-e", "HERMES_DASHBOARD_HOST=0.0.0.0",
"-e", "HERMES_DASHBOARD_INSECURE=1",
built_image, "sleep", "120"],
check=True, capture_output=True, timeout=30,
)
status_code, body = _http_probe(container_name, "/api/status")
assert status_code == 200, (
f"/api/status should return 200 with the auth gate disabled; "
f"got {status_code} body={body!r}"
)
status = json.loads(body)
assert status.get("auth_required") is False, (
"HERMES_DASHBOARD_INSECURE=1 must disable the auth gate (explicit "
f"opt-in for trusted-LAN deployments). Got: {status!r}"
)
@@ -131,8 +131,13 @@ class TestRefreshTokenCookieDeprecation:
class TestApi401Envelope:
# NOTE: probe a gated route (``/api/sessions``) here rather than
# ``/api/status`` — status is in the shared ``PUBLIC_API_PATHS``
# allowlist (portal liveness probe) so it would 200 even without a
# cookie and never exercise the 401-envelope code path.
def test_no_cookie_returns_unauthenticated_envelope(self, gated_app):
r = gated_app.get("/api/status")
r = gated_app.get("/api/sessions")
assert r.status_code == 401
body = r.json()
assert body["error"] == "unauthenticated"
@@ -141,7 +146,7 @@ class TestApi401Envelope:
def test_invalid_cookie_returns_session_expired_envelope(self, gated_app):
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
r = gated_app.get("/api/status")
r = gated_app.get("/api/sessions")
assert r.status_code == 401
body = r.json()
assert body["error"] == "session_expired"
@@ -151,7 +156,7 @@ class TestApi401Envelope:
"""Dead-cookie cleanup — Phase 6 requirement so the browser
doesn't keep replaying the stale token on every request."""
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
r = gated_app.get("/api/status")
r = gated_app.get("/api/sessions")
set_cookies = r.headers.get_list("set-cookie")
assert any(
c.startswith(f"{SESSION_AT_COOKIE}=") and "Max-Age=0" in c
@@ -56,10 +56,61 @@ def gated_app():
# ---------------------------------------------------------------------------
def test_gated_status_now_requires_auth(gated_app):
"""When gate is on, /api/status is NOT public — login bootstrap uses /api/auth/providers."""
def test_gated_status_is_public(gated_app):
"""``/api/status`` MUST be public under the OAuth gate.
Regression guard for the wildcard-subdomain rollout: NAS
(``fly-provider.ts`` ``getInstanceRuntimeStatus``) hits
``/api/status`` without a cookie as its sole liveness probe. A 401
here surfaces every healthy agent as STARTING/down in the portal
UI. The endpoint returns only version + gateway/auth-gate metadata
(no user data, no session content), so it stays in the shared
``PUBLIC_API_PATHS`` allowlist under both the legacy ``_SESSION_TOKEN``
gate and the OAuth gate.
The body also reports the gate's shape (``auth_required``,
``auth_providers``) so the SPA's StatusPage and external monitors
can distinguish loopback / gated / no-providers without a separate
round trip.
"""
r = gated_app.get("/api/status")
assert r.status_code == 401
assert r.status_code == 200, (
f"Expected 200, got {r.status_code}: {r.text}"
)
body = r.json()
assert body["auth_required"] is True
assert "version" in body
assert "gateway_state" in body
@pytest.mark.parametrize("path", [
"/api/config/defaults",
"/api/config/schema",
"/api/model/info",
"/api/dashboard/themes",
"/api/dashboard/plugins",
])
def test_other_public_api_paths_are_public_under_gate(gated_app, path):
"""The remaining ``PUBLIC_API_PATHS`` entries must also bypass the
gate. They're documented as non-sensitive read-only endpoints that
the SPA pre-loads before login (themes, config schema, model
metadata). A 401 / 302-to-login here would block the dashboard
shell from rendering pre-auth.
Accept any non-auth-failure status: 200 when the route succeeds,
or any route-specific error (e.g. 400 / 404 / 500 from a missing
dependency) but NEVER 401, and NEVER a 302 to ``/login``.
"""
r = gated_app.get(path, follow_redirects=False)
assert r.status_code != 401, (
f"{path} returned 401 under the OAuth gate — should be public"
)
if r.status_code == 302:
location = r.headers.get("location", "")
assert "/login" not in location, (
f"{path} redirected to {location} — should be public, "
"not bounced to /login"
)
def test_gated_html_redirects_to_login(gated_app):
@@ -98,7 +149,7 @@ def test_gated_static_asset_path_is_public(gated_app):
# ---------------------------------------------------------------------------
def test_full_login_round_trip_unlocks_api_status(gated_app):
def test_full_login_round_trip_unlocks_gated_api(gated_app):
# 1) Click "Sign in with Stub IdP" — /auth/login redirects to the stub
# with a PKCE cookie on the response.
r1 = gated_app.get("/auth/login?provider=stub", follow_redirects=False)
@@ -128,11 +179,16 @@ def test_full_login_round_trip_unlocks_api_status(gated_app):
assert any("hermes_session_at" in c for c in set_cookies)
assert any("hermes_session_rt" in c for c in set_cookies)
# 3) /api/status now succeeds because we're authenticated.
r3 = gated_app.get("/api/status")
assert r3.status_code == 200
body = r3.json()
assert "version" in body
# 3) A gated API route (``/api/sessions``) now succeeds because we
# have a valid session cookie. (We deliberately don't probe
# ``/api/status`` here — it's in the shared PUBLIC_API_PATHS
# allowlist and would 200 even without a login, so it can't
# distinguish "logged in" from "gate accidentally disabled".)
r3 = gated_app.get("/api/sessions")
assert r3.status_code == 200, (
f"Expected 200 for /api/sessions post-login, got {r3.status_code}: "
f"{r3.text}"
)
def test_login_unknown_provider_returns_404(gated_app):
@@ -59,19 +59,11 @@ def loopback_client():
web_server.app.state.auth_required = prev_required
def _login(client: TestClient) -> None:
"""Drive the stub OAuth round trip so the gated client is authed."""
r1 = client.get("/auth/login?provider=stub", follow_redirects=False)
assert r1.status_code == 302
state = r1.headers["location"].split("state=")[1]
r2 = client.get(
f"/auth/callback?code=stub_code&state={state}", follow_redirects=False
)
assert r2.status_code == 302
def test_status_reports_auth_required_in_gated_mode(gated_client):
_login(gated_client)
# No ``_login()`` call — ``/api/status`` is in the shared
# ``PUBLIC_API_PATHS`` allowlist precisely so external probes (and
# the SPA's pre-login bootstrap) can read the gate's shape without
# a cookie. Hit it cold.
r = gated_client.get("/api/status")
assert r.status_code == 200
body = r.json()
@@ -0,0 +1,238 @@
"""Worker-side image enrichment for kanban tasks.
When a kanban task body contains a local image path or an ``http(s)://``
image URL, the worker must surface that image to the model on its first
user turn matching the CLI/gateway behaviour for inbound images.
The dispatcher spawns the worker as
``hermes -p <profile> chat -q "work kanban task <id>"``. The task body
itself never appears in argv; the worker has to read it from the kanban
DB during startup. These tests cover the round-trip:
task body kanban_db.get_task extract_image_refs
build_native_content_parts multimodal user turn
"""
from __future__ import annotations
import base64
from pathlib import Path
import pytest
from hermes_cli import kanban_db as kb
from agent.image_routing import (
build_native_content_parts,
extract_image_refs,
)
# Tiny 1×1 transparent PNG used to back any path the tests stick into a
# task body. extract_image_refs validates the path exists on disk, so the
# byte content has to be a real readable file (any image bytes will do).
_PNG = base64.b64decode(
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNgYGBgAAAABQABpfZFQAAAAABJRU5ErkJggg=="
)
@pytest.fixture
def kanban_home(tmp_path: Path, monkeypatch):
"""Isolated HERMES_HOME with a fresh kanban DB for each test."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
kb.init_db()
return home
def _add_task_with_body(body: str, *, title: str = "Look at this") -> str:
conn = kb.connect()
try:
task_id = kb.create_task(
conn,
title=title,
body=body,
assignee="worker-a",
tenant=None,
)
finally:
conn.close()
return task_id
def _read_body(task_id: str) -> str:
conn = kb.connect()
try:
task = kb.get_task(conn, task_id)
return (task.body if task is not None else "") or ""
finally:
conn.close()
class TestExtractFromTaskBody:
"""Read a real kanban task body and run it through extract_image_refs."""
def test_local_path_in_body_round_trips(self, kanban_home, tmp_path):
img = tmp_path / "screenshot.png"
img.write_bytes(_PNG)
tid = _add_task_with_body(
f"Please review the screenshot at {img} and confirm "
"the alignment is right."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == []
def test_url_in_body_round_trips(self, kanban_home):
tid = _add_task_with_body(
"The design lives at https://example.com/mock/v3.png — "
"make the implementation match it."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == ["https://example.com/mock/v3.png"]
def test_mixed_path_and_url_in_body(self, kanban_home, tmp_path):
img = tmp_path / "current.png"
img.write_bytes(_PNG)
tid = _add_task_with_body(
f"Compare the current screenshot {img} against the design at "
"https://example.com/target.png and write a diff."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == [str(img)]
assert urls == ["https://example.com/target.png"]
def test_body_without_images_yields_nothing(self, kanban_home):
tid = _add_task_with_body(
"Refactor the auth module to use the new session helper."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == []
def test_empty_body_is_safe(self, kanban_home):
tid = _add_task_with_body("")
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == []
assert urls == []
class TestBuildPartsFromTaskBody:
"""Verify the full pipeline produces a multimodal user turn."""
def test_local_path_becomes_native_image_part(self, kanban_home, tmp_path):
img = tmp_path / "design.png"
img.write_bytes(_PNG)
tid = _add_task_with_body(f"Check out {img} — what's broken?")
body = _read_body(tid)
paths, urls = extract_image_refs(body)
# Mirrors the cli.py wiring: pass the worker's literal -q argument
# (the dispatcher uses ``"work kanban task <id>"``) plus the
# extracted refs through build_native_content_parts.
parts, skipped = build_native_content_parts(
f"work kanban task {tid}",
paths,
image_urls=urls or None,
)
assert skipped == []
# text part + one image_url part
assert len(parts) == 2
assert parts[0]["type"] == "text"
assert parts[0]["text"].startswith(f"work kanban task {tid}")
assert f"[Image attached at: {img}]" in parts[0]["text"]
assert parts[1]["type"] == "image_url"
assert parts[1]["image_url"]["url"].startswith("data:image/png;base64,")
def test_url_becomes_image_url_part(self, kanban_home):
tid = _add_task_with_body(
"Reference: https://example.com/target.jpg — match it."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
parts, skipped = build_native_content_parts(
f"work kanban task {tid}",
paths,
image_urls=urls or None,
)
assert skipped == []
assert len(parts) == 2
assert parts[0]["type"] == "text"
assert "[Image attached: https://example.com/target.jpg]" in parts[0]["text"]
assert parts[1] == {
"type": "image_url",
"image_url": {"url": "https://example.com/target.jpg"},
}
def test_body_with_both_yields_two_image_parts(self, kanban_home, tmp_path):
img = tmp_path / "local.png"
img.write_bytes(_PNG)
tid = _add_task_with_body(
f"Diff {img} vs https://example.com/target.png — explain it."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
parts, skipped = build_native_content_parts(
f"work kanban task {tid}",
paths,
image_urls=urls or None,
)
assert skipped == []
image_parts = [p for p in parts if p.get("type") == "image_url"]
assert len(image_parts) == 2
# Local file is embedded as a data URL; remote URL passes through.
assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
assert image_parts[1]["image_url"]["url"] == "https://example.com/target.png"
def test_body_with_no_images_leaves_query_untouched(self, kanban_home):
tid = _add_task_with_body(
"Rewrite the README intro paragraph to focus on use cases."
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
parts, skipped = build_native_content_parts(
f"work kanban task {tid}",
paths,
image_urls=urls or None,
)
# No images → plain text-only return (single part, no list mutation).
assert skipped == []
assert len(parts) == 1
assert parts[0]["type"] == "text"
assert parts[0]["text"] == f"work kanban task {tid}"
def test_code_block_example_is_not_attached(self, kanban_home, tmp_path):
# Only the real image outside the fenced code block should attach.
real = tmp_path / "real.png"
real.write_bytes(_PNG)
tid = _add_task_with_body(
f"Real screenshot:\n{real}\n\n"
"Example we DON'T want attached:\n"
"```\n"
"image: /tmp/example_only.png\n"
"url: https://example.com/example.png\n"
"```\n"
)
body = _read_body(tid)
paths, urls = extract_image_refs(body)
assert paths == [str(real)]
assert urls == []
@@ -197,10 +197,32 @@ class TestConfig:
assert provider._recall_max_input_chars == 800
assert provider._tags is None
assert provider._recall_tags is None
# Default recall narrowed to observation-only; world/experience are
# aggregate facts that often crowd out concrete-event signal during
# auto-recall. Users opt back in via the recall_types config key.
assert provider._recall_types == ["observation"]
assert provider._bank_mission == ""
assert provider._bank_retain_mission is None
assert provider._retain_context == "conversation between Hermes Agent and the User"
def test_recall_types_default_is_observation_only(self, provider):
"""Auto-recall must filter to observation by default."""
assert provider._recall_types == ["observation"]
def test_recall_types_explicit_list_overrides_default(self, provider_with_config):
p = provider_with_config(recall_types=["world", "experience", "observation"])
assert p._recall_types == ["world", "experience", "observation"]
def test_recall_types_csv_string_accepted(self, provider_with_config):
"""For parity with recall_tags, comma-separated strings work too."""
p = provider_with_config(recall_types="observation, world")
assert p._recall_types == ["observation", "world"]
def test_recall_types_empty_list_falls_back_to_default(self, provider_with_config):
"""An empty list shouldn't disable the filter (would be wider than default)."""
p = provider_with_config(recall_types=[])
assert p._recall_types == ["observation"]
def test_custom_config_values(self, provider_with_config):
p = provider_with_config(
retain_tags=["tag1", "tag2"],
@@ -91,6 +91,45 @@ class TestSyncExternalMemoryForTurn:
session_id="test_session_001",
)
def test_completed_turn_syncs_messages_when_present(self):
agent = _bare_agent()
messages = [
{
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": "call-1",
"type": "function",
"function": {
"name": "terminal",
"arguments": "{\"command\":\"pytest\"}",
},
}
],
},
{
"role": "tool",
"name": "terminal",
"tool_call_id": "call-1",
"content": "final Hermes-processed output",
}
]
agent._sync_external_memory_for_turn(
original_user_message="run tests",
final_response="tests passed",
interrupted=False,
messages=messages,
)
agent._memory_manager.sync_all.assert_called_once_with(
"run tests",
"tests passed",
session_id="test_session_001",
messages=messages,
)
# --- Edge cases (pre-existing behaviour preserved) ------------------
def test_no_final_response_skips(self):
@@ -13,3 +13,36 @@ def test_dashboard_run_resets_home_before_dropping_privileges() -> None:
assert "#!/command/with-contenv sh" in text
assert "export HOME=/opt/data" in text
assert "exec s6-setuidgid hermes hermes dashboard" in text
def test_dashboard_run_does_not_derive_insecure_from_bind_host() -> None:
"""The s6 dashboard run script MUST NOT auto-add ``--insecure`` based on
``HERMES_DASHBOARD_HOST``. Doing so disables the OAuth auth gate on
every non-loopback bind even when an auth provider is registered
the exact regression that exposed every wildcard-subdomain agent
dashboard publicly until early 2026.
The opt-in is now explicit: ``HERMES_DASHBOARD_INSECURE=1`` (truthy).
The auth gate is the authority on whether non-loopback binds are safe.
"""
text = DASHBOARD_RUN.read_text(encoding="utf-8")
# No legacy host-derived flip.
assert '127.0.0.1|localhost' not in text, (
"Run script still derives --insecure from the bind host. The gate "
"is the authority now — opt in via HERMES_DASHBOARD_INSECURE instead."
)
assert 'case "$dash_host" in' not in text, (
"Legacy host-derived --insecure case-statement is back."
)
# New opt-in env var present.
assert "HERMES_DASHBOARD_INSECURE" in text, (
"Explicit HERMES_DASHBOARD_INSECURE opt-in is missing."
)
# Truthy values aligned with the rest of the s6 scripts
# (HERMES_DASHBOARD, HERMES_DASHBOARD_TUI).
for truthy in ("1", "true", "TRUE", "True", "yes", "YES", "Yes"):
assert truthy in text, (
f"HERMES_DASHBOARD_INSECURE should accept truthy value {truthy!r}"
)
-58
View File
@@ -2679,64 +2679,6 @@ class TestVacuum:
db.vacuum()
class TestOptimizeFts:
def test_optimize_returns_index_count(self, db):
"""A fresh DB has both FTS indexes; optimize merges both."""
db.create_session(session_id="s1", source="cli")
db.append_message(session_id="s1", role="user", content="hello world")
assert db.optimize_fts() == 2
def test_optimize_preserves_search_and_snippet(self, db):
"""Optimize is layout-only: MATCH results + snippets are unchanged."""
db.create_session(session_id="s1", source="cli")
for i in range(50):
db.append_message(
session_id="s1",
role="user",
content=f"needle alpha bravo charlie message {i}",
)
before = db.search_messages("needle")
n = db.optimize_fts()
assert n == 2
after = db.search_messages("needle")
assert len(after) == len(before)
assert len(after) > 0
# Snippet must still be populated (would be empty/None if the FTS
# content shadow were lost during optimize).
assert all(row.get("snippet") for row in after)
# IDs and snippets are identical before/after — pure layout change.
assert [r["id"] for r in after] == [r["id"] for r in before]
assert [r["snippet"] for r in after] == [r["snippet"] for r in before]
def test_optimize_skips_missing_trigram_table(self, db):
"""When the trigram index is absent, optimize handles only the porter
index and does not raise."""
db.create_session(session_id="s1", source="cli")
db.append_message(session_id="s1", role="user", content="hello")
# Drop the trigram table + triggers to simulate a disabled/absent index.
with db._lock:
for trig in (
"messages_fts_trigram_insert",
"messages_fts_trigram_delete",
"messages_fts_trigram_update",
):
db._conn.execute(f"DROP TRIGGER IF EXISTS {trig}")
db._conn.execute("DROP TABLE IF EXISTS messages_fts_trigram")
assert db._fts_table_exists("messages_fts_trigram") is False
assert db._fts_table_exists("messages_fts") is True
# Only the porter index remains -> 1 optimized, no error.
assert db.optimize_fts() == 1
def test_optimize_idempotent(self, db):
"""Running optimize twice is safe (second pass is a no-op merge)."""
db.create_session(session_id="s1", source="cli")
db.append_message(session_id="s1", role="user", content="repeat me")
assert db.optimize_fts() == 2
assert db.optimize_fts() == 2
# Search still works after repeated optimization.
assert len(db.search_messages("repeat")) == 1
class TestAutoMaintenance:
def _make_old_ended(self, db, sid: str, days_old: int = 100):
"""Create a session that is ended and was started `days_old` days ago."""
+866 -12
View File
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
def test_non_persistent_cleanup_removes_container(monkeypatch):
"""When persistent=false, cleanup() must schedule docker stop + rm."""
"""When persist_across_processes=false, cleanup() must docker stop AND
docker rm so containers don't leak across hermes processes.
Updated for issue #20561: the previous implementation used fire-and-forget
``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
the new implementation uses ``subprocess.run`` on a daemon thread with
bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
for the full behavior contract.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
# Run the worker thread synchronously so assertions can observe its work.
import threading
monkeypatch.setattr(threading, "Thread", _FakeThread)
popen_cmds = []
monkeypatch.setattr(
docker_env.subprocess, "Popen",
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="ephemeral-task", persistent_filesystem=False,
persist_across_processes=False,
)
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
assert env._container_id
container_id = env._container_id
assert container_id
# Capture cleanup-time docker calls (everything before this was init).
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capture(cmd, **kw):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
return real_run(cmd, **kw)
monkeypatch.setattr(docker_env.subprocess, "run", _capture)
env.cleanup()
# Should have stop and rm calls via Popen
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"
class _FakePopen:
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
"does not expose POSIX uid/gid" in rec.getMessage()
for rec in caplog.records
), "expected a warning when POSIX ids are unavailable"
# ── Docker labels (issue #20561) ──────────────────────────────────
def _run_args_from_calls(calls):
"""Pull the argv list passed to the first ``docker run`` invocation."""
run_calls = [
c for c in calls
if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
]
assert run_calls, "docker run should have been called"
return run_calls[0][0]
def _labels_in_run_args(run_args):
"""Return the set of ``key=value`` strings passed via ``--label``."""
return {
run_args[i + 1]
for i, flag in enumerate(run_args[:-1])
if flag == "--label"
}
def test_run_command_tags_hermes_agent_label(monkeypatch):
"""Every container hermes-agent starts must carry the hermes-agent=1 label
so the orphan reaper (and external operators) can identify them with a
single ``docker ps --filter label=hermes-agent=1`` call. Regression test
for issue #20561 — without the label there is no global sweep target."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="my-task")
labels = _labels_in_run_args(_run_args_from_calls(calls))
assert "hermes-agent=1" in labels, (
f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
)
def test_run_command_tags_task_and_profile_labels(monkeypatch):
"""task_id and the active profile name are surfaced as labels so future
cross-process reuse logic can filter to a specific (task, profile) pair
without parsing container names. Profile resolution uses the helper that
returns ``"default"`` for the root Hermes home."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="kanban-42")
labels = _labels_in_run_args(_run_args_from_calls(calls))
assert "hermes-task-id=kanban-42" in labels, (
f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
)
assert "hermes-profile=research-bot" in labels, (
f"hermes-profile=research-bot missing; got: {sorted(labels)}"
)
def test_label_sanitizer_rejects_invalid_characters():
"""Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
task names containing slashes, colons, or unicode would otherwise emit
invalid labels that round-trip badly through ``docker ps --filter``."""
assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
assert docker_env._sanitize_label_value("with/slash") == "with_slash"
assert docker_env._sanitize_label_value("with:colon") == "with_colon"
assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
# Empty / non-string inputs must collapse to a queryable token, not "".
assert docker_env._sanitize_label_value("") == "unknown"
assert docker_env._sanitize_label_value(None) == "unknown" # type: ignore[arg-type]
# >63 chars must truncate, not error.
long_value = "x" * 100
assert len(docker_env._sanitize_label_value(long_value)) == 63
def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
"""A task_id containing characters Docker rejects in label values must be
sanitized before reaching ``docker run --label``; otherwise the daemon
refuses the run with an inscrutable error and the agent's first command
blows up."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="task/with:weird*chars")
labels = _labels_in_run_args(_run_args_from_calls(calls))
# Each non-OK character becomes an underscore; the safe chars survive.
assert "hermes-task-id=task_with_weird_chars" in labels, (
f"sanitized task-id label missing; got: {sorted(labels)}"
)
def test_labels_attribute_populated_after_init(monkeypatch):
"""``self._labels`` must be set to the same key/value pairs that went onto
docker run, so subsequent reuse / reaper paths can match without re-running
the sanitizer or re-importing the profile module."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
env = _make_dummy_env(task_id="abc")
assert env._labels == {
"hermes-agent": "1",
"hermes-task-id": "abc",
"hermes-profile": "default",
}
# ── Cross-process container reuse (issue #20561) ──────────────────
def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
start_succeeds: bool = True):
"""Reuse-aware subprocess.run mock.
``ps_state`` controls what ``docker ps -a --filter ...`` returns:
* ``None`` no match (empty stdout). Forces a fresh ``docker run``.
* ``"running"`` / ``"exited"`` / ... emit ``CID\\tSTATE`` so the reuse
path picks it up. ``"running"`` skips ``docker start``; other states
trigger ``docker start`` (which can be forced to fail via
``start_succeeds=False``).
Returns the captured call list so the test can verify which docker
commands actually ran.
"""
calls = []
def _run(cmd, **kwargs):
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if isinstance(cmd, list) and len(cmd) >= 2:
sub = cmd[1]
if sub == "version":
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
if sub == "ps":
if ps_state is None:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
return subprocess.CompletedProcess(
cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
)
if sub == "start":
if not start_succeeds:
# Real subprocess.run with check=True raises on non-zero exit;
# mirror that so the production code's except clause fires.
raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
if sub == "run":
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
return calls
def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
"""When a labeled container is already ``running``, the reuse probe
must pick it up and skip ``docker run`` entirely. Regression for the
issue #20561 root cause: every Hermes process spawning a new container
despite docs claiming "ONE long-lived container shared across sessions"."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
env = _make_dummy_env(task_id="reuse-test")
# The reuse path must populate _container_id from the ps probe output.
assert env._container_id == "reused-cid", (
f"expected reused container id, got {env._container_id!r}"
)
# And it must NOT have run `docker run`.
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert not run_invocations, (
f"docker run should be skipped on reuse, got: {run_invocations}"
)
# And it must have NOT issued a `docker start` for an already-running container.
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
assert not start_invocations, (
f"docker start should be skipped when container already running, got: {start_invocations}"
)
def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
"""A labeled container in ``exited`` state must be restarted via
``docker start`` before the new Hermes process uses it. Without this
step, ``docker exec`` against a stopped container errors out and the
first agent command fails opaquely."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
env = _make_dummy_env(task_id="reuse-stopped")
assert env._container_id == "reused-cid"
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
assert start_invocations, "expected docker start for exited container"
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert not run_invocations, "should not docker run when reusing an exited container"
def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
"""If ``docker start`` on the matched container fails (container was
removed between probe and start, daemon paused, etc.), the code must
silently fall through to a fresh ``docker run`` rather than leaving the
user with a broken environment. Defensive recovery the probe is best-
effort, not authoritative."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(
monkeypatch, ps_state="exited", start_succeeds=False,
)
env = _make_dummy_env(task_id="reuse-broken-start")
# docker start should be attempted then fail; code falls through to run.
assert env._container_id == "fresh-cid", (
f"expected fresh container id after fallback, got {env._container_id!r}"
)
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_invocations, "fallback to fresh docker run must happen on start failure"
def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
"""Opt-out path: ``persist_across_processes=False`` skips the ps probe
entirely and always starts a fresh container, matching the pre-fix
behavior for users who want hard per-process isolation."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
# ps_state=running would trigger reuse if the probe ran — assert it doesn't.
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="no-reuse", persist_across_processes=False,
)
# Must NOT have issued docker ps (the probe is gated by the flag).
ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
assert not ps_invocations, (
f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
)
# Should have started a fresh container.
assert env._container_id == "fresh-cid"
def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
"""When the probe returns multiple matches (shouldn't normally happen,
but can after a crash leaves stale duplicates), a ``running`` container
is preferred over any stopped one. The duplicate gets reaped later by
the orphan reaper; we don't try to be heroic about it here."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
def _run(cmd, **kwargs):
if isinstance(cmd, list) and len(cmd) >= 2:
if cmd[1] == "version":
return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
if cmd[1] == "ps":
# Two matches: stopped first, running second.
return subprocess.CompletedProcess(
cmd, 0,
stdout="stopped-cid\texited\nrunning-cid\trunning\n",
stderr="",
)
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
env = _make_dummy_env(task_id="dup-match")
assert env._container_id == "running-cid", (
f"running container should win over stopped duplicate, got {env._container_id!r}"
)
# ── Cleanup correctness (issue #20561) ────────────────────────────
class _FakeThread:
"""Stand-in for threading.Thread that captures target/args and calls
target() synchronously when .start() runs, so cleanup behavior is
observable without actually backgrounding subprocess calls."""
def __init__(self, target=None, daemon=None, name=None):
self._target = target
self.daemon = daemon
self.name = name
self._done = False
def start(self):
if self._target is not None:
self._target()
self._done = True
def is_alive(self):
return not self._done
def join(self, timeout=None):
self._done = True
def _install_fake_thread(monkeypatch):
import threading
monkeypatch.setattr(threading, "Thread", _FakeThread)
def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
"""``persist_across_processes=True`` (default) cleanup must NEITHER stop
NOR remove the container the docs promise "ONE long-lived container
shared across sessions", and any docker stop would kill background
processes inside the container (npm watchers, pytest watchers, etc.).
Resource reclamation in this mode happens via the orphan reaper on next
Hermes startup, not on graceful exit. Issue #20561 — the first iteration
of this PR did docker stop here, which Ben caught as contradicting the
"ONE long-lived container" semantics."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
# Default persist_across_processes=True.
container_id = env._container_id
assert container_id
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup()
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert not stops, (
f"docker stop must NOT be called when persist_across_processes=True; "
f"container has to stay running so background processes survive. "
f"Got: {stops}"
)
assert not rms, (
f"docker rm must NOT be called when persist_across_processes=True; "
f"reuse would be impossible. Got: {rms}"
)
# The in-process handle must still be cleared so the next __init__
# re-probes via labels (and reuses the still-running container).
assert env._container_id is None, (
"in-process container_id should be cleared even in no-op cleanup"
)
def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
"""``cleanup(force_remove=True)`` must stop AND rm the container even
when ``persist_across_processes=True``. This is the explicit-teardown
path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
future caller that wants a guaranteed fresh container.
Without this kwarg, callers in persist mode would have no way to force a
fresh container without also flipping the global config too coarse for
a per-task reset.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
assert env._container_id
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup(force_remove=True)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
"""``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
for a persist-mode container.
Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
(which is called from ``tui_gateway/server.py`` on session.close, from
``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
we'd tear down the container on every TUI session close, defeating the
"ONE long-lived container shared across sessions" contract.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
from tools import terminal_tool
env = _make_dummy_env(task_id="session-close-test")
container_id = env._container_id
terminal_tool._active_environments["session-close-test"] = env
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
try:
terminal_tool.cleanup_vm("session-close-test")
finally:
terminal_tool._active_environments.pop("session-close-test", None)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert not stops, (
f"cleanup_vm() default must not docker stop a persist-mode container; "
f"got: {stops}"
)
assert not rms, (
f"cleanup_vm() default must not docker rm a persist-mode container; "
f"got: {rms}"
)
def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
"""``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
container the explicit-teardown path for ``/reset``-style flows.
Also pins the runtime-signature-inspection plumbing: the kwarg must
actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
from tools import terminal_tool
env = _make_dummy_env(task_id="explicit-teardown-test")
terminal_tool._active_environments["explicit-teardown-test"] = env
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
try:
terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
finally:
terminal_tool._active_environments.pop("explicit-teardown-test", None)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
"""``persist_across_processes=False`` cleanup must docker stop AND docker
rm so containers don't leak. Crucially, this runs regardless of the
``persistent_filesystem`` setting the original code only rm'd when
``not self._persistent``, which meant the default-on ``container_persistent:
true`` users (the documented happy path) leaked Exited containers forever.
Issue #20561 root-cause fix."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
# Note: persistent_filesystem=True (the prior-leak scenario) + the new
# cross-process toggle OFF must still result in a clean rm.
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="cleanup-no-persist", persistent_filesystem=True,
persist_across_processes=False,
)
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup()
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, "expected docker stop"
assert rms, (
"docker rm MUST run when persist_across_processes=False, even with "
"persistent_filesystem=True — that gating was the leak source in #20561."
)
def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
"""The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
raced with parent-process exit and silently dropped cleanup work. The
new code must use ``subprocess.run`` with bounded ``timeout=`` so the
work actually completes within the process lifetime.
Asserts cleanup never reaches into shell-mode Popen. Uses
``force_remove=True`` so cleanup actually issues docker calls the
default persist-mode path is now a no-op (commit 4) and would trivially
pass this assertion without exercising the docker code at all.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
def _forbidden_popen(*args, **kwargs):
raise AssertionError(
f"cleanup must not use subprocess.Popen anymore (issue #20561); "
f"got args={args} kwargs={kwargs}"
)
monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
env = _make_dummy_env(task_id="no-popen-cleanup")
env.cleanup(force_remove=True) # must not raise
def test_wait_for_cleanup_returns_true_when_no_thread_started():
"""``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
(or the env has no live cleanup thread) atexit calls it unconditionally
across all active envs, so a False return would falsely flag healthy
shutdowns."""
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
# No _cleanup_thread set — simulates an env that was never cleanup()'d.
assert env.wait_for_cleanup(timeout=1.0) is True
def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
"""End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
and reports completion. Atexit relies on this contract to ensure docker
stop/rm actually finishes before the Python interpreter exits.
Uses ``force_remove=True`` so cleanup actually starts a worker thread
the default persist-mode cleanup is a no-op (commit 4) and never spawns
a thread, so the trivial "no thread" branch of wait_for_cleanup is
already covered by the previous test.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="wait-test")
env.cleanup(force_remove=True)
assert env.wait_for_cleanup(timeout=5.0) is True
def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
"""A DockerEnvironment whose ``__init__`` failed before the container_id
was set (image-pull error, docker daemon down) should still be safe to
cleanup() the post-creation failure path in callers always tries.
Without this guard the daemon-down case used to NameError on the cleanup
branch."""
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
env._container_id = None
env._persistent = False
env._workspace_dir = None
env._home_dir = None
# No exception expected.
env.cleanup()
# ── Orphan reaper (issue #20561) ──────────────────────────────────
def _now_iso(offset_seconds: int = 0) -> str:
"""Return an RFC3339 timestamp ``offset_seconds`` in the past."""
import datetime
t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
# Format like Docker emits — with nanoseconds-style trailing digits.
return t.isoformat().replace("+00:00", ".123456789Z")
def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
rm_succeeds: bool = True):
"""Build a subprocess.run mock for reaper tests.
* ``ps_ids`` what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
* ``inspect_responses[cid]`` what ``docker inspect ... FinishedAt`` returns
for each cid; ``""`` means "field unset".
* ``rm_succeeds`` whether ``docker rm -f`` returns 0.
Captures every call so tests can assert which containers were rm'd.
"""
calls = []
def _run(cmd, **kwargs):
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if not isinstance(cmd, list) or len(cmd) < 2:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
sub = cmd[1]
if sub == "ps":
return subprocess.CompletedProcess(
cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
)
if sub == "inspect":
# cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
cid = cmd[-1]
return subprocess.CompletedProcess(
cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
)
if sub == "rm":
return subprocess.CompletedProcess(
cmd, 0 if rm_succeeds else 1,
stdout="", stderr="" if rm_succeeds else "no such container",
)
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
return calls
def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
"""No labeled containers → no rm calls, returns 0. Establishes the
happy-path baseline for the orphan reaper (issue #20561)."""
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, "no rm calls expected when ps returns empty"
def test_reap_orphan_removes_stale_exited_container(monkeypatch):
"""An Exited container older than max_age_seconds must be removed.
This is the core repair path for issue #20561 — without the reaper,
SIGKILL'd Hermes processes leak containers permanently."""
old = _now_iso(offset_seconds=900) # 15 minutes ago
calls = _reaper_run_mock(
monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 1
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert len(rms) == 1
assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
def test_reap_orphan_spares_recently_exited_container(monkeypatch):
"""A container exited within max_age_seconds must NOT be reaped — that
container belongs to a Hermes process that just finished and may be
about to be replaced. Conservative window prevents racing sibling
processes."""
recent = _now_iso(offset_seconds=60) # 1 minute ago
calls = _reaper_run_mock(
monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
"""The reaper must pass ``--filter label=hermes-profile=<profile>`` to
docker ps so it never sweeps another profile's containers. A research
profile must not tear down the default profile's stragglers."""
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
)
ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
assert ps_calls, "expected at least one docker ps call"
flat = " ".join(ps_calls[0][0])
assert "label=hermes-profile=research-bot" in flat, (
f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
)
assert "label=hermes-agent=1" in flat, (
f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
)
assert "status=exited" in flat, (
"must filter to exited containers only — running containers may "
"belong to a sibling Hermes process and must NEVER be reaped"
)
def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
"""If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
FinishedAt yet) or an unparseable timestamp, the reaper must leave the
container alone. Defensive never reap a container whose age we can't
determine."""
calls = _reaper_run_mock(
monkeypatch,
ps_ids=["never-finished", "garbage-ts"],
inspect_responses={
"never-finished": "0001-01-01T00:00:00Z",
"garbage-ts": "not-a-timestamp",
},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, (
f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
)
def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
"""If docker ps itself fails (daemon down, permission denied), the
reaper returns 0 without crashing. The reaper is best-effort plumbing,
not a critical path it must never block container creation."""
def _failing_ps(cmd, **kwargs):
if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
# Must not raise
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
"""If ``docker rm -f`` fails on one container (already removed by a
concurrent process, container locked, etc.), the reaper must log and
continue to the next candidate rather than aborting the whole sweep."""
old = _now_iso(offset_seconds=900)
rm_calls = []
def _run(cmd, **kwargs):
if not isinstance(cmd, list) or len(cmd) < 2:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
sub = cmd[1]
if sub == "ps":
return subprocess.CompletedProcess(
cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
)
if sub == "inspect":
return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
if sub == "rm":
rm_calls.append(cmd[-1])
# cid-b fails; cid-a and cid-c succeed.
if cmd[-1] == "cid-b":
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
# All three were attempted, two succeeded.
assert removed == 2
assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
)
def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
"""Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
9 fractional digits), but Python's fromisoformat caps at microseconds.
The helper must trim the extra digits without raising otherwise every
candidate gets skipped and the reaper does nothing."""
def _run(cmd, **kwargs):
return subprocess.CompletedProcess(
cmd, 0,
stdout="2026-05-28T13:45:00.123456789Z\n",
stderr="",
)
monkeypatch.setattr(docker_env.subprocess, "run", _run)
result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
assert result is not None, "must parse RFC3339 with nanoseconds"
import datetime
assert result.tzinfo == datetime.timezone.utc
assert result.year == 2026 and result.month == 5 and result.day == 28
def test_container_finished_at_returns_none_on_zero_value():
"""Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
map to None so the reaper treats the container as unreapable."""
# Direct test of the parsing helper — no subprocess needed since the
# check happens after the inspect call returns.
import subprocess as _subprocess
class _MockRun:
def __init__(self, stdout):
self.returncode = 0
self.stdout = stdout
self.stderr = ""
import unittest.mock
with unittest.mock.patch.object(
docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
):
result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
assert result is None
@@ -0,0 +1,139 @@
"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
The reaper itself is unit-tested in tests/tools/test_docker_environment.py
under the "Orphan reaper" section. These tests cover the terminal_tool-side
gates: once-per-process behavior, the disable flag, and the
``lifetime_seconds`` doubling that determines the reaper's age threshold.
Issue #20561 — without these gates, parallel subagents would each fire the
reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
opt-out would silently do nothing.
"""
import os
from unittest.mock import patch
import tools.terminal_tool as terminal_tool
def _reset_reaper_gate():
"""Clear the once-per-process flag between tests."""
terminal_tool._docker_orphan_reaper_ran = False
def test_maybe_reap_runs_once_per_process(monkeypatch):
"""The reaper sweep must run at most once per Python interpreter.
Parallel subagents that each call _create_environment(env_type='docker')
would otherwise fire N concurrent docker ps + inspect storms against the
daemon and waste 510s of startup."""
_reset_reaper_gate()
call_count = {"reap": 0}
def _fake_reap(**kwargs):
call_count["reap"] += 1
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
config = {"docker_orphan_reaper": True}
terminal_tool._maybe_reap_docker_orphans(config)
terminal_tool._maybe_reap_docker_orphans(config)
terminal_tool._maybe_reap_docker_orphans(config)
assert call_count["reap"] == 1, (
f"reaper must run exactly once per process; got {call_count['reap']} calls"
)
def test_maybe_reap_respects_disable_flag(monkeypatch):
"""``terminal.docker_orphan_reaper: false`` (via container_config) must
skip the sweep entirely no docker ps, no inspect, no rm. The escape
hatch for operators running multiple Hermes processes in the same
profile."""
_reset_reaper_gate()
call_count = {"reap": 0}
def _fake_reap(**kwargs):
call_count["reap"] += 1
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
# The once-per-process gate must NOT be tripped when the reaper is
# disabled — that would prevent a subsequent toggle to true from working.
assert terminal_tool._docker_orphan_reaper_ran is False
def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
"""The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
floor). Generous default gives sibling Hermes processes ample grace
to be replaced without their just-exited containers being yanked."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("max_age_seconds") == 600, (
f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
)
def test_maybe_reap_floors_at_60_seconds(monkeypatch):
"""A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
otherwise get an effective age threshold of zero, which would race the
user's own just-started container creation. Floor at 60s × 2 = 120s."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("max_age_seconds") == 120, (
f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
)
def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
"""The reaper must be scoped to the current Hermes profile — a research
profile must NEVER reap default's containers. Verifies the
profile-filter wiring."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("profile_filter") == "research-bot", (
f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
)
def test_maybe_reap_swallows_exceptions(monkeypatch):
"""A reaper crash (docker daemon down, parse error in helper) must NOT
block env creation. The reaper is best-effort plumbing, not a critical
path; failures get logged at debug level and execution continues."""
_reset_reaper_gate()
def _exploding_reap(**kwargs):
raise RuntimeError("docker daemon ate the cat")
with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
# Must not raise
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
+33
View File
@@ -34,6 +34,39 @@ def test_resolve_stdio_command_falls_back_to_hermes_node_bin(tmp_path):
assert env["PATH"].split(os.pathsep)[0] == str(node_bin)
def test_resolve_stdio_command_falls_back_to_usr_local_bin():
"""When ``npx`` isn't on the filtered PATH and isn't under ``$HERMES_HOME/node/bin``
or ``~/.local/bin``, the resolver should still locate it at ``/usr/local/bin/npx``.
This is the canonical install location for Node on Linux from-source builds,
the upstream ``node:bookworm-slim`` image (which the Hermes Docker image
copies ``node + npm + corepack`` from since #4977), and macOS Homebrew on
Intel. Without this candidate, MCP servers run with an ``env.PATH`` that
omits ``/usr/local/bin`` (common when users hand-author PATH for sandboxing)
fail with ENOENT at ``execvp``.
"""
target = os.path.join(os.sep, "usr", "local", "bin", "npx")
# Pretend ONLY the /usr/local/bin/npx candidate exists and is executable —
# the other candidates ($HERMES_HOME/node/bin/npx and ~/.local/bin/npx)
# should fail isfile() and the resolver must fall through to /usr/local/bin.
def _fake_isfile(path):
return path == target
def _fake_access(path, _mode):
return path == target
with patch("tools.mcp_tool.shutil.which", return_value=None), \
patch("tools.mcp_tool.os.path.isfile", side_effect=_fake_isfile), \
patch("tools.mcp_tool.os.access", side_effect=_fake_access):
command, env = _resolve_stdio_command("npx", {"PATH": "/opt/data/bin:/usr/bin:/bin"})
assert command == target
# /usr/local/bin must be prepended so npx's shebang (`/usr/bin/env node`)
# can find node in the same directory.
assert env["PATH"].split(os.pathsep)[0] == os.path.dirname(target)
def test_resolve_stdio_command_respects_explicit_empty_path():
seen_paths = []
@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
assert "docker_env" in _gateway_env_map_keys()
assert "docker_env" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
def test_docker_persist_across_processes_is_bridged_everywhere():
"""Regression pin for the cross-process container reuse toggle.
``terminal.docker_persist_across_processes`` (issue #20561) controls
whether ``DockerEnvironment.__init__`` probes for and reuses an existing
labeled container at startup, and whether ``cleanup()`` removes the
container on Hermes exit or just stops it (keeping it for the next
process). Same four-bridge invariant as docker_run_as_host_user /
docker_env / docker_mount_cwd_to_workspace drift between any of the
four sites means ``terminal.docker_persist_across_processes: false`` in
config.yaml silently does nothing for that entry point, leaving the
user unable to opt out of the documented "ONE long-lived container
shared across sessions" behavior.
"""
assert "docker_persist_across_processes" in _cli_env_map_keys()
assert "docker_persist_across_processes" in _gateway_env_map_keys()
assert "docker_persist_across_processes" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
def test_docker_orphan_reaper_is_bridged_everywhere():
"""Regression pin for the startup orphan reaper toggle (issue #20561).
``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
Exited containers from prior SIGKILL'd processes at startup. Same
four-site bridge invariant drift means
``terminal.docker_orphan_reaper: false`` silently does nothing for one
entry point, and the reaper either runs when the operator disabled it
or fails to run when they enabled it.
"""
assert "docker_orphan_reaper" in _cli_env_map_keys()
assert "docker_orphan_reaper" in _gateway_env_map_keys()
assert "docker_orphan_reaper" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
+8 -2
View File
@@ -44,11 +44,17 @@ def server(hermes_home):
):
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
@pytest.fixture()
+8 -2
View File
@@ -30,11 +30,17 @@ def server():
import importlib
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
@pytest.fixture()
@@ -34,11 +34,17 @@ def server():
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
def test_init_session_attaches_background_review_callback(server, monkeypatch):
+427 -40
View File
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
return {}
# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
# safely through `docker ps --filter label=key=value`. Profile and task names
# can technically contain other characters; sanitize defensively.
_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
def _sanitize_label_value(value: str) -> str:
"""Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
label is always queryable. Used at container-create time; never round-trip
a sanitized value back into application logic.
"""
if not isinstance(value, str) or not value:
return "unknown"
cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
cleaned = cleaned[:63] or "unknown"
return cleaned
def _get_active_profile_name() -> str:
"""Return the active Hermes profile name, or ``"default"`` on any error.
Resolved at container-create time so a single container is permanently
tagged with the profile that created it. Profile switches inside the
same process don't retroactively relabel running containers.
"""
try:
from hermes_cli.profiles import get_active_profile_name
return get_active_profile_name() or "default"
except Exception:
return "default"
def reap_orphan_containers(
*,
max_age_seconds: int = 600,
profile_filter: str | None = None,
docker_exe: str | None = None,
) -> int:
"""Remove stale hermes-tagged containers left behind by prior processes.
Targets containers that match all of:
* ``label=hermes-agent=1`` (created by this codebase)
* ``status=exited`` (running containers are NEVER reaped they may
belong to a sibling Hermes process whose reuse path will pick them
up; killing them would crash the sibling mid-command)
* (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
caller's profile by default; a hermes process in profile A must not
tear down profile B's containers)
* ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
process that just exited and is about to be replaced doesn't get
its container yanked out from under it)
Returns the number of containers removed. Best-effort: any failure
(docker daemon unreachable, slow inspect, parse error) is logged at
debug level and the function returns whatever it managed before the
failure. Safe to call repeatedly; idempotent.
Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
terminal exits that bypass the ``atexit`` cleanup hook. Without it,
even with the cleanup-fix in the prior commit, a hard-killed Hermes
process leaves its container behind permanently because there's no
subsequent Hermes process scheduled to reuse that exact (task, profile)
pair.
"""
docker = docker_exe or find_docker() or "docker"
filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
if profile_filter:
filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
try:
listing = subprocess.run(
[docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
capture_output=True, text=True, timeout=15, check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker ps failed: %s", e)
return 0
if listing.returncode != 0:
logger.debug(
"orphan reaper docker ps returned %d: %s",
listing.returncode, listing.stderr.strip(),
)
return 0
candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
if not candidate_ids:
return 0
# Inspect each candidate to get FinishedAt; reap only those exited
# long enough ago. Doing this per-container (rather than bulk inspect)
# keeps the failure blast radius to one container at a time.
import datetime
now = datetime.datetime.now(datetime.timezone.utc)
removed = 0
for cid in candidate_ids:
finished_at = _container_finished_at(docker, cid)
if finished_at is None:
# Couldn't determine age — be conservative and leave it alone.
continue
age = (now - finished_at).total_seconds()
if age < max_age_seconds:
continue
try:
result = subprocess.run(
[docker, "rm", "-f", cid],
capture_output=True, text=True, timeout=30,
)
if result.returncode == 0:
removed += 1
logger.info(
"Reaped orphan container %s (exited %d seconds ago)",
cid[:12], int(age),
)
else:
logger.debug(
"docker rm -f %s failed: %s",
cid[:12], result.stderr.strip(),
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
return removed
def _container_finished_at(docker_exe: str, container_id: str):
"""Parse ``docker inspect`` FinishedAt for *container_id*.
Returns a timezone-aware datetime, or ``None`` if the field is missing,
unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
for never-finished containers. ``None`` means "don't reap" the caller
leaves the container alone.
"""
try:
result = subprocess.run(
[docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
capture_output=True, text=True, timeout=10, check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
return None
if result.returncode != 0:
return None
raw = result.stdout.strip()
if not raw or raw.startswith("0001-01-01"):
return None
# Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
# Python's fromisoformat handles microseconds but not nanoseconds; trim.
import re as _re
raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
raw = raw.replace("Z", "+00:00")
try:
import datetime
return datetime.datetime.fromisoformat(raw)
except ValueError as e:
logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
return None
def find_docker() -> Optional[str]:
"""Locate the docker (or podman) CLI binary.
@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
auto_mount_cwd: bool = False,
run_as_host_user: bool = False,
extra_args: list = None,
persist_across_processes: bool = True,
):
if cwd == "~":
cwd = "/root"
super().__init__(cwd=cwd, timeout=timeout)
self._persistent = persistent_filesystem
self._persist_across_processes = persist_across_processes
self._task_id = task_id
self._forward_env = _normalize_forward_env_names(forward_env)
self._env = _normalize_env_dict(env)
self._container_id: Optional[str] = None
self._labels: dict[str, str] = {}
logger.info(f"DockerEnvironment volumes: {volumes}")
# Ensure volumes is a list (config.yaml could be malformed)
if volumes is not None and not isinstance(volumes, list):
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):
# Start the container directly via `docker run -d`.
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
run_cmd = [
self._docker_exe, "run", "-d",
"--init", # tini/catatonit as PID 1 — reaps zombie children
"--name", container_name,
"-w", cwd,
*all_run_args,
image,
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
# Labels make hermes-created containers identifiable to:
# * the orphan reaper (`hermes-agent=1` for the global sweep filter)
# * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
# * operators running `docker ps --filter label=hermes-agent=1`
# Values are limited to the safe character set defined by
# _sanitize_label_value(); the active Hermes profile is captured at
# container-start time and never changes for the container's lifetime.
profile_name = _sanitize_label_value(_get_active_profile_name())
task_label = _sanitize_label_value(task_id)
label_args = [
"--label", "hermes-agent=1",
"--label", f"hermes-task-id={task_label}",
"--label", f"hermes-profile={profile_name}",
]
logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run(
run_cmd,
capture_output=True,
text=True,
timeout=120, # image pull may take a while
check=True,
)
self._container_id = result.stdout.strip()
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
self._labels = {
"hermes-agent": "1",
"hermes-task-id": task_label,
"hermes-profile": profile_name,
}
# Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
# container shared across sessions"). If a prior Hermes process
# already started a container for this (task_id, profile) and it
# still exists, attach to it instead of starting a fresh one. This
# restores the documented contract; opt out via
# ``terminal.docker_persist_across_processes: false``.
#
# Reuse matches on labels only — we deliberately do NOT compare image
# / mounts / resources. Operators who need a fresh container after
# changing those settings should set ``docker_persist_across_processes:
# false`` (or run ``docker rm -f`` against the labeled container) to
# force a clean start.
reused = False
if persist_across_processes:
existing = self._find_reusable_container(task_label, profile_name)
if existing is not None:
container_id, state = existing
self._container_id = container_id
if state != "running":
try:
subprocess.run(
[self._docker_exe, "start", container_id],
capture_output=True,
text=True,
timeout=30,
check=True,
)
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
logger.warning(
"Failed to start existing container %s (state=%s): "
"%s — falling back to a fresh container.",
container_id[:12], state, e,
)
self._container_id = None
if self._container_id:
logger.info(
"Reusing container %s (task=%s, profile=%s, prior state=%s)",
container_id[:12], task_label, profile_name, state,
)
reused = True
if not reused:
run_cmd = [
self._docker_exe, "run", "-d",
"--init", # tini/catatonit as PID 1 — reaps zombie children
"--name", container_name,
*label_args,
"-w", cwd,
*all_run_args,
image,
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
]
logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run(
run_cmd,
capture_output=True,
text=True,
timeout=120, # image pull may take a while
check=True,
)
self._container_id = result.stdout.strip()
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
# Build the init-time env forwarding args (used only by init_session
# to inject host env vars into the snapshot; subsequent commands get
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
return _storage_opt_ok
def cleanup(self):
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
if self._container_id:
try:
# Stop in background so cleanup doesn't block
stop_cmd = (
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
)
subprocess.Popen(stop_cmd, shell=True)
except Exception as e:
logger.warning("Failed to stop container %s: %s", self._container_id, e)
def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
"""Look for an existing container labeled for this (task, profile).
Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
failure (including ``docker ps`` itself failing). State is one of the
values Docker reports via ``{{.State}}`` e.g. ``running``, ``exited``,
``created``, ``paused``, ``restarting``, ``dead``. The caller decides
whether the state warrants ``docker start`` before reuse.
Restricted to the docker-stored label set this class creates; never
matches containers that happened to be named ``hermes-*`` but were
started by some other tool.
"""
try:
result = subprocess.run(
[
self._docker_exe, "ps", "-a",
"--filter", "label=hermes-agent=1",
"--filter", f"label=hermes-task-id={task_label}",
"--filter", f"label=hermes-profile={profile_label}",
"--format", "{{.ID}}\t{{.State}}",
],
capture_output=True,
text=True,
timeout=10,
check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("docker ps probe failed: %s — will start a fresh container", e)
return None
if result.returncode != 0:
logger.debug(
"docker ps probe returned %d: %s — will start a fresh container",
result.returncode, result.stderr.strip(),
)
return None
lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
if not lines:
return None
# Multiple matches are unusual (one (task, profile) should produce one
# container) but can happen if a previous Hermes process crashed
# mid-cleanup. Prefer a running one if present; otherwise pick the
# first listed. Stale duplicates get reaped by the orphan-reaper in a
# follow-up commit; we don't try to be heroic about them here.
running = None
first = None
for ln in lines:
parts = ln.split("\t", 1)
if len(parts) != 2:
continue
cid, state = parts[0], parts[1].lower()
if first is None:
first = (cid, state)
if state == "running" and running is None:
running = (cid, state)
return running or first
def cleanup(self, *, force_remove: bool = False):
"""Tear down the container according to persist mode and *force_remove*.
Persist-mode (``persist_across_processes=True``, the default) leaves the
container **running** untouched. The docs promise "ONE long-lived
container shared across sessions" and stopping it on every Hermes exit
breaks that promise:
* Background processes inside the container (``npm run dev``, watchers,
long-running pytest) get killed every time the user runs ``/quit``.
* Every reuse requires ``docker start`` + waiting for the container to
come back up, adding 12s to the first tool call of the new session.
* The user-visible difference between "ONE long-lived container" and
"a new container that happens to share state" is exactly this:
processes survive in the former, die in the latter.
Resource reclamation for the persist-mode case lives in the
``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
Hermes process touches a labeled container for ``2 × lifetime_seconds``
it gets ``docker rm -f``'d at the next Hermes startup. That covers the
SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
container on every graceful exit.
Opt-out mode (``persist_across_processes=False``) still does
``docker stop`` + ``docker rm -f`` on every cleanup, matching the
pre-PR behavior for users who explicitly want per-process isolation.
``force_remove=True`` overrides persist mode and always tears the
container down (``docker stop`` + ``docker rm -f``). This is the
explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
resets, or any caller that wants a guaranteed fresh container on next
``DockerEnvironment(task_id=...)``. No current caller passes
``force_remove=True``; the parameter is here so the explicit-teardown
semantics can be wired up later without changing this method's
signature.
Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
(not the racy ``Popen(... &)`` pattern from before PR #33645). The
atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
thread to finish before the interpreter exits, so ``docker stop`` /
``docker rm`` actually completes when we do trigger it.
"""
container_id = self._container_id
if not container_id:
# Still drop the bind-mount dirs if any were allocated and we're
# NOT in persist mode (persist mode preserves them).
if not self._persistent:
# Also schedule removal (stop only leaves it as stopped)
try:
subprocess.Popen(
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
shell=True,
)
except Exception:
pass
self._container_id = None
for d in (self._workspace_dir, self._home_dir):
if d:
shutil.rmtree(d, ignore_errors=True)
return
if not self._persistent:
# Decide what to actually do. Three cases:
#
# force_remove=True → stop + rm (explicit teardown)
# persist_across_processes=True → no-op (leave container running)
# persist_across_processes=False → stop + rm (per-process isolation)
#
# The persist-mode no-op is the issue-#20561 contract: the container
# outlives Hermes processes, processes inside it stay alive, and
# reuse on next startup is instant.
if force_remove:
should_stop = True
should_remove = True
elif self._persist_across_processes:
# No-op for the container. Drop the in-process handle so a fresh
# __init__ will re-probe via labels (and find the running
# container) instead of trying to reuse a stale Python reference.
self._container_id = None
return
else:
should_stop = True
should_remove = True
# Capture state needed by the worker before we null out the attrs —
# the worker thread can outlive ``self``.
docker_exe = self._docker_exe
log_id = container_id[:12]
def _do_cleanup() -> None:
if should_stop:
try:
subprocess.run(
[docker_exe, "stop", "-t", "10", container_id],
capture_output=True, timeout=30,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("docker stop %s timed out / failed: %s", log_id, e)
if should_remove:
try:
subprocess.run(
[docker_exe, "rm", "-f", container_id],
capture_output=True, timeout=30,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("docker rm -f %s failed: %s", log_id, e)
# Daemon thread: doesn't block interpreter exit (atexit returns
# promptly), but unlike the old ``Popen(... &)`` shell trick the
# Python-level join semantics let the thread actually run to
# completion if the interpreter is still alive. atexit registers
# ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
# outstanding cleanups, so most exits complete the work cleanly.
import threading
t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
t.start()
self._cleanup_thread = t
self._container_id = None
# Bind-mount dir teardown only runs when we actually removed the
# container (the dirs are the container's filesystem state; keeping
# them around with no container would orphan the data on disk).
if should_remove and not self._persistent:
for d in (self._workspace_dir, self._home_dir):
if d:
shutil.rmtree(d, ignore_errors=True)
def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
"""Block up to *timeout* seconds for the cleanup worker thread.
Returns ``True`` if the thread finished (or no thread was started),
``False`` on timeout. The atexit hook in terminal_tool.py calls this
on every active environment so docker stop/rm actually completes
before the Python process exits without this, ``hermes /quit``
races the interpreter shutdown and leaves stopped containers behind.
"""
thread = getattr(self, "_cleanup_thread", None)
if thread is None or not thread.is_alive():
return True
thread.join(timeout=timeout)
return not thread.is_alive()
+11
View File
@@ -422,6 +422,17 @@ def _resolve_stdio_command(command: str, env: dict) -> tuple[str, dict]:
candidates = [
os.path.join(hermes_home, "node", "bin", resolved_command),
os.path.join(os.path.expanduser("~"), ".local", "bin", resolved_command),
# /usr/local/bin is the canonical install location for Node on
# Linux from-source builds, the upstream node:bookworm-slim
# image (which the Hermes Docker image copies node + npm +
# corepack from since #4977), and macOS Homebrew on Intel.
# Without this candidate, any MCP server configured with an
# env.PATH that omits /usr/local/bin (a common pattern when
# users hand-author PATH for sandboxing) fails with ENOENT
# at execvp, and a naive symlink workaround into the user's
# PATH only fails one layer deeper because npx's shebang
# re-execs /usr/bin/env node which needs the same directory.
os.path.join(os.sep, "usr", "local", "bin", resolved_command),
]
for candidate in candidates:
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
+143 -3
View File
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
_cleanup_thread = None
_cleanup_running = False
# Once-per-process guard for the docker orphan reaper (issue #20561).
# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
# calls for parallel subagents won't re-trigger the sweep.
_docker_orphan_reaper_ran = False
_docker_orphan_reaper_lock = threading.Lock()
def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
"""Run the docker orphan reaper once per process, if enabled.
Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
profile that match the issue #20561 leak class — containers left behind
by Hermes processes that exited without firing ``atexit`` (SIGKILL,
OOM, terminal-window-close). The reaper is conservative by default:
only Exited containers older than ``2 × lifetime_seconds`` and scoped to
the current profile.
Gates:
* ``terminal.docker_orphan_reaper: false`` disables it entirely (the
operator opted out usually because they're running multiple
Hermes processes in the same profile and don't trust the
conservative defaults).
* ``_docker_orphan_reaper_ran`` flag sweep runs once per Python
interpreter, not on every subagent / RL-rollout / parallel
``terminal()`` call.
"""
global _docker_orphan_reaper_ran
if not container_config.get("docker_orphan_reaper", True):
return
# Cheap double-checked-locking: read without the lock, take the lock
# only on first run, recheck inside.
if _docker_orphan_reaper_ran:
return
with _docker_orphan_reaper_lock:
if _docker_orphan_reaper_ran:
return
_docker_orphan_reaper_ran = True
# 2 × lifetime_seconds gives sibling Hermes processes a generous grace
# window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
# doesn't get an instant-reap that races their own setup.
# ``container_config`` only carries container_* keys, so read
# lifetime_seconds from the env var the rest of the module uses.
try:
lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
except (TypeError, ValueError):
lifetime = 300
lifetime = max(60, lifetime)
max_age = lifetime * 2
try:
from tools.environments.docker import (
reap_orphan_containers, _get_active_profile_name,
)
except ImportError:
return
try:
profile = _get_active_profile_name()
removed = reap_orphan_containers(
max_age_seconds=max_age, profile_filter=profile,
)
if removed:
logger.info(
"Docker orphan reaper removed %d stale container(s) for profile %s",
removed, profile,
)
except Exception as e:
# Never fail the env-creation path because of a janitor problem.
logger.debug("Docker orphan reaper raised: %s", e)
# Per-task environment overrides registry.
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
"docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
"docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
"docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
# Cross-process container reuse (issue #20561). The docs claim
# "ONE long-lived container shared across sessions" — this toggle
# makes that real by probing for a labeled container at startup and
# attaching to it instead of always starting a fresh one. Set to
# ``false`` for hard per-process isolation (no reuse, container is
# removed on exit).
"docker_persist_across_processes": os.getenv(
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
).lower() in {"true", "1", "yes"},
# Startup orphan reaper for hermes-tagged containers left behind by
# crashed / SIGKILL'd previous processes that bypassed atexit.
# Conservative: only sweeps Exited containers older than 2× the
# idle-reap window AND scoped to the current profile. Issue #20561.
"docker_orphan_reaper": os.getenv(
"TERMINAL_DOCKER_ORPHAN_REAPER", "true"
).lower() in {"true", "1", "yes"},
}
@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
return _LocalEnvironment(cwd=cwd, timeout=timeout)
elif env_type == "docker":
# One-shot orphan reaper: clean up labeled containers left behind by
# prior Hermes processes that hit SIGKILL / OOM / a closed terminal
# before the atexit cleanup hook could run. Gated to once per
# process so concurrent _create_environment calls (parallel
# subagents, RL benchmarks) don't run the reaper N times.
# Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
_maybe_reap_docker_orphans(cc)
return _DockerEnvironment(
image=image, cwd=cwd, timeout=timeout,
cpu=cpu, memory=memory, disk=disk,
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
env=docker_env,
run_as_host_user=cc.get("docker_run_as_host_user", False),
extra_args=docker_extra_args,
persist_across_processes=cc.get("docker_persist_across_processes", True),
)
elif env_type == "singularity":
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
return cleaned
def cleanup_vm(task_id: str):
"""Manually clean up a specific environment by task_id."""
def cleanup_vm(task_id: str, *, force_remove: bool = False):
"""Manually clean up a specific environment by task_id.
*force_remove* (default False) is forwarded to backends that accept it
currently only ``DockerEnvironment``. The default of False matches
session-lifecycle semantics: this function is called from
``AIAgent.close()`` (TUI session close, gateway session teardown) and the
per-turn cleanup branch for non-persistent envs, both of which should
honor the user's persist-mode preference. Stopping the container here
would defeat the "ONE long-lived container shared across sessions"
contract exactly the bug Ben reported when the container was killed
on every TUI session close.
Pass ``force_remove=True`` for actual user-initiated teardown
(e.g. ``/reset``-style flows that haven't been wired yet, or future
"destroy my sandbox" commands).
The idle reaper passes the env through ``env.cleanup()`` directly (not
via this function), so persist-mode idle envs are similarly no-op'd —
only the orphan reaper at next startup reclaims them.
"""
# Remove from tracking dicts while holding the lock, but defer the
# actual (potentially slow) env.cleanup() call to outside the lock
# so other tool calls aren't blocked.
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):
try:
if hasattr(env, 'cleanup'):
env.cleanup()
# Pass force_remove only if the env's cleanup() accepts it
# (DockerEnvironment after issue #20561; other backends don't).
import inspect
sig = inspect.signature(env.cleanup)
if "force_remove" in sig.parameters:
env.cleanup(force_remove=force_remove)
else:
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
if _active_environments:
count = len(_active_environments)
logger.info("Shutting down %d remaining sandbox(es)...", count)
# Snapshot the env objects BEFORE cleanup_all_environments empties
# the dict; we need them to wait on docker cleanup threads after the
# registry has been cleared.
envs_to_wait = list(_active_environments.values())
cleanup_all_environments()
# Block briefly so docker stop/rm actually completes before the
# interpreter exits. Issue #20561 — without this join, the daemon
# cleanup threads were getting torn down mid-`docker stop`, leaving
# Exited containers piled up on the host.
for env in envs_to_wait:
wait_fn = getattr(env, "wait_for_cleanup", None)
if wait_fn is None:
continue
try:
wait_fn(timeout=15.0)
except Exception as e: # never block shutdown on a bad backend
logger.debug("wait_for_cleanup raised on exit: %s", e)
atexit.register(_atexit_cleanup)
@@ -1746,6 +1884,8 @@ def terminal_tool(
"docker_env": config.get("docker_env", {}),
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
"docker_extra_args": config.get("docker_extra_args", []),
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
}
local_config = None
Generated
+1 -1
View File
@@ -1589,7 +1589,7 @@ wheels = [
[[package]]
name = "hermes-agent"
version = "0.15.0"
version = "0.15.1"
source = { editable = "." }
dependencies = [
{ name = "croniter" },
+2786 -6
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -10,7 +10,7 @@
"preview": "vite preview"
},
"dependencies": {
"@nous-research/ui": "0.16.0",
"@nous-research/ui": "0.18.2",
"@observablehq/plot": "^0.6.17",
"@react-three/fiber": "^9.6.0",
"@tailwindcss/vite": "^4.2.1",
+2 -2
View File
@@ -50,12 +50,12 @@ import {
import { Button } from "@nous-research/ui/ui/components/button";
import { SelectionSwitcher } from "@nous-research/ui/ui/components/selection-switcher";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Typography } from "@/components/NouiTypography";
import { Typography } from "@nous-research/ui/ui/components/typography/index";
import { cn } from "@/lib/utils";
import { Backdrop } from "@/components/Backdrop";
import { SidebarFooter } from "@/components/SidebarFooter";
import { SidebarStatusStrip, gatewayLine } from "@/components/SidebarStatusStrip";
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
import { useSidebarStatus } from "@/hooks/useSidebarStatus";
import { AuthWidget } from "@/components/AuthWidget";
import { PageHeaderProvider } from "@/contexts/PageHeaderProvider";
+2 -2
View File
@@ -1,7 +1,7 @@
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
function FieldHint({ schema, schemaKey }: { schema: Record<string, unknown>; schemaKey: string }) {
const keyPath = schemaKey.includes(".") ? schemaKey : "";
-225
View File
@@ -1,225 +0,0 @@
import {
type PointerEvent as ReactPointerEvent,
type ReactNode,
useEffect,
useRef,
useState,
} from "react";
import { createPortal } from "react-dom";
import { Typography } from "@/components/NouiTypography";
import { cn, themedBody } from "@/lib/utils";
const CLOSE_DRAG_MIN_PX = 72;
const CLOSE_DRAG_RATIO = 0.18;
const SHEET_TRANSITION_MS = 280;
/**
* Mobile-first picker shell: fixed backdrop + bottom sheet, portaled to `body`
* so nested overflow/transform in the sidebar cannot clip menus (theme /
* language switchers). Open/close uses slide + fade; teardown is delayed until
* the exit animation finishes so animations can complete.
*
* Drag the header/handle downward to dismiss (skipped when reduced motion is on).
*/
export function BottomPickSheet({
backdropDismissLabel = "Dismiss",
children,
onClose,
open,
title,
}: BottomPickSheetProps) {
const [renderPortal, setRenderPortal] = useState(open);
const [entered, setEntered] = useState(false);
const [dragOffsetPx, setDragOffsetPx] = useState(0);
const [dragActive, setDragActive] = useState(false);
const closeTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
const sheetRef = useRef<HTMLDivElement>(null);
const dragTrackingRef = useRef(false);
const dragStartYRef = useRef(0);
const dragOffsetRef = useRef(0);
const reducedMotion =
typeof window !== "undefined" &&
window.matchMedia("(prefers-reduced-motion: reduce)").matches;
const syncDragPx = (next: number) => {
dragOffsetRef.current = next;
setDragOffsetPx(next);
};
useEffect(() => {
if (closeTimerRef.current) {
clearTimeout(closeTimerRef.current);
closeTimerRef.current = null;
}
const ms = reducedMotion ? 0 : SHEET_TRANSITION_MS;
let openRafId = 0;
let exitRafId = 0;
if (open) {
openRafId = requestAnimationFrame(() => {
dragTrackingRef.current = false;
dragOffsetRef.current = 0;
setDragActive(false);
setDragOffsetPx(0);
setRenderPortal(true);
requestAnimationFrame(() => {
requestAnimationFrame(() => setEntered(true));
});
});
} else {
exitRafId = requestAnimationFrame(() => {
dragTrackingRef.current = false;
setDragActive(false);
setEntered(false);
closeTimerRef.current = window.setTimeout(() => {
dragOffsetRef.current = 0;
setDragOffsetPx(0);
setRenderPortal(false);
closeTimerRef.current = null;
}, ms);
});
}
return () => {
cancelAnimationFrame(openRafId);
cancelAnimationFrame(exitRafId);
if (closeTimerRef.current) {
clearTimeout(closeTimerRef.current);
closeTimerRef.current = null;
}
};
}, [open, reducedMotion]);
useEffect(() => {
if (!renderPortal) return;
const prev = document.body.style.overflow;
document.body.style.overflow = "hidden";
return () => {
document.body.style.overflow = prev;
};
}, [renderPortal]);
if (!renderPortal || typeof document === "undefined") return null;
const durationClass = reducedMotion ? "duration-0" : "duration-[280ms]";
const draggingVisual = dragActive || dragOffsetPx > 0;
const onDragPointerDown = (e: ReactPointerEvent<HTMLDivElement>) => {
if (reducedMotion || !entered) return;
if (e.pointerType === "mouse" && e.button !== 0) return;
dragTrackingRef.current = true;
setDragActive(true);
dragStartYRef.current = e.clientY;
syncDragPx(0);
e.currentTarget.setPointerCapture(e.pointerId);
};
const onDragPointerMove = (e: ReactPointerEvent<HTMLDivElement>) => {
if (!dragTrackingRef.current) return;
const dy = e.clientY - dragStartYRef.current;
const next = Math.max(0, dy);
const sheetH = sheetRef.current?.offsetHeight ?? 560;
syncDragPx(Math.min(next, sheetH));
};
const endDrag = (e: ReactPointerEvent<HTMLDivElement>) => {
if (!dragTrackingRef.current) return;
dragTrackingRef.current = false;
setDragActive(false);
try {
e.currentTarget.releasePointerCapture(e.pointerId);
} catch {
/* already released */
}
const sheetH = sheetRef.current?.offsetHeight ?? 560;
const threshold = Math.max(CLOSE_DRAG_MIN_PX, sheetH * CLOSE_DRAG_RATIO);
const d = dragOffsetRef.current;
if (d >= threshold) {
onClose();
return;
}
syncDragPx(0);
};
return createPortal(
<div className="fixed inset-0 z-[200] flex flex-col justify-end">
<button
type="button"
aria-label={backdropDismissLabel}
className={cn(
"absolute inset-0 bg-black/55 backdrop-blur-[2px]",
"transition-opacity ease-out motion-reduce:transition-none",
durationClass,
entered ? "opacity-100" : "opacity-0",
)}
onClick={onClose}
/>
<div
aria-label={title}
aria-modal="true"
ref={sheetRef}
className={cn(
themedBody,
"relative flex max-h-[85dvh] min-h-0 flex-col rounded-t-xl border border-current/20",
"bg-background-base/98 pb-[max(1rem,env(safe-area-inset-bottom))]",
"shadow-[0_-12px_40px_-8px_rgba(0,0,0,0.55)] backdrop-blur-md",
"ease-out motion-reduce:transition-none transform-gpu",
draggingVisual ? "transition-none" : cn("transition-transform", durationClass),
entered ? "translate-y-0" : "translate-y-full",
)}
role="dialog"
style={
entered && dragOffsetPx > 0
? { transform: `translateY(${dragOffsetPx}px)` }
: undefined
}
>
<div
className={cn(
"flex shrink-0 flex-col gap-2 border-b border-current/15 px-4 pb-3 pt-2",
"touch-none select-none",
reducedMotion ? "cursor-default" : "cursor-grab active:cursor-grabbing",
)}
onPointerCancel={endDrag}
onPointerDown={onDragPointerDown}
onPointerMove={onDragPointerMove}
onPointerUp={endDrag}
>
<div
aria-hidden
className="mx-auto h-1 w-10 shrink-0 rounded-full bg-current/20"
/>
<Typography
mondwest
className="text-display text-xs tracking-[0.12em] text-text-tertiary"
>
{title}
</Typography>
</div>
<div className="min-h-0 flex-1 overflow-y-auto overscroll-contain">
{children}
</div>
</div>
</div>,
document.body,
);
}
interface BottomPickSheetProps {
backdropDismissLabel?: string;
children: ReactNode;
onClose: () => void;
open: boolean;
title: string;
}
+1 -1
View File
@@ -25,7 +25,7 @@
import { Button } from "@nous-research/ui/ui/components/button";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card } from "@/components/ui/card";
import { Card } from "@nous-research/ui/ui/components/card";
import { ModelPickerDialog } from "@/components/ModelPickerDialog";
import { ToolCall, type ToolEntry } from "@/components/ToolCall";
+1 -1
View File
@@ -1,4 +1,4 @@
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
import { useI18n } from "@/i18n";
export function DeleteConfirmDialog({
+5 -5
View File
@@ -2,9 +2,9 @@ import { useState, useRef, useEffect } from "react";
import { createPortal } from "react-dom";
import { Check } from "lucide-react";
import { Button } from "@nous-research/ui/ui/components/button";
import { BottomPickSheet } from "@/components/BottomPickSheet";
import { Typography } from "@/components/NouiTypography";
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
import { Typography } from "@nous-research/ui/ui/components/typography/index";
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
import { useI18n } from "@/i18n/context";
import { LOCALE_META } from "@/i18n";
import type { Locale } from "@/i18n";
@@ -87,7 +87,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
</Button>
{useMobileSheet && (
<BottomPickSheet
<BottomSheet
backdropDismissLabel={t.common.close}
onClose={() => setOpen(false)}
open={open}
@@ -101,7 +101,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
setOpen={setOpen}
/>
</div>
</BottomPickSheet>
</BottomSheet>
)}
{open && !useMobileSheet && (() => {
+2 -2
View File
@@ -2,8 +2,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Label } from "@/components/ui/label";
import { Input } from "@/components/ui/input";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import type { GatewayClient } from "@/lib/gatewayClient";
import { Check, Search, X } from "lucide-react";
import { useEffect, useMemo, useRef, useState } from "react";
-63
View File
@@ -1,63 +0,0 @@
import { forwardRef, type ElementType, type HTMLAttributes, type ReactNode } from "react";
import { cn } from "@/lib/utils";
type TypographyProps = HTMLAttributes<HTMLElement> & {
as?: ElementType;
children?: ReactNode;
compressed?: boolean;
courier?: boolean;
expanded?: boolean;
mondwest?: boolean;
mono?: boolean;
sans?: boolean;
variant?: "sm" | "md" | "lg" | "xl";
};
const variantClasses: Record<NonNullable<TypographyProps["variant"]>, string> = {
sm: "leading-[1.4] text-[.9375rem] tracking-[0.1875rem]",
md: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
lg: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
xl: "text-[4.5rem] leading-[1] tracking-[0.135rem]",
};
export const Typography = forwardRef<HTMLElement, TypographyProps>(function Typography(
{
as: Component = "span",
className,
compressed,
courier,
expanded,
mondwest,
mono,
sans,
variant,
...props
},
ref,
) {
const hasFontVariant = compressed || courier || expanded || mondwest || mono || sans;
return (
<Component
className={cn(
compressed && "font-compressed",
courier && "font-courier",
expanded && "font-expanded",
mondwest && "font-mondwest tracking-[0.1875rem]",
mono && "font-mono",
(!hasFontVariant || sans) && "font-sans",
variant && variantClasses[variant],
className,
)}
ref={ref}
{...props}
/>
);
});
export const H2 = forwardRef<HTMLHeadingElement, Omit<TypographyProps, "as">>(function H2(
{ className, variant = "lg", ...props },
ref,
) {
return <Typography as="h2" className={cn("font-bold", className)} variant={variant} ref={ref} {...props} />;
});
+2 -2
View File
@@ -3,9 +3,9 @@ import { ExternalLink, X, Check } from "lucide-react";
import { Button } from "@nous-research/ui/ui/components/button";
import { CopyButton } from "@nous-research/ui/ui/components/command-block";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { H2 } from "@/components/NouiTypography";
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
import { api, type OAuthProvider, type OAuthStartResponse } from "@/lib/api";
import { Input } from "@/components/ui/input";
import { Input } from "@nous-research/ui/ui/components/input";
import { useI18n } from "@/i18n";
import { cn, themedBody } from "@/lib/utils";
+2 -2
View File
@@ -16,9 +16,9 @@ import {
CardDescription,
CardHeader,
CardTitle,
} from "@/components/ui/card";
} from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
import { OAuthLoginModal } from "@/components/OAuthLoginModal";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -2,7 +2,7 @@ import { AlertTriangle, Radio, Wifi, WifiOff } from "lucide-react";
import type { PlatformStatus } from "@/lib/api";
import { isoTimeAgo } from "@/lib/utils";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { useI18n } from "@/i18n";
export function PlatformsCard({ platforms }: PlatformsCardProps) {
+1 -1
View File
@@ -1,4 +1,4 @@
import { Typography } from "@/components/NouiTypography";
import { Typography } from "@nous-research/ui/ui/components/typography/index";
import type { StatusResponse } from "@/lib/api";
import { cn } from "@/lib/utils";
import { useI18n } from "@/i18n";
+5 -5
View File
@@ -3,9 +3,9 @@ import { createPortal } from "react-dom";
import { Palette, Check } from "lucide-react";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { BottomPickSheet } from "@/components/BottomPickSheet";
import { Typography } from "@/components/NouiTypography";
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
import { Typography } from "@nous-research/ui/ui/components/typography/index";
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
import { BUILTIN_THEMES, useTheme } from "@/themes";
import type { DashboardTheme, ThemeListEntry } from "@/themes";
import { useI18n } from "@/i18n";
@@ -91,7 +91,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
</Button>
{useMobileSheet && (
<BottomPickSheet
<BottomSheet
backdropDismissLabel={t.common.close}
onClose={close}
open={open}
@@ -105,7 +105,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
themeName={themeName}
/>
</div>
</BottomPickSheet>
</BottomSheet>
)}
{open && !useMobileSheet && (() => {
-40
View File
@@ -1,40 +0,0 @@
import { useEffect, useState } from "react";
import { createPortal } from "react-dom";
export function Toast({ toast }: { toast: { message: string; type: "success" | "error" } | null }) {
const [visible, setVisible] = useState(false);
const [current, setCurrent] = useState(toast);
useEffect(() => {
if (toast) {
setCurrent(toast);
setVisible(true);
} else {
setVisible(false);
const timer = setTimeout(() => setCurrent(null), 200);
return () => clearTimeout(timer);
}
}, [toast]);
if (!current) return null;
// Portal to document.body so the toast escapes any ancestor stacking context
// (e.g. <main> has `relative z-2`, which would trap z-50 below the header's z-40).
return createPortal(
<div
role="status"
aria-live="polite"
className={`fixed top-16 right-4 z-50 border px-4 py-2.5 font-courier text-xs tracking-wider uppercase backdrop-blur-sm ${
current.type === "success"
? "bg-success/15 text-success border-success/30"
: "bg-destructive/15 text-destructive border-destructive/30"
}`}
style={{
animation: visible ? "toast-in 200ms ease-out forwards" : "toast-out 200ms ease-in forwards",
}}
>
{current.message}
</div>,
document.body,
);
}
-63
View File
@@ -1,63 +0,0 @@
import { cn, themedBody } from "@/lib/utils";
/**
* Themed card primitive. Themes can restyle every card without touching
* call sites by setting CSS vars under the `card` component-style bucket:
*
* componentStyles:
* card:
* clipPath: "polygon(10px 0, 100% 0, 100% calc(100% - 10px), calc(100% - 10px) 100%, 0 100%, 0 10px)"
* border: "1px solid var(--color-ring)"
* background: "linear-gradient(180deg, var(--color-card) 0%, transparent 100%)"
* boxShadow: "0 0 0 1px var(--color-ring) inset, 0 0 24px -8px var(--warm-glow)"
*
* All properties are optional vars that aren't set compute to their
* CSS initial value, so the default shadcn-y card keeps looking normal
* for themes that don't override anything.
*/
const CARD_STYLE: React.CSSProperties = {
clipPath: "var(--component-card-clip-path)",
borderImage: "var(--component-card-border-image)",
background: "var(--component-card-background)",
boxShadow: "var(--component-card-box-shadow)",
};
export function Card({ className, style, ...props }: React.HTMLAttributes<HTMLDivElement>) {
return (
<div
className={cn(
"border border-border bg-card/80 text-card-foreground w-full",
themedBody,
className,
)}
style={{ ...CARD_STYLE, ...style }}
{...props}
/>
);
}
export function CardHeader({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
return <div className={cn("flex flex-col gap-1.5 p-4 border-b border-border", className)} {...props} />;
}
export function CardTitle({ className, ...props }: React.HTMLAttributes<HTMLHeadingElement>) {
return (
<h3
className={cn(
"font-mondwest text-display text-sm tracking-[0.12em] text-text-primary",
className,
)}
{...props}
/>
);
}
export function CardDescription({ className, ...props }: React.HTMLAttributes<HTMLParagraphElement>) {
return (
<p className={cn("font-mondwest normal-case text-xs text-muted-foreground", className)} {...props} />
);
}
export function CardContent({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
return <div className={cn("p-4", className)} {...props} />;
}
-137
View File
@@ -1,137 +0,0 @@
import { useEffect, useRef } from "react";
import { createPortal } from "react-dom";
import { AlertTriangle } from "lucide-react";
import { Button } from "@nous-research/ui/ui/components/button";
import { cn, themedBody } from "@/lib/utils";
export function ConfirmDialog({
cancelLabel = "Cancel",
confirmLabel = "Confirm",
description,
destructive = false,
loading = false,
onCancel,
onConfirm,
open,
title,
}: ConfirmDialogProps) {
const dialogRef = useRef<HTMLDivElement>(null);
// Focus the confirm button when opened; trap ESC to cancel.
useEffect(() => {
if (!open) return;
const prevActive = document.activeElement as HTMLElement | null;
dialogRef.current
?.querySelector<HTMLButtonElement>("[data-confirm]")
?.focus();
const onKey = (e: KeyboardEvent) => {
if (e.key === "Escape") {
e.preventDefault();
onCancel();
}
};
document.addEventListener("keydown", onKey);
const prevOverflow = document.body.style.overflow;
document.body.style.overflow = "hidden";
return () => {
document.removeEventListener("keydown", onKey);
document.body.style.overflow = prevOverflow;
prevActive?.focus?.();
};
}, [open, onCancel]);
if (!open) return null;
return createPortal(
<div
role="dialog"
aria-modal="true"
aria-labelledby="confirm-dialog-title"
aria-describedby={description ? "confirm-dialog-desc" : undefined}
onClick={(e) => {
if (e.target === e.currentTarget) onCancel();
}}
className={cn(
"fixed inset-0 z-50 flex items-center justify-center",
"bg-black/60 backdrop-blur-sm",
"animate-[fade-in_150ms_ease-out]",
)}
>
<div
ref={dialogRef}
className={cn(
themedBody,
"relative w-full max-w-md mx-4",
"border border-border bg-card shadow-lg",
"animate-[dialog-in_180ms_ease-out]",
)}
>
<div className="flex items-start gap-3 p-4 border-b border-border">
{destructive && (
<div
aria-hidden
className="mt-0.5 shrink-0 text-destructive"
>
<AlertTriangle className="h-4 w-4" />
</div>
)}
<div className="flex-1 min-w-0 flex flex-col gap-1">
<h2
id="confirm-dialog-title"
className="font-mondwest text-display text-sm font-bold tracking-[0.12em] blend-lighter"
>
{title}
</h2>
{description && (
<p
id="confirm-dialog-desc"
className="font-mondwest normal-case text-xs text-muted-foreground leading-relaxed"
>
{description}
</p>
)}
</div>
</div>
<div className="flex items-center justify-end gap-2 p-3">
<Button
type="button"
outlined
onClick={onCancel}
disabled={loading}
>
{cancelLabel}
</Button>
<Button
data-confirm
type="button"
destructive={destructive}
onClick={onConfirm}
disabled={loading}
>
{loading ? "…" : confirmLabel}
</Button>
</div>
</div>
</div>,
document.body,
);
}
interface ConfirmDialogProps {
cancelLabel?: string;
confirmLabel?: string;
description?: string;
destructive?: boolean;
loading?: boolean;
onCancel: () => void;
onConfirm: () => void;
open: boolean;
title: string;
}
-16
View File
@@ -1,16 +0,0 @@
import { cn } from "@/lib/utils";
export function Input({ className, ...props }: React.InputHTMLAttributes<HTMLInputElement>) {
return (
<input
className={cn(
"flex h-9 w-full border border-border bg-background/40 px-3 py-1 font-courier text-sm transition-colors",
"placeholder:text-muted-foreground",
"focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-foreground/30 focus-visible:border-foreground/25",
"disabled:cursor-not-allowed disabled:opacity-50",
className,
)}
{...props}
/>
);
}
-13
View File
@@ -1,13 +0,0 @@
import { cn } from "@/lib/utils";
export function Label({ className, ...props }: React.LabelHTMLAttributes<HTMLLabelElement>) {
return (
<label
className={cn(
"font-mondwest text-xs tracking-[0.1em] uppercase leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70",
className,
)}
{...props}
/>
);
}
-19
View File
@@ -1,19 +0,0 @@
import { cn } from "@/lib/utils";
export function Separator({
className,
orientation = "horizontal",
...props
}: React.HTMLAttributes<HTMLDivElement> & { orientation?: "horizontal" | "vertical" }) {
return (
<div
role="separator"
className={cn(
"shrink-0 bg-border",
orientation === "horizontal" ? "h-px w-full" : "h-full w-px",
className,
)}
{...props}
/>
);
}
+1 -1
View File
@@ -1,7 +1,7 @@
import { useCallback, useEffect, useState } from "react";
import { api } from "@/lib/api";
import type { ActionStatusResponse } from "@/lib/api";
import { Toast } from "@/components/Toast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { useI18n } from "@/i18n";
import {
SystemActionsContext,
-19
View File
@@ -1,19 +0,0 @@
import { useEffect, useState } from "react";
/** True when viewport width is strictly below `px` (matches Tailwind `min-width: px`). */
export function useBelowBreakpoint(px: number) {
const query = `(max-width: ${px - 1}px)`;
const [matches, setMatches] = useState(() =>
typeof window !== "undefined" ? window.matchMedia(query).matches : false,
);
useEffect(() => {
const mql = window.matchMedia(query);
const sync = () => setMatches(mql.matches);
sync();
mql.addEventListener("change", sync);
return () => mql.removeEventListener("change", sync);
}, [query]);
return matches;
}
-41
View File
@@ -1,41 +0,0 @@
import { useCallback, useState } from "react";
export function useConfirmDelete<TId>({
onDelete,
}: {
onDelete: (id: TId) => Promise<void>;
}) {
const [pendingId, setPendingId] = useState<TId | null>(null);
const [isDeleting, setIsDeleting] = useState(false);
const requestDelete = useCallback((id: TId) => {
setPendingId(id);
}, []);
const cancel = useCallback(() => {
if (!isDeleting) setPendingId(null);
}, [isDeleting]);
const confirm = useCallback(async () => {
if (pendingId === null) return;
const id = pendingId;
setIsDeleting(true);
try {
await onDelete(id);
setPendingId(null);
} catch {
// Dialog stays open; caller can surface errors in onDelete before rethrowing
} finally {
setIsDeleting(false);
}
}, [pendingId, onDelete]);
return {
cancel,
confirm,
isDeleting,
isOpen: pendingId !== null,
pendingId,
requestDelete,
} as const;
}
-15
View File
@@ -1,15 +0,0 @@
import { useCallback, useState } from "react";
export function useToast(duration = 3000) {
const [toast, setToast] = useState<{ message: string; type: "success" | "error" } | null>(null);
const showToast = useCallback(
(message: string, type: "success" | "error") => {
setToast({ message, type });
setTimeout(() => setToast(null), duration);
},
[duration],
);
return { toast, showToast };
}
+27 -3
View File
@@ -41,7 +41,11 @@ function setSessionHeader(headers: Headers, token: string): void {
}
}
export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T> {
export async function fetchJSON<T>(
url: string,
init?: RequestInit,
options?: FetchJSONOptions,
): Promise<T> {
// Inject the session token into all /api/ requests.
const headers = new Headers(init?.headers);
const token = window.__HERMES_SESSION_TOKEN__;
@@ -100,7 +104,7 @@ export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T>
// that reload once on the first stale-token 401 — gated mode is
// handled above, so reaching here in gated mode means a real
// middleware failure that should not reload-loop.
if (!window.__HERMES_AUTH_REQUIRED__) {
if (!window.__HERMES_AUTH_REQUIRED__ && !options?.allowUnauthorized) {
let alreadyReloaded = false;
try {
alreadyReloaded =
@@ -198,8 +202,19 @@ export const api = {
* still exists but is never useful there (no Session, no cookie). The
* AuthWidget component swallows 401s from this call: if the gate isn't
* engaged, /api/auth/me returns 401 and the widget renders nothing.
*
* ``allowUnauthorized`` is load-bearing: in loopback mode this endpoint
* 401s by design, and fetchJSON's default loopback behaviour treats a
* 401 as a rotated session token and full-page-reloads to pick up a
* fresh one. Because every *other* dashboard request succeeds (and so
* clears the one-shot reload guard), that turns this expected 401 into
* an infinite reload loop. Opting out keeps the 401 a plain throw the
* widget can catch.
*/
getAuthMe: () => fetchJSON<AuthMeResponse>("/api/auth/me"),
getAuthMe: () =>
fetchJSON<AuthMeResponse>("/api/auth/me", undefined, {
allowUnauthorized: true,
}),
logout: () =>
fetch(`${BASE}/auth/logout`, {
method: "POST",
@@ -514,6 +529,15 @@ export interface ActionResponse {
pid: number;
}
/** Per-call overrides for {@link fetchJSON}. */
interface FetchJSONOptions {
/** When true, a 401 response is surfaced as a normal thrown error rather
* than triggering the loopback stale-token page reload. Use for probes
* whose 401 is an expected signal (e.g. /api/auth/me in non-gated mode)
* rather than evidence of a rotated session token. */
allowUnauthorized?: boolean;
}
export interface ActionStatusResponse {
exit_code: number | null;
lines: string[];
+1 -1
View File
@@ -20,7 +20,7 @@ import { timeAgo } from "@/lib/utils";
import { Button } from "@nous-research/ui/ui/components/button";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Stats } from "@nous-research/ui/ui/components/stats";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { usePageHeader } from "@/contexts/usePageHeader";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -23,7 +23,7 @@ import { WebglAddon } from "@xterm/addon-webgl";
import { Terminal } from "@xterm/xterm";
import "@xterm/xterm/css/xterm.css";
import { Button } from "@nous-research/ui/ui/components/button";
import { Typography } from "@/components/NouiTypography";
import { Typography } from "@nous-research/ui/ui/components/typography/index";
import { HERMES_BASE_PATH, buildWsAuthParam } from "@/lib/api";
import { cn } from "@/lib/utils";
import { Copy, PanelRight, X } from "lucide-react";
+5 -5
View File
@@ -38,15 +38,15 @@ import {
} from "lucide-react";
import { api } from "@/lib/api";
import { getNestedValue, setNestedValue } from "@/lib/nested";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@/components/Toast";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { AutoField } from "@/components/AutoField";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
import { Input } from "@/components/ui/input";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
import { Input } from "@nous-research/ui/ui/components/input";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
+7 -7
View File
@@ -4,17 +4,17 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { H2 } from "@/components/NouiTypography";
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
import { api } from "@/lib/api";
import type { CronJob, ProfileInfo } from "@/lib/api";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { useToast } from "@/hooks/useToast";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
import { useModalBehavior } from "@/hooks/useModalBehavior";
import { Toast } from "@/components/Toast";
import { Card, CardContent } from "@/components/ui/card";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { Card, CardContent } from "@nous-research/ui/ui/components/card";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+6 -6
View File
@@ -17,9 +17,9 @@ import {
import { api } from "@/lib/api";
import type { EnvVarInfo } from "@/lib/api";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { Toast } from "@/components/Toast";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { OAuthProvidersCard } from "@/components/OAuthProvidersCard";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
@@ -30,10 +30,10 @@ import {
CardDescription,
CardHeader,
CardTitle,
} from "@/components/ui/card";
} from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+2 -2
View File
@@ -12,8 +12,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
import { FilterGroup, Segmented } from "@nous-research/ui/ui/components/segmented";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Label } from "@/components/ui/label";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { Label } from "@nous-research/ui/ui/components/label";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+2 -2
View File
@@ -24,9 +24,9 @@ import { formatTokenCount } from "@/lib/format";
import { Button } from "@nous-research/ui/ui/components/button";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Stats } from "@nous-research/ui/ui/components/stats";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
import { useModalBehavior } from "@/hooks/useModalBehavior";
import { usePageHeader } from "@/contexts/usePageHeader";
import { useI18n } from "@/i18n";
+6 -6
View File
@@ -10,12 +10,12 @@ import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { CommandBlock } from "@nous-research/ui/ui/components/command-block";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@/components/Toast";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { useI18n } from "@/i18n";
import { PluginSlot } from "@/plugins";
import { cn } from "@/lib/utils";
+7 -7
View File
@@ -14,19 +14,19 @@ import {
X,
} from "lucide-react";
import spinners from "unicode-animations";
import { H2 } from "@/components/NouiTypography";
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
import { api } from "@/lib/api";
import type { ProfileInfo } from "@/lib/api";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { useToast } from "@/hooks/useToast";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
import { useModalBehavior } from "@/hooks/useModalBehavior";
import { Toast } from "@/components/Toast";
import { Card, CardContent } from "@/components/ui/card";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { Card, CardContent } from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
+5 -5
View File
@@ -34,18 +34,18 @@ import type {
import { timeAgo } from "@/lib/utils";
import { Markdown } from "@/components/Markdown";
import { PlatformsCard } from "@/components/PlatformsCard";
import { Toast } from "@/components/Toast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Segmented } from "@nous-research/ui/ui/components/segmented";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { Input } from "@/components/ui/input";
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
import { Input } from "@nous-research/ui/ui/components/input";
import { useSystemActions } from "@/contexts/useSystemActions";
import { useToast } from "@/hooks/useToast";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+4 -4
View File
@@ -17,16 +17,16 @@ import {
} from "lucide-react";
import { api } from "@/lib/api";
import type { SkillInfo, ToolsetInfo } from "@/lib/api";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@/components/Toast";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { useToast } from "@nous-research/ui/hooks/use-toast";
import { Toast } from "@nous-research/ui/ui/components/toast";
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { cn } from "@/lib/utils";
import { Input } from "@/components/ui/input";
import { Input } from "@nous-research/ui/ui/components/input";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+4 -4
View File
@@ -23,10 +23,10 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Card, CardHeader, CardTitle, CardContent } from "@/components/ui/card";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Separator } from "@/components/ui/separator";
import { Card, CardHeader, CardTitle, CardContent } from "@nous-research/ui/ui/components/card";
import { Input } from "@nous-research/ui/ui/components/input";
import { Label } from "@nous-research/ui/ui/components/label";
import { Separator } from "@nous-research/ui/ui/components/separator";
import { Tabs, TabsList, TabsTrigger } from "@nous-research/ui/ui/components/tabs";
import { useI18n } from "@/i18n";
import { registerSlot, PluginSlot } from "./slots";
@@ -154,10 +154,10 @@ hooks:
**`sync_turn()` MUST be non-blocking.** If your backend has latency (API calls, LLM processing), run the work in a daemon thread:
```python
def sync_turn(self, user_content, assistant_content):
def sync_turn(self, user_content, assistant_content, *, session_id="", messages=None):
def _sync():
try:
self._api.ingest(user_content, assistant_content)
self._api.ingest(user_content, assistant_content, session_id=session_id, messages=messages)
except Exception as e:
logger.warning("Sync failed: %s", e)
@@ -167,6 +167,16 @@ def sync_turn(self, user_content, assistant_content):
self._sync_thread.start()
```
`messages` is optional OpenAI-style conversation context as of the completed
turn. When present, it includes user/assistant messages, assistant tool calls,
and tool result messages. Providers that do not need raw turn context can omit
the `messages` parameter; Hermes will continue calling them with the legacy
signature.
Cloud providers should document what parts of `messages` are sent off-device.
Tool calls and tool results may contain file paths, command output, or other
workspace data.
## Profile Isolation
All storage paths **must** use the `hermes_home` kwarg from `initialize()`, not hardcoded `~/.hermes`:
+1
View File
@@ -227,6 +227,7 @@ Options:
| Option | Description |
|--------|-------------|
| `--all` | On `start` / `restart` / `stop`: act on **every profile's** gateway, not just the active `HERMES_HOME`. Useful if you run multiple profiles side-by-side and want to restart them all after `hermes update`. |
| `--no-supervise` | On `run`: inside the s6-overlay Docker image, opt out of auto-supervision and use pre-s6 foreground semantics — gateway runs as the container's main process with no auto-restart. No-op outside the s6 image. Equivalent to setting `HERMES_GATEWAY_NO_SUPERVISE=1`. |
:::tip WSL users
Use `hermes gateway run` instead of `hermes gateway start` — WSL's systemd support is unreliable. Wrap it in tmux for persistence: `tmux new -s hermes 'hermes gateway run'`. See [WSL FAQ](/reference/faq#wsl-gateway-keeps-disconnecting-or-hermes-gateway-start-fails) for details.
@@ -518,6 +518,7 @@ Advanced per-platform knobs for throttling the outbound message batcher. Most us
| `HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT` | Per-platform connect timeout during gateway startup (seconds). |
| `HERMES_GATEWAY_BUSY_INPUT_MODE` | Default gateway busy-input behavior: `queue`, `steer`, or `interrupt`. Can be overridden per chat with `/busy`. |
| `HERMES_GATEWAY_BUSY_ACK_ENABLED` | Whether the gateway sends an acknowledgment message (⚡/⏳/⏩) when a user sends input while the agent is busy (default: `true`). Set to `false` to suppress these messages entirely — the input is still queued/steered/interrupts as normal, only the chat reply is silenced. Bridged from `display.busy_ack_enabled` in `config.yaml`. |
| `HERMES_GATEWAY_NO_SUPERVISE` | Inside the s6-overlay Docker image, opt out of auto-supervision when running `hermes gateway run` and use pre-s6 foreground semantics (no auto-restart, gateway is the container's main process). Truthy values: `1`, `true`, `yes`. Equivalent to the `--no-supervise` CLI flag. No-op outside the s6 image. |
| `HERMES_FILE_MUTATION_VERIFIER` | Enable the per-turn file-mutation verifier footer (default: `true`). When enabled, Hermes appends an advisory listing any `write_file` / `patch` calls that failed during the turn and were not superseded by a successful write. Set to `0`, `false`, `no`, or `off` to suppress. Mirrors `display.file_mutation_verifier` in `config.yaml`; the env var wins when set. |
| `HERMES_CRON_TIMEOUT` | Inactivity timeout for cron job agent runs in seconds (default: `600`). The agent can run indefinitely while actively calling tools or receiving stream tokens — this only triggers when idle. Set to `0` for unlimited. |
| `HERMES_CRON_SCRIPT_TIMEOUT` | Timeout for pre-run scripts attached to cron jobs in seconds (default: `120`). Override for scripts that need longer execution (e.g., randomized delays for anti-bot timing). Also configurable via `cron.script_timeout_seconds` in `config.yaml`. |
+65 -4
View File
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.
```yaml
terminal:
@@ -138,8 +138,11 @@ terminal:
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
docker_run_as_host_user: false # See "Running container as host user" below
docker_forward_env: # Env vars to forward into container
docker_forward_env: # Host env vars to forward into container
- "GITHUB_TOKEN"
docker_env: # Literal env vars to inject (KEY=value)
DEBUG: "1"
PYTHONUNBUFFERED: "1"
docker_volumes: # Host directory mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
@@ -151,14 +154,49 @@ terminal:
container_cpu: 1 # CPU cores (0 = unlimited)
container_memory: 5120 # MB (0 = unlimited)
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
container_persistent: true # Persist /workspace and /root across sessions
container_persistent: true # Persist /workspace and /root bind-mount dirs
# Cross-process container reuse (defaults match the "one long-lived
# container shared across sessions" contract — see Container lifecycle).
docker_persist_across_processes: true # Reuse container across Hermes restarts
docker_orphan_reaper: true # Sweep abandoned Exited containers at startup
# Cross-backend lifecycle settings (apply to docker as well)
timeout: 180 # Per-command timeout in seconds
lifetime_seconds: 300 # Idle-reaper window; also feeds 2× orphan-reaper threshold
```
**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
**`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
#### Container lifecycle
Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
- `hermes-agent=1` — marks it as Hermes-managed
- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
| Trigger | When it fires |
|---|---|
| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
Edge cases worth knowing:
- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
#### Environment variable overrides
Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
| Env var | Maps to | Notes |
|---|---|---|
| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
### SSH Backend
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
+23 -2
View File
@@ -101,8 +101,29 @@ The entrypoint starts `hermes dashboard` in the background (running as the non-r
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `127.0.0.1` |
| `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
| `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
| `HERMES_DASHBOARD_INSECURE` | Set to `1` (or `true` / `yes`) to bind without the OAuth auth gate. Only use on trusted networks behind a reverse proxy without the OAuth contract — the dashboard exposes API keys and session data | *(unset — gate enforced when a `DashboardAuthProvider` is registered)* |
By default, the dashboard stays on loopback to avoid exposing the unauthenticated web surface over the network. To publish it intentionally, set `HERMES_DASHBOARD_HOST=0.0.0.0` and configure your own trusted network boundary/reverse proxy. In that case you must explicitly add `--insecure` behavior by passing host/flags in your command path (the entrypoint no longer auto-enables insecure mode).
By default, the dashboard stays on loopback (`127.0.0.1`) to avoid exposing
the web surface over the network. To publish it intentionally, set
`HERMES_DASHBOARD_HOST=0.0.0.0`. The dashboard's OAuth auth gate engages
automatically whenever:
1. The bind host is non-loopback, **and**
2. A `DashboardAuthProvider` plugin is registered.
The bundled `dashboard_auth/nous` provider activates whenever
`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see
[Web Dashboard → Authentication](features/web-dashboard.md)). With the
gate engaged, browser callers are redirected to the configured portal's
OAuth flow before they can reach any protected route.
If no provider is registered and the bind is non-loopback, the dashboard
**fails closed at startup** with a specific error pointing at the
missing env var. To opt out of the gate explicitly — for a trusted-LAN
deployment behind your own reverse proxy without the OAuth contract —
set `HERMES_DASHBOARD_INSECURE=1`. This re-enables the legacy "no auth,
loud warning" mode and is the only path that disables the gate; the bind
host does not implicitly determine `--insecure` anymore.
:::note
The dashboard runs as a supervised s6 service inside the container. If
@@ -228,7 +249,7 @@ docker run -it --rm \
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
:::note Looking for Docker as the **terminal backend**?
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
:::
## Docker Compose example
@@ -520,6 +520,27 @@ echo 'SUPERMEMORY_API_KEY=***' >> ~/.hermes/.env
**Support:** [Discord](https://supermemory.link/discord) · [support@supermemory.com](mailto:support@supermemory.com)
### Memori
Structured long-term memory using Memori Cloud, with background completed-turn capture, tool-aware turn context, and explicit recall tools for facts, summaries, quota, signup, and feedback.
| | |
|---|---|
| **Best for** | Agent-controlled recall with structured project and session attribution |
| **Requires** | `pip install hermes-memori` + `hermes-memori install` + [Memori API key](https://app.memorilabs.ai/signup) |
| **Data storage** | Memori Cloud |
| **Cost** | Memori pricing |
**Tools:** `memori_recall` (search long-term memory), `memori_recall_summary` (summarized context), `memori_quota` (usage/quota), `memori_signup` (request signup email), `memori_feedback` (send integration feedback)
**Setup:**
```bash
pip install hermes-memori
hermes-memori install
hermes config set memory.provider memori
hermes memory setup
```
---
## Provider Comparison
@@ -534,6 +555,7 @@ echo 'SUPERMEMORY_API_KEY=***' >> ~/.hermes/.env
| **RetainDB** | Cloud | $20/mo | 5 | `requests` | Delta compression |
| **ByteRover** | Local/Cloud | Free/Paid | 3 | `brv` CLI | Pre-compression extraction |
| **Supermemory** | Cloud | Paid | 4 | `supermemory` | Context fencing + session graph ingest + multi-container |
| **Memori** | Cloud | Free/Paid | 5 | `hermes-memori` | Tool-aware memory + structured recall |
## Profile Isolation
@@ -224,6 +224,7 @@ hermes gateway <subcommand>
| 选项 | 说明 |
|--------|-------------|
| `--all` | 在 `start` / `restart` / `stop` 时:对**每个 profile** 的 gateway 执行操作,而不仅限于活跃的 `HERMES_HOME`。当你并行运行多个 profile 并希望在 `hermes update` 后全部重启时很有用。 |
| `--no-supervise` | 在 `run` 时:在 s6-overlay Docker 镜像内部,跳过 s6 自动监管,退回到 pre-s6 前台语义——gateway 作为容器主进程运行,无自动重启。在 s6 镜像之外为空操作。等同于设置 `HERMES_GATEWAY_NO_SUPERVISE=1`。 |
:::tip WSL 用户
使用 `hermes gateway run` 而非 `hermes gateway start`——WSL 的 systemd 支持不稳定。用 tmux 包裹以保持持久运行:`tmux new -s hermes 'hermes gateway run'`。详见 [WSL FAQ](/reference/faq#wsl-gateway-keeps-disconnecting-or-hermes-gateway-start-fails)。
@@ -518,6 +518,7 @@ Graph 事件(Teams 会议、日历、聊天等)的入站变更通知监听
| `HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT` | gateway 启动期间每个平台的连接超时(秒)。 |
| `HERMES_GATEWAY_BUSY_INPUT_MODE` | 默认 gateway 繁忙输入行为:`queue``steer``interrupt`。可通过 `/busy` 按聊天覆盖。 |
| `HERMES_GATEWAY_BUSY_ACK_ENABLED` | gateway 是否在用户 agent 繁忙时发送确认消息(⚡/⏳/⏩)(默认:`true`)。设为 `false` 可完全抑制这些消息——输入仍会正常排队/引导/中断,只是聊天回复被静默。从 `config.yaml` 中的 `display.busy_ack_enabled` 桥接。 |
| `HERMES_GATEWAY_NO_SUPERVISE` | 在 s6-overlay Docker 镜像内部运行 `hermes gateway run` 时跳过 s6 自动监管,退回到 pre-s6 前台语义(无自动重启,gateway 作为容器主进程)。真值:`1``true``yes`。等同于 `--no-supervise` CLI 标志。在 s6 镜像之外为空操作。 |
| `HERMES_FILE_MUTATION_VERIFIER` | 启用每轮文件变更验证器页脚(默认:`true`)。启用后,Hermes 附加一个建议列表,列出本轮中失败且未被成功写入覆盖的 `write_file`/`patch` 调用。设为 `0``false``no``off` 可抑制。镜像 `config.yaml` 中的 `display.file_mutation_verifier`;设置时环境变量优先。 |
| `HERMES_CRON_TIMEOUT` | cron 任务 agent 运行的不活动超时(秒,默认:`600`)。agent 在主动调用工具或接收流 token 时可无限运行——仅在空闲时触发。设为 `0` 表示无限制。 |
| `HERMES_CRON_SCRIPT_TIMEOUT` | cron 任务附加的预运行脚本超时(秒,默认:`120`)。对需要更长执行时间的脚本(例如随机延迟的反机器人计时)可增大此值。也可通过 `config.yaml` 中的 `cron.script_timeout_seconds` 配置。 |
@@ -80,8 +80,28 @@ docker run -d \
| `HERMES_DASHBOARD_HOST` | dashboard HTTP 服务器的绑定地址 | `127.0.0.1` |
| `HERMES_DASHBOARD_PORT` | dashboard HTTP 服务器的端口 | `9119` |
| `HERMES_DASHBOARD_TUI` | 设为 `1` 以启用浏览器内 Chat 标签页(通过 PTY/WebSocket 嵌入 `hermes --tui` | *(未设置)* |
| `HERMES_DASHBOARD_INSECURE` | 设为 `1`(或 `true` / `yes`)以在不启用 OAuth 鉴权门控的情况下绑定。仅在可信网络(且通过没有 OAuth 契约的反向代理时)使用——dashboard 会暴露 API 密钥与会话数据 | *(未设置——当注册了 `DashboardAuthProvider` 时启用门控)* |
默认情况下,dashboard 保持在回环地址,以避免将未经身份验证的 Web 界面暴露到网络。若要有意发布,请设置 `HERMES_DASHBOARD_HOST=0.0.0.0` 并配置你自己的可信网络边界/反向代理。在这种情况下,你必须通过命令路径中的 host/flags 显式添加 `--insecure` 行为(入口点不再自动启用不安全模式)。
默认情况下,dashboard 保持在回环地址`127.0.0.1`,以避免将
Web 界面暴露到网络。若要有意发布,请设置
`HERMES_DASHBOARD_HOST=0.0.0.0`。当以下两项同时满足时,
dashboard 的 OAuth 鉴权门控会自动启用:
1. 绑定地址为非回环地址,**且**
2. 注册了一个 `DashboardAuthProvider` 插件。
捆绑的 `dashboard_auth/nous` 提供者会在设置
`HERMES_DASHBOARD_OAUTH_CLIENT_ID` 时自动激活(参见
[Web Dashboard → 鉴权](features/web-dashboard.md))。门控启用后,
浏览器调用方会先被重定向到所配置门户的 OAuth 流,然后才能
访问任何受保护路由。
如果未注册提供者且绑定为非回环地址,dashboard **会在启动时
失败关闭**,并给出指向缺失环境变量的具体错误信息。要显式
退出门控——用于不使用 OAuth 契约、通过你自己的反向代理部署
在可信局域网中的场景——请设置 `HERMES_DASHBOARD_INSECURE=1`
这会恢复旧的“无鉴权,但发出告警”模式,也是唯一可以禁用门控的
路径;绑定地址不再隐式决定 `--insecure`
:::note
dashboard 在容器内作为受监管的 s6 服务运行。如果
+2 -2
View File
@@ -500,7 +500,7 @@ export default function SkillsDashboard() {
const sources = useMemo(() => {
const set = new Set(allSkillsLocal.map((s) => s.source));
return SOURCE_ORDER.filter((s) => s === "all" || set.has(s));
}, []);
}, [allSkillsLocal]);
const categoryEntries = useMemo(() => {
const pool =
@@ -523,7 +523,7 @@ export default function SkillsDashboard() {
return Array.from(map.entries())
.sort((a, b) => b[1].count - a[1].count)
.map(([key, { label, count }]) => ({ key, label, count }));
}, [sourceFilter]);
}, [sourceFilter, allSkillsLocal]);
const filtered = useMemo(() => {
const q = debouncedSearch.toLowerCase().trim();