Compare commits

...

10 Commits

Author SHA1 Message Date
Teknium 3b6347af15 feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145, #21582) (#34244)
Two related dispatcher behaviors that have been missing for a while.

## kanban.default_assignee (#27145)

Reporter (@agarzon): dashboard creates a task without an assignee, task
parks in 'ready' forever even though the operator's intent ('default')
is perfectly clear. The dispatcher already had a 'skipped_unassigned'
bucket but no fallback routing — users had to manually type 'default'
in the assignee field every time.

Behavior: when 'kanban.default_assignee' is set in config.yaml, the
dispatcher applies that assignee to any unassigned ready task before
deciding whether to spawn. The row is mutated (assignee column + an
'assigned' event with source='kanban.default_assignee' for the audit
trail). Empty/whitespace config value = no fallback, preserving the
existing skipped_unassigned behavior.

Dry-run mode reports what WOULD happen via the new
'auto_assigned_default' bucket on DispatchResult, but does NOT mutate
the DB — operators using 'hermes kanban dispatch --dry-run' see the
routing decision before committing.

## kanban.max_in_progress_per_profile (#21582)

Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads
saturate one profile's local model / API quota / browser pool while
other profiles sit idle. The existing global 'max_in_progress' caps
total workers but doesn't balance across profiles.

Behavior: when 'kanban.max_in_progress_per_profile' is set to a
positive int, the dispatcher tracks per-assignee running counts (one
query at tick start) and refuses to spawn for any assignee already at
the cap. Tasks blocked this way go to a new
'skipped_per_profile_capped' bucket on DispatchResult as
(task_id, assignee, current_running_count) tuples — NOT an
operator-actionable failure, just 'try again next tick when the
profile has capacity'.

Pre-existing 'running' tasks count against the cap (verified via
regression test). The cap respects dry_run mode by incrementing
its in-memory counter on each would-be spawn so dry_run reports
the same balanced subset that a real tick would.

Invalid cap values (0, negative, non-int, None) are treated as 'no
cap', preserving the existing behavior. Backward-compatible for
installs that don't set the config.

## Surfaces

- 'hermes kanban dispatch' CLI now prints 'Auto-assigned to
  kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap,
  N running): ...' lines, plus matching JSON keys in --json output.
- Gateway dispatcher logs the configured values at startup
  ('default_assignee=X', 'max_in_progress_per_profile=N').
- 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with
  inline docs.

## Validation

- tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap
  baseline, auto-assign + DB mutation, dry-run reports without
  mutating, whitespace treated as None, explicit assignees untouched,
  DispatchResult field schema.
- tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including
  4 parametrized): no-cap baseline, balanced 2-profile fan-out,
  pre-existing running counts against cap, invalid cap values
  (0/-1/'abc'/None), capped tasks dispatched on next tick after
  running task completes, DispatchResult field schema.
- Broader kanban suite: 464/464 pass (was 449 baseline; +15 new
  regression tests across both features).

## Credit

#27145 — Jimmy Johansson reported the dispatcher skipped-unassigned
gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix
that matches the existing config knob.
#21582 — @edwardchenchen filed the per-profile cap ask after hitting
model 429s on fan-out research projects; @simlu confirmed the same
pain on local-model setups.
2026-05-28 19:02:55 -07:00
Ben 42612aa350 docs(docker): refresh user-guide page for s6-overlay reality
The page was last meaningfully rewritten in the pre-s6 (tini) era and had
drifted on five points that no longer matched the image:

1. "Running the dashboard" claimed the entrypoint backgrounds
   `hermes dashboard` and prefixes its output with `[dashboard]`. That
   was the pre-s6 entrypoint.sh path; under s6 the dashboard is a
   supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no
   sed-prefix pipeline. Rewrote the section accordingly.

2. The default for `HERMES_DASHBOARD_HOST` was documented as
   `127.0.0.1`. The s6 run script defaults it to `0.0.0.0`
   (`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table
   and the surrounding prose.

3. Multi-profile was documented as "not recommended in Docker — run
   one container per profile." That advice was load-bearing when
   there was no in-container supervisor, but the s6 architecture
   explicitly adds per-profile gateway supervision: each profile
   created via `hermes profile create <name>` gets a slot under
   `/run/service/gateway-<name>/`, the `02-reconcile-profiles`
   cont-init script restores them across `docker restart` from
   `gateway_state.json`, and `hermes gateway start/stop/restart` is
   intercepted by `_dispatch_via_service_manager_if_s6` to route
   through `s6-svc`. Pivoted the section to "one container, many
   supervised profile gateways" as the default, with a comparison
   table and a "When you DO want a separate container" escape
   hatch for the genuine resource-isolation / network-segmentation
   cases.

4. The Compose example trailer also claimed `[dashboard]` log
   prefixing. Replaced with the actual log routing.

5. Added a new "Where the logs go" section covering all four log
   surfaces: per-profile gateways (tee'd to `docker logs` AND
   `${HERMES_HOME}/logs/gateways/<profile>/current` since PR
   b34532319), dashboard (`docker logs`, no prefix), boot reconciler
   (`container-boot.log`), and `hermes logs`. The gateway-mode and
   Compose sections cross-reference this rather than each carrying
   their own routing prose.

Added a new "docker exec automatically drops to the hermes user"
subsection under "What the Dockerfile does", next to the existing
Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim
(landed via the docker-exec privilege-drop work) — operators don't
need to remember `--user hermes` for `docker exec hermes login`,
`docker exec hermes profile create …`, etc. The historical footgun
(`auth.json` written as `root:root`, supervised gateway then can't
read its own auth file) is mentioned only as context for what the
fail-loud `exit 126` is protecting against, not as a problem the
reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is
documented for diagnostic sessions.

The "Permission denied" troubleshooting subsection now carries a
single-line pointer to the new section instead of duplicating it.

The `--insecure` framing reflects PR #fb5125362 (opt-in via
`HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth
gate is the authority, the bind host alone never implies
`--insecure`, and opting out is an explicit security trade-off.

Anchors verified resolve. i18n zh-Hans mirror left for the
translation flow to catch up.
2026-05-29 11:55:01 +10:00
Ben 3c6e70aef1 docs(docker): document new persist-across-processes contract and orphan reaper (#20561)
Updates the Docker Backend section of the user-guide configuration page
to match the actual behavior shipped in PR #33645. Pre-PR the docs
claimed "container is stopped and removed on shutdown," which was
never quite true for the documented happy path and is now actively
wrong: in default mode the container survives across Hermes processes
so background processes (npm watchers, dev servers, long-running
pytest) carry over the way the "ONE long-lived container shared
across sessions" promise requires.

Changes to `website/docs/user-guide/configuration.md`:

* Reworked the intro paragraph at the top of the Docker Backend
  section to describe the actual cross-process reuse contract.
* Expanded the YAML example with the new keys
  `docker_persist_across_processes` and `docker_orphan_reaper`, plus
  the pre-existing-but-undocumented `docker_env`, `timeout`, and
  `lifetime_seconds`.  Clarified the `container_persistent` comment
  to disambiguate from `docker_persist_across_processes`.
* Added a `docker_env` vs `docker_forward_env` explainer (one
  injects literal KEY=value, the other forwards values from the
  host/.env — easy to confuse).
* Replaced the one-line "Container lifecycle" paragraph with a full
  subsection covering:
    - the three labels Hermes tags every container with
      (hermes-agent, hermes-task-id, hermes-profile)
    - the label-probe reuse mechanism on startup
    - a teardown-trigger table with four rows for every situation
      that destroys the container in default mode
    - edge cases (OOM kill, profile switching)
* Added an "Environment variable overrides" table covering all
  TERMINAL_* env vars relevant to the Docker backend, including the
  previously-undocumented `TERMINAL_DOCKER_ENV` and
  `HERMES_DOCKER_BINARY`.

Changes to `website/docs/user-guide/docker.md`:

* Extended the cross-link admonition (around l.227) so the
  Hermes-in-Docker page points at the new terminal-backend keys
  (`docker_env`, `docker_persist_across_processes`,
  `docker_orphan_reaper`) alongside the ones already mentioned.

No code changes.  Behavior already covered by tests added in earlier
commits on this branch (#33645 commits 1-5).

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 2f0f03c40d fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close)
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.

Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.

The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.

Two new unit tests pin the behavior:

* `test_cleanup_vm_default_honors_persist_mode` — asserts
  `cleanup_vm(task_id)` does neither docker stop nor docker rm on a
  persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
  asserts the kwarg still flows through the runtime-signature-inspection
  plumbing to the backend's cleanup().

E2E verified against real Docker (in addition to all 17 existing checks):

  ✓ Default cleanup_vm() leaves persist-mode container running
  ✓ cleanup_vm(force_remove=True) removed the container

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 5c2170a7c6 fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561)
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.

This commit splits the cleanup paths cleanly:

* **Persist mode (default)** — cleanup() is a NO-OP for the
  container. Container stays running, processes survive, next Hermes
  process attaches via the existing label probe in ~ms instead of
  waiting for docker start. Resource reclamation happens via the
  orphan reaper at next startup (2 × lifetime_seconds threshold), which
  covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
  docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
  overrides persist mode and tears the container down unconditionally.
  cleanup_vm(task_id) now defaults to force_remove=True since
  it's the user-driven reset path (called from AIAgent.close(),
  /reset-style flows, and the idle reaper's per-turn cleanup).

The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.

E2E verified against real Docker:

  ✓ Container is still running after cleanup()
  ✓ Background process (sleep loop) survived cleanup()
  ✓ Filesystem state preserved across cleanup()
  ✓ In-process container_id cleared (next __init__ will re-probe)
  ✓ Background process visible from reused env (no docker start happened)
  ✓ force_remove=True removed the container even in persist mode
  ✓ cleanup_vm() removed the container (defaults to force_remove=True)

Test changes:

* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
  `test_cleanup_with_persist_is_noop_for_container` — asserts neither
  stop nor rm runs in persist mode, and the in-process handle is
  cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
  — covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
  `test_wait_for_cleanup_after_cleanup_returns_true` to pass
  `force_remove=True` so they actually exercise the docker code path
  (default no-op would trivially pass).

cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben d77d877665 fix(docker): startup orphan reaper for crashed-process containers
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.

But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.

This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.

Implementation:

* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
  Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
  ``label=hermes-profile=<current>``. Per-container ``docker inspect``
  parses ``State.FinishedAt`` (with nanosecond-precision trimming for
  Python's microsecond-bound ``fromisoformat``); containers older than
  the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
  load-bearing — a running container may belong to a sibling Hermes
  process whose reuse path will pick it up; killing it would crash the
  sibling mid-command. Single-container failures are logged and the
  sweep continues to the next candidate.

* New ``_maybe_reap_docker_orphans()`` helper in
  ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
  ``env_type == "docker"``. Gated by:

    - ``terminal.docker_orphan_reaper: true`` (default; opt-out for
      operators running multiple Hermes processes in the same profile
      who don't trust the conservative defaults)
    - ``_docker_orphan_reaper_ran`` module flag with double-checked
      locking — parallel subagents and RL rollouts don't trigger N
      concurrent docker ps storms
    - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
      (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
      setup)
    - Profile scoping — a research profile NEVER reaps the default
      profile's stragglers
    - Exception swallow — a janitor failure must never block container
      creation

* New config ``terminal.docker_orphan_reaper`` wired through all four
  config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
  tests/conftest.py) and pinned by
  ``test_docker_orphan_reaper_is_bridged_everywhere``.

Coverage:

* 9 new unit tests in test_docker_environment.py — happy path, recent-
  container sparing, profile scoping, unparseable-timestamp safety,
  docker-ps-failure handling, partial-failure continuation, nanosecond
  timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
  — once-per-process gate, disable-flag respected, lifetime doubling
  with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.

Closes #20561 (combined with the two prior commits on this branch).
2026-05-29 11:49:54 +10:00
Ben ac8e238bc8 fix(docker): reuse containers across processes + fix cleanup leaks
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).

Three root causes, all addressed by this commit:

1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
   which raced with parent-process exit — when Python exited promptly the
   detached shell child got killed mid-\`docker stop\`, leaving stopped
   containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
   (the bind-mount-persistence flag). Default config sets
   \`container_persistent: true\`, so the default happy path skipped \`rm\`
   entirely — even when the user explicitly didn't want cross-process
   reuse, containers leaked.

Fix:

* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
  true, init probes
  \`docker ps -a --filter label=hermes-agent=1
                  --filter label=hermes-task-id=<task>
                  --filter label=hermes-profile=<profile>\`
  and reuses a matching container (running → attach; stopped →
  \`docker start\` → attach; \`docker start\` failure → fall through to a
  fresh \`docker run\`). Multiple matches prefer the running one, with the
  stragglers left for the orphan reaper (next commit) to clean up.

* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
  daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
  \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
  whenever \`persist_across_processes\` is false, regardless of the
  bind-mount-persistence setting. The leak class is gone in all
  combinations.

* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
  hook calls this on every active env, blocking up to 15s for the
  cleanup thread before interpreter exit. Without this, \`hermes /quit\`
  raced the daemon-thread teardown and dropped the stop/rm work.

* New config \`terminal.docker_persist_across_processes\` (default
  \`true\` — restores the documented contract). Set \`false\` for hard
  per-process isolation. Wired through all four config-bridge sites
  (cli.py env_mappings, gateway/run.py _terminal_env_map,
  hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
  list); regression-pinned by
  \`test_docker_persist_across_processes_is_bridged_everywhere\` matching
  the existing pattern for docker_run_as_host_user / docker_env.

Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.

Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben 8d129d013b fix(docker): tag containers with hermes-agent labels for identification
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.

This commit adds three labels at `docker run` time:

  --label hermes-agent=1                # global sweep target
  --label hermes-task-id=<sanitized>    # per-task reuse key
  --label hermes-profile=<sanitized>    # per-profile isolation key

Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.

No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.

Refs #20561
2026-05-29 11:49:54 +10:00
Teknium 300140e006 test(tui_gateway): stop reloading server module in fixture teardown (#34217)
tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
2026-05-28 18:16:54 -07:00
Teknium e71a2bd11b chore: release v0.15.1 (2026.5.29) (#34222) 2026-05-28 18:11:49 -07:00
23 changed files with 2460 additions and 147 deletions
+110
View File
@@ -0,0 +1,110 @@
# Hermes Agent v0.15.1 (v2026.5.29)
**Release Date:** May 29, 2026
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
---
## ✨ Highlights
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
---
## 🐛 Bug Fixes
### Dashboard / Web
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
### Docker
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
### MCP
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
### Kanban
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
### Gateway
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
### CLI
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
### Skills
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
### Redaction
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
---
## ✨ Small Features
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
---
## 📚 Documentation
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
---
## 🛠️ Infrastructure
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
---
## 👥 Contributors
### Core
- @teknium1
### Community
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
- @kshitijk4poor`/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
- @nicoloboschi — hindsight `recall_types` observation default
- @BROCCOLO1D — arm64 PR build cache fix
- @r266-tech — `--no-supervise` reference docs
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
- @devwdave — completed-turn memory context (credited via salvage)
- @andrewhosf — co-author
### Issue Reporters (the 401 loop)
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
---
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
+2 -2
View File
@@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.15.0",
"version": "0.15.1",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.15.0",
"package": "hermes-agent[acp]==0.15.1",
"args": ["hermes-acp"]
}
}
+2
View File
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
# Persistent shell (non-local backends)
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
+47
View File
@@ -831,6 +831,8 @@ if _config_path.exists():
"docker_env": "TERMINAL_DOCKER_ENV",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
}
@@ -5418,6 +5420,49 @@ class GatewayRunner:
)
stale_timeout_seconds = 0
# Read kanban.default_assignee — fallback profile for tasks
# created without an explicit assignee (e.g. via the dashboard).
# When set, the dispatcher applies it to unassigned ready tasks
# instead of skipping them indefinitely (#27145). Empty string
# (the schema default) means "no fallback, keep skipping" —
# backward-compatible with existing installs.
default_assignee = (kanban_cfg.get("default_assignee") or "").strip() or None
if default_assignee:
logger.info(
"kanban dispatcher: default_assignee=%r (unassigned ready tasks "
"will route to this profile)",
default_assignee,
)
# Read kanban.max_in_progress_per_profile — per-profile concurrency
# cap (#21582). When set, no single profile gets more than N
# workers running at once, even if the global max_in_progress
# would allow it. Prevents one profile's local model / API quota
# / browser pool from being overwhelmed by a fan-out.
raw_per_profile = kanban_cfg.get("max_in_progress_per_profile", None)
max_in_progress_per_profile = None
if raw_per_profile is not None:
try:
max_in_progress_per_profile = int(raw_per_profile)
except (TypeError, ValueError):
logger.warning(
"kanban dispatcher: invalid kanban.max_in_progress_per_profile=%r; ignoring",
raw_per_profile,
)
max_in_progress_per_profile = None
else:
if max_in_progress_per_profile < 1:
logger.warning(
"kanban dispatcher: kanban.max_in_progress_per_profile=%r is below 1; ignoring",
raw_per_profile,
)
max_in_progress_per_profile = None
else:
logger.info(
"kanban dispatcher: max_in_progress_per_profile=%d",
max_in_progress_per_profile,
)
# Initial delay so the gateway finishes wiring adapters before the
# dispatcher spawns workers (those workers may hit gateway notify
# subscriptions etc.). Matches the notifier watcher's delay.
@@ -5509,6 +5554,8 @@ class GatewayRunner:
max_in_progress=max_in_progress,
failure_limit=failure_limit,
stale_timeout_seconds=stale_timeout_seconds,
default_assignee=default_assignee,
max_in_progress_per_profile=max_in_progress_per_profile,
)
except sqlite3.DatabaseError as exc:
if _is_corrupt_board_db_error(exc):
+2 -2
View File
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
__version__ = "0.15.0"
__release_date__ = "2026.5.28"
__version__ = "0.15.1"
__release_date__ = "2026.5.29"
def _ensure_utf8():
+11
View File
@@ -1726,6 +1726,15 @@ DEFAULT_CONFIG = {
# assignee to any installed profile. When unset, falls back to the
# default profile. A task never ends up with assignee=None.
"default_assignee": "",
# Per-profile concurrency cap (#21582). When set to a positive int,
# no single profile can have more than N workers running at once,
# even if the global max_in_progress / max_spawn caps would allow
# it. Tasks blocked this way defer to the next dispatcher tick.
# Unset (None) means "no per-profile cap" — backward-compatible
# with existing installs. Useful for fan-out workflows that would
# otherwise saturate one profile's local model / API quota /
# browser pool while leaving other profiles idle.
"max_in_progress_per_profile": None,
# When true, the kanban dispatcher auto-runs the decomposer on
# tasks that land in Triage (every dispatcher tick). When false,
# decomposition is manual via `hermes kanban decompose <id>` or
@@ -5551,6 +5560,8 @@ def set_config_value(key: str, value: str):
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
# terminal.cwd intentionally excluded — CLI resolves at runtime,
# gateway bridges it in gateway/run.py. Persisting to .env causes
+38
View File
@@ -2087,12 +2087,35 @@ def _cmd_tail(args: argparse.Namespace) -> int:
def _cmd_dispatch(args: argparse.Namespace) -> int:
# Honour kanban.default_assignee as the fallback for unassigned ready
# tasks (#27145) and kanban.max_in_progress_per_profile as the
# per-profile concurrency cap (#21582). Same semantics as the
# gateway dispatch path.
try:
from hermes_cli.config import load_config
_cfg = load_config()
_kanban_cfg = _cfg.get("kanban", {}) if isinstance(_cfg, dict) else {}
default_assignee = (_kanban_cfg.get("default_assignee") or "").strip() or None
_raw_per_profile = _kanban_cfg.get("max_in_progress_per_profile", None)
try:
max_in_progress_per_profile = (
int(_raw_per_profile) if _raw_per_profile is not None else None
)
if max_in_progress_per_profile is not None and max_in_progress_per_profile < 1:
max_in_progress_per_profile = None
except (TypeError, ValueError):
max_in_progress_per_profile = None
except Exception:
default_assignee = None
max_in_progress_per_profile = None
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn,
dry_run=args.dry_run,
max_spawn=args.max,
failure_limit=getattr(args, "failure_limit", kb.DEFAULT_SPAWN_FAILURE_LIMIT),
default_assignee=default_assignee,
max_in_progress_per_profile=max_in_progress_per_profile,
)
if getattr(args, "json", False):
print(json.dumps({
@@ -2108,6 +2131,11 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
],
"skipped_unassigned": res.skipped_unassigned,
"skipped_nonspawnable": res.skipped_nonspawnable,
"skipped_per_profile_capped": [
{"task_id": tid, "assignee": who, "current": current}
for (tid, who, current) in res.skipped_per_profile_capped
],
"auto_assigned_default": res.auto_assigned_default,
}, indent=2))
return 0
print(f"Reclaimed: {res.reclaimed}")
@@ -2128,8 +2156,18 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
for tid, who, ws in res.spawned:
tag = " (dry)" if args.dry_run else ""
print(f" - {tid} -> {who} @ {ws or '-'}{tag}")
if res.auto_assigned_default:
print(
f"Auto-assigned to kanban.default_assignee={default_assignee!r}: "
f"{', '.join(res.auto_assigned_default)}"
)
if res.skipped_unassigned:
print(f"Skipped (unassigned): {', '.join(res.skipped_unassigned)}")
if res.skipped_per_profile_capped:
for tid, who, current in res.skipped_per_profile_capped:
print(
f"Deferred ({who} at per-profile cap, {current} running): {tid}"
)
if res.skipped_nonspawnable:
print(
f"Skipped (non-spawnable assignee — terminal lane, OK): "
+126 -5
View File
@@ -4289,6 +4289,12 @@ class DispatchResult:
skipped_unassigned: list[str] = field(default_factory=list)
"""Ready task ids skipped because they have no assignee at all.
Operator-actionable usually a misfiled task waiting for routing."""
auto_assigned_default: list[str] = field(default_factory=list)
"""Task ids that were unassigned in the DB and had
``kanban.default_assignee`` applied this tick before spawning (#27145).
Surfaces the auto-assignment to telemetry / CLI / dashboard so the
operator can see when the dispatcher is acting on the fallback rule
rather than on explicit per-task assignments."""
skipped_nonspawnable: list[str] = field(default_factory=list)
"""Ready task ids skipped because their assignee names a control-plane
lane (a Claude Code terminal like ``orion-cc``) rather than a Hermes
@@ -4296,6 +4302,14 @@ class DispatchResult:
operator-actionable failure. Tracked separately so health telemetry
can distinguish "real stuck" (nothing spawned but spawnable work
available) from "correctly idle" (nothing spawnable in the queue)."""
skipped_per_profile_capped: list[tuple[str, str, int]] = field(default_factory=list)
"""Tasks deferred this tick because their assignee is already at
``kanban.max_in_progress_per_profile`` (#21582). Each entry is
``(task_id, assignee, current_running_count)``. NOT an
operator-actionable failure the task will be picked up on a
subsequent tick when the assignee has capacity. Separate bucket so
telemetry / dashboards can show "this profile is busy" vs
"task is genuinely stuck"."""
crashed: list[str] = field(default_factory=list)
"""Task ids reclaimed because their worker PID disappeared."""
auto_blocked: list[str] = field(default_factory=list)
@@ -5342,6 +5356,8 @@ def dispatch_once(
failure_limit: int = DEFAULT_SPAWN_FAILURE_LIMIT,
stale_timeout_seconds: int = 0,
board: Optional[str] = None,
default_assignee: Optional[str] = None,
max_in_progress_per_profile: Optional[int] = None,
) -> DispatchResult:
"""Run one dispatcher tick.
@@ -5427,12 +5443,89 @@ def dispatch_once(
if max_spawn is None or max_spawn > remaining:
max_spawn = remaining
spawned = 0
# Per-profile concurrency cap (#21582): when set, track how many
# workers each assignee already has in flight, and refuse to spawn
# when this would push that assignee past the cap. Prevents
# fan-out workloads from melting a single profile's local model /
# API quota / browser pool while leaving other profiles idle.
# Tasks blocked this way go to skipped_per_profile_capped (not
# skipped_unassigned — the operator-actionable signal is different:
# "this profile is busy, try again later" not "this needs routing").
_per_profile_cap = max_in_progress_per_profile if (
isinstance(max_in_progress_per_profile, int)
and max_in_progress_per_profile > 0
) else None
_per_profile_running: dict[str, int] = {}
if _per_profile_cap is not None:
for prow in conn.execute(
"SELECT assignee, COUNT(*) AS n FROM tasks "
"WHERE status = 'running' AND assignee IS NOT NULL "
"GROUP BY assignee"
):
_per_profile_running[prow["assignee"]] = int(prow["n"])
# Normalize default_assignee once: empty/whitespace string → None so the
# rest of the loop can use ``if default_assignee:`` as a single check.
# We also resolve profile_exists once here for the same reason.
_default_assignee = (default_assignee or "").strip() or None
_default_assignee_resolved = False
if _default_assignee:
try:
from hermes_cli.profiles import profile_exists as _pe
_default_assignee_resolved = bool(_pe(_default_assignee))
except Exception:
# Profiles module not importable (test stubs, exotic envs).
# Trust the operator's config and try the assignment; the
# downstream profile_exists check on the assigned row will
# bucket it as nonspawnable if the profile genuinely isn't
# there, with the existing diagnostic.
_default_assignee_resolved = True
for row in ready_rows:
if max_spawn is not None and running_count + spawned >= max_spawn:
break
if not row["assignee"]:
result.skipped_unassigned.append(row["id"])
continue
row_assignee = row["assignee"]
if not row_assignee:
# Honour kanban.default_assignee: when the dispatcher hits an
# unassigned ready task and an operator-configured fallback
# exists, persist the assignment and proceed. This removes the
# dashboard footgun where a task created without an assignee
# parks in 'ready' forever even though the operator's intent
# ("default") was perfectly clear (#27145). Mutating the row
# (not just the in-memory view) keeps diagnostics and the
# board state consistent: the task is now legitimately owned
# by ``kanban.default_assignee``, not "unassigned but secretly
# routed".
if _default_assignee and _default_assignee_resolved:
# Dry-run: show what WOULD happen (auto-assign + spawn) without
# mutating the DB. Real run: mutate the row + emit the
# 'assigned' event so the board state matches what just happened.
if not dry_run:
try:
with write_txn(conn):
conn.execute(
"UPDATE tasks SET assignee = ? WHERE id = ? "
"AND (assignee IS NULL OR assignee = '')",
(_default_assignee, row["id"]),
)
_append_event(
conn, row["id"], "assigned",
{
"assignee": _default_assignee,
"source": "kanban.default_assignee",
},
)
except Exception:
_log.debug(
"kanban dispatch: failed to apply default_assignee=%r "
"to task %s",
_default_assignee, row["id"], exc_info=True,
)
result.skipped_unassigned.append(row["id"])
continue
row_assignee = _default_assignee
result.auto_assigned_default.append(row["id"])
else:
result.skipped_unassigned.append(row["id"])
continue
# Skip ready tasks whose assignee is not a real Hermes profile.
# `_default_spawn` invokes ``hermes -p <assignee>`` which fails
# with "Profile 'X' does not exist" when the assignee names a
@@ -5447,7 +5540,7 @@ def dispatch_once(
from hermes_cli.profiles import profile_exists # local import: avoids cycle
except Exception:
profile_exists = None # type: ignore[assignment]
if profile_exists is not None and not profile_exists(row["assignee"]):
if profile_exists is not None and not profile_exists(row_assignee):
# Bucket separately from skipped_unassigned: the operator
# cannot fix this by assigning a profile (the assignee IS the
# intended owner — a terminal lane). Health telemetry uses
@@ -5456,6 +5549,19 @@ def dispatch_once(
# of human-pulled work.
result.skipped_nonspawnable.append(row["id"])
continue
# Per-profile concurrency cap (#21582): even if there's global
# headroom, refuse to spawn for an assignee that's already at
# its in-flight cap. Prevents one profile's local model / API
# quota / browser pool from being overwhelmed by a fan-out
# while the global max_in_progress / max_spawn caps still allow
# work on OTHER profiles.
if _per_profile_cap is not None:
current = _per_profile_running.get(row_assignee, 0)
if current >= _per_profile_cap:
result.skipped_per_profile_capped.append(
(row["id"], row_assignee, current)
)
continue
# Respawn guard: refuse to re-spawn when useful work is already
# in-flight/recent, or when the last failure is a deterministic
# blocker (quota / auth). The guard defers the spawn this tick so
@@ -5478,7 +5584,15 @@ def dispatch_once(
)
continue
if dry_run:
result.spawned.append((row["id"], row["assignee"], ""))
result.spawned.append((row["id"], row_assignee, ""))
# Increment per-profile counter even in dry_run so the cap
# check sees the would-be spawn on subsequent iterations.
# Without this, dry_run reports every task as spawnable and
# under-reports the capped subset (#21582).
if _per_profile_cap is not None and row_assignee:
_per_profile_running[row_assignee] = (
_per_profile_running.get(row_assignee, 0) + 1
)
continue
claimed = claim_task(conn, row["id"], ttl_seconds=ttl_seconds)
if claimed is None:
@@ -5521,6 +5635,13 @@ def dispatch_once(
# complete_task).
result.spawned.append((claimed.id, claimed.assignee or "", str(workspace)))
spawned += 1
# Track the new in-flight count for this profile so later
# iterations in this same tick respect the per-profile cap
# (#21582). Subsequent ticks re-query from the DB.
if _per_profile_cap is not None and claimed.assignee:
_per_profile_running[claimed.assignee] = (
_per_profile_running.get(claimed.assignee, 0) + 1
)
except Exception as exc:
auto = _record_spawn_failure(
conn, claimed.id, str(exc),
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.15.0"
version = "0.15.1"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
+2
View File
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
"TERMINAL_CONTAINER_DISK",
"TERMINAL_CONTAINER_MEMORY",
"TERMINAL_CONTAINER_PERSISTENT",
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
"TERMINAL_DOCKER_ORPHAN_REAPER",
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
"BROWSER_CDP_URL",
"CAMOFOX_URL",
@@ -0,0 +1,154 @@
"""Regression tests for #27145 — kanban.default_assignee for unassigned ready tasks.
When the dispatcher hits an unassigned ready task and ``kanban.default_assignee``
is set, the dispatcher applies the assignment and spawns. Without the config,
the task is skipped (existing behavior preserved).
"""
from __future__ import annotations
import json
import os
import sys
import tempfile
import pytest
@pytest.fixture()
def isolated_kanban_home(monkeypatch):
"""Spin up a fresh HERMES_HOME with a clean kanban DB."""
test_home = tempfile.mkdtemp(prefix="kanban_default_assignee_test_")
monkeypatch.setenv("HERMES_HOME", test_home)
# Force-reimport so the fresh HERMES_HOME is picked up.
for mod in list(sys.modules.keys()):
if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
del sys.modules[mod]
from hermes_cli import kanban_db
yield kanban_db, test_home
# Cleanup is best-effort; tempfile dir survives but pytest isolation
# gives each test its own monkeypatched HERMES_HOME so no cross-test
# contamination.
def _fake_spawn(*args, **kwargs):
"""Stand-in for the real worker spawn — returns a fake PID."""
return 12345
def test_unassigned_task_skipped_without_default_assignee(isolated_kanban_home):
"""Baseline: with no default_assignee, an unassigned ready task is
skipped via the existing `skipped_unassigned` bucket and the DB row
is untouched."""
kb, _home = isolated_kanban_home
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
task_id = kb.create_task(conn, title="t1", assignee=None)
with kb.connect_closing() as conn:
res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=False)
assert res.skipped_unassigned == [task_id]
assert not res.auto_assigned_default
assert not res.spawned
with kb.connect_closing() as conn:
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
assert row["assignee"] is None
def test_unassigned_task_auto_assigned_with_default_assignee(isolated_kanban_home):
"""Core #27145 contract: with default_assignee set, an unassigned ready
task gets the assignment applied and dispatched on the same tick. The
DB row is mutated (assignee column + an 'assigned' event)."""
kb, _home = isolated_kanban_home
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
task_id = kb.create_task(conn, title="t1", assignee=None)
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=False,
default_assignee="default",
)
assert res.auto_assigned_default == [task_id]
assert not res.skipped_unassigned
assert len(res.spawned) == 1
assert res.spawned[0][0] == task_id
assert res.spawned[0][1] == "default"
with kb.connect_closing() as conn:
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
assert row["assignee"] == "default"
# 'assigned' event emitted for the audit trail
with kb.connect_closing() as conn:
evs = list(conn.execute(
"SELECT kind, payload FROM task_events WHERE task_id = ? AND kind = 'assigned'",
(task_id,),
))
assert len(evs) == 1
payload = json.loads(evs[0][1])
assert payload["assignee"] == "default"
assert payload["source"] == "kanban.default_assignee"
def test_dry_run_with_default_assignee_reports_without_mutating(isolated_kanban_home):
"""Dry-run mode: reports what WOULD happen (task in auto_assigned_default,
spawn entry) but does NOT mutate the DB. Operators using
`hermes kanban dispatch --dry-run` see the routing decision before
committing."""
kb, _home = isolated_kanban_home
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
task_id = kb.create_task(conn, title="t1", assignee=None)
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=True,
default_assignee="default",
)
assert res.auto_assigned_default == [task_id]
assert len(res.spawned) == 1
with kb.connect_closing() as conn:
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
# DB unchanged — dry_run did not commit the assignment.
assert row["assignee"] is None
def test_whitespace_default_assignee_treated_as_none(isolated_kanban_home):
"""Empty / whitespace-only default_assignee values must be treated as
'no fallback set' so a misconfigured kanban.default_assignee=' '
doesn't surprise operators by silently routing unassigned tasks."""
kb, _home = isolated_kanban_home
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
task_id = kb.create_task(conn, title="t1", assignee=None)
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=False,
default_assignee=" ",
)
assert task_id in res.skipped_unassigned
assert not res.auto_assigned_default
def test_explicitly_assigned_task_untouched_by_default_assignee(isolated_kanban_home):
"""A task with an explicit assignee must NOT be touched by the
default_assignee logic — that fallback only applies to genuinely
unassigned rows."""
kb, _home = isolated_kanban_home
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
task_id = kb.create_task(conn, title="t1", assignee="default")
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=False,
default_assignee="someother",
)
assert task_id not in res.auto_assigned_default
assert any(s[0] == task_id and s[1] == "default" for s in res.spawned)
def test_dispatch_result_has_auto_assigned_default_field():
"""Schema-level invariant: DispatchResult exposes the
auto_assigned_default field so CLI / dashboard / gateway can surface
the new routing decisions."""
from hermes_cli.kanban_db import DispatchResult
r = DispatchResult()
assert hasattr(r, "auto_assigned_default")
assert r.auto_assigned_default == []
@@ -0,0 +1,167 @@
"""Regression tests for #21582 — per-profile concurrency cap in dispatcher.
When ``kanban.max_in_progress_per_profile`` is set, no single profile
gets more than N workers running at once even if the global
``max_in_progress`` cap would allow it. Prevents one profile's local
model / API quota / browser pool from being overwhelmed by a fan-out.
"""
from __future__ import annotations
import os
import sys
import tempfile
import pytest
@pytest.fixture()
def isolated_kanban_home_with_profiles(monkeypatch):
"""Spin up a fresh HERMES_HOME with kanban DB + alpha/beta profiles."""
test_home = tempfile.mkdtemp(prefix="kanban_per_profile_cap_test_")
for prof in ("alpha", "beta", "default"):
os.makedirs(os.path.join(test_home, "profiles", prof), exist_ok=True)
monkeypatch.setenv("HERMES_HOME", test_home)
for mod in list(sys.modules.keys()):
if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
del sys.modules[mod]
from hermes_cli import kanban_db
yield kanban_db
def _fake_spawn(*args, **kwargs):
return 12345
def test_no_cap_all_tasks_dispatched(isolated_kanban_home_with_profiles):
"""Baseline: with no per-profile cap, all ready tasks dispatch."""
kb = isolated_kanban_home_with_profiles
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
for i in range(5):
kb.create_task(conn, title=f"a{i}", assignee="alpha")
for i in range(3):
kb.create_task(conn, title=f"b{i}", assignee="beta")
with kb.connect_closing() as conn:
res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=True)
assert len(res.spawned) == 8
assert not res.skipped_per_profile_capped
def test_cap_2_balances_two_profiles(isolated_kanban_home_with_profiles):
"""With cap=2: 2 alpha + 2 beta dispatched; remaining 3 alpha + 1 beta
deferred to skipped_per_profile_capped."""
kb = isolated_kanban_home_with_profiles
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
for i in range(5):
kb.create_task(conn, title=f"a{i}", assignee="alpha")
for i in range(3):
kb.create_task(conn, title=f"b{i}", assignee="beta")
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=True,
max_in_progress_per_profile=2,
)
spawn_assignees = [s[1] for s in res.spawned]
capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
assert spawn_assignees.count("alpha") == 2
assert spawn_assignees.count("beta") == 2
assert capped_assignees.count("alpha") == 3
assert capped_assignees.count("beta") == 1
def test_pre_existing_running_counts_against_cap(isolated_kanban_home_with_profiles):
"""A task already in 'running' status when dispatch_once starts counts
toward the per-profile cap. With 1 alpha pre-running and cap=1, NO new
alpha tasks should spawn; beta is independent so 1 beta spawns."""
kb = isolated_kanban_home_with_profiles
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
running_alpha = kb.create_task(conn, title="running alpha", assignee="alpha")
with kb.write_txn(conn):
conn.execute(
"UPDATE tasks SET status = 'running', claim_lock = 'test:1' WHERE id = ?",
(running_alpha,),
)
for i in range(2):
kb.create_task(conn, title=f"a{i}", assignee="alpha")
for i in range(2):
kb.create_task(conn, title=f"b{i}", assignee="beta")
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=True,
max_in_progress_per_profile=1,
)
spawn_assignees = [s[1] for s in res.spawned]
capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
assert spawn_assignees.count("alpha") == 0
assert spawn_assignees.count("beta") == 1
assert capped_assignees.count("alpha") == 2
assert capped_assignees.count("beta") == 1
@pytest.mark.parametrize("cap", [0, -1, "abc", None])
def test_invalid_cap_treated_as_no_cap(isolated_kanban_home_with_profiles, cap):
"""Cap values that don't represent a positive int should be treated as
'no cap' — silently falling through rather than crashing the dispatcher."""
kb = isolated_kanban_home_with_profiles
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
for i in range(3):
kb.create_task(conn, title=f"a{i}", assignee="alpha")
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=True,
max_in_progress_per_profile=cap,
)
assert not res.skipped_per_profile_capped
assert len(res.spawned) == 3
def test_capped_tasks_dispatched_on_subsequent_tick(isolated_kanban_home_with_profiles):
"""A task deferred this tick because its profile was at cap should be
eligible for dispatch on the next tick (after running tasks complete).
This verifies the cap is per-tick state, not a permanent block."""
kb = isolated_kanban_home_with_profiles
with kb.connect_closing() as conn:
kb.create_board(slug="default", name="Test")
ids = [kb.create_task(conn, title=f"a{i}", assignee="alpha") for i in range(3)]
# First tick: cap=1, only 1 alpha dispatched
with kb.connect_closing() as conn:
res1 = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=False,
max_in_progress_per_profile=1,
)
assert len(res1.spawned) == 1
assert len(res1.skipped_per_profile_capped) == 2
# Simulate the running task completing — set it back to done so the
# 'running' count drops
spawned_id = res1.spawned[0][0]
with kb.connect_closing() as conn:
with kb.write_txn(conn):
conn.execute(
"UPDATE tasks SET status = 'done', claim_lock = NULL WHERE id = ?",
(spawned_id,),
)
# Second tick: 1 more alpha should now dispatch
with kb.connect_closing() as conn:
res2 = kb.dispatch_once(
conn, spawn_fn=_fake_spawn, dry_run=False,
max_in_progress_per_profile=1,
)
assert len(res2.spawned) == 1
assert len(res2.skipped_per_profile_capped) == 1
assert res2.spawned[0][0] != spawned_id # different task this time
def test_dispatch_result_has_skipped_per_profile_capped_field():
"""Schema-level invariant: DispatchResult exposes the
skipped_per_profile_capped field as a list of
(task_id, assignee, current_running) tuples."""
from hermes_cli.kanban_db import DispatchResult
r = DispatchResult()
assert hasattr(r, "skipped_per_profile_capped")
assert r.skipped_per_profile_capped == []
+866 -12
View File
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
def test_non_persistent_cleanup_removes_container(monkeypatch):
"""When persistent=false, cleanup() must schedule docker stop + rm."""
"""When persist_across_processes=false, cleanup() must docker stop AND
docker rm so containers don't leak across hermes processes.
Updated for issue #20561: the previous implementation used fire-and-forget
``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
the new implementation uses ``subprocess.run`` on a daemon thread with
bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
for the full behavior contract.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
# Run the worker thread synchronously so assertions can observe its work.
import threading
monkeypatch.setattr(threading, "Thread", _FakeThread)
popen_cmds = []
monkeypatch.setattr(
docker_env.subprocess, "Popen",
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="ephemeral-task", persistent_filesystem=False,
persist_across_processes=False,
)
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
assert env._container_id
container_id = env._container_id
assert container_id
# Capture cleanup-time docker calls (everything before this was init).
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capture(cmd, **kw):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
return real_run(cmd, **kw)
monkeypatch.setattr(docker_env.subprocess, "run", _capture)
env.cleanup()
# Should have stop and rm calls via Popen
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"
class _FakePopen:
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
"does not expose POSIX uid/gid" in rec.getMessage()
for rec in caplog.records
), "expected a warning when POSIX ids are unavailable"
# ── Docker labels (issue #20561) ──────────────────────────────────
def _run_args_from_calls(calls):
"""Pull the argv list passed to the first ``docker run`` invocation."""
run_calls = [
c for c in calls
if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
]
assert run_calls, "docker run should have been called"
return run_calls[0][0]
def _labels_in_run_args(run_args):
"""Return the set of ``key=value`` strings passed via ``--label``."""
return {
run_args[i + 1]
for i, flag in enumerate(run_args[:-1])
if flag == "--label"
}
def test_run_command_tags_hermes_agent_label(monkeypatch):
"""Every container hermes-agent starts must carry the hermes-agent=1 label
so the orphan reaper (and external operators) can identify them with a
single ``docker ps --filter label=hermes-agent=1`` call. Regression test
for issue #20561 — without the label there is no global sweep target."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="my-task")
labels = _labels_in_run_args(_run_args_from_calls(calls))
assert "hermes-agent=1" in labels, (
f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
)
def test_run_command_tags_task_and_profile_labels(monkeypatch):
"""task_id and the active profile name are surfaced as labels so future
cross-process reuse logic can filter to a specific (task, profile) pair
without parsing container names. Profile resolution uses the helper that
returns ``"default"`` for the root Hermes home."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="kanban-42")
labels = _labels_in_run_args(_run_args_from_calls(calls))
assert "hermes-task-id=kanban-42" in labels, (
f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
)
assert "hermes-profile=research-bot" in labels, (
f"hermes-profile=research-bot missing; got: {sorted(labels)}"
)
def test_label_sanitizer_rejects_invalid_characters():
"""Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
task names containing slashes, colons, or unicode would otherwise emit
invalid labels that round-trip badly through ``docker ps --filter``."""
assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
assert docker_env._sanitize_label_value("with/slash") == "with_slash"
assert docker_env._sanitize_label_value("with:colon") == "with_colon"
assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
# Empty / non-string inputs must collapse to a queryable token, not "".
assert docker_env._sanitize_label_value("") == "unknown"
assert docker_env._sanitize_label_value(None) == "unknown" # type: ignore[arg-type]
# >63 chars must truncate, not error.
long_value = "x" * 100
assert len(docker_env._sanitize_label_value(long_value)) == 63
def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
"""A task_id containing characters Docker rejects in label values must be
sanitized before reaching ``docker run --label``; otherwise the daemon
refuses the run with an inscrutable error and the agent's first command
blows up."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(task_id="task/with:weird*chars")
labels = _labels_in_run_args(_run_args_from_calls(calls))
# Each non-OK character becomes an underscore; the safe chars survive.
assert "hermes-task-id=task_with_weird_chars" in labels, (
f"sanitized task-id label missing; got: {sorted(labels)}"
)
def test_labels_attribute_populated_after_init(monkeypatch):
"""``self._labels`` must be set to the same key/value pairs that went onto
docker run, so subsequent reuse / reaper paths can match without re-running
the sanitizer or re-importing the profile module."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
env = _make_dummy_env(task_id="abc")
assert env._labels == {
"hermes-agent": "1",
"hermes-task-id": "abc",
"hermes-profile": "default",
}
# ── Cross-process container reuse (issue #20561) ──────────────────
def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
start_succeeds: bool = True):
"""Reuse-aware subprocess.run mock.
``ps_state`` controls what ``docker ps -a --filter ...`` returns:
* ``None`` → no match (empty stdout). Forces a fresh ``docker run``.
* ``"running"`` / ``"exited"`` / ... → emit ``CID\\tSTATE`` so the reuse
path picks it up. ``"running"`` skips ``docker start``; other states
trigger ``docker start`` (which can be forced to fail via
``start_succeeds=False``).
Returns the captured call list so the test can verify which docker
commands actually ran.
"""
calls = []
def _run(cmd, **kwargs):
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if isinstance(cmd, list) and len(cmd) >= 2:
sub = cmd[1]
if sub == "version":
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
if sub == "ps":
if ps_state is None:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
return subprocess.CompletedProcess(
cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
)
if sub == "start":
if not start_succeeds:
# Real subprocess.run with check=True raises on non-zero exit;
# mirror that so the production code's except clause fires.
raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
if sub == "run":
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
return calls
def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
"""When a labeled container is already ``running``, the reuse probe
must pick it up and skip ``docker run`` entirely. Regression for the
issue #20561 root cause: every Hermes process spawning a new container
despite docs claiming "ONE long-lived container shared across sessions"."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
env = _make_dummy_env(task_id="reuse-test")
# The reuse path must populate _container_id from the ps probe output.
assert env._container_id == "reused-cid", (
f"expected reused container id, got {env._container_id!r}"
)
# And it must NOT have run `docker run`.
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert not run_invocations, (
f"docker run should be skipped on reuse, got: {run_invocations}"
)
# And it must have NOT issued a `docker start` for an already-running container.
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
assert not start_invocations, (
f"docker start should be skipped when container already running, got: {start_invocations}"
)
def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
"""A labeled container in ``exited`` state must be restarted via
``docker start`` before the new Hermes process uses it. Without this
step, ``docker exec`` against a stopped container errors out and the
first agent command fails opaquely."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
env = _make_dummy_env(task_id="reuse-stopped")
assert env._container_id == "reused-cid"
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
assert start_invocations, "expected docker start for exited container"
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert not run_invocations, "should not docker run when reusing an exited container"
def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
"""If ``docker start`` on the matched container fails (container was
removed between probe and start, daemon paused, etc.), the code must
silently fall through to a fresh ``docker run`` rather than leaving the
user with a broken environment. Defensive recovery — the probe is best-
effort, not authoritative."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
calls = _mock_subprocess_run_with_reuse(
monkeypatch, ps_state="exited", start_succeeds=False,
)
env = _make_dummy_env(task_id="reuse-broken-start")
# docker start should be attempted then fail; code falls through to run.
assert env._container_id == "fresh-cid", (
f"expected fresh container id after fallback, got {env._container_id!r}"
)
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_invocations, "fallback to fresh docker run must happen on start failure"
def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
"""Opt-out path: ``persist_across_processes=False`` skips the ps probe
entirely and always starts a fresh container, matching the pre-fix
behavior for users who want hard per-process isolation."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
# ps_state=running would trigger reuse if the probe ran — assert it doesn't.
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="no-reuse", persist_across_processes=False,
)
# Must NOT have issued docker ps (the probe is gated by the flag).
ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
assert not ps_invocations, (
f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
)
# Should have started a fresh container.
assert env._container_id == "fresh-cid"
def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
"""When the probe returns multiple matches (shouldn't normally happen,
but can after a crash leaves stale duplicates), a ``running`` container
is preferred over any stopped one. The duplicate gets reaped later by
the orphan reaper; we don't try to be heroic about it here."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
def _run(cmd, **kwargs):
if isinstance(cmd, list) and len(cmd) >= 2:
if cmd[1] == "version":
return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
if cmd[1] == "ps":
# Two matches: stopped first, running second.
return subprocess.CompletedProcess(
cmd, 0,
stdout="stopped-cid\texited\nrunning-cid\trunning\n",
stderr="",
)
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
env = _make_dummy_env(task_id="dup-match")
assert env._container_id == "running-cid", (
f"running container should win over stopped duplicate, got {env._container_id!r}"
)
# ── Cleanup correctness (issue #20561) ────────────────────────────
class _FakeThread:
"""Stand-in for threading.Thread that captures target/args and calls
target() synchronously when .start() runs, so cleanup behavior is
observable without actually backgrounding subprocess calls."""
def __init__(self, target=None, daemon=None, name=None):
self._target = target
self.daemon = daemon
self.name = name
self._done = False
def start(self):
if self._target is not None:
self._target()
self._done = True
def is_alive(self):
return not self._done
def join(self, timeout=None):
self._done = True
def _install_fake_thread(monkeypatch):
import threading
monkeypatch.setattr(threading, "Thread", _FakeThread)
def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
"""``persist_across_processes=True`` (default) cleanup must NEITHER stop
NOR remove the container — the docs promise "ONE long-lived container
shared across sessions", and any docker stop would kill background
processes inside the container (npm watchers, pytest watchers, etc.).
Resource reclamation in this mode happens via the orphan reaper on next
Hermes startup, not on graceful exit. Issue #20561 — the first iteration
of this PR did docker stop here, which Ben caught as contradicting the
"ONE long-lived container" semantics."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
# Default persist_across_processes=True.
container_id = env._container_id
assert container_id
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup()
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert not stops, (
f"docker stop must NOT be called when persist_across_processes=True; "
f"container has to stay running so background processes survive. "
f"Got: {stops}"
)
assert not rms, (
f"docker rm must NOT be called when persist_across_processes=True; "
f"reuse would be impossible. Got: {rms}"
)
# The in-process handle must still be cleared so the next __init__
# re-probes via labels (and reuses the still-running container).
assert env._container_id is None, (
"in-process container_id should be cleared even in no-op cleanup"
)
def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
"""``cleanup(force_remove=True)`` must stop AND rm the container even
when ``persist_across_processes=True``. This is the explicit-teardown
path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
future caller that wants a guaranteed fresh container.
Without this kwarg, callers in persist mode would have no way to force a
fresh container without also flipping the global config — too coarse for
a per-task reset.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
assert env._container_id
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup(force_remove=True)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
"""``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
for a persist-mode container.
Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
(which is called from ``tui_gateway/server.py`` on session.close, from
``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
we'd tear down the container on every TUI session close, defeating the
"ONE long-lived container shared across sessions" contract.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
from tools import terminal_tool
env = _make_dummy_env(task_id="session-close-test")
container_id = env._container_id
terminal_tool._active_environments["session-close-test"] = env
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
try:
terminal_tool.cleanup_vm("session-close-test")
finally:
terminal_tool._active_environments.pop("session-close-test", None)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert not stops, (
f"cleanup_vm() default must not docker stop a persist-mode container; "
f"got: {stops}"
)
assert not rms, (
f"cleanup_vm() default must not docker rm a persist-mode container; "
f"got: {rms}"
)
def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
"""``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
container — the explicit-teardown path for ``/reset``-style flows.
Also pins the runtime-signature-inspection plumbing: the kwarg must
actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
from tools import terminal_tool
env = _make_dummy_env(task_id="explicit-teardown-test")
terminal_tool._active_environments["explicit-teardown-test"] = env
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
try:
terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
finally:
terminal_tool._active_environments.pop("explicit-teardown-test", None)
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
"""``persist_across_processes=False`` cleanup must docker stop AND docker
rm so containers don't leak. Crucially, this runs regardless of the
``persistent_filesystem`` setting — the original code only rm'd when
``not self._persistent``, which meant the default-on ``container_persistent:
true`` users (the documented happy path) leaked Exited containers forever.
Issue #20561 root-cause fix."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
# Note: persistent_filesystem=True (the prior-leak scenario) + the new
# cross-process toggle OFF must still result in a clean rm.
env = docker_env.DockerEnvironment(
image="python:3.11", cwd="/root", timeout=60,
task_id="cleanup-no-persist", persistent_filesystem=True,
persist_across_processes=False,
)
cleanup_calls = []
real_run = docker_env.subprocess.run
def _capturing_run(cmd, **kwargs):
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
return real_run(cmd, **kwargs)
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
env.cleanup()
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
assert stops, "expected docker stop"
assert rms, (
"docker rm MUST run when persist_across_processes=False, even with "
"persistent_filesystem=True — that gating was the leak source in #20561."
)
def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
"""The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
raced with parent-process exit and silently dropped cleanup work. The
new code must use ``subprocess.run`` with bounded ``timeout=`` so the
work actually completes within the process lifetime.
Asserts cleanup never reaches into shell-mode Popen. Uses
``force_remove=True`` so cleanup actually issues docker calls — the
default persist-mode path is now a no-op (commit 4) and would trivially
pass this assertion without exercising the docker code at all.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
def _forbidden_popen(*args, **kwargs):
raise AssertionError(
f"cleanup must not use subprocess.Popen anymore (issue #20561); "
f"got args={args} kwargs={kwargs}"
)
monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
env = _make_dummy_env(task_id="no-popen-cleanup")
env.cleanup(force_remove=True) # must not raise
def test_wait_for_cleanup_returns_true_when_no_thread_started():
"""``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
(or the env has no live cleanup thread) — atexit calls it unconditionally
across all active envs, so a False return would falsely flag healthy
shutdowns."""
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
# No _cleanup_thread set — simulates an env that was never cleanup()'d.
assert env.wait_for_cleanup(timeout=1.0) is True
def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
"""End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
and reports completion. Atexit relies on this contract to ensure docker
stop/rm actually finishes before the Python interpreter exits.
Uses ``force_remove=True`` so cleanup actually starts a worker thread —
the default persist-mode cleanup is a no-op (commit 4) and never spawns
a thread, so the trivial "no thread" branch of wait_for_cleanup is
already covered by the previous test.
"""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
_mock_subprocess_run(monkeypatch)
_install_fake_thread(monkeypatch)
env = _make_dummy_env(task_id="wait-test")
env.cleanup(force_remove=True)
assert env.wait_for_cleanup(timeout=5.0) is True
def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
"""A DockerEnvironment whose ``__init__`` failed before the container_id
was set (image-pull error, docker daemon down) should still be safe to
cleanup() — the post-creation failure path in callers always tries.
Without this guard the daemon-down case used to NameError on the cleanup
branch."""
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
env._container_id = None
env._persistent = False
env._workspace_dir = None
env._home_dir = None
# No exception expected.
env.cleanup()
# ── Orphan reaper (issue #20561) ──────────────────────────────────
def _now_iso(offset_seconds: int = 0) -> str:
"""Return an RFC3339 timestamp ``offset_seconds`` in the past."""
import datetime
t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
# Format like Docker emits — with nanoseconds-style trailing digits.
return t.isoformat().replace("+00:00", ".123456789Z")
def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
rm_succeeds: bool = True):
"""Build a subprocess.run mock for reaper tests.
* ``ps_ids`` — what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
* ``inspect_responses[cid]`` — what ``docker inspect ... FinishedAt`` returns
for each cid; ``""`` means "field unset".
* ``rm_succeeds`` — whether ``docker rm -f`` returns 0.
Captures every call so tests can assert which containers were rm'd.
"""
calls = []
def _run(cmd, **kwargs):
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if not isinstance(cmd, list) or len(cmd) < 2:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
sub = cmd[1]
if sub == "ps":
return subprocess.CompletedProcess(
cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
)
if sub == "inspect":
# cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
cid = cmd[-1]
return subprocess.CompletedProcess(
cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
)
if sub == "rm":
return subprocess.CompletedProcess(
cmd, 0 if rm_succeeds else 1,
stdout="", stderr="" if rm_succeeds else "no such container",
)
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
return calls
def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
"""No labeled containers → no rm calls, returns 0. Establishes the
happy-path baseline for the orphan reaper (issue #20561)."""
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, "no rm calls expected when ps returns empty"
def test_reap_orphan_removes_stale_exited_container(monkeypatch):
"""An Exited container older than max_age_seconds must be removed.
This is the core repair path for issue #20561 — without the reaper,
SIGKILL'd Hermes processes leak containers permanently."""
old = _now_iso(offset_seconds=900) # 15 minutes ago
calls = _reaper_run_mock(
monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 1
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert len(rms) == 1
assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
def test_reap_orphan_spares_recently_exited_container(monkeypatch):
"""A container exited within max_age_seconds must NOT be reaped — that
container belongs to a Hermes process that just finished and may be
about to be replaced. Conservative window prevents racing sibling
processes."""
recent = _now_iso(offset_seconds=60) # 1 minute ago
calls = _reaper_run_mock(
monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
"""The reaper must pass ``--filter label=hermes-profile=<profile>`` to
docker ps so it never sweeps another profile's containers. A research
profile must not tear down the default profile's stragglers."""
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
)
ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
assert ps_calls, "expected at least one docker ps call"
flat = " ".join(ps_calls[0][0])
assert "label=hermes-profile=research-bot" in flat, (
f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
)
assert "label=hermes-agent=1" in flat, (
f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
)
assert "status=exited" in flat, (
"must filter to exited containers only — running containers may "
"belong to a sibling Hermes process and must NEVER be reaped"
)
def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
"""If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
FinishedAt yet) or an unparseable timestamp, the reaper must leave the
container alone. Defensive — never reap a container whose age we can't
determine."""
calls = _reaper_run_mock(
monkeypatch,
ps_ids=["never-finished", "garbage-ts"],
inspect_responses={
"never-finished": "0001-01-01T00:00:00Z",
"garbage-ts": "not-a-timestamp",
},
)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
assert not rms, (
f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
)
def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
"""If docker ps itself fails (daemon down, permission denied), the
reaper returns 0 without crashing. The reaper is best-effort plumbing,
not a critical path — it must never block container creation."""
def _failing_ps(cmd, **kwargs):
if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
# Must not raise
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
assert removed == 0
def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
"""If ``docker rm -f`` fails on one container (already removed by a
concurrent process, container locked, etc.), the reaper must log and
continue to the next candidate rather than aborting the whole sweep."""
old = _now_iso(offset_seconds=900)
rm_calls = []
def _run(cmd, **kwargs):
if not isinstance(cmd, list) or len(cmd) < 2:
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
sub = cmd[1]
if sub == "ps":
return subprocess.CompletedProcess(
cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
)
if sub == "inspect":
return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
if sub == "rm":
rm_calls.append(cmd[-1])
# cid-b fails; cid-a and cid-c succeed.
if cmd[-1] == "cid-b":
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
removed = docker_env.reap_orphan_containers(
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
)
# All three were attempted, two succeeded.
assert removed == 2
assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
)
def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
"""Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
9 fractional digits), but Python's fromisoformat caps at microseconds.
The helper must trim the extra digits without raising — otherwise every
candidate gets skipped and the reaper does nothing."""
def _run(cmd, **kwargs):
return subprocess.CompletedProcess(
cmd, 0,
stdout="2026-05-28T13:45:00.123456789Z\n",
stderr="",
)
monkeypatch.setattr(docker_env.subprocess, "run", _run)
result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
assert result is not None, "must parse RFC3339 with nanoseconds"
import datetime
assert result.tzinfo == datetime.timezone.utc
assert result.year == 2026 and result.month == 5 and result.day == 28
def test_container_finished_at_returns_none_on_zero_value():
"""Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
map to None so the reaper treats the container as unreapable."""
# Direct test of the parsing helper — no subprocess needed since the
# check happens after the inspect call returns.
import subprocess as _subprocess
class _MockRun:
def __init__(self, stdout):
self.returncode = 0
self.stdout = stdout
self.stderr = ""
import unittest.mock
with unittest.mock.patch.object(
docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
):
result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
assert result is None
@@ -0,0 +1,139 @@
"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
The reaper itself is unit-tested in tests/tools/test_docker_environment.py
under the "Orphan reaper" section. These tests cover the terminal_tool-side
gates: once-per-process behavior, the disable flag, and the
``lifetime_seconds`` doubling that determines the reaper's age threshold.
Issue #20561 — without these gates, parallel subagents would each fire the
reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
opt-out would silently do nothing.
"""
import os
from unittest.mock import patch
import tools.terminal_tool as terminal_tool
def _reset_reaper_gate():
"""Clear the once-per-process flag between tests."""
terminal_tool._docker_orphan_reaper_ran = False
def test_maybe_reap_runs_once_per_process(monkeypatch):
"""The reaper sweep must run at most once per Python interpreter.
Parallel subagents that each call _create_environment(env_type='docker')
would otherwise fire N concurrent docker ps + inspect storms against the
daemon and waste 510s of startup."""
_reset_reaper_gate()
call_count = {"reap": 0}
def _fake_reap(**kwargs):
call_count["reap"] += 1
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
config = {"docker_orphan_reaper": True}
terminal_tool._maybe_reap_docker_orphans(config)
terminal_tool._maybe_reap_docker_orphans(config)
terminal_tool._maybe_reap_docker_orphans(config)
assert call_count["reap"] == 1, (
f"reaper must run exactly once per process; got {call_count['reap']} calls"
)
def test_maybe_reap_respects_disable_flag(monkeypatch):
"""``terminal.docker_orphan_reaper: false`` (via container_config) must
skip the sweep entirely — no docker ps, no inspect, no rm. The escape
hatch for operators running multiple Hermes processes in the same
profile."""
_reset_reaper_gate()
call_count = {"reap": 0}
def _fake_reap(**kwargs):
call_count["reap"] += 1
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
# The once-per-process gate must NOT be tripped when the reaper is
# disabled — that would prevent a subsequent toggle to true from working.
assert terminal_tool._docker_orphan_reaper_ran is False
def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
"""The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
floor). Generous default — gives sibling Hermes processes ample grace
to be replaced without their just-exited containers being yanked."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("max_age_seconds") == 600, (
f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
)
def test_maybe_reap_floors_at_60_seconds(monkeypatch):
"""A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
otherwise get an effective age threshold of zero, which would race the
user's own just-started container creation. Floor at 60s × 2 = 120s."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("max_age_seconds") == 120, (
f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
)
def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
"""The reaper must be scoped to the current Hermes profile — a research
profile must NEVER reap default's containers. Verifies the
profile-filter wiring."""
_reset_reaper_gate()
captured_args = {}
def _fake_reap(**kwargs):
captured_args.update(kwargs)
return 0
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
assert captured_args.get("profile_filter") == "research-bot", (
f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
)
def test_maybe_reap_swallows_exceptions(monkeypatch):
"""A reaper crash (docker daemon down, parse error in helper) must NOT
block env creation. The reaper is best-effort plumbing, not a critical
path; failures get logged at debug level and execution continues."""
_reset_reaper_gate()
def _exploding_reap(**kwargs):
raise RuntimeError("docker daemon ate the cat")
with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
# Must not raise
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
assert "docker_env" in _gateway_env_map_keys()
assert "docker_env" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
def test_docker_persist_across_processes_is_bridged_everywhere():
"""Regression pin for the cross-process container reuse toggle.
``terminal.docker_persist_across_processes`` (issue #20561) controls
whether ``DockerEnvironment.__init__`` probes for and reuses an existing
labeled container at startup, and whether ``cleanup()`` removes the
container on Hermes exit or just stops it (keeping it for the next
process). Same four-bridge invariant as docker_run_as_host_user /
docker_env / docker_mount_cwd_to_workspace — drift between any of the
four sites means ``terminal.docker_persist_across_processes: false`` in
config.yaml silently does nothing for that entry point, leaving the
user unable to opt out of the documented "ONE long-lived container
shared across sessions" behavior.
"""
assert "docker_persist_across_processes" in _cli_env_map_keys()
assert "docker_persist_across_processes" in _gateway_env_map_keys()
assert "docker_persist_across_processes" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
def test_docker_orphan_reaper_is_bridged_everywhere():
"""Regression pin for the startup orphan reaper toggle (issue #20561).
``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
Exited containers from prior SIGKILL'd processes at startup. Same
four-site bridge invariant — drift means
``terminal.docker_orphan_reaper: false`` silently does nothing for one
entry point, and the reaper either runs when the operator disabled it
or fails to run when they enabled it.
"""
assert "docker_orphan_reaper" in _cli_env_map_keys()
assert "docker_orphan_reaper" in _gateway_env_map_keys()
assert "docker_orphan_reaper" in _save_config_env_sync_keys()
assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
+8 -2
View File
@@ -44,11 +44,17 @@ def server(hermes_home):
):
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
@pytest.fixture()
+8 -2
View File
@@ -30,11 +30,17 @@ def server():
import importlib
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
@pytest.fixture()
@@ -34,11 +34,17 @@ def server():
mod = importlib.import_module("tui_gateway.server")
yield mod
# Reset module-level session state without re-importing. importlib.reload
# would re-register the module's atexit hooks (ThreadPoolExecutor
# shutdown, _shutdown_sessions); the duplicates race the stderr
# buffer at interpreter shutdown and surface as Fatal Python error:
# _enter_buffered_busy. Clearing the per-session dicts gives the
# next test a clean slate; _methods is NOT cleared because it's
# populated at module import time and re-registration only happens
# via reload (which we don't do).
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
def test_init_session_attaches_background_review_callback(server, monkeypatch):
+427 -40
View File
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
return {}
# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
# safely through `docker ps --filter label=key=value`. Profile and task names
# can technically contain other characters; sanitize defensively.
_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
def _sanitize_label_value(value: str) -> str:
"""Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
label is always queryable. Used at container-create time; never round-trip
a sanitized value back into application logic.
"""
if not isinstance(value, str) or not value:
return "unknown"
cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
cleaned = cleaned[:63] or "unknown"
return cleaned
def _get_active_profile_name() -> str:
"""Return the active Hermes profile name, or ``"default"`` on any error.
Resolved at container-create time so a single container is permanently
tagged with the profile that created it. Profile switches inside the
same process don't retroactively relabel running containers.
"""
try:
from hermes_cli.profiles import get_active_profile_name
return get_active_profile_name() or "default"
except Exception:
return "default"
def reap_orphan_containers(
*,
max_age_seconds: int = 600,
profile_filter: str | None = None,
docker_exe: str | None = None,
) -> int:
"""Remove stale hermes-tagged containers left behind by prior processes.
Targets containers that match all of:
* ``label=hermes-agent=1`` (created by this codebase)
* ``status=exited`` (running containers are NEVER reaped — they may
belong to a sibling Hermes process whose reuse path will pick them
up; killing them would crash the sibling mid-command)
* (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
caller's profile by default; a hermes process in profile A must not
tear down profile B's containers)
* ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
process that just exited and is about to be replaced doesn't get
its container yanked out from under it)
Returns the number of containers removed. Best-effort: any failure
(docker daemon unreachable, slow inspect, parse error) is logged at
debug level and the function returns whatever it managed before the
failure. Safe to call repeatedly; idempotent.
Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
terminal exits that bypass the ``atexit`` cleanup hook. Without it,
even with the cleanup-fix in the prior commit, a hard-killed Hermes
process leaves its container behind permanently because there's no
subsequent Hermes process scheduled to reuse that exact (task, profile)
pair.
"""
docker = docker_exe or find_docker() or "docker"
filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
if profile_filter:
filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
try:
listing = subprocess.run(
[docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
capture_output=True, text=True, timeout=15, check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker ps failed: %s", e)
return 0
if listing.returncode != 0:
logger.debug(
"orphan reaper docker ps returned %d: %s",
listing.returncode, listing.stderr.strip(),
)
return 0
candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
if not candidate_ids:
return 0
# Inspect each candidate to get FinishedAt; reap only those exited
# long enough ago. Doing this per-container (rather than bulk inspect)
# keeps the failure blast radius to one container at a time.
import datetime
now = datetime.datetime.now(datetime.timezone.utc)
removed = 0
for cid in candidate_ids:
finished_at = _container_finished_at(docker, cid)
if finished_at is None:
# Couldn't determine age — be conservative and leave it alone.
continue
age = (now - finished_at).total_seconds()
if age < max_age_seconds:
continue
try:
result = subprocess.run(
[docker, "rm", "-f", cid],
capture_output=True, text=True, timeout=30,
)
if result.returncode == 0:
removed += 1
logger.info(
"Reaped orphan container %s (exited %d seconds ago)",
cid[:12], int(age),
)
else:
logger.debug(
"docker rm -f %s failed: %s",
cid[:12], result.stderr.strip(),
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
return removed
def _container_finished_at(docker_exe: str, container_id: str):
"""Parse ``docker inspect`` FinishedAt for *container_id*.
Returns a timezone-aware datetime, or ``None`` if the field is missing,
unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
for never-finished containers. ``None`` means "don't reap" — the caller
leaves the container alone.
"""
try:
result = subprocess.run(
[docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
capture_output=True, text=True, timeout=10, check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
return None
if result.returncode != 0:
return None
raw = result.stdout.strip()
if not raw or raw.startswith("0001-01-01"):
return None
# Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
# Python's fromisoformat handles microseconds but not nanoseconds; trim.
import re as _re
raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
raw = raw.replace("Z", "+00:00")
try:
import datetime
return datetime.datetime.fromisoformat(raw)
except ValueError as e:
logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
return None
def find_docker() -> Optional[str]:
"""Locate the docker (or podman) CLI binary.
@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
auto_mount_cwd: bool = False,
run_as_host_user: bool = False,
extra_args: list = None,
persist_across_processes: bool = True,
):
if cwd == "~":
cwd = "/root"
super().__init__(cwd=cwd, timeout=timeout)
self._persistent = persistent_filesystem
self._persist_across_processes = persist_across_processes
self._task_id = task_id
self._forward_env = _normalize_forward_env_names(forward_env)
self._env = _normalize_env_dict(env)
self._container_id: Optional[str] = None
self._labels: dict[str, str] = {}
logger.info(f"DockerEnvironment volumes: {volumes}")
# Ensure volumes is a list (config.yaml could be malformed)
if volumes is not None and not isinstance(volumes, list):
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):
# Start the container directly via `docker run -d`.
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
run_cmd = [
self._docker_exe, "run", "-d",
"--init", # tini/catatonit as PID 1 — reaps zombie children
"--name", container_name,
"-w", cwd,
*all_run_args,
image,
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
# Labels make hermes-created containers identifiable to:
# * the orphan reaper (`hermes-agent=1` for the global sweep filter)
# * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
# * operators running `docker ps --filter label=hermes-agent=1`
# Values are limited to the safe character set defined by
# _sanitize_label_value(); the active Hermes profile is captured at
# container-start time and never changes for the container's lifetime.
profile_name = _sanitize_label_value(_get_active_profile_name())
task_label = _sanitize_label_value(task_id)
label_args = [
"--label", "hermes-agent=1",
"--label", f"hermes-task-id={task_label}",
"--label", f"hermes-profile={profile_name}",
]
logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run(
run_cmd,
capture_output=True,
text=True,
timeout=120, # image pull may take a while
check=True,
)
self._container_id = result.stdout.strip()
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
self._labels = {
"hermes-agent": "1",
"hermes-task-id": task_label,
"hermes-profile": profile_name,
}
# Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
# container shared across sessions"). If a prior Hermes process
# already started a container for this (task_id, profile) and it
# still exists, attach to it instead of starting a fresh one. This
# restores the documented contract; opt out via
# ``terminal.docker_persist_across_processes: false``.
#
# Reuse matches on labels only — we deliberately do NOT compare image
# / mounts / resources. Operators who need a fresh container after
# changing those settings should set ``docker_persist_across_processes:
# false`` (or run ``docker rm -f`` against the labeled container) to
# force a clean start.
reused = False
if persist_across_processes:
existing = self._find_reusable_container(task_label, profile_name)
if existing is not None:
container_id, state = existing
self._container_id = container_id
if state != "running":
try:
subprocess.run(
[self._docker_exe, "start", container_id],
capture_output=True,
text=True,
timeout=30,
check=True,
)
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
logger.warning(
"Failed to start existing container %s (state=%s): "
"%s — falling back to a fresh container.",
container_id[:12], state, e,
)
self._container_id = None
if self._container_id:
logger.info(
"Reusing container %s (task=%s, profile=%s, prior state=%s)",
container_id[:12], task_label, profile_name, state,
)
reused = True
if not reused:
run_cmd = [
self._docker_exe, "run", "-d",
"--init", # tini/catatonit as PID 1 — reaps zombie children
"--name", container_name,
*label_args,
"-w", cwd,
*all_run_args,
image,
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
]
logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run(
run_cmd,
capture_output=True,
text=True,
timeout=120, # image pull may take a while
check=True,
)
self._container_id = result.stdout.strip()
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
# Build the init-time env forwarding args (used only by init_session
# to inject host env vars into the snapshot; subsequent commands get
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
return _storage_opt_ok
def cleanup(self):
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
if self._container_id:
try:
# Stop in background so cleanup doesn't block
stop_cmd = (
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
)
subprocess.Popen(stop_cmd, shell=True)
except Exception as e:
logger.warning("Failed to stop container %s: %s", self._container_id, e)
def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
"""Look for an existing container labeled for this (task, profile).
Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
failure (including ``docker ps`` itself failing). State is one of the
values Docker reports via ``{{.State}}`` — e.g. ``running``, ``exited``,
``created``, ``paused``, ``restarting``, ``dead``. The caller decides
whether the state warrants ``docker start`` before reuse.
Restricted to the docker-stored label set this class creates; never
matches containers that happened to be named ``hermes-*`` but were
started by some other tool.
"""
try:
result = subprocess.run(
[
self._docker_exe, "ps", "-a",
"--filter", "label=hermes-agent=1",
"--filter", f"label=hermes-task-id={task_label}",
"--filter", f"label=hermes-profile={profile_label}",
"--format", "{{.ID}}\t{{.State}}",
],
capture_output=True,
text=True,
timeout=10,
check=False,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.debug("docker ps probe failed: %s — will start a fresh container", e)
return None
if result.returncode != 0:
logger.debug(
"docker ps probe returned %d: %s — will start a fresh container",
result.returncode, result.stderr.strip(),
)
return None
lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
if not lines:
return None
# Multiple matches are unusual (one (task, profile) should produce one
# container) but can happen if a previous Hermes process crashed
# mid-cleanup. Prefer a running one if present; otherwise pick the
# first listed. Stale duplicates get reaped by the orphan-reaper in a
# follow-up commit; we don't try to be heroic about them here.
running = None
first = None
for ln in lines:
parts = ln.split("\t", 1)
if len(parts) != 2:
continue
cid, state = parts[0], parts[1].lower()
if first is None:
first = (cid, state)
if state == "running" and running is None:
running = (cid, state)
return running or first
def cleanup(self, *, force_remove: bool = False):
"""Tear down the container according to persist mode and *force_remove*.
Persist-mode (``persist_across_processes=True``, the default) leaves the
container **running** untouched. The docs promise "ONE long-lived
container shared across sessions" and stopping it on every Hermes exit
breaks that promise:
* Background processes inside the container (``npm run dev``, watchers,
long-running pytest) get killed every time the user runs ``/quit``.
* Every reuse requires ``docker start`` + waiting for the container to
come back up, adding 12s to the first tool call of the new session.
* The user-visible difference between "ONE long-lived container" and
"a new container that happens to share state" is exactly this:
processes survive in the former, die in the latter.
Resource reclamation for the persist-mode case lives in the
``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
Hermes process touches a labeled container for ``2 × lifetime_seconds``
it gets ``docker rm -f``'d at the next Hermes startup. That covers the
SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
container on every graceful exit.
Opt-out mode (``persist_across_processes=False``) still does
``docker stop`` + ``docker rm -f`` on every cleanup, matching the
pre-PR behavior for users who explicitly want per-process isolation.
``force_remove=True`` overrides persist mode and always tears the
container down (``docker stop`` + ``docker rm -f``). This is the
explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
resets, or any caller that wants a guaranteed fresh container on next
``DockerEnvironment(task_id=...)``. No current caller passes
``force_remove=True``; the parameter is here so the explicit-teardown
semantics can be wired up later without changing this method's
signature.
Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
(not the racy ``Popen(... &)`` pattern from before PR #33645). The
atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
thread to finish before the interpreter exits, so ``docker stop`` /
``docker rm`` actually completes when we do trigger it.
"""
container_id = self._container_id
if not container_id:
# Still drop the bind-mount dirs if any were allocated and we're
# NOT in persist mode (persist mode preserves them).
if not self._persistent:
# Also schedule removal (stop only leaves it as stopped)
try:
subprocess.Popen(
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
shell=True,
)
except Exception:
pass
self._container_id = None
for d in (self._workspace_dir, self._home_dir):
if d:
shutil.rmtree(d, ignore_errors=True)
return
if not self._persistent:
# Decide what to actually do. Three cases:
#
# force_remove=True → stop + rm (explicit teardown)
# persist_across_processes=True → no-op (leave container running)
# persist_across_processes=False → stop + rm (per-process isolation)
#
# The persist-mode no-op is the issue-#20561 contract: the container
# outlives Hermes processes, processes inside it stay alive, and
# reuse on next startup is instant.
if force_remove:
should_stop = True
should_remove = True
elif self._persist_across_processes:
# No-op for the container. Drop the in-process handle so a fresh
# __init__ will re-probe via labels (and find the running
# container) instead of trying to reuse a stale Python reference.
self._container_id = None
return
else:
should_stop = True
should_remove = True
# Capture state needed by the worker before we null out the attrs —
# the worker thread can outlive ``self``.
docker_exe = self._docker_exe
log_id = container_id[:12]
def _do_cleanup() -> None:
if should_stop:
try:
subprocess.run(
[docker_exe, "stop", "-t", "10", container_id],
capture_output=True, timeout=30,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("docker stop %s timed out / failed: %s", log_id, e)
if should_remove:
try:
subprocess.run(
[docker_exe, "rm", "-f", container_id],
capture_output=True, timeout=30,
)
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("docker rm -f %s failed: %s", log_id, e)
# Daemon thread: doesn't block interpreter exit (atexit returns
# promptly), but unlike the old ``Popen(... &)`` shell trick the
# Python-level join semantics let the thread actually run to
# completion if the interpreter is still alive. atexit registers
# ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
# outstanding cleanups, so most exits complete the work cleanly.
import threading
t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
t.start()
self._cleanup_thread = t
self._container_id = None
# Bind-mount dir teardown only runs when we actually removed the
# container (the dirs are the container's filesystem state; keeping
# them around with no container would orphan the data on disk).
if should_remove and not self._persistent:
for d in (self._workspace_dir, self._home_dir):
if d:
shutil.rmtree(d, ignore_errors=True)
def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
"""Block up to *timeout* seconds for the cleanup worker thread.
Returns ``True`` if the thread finished (or no thread was started),
``False`` on timeout. The atexit hook in terminal_tool.py calls this
on every active environment so docker stop/rm actually completes
before the Python process exits — without this, ``hermes /quit``
races the interpreter shutdown and leaves stopped containers behind.
"""
thread = getattr(self, "_cleanup_thread", None)
if thread is None or not thread.is_alive():
return True
thread.join(timeout=timeout)
return not thread.is_alive()
+143 -3
View File
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
_cleanup_thread = None
_cleanup_running = False
# Once-per-process guard for the docker orphan reaper (issue #20561).
# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
# calls for parallel subagents won't re-trigger the sweep.
_docker_orphan_reaper_ran = False
_docker_orphan_reaper_lock = threading.Lock()
def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
"""Run the docker orphan reaper once per process, if enabled.
Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
profile that match the issue #20561 leak class — containers left behind
by Hermes processes that exited without firing ``atexit`` (SIGKILL,
OOM, terminal-window-close). The reaper is conservative by default:
only Exited containers older than ``2 × lifetime_seconds`` and scoped to
the current profile.
Gates:
* ``terminal.docker_orphan_reaper: false`` disables it entirely (the
operator opted out — usually because they're running multiple
Hermes processes in the same profile and don't trust the
conservative defaults).
* ``_docker_orphan_reaper_ran`` flag — sweep runs once per Python
interpreter, not on every subagent / RL-rollout / parallel
``terminal()`` call.
"""
global _docker_orphan_reaper_ran
if not container_config.get("docker_orphan_reaper", True):
return
# Cheap double-checked-locking: read without the lock, take the lock
# only on first run, recheck inside.
if _docker_orphan_reaper_ran:
return
with _docker_orphan_reaper_lock:
if _docker_orphan_reaper_ran:
return
_docker_orphan_reaper_ran = True
# 2 × lifetime_seconds gives sibling Hermes processes a generous grace
# window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
# doesn't get an instant-reap that races their own setup.
# ``container_config`` only carries container_* keys, so read
# lifetime_seconds from the env var the rest of the module uses.
try:
lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
except (TypeError, ValueError):
lifetime = 300
lifetime = max(60, lifetime)
max_age = lifetime * 2
try:
from tools.environments.docker import (
reap_orphan_containers, _get_active_profile_name,
)
except ImportError:
return
try:
profile = _get_active_profile_name()
removed = reap_orphan_containers(
max_age_seconds=max_age, profile_filter=profile,
)
if removed:
logger.info(
"Docker orphan reaper removed %d stale container(s) for profile %s",
removed, profile,
)
except Exception as e:
# Never fail the env-creation path because of a janitor problem.
logger.debug("Docker orphan reaper raised: %s", e)
# Per-task environment overrides registry.
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
"docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
"docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
"docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
# Cross-process container reuse (issue #20561). The docs claim
# "ONE long-lived container shared across sessions" — this toggle
# makes that real by probing for a labeled container at startup and
# attaching to it instead of always starting a fresh one. Set to
# ``false`` for hard per-process isolation (no reuse, container is
# removed on exit).
"docker_persist_across_processes": os.getenv(
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
).lower() in {"true", "1", "yes"},
# Startup orphan reaper for hermes-tagged containers left behind by
# crashed / SIGKILL'd previous processes that bypassed atexit.
# Conservative: only sweeps Exited containers older than 2× the
# idle-reap window AND scoped to the current profile. Issue #20561.
"docker_orphan_reaper": os.getenv(
"TERMINAL_DOCKER_ORPHAN_REAPER", "true"
).lower() in {"true", "1", "yes"},
}
@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
return _LocalEnvironment(cwd=cwd, timeout=timeout)
elif env_type == "docker":
# One-shot orphan reaper: clean up labeled containers left behind by
# prior Hermes processes that hit SIGKILL / OOM / a closed terminal
# before the atexit cleanup hook could run. Gated to once per
# process so concurrent _create_environment calls (parallel
# subagents, RL benchmarks) don't run the reaper N times.
# Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
_maybe_reap_docker_orphans(cc)
return _DockerEnvironment(
image=image, cwd=cwd, timeout=timeout,
cpu=cpu, memory=memory, disk=disk,
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
env=docker_env,
run_as_host_user=cc.get("docker_run_as_host_user", False),
extra_args=docker_extra_args,
persist_across_processes=cc.get("docker_persist_across_processes", True),
)
elif env_type == "singularity":
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
return cleaned
def cleanup_vm(task_id: str):
"""Manually clean up a specific environment by task_id."""
def cleanup_vm(task_id: str, *, force_remove: bool = False):
"""Manually clean up a specific environment by task_id.
*force_remove* (default False) is forwarded to backends that accept it
— currently only ``DockerEnvironment``. The default of False matches
session-lifecycle semantics: this function is called from
``AIAgent.close()`` (TUI session close, gateway session teardown) and the
per-turn cleanup branch for non-persistent envs, both of which should
honor the user's persist-mode preference. Stopping the container here
would defeat the "ONE long-lived container shared across sessions"
contract — exactly the bug Ben reported when the container was killed
on every TUI session close.
Pass ``force_remove=True`` for actual user-initiated teardown
(e.g. ``/reset``-style flows that haven't been wired yet, or future
"destroy my sandbox" commands).
The idle reaper passes the env through ``env.cleanup()`` directly (not
via this function), so persist-mode idle envs are similarly no-op'd —
only the orphan reaper at next startup reclaims them.
"""
# Remove from tracking dicts while holding the lock, but defer the
# actual (potentially slow) env.cleanup() call to outside the lock
# so other tool calls aren't blocked.
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):
try:
if hasattr(env, 'cleanup'):
env.cleanup()
# Pass force_remove only if the env's cleanup() accepts it
# (DockerEnvironment after issue #20561; other backends don't).
import inspect
sig = inspect.signature(env.cleanup)
if "force_remove" in sig.parameters:
env.cleanup(force_remove=force_remove)
else:
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
if _active_environments:
count = len(_active_environments)
logger.info("Shutting down %d remaining sandbox(es)...", count)
# Snapshot the env objects BEFORE cleanup_all_environments empties
# the dict; we need them to wait on docker cleanup threads after the
# registry has been cleared.
envs_to_wait = list(_active_environments.values())
cleanup_all_environments()
# Block briefly so docker stop/rm actually completes before the
# interpreter exits. Issue #20561 — without this join, the daemon
# cleanup threads were getting torn down mid-`docker stop`, leaving
# Exited containers piled up on the host.
for env in envs_to_wait:
wait_fn = getattr(env, "wait_for_cleanup", None)
if wait_fn is None:
continue
try:
wait_fn(timeout=15.0)
except Exception as e: # never block shutdown on a bad backend
logger.debug("wait_for_cleanup raised on exit: %s", e)
atexit.register(_atexit_cleanup)
@@ -1746,6 +1884,8 @@ def terminal_tool(
"docker_env": config.get("docker_env", {}),
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
"docker_extra_args": config.get("docker_extra_args", []),
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
}
local_config = None
Generated
+1 -1
View File
@@ -1589,7 +1589,7 @@ wheels = [
[[package]]
name = "hermes-agent"
version = "0.15.0"
version = "0.15.1"
source = { editable = "." }
dependencies = [
{ name = "croniter" },
+65 -4
View File
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.
```yaml
terminal:
@@ -138,8 +138,11 @@ terminal:
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
docker_run_as_host_user: false # See "Running container as host user" below
docker_forward_env: # Env vars to forward into container
docker_forward_env: # Host env vars to forward into container
- "GITHUB_TOKEN"
docker_env: # Literal env vars to inject (KEY=value)
DEBUG: "1"
PYTHONUNBUFFERED: "1"
docker_volumes: # Host directory mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
@@ -151,14 +154,49 @@ terminal:
container_cpu: 1 # CPU cores (0 = unlimited)
container_memory: 5120 # MB (0 = unlimited)
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
container_persistent: true # Persist /workspace and /root across sessions
container_persistent: true # Persist /workspace and /root bind-mount dirs
# Cross-process container reuse (defaults match the "one long-lived
# container shared across sessions" contract — see Container lifecycle).
docker_persist_across_processes: true # Reuse container across Hermes restarts
docker_orphan_reaper: true # Sweep abandoned Exited containers at startup
# Cross-backend lifecycle settings (apply to docker as well)
timeout: 180 # Per-command timeout in seconds
lifetime_seconds: 300 # Idle-reaper window; also feeds 2× orphan-reaper threshold
```
**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
**`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
#### Container lifecycle
Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
- `hermes-agent=1` — marks it as Hermes-managed
- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
| Trigger | When it fires |
|---|---|
| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
Edge cases worth knowing:
- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
#### Environment variable overrides
Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
| Env var | Maps to | Notes |
|---|---|---|
| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
### SSH Backend
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
+97 -71
View File
@@ -54,12 +54,7 @@ This behavior applies to the s6-based image only. Earlier (tini-based) images st
:::
:::note Where gateway logs go
Inside the s6 image, the supervised gateway's output is tee'd to two destinations:
- **`docker logs <container>`** — every line in real time (raw, no extra prefix). This is the same stream you'd get from a foreground gateway, so existing `docker logs --follow` / `--timestamps` / log-shipper integrations work unchanged.
- **`${HERMES_HOME}/logs/gateways/<profile>/current`** (mapped to `~/.hermes/logs/gateways/<profile>/current` on the host via the volume mount) — rotated, with an ISO 8601 timestamp prepended per line. Rotation is 10 archives × 1 MB each, so it can't fill the disk. This is what `hermes logs` reads and what survives container restarts.
The per-profile reconciler keeps a separate audit log at `${HERMES_HOME}/logs/container-boot.log` — one line per profile per container boot, recording whether each gateway was restored to its prior state.
See the [Where the logs go](#where-the-logs-go) section below for the full routing map (per-profile gateways, dashboard, boot reconciler, container-wide `docker logs`).
:::
Note: the API server is gated on `API_SERVER_ENABLED=true`. To expose it beyond `127.0.0.1` inside the container, also set `API_SERVER_HOST=0.0.0.0` and an `API_SERVER_KEY` (minimum 8 characters — generate one with `openssl rand -hex 32`). Example:
@@ -81,7 +76,7 @@ Opening any port on an internet facing machine is a security risk. You should no
## Running the dashboard
The built-in web dashboard runs as an optional side-process inside the same container as the gateway. Set `HERMES_DASHBOARD=1` to run the dashboard on container loopback (`127.0.0.1`) by default:
The built-in web dashboard runs as a supervised s6-rc service alongside the gateway in the same container. Set `HERMES_DASHBOARD=1` to bring it up:
```sh
docker run -d \
@@ -89,54 +84,38 @@ docker run -d \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
-p 9119:9119 \
-e HERMES_DASHBOARD=1 \
nousresearch/hermes-agent gateway run
```
The entrypoint starts `hermes dashboard` in the background (running as the non-root `hermes` user) before `exec`-ing the main command. Dashboard output is prefixed with `[dashboard]` in `docker logs` so it's easy to separate from gateway logs.
The dashboard is supervised by s6 — if it crashes, `s6-supervise` restarts it automatically after a short backoff. Dashboard stdout/stderr is forwarded to `docker logs <container>` (no prefix; the gateway's own output now lives in a per-profile s6-log file — see [Where the logs go](#where-the-logs-go) below — so the two streams don't clash).
| Environment variable | Description | Default |
|---------------------|-------------|---------|
| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to launch the dashboard alongside the main command | *(unset — dashboard not started)* |
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `127.0.0.1` |
| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to enable the supervised dashboard service | *(unset — service is registered but stays down)* |
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `0.0.0.0` |
| `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
| `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
| `HERMES_DASHBOARD_INSECURE` | Set to `1` (or `true` / `yes`) to bind without the OAuth auth gate. Only use on trusted networks behind a reverse proxy without the OAuth contract — the dashboard exposes API keys and session data | *(unset — gate enforced when a `DashboardAuthProvider` is registered)* |
By default, the dashboard stays on loopback (`127.0.0.1`) to avoid exposing
the web surface over the network. To publish it intentionally, set
`HERMES_DASHBOARD_HOST=0.0.0.0`. The dashboard's OAuth auth gate engages
automatically whenever:
The dashboard inside the container defaults to binding `0.0.0.0` — without it, the published `-p 9119:9119` port would not be reachable from the host. To restrict the bind to container loopback (for sidecar / reverse-proxy setups), set `HERMES_DASHBOARD_HOST=127.0.0.1`.
1. The bind host is non-loopback, **and**
The dashboard's OAuth auth gate engages automatically when both of the following are true:
1. The bind host is non-loopback (e.g. the default `0.0.0.0` inside the container), **and**
2. A `DashboardAuthProvider` plugin is registered.
The bundled `dashboard_auth/nous` provider activates whenever
`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see
[Web Dashboard → Authentication](features/web-dashboard.md)). With the
gate engaged, browser callers are redirected to the configured portal's
OAuth flow before they can reach any protected route.
The bundled `dashboard_auth/nous` provider activates whenever `HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see [Web Dashboard → Authentication](features/web-dashboard.md)). With the gate engaged, browser callers are redirected to the configured portal's OAuth flow before they can reach any protected route.
If no provider is registered and the bind is non-loopback, the dashboard
**fails closed at startup** with a specific error pointing at the
missing env var. To opt out of the gate explicitly — for a trusted-LAN
deployment behind your own reverse proxy without the OAuth contract —
set `HERMES_DASHBOARD_INSECURE=1`. This re-enables the legacy "no auth,
loud warning" mode and is the only path that disables the gate; the bind
host does not implicitly determine `--insecure` anymore.
If no provider is registered and the bind is non-loopback, the dashboard **fails closed at startup** with a specific error pointing at the missing env var. To opt out of the gate explicitly — for a trusted-LAN deployment behind your own reverse proxy without the OAuth contract — set `HERMES_DASHBOARD_INSECURE=1`. This is the **only** path that disables the gate; the bind host alone never implies `--insecure` (it used to, but that predated the OAuth gate and silently disabled it on every container-deployed dashboard).
:::note
The dashboard runs as a supervised s6 service inside the container. If
the dashboard process crashes, s6-overlay restarts it automatically
after a short backoff — you'll see a new PID without needing to
restart the container. Logs and crash output are visible via
`docker logs <container>` (s6 forwards service stdout/stderr there).
Running the dashboard as a separate container is not supported: its
gateway-liveness detection requires a shared PID namespace with the
gateway process.
:::warning `HERMES_DASHBOARD_INSECURE=1` exposes API keys
Opting out of the OAuth gate serves the dashboard's API surface (including model keys and session data) to anyone who can reach the published port. Only enable it when you have your own auth layer in front, or on a trusted LAN you fully control.
:::
Running the dashboard as a separate container is not supported: its gateway-liveness detection requires a shared PID namespace with the gateway process.
## Running interactively (CLI chat)
To open an interactive chat session against a running data directory:
@@ -179,37 +158,60 @@ Never run two Hermes **gateway** containers against the same data directory simu
## Multi-profile support
Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` subdirectories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **Inside the official Docker image, the s6 supervision tree treats each profile as a first-class supervised service**, so the recommended deployment is **one container hosting all profiles**.
Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
Each profile created with `hermes profile create <name>` gets:
- A dedicated s6 service slot at `/run/service/gateway-<name>/`, registered dynamically by the runtime — no container rebuild required.
- Auto-restart on crash, backoff-managed by `s6-supervise`.
- Per-profile rotated logs at `${HERMES_HOME}/logs/gateways/<name>/current` (10 archives × 1 MB each).
- State persistence across container restarts: the boot-time reconciler reads `gateway_state.json` from each profile directory and brings the slot back up only for profiles whose last recorded state was `running`. Stopped profiles stay stopped.
The lifecycle commands you'd run on the host work the same way from inside the container:
```sh
# Work profile
docker run -d \
--name hermes-work \
--restart unless-stopped \
-v ~/.hermes-work:/opt/data \
-p 8642:8642 \
nousresearch/hermes-agent gateway run
# Create a profile — registers the gateway-<name> s6 slot.
docker exec hermes hermes profile create coder
# Personal profile
docker run -d \
--name hermes-personal \
--restart unless-stopped \
-v ~/.hermes-personal:/opt/data \
-p 8643:8642 \
nousresearch/hermes-agent gateway run
# Start / stop / restart — dispatches s6-svc; the gateway lifecycle survives docker restart.
docker exec hermes hermes -p coder gateway start
docker exec hermes hermes -p coder gateway stop
docker exec hermes hermes -p coder gateway restart
# Status — reports `Manager: s6 (container supervisor)` inside the container.
docker exec hermes hermes -p coder gateway status
# Remove a profile — tears down the s6 slot too.
docker exec hermes hermes profile delete coder
```
Why separate containers over profiles in Docker:
Under the hood, `hermes gateway start/stop/restart` inside the container is intercepted and routed to `s6-svc` against the right service directory; you don't need to learn the s6 commands directly. For raw supervisor state, use `/command/s6-svstat /run/service/gateway-<name>` (note `/command/` is on PATH only for processes spawned by the supervision tree — when calling from `docker exec`, pass the absolute path).
- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
### Why one container with many profiles, not many containers
In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
Before the s6 migration, "one container per profile" was the recommended pattern because there was no in-container supervisor to manage multiple gateways. With s6 as PID 1, that's no longer necessary, and the single-container layout is simpler in almost every dimension:
| | One container, many profiles | One container per profile |
|---|---|---|
| Disk overhead | One image, one bundled venv, one Playwright cache | N images / N caches |
| Memory overhead | Shared Python interpreter cache, shared node_modules | Duplicated per container |
| Profile creation | `docker exec ... hermes profile create <name>` (seconds) | New `docker run` invocation + port allocation + bind-mount config |
| Per-profile crash recovery | `s6-supervise` auto-restart | Docker's `--restart unless-stopped` (slower, kills sibling work) |
| Logs | Per-profile rotated file via `s6-log`, plus container-boot audit log | `docker logs <name>` per container — no built-in rotation |
| Backup | One `~/.hermes` directory | N directories to coordinate |
The default profile (`default`) is always registered on first boot, so a fresh container ships with one supervised gateway out of the box. Additional profiles are pure runtime adds.
### When you DO want a separate container
Profile-in-container is the default. Run a separate container per profile only when you have a specific reason:
- **Resource isolation per workload** — e.g. a runaway browser-tool session in profile A shouldn't be able to OOM profile B. Containers give you `--memory` / `--cpus` per profile.
- **Independent image pinning** — different upstream image tags per workload.
- **Network segmentation** — distinct Docker networks per profile (e.g. one customer-facing, one internal).
- **Compliance / blast radius** — distinct credentials never share an OS-level process tree.
In those cases, declare one service per profile with distinct `container_name`, `volumes`, and `ports`:
```yaml
services:
@@ -234,6 +236,24 @@ services:
- ~/.hermes-personal:/opt/data
```
The warning from [Persistent volumes](#persistent-volumes) still applies: never point two containers at the same `~/.hermes` directory simultaneously. The s6 supervisor inside each container manages its own profile set; cross-container sharing of a data volume corrupts session files and memory stores.
## Where the logs go
The s6 container has four distinct log surfaces, and "why isn't my gateway showing anything in `docker logs`" is a common surprise. Cheatsheet:
| Source | Where it lands | How to read it |
|---|---|---|
| **Per-profile gateway** (`hermes gateway run` and per-profile gateways under s6) | Tee'd to two places: `docker logs <container>` (real time, no extra prefix) **and** `${HERMES_HOME}/logs/gateways/<profile>/current` (rotated, ISO-8601 timestamped, 10 archives × 1 MB each) | `docker logs -f hermes` or `tail -F ~/.hermes/logs/gateways/default/current` on the host |
| **Dashboard** (when `HERMES_DASHBOARD=1`) | `docker logs <container>` (no prefix) | `docker logs -f hermes` — interleaved with gateway lines |
| **Boot reconciler** (records which profile gateways were restored on each container start) | `${HERMES_HOME}/logs/container-boot.log` (append-only audit log) | `tail -F ~/.hermes/logs/container-boot.log` |
| **Generic Hermes logs** (`agent.log`, `errors.log`) | `${HERMES_HOME}/logs/` (profile-aware) | `docker exec hermes hermes logs --follow [--level WARNING] [--session <id>]` |
Two practical consequences worth knowing:
- The file copy at `logs/gateways/<profile>/current` is what survives container restarts. `docker logs` only retains output from the current container's lifetime (and is wiped on `docker rm`); the rotated files persist on the bind-mounted volume.
- The boot reconciler's audit line shape is `<iso-timestamp> profile=<name> prior_state=<state> action=<registered|started>`, so a quick `grep profile=coder ~/.hermes/logs/container-boot.log` reveals when a given profile was last restored and whether s6 auto-started it.
## Environment variable forwarding
API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
@@ -249,7 +269,7 @@ docker run -it --rm \
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
:::note Looking for Docker as the **terminal backend**?
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
:::
## Docker Compose example
@@ -281,7 +301,7 @@ services:
cpus: "2.0"
```
Start with `docker compose up -d` and view logs with `docker compose logs -f`. Dashboard output is prefixed with `[dashboard]` so it's easy to filter from gateway logs.
Start with `docker compose up -d` and view logs with `docker compose logs -f`. The supervised gateway's stdout is also tee'd to `${HERMES_HOME}/logs/gateways/<profile>/current` on the volume — see [Where the logs go](#where-the-logs-go) for the full routing map.
## Optional: Linux desktop audio bridge
@@ -415,24 +435,28 @@ The container ENTRYPOINT is now `/init` (s6-overlay), not `/usr/bin/tini`. All f
Do not override the image entrypoint unless you keep `/init` (or, equivalently, the legacy `docker/entrypoint.sh` shim that forwards to the stage2 hook) in the command chain. s6-overlay's `/init` runs as root so it can chown the volume on first boot, then drops to the `hermes` user via `s6-setuidgid` for every supervised service AND for the main program. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
:::
### Per-profile gateway supervision
### `docker exec` automatically drops to the `hermes` user
Inside the container, each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`. The lifecycle commands you'd run on the host work the same way:
`docker exec hermes <cmd>` defaults to running as root inside the container, but the image ships a thin shim at `/opt/hermes/bin/hermes` (earliest on PATH) that detects root callers and transparently re-execs through `s6-setuidgid hermes`. So `docker exec hermes login`, `docker exec hermes profile create …`, `docker exec hermes setup`, etc. all write files owned by UID 10000 — i.e. readable by the supervised gateway — with no extra `--user` flag needed. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents inside the container) hit a short-circuit that exec's the venv binary directly, so there's no overhead on the hot paths.
If you specifically need a `docker exec` that retains root semantics (diagnostic sessions, inspecting root-only state, files outside `/opt/data` that root happens to own), opt out per invocation:
```sh
hermes profile create coder # registers gateway-coder s6 slot
hermes -p coder gateway start # s6-svc -u → supervised gateway
hermes -p coder gateway stop # s6-svc -d → service down
hermes -p coder gateway restart # s6-svc -t → SIGTERM the supervisor
hermes profile delete coder # tears down the s6 slot
docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 hermes <cmd>
```
The shim accepts `1` / `true` / `yes` (case-insensitive). Anything else — including typos like `=0` — falls through to the drop, so silent opt-outs aren't possible. If `s6-setuidgid` isn't available (custom builds that stripped s6-overlay), the shim refuses to run as root and exits 126 instead, surfacing the broken privilege model loudly rather than regressing to the historical footgun where `docker exec hermes login` would write `auth.json` as `root:root` and break the supervised gateway's auth on every chat platform message.
### Per-profile gateway supervision
Each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`, with state-persistent auto-restart across container restarts. See [Multi-profile support](#multi-profile-support) above for the user-facing workflow and the lifecycle commands.
**Supervision benefits over the pre-s6 image:**
- Gateway crashes are auto-restarted by `s6-supervise` after a ~1s backoff.
- Dashboard crashes are auto-restarted (set `HERMES_DASHBOARD=1` to start it).
- Dashboard, when enabled with `HERMES_DASHBOARD=1`, is supervised on the same supervision tree and gets the same auto-restart treatment.
- `docker restart` preserves running gateways: the cont-init reconciler reads `$HERMES_HOME/profiles/<name>/gateway_state.json` and brings the slot back up if the last recorded state was `running`. Stopped gateways stay stopped.
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot.
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot. See [Where the logs go](#where-the-logs-go) for the full routing map.
`hermes status` inside the container reports `Manager: s6 (container supervisor)`. Use `/command/s6-svstat /run/service/gateway-<name>` for the raw supervisor view (note `/command/` is on PATH for supervision-tree processes only; pass the absolute path when calling from `docker exec`).
@@ -692,6 +716,8 @@ The container's stage2 hook drops privileges to the non-root `hermes` user (UID
chmod -R 755 ~/.hermes
```
`docker exec hermes <cmd>` automatically drops to UID 10000 too — see [`docker exec` automatically drops to the `hermes` user](#docker-exec-automatically-drops-to-the-hermes-user) for details and the per-invocation opt-out.
### Browser tools not working
Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command: