Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 3b6347af15 | |||
| 42612aa350 | |||
| 3c6e70aef1 | |||
| 2f0f03c40d | |||
| 5c2170a7c6 | |||
| d77d877665 | |||
| ac8e238bc8 | |||
| 8d129d013b | |||
| 300140e006 | |||
| e71a2bd11b |
@@ -0,0 +1,110 @@
|
||||
# Hermes Agent v0.15.1 (v2026.5.29)
|
||||
|
||||
**Release Date:** May 29, 2026
|
||||
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
|
||||
|
||||
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Highlights
|
||||
|
||||
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
|
||||
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
|
||||
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
|
||||
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Bug Fixes
|
||||
|
||||
### Dashboard / Web
|
||||
|
||||
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
### Docker
|
||||
|
||||
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
|
||||
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
|
||||
|
||||
### MCP
|
||||
|
||||
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
### Kanban
|
||||
|
||||
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
|
||||
|
||||
### Gateway
|
||||
|
||||
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
|
||||
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
|
||||
|
||||
### CLI
|
||||
|
||||
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
|
||||
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
|
||||
|
||||
### Skills
|
||||
|
||||
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
### Redaction
|
||||
|
||||
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
|
||||
|
||||
---
|
||||
|
||||
## ✨ Small Features
|
||||
|
||||
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
|
||||
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
|
||||
|
||||
---
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Infrastructure
|
||||
|
||||
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
|
||||
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
|
||||
|
||||
---
|
||||
|
||||
## 👥 Contributors
|
||||
|
||||
### Core
|
||||
- @teknium1
|
||||
|
||||
### Community
|
||||
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
|
||||
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
|
||||
- @kshitijk4poor — `/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
|
||||
- @nicoloboschi — hindsight `recall_types` observation default
|
||||
- @BROCCOLO1D — arm64 PR build cache fix
|
||||
- @r266-tech — `--no-supervise` reference docs
|
||||
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
|
||||
- @devwdave — completed-turn memory context (credited via salvage)
|
||||
- @andrewhosf — co-author
|
||||
|
||||
### Issue Reporters (the 401 loop)
|
||||
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
|
||||
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
|
||||
|
||||
---
|
||||
|
||||
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"id": "hermes-agent",
|
||||
"name": "Hermes Agent",
|
||||
"version": "0.15.0",
|
||||
"version": "0.15.1",
|
||||
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
|
||||
"repository": "https://github.com/NousResearch/hermes-agent",
|
||||
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
|
||||
@@ -9,7 +9,7 @@
|
||||
"license": "MIT",
|
||||
"distribution": {
|
||||
"uvx": {
|
||||
"package": "hermes-agent[acp]==0.15.0",
|
||||
"package": "hermes-agent[acp]==0.15.1",
|
||||
"args": ["hermes-acp"]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
# Persistent shell (non-local backends)
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
|
||||
@@ -831,6 +831,8 @@ if _config_path.exists():
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
}
|
||||
@@ -5418,6 +5420,49 @@ class GatewayRunner:
|
||||
)
|
||||
stale_timeout_seconds = 0
|
||||
|
||||
# Read kanban.default_assignee — fallback profile for tasks
|
||||
# created without an explicit assignee (e.g. via the dashboard).
|
||||
# When set, the dispatcher applies it to unassigned ready tasks
|
||||
# instead of skipping them indefinitely (#27145). Empty string
|
||||
# (the schema default) means "no fallback, keep skipping" —
|
||||
# backward-compatible with existing installs.
|
||||
default_assignee = (kanban_cfg.get("default_assignee") or "").strip() or None
|
||||
if default_assignee:
|
||||
logger.info(
|
||||
"kanban dispatcher: default_assignee=%r (unassigned ready tasks "
|
||||
"will route to this profile)",
|
||||
default_assignee,
|
||||
)
|
||||
|
||||
# Read kanban.max_in_progress_per_profile — per-profile concurrency
|
||||
# cap (#21582). When set, no single profile gets more than N
|
||||
# workers running at once, even if the global max_in_progress
|
||||
# would allow it. Prevents one profile's local model / API quota
|
||||
# / browser pool from being overwhelmed by a fan-out.
|
||||
raw_per_profile = kanban_cfg.get("max_in_progress_per_profile", None)
|
||||
max_in_progress_per_profile = None
|
||||
if raw_per_profile is not None:
|
||||
try:
|
||||
max_in_progress_per_profile = int(raw_per_profile)
|
||||
except (TypeError, ValueError):
|
||||
logger.warning(
|
||||
"kanban dispatcher: invalid kanban.max_in_progress_per_profile=%r; ignoring",
|
||||
raw_per_profile,
|
||||
)
|
||||
max_in_progress_per_profile = None
|
||||
else:
|
||||
if max_in_progress_per_profile < 1:
|
||||
logger.warning(
|
||||
"kanban dispatcher: kanban.max_in_progress_per_profile=%r is below 1; ignoring",
|
||||
raw_per_profile,
|
||||
)
|
||||
max_in_progress_per_profile = None
|
||||
else:
|
||||
logger.info(
|
||||
"kanban dispatcher: max_in_progress_per_profile=%d",
|
||||
max_in_progress_per_profile,
|
||||
)
|
||||
|
||||
# Initial delay so the gateway finishes wiring adapters before the
|
||||
# dispatcher spawns workers (those workers may hit gateway notify
|
||||
# subscriptions etc.). Matches the notifier watcher's delay.
|
||||
@@ -5509,6 +5554,8 @@ class GatewayRunner:
|
||||
max_in_progress=max_in_progress,
|
||||
failure_limit=failure_limit,
|
||||
stale_timeout_seconds=stale_timeout_seconds,
|
||||
default_assignee=default_assignee,
|
||||
max_in_progress_per_profile=max_in_progress_per_profile,
|
||||
)
|
||||
except sqlite3.DatabaseError as exc:
|
||||
if _is_corrupt_board_db_error(exc):
|
||||
|
||||
@@ -14,8 +14,8 @@ Provides subcommands for:
|
||||
import os
|
||||
import sys
|
||||
|
||||
__version__ = "0.15.0"
|
||||
__release_date__ = "2026.5.28"
|
||||
__version__ = "0.15.1"
|
||||
__release_date__ = "2026.5.29"
|
||||
|
||||
|
||||
def _ensure_utf8():
|
||||
|
||||
@@ -1726,6 +1726,15 @@ DEFAULT_CONFIG = {
|
||||
# assignee to any installed profile. When unset, falls back to the
|
||||
# default profile. A task never ends up with assignee=None.
|
||||
"default_assignee": "",
|
||||
# Per-profile concurrency cap (#21582). When set to a positive int,
|
||||
# no single profile can have more than N workers running at once,
|
||||
# even if the global max_in_progress / max_spawn caps would allow
|
||||
# it. Tasks blocked this way defer to the next dispatcher tick.
|
||||
# Unset (None) means "no per-profile cap" — backward-compatible
|
||||
# with existing installs. Useful for fan-out workflows that would
|
||||
# otherwise saturate one profile's local model / API quota /
|
||||
# browser pool while leaving other profiles idle.
|
||||
"max_in_progress_per_profile": None,
|
||||
# When true, the kanban dispatcher auto-runs the decomposer on
|
||||
# tasks that land in Triage (every dispatcher tick). When false,
|
||||
# decomposition is manual via `hermes kanban decompose <id>` or
|
||||
@@ -5551,6 +5560,8 @@ def set_config_value(key: str, value: str):
|
||||
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
|
||||
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
|
||||
# terminal.cwd intentionally excluded — CLI resolves at runtime,
|
||||
# gateway bridges it in gateway/run.py. Persisting to .env causes
|
||||
|
||||
@@ -2087,12 +2087,35 @@ def _cmd_tail(args: argparse.Namespace) -> int:
|
||||
|
||||
|
||||
def _cmd_dispatch(args: argparse.Namespace) -> int:
|
||||
# Honour kanban.default_assignee as the fallback for unassigned ready
|
||||
# tasks (#27145) and kanban.max_in_progress_per_profile as the
|
||||
# per-profile concurrency cap (#21582). Same semantics as the
|
||||
# gateway dispatch path.
|
||||
try:
|
||||
from hermes_cli.config import load_config
|
||||
_cfg = load_config()
|
||||
_kanban_cfg = _cfg.get("kanban", {}) if isinstance(_cfg, dict) else {}
|
||||
default_assignee = (_kanban_cfg.get("default_assignee") or "").strip() or None
|
||||
_raw_per_profile = _kanban_cfg.get("max_in_progress_per_profile", None)
|
||||
try:
|
||||
max_in_progress_per_profile = (
|
||||
int(_raw_per_profile) if _raw_per_profile is not None else None
|
||||
)
|
||||
if max_in_progress_per_profile is not None and max_in_progress_per_profile < 1:
|
||||
max_in_progress_per_profile = None
|
||||
except (TypeError, ValueError):
|
||||
max_in_progress_per_profile = None
|
||||
except Exception:
|
||||
default_assignee = None
|
||||
max_in_progress_per_profile = None
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn,
|
||||
dry_run=args.dry_run,
|
||||
max_spawn=args.max,
|
||||
failure_limit=getattr(args, "failure_limit", kb.DEFAULT_SPAWN_FAILURE_LIMIT),
|
||||
default_assignee=default_assignee,
|
||||
max_in_progress_per_profile=max_in_progress_per_profile,
|
||||
)
|
||||
if getattr(args, "json", False):
|
||||
print(json.dumps({
|
||||
@@ -2108,6 +2131,11 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
|
||||
],
|
||||
"skipped_unassigned": res.skipped_unassigned,
|
||||
"skipped_nonspawnable": res.skipped_nonspawnable,
|
||||
"skipped_per_profile_capped": [
|
||||
{"task_id": tid, "assignee": who, "current": current}
|
||||
for (tid, who, current) in res.skipped_per_profile_capped
|
||||
],
|
||||
"auto_assigned_default": res.auto_assigned_default,
|
||||
}, indent=2))
|
||||
return 0
|
||||
print(f"Reclaimed: {res.reclaimed}")
|
||||
@@ -2128,8 +2156,18 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
|
||||
for tid, who, ws in res.spawned:
|
||||
tag = " (dry)" if args.dry_run else ""
|
||||
print(f" - {tid} -> {who} @ {ws or '-'}{tag}")
|
||||
if res.auto_assigned_default:
|
||||
print(
|
||||
f"Auto-assigned to kanban.default_assignee={default_assignee!r}: "
|
||||
f"{', '.join(res.auto_assigned_default)}"
|
||||
)
|
||||
if res.skipped_unassigned:
|
||||
print(f"Skipped (unassigned): {', '.join(res.skipped_unassigned)}")
|
||||
if res.skipped_per_profile_capped:
|
||||
for tid, who, current in res.skipped_per_profile_capped:
|
||||
print(
|
||||
f"Deferred ({who} at per-profile cap, {current} running): {tid}"
|
||||
)
|
||||
if res.skipped_nonspawnable:
|
||||
print(
|
||||
f"Skipped (non-spawnable assignee — terminal lane, OK): "
|
||||
|
||||
+126
-5
@@ -4289,6 +4289,12 @@ class DispatchResult:
|
||||
skipped_unassigned: list[str] = field(default_factory=list)
|
||||
"""Ready task ids skipped because they have no assignee at all.
|
||||
Operator-actionable — usually a misfiled task waiting for routing."""
|
||||
auto_assigned_default: list[str] = field(default_factory=list)
|
||||
"""Task ids that were unassigned in the DB and had
|
||||
``kanban.default_assignee`` applied this tick before spawning (#27145).
|
||||
Surfaces the auto-assignment to telemetry / CLI / dashboard so the
|
||||
operator can see when the dispatcher is acting on the fallback rule
|
||||
rather than on explicit per-task assignments."""
|
||||
skipped_nonspawnable: list[str] = field(default_factory=list)
|
||||
"""Ready task ids skipped because their assignee names a control-plane
|
||||
lane (a Claude Code terminal like ``orion-cc``) rather than a Hermes
|
||||
@@ -4296,6 +4302,14 @@ class DispatchResult:
|
||||
operator-actionable failure. Tracked separately so health telemetry
|
||||
can distinguish "real stuck" (nothing spawned but spawnable work
|
||||
available) from "correctly idle" (nothing spawnable in the queue)."""
|
||||
skipped_per_profile_capped: list[tuple[str, str, int]] = field(default_factory=list)
|
||||
"""Tasks deferred this tick because their assignee is already at
|
||||
``kanban.max_in_progress_per_profile`` (#21582). Each entry is
|
||||
``(task_id, assignee, current_running_count)``. NOT an
|
||||
operator-actionable failure — the task will be picked up on a
|
||||
subsequent tick when the assignee has capacity. Separate bucket so
|
||||
telemetry / dashboards can show "this profile is busy" vs
|
||||
"task is genuinely stuck"."""
|
||||
crashed: list[str] = field(default_factory=list)
|
||||
"""Task ids reclaimed because their worker PID disappeared."""
|
||||
auto_blocked: list[str] = field(default_factory=list)
|
||||
@@ -5342,6 +5356,8 @@ def dispatch_once(
|
||||
failure_limit: int = DEFAULT_SPAWN_FAILURE_LIMIT,
|
||||
stale_timeout_seconds: int = 0,
|
||||
board: Optional[str] = None,
|
||||
default_assignee: Optional[str] = None,
|
||||
max_in_progress_per_profile: Optional[int] = None,
|
||||
) -> DispatchResult:
|
||||
"""Run one dispatcher tick.
|
||||
|
||||
@@ -5427,12 +5443,89 @@ def dispatch_once(
|
||||
if max_spawn is None or max_spawn > remaining:
|
||||
max_spawn = remaining
|
||||
spawned = 0
|
||||
# Per-profile concurrency cap (#21582): when set, track how many
|
||||
# workers each assignee already has in flight, and refuse to spawn
|
||||
# when this would push that assignee past the cap. Prevents
|
||||
# fan-out workloads from melting a single profile's local model /
|
||||
# API quota / browser pool while leaving other profiles idle.
|
||||
# Tasks blocked this way go to skipped_per_profile_capped (not
|
||||
# skipped_unassigned — the operator-actionable signal is different:
|
||||
# "this profile is busy, try again later" not "this needs routing").
|
||||
_per_profile_cap = max_in_progress_per_profile if (
|
||||
isinstance(max_in_progress_per_profile, int)
|
||||
and max_in_progress_per_profile > 0
|
||||
) else None
|
||||
_per_profile_running: dict[str, int] = {}
|
||||
if _per_profile_cap is not None:
|
||||
for prow in conn.execute(
|
||||
"SELECT assignee, COUNT(*) AS n FROM tasks "
|
||||
"WHERE status = 'running' AND assignee IS NOT NULL "
|
||||
"GROUP BY assignee"
|
||||
):
|
||||
_per_profile_running[prow["assignee"]] = int(prow["n"])
|
||||
# Normalize default_assignee once: empty/whitespace string → None so the
|
||||
# rest of the loop can use ``if default_assignee:`` as a single check.
|
||||
# We also resolve profile_exists once here for the same reason.
|
||||
_default_assignee = (default_assignee or "").strip() or None
|
||||
_default_assignee_resolved = False
|
||||
if _default_assignee:
|
||||
try:
|
||||
from hermes_cli.profiles import profile_exists as _pe
|
||||
_default_assignee_resolved = bool(_pe(_default_assignee))
|
||||
except Exception:
|
||||
# Profiles module not importable (test stubs, exotic envs).
|
||||
# Trust the operator's config and try the assignment; the
|
||||
# downstream profile_exists check on the assigned row will
|
||||
# bucket it as nonspawnable if the profile genuinely isn't
|
||||
# there, with the existing diagnostic.
|
||||
_default_assignee_resolved = True
|
||||
for row in ready_rows:
|
||||
if max_spawn is not None and running_count + spawned >= max_spawn:
|
||||
break
|
||||
if not row["assignee"]:
|
||||
result.skipped_unassigned.append(row["id"])
|
||||
continue
|
||||
row_assignee = row["assignee"]
|
||||
if not row_assignee:
|
||||
# Honour kanban.default_assignee: when the dispatcher hits an
|
||||
# unassigned ready task and an operator-configured fallback
|
||||
# exists, persist the assignment and proceed. This removes the
|
||||
# dashboard footgun where a task created without an assignee
|
||||
# parks in 'ready' forever even though the operator's intent
|
||||
# ("default") was perfectly clear (#27145). Mutating the row
|
||||
# (not just the in-memory view) keeps diagnostics and the
|
||||
# board state consistent: the task is now legitimately owned
|
||||
# by ``kanban.default_assignee``, not "unassigned but secretly
|
||||
# routed".
|
||||
if _default_assignee and _default_assignee_resolved:
|
||||
# Dry-run: show what WOULD happen (auto-assign + spawn) without
|
||||
# mutating the DB. Real run: mutate the row + emit the
|
||||
# 'assigned' event so the board state matches what just happened.
|
||||
if not dry_run:
|
||||
try:
|
||||
with write_txn(conn):
|
||||
conn.execute(
|
||||
"UPDATE tasks SET assignee = ? WHERE id = ? "
|
||||
"AND (assignee IS NULL OR assignee = '')",
|
||||
(_default_assignee, row["id"]),
|
||||
)
|
||||
_append_event(
|
||||
conn, row["id"], "assigned",
|
||||
{
|
||||
"assignee": _default_assignee,
|
||||
"source": "kanban.default_assignee",
|
||||
},
|
||||
)
|
||||
except Exception:
|
||||
_log.debug(
|
||||
"kanban dispatch: failed to apply default_assignee=%r "
|
||||
"to task %s",
|
||||
_default_assignee, row["id"], exc_info=True,
|
||||
)
|
||||
result.skipped_unassigned.append(row["id"])
|
||||
continue
|
||||
row_assignee = _default_assignee
|
||||
result.auto_assigned_default.append(row["id"])
|
||||
else:
|
||||
result.skipped_unassigned.append(row["id"])
|
||||
continue
|
||||
# Skip ready tasks whose assignee is not a real Hermes profile.
|
||||
# `_default_spawn` invokes ``hermes -p <assignee>`` which fails
|
||||
# with "Profile 'X' does not exist" when the assignee names a
|
||||
@@ -5447,7 +5540,7 @@ def dispatch_once(
|
||||
from hermes_cli.profiles import profile_exists # local import: avoids cycle
|
||||
except Exception:
|
||||
profile_exists = None # type: ignore[assignment]
|
||||
if profile_exists is not None and not profile_exists(row["assignee"]):
|
||||
if profile_exists is not None and not profile_exists(row_assignee):
|
||||
# Bucket separately from skipped_unassigned: the operator
|
||||
# cannot fix this by assigning a profile (the assignee IS the
|
||||
# intended owner — a terminal lane). Health telemetry uses
|
||||
@@ -5456,6 +5549,19 @@ def dispatch_once(
|
||||
# of human-pulled work.
|
||||
result.skipped_nonspawnable.append(row["id"])
|
||||
continue
|
||||
# Per-profile concurrency cap (#21582): even if there's global
|
||||
# headroom, refuse to spawn for an assignee that's already at
|
||||
# its in-flight cap. Prevents one profile's local model / API
|
||||
# quota / browser pool from being overwhelmed by a fan-out
|
||||
# while the global max_in_progress / max_spawn caps still allow
|
||||
# work on OTHER profiles.
|
||||
if _per_profile_cap is not None:
|
||||
current = _per_profile_running.get(row_assignee, 0)
|
||||
if current >= _per_profile_cap:
|
||||
result.skipped_per_profile_capped.append(
|
||||
(row["id"], row_assignee, current)
|
||||
)
|
||||
continue
|
||||
# Respawn guard: refuse to re-spawn when useful work is already
|
||||
# in-flight/recent, or when the last failure is a deterministic
|
||||
# blocker (quota / auth). The guard defers the spawn this tick so
|
||||
@@ -5478,7 +5584,15 @@ def dispatch_once(
|
||||
)
|
||||
continue
|
||||
if dry_run:
|
||||
result.spawned.append((row["id"], row["assignee"], ""))
|
||||
result.spawned.append((row["id"], row_assignee, ""))
|
||||
# Increment per-profile counter even in dry_run so the cap
|
||||
# check sees the would-be spawn on subsequent iterations.
|
||||
# Without this, dry_run reports every task as spawnable and
|
||||
# under-reports the capped subset (#21582).
|
||||
if _per_profile_cap is not None and row_assignee:
|
||||
_per_profile_running[row_assignee] = (
|
||||
_per_profile_running.get(row_assignee, 0) + 1
|
||||
)
|
||||
continue
|
||||
claimed = claim_task(conn, row["id"], ttl_seconds=ttl_seconds)
|
||||
if claimed is None:
|
||||
@@ -5521,6 +5635,13 @@ def dispatch_once(
|
||||
# complete_task).
|
||||
result.spawned.append((claimed.id, claimed.assignee or "", str(workspace)))
|
||||
spawned += 1
|
||||
# Track the new in-flight count for this profile so later
|
||||
# iterations in this same tick respect the per-profile cap
|
||||
# (#21582). Subsequent ticks re-query from the DB.
|
||||
if _per_profile_cap is not None and claimed.assignee:
|
||||
_per_profile_running[claimed.assignee] = (
|
||||
_per_profile_running.get(claimed.assignee, 0) + 1
|
||||
)
|
||||
except Exception as exc:
|
||||
auto = _record_spawn_failure(
|
||||
conn, claimed.id, str(exc),
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
|
||||
"TERMINAL_CONTAINER_DISK",
|
||||
"TERMINAL_CONTAINER_MEMORY",
|
||||
"TERMINAL_CONTAINER_PERSISTENT",
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"BROWSER_CDP_URL",
|
||||
"CAMOFOX_URL",
|
||||
|
||||
@@ -0,0 +1,154 @@
|
||||
"""Regression tests for #27145 — kanban.default_assignee for unassigned ready tasks.
|
||||
|
||||
When the dispatcher hits an unassigned ready task and ``kanban.default_assignee``
|
||||
is set, the dispatcher applies the assignment and spawns. Without the config,
|
||||
the task is skipped (existing behavior preserved).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def isolated_kanban_home(monkeypatch):
|
||||
"""Spin up a fresh HERMES_HOME with a clean kanban DB."""
|
||||
test_home = tempfile.mkdtemp(prefix="kanban_default_assignee_test_")
|
||||
monkeypatch.setenv("HERMES_HOME", test_home)
|
||||
# Force-reimport so the fresh HERMES_HOME is picked up.
|
||||
for mod in list(sys.modules.keys()):
|
||||
if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
|
||||
del sys.modules[mod]
|
||||
from hermes_cli import kanban_db
|
||||
yield kanban_db, test_home
|
||||
# Cleanup is best-effort; tempfile dir survives but pytest isolation
|
||||
# gives each test its own monkeypatched HERMES_HOME so no cross-test
|
||||
# contamination.
|
||||
|
||||
|
||||
def _fake_spawn(*args, **kwargs):
|
||||
"""Stand-in for the real worker spawn — returns a fake PID."""
|
||||
return 12345
|
||||
|
||||
|
||||
def test_unassigned_task_skipped_without_default_assignee(isolated_kanban_home):
|
||||
"""Baseline: with no default_assignee, an unassigned ready task is
|
||||
skipped via the existing `skipped_unassigned` bucket and the DB row
|
||||
is untouched."""
|
||||
kb, _home = isolated_kanban_home
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
task_id = kb.create_task(conn, title="t1", assignee=None)
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=False)
|
||||
assert res.skipped_unassigned == [task_id]
|
||||
assert not res.auto_assigned_default
|
||||
assert not res.spawned
|
||||
with kb.connect_closing() as conn:
|
||||
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
|
||||
assert row["assignee"] is None
|
||||
|
||||
|
||||
def test_unassigned_task_auto_assigned_with_default_assignee(isolated_kanban_home):
|
||||
"""Core #27145 contract: with default_assignee set, an unassigned ready
|
||||
task gets the assignment applied and dispatched on the same tick. The
|
||||
DB row is mutated (assignee column + an 'assigned' event)."""
|
||||
kb, _home = isolated_kanban_home
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
task_id = kb.create_task(conn, title="t1", assignee=None)
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=False,
|
||||
default_assignee="default",
|
||||
)
|
||||
assert res.auto_assigned_default == [task_id]
|
||||
assert not res.skipped_unassigned
|
||||
assert len(res.spawned) == 1
|
||||
assert res.spawned[0][0] == task_id
|
||||
assert res.spawned[0][1] == "default"
|
||||
|
||||
with kb.connect_closing() as conn:
|
||||
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
|
||||
assert row["assignee"] == "default"
|
||||
|
||||
# 'assigned' event emitted for the audit trail
|
||||
with kb.connect_closing() as conn:
|
||||
evs = list(conn.execute(
|
||||
"SELECT kind, payload FROM task_events WHERE task_id = ? AND kind = 'assigned'",
|
||||
(task_id,),
|
||||
))
|
||||
assert len(evs) == 1
|
||||
payload = json.loads(evs[0][1])
|
||||
assert payload["assignee"] == "default"
|
||||
assert payload["source"] == "kanban.default_assignee"
|
||||
|
||||
|
||||
def test_dry_run_with_default_assignee_reports_without_mutating(isolated_kanban_home):
|
||||
"""Dry-run mode: reports what WOULD happen (task in auto_assigned_default,
|
||||
spawn entry) but does NOT mutate the DB. Operators using
|
||||
`hermes kanban dispatch --dry-run` see the routing decision before
|
||||
committing."""
|
||||
kb, _home = isolated_kanban_home
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
task_id = kb.create_task(conn, title="t1", assignee=None)
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=True,
|
||||
default_assignee="default",
|
||||
)
|
||||
assert res.auto_assigned_default == [task_id]
|
||||
assert len(res.spawned) == 1
|
||||
with kb.connect_closing() as conn:
|
||||
row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
|
||||
# DB unchanged — dry_run did not commit the assignment.
|
||||
assert row["assignee"] is None
|
||||
|
||||
|
||||
def test_whitespace_default_assignee_treated_as_none(isolated_kanban_home):
|
||||
"""Empty / whitespace-only default_assignee values must be treated as
|
||||
'no fallback set' so a misconfigured kanban.default_assignee=' '
|
||||
doesn't surprise operators by silently routing unassigned tasks."""
|
||||
kb, _home = isolated_kanban_home
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
task_id = kb.create_task(conn, title="t1", assignee=None)
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=False,
|
||||
default_assignee=" ",
|
||||
)
|
||||
assert task_id in res.skipped_unassigned
|
||||
assert not res.auto_assigned_default
|
||||
|
||||
|
||||
def test_explicitly_assigned_task_untouched_by_default_assignee(isolated_kanban_home):
|
||||
"""A task with an explicit assignee must NOT be touched by the
|
||||
default_assignee logic — that fallback only applies to genuinely
|
||||
unassigned rows."""
|
||||
kb, _home = isolated_kanban_home
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
task_id = kb.create_task(conn, title="t1", assignee="default")
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=False,
|
||||
default_assignee="someother",
|
||||
)
|
||||
assert task_id not in res.auto_assigned_default
|
||||
assert any(s[0] == task_id and s[1] == "default" for s in res.spawned)
|
||||
|
||||
|
||||
def test_dispatch_result_has_auto_assigned_default_field():
|
||||
"""Schema-level invariant: DispatchResult exposes the
|
||||
auto_assigned_default field so CLI / dashboard / gateway can surface
|
||||
the new routing decisions."""
|
||||
from hermes_cli.kanban_db import DispatchResult
|
||||
r = DispatchResult()
|
||||
assert hasattr(r, "auto_assigned_default")
|
||||
assert r.auto_assigned_default == []
|
||||
@@ -0,0 +1,167 @@
|
||||
"""Regression tests for #21582 — per-profile concurrency cap in dispatcher.
|
||||
|
||||
When ``kanban.max_in_progress_per_profile`` is set, no single profile
|
||||
gets more than N workers running at once even if the global
|
||||
``max_in_progress`` cap would allow it. Prevents one profile's local
|
||||
model / API quota / browser pool from being overwhelmed by a fan-out.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def isolated_kanban_home_with_profiles(monkeypatch):
|
||||
"""Spin up a fresh HERMES_HOME with kanban DB + alpha/beta profiles."""
|
||||
test_home = tempfile.mkdtemp(prefix="kanban_per_profile_cap_test_")
|
||||
for prof in ("alpha", "beta", "default"):
|
||||
os.makedirs(os.path.join(test_home, "profiles", prof), exist_ok=True)
|
||||
monkeypatch.setenv("HERMES_HOME", test_home)
|
||||
for mod in list(sys.modules.keys()):
|
||||
if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
|
||||
del sys.modules[mod]
|
||||
from hermes_cli import kanban_db
|
||||
yield kanban_db
|
||||
|
||||
|
||||
def _fake_spawn(*args, **kwargs):
|
||||
return 12345
|
||||
|
||||
|
||||
def test_no_cap_all_tasks_dispatched(isolated_kanban_home_with_profiles):
|
||||
"""Baseline: with no per-profile cap, all ready tasks dispatch."""
|
||||
kb = isolated_kanban_home_with_profiles
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
for i in range(5):
|
||||
kb.create_task(conn, title=f"a{i}", assignee="alpha")
|
||||
for i in range(3):
|
||||
kb.create_task(conn, title=f"b{i}", assignee="beta")
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=True)
|
||||
assert len(res.spawned) == 8
|
||||
assert not res.skipped_per_profile_capped
|
||||
|
||||
|
||||
def test_cap_2_balances_two_profiles(isolated_kanban_home_with_profiles):
|
||||
"""With cap=2: 2 alpha + 2 beta dispatched; remaining 3 alpha + 1 beta
|
||||
deferred to skipped_per_profile_capped."""
|
||||
kb = isolated_kanban_home_with_profiles
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
for i in range(5):
|
||||
kb.create_task(conn, title=f"a{i}", assignee="alpha")
|
||||
for i in range(3):
|
||||
kb.create_task(conn, title=f"b{i}", assignee="beta")
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=True,
|
||||
max_in_progress_per_profile=2,
|
||||
)
|
||||
spawn_assignees = [s[1] for s in res.spawned]
|
||||
capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
|
||||
assert spawn_assignees.count("alpha") == 2
|
||||
assert spawn_assignees.count("beta") == 2
|
||||
assert capped_assignees.count("alpha") == 3
|
||||
assert capped_assignees.count("beta") == 1
|
||||
|
||||
|
||||
def test_pre_existing_running_counts_against_cap(isolated_kanban_home_with_profiles):
|
||||
"""A task already in 'running' status when dispatch_once starts counts
|
||||
toward the per-profile cap. With 1 alpha pre-running and cap=1, NO new
|
||||
alpha tasks should spawn; beta is independent so 1 beta spawns."""
|
||||
kb = isolated_kanban_home_with_profiles
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
running_alpha = kb.create_task(conn, title="running alpha", assignee="alpha")
|
||||
with kb.write_txn(conn):
|
||||
conn.execute(
|
||||
"UPDATE tasks SET status = 'running', claim_lock = 'test:1' WHERE id = ?",
|
||||
(running_alpha,),
|
||||
)
|
||||
for i in range(2):
|
||||
kb.create_task(conn, title=f"a{i}", assignee="alpha")
|
||||
for i in range(2):
|
||||
kb.create_task(conn, title=f"b{i}", assignee="beta")
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=True,
|
||||
max_in_progress_per_profile=1,
|
||||
)
|
||||
spawn_assignees = [s[1] for s in res.spawned]
|
||||
capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
|
||||
assert spawn_assignees.count("alpha") == 0
|
||||
assert spawn_assignees.count("beta") == 1
|
||||
assert capped_assignees.count("alpha") == 2
|
||||
assert capped_assignees.count("beta") == 1
|
||||
|
||||
|
||||
@pytest.mark.parametrize("cap", [0, -1, "abc", None])
|
||||
def test_invalid_cap_treated_as_no_cap(isolated_kanban_home_with_profiles, cap):
|
||||
"""Cap values that don't represent a positive int should be treated as
|
||||
'no cap' — silently falling through rather than crashing the dispatcher."""
|
||||
kb = isolated_kanban_home_with_profiles
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
for i in range(3):
|
||||
kb.create_task(conn, title=f"a{i}", assignee="alpha")
|
||||
with kb.connect_closing() as conn:
|
||||
res = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=True,
|
||||
max_in_progress_per_profile=cap,
|
||||
)
|
||||
assert not res.skipped_per_profile_capped
|
||||
assert len(res.spawned) == 3
|
||||
|
||||
|
||||
def test_capped_tasks_dispatched_on_subsequent_tick(isolated_kanban_home_with_profiles):
|
||||
"""A task deferred this tick because its profile was at cap should be
|
||||
eligible for dispatch on the next tick (after running tasks complete).
|
||||
This verifies the cap is per-tick state, not a permanent block."""
|
||||
kb = isolated_kanban_home_with_profiles
|
||||
with kb.connect_closing() as conn:
|
||||
kb.create_board(slug="default", name="Test")
|
||||
ids = [kb.create_task(conn, title=f"a{i}", assignee="alpha") for i in range(3)]
|
||||
|
||||
# First tick: cap=1, only 1 alpha dispatched
|
||||
with kb.connect_closing() as conn:
|
||||
res1 = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=False,
|
||||
max_in_progress_per_profile=1,
|
||||
)
|
||||
assert len(res1.spawned) == 1
|
||||
assert len(res1.skipped_per_profile_capped) == 2
|
||||
|
||||
# Simulate the running task completing — set it back to done so the
|
||||
# 'running' count drops
|
||||
spawned_id = res1.spawned[0][0]
|
||||
with kb.connect_closing() as conn:
|
||||
with kb.write_txn(conn):
|
||||
conn.execute(
|
||||
"UPDATE tasks SET status = 'done', claim_lock = NULL WHERE id = ?",
|
||||
(spawned_id,),
|
||||
)
|
||||
|
||||
# Second tick: 1 more alpha should now dispatch
|
||||
with kb.connect_closing() as conn:
|
||||
res2 = kb.dispatch_once(
|
||||
conn, spawn_fn=_fake_spawn, dry_run=False,
|
||||
max_in_progress_per_profile=1,
|
||||
)
|
||||
assert len(res2.spawned) == 1
|
||||
assert len(res2.skipped_per_profile_capped) == 1
|
||||
assert res2.spawned[0][0] != spawned_id # different task this time
|
||||
|
||||
|
||||
def test_dispatch_result_has_skipped_per_profile_capped_field():
|
||||
"""Schema-level invariant: DispatchResult exposes the
|
||||
skipped_per_profile_capped field as a list of
|
||||
(task_id, assignee, current_running) tuples."""
|
||||
from hermes_cli.kanban_db import DispatchResult
|
||||
r = DispatchResult()
|
||||
assert hasattr(r, "skipped_per_profile_capped")
|
||||
assert r.skipped_per_profile_capped == []
|
||||
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
|
||||
|
||||
|
||||
def test_non_persistent_cleanup_removes_container(monkeypatch):
|
||||
"""When persistent=false, cleanup() must schedule docker stop + rm."""
|
||||
"""When persist_across_processes=false, cleanup() must docker stop AND
|
||||
docker rm so containers don't leak across hermes processes.
|
||||
|
||||
Updated for issue #20561: the previous implementation used fire-and-forget
|
||||
``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
|
||||
the new implementation uses ``subprocess.run`` on a daemon thread with
|
||||
bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
|
||||
for the full behavior contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
# Run the worker thread synchronously so assertions can observe its work.
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
popen_cmds = []
|
||||
monkeypatch.setattr(
|
||||
docker_env.subprocess, "Popen",
|
||||
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="ephemeral-task", persistent_filesystem=False,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
|
||||
assert env._container_id
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
# Capture cleanup-time docker calls (everything before this was init).
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capture(cmd, **kw):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
|
||||
return real_run(cmd, **kw)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capture)
|
||||
env.cleanup()
|
||||
|
||||
# Should have stop and rm calls via Popen
|
||||
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
|
||||
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
|
||||
assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"
|
||||
|
||||
|
||||
class _FakePopen:
|
||||
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
|
||||
"does not expose POSIX uid/gid" in rec.getMessage()
|
||||
for rec in caplog.records
|
||||
), "expected a warning when POSIX ids are unavailable"
|
||||
|
||||
|
||||
# ── Docker labels (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _run_args_from_calls(calls):
|
||||
"""Pull the argv list passed to the first ``docker run`` invocation."""
|
||||
run_calls = [
|
||||
c for c in calls
|
||||
if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
|
||||
]
|
||||
assert run_calls, "docker run should have been called"
|
||||
return run_calls[0][0]
|
||||
|
||||
|
||||
def _labels_in_run_args(run_args):
|
||||
"""Return the set of ``key=value`` strings passed via ``--label``."""
|
||||
return {
|
||||
run_args[i + 1]
|
||||
for i, flag in enumerate(run_args[:-1])
|
||||
if flag == "--label"
|
||||
}
|
||||
|
||||
|
||||
def test_run_command_tags_hermes_agent_label(monkeypatch):
|
||||
"""Every container hermes-agent starts must carry the hermes-agent=1 label
|
||||
so the orphan reaper (and external operators) can identify them with a
|
||||
single ``docker ps --filter label=hermes-agent=1`` call. Regression test
|
||||
for issue #20561 — without the label there is no global sweep target."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="my-task")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-agent=1" in labels, (
|
||||
f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_run_command_tags_task_and_profile_labels(monkeypatch):
|
||||
"""task_id and the active profile name are surfaced as labels so future
|
||||
cross-process reuse logic can filter to a specific (task, profile) pair
|
||||
without parsing container names. Profile resolution uses the helper that
|
||||
returns ``"default"`` for the root Hermes home."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="kanban-42")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-task-id=kanban-42" in labels, (
|
||||
f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
|
||||
)
|
||||
assert "hermes-profile=research-bot" in labels, (
|
||||
f"hermes-profile=research-bot missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_label_sanitizer_rejects_invalid_characters():
|
||||
"""Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
|
||||
task names containing slashes, colons, or unicode would otherwise emit
|
||||
invalid labels that round-trip badly through ``docker ps --filter``."""
|
||||
assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
|
||||
assert docker_env._sanitize_label_value("with/slash") == "with_slash"
|
||||
assert docker_env._sanitize_label_value("with:colon") == "with_colon"
|
||||
assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
|
||||
# Empty / non-string inputs must collapse to a queryable token, not "".
|
||||
assert docker_env._sanitize_label_value("") == "unknown"
|
||||
assert docker_env._sanitize_label_value(None) == "unknown" # type: ignore[arg-type]
|
||||
# >63 chars must truncate, not error.
|
||||
long_value = "x" * 100
|
||||
assert len(docker_env._sanitize_label_value(long_value)) == 63
|
||||
|
||||
|
||||
def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
|
||||
"""A task_id containing characters Docker rejects in label values must be
|
||||
sanitized before reaching ``docker run --label``; otherwise the daemon
|
||||
refuses the run with an inscrutable error and the agent's first command
|
||||
blows up."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="task/with:weird*chars")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
# Each non-OK character becomes an underscore; the safe chars survive.
|
||||
assert "hermes-task-id=task_with_weird_chars" in labels, (
|
||||
f"sanitized task-id label missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_labels_attribute_populated_after_init(monkeypatch):
|
||||
"""``self._labels`` must be set to the same key/value pairs that went onto
|
||||
docker run, so subsequent reuse / reaper paths can match without re-running
|
||||
the sanitizer or re-importing the profile module."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="abc")
|
||||
|
||||
assert env._labels == {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": "abc",
|
||||
"hermes-profile": "default",
|
||||
}
|
||||
|
||||
|
||||
# ── Cross-process container reuse (issue #20561) ──────────────────
|
||||
|
||||
|
||||
def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
|
||||
start_succeeds: bool = True):
|
||||
"""Reuse-aware subprocess.run mock.
|
||||
|
||||
``ps_state`` controls what ``docker ps -a --filter ...`` returns:
|
||||
* ``None`` → no match (empty stdout). Forces a fresh ``docker run``.
|
||||
* ``"running"`` / ``"exited"`` / ... → emit ``CID\\tSTATE`` so the reuse
|
||||
path picks it up. ``"running"`` skips ``docker start``; other states
|
||||
trigger ``docker start`` (which can be forced to fail via
|
||||
``start_succeeds=False``).
|
||||
|
||||
Returns the captured call list so the test can verify which docker
|
||||
commands actually ran.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
sub = cmd[1]
|
||||
if sub == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
|
||||
if sub == "ps":
|
||||
if ps_state is None:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
|
||||
)
|
||||
if sub == "start":
|
||||
if not start_succeeds:
|
||||
# Real subprocess.run with check=True raises on non-zero exit;
|
||||
# mirror that so the production code's except clause fires.
|
||||
raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
|
||||
if sub == "run":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
|
||||
"""When a labeled container is already ``running``, the reuse probe
|
||||
must pick it up and skip ``docker run`` entirely. Regression for the
|
||||
issue #20561 root cause: every Hermes process spawning a new container
|
||||
despite docs claiming "ONE long-lived container shared across sessions"."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-test")
|
||||
|
||||
# The reuse path must populate _container_id from the ps probe output.
|
||||
assert env._container_id == "reused-cid", (
|
||||
f"expected reused container id, got {env._container_id!r}"
|
||||
)
|
||||
# And it must NOT have run `docker run`.
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, (
|
||||
f"docker run should be skipped on reuse, got: {run_invocations}"
|
||||
)
|
||||
# And it must have NOT issued a `docker start` for an already-running container.
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert not start_invocations, (
|
||||
f"docker start should be skipped when container already running, got: {start_invocations}"
|
||||
)
|
||||
|
||||
|
||||
def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
|
||||
"""A labeled container in ``exited`` state must be restarted via
|
||||
``docker start`` before the new Hermes process uses it. Without this
|
||||
step, ``docker exec`` against a stopped container errors out and the
|
||||
first agent command fails opaquely."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-stopped")
|
||||
|
||||
assert env._container_id == "reused-cid"
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert start_invocations, "expected docker start for exited container"
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, "should not docker run when reusing an exited container"
|
||||
|
||||
|
||||
def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
|
||||
"""If ``docker start`` on the matched container fails (container was
|
||||
removed between probe and start, daemon paused, etc.), the code must
|
||||
silently fall through to a fresh ``docker run`` rather than leaving the
|
||||
user with a broken environment. Defensive recovery — the probe is best-
|
||||
effort, not authoritative."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(
|
||||
monkeypatch, ps_state="exited", start_succeeds=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-broken-start")
|
||||
|
||||
# docker start should be attempted then fail; code falls through to run.
|
||||
assert env._container_id == "fresh-cid", (
|
||||
f"expected fresh container id after fallback, got {env._container_id!r}"
|
||||
)
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert run_invocations, "fallback to fresh docker run must happen on start failure"
|
||||
|
||||
|
||||
def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
|
||||
"""Opt-out path: ``persist_across_processes=False`` skips the ps probe
|
||||
entirely and always starts a fresh container, matching the pre-fix
|
||||
behavior for users who want hard per-process isolation."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
# ps_state=running would trigger reuse if the probe ran — assert it doesn't.
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="no-reuse", persist_across_processes=False,
|
||||
)
|
||||
|
||||
# Must NOT have issued docker ps (the probe is gated by the flag).
|
||||
ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
|
||||
assert not ps_invocations, (
|
||||
f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
|
||||
)
|
||||
# Should have started a fresh container.
|
||||
assert env._container_id == "fresh-cid"
|
||||
|
||||
|
||||
def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
|
||||
"""When the probe returns multiple matches (shouldn't normally happen,
|
||||
but can after a crash leaves stale duplicates), a ``running`` container
|
||||
is preferred over any stopped one. The duplicate gets reaped later by
|
||||
the orphan reaper; we don't try to be heroic about it here."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
if cmd[1] == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
|
||||
if cmd[1] == "ps":
|
||||
# Two matches: stopped first, running second.
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="stopped-cid\texited\nrunning-cid\trunning\n",
|
||||
stderr="",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
env = _make_dummy_env(task_id="dup-match")
|
||||
assert env._container_id == "running-cid", (
|
||||
f"running container should win over stopped duplicate, got {env._container_id!r}"
|
||||
)
|
||||
|
||||
|
||||
# ── Cleanup correctness (issue #20561) ────────────────────────────
|
||||
|
||||
|
||||
class _FakeThread:
|
||||
"""Stand-in for threading.Thread that captures target/args and calls
|
||||
target() synchronously when .start() runs, so cleanup behavior is
|
||||
observable without actually backgrounding subprocess calls."""
|
||||
|
||||
def __init__(self, target=None, daemon=None, name=None):
|
||||
self._target = target
|
||||
self.daemon = daemon
|
||||
self.name = name
|
||||
self._done = False
|
||||
|
||||
def start(self):
|
||||
if self._target is not None:
|
||||
self._target()
|
||||
self._done = True
|
||||
|
||||
def is_alive(self):
|
||||
return not self._done
|
||||
|
||||
def join(self, timeout=None):
|
||||
self._done = True
|
||||
|
||||
|
||||
def _install_fake_thread(monkeypatch):
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
|
||||
def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
|
||||
"""``persist_across_processes=True`` (default) cleanup must NEITHER stop
|
||||
NOR remove the container — the docs promise "ONE long-lived container
|
||||
shared across sessions", and any docker stop would kill background
|
||||
processes inside the container (npm watchers, pytest watchers, etc.).
|
||||
|
||||
Resource reclamation in this mode happens via the orphan reaper on next
|
||||
Hermes startup, not on graceful exit. Issue #20561 — the first iteration
|
||||
of this PR did docker stop here, which Ben caught as contradicting the
|
||||
"ONE long-lived container" semantics."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
|
||||
# Default persist_across_processes=True.
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"docker stop must NOT be called when persist_across_processes=True; "
|
||||
f"container has to stay running so background processes survive. "
|
||||
f"Got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"docker rm must NOT be called when persist_across_processes=True; "
|
||||
f"reuse would be impossible. Got: {rms}"
|
||||
)
|
||||
# The in-process handle must still be cleared so the next __init__
|
||||
# re-probes via labels (and reuses the still-running container).
|
||||
assert env._container_id is None, (
|
||||
"in-process container_id should be cleared even in no-op cleanup"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
|
||||
"""``cleanup(force_remove=True)`` must stop AND rm the container even
|
||||
when ``persist_across_processes=True``. This is the explicit-teardown
|
||||
path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
|
||||
future caller that wants a guaranteed fresh container.
|
||||
|
||||
Without this kwarg, callers in persist mode would have no way to force a
|
||||
fresh container without also flipping the global config — too coarse for
|
||||
a per-task reset.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
|
||||
assert env._container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup(force_remove=True)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
|
||||
"""``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
|
||||
for a persist-mode container.
|
||||
|
||||
Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
|
||||
(which is called from ``tui_gateway/server.py`` on session.close, from
|
||||
``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
|
||||
calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
|
||||
we'd tear down the container on every TUI session close, defeating the
|
||||
"ONE long-lived container shared across sessions" contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="session-close-test")
|
||||
container_id = env._container_id
|
||||
terminal_tool._active_environments["session-close-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("session-close-test")
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("session-close-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"cleanup_vm() default must not docker stop a persist-mode container; "
|
||||
f"got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"cleanup_vm() default must not docker rm a persist-mode container; "
|
||||
f"got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
|
||||
"""``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
|
||||
container — the explicit-teardown path for ``/reset``-style flows.
|
||||
|
||||
Also pins the runtime-signature-inspection plumbing: the kwarg must
|
||||
actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="explicit-teardown-test")
|
||||
terminal_tool._active_environments["explicit-teardown-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("explicit-teardown-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
|
||||
"""``persist_across_processes=False`` cleanup must docker stop AND docker
|
||||
rm so containers don't leak. Crucially, this runs regardless of the
|
||||
``persistent_filesystem`` setting — the original code only rm'd when
|
||||
``not self._persistent``, which meant the default-on ``container_persistent:
|
||||
true`` users (the documented happy path) leaked Exited containers forever.
|
||||
Issue #20561 root-cause fix."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
# Note: persistent_filesystem=True (the prior-leak scenario) + the new
|
||||
# cross-process toggle OFF must still result in a clean rm.
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="cleanup-no-persist", persistent_filesystem=True,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, "expected docker stop"
|
||||
assert rms, (
|
||||
"docker rm MUST run when persist_across_processes=False, even with "
|
||||
"persistent_filesystem=True — that gating was the leak source in #20561."
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
|
||||
"""The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
|
||||
raced with parent-process exit and silently dropped cleanup work. The
|
||||
new code must use ``subprocess.run`` with bounded ``timeout=`` so the
|
||||
work actually completes within the process lifetime.
|
||||
|
||||
Asserts cleanup never reaches into shell-mode Popen. Uses
|
||||
``force_remove=True`` so cleanup actually issues docker calls — the
|
||||
default persist-mode path is now a no-op (commit 4) and would trivially
|
||||
pass this assertion without exercising the docker code at all.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
def _forbidden_popen(*args, **kwargs):
|
||||
raise AssertionError(
|
||||
f"cleanup must not use subprocess.Popen anymore (issue #20561); "
|
||||
f"got args={args} kwargs={kwargs}"
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
|
||||
|
||||
env = _make_dummy_env(task_id="no-popen-cleanup")
|
||||
env.cleanup(force_remove=True) # must not raise
|
||||
|
||||
|
||||
def test_wait_for_cleanup_returns_true_when_no_thread_started():
|
||||
"""``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
|
||||
(or the env has no live cleanup thread) — atexit calls it unconditionally
|
||||
across all active envs, so a False return would falsely flag healthy
|
||||
shutdowns."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
# No _cleanup_thread set — simulates an env that was never cleanup()'d.
|
||||
assert env.wait_for_cleanup(timeout=1.0) is True
|
||||
|
||||
|
||||
def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
|
||||
"""End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
|
||||
and reports completion. Atexit relies on this contract to ensure docker
|
||||
stop/rm actually finishes before the Python interpreter exits.
|
||||
|
||||
Uses ``force_remove=True`` so cleanup actually starts a worker thread —
|
||||
the default persist-mode cleanup is a no-op (commit 4) and never spawns
|
||||
a thread, so the trivial "no thread" branch of wait_for_cleanup is
|
||||
already covered by the previous test.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="wait-test")
|
||||
env.cleanup(force_remove=True)
|
||||
assert env.wait_for_cleanup(timeout=5.0) is True
|
||||
|
||||
|
||||
def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
|
||||
"""A DockerEnvironment whose ``__init__`` failed before the container_id
|
||||
was set (image-pull error, docker daemon down) should still be safe to
|
||||
cleanup() — the post-creation failure path in callers always tries.
|
||||
Without this guard the daemon-down case used to NameError on the cleanup
|
||||
branch."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
env._container_id = None
|
||||
env._persistent = False
|
||||
env._workspace_dir = None
|
||||
env._home_dir = None
|
||||
# No exception expected.
|
||||
env.cleanup()
|
||||
|
||||
|
||||
# ── Orphan reaper (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _now_iso(offset_seconds: int = 0) -> str:
|
||||
"""Return an RFC3339 timestamp ``offset_seconds`` in the past."""
|
||||
import datetime
|
||||
t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
|
||||
# Format like Docker emits — with nanoseconds-style trailing digits.
|
||||
return t.isoformat().replace("+00:00", ".123456789Z")
|
||||
|
||||
|
||||
def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
|
||||
rm_succeeds: bool = True):
|
||||
"""Build a subprocess.run mock for reaper tests.
|
||||
|
||||
* ``ps_ids`` — what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
|
||||
* ``inspect_responses[cid]`` — what ``docker inspect ... FinishedAt`` returns
|
||||
for each cid; ``""`` means "field unset".
|
||||
* ``rm_succeeds`` — whether ``docker rm -f`` returns 0.
|
||||
|
||||
Captures every call so tests can assert which containers were rm'd.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
# cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
|
||||
cid = cmd[-1]
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
|
||||
)
|
||||
if sub == "rm":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0 if rm_succeeds else 1,
|
||||
stdout="", stderr="" if rm_succeeds else "no such container",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
|
||||
"""No labeled containers → no rm calls, returns 0. Establishes the
|
||||
happy-path baseline for the orphan reaper (issue #20561)."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, "no rm calls expected when ps returns empty"
|
||||
|
||||
|
||||
def test_reap_orphan_removes_stale_exited_container(monkeypatch):
|
||||
"""An Exited container older than max_age_seconds must be removed.
|
||||
This is the core repair path for issue #20561 — without the reaper,
|
||||
SIGKILL'd Hermes processes leak containers permanently."""
|
||||
old = _now_iso(offset_seconds=900) # 15 minutes ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 1
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert len(rms) == 1
|
||||
assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
|
||||
|
||||
|
||||
def test_reap_orphan_spares_recently_exited_container(monkeypatch):
|
||||
"""A container exited within max_age_seconds must NOT be reaped — that
|
||||
container belongs to a Hermes process that just finished and may be
|
||||
about to be replaced. Conservative window prevents racing sibling
|
||||
processes."""
|
||||
recent = _now_iso(offset_seconds=60) # 1 minute ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
|
||||
|
||||
|
||||
def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
|
||||
"""The reaper must pass ``--filter label=hermes-profile=<profile>`` to
|
||||
docker ps so it never sweeps another profile's containers. A research
|
||||
profile must not tear down the default profile's stragglers."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
|
||||
assert ps_calls, "expected at least one docker ps call"
|
||||
flat = " ".join(ps_calls[0][0])
|
||||
assert "label=hermes-profile=research-bot" in flat, (
|
||||
f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "label=hermes-agent=1" in flat, (
|
||||
f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "status=exited" in flat, (
|
||||
"must filter to exited containers only — running containers may "
|
||||
"belong to a sibling Hermes process and must NEVER be reaped"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
|
||||
"""If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
|
||||
FinishedAt yet) or an unparseable timestamp, the reaper must leave the
|
||||
container alone. Defensive — never reap a container whose age we can't
|
||||
determine."""
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch,
|
||||
ps_ids=["never-finished", "garbage-ts"],
|
||||
inspect_responses={
|
||||
"never-finished": "0001-01-01T00:00:00Z",
|
||||
"garbage-ts": "not-a-timestamp",
|
||||
},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, (
|
||||
f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
|
||||
"""If docker ps itself fails (daemon down, permission denied), the
|
||||
reaper returns 0 without crashing. The reaper is best-effort plumbing,
|
||||
not a critical path — it must never block container creation."""
|
||||
def _failing_ps(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
|
||||
|
||||
# Must not raise
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
assert removed == 0
|
||||
|
||||
|
||||
def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
|
||||
"""If ``docker rm -f`` fails on one container (already removed by a
|
||||
concurrent process, container locked, etc.), the reaper must log and
|
||||
continue to the next candidate rather than aborting the whole sweep."""
|
||||
old = _now_iso(offset_seconds=900)
|
||||
rm_calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
|
||||
if sub == "rm":
|
||||
rm_calls.append(cmd[-1])
|
||||
# cid-b fails; cid-a and cid-c succeed.
|
||||
if cmd[-1] == "cid-b":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
# All three were attempted, two succeeded.
|
||||
assert removed == 2
|
||||
assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
|
||||
f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
|
||||
)
|
||||
|
||||
|
||||
def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
|
||||
"""Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
|
||||
9 fractional digits), but Python's fromisoformat caps at microseconds.
|
||||
The helper must trim the extra digits without raising — otherwise every
|
||||
candidate gets skipped and the reaper does nothing."""
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="2026-05-28T13:45:00.123456789Z\n",
|
||||
stderr="",
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
|
||||
assert result is not None, "must parse RFC3339 with nanoseconds"
|
||||
import datetime
|
||||
assert result.tzinfo == datetime.timezone.utc
|
||||
assert result.year == 2026 and result.month == 5 and result.day == 28
|
||||
|
||||
|
||||
def test_container_finished_at_returns_none_on_zero_value():
|
||||
"""Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
|
||||
map to None so the reaper treats the container as unreapable."""
|
||||
# Direct test of the parsing helper — no subprocess needed since the
|
||||
# check happens after the inspect call returns.
|
||||
import subprocess as _subprocess
|
||||
|
||||
class _MockRun:
|
||||
def __init__(self, stdout):
|
||||
self.returncode = 0
|
||||
self.stdout = stdout
|
||||
self.stderr = ""
|
||||
|
||||
import unittest.mock
|
||||
with unittest.mock.patch.object(
|
||||
docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
|
||||
):
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
|
||||
assert result is None
|
||||
|
||||
@@ -0,0 +1,139 @@
|
||||
"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
|
||||
|
||||
The reaper itself is unit-tested in tests/tools/test_docker_environment.py
|
||||
under the "Orphan reaper" section. These tests cover the terminal_tool-side
|
||||
gates: once-per-process behavior, the disable flag, and the
|
||||
``lifetime_seconds`` doubling that determines the reaper's age threshold.
|
||||
|
||||
Issue #20561 — without these gates, parallel subagents would each fire the
|
||||
reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
|
||||
opt-out would silently do nothing.
|
||||
"""
|
||||
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
|
||||
import tools.terminal_tool as terminal_tool
|
||||
|
||||
|
||||
def _reset_reaper_gate():
|
||||
"""Clear the once-per-process flag between tests."""
|
||||
terminal_tool._docker_orphan_reaper_ran = False
|
||||
|
||||
|
||||
def test_maybe_reap_runs_once_per_process(monkeypatch):
|
||||
"""The reaper sweep must run at most once per Python interpreter.
|
||||
Parallel subagents that each call _create_environment(env_type='docker')
|
||||
would otherwise fire N concurrent docker ps + inspect storms against the
|
||||
daemon and waste 5–10s of startup."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
config = {"docker_orphan_reaper": True}
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
|
||||
assert call_count["reap"] == 1, (
|
||||
f"reaper must run exactly once per process; got {call_count['reap']} calls"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_respects_disable_flag(monkeypatch):
|
||||
"""``terminal.docker_orphan_reaper: false`` (via container_config) must
|
||||
skip the sweep entirely — no docker ps, no inspect, no rm. The escape
|
||||
hatch for operators running multiple Hermes processes in the same
|
||||
profile."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
|
||||
|
||||
assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
|
||||
# The once-per-process gate must NOT be tripped when the reaper is
|
||||
# disabled — that would prevent a subsequent toggle to true from working.
|
||||
assert terminal_tool._docker_orphan_reaper_ran is False
|
||||
|
||||
|
||||
def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
|
||||
"""The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
|
||||
floor). Generous default — gives sibling Hermes processes ample grace
|
||||
to be replaced without their just-exited containers being yanked."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 600, (
|
||||
f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_floors_at_60_seconds(monkeypatch):
|
||||
"""A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
|
||||
otherwise get an effective age threshold of zero, which would race the
|
||||
user's own just-started container creation. Floor at 60s × 2 = 120s."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 120, (
|
||||
f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
|
||||
"""The reaper must be scoped to the current Hermes profile — a research
|
||||
profile must NEVER reap default's containers. Verifies the
|
||||
profile-filter wiring."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
|
||||
patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("profile_filter") == "research-bot", (
|
||||
f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_swallows_exceptions(monkeypatch):
|
||||
"""A reaper crash (docker daemon down, parse error in helper) must NOT
|
||||
block env creation. The reaper is best-effort plumbing, not a critical
|
||||
path; failures get logged at debug level and execution continues."""
|
||||
_reset_reaper_gate()
|
||||
|
||||
def _exploding_reap(**kwargs):
|
||||
raise RuntimeError("docker daemon ate the cat")
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
|
||||
# Must not raise
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
|
||||
assert "docker_env" in _gateway_env_map_keys()
|
||||
assert "docker_env" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_persist_across_processes_is_bridged_everywhere():
|
||||
"""Regression pin for the cross-process container reuse toggle.
|
||||
|
||||
``terminal.docker_persist_across_processes`` (issue #20561) controls
|
||||
whether ``DockerEnvironment.__init__`` probes for and reuses an existing
|
||||
labeled container at startup, and whether ``cleanup()`` removes the
|
||||
container on Hermes exit or just stops it (keeping it for the next
|
||||
process). Same four-bridge invariant as docker_run_as_host_user /
|
||||
docker_env / docker_mount_cwd_to_workspace — drift between any of the
|
||||
four sites means ``terminal.docker_persist_across_processes: false`` in
|
||||
config.yaml silently does nothing for that entry point, leaving the
|
||||
user unable to opt out of the documented "ONE long-lived container
|
||||
shared across sessions" behavior.
|
||||
"""
|
||||
assert "docker_persist_across_processes" in _cli_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _gateway_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_orphan_reaper_is_bridged_everywhere():
|
||||
"""Regression pin for the startup orphan reaper toggle (issue #20561).
|
||||
|
||||
``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
|
||||
Exited containers from prior SIGKILL'd processes at startup. Same
|
||||
four-site bridge invariant — drift means
|
||||
``terminal.docker_orphan_reaper: false`` silently does nothing for one
|
||||
entry point, and the reaper either runs when the operator disabled it
|
||||
or fails to run when they enabled it.
|
||||
"""
|
||||
assert "docker_orphan_reaper" in _cli_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _gateway_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
|
||||
|
||||
@@ -44,11 +44,17 @@ def server(hermes_home):
|
||||
):
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -30,11 +30,17 @@ def server():
|
||||
import importlib
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -34,11 +34,17 @@ def server():
|
||||
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
def test_init_session_attaches_background_review_callback(server, monkeypatch):
|
||||
|
||||
+427
-40
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
|
||||
return {}
|
||||
|
||||
|
||||
# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
|
||||
# safely through `docker ps --filter label=key=value`. Profile and task names
|
||||
# can technically contain other characters; sanitize defensively.
|
||||
_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
|
||||
|
||||
|
||||
def _sanitize_label_value(value: str) -> str:
|
||||
"""Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
|
||||
|
||||
Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
|
||||
label is always queryable. Used at container-create time; never round-trip
|
||||
a sanitized value back into application logic.
|
||||
"""
|
||||
if not isinstance(value, str) or not value:
|
||||
return "unknown"
|
||||
cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
|
||||
cleaned = cleaned[:63] or "unknown"
|
||||
return cleaned
|
||||
|
||||
|
||||
def _get_active_profile_name() -> str:
|
||||
"""Return the active Hermes profile name, or ``"default"`` on any error.
|
||||
|
||||
Resolved at container-create time so a single container is permanently
|
||||
tagged with the profile that created it. Profile switches inside the
|
||||
same process don't retroactively relabel running containers.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.profiles import get_active_profile_name
|
||||
|
||||
return get_active_profile_name() or "default"
|
||||
except Exception:
|
||||
return "default"
|
||||
|
||||
|
||||
def reap_orphan_containers(
|
||||
*,
|
||||
max_age_seconds: int = 600,
|
||||
profile_filter: str | None = None,
|
||||
docker_exe: str | None = None,
|
||||
) -> int:
|
||||
"""Remove stale hermes-tagged containers left behind by prior processes.
|
||||
|
||||
Targets containers that match all of:
|
||||
|
||||
* ``label=hermes-agent=1`` (created by this codebase)
|
||||
* ``status=exited`` (running containers are NEVER reaped — they may
|
||||
belong to a sibling Hermes process whose reuse path will pick them
|
||||
up; killing them would crash the sibling mid-command)
|
||||
* (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
|
||||
caller's profile by default; a hermes process in profile A must not
|
||||
tear down profile B's containers)
|
||||
* ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
|
||||
process that just exited and is about to be replaced doesn't get
|
||||
its container yanked out from under it)
|
||||
|
||||
Returns the number of containers removed. Best-effort: any failure
|
||||
(docker daemon unreachable, slow inspect, parse error) is logged at
|
||||
debug level and the function returns whatever it managed before the
|
||||
failure. Safe to call repeatedly; idempotent.
|
||||
|
||||
Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
|
||||
terminal exits that bypass the ``atexit`` cleanup hook. Without it,
|
||||
even with the cleanup-fix in the prior commit, a hard-killed Hermes
|
||||
process leaves its container behind permanently because there's no
|
||||
subsequent Hermes process scheduled to reuse that exact (task, profile)
|
||||
pair.
|
||||
"""
|
||||
docker = docker_exe or find_docker() or "docker"
|
||||
filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
|
||||
if profile_filter:
|
||||
filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
|
||||
|
||||
try:
|
||||
listing = subprocess.run(
|
||||
[docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
|
||||
capture_output=True, text=True, timeout=15, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker ps failed: %s", e)
|
||||
return 0
|
||||
if listing.returncode != 0:
|
||||
logger.debug(
|
||||
"orphan reaper docker ps returned %d: %s",
|
||||
listing.returncode, listing.stderr.strip(),
|
||||
)
|
||||
return 0
|
||||
|
||||
candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
|
||||
if not candidate_ids:
|
||||
return 0
|
||||
|
||||
# Inspect each candidate to get FinishedAt; reap only those exited
|
||||
# long enough ago. Doing this per-container (rather than bulk inspect)
|
||||
# keeps the failure blast radius to one container at a time.
|
||||
import datetime
|
||||
now = datetime.datetime.now(datetime.timezone.utc)
|
||||
removed = 0
|
||||
for cid in candidate_ids:
|
||||
finished_at = _container_finished_at(docker, cid)
|
||||
if finished_at is None:
|
||||
# Couldn't determine age — be conservative and leave it alone.
|
||||
continue
|
||||
age = (now - finished_at).total_seconds()
|
||||
if age < max_age_seconds:
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker, "rm", "-f", cid],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
removed += 1
|
||||
logger.info(
|
||||
"Reaped orphan container %s (exited %d seconds ago)",
|
||||
cid[:12], int(age),
|
||||
)
|
||||
else:
|
||||
logger.debug(
|
||||
"docker rm -f %s failed: %s",
|
||||
cid[:12], result.stderr.strip(),
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
|
||||
return removed
|
||||
|
||||
|
||||
def _container_finished_at(docker_exe: str, container_id: str):
|
||||
"""Parse ``docker inspect`` FinishedAt for *container_id*.
|
||||
|
||||
Returns a timezone-aware datetime, or ``None`` if the field is missing,
|
||||
unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
|
||||
for never-finished containers. ``None`` means "don't reap" — the caller
|
||||
leaves the container alone.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
|
||||
capture_output=True, text=True, timeout=10, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
return None
|
||||
raw = result.stdout.strip()
|
||||
if not raw or raw.startswith("0001-01-01"):
|
||||
return None
|
||||
# Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
|
||||
# Python's fromisoformat handles microseconds but not nanoseconds; trim.
|
||||
import re as _re
|
||||
raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
|
||||
raw = raw.replace("Z", "+00:00")
|
||||
try:
|
||||
import datetime
|
||||
return datetime.datetime.fromisoformat(raw)
|
||||
except ValueError as e:
|
||||
logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
|
||||
return None
|
||||
|
||||
|
||||
def find_docker() -> Optional[str]:
|
||||
"""Locate the docker (or podman) CLI binary.
|
||||
|
||||
@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
|
||||
auto_mount_cwd: bool = False,
|
||||
run_as_host_user: bool = False,
|
||||
extra_args: list = None,
|
||||
persist_across_processes: bool = True,
|
||||
):
|
||||
if cwd == "~":
|
||||
cwd = "/root"
|
||||
super().__init__(cwd=cwd, timeout=timeout)
|
||||
self._persistent = persistent_filesystem
|
||||
self._persist_across_processes = persist_across_processes
|
||||
self._task_id = task_id
|
||||
self._forward_env = _normalize_forward_env_names(forward_env)
|
||||
self._env = _normalize_env_dict(env)
|
||||
self._container_id: Optional[str] = None
|
||||
self._labels: dict[str, str] = {}
|
||||
logger.info(f"DockerEnvironment volumes: {volumes}")
|
||||
# Ensure volumes is a list (config.yaml could be malformed)
|
||||
if volumes is not None and not isinstance(volumes, list):
|
||||
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):
|
||||
|
||||
# Start the container directly via `docker run -d`.
|
||||
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
# Labels make hermes-created containers identifiable to:
|
||||
# * the orphan reaper (`hermes-agent=1` for the global sweep filter)
|
||||
# * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
|
||||
# * operators running `docker ps --filter label=hermes-agent=1`
|
||||
# Values are limited to the safe character set defined by
|
||||
# _sanitize_label_value(); the active Hermes profile is captured at
|
||||
# container-start time and never changes for the container's lifetime.
|
||||
profile_name = _sanitize_label_value(_get_active_profile_name())
|
||||
task_label = _sanitize_label_value(task_id)
|
||||
label_args = [
|
||||
"--label", "hermes-agent=1",
|
||||
"--label", f"hermes-task-id={task_label}",
|
||||
"--label", f"hermes-profile={profile_name}",
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
self._labels = {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": task_label,
|
||||
"hermes-profile": profile_name,
|
||||
}
|
||||
|
||||
# Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
|
||||
# container shared across sessions"). If a prior Hermes process
|
||||
# already started a container for this (task_id, profile) and it
|
||||
# still exists, attach to it instead of starting a fresh one. This
|
||||
# restores the documented contract; opt out via
|
||||
# ``terminal.docker_persist_across_processes: false``.
|
||||
#
|
||||
# Reuse matches on labels only — we deliberately do NOT compare image
|
||||
# / mounts / resources. Operators who need a fresh container after
|
||||
# changing those settings should set ``docker_persist_across_processes:
|
||||
# false`` (or run ``docker rm -f`` against the labeled container) to
|
||||
# force a clean start.
|
||||
reused = False
|
||||
if persist_across_processes:
|
||||
existing = self._find_reusable_container(task_label, profile_name)
|
||||
if existing is not None:
|
||||
container_id, state = existing
|
||||
self._container_id = container_id
|
||||
if state != "running":
|
||||
try:
|
||||
subprocess.run(
|
||||
[self._docker_exe, "start", container_id],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
check=True,
|
||||
)
|
||||
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
|
||||
logger.warning(
|
||||
"Failed to start existing container %s (state=%s): "
|
||||
"%s — falling back to a fresh container.",
|
||||
container_id[:12], state, e,
|
||||
)
|
||||
self._container_id = None
|
||||
if self._container_id:
|
||||
logger.info(
|
||||
"Reusing container %s (task=%s, profile=%s, prior state=%s)",
|
||||
container_id[:12], task_label, profile_name, state,
|
||||
)
|
||||
reused = True
|
||||
|
||||
if not reused:
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
*label_args,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
|
||||
# Build the init-time env forwarding args (used only by init_session
|
||||
# to inject host env vars into the snapshot; subsequent commands get
|
||||
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
|
||||
logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
|
||||
return _storage_opt_ok
|
||||
|
||||
def cleanup(self):
|
||||
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
|
||||
if self._container_id:
|
||||
try:
|
||||
# Stop in background so cleanup doesn't block
|
||||
stop_cmd = (
|
||||
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
|
||||
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
|
||||
)
|
||||
subprocess.Popen(stop_cmd, shell=True)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to stop container %s: %s", self._container_id, e)
|
||||
def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
|
||||
"""Look for an existing container labeled for this (task, profile).
|
||||
|
||||
Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
|
||||
failure (including ``docker ps`` itself failing). State is one of the
|
||||
values Docker reports via ``{{.State}}`` — e.g. ``running``, ``exited``,
|
||||
``created``, ``paused``, ``restarting``, ``dead``. The caller decides
|
||||
whether the state warrants ``docker start`` before reuse.
|
||||
|
||||
Restricted to the docker-stored label set this class creates; never
|
||||
matches containers that happened to be named ``hermes-*`` but were
|
||||
started by some other tool.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
self._docker_exe, "ps", "-a",
|
||||
"--filter", "label=hermes-agent=1",
|
||||
"--filter", f"label=hermes-task-id={task_label}",
|
||||
"--filter", f"label=hermes-profile={profile_label}",
|
||||
"--format", "{{.ID}}\t{{.State}}",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("docker ps probe failed: %s — will start a fresh container", e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
logger.debug(
|
||||
"docker ps probe returned %d: %s — will start a fresh container",
|
||||
result.returncode, result.stderr.strip(),
|
||||
)
|
||||
return None
|
||||
lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
|
||||
if not lines:
|
||||
return None
|
||||
# Multiple matches are unusual (one (task, profile) should produce one
|
||||
# container) but can happen if a previous Hermes process crashed
|
||||
# mid-cleanup. Prefer a running one if present; otherwise pick the
|
||||
# first listed. Stale duplicates get reaped by the orphan-reaper in a
|
||||
# follow-up commit; we don't try to be heroic about them here.
|
||||
running = None
|
||||
first = None
|
||||
for ln in lines:
|
||||
parts = ln.split("\t", 1)
|
||||
if len(parts) != 2:
|
||||
continue
|
||||
cid, state = parts[0], parts[1].lower()
|
||||
if first is None:
|
||||
first = (cid, state)
|
||||
if state == "running" and running is None:
|
||||
running = (cid, state)
|
||||
return running or first
|
||||
|
||||
def cleanup(self, *, force_remove: bool = False):
|
||||
"""Tear down the container according to persist mode and *force_remove*.
|
||||
|
||||
Persist-mode (``persist_across_processes=True``, the default) leaves the
|
||||
container **running** untouched. The docs promise "ONE long-lived
|
||||
container shared across sessions" and stopping it on every Hermes exit
|
||||
breaks that promise:
|
||||
|
||||
* Background processes inside the container (``npm run dev``, watchers,
|
||||
long-running pytest) get killed every time the user runs ``/quit``.
|
||||
* Every reuse requires ``docker start`` + waiting for the container to
|
||||
come back up, adding 1–2s to the first tool call of the new session.
|
||||
* The user-visible difference between "ONE long-lived container" and
|
||||
"a new container that happens to share state" is exactly this:
|
||||
processes survive in the former, die in the latter.
|
||||
|
||||
Resource reclamation for the persist-mode case lives in the
|
||||
``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
|
||||
Hermes process touches a labeled container for ``2 × lifetime_seconds``
|
||||
it gets ``docker rm -f``'d at the next Hermes startup. That covers the
|
||||
SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
|
||||
container on every graceful exit.
|
||||
|
||||
Opt-out mode (``persist_across_processes=False``) still does
|
||||
``docker stop`` + ``docker rm -f`` on every cleanup, matching the
|
||||
pre-PR behavior for users who explicitly want per-process isolation.
|
||||
|
||||
``force_remove=True`` overrides persist mode and always tears the
|
||||
container down (``docker stop`` + ``docker rm -f``). This is the
|
||||
explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
|
||||
resets, or any caller that wants a guaranteed fresh container on next
|
||||
``DockerEnvironment(task_id=...)``. No current caller passes
|
||||
``force_remove=True``; the parameter is here so the explicit-teardown
|
||||
semantics can be wired up later without changing this method's
|
||||
signature.
|
||||
|
||||
Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
|
||||
(not the racy ``Popen(... &)`` pattern from before PR #33645). The
|
||||
atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
|
||||
thread to finish before the interpreter exits, so ``docker stop`` /
|
||||
``docker rm`` actually completes when we do trigger it.
|
||||
"""
|
||||
container_id = self._container_id
|
||||
if not container_id:
|
||||
# Still drop the bind-mount dirs if any were allocated and we're
|
||||
# NOT in persist mode (persist mode preserves them).
|
||||
if not self._persistent:
|
||||
# Also schedule removal (stop only leaves it as stopped)
|
||||
try:
|
||||
subprocess.Popen(
|
||||
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
|
||||
shell=True,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
self._container_id = None
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
return
|
||||
|
||||
if not self._persistent:
|
||||
# Decide what to actually do. Three cases:
|
||||
#
|
||||
# force_remove=True → stop + rm (explicit teardown)
|
||||
# persist_across_processes=True → no-op (leave container running)
|
||||
# persist_across_processes=False → stop + rm (per-process isolation)
|
||||
#
|
||||
# The persist-mode no-op is the issue-#20561 contract: the container
|
||||
# outlives Hermes processes, processes inside it stay alive, and
|
||||
# reuse on next startup is instant.
|
||||
if force_remove:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
elif self._persist_across_processes:
|
||||
# No-op for the container. Drop the in-process handle so a fresh
|
||||
# __init__ will re-probe via labels (and find the running
|
||||
# container) instead of trying to reuse a stale Python reference.
|
||||
self._container_id = None
|
||||
return
|
||||
else:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
|
||||
# Capture state needed by the worker before we null out the attrs —
|
||||
# the worker thread can outlive ``self``.
|
||||
docker_exe = self._docker_exe
|
||||
log_id = container_id[:12]
|
||||
|
||||
def _do_cleanup() -> None:
|
||||
if should_stop:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "stop", "-t", "10", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker stop %s timed out / failed: %s", log_id, e)
|
||||
if should_remove:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "rm", "-f", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker rm -f %s failed: %s", log_id, e)
|
||||
|
||||
# Daemon thread: doesn't block interpreter exit (atexit returns
|
||||
# promptly), but unlike the old ``Popen(... &)`` shell trick the
|
||||
# Python-level join semantics let the thread actually run to
|
||||
# completion if the interpreter is still alive. atexit registers
|
||||
# ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
|
||||
# outstanding cleanups, so most exits complete the work cleanly.
|
||||
import threading
|
||||
t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
|
||||
t.start()
|
||||
self._cleanup_thread = t
|
||||
self._container_id = None
|
||||
|
||||
# Bind-mount dir teardown only runs when we actually removed the
|
||||
# container (the dirs are the container's filesystem state; keeping
|
||||
# them around with no container would orphan the data on disk).
|
||||
if should_remove and not self._persistent:
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
|
||||
def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
|
||||
"""Block up to *timeout* seconds for the cleanup worker thread.
|
||||
|
||||
Returns ``True`` if the thread finished (or no thread was started),
|
||||
``False`` on timeout. The atexit hook in terminal_tool.py calls this
|
||||
on every active environment so docker stop/rm actually completes
|
||||
before the Python process exits — without this, ``hermes /quit``
|
||||
races the interpreter shutdown and leaves stopped containers behind.
|
||||
"""
|
||||
thread = getattr(self, "_cleanup_thread", None)
|
||||
if thread is None or not thread.is_alive():
|
||||
return True
|
||||
thread.join(timeout=timeout)
|
||||
return not thread.is_alive()
|
||||
|
||||
+143
-3
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
|
||||
_cleanup_thread = None
|
||||
_cleanup_running = False
|
||||
|
||||
# Once-per-process guard for the docker orphan reaper (issue #20561).
|
||||
# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
|
||||
# calls for parallel subagents won't re-trigger the sweep.
|
||||
_docker_orphan_reaper_ran = False
|
||||
_docker_orphan_reaper_lock = threading.Lock()
|
||||
|
||||
|
||||
def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
|
||||
"""Run the docker orphan reaper once per process, if enabled.
|
||||
|
||||
Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
|
||||
profile that match the issue #20561 leak class — containers left behind
|
||||
by Hermes processes that exited without firing ``atexit`` (SIGKILL,
|
||||
OOM, terminal-window-close). The reaper is conservative by default:
|
||||
only Exited containers older than ``2 × lifetime_seconds`` and scoped to
|
||||
the current profile.
|
||||
|
||||
Gates:
|
||||
|
||||
* ``terminal.docker_orphan_reaper: false`` disables it entirely (the
|
||||
operator opted out — usually because they're running multiple
|
||||
Hermes processes in the same profile and don't trust the
|
||||
conservative defaults).
|
||||
* ``_docker_orphan_reaper_ran`` flag — sweep runs once per Python
|
||||
interpreter, not on every subagent / RL-rollout / parallel
|
||||
``terminal()`` call.
|
||||
"""
|
||||
global _docker_orphan_reaper_ran
|
||||
if not container_config.get("docker_orphan_reaper", True):
|
||||
return
|
||||
# Cheap double-checked-locking: read without the lock, take the lock
|
||||
# only on first run, recheck inside.
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
with _docker_orphan_reaper_lock:
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
_docker_orphan_reaper_ran = True
|
||||
|
||||
# 2 × lifetime_seconds gives sibling Hermes processes a generous grace
|
||||
# window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
|
||||
# doesn't get an instant-reap that races their own setup.
|
||||
# ``container_config`` only carries container_* keys, so read
|
||||
# lifetime_seconds from the env var the rest of the module uses.
|
||||
try:
|
||||
lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
|
||||
except (TypeError, ValueError):
|
||||
lifetime = 300
|
||||
lifetime = max(60, lifetime)
|
||||
max_age = lifetime * 2
|
||||
|
||||
try:
|
||||
from tools.environments.docker import (
|
||||
reap_orphan_containers, _get_active_profile_name,
|
||||
)
|
||||
except ImportError:
|
||||
return
|
||||
try:
|
||||
profile = _get_active_profile_name()
|
||||
removed = reap_orphan_containers(
|
||||
max_age_seconds=max_age, profile_filter=profile,
|
||||
)
|
||||
if removed:
|
||||
logger.info(
|
||||
"Docker orphan reaper removed %d stale container(s) for profile %s",
|
||||
removed, profile,
|
||||
)
|
||||
except Exception as e:
|
||||
# Never fail the env-creation path because of a janitor problem.
|
||||
logger.debug("Docker orphan reaper raised: %s", e)
|
||||
|
||||
|
||||
# Per-task environment overrides registry.
|
||||
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
|
||||
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
|
||||
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
|
||||
"docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
|
||||
"docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
|
||||
"docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
|
||||
# Cross-process container reuse (issue #20561). The docs claim
|
||||
# "ONE long-lived container shared across sessions" — this toggle
|
||||
# makes that real by probing for a labeled container at startup and
|
||||
# attaching to it instead of always starting a fresh one. Set to
|
||||
# ``false`` for hard per-process isolation (no reuse, container is
|
||||
# removed on exit).
|
||||
"docker_persist_across_processes": os.getenv(
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
# Startup orphan reaper for hermes-tagged containers left behind by
|
||||
# crashed / SIGKILL'd previous processes that bypassed atexit.
|
||||
# Conservative: only sweeps Exited containers older than 2× the
|
||||
# idle-reap window AND scoped to the current profile. Issue #20561.
|
||||
"docker_orphan_reaper": os.getenv(
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
}
|
||||
|
||||
|
||||
@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
return _LocalEnvironment(cwd=cwd, timeout=timeout)
|
||||
|
||||
elif env_type == "docker":
|
||||
# One-shot orphan reaper: clean up labeled containers left behind by
|
||||
# prior Hermes processes that hit SIGKILL / OOM / a closed terminal
|
||||
# before the atexit cleanup hook could run. Gated to once per
|
||||
# process so concurrent _create_environment calls (parallel
|
||||
# subagents, RL benchmarks) don't run the reaper N times.
|
||||
# Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
|
||||
_maybe_reap_docker_orphans(cc)
|
||||
return _DockerEnvironment(
|
||||
image=image, cwd=cwd, timeout=timeout,
|
||||
cpu=cpu, memory=memory, disk=disk,
|
||||
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
env=docker_env,
|
||||
run_as_host_user=cc.get("docker_run_as_host_user", False),
|
||||
extra_args=docker_extra_args,
|
||||
persist_across_processes=cc.get("docker_persist_across_processes", True),
|
||||
)
|
||||
|
||||
elif env_type == "singularity":
|
||||
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
|
||||
return cleaned
|
||||
|
||||
|
||||
def cleanup_vm(task_id: str):
|
||||
"""Manually clean up a specific environment by task_id."""
|
||||
def cleanup_vm(task_id: str, *, force_remove: bool = False):
|
||||
"""Manually clean up a specific environment by task_id.
|
||||
|
||||
*force_remove* (default False) is forwarded to backends that accept it
|
||||
— currently only ``DockerEnvironment``. The default of False matches
|
||||
session-lifecycle semantics: this function is called from
|
||||
``AIAgent.close()`` (TUI session close, gateway session teardown) and the
|
||||
per-turn cleanup branch for non-persistent envs, both of which should
|
||||
honor the user's persist-mode preference. Stopping the container here
|
||||
would defeat the "ONE long-lived container shared across sessions"
|
||||
contract — exactly the bug Ben reported when the container was killed
|
||||
on every TUI session close.
|
||||
|
||||
Pass ``force_remove=True`` for actual user-initiated teardown
|
||||
(e.g. ``/reset``-style flows that haven't been wired yet, or future
|
||||
"destroy my sandbox" commands).
|
||||
|
||||
The idle reaper passes the env through ``env.cleanup()`` directly (not
|
||||
via this function), so persist-mode idle envs are similarly no-op'd —
|
||||
only the orphan reaper at next startup reclaims them.
|
||||
"""
|
||||
# Remove from tracking dicts while holding the lock, but defer the
|
||||
# actual (potentially slow) env.cleanup() call to outside the lock
|
||||
# so other tool calls aren't blocked.
|
||||
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):
|
||||
|
||||
try:
|
||||
if hasattr(env, 'cleanup'):
|
||||
env.cleanup()
|
||||
# Pass force_remove only if the env's cleanup() accepts it
|
||||
# (DockerEnvironment after issue #20561; other backends don't).
|
||||
import inspect
|
||||
sig = inspect.signature(env.cleanup)
|
||||
if "force_remove" in sig.parameters:
|
||||
env.cleanup(force_remove=force_remove)
|
||||
else:
|
||||
env.cleanup()
|
||||
elif hasattr(env, 'stop'):
|
||||
env.stop()
|
||||
elif hasattr(env, 'terminate'):
|
||||
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
|
||||
if _active_environments:
|
||||
count = len(_active_environments)
|
||||
logger.info("Shutting down %d remaining sandbox(es)...", count)
|
||||
# Snapshot the env objects BEFORE cleanup_all_environments empties
|
||||
# the dict; we need them to wait on docker cleanup threads after the
|
||||
# registry has been cleared.
|
||||
envs_to_wait = list(_active_environments.values())
|
||||
cleanup_all_environments()
|
||||
# Block briefly so docker stop/rm actually completes before the
|
||||
# interpreter exits. Issue #20561 — without this join, the daemon
|
||||
# cleanup threads were getting torn down mid-`docker stop`, leaving
|
||||
# Exited containers piled up on the host.
|
||||
for env in envs_to_wait:
|
||||
wait_fn = getattr(env, "wait_for_cleanup", None)
|
||||
if wait_fn is None:
|
||||
continue
|
||||
try:
|
||||
wait_fn(timeout=15.0)
|
||||
except Exception as e: # never block shutdown on a bad backend
|
||||
logger.debug("wait_for_cleanup raised on exit: %s", e)
|
||||
|
||||
atexit.register(_atexit_cleanup)
|
||||
|
||||
@@ -1746,6 +1884,8 @@ def terminal_tool(
|
||||
"docker_env": config.get("docker_env", {}),
|
||||
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
|
||||
"docker_extra_args": config.get("docker_extra_args", []),
|
||||
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
|
||||
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
|
||||
}
|
||||
|
||||
local_config = None
|
||||
|
||||
@@ -1589,7 +1589,7 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
source = { editable = "." }
|
||||
dependencies = [
|
||||
{ name = "croniter" },
|
||||
|
||||
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools
|
||||
|
||||
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
|
||||
|
||||
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
|
||||
**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
@@ -138,8 +138,11 @@ terminal:
|
||||
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
|
||||
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
|
||||
docker_run_as_host_user: false # See "Running container as host user" below
|
||||
docker_forward_env: # Env vars to forward into container
|
||||
docker_forward_env: # Host env vars to forward into container
|
||||
- "GITHUB_TOKEN"
|
||||
docker_env: # Literal env vars to inject (KEY=value)
|
||||
DEBUG: "1"
|
||||
PYTHONUNBUFFERED: "1"
|
||||
docker_volumes: # Host directory mounts
|
||||
- "/home/user/projects:/workspace/projects"
|
||||
- "/home/user/data:/data:ro" # :ro for read-only
|
||||
@@ -151,14 +154,49 @@ terminal:
|
||||
container_cpu: 1 # CPU cores (0 = unlimited)
|
||||
container_memory: 5120 # MB (0 = unlimited)
|
||||
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
|
||||
container_persistent: true # Persist /workspace and /root across sessions
|
||||
container_persistent: true # Persist /workspace and /root bind-mount dirs
|
||||
|
||||
# Cross-process container reuse (defaults match the "one long-lived
|
||||
# container shared across sessions" contract — see Container lifecycle).
|
||||
docker_persist_across_processes: true # Reuse container across Hermes restarts
|
||||
docker_orphan_reaper: true # Sweep abandoned Exited containers at startup
|
||||
|
||||
# Cross-backend lifecycle settings (apply to docker as well)
|
||||
timeout: 180 # Per-command timeout in seconds
|
||||
lifetime_seconds: 300 # Idle-reaper window; also feeds 2× orphan-reaper threshold
|
||||
```
|
||||
|
||||
**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
|
||||
|
||||
**`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.
|
||||
|
||||
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
|
||||
|
||||
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
|
||||
#### Container lifecycle
|
||||
|
||||
Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
|
||||
|
||||
- `hermes-agent=1` — marks it as Hermes-managed
|
||||
- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
|
||||
- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
|
||||
|
||||
On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
|
||||
|
||||
When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
|
||||
|
||||
**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
|
||||
|
||||
| Trigger | When it fires |
|
||||
|---|---|
|
||||
| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
|
||||
| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
|
||||
| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
|
||||
| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
|
||||
|
||||
Edge cases worth knowing:
|
||||
|
||||
- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
|
||||
- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.
|
||||
|
||||
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
|
||||
|
||||
@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta
|
||||
|
||||
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
|
||||
|
||||
#### Environment variable overrides
|
||||
|
||||
Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
|
||||
|
||||
| Env var | Maps to | Notes |
|
||||
|---|---|---|
|
||||
| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
|
||||
| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
|
||||
| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
|
||||
| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
|
||||
| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
|
||||
| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
|
||||
| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
|
||||
| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
|
||||
| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
|
||||
| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
|
||||
| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
|
||||
| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
|
||||
|
||||
### SSH Backend
|
||||
|
||||
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
|
||||
|
||||
@@ -54,12 +54,7 @@ This behavior applies to the s6-based image only. Earlier (tini-based) images st
|
||||
:::
|
||||
|
||||
:::note Where gateway logs go
|
||||
Inside the s6 image, the supervised gateway's output is tee'd to two destinations:
|
||||
|
||||
- **`docker logs <container>`** — every line in real time (raw, no extra prefix). This is the same stream you'd get from a foreground gateway, so existing `docker logs --follow` / `--timestamps` / log-shipper integrations work unchanged.
|
||||
- **`${HERMES_HOME}/logs/gateways/<profile>/current`** (mapped to `~/.hermes/logs/gateways/<profile>/current` on the host via the volume mount) — rotated, with an ISO 8601 timestamp prepended per line. Rotation is 10 archives × 1 MB each, so it can't fill the disk. This is what `hermes logs` reads and what survives container restarts.
|
||||
|
||||
The per-profile reconciler keeps a separate audit log at `${HERMES_HOME}/logs/container-boot.log` — one line per profile per container boot, recording whether each gateway was restored to its prior state.
|
||||
See the [Where the logs go](#where-the-logs-go) section below for the full routing map (per-profile gateways, dashboard, boot reconciler, container-wide `docker logs`).
|
||||
:::
|
||||
|
||||
Note: the API server is gated on `API_SERVER_ENABLED=true`. To expose it beyond `127.0.0.1` inside the container, also set `API_SERVER_HOST=0.0.0.0` and an `API_SERVER_KEY` (minimum 8 characters — generate one with `openssl rand -hex 32`). Example:
|
||||
@@ -81,7 +76,7 @@ Opening any port on an internet facing machine is a security risk. You should no
|
||||
|
||||
## Running the dashboard
|
||||
|
||||
The built-in web dashboard runs as an optional side-process inside the same container as the gateway. Set `HERMES_DASHBOARD=1` to run the dashboard on container loopback (`127.0.0.1`) by default:
|
||||
The built-in web dashboard runs as a supervised s6-rc service alongside the gateway in the same container. Set `HERMES_DASHBOARD=1` to bring it up:
|
||||
|
||||
```sh
|
||||
docker run -d \
|
||||
@@ -89,54 +84,38 @@ docker run -d \
|
||||
--restart unless-stopped \
|
||||
-v ~/.hermes:/opt/data \
|
||||
-p 8642:8642 \
|
||||
-p 9119:9119 \
|
||||
-e HERMES_DASHBOARD=1 \
|
||||
nousresearch/hermes-agent gateway run
|
||||
```
|
||||
|
||||
The entrypoint starts `hermes dashboard` in the background (running as the non-root `hermes` user) before `exec`-ing the main command. Dashboard output is prefixed with `[dashboard]` in `docker logs` so it's easy to separate from gateway logs.
|
||||
The dashboard is supervised by s6 — if it crashes, `s6-supervise` restarts it automatically after a short backoff. Dashboard stdout/stderr is forwarded to `docker logs <container>` (no prefix; the gateway's own output now lives in a per-profile s6-log file — see [Where the logs go](#where-the-logs-go) below — so the two streams don't clash).
|
||||
|
||||
| Environment variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to launch the dashboard alongside the main command | *(unset — dashboard not started)* |
|
||||
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `127.0.0.1` |
|
||||
| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to enable the supervised dashboard service | *(unset — service is registered but stays down)* |
|
||||
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `0.0.0.0` |
|
||||
| `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
|
||||
| `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
|
||||
| `HERMES_DASHBOARD_INSECURE` | Set to `1` (or `true` / `yes`) to bind without the OAuth auth gate. Only use on trusted networks behind a reverse proxy without the OAuth contract — the dashboard exposes API keys and session data | *(unset — gate enforced when a `DashboardAuthProvider` is registered)* |
|
||||
|
||||
By default, the dashboard stays on loopback (`127.0.0.1`) to avoid exposing
|
||||
the web surface over the network. To publish it intentionally, set
|
||||
`HERMES_DASHBOARD_HOST=0.0.0.0`. The dashboard's OAuth auth gate engages
|
||||
automatically whenever:
|
||||
The dashboard inside the container defaults to binding `0.0.0.0` — without it, the published `-p 9119:9119` port would not be reachable from the host. To restrict the bind to container loopback (for sidecar / reverse-proxy setups), set `HERMES_DASHBOARD_HOST=127.0.0.1`.
|
||||
|
||||
1. The bind host is non-loopback, **and**
|
||||
The dashboard's OAuth auth gate engages automatically when both of the following are true:
|
||||
|
||||
1. The bind host is non-loopback (e.g. the default `0.0.0.0` inside the container), **and**
|
||||
2. A `DashboardAuthProvider` plugin is registered.
|
||||
|
||||
The bundled `dashboard_auth/nous` provider activates whenever
|
||||
`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see
|
||||
[Web Dashboard → Authentication](features/web-dashboard.md)). With the
|
||||
gate engaged, browser callers are redirected to the configured portal's
|
||||
OAuth flow before they can reach any protected route.
|
||||
The bundled `dashboard_auth/nous` provider activates whenever `HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see [Web Dashboard → Authentication](features/web-dashboard.md)). With the gate engaged, browser callers are redirected to the configured portal's OAuth flow before they can reach any protected route.
|
||||
|
||||
If no provider is registered and the bind is non-loopback, the dashboard
|
||||
**fails closed at startup** with a specific error pointing at the
|
||||
missing env var. To opt out of the gate explicitly — for a trusted-LAN
|
||||
deployment behind your own reverse proxy without the OAuth contract —
|
||||
set `HERMES_DASHBOARD_INSECURE=1`. This re-enables the legacy "no auth,
|
||||
loud warning" mode and is the only path that disables the gate; the bind
|
||||
host does not implicitly determine `--insecure` anymore.
|
||||
If no provider is registered and the bind is non-loopback, the dashboard **fails closed at startup** with a specific error pointing at the missing env var. To opt out of the gate explicitly — for a trusted-LAN deployment behind your own reverse proxy without the OAuth contract — set `HERMES_DASHBOARD_INSECURE=1`. This is the **only** path that disables the gate; the bind host alone never implies `--insecure` (it used to, but that predated the OAuth gate and silently disabled it on every container-deployed dashboard).
|
||||
|
||||
:::note
|
||||
The dashboard runs as a supervised s6 service inside the container. If
|
||||
the dashboard process crashes, s6-overlay restarts it automatically
|
||||
after a short backoff — you'll see a new PID without needing to
|
||||
restart the container. Logs and crash output are visible via
|
||||
`docker logs <container>` (s6 forwards service stdout/stderr there).
|
||||
|
||||
Running the dashboard as a separate container is not supported: its
|
||||
gateway-liveness detection requires a shared PID namespace with the
|
||||
gateway process.
|
||||
:::warning `HERMES_DASHBOARD_INSECURE=1` exposes API keys
|
||||
Opting out of the OAuth gate serves the dashboard's API surface (including model keys and session data) to anyone who can reach the published port. Only enable it when you have your own auth layer in front, or on a trusted LAN you fully control.
|
||||
:::
|
||||
|
||||
Running the dashboard as a separate container is not supported: its gateway-liveness detection requires a shared PID namespace with the gateway process.
|
||||
|
||||
## Running interactively (CLI chat)
|
||||
|
||||
To open an interactive chat session against a running data directory:
|
||||
@@ -179,37 +158,60 @@ Never run two Hermes **gateway** containers against the same data directory simu
|
||||
|
||||
## Multi-profile support
|
||||
|
||||
Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
|
||||
Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` subdirectories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **Inside the official Docker image, the s6 supervision tree treats each profile as a first-class supervised service**, so the recommended deployment is **one container hosting all profiles**.
|
||||
|
||||
Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
|
||||
Each profile created with `hermes profile create <name>` gets:
|
||||
|
||||
- A dedicated s6 service slot at `/run/service/gateway-<name>/`, registered dynamically by the runtime — no container rebuild required.
|
||||
- Auto-restart on crash, backoff-managed by `s6-supervise`.
|
||||
- Per-profile rotated logs at `${HERMES_HOME}/logs/gateways/<name>/current` (10 archives × 1 MB each).
|
||||
- State persistence across container restarts: the boot-time reconciler reads `gateway_state.json` from each profile directory and brings the slot back up only for profiles whose last recorded state was `running`. Stopped profiles stay stopped.
|
||||
|
||||
The lifecycle commands you'd run on the host work the same way from inside the container:
|
||||
|
||||
```sh
|
||||
# Work profile
|
||||
docker run -d \
|
||||
--name hermes-work \
|
||||
--restart unless-stopped \
|
||||
-v ~/.hermes-work:/opt/data \
|
||||
-p 8642:8642 \
|
||||
nousresearch/hermes-agent gateway run
|
||||
# Create a profile — registers the gateway-<name> s6 slot.
|
||||
docker exec hermes hermes profile create coder
|
||||
|
||||
# Personal profile
|
||||
docker run -d \
|
||||
--name hermes-personal \
|
||||
--restart unless-stopped \
|
||||
-v ~/.hermes-personal:/opt/data \
|
||||
-p 8643:8642 \
|
||||
nousresearch/hermes-agent gateway run
|
||||
# Start / stop / restart — dispatches s6-svc; the gateway lifecycle survives docker restart.
|
||||
docker exec hermes hermes -p coder gateway start
|
||||
docker exec hermes hermes -p coder gateway stop
|
||||
docker exec hermes hermes -p coder gateway restart
|
||||
|
||||
# Status — reports `Manager: s6 (container supervisor)` inside the container.
|
||||
docker exec hermes hermes -p coder gateway status
|
||||
|
||||
# Remove a profile — tears down the s6 slot too.
|
||||
docker exec hermes hermes profile delete coder
|
||||
```
|
||||
|
||||
Why separate containers over profiles in Docker:
|
||||
Under the hood, `hermes gateway start/stop/restart` inside the container is intercepted and routed to `s6-svc` against the right service directory; you don't need to learn the s6 commands directly. For raw supervisor state, use `/command/s6-svstat /run/service/gateway-<name>` (note `/command/` is on PATH only for processes spawned by the supervision tree — when calling from `docker exec`, pass the absolute path).
|
||||
|
||||
- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
|
||||
- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
|
||||
- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
|
||||
- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
|
||||
- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
|
||||
### Why one container with many profiles, not many containers
|
||||
|
||||
In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
|
||||
Before the s6 migration, "one container per profile" was the recommended pattern because there was no in-container supervisor to manage multiple gateways. With s6 as PID 1, that's no longer necessary, and the single-container layout is simpler in almost every dimension:
|
||||
|
||||
| | One container, many profiles | One container per profile |
|
||||
|---|---|---|
|
||||
| Disk overhead | One image, one bundled venv, one Playwright cache | N images / N caches |
|
||||
| Memory overhead | Shared Python interpreter cache, shared node_modules | Duplicated per container |
|
||||
| Profile creation | `docker exec ... hermes profile create <name>` (seconds) | New `docker run` invocation + port allocation + bind-mount config |
|
||||
| Per-profile crash recovery | `s6-supervise` auto-restart | Docker's `--restart unless-stopped` (slower, kills sibling work) |
|
||||
| Logs | Per-profile rotated file via `s6-log`, plus container-boot audit log | `docker logs <name>` per container — no built-in rotation |
|
||||
| Backup | One `~/.hermes` directory | N directories to coordinate |
|
||||
|
||||
The default profile (`default`) is always registered on first boot, so a fresh container ships with one supervised gateway out of the box. Additional profiles are pure runtime adds.
|
||||
|
||||
### When you DO want a separate container
|
||||
|
||||
Profile-in-container is the default. Run a separate container per profile only when you have a specific reason:
|
||||
|
||||
- **Resource isolation per workload** — e.g. a runaway browser-tool session in profile A shouldn't be able to OOM profile B. Containers give you `--memory` / `--cpus` per profile.
|
||||
- **Independent image pinning** — different upstream image tags per workload.
|
||||
- **Network segmentation** — distinct Docker networks per profile (e.g. one customer-facing, one internal).
|
||||
- **Compliance / blast radius** — distinct credentials never share an OS-level process tree.
|
||||
|
||||
In those cases, declare one service per profile with distinct `container_name`, `volumes`, and `ports`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
@@ -234,6 +236,24 @@ services:
|
||||
- ~/.hermes-personal:/opt/data
|
||||
```
|
||||
|
||||
The warning from [Persistent volumes](#persistent-volumes) still applies: never point two containers at the same `~/.hermes` directory simultaneously. The s6 supervisor inside each container manages its own profile set; cross-container sharing of a data volume corrupts session files and memory stores.
|
||||
|
||||
## Where the logs go
|
||||
|
||||
The s6 container has four distinct log surfaces, and "why isn't my gateway showing anything in `docker logs`" is a common surprise. Cheatsheet:
|
||||
|
||||
| Source | Where it lands | How to read it |
|
||||
|---|---|---|
|
||||
| **Per-profile gateway** (`hermes gateway run` and per-profile gateways under s6) | Tee'd to two places: `docker logs <container>` (real time, no extra prefix) **and** `${HERMES_HOME}/logs/gateways/<profile>/current` (rotated, ISO-8601 timestamped, 10 archives × 1 MB each) | `docker logs -f hermes` or `tail -F ~/.hermes/logs/gateways/default/current` on the host |
|
||||
| **Dashboard** (when `HERMES_DASHBOARD=1`) | `docker logs <container>` (no prefix) | `docker logs -f hermes` — interleaved with gateway lines |
|
||||
| **Boot reconciler** (records which profile gateways were restored on each container start) | `${HERMES_HOME}/logs/container-boot.log` (append-only audit log) | `tail -F ~/.hermes/logs/container-boot.log` |
|
||||
| **Generic Hermes logs** (`agent.log`, `errors.log`) | `${HERMES_HOME}/logs/` (profile-aware) | `docker exec hermes hermes logs --follow [--level WARNING] [--session <id>]` |
|
||||
|
||||
Two practical consequences worth knowing:
|
||||
|
||||
- The file copy at `logs/gateways/<profile>/current` is what survives container restarts. `docker logs` only retains output from the current container's lifetime (and is wiped on `docker rm`); the rotated files persist on the bind-mounted volume.
|
||||
- The boot reconciler's audit line shape is `<iso-timestamp> profile=<name> prior_state=<state> action=<registered|started>`, so a quick `grep profile=coder ~/.hermes/logs/container-boot.log` reveals when a given profile was last restored and whether s6 auto-started it.
|
||||
|
||||
## Environment variable forwarding
|
||||
|
||||
API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
|
||||
@@ -249,7 +269,7 @@ docker run -it --rm \
|
||||
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
|
||||
|
||||
:::note Looking for Docker as the **terminal backend**?
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
|
||||
:::
|
||||
|
||||
## Docker Compose example
|
||||
@@ -281,7 +301,7 @@ services:
|
||||
cpus: "2.0"
|
||||
```
|
||||
|
||||
Start with `docker compose up -d` and view logs with `docker compose logs -f`. Dashboard output is prefixed with `[dashboard]` so it's easy to filter from gateway logs.
|
||||
Start with `docker compose up -d` and view logs with `docker compose logs -f`. The supervised gateway's stdout is also tee'd to `${HERMES_HOME}/logs/gateways/<profile>/current` on the volume — see [Where the logs go](#where-the-logs-go) for the full routing map.
|
||||
|
||||
## Optional: Linux desktop audio bridge
|
||||
|
||||
@@ -415,24 +435,28 @@ The container ENTRYPOINT is now `/init` (s6-overlay), not `/usr/bin/tini`. All f
|
||||
Do not override the image entrypoint unless you keep `/init` (or, equivalently, the legacy `docker/entrypoint.sh` shim that forwards to the stage2 hook) in the command chain. s6-overlay's `/init` runs as root so it can chown the volume on first boot, then drops to the `hermes` user via `s6-setuidgid` for every supervised service AND for the main program. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
|
||||
:::
|
||||
|
||||
### Per-profile gateway supervision
|
||||
### `docker exec` automatically drops to the `hermes` user
|
||||
|
||||
Inside the container, each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`. The lifecycle commands you'd run on the host work the same way:
|
||||
`docker exec hermes <cmd>` defaults to running as root inside the container, but the image ships a thin shim at `/opt/hermes/bin/hermes` (earliest on PATH) that detects root callers and transparently re-execs through `s6-setuidgid hermes`. So `docker exec hermes login`, `docker exec hermes profile create …`, `docker exec hermes setup`, etc. all write files owned by UID 10000 — i.e. readable by the supervised gateway — with no extra `--user` flag needed. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents inside the container) hit a short-circuit that exec's the venv binary directly, so there's no overhead on the hot paths.
|
||||
|
||||
If you specifically need a `docker exec` that retains root semantics (diagnostic sessions, inspecting root-only state, files outside `/opt/data` that root happens to own), opt out per invocation:
|
||||
|
||||
```sh
|
||||
hermes profile create coder # registers gateway-coder s6 slot
|
||||
hermes -p coder gateway start # s6-svc -u → supervised gateway
|
||||
hermes -p coder gateway stop # s6-svc -d → service down
|
||||
hermes -p coder gateway restart # s6-svc -t → SIGTERM the supervisor
|
||||
hermes profile delete coder # tears down the s6 slot
|
||||
docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 hermes <cmd>
|
||||
```
|
||||
|
||||
The shim accepts `1` / `true` / `yes` (case-insensitive). Anything else — including typos like `=0` — falls through to the drop, so silent opt-outs aren't possible. If `s6-setuidgid` isn't available (custom builds that stripped s6-overlay), the shim refuses to run as root and exits 126 instead, surfacing the broken privilege model loudly rather than regressing to the historical footgun where `docker exec hermes login` would write `auth.json` as `root:root` and break the supervised gateway's auth on every chat platform message.
|
||||
|
||||
### Per-profile gateway supervision
|
||||
|
||||
Each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`, with state-persistent auto-restart across container restarts. See [Multi-profile support](#multi-profile-support) above for the user-facing workflow and the lifecycle commands.
|
||||
|
||||
**Supervision benefits over the pre-s6 image:**
|
||||
|
||||
- Gateway crashes are auto-restarted by `s6-supervise` after a ~1s backoff.
|
||||
- Dashboard crashes are auto-restarted (set `HERMES_DASHBOARD=1` to start it).
|
||||
- Dashboard, when enabled with `HERMES_DASHBOARD=1`, is supervised on the same supervision tree and gets the same auto-restart treatment.
|
||||
- `docker restart` preserves running gateways: the cont-init reconciler reads `$HERMES_HOME/profiles/<name>/gateway_state.json` and brings the slot back up if the last recorded state was `running`. Stopped gateways stay stopped.
|
||||
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot.
|
||||
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot. See [Where the logs go](#where-the-logs-go) for the full routing map.
|
||||
|
||||
`hermes status` inside the container reports `Manager: s6 (container supervisor)`. Use `/command/s6-svstat /run/service/gateway-<name>` for the raw supervisor view (note `/command/` is on PATH for supervision-tree processes only; pass the absolute path when calling from `docker exec`).
|
||||
|
||||
@@ -692,6 +716,8 @@ The container's stage2 hook drops privileges to the non-root `hermes` user (UID
|
||||
chmod -R 755 ~/.hermes
|
||||
```
|
||||
|
||||
`docker exec hermes <cmd>` automatically drops to UID 10000 too — see [`docker exec` automatically drops to the `hermes` user](#docker-exec-automatically-drops-to-the-hermes-user) for details and the per-invocation opt-out.
|
||||
|
||||
### Browser tools not working
|
||||
|
||||
Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
|
||||
|
||||
Reference in New Issue
Block a user