Compare commits
9 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 7e6d4b982e | |||
| 3c6e70aef1 | |||
| 2f0f03c40d | |||
| 5c2170a7c6 | |||
| d77d877665 | |||
| ac8e238bc8 | |||
| 8d129d013b | |||
| 300140e006 | |||
| e71a2bd11b |
@@ -0,0 +1,110 @@
|
||||
# Hermes Agent v0.15.1 (v2026.5.29)
|
||||
|
||||
**Release Date:** May 29, 2026
|
||||
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
|
||||
|
||||
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Highlights
|
||||
|
||||
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
|
||||
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
|
||||
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
|
||||
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Bug Fixes
|
||||
|
||||
### Dashboard / Web
|
||||
|
||||
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
### Docker
|
||||
|
||||
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
|
||||
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
|
||||
|
||||
### MCP
|
||||
|
||||
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
### Kanban
|
||||
|
||||
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
|
||||
|
||||
### Gateway
|
||||
|
||||
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
|
||||
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
|
||||
|
||||
### CLI
|
||||
|
||||
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
|
||||
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
|
||||
|
||||
### Skills
|
||||
|
||||
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
### Redaction
|
||||
|
||||
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
|
||||
|
||||
---
|
||||
|
||||
## ✨ Small Features
|
||||
|
||||
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
|
||||
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
|
||||
|
||||
---
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Infrastructure
|
||||
|
||||
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
|
||||
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
|
||||
|
||||
---
|
||||
|
||||
## 👥 Contributors
|
||||
|
||||
### Core
|
||||
- @teknium1
|
||||
|
||||
### Community
|
||||
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
|
||||
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
|
||||
- @kshitijk4poor — `/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
|
||||
- @nicoloboschi — hindsight `recall_types` observation default
|
||||
- @BROCCOLO1D — arm64 PR build cache fix
|
||||
- @r266-tech — `--no-supervise` reference docs
|
||||
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
|
||||
- @devwdave — completed-turn memory context (credited via salvage)
|
||||
- @andrewhosf — co-author
|
||||
|
||||
### Issue Reporters (the 401 loop)
|
||||
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
|
||||
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
|
||||
|
||||
---
|
||||
|
||||
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"id": "hermes-agent",
|
||||
"name": "Hermes Agent",
|
||||
"version": "0.15.0",
|
||||
"version": "0.15.1",
|
||||
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
|
||||
"repository": "https://github.com/NousResearch/hermes-agent",
|
||||
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
|
||||
@@ -9,7 +9,7 @@
|
||||
"license": "MIT",
|
||||
"distribution": {
|
||||
"uvx": {
|
||||
"package": "hermes-agent[acp]==0.15.0",
|
||||
"package": "hermes-agent[acp]==0.15.1",
|
||||
"args": ["hermes-acp"]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
# Persistent shell (non-local backends)
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
|
||||
@@ -831,6 +831,8 @@ if _config_path.exists():
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
}
|
||||
|
||||
@@ -14,8 +14,8 @@ Provides subcommands for:
|
||||
import os
|
||||
import sys
|
||||
|
||||
__version__ = "0.15.0"
|
||||
__release_date__ = "2026.5.28"
|
||||
__version__ = "0.15.1"
|
||||
__release_date__ = "2026.5.29"
|
||||
|
||||
|
||||
def _ensure_utf8():
|
||||
|
||||
@@ -5551,6 +5551,8 @@ def set_config_value(key: str, value: str):
|
||||
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
|
||||
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
|
||||
# terminal.cwd intentionally excluded — CLI resolves at runtime,
|
||||
# gateway bridges it in gateway/run.py. Persisting to .env causes
|
||||
|
||||
@@ -26,10 +26,15 @@ from hermes_cli.dashboard_auth import list_providers
|
||||
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
|
||||
from hermes_cli.dashboard_auth.base import ProviderError
|
||||
from hermes_cli.dashboard_auth.cookies import read_session_cookies
|
||||
from hermes_cli.dashboard_auth.public_paths import PUBLIC_API_PATHS
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
# Paths that bypass the auth gate. Order matters: prefix match.
|
||||
# Prefixes that bypass the auth gate. Match via ``path == prefix`` or
|
||||
# ``path.startswith(prefix)`` — so ``/assets/`` (with trailing slash)
|
||||
# matches ``/assets/foo.css`` but not ``/assetsleak``. Auth-bootstrap
|
||||
# (login page, OAuth round trip, provider listing) and static asset
|
||||
# mounts go here.
|
||||
_GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
|
||||
"/auth/login",
|
||||
"/auth/callback",
|
||||
@@ -45,6 +50,20 @@ _GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
|
||||
|
||||
|
||||
def _path_is_public(path: str) -> bool:
|
||||
"""True if ``path`` bypasses the OAuth auth gate.
|
||||
|
||||
Two sources of public-ness:
|
||||
|
||||
* :data:`PUBLIC_API_PATHS` — the shared ``/api/*`` allowlist that
|
||||
the legacy ``_SESSION_TOKEN`` middleware also honours. Matched
|
||||
exactly (no prefix expansion) so adding ``/api/status`` doesn't
|
||||
accidentally expose ``/api/status/secret-extension``.
|
||||
* :data:`_GATE_PUBLIC_PREFIXES` — auth-bootstrap routes and static
|
||||
mounts. Prefix-matched so ``/assets/foo.css`` lights up via
|
||||
``/assets/``.
|
||||
"""
|
||||
if path in PUBLIC_API_PATHS:
|
||||
return True
|
||||
return any(
|
||||
path == prefix or path.startswith(prefix)
|
||||
for prefix in _GATE_PUBLIC_PREFIXES
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
"""Shared allowlist of ``/api/*`` paths that bypass dashboard auth.
|
||||
|
||||
Two middlewares enforce dashboard auth and previously kept independent
|
||||
copies of this list:
|
||||
|
||||
* ``hermes_cli.web_server.auth_middleware`` — loopback / ``--insecure``
|
||||
mode, gates on the ephemeral ``_SESSION_TOKEN``.
|
||||
* ``hermes_cli.dashboard_auth.middleware.gated_auth_middleware`` —
|
||||
non-loopback mode, gates on the OAuth session cookie.
|
||||
|
||||
When the lists drifted, ``/api/status`` ended up public under the legacy
|
||||
gate but 401'd under the OAuth gate. That broke the portal's wildcard
|
||||
liveness probe (``nous-account-service`` ``fly-provider.ts``
|
||||
``getInstanceRuntimeStatus``), which fetches ``/api/status`` without a
|
||||
cookie as its sole signal of "agent dashboard is alive": every healthy
|
||||
wildcard-subdomain agent surfaced as STARTING/down in the portal UI even
|
||||
though the dashboard was serving correctly.
|
||||
|
||||
Centralising the allowlist here so both middlewares import the same
|
||||
frozenset prevents the next drift. Keep this list minimal — only truly
|
||||
non-sensitive, read-only endpoints belong here. As a sanity check, every
|
||||
entry should be safe to expose to:
|
||||
|
||||
* external uptime probes (Pingdom, Better Stack, NAS),
|
||||
* the dashboard SPA before the user has logged in,
|
||||
* anyone who happens to ``curl`` the hostname.
|
||||
|
||||
If a new endpoint doesn't pass all three tests, it should be gated and
|
||||
the SPA should bootstrap it after login instead.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
PUBLIC_API_PATHS: frozenset[str] = frozenset({
|
||||
# Liveness probe target. Returns version, gateway state, active
|
||||
# session count, and the dashboard auth-gate shape. No bodies, no
|
||||
# session content, no secrets. Documented as the portal's wildcard
|
||||
# liveness probe in
|
||||
# ``docs/agent-dashboard-public-url-contract.md`` (NAS side).
|
||||
"/api/status",
|
||||
# Read-only config-defaults / schema feeds for the SPA's Config page.
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
# Read-only model metadata (context windows, etc.) — same shape as
|
||||
# provider catalogs already exposed on the public internet.
|
||||
"/api/model/info",
|
||||
# Read-only theme + plugin manifests for the dashboard skin engine.
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
})
|
||||
+13
-10
@@ -110,17 +110,20 @@ app.add_middleware(
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Endpoints that do NOT require the session token. Everything else under
|
||||
# /api/ is gated by the auth middleware below. Keep this list minimal —
|
||||
# only truly non-sensitive, read-only endpoints belong here.
|
||||
# /api/ is gated by the auth middleware below.
|
||||
#
|
||||
# This list is defined in ``hermes_cli.dashboard_auth.public_paths`` so the
|
||||
# OAuth gate middleware can honour the same allowlist — keeping the two
|
||||
# gates in lockstep avoids drift like the wildcard-subdomain regression
|
||||
# where ``/api/status`` was public under the legacy gate but 401'd under
|
||||
# the OAuth gate (breaking the portal's liveness probe).
|
||||
#
|
||||
# Keep the upstream list minimal — only truly non-sensitive, read-only
|
||||
# endpoints belong there.
|
||||
# ---------------------------------------------------------------------------
|
||||
_PUBLIC_API_PATHS: frozenset = frozenset({
|
||||
"/api/status",
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
"/api/model/info",
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
})
|
||||
from hermes_cli.dashboard_auth.public_paths import (
|
||||
PUBLIC_API_PATHS as _PUBLIC_API_PATHS,
|
||||
)
|
||||
|
||||
|
||||
def _has_valid_session_token(request: Request) -> bool:
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
|
||||
"TERMINAL_CONTAINER_DISK",
|
||||
"TERMINAL_CONTAINER_MEMORY",
|
||||
"TERMINAL_CONTAINER_PERSISTENT",
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"BROWSER_CDP_URL",
|
||||
"CAMOFOX_URL",
|
||||
|
||||
@@ -324,10 +324,14 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
|
||||
1. ``/api/auth/providers`` (publicly reachable through the gate so
|
||||
the login page can bootstrap) returns 200 with ``nous`` in the
|
||||
provider list — proves the bundled provider registered.
|
||||
2. ``/api/status`` (a public endpoint under the legacy
|
||||
``_SESSION_TOKEN`` middleware) returns 401 — proves the OAuth gate
|
||||
runs upstream of the legacy public list and is actively
|
||||
intercepting unauthenticated callers.
|
||||
2. ``/api/sessions`` (a gated route under both the legacy
|
||||
``_SESSION_TOKEN`` middleware and the OAuth gate) returns 401
|
||||
to an unauthenticated caller — proves the OAuth gate is actively
|
||||
intercepting browser traffic. We deliberately probe a gated route
|
||||
here rather than ``/api/status``: status sits in the shared
|
||||
``PUBLIC_API_PATHS`` allowlist (portal liveness probe target) and
|
||||
responds 200 without a cookie under both gates, so it cannot
|
||||
distinguish "gate on" from "gate off".
|
||||
"""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
@@ -351,14 +355,32 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
|
||||
f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {payload!r}"
|
||||
)
|
||||
|
||||
# (2) /api/status is gated by the OAuth middleware → unauthenticated
|
||||
# callers get 401, not the legacy public 200 JSON.
|
||||
status_code, body = _http_probe(container_name, "/api/status")
|
||||
# (2) A gated route (``/api/sessions``) returns 401 to an
|
||||
# unauthenticated caller — the OAuth gate is intercepting.
|
||||
status_code, body = _http_probe(container_name, "/api/sessions")
|
||||
assert status_code == 401, (
|
||||
"OAuth gate must intercept /api/status on 0.0.0.0 bind when a "
|
||||
"provider is registered and HERMES_DASHBOARD_INSECURE is unset. "
|
||||
"OAuth gate must intercept gated /api/* routes on 0.0.0.0 bind "
|
||||
"when a provider is registered and HERMES_DASHBOARD_INSECURE "
|
||||
f"is unset. Got: status={status_code} body={body!r}"
|
||||
)
|
||||
|
||||
# (3) ``/api/status`` remains 200 under the gate — it's in the shared
|
||||
# ``PUBLIC_API_PATHS`` allowlist so NAS's wildcard-subdomain
|
||||
# liveness probe (``fly-provider.ts`` ``getInstanceRuntimeStatus``)
|
||||
# can reach it without a cookie. Regression guard: this allowlist
|
||||
# drifted once already and surfaced every healthy agent as
|
||||
# STARTING/down in the portal UI.
|
||||
status_code, body = _http_probe(container_name, "/api/status")
|
||||
assert status_code == 200, (
|
||||
"/api/status must remain publicly reachable under the OAuth gate "
|
||||
"— the portal uses it as the wildcard-subdomain liveness probe. "
|
||||
f"Got: status={status_code} body={body!r}"
|
||||
)
|
||||
status = json.loads(body)
|
||||
assert status.get("auth_required") is True, (
|
||||
"/api/status must report auth_required=True when the OAuth gate "
|
||||
f"is engaged so the SPA/portal can distinguish modes. Got: {status!r}"
|
||||
)
|
||||
|
||||
|
||||
def test_dashboard_insecure_env_var_opts_out_of_gate(
|
||||
|
||||
@@ -131,8 +131,13 @@ class TestRefreshTokenCookieDeprecation:
|
||||
|
||||
|
||||
class TestApi401Envelope:
|
||||
# NOTE: probe a gated route (``/api/sessions``) here rather than
|
||||
# ``/api/status`` — status is in the shared ``PUBLIC_API_PATHS``
|
||||
# allowlist (portal liveness probe) so it would 200 even without a
|
||||
# cookie and never exercise the 401-envelope code path.
|
||||
|
||||
def test_no_cookie_returns_unauthenticated_envelope(self, gated_app):
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
assert r.status_code == 401
|
||||
body = r.json()
|
||||
assert body["error"] == "unauthenticated"
|
||||
@@ -141,7 +146,7 @@ class TestApi401Envelope:
|
||||
|
||||
def test_invalid_cookie_returns_session_expired_envelope(self, gated_app):
|
||||
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
assert r.status_code == 401
|
||||
body = r.json()
|
||||
assert body["error"] == "session_expired"
|
||||
@@ -151,7 +156,7 @@ class TestApi401Envelope:
|
||||
"""Dead-cookie cleanup — Phase 6 requirement so the browser
|
||||
doesn't keep replaying the stale token on every request."""
|
||||
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
set_cookies = r.headers.get_list("set-cookie")
|
||||
assert any(
|
||||
c.startswith(f"{SESSION_AT_COOKIE}=") and "Max-Age=0" in c
|
||||
|
||||
@@ -56,10 +56,61 @@ def gated_app():
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_gated_status_now_requires_auth(gated_app):
|
||||
"""When gate is on, /api/status is NOT public — login bootstrap uses /api/auth/providers."""
|
||||
def test_gated_status_is_public(gated_app):
|
||||
"""``/api/status`` MUST be public under the OAuth gate.
|
||||
|
||||
Regression guard for the wildcard-subdomain rollout: NAS
|
||||
(``fly-provider.ts`` ``getInstanceRuntimeStatus``) hits
|
||||
``/api/status`` without a cookie as its sole liveness probe. A 401
|
||||
here surfaces every healthy agent as STARTING/down in the portal
|
||||
UI. The endpoint returns only version + gateway/auth-gate metadata
|
||||
(no user data, no session content), so it stays in the shared
|
||||
``PUBLIC_API_PATHS`` allowlist under both the legacy ``_SESSION_TOKEN``
|
||||
gate and the OAuth gate.
|
||||
|
||||
The body also reports the gate's shape (``auth_required``,
|
||||
``auth_providers``) so the SPA's StatusPage and external monitors
|
||||
can distinguish loopback / gated / no-providers without a separate
|
||||
round trip.
|
||||
"""
|
||||
r = gated_app.get("/api/status")
|
||||
assert r.status_code == 401
|
||||
assert r.status_code == 200, (
|
||||
f"Expected 200, got {r.status_code}: {r.text}"
|
||||
)
|
||||
body = r.json()
|
||||
assert body["auth_required"] is True
|
||||
assert "version" in body
|
||||
assert "gateway_state" in body
|
||||
|
||||
|
||||
@pytest.mark.parametrize("path", [
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
"/api/model/info",
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
])
|
||||
def test_other_public_api_paths_are_public_under_gate(gated_app, path):
|
||||
"""The remaining ``PUBLIC_API_PATHS`` entries must also bypass the
|
||||
gate. They're documented as non-sensitive read-only endpoints that
|
||||
the SPA pre-loads before login (themes, config schema, model
|
||||
metadata). A 401 / 302-to-login here would block the dashboard
|
||||
shell from rendering pre-auth.
|
||||
|
||||
Accept any non-auth-failure status: 200 when the route succeeds,
|
||||
or any route-specific error (e.g. 400 / 404 / 500 from a missing
|
||||
dependency) — but NEVER 401, and NEVER a 302 to ``/login``.
|
||||
"""
|
||||
r = gated_app.get(path, follow_redirects=False)
|
||||
assert r.status_code != 401, (
|
||||
f"{path} returned 401 under the OAuth gate — should be public"
|
||||
)
|
||||
if r.status_code == 302:
|
||||
location = r.headers.get("location", "")
|
||||
assert "/login" not in location, (
|
||||
f"{path} redirected to {location} — should be public, "
|
||||
"not bounced to /login"
|
||||
)
|
||||
|
||||
|
||||
def test_gated_html_redirects_to_login(gated_app):
|
||||
@@ -98,7 +149,7 @@ def test_gated_static_asset_path_is_public(gated_app):
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_full_login_round_trip_unlocks_api_status(gated_app):
|
||||
def test_full_login_round_trip_unlocks_gated_api(gated_app):
|
||||
# 1) Click "Sign in with Stub IdP" — /auth/login redirects to the stub
|
||||
# with a PKCE cookie on the response.
|
||||
r1 = gated_app.get("/auth/login?provider=stub", follow_redirects=False)
|
||||
@@ -128,11 +179,16 @@ def test_full_login_round_trip_unlocks_api_status(gated_app):
|
||||
assert any("hermes_session_at" in c for c in set_cookies)
|
||||
assert any("hermes_session_rt" in c for c in set_cookies)
|
||||
|
||||
# 3) /api/status now succeeds because we're authenticated.
|
||||
r3 = gated_app.get("/api/status")
|
||||
assert r3.status_code == 200
|
||||
body = r3.json()
|
||||
assert "version" in body
|
||||
# 3) A gated API route (``/api/sessions``) now succeeds because we
|
||||
# have a valid session cookie. (We deliberately don't probe
|
||||
# ``/api/status`` here — it's in the shared PUBLIC_API_PATHS
|
||||
# allowlist and would 200 even without a login, so it can't
|
||||
# distinguish "logged in" from "gate accidentally disabled".)
|
||||
r3 = gated_app.get("/api/sessions")
|
||||
assert r3.status_code == 200, (
|
||||
f"Expected 200 for /api/sessions post-login, got {r3.status_code}: "
|
||||
f"{r3.text}"
|
||||
)
|
||||
|
||||
|
||||
def test_login_unknown_provider_returns_404(gated_app):
|
||||
|
||||
@@ -59,19 +59,11 @@ def loopback_client():
|
||||
web_server.app.state.auth_required = prev_required
|
||||
|
||||
|
||||
def _login(client: TestClient) -> None:
|
||||
"""Drive the stub OAuth round trip so the gated client is authed."""
|
||||
r1 = client.get("/auth/login?provider=stub", follow_redirects=False)
|
||||
assert r1.status_code == 302
|
||||
state = r1.headers["location"].split("state=")[1]
|
||||
r2 = client.get(
|
||||
f"/auth/callback?code=stub_code&state={state}", follow_redirects=False
|
||||
)
|
||||
assert r2.status_code == 302
|
||||
|
||||
|
||||
def test_status_reports_auth_required_in_gated_mode(gated_client):
|
||||
_login(gated_client)
|
||||
# No ``_login()`` call — ``/api/status`` is in the shared
|
||||
# ``PUBLIC_API_PATHS`` allowlist precisely so external probes (and
|
||||
# the SPA's pre-login bootstrap) can read the gate's shape without
|
||||
# a cookie. Hit it cold.
|
||||
r = gated_client.get("/api/status")
|
||||
assert r.status_code == 200
|
||||
body = r.json()
|
||||
|
||||
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
|
||||
|
||||
|
||||
def test_non_persistent_cleanup_removes_container(monkeypatch):
|
||||
"""When persistent=false, cleanup() must schedule docker stop + rm."""
|
||||
"""When persist_across_processes=false, cleanup() must docker stop AND
|
||||
docker rm so containers don't leak across hermes processes.
|
||||
|
||||
Updated for issue #20561: the previous implementation used fire-and-forget
|
||||
``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
|
||||
the new implementation uses ``subprocess.run`` on a daemon thread with
|
||||
bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
|
||||
for the full behavior contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
# Run the worker thread synchronously so assertions can observe its work.
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
popen_cmds = []
|
||||
monkeypatch.setattr(
|
||||
docker_env.subprocess, "Popen",
|
||||
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="ephemeral-task", persistent_filesystem=False,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
|
||||
assert env._container_id
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
# Capture cleanup-time docker calls (everything before this was init).
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capture(cmd, **kw):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
|
||||
return real_run(cmd, **kw)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capture)
|
||||
env.cleanup()
|
||||
|
||||
# Should have stop and rm calls via Popen
|
||||
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
|
||||
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
|
||||
assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"
|
||||
|
||||
|
||||
class _FakePopen:
|
||||
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
|
||||
"does not expose POSIX uid/gid" in rec.getMessage()
|
||||
for rec in caplog.records
|
||||
), "expected a warning when POSIX ids are unavailable"
|
||||
|
||||
|
||||
# ── Docker labels (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _run_args_from_calls(calls):
|
||||
"""Pull the argv list passed to the first ``docker run`` invocation."""
|
||||
run_calls = [
|
||||
c for c in calls
|
||||
if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
|
||||
]
|
||||
assert run_calls, "docker run should have been called"
|
||||
return run_calls[0][0]
|
||||
|
||||
|
||||
def _labels_in_run_args(run_args):
|
||||
"""Return the set of ``key=value`` strings passed via ``--label``."""
|
||||
return {
|
||||
run_args[i + 1]
|
||||
for i, flag in enumerate(run_args[:-1])
|
||||
if flag == "--label"
|
||||
}
|
||||
|
||||
|
||||
def test_run_command_tags_hermes_agent_label(monkeypatch):
|
||||
"""Every container hermes-agent starts must carry the hermes-agent=1 label
|
||||
so the orphan reaper (and external operators) can identify them with a
|
||||
single ``docker ps --filter label=hermes-agent=1`` call. Regression test
|
||||
for issue #20561 — without the label there is no global sweep target."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="my-task")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-agent=1" in labels, (
|
||||
f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_run_command_tags_task_and_profile_labels(monkeypatch):
|
||||
"""task_id and the active profile name are surfaced as labels so future
|
||||
cross-process reuse logic can filter to a specific (task, profile) pair
|
||||
without parsing container names. Profile resolution uses the helper that
|
||||
returns ``"default"`` for the root Hermes home."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="kanban-42")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-task-id=kanban-42" in labels, (
|
||||
f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
|
||||
)
|
||||
assert "hermes-profile=research-bot" in labels, (
|
||||
f"hermes-profile=research-bot missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_label_sanitizer_rejects_invalid_characters():
|
||||
"""Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
|
||||
task names containing slashes, colons, or unicode would otherwise emit
|
||||
invalid labels that round-trip badly through ``docker ps --filter``."""
|
||||
assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
|
||||
assert docker_env._sanitize_label_value("with/slash") == "with_slash"
|
||||
assert docker_env._sanitize_label_value("with:colon") == "with_colon"
|
||||
assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
|
||||
# Empty / non-string inputs must collapse to a queryable token, not "".
|
||||
assert docker_env._sanitize_label_value("") == "unknown"
|
||||
assert docker_env._sanitize_label_value(None) == "unknown" # type: ignore[arg-type]
|
||||
# >63 chars must truncate, not error.
|
||||
long_value = "x" * 100
|
||||
assert len(docker_env._sanitize_label_value(long_value)) == 63
|
||||
|
||||
|
||||
def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
|
||||
"""A task_id containing characters Docker rejects in label values must be
|
||||
sanitized before reaching ``docker run --label``; otherwise the daemon
|
||||
refuses the run with an inscrutable error and the agent's first command
|
||||
blows up."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="task/with:weird*chars")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
# Each non-OK character becomes an underscore; the safe chars survive.
|
||||
assert "hermes-task-id=task_with_weird_chars" in labels, (
|
||||
f"sanitized task-id label missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_labels_attribute_populated_after_init(monkeypatch):
|
||||
"""``self._labels`` must be set to the same key/value pairs that went onto
|
||||
docker run, so subsequent reuse / reaper paths can match without re-running
|
||||
the sanitizer or re-importing the profile module."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="abc")
|
||||
|
||||
assert env._labels == {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": "abc",
|
||||
"hermes-profile": "default",
|
||||
}
|
||||
|
||||
|
||||
# ── Cross-process container reuse (issue #20561) ──────────────────
|
||||
|
||||
|
||||
def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
|
||||
start_succeeds: bool = True):
|
||||
"""Reuse-aware subprocess.run mock.
|
||||
|
||||
``ps_state`` controls what ``docker ps -a --filter ...`` returns:
|
||||
* ``None`` → no match (empty stdout). Forces a fresh ``docker run``.
|
||||
* ``"running"`` / ``"exited"`` / ... → emit ``CID\\tSTATE`` so the reuse
|
||||
path picks it up. ``"running"`` skips ``docker start``; other states
|
||||
trigger ``docker start`` (which can be forced to fail via
|
||||
``start_succeeds=False``).
|
||||
|
||||
Returns the captured call list so the test can verify which docker
|
||||
commands actually ran.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
sub = cmd[1]
|
||||
if sub == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
|
||||
if sub == "ps":
|
||||
if ps_state is None:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
|
||||
)
|
||||
if sub == "start":
|
||||
if not start_succeeds:
|
||||
# Real subprocess.run with check=True raises on non-zero exit;
|
||||
# mirror that so the production code's except clause fires.
|
||||
raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
|
||||
if sub == "run":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
|
||||
"""When a labeled container is already ``running``, the reuse probe
|
||||
must pick it up and skip ``docker run`` entirely. Regression for the
|
||||
issue #20561 root cause: every Hermes process spawning a new container
|
||||
despite docs claiming "ONE long-lived container shared across sessions"."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-test")
|
||||
|
||||
# The reuse path must populate _container_id from the ps probe output.
|
||||
assert env._container_id == "reused-cid", (
|
||||
f"expected reused container id, got {env._container_id!r}"
|
||||
)
|
||||
# And it must NOT have run `docker run`.
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, (
|
||||
f"docker run should be skipped on reuse, got: {run_invocations}"
|
||||
)
|
||||
# And it must have NOT issued a `docker start` for an already-running container.
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert not start_invocations, (
|
||||
f"docker start should be skipped when container already running, got: {start_invocations}"
|
||||
)
|
||||
|
||||
|
||||
def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
|
||||
"""A labeled container in ``exited`` state must be restarted via
|
||||
``docker start`` before the new Hermes process uses it. Without this
|
||||
step, ``docker exec`` against a stopped container errors out and the
|
||||
first agent command fails opaquely."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-stopped")
|
||||
|
||||
assert env._container_id == "reused-cid"
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert start_invocations, "expected docker start for exited container"
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, "should not docker run when reusing an exited container"
|
||||
|
||||
|
||||
def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
|
||||
"""If ``docker start`` on the matched container fails (container was
|
||||
removed between probe and start, daemon paused, etc.), the code must
|
||||
silently fall through to a fresh ``docker run`` rather than leaving the
|
||||
user with a broken environment. Defensive recovery — the probe is best-
|
||||
effort, not authoritative."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(
|
||||
monkeypatch, ps_state="exited", start_succeeds=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-broken-start")
|
||||
|
||||
# docker start should be attempted then fail; code falls through to run.
|
||||
assert env._container_id == "fresh-cid", (
|
||||
f"expected fresh container id after fallback, got {env._container_id!r}"
|
||||
)
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert run_invocations, "fallback to fresh docker run must happen on start failure"
|
||||
|
||||
|
||||
def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
|
||||
"""Opt-out path: ``persist_across_processes=False`` skips the ps probe
|
||||
entirely and always starts a fresh container, matching the pre-fix
|
||||
behavior for users who want hard per-process isolation."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
# ps_state=running would trigger reuse if the probe ran — assert it doesn't.
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="no-reuse", persist_across_processes=False,
|
||||
)
|
||||
|
||||
# Must NOT have issued docker ps (the probe is gated by the flag).
|
||||
ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
|
||||
assert not ps_invocations, (
|
||||
f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
|
||||
)
|
||||
# Should have started a fresh container.
|
||||
assert env._container_id == "fresh-cid"
|
||||
|
||||
|
||||
def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
|
||||
"""When the probe returns multiple matches (shouldn't normally happen,
|
||||
but can after a crash leaves stale duplicates), a ``running`` container
|
||||
is preferred over any stopped one. The duplicate gets reaped later by
|
||||
the orphan reaper; we don't try to be heroic about it here."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
if cmd[1] == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
|
||||
if cmd[1] == "ps":
|
||||
# Two matches: stopped first, running second.
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="stopped-cid\texited\nrunning-cid\trunning\n",
|
||||
stderr="",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
env = _make_dummy_env(task_id="dup-match")
|
||||
assert env._container_id == "running-cid", (
|
||||
f"running container should win over stopped duplicate, got {env._container_id!r}"
|
||||
)
|
||||
|
||||
|
||||
# ── Cleanup correctness (issue #20561) ────────────────────────────
|
||||
|
||||
|
||||
class _FakeThread:
|
||||
"""Stand-in for threading.Thread that captures target/args and calls
|
||||
target() synchronously when .start() runs, so cleanup behavior is
|
||||
observable without actually backgrounding subprocess calls."""
|
||||
|
||||
def __init__(self, target=None, daemon=None, name=None):
|
||||
self._target = target
|
||||
self.daemon = daemon
|
||||
self.name = name
|
||||
self._done = False
|
||||
|
||||
def start(self):
|
||||
if self._target is not None:
|
||||
self._target()
|
||||
self._done = True
|
||||
|
||||
def is_alive(self):
|
||||
return not self._done
|
||||
|
||||
def join(self, timeout=None):
|
||||
self._done = True
|
||||
|
||||
|
||||
def _install_fake_thread(monkeypatch):
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
|
||||
def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
|
||||
"""``persist_across_processes=True`` (default) cleanup must NEITHER stop
|
||||
NOR remove the container — the docs promise "ONE long-lived container
|
||||
shared across sessions", and any docker stop would kill background
|
||||
processes inside the container (npm watchers, pytest watchers, etc.).
|
||||
|
||||
Resource reclamation in this mode happens via the orphan reaper on next
|
||||
Hermes startup, not on graceful exit. Issue #20561 — the first iteration
|
||||
of this PR did docker stop here, which Ben caught as contradicting the
|
||||
"ONE long-lived container" semantics."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
|
||||
# Default persist_across_processes=True.
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"docker stop must NOT be called when persist_across_processes=True; "
|
||||
f"container has to stay running so background processes survive. "
|
||||
f"Got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"docker rm must NOT be called when persist_across_processes=True; "
|
||||
f"reuse would be impossible. Got: {rms}"
|
||||
)
|
||||
# The in-process handle must still be cleared so the next __init__
|
||||
# re-probes via labels (and reuses the still-running container).
|
||||
assert env._container_id is None, (
|
||||
"in-process container_id should be cleared even in no-op cleanup"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
|
||||
"""``cleanup(force_remove=True)`` must stop AND rm the container even
|
||||
when ``persist_across_processes=True``. This is the explicit-teardown
|
||||
path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
|
||||
future caller that wants a guaranteed fresh container.
|
||||
|
||||
Without this kwarg, callers in persist mode would have no way to force a
|
||||
fresh container without also flipping the global config — too coarse for
|
||||
a per-task reset.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
|
||||
assert env._container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup(force_remove=True)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
|
||||
"""``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
|
||||
for a persist-mode container.
|
||||
|
||||
Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
|
||||
(which is called from ``tui_gateway/server.py`` on session.close, from
|
||||
``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
|
||||
calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
|
||||
we'd tear down the container on every TUI session close, defeating the
|
||||
"ONE long-lived container shared across sessions" contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="session-close-test")
|
||||
container_id = env._container_id
|
||||
terminal_tool._active_environments["session-close-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("session-close-test")
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("session-close-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"cleanup_vm() default must not docker stop a persist-mode container; "
|
||||
f"got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"cleanup_vm() default must not docker rm a persist-mode container; "
|
||||
f"got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
|
||||
"""``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
|
||||
container — the explicit-teardown path for ``/reset``-style flows.
|
||||
|
||||
Also pins the runtime-signature-inspection plumbing: the kwarg must
|
||||
actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="explicit-teardown-test")
|
||||
terminal_tool._active_environments["explicit-teardown-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("explicit-teardown-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
|
||||
"""``persist_across_processes=False`` cleanup must docker stop AND docker
|
||||
rm so containers don't leak. Crucially, this runs regardless of the
|
||||
``persistent_filesystem`` setting — the original code only rm'd when
|
||||
``not self._persistent``, which meant the default-on ``container_persistent:
|
||||
true`` users (the documented happy path) leaked Exited containers forever.
|
||||
Issue #20561 root-cause fix."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
# Note: persistent_filesystem=True (the prior-leak scenario) + the new
|
||||
# cross-process toggle OFF must still result in a clean rm.
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="cleanup-no-persist", persistent_filesystem=True,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, "expected docker stop"
|
||||
assert rms, (
|
||||
"docker rm MUST run when persist_across_processes=False, even with "
|
||||
"persistent_filesystem=True — that gating was the leak source in #20561."
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
|
||||
"""The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
|
||||
raced with parent-process exit and silently dropped cleanup work. The
|
||||
new code must use ``subprocess.run`` with bounded ``timeout=`` so the
|
||||
work actually completes within the process lifetime.
|
||||
|
||||
Asserts cleanup never reaches into shell-mode Popen. Uses
|
||||
``force_remove=True`` so cleanup actually issues docker calls — the
|
||||
default persist-mode path is now a no-op (commit 4) and would trivially
|
||||
pass this assertion without exercising the docker code at all.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
def _forbidden_popen(*args, **kwargs):
|
||||
raise AssertionError(
|
||||
f"cleanup must not use subprocess.Popen anymore (issue #20561); "
|
||||
f"got args={args} kwargs={kwargs}"
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
|
||||
|
||||
env = _make_dummy_env(task_id="no-popen-cleanup")
|
||||
env.cleanup(force_remove=True) # must not raise
|
||||
|
||||
|
||||
def test_wait_for_cleanup_returns_true_when_no_thread_started():
|
||||
"""``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
|
||||
(or the env has no live cleanup thread) — atexit calls it unconditionally
|
||||
across all active envs, so a False return would falsely flag healthy
|
||||
shutdowns."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
# No _cleanup_thread set — simulates an env that was never cleanup()'d.
|
||||
assert env.wait_for_cleanup(timeout=1.0) is True
|
||||
|
||||
|
||||
def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
|
||||
"""End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
|
||||
and reports completion. Atexit relies on this contract to ensure docker
|
||||
stop/rm actually finishes before the Python interpreter exits.
|
||||
|
||||
Uses ``force_remove=True`` so cleanup actually starts a worker thread —
|
||||
the default persist-mode cleanup is a no-op (commit 4) and never spawns
|
||||
a thread, so the trivial "no thread" branch of wait_for_cleanup is
|
||||
already covered by the previous test.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="wait-test")
|
||||
env.cleanup(force_remove=True)
|
||||
assert env.wait_for_cleanup(timeout=5.0) is True
|
||||
|
||||
|
||||
def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
|
||||
"""A DockerEnvironment whose ``__init__`` failed before the container_id
|
||||
was set (image-pull error, docker daemon down) should still be safe to
|
||||
cleanup() — the post-creation failure path in callers always tries.
|
||||
Without this guard the daemon-down case used to NameError on the cleanup
|
||||
branch."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
env._container_id = None
|
||||
env._persistent = False
|
||||
env._workspace_dir = None
|
||||
env._home_dir = None
|
||||
# No exception expected.
|
||||
env.cleanup()
|
||||
|
||||
|
||||
# ── Orphan reaper (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _now_iso(offset_seconds: int = 0) -> str:
|
||||
"""Return an RFC3339 timestamp ``offset_seconds`` in the past."""
|
||||
import datetime
|
||||
t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
|
||||
# Format like Docker emits — with nanoseconds-style trailing digits.
|
||||
return t.isoformat().replace("+00:00", ".123456789Z")
|
||||
|
||||
|
||||
def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
|
||||
rm_succeeds: bool = True):
|
||||
"""Build a subprocess.run mock for reaper tests.
|
||||
|
||||
* ``ps_ids`` — what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
|
||||
* ``inspect_responses[cid]`` — what ``docker inspect ... FinishedAt`` returns
|
||||
for each cid; ``""`` means "field unset".
|
||||
* ``rm_succeeds`` — whether ``docker rm -f`` returns 0.
|
||||
|
||||
Captures every call so tests can assert which containers were rm'd.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
# cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
|
||||
cid = cmd[-1]
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
|
||||
)
|
||||
if sub == "rm":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0 if rm_succeeds else 1,
|
||||
stdout="", stderr="" if rm_succeeds else "no such container",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
|
||||
"""No labeled containers → no rm calls, returns 0. Establishes the
|
||||
happy-path baseline for the orphan reaper (issue #20561)."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, "no rm calls expected when ps returns empty"
|
||||
|
||||
|
||||
def test_reap_orphan_removes_stale_exited_container(monkeypatch):
|
||||
"""An Exited container older than max_age_seconds must be removed.
|
||||
This is the core repair path for issue #20561 — without the reaper,
|
||||
SIGKILL'd Hermes processes leak containers permanently."""
|
||||
old = _now_iso(offset_seconds=900) # 15 minutes ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 1
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert len(rms) == 1
|
||||
assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
|
||||
|
||||
|
||||
def test_reap_orphan_spares_recently_exited_container(monkeypatch):
|
||||
"""A container exited within max_age_seconds must NOT be reaped — that
|
||||
container belongs to a Hermes process that just finished and may be
|
||||
about to be replaced. Conservative window prevents racing sibling
|
||||
processes."""
|
||||
recent = _now_iso(offset_seconds=60) # 1 minute ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
|
||||
|
||||
|
||||
def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
|
||||
"""The reaper must pass ``--filter label=hermes-profile=<profile>`` to
|
||||
docker ps so it never sweeps another profile's containers. A research
|
||||
profile must not tear down the default profile's stragglers."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
|
||||
assert ps_calls, "expected at least one docker ps call"
|
||||
flat = " ".join(ps_calls[0][0])
|
||||
assert "label=hermes-profile=research-bot" in flat, (
|
||||
f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "label=hermes-agent=1" in flat, (
|
||||
f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "status=exited" in flat, (
|
||||
"must filter to exited containers only — running containers may "
|
||||
"belong to a sibling Hermes process and must NEVER be reaped"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
|
||||
"""If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
|
||||
FinishedAt yet) or an unparseable timestamp, the reaper must leave the
|
||||
container alone. Defensive — never reap a container whose age we can't
|
||||
determine."""
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch,
|
||||
ps_ids=["never-finished", "garbage-ts"],
|
||||
inspect_responses={
|
||||
"never-finished": "0001-01-01T00:00:00Z",
|
||||
"garbage-ts": "not-a-timestamp",
|
||||
},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, (
|
||||
f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
|
||||
"""If docker ps itself fails (daemon down, permission denied), the
|
||||
reaper returns 0 without crashing. The reaper is best-effort plumbing,
|
||||
not a critical path — it must never block container creation."""
|
||||
def _failing_ps(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
|
||||
|
||||
# Must not raise
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
assert removed == 0
|
||||
|
||||
|
||||
def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
|
||||
"""If ``docker rm -f`` fails on one container (already removed by a
|
||||
concurrent process, container locked, etc.), the reaper must log and
|
||||
continue to the next candidate rather than aborting the whole sweep."""
|
||||
old = _now_iso(offset_seconds=900)
|
||||
rm_calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
|
||||
if sub == "rm":
|
||||
rm_calls.append(cmd[-1])
|
||||
# cid-b fails; cid-a and cid-c succeed.
|
||||
if cmd[-1] == "cid-b":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
# All three were attempted, two succeeded.
|
||||
assert removed == 2
|
||||
assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
|
||||
f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
|
||||
)
|
||||
|
||||
|
||||
def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
|
||||
"""Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
|
||||
9 fractional digits), but Python's fromisoformat caps at microseconds.
|
||||
The helper must trim the extra digits without raising — otherwise every
|
||||
candidate gets skipped and the reaper does nothing."""
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="2026-05-28T13:45:00.123456789Z\n",
|
||||
stderr="",
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
|
||||
assert result is not None, "must parse RFC3339 with nanoseconds"
|
||||
import datetime
|
||||
assert result.tzinfo == datetime.timezone.utc
|
||||
assert result.year == 2026 and result.month == 5 and result.day == 28
|
||||
|
||||
|
||||
def test_container_finished_at_returns_none_on_zero_value():
|
||||
"""Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
|
||||
map to None so the reaper treats the container as unreapable."""
|
||||
# Direct test of the parsing helper — no subprocess needed since the
|
||||
# check happens after the inspect call returns.
|
||||
import subprocess as _subprocess
|
||||
|
||||
class _MockRun:
|
||||
def __init__(self, stdout):
|
||||
self.returncode = 0
|
||||
self.stdout = stdout
|
||||
self.stderr = ""
|
||||
|
||||
import unittest.mock
|
||||
with unittest.mock.patch.object(
|
||||
docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
|
||||
):
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
|
||||
assert result is None
|
||||
|
||||
@@ -0,0 +1,139 @@
|
||||
"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
|
||||
|
||||
The reaper itself is unit-tested in tests/tools/test_docker_environment.py
|
||||
under the "Orphan reaper" section. These tests cover the terminal_tool-side
|
||||
gates: once-per-process behavior, the disable flag, and the
|
||||
``lifetime_seconds`` doubling that determines the reaper's age threshold.
|
||||
|
||||
Issue #20561 — without these gates, parallel subagents would each fire the
|
||||
reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
|
||||
opt-out would silently do nothing.
|
||||
"""
|
||||
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
|
||||
import tools.terminal_tool as terminal_tool
|
||||
|
||||
|
||||
def _reset_reaper_gate():
|
||||
"""Clear the once-per-process flag between tests."""
|
||||
terminal_tool._docker_orphan_reaper_ran = False
|
||||
|
||||
|
||||
def test_maybe_reap_runs_once_per_process(monkeypatch):
|
||||
"""The reaper sweep must run at most once per Python interpreter.
|
||||
Parallel subagents that each call _create_environment(env_type='docker')
|
||||
would otherwise fire N concurrent docker ps + inspect storms against the
|
||||
daemon and waste 5–10s of startup."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
config = {"docker_orphan_reaper": True}
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
|
||||
assert call_count["reap"] == 1, (
|
||||
f"reaper must run exactly once per process; got {call_count['reap']} calls"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_respects_disable_flag(monkeypatch):
|
||||
"""``terminal.docker_orphan_reaper: false`` (via container_config) must
|
||||
skip the sweep entirely — no docker ps, no inspect, no rm. The escape
|
||||
hatch for operators running multiple Hermes processes in the same
|
||||
profile."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
|
||||
|
||||
assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
|
||||
# The once-per-process gate must NOT be tripped when the reaper is
|
||||
# disabled — that would prevent a subsequent toggle to true from working.
|
||||
assert terminal_tool._docker_orphan_reaper_ran is False
|
||||
|
||||
|
||||
def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
|
||||
"""The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
|
||||
floor). Generous default — gives sibling Hermes processes ample grace
|
||||
to be replaced without their just-exited containers being yanked."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 600, (
|
||||
f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_floors_at_60_seconds(monkeypatch):
|
||||
"""A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
|
||||
otherwise get an effective age threshold of zero, which would race the
|
||||
user's own just-started container creation. Floor at 60s × 2 = 120s."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 120, (
|
||||
f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
|
||||
"""The reaper must be scoped to the current Hermes profile — a research
|
||||
profile must NEVER reap default's containers. Verifies the
|
||||
profile-filter wiring."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
|
||||
patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("profile_filter") == "research-bot", (
|
||||
f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_swallows_exceptions(monkeypatch):
|
||||
"""A reaper crash (docker daemon down, parse error in helper) must NOT
|
||||
block env creation. The reaper is best-effort plumbing, not a critical
|
||||
path; failures get logged at debug level and execution continues."""
|
||||
_reset_reaper_gate()
|
||||
|
||||
def _exploding_reap(**kwargs):
|
||||
raise RuntimeError("docker daemon ate the cat")
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
|
||||
# Must not raise
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
|
||||
assert "docker_env" in _gateway_env_map_keys()
|
||||
assert "docker_env" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_persist_across_processes_is_bridged_everywhere():
|
||||
"""Regression pin for the cross-process container reuse toggle.
|
||||
|
||||
``terminal.docker_persist_across_processes`` (issue #20561) controls
|
||||
whether ``DockerEnvironment.__init__`` probes for and reuses an existing
|
||||
labeled container at startup, and whether ``cleanup()`` removes the
|
||||
container on Hermes exit or just stops it (keeping it for the next
|
||||
process). Same four-bridge invariant as docker_run_as_host_user /
|
||||
docker_env / docker_mount_cwd_to_workspace — drift between any of the
|
||||
four sites means ``terminal.docker_persist_across_processes: false`` in
|
||||
config.yaml silently does nothing for that entry point, leaving the
|
||||
user unable to opt out of the documented "ONE long-lived container
|
||||
shared across sessions" behavior.
|
||||
"""
|
||||
assert "docker_persist_across_processes" in _cli_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _gateway_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_orphan_reaper_is_bridged_everywhere():
|
||||
"""Regression pin for the startup orphan reaper toggle (issue #20561).
|
||||
|
||||
``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
|
||||
Exited containers from prior SIGKILL'd processes at startup. Same
|
||||
four-site bridge invariant — drift means
|
||||
``terminal.docker_orphan_reaper: false`` silently does nothing for one
|
||||
entry point, and the reaper either runs when the operator disabled it
|
||||
or fails to run when they enabled it.
|
||||
"""
|
||||
assert "docker_orphan_reaper" in _cli_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _gateway_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
|
||||
|
||||
@@ -44,11 +44,17 @@ def server(hermes_home):
|
||||
):
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -30,11 +30,17 @@ def server():
|
||||
import importlib
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -34,11 +34,17 @@ def server():
|
||||
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
def test_init_session_attaches_background_review_callback(server, monkeypatch):
|
||||
|
||||
+427
-40
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
|
||||
return {}
|
||||
|
||||
|
||||
# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
|
||||
# safely through `docker ps --filter label=key=value`. Profile and task names
|
||||
# can technically contain other characters; sanitize defensively.
|
||||
_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
|
||||
|
||||
|
||||
def _sanitize_label_value(value: str) -> str:
|
||||
"""Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
|
||||
|
||||
Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
|
||||
label is always queryable. Used at container-create time; never round-trip
|
||||
a sanitized value back into application logic.
|
||||
"""
|
||||
if not isinstance(value, str) or not value:
|
||||
return "unknown"
|
||||
cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
|
||||
cleaned = cleaned[:63] or "unknown"
|
||||
return cleaned
|
||||
|
||||
|
||||
def _get_active_profile_name() -> str:
|
||||
"""Return the active Hermes profile name, or ``"default"`` on any error.
|
||||
|
||||
Resolved at container-create time so a single container is permanently
|
||||
tagged with the profile that created it. Profile switches inside the
|
||||
same process don't retroactively relabel running containers.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.profiles import get_active_profile_name
|
||||
|
||||
return get_active_profile_name() or "default"
|
||||
except Exception:
|
||||
return "default"
|
||||
|
||||
|
||||
def reap_orphan_containers(
|
||||
*,
|
||||
max_age_seconds: int = 600,
|
||||
profile_filter: str | None = None,
|
||||
docker_exe: str | None = None,
|
||||
) -> int:
|
||||
"""Remove stale hermes-tagged containers left behind by prior processes.
|
||||
|
||||
Targets containers that match all of:
|
||||
|
||||
* ``label=hermes-agent=1`` (created by this codebase)
|
||||
* ``status=exited`` (running containers are NEVER reaped — they may
|
||||
belong to a sibling Hermes process whose reuse path will pick them
|
||||
up; killing them would crash the sibling mid-command)
|
||||
* (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
|
||||
caller's profile by default; a hermes process in profile A must not
|
||||
tear down profile B's containers)
|
||||
* ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
|
||||
process that just exited and is about to be replaced doesn't get
|
||||
its container yanked out from under it)
|
||||
|
||||
Returns the number of containers removed. Best-effort: any failure
|
||||
(docker daemon unreachable, slow inspect, parse error) is logged at
|
||||
debug level and the function returns whatever it managed before the
|
||||
failure. Safe to call repeatedly; idempotent.
|
||||
|
||||
Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
|
||||
terminal exits that bypass the ``atexit`` cleanup hook. Without it,
|
||||
even with the cleanup-fix in the prior commit, a hard-killed Hermes
|
||||
process leaves its container behind permanently because there's no
|
||||
subsequent Hermes process scheduled to reuse that exact (task, profile)
|
||||
pair.
|
||||
"""
|
||||
docker = docker_exe or find_docker() or "docker"
|
||||
filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
|
||||
if profile_filter:
|
||||
filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
|
||||
|
||||
try:
|
||||
listing = subprocess.run(
|
||||
[docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
|
||||
capture_output=True, text=True, timeout=15, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker ps failed: %s", e)
|
||||
return 0
|
||||
if listing.returncode != 0:
|
||||
logger.debug(
|
||||
"orphan reaper docker ps returned %d: %s",
|
||||
listing.returncode, listing.stderr.strip(),
|
||||
)
|
||||
return 0
|
||||
|
||||
candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
|
||||
if not candidate_ids:
|
||||
return 0
|
||||
|
||||
# Inspect each candidate to get FinishedAt; reap only those exited
|
||||
# long enough ago. Doing this per-container (rather than bulk inspect)
|
||||
# keeps the failure blast radius to one container at a time.
|
||||
import datetime
|
||||
now = datetime.datetime.now(datetime.timezone.utc)
|
||||
removed = 0
|
||||
for cid in candidate_ids:
|
||||
finished_at = _container_finished_at(docker, cid)
|
||||
if finished_at is None:
|
||||
# Couldn't determine age — be conservative and leave it alone.
|
||||
continue
|
||||
age = (now - finished_at).total_seconds()
|
||||
if age < max_age_seconds:
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker, "rm", "-f", cid],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
removed += 1
|
||||
logger.info(
|
||||
"Reaped orphan container %s (exited %d seconds ago)",
|
||||
cid[:12], int(age),
|
||||
)
|
||||
else:
|
||||
logger.debug(
|
||||
"docker rm -f %s failed: %s",
|
||||
cid[:12], result.stderr.strip(),
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
|
||||
return removed
|
||||
|
||||
|
||||
def _container_finished_at(docker_exe: str, container_id: str):
|
||||
"""Parse ``docker inspect`` FinishedAt for *container_id*.
|
||||
|
||||
Returns a timezone-aware datetime, or ``None`` if the field is missing,
|
||||
unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
|
||||
for never-finished containers. ``None`` means "don't reap" — the caller
|
||||
leaves the container alone.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
|
||||
capture_output=True, text=True, timeout=10, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
return None
|
||||
raw = result.stdout.strip()
|
||||
if not raw or raw.startswith("0001-01-01"):
|
||||
return None
|
||||
# Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
|
||||
# Python's fromisoformat handles microseconds but not nanoseconds; trim.
|
||||
import re as _re
|
||||
raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
|
||||
raw = raw.replace("Z", "+00:00")
|
||||
try:
|
||||
import datetime
|
||||
return datetime.datetime.fromisoformat(raw)
|
||||
except ValueError as e:
|
||||
logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
|
||||
return None
|
||||
|
||||
|
||||
def find_docker() -> Optional[str]:
|
||||
"""Locate the docker (or podman) CLI binary.
|
||||
|
||||
@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
|
||||
auto_mount_cwd: bool = False,
|
||||
run_as_host_user: bool = False,
|
||||
extra_args: list = None,
|
||||
persist_across_processes: bool = True,
|
||||
):
|
||||
if cwd == "~":
|
||||
cwd = "/root"
|
||||
super().__init__(cwd=cwd, timeout=timeout)
|
||||
self._persistent = persistent_filesystem
|
||||
self._persist_across_processes = persist_across_processes
|
||||
self._task_id = task_id
|
||||
self._forward_env = _normalize_forward_env_names(forward_env)
|
||||
self._env = _normalize_env_dict(env)
|
||||
self._container_id: Optional[str] = None
|
||||
self._labels: dict[str, str] = {}
|
||||
logger.info(f"DockerEnvironment volumes: {volumes}")
|
||||
# Ensure volumes is a list (config.yaml could be malformed)
|
||||
if volumes is not None and not isinstance(volumes, list):
|
||||
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):
|
||||
|
||||
# Start the container directly via `docker run -d`.
|
||||
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
# Labels make hermes-created containers identifiable to:
|
||||
# * the orphan reaper (`hermes-agent=1` for the global sweep filter)
|
||||
# * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
|
||||
# * operators running `docker ps --filter label=hermes-agent=1`
|
||||
# Values are limited to the safe character set defined by
|
||||
# _sanitize_label_value(); the active Hermes profile is captured at
|
||||
# container-start time and never changes for the container's lifetime.
|
||||
profile_name = _sanitize_label_value(_get_active_profile_name())
|
||||
task_label = _sanitize_label_value(task_id)
|
||||
label_args = [
|
||||
"--label", "hermes-agent=1",
|
||||
"--label", f"hermes-task-id={task_label}",
|
||||
"--label", f"hermes-profile={profile_name}",
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
self._labels = {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": task_label,
|
||||
"hermes-profile": profile_name,
|
||||
}
|
||||
|
||||
# Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
|
||||
# container shared across sessions"). If a prior Hermes process
|
||||
# already started a container for this (task_id, profile) and it
|
||||
# still exists, attach to it instead of starting a fresh one. This
|
||||
# restores the documented contract; opt out via
|
||||
# ``terminal.docker_persist_across_processes: false``.
|
||||
#
|
||||
# Reuse matches on labels only — we deliberately do NOT compare image
|
||||
# / mounts / resources. Operators who need a fresh container after
|
||||
# changing those settings should set ``docker_persist_across_processes:
|
||||
# false`` (or run ``docker rm -f`` against the labeled container) to
|
||||
# force a clean start.
|
||||
reused = False
|
||||
if persist_across_processes:
|
||||
existing = self._find_reusable_container(task_label, profile_name)
|
||||
if existing is not None:
|
||||
container_id, state = existing
|
||||
self._container_id = container_id
|
||||
if state != "running":
|
||||
try:
|
||||
subprocess.run(
|
||||
[self._docker_exe, "start", container_id],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
check=True,
|
||||
)
|
||||
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
|
||||
logger.warning(
|
||||
"Failed to start existing container %s (state=%s): "
|
||||
"%s — falling back to a fresh container.",
|
||||
container_id[:12], state, e,
|
||||
)
|
||||
self._container_id = None
|
||||
if self._container_id:
|
||||
logger.info(
|
||||
"Reusing container %s (task=%s, profile=%s, prior state=%s)",
|
||||
container_id[:12], task_label, profile_name, state,
|
||||
)
|
||||
reused = True
|
||||
|
||||
if not reused:
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
*label_args,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
|
||||
# Build the init-time env forwarding args (used only by init_session
|
||||
# to inject host env vars into the snapshot; subsequent commands get
|
||||
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
|
||||
logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
|
||||
return _storage_opt_ok
|
||||
|
||||
def cleanup(self):
|
||||
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
|
||||
if self._container_id:
|
||||
try:
|
||||
# Stop in background so cleanup doesn't block
|
||||
stop_cmd = (
|
||||
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
|
||||
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
|
||||
)
|
||||
subprocess.Popen(stop_cmd, shell=True)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to stop container %s: %s", self._container_id, e)
|
||||
def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
|
||||
"""Look for an existing container labeled for this (task, profile).
|
||||
|
||||
Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
|
||||
failure (including ``docker ps`` itself failing). State is one of the
|
||||
values Docker reports via ``{{.State}}`` — e.g. ``running``, ``exited``,
|
||||
``created``, ``paused``, ``restarting``, ``dead``. The caller decides
|
||||
whether the state warrants ``docker start`` before reuse.
|
||||
|
||||
Restricted to the docker-stored label set this class creates; never
|
||||
matches containers that happened to be named ``hermes-*`` but were
|
||||
started by some other tool.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
self._docker_exe, "ps", "-a",
|
||||
"--filter", "label=hermes-agent=1",
|
||||
"--filter", f"label=hermes-task-id={task_label}",
|
||||
"--filter", f"label=hermes-profile={profile_label}",
|
||||
"--format", "{{.ID}}\t{{.State}}",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("docker ps probe failed: %s — will start a fresh container", e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
logger.debug(
|
||||
"docker ps probe returned %d: %s — will start a fresh container",
|
||||
result.returncode, result.stderr.strip(),
|
||||
)
|
||||
return None
|
||||
lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
|
||||
if not lines:
|
||||
return None
|
||||
# Multiple matches are unusual (one (task, profile) should produce one
|
||||
# container) but can happen if a previous Hermes process crashed
|
||||
# mid-cleanup. Prefer a running one if present; otherwise pick the
|
||||
# first listed. Stale duplicates get reaped by the orphan-reaper in a
|
||||
# follow-up commit; we don't try to be heroic about them here.
|
||||
running = None
|
||||
first = None
|
||||
for ln in lines:
|
||||
parts = ln.split("\t", 1)
|
||||
if len(parts) != 2:
|
||||
continue
|
||||
cid, state = parts[0], parts[1].lower()
|
||||
if first is None:
|
||||
first = (cid, state)
|
||||
if state == "running" and running is None:
|
||||
running = (cid, state)
|
||||
return running or first
|
||||
|
||||
def cleanup(self, *, force_remove: bool = False):
|
||||
"""Tear down the container according to persist mode and *force_remove*.
|
||||
|
||||
Persist-mode (``persist_across_processes=True``, the default) leaves the
|
||||
container **running** untouched. The docs promise "ONE long-lived
|
||||
container shared across sessions" and stopping it on every Hermes exit
|
||||
breaks that promise:
|
||||
|
||||
* Background processes inside the container (``npm run dev``, watchers,
|
||||
long-running pytest) get killed every time the user runs ``/quit``.
|
||||
* Every reuse requires ``docker start`` + waiting for the container to
|
||||
come back up, adding 1–2s to the first tool call of the new session.
|
||||
* The user-visible difference between "ONE long-lived container" and
|
||||
"a new container that happens to share state" is exactly this:
|
||||
processes survive in the former, die in the latter.
|
||||
|
||||
Resource reclamation for the persist-mode case lives in the
|
||||
``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
|
||||
Hermes process touches a labeled container for ``2 × lifetime_seconds``
|
||||
it gets ``docker rm -f``'d at the next Hermes startup. That covers the
|
||||
SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
|
||||
container on every graceful exit.
|
||||
|
||||
Opt-out mode (``persist_across_processes=False``) still does
|
||||
``docker stop`` + ``docker rm -f`` on every cleanup, matching the
|
||||
pre-PR behavior for users who explicitly want per-process isolation.
|
||||
|
||||
``force_remove=True`` overrides persist mode and always tears the
|
||||
container down (``docker stop`` + ``docker rm -f``). This is the
|
||||
explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
|
||||
resets, or any caller that wants a guaranteed fresh container on next
|
||||
``DockerEnvironment(task_id=...)``. No current caller passes
|
||||
``force_remove=True``; the parameter is here so the explicit-teardown
|
||||
semantics can be wired up later without changing this method's
|
||||
signature.
|
||||
|
||||
Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
|
||||
(not the racy ``Popen(... &)`` pattern from before PR #33645). The
|
||||
atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
|
||||
thread to finish before the interpreter exits, so ``docker stop`` /
|
||||
``docker rm`` actually completes when we do trigger it.
|
||||
"""
|
||||
container_id = self._container_id
|
||||
if not container_id:
|
||||
# Still drop the bind-mount dirs if any were allocated and we're
|
||||
# NOT in persist mode (persist mode preserves them).
|
||||
if not self._persistent:
|
||||
# Also schedule removal (stop only leaves it as stopped)
|
||||
try:
|
||||
subprocess.Popen(
|
||||
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
|
||||
shell=True,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
self._container_id = None
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
return
|
||||
|
||||
if not self._persistent:
|
||||
# Decide what to actually do. Three cases:
|
||||
#
|
||||
# force_remove=True → stop + rm (explicit teardown)
|
||||
# persist_across_processes=True → no-op (leave container running)
|
||||
# persist_across_processes=False → stop + rm (per-process isolation)
|
||||
#
|
||||
# The persist-mode no-op is the issue-#20561 contract: the container
|
||||
# outlives Hermes processes, processes inside it stay alive, and
|
||||
# reuse on next startup is instant.
|
||||
if force_remove:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
elif self._persist_across_processes:
|
||||
# No-op for the container. Drop the in-process handle so a fresh
|
||||
# __init__ will re-probe via labels (and find the running
|
||||
# container) instead of trying to reuse a stale Python reference.
|
||||
self._container_id = None
|
||||
return
|
||||
else:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
|
||||
# Capture state needed by the worker before we null out the attrs —
|
||||
# the worker thread can outlive ``self``.
|
||||
docker_exe = self._docker_exe
|
||||
log_id = container_id[:12]
|
||||
|
||||
def _do_cleanup() -> None:
|
||||
if should_stop:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "stop", "-t", "10", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker stop %s timed out / failed: %s", log_id, e)
|
||||
if should_remove:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "rm", "-f", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker rm -f %s failed: %s", log_id, e)
|
||||
|
||||
# Daemon thread: doesn't block interpreter exit (atexit returns
|
||||
# promptly), but unlike the old ``Popen(... &)`` shell trick the
|
||||
# Python-level join semantics let the thread actually run to
|
||||
# completion if the interpreter is still alive. atexit registers
|
||||
# ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
|
||||
# outstanding cleanups, so most exits complete the work cleanly.
|
||||
import threading
|
||||
t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
|
||||
t.start()
|
||||
self._cleanup_thread = t
|
||||
self._container_id = None
|
||||
|
||||
# Bind-mount dir teardown only runs when we actually removed the
|
||||
# container (the dirs are the container's filesystem state; keeping
|
||||
# them around with no container would orphan the data on disk).
|
||||
if should_remove and not self._persistent:
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
|
||||
def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
|
||||
"""Block up to *timeout* seconds for the cleanup worker thread.
|
||||
|
||||
Returns ``True`` if the thread finished (or no thread was started),
|
||||
``False`` on timeout. The atexit hook in terminal_tool.py calls this
|
||||
on every active environment so docker stop/rm actually completes
|
||||
before the Python process exits — without this, ``hermes /quit``
|
||||
races the interpreter shutdown and leaves stopped containers behind.
|
||||
"""
|
||||
thread = getattr(self, "_cleanup_thread", None)
|
||||
if thread is None or not thread.is_alive():
|
||||
return True
|
||||
thread.join(timeout=timeout)
|
||||
return not thread.is_alive()
|
||||
|
||||
+143
-3
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
|
||||
_cleanup_thread = None
|
||||
_cleanup_running = False
|
||||
|
||||
# Once-per-process guard for the docker orphan reaper (issue #20561).
|
||||
# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
|
||||
# calls for parallel subagents won't re-trigger the sweep.
|
||||
_docker_orphan_reaper_ran = False
|
||||
_docker_orphan_reaper_lock = threading.Lock()
|
||||
|
||||
|
||||
def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
|
||||
"""Run the docker orphan reaper once per process, if enabled.
|
||||
|
||||
Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
|
||||
profile that match the issue #20561 leak class — containers left behind
|
||||
by Hermes processes that exited without firing ``atexit`` (SIGKILL,
|
||||
OOM, terminal-window-close). The reaper is conservative by default:
|
||||
only Exited containers older than ``2 × lifetime_seconds`` and scoped to
|
||||
the current profile.
|
||||
|
||||
Gates:
|
||||
|
||||
* ``terminal.docker_orphan_reaper: false`` disables it entirely (the
|
||||
operator opted out — usually because they're running multiple
|
||||
Hermes processes in the same profile and don't trust the
|
||||
conservative defaults).
|
||||
* ``_docker_orphan_reaper_ran`` flag — sweep runs once per Python
|
||||
interpreter, not on every subagent / RL-rollout / parallel
|
||||
``terminal()`` call.
|
||||
"""
|
||||
global _docker_orphan_reaper_ran
|
||||
if not container_config.get("docker_orphan_reaper", True):
|
||||
return
|
||||
# Cheap double-checked-locking: read without the lock, take the lock
|
||||
# only on first run, recheck inside.
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
with _docker_orphan_reaper_lock:
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
_docker_orphan_reaper_ran = True
|
||||
|
||||
# 2 × lifetime_seconds gives sibling Hermes processes a generous grace
|
||||
# window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
|
||||
# doesn't get an instant-reap that races their own setup.
|
||||
# ``container_config`` only carries container_* keys, so read
|
||||
# lifetime_seconds from the env var the rest of the module uses.
|
||||
try:
|
||||
lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
|
||||
except (TypeError, ValueError):
|
||||
lifetime = 300
|
||||
lifetime = max(60, lifetime)
|
||||
max_age = lifetime * 2
|
||||
|
||||
try:
|
||||
from tools.environments.docker import (
|
||||
reap_orphan_containers, _get_active_profile_name,
|
||||
)
|
||||
except ImportError:
|
||||
return
|
||||
try:
|
||||
profile = _get_active_profile_name()
|
||||
removed = reap_orphan_containers(
|
||||
max_age_seconds=max_age, profile_filter=profile,
|
||||
)
|
||||
if removed:
|
||||
logger.info(
|
||||
"Docker orphan reaper removed %d stale container(s) for profile %s",
|
||||
removed, profile,
|
||||
)
|
||||
except Exception as e:
|
||||
# Never fail the env-creation path because of a janitor problem.
|
||||
logger.debug("Docker orphan reaper raised: %s", e)
|
||||
|
||||
|
||||
# Per-task environment overrides registry.
|
||||
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
|
||||
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
|
||||
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
|
||||
"docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
|
||||
"docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
|
||||
"docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
|
||||
# Cross-process container reuse (issue #20561). The docs claim
|
||||
# "ONE long-lived container shared across sessions" — this toggle
|
||||
# makes that real by probing for a labeled container at startup and
|
||||
# attaching to it instead of always starting a fresh one. Set to
|
||||
# ``false`` for hard per-process isolation (no reuse, container is
|
||||
# removed on exit).
|
||||
"docker_persist_across_processes": os.getenv(
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
# Startup orphan reaper for hermes-tagged containers left behind by
|
||||
# crashed / SIGKILL'd previous processes that bypassed atexit.
|
||||
# Conservative: only sweeps Exited containers older than 2× the
|
||||
# idle-reap window AND scoped to the current profile. Issue #20561.
|
||||
"docker_orphan_reaper": os.getenv(
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
}
|
||||
|
||||
|
||||
@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
return _LocalEnvironment(cwd=cwd, timeout=timeout)
|
||||
|
||||
elif env_type == "docker":
|
||||
# One-shot orphan reaper: clean up labeled containers left behind by
|
||||
# prior Hermes processes that hit SIGKILL / OOM / a closed terminal
|
||||
# before the atexit cleanup hook could run. Gated to once per
|
||||
# process so concurrent _create_environment calls (parallel
|
||||
# subagents, RL benchmarks) don't run the reaper N times.
|
||||
# Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
|
||||
_maybe_reap_docker_orphans(cc)
|
||||
return _DockerEnvironment(
|
||||
image=image, cwd=cwd, timeout=timeout,
|
||||
cpu=cpu, memory=memory, disk=disk,
|
||||
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
env=docker_env,
|
||||
run_as_host_user=cc.get("docker_run_as_host_user", False),
|
||||
extra_args=docker_extra_args,
|
||||
persist_across_processes=cc.get("docker_persist_across_processes", True),
|
||||
)
|
||||
|
||||
elif env_type == "singularity":
|
||||
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
|
||||
return cleaned
|
||||
|
||||
|
||||
def cleanup_vm(task_id: str):
|
||||
"""Manually clean up a specific environment by task_id."""
|
||||
def cleanup_vm(task_id: str, *, force_remove: bool = False):
|
||||
"""Manually clean up a specific environment by task_id.
|
||||
|
||||
*force_remove* (default False) is forwarded to backends that accept it
|
||||
— currently only ``DockerEnvironment``. The default of False matches
|
||||
session-lifecycle semantics: this function is called from
|
||||
``AIAgent.close()`` (TUI session close, gateway session teardown) and the
|
||||
per-turn cleanup branch for non-persistent envs, both of which should
|
||||
honor the user's persist-mode preference. Stopping the container here
|
||||
would defeat the "ONE long-lived container shared across sessions"
|
||||
contract — exactly the bug Ben reported when the container was killed
|
||||
on every TUI session close.
|
||||
|
||||
Pass ``force_remove=True`` for actual user-initiated teardown
|
||||
(e.g. ``/reset``-style flows that haven't been wired yet, or future
|
||||
"destroy my sandbox" commands).
|
||||
|
||||
The idle reaper passes the env through ``env.cleanup()`` directly (not
|
||||
via this function), so persist-mode idle envs are similarly no-op'd —
|
||||
only the orphan reaper at next startup reclaims them.
|
||||
"""
|
||||
# Remove from tracking dicts while holding the lock, but defer the
|
||||
# actual (potentially slow) env.cleanup() call to outside the lock
|
||||
# so other tool calls aren't blocked.
|
||||
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):
|
||||
|
||||
try:
|
||||
if hasattr(env, 'cleanup'):
|
||||
env.cleanup()
|
||||
# Pass force_remove only if the env's cleanup() accepts it
|
||||
# (DockerEnvironment after issue #20561; other backends don't).
|
||||
import inspect
|
||||
sig = inspect.signature(env.cleanup)
|
||||
if "force_remove" in sig.parameters:
|
||||
env.cleanup(force_remove=force_remove)
|
||||
else:
|
||||
env.cleanup()
|
||||
elif hasattr(env, 'stop'):
|
||||
env.stop()
|
||||
elif hasattr(env, 'terminate'):
|
||||
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
|
||||
if _active_environments:
|
||||
count = len(_active_environments)
|
||||
logger.info("Shutting down %d remaining sandbox(es)...", count)
|
||||
# Snapshot the env objects BEFORE cleanup_all_environments empties
|
||||
# the dict; we need them to wait on docker cleanup threads after the
|
||||
# registry has been cleared.
|
||||
envs_to_wait = list(_active_environments.values())
|
||||
cleanup_all_environments()
|
||||
# Block briefly so docker stop/rm actually completes before the
|
||||
# interpreter exits. Issue #20561 — without this join, the daemon
|
||||
# cleanup threads were getting torn down mid-`docker stop`, leaving
|
||||
# Exited containers piled up on the host.
|
||||
for env in envs_to_wait:
|
||||
wait_fn = getattr(env, "wait_for_cleanup", None)
|
||||
if wait_fn is None:
|
||||
continue
|
||||
try:
|
||||
wait_fn(timeout=15.0)
|
||||
except Exception as e: # never block shutdown on a bad backend
|
||||
logger.debug("wait_for_cleanup raised on exit: %s", e)
|
||||
|
||||
atexit.register(_atexit_cleanup)
|
||||
|
||||
@@ -1746,6 +1884,8 @@ def terminal_tool(
|
||||
"docker_env": config.get("docker_env", {}),
|
||||
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
|
||||
"docker_extra_args": config.get("docker_extra_args", []),
|
||||
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
|
||||
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
|
||||
}
|
||||
|
||||
local_config = None
|
||||
|
||||
@@ -1589,7 +1589,7 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
source = { editable = "." }
|
||||
dependencies = [
|
||||
{ name = "croniter" },
|
||||
|
||||
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools
|
||||
|
||||
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
|
||||
|
||||
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
|
||||
**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
@@ -138,8 +138,11 @@ terminal:
|
||||
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
|
||||
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
|
||||
docker_run_as_host_user: false # See "Running container as host user" below
|
||||
docker_forward_env: # Env vars to forward into container
|
||||
docker_forward_env: # Host env vars to forward into container
|
||||
- "GITHUB_TOKEN"
|
||||
docker_env: # Literal env vars to inject (KEY=value)
|
||||
DEBUG: "1"
|
||||
PYTHONUNBUFFERED: "1"
|
||||
docker_volumes: # Host directory mounts
|
||||
- "/home/user/projects:/workspace/projects"
|
||||
- "/home/user/data:/data:ro" # :ro for read-only
|
||||
@@ -151,14 +154,49 @@ terminal:
|
||||
container_cpu: 1 # CPU cores (0 = unlimited)
|
||||
container_memory: 5120 # MB (0 = unlimited)
|
||||
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
|
||||
container_persistent: true # Persist /workspace and /root across sessions
|
||||
container_persistent: true # Persist /workspace and /root bind-mount dirs
|
||||
|
||||
# Cross-process container reuse (defaults match the "one long-lived
|
||||
# container shared across sessions" contract — see Container lifecycle).
|
||||
docker_persist_across_processes: true # Reuse container across Hermes restarts
|
||||
docker_orphan_reaper: true # Sweep abandoned Exited containers at startup
|
||||
|
||||
# Cross-backend lifecycle settings (apply to docker as well)
|
||||
timeout: 180 # Per-command timeout in seconds
|
||||
lifetime_seconds: 300 # Idle-reaper window; also feeds 2× orphan-reaper threshold
|
||||
```
|
||||
|
||||
**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
|
||||
|
||||
**`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.
|
||||
|
||||
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
|
||||
|
||||
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
|
||||
#### Container lifecycle
|
||||
|
||||
Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
|
||||
|
||||
- `hermes-agent=1` — marks it as Hermes-managed
|
||||
- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
|
||||
- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
|
||||
|
||||
On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
|
||||
|
||||
When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
|
||||
|
||||
**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
|
||||
|
||||
| Trigger | When it fires |
|
||||
|---|---|
|
||||
| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
|
||||
| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
|
||||
| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
|
||||
| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
|
||||
|
||||
Edge cases worth knowing:
|
||||
|
||||
- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
|
||||
- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.
|
||||
|
||||
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
|
||||
|
||||
@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta
|
||||
|
||||
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
|
||||
|
||||
#### Environment variable overrides
|
||||
|
||||
Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
|
||||
|
||||
| Env var | Maps to | Notes |
|
||||
|---|---|---|
|
||||
| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
|
||||
| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
|
||||
| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
|
||||
| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
|
||||
| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
|
||||
| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
|
||||
| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
|
||||
| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
|
||||
| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
|
||||
| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
|
||||
| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
|
||||
| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
|
||||
|
||||
### SSH Backend
|
||||
|
||||
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
|
||||
|
||||
@@ -249,7 +249,7 @@ docker run -it --rm \
|
||||
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
|
||||
|
||||
:::note Looking for Docker as the **terminal backend**?
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
|
||||
:::
|
||||
|
||||
## Docker Compose example
|
||||
|
||||
Reference in New Issue
Block a user