Compare commits
20 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 7e6d4b982e | |||
| 3c6e70aef1 | |||
| 2f0f03c40d | |||
| 5c2170a7c6 | |||
| d77d877665 | |||
| ac8e238bc8 | |||
| 8d129d013b | |||
| 300140e006 | |||
| e71a2bd11b | |||
| 769ee86cd2 | |||
| 1b1e30510a | |||
| f3acdd94fe | |||
| 78a54d2c00 | |||
| e7c99651fb | |||
| a5c1f925b5 | |||
| 0acb7f4583 | |||
| a3cd974ee7 | |||
| 102eb4adc0 | |||
| c661fefa08 | |||
| c9e5a9bb08 |
@@ -0,0 +1,110 @@
|
||||
# Hermes Agent v0.15.1 (v2026.5.29)
|
||||
|
||||
**Release Date:** May 29, 2026
|
||||
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
|
||||
|
||||
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Highlights
|
||||
|
||||
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
|
||||
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
|
||||
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
|
||||
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Bug Fixes
|
||||
|
||||
### Dashboard / Web
|
||||
|
||||
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
|
||||
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
|
||||
|
||||
### Docker
|
||||
|
||||
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
|
||||
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
|
||||
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
|
||||
|
||||
### MCP
|
||||
|
||||
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
|
||||
|
||||
### Kanban
|
||||
|
||||
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
|
||||
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
|
||||
|
||||
### Gateway
|
||||
|
||||
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
|
||||
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
|
||||
|
||||
### CLI
|
||||
|
||||
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
|
||||
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
|
||||
|
||||
### Skills
|
||||
|
||||
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
|
||||
|
||||
### Redaction
|
||||
|
||||
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
|
||||
|
||||
---
|
||||
|
||||
## ✨ Small Features
|
||||
|
||||
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
|
||||
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
|
||||
|
||||
---
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Infrastructure
|
||||
|
||||
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
|
||||
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
|
||||
|
||||
---
|
||||
|
||||
## 👥 Contributors
|
||||
|
||||
### Core
|
||||
- @teknium1
|
||||
|
||||
### Community
|
||||
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
|
||||
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
|
||||
- @kshitijk4poor — `/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
|
||||
- @nicoloboschi — hindsight `recall_types` observation default
|
||||
- @BROCCOLO1D — arm64 PR build cache fix
|
||||
- @r266-tech — `--no-supervise` reference docs
|
||||
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
|
||||
- @devwdave — completed-turn memory context (credited via salvage)
|
||||
- @andrewhosf — co-author
|
||||
|
||||
### Issue Reporters (the 401 loop)
|
||||
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
|
||||
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
|
||||
|
||||
---
|
||||
|
||||
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"id": "hermes-agent",
|
||||
"name": "Hermes Agent",
|
||||
"version": "0.15.0",
|
||||
"version": "0.15.1",
|
||||
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
|
||||
"repository": "https://github.com/NousResearch/hermes-agent",
|
||||
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
|
||||
@@ -9,7 +9,7 @@
|
||||
"license": "MIT",
|
||||
"distribution": {
|
||||
"uvx": {
|
||||
"package": "hermes-agent[acp]==0.15.0",
|
||||
"package": "hermes-agent[acp]==0.15.1",
|
||||
"args": ["hermes-acp"]
|
||||
}
|
||||
}
|
||||
|
||||
+134
-14
@@ -37,6 +37,8 @@ from __future__ import annotations
|
||||
import base64
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
@@ -46,6 +48,102 @@ logger = logging.getLogger(__name__)
|
||||
_VALID_MODES = frozenset({"auto", "native", "text"})
|
||||
|
||||
|
||||
# Image extensions used by extract_image_refs(). Kept tight on purpose — we
|
||||
# only auto-attach things the model can actually see. Documents/archives are
|
||||
# excluded because the gateway's broader extract_local_files() also routes
|
||||
# them differently (send_document), and we don't want to attach a PDF as a
|
||||
# vision part.
|
||||
_IMAGE_EXTS = (
|
||||
".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".heic",
|
||||
)
|
||||
_IMAGE_EXT_PATTERN = "|".join(e.lstrip(".") for e in _IMAGE_EXTS)
|
||||
|
||||
# Absolute / home-relative local image path. Matches the same shape gateway's
|
||||
# extract_local_files() uses: anchors to ``~/`` or ``/``, ignores matches inside
|
||||
# URLs (the ``(?<![/:\w.])`` lookbehind), and case-insensitive on the extension.
|
||||
_LOCAL_IMAGE_PATH_RE = re.compile(
|
||||
r"(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:" + _IMAGE_EXT_PATTERN + r")\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# http(s) URL ending in an image extension (optionally followed by a
|
||||
# query string). Case-insensitive on the extension. Strict ``http(s)://``
|
||||
# scheme so we don't accidentally grab ``file://`` URLs or other shapes.
|
||||
_IMAGE_URL_RE = re.compile(
|
||||
r"https?://[^\s<>\"']+?\.(?:" + _IMAGE_EXT_PATTERN + r")(?:\?[^\s<>\"']*)?",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
def extract_image_refs(text: str) -> Tuple[List[str], List[str]]:
|
||||
"""Scan free-form text for image references the model should see.
|
||||
|
||||
Returns ``(local_paths, urls)``:
|
||||
|
||||
* ``local_paths`` — absolute (``/``) or home-relative (``~/``) paths
|
||||
whose suffix is an image extension AND whose expanded form exists
|
||||
on disk as a file. Order-preserving, deduplicated.
|
||||
* ``urls`` — ``http(s)://…`` URLs whose path ends in an image
|
||||
extension (a ``?query`` is allowed after the extension).
|
||||
Order-preserving, deduplicated.
|
||||
|
||||
Matches inside fenced code blocks (``` ``` ```) and inline backticks
|
||||
(`` `…` ``) are skipped so that snippets pasted into a task body for
|
||||
reference aren't mistaken for live attachments. This mirrors the
|
||||
behaviour of ``gateway.platforms.base.BaseAdapter.extract_local_files``.
|
||||
|
||||
Local paths are validated against the filesystem; URLs are not
|
||||
(the provider fetches them at request time).
|
||||
"""
|
||||
if not isinstance(text, str) or not text:
|
||||
return [], []
|
||||
|
||||
# Build spans covered by fenced code blocks and inline code so we can
|
||||
# ignore references the author embedded purely as example text.
|
||||
code_spans: list[tuple[int, int]] = []
|
||||
for m in re.finditer(r"```[^\n]*\n.*?```", text, re.DOTALL):
|
||||
code_spans.append((m.start(), m.end()))
|
||||
for m in re.finditer(r"`[^`\n]+`", text):
|
||||
code_spans.append((m.start(), m.end()))
|
||||
|
||||
def _in_code(pos: int) -> bool:
|
||||
return any(s <= pos < e for s, e in code_spans)
|
||||
|
||||
local_paths: list[str] = []
|
||||
seen_paths: set[str] = set()
|
||||
for match in _LOCAL_IMAGE_PATH_RE.finditer(text):
|
||||
if _in_code(match.start()):
|
||||
continue
|
||||
raw = match.group(0)
|
||||
expanded = os.path.expanduser(raw)
|
||||
try:
|
||||
if not os.path.isfile(expanded):
|
||||
continue
|
||||
except OSError:
|
||||
# ENAMETOOLONG / EINVAL on pathological inputs — skip rather than crash.
|
||||
continue
|
||||
if expanded in seen_paths:
|
||||
continue
|
||||
seen_paths.add(expanded)
|
||||
local_paths.append(expanded)
|
||||
|
||||
urls: list[str] = []
|
||||
seen_urls: set[str] = set()
|
||||
for match in _IMAGE_URL_RE.finditer(text):
|
||||
if _in_code(match.start()):
|
||||
continue
|
||||
url = match.group(0)
|
||||
# Strip trailing punctuation that's almost certainly prose, not part
|
||||
# of the URL (e.g. "see https://x.com/a.png." or "/a.png)").
|
||||
url = url.rstrip(".,;:!?)]>")
|
||||
if url in seen_urls:
|
||||
continue
|
||||
seen_urls.add(url)
|
||||
urls.append(url)
|
||||
|
||||
return local_paths, urls
|
||||
|
||||
|
||||
# Strict YAML/JSON boolean coercion for capability overrides.
|
||||
#
|
||||
# ``bool("false")`` is True in Python because non-empty strings are truthy, so
|
||||
@@ -320,20 +418,29 @@ def _file_to_data_url(path: Path) -> Optional[str]:
|
||||
def build_native_content_parts(
|
||||
user_text: str,
|
||||
image_paths: List[str],
|
||||
image_urls: Optional[List[str]] = None,
|
||||
) -> Tuple[List[Dict[str, Any]], List[str]]:
|
||||
"""Build an OpenAI-style ``content`` list for a user turn.
|
||||
|
||||
Shape:
|
||||
[{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
|
||||
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
|
||||
{"type": "image_url", "image_url": {"url": "https://example.com/a.png"}},
|
||||
...]
|
||||
|
||||
The local path of each successfully attached image is appended to the
|
||||
text part as ``[Image attached at: <path>]``. The model still sees the
|
||||
pixels via the ``image_url`` part (full native vision); the path note
|
||||
just gives it a string handle so MCP/skill tools that take an image
|
||||
path or URL argument can be invoked on the same image without an
|
||||
extra round-trip. This parallels the text-mode hint produced by
|
||||
Local paths are read from disk and embedded as base64 ``data:`` URLs.
|
||||
Remote URLs (``http(s)://``) are passed through verbatim — the provider
|
||||
fetches them server-side. The model still sees the pixels either way.
|
||||
|
||||
For each successfully attached image, a hint is appended to the text
|
||||
part:
|
||||
|
||||
* local path → ``[Image attached at: <path>]``
|
||||
* URL → ``[Image attached: <url>]``
|
||||
|
||||
The hint gives the model a string handle so MCP/skill tools that take
|
||||
an image path or URL argument can be invoked on the same image without
|
||||
an extra round-trip. This parallels the text-mode hint produced by
|
||||
``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
|
||||
<path>``) so behaviour is consistent across both image input modes.
|
||||
|
||||
@@ -342,12 +449,14 @@ def build_native_content_parts(
|
||||
ceiling), the agent's retry loop transparently shrinks and retries
|
||||
once — see ``run_agent._try_shrink_image_parts_in_messages``.
|
||||
|
||||
Returns (content_parts, skipped_paths). Skipped paths are files that
|
||||
couldn't be read from disk and are NOT advertised in the path hints.
|
||||
Returns (content_parts, skipped). Skipped entries are local paths
|
||||
that couldn't be read from disk; URLs are never skipped (they're
|
||||
not validated here).
|
||||
"""
|
||||
skipped: List[str] = []
|
||||
image_parts: List[Dict[str, Any]] = []
|
||||
attached_paths: List[str] = []
|
||||
attached_urls: List[str] = []
|
||||
|
||||
for raw_path in image_paths:
|
||||
p = Path(raw_path)
|
||||
@@ -364,16 +473,26 @@ def build_native_content_parts(
|
||||
})
|
||||
attached_paths.append(str(raw_path))
|
||||
|
||||
for url in image_urls or []:
|
||||
url = (url or "").strip()
|
||||
if not url:
|
||||
continue
|
||||
image_parts.append({
|
||||
"type": "image_url",
|
||||
"image_url": {"url": url},
|
||||
})
|
||||
attached_urls.append(url)
|
||||
|
||||
text = (user_text or "").strip()
|
||||
|
||||
# If at least one image attached, build a single text part that combines
|
||||
# the user's caption (or a neutral default) with one path hint per image.
|
||||
if attached_paths:
|
||||
# the user's caption (or a neutral default) with one hint per image.
|
||||
if attached_paths or attached_urls:
|
||||
base_text = text or "What do you see in this image?"
|
||||
path_hints = "\n".join(
|
||||
f"[Image attached at: {p}]" for p in attached_paths
|
||||
)
|
||||
combined_text = f"{base_text}\n\n{path_hints}"
|
||||
hint_lines: List[str] = []
|
||||
hint_lines.extend(f"[Image attached at: {p}]" for p in attached_paths)
|
||||
hint_lines.extend(f"[Image attached: {u}]" for u in attached_urls)
|
||||
combined_text = f"{base_text}\n\n" + "\n".join(hint_lines)
|
||||
parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
|
||||
parts.extend(image_parts)
|
||||
return parts, skipped
|
||||
@@ -388,4 +507,5 @@ def build_native_content_parts(
|
||||
__all__ = [
|
||||
"decide_image_input_mode",
|
||||
"build_native_content_parts",
|
||||
"extract_image_refs",
|
||||
]
|
||||
|
||||
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
# Persistent shell (non-local backends)
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
@@ -15125,13 +15127,50 @@ def main(
|
||||
# Handle single query mode
|
||||
if query or image:
|
||||
query, single_query_images = _collect_query_images(query, image)
|
||||
# Kanban workers spawn with ``hermes chat -q "work kanban task <id>"``;
|
||||
# the actual task description lives in the task body. Mirror the
|
||||
# gateway/CLI behaviour for inbound images by scanning the body for
|
||||
# local image paths and http(s) image URLs and attaching them to the
|
||||
# worker's first turn. Without this, users who paste a screenshot
|
||||
# path or URL into a kanban task body never get it routed to the
|
||||
# model's vision input.
|
||||
single_query_image_urls: list[str] = []
|
||||
_kanban_task_id = os.environ.get("HERMES_KANBAN_TASK", "").strip()
|
||||
if _kanban_task_id:
|
||||
try:
|
||||
from hermes_cli import kanban_db as _kb
|
||||
from agent.image_routing import extract_image_refs as _extract_refs
|
||||
|
||||
_conn = _kb.connect()
|
||||
try:
|
||||
_task = _kb.get_task(_conn, _kanban_task_id)
|
||||
finally:
|
||||
try:
|
||||
_conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
_body = getattr(_task, "body", "") if _task is not None else ""
|
||||
if _body:
|
||||
_kb_paths, _kb_urls = _extract_refs(_body)
|
||||
if _kb_paths:
|
||||
# Dedupe against any --image the user already passed.
|
||||
_seen = {str(p) for p in single_query_images}
|
||||
for _p in _kb_paths:
|
||||
if _p not in _seen:
|
||||
_seen.add(_p)
|
||||
single_query_images.append(Path(_p))
|
||||
if _kb_urls:
|
||||
single_query_image_urls.extend(_kb_urls)
|
||||
except Exception as _exc:
|
||||
# Best-effort enrichment; never block worker startup on it.
|
||||
logger.debug("kanban image-ref extraction failed: %s", _exc)
|
||||
if quiet:
|
||||
# Quiet mode: suppress banner, spinner, tool previews.
|
||||
# Only print the final response and parseable session info.
|
||||
cli.tool_progress_mode = "off"
|
||||
if cli._ensure_runtime_credentials():
|
||||
effective_query: Any = query
|
||||
if single_query_images:
|
||||
if single_query_images or single_query_image_urls:
|
||||
# Honour the same image-routing decision used by the
|
||||
# interactive path. With a vision-capable model (incl.
|
||||
# custom-provider models declared via
|
||||
@@ -15160,19 +15199,26 @@ def main(
|
||||
_parts, _skipped = _build_parts(
|
||||
query if isinstance(query, str) else "",
|
||||
[str(p) for p in single_query_images],
|
||||
image_urls=list(single_query_image_urls) or None,
|
||||
)
|
||||
if any(p.get("type") == "image_url" for p in _parts):
|
||||
effective_query = _parts
|
||||
else:
|
||||
# All images unreadable — text fallback.
|
||||
# ``_preprocess_images_with_vision`` only knows
|
||||
# about local files; URLs would be lost there,
|
||||
# so keep the original query text intact when
|
||||
# only URLs were supplied.
|
||||
if single_query_images:
|
||||
effective_query = cli._preprocess_images_with_vision(
|
||||
query, single_query_images, announce=False,
|
||||
)
|
||||
except Exception:
|
||||
if single_query_images:
|
||||
effective_query = cli._preprocess_images_with_vision(
|
||||
query, single_query_images, announce=False,
|
||||
)
|
||||
except Exception:
|
||||
effective_query = cli._preprocess_images_with_vision(
|
||||
query, single_query_images, announce=False,
|
||||
)
|
||||
else:
|
||||
elif single_query_images:
|
||||
effective_query = cli._preprocess_images_with_vision(
|
||||
query,
|
||||
single_query_images,
|
||||
|
||||
@@ -831,6 +831,8 @@ if _config_path.exists():
|
||||
"docker_env": "TERMINAL_DOCKER_ENV",
|
||||
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
|
||||
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
|
||||
}
|
||||
|
||||
@@ -14,8 +14,8 @@ Provides subcommands for:
|
||||
import os
|
||||
import sys
|
||||
|
||||
__version__ = "0.15.0"
|
||||
__release_date__ = "2026.5.28"
|
||||
__version__ = "0.15.1"
|
||||
__release_date__ = "2026.5.29"
|
||||
|
||||
|
||||
def _ensure_utf8():
|
||||
|
||||
@@ -5551,6 +5551,8 @@ def set_config_value(key: str, value: str):
|
||||
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
|
||||
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
|
||||
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
|
||||
# terminal.cwd intentionally excluded — CLI resolves at runtime,
|
||||
# gateway bridges it in gateway/run.py. Persisting to .env causes
|
||||
|
||||
@@ -26,10 +26,15 @@ from hermes_cli.dashboard_auth import list_providers
|
||||
from hermes_cli.dashboard_auth.audit import AuditEvent, audit_log
|
||||
from hermes_cli.dashboard_auth.base import ProviderError
|
||||
from hermes_cli.dashboard_auth.cookies import read_session_cookies
|
||||
from hermes_cli.dashboard_auth.public_paths import PUBLIC_API_PATHS
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
# Paths that bypass the auth gate. Order matters: prefix match.
|
||||
# Prefixes that bypass the auth gate. Match via ``path == prefix`` or
|
||||
# ``path.startswith(prefix)`` — so ``/assets/`` (with trailing slash)
|
||||
# matches ``/assets/foo.css`` but not ``/assetsleak``. Auth-bootstrap
|
||||
# (login page, OAuth round trip, provider listing) and static asset
|
||||
# mounts go here.
|
||||
_GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
|
||||
"/auth/login",
|
||||
"/auth/callback",
|
||||
@@ -45,6 +50,20 @@ _GATE_PUBLIC_PREFIXES: tuple[str, ...] = (
|
||||
|
||||
|
||||
def _path_is_public(path: str) -> bool:
|
||||
"""True if ``path`` bypasses the OAuth auth gate.
|
||||
|
||||
Two sources of public-ness:
|
||||
|
||||
* :data:`PUBLIC_API_PATHS` — the shared ``/api/*`` allowlist that
|
||||
the legacy ``_SESSION_TOKEN`` middleware also honours. Matched
|
||||
exactly (no prefix expansion) so adding ``/api/status`` doesn't
|
||||
accidentally expose ``/api/status/secret-extension``.
|
||||
* :data:`_GATE_PUBLIC_PREFIXES` — auth-bootstrap routes and static
|
||||
mounts. Prefix-matched so ``/assets/foo.css`` lights up via
|
||||
``/assets/``.
|
||||
"""
|
||||
if path in PUBLIC_API_PATHS:
|
||||
return True
|
||||
return any(
|
||||
path == prefix or path.startswith(prefix)
|
||||
for prefix in _GATE_PUBLIC_PREFIXES
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
"""Shared allowlist of ``/api/*`` paths that bypass dashboard auth.
|
||||
|
||||
Two middlewares enforce dashboard auth and previously kept independent
|
||||
copies of this list:
|
||||
|
||||
* ``hermes_cli.web_server.auth_middleware`` — loopback / ``--insecure``
|
||||
mode, gates on the ephemeral ``_SESSION_TOKEN``.
|
||||
* ``hermes_cli.dashboard_auth.middleware.gated_auth_middleware`` —
|
||||
non-loopback mode, gates on the OAuth session cookie.
|
||||
|
||||
When the lists drifted, ``/api/status`` ended up public under the legacy
|
||||
gate but 401'd under the OAuth gate. That broke the portal's wildcard
|
||||
liveness probe (``nous-account-service`` ``fly-provider.ts``
|
||||
``getInstanceRuntimeStatus``), which fetches ``/api/status`` without a
|
||||
cookie as its sole signal of "agent dashboard is alive": every healthy
|
||||
wildcard-subdomain agent surfaced as STARTING/down in the portal UI even
|
||||
though the dashboard was serving correctly.
|
||||
|
||||
Centralising the allowlist here so both middlewares import the same
|
||||
frozenset prevents the next drift. Keep this list minimal — only truly
|
||||
non-sensitive, read-only endpoints belong here. As a sanity check, every
|
||||
entry should be safe to expose to:
|
||||
|
||||
* external uptime probes (Pingdom, Better Stack, NAS),
|
||||
* the dashboard SPA before the user has logged in,
|
||||
* anyone who happens to ``curl`` the hostname.
|
||||
|
||||
If a new endpoint doesn't pass all three tests, it should be gated and
|
||||
the SPA should bootstrap it after login instead.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
PUBLIC_API_PATHS: frozenset[str] = frozenset({
|
||||
# Liveness probe target. Returns version, gateway state, active
|
||||
# session count, and the dashboard auth-gate shape. No bodies, no
|
||||
# session content, no secrets. Documented as the portal's wildcard
|
||||
# liveness probe in
|
||||
# ``docs/agent-dashboard-public-url-contract.md`` (NAS side).
|
||||
"/api/status",
|
||||
# Read-only config-defaults / schema feeds for the SPA's Config page.
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
# Read-only model metadata (context windows, etc.) — same shape as
|
||||
# provider catalogs already exposed on the public internet.
|
||||
"/api/model/info",
|
||||
# Read-only theme + plugin manifests for the dashboard skin engine.
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
})
|
||||
+13
-10
@@ -110,17 +110,20 @@ app.add_middleware(
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Endpoints that do NOT require the session token. Everything else under
|
||||
# /api/ is gated by the auth middleware below. Keep this list minimal —
|
||||
# only truly non-sensitive, read-only endpoints belong here.
|
||||
# /api/ is gated by the auth middleware below.
|
||||
#
|
||||
# This list is defined in ``hermes_cli.dashboard_auth.public_paths`` so the
|
||||
# OAuth gate middleware can honour the same allowlist — keeping the two
|
||||
# gates in lockstep avoids drift like the wildcard-subdomain regression
|
||||
# where ``/api/status`` was public under the legacy gate but 401'd under
|
||||
# the OAuth gate (breaking the portal's liveness probe).
|
||||
#
|
||||
# Keep the upstream list minimal — only truly non-sensitive, read-only
|
||||
# endpoints belong there.
|
||||
# ---------------------------------------------------------------------------
|
||||
_PUBLIC_API_PATHS: frozenset = frozenset({
|
||||
"/api/status",
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
"/api/model/info",
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
})
|
||||
from hermes_cli.dashboard_auth.public_paths import (
|
||||
PUBLIC_API_PATHS as _PUBLIC_API_PATHS,
|
||||
)
|
||||
|
||||
|
||||
def _has_valid_session_token(request: Request) -> bool:
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ let
|
||||
src = ../web;
|
||||
npmDeps = pkgs.fetchNpmDeps {
|
||||
inherit src;
|
||||
hash = "sha256-6qhGuifHVtCeep1SiQdCUxBMr7UGhYpdMTvXhrQu/zA=";
|
||||
hash = "sha256-HV0aISBVjwbGqDj8qQynSxGFrrZDzuYAW3D3lB/x3zo=";
|
||||
};
|
||||
|
||||
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
@@ -16,6 +16,7 @@ from agent.image_routing import (
|
||||
_supports_vision_override,
|
||||
build_native_content_parts,
|
||||
decide_image_input_mode,
|
||||
extract_image_refs,
|
||||
)
|
||||
|
||||
|
||||
@@ -449,3 +450,190 @@ class TestLargeImageHandling:
|
||||
assert len(parts) == 2
|
||||
assert parts[0]["type"] == "text"
|
||||
assert parts[1]["type"] == "image_url"
|
||||
|
||||
|
||||
# ─── extract_image_refs ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestExtractImageRefs:
|
||||
"""Scan task body / inbound text for image paths and URLs (kanban worker
|
||||
enrichment, issue raised May 2026)."""
|
||||
|
||||
def test_empty_or_none_returns_empty(self):
|
||||
assert extract_image_refs("") == ([], [])
|
||||
assert extract_image_refs(None) == ([], []) # type: ignore[arg-type]
|
||||
|
||||
def test_finds_absolute_path(self, tmp_path: Path):
|
||||
img = tmp_path / "screenshot.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = f"Look at {img} and tell me what's wrong."
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == []
|
||||
|
||||
def test_finds_home_relative_path(self, tmp_path: Path, monkeypatch):
|
||||
# Simulate ~/foo.png by pointing HOME at tmp_path and creating the file
|
||||
monkeypatch.setenv("HOME", str(tmp_path))
|
||||
img = tmp_path / "foo.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
paths, urls = extract_image_refs("see ~/foo.png please")
|
||||
assert paths == [str(img)]
|
||||
assert urls == []
|
||||
|
||||
def test_skips_nonexistent_paths(self, tmp_path: Path):
|
||||
# Path-shaped but no file on disk → skipped.
|
||||
body = f"What's at {tmp_path}/never_created.png ?"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == []
|
||||
|
||||
def test_finds_http_image_url(self):
|
||||
body = "Check out https://example.com/photos/cat.png — cute right?"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == ["https://example.com/photos/cat.png"]
|
||||
|
||||
def test_finds_https_url_with_query_string(self):
|
||||
body = "Diagram: https://cdn.example.com/img.jpeg?size=large&v=2 here"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert urls == ["https://cdn.example.com/img.jpeg?size=large&v=2"]
|
||||
|
||||
def test_url_trailing_punctuation_stripped(self):
|
||||
# Prose punctuation right after the URL must not be part of the URL.
|
||||
body = "See https://example.com/a.png."
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert urls == ["https://example.com/a.png"]
|
||||
|
||||
def test_ignores_non_image_urls(self):
|
||||
body = "See https://example.com/page.html and https://x.com/y.pdf"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert urls == []
|
||||
|
||||
def test_dedupes_paths_and_urls(self, tmp_path: Path):
|
||||
img = tmp_path / "dup.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = (
|
||||
f"First {img} then again {img}. "
|
||||
"Also https://example.com/x.png and https://example.com/x.png again."
|
||||
)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == ["https://example.com/x.png"]
|
||||
|
||||
def test_ignores_paths_in_fenced_code_block(self, tmp_path: Path):
|
||||
img = tmp_path / "real.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = (
|
||||
"Outside the block, attach this:\n"
|
||||
f"{img}\n"
|
||||
"But not these examples:\n"
|
||||
"```\n"
|
||||
f"some_other_image: /tmp/example.png\n"
|
||||
f"url: https://example.com/example.png\n"
|
||||
"```\n"
|
||||
)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == []
|
||||
|
||||
def test_ignores_paths_in_inline_code(self, tmp_path: Path):
|
||||
img = tmp_path / "real.jpg"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = (
|
||||
f"Attach {img}, but ignore the example "
|
||||
"`https://example.com/skip.png` in backticks."
|
||||
)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == []
|
||||
|
||||
def test_does_not_match_paths_inside_urls(self, tmp_path: Path):
|
||||
# The lookbehind in the regex prevents matching the path-portion of
|
||||
# a URL as a local path. Only the URL should be detected.
|
||||
body = "Just the URL: https://example.com/some/dir/image.png"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == ["https://example.com/some/dir/image.png"]
|
||||
|
||||
def test_mixed_paths_and_urls(self, tmp_path: Path):
|
||||
img = tmp_path / "local.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = (
|
||||
f"Compare local {img} against the design at "
|
||||
"https://example.com/design/v2.png — does it match?"
|
||||
)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == ["https://example.com/design/v2.png"]
|
||||
|
||||
def test_case_insensitive_extension(self, tmp_path: Path):
|
||||
img = tmp_path / "shouty.PNG"
|
||||
img.write_bytes(_png_bytes())
|
||||
body = f"see {img}"
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
|
||||
|
||||
# ─── build_native_content_parts with URLs ────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildNativeContentPartsURLs:
|
||||
"""URL pass-through support added so kanban task bodies (and other
|
||||
inbound surfaces) can route remote image URLs straight to the model."""
|
||||
|
||||
def test_url_only_no_local_paths(self):
|
||||
parts, skipped = build_native_content_parts(
|
||||
"what is this?",
|
||||
[],
|
||||
image_urls=["https://example.com/diagram.png"],
|
||||
)
|
||||
assert skipped == []
|
||||
assert len(parts) == 2
|
||||
assert parts[0]["type"] == "text"
|
||||
assert "[Image attached: https://example.com/diagram.png]" in parts[0]["text"]
|
||||
assert parts[0]["text"].startswith("what is this?")
|
||||
assert parts[1] == {
|
||||
"type": "image_url",
|
||||
"image_url": {"url": "https://example.com/diagram.png"},
|
||||
}
|
||||
|
||||
def test_mixed_path_and_url(self, tmp_path: Path):
|
||||
img = tmp_path / "local.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
parts, skipped = build_native_content_parts(
|
||||
"compare these",
|
||||
[str(img)],
|
||||
image_urls=["https://example.com/remote.jpg"],
|
||||
)
|
||||
assert skipped == []
|
||||
# 1 text + 2 image parts (local data URL first, then remote URL).
|
||||
image_parts = [p for p in parts if p.get("type") == "image_url"]
|
||||
assert len(image_parts) == 2
|
||||
assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
|
||||
assert image_parts[1]["image_url"]["url"] == "https://example.com/remote.jpg"
|
||||
text = parts[0]["text"]
|
||||
assert "[Image attached at:" in text
|
||||
assert "[Image attached: https://example.com/remote.jpg]" in text
|
||||
|
||||
def test_empty_url_list_is_no_op(self, tmp_path: Path):
|
||||
img = tmp_path / "x.png"
|
||||
img.write_bytes(_png_bytes())
|
||||
# image_urls=[] should behave the same as not passing it at all.
|
||||
parts_no_urls, _ = build_native_content_parts("hi", [str(img)])
|
||||
parts_empty_urls, _ = build_native_content_parts("hi", [str(img)], image_urls=[])
|
||||
assert parts_no_urls == parts_empty_urls
|
||||
|
||||
def test_blank_url_strings_are_dropped(self):
|
||||
parts, _ = build_native_content_parts(
|
||||
"x", [], image_urls=["", " ", "https://example.com/a.png"]
|
||||
)
|
||||
image_parts = [p for p in parts if p.get("type") == "image_url"]
|
||||
assert len(image_parts) == 1
|
||||
assert image_parts[0]["image_url"]["url"] == "https://example.com/a.png"
|
||||
|
||||
def test_url_only_inserts_default_prompt_when_text_empty(self):
|
||||
parts, _ = build_native_content_parts(
|
||||
"", [], image_urls=["https://example.com/a.png"]
|
||||
)
|
||||
assert parts[0]["type"] == "text"
|
||||
assert parts[0]["text"].startswith("What do you see in this image?")
|
||||
|
||||
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
|
||||
"TERMINAL_CONTAINER_DISK",
|
||||
"TERMINAL_CONTAINER_MEMORY",
|
||||
"TERMINAL_CONTAINER_PERSISTENT",
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER",
|
||||
"TERMINAL_DOCKER_RUN_AS_HOST_USER",
|
||||
"BROWSER_CDP_URL",
|
||||
"CAMOFOX_URL",
|
||||
|
||||
+134
-26
@@ -88,7 +88,15 @@ def test_dashboard_slot_reports_up_when_enabled(
|
||||
"""Symmetry: with HERMES_DASHBOARD=1, s6-svstat reports the slot as up."""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
|
||||
"-e", "HERMES_DASHBOARD=1",
|
||||
# The default dashboard host is 0.0.0.0, which now engages the
|
||||
# OAuth auth gate. Without a provider registered (no
|
||||
# HERMES_DASHBOARD_OAUTH_CLIENT_ID in this test env), start_server
|
||||
# would fail closed and the slot would never come up. Pin the
|
||||
# explicit insecure opt-in to keep this test focused on the s6
|
||||
# supervision contract, not the auth gate.
|
||||
"-e", "HERMES_DASHBOARD_INSECURE=1",
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
# uvicorn takes a moment to bind; poll svstat.
|
||||
@@ -113,7 +121,12 @@ def test_dashboard_opt_in_starts(
|
||||
"""With HERMES_DASHBOARD=1, a dashboard process should be visible."""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
|
||||
"-e", "HERMES_DASHBOARD=1",
|
||||
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
|
||||
# doesn't fail-closed before the process can come up. See
|
||||
# test_dashboard_slot_reports_up_when_enabled for the full rationale.
|
||||
"-e", "HERMES_DASHBOARD_INSECURE=1",
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
# Poll for the dashboard subprocess to appear — the entrypoint
|
||||
@@ -132,6 +145,10 @@ def test_dashboard_port_override(
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
"-e", "HERMES_DASHBOARD=1", "-e", "HERMES_DASHBOARD_PORT=9120",
|
||||
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
|
||||
# doesn't fail-closed before the port is bound. See
|
||||
# test_dashboard_slot_reports_up_when_enabled for the full rationale.
|
||||
"-e", "HERMES_DASHBOARD_INSECURE=1",
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
@@ -161,7 +178,13 @@ def test_dashboard_restarts_after_crash(
|
||||
"""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
"-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
|
||||
"-e", "HERMES_DASHBOARD=1",
|
||||
# Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
|
||||
# doesn't fail-closed before the supervised dashboard can come up.
|
||||
# See test_dashboard_slot_reports_up_when_enabled for the full
|
||||
# rationale.
|
||||
"-e", "HERMES_DASHBOARD_INSECURE=1",
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
# Wait for the first dashboard to come up.
|
||||
@@ -214,36 +237,67 @@ def test_dashboard_restarts_after_crash(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _fetch_api_status(container: str, *, deadline_s: float = 60.0) -> dict:
|
||||
"""Poll ``/api/status`` from inside the container via the venv python.
|
||||
def _http_probe(
|
||||
container: str,
|
||||
path: str,
|
||||
*,
|
||||
deadline_s: float = 60.0,
|
||||
) -> tuple[int, str]:
|
||||
"""Poll ``http://127.0.0.1:9119<path>`` from inside the container.
|
||||
|
||||
The dashboard binds to ``HERMES_DASHBOARD_HOST`` (typically ``0.0.0.0``)
|
||||
so loopback inside the container works. The image doesn't ship
|
||||
``curl`` but Python's stdlib ``urllib`` is good enough.
|
||||
Returns ``(status_code, body)`` as soon as the dashboard answers any
|
||||
HTTP response — 200, 401, 503, anything. The image doesn't ship
|
||||
``curl`` but the venv's stdlib ``urllib`` is good enough; we use a
|
||||
proper ``try``/``except`` to intercept ``HTTPError`` because
|
||||
``urlopen`` raises on 4xx/5xx, and we treat those as legitimate
|
||||
responses (the OAuth gate's 401 IS the success signal for the
|
||||
gate-engaged test).
|
||||
|
||||
Returns the decoded JSON dict on success; raises AssertionError on
|
||||
timeout.
|
||||
Connection errors (uvicorn still starting, fail-closed exited) keep
|
||||
the poll loop running until ``deadline_s`` elapses.
|
||||
|
||||
The probe Python program is fed over stdin (``python -``) rather
|
||||
than ``python -c`` so we can use proper multi-line syntax with
|
||||
``try``/``except`` blocks without escaping hell.
|
||||
|
||||
Raises ``AssertionError`` on timeout.
|
||||
"""
|
||||
py_program = f"""\
|
||||
import urllib.request, urllib.error
|
||||
req = urllib.request.Request("http://127.0.0.1:9119{path}")
|
||||
try:
|
||||
r = urllib.request.urlopen(req, timeout=5)
|
||||
print(r.status)
|
||||
print(r.read().decode(), end="")
|
||||
except urllib.error.HTTPError as h:
|
||||
print(h.code)
|
||||
print(h.read().decode(), end="")
|
||||
"""
|
||||
# Feed the program over stdin via a heredoc so docker_exec_sh's
|
||||
# single bash string stays clean. The 'PY' delimiter is quoted to
|
||||
# disable shell expansion inside the heredoc body.
|
||||
probe = (
|
||||
"/opt/hermes/.venv/bin/python -c "
|
||||
"'import json,urllib.request as u;"
|
||||
"print(u.urlopen(\"http://127.0.0.1:9119/api/status\",timeout=5)"
|
||||
".read().decode())'"
|
||||
"/opt/hermes/.venv/bin/python - <<'PY'\n"
|
||||
f"{py_program}"
|
||||
"PY"
|
||||
)
|
||||
end = time.monotonic() + deadline_s
|
||||
last_err = ""
|
||||
while time.monotonic() < end:
|
||||
r = docker_exec_sh(container, probe, timeout=10)
|
||||
if r.returncode == 0 and r.stdout.strip():
|
||||
lines = r.stdout.split("\n", 1)
|
||||
try:
|
||||
return json.loads(r.stdout)
|
||||
except (ValueError, json.JSONDecodeError) as exc: # noqa: F841
|
||||
last_err = f"json parse: {exc!r} / stdout={r.stdout!r}"
|
||||
status = int(lines[0].strip())
|
||||
body = lines[1] if len(lines) > 1 else ""
|
||||
return status, body
|
||||
except (ValueError, IndexError) as exc:
|
||||
last_err = f"parse: {exc!r} / stdout={r.stdout!r}"
|
||||
else:
|
||||
last_err = f"rc={r.returncode} stderr={r.stderr!r}"
|
||||
time.sleep(0.5)
|
||||
raise AssertionError(
|
||||
f"/api/status never returned valid JSON within {deadline_s}s; "
|
||||
f"Probe of {path} never returned HTTP within {deadline_s}s; "
|
||||
f"last error: {last_err}"
|
||||
)
|
||||
|
||||
@@ -263,6 +317,21 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
|
||||
flipped ``--insecure`` on for any non-loopback bind, which routed
|
||||
``start_server`` straight back into the legacy ``allow_public=True``
|
||||
branch and disabled the gate every time.
|
||||
|
||||
We verify two independent observable consequences of the gate being
|
||||
on:
|
||||
|
||||
1. ``/api/auth/providers`` (publicly reachable through the gate so
|
||||
the login page can bootstrap) returns 200 with ``nous`` in the
|
||||
provider list — proves the bundled provider registered.
|
||||
2. ``/api/sessions`` (a gated route under both the legacy
|
||||
``_SESSION_TOKEN`` middleware and the OAuth gate) returns 401
|
||||
to an unauthenticated caller — proves the OAuth gate is actively
|
||||
intercepting browser traffic. We deliberately probe a gated route
|
||||
here rather than ``/api/status``: status sits in the shared
|
||||
``PUBLIC_API_PATHS`` allowlist (portal liveness probe target) and
|
||||
responds 200 without a cookie under both gates, so it cannot
|
||||
distinguish "gate on" from "gate off".
|
||||
"""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
@@ -272,15 +341,45 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
status = _fetch_api_status(container_name)
|
||||
assert status.get("auth_required") is True, (
|
||||
"OAuth gate must be engaged on 0.0.0.0 bind when a provider is "
|
||||
"registered and HERMES_DASHBOARD_INSECURE is unset. Got: "
|
||||
f"{status!r}"
|
||||
|
||||
# (1) Provider registry visible via the public bootstrap endpoint.
|
||||
status_code, body = _http_probe(container_name, "/api/auth/providers")
|
||||
assert status_code == 200, (
|
||||
f"/api/auth/providers should return 200 when a provider is "
|
||||
f"registered; got {status_code} body={body!r}"
|
||||
)
|
||||
assert "nous" in status.get("auth_providers", []), (
|
||||
payload = json.loads(body)
|
||||
provider_names = [p.get("name") for p in payload.get("providers", [])]
|
||||
assert "nous" in provider_names, (
|
||||
"Bundled dashboard_auth/nous provider should register when "
|
||||
f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {status!r}"
|
||||
f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {payload!r}"
|
||||
)
|
||||
|
||||
# (2) A gated route (``/api/sessions``) returns 401 to an
|
||||
# unauthenticated caller — the OAuth gate is intercepting.
|
||||
status_code, body = _http_probe(container_name, "/api/sessions")
|
||||
assert status_code == 401, (
|
||||
"OAuth gate must intercept gated /api/* routes on 0.0.0.0 bind "
|
||||
"when a provider is registered and HERMES_DASHBOARD_INSECURE "
|
||||
f"is unset. Got: status={status_code} body={body!r}"
|
||||
)
|
||||
|
||||
# (3) ``/api/status`` remains 200 under the gate — it's in the shared
|
||||
# ``PUBLIC_API_PATHS`` allowlist so NAS's wildcard-subdomain
|
||||
# liveness probe (``fly-provider.ts`` ``getInstanceRuntimeStatus``)
|
||||
# can reach it without a cookie. Regression guard: this allowlist
|
||||
# drifted once already and surfaced every healthy agent as
|
||||
# STARTING/down in the portal UI.
|
||||
status_code, body = _http_probe(container_name, "/api/status")
|
||||
assert status_code == 200, (
|
||||
"/api/status must remain publicly reachable under the OAuth gate "
|
||||
"— the portal uses it as the wildcard-subdomain liveness probe. "
|
||||
f"Got: status={status_code} body={body!r}"
|
||||
)
|
||||
status = json.loads(body)
|
||||
assert status.get("auth_required") is True, (
|
||||
"/api/status must report auth_required=True when the OAuth gate "
|
||||
f"is engaged so the SPA/portal can distinguish modes. Got: {status!r}"
|
||||
)
|
||||
|
||||
|
||||
@@ -291,6 +390,10 @@ def test_dashboard_insecure_env_var_opts_out_of_gate(
|
||||
for operators running on trusted LANs behind a reverse proxy without
|
||||
the OAuth contract. Same opt-out shape as the rest of the s6 boolean
|
||||
envs (``HERMES_DASHBOARD``, ``HERMES_DASHBOARD_TUI``).
|
||||
|
||||
With the gate off, ``/api/status`` (a public endpoint under the
|
||||
legacy ``_SESSION_TOKEN`` middleware) returns 200 with the
|
||||
``auth_required: false`` body — proves the gate is bypassed.
|
||||
"""
|
||||
subprocess.run(
|
||||
["docker", "run", "-d", "--name", container_name,
|
||||
@@ -300,7 +403,12 @@ def test_dashboard_insecure_env_var_opts_out_of_gate(
|
||||
built_image, "sleep", "120"],
|
||||
check=True, capture_output=True, timeout=30,
|
||||
)
|
||||
status = _fetch_api_status(container_name)
|
||||
status_code, body = _http_probe(container_name, "/api/status")
|
||||
assert status_code == 200, (
|
||||
f"/api/status should return 200 with the auth gate disabled; "
|
||||
f"got {status_code} body={body!r}"
|
||||
)
|
||||
status = json.loads(body)
|
||||
assert status.get("auth_required") is False, (
|
||||
"HERMES_DASHBOARD_INSECURE=1 must disable the auth gate (explicit "
|
||||
f"opt-in for trusted-LAN deployments). Got: {status!r}"
|
||||
|
||||
@@ -131,8 +131,13 @@ class TestRefreshTokenCookieDeprecation:
|
||||
|
||||
|
||||
class TestApi401Envelope:
|
||||
# NOTE: probe a gated route (``/api/sessions``) here rather than
|
||||
# ``/api/status`` — status is in the shared ``PUBLIC_API_PATHS``
|
||||
# allowlist (portal liveness probe) so it would 200 even without a
|
||||
# cookie and never exercise the 401-envelope code path.
|
||||
|
||||
def test_no_cookie_returns_unauthenticated_envelope(self, gated_app):
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
assert r.status_code == 401
|
||||
body = r.json()
|
||||
assert body["error"] == "unauthenticated"
|
||||
@@ -141,7 +146,7 @@ class TestApi401Envelope:
|
||||
|
||||
def test_invalid_cookie_returns_session_expired_envelope(self, gated_app):
|
||||
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
assert r.status_code == 401
|
||||
body = r.json()
|
||||
assert body["error"] == "session_expired"
|
||||
@@ -151,7 +156,7 @@ class TestApi401Envelope:
|
||||
"""Dead-cookie cleanup — Phase 6 requirement so the browser
|
||||
doesn't keep replaying the stale token on every request."""
|
||||
gated_app.cookies.set(SESSION_AT_COOKIE, "garbage")
|
||||
r = gated_app.get("/api/status")
|
||||
r = gated_app.get("/api/sessions")
|
||||
set_cookies = r.headers.get_list("set-cookie")
|
||||
assert any(
|
||||
c.startswith(f"{SESSION_AT_COOKIE}=") and "Max-Age=0" in c
|
||||
|
||||
@@ -56,10 +56,61 @@ def gated_app():
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_gated_status_now_requires_auth(gated_app):
|
||||
"""When gate is on, /api/status is NOT public — login bootstrap uses /api/auth/providers."""
|
||||
def test_gated_status_is_public(gated_app):
|
||||
"""``/api/status`` MUST be public under the OAuth gate.
|
||||
|
||||
Regression guard for the wildcard-subdomain rollout: NAS
|
||||
(``fly-provider.ts`` ``getInstanceRuntimeStatus``) hits
|
||||
``/api/status`` without a cookie as its sole liveness probe. A 401
|
||||
here surfaces every healthy agent as STARTING/down in the portal
|
||||
UI. The endpoint returns only version + gateway/auth-gate metadata
|
||||
(no user data, no session content), so it stays in the shared
|
||||
``PUBLIC_API_PATHS`` allowlist under both the legacy ``_SESSION_TOKEN``
|
||||
gate and the OAuth gate.
|
||||
|
||||
The body also reports the gate's shape (``auth_required``,
|
||||
``auth_providers``) so the SPA's StatusPage and external monitors
|
||||
can distinguish loopback / gated / no-providers without a separate
|
||||
round trip.
|
||||
"""
|
||||
r = gated_app.get("/api/status")
|
||||
assert r.status_code == 401
|
||||
assert r.status_code == 200, (
|
||||
f"Expected 200, got {r.status_code}: {r.text}"
|
||||
)
|
||||
body = r.json()
|
||||
assert body["auth_required"] is True
|
||||
assert "version" in body
|
||||
assert "gateway_state" in body
|
||||
|
||||
|
||||
@pytest.mark.parametrize("path", [
|
||||
"/api/config/defaults",
|
||||
"/api/config/schema",
|
||||
"/api/model/info",
|
||||
"/api/dashboard/themes",
|
||||
"/api/dashboard/plugins",
|
||||
])
|
||||
def test_other_public_api_paths_are_public_under_gate(gated_app, path):
|
||||
"""The remaining ``PUBLIC_API_PATHS`` entries must also bypass the
|
||||
gate. They're documented as non-sensitive read-only endpoints that
|
||||
the SPA pre-loads before login (themes, config schema, model
|
||||
metadata). A 401 / 302-to-login here would block the dashboard
|
||||
shell from rendering pre-auth.
|
||||
|
||||
Accept any non-auth-failure status: 200 when the route succeeds,
|
||||
or any route-specific error (e.g. 400 / 404 / 500 from a missing
|
||||
dependency) — but NEVER 401, and NEVER a 302 to ``/login``.
|
||||
"""
|
||||
r = gated_app.get(path, follow_redirects=False)
|
||||
assert r.status_code != 401, (
|
||||
f"{path} returned 401 under the OAuth gate — should be public"
|
||||
)
|
||||
if r.status_code == 302:
|
||||
location = r.headers.get("location", "")
|
||||
assert "/login" not in location, (
|
||||
f"{path} redirected to {location} — should be public, "
|
||||
"not bounced to /login"
|
||||
)
|
||||
|
||||
|
||||
def test_gated_html_redirects_to_login(gated_app):
|
||||
@@ -98,7 +149,7 @@ def test_gated_static_asset_path_is_public(gated_app):
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_full_login_round_trip_unlocks_api_status(gated_app):
|
||||
def test_full_login_round_trip_unlocks_gated_api(gated_app):
|
||||
# 1) Click "Sign in with Stub IdP" — /auth/login redirects to the stub
|
||||
# with a PKCE cookie on the response.
|
||||
r1 = gated_app.get("/auth/login?provider=stub", follow_redirects=False)
|
||||
@@ -128,11 +179,16 @@ def test_full_login_round_trip_unlocks_api_status(gated_app):
|
||||
assert any("hermes_session_at" in c for c in set_cookies)
|
||||
assert any("hermes_session_rt" in c for c in set_cookies)
|
||||
|
||||
# 3) /api/status now succeeds because we're authenticated.
|
||||
r3 = gated_app.get("/api/status")
|
||||
assert r3.status_code == 200
|
||||
body = r3.json()
|
||||
assert "version" in body
|
||||
# 3) A gated API route (``/api/sessions``) now succeeds because we
|
||||
# have a valid session cookie. (We deliberately don't probe
|
||||
# ``/api/status`` here — it's in the shared PUBLIC_API_PATHS
|
||||
# allowlist and would 200 even without a login, so it can't
|
||||
# distinguish "logged in" from "gate accidentally disabled".)
|
||||
r3 = gated_app.get("/api/sessions")
|
||||
assert r3.status_code == 200, (
|
||||
f"Expected 200 for /api/sessions post-login, got {r3.status_code}: "
|
||||
f"{r3.text}"
|
||||
)
|
||||
|
||||
|
||||
def test_login_unknown_provider_returns_404(gated_app):
|
||||
|
||||
@@ -59,19 +59,11 @@ def loopback_client():
|
||||
web_server.app.state.auth_required = prev_required
|
||||
|
||||
|
||||
def _login(client: TestClient) -> None:
|
||||
"""Drive the stub OAuth round trip so the gated client is authed."""
|
||||
r1 = client.get("/auth/login?provider=stub", follow_redirects=False)
|
||||
assert r1.status_code == 302
|
||||
state = r1.headers["location"].split("state=")[1]
|
||||
r2 = client.get(
|
||||
f"/auth/callback?code=stub_code&state={state}", follow_redirects=False
|
||||
)
|
||||
assert r2.status_code == 302
|
||||
|
||||
|
||||
def test_status_reports_auth_required_in_gated_mode(gated_client):
|
||||
_login(gated_client)
|
||||
# No ``_login()`` call — ``/api/status`` is in the shared
|
||||
# ``PUBLIC_API_PATHS`` allowlist precisely so external probes (and
|
||||
# the SPA's pre-login bootstrap) can read the gate's shape without
|
||||
# a cookie. Hit it cold.
|
||||
r = gated_client.get("/api/status")
|
||||
assert r.status_code == 200
|
||||
body = r.json()
|
||||
|
||||
@@ -0,0 +1,238 @@
|
||||
"""Worker-side image enrichment for kanban tasks.
|
||||
|
||||
When a kanban task body contains a local image path or an ``http(s)://``
|
||||
image URL, the worker must surface that image to the model on its first
|
||||
user turn — matching the CLI/gateway behaviour for inbound images.
|
||||
|
||||
The dispatcher spawns the worker as
|
||||
``hermes -p <profile> chat -q "work kanban task <id>"``. The task body
|
||||
itself never appears in argv; the worker has to read it from the kanban
|
||||
DB during startup. These tests cover the round-trip:
|
||||
|
||||
task body → kanban_db.get_task → extract_image_refs →
|
||||
build_native_content_parts → multimodal user turn
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from hermes_cli import kanban_db as kb
|
||||
from agent.image_routing import (
|
||||
build_native_content_parts,
|
||||
extract_image_refs,
|
||||
)
|
||||
|
||||
|
||||
# Tiny 1×1 transparent PNG used to back any path the tests stick into a
|
||||
# task body. extract_image_refs validates the path exists on disk, so the
|
||||
# byte content has to be a real readable file (any image bytes will do).
|
||||
_PNG = base64.b64decode(
|
||||
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNgYGBgAAAABQABpfZFQAAAAABJRU5ErkJggg=="
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def kanban_home(tmp_path: Path, monkeypatch):
|
||||
"""Isolated HERMES_HOME with a fresh kanban DB for each test."""
|
||||
home = tmp_path / ".hermes"
|
||||
home.mkdir()
|
||||
monkeypatch.setenv("HERMES_HOME", str(home))
|
||||
monkeypatch.setattr(Path, "home", lambda: tmp_path)
|
||||
kb.init_db()
|
||||
return home
|
||||
|
||||
|
||||
def _add_task_with_body(body: str, *, title: str = "Look at this") -> str:
|
||||
conn = kb.connect()
|
||||
try:
|
||||
task_id = kb.create_task(
|
||||
conn,
|
||||
title=title,
|
||||
body=body,
|
||||
assignee="worker-a",
|
||||
tenant=None,
|
||||
)
|
||||
finally:
|
||||
conn.close()
|
||||
return task_id
|
||||
|
||||
|
||||
def _read_body(task_id: str) -> str:
|
||||
conn = kb.connect()
|
||||
try:
|
||||
task = kb.get_task(conn, task_id)
|
||||
return (task.body if task is not None else "") or ""
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
class TestExtractFromTaskBody:
|
||||
"""Read a real kanban task body and run it through extract_image_refs."""
|
||||
|
||||
def test_local_path_in_body_round_trips(self, kanban_home, tmp_path):
|
||||
img = tmp_path / "screenshot.png"
|
||||
img.write_bytes(_PNG)
|
||||
tid = _add_task_with_body(
|
||||
f"Please review the screenshot at {img} and confirm "
|
||||
"the alignment is right."
|
||||
)
|
||||
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == []
|
||||
|
||||
def test_url_in_body_round_trips(self, kanban_home):
|
||||
tid = _add_task_with_body(
|
||||
"The design lives at https://example.com/mock/v3.png — "
|
||||
"make the implementation match it."
|
||||
)
|
||||
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == ["https://example.com/mock/v3.png"]
|
||||
|
||||
def test_mixed_path_and_url_in_body(self, kanban_home, tmp_path):
|
||||
img = tmp_path / "current.png"
|
||||
img.write_bytes(_PNG)
|
||||
tid = _add_task_with_body(
|
||||
f"Compare the current screenshot {img} against the design at "
|
||||
"https://example.com/target.png and write a diff."
|
||||
)
|
||||
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == [str(img)]
|
||||
assert urls == ["https://example.com/target.png"]
|
||||
|
||||
def test_body_without_images_yields_nothing(self, kanban_home):
|
||||
tid = _add_task_with_body(
|
||||
"Refactor the auth module to use the new session helper."
|
||||
)
|
||||
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == []
|
||||
|
||||
def test_empty_body_is_safe(self, kanban_home):
|
||||
tid = _add_task_with_body("")
|
||||
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
assert paths == []
|
||||
assert urls == []
|
||||
|
||||
|
||||
class TestBuildPartsFromTaskBody:
|
||||
"""Verify the full pipeline produces a multimodal user turn."""
|
||||
|
||||
def test_local_path_becomes_native_image_part(self, kanban_home, tmp_path):
|
||||
img = tmp_path / "design.png"
|
||||
img.write_bytes(_PNG)
|
||||
tid = _add_task_with_body(f"Check out {img} — what's broken?")
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
|
||||
# Mirrors the cli.py wiring: pass the worker's literal -q argument
|
||||
# (the dispatcher uses ``"work kanban task <id>"``) plus the
|
||||
# extracted refs through build_native_content_parts.
|
||||
parts, skipped = build_native_content_parts(
|
||||
f"work kanban task {tid}",
|
||||
paths,
|
||||
image_urls=urls or None,
|
||||
)
|
||||
|
||||
assert skipped == []
|
||||
# text part + one image_url part
|
||||
assert len(parts) == 2
|
||||
assert parts[0]["type"] == "text"
|
||||
assert parts[0]["text"].startswith(f"work kanban task {tid}")
|
||||
assert f"[Image attached at: {img}]" in parts[0]["text"]
|
||||
assert parts[1]["type"] == "image_url"
|
||||
assert parts[1]["image_url"]["url"].startswith("data:image/png;base64,")
|
||||
|
||||
def test_url_becomes_image_url_part(self, kanban_home):
|
||||
tid = _add_task_with_body(
|
||||
"Reference: https://example.com/target.jpg — match it."
|
||||
)
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
|
||||
parts, skipped = build_native_content_parts(
|
||||
f"work kanban task {tid}",
|
||||
paths,
|
||||
image_urls=urls or None,
|
||||
)
|
||||
|
||||
assert skipped == []
|
||||
assert len(parts) == 2
|
||||
assert parts[0]["type"] == "text"
|
||||
assert "[Image attached: https://example.com/target.jpg]" in parts[0]["text"]
|
||||
assert parts[1] == {
|
||||
"type": "image_url",
|
||||
"image_url": {"url": "https://example.com/target.jpg"},
|
||||
}
|
||||
|
||||
def test_body_with_both_yields_two_image_parts(self, kanban_home, tmp_path):
|
||||
img = tmp_path / "local.png"
|
||||
img.write_bytes(_PNG)
|
||||
tid = _add_task_with_body(
|
||||
f"Diff {img} vs https://example.com/target.png — explain it."
|
||||
)
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
|
||||
parts, skipped = build_native_content_parts(
|
||||
f"work kanban task {tid}",
|
||||
paths,
|
||||
image_urls=urls or None,
|
||||
)
|
||||
|
||||
assert skipped == []
|
||||
image_parts = [p for p in parts if p.get("type") == "image_url"]
|
||||
assert len(image_parts) == 2
|
||||
# Local file is embedded as a data URL; remote URL passes through.
|
||||
assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
|
||||
assert image_parts[1]["image_url"]["url"] == "https://example.com/target.png"
|
||||
|
||||
def test_body_with_no_images_leaves_query_untouched(self, kanban_home):
|
||||
tid = _add_task_with_body(
|
||||
"Rewrite the README intro paragraph to focus on use cases."
|
||||
)
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
|
||||
parts, skipped = build_native_content_parts(
|
||||
f"work kanban task {tid}",
|
||||
paths,
|
||||
image_urls=urls or None,
|
||||
)
|
||||
|
||||
# No images → plain text-only return (single part, no list mutation).
|
||||
assert skipped == []
|
||||
assert len(parts) == 1
|
||||
assert parts[0]["type"] == "text"
|
||||
assert parts[0]["text"] == f"work kanban task {tid}"
|
||||
|
||||
def test_code_block_example_is_not_attached(self, kanban_home, tmp_path):
|
||||
# Only the real image outside the fenced code block should attach.
|
||||
real = tmp_path / "real.png"
|
||||
real.write_bytes(_PNG)
|
||||
tid = _add_task_with_body(
|
||||
f"Real screenshot:\n{real}\n\n"
|
||||
"Example we DON'T want attached:\n"
|
||||
"```\n"
|
||||
"image: /tmp/example_only.png\n"
|
||||
"url: https://example.com/example.png\n"
|
||||
"```\n"
|
||||
)
|
||||
body = _read_body(tid)
|
||||
paths, urls = extract_image_refs(body)
|
||||
|
||||
assert paths == [str(real)]
|
||||
assert urls == []
|
||||
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
|
||||
|
||||
|
||||
def test_non_persistent_cleanup_removes_container(monkeypatch):
|
||||
"""When persistent=false, cleanup() must schedule docker stop + rm."""
|
||||
"""When persist_across_processes=false, cleanup() must docker stop AND
|
||||
docker rm so containers don't leak across hermes processes.
|
||||
|
||||
Updated for issue #20561: the previous implementation used fire-and-forget
|
||||
``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
|
||||
the new implementation uses ``subprocess.run`` on a daemon thread with
|
||||
bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
|
||||
for the full behavior contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
# Run the worker thread synchronously so assertions can observe its work.
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
popen_cmds = []
|
||||
monkeypatch.setattr(
|
||||
docker_env.subprocess, "Popen",
|
||||
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="ephemeral-task", persistent_filesystem=False,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
|
||||
assert env._container_id
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
# Capture cleanup-time docker calls (everything before this was init).
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capture(cmd, **kw):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
|
||||
return real_run(cmd, **kw)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capture)
|
||||
env.cleanup()
|
||||
|
||||
# Should have stop and rm calls via Popen
|
||||
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
|
||||
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
|
||||
assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"
|
||||
|
||||
|
||||
class _FakePopen:
|
||||
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
|
||||
"does not expose POSIX uid/gid" in rec.getMessage()
|
||||
for rec in caplog.records
|
||||
), "expected a warning when POSIX ids are unavailable"
|
||||
|
||||
|
||||
# ── Docker labels (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _run_args_from_calls(calls):
|
||||
"""Pull the argv list passed to the first ``docker run`` invocation."""
|
||||
run_calls = [
|
||||
c for c in calls
|
||||
if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
|
||||
]
|
||||
assert run_calls, "docker run should have been called"
|
||||
return run_calls[0][0]
|
||||
|
||||
|
||||
def _labels_in_run_args(run_args):
|
||||
"""Return the set of ``key=value`` strings passed via ``--label``."""
|
||||
return {
|
||||
run_args[i + 1]
|
||||
for i, flag in enumerate(run_args[:-1])
|
||||
if flag == "--label"
|
||||
}
|
||||
|
||||
|
||||
def test_run_command_tags_hermes_agent_label(monkeypatch):
|
||||
"""Every container hermes-agent starts must carry the hermes-agent=1 label
|
||||
so the orphan reaper (and external operators) can identify them with a
|
||||
single ``docker ps --filter label=hermes-agent=1`` call. Regression test
|
||||
for issue #20561 — without the label there is no global sweep target."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="my-task")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-agent=1" in labels, (
|
||||
f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_run_command_tags_task_and_profile_labels(monkeypatch):
|
||||
"""task_id and the active profile name are surfaced as labels so future
|
||||
cross-process reuse logic can filter to a specific (task, profile) pair
|
||||
without parsing container names. Profile resolution uses the helper that
|
||||
returns ``"default"`` for the root Hermes home."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="kanban-42")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
assert "hermes-task-id=kanban-42" in labels, (
|
||||
f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
|
||||
)
|
||||
assert "hermes-profile=research-bot" in labels, (
|
||||
f"hermes-profile=research-bot missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_label_sanitizer_rejects_invalid_characters():
|
||||
"""Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
|
||||
task names containing slashes, colons, or unicode would otherwise emit
|
||||
invalid labels that round-trip badly through ``docker ps --filter``."""
|
||||
assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
|
||||
assert docker_env._sanitize_label_value("with/slash") == "with_slash"
|
||||
assert docker_env._sanitize_label_value("with:colon") == "with_colon"
|
||||
assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
|
||||
# Empty / non-string inputs must collapse to a queryable token, not "".
|
||||
assert docker_env._sanitize_label_value("") == "unknown"
|
||||
assert docker_env._sanitize_label_value(None) == "unknown" # type: ignore[arg-type]
|
||||
# >63 chars must truncate, not error.
|
||||
long_value = "x" * 100
|
||||
assert len(docker_env._sanitize_label_value(long_value)) == 63
|
||||
|
||||
|
||||
def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
|
||||
"""A task_id containing characters Docker rejects in label values must be
|
||||
sanitized before reaching ``docker run --label``; otherwise the daemon
|
||||
refuses the run with an inscrutable error and the agent's first command
|
||||
blows up."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
calls = _mock_subprocess_run(monkeypatch)
|
||||
|
||||
_make_dummy_env(task_id="task/with:weird*chars")
|
||||
|
||||
labels = _labels_in_run_args(_run_args_from_calls(calls))
|
||||
# Each non-OK character becomes an underscore; the safe chars survive.
|
||||
assert "hermes-task-id=task_with_weird_chars" in labels, (
|
||||
f"sanitized task-id label missing; got: {sorted(labels)}"
|
||||
)
|
||||
|
||||
|
||||
def test_labels_attribute_populated_after_init(monkeypatch):
|
||||
"""``self._labels`` must be set to the same key/value pairs that went onto
|
||||
docker run, so subsequent reuse / reaper paths can match without re-running
|
||||
the sanitizer or re-importing the profile module."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="abc")
|
||||
|
||||
assert env._labels == {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": "abc",
|
||||
"hermes-profile": "default",
|
||||
}
|
||||
|
||||
|
||||
# ── Cross-process container reuse (issue #20561) ──────────────────
|
||||
|
||||
|
||||
def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
|
||||
start_succeeds: bool = True):
|
||||
"""Reuse-aware subprocess.run mock.
|
||||
|
||||
``ps_state`` controls what ``docker ps -a --filter ...`` returns:
|
||||
* ``None`` → no match (empty stdout). Forces a fresh ``docker run``.
|
||||
* ``"running"`` / ``"exited"`` / ... → emit ``CID\\tSTATE`` so the reuse
|
||||
path picks it up. ``"running"`` skips ``docker start``; other states
|
||||
trigger ``docker start`` (which can be forced to fail via
|
||||
``start_succeeds=False``).
|
||||
|
||||
Returns the captured call list so the test can verify which docker
|
||||
commands actually ran.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
sub = cmd[1]
|
||||
if sub == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
|
||||
if sub == "ps":
|
||||
if ps_state is None:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
|
||||
)
|
||||
if sub == "start":
|
||||
if not start_succeeds:
|
||||
# Real subprocess.run with check=True raises on non-zero exit;
|
||||
# mirror that so the production code's except clause fires.
|
||||
raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
|
||||
if sub == "run":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
|
||||
"""When a labeled container is already ``running``, the reuse probe
|
||||
must pick it up and skip ``docker run`` entirely. Regression for the
|
||||
issue #20561 root cause: every Hermes process spawning a new container
|
||||
despite docs claiming "ONE long-lived container shared across sessions"."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-test")
|
||||
|
||||
# The reuse path must populate _container_id from the ps probe output.
|
||||
assert env._container_id == "reused-cid", (
|
||||
f"expected reused container id, got {env._container_id!r}"
|
||||
)
|
||||
# And it must NOT have run `docker run`.
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, (
|
||||
f"docker run should be skipped on reuse, got: {run_invocations}"
|
||||
)
|
||||
# And it must have NOT issued a `docker start` for an already-running container.
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert not start_invocations, (
|
||||
f"docker start should be skipped when container already running, got: {start_invocations}"
|
||||
)
|
||||
|
||||
|
||||
def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
|
||||
"""A labeled container in ``exited`` state must be restarted via
|
||||
``docker start`` before the new Hermes process uses it. Without this
|
||||
step, ``docker exec`` against a stopped container errors out and the
|
||||
first agent command fails opaquely."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-stopped")
|
||||
|
||||
assert env._container_id == "reused-cid"
|
||||
start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
|
||||
assert start_invocations, "expected docker start for exited container"
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert not run_invocations, "should not docker run when reusing an exited container"
|
||||
|
||||
|
||||
def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
|
||||
"""If ``docker start`` on the matched container fails (container was
|
||||
removed between probe and start, daemon paused, etc.), the code must
|
||||
silently fall through to a fresh ``docker run`` rather than leaving the
|
||||
user with a broken environment. Defensive recovery — the probe is best-
|
||||
effort, not authoritative."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
calls = _mock_subprocess_run_with_reuse(
|
||||
monkeypatch, ps_state="exited", start_succeeds=False,
|
||||
)
|
||||
|
||||
env = _make_dummy_env(task_id="reuse-broken-start")
|
||||
|
||||
# docker start should be attempted then fail; code falls through to run.
|
||||
assert env._container_id == "fresh-cid", (
|
||||
f"expected fresh container id after fallback, got {env._container_id!r}"
|
||||
)
|
||||
run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
|
||||
assert run_invocations, "fallback to fresh docker run must happen on start failure"
|
||||
|
||||
|
||||
def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
|
||||
"""Opt-out path: ``persist_across_processes=False`` skips the ps probe
|
||||
entirely and always starts a fresh container, matching the pre-fix
|
||||
behavior for users who want hard per-process isolation."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
# ps_state=running would trigger reuse if the probe ran — assert it doesn't.
|
||||
calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
|
||||
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="no-reuse", persist_across_processes=False,
|
||||
)
|
||||
|
||||
# Must NOT have issued docker ps (the probe is gated by the flag).
|
||||
ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
|
||||
assert not ps_invocations, (
|
||||
f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
|
||||
)
|
||||
# Should have started a fresh container.
|
||||
assert env._container_id == "fresh-cid"
|
||||
|
||||
|
||||
def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
|
||||
"""When the probe returns multiple matches (shouldn't normally happen,
|
||||
but can after a crash leaves stale duplicates), a ``running`` container
|
||||
is preferred over any stopped one. The duplicate gets reaped later by
|
||||
the orphan reaper; we don't try to be heroic about it here."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2:
|
||||
if cmd[1] == "version":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
|
||||
if cmd[1] == "ps":
|
||||
# Two matches: stopped first, running second.
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="stopped-cid\texited\nrunning-cid\trunning\n",
|
||||
stderr="",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
env = _make_dummy_env(task_id="dup-match")
|
||||
assert env._container_id == "running-cid", (
|
||||
f"running container should win over stopped duplicate, got {env._container_id!r}"
|
||||
)
|
||||
|
||||
|
||||
# ── Cleanup correctness (issue #20561) ────────────────────────────
|
||||
|
||||
|
||||
class _FakeThread:
|
||||
"""Stand-in for threading.Thread that captures target/args and calls
|
||||
target() synchronously when .start() runs, so cleanup behavior is
|
||||
observable without actually backgrounding subprocess calls."""
|
||||
|
||||
def __init__(self, target=None, daemon=None, name=None):
|
||||
self._target = target
|
||||
self.daemon = daemon
|
||||
self.name = name
|
||||
self._done = False
|
||||
|
||||
def start(self):
|
||||
if self._target is not None:
|
||||
self._target()
|
||||
self._done = True
|
||||
|
||||
def is_alive(self):
|
||||
return not self._done
|
||||
|
||||
def join(self, timeout=None):
|
||||
self._done = True
|
||||
|
||||
|
||||
def _install_fake_thread(monkeypatch):
|
||||
import threading
|
||||
monkeypatch.setattr(threading, "Thread", _FakeThread)
|
||||
|
||||
|
||||
def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
|
||||
"""``persist_across_processes=True`` (default) cleanup must NEITHER stop
|
||||
NOR remove the container — the docs promise "ONE long-lived container
|
||||
shared across sessions", and any docker stop would kill background
|
||||
processes inside the container (npm watchers, pytest watchers, etc.).
|
||||
|
||||
Resource reclamation in this mode happens via the orphan reaper on next
|
||||
Hermes startup, not on graceful exit. Issue #20561 — the first iteration
|
||||
of this PR did docker stop here, which Ben caught as contradicting the
|
||||
"ONE long-lived container" semantics."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
|
||||
# Default persist_across_processes=True.
|
||||
container_id = env._container_id
|
||||
assert container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"docker stop must NOT be called when persist_across_processes=True; "
|
||||
f"container has to stay running so background processes survive. "
|
||||
f"Got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"docker rm must NOT be called when persist_across_processes=True; "
|
||||
f"reuse would be impossible. Got: {rms}"
|
||||
)
|
||||
# The in-process handle must still be cleared so the next __init__
|
||||
# re-probes via labels (and reuses the still-running container).
|
||||
assert env._container_id is None, (
|
||||
"in-process container_id should be cleared even in no-op cleanup"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
|
||||
"""``cleanup(force_remove=True)`` must stop AND rm the container even
|
||||
when ``persist_across_processes=True``. This is the explicit-teardown
|
||||
path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
|
||||
future caller that wants a guaranteed fresh container.
|
||||
|
||||
Without this kwarg, callers in persist mode would have no way to force a
|
||||
fresh container without also flipping the global config — too coarse for
|
||||
a per-task reset.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
|
||||
assert env._container_id
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup(force_remove=True)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
|
||||
"""``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
|
||||
for a persist-mode container.
|
||||
|
||||
Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
|
||||
(which is called from ``tui_gateway/server.py`` on session.close, from
|
||||
``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
|
||||
calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
|
||||
we'd tear down the container on every TUI session close, defeating the
|
||||
"ONE long-lived container shared across sessions" contract.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="session-close-test")
|
||||
container_id = env._container_id
|
||||
terminal_tool._active_environments["session-close-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("session-close-test")
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("session-close-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert not stops, (
|
||||
f"cleanup_vm() default must not docker stop a persist-mode container; "
|
||||
f"got: {stops}"
|
||||
)
|
||||
assert not rms, (
|
||||
f"cleanup_vm() default must not docker rm a persist-mode container; "
|
||||
f"got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
|
||||
"""``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
|
||||
container — the explicit-teardown path for ``/reset``-style flows.
|
||||
|
||||
Also pins the runtime-signature-inspection plumbing: the kwarg must
|
||||
actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
from tools import terminal_tool
|
||||
|
||||
env = _make_dummy_env(task_id="explicit-teardown-test")
|
||||
terminal_tool._active_environments["explicit-teardown-test"] = env
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
try:
|
||||
terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
|
||||
finally:
|
||||
terminal_tool._active_environments.pop("explicit-teardown-test", None)
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
|
||||
assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
|
||||
|
||||
|
||||
def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
|
||||
"""``persist_across_processes=False`` cleanup must docker stop AND docker
|
||||
rm so containers don't leak. Crucially, this runs regardless of the
|
||||
``persistent_filesystem`` setting — the original code only rm'd when
|
||||
``not self._persistent``, which meant the default-on ``container_persistent:
|
||||
true`` users (the documented happy path) leaked Exited containers forever.
|
||||
Issue #20561 root-cause fix."""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
# Note: persistent_filesystem=True (the prior-leak scenario) + the new
|
||||
# cross-process toggle OFF must still result in a clean rm.
|
||||
env = docker_env.DockerEnvironment(
|
||||
image="python:3.11", cwd="/root", timeout=60,
|
||||
task_id="cleanup-no-persist", persistent_filesystem=True,
|
||||
persist_across_processes=False,
|
||||
)
|
||||
|
||||
cleanup_calls = []
|
||||
real_run = docker_env.subprocess.run
|
||||
|
||||
def _capturing_run(cmd, **kwargs):
|
||||
cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
return real_run(cmd, **kwargs)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
|
||||
|
||||
env.cleanup()
|
||||
|
||||
stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
|
||||
rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
|
||||
assert stops, "expected docker stop"
|
||||
assert rms, (
|
||||
"docker rm MUST run when persist_across_processes=False, even with "
|
||||
"persistent_filesystem=True — that gating was the leak source in #20561."
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
|
||||
"""The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
|
||||
raced with parent-process exit and silently dropped cleanup work. The
|
||||
new code must use ``subprocess.run`` with bounded ``timeout=`` so the
|
||||
work actually completes within the process lifetime.
|
||||
|
||||
Asserts cleanup never reaches into shell-mode Popen. Uses
|
||||
``force_remove=True`` so cleanup actually issues docker calls — the
|
||||
default persist-mode path is now a no-op (commit 4) and would trivially
|
||||
pass this assertion without exercising the docker code at all.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
def _forbidden_popen(*args, **kwargs):
|
||||
raise AssertionError(
|
||||
f"cleanup must not use subprocess.Popen anymore (issue #20561); "
|
||||
f"got args={args} kwargs={kwargs}"
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
|
||||
|
||||
env = _make_dummy_env(task_id="no-popen-cleanup")
|
||||
env.cleanup(force_remove=True) # must not raise
|
||||
|
||||
|
||||
def test_wait_for_cleanup_returns_true_when_no_thread_started():
|
||||
"""``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
|
||||
(or the env has no live cleanup thread) — atexit calls it unconditionally
|
||||
across all active envs, so a False return would falsely flag healthy
|
||||
shutdowns."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
# No _cleanup_thread set — simulates an env that was never cleanup()'d.
|
||||
assert env.wait_for_cleanup(timeout=1.0) is True
|
||||
|
||||
|
||||
def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
|
||||
"""End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
|
||||
and reports completion. Atexit relies on this contract to ensure docker
|
||||
stop/rm actually finishes before the Python interpreter exits.
|
||||
|
||||
Uses ``force_remove=True`` so cleanup actually starts a worker thread —
|
||||
the default persist-mode cleanup is a no-op (commit 4) and never spawns
|
||||
a thread, so the trivial "no thread" branch of wait_for_cleanup is
|
||||
already covered by the previous test.
|
||||
"""
|
||||
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
|
||||
monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
|
||||
_mock_subprocess_run(monkeypatch)
|
||||
_install_fake_thread(monkeypatch)
|
||||
|
||||
env = _make_dummy_env(task_id="wait-test")
|
||||
env.cleanup(force_remove=True)
|
||||
assert env.wait_for_cleanup(timeout=5.0) is True
|
||||
|
||||
|
||||
def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
|
||||
"""A DockerEnvironment whose ``__init__`` failed before the container_id
|
||||
was set (image-pull error, docker daemon down) should still be safe to
|
||||
cleanup() — the post-creation failure path in callers always tries.
|
||||
Without this guard the daemon-down case used to NameError on the cleanup
|
||||
branch."""
|
||||
env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
|
||||
env._container_id = None
|
||||
env._persistent = False
|
||||
env._workspace_dir = None
|
||||
env._home_dir = None
|
||||
# No exception expected.
|
||||
env.cleanup()
|
||||
|
||||
|
||||
# ── Orphan reaper (issue #20561) ──────────────────────────────────
|
||||
|
||||
|
||||
def _now_iso(offset_seconds: int = 0) -> str:
|
||||
"""Return an RFC3339 timestamp ``offset_seconds`` in the past."""
|
||||
import datetime
|
||||
t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
|
||||
# Format like Docker emits — with nanoseconds-style trailing digits.
|
||||
return t.isoformat().replace("+00:00", ".123456789Z")
|
||||
|
||||
|
||||
def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
|
||||
rm_succeeds: bool = True):
|
||||
"""Build a subprocess.run mock for reaper tests.
|
||||
|
||||
* ``ps_ids`` — what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
|
||||
* ``inspect_responses[cid]`` — what ``docker inspect ... FinishedAt`` returns
|
||||
for each cid; ``""`` means "field unset".
|
||||
* ``rm_succeeds`` — whether ``docker rm -f`` returns 0.
|
||||
|
||||
Captures every call so tests can assert which containers were rm'd.
|
||||
"""
|
||||
calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
# cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
|
||||
cid = cmd[-1]
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
|
||||
)
|
||||
if sub == "rm":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0 if rm_succeeds else 1,
|
||||
stdout="", stderr="" if rm_succeeds else "no such container",
|
||||
)
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
return calls
|
||||
|
||||
|
||||
def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
|
||||
"""No labeled containers → no rm calls, returns 0. Establishes the
|
||||
happy-path baseline for the orphan reaper (issue #20561)."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, "no rm calls expected when ps returns empty"
|
||||
|
||||
|
||||
def test_reap_orphan_removes_stale_exited_container(monkeypatch):
|
||||
"""An Exited container older than max_age_seconds must be removed.
|
||||
This is the core repair path for issue #20561 — without the reaper,
|
||||
SIGKILL'd Hermes processes leak containers permanently."""
|
||||
old = _now_iso(offset_seconds=900) # 15 minutes ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 1
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert len(rms) == 1
|
||||
assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
|
||||
|
||||
|
||||
def test_reap_orphan_spares_recently_exited_container(monkeypatch):
|
||||
"""A container exited within max_age_seconds must NOT be reaped — that
|
||||
container belongs to a Hermes process that just finished and may be
|
||||
about to be replaced. Conservative window prevents racing sibling
|
||||
processes."""
|
||||
recent = _now_iso(offset_seconds=60) # 1 minute ago
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
|
||||
|
||||
|
||||
def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
|
||||
"""The reaper must pass ``--filter label=hermes-profile=<profile>`` to
|
||||
docker ps so it never sweeps another profile's containers. A research
|
||||
profile must not tear down the default profile's stragglers."""
|
||||
calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
|
||||
|
||||
docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
|
||||
assert ps_calls, "expected at least one docker ps call"
|
||||
flat = " ".join(ps_calls[0][0])
|
||||
assert "label=hermes-profile=research-bot" in flat, (
|
||||
f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "label=hermes-agent=1" in flat, (
|
||||
f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
|
||||
)
|
||||
assert "status=exited" in flat, (
|
||||
"must filter to exited containers only — running containers may "
|
||||
"belong to a sibling Hermes process and must NEVER be reaped"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
|
||||
"""If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
|
||||
FinishedAt yet) or an unparseable timestamp, the reaper must leave the
|
||||
container alone. Defensive — never reap a container whose age we can't
|
||||
determine."""
|
||||
calls = _reaper_run_mock(
|
||||
monkeypatch,
|
||||
ps_ids=["never-finished", "garbage-ts"],
|
||||
inspect_responses={
|
||||
"never-finished": "0001-01-01T00:00:00Z",
|
||||
"garbage-ts": "not-a-timestamp",
|
||||
},
|
||||
)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
assert removed == 0
|
||||
rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
|
||||
assert not rms, (
|
||||
f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
|
||||
)
|
||||
|
||||
|
||||
def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
|
||||
"""If docker ps itself fails (daemon down, permission denied), the
|
||||
reaper returns 0 without crashing. The reaper is best-effort plumbing,
|
||||
not a critical path — it must never block container creation."""
|
||||
def _failing_ps(cmd, **kwargs):
|
||||
if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
|
||||
|
||||
# Must not raise
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
assert removed == 0
|
||||
|
||||
|
||||
def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
|
||||
"""If ``docker rm -f`` fails on one container (already removed by a
|
||||
concurrent process, container locked, etc.), the reaper must log and
|
||||
continue to the next candidate rather than aborting the whole sweep."""
|
||||
old = _now_iso(offset_seconds=900)
|
||||
rm_calls = []
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
if not isinstance(cmd, list) or len(cmd) < 2:
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
sub = cmd[1]
|
||||
if sub == "ps":
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
|
||||
)
|
||||
if sub == "inspect":
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
|
||||
if sub == "rm":
|
||||
rm_calls.append(cmd[-1])
|
||||
# cid-b fails; cid-a and cid-c succeed.
|
||||
if cmd[-1] == "cid-b":
|
||||
return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
removed = docker_env.reap_orphan_containers(
|
||||
max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
|
||||
)
|
||||
|
||||
# All three were attempted, two succeeded.
|
||||
assert removed == 2
|
||||
assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
|
||||
f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
|
||||
)
|
||||
|
||||
|
||||
def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
|
||||
"""Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
|
||||
9 fractional digits), but Python's fromisoformat caps at microseconds.
|
||||
The helper must trim the extra digits without raising — otherwise every
|
||||
candidate gets skipped and the reaper does nothing."""
|
||||
|
||||
def _run(cmd, **kwargs):
|
||||
return subprocess.CompletedProcess(
|
||||
cmd, 0,
|
||||
stdout="2026-05-28T13:45:00.123456789Z\n",
|
||||
stderr="",
|
||||
)
|
||||
|
||||
monkeypatch.setattr(docker_env.subprocess, "run", _run)
|
||||
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
|
||||
assert result is not None, "must parse RFC3339 with nanoseconds"
|
||||
import datetime
|
||||
assert result.tzinfo == datetime.timezone.utc
|
||||
assert result.year == 2026 and result.month == 5 and result.day == 28
|
||||
|
||||
|
||||
def test_container_finished_at_returns_none_on_zero_value():
|
||||
"""Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
|
||||
map to None so the reaper treats the container as unreapable."""
|
||||
# Direct test of the parsing helper — no subprocess needed since the
|
||||
# check happens after the inspect call returns.
|
||||
import subprocess as _subprocess
|
||||
|
||||
class _MockRun:
|
||||
def __init__(self, stdout):
|
||||
self.returncode = 0
|
||||
self.stdout = stdout
|
||||
self.stderr = ""
|
||||
|
||||
import unittest.mock
|
||||
with unittest.mock.patch.object(
|
||||
docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
|
||||
):
|
||||
result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
|
||||
assert result is None
|
||||
|
||||
@@ -0,0 +1,139 @@
|
||||
"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
|
||||
|
||||
The reaper itself is unit-tested in tests/tools/test_docker_environment.py
|
||||
under the "Orphan reaper" section. These tests cover the terminal_tool-side
|
||||
gates: once-per-process behavior, the disable flag, and the
|
||||
``lifetime_seconds`` doubling that determines the reaper's age threshold.
|
||||
|
||||
Issue #20561 — without these gates, parallel subagents would each fire the
|
||||
reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
|
||||
opt-out would silently do nothing.
|
||||
"""
|
||||
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
|
||||
import tools.terminal_tool as terminal_tool
|
||||
|
||||
|
||||
def _reset_reaper_gate():
|
||||
"""Clear the once-per-process flag between tests."""
|
||||
terminal_tool._docker_orphan_reaper_ran = False
|
||||
|
||||
|
||||
def test_maybe_reap_runs_once_per_process(monkeypatch):
|
||||
"""The reaper sweep must run at most once per Python interpreter.
|
||||
Parallel subagents that each call _create_environment(env_type='docker')
|
||||
would otherwise fire N concurrent docker ps + inspect storms against the
|
||||
daemon and waste 5–10s of startup."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
config = {"docker_orphan_reaper": True}
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
terminal_tool._maybe_reap_docker_orphans(config)
|
||||
|
||||
assert call_count["reap"] == 1, (
|
||||
f"reaper must run exactly once per process; got {call_count['reap']} calls"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_respects_disable_flag(monkeypatch):
|
||||
"""``terminal.docker_orphan_reaper: false`` (via container_config) must
|
||||
skip the sweep entirely — no docker ps, no inspect, no rm. The escape
|
||||
hatch for operators running multiple Hermes processes in the same
|
||||
profile."""
|
||||
_reset_reaper_gate()
|
||||
call_count = {"reap": 0}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
call_count["reap"] += 1
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
|
||||
|
||||
assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
|
||||
# The once-per-process gate must NOT be tripped when the reaper is
|
||||
# disabled — that would prevent a subsequent toggle to true from working.
|
||||
assert terminal_tool._docker_orphan_reaper_ran is False
|
||||
|
||||
|
||||
def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
|
||||
"""The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
|
||||
floor). Generous default — gives sibling Hermes processes ample grace
|
||||
to be replaced without their just-exited containers being yanked."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 600, (
|
||||
f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_floors_at_60_seconds(monkeypatch):
|
||||
"""A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
|
||||
otherwise get an effective age threshold of zero, which would race the
|
||||
user's own just-started container creation. Floor at 60s × 2 = 120s."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("max_age_seconds") == 120, (
|
||||
f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
|
||||
"""The reaper must be scoped to the current Hermes profile — a research
|
||||
profile must NEVER reap default's containers. Verifies the
|
||||
profile-filter wiring."""
|
||||
_reset_reaper_gate()
|
||||
captured_args = {}
|
||||
|
||||
def _fake_reap(**kwargs):
|
||||
captured_args.update(kwargs)
|
||||
return 0
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
|
||||
patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
|
||||
assert captured_args.get("profile_filter") == "research-bot", (
|
||||
f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
|
||||
)
|
||||
|
||||
|
||||
def test_maybe_reap_swallows_exceptions(monkeypatch):
|
||||
"""A reaper crash (docker daemon down, parse error in helper) must NOT
|
||||
block env creation. The reaper is best-effort plumbing, not a critical
|
||||
path; failures get logged at debug level and execution continues."""
|
||||
_reset_reaper_gate()
|
||||
|
||||
def _exploding_reap(**kwargs):
|
||||
raise RuntimeError("docker daemon ate the cat")
|
||||
|
||||
with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
|
||||
# Must not raise
|
||||
terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
|
||||
@@ -34,6 +34,39 @@ def test_resolve_stdio_command_falls_back_to_hermes_node_bin(tmp_path):
|
||||
assert env["PATH"].split(os.pathsep)[0] == str(node_bin)
|
||||
|
||||
|
||||
def test_resolve_stdio_command_falls_back_to_usr_local_bin():
|
||||
"""When ``npx`` isn't on the filtered PATH and isn't under ``$HERMES_HOME/node/bin``
|
||||
or ``~/.local/bin``, the resolver should still locate it at ``/usr/local/bin/npx``.
|
||||
|
||||
This is the canonical install location for Node on Linux from-source builds,
|
||||
the upstream ``node:bookworm-slim`` image (which the Hermes Docker image
|
||||
copies ``node + npm + corepack`` from since #4977), and macOS Homebrew on
|
||||
Intel. Without this candidate, MCP servers run with an ``env.PATH`` that
|
||||
omits ``/usr/local/bin`` (common when users hand-author PATH for sandboxing)
|
||||
fail with ENOENT at ``execvp``.
|
||||
"""
|
||||
target = os.path.join(os.sep, "usr", "local", "bin", "npx")
|
||||
|
||||
# Pretend ONLY the /usr/local/bin/npx candidate exists and is executable —
|
||||
# the other candidates ($HERMES_HOME/node/bin/npx and ~/.local/bin/npx)
|
||||
# should fail isfile() and the resolver must fall through to /usr/local/bin.
|
||||
def _fake_isfile(path):
|
||||
return path == target
|
||||
|
||||
def _fake_access(path, _mode):
|
||||
return path == target
|
||||
|
||||
with patch("tools.mcp_tool.shutil.which", return_value=None), \
|
||||
patch("tools.mcp_tool.os.path.isfile", side_effect=_fake_isfile), \
|
||||
patch("tools.mcp_tool.os.access", side_effect=_fake_access):
|
||||
command, env = _resolve_stdio_command("npx", {"PATH": "/opt/data/bin:/usr/bin:/bin"})
|
||||
|
||||
assert command == target
|
||||
# /usr/local/bin must be prepended so npx's shebang (`/usr/bin/env node`)
|
||||
# can find node in the same directory.
|
||||
assert env["PATH"].split(os.pathsep)[0] == os.path.dirname(target)
|
||||
|
||||
|
||||
def test_resolve_stdio_command_respects_explicit_empty_path():
|
||||
seen_paths = []
|
||||
|
||||
|
||||
@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
|
||||
assert "docker_env" in _gateway_env_map_keys()
|
||||
assert "docker_env" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_persist_across_processes_is_bridged_everywhere():
|
||||
"""Regression pin for the cross-process container reuse toggle.
|
||||
|
||||
``terminal.docker_persist_across_processes`` (issue #20561) controls
|
||||
whether ``DockerEnvironment.__init__`` probes for and reuses an existing
|
||||
labeled container at startup, and whether ``cleanup()`` removes the
|
||||
container on Hermes exit or just stops it (keeping it for the next
|
||||
process). Same four-bridge invariant as docker_run_as_host_user /
|
||||
docker_env / docker_mount_cwd_to_workspace — drift between any of the
|
||||
four sites means ``terminal.docker_persist_across_processes: false`` in
|
||||
config.yaml silently does nothing for that entry point, leaving the
|
||||
user unable to opt out of the documented "ONE long-lived container
|
||||
shared across sessions" behavior.
|
||||
"""
|
||||
assert "docker_persist_across_processes" in _cli_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _gateway_env_map_keys()
|
||||
assert "docker_persist_across_processes" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
|
||||
|
||||
|
||||
def test_docker_orphan_reaper_is_bridged_everywhere():
|
||||
"""Regression pin for the startup orphan reaper toggle (issue #20561).
|
||||
|
||||
``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
|
||||
Exited containers from prior SIGKILL'd processes at startup. Same
|
||||
four-site bridge invariant — drift means
|
||||
``terminal.docker_orphan_reaper: false`` silently does nothing for one
|
||||
entry point, and the reaper either runs when the operator disabled it
|
||||
or fails to run when they enabled it.
|
||||
"""
|
||||
assert "docker_orphan_reaper" in _cli_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _gateway_env_map_keys()
|
||||
assert "docker_orphan_reaper" in _save_config_env_sync_keys()
|
||||
assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
|
||||
|
||||
@@ -44,11 +44,17 @@ def server(hermes_home):
|
||||
):
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -30,11 +30,17 @@ def server():
|
||||
import importlib
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -34,11 +34,17 @@ def server():
|
||||
|
||||
mod = importlib.import_module("tui_gateway.server")
|
||||
yield mod
|
||||
# Reset module-level session state without re-importing. importlib.reload
|
||||
# would re-register the module's atexit hooks (ThreadPoolExecutor
|
||||
# shutdown, _shutdown_sessions); the duplicates race the stderr
|
||||
# buffer at interpreter shutdown and surface as Fatal Python error:
|
||||
# _enter_buffered_busy. Clearing the per-session dicts gives the
|
||||
# next test a clean slate; _methods is NOT cleared because it's
|
||||
# populated at module import time and re-registration only happens
|
||||
# via reload (which we don't do).
|
||||
mod._sessions.clear()
|
||||
mod._pending.clear()
|
||||
mod._answers.clear()
|
||||
mod._methods.clear()
|
||||
importlib.reload(mod)
|
||||
|
||||
|
||||
def test_init_session_attaches_background_review_callback(server, monkeypatch):
|
||||
|
||||
+427
-40
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
|
||||
return {}
|
||||
|
||||
|
||||
# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
|
||||
# safely through `docker ps --filter label=key=value`. Profile and task names
|
||||
# can technically contain other characters; sanitize defensively.
|
||||
_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
|
||||
|
||||
|
||||
def _sanitize_label_value(value: str) -> str:
|
||||
"""Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
|
||||
|
||||
Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
|
||||
label is always queryable. Used at container-create time; never round-trip
|
||||
a sanitized value back into application logic.
|
||||
"""
|
||||
if not isinstance(value, str) or not value:
|
||||
return "unknown"
|
||||
cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
|
||||
cleaned = cleaned[:63] or "unknown"
|
||||
return cleaned
|
||||
|
||||
|
||||
def _get_active_profile_name() -> str:
|
||||
"""Return the active Hermes profile name, or ``"default"`` on any error.
|
||||
|
||||
Resolved at container-create time so a single container is permanently
|
||||
tagged with the profile that created it. Profile switches inside the
|
||||
same process don't retroactively relabel running containers.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.profiles import get_active_profile_name
|
||||
|
||||
return get_active_profile_name() or "default"
|
||||
except Exception:
|
||||
return "default"
|
||||
|
||||
|
||||
def reap_orphan_containers(
|
||||
*,
|
||||
max_age_seconds: int = 600,
|
||||
profile_filter: str | None = None,
|
||||
docker_exe: str | None = None,
|
||||
) -> int:
|
||||
"""Remove stale hermes-tagged containers left behind by prior processes.
|
||||
|
||||
Targets containers that match all of:
|
||||
|
||||
* ``label=hermes-agent=1`` (created by this codebase)
|
||||
* ``status=exited`` (running containers are NEVER reaped — they may
|
||||
belong to a sibling Hermes process whose reuse path will pick them
|
||||
up; killing them would crash the sibling mid-command)
|
||||
* (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
|
||||
caller's profile by default; a hermes process in profile A must not
|
||||
tear down profile B's containers)
|
||||
* ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
|
||||
process that just exited and is about to be replaced doesn't get
|
||||
its container yanked out from under it)
|
||||
|
||||
Returns the number of containers removed. Best-effort: any failure
|
||||
(docker daemon unreachable, slow inspect, parse error) is logged at
|
||||
debug level and the function returns whatever it managed before the
|
||||
failure. Safe to call repeatedly; idempotent.
|
||||
|
||||
Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
|
||||
terminal exits that bypass the ``atexit`` cleanup hook. Without it,
|
||||
even with the cleanup-fix in the prior commit, a hard-killed Hermes
|
||||
process leaves its container behind permanently because there's no
|
||||
subsequent Hermes process scheduled to reuse that exact (task, profile)
|
||||
pair.
|
||||
"""
|
||||
docker = docker_exe or find_docker() or "docker"
|
||||
filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
|
||||
if profile_filter:
|
||||
filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
|
||||
|
||||
try:
|
||||
listing = subprocess.run(
|
||||
[docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
|
||||
capture_output=True, text=True, timeout=15, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker ps failed: %s", e)
|
||||
return 0
|
||||
if listing.returncode != 0:
|
||||
logger.debug(
|
||||
"orphan reaper docker ps returned %d: %s",
|
||||
listing.returncode, listing.stderr.strip(),
|
||||
)
|
||||
return 0
|
||||
|
||||
candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
|
||||
if not candidate_ids:
|
||||
return 0
|
||||
|
||||
# Inspect each candidate to get FinishedAt; reap only those exited
|
||||
# long enough ago. Doing this per-container (rather than bulk inspect)
|
||||
# keeps the failure blast radius to one container at a time.
|
||||
import datetime
|
||||
now = datetime.datetime.now(datetime.timezone.utc)
|
||||
removed = 0
|
||||
for cid in candidate_ids:
|
||||
finished_at = _container_finished_at(docker, cid)
|
||||
if finished_at is None:
|
||||
# Couldn't determine age — be conservative and leave it alone.
|
||||
continue
|
||||
age = (now - finished_at).total_seconds()
|
||||
if age < max_age_seconds:
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker, "rm", "-f", cid],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
removed += 1
|
||||
logger.info(
|
||||
"Reaped orphan container %s (exited %d seconds ago)",
|
||||
cid[:12], int(age),
|
||||
)
|
||||
else:
|
||||
logger.debug(
|
||||
"docker rm -f %s failed: %s",
|
||||
cid[:12], result.stderr.strip(),
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
|
||||
return removed
|
||||
|
||||
|
||||
def _container_finished_at(docker_exe: str, container_id: str):
|
||||
"""Parse ``docker inspect`` FinishedAt for *container_id*.
|
||||
|
||||
Returns a timezone-aware datetime, or ``None`` if the field is missing,
|
||||
unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
|
||||
for never-finished containers. ``None`` means "don't reap" — the caller
|
||||
leaves the container alone.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
|
||||
capture_output=True, text=True, timeout=10, check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
return None
|
||||
raw = result.stdout.strip()
|
||||
if not raw or raw.startswith("0001-01-01"):
|
||||
return None
|
||||
# Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
|
||||
# Python's fromisoformat handles microseconds but not nanoseconds; trim.
|
||||
import re as _re
|
||||
raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
|
||||
raw = raw.replace("Z", "+00:00")
|
||||
try:
|
||||
import datetime
|
||||
return datetime.datetime.fromisoformat(raw)
|
||||
except ValueError as e:
|
||||
logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
|
||||
return None
|
||||
|
||||
|
||||
def find_docker() -> Optional[str]:
|
||||
"""Locate the docker (or podman) CLI binary.
|
||||
|
||||
@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
|
||||
auto_mount_cwd: bool = False,
|
||||
run_as_host_user: bool = False,
|
||||
extra_args: list = None,
|
||||
persist_across_processes: bool = True,
|
||||
):
|
||||
if cwd == "~":
|
||||
cwd = "/root"
|
||||
super().__init__(cwd=cwd, timeout=timeout)
|
||||
self._persistent = persistent_filesystem
|
||||
self._persist_across_processes = persist_across_processes
|
||||
self._task_id = task_id
|
||||
self._forward_env = _normalize_forward_env_names(forward_env)
|
||||
self._env = _normalize_env_dict(env)
|
||||
self._container_id: Optional[str] = None
|
||||
self._labels: dict[str, str] = {}
|
||||
logger.info(f"DockerEnvironment volumes: {volumes}")
|
||||
# Ensure volumes is a list (config.yaml could be malformed)
|
||||
if volumes is not None and not isinstance(volumes, list):
|
||||
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):
|
||||
|
||||
# Start the container directly via `docker run -d`.
|
||||
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
# Labels make hermes-created containers identifiable to:
|
||||
# * the orphan reaper (`hermes-agent=1` for the global sweep filter)
|
||||
# * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
|
||||
# * operators running `docker ps --filter label=hermes-agent=1`
|
||||
# Values are limited to the safe character set defined by
|
||||
# _sanitize_label_value(); the active Hermes profile is captured at
|
||||
# container-start time and never changes for the container's lifetime.
|
||||
profile_name = _sanitize_label_value(_get_active_profile_name())
|
||||
task_label = _sanitize_label_value(task_id)
|
||||
label_args = [
|
||||
"--label", "hermes-agent=1",
|
||||
"--label", f"hermes-task-id={task_label}",
|
||||
"--label", f"hermes-profile={profile_name}",
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
self._labels = {
|
||||
"hermes-agent": "1",
|
||||
"hermes-task-id": task_label,
|
||||
"hermes-profile": profile_name,
|
||||
}
|
||||
|
||||
# Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
|
||||
# container shared across sessions"). If a prior Hermes process
|
||||
# already started a container for this (task_id, profile) and it
|
||||
# still exists, attach to it instead of starting a fresh one. This
|
||||
# restores the documented contract; opt out via
|
||||
# ``terminal.docker_persist_across_processes: false``.
|
||||
#
|
||||
# Reuse matches on labels only — we deliberately do NOT compare image
|
||||
# / mounts / resources. Operators who need a fresh container after
|
||||
# changing those settings should set ``docker_persist_across_processes:
|
||||
# false`` (or run ``docker rm -f`` against the labeled container) to
|
||||
# force a clean start.
|
||||
reused = False
|
||||
if persist_across_processes:
|
||||
existing = self._find_reusable_container(task_label, profile_name)
|
||||
if existing is not None:
|
||||
container_id, state = existing
|
||||
self._container_id = container_id
|
||||
if state != "running":
|
||||
try:
|
||||
subprocess.run(
|
||||
[self._docker_exe, "start", container_id],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
check=True,
|
||||
)
|
||||
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
|
||||
logger.warning(
|
||||
"Failed to start existing container %s (state=%s): "
|
||||
"%s — falling back to a fresh container.",
|
||||
container_id[:12], state, e,
|
||||
)
|
||||
self._container_id = None
|
||||
if self._container_id:
|
||||
logger.info(
|
||||
"Reusing container %s (task=%s, profile=%s, prior state=%s)",
|
||||
container_id[:12], task_label, profile_name, state,
|
||||
)
|
||||
reused = True
|
||||
|
||||
if not reused:
|
||||
run_cmd = [
|
||||
self._docker_exe, "run", "-d",
|
||||
"--init", # tini/catatonit as PID 1 — reaps zombie children
|
||||
"--name", container_name,
|
||||
*label_args,
|
||||
"-w", cwd,
|
||||
*all_run_args,
|
||||
image,
|
||||
"sleep", "infinity", # no fixed lifetime — idle reaper handles cleanup
|
||||
]
|
||||
logger.debug(f"Starting container: {' '.join(run_cmd)}")
|
||||
result = subprocess.run(
|
||||
run_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120, # image pull may take a while
|
||||
check=True,
|
||||
)
|
||||
self._container_id = result.stdout.strip()
|
||||
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
|
||||
|
||||
# Build the init-time env forwarding args (used only by init_session
|
||||
# to inject host env vars into the snapshot; subsequent commands get
|
||||
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
|
||||
logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
|
||||
return _storage_opt_ok
|
||||
|
||||
def cleanup(self):
|
||||
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
|
||||
if self._container_id:
|
||||
try:
|
||||
# Stop in background so cleanup doesn't block
|
||||
stop_cmd = (
|
||||
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
|
||||
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
|
||||
)
|
||||
subprocess.Popen(stop_cmd, shell=True)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to stop container %s: %s", self._container_id, e)
|
||||
def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
|
||||
"""Look for an existing container labeled for this (task, profile).
|
||||
|
||||
Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
|
||||
failure (including ``docker ps`` itself failing). State is one of the
|
||||
values Docker reports via ``{{.State}}`` — e.g. ``running``, ``exited``,
|
||||
``created``, ``paused``, ``restarting``, ``dead``. The caller decides
|
||||
whether the state warrants ``docker start`` before reuse.
|
||||
|
||||
Restricted to the docker-stored label set this class creates; never
|
||||
matches containers that happened to be named ``hermes-*`` but were
|
||||
started by some other tool.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
self._docker_exe, "ps", "-a",
|
||||
"--filter", "label=hermes-agent=1",
|
||||
"--filter", f"label=hermes-task-id={task_label}",
|
||||
"--filter", f"label=hermes-profile={profile_label}",
|
||||
"--format", "{{.ID}}\t{{.State}}",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
check=False,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.debug("docker ps probe failed: %s — will start a fresh container", e)
|
||||
return None
|
||||
if result.returncode != 0:
|
||||
logger.debug(
|
||||
"docker ps probe returned %d: %s — will start a fresh container",
|
||||
result.returncode, result.stderr.strip(),
|
||||
)
|
||||
return None
|
||||
lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
|
||||
if not lines:
|
||||
return None
|
||||
# Multiple matches are unusual (one (task, profile) should produce one
|
||||
# container) but can happen if a previous Hermes process crashed
|
||||
# mid-cleanup. Prefer a running one if present; otherwise pick the
|
||||
# first listed. Stale duplicates get reaped by the orphan-reaper in a
|
||||
# follow-up commit; we don't try to be heroic about them here.
|
||||
running = None
|
||||
first = None
|
||||
for ln in lines:
|
||||
parts = ln.split("\t", 1)
|
||||
if len(parts) != 2:
|
||||
continue
|
||||
cid, state = parts[0], parts[1].lower()
|
||||
if first is None:
|
||||
first = (cid, state)
|
||||
if state == "running" and running is None:
|
||||
running = (cid, state)
|
||||
return running or first
|
||||
|
||||
def cleanup(self, *, force_remove: bool = False):
|
||||
"""Tear down the container according to persist mode and *force_remove*.
|
||||
|
||||
Persist-mode (``persist_across_processes=True``, the default) leaves the
|
||||
container **running** untouched. The docs promise "ONE long-lived
|
||||
container shared across sessions" and stopping it on every Hermes exit
|
||||
breaks that promise:
|
||||
|
||||
* Background processes inside the container (``npm run dev``, watchers,
|
||||
long-running pytest) get killed every time the user runs ``/quit``.
|
||||
* Every reuse requires ``docker start`` + waiting for the container to
|
||||
come back up, adding 1–2s to the first tool call of the new session.
|
||||
* The user-visible difference between "ONE long-lived container" and
|
||||
"a new container that happens to share state" is exactly this:
|
||||
processes survive in the former, die in the latter.
|
||||
|
||||
Resource reclamation for the persist-mode case lives in the
|
||||
``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
|
||||
Hermes process touches a labeled container for ``2 × lifetime_seconds``
|
||||
it gets ``docker rm -f``'d at the next Hermes startup. That covers the
|
||||
SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
|
||||
container on every graceful exit.
|
||||
|
||||
Opt-out mode (``persist_across_processes=False``) still does
|
||||
``docker stop`` + ``docker rm -f`` on every cleanup, matching the
|
||||
pre-PR behavior for users who explicitly want per-process isolation.
|
||||
|
||||
``force_remove=True`` overrides persist mode and always tears the
|
||||
container down (``docker stop`` + ``docker rm -f``). This is the
|
||||
explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
|
||||
resets, or any caller that wants a guaranteed fresh container on next
|
||||
``DockerEnvironment(task_id=...)``. No current caller passes
|
||||
``force_remove=True``; the parameter is here so the explicit-teardown
|
||||
semantics can be wired up later without changing this method's
|
||||
signature.
|
||||
|
||||
Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
|
||||
(not the racy ``Popen(... &)`` pattern from before PR #33645). The
|
||||
atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
|
||||
thread to finish before the interpreter exits, so ``docker stop`` /
|
||||
``docker rm`` actually completes when we do trigger it.
|
||||
"""
|
||||
container_id = self._container_id
|
||||
if not container_id:
|
||||
# Still drop the bind-mount dirs if any were allocated and we're
|
||||
# NOT in persist mode (persist mode preserves them).
|
||||
if not self._persistent:
|
||||
# Also schedule removal (stop only leaves it as stopped)
|
||||
try:
|
||||
subprocess.Popen(
|
||||
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
|
||||
shell=True,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
self._container_id = None
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
return
|
||||
|
||||
if not self._persistent:
|
||||
# Decide what to actually do. Three cases:
|
||||
#
|
||||
# force_remove=True → stop + rm (explicit teardown)
|
||||
# persist_across_processes=True → no-op (leave container running)
|
||||
# persist_across_processes=False → stop + rm (per-process isolation)
|
||||
#
|
||||
# The persist-mode no-op is the issue-#20561 contract: the container
|
||||
# outlives Hermes processes, processes inside it stay alive, and
|
||||
# reuse on next startup is instant.
|
||||
if force_remove:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
elif self._persist_across_processes:
|
||||
# No-op for the container. Drop the in-process handle so a fresh
|
||||
# __init__ will re-probe via labels (and find the running
|
||||
# container) instead of trying to reuse a stale Python reference.
|
||||
self._container_id = None
|
||||
return
|
||||
else:
|
||||
should_stop = True
|
||||
should_remove = True
|
||||
|
||||
# Capture state needed by the worker before we null out the attrs —
|
||||
# the worker thread can outlive ``self``.
|
||||
docker_exe = self._docker_exe
|
||||
log_id = container_id[:12]
|
||||
|
||||
def _do_cleanup() -> None:
|
||||
if should_stop:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "stop", "-t", "10", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker stop %s timed out / failed: %s", log_id, e)
|
||||
if should_remove:
|
||||
try:
|
||||
subprocess.run(
|
||||
[docker_exe, "rm", "-f", container_id],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
logger.warning("docker rm -f %s failed: %s", log_id, e)
|
||||
|
||||
# Daemon thread: doesn't block interpreter exit (atexit returns
|
||||
# promptly), but unlike the old ``Popen(... &)`` shell trick the
|
||||
# Python-level join semantics let the thread actually run to
|
||||
# completion if the interpreter is still alive. atexit registers
|
||||
# ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
|
||||
# outstanding cleanups, so most exits complete the work cleanly.
|
||||
import threading
|
||||
t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
|
||||
t.start()
|
||||
self._cleanup_thread = t
|
||||
self._container_id = None
|
||||
|
||||
# Bind-mount dir teardown only runs when we actually removed the
|
||||
# container (the dirs are the container's filesystem state; keeping
|
||||
# them around with no container would orphan the data on disk).
|
||||
if should_remove and not self._persistent:
|
||||
for d in (self._workspace_dir, self._home_dir):
|
||||
if d:
|
||||
shutil.rmtree(d, ignore_errors=True)
|
||||
|
||||
def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
|
||||
"""Block up to *timeout* seconds for the cleanup worker thread.
|
||||
|
||||
Returns ``True`` if the thread finished (or no thread was started),
|
||||
``False`` on timeout. The atexit hook in terminal_tool.py calls this
|
||||
on every active environment so docker stop/rm actually completes
|
||||
before the Python process exits — without this, ``hermes /quit``
|
||||
races the interpreter shutdown and leaves stopped containers behind.
|
||||
"""
|
||||
thread = getattr(self, "_cleanup_thread", None)
|
||||
if thread is None or not thread.is_alive():
|
||||
return True
|
||||
thread.join(timeout=timeout)
|
||||
return not thread.is_alive()
|
||||
|
||||
@@ -422,6 +422,17 @@ def _resolve_stdio_command(command: str, env: dict) -> tuple[str, dict]:
|
||||
candidates = [
|
||||
os.path.join(hermes_home, "node", "bin", resolved_command),
|
||||
os.path.join(os.path.expanduser("~"), ".local", "bin", resolved_command),
|
||||
# /usr/local/bin is the canonical install location for Node on
|
||||
# Linux from-source builds, the upstream node:bookworm-slim
|
||||
# image (which the Hermes Docker image copies node + npm +
|
||||
# corepack from since #4977), and macOS Homebrew on Intel.
|
||||
# Without this candidate, any MCP server configured with an
|
||||
# env.PATH that omits /usr/local/bin (a common pattern when
|
||||
# users hand-author PATH for sandboxing) fails with ENOENT
|
||||
# at execvp, and a naive symlink workaround into the user's
|
||||
# PATH only fails one layer deeper because npx's shebang
|
||||
# re-execs /usr/bin/env node which needs the same directory.
|
||||
os.path.join(os.sep, "usr", "local", "bin", resolved_command),
|
||||
]
|
||||
for candidate in candidates:
|
||||
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
|
||||
|
||||
+143
-3
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
|
||||
_cleanup_thread = None
|
||||
_cleanup_running = False
|
||||
|
||||
# Once-per-process guard for the docker orphan reaper (issue #20561).
|
||||
# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
|
||||
# calls for parallel subagents won't re-trigger the sweep.
|
||||
_docker_orphan_reaper_ran = False
|
||||
_docker_orphan_reaper_lock = threading.Lock()
|
||||
|
||||
|
||||
def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
|
||||
"""Run the docker orphan reaper once per process, if enabled.
|
||||
|
||||
Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
|
||||
profile that match the issue #20561 leak class — containers left behind
|
||||
by Hermes processes that exited without firing ``atexit`` (SIGKILL,
|
||||
OOM, terminal-window-close). The reaper is conservative by default:
|
||||
only Exited containers older than ``2 × lifetime_seconds`` and scoped to
|
||||
the current profile.
|
||||
|
||||
Gates:
|
||||
|
||||
* ``terminal.docker_orphan_reaper: false`` disables it entirely (the
|
||||
operator opted out — usually because they're running multiple
|
||||
Hermes processes in the same profile and don't trust the
|
||||
conservative defaults).
|
||||
* ``_docker_orphan_reaper_ran`` flag — sweep runs once per Python
|
||||
interpreter, not on every subagent / RL-rollout / parallel
|
||||
``terminal()`` call.
|
||||
"""
|
||||
global _docker_orphan_reaper_ran
|
||||
if not container_config.get("docker_orphan_reaper", True):
|
||||
return
|
||||
# Cheap double-checked-locking: read without the lock, take the lock
|
||||
# only on first run, recheck inside.
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
with _docker_orphan_reaper_lock:
|
||||
if _docker_orphan_reaper_ran:
|
||||
return
|
||||
_docker_orphan_reaper_ran = True
|
||||
|
||||
# 2 × lifetime_seconds gives sibling Hermes processes a generous grace
|
||||
# window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
|
||||
# doesn't get an instant-reap that races their own setup.
|
||||
# ``container_config`` only carries container_* keys, so read
|
||||
# lifetime_seconds from the env var the rest of the module uses.
|
||||
try:
|
||||
lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
|
||||
except (TypeError, ValueError):
|
||||
lifetime = 300
|
||||
lifetime = max(60, lifetime)
|
||||
max_age = lifetime * 2
|
||||
|
||||
try:
|
||||
from tools.environments.docker import (
|
||||
reap_orphan_containers, _get_active_profile_name,
|
||||
)
|
||||
except ImportError:
|
||||
return
|
||||
try:
|
||||
profile = _get_active_profile_name()
|
||||
removed = reap_orphan_containers(
|
||||
max_age_seconds=max_age, profile_filter=profile,
|
||||
)
|
||||
if removed:
|
||||
logger.info(
|
||||
"Docker orphan reaper removed %d stale container(s) for profile %s",
|
||||
removed, profile,
|
||||
)
|
||||
except Exception as e:
|
||||
# Never fail the env-creation path because of a janitor problem.
|
||||
logger.debug("Docker orphan reaper raised: %s", e)
|
||||
|
||||
|
||||
# Per-task environment overrides registry.
|
||||
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
|
||||
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
|
||||
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
|
||||
"docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
|
||||
"docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
|
||||
"docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
|
||||
# Cross-process container reuse (issue #20561). The docs claim
|
||||
# "ONE long-lived container shared across sessions" — this toggle
|
||||
# makes that real by probing for a labeled container at startup and
|
||||
# attaching to it instead of always starting a fresh one. Set to
|
||||
# ``false`` for hard per-process isolation (no reuse, container is
|
||||
# removed on exit).
|
||||
"docker_persist_across_processes": os.getenv(
|
||||
"TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
# Startup orphan reaper for hermes-tagged containers left behind by
|
||||
# crashed / SIGKILL'd previous processes that bypassed atexit.
|
||||
# Conservative: only sweeps Exited containers older than 2× the
|
||||
# idle-reap window AND scoped to the current profile. Issue #20561.
|
||||
"docker_orphan_reaper": os.getenv(
|
||||
"TERMINAL_DOCKER_ORPHAN_REAPER", "true"
|
||||
).lower() in {"true", "1", "yes"},
|
||||
}
|
||||
|
||||
|
||||
@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
return _LocalEnvironment(cwd=cwd, timeout=timeout)
|
||||
|
||||
elif env_type == "docker":
|
||||
# One-shot orphan reaper: clean up labeled containers left behind by
|
||||
# prior Hermes processes that hit SIGKILL / OOM / a closed terminal
|
||||
# before the atexit cleanup hook could run. Gated to once per
|
||||
# process so concurrent _create_environment calls (parallel
|
||||
# subagents, RL benchmarks) don't run the reaper N times.
|
||||
# Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
|
||||
_maybe_reap_docker_orphans(cc)
|
||||
return _DockerEnvironment(
|
||||
image=image, cwd=cwd, timeout=timeout,
|
||||
cpu=cpu, memory=memory, disk=disk,
|
||||
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
|
||||
env=docker_env,
|
||||
run_as_host_user=cc.get("docker_run_as_host_user", False),
|
||||
extra_args=docker_extra_args,
|
||||
persist_across_processes=cc.get("docker_persist_across_processes", True),
|
||||
)
|
||||
|
||||
elif env_type == "singularity":
|
||||
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
|
||||
return cleaned
|
||||
|
||||
|
||||
def cleanup_vm(task_id: str):
|
||||
"""Manually clean up a specific environment by task_id."""
|
||||
def cleanup_vm(task_id: str, *, force_remove: bool = False):
|
||||
"""Manually clean up a specific environment by task_id.
|
||||
|
||||
*force_remove* (default False) is forwarded to backends that accept it
|
||||
— currently only ``DockerEnvironment``. The default of False matches
|
||||
session-lifecycle semantics: this function is called from
|
||||
``AIAgent.close()`` (TUI session close, gateway session teardown) and the
|
||||
per-turn cleanup branch for non-persistent envs, both of which should
|
||||
honor the user's persist-mode preference. Stopping the container here
|
||||
would defeat the "ONE long-lived container shared across sessions"
|
||||
contract — exactly the bug Ben reported when the container was killed
|
||||
on every TUI session close.
|
||||
|
||||
Pass ``force_remove=True`` for actual user-initiated teardown
|
||||
(e.g. ``/reset``-style flows that haven't been wired yet, or future
|
||||
"destroy my sandbox" commands).
|
||||
|
||||
The idle reaper passes the env through ``env.cleanup()`` directly (not
|
||||
via this function), so persist-mode idle envs are similarly no-op'd —
|
||||
only the orphan reaper at next startup reclaims them.
|
||||
"""
|
||||
# Remove from tracking dicts while holding the lock, but defer the
|
||||
# actual (potentially slow) env.cleanup() call to outside the lock
|
||||
# so other tool calls aren't blocked.
|
||||
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):
|
||||
|
||||
try:
|
||||
if hasattr(env, 'cleanup'):
|
||||
env.cleanup()
|
||||
# Pass force_remove only if the env's cleanup() accepts it
|
||||
# (DockerEnvironment after issue #20561; other backends don't).
|
||||
import inspect
|
||||
sig = inspect.signature(env.cleanup)
|
||||
if "force_remove" in sig.parameters:
|
||||
env.cleanup(force_remove=force_remove)
|
||||
else:
|
||||
env.cleanup()
|
||||
elif hasattr(env, 'stop'):
|
||||
env.stop()
|
||||
elif hasattr(env, 'terminate'):
|
||||
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
|
||||
if _active_environments:
|
||||
count = len(_active_environments)
|
||||
logger.info("Shutting down %d remaining sandbox(es)...", count)
|
||||
# Snapshot the env objects BEFORE cleanup_all_environments empties
|
||||
# the dict; we need them to wait on docker cleanup threads after the
|
||||
# registry has been cleared.
|
||||
envs_to_wait = list(_active_environments.values())
|
||||
cleanup_all_environments()
|
||||
# Block briefly so docker stop/rm actually completes before the
|
||||
# interpreter exits. Issue #20561 — without this join, the daemon
|
||||
# cleanup threads were getting torn down mid-`docker stop`, leaving
|
||||
# Exited containers piled up on the host.
|
||||
for env in envs_to_wait:
|
||||
wait_fn = getattr(env, "wait_for_cleanup", None)
|
||||
if wait_fn is None:
|
||||
continue
|
||||
try:
|
||||
wait_fn(timeout=15.0)
|
||||
except Exception as e: # never block shutdown on a bad backend
|
||||
logger.debug("wait_for_cleanup raised on exit: %s", e)
|
||||
|
||||
atexit.register(_atexit_cleanup)
|
||||
|
||||
@@ -1746,6 +1884,8 @@ def terminal_tool(
|
||||
"docker_env": config.get("docker_env", {}),
|
||||
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
|
||||
"docker_extra_args": config.get("docker_extra_args", []),
|
||||
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
|
||||
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
|
||||
}
|
||||
|
||||
local_config = None
|
||||
|
||||
@@ -1589,7 +1589,7 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "hermes-agent"
|
||||
version = "0.15.0"
|
||||
version = "0.15.1"
|
||||
source = { editable = "." }
|
||||
dependencies = [
|
||||
{ name = "croniter" },
|
||||
|
||||
Generated
+2786
-6
File diff suppressed because it is too large
Load Diff
+1
-1
@@ -10,7 +10,7 @@
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"dependencies": {
|
||||
"@nous-research/ui": "0.16.0",
|
||||
"@nous-research/ui": "0.18.2",
|
||||
"@observablehq/plot": "^0.6.17",
|
||||
"@react-three/fiber": "^9.6.0",
|
||||
"@tailwindcss/vite": "^4.2.1",
|
||||
|
||||
+2
-2
@@ -50,12 +50,12 @@ import {
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { SelectionSwitcher } from "@nous-research/ui/ui/components/selection-switcher";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { Typography } from "@nous-research/ui/ui/components/typography/index";
|
||||
import { cn } from "@/lib/utils";
|
||||
import { Backdrop } from "@/components/Backdrop";
|
||||
import { SidebarFooter } from "@/components/SidebarFooter";
|
||||
import { SidebarStatusStrip, gatewayLine } from "@/components/SidebarStatusStrip";
|
||||
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
|
||||
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
|
||||
import { useSidebarStatus } from "@/hooks/useSidebarStatus";
|
||||
import { AuthWidget } from "@/components/AuthWidget";
|
||||
import { PageHeaderProvider } from "@/contexts/PageHeaderProvider";
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
|
||||
import { Switch } from "@nous-research/ui/ui/components/switch";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
|
||||
function FieldHint({ schema, schemaKey }: { schema: Record<string, unknown>; schemaKey: string }) {
|
||||
const keyPath = schemaKey.includes(".") ? schemaKey : "";
|
||||
|
||||
@@ -1,225 +0,0 @@
|
||||
import {
|
||||
type PointerEvent as ReactPointerEvent,
|
||||
type ReactNode,
|
||||
useEffect,
|
||||
useRef,
|
||||
useState,
|
||||
} from "react";
|
||||
import { createPortal } from "react-dom";
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { cn, themedBody } from "@/lib/utils";
|
||||
|
||||
const CLOSE_DRAG_MIN_PX = 72;
|
||||
const CLOSE_DRAG_RATIO = 0.18;
|
||||
const SHEET_TRANSITION_MS = 280;
|
||||
|
||||
/**
|
||||
* Mobile-first picker shell: fixed backdrop + bottom sheet, portaled to `body`
|
||||
* so nested overflow/transform in the sidebar cannot clip menus (theme /
|
||||
* language switchers). Open/close uses slide + fade; teardown is delayed until
|
||||
* the exit animation finishes so animations can complete.
|
||||
*
|
||||
* Drag the header/handle downward to dismiss (skipped when reduced motion is on).
|
||||
*/
|
||||
export function BottomPickSheet({
|
||||
backdropDismissLabel = "Dismiss",
|
||||
children,
|
||||
onClose,
|
||||
open,
|
||||
title,
|
||||
}: BottomPickSheetProps) {
|
||||
const [renderPortal, setRenderPortal] = useState(open);
|
||||
const [entered, setEntered] = useState(false);
|
||||
const [dragOffsetPx, setDragOffsetPx] = useState(0);
|
||||
const [dragActive, setDragActive] = useState(false);
|
||||
|
||||
const closeTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
|
||||
const sheetRef = useRef<HTMLDivElement>(null);
|
||||
const dragTrackingRef = useRef(false);
|
||||
const dragStartYRef = useRef(0);
|
||||
const dragOffsetRef = useRef(0);
|
||||
|
||||
const reducedMotion =
|
||||
typeof window !== "undefined" &&
|
||||
window.matchMedia("(prefers-reduced-motion: reduce)").matches;
|
||||
|
||||
const syncDragPx = (next: number) => {
|
||||
dragOffsetRef.current = next;
|
||||
setDragOffsetPx(next);
|
||||
};
|
||||
|
||||
useEffect(() => {
|
||||
if (closeTimerRef.current) {
|
||||
clearTimeout(closeTimerRef.current);
|
||||
closeTimerRef.current = null;
|
||||
}
|
||||
|
||||
const ms = reducedMotion ? 0 : SHEET_TRANSITION_MS;
|
||||
|
||||
let openRafId = 0;
|
||||
let exitRafId = 0;
|
||||
|
||||
if (open) {
|
||||
openRafId = requestAnimationFrame(() => {
|
||||
dragTrackingRef.current = false;
|
||||
dragOffsetRef.current = 0;
|
||||
setDragActive(false);
|
||||
setDragOffsetPx(0);
|
||||
setRenderPortal(true);
|
||||
requestAnimationFrame(() => {
|
||||
requestAnimationFrame(() => setEntered(true));
|
||||
});
|
||||
});
|
||||
} else {
|
||||
exitRafId = requestAnimationFrame(() => {
|
||||
dragTrackingRef.current = false;
|
||||
setDragActive(false);
|
||||
setEntered(false);
|
||||
closeTimerRef.current = window.setTimeout(() => {
|
||||
dragOffsetRef.current = 0;
|
||||
setDragOffsetPx(0);
|
||||
setRenderPortal(false);
|
||||
closeTimerRef.current = null;
|
||||
}, ms);
|
||||
});
|
||||
}
|
||||
|
||||
return () => {
|
||||
cancelAnimationFrame(openRafId);
|
||||
cancelAnimationFrame(exitRafId);
|
||||
if (closeTimerRef.current) {
|
||||
clearTimeout(closeTimerRef.current);
|
||||
closeTimerRef.current = null;
|
||||
}
|
||||
};
|
||||
}, [open, reducedMotion]);
|
||||
|
||||
useEffect(() => {
|
||||
if (!renderPortal) return;
|
||||
const prev = document.body.style.overflow;
|
||||
document.body.style.overflow = "hidden";
|
||||
return () => {
|
||||
document.body.style.overflow = prev;
|
||||
};
|
||||
}, [renderPortal]);
|
||||
|
||||
if (!renderPortal || typeof document === "undefined") return null;
|
||||
|
||||
const durationClass = reducedMotion ? "duration-0" : "duration-[280ms]";
|
||||
|
||||
const draggingVisual = dragActive || dragOffsetPx > 0;
|
||||
|
||||
const onDragPointerDown = (e: ReactPointerEvent<HTMLDivElement>) => {
|
||||
if (reducedMotion || !entered) return;
|
||||
if (e.pointerType === "mouse" && e.button !== 0) return;
|
||||
|
||||
dragTrackingRef.current = true;
|
||||
setDragActive(true);
|
||||
dragStartYRef.current = e.clientY;
|
||||
syncDragPx(0);
|
||||
e.currentTarget.setPointerCapture(e.pointerId);
|
||||
};
|
||||
|
||||
const onDragPointerMove = (e: ReactPointerEvent<HTMLDivElement>) => {
|
||||
if (!dragTrackingRef.current) return;
|
||||
const dy = e.clientY - dragStartYRef.current;
|
||||
const next = Math.max(0, dy);
|
||||
const sheetH = sheetRef.current?.offsetHeight ?? 560;
|
||||
syncDragPx(Math.min(next, sheetH));
|
||||
};
|
||||
|
||||
const endDrag = (e: ReactPointerEvent<HTMLDivElement>) => {
|
||||
if (!dragTrackingRef.current) return;
|
||||
dragTrackingRef.current = false;
|
||||
setDragActive(false);
|
||||
try {
|
||||
e.currentTarget.releasePointerCapture(e.pointerId);
|
||||
} catch {
|
||||
/* already released */
|
||||
}
|
||||
|
||||
const sheetH = sheetRef.current?.offsetHeight ?? 560;
|
||||
const threshold = Math.max(CLOSE_DRAG_MIN_PX, sheetH * CLOSE_DRAG_RATIO);
|
||||
const d = dragOffsetRef.current;
|
||||
|
||||
if (d >= threshold) {
|
||||
onClose();
|
||||
return;
|
||||
}
|
||||
syncDragPx(0);
|
||||
};
|
||||
|
||||
return createPortal(
|
||||
<div className="fixed inset-0 z-[200] flex flex-col justify-end">
|
||||
<button
|
||||
type="button"
|
||||
aria-label={backdropDismissLabel}
|
||||
className={cn(
|
||||
"absolute inset-0 bg-black/55 backdrop-blur-[2px]",
|
||||
"transition-opacity ease-out motion-reduce:transition-none",
|
||||
durationClass,
|
||||
entered ? "opacity-100" : "opacity-0",
|
||||
)}
|
||||
onClick={onClose}
|
||||
/>
|
||||
|
||||
<div
|
||||
aria-label={title}
|
||||
aria-modal="true"
|
||||
ref={sheetRef}
|
||||
className={cn(
|
||||
themedBody,
|
||||
"relative flex max-h-[85dvh] min-h-0 flex-col rounded-t-xl border border-current/20",
|
||||
"bg-background-base/98 pb-[max(1rem,env(safe-area-inset-bottom))]",
|
||||
"shadow-[0_-12px_40px_-8px_rgba(0,0,0,0.55)] backdrop-blur-md",
|
||||
"ease-out motion-reduce:transition-none transform-gpu",
|
||||
draggingVisual ? "transition-none" : cn("transition-transform", durationClass),
|
||||
entered ? "translate-y-0" : "translate-y-full",
|
||||
)}
|
||||
role="dialog"
|
||||
style={
|
||||
entered && dragOffsetPx > 0
|
||||
? { transform: `translateY(${dragOffsetPx}px)` }
|
||||
: undefined
|
||||
}
|
||||
>
|
||||
<div
|
||||
className={cn(
|
||||
"flex shrink-0 flex-col gap-2 border-b border-current/15 px-4 pb-3 pt-2",
|
||||
"touch-none select-none",
|
||||
reducedMotion ? "cursor-default" : "cursor-grab active:cursor-grabbing",
|
||||
)}
|
||||
onPointerCancel={endDrag}
|
||||
onPointerDown={onDragPointerDown}
|
||||
onPointerMove={onDragPointerMove}
|
||||
onPointerUp={endDrag}
|
||||
>
|
||||
<div
|
||||
aria-hidden
|
||||
className="mx-auto h-1 w-10 shrink-0 rounded-full bg-current/20"
|
||||
/>
|
||||
|
||||
<Typography
|
||||
mondwest
|
||||
className="text-display text-xs tracking-[0.12em] text-text-tertiary"
|
||||
>
|
||||
{title}
|
||||
</Typography>
|
||||
</div>
|
||||
|
||||
<div className="min-h-0 flex-1 overflow-y-auto overscroll-contain">
|
||||
{children}
|
||||
</div>
|
||||
</div>
|
||||
</div>,
|
||||
document.body,
|
||||
);
|
||||
}
|
||||
|
||||
interface BottomPickSheetProps {
|
||||
backdropDismissLabel?: string;
|
||||
children: ReactNode;
|
||||
onClose: () => void;
|
||||
open: boolean;
|
||||
title: string;
|
||||
}
|
||||
@@ -25,7 +25,7 @@
|
||||
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Card } from "@/components/ui/card";
|
||||
import { Card } from "@nous-research/ui/ui/components/card";
|
||||
|
||||
import { ModelPickerDialog } from "@/components/ModelPickerDialog";
|
||||
import { ToolCall, type ToolEntry } from "@/components/ToolCall";
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
|
||||
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
export function DeleteConfirmDialog({
|
||||
|
||||
@@ -2,9 +2,9 @@ import { useState, useRef, useEffect } from "react";
|
||||
import { createPortal } from "react-dom";
|
||||
import { Check } from "lucide-react";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { BottomPickSheet } from "@/components/BottomPickSheet";
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
|
||||
import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
|
||||
import { Typography } from "@nous-research/ui/ui/components/typography/index";
|
||||
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
|
||||
import { useI18n } from "@/i18n/context";
|
||||
import { LOCALE_META } from "@/i18n";
|
||||
import type { Locale } from "@/i18n";
|
||||
@@ -87,7 +87,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
|
||||
</Button>
|
||||
|
||||
{useMobileSheet && (
|
||||
<BottomPickSheet
|
||||
<BottomSheet
|
||||
backdropDismissLabel={t.common.close}
|
||||
onClose={() => setOpen(false)}
|
||||
open={open}
|
||||
@@ -101,7 +101,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
|
||||
setOpen={setOpen}
|
||||
/>
|
||||
</div>
|
||||
</BottomPickSheet>
|
||||
</BottomSheet>
|
||||
)}
|
||||
|
||||
{open && !useMobileSheet && (() => {
|
||||
|
||||
@@ -2,8 +2,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import type { GatewayClient } from "@/lib/gatewayClient";
|
||||
import { Check, Search, X } from "lucide-react";
|
||||
import { useEffect, useMemo, useRef, useState } from "react";
|
||||
|
||||
@@ -1,63 +0,0 @@
|
||||
import { forwardRef, type ElementType, type HTMLAttributes, type ReactNode } from "react";
|
||||
import { cn } from "@/lib/utils";
|
||||
|
||||
type TypographyProps = HTMLAttributes<HTMLElement> & {
|
||||
as?: ElementType;
|
||||
children?: ReactNode;
|
||||
compressed?: boolean;
|
||||
courier?: boolean;
|
||||
expanded?: boolean;
|
||||
mondwest?: boolean;
|
||||
mono?: boolean;
|
||||
sans?: boolean;
|
||||
variant?: "sm" | "md" | "lg" | "xl";
|
||||
};
|
||||
|
||||
const variantClasses: Record<NonNullable<TypographyProps["variant"]>, string> = {
|
||||
sm: "leading-[1.4] text-[.9375rem] tracking-[0.1875rem]",
|
||||
md: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
|
||||
lg: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
|
||||
xl: "text-[4.5rem] leading-[1] tracking-[0.135rem]",
|
||||
};
|
||||
|
||||
export const Typography = forwardRef<HTMLElement, TypographyProps>(function Typography(
|
||||
{
|
||||
as: Component = "span",
|
||||
className,
|
||||
compressed,
|
||||
courier,
|
||||
expanded,
|
||||
mondwest,
|
||||
mono,
|
||||
sans,
|
||||
variant,
|
||||
...props
|
||||
},
|
||||
ref,
|
||||
) {
|
||||
const hasFontVariant = compressed || courier || expanded || mondwest || mono || sans;
|
||||
|
||||
return (
|
||||
<Component
|
||||
className={cn(
|
||||
compressed && "font-compressed",
|
||||
courier && "font-courier",
|
||||
expanded && "font-expanded",
|
||||
mondwest && "font-mondwest tracking-[0.1875rem]",
|
||||
mono && "font-mono",
|
||||
(!hasFontVariant || sans) && "font-sans",
|
||||
variant && variantClasses[variant],
|
||||
className,
|
||||
)}
|
||||
ref={ref}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
});
|
||||
|
||||
export const H2 = forwardRef<HTMLHeadingElement, Omit<TypographyProps, "as">>(function H2(
|
||||
{ className, variant = "lg", ...props },
|
||||
ref,
|
||||
) {
|
||||
return <Typography as="h2" className={cn("font-bold", className)} variant={variant} ref={ref} {...props} />;
|
||||
});
|
||||
@@ -3,9 +3,9 @@ import { ExternalLink, X, Check } from "lucide-react";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { CopyButton } from "@nous-research/ui/ui/components/command-block";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { H2 } from "@/components/NouiTypography";
|
||||
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
|
||||
import { api, type OAuthProvider, type OAuthStartResponse } from "@/lib/api";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { cn, themedBody } from "@/lib/utils";
|
||||
|
||||
|
||||
@@ -16,9 +16,9 @@ import {
|
||||
CardDescription,
|
||||
CardHeader,
|
||||
CardTitle,
|
||||
} from "@/components/ui/card";
|
||||
} from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
|
||||
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
|
||||
import { OAuthLoginModal } from "@/components/OAuthLoginModal";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@ import { AlertTriangle, Radio, Wifi, WifiOff } from "lucide-react";
|
||||
import type { PlatformStatus } from "@/lib/api";
|
||||
import { isoTimeAgo } from "@/lib/utils";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
export function PlatformsCard({ platforms }: PlatformsCardProps) {
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { Typography } from "@nous-research/ui/ui/components/typography/index";
|
||||
import type { StatusResponse } from "@/lib/api";
|
||||
import { cn } from "@/lib/utils";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
@@ -3,9 +3,9 @@ import { createPortal } from "react-dom";
|
||||
import { Palette, Check } from "lucide-react";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
import { BottomPickSheet } from "@/components/BottomPickSheet";
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
|
||||
import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
|
||||
import { Typography } from "@nous-research/ui/ui/components/typography/index";
|
||||
import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
|
||||
import { BUILTIN_THEMES, useTheme } from "@/themes";
|
||||
import type { DashboardTheme, ThemeListEntry } from "@/themes";
|
||||
import { useI18n } from "@/i18n";
|
||||
@@ -91,7 +91,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
|
||||
</Button>
|
||||
|
||||
{useMobileSheet && (
|
||||
<BottomPickSheet
|
||||
<BottomSheet
|
||||
backdropDismissLabel={t.common.close}
|
||||
onClose={close}
|
||||
open={open}
|
||||
@@ -105,7 +105,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
|
||||
themeName={themeName}
|
||||
/>
|
||||
</div>
|
||||
</BottomPickSheet>
|
||||
</BottomSheet>
|
||||
)}
|
||||
|
||||
{open && !useMobileSheet && (() => {
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
import { useEffect, useState } from "react";
|
||||
import { createPortal } from "react-dom";
|
||||
|
||||
export function Toast({ toast }: { toast: { message: string; type: "success" | "error" } | null }) {
|
||||
const [visible, setVisible] = useState(false);
|
||||
const [current, setCurrent] = useState(toast);
|
||||
|
||||
useEffect(() => {
|
||||
if (toast) {
|
||||
setCurrent(toast);
|
||||
setVisible(true);
|
||||
} else {
|
||||
setVisible(false);
|
||||
const timer = setTimeout(() => setCurrent(null), 200);
|
||||
return () => clearTimeout(timer);
|
||||
}
|
||||
}, [toast]);
|
||||
|
||||
if (!current) return null;
|
||||
|
||||
// Portal to document.body so the toast escapes any ancestor stacking context
|
||||
// (e.g. <main> has `relative z-2`, which would trap z-50 below the header's z-40).
|
||||
return createPortal(
|
||||
<div
|
||||
role="status"
|
||||
aria-live="polite"
|
||||
className={`fixed top-16 right-4 z-50 border px-4 py-2.5 font-courier text-xs tracking-wider uppercase backdrop-blur-sm ${
|
||||
current.type === "success"
|
||||
? "bg-success/15 text-success border-success/30"
|
||||
: "bg-destructive/15 text-destructive border-destructive/30"
|
||||
}`}
|
||||
style={{
|
||||
animation: visible ? "toast-in 200ms ease-out forwards" : "toast-out 200ms ease-in forwards",
|
||||
}}
|
||||
>
|
||||
{current.message}
|
||||
</div>,
|
||||
document.body,
|
||||
);
|
||||
}
|
||||
@@ -1,63 +0,0 @@
|
||||
import { cn, themedBody } from "@/lib/utils";
|
||||
|
||||
/**
|
||||
* Themed card primitive. Themes can restyle every card without touching
|
||||
* call sites by setting CSS vars under the `card` component-style bucket:
|
||||
*
|
||||
* componentStyles:
|
||||
* card:
|
||||
* clipPath: "polygon(10px 0, 100% 0, 100% calc(100% - 10px), calc(100% - 10px) 100%, 0 100%, 0 10px)"
|
||||
* border: "1px solid var(--color-ring)"
|
||||
* background: "linear-gradient(180deg, var(--color-card) 0%, transparent 100%)"
|
||||
* boxShadow: "0 0 0 1px var(--color-ring) inset, 0 0 24px -8px var(--warm-glow)"
|
||||
*
|
||||
* All properties are optional — vars that aren't set compute to their
|
||||
* CSS initial value, so the default shadcn-y card keeps looking normal
|
||||
* for themes that don't override anything.
|
||||
*/
|
||||
const CARD_STYLE: React.CSSProperties = {
|
||||
clipPath: "var(--component-card-clip-path)",
|
||||
borderImage: "var(--component-card-border-image)",
|
||||
background: "var(--component-card-background)",
|
||||
boxShadow: "var(--component-card-box-shadow)",
|
||||
};
|
||||
|
||||
export function Card({ className, style, ...props }: React.HTMLAttributes<HTMLDivElement>) {
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
"border border-border bg-card/80 text-card-foreground w-full",
|
||||
themedBody,
|
||||
className,
|
||||
)}
|
||||
style={{ ...CARD_STYLE, ...style }}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
|
||||
export function CardHeader({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
|
||||
return <div className={cn("flex flex-col gap-1.5 p-4 border-b border-border", className)} {...props} />;
|
||||
}
|
||||
|
||||
export function CardTitle({ className, ...props }: React.HTMLAttributes<HTMLHeadingElement>) {
|
||||
return (
|
||||
<h3
|
||||
className={cn(
|
||||
"font-mondwest text-display text-sm tracking-[0.12em] text-text-primary",
|
||||
className,
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
|
||||
export function CardDescription({ className, ...props }: React.HTMLAttributes<HTMLParagraphElement>) {
|
||||
return (
|
||||
<p className={cn("font-mondwest normal-case text-xs text-muted-foreground", className)} {...props} />
|
||||
);
|
||||
}
|
||||
|
||||
export function CardContent({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
|
||||
return <div className={cn("p-4", className)} {...props} />;
|
||||
}
|
||||
@@ -1,137 +0,0 @@
|
||||
import { useEffect, useRef } from "react";
|
||||
import { createPortal } from "react-dom";
|
||||
import { AlertTriangle } from "lucide-react";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { cn, themedBody } from "@/lib/utils";
|
||||
|
||||
export function ConfirmDialog({
|
||||
cancelLabel = "Cancel",
|
||||
confirmLabel = "Confirm",
|
||||
description,
|
||||
destructive = false,
|
||||
loading = false,
|
||||
onCancel,
|
||||
onConfirm,
|
||||
open,
|
||||
title,
|
||||
}: ConfirmDialogProps) {
|
||||
const dialogRef = useRef<HTMLDivElement>(null);
|
||||
|
||||
// Focus the confirm button when opened; trap ESC to cancel.
|
||||
useEffect(() => {
|
||||
if (!open) return;
|
||||
|
||||
const prevActive = document.activeElement as HTMLElement | null;
|
||||
dialogRef.current
|
||||
?.querySelector<HTMLButtonElement>("[data-confirm]")
|
||||
?.focus();
|
||||
|
||||
const onKey = (e: KeyboardEvent) => {
|
||||
if (e.key === "Escape") {
|
||||
e.preventDefault();
|
||||
onCancel();
|
||||
}
|
||||
};
|
||||
|
||||
document.addEventListener("keydown", onKey);
|
||||
const prevOverflow = document.body.style.overflow;
|
||||
document.body.style.overflow = "hidden";
|
||||
|
||||
return () => {
|
||||
document.removeEventListener("keydown", onKey);
|
||||
document.body.style.overflow = prevOverflow;
|
||||
prevActive?.focus?.();
|
||||
};
|
||||
}, [open, onCancel]);
|
||||
|
||||
if (!open) return null;
|
||||
|
||||
return createPortal(
|
||||
<div
|
||||
role="dialog"
|
||||
aria-modal="true"
|
||||
aria-labelledby="confirm-dialog-title"
|
||||
aria-describedby={description ? "confirm-dialog-desc" : undefined}
|
||||
onClick={(e) => {
|
||||
if (e.target === e.currentTarget) onCancel();
|
||||
}}
|
||||
className={cn(
|
||||
"fixed inset-0 z-50 flex items-center justify-center",
|
||||
"bg-black/60 backdrop-blur-sm",
|
||||
"animate-[fade-in_150ms_ease-out]",
|
||||
)}
|
||||
>
|
||||
<div
|
||||
ref={dialogRef}
|
||||
className={cn(
|
||||
themedBody,
|
||||
"relative w-full max-w-md mx-4",
|
||||
"border border-border bg-card shadow-lg",
|
||||
"animate-[dialog-in_180ms_ease-out]",
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start gap-3 p-4 border-b border-border">
|
||||
{destructive && (
|
||||
<div
|
||||
aria-hidden
|
||||
className="mt-0.5 shrink-0 text-destructive"
|
||||
>
|
||||
<AlertTriangle className="h-4 w-4" />
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex-1 min-w-0 flex flex-col gap-1">
|
||||
<h2
|
||||
id="confirm-dialog-title"
|
||||
className="font-mondwest text-display text-sm font-bold tracking-[0.12em] blend-lighter"
|
||||
>
|
||||
{title}
|
||||
</h2>
|
||||
|
||||
{description && (
|
||||
<p
|
||||
id="confirm-dialog-desc"
|
||||
className="font-mondwest normal-case text-xs text-muted-foreground leading-relaxed"
|
||||
>
|
||||
{description}
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="flex items-center justify-end gap-2 p-3">
|
||||
<Button
|
||||
type="button"
|
||||
outlined
|
||||
onClick={onCancel}
|
||||
disabled={loading}
|
||||
>
|
||||
{cancelLabel}
|
||||
</Button>
|
||||
<Button
|
||||
data-confirm
|
||||
type="button"
|
||||
destructive={destructive}
|
||||
onClick={onConfirm}
|
||||
disabled={loading}
|
||||
>
|
||||
{loading ? "…" : confirmLabel}
|
||||
</Button>
|
||||
</div>
|
||||
</div>
|
||||
</div>,
|
||||
document.body,
|
||||
);
|
||||
}
|
||||
|
||||
interface ConfirmDialogProps {
|
||||
cancelLabel?: string;
|
||||
confirmLabel?: string;
|
||||
description?: string;
|
||||
destructive?: boolean;
|
||||
loading?: boolean;
|
||||
onCancel: () => void;
|
||||
onConfirm: () => void;
|
||||
open: boolean;
|
||||
title: string;
|
||||
}
|
||||
@@ -1,16 +0,0 @@
|
||||
import { cn } from "@/lib/utils";
|
||||
|
||||
export function Input({ className, ...props }: React.InputHTMLAttributes<HTMLInputElement>) {
|
||||
return (
|
||||
<input
|
||||
className={cn(
|
||||
"flex h-9 w-full border border-border bg-background/40 px-3 py-1 font-courier text-sm transition-colors",
|
||||
"placeholder:text-muted-foreground",
|
||||
"focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-foreground/30 focus-visible:border-foreground/25",
|
||||
"disabled:cursor-not-allowed disabled:opacity-50",
|
||||
className,
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
@@ -1,13 +0,0 @@
|
||||
import { cn } from "@/lib/utils";
|
||||
|
||||
export function Label({ className, ...props }: React.LabelHTMLAttributes<HTMLLabelElement>) {
|
||||
return (
|
||||
<label
|
||||
className={cn(
|
||||
"font-mondwest text-xs tracking-[0.1em] uppercase leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70",
|
||||
className,
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
@@ -1,19 +0,0 @@
|
||||
import { cn } from "@/lib/utils";
|
||||
|
||||
export function Separator({
|
||||
className,
|
||||
orientation = "horizontal",
|
||||
...props
|
||||
}: React.HTMLAttributes<HTMLDivElement> & { orientation?: "horizontal" | "vertical" }) {
|
||||
return (
|
||||
<div
|
||||
role="separator"
|
||||
className={cn(
|
||||
"shrink-0 bg-border",
|
||||
orientation === "horizontal" ? "h-px w-full" : "h-full w-px",
|
||||
className,
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
@@ -1,7 +1,7 @@
|
||||
import { useCallback, useEffect, useState } from "react";
|
||||
import { api } from "@/lib/api";
|
||||
import type { ActionStatusResponse } from "@/lib/api";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { useI18n } from "@/i18n";
|
||||
import {
|
||||
SystemActionsContext,
|
||||
|
||||
@@ -1,19 +0,0 @@
|
||||
import { useEffect, useState } from "react";
|
||||
|
||||
/** True when viewport width is strictly below `px` (matches Tailwind `min-width: px`). */
|
||||
export function useBelowBreakpoint(px: number) {
|
||||
const query = `(max-width: ${px - 1}px)`;
|
||||
const [matches, setMatches] = useState(() =>
|
||||
typeof window !== "undefined" ? window.matchMedia(query).matches : false,
|
||||
);
|
||||
|
||||
useEffect(() => {
|
||||
const mql = window.matchMedia(query);
|
||||
const sync = () => setMatches(mql.matches);
|
||||
sync();
|
||||
mql.addEventListener("change", sync);
|
||||
return () => mql.removeEventListener("change", sync);
|
||||
}, [query]);
|
||||
|
||||
return matches;
|
||||
}
|
||||
@@ -1,41 +0,0 @@
|
||||
import { useCallback, useState } from "react";
|
||||
|
||||
export function useConfirmDelete<TId>({
|
||||
onDelete,
|
||||
}: {
|
||||
onDelete: (id: TId) => Promise<void>;
|
||||
}) {
|
||||
const [pendingId, setPendingId] = useState<TId | null>(null);
|
||||
const [isDeleting, setIsDeleting] = useState(false);
|
||||
|
||||
const requestDelete = useCallback((id: TId) => {
|
||||
setPendingId(id);
|
||||
}, []);
|
||||
|
||||
const cancel = useCallback(() => {
|
||||
if (!isDeleting) setPendingId(null);
|
||||
}, [isDeleting]);
|
||||
|
||||
const confirm = useCallback(async () => {
|
||||
if (pendingId === null) return;
|
||||
const id = pendingId;
|
||||
setIsDeleting(true);
|
||||
try {
|
||||
await onDelete(id);
|
||||
setPendingId(null);
|
||||
} catch {
|
||||
// Dialog stays open; caller can surface errors in onDelete before rethrowing
|
||||
} finally {
|
||||
setIsDeleting(false);
|
||||
}
|
||||
}, [pendingId, onDelete]);
|
||||
|
||||
return {
|
||||
cancel,
|
||||
confirm,
|
||||
isDeleting,
|
||||
isOpen: pendingId !== null,
|
||||
pendingId,
|
||||
requestDelete,
|
||||
} as const;
|
||||
}
|
||||
@@ -1,15 +0,0 @@
|
||||
import { useCallback, useState } from "react";
|
||||
|
||||
export function useToast(duration = 3000) {
|
||||
const [toast, setToast] = useState<{ message: string; type: "success" | "error" } | null>(null);
|
||||
|
||||
const showToast = useCallback(
|
||||
(message: string, type: "success" | "error") => {
|
||||
setToast({ message, type });
|
||||
setTimeout(() => setToast(null), duration);
|
||||
},
|
||||
[duration],
|
||||
);
|
||||
|
||||
return { toast, showToast };
|
||||
}
|
||||
+27
-3
@@ -41,7 +41,11 @@ function setSessionHeader(headers: Headers, token: string): void {
|
||||
}
|
||||
}
|
||||
|
||||
export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T> {
|
||||
export async function fetchJSON<T>(
|
||||
url: string,
|
||||
init?: RequestInit,
|
||||
options?: FetchJSONOptions,
|
||||
): Promise<T> {
|
||||
// Inject the session token into all /api/ requests.
|
||||
const headers = new Headers(init?.headers);
|
||||
const token = window.__HERMES_SESSION_TOKEN__;
|
||||
@@ -100,7 +104,7 @@ export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T>
|
||||
// that reload once on the first stale-token 401 — gated mode is
|
||||
// handled above, so reaching here in gated mode means a real
|
||||
// middleware failure that should not reload-loop.
|
||||
if (!window.__HERMES_AUTH_REQUIRED__) {
|
||||
if (!window.__HERMES_AUTH_REQUIRED__ && !options?.allowUnauthorized) {
|
||||
let alreadyReloaded = false;
|
||||
try {
|
||||
alreadyReloaded =
|
||||
@@ -198,8 +202,19 @@ export const api = {
|
||||
* still exists but is never useful there (no Session, no cookie). The
|
||||
* AuthWidget component swallows 401s from this call: if the gate isn't
|
||||
* engaged, /api/auth/me returns 401 and the widget renders nothing.
|
||||
*
|
||||
* ``allowUnauthorized`` is load-bearing: in loopback mode this endpoint
|
||||
* 401s by design, and fetchJSON's default loopback behaviour treats a
|
||||
* 401 as a rotated session token and full-page-reloads to pick up a
|
||||
* fresh one. Because every *other* dashboard request succeeds (and so
|
||||
* clears the one-shot reload guard), that turns this expected 401 into
|
||||
* an infinite reload loop. Opting out keeps the 401 a plain throw the
|
||||
* widget can catch.
|
||||
*/
|
||||
getAuthMe: () => fetchJSON<AuthMeResponse>("/api/auth/me"),
|
||||
getAuthMe: () =>
|
||||
fetchJSON<AuthMeResponse>("/api/auth/me", undefined, {
|
||||
allowUnauthorized: true,
|
||||
}),
|
||||
logout: () =>
|
||||
fetch(`${BASE}/auth/logout`, {
|
||||
method: "POST",
|
||||
@@ -514,6 +529,15 @@ export interface ActionResponse {
|
||||
pid: number;
|
||||
}
|
||||
|
||||
/** Per-call overrides for {@link fetchJSON}. */
|
||||
interface FetchJSONOptions {
|
||||
/** When true, a 401 response is surfaced as a normal thrown error rather
|
||||
* than triggering the loopback stale-token page reload. Use for probes
|
||||
* whose 401 is an expected signal (e.g. /api/auth/me in non-gated mode)
|
||||
* rather than evidence of a rotated session token. */
|
||||
allowUnauthorized?: boolean;
|
||||
}
|
||||
|
||||
export interface ActionStatusResponse {
|
||||
exit_code: number | null;
|
||||
lines: string[];
|
||||
|
||||
@@ -20,7 +20,7 @@ import { timeAgo } from "@/lib/utils";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Stats } from "@nous-research/ui/ui/components/stats";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
@@ -23,7 +23,7 @@ import { WebglAddon } from "@xterm/addon-webgl";
|
||||
import { Terminal } from "@xterm/xterm";
|
||||
import "@xterm/xterm/css/xterm.css";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Typography } from "@/components/NouiTypography";
|
||||
import { Typography } from "@nous-research/ui/ui/components/typography/index";
|
||||
import { HERMES_BASE_PATH, buildWsAuthParam } from "@/lib/api";
|
||||
import { cn } from "@/lib/utils";
|
||||
import { Copy, PanelRight, X } from "lucide-react";
|
||||
|
||||
@@ -38,15 +38,15 @@ import {
|
||||
} from "lucide-react";
|
||||
import { api } from "@/lib/api";
|
||||
import { getNestedValue, setNestedValue } from "@/lib/nested";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { AutoField } from "@/components/AutoField";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
|
||||
@@ -4,17 +4,17 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { H2 } from "@/components/NouiTypography";
|
||||
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
|
||||
import { api } from "@/lib/api";
|
||||
import type { CronJob, ProfileInfo } from "@/lib/api";
|
||||
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
|
||||
import { useModalBehavior } from "@/hooks/useModalBehavior";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Card, CardContent } from "@/components/ui/card";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { Card, CardContent } from "@nous-research/ui/ui/components/card";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
|
||||
@@ -17,9 +17,9 @@ import {
|
||||
import { api } from "@/lib/api";
|
||||
import type { EnvVarInfo } from "@/lib/api";
|
||||
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { OAuthProvidersCard } from "@/components/OAuthProvidersCard";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
@@ -30,10 +30,10 @@ import {
|
||||
CardDescription,
|
||||
CardHeader,
|
||||
CardTitle,
|
||||
} from "@/components/ui/card";
|
||||
} from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
|
||||
@@ -12,8 +12,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { FilterGroup, Segmented } from "@nous-research/ui/ui/components/segmented";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Switch } from "@nous-research/ui/ui/components/switch";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
|
||||
@@ -24,9 +24,9 @@ import { formatTokenCount } from "@/lib/format";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Stats } from "@nous-research/ui/ui/components/stats";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
|
||||
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
|
||||
import { useModalBehavior } from "@/hooks/useModalBehavior";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { useI18n } from "@/i18n";
|
||||
|
||||
@@ -10,12 +10,12 @@ import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
|
||||
import { Switch } from "@nous-research/ui/ui/components/switch";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { CommandBlock } from "@nous-research/ui/ui/components/command-block";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { ConfirmDialog } from "@/components/ui/confirm-dialog";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
import { cn } from "@/lib/utils";
|
||||
|
||||
@@ -14,19 +14,19 @@ import {
|
||||
X,
|
||||
} from "lucide-react";
|
||||
import spinners from "unicode-animations";
|
||||
import { H2 } from "@/components/NouiTypography";
|
||||
import { H2 } from "@nous-research/ui/ui/components/typography/h2";
|
||||
import { api } from "@/lib/api";
|
||||
import type { ProfileInfo } from "@/lib/api";
|
||||
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
|
||||
import { useModalBehavior } from "@/hooks/useModalBehavior";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Card, CardContent } from "@/components/ui/card";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { Card, CardContent } from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
|
||||
@@ -34,18 +34,18 @@ import type {
|
||||
import { timeAgo } from "@/lib/utils";
|
||||
import { Markdown } from "@/components/Markdown";
|
||||
import { PlatformsCard } from "@/components/PlatformsCard";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
import { Segmented } from "@nous-research/ui/ui/components/segmented";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
|
||||
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { useSystemActions } from "@/contexts/useSystemActions";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
|
||||
@@ -17,16 +17,16 @@ import {
|
||||
} from "lucide-react";
|
||||
import { api } from "@/lib/api";
|
||||
import type { SkillInfo, ToolsetInfo } from "@/lib/api";
|
||||
import { useToast } from "@/hooks/useToast";
|
||||
import { Toast } from "@/components/Toast";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { useToast } from "@nous-research/ui/hooks/use-toast";
|
||||
import { Toast } from "@nous-research/ui/ui/components/toast";
|
||||
import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
|
||||
import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { ListItem } from "@nous-research/ui/ui/components/list-item";
|
||||
import { Spinner } from "@nous-research/ui/ui/components/spinner";
|
||||
import { Switch } from "@nous-research/ui/ui/components/switch";
|
||||
import { cn } from "@/lib/utils";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { usePageHeader } from "@/contexts/usePageHeader";
|
||||
import { PluginSlot } from "@/plugins";
|
||||
|
||||
@@ -23,10 +23,10 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
|
||||
import { Button } from "@nous-research/ui/ui/components/button";
|
||||
import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
|
||||
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
|
||||
import { Card, CardHeader, CardTitle, CardContent } from "@/components/ui/card";
|
||||
import { Input } from "@/components/ui/input";
|
||||
import { Label } from "@/components/ui/label";
|
||||
import { Separator } from "@/components/ui/separator";
|
||||
import { Card, CardHeader, CardTitle, CardContent } from "@nous-research/ui/ui/components/card";
|
||||
import { Input } from "@nous-research/ui/ui/components/input";
|
||||
import { Label } from "@nous-research/ui/ui/components/label";
|
||||
import { Separator } from "@nous-research/ui/ui/components/separator";
|
||||
import { Tabs, TabsList, TabsTrigger } from "@nous-research/ui/ui/components/tabs";
|
||||
import { useI18n } from "@/i18n";
|
||||
import { registerSlot, PluginSlot } from "./slots";
|
||||
|
||||
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools
|
||||
|
||||
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
|
||||
|
||||
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
|
||||
**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
@@ -138,8 +138,11 @@ terminal:
|
||||
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
|
||||
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
|
||||
docker_run_as_host_user: false # See "Running container as host user" below
|
||||
docker_forward_env: # Env vars to forward into container
|
||||
docker_forward_env: # Host env vars to forward into container
|
||||
- "GITHUB_TOKEN"
|
||||
docker_env: # Literal env vars to inject (KEY=value)
|
||||
DEBUG: "1"
|
||||
PYTHONUNBUFFERED: "1"
|
||||
docker_volumes: # Host directory mounts
|
||||
- "/home/user/projects:/workspace/projects"
|
||||
- "/home/user/data:/data:ro" # :ro for read-only
|
||||
@@ -151,14 +154,49 @@ terminal:
|
||||
container_cpu: 1 # CPU cores (0 = unlimited)
|
||||
container_memory: 5120 # MB (0 = unlimited)
|
||||
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
|
||||
container_persistent: true # Persist /workspace and /root across sessions
|
||||
container_persistent: true # Persist /workspace and /root bind-mount dirs
|
||||
|
||||
# Cross-process container reuse (defaults match the "one long-lived
|
||||
# container shared across sessions" contract — see Container lifecycle).
|
||||
docker_persist_across_processes: true # Reuse container across Hermes restarts
|
||||
docker_orphan_reaper: true # Sweep abandoned Exited containers at startup
|
||||
|
||||
# Cross-backend lifecycle settings (apply to docker as well)
|
||||
timeout: 180 # Per-command timeout in seconds
|
||||
lifetime_seconds: 300 # Idle-reaper window; also feeds 2× orphan-reaper threshold
|
||||
```
|
||||
|
||||
**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
|
||||
|
||||
**`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.
|
||||
|
||||
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
|
||||
|
||||
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
|
||||
#### Container lifecycle
|
||||
|
||||
Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
|
||||
|
||||
- `hermes-agent=1` — marks it as Hermes-managed
|
||||
- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
|
||||
- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
|
||||
|
||||
On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
|
||||
|
||||
When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
|
||||
|
||||
**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
|
||||
|
||||
| Trigger | When it fires |
|
||||
|---|---|
|
||||
| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
|
||||
| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
|
||||
| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
|
||||
| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
|
||||
|
||||
Edge cases worth knowing:
|
||||
|
||||
- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
|
||||
- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.
|
||||
|
||||
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
|
||||
|
||||
@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta
|
||||
|
||||
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
|
||||
|
||||
#### Environment variable overrides
|
||||
|
||||
Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
|
||||
|
||||
| Env var | Maps to | Notes |
|
||||
|---|---|---|
|
||||
| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
|
||||
| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
|
||||
| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
|
||||
| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
|
||||
| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
|
||||
| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
|
||||
| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
|
||||
| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
|
||||
| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
|
||||
| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
|
||||
| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
|
||||
| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
|
||||
| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
|
||||
| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
|
||||
|
||||
### SSH Backend
|
||||
|
||||
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
|
||||
|
||||
@@ -249,7 +249,7 @@ docker run -it --rm \
|
||||
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
|
||||
|
||||
:::note Looking for Docker as the **terminal backend**?
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
|
||||
This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
|
||||
:::
|
||||
|
||||
## Docker Compose example
|
||||
|
||||
@@ -500,7 +500,7 @@ export default function SkillsDashboard() {
|
||||
const sources = useMemo(() => {
|
||||
const set = new Set(allSkillsLocal.map((s) => s.source));
|
||||
return SOURCE_ORDER.filter((s) => s === "all" || set.has(s));
|
||||
}, []);
|
||||
}, [allSkillsLocal]);
|
||||
|
||||
const categoryEntries = useMemo(() => {
|
||||
const pool =
|
||||
@@ -523,7 +523,7 @@ export default function SkillsDashboard() {
|
||||
return Array.from(map.entries())
|
||||
.sort((a, b) => b[1].count - a[1].count)
|
||||
.map(([key, { label, count }]) => ({ key, label, count }));
|
||||
}, [sourceFilter]);
|
||||
}, [sourceFilter, allSkillsLocal]);
|
||||
|
||||
const filtered = useMemo(() => {
|
||||
const q = debouncedSearch.toLowerCase().trim();
|
||||
|
||||
Reference in New Issue
Block a user