Compare commits

..

8 Commits

Author SHA1 Message Date
alt-glitch
4511322f56 Merge origin/main into sid/persistent-backend
Resolve conflict in local.py: keep refactored _make_run_env helper
over inline _sanitize_subprocess_env logic.
2026-03-15 21:08:11 +05:30
alt-glitch
e266530c7d add different polling intervals for ssh and local backends. ssh has a
longer roundtrip
2026-03-15 02:54:32 +05:30
alt-glitch
879b7d3fbf fix(tests): update mock stdout in env blocklist tests
The fake_popen mock used iter([]) for proc.stdout which doesn't
support .close(). Use MagicMock with __iter__ instead, since
_drain_stdout now calls proc.stdout.close() in its finally block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:48:05 +05:30
alt-glitch
9f36483bf4 refactor: deduplicate execute/cleanup, merge init, clean up helpers
- Merge _init_persistent_shell + _start_persistent_shell into single method
- Move execute() dispatcher and cleanup() into PersistentShellMixin
  so LocalEnvironment and SSHEnvironment inherit them
- Remove broad except Exception wrappers from _execute_oneshot in both backends
- Replace try/except with os.path.exists checks in local _read_temp_files
  and _cleanup_temp_files
- Remove redundant bash -c from SSH oneshot (SSH already runs in a shell)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:39:56 +05:30
alt-glitch
7be314c456 pass configs to file_tools for r+w over ssh.
pass TERM env.
default to ~ to in local and ssh backends.
ssh backend.
2026-03-15 02:26:39 +05:30
balyan.sid@gmail.com
9001b34146 simplify docstrings, fix some bugs 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com
861202b56c wip: add persistent shell to ssh and local terminal backends 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com
9d63dcc3f9 add persistent ssh backend 2026-03-15 01:19:38 +05:30
18 changed files with 908 additions and 2768 deletions

View File

@@ -1,193 +0,0 @@
---
name: vercel-obs
description: Investigate Vercel-deployed apps by collecting runtime logs or configuring a drain to a local receiver, correlating the data with the current codebase, and producing bug-focused observability reports.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [vercel, observability, logging, debugging, production]
related_skills: [native-mcp]
---
# Vercel Obs
Use this skill when the current app is deployed to Vercel and the user wants a code-aware observability pass over recent runtime logs or a temporary drain-backed capture session.
## Prerequisites
- `vercel` CLI installed and logged in
- Repo linked to a Vercel project, or the user can provide a project id/name
- For one-shot live drain capture: `cloudflared` or `ngrok` installed locally
- Drain support is plan-dependent; if drains are unavailable, fall back to runtime-log analysis
## Helper Script
Installed path:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py
```
Read `references/vercel.md` if you need the current Vercel constraints or API assumptions.
## Workflow
### 1. Preflight
Always start with:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py preflight
```
This checks for:
- linked Vercel project metadata
- CLI availability and version
- current Vercel login state
- whether `vercel api` is available for drain operations
If the repo is not linked or the CLI is not authenticated, stop and explain the blocker.
### 2. Immediate Runtime Analysis
Use runtime logs first so the user gets signal immediately:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py collect-runtime --since 30m
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py analyze --since 30m --report-path .hermes/observability/reports/runtime-report.md
```
This path is the default fallback when drain setup is not possible.
### 3. One-Shot Live Session
For a single prompt workflow, prefer the built-in orchestration command:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py live-session \
--minutes 10 \
--environment production \
--report-path .hermes/observability/reports/live-session.md
```
This command will:
- start the local drain receiver
- launch a tunnel with `cloudflared` or `ngrok`
- create a temporary Vercel drain against the linked project
- collect logs for the requested window
- delete the drain
- stop the tunnel and receiver
- analyze only the rows captured during that session
- write a report
Useful flags:
- `--project-id prj_123` if the repo is not linked
- `--scope team_slug` for team-scoped Vercel access
- `--source serverless --source edge-function` to narrow the capture
- `--tunnel cloudflared` to force a provider
- `--name-prefix hermes-incident` to change the temporary drain name prefix
If the tunnel binary is missing or the drain cannot be created, the script should still clean up the local receiver before exiting with an error.
### 4. Local Receiver for Manual Live Drain Capture
Use the manual steps below only when you need fine-grained control.
Start the receiver in the background:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py serve --port 4319 --secret YOUR_SHARED_SECRET
```
Run it through Hermes background process support so it stays alive.
The receiver writes to:
- `.hermes/observability/logs.sqlite3`
- `.hermes/observability/raw/`
### 5. Expose the Receiver
If the receiver is only listening on localhost, expose it with a tunnel before creating the drain.
Preferred manual pattern:
```bash
cloudflared tunnel --url http://127.0.0.1:4319 --no-autoupdate
```
Parse the public HTTPS URL from the tunnel output. If no tunnel is available, explain that Vercel cannot deliver drains to a private localhost endpoint.
### 6. Create or Reuse the Drain
Once you have a public URL:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py ensure-drain \
--name hermes-observability \
--target-url https://example.trycloudflare.com \
--project-id prj_123 \
--secret YOUR_SHARED_SECRET \
--source static \
--source serverless \
--source edge-function
```
If the project id is omitted, the script tries `.vercel/project.json`.
For teardown:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py delete-drain --drain-id d_123
```
### 7. Analyze and Report
Generate a report after enough logs have arrived:
```bash
python ~/.hermes/skills/observability/vercel-observability-loop/scripts/vercel_observability.py analyze \
--since 2h \
--sample-limit 20 \
--report-path .hermes/observability/reports/observability-report.md
```
The report should prioritize:
- bug candidates
- noisy or superfluous logs
- missing context in error logs
- concrete fix proposals tied back to likely files in the repo
## Output Expectations
When using this skill, produce:
1. A short status summary of what mode you used: runtime only or drain-backed
2. The report path
3. The highest-signal findings first
4. Concrete next steps, including drain cleanup if you created one
## Hermes Prompt Patterns
Use prompts like:
```text
/vercel-obs Run a 10 minute live observability session for this repo: start live collection, set up the tunnel, create a temporary drain, collect logs, clean everything up, analyze the captured data, and write the report to .hermes/observability/reports/live-session.md.
```
```text
/vercel-obs Run preflight first, then execute a 5 minute live session against production using only serverless and edge-function logs. Summarize the top bug candidates in chat and save the full report under .hermes/observability/reports/incident-review.md.
```
## Guardrails
- Prefer read-only investigation unless the user explicitly asks for fixes
- Redact obvious secrets and tokens in reports
- Keep time windows narrow by default
- Use sampling for high-volume logs
- If drain creation fails, surface the Vercel API error and fall back to runtime-log analysis

View File

@@ -1,40 +0,0 @@
# Vercel Notes
These notes are here so the skill can stay short.
## Current Assumptions
- `vercel logs --json` is the structured runtime-log path for `v1`
- `vercel api` is used for drain CRUD operations
- The drain endpoints are under `/v1/drains`
- The receiver must be publicly reachable for Vercel to deliver drain traffic
- Drain signature verification uses HMAC-SHA1 over the raw request body and compares against `x-vercel-signature`
- For the one-shot `live-session` flow, tunnel setup is automated with `cloudflared` first and `ngrok` as fallback when available
## Practical Defaults
- Use runtime logs first for immediate signal
- Use drains only for live capture or longer windows
- Store normalized logs locally in SQLite for `v1`
- Use the repo root as the code-correlation root
## Useful CLI Commands
```bash
vercel whoami
vercel logs --json --since 30m
vercel api /v1/drains
vercel api /v1/drains -X POST --input payload.json
vercel api /v1/drains/{id} -X DELETE
```
## Suggested Sources
Reasonable first-pass source sets for a general web app:
- `serverless`
- `edge-function`
- `edge-middleware`
- `static`
Tune the sources down if the project is noisy.

View File

@@ -1,444 +0,0 @@
from __future__ import annotations
import importlib.util
import json
import sqlite3
import sys
from pathlib import Path
SCRIPT_PATH = (
Path(__file__).resolve().parents[2]
/ "optional-skills"
/ "observability"
/ "vercel-observability-loop"
/ "scripts"
/ "vercel_observability.py"
)
def load_module():
spec = importlib.util.spec_from_file_location("vercel_observability_skill", SCRIPT_PATH)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
def test_preflight_reads_vercel_linked_project(tmp_path: Path, monkeypatch):
mod = load_module()
project_dir = tmp_path / "app"
(project_dir / ".vercel").mkdir(parents=True)
(project_dir / ".vercel" / "project.json").write_text(
json.dumps(
{
"projectId": "prj_123",
"orgId": "team_456",
"projectName": "demo-app",
}
),
encoding="utf-8",
)
(project_dir / "vercel.json").write_text("{}", encoding="utf-8")
monkeypatch.setattr(mod.shutil, "which", lambda name: "/opt/homebrew/bin/vercel")
def fake_run(cmd, **kwargs):
joined = " ".join(cmd)
if joined == "vercel --version":
return mod.subprocess.CompletedProcess(cmd, 0, stdout="Vercel CLI 50.31.0\n", stderr="")
if joined == "vercel --help":
return mod.subprocess.CompletedProcess(cmd, 0, stdout="Commands:\n api\n", stderr="")
if joined == "vercel whoami --no-color --non-interactive":
return mod.subprocess.CompletedProcess(cmd, 0, stdout="rewbs\n", stderr="")
raise AssertionError(f"Unexpected command: {cmd}")
monkeypatch.setattr(mod, "run_command", fake_run)
result = mod.run_preflight(project_dir)
assert result["project"]["linked"] is True
assert result["project"]["project_id"] == "prj_123"
assert result["cli"]["logged_in"] is True
assert result["recommended_mode"] == "runtime-or-drain"
def test_collect_runtime_logs_persists_json_lines(tmp_path: Path, monkeypatch):
mod = load_module()
runtime_paths = mod.resolve_paths(tmp_path / ".hermes" / "observability")
sample_lines = "\n".join(
[
json.dumps(
{
"timestamp": "2026-03-16T01:02:03Z",
"level": "error",
"message": "Database timeout for /api/orders",
"path": "/api/orders",
"statusCode": 500,
"requestId": "req_123",
"source": "serverless",
}
),
json.dumps(
{
"timestamp": "2026-03-16T01:04:05Z",
"level": "info",
"message": "render home page",
"path": "/",
"statusCode": 200,
"requestId": "req_456",
"source": "edge-function",
}
),
]
)
def fake_run(cmd, **kwargs):
return mod.subprocess.CompletedProcess(cmd, 0, stdout=sample_lines, stderr="")
monkeypatch.setattr(mod, "run_command", fake_run)
result = mod.collect_runtime_logs(
cwd=tmp_path,
base_dir=runtime_paths["state_dir"],
since="30m",
until=None,
project=None,
environment=None,
level=None,
source=None,
limit=100,
search=None,
request_id=None,
status_code=None,
)
assert result["success"] is True
assert result["stored"] == 2
conn = sqlite3.connect(runtime_paths["db_path"])
try:
count = conn.execute("SELECT COUNT(*) FROM log_events").fetchone()[0]
finally:
conn.close()
assert count == 2
def test_verify_signature_uses_hmac_sha1():
mod = load_module()
body = b'{"message":"hello"}'
secret = "shared-secret"
signature = mod.hmac.new(secret.encode("utf-8"), body, "sha1").hexdigest()
assert mod.verify_signature(body, signature, secret) is True
assert mod.verify_signature(body, "bad-signature", secret) is False
def test_build_drain_payload_includes_self_served_source():
mod = load_module()
payload = mod.build_drain_payload(
name="hermes-observability",
target_url="https://example.trycloudflare.com",
project_id="prj_123",
sources=["serverless", "static"],
headers={"X-Test": "1"},
secret="secret",
delivery_format="json",
)
assert payload["name"] == "hermes-observability"
assert payload["projectIds"] == ["prj_123"]
assert payload["source"] == {"kind": "self-served"}
assert payload["sources"] == ["serverless", "static"]
assert payload["headers"]["X-Test"] == "1"
def test_normalize_log_record_handles_vercel_millisecond_timestamps_and_empty_messages():
mod = load_module()
record = mod.normalize_log_record(
{
"id": "4chxq-1773620260046-25aa70eb0443",
"timestamp": 1773620260046,
"deploymentId": "dpl_123",
"projectId": "prj_123",
"level": "info",
"message": "",
"source": "serverless",
"domain": "portal.nousresearch.com",
"requestMethod": "POST",
"requestPath": "/refresh",
"responseStatusCode": 0,
"environment": "production",
"traceId": "",
},
"runtime",
)
assert record["observed_at"] == "2026-03-16T00:17:40.046000Z"
assert record["path"] == "/refresh"
assert record["host"] == "portal.nousresearch.com"
assert record["status_code"] == 0
assert record["message"] == "POST /refresh -> 0 serverless"
assert record["request_id"] == "4chxq-1773620260046-25aa70eb0443"
def test_analyze_rows_flags_noisy_and_missing_context(tmp_path: Path):
mod = load_module()
repo_root = tmp_path / "repo"
(repo_root / "app" / "api").mkdir(parents=True)
(repo_root / "app" / "api" / "orders.ts").write_text("export function handler() {}", encoding="utf-8")
rows = []
for index in range(12):
rows.append(
{
"fingerprint": "noise",
"origin": "runtime",
"source": "edge-function",
"level": "info",
"status_code": 200,
"request_id": f"req_{index}",
"deployment_id": None,
"environment": "preview",
"path": "/",
"host": None,
"message": "Rendered landing page",
"raw_json": "{}",
}
)
rows.append(
{
"fingerprint": "bug",
"origin": "runtime",
"source": "serverless",
"level": "error",
"status_code": 500,
"request_id": None,
"deployment_id": None,
"environment": "production",
"path": "/api/orders",
"host": None,
"message": "Internal Server Error",
"raw_json": "{}",
}
)
analysis = mod.analyze_rows(rows, repo_root, sample_limit=3)
assert analysis["summary"]["bug_candidates"] >= 1
assert analysis["summary"]["noisy_log_candidates"] >= 1
assert analysis["summary"]["missing_context_candidates"] >= 1
assert any("orders.ts" in ",".join(item["likely_files"]) for item in analysis["bug_candidates"])
def test_live_session_runs_end_to_end_and_scopes_analysis(tmp_path: Path, monkeypatch):
mod = load_module()
runtime_paths = mod.resolve_paths(tmp_path / ".hermes" / "observability")
calls: dict[str, object] = {}
monkeypatch.setattr(
mod,
"run_preflight",
lambda cwd: {
"success": True,
"cli": {"installed": True, "logged_in": True, "api_supported": True},
"project": {"project_id": "prj_123"},
},
)
monkeypatch.setattr(
mod,
"start_receiver_background",
lambda **kwargs: {
"success": True,
"server": object(),
"thread": object(),
"startup": {
"listening": "http://127.0.0.1:4319",
"port": 4319,
"db_path": str(runtime_paths["db_path"]),
"raw_dir": str(runtime_paths["raw_dir"]),
},
},
)
monkeypatch.setattr(
mod,
"start_tunnel",
lambda **kwargs: {
"success": True,
"provider": "cloudflared",
"public_url": "https://demo.trycloudflare.com",
"command": ["cloudflared", "tunnel"],
"process": object(),
"reader_thread": object(),
},
)
monkeypatch.setattr(
mod,
"ensure_drain",
lambda **kwargs: {
"success": True,
"action": "create",
"response": {"json": {"id": "drn_123"}},
},
)
row_ids = iter([10, 16])
monkeypatch.setattr(mod, "get_max_row_id", lambda db_path: next(row_ids))
monkeypatch.setattr(mod.time, "sleep", lambda seconds: calls.setdefault("slept", seconds))
monkeypatch.setattr(
mod,
"delete_drain",
lambda **kwargs: {"success": True, "deleted": kwargs["drain_id"]},
)
monkeypatch.setattr(mod, "stop_tunnel", lambda process, reader_thread: {"success": True, "status": "stopped"})
monkeypatch.setattr(mod, "stop_receiver_background", lambda server, thread: {"success": True, "status": "stopped"})
def fake_analyze_database(**kwargs):
calls["analyze_kwargs"] = kwargs
return {
"success": True,
"report_path": str(kwargs["report_path"]),
"analysis": {
"summary": {
"records": 2,
"clusters": 1,
"bug_candidates": 1,
"noisy_log_candidates": 0,
"missing_context_candidates": 0,
}
},
}
monkeypatch.setattr(mod, "analyze_database", fake_analyze_database)
result = mod.run_live_session(
cwd=tmp_path,
base_dir=runtime_paths["state_dir"],
minutes=0.05,
bind="127.0.0.1",
port=4319,
secret="shared-secret",
name_prefix="session",
project_id=None,
scope=None,
sources=["serverless"],
headers=None,
delivery_format="json",
tunnel="auto",
tunnel_timeout=10.0,
environment="production",
limit=250,
sample_limit=15,
report_path=None,
)
assert result["success"] is True
assert calls["slept"] == 3.0
assert result["session"]["drain_id"] == "drn_123"
assert result["session"]["drain_name"].startswith("session-")
assert result["cleanup"]["drain"]["deleted"] == "drn_123"
analyze_kwargs = calls["analyze_kwargs"]
assert analyze_kwargs["origins"] == ["drain"]
assert analyze_kwargs["min_row_id"] == 10
assert analyze_kwargs["max_row_id"] == 16
assert analyze_kwargs["environment"] == "production"
assert analyze_kwargs["limit"] == 250
assert analyze_kwargs["sample_limit"] == 15
assert analyze_kwargs["report_path"].name.startswith("live-session-")
def test_live_session_cleans_up_receiver_and_tunnel_when_drain_creation_fails(tmp_path: Path, monkeypatch):
mod = load_module()
runtime_paths = mod.resolve_paths(tmp_path / ".hermes" / "observability")
calls = {"stop_tunnel": 0, "stop_receiver": 0, "analyze": 0}
monkeypatch.setattr(
mod,
"run_preflight",
lambda cwd: {
"success": True,
"cli": {"installed": True, "logged_in": True, "api_supported": True},
"project": {"project_id": "prj_123"},
},
)
monkeypatch.setattr(
mod,
"start_receiver_background",
lambda **kwargs: {
"success": True,
"server": object(),
"thread": object(),
"startup": {
"listening": "http://127.0.0.1:4319",
"port": 4319,
"db_path": str(runtime_paths["db_path"]),
"raw_dir": str(runtime_paths["raw_dir"]),
},
},
)
monkeypatch.setattr(
mod,
"start_tunnel",
lambda **kwargs: {
"success": True,
"provider": "cloudflared",
"public_url": "https://demo.trycloudflare.com",
"command": ["cloudflared", "tunnel"],
"process": object(),
"reader_thread": object(),
},
)
monkeypatch.setattr(
mod,
"ensure_drain",
lambda **kwargs: {"success": False, "phase": "create", "response": {"stderr": "boom"}},
)
monkeypatch.setattr(mod, "get_max_row_id", lambda db_path: 0)
monkeypatch.setattr(
mod,
"stop_tunnel",
lambda process, reader_thread: calls.__setitem__("stop_tunnel", calls["stop_tunnel"] + 1) or {"success": True},
)
monkeypatch.setattr(
mod,
"stop_receiver_background",
lambda server, thread: calls.__setitem__("stop_receiver", calls["stop_receiver"] + 1) or {"success": True},
)
monkeypatch.setattr(mod, "delete_drain", lambda **kwargs: (_ for _ in ()).throw(AssertionError("delete_drain should not be called")))
monkeypatch.setattr(
mod,
"analyze_database",
lambda **kwargs: calls.__setitem__("analyze", calls["analyze"] + 1) or {"success": True},
)
result = mod.run_live_session(
cwd=tmp_path,
base_dir=runtime_paths["state_dir"],
minutes=0.01,
bind="127.0.0.1",
port=4319,
secret="shared-secret",
name_prefix="session",
project_id=None,
scope=None,
sources=["serverless"],
headers=None,
delivery_format="json",
tunnel="auto",
tunnel_timeout=10.0,
environment=None,
limit=None,
sample_limit=20,
report_path=None,
)
assert result["success"] is False
assert result["phase"] == "ensure-drain"
assert calls["stop_tunnel"] == 1
assert calls["stop_receiver"] == 1
assert calls["analyze"] == 0

View File

@@ -68,22 +68,6 @@ class TestAtomicJsonWrite:
tmp_files = [f for f in tmp_path.iterdir() if ".tmp" in f.name]
assert len(tmp_files) == 0
def test_cleans_up_temp_file_on_baseexception(self, tmp_path):
class SimulatedAbort(BaseException):
pass
target = tmp_path / "data.json"
original = {"preserved": True}
target.write_text(json.dumps(original), encoding="utf-8")
with patch("utils.json.dump", side_effect=SimulatedAbort):
with pytest.raises(SimulatedAbort):
atomic_json_write(target, {"new": True})
tmp_files = [f for f in tmp_path.iterdir() if ".tmp" in f.name]
assert len(tmp_files) == 0
assert json.loads(target.read_text(encoding="utf-8")) == original
def test_accepts_string_path(self, tmp_path):
target = str(tmp_path / "string_path.json")
atomic_json_write(target, {"string": True})

View File

@@ -1,44 +0,0 @@
"""Tests for utils.atomic_yaml_write — crash-safe YAML file writes."""
from pathlib import Path
from unittest.mock import patch
import pytest
import yaml
from utils import atomic_yaml_write
class TestAtomicYamlWrite:
def test_writes_valid_yaml(self, tmp_path):
target = tmp_path / "data.yaml"
data = {"key": "value", "nested": {"a": 1}}
atomic_yaml_write(target, data)
assert yaml.safe_load(target.read_text(encoding="utf-8")) == data
def test_cleans_up_temp_file_on_baseexception(self, tmp_path):
class SimulatedAbort(BaseException):
pass
target = tmp_path / "data.yaml"
original = {"preserved": True}
target.write_text(yaml.safe_dump(original), encoding="utf-8")
with patch("utils.yaml.dump", side_effect=SimulatedAbort):
with pytest.raises(SimulatedAbort):
atomic_yaml_write(target, {"new": True})
tmp_files = [f for f in tmp_path.iterdir() if ".tmp" in f.name]
assert len(tmp_files) == 0
assert yaml.safe_load(target.read_text(encoding="utf-8")) == original
def test_appends_extra_content(self, tmp_path):
target = tmp_path / "data.yaml"
atomic_yaml_write(target, {"key": "value"}, extra_content="\n# comment\n")
text = target.read_text(encoding="utf-8")
assert "key: value" in text
assert "# comment" in text

View File

@@ -1,10 +1,8 @@
"""Tests for tools/checkpoint_manager.py — CheckpointManager."""
import logging
import os
import json
import shutil
import subprocess
import pytest
from pathlib import Path
from unittest.mock import patch
@@ -145,12 +143,6 @@ class TestTakeCheckpoint:
result = mgr.ensure_checkpoint(str(work_dir), "initial")
assert result is True
def test_successful_checkpoint_does_not_log_expected_diff_exit(self, mgr, work_dir, caplog):
with caplog.at_level(logging.ERROR, logger="tools.checkpoint_manager"):
result = mgr.ensure_checkpoint(str(work_dir), "initial")
assert result is True
assert not any("diff --cached --quiet" in r.getMessage() for r in caplog.records)
def test_dedup_same_turn(self, mgr, work_dir):
r1 = mgr.ensure_checkpoint(str(work_dir), "first")
r2 = mgr.ensure_checkpoint(str(work_dir), "second")
@@ -383,26 +375,6 @@ class TestErrorResilience:
result = mgr.ensure_checkpoint(str(work_dir), "test")
assert result is False
def test_run_git_allows_expected_nonzero_without_error_log(self, tmp_path, caplog):
completed = subprocess.CompletedProcess(
args=["git", "diff", "--cached", "--quiet"],
returncode=1,
stdout="",
stderr="",
)
with patch("tools.checkpoint_manager.subprocess.run", return_value=completed):
with caplog.at_level(logging.ERROR, logger="tools.checkpoint_manager"):
ok, stdout, stderr = _run_git(
["diff", "--cached", "--quiet"],
tmp_path / "shadow",
str(tmp_path / "work"),
allowed_returncodes={1},
)
assert ok is False
assert stdout == ""
assert stderr == ""
assert not caplog.records
def test_checkpoint_failure_does_not_raise(self, mgr, work_dir, monkeypatch):
"""Checkpoint failures should never raise — they're silently logged."""
def broken_run_git(*args, **kwargs):

View File

@@ -5,7 +5,6 @@ handling without requiring a running terminal environment.
"""
import json
import logging
from unittest.mock import MagicMock, patch
from tools.file_tools import (
@@ -88,26 +87,13 @@ class TestWriteFileHandler:
mock_ops.write_file.assert_called_once_with("/tmp/out.txt", "hello world!\n")
@patch("tools.file_tools._get_file_ops")
def test_permission_error_returns_error_json_without_error_log(self, mock_get, caplog):
def test_exception_returns_error_json(self, mock_get):
mock_get.side_effect = PermissionError("read-only filesystem")
from tools.file_tools import write_file_tool
with caplog.at_level(logging.DEBUG, logger="tools.file_tools"):
result = json.loads(write_file_tool("/tmp/out.txt", "data"))
result = json.loads(write_file_tool("/tmp/out.txt", "data"))
assert "error" in result
assert "read-only" in result["error"]
assert any("write_file expected denial" in r.getMessage() for r in caplog.records)
assert not any(r.levelno >= logging.ERROR for r in caplog.records)
@patch("tools.file_tools._get_file_ops")
def test_unexpected_exception_still_logs_error(self, mock_get, caplog):
mock_get.side_effect = RuntimeError("boom")
from tools.file_tools import write_file_tool
with caplog.at_level(logging.ERROR, logger="tools.file_tools"):
result = json.loads(write_file_tool("/tmp/out.txt", "data"))
assert result["error"] == "boom"
assert any("write_file error" in r.getMessage() for r in caplog.records)
class TestPatchHandler:

View File

@@ -26,8 +26,7 @@ def _make_fake_popen(captured: dict):
proc = MagicMock()
proc.poll.return_value = 0
proc.returncode = 0
proc.stdout = iter([])
proc.stdout.close = lambda: None
proc.stdout = MagicMock(__iter__=lambda s: iter([]), __next__=lambda s: (_ for _ in ()).throw(StopIteration))
proc.stdin = MagicMock()
return proc
return fake_popen

View File

@@ -0,0 +1,152 @@
"""Tests for the local persistent shell backend."""
import glob as glob_mod
import pytest
from tools.environments.local import LocalEnvironment
from tools.environments.persistent_shell import PersistentShellMixin
class TestLocalConfig:
def test_local_persistent_default_false(self, monkeypatch):
monkeypatch.delenv("TERMINAL_LOCAL_PERSISTENT", raising=False)
from tools.terminal_tool import _get_env_config
assert _get_env_config()["local_persistent"] is False
def test_local_persistent_true(self, monkeypatch):
monkeypatch.setenv("TERMINAL_LOCAL_PERSISTENT", "true")
from tools.terminal_tool import _get_env_config
assert _get_env_config()["local_persistent"] is True
def test_local_persistent_yes(self, monkeypatch):
monkeypatch.setenv("TERMINAL_LOCAL_PERSISTENT", "yes")
from tools.terminal_tool import _get_env_config
assert _get_env_config()["local_persistent"] is True
class TestMergeOutput:
def test_stdout_only(self):
assert PersistentShellMixin._merge_output("out", "") == "out"
def test_stderr_only(self):
assert PersistentShellMixin._merge_output("", "err") == "err"
def test_both(self):
assert PersistentShellMixin._merge_output("out", "err") == "out\nerr"
def test_empty(self):
assert PersistentShellMixin._merge_output("", "") == ""
def test_strips_trailing_newlines(self):
assert PersistentShellMixin._merge_output("out\n\n", "err\n") == "out\nerr"
class TestLocalOneShotRegression:
def test_echo(self):
env = LocalEnvironment(persistent=False)
r = env.execute("echo hello")
assert r["returncode"] == 0
assert "hello" in r["output"]
env.cleanup()
def test_exit_code(self):
env = LocalEnvironment(persistent=False)
r = env.execute("exit 42")
assert r["returncode"] == 42
env.cleanup()
def test_state_does_not_persist(self):
env = LocalEnvironment(persistent=False)
env.execute("export HERMES_ONESHOT_LOCAL=yes")
r = env.execute("echo $HERMES_ONESHOT_LOCAL")
assert r["output"].strip() == ""
env.cleanup()
class TestLocalPersistent:
@pytest.fixture
def env(self):
e = LocalEnvironment(persistent=True)
yield e
e.cleanup()
def test_echo(self, env):
r = env.execute("echo hello-persistent")
assert r["returncode"] == 0
assert "hello-persistent" in r["output"]
def test_env_var_persists(self, env):
env.execute("export HERMES_LOCAL_PERSIST_TEST=works")
r = env.execute("echo $HERMES_LOCAL_PERSIST_TEST")
assert r["output"].strip() == "works"
def test_cwd_persists(self, env):
env.execute("cd /tmp")
r = env.execute("pwd")
assert r["output"].strip() == "/tmp"
def test_exit_code(self, env):
r = env.execute("(exit 42)")
assert r["returncode"] == 42
def test_stderr(self, env):
r = env.execute("echo oops >&2")
assert r["returncode"] == 0
assert "oops" in r["output"]
def test_multiline_output(self, env):
r = env.execute("echo a; echo b; echo c")
lines = r["output"].strip().splitlines()
assert lines == ["a", "b", "c"]
def test_timeout_then_recovery(self, env):
r = env.execute("sleep 999", timeout=2)
assert r["returncode"] in (124, 130)
r = env.execute("echo alive")
assert r["returncode"] == 0
assert "alive" in r["output"]
def test_large_output(self, env):
r = env.execute("seq 1 1000")
assert r["returncode"] == 0
lines = r["output"].strip().splitlines()
assert len(lines) == 1000
assert lines[0] == "1"
assert lines[-1] == "1000"
def test_shell_variable_persists(self, env):
env.execute("MY_LOCAL_VAR=hello123")
r = env.execute("echo $MY_LOCAL_VAR")
assert r["output"].strip() == "hello123"
def test_cleanup_removes_temp_files(self, env):
env.execute("echo warmup")
prefix = env._temp_prefix
assert len(glob_mod.glob(f"{prefix}-*")) > 0
env.cleanup()
remaining = glob_mod.glob(f"{prefix}-*")
assert remaining == []
def test_state_does_not_leak_between_instances(self):
env1 = LocalEnvironment(persistent=True)
env2 = LocalEnvironment(persistent=True)
try:
env1.execute("export LEAK_TEST=from_env1")
r = env2.execute("echo $LEAK_TEST")
assert r["output"].strip() == ""
finally:
env1.cleanup()
env2.cleanup()
def test_special_characters_in_command(self, env):
r = env.execute("echo 'hello world'")
assert r["output"].strip() == "hello world"
def test_pipe_command(self, env):
r = env.execute("echo hello | tr 'h' 'H'")
assert r["output"].strip() == "Hello"
def test_multiple_commands_semicolon(self, env):
r = env.execute("X=42; echo $X")
assert r["output"].strip() == "42"

View File

@@ -0,0 +1,167 @@
"""Tests for the SSH remote execution environment backend."""
import json
import os
import subprocess
from unittest.mock import MagicMock
import pytest
from tools.environments.ssh import SSHEnvironment
_SSH_HOST = os.getenv("TERMINAL_SSH_HOST", "")
_SSH_USER = os.getenv("TERMINAL_SSH_USER", "")
_SSH_PORT = int(os.getenv("TERMINAL_SSH_PORT", "22"))
_SSH_KEY = os.getenv("TERMINAL_SSH_KEY", "")
_has_ssh = bool(_SSH_HOST and _SSH_USER)
requires_ssh = pytest.mark.skipif(
not _has_ssh,
reason="TERMINAL_SSH_HOST / TERMINAL_SSH_USER not set",
)
def _run(command, task_id="ssh_test", **kwargs):
from tools.terminal_tool import terminal_tool
return json.loads(terminal_tool(command, task_id=task_id, **kwargs))
def _cleanup(task_id="ssh_test"):
from tools.terminal_tool import cleanup_vm
cleanup_vm(task_id)
class TestBuildSSHCommand:
@pytest.fixture(autouse=True)
def _mock_connection(self, monkeypatch):
monkeypatch.setattr("tools.environments.ssh.subprocess.run",
lambda *a, **k: subprocess.CompletedProcess([], 0))
monkeypatch.setattr("tools.environments.ssh.subprocess.Popen",
lambda *a, **k: MagicMock(stdout=iter([]),
stderr=iter([]),
stdin=MagicMock()))
monkeypatch.setattr("tools.environments.ssh.time.sleep", lambda _: None)
def test_base_flags(self):
env = SSHEnvironment(host="h", user="u")
cmd = " ".join(env._build_ssh_command())
for flag in ("ControlMaster=auto", "ControlPersist=300",
"BatchMode=yes", "StrictHostKeyChecking=accept-new"):
assert flag in cmd
def test_custom_port(self):
env = SSHEnvironment(host="h", user="u", port=2222)
cmd = env._build_ssh_command()
assert "-p" in cmd and "2222" in cmd
def test_key_path(self):
env = SSHEnvironment(host="h", user="u", key_path="/k")
cmd = env._build_ssh_command()
assert "-i" in cmd and "/k" in cmd
def test_user_host_suffix(self):
env = SSHEnvironment(host="h", user="u")
assert env._build_ssh_command()[-1] == "u@h"
class TestTerminalToolConfig:
def test_ssh_persistent_default_false(self, monkeypatch):
monkeypatch.delenv("TERMINAL_SSH_PERSISTENT", raising=False)
from tools.terminal_tool import _get_env_config
assert _get_env_config()["ssh_persistent"] is False
def test_ssh_persistent_true(self, monkeypatch):
monkeypatch.setenv("TERMINAL_SSH_PERSISTENT", "true")
from tools.terminal_tool import _get_env_config
assert _get_env_config()["ssh_persistent"] is True
def _setup_ssh_env(monkeypatch, persistent: bool):
monkeypatch.setenv("TERMINAL_ENV", "ssh")
monkeypatch.setenv("TERMINAL_SSH_HOST", _SSH_HOST)
monkeypatch.setenv("TERMINAL_SSH_USER", _SSH_USER)
monkeypatch.setenv("TERMINAL_SSH_PERSISTENT", "true" if persistent else "false")
if _SSH_PORT != 22:
monkeypatch.setenv("TERMINAL_SSH_PORT", str(_SSH_PORT))
if _SSH_KEY:
monkeypatch.setenv("TERMINAL_SSH_KEY", _SSH_KEY)
@requires_ssh
class TestOneShotSSH:
@pytest.fixture(autouse=True)
def _setup(self, monkeypatch):
_setup_ssh_env(monkeypatch, persistent=False)
yield
_cleanup()
def test_echo(self):
r = _run("echo hello")
assert r["exit_code"] == 0
assert "hello" in r["output"]
def test_exit_code(self):
r = _run("exit 42")
assert r["exit_code"] == 42
def test_state_does_not_persist(self):
_run("export HERMES_ONESHOT_TEST=yes")
r = _run("echo $HERMES_ONESHOT_TEST")
assert r["output"].strip() == ""
@requires_ssh
class TestPersistentSSH:
@pytest.fixture(autouse=True)
def _setup(self, monkeypatch):
_setup_ssh_env(monkeypatch, persistent=True)
yield
_cleanup()
def test_echo(self):
r = _run("echo hello-persistent")
assert r["exit_code"] == 0
assert "hello-persistent" in r["output"]
def test_env_var_persists(self):
_run("export HERMES_PERSIST_TEST=works")
r = _run("echo $HERMES_PERSIST_TEST")
assert r["output"].strip() == "works"
def test_cwd_persists(self):
_run("cd /tmp")
r = _run("pwd")
assert r["output"].strip() == "/tmp"
def test_exit_code(self):
r = _run("(exit 42)")
assert r["exit_code"] == 42
def test_stderr(self):
r = _run("echo oops >&2")
assert r["exit_code"] == 0
assert "oops" in r["output"]
def test_multiline_output(self):
r = _run("echo a; echo b; echo c")
lines = r["output"].strip().splitlines()
assert lines == ["a", "b", "c"]
def test_timeout_then_recovery(self):
r = _run("sleep 999", timeout=2)
assert r["exit_code"] == 124
r = _run("echo alive")
assert r["exit_code"] == 0
assert "alive" in r["output"]
def test_large_output(self):
r = _run("seq 1 1000")
assert r["exit_code"] == 0
lines = r["output"].strip().splitlines()
assert len(lines) == 1000
assert lines[0] == "1"
assert lines[-1] == "1000"

View File

@@ -92,17 +92,10 @@ def _run_git(
shadow_repo: Path,
working_dir: str,
timeout: int = _GIT_TIMEOUT,
allowed_returncodes: Optional[Set[int]] = None,
) -> tuple:
"""Run a git command against the shadow repo. Returns (ok, stdout, stderr).
``allowed_returncodes`` suppresses error logging for known/expected non-zero
exits while preserving the normal ``ok = (returncode == 0)`` contract.
Example: ``git diff --cached --quiet`` returns 1 when changes exist.
"""
"""Run a git command against the shadow repo. Returns (ok, stdout, stderr)."""
env = _git_env(shadow_repo, working_dir)
cmd = ["git"] + list(args)
allowed_returncodes = allowed_returncodes or set()
try:
result = subprocess.run(
cmd,
@@ -115,7 +108,7 @@ def _run_git(
ok = result.returncode == 0
stdout = result.stdout.strip()
stderr = result.stderr.strip()
if not ok and result.returncode not in allowed_returncodes:
if not ok:
logger.error(
"Git command failed: %s (rc=%d) stderr=%s",
" ".join(cmd), result.returncode, stderr,
@@ -388,10 +381,7 @@ class CheckpointManager:
# Check if there's anything to commit
ok_diff, diff_out, _ = _run_git(
["diff", "--cached", "--quiet"],
shadow,
working_dir,
allowed_returncodes={1},
["diff", "--cached", "--quiet"], shadow, working_dir,
)
if ok_diff:
# No changes to commit

View File

@@ -1,5 +1,6 @@
"""Local execution environment with interrupt support and non-blocking I/O."""
import glob
import os
import platform
import shutil
@@ -11,6 +12,8 @@ import time
_IS_WINDOWS = platform.system() == "Windows"
from tools.environments.base import BaseEnvironment
from tools.environments.persistent_shell import PersistentShellMixin
from tools.interrupt import is_interrupted
# Unique marker to isolate real command output from shell init/exit noise.
# printf (no trailing newline) keeps the boundaries clean for splitting.
@@ -244,6 +247,25 @@ def _clean_shell_noise(output: str) -> str:
return result
_SANE_PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
def _make_run_env(env: dict) -> dict:
"""Build a run environment with a sane PATH and provider-var stripping."""
merged = dict(os.environ | env)
run_env = {}
for k, v in merged.items():
if k.startswith(_HERMES_PROVIDER_ENV_FORCE_PREFIX):
real_key = k[len(_HERMES_PROVIDER_ENV_FORCE_PREFIX):]
run_env[real_key] = v
elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST:
run_env[k] = v
existing_path = run_env.get("PATH", "")
if "/usr/bin" not in existing_path.split(":"):
run_env["PATH"] = f"{existing_path}:{_SANE_PATH}" if existing_path else _SANE_PATH
return run_env
def _extract_fenced_output(raw: str) -> str:
"""Extract real command output from between fence markers.
@@ -268,7 +290,7 @@ def _extract_fenced_output(raw: str) -> str:
return raw[start:last]
class LocalEnvironment(BaseEnvironment):
class LocalEnvironment(PersistentShellMixin, BaseEnvironment):
"""Run commands directly on the host machine.
Features:
@@ -277,24 +299,66 @@ class LocalEnvironment(BaseEnvironment):
- stdin_data support for piping content (bypasses ARG_MAX limits)
- sudo -S transform via SUDO_PASSWORD env var
- Uses interactive login shell so full user env is available
- Optional persistent shell mode (cwd/env vars survive across calls)
"""
def __init__(self, cwd: str = "", timeout: int = 60, env: dict = None):
def __init__(self, cwd: str = "", timeout: int = 60, env: dict = None,
persistent: bool = False):
super().__init__(cwd=cwd or os.getcwd(), timeout=timeout, env=env)
self.persistent = persistent
if self.persistent:
self._init_persistent_shell()
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
from tools.terminal_tool import _interrupt_event
@property
def _temp_prefix(self) -> str:
return f"/tmp/hermes-local-{self._session_id}"
def _spawn_shell_process(self) -> subprocess.Popen:
user_shell = _find_bash()
run_env = _make_run_env(self.env)
return subprocess.Popen(
[user_shell, "-l"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
text=True,
env=run_env,
preexec_fn=None if _IS_WINDOWS else os.setsid,
)
def _read_temp_files(self, *paths: str) -> list[str]:
results = []
for path in paths:
if os.path.exists(path):
with open(path) as f:
results.append(f.read())
else:
results.append("")
return results
def _kill_shell_children(self):
if self._shell_pid is None:
return
try:
subprocess.run(
["pkill", "-P", str(self._shell_pid)],
capture_output=True, timeout=5,
)
except (subprocess.TimeoutExpired, FileNotFoundError):
pass
def _cleanup_temp_files(self):
for f in glob.glob(f"{self._temp_prefix}-*"):
if os.path.exists(f):
os.remove(f)
def _execute_oneshot(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
work_dir = cwd or self.cwd or os.getcwd()
effective_timeout = timeout or self.timeout
exec_command, sudo_stdin = self._prepare_command(command)
# Merge the sudo password (if any) with caller-supplied stdin_data.
# sudo -S reads exactly one line (the password) then passes the rest
# of stdin to the child, so prepending is safe even when stdin_data
# is also present.
if sudo_stdin is not None and stdin_data is not None:
effective_stdin = sudo_stdin + stdin_data
elif sudo_stdin is not None:
@@ -302,110 +366,87 @@ class LocalEnvironment(BaseEnvironment):
else:
effective_stdin = stdin_data
try:
# The fence wrapper uses bash syntax (semicolons, $?, printf).
# Always use bash for the wrapper — NOT $SHELL which could be
# fish, zsh, or another shell with incompatible syntax.
# The -lic flags source rc files so tools like nvm/pyenv work.
user_shell = _find_bash()
# Wrap with output fences so we can later extract the real
# command output and discard shell init/exit noise.
fenced_cmd = (
f"printf '{_OUTPUT_FENCE}';"
f" {exec_command};"
f" __hermes_rc=$?;"
f" printf '{_OUTPUT_FENCE}';"
f" exit $__hermes_rc"
)
# Ensure PATH always includes standard dirs — systemd services
# and some terminal multiplexers inherit a minimal PATH.
_SANE_PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# Strip Hermes-managed provider/tool/gateway vars so external CLIs
# are not silently misrouted or handed Hermes secrets. Callers that
# truly need a blocked var can opt in by prefixing the key with
# _HERMES_FORCE_ in self.env (e.g. _HERMES_FORCE_OPENAI_API_KEY).
run_env = _sanitize_subprocess_env(os.environ, self.env)
existing_path = run_env.get("PATH", "")
if "/usr/bin" not in existing_path.split(":"):
run_env["PATH"] = f"{existing_path}:{_SANE_PATH}" if existing_path else _SANE_PATH
user_shell = _find_bash()
fenced_cmd = (
f"printf '{_OUTPUT_FENCE}';"
f" {exec_command};"
f" __hermes_rc=$?;"
f" printf '{_OUTPUT_FENCE}';"
f" exit $__hermes_rc"
)
run_env = _make_run_env(self.env)
proc = subprocess.Popen(
[user_shell, "-lic", fenced_cmd],
text=True,
cwd=work_dir,
env=run_env,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.PIPE if effective_stdin is not None else subprocess.DEVNULL,
preexec_fn=None if _IS_WINDOWS else os.setsid,
)
proc = subprocess.Popen(
[user_shell, "-lic", fenced_cmd],
text=True,
cwd=work_dir,
env=run_env,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.PIPE if effective_stdin is not None else subprocess.DEVNULL,
preexec_fn=None if _IS_WINDOWS else os.setsid,
)
if effective_stdin is not None:
def _write_stdin():
try:
proc.stdin.write(effective_stdin)
proc.stdin.close()
except (BrokenPipeError, OSError):
pass
threading.Thread(target=_write_stdin, daemon=True).start()
_output_chunks: list[str] = []
def _drain_stdout():
if effective_stdin is not None:
def _write_stdin():
try:
for line in proc.stdout:
_output_chunks.append(line)
except ValueError:
proc.stdin.write(effective_stdin)
proc.stdin.close()
except (BrokenPipeError, OSError):
pass
finally:
try:
proc.stdout.close()
except Exception:
pass
threading.Thread(target=_write_stdin, daemon=True).start()
reader = threading.Thread(target=_drain_stdout, daemon=True)
reader.start()
deadline = time.monotonic() + effective_timeout
_output_chunks: list[str] = []
while proc.poll() is None:
if _interrupt_event.is_set():
try:
if _IS_WINDOWS:
proc.terminate()
else:
pgid = os.getpgid(proc.pid)
os.killpg(pgid, signal.SIGTERM)
try:
proc.wait(timeout=1.0)
except subprocess.TimeoutExpired:
os.killpg(pgid, signal.SIGKILL)
except (ProcessLookupError, PermissionError):
proc.kill()
reader.join(timeout=2)
return {
"output": "".join(_output_chunks) + "\n[Command interrupted — user sent a new message]",
"returncode": 130,
}
if time.monotonic() > deadline:
try:
if _IS_WINDOWS:
proc.terminate()
else:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
except (ProcessLookupError, PermissionError):
proc.kill()
reader.join(timeout=2)
return self._timeout_result(effective_timeout)
time.sleep(0.2)
def _drain_stdout():
try:
for line in proc.stdout:
_output_chunks.append(line)
except ValueError:
pass
finally:
try:
proc.stdout.close()
except Exception:
pass
reader.join(timeout=5)
output = _extract_fenced_output("".join(_output_chunks))
return {"output": output, "returncode": proc.returncode}
reader = threading.Thread(target=_drain_stdout, daemon=True)
reader.start()
deadline = time.monotonic() + effective_timeout
except Exception as e:
return {"output": f"Execution error: {str(e)}", "returncode": 1}
while proc.poll() is None:
if is_interrupted():
try:
if _IS_WINDOWS:
proc.terminate()
else:
pgid = os.getpgid(proc.pid)
os.killpg(pgid, signal.SIGTERM)
try:
proc.wait(timeout=1.0)
except subprocess.TimeoutExpired:
os.killpg(pgid, signal.SIGKILL)
except (ProcessLookupError, PermissionError):
proc.kill()
reader.join(timeout=2)
return {
"output": "".join(_output_chunks) + "\n[Command interrupted — user sent a new message]",
"returncode": 130,
}
if time.monotonic() > deadline:
try:
if _IS_WINDOWS:
proc.terminate()
else:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
except (ProcessLookupError, PermissionError):
proc.kill()
reader.join(timeout=2)
return self._timeout_result(effective_timeout)
time.sleep(0.2)
def cleanup(self):
pass
reader.join(timeout=5)
output = _extract_fenced_output("".join(_output_chunks))
return {"output": output, "returncode": proc.returncode}

View File

@@ -0,0 +1,272 @@
"""Persistent shell mixin: file-based IPC protocol for long-lived bash shells."""
import logging
import shlex
import subprocess
import threading
import time
import uuid
from abc import abstractmethod
from tools.interrupt import is_interrupted
logger = logging.getLogger(__name__)
class PersistentShellMixin:
"""Mixin that adds persistent shell capability to any BaseEnvironment.
Subclasses must implement ``_spawn_shell_process()``, ``_read_temp_files()``,
``_kill_shell_children()``, ``_execute_oneshot()``, and ``_cleanup_temp_files()``.
"""
persistent: bool
@abstractmethod
def _spawn_shell_process(self) -> subprocess.Popen: ...
@abstractmethod
def _read_temp_files(self, *paths: str) -> list[str]: ...
@abstractmethod
def _kill_shell_children(self): ...
@abstractmethod
def _execute_oneshot(self, command: str, cwd: str, *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict: ...
@abstractmethod
def _cleanup_temp_files(self): ...
_session_id: str = ""
_poll_interval: float = 0.01
@property
def _temp_prefix(self) -> str:
return f"/tmp/hermes-persistent-{self._session_id}"
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def _init_persistent_shell(self):
self._shell_lock = threading.Lock()
self._shell_proc: subprocess.Popen | None = None
self._shell_alive: bool = False
self._shell_pid: int | None = None
self._session_id = uuid.uuid4().hex[:12]
p = self._temp_prefix
self._pshell_stdout = f"{p}-stdout"
self._pshell_stderr = f"{p}-stderr"
self._pshell_status = f"{p}-status"
self._pshell_cwd = f"{p}-cwd"
self._pshell_pid_file = f"{p}-pid"
self._shell_proc = self._spawn_shell_process()
self._shell_alive = True
self._drain_thread = threading.Thread(
target=self._drain_shell_output, daemon=True,
)
self._drain_thread.start()
init_script = (
f"export TERM=${{TERM:-dumb}}\n"
f"touch {self._pshell_stdout} {self._pshell_stderr} "
f"{self._pshell_status} {self._pshell_cwd} {self._pshell_pid_file}\n"
f"echo $$ > {self._pshell_pid_file}\n"
f"pwd > {self._pshell_cwd}\n"
)
self._send_to_shell(init_script)
deadline = time.monotonic() + 3.0
while time.monotonic() < deadline:
pid_str = self._read_temp_files(self._pshell_pid_file)[0].strip()
if pid_str.isdigit():
self._shell_pid = int(pid_str)
break
time.sleep(0.05)
else:
logger.warning("Could not read persistent shell PID")
self._shell_pid = None
if self._shell_pid:
logger.info(
"Persistent shell started (session=%s, pid=%d)",
self._session_id, self._shell_pid,
)
reported_cwd = self._read_temp_files(self._pshell_cwd)[0].strip()
if reported_cwd:
self.cwd = reported_cwd
def _cleanup_persistent_shell(self):
if self._shell_proc is None:
return
if self._session_id:
self._cleanup_temp_files()
try:
self._shell_proc.stdin.close()
except Exception:
pass
try:
self._shell_proc.terminate()
self._shell_proc.wait(timeout=3)
except subprocess.TimeoutExpired:
self._shell_proc.kill()
self._shell_alive = False
self._shell_proc = None
if hasattr(self, "_drain_thread") and self._drain_thread.is_alive():
self._drain_thread.join(timeout=1.0)
# ------------------------------------------------------------------
# execute() / cleanup() — shared dispatcher, subclasses inherit
# ------------------------------------------------------------------
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
if self.persistent:
return self._execute_persistent(
command, cwd, timeout=timeout, stdin_data=stdin_data,
)
return self._execute_oneshot(
command, cwd, timeout=timeout, stdin_data=stdin_data,
)
def cleanup(self):
if self.persistent:
self._cleanup_persistent_shell()
# ------------------------------------------------------------------
# Shell I/O
# ------------------------------------------------------------------
def _drain_shell_output(self):
try:
for _ in self._shell_proc.stdout:
pass
except Exception:
pass
self._shell_alive = False
def _send_to_shell(self, text: str):
if not self._shell_alive or self._shell_proc is None:
return
try:
self._shell_proc.stdin.write(text)
self._shell_proc.stdin.flush()
except (BrokenPipeError, OSError):
self._shell_alive = False
def _read_persistent_output(self) -> tuple[str, int, str]:
stdout, stderr, status_raw, cwd = self._read_temp_files(
self._pshell_stdout, self._pshell_stderr,
self._pshell_status, self._pshell_cwd,
)
output = self._merge_output(stdout, stderr)
status = status_raw.strip()
if ":" in status:
status = status.split(":", 1)[1]
try:
exit_code = int(status.strip())
except ValueError:
exit_code = 1
return output, exit_code, cwd.strip()
# ------------------------------------------------------------------
# Execution
# ------------------------------------------------------------------
def _execute_persistent(self, command: str, cwd: str, *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
if not self._shell_alive:
logger.info("Persistent shell died, restarting...")
self._init_persistent_shell()
exec_command, sudo_stdin = self._prepare_command(command)
effective_timeout = timeout or self.timeout
if stdin_data or sudo_stdin:
return self._execute_oneshot(
command, cwd, timeout=timeout, stdin_data=stdin_data,
)
with self._shell_lock:
return self._execute_persistent_locked(
exec_command, cwd, effective_timeout,
)
def _execute_persistent_locked(self, command: str, cwd: str,
timeout: int) -> dict:
work_dir = cwd or self.cwd
cmd_id = uuid.uuid4().hex[:8]
truncate = (
f": > {self._pshell_stdout}\n"
f": > {self._pshell_stderr}\n"
f": > {self._pshell_status}\n"
)
self._send_to_shell(truncate)
escaped = command.replace("'", "'\\''")
ipc_script = (
f"cd {shlex.quote(work_dir)}\n"
f"eval '{escaped}' < /dev/null > {self._pshell_stdout} 2> {self._pshell_stderr}\n"
f"__EC=$?\n"
f"pwd > {self._pshell_cwd}\n"
f"echo {cmd_id}:$__EC > {self._pshell_status}\n"
)
self._send_to_shell(ipc_script)
deadline = time.monotonic() + timeout
poll_interval = self._poll_interval
while True:
if is_interrupted():
self._kill_shell_children()
output, _, _ = self._read_persistent_output()
return {
"output": output + "\n[Command interrupted]",
"returncode": 130,
}
if time.monotonic() > deadline:
self._kill_shell_children()
output, _, _ = self._read_persistent_output()
if output:
return {
"output": output + f"\n[Command timed out after {timeout}s]",
"returncode": 124,
}
return self._timeout_result(timeout)
if not self._shell_alive:
return {
"output": "Persistent shell died during execution",
"returncode": 1,
}
status_content = self._read_temp_files(self._pshell_status)[0].strip()
if status_content.startswith(cmd_id + ":"):
break
time.sleep(poll_interval)
output, exit_code, new_cwd = self._read_persistent_output()
if new_cwd:
self.cwd = new_cwd
return {"output": output, "returncode": exit_code}
@staticmethod
def _merge_output(stdout: str, stderr: str) -> str:
parts = []
if stdout.strip():
parts.append(stdout.rstrip("\n"))
if stderr.strip():
parts.append(stderr.rstrip("\n"))
return "\n".join(parts)

View File

@@ -8,12 +8,13 @@ import time
from pathlib import Path
from tools.environments.base import BaseEnvironment
from tools.environments.persistent_shell import PersistentShellMixin
from tools.interrupt import is_interrupted
logger = logging.getLogger(__name__)
class SSHEnvironment(BaseEnvironment):
class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
"""Run commands on a remote machine over SSH.
Uses SSH ControlMaster for connection persistence so subsequent
@@ -22,22 +23,33 @@ class SSHEnvironment(BaseEnvironment):
Foreground commands are interruptible: the local ssh process is killed
and a remote kill is attempted over the ControlMaster socket.
When ``persistent=True``, a single long-lived bash shell is kept alive
over SSH and state (cwd, env vars, shell variables) persists across
``execute()`` calls. Output capture uses file-based IPC on the remote
host (stdout/stderr/exit-code written to temp files, polled via fast
ControlMaster one-shot reads).
"""
def __init__(self, host: str, user: str, cwd: str = "~",
timeout: int = 60, port: int = 22, key_path: str = ""):
timeout: int = 60, port: int = 22, key_path: str = "",
persistent: bool = False):
super().__init__(cwd=cwd, timeout=timeout)
self.host = host
self.user = user
self.port = port
self.key_path = key_path
self.persistent = persistent
self.control_dir = Path(tempfile.gettempdir()) / "hermes-ssh"
self.control_dir.mkdir(parents=True, exist_ok=True)
self.control_socket = self.control_dir / f"{user}@{host}:{port}.sock"
self._establish_connection()
def _build_ssh_command(self, extra_args: list = None) -> list:
if self.persistent:
self._init_persistent_shell()
def _build_ssh_command(self, extra_args: list | None = None) -> list:
cmd = ["ssh"]
cmd.extend(["-o", f"ControlPath={self.control_socket}"])
cmd.extend(["-o", "ControlMaster=auto"])
@@ -65,15 +77,76 @@ class SSHEnvironment(BaseEnvironment):
except subprocess.TimeoutExpired:
raise RuntimeError(f"SSH connection to {self.user}@{self.host} timed out")
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
_poll_interval: float = 0.15
@property
def _temp_prefix(self) -> str:
return f"/tmp/hermes-ssh-{self._session_id}"
def _spawn_shell_process(self) -> subprocess.Popen:
cmd = self._build_ssh_command()
cmd.append("bash -l")
return subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
text=True,
)
def _read_temp_files(self, *paths: str) -> list[str]:
if len(paths) == 1:
cmd = self._build_ssh_command()
cmd.append(f"cat {paths[0]} 2>/dev/null")
try:
result = subprocess.run(
cmd, capture_output=True, text=True, timeout=10,
)
return [result.stdout]
except (subprocess.TimeoutExpired, OSError):
return [""]
delim = f"__HERMES_SEP_{self._session_id}__"
script = "; ".join(
f"cat {p} 2>/dev/null; echo '{delim}'" for p in paths
)
cmd = self._build_ssh_command()
cmd.append(script)
try:
result = subprocess.run(
cmd, capture_output=True, text=True, timeout=10,
)
parts = result.stdout.split(delim + "\n")
return [parts[i] if i < len(parts) else "" for i in range(len(paths))]
except (subprocess.TimeoutExpired, OSError):
return [""] * len(paths)
def _kill_shell_children(self):
if self._shell_pid is None:
return
cmd = self._build_ssh_command()
cmd.append(f"pkill -P {self._shell_pid} 2>/dev/null; true")
try:
subprocess.run(cmd, capture_output=True, timeout=5)
except (subprocess.TimeoutExpired, OSError):
pass
def _cleanup_temp_files(self):
cmd = self._build_ssh_command()
cmd.append(f"rm -f {self._temp_prefix}-*")
try:
subprocess.run(cmd, capture_output=True, timeout=5)
except (subprocess.TimeoutExpired, OSError):
pass
def _execute_oneshot(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
work_dir = cwd or self.cwd
exec_command, sudo_stdin = self._prepare_command(command)
wrapped = f'cd {work_dir} && {exec_command}'
effective_timeout = timeout or self.timeout
# Merge sudo password (if any) with caller-supplied stdin_data.
if sudo_stdin is not None and stdin_data is not None:
effective_stdin = sudo_stdin + stdin_data
elif sudo_stdin is not None:
@@ -82,66 +155,60 @@ class SSHEnvironment(BaseEnvironment):
effective_stdin = stdin_data
cmd = self._build_ssh_command()
cmd.extend(["bash", "-c", wrapped])
cmd.append(wrapped)
try:
kwargs = self._build_run_kwargs(timeout, effective_stdin)
# Remove timeout from kwargs -- we handle it in the poll loop
kwargs.pop("timeout", None)
kwargs = self._build_run_kwargs(timeout, effective_stdin)
kwargs.pop("timeout", None)
_output_chunks = []
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.PIPE if effective_stdin else subprocess.DEVNULL,
text=True,
)
_output_chunks = []
if effective_stdin:
try:
proc.stdin.write(effective_stdin)
proc.stdin.close()
except (BrokenPipeError, OSError):
pass
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.PIPE if effective_stdin else subprocess.DEVNULL,
text=True,
)
def _drain():
try:
for line in proc.stdout:
_output_chunks.append(line)
except Exception:
pass
if effective_stdin:
reader = threading.Thread(target=_drain, daemon=True)
reader.start()
deadline = time.monotonic() + effective_timeout
while proc.poll() is None:
if is_interrupted():
proc.terminate()
try:
proc.stdin.write(effective_stdin)
proc.stdin.close()
except Exception:
pass
def _drain():
try:
for line in proc.stdout:
_output_chunks.append(line)
except Exception:
pass
reader = threading.Thread(target=_drain, daemon=True)
reader.start()
deadline = time.monotonic() + effective_timeout
while proc.poll() is None:
if is_interrupted():
proc.terminate()
try:
proc.wait(timeout=1)
except subprocess.TimeoutExpired:
proc.kill()
reader.join(timeout=2)
return {
"output": "".join(_output_chunks) + "\n[Command interrupted]",
"returncode": 130,
}
if time.monotonic() > deadline:
proc.wait(timeout=1)
except subprocess.TimeoutExpired:
proc.kill()
reader.join(timeout=2)
return self._timeout_result(effective_timeout)
time.sleep(0.2)
reader.join(timeout=2)
return {
"output": "".join(_output_chunks) + "\n[Command interrupted]",
"returncode": 130,
}
if time.monotonic() > deadline:
proc.kill()
reader.join(timeout=2)
return self._timeout_result(effective_timeout)
time.sleep(0.2)
reader.join(timeout=5)
return {"output": "".join(_output_chunks), "returncode": proc.returncode}
except Exception as e:
return {"output": f"SSH execution error: {str(e)}", "returncode": 1}
reader.join(timeout=5)
return {"output": "".join(_output_chunks), "returncode": proc.returncode}
def cleanup(self):
super().cleanup()
if self.control_socket.exists():
try:
cmd = ["ssh", "-o", f"ControlPath={self.control_socket}",

View File

@@ -1,7 +1,6 @@
#!/usr/bin/env python3
"""File Tools Module - LLM agent file manipulation tools."""
import errno
import json
import logging
import os
@@ -12,18 +11,6 @@ from agent.redact import redact_sensitive_text
logger = logging.getLogger(__name__)
_EXPECTED_WRITE_ERRNOS = {errno.EACCES, errno.EPERM, errno.EROFS}
def _is_expected_write_exception(exc: Exception) -> bool:
"""Return True for expected write denials that should not hit error logs."""
if isinstance(exc, PermissionError):
return True
if isinstance(exc, OSError) and exc.errno in _EXPECTED_WRITE_ERRNOS:
return True
return False
_file_ops_lock = threading.Lock()
_file_ops_cache: dict = {}
@@ -114,12 +101,31 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
"container_persistent": config.get("container_persistent", True),
"docker_volumes": config.get("docker_volumes", []),
}
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
"persistent": config.get("ssh_persistent", False),
}
local_config = None
if env_type == "local":
local_config = {
"persistent": config.get("local_persistent", False),
}
terminal_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=config["timeout"],
ssh_config=ssh_config,
container_config=container_config,
local_config=local_config,
task_id=task_id,
)
@@ -251,10 +257,7 @@ def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
result = file_ops.write_file(path, content)
return json.dumps(result.to_dict(), ensure_ascii=False)
except Exception as e:
if _is_expected_write_exception(e):
logger.debug("write_file expected denial: %s: %s", type(e).__name__, e)
else:
logger.error("write_file error: %s: %s", type(e).__name__, e, exc_info=True)
logger.error("write_file error: %s: %s", type(e).__name__, e)
return json.dumps({"error": str(e)}, ensure_ascii=False)

View File

@@ -471,6 +471,8 @@ def _get_env_config() -> Dict[str, Any]:
# is running inside the container/remote).
if env_type == "local":
default_cwd = os.getcwd()
elif env_type == "ssh":
default_cwd = "~"
else:
default_cwd = "/root"
@@ -503,6 +505,8 @@ def _get_env_config() -> Dict[str, Any]:
"ssh_user": os.getenv("TERMINAL_SSH_USER", ""),
"ssh_port": _parse_env_var("TERMINAL_SSH_PORT", "22"),
"ssh_key": os.getenv("TERMINAL_SSH_KEY", ""),
"ssh_persistent": os.getenv("TERMINAL_SSH_PERSISTENT", "false").lower() in ("true", "1", "yes"),
"local_persistent": os.getenv("TERMINAL_LOCAL_PERSISTENT", "false").lower() in ("true", "1", "yes"),
# Container resource config (applies to docker, singularity, modal, daytona -- ignored for local/ssh)
"container_cpu": _parse_env_var("TERMINAL_CONTAINER_CPU", "1", float, "number"),
"container_memory": _parse_env_var("TERMINAL_CONTAINER_MEMORY", "5120"), # MB (default 5GB)
@@ -514,6 +518,7 @@ def _get_env_config() -> Dict[str, Any]:
def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
ssh_config: dict = None, container_config: dict = None,
local_config: dict = None,
task_id: str = "default"):
"""
Create an execution environment from mini-swe-agent.
@@ -538,7 +543,9 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
volumes = cc.get("docker_volumes", [])
if env_type == "local":
return _LocalEnvironment(cwd=cwd, timeout=timeout)
lc = local_config or {}
return _LocalEnvironment(cwd=cwd, timeout=timeout,
persistent=lc.get("persistent", False))
elif env_type == "docker":
return _DockerEnvironment(
@@ -594,6 +601,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
key_path=ssh_config.get("key", ""),
cwd=cwd,
timeout=timeout,
persistent=ssh_config.get("persistent", False),
)
else:
@@ -923,6 +931,7 @@ def terminal_tool(
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
"persistent": config.get("ssh_persistent", False),
}
container_config = None
@@ -935,6 +944,12 @@ def terminal_tool(
"docker_volumes": config.get("docker_volumes", []),
}
local_config = None
if env_type == "local":
local_config = {
"persistent": config.get("local_persistent", False),
}
new_env = _create_environment(
env_type=env_type,
image=image,
@@ -942,6 +957,7 @@ def terminal_tool(
timeout=effective_timeout,
ssh_config=ssh_config,
container_config=container_config,
local_config=local_config,
task_id=effective_task_id,
)
except ImportError as e:

View File

@@ -50,8 +50,6 @@ def atomic_json_write(
os.fsync(f.fileno())
os.replace(tmp_path, path)
except BaseException:
# Intentionally catch BaseException so temp-file cleanup still runs for
# KeyboardInterrupt/SystemExit before re-raising the original signal.
try:
os.unlink(tmp_path)
except OSError:
@@ -98,8 +96,6 @@ def atomic_yaml_write(
os.fsync(f.fileno())
os.replace(tmp_path, path)
except BaseException:
# Match atomic_json_write: cleanup must also happen for process-level
# interruptions before we re-raise them.
try:
os.unlink(tmp_path)
except OSError: