Compare commits

...

40 Commits

Author SHA1 Message Date
Teknium 81f5faf1e0 fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes
teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after
a code update, on both his Windows machine AND his Linux workstation.  The
failure mode is real and affects every user who updates hermes by any path
OTHER than a fully-successful ``hermes update``.

## What happens

hermes_bootstrap.py is a top-level module registered via pyproject.toml's
``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work).  It
must be registered in the venv's editable-install .pth file before Python
can find it as a bare ``import hermes_bootstrap``.

``hermes update`` handles this correctly: (1) git reset --hard, (2) clear
__pycache__, (3) uv pip install -e . (re-registers the package including
the new py-modules list), (4) restart.

BUT if any step AFTER (1) fails — network blip during pip install, PEP 668
on a system Python, venv locked, uv not in PATH, a crash mid-update — the
user is left with new code that references hermes_bootstrap and a venv
that doesn't know about it.  Every hermes invocation after that crashes
with ModuleNotFoundError, including ``hermes update`` itself.  No recovery
path without manual `uv pip install -e .`.

Also affects users who ``git pull`` the repo directly without running
hermes update — relatively common for developers.

## Fix

Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError
across all 6 entry points (hermes_cli/main, run_agent, gateway/run,
acp_adapter/entry, cli, batch_runner).  On Windows, missing bootstrap
means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode
chars may fail to print) but NOT a crash.  POSIX is unaffected either way
since the bootstrap is a no-op there.

Once hermes is running again, the user can ``hermes update`` to fully
recover.

## Test update

tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap
scans for the first top-level import in each entry point and asserts it
is hermes_bootstrap.  Extended the check to accept a Try block whose body
is a lone Import of hermes_bootstrap — that's the recovery-friendly form
we just introduced.

Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak``
and confirming ``python -c "import hermes_cli.main"`` succeeds.  82/82
tests pass (hermes_bootstrap + windows-native + windows-compat).
2026-05-08 14:38:42 -07:00
Teknium 67b8a1076a ci: retrigger checks after #22083 landed the profiles.py PLW1514 fix on main 2026-05-08 14:24:55 -07:00
Teknium e318b593f1 ci(lint): add blocking ruff-check + windows-footguns jobs to lint.yml
Paired with commit e0c03defd (enabled PLW1514 in pyproject.toml) and
commit 3dfb35700 (added scripts/check-windows-footguns.py). Both
commits noted that the corresponding workflow edits were held back
because the authoring token lacked the `workflow` OAuth scope.

New jobs, both separate from `lint-diff` so the advisory diff
comment still posts when enforcement fails:

- ruff-blocking: runs `ruff check .` against the explicit select
  list in pyproject.toml (currently PLW1514, which catches bare
  open() that defaults to locale encoding — cp1252 on Windows).
  No --exit-zero, no `|| true`; exit code propagates to the
  required-check gate.

- windows-footguns: runs scripts/check-windows-footguns.py --all
  (380 files, stdlib-only, <2s). Covers 11 Windows-unsafe
  primitives — os.kill(pid, 0) bpo-14484 footgun, os.killpg,
  os.setsid/setpgrp, signal.SIGKILL/SIGHUP/SIGUSR* without
  getattr fallback, shebang scripts via subprocess, wmic without
  shutil.which guard, hardcoded ~/Desktop OneDrive trap, bare
  open() without encoding=, etc.

Both jobs pin actions by SHA to match repo convention.
tests/test_lint_config.py::test_workflow_has_blocking_ruff_step
now finds the blocking step and passes.
2026-05-08 14:19:23 -07:00
Teknium 37f509d2bb test: migrate stale os.kill monkeypatches to gateway.status._pid_exists
PR #21561 migrated liveness probes across 14 call sites from
`os.kill(pid, 0)` to `gateway.status._pid_exists` (psutil-first) so
the gateway doesn't Ctrl+C-itself on Windows via bpo-14484. A handful of
tests still patched the old `os.kill` seam and either happened to pass
on POSIX (when PID 12345 incidentally wasn't alive on the CI worker) or
failed outright — on CI runs they surfaced as 7 flaky/stable failures.

Migrate each affected test to patch the correct seam:

- tests/tools/test_browser_orphan_reaper.py (5 tests)
    Patch `gateway.status._pid_exists` instead of `os.kill`.
    Rename test_permission_error_on_kill_check_skips to
    test_alive_legacy_daemon_is_reaped — the old assertion was
    "PermissionError on sig 0 → skip dir"; post-migration the
    untracked-alive-daemon path always reaps the dir after SIGTERM
    (best-effort semantics were preserved).

- tests/tools/test_windows_native_support.py (4 tests)
    Replace tests that asserted `os.kill` seam behavior with tests
    that exercise `ProcessRegistry._is_host_pid_alive` as a
    delegator and split out a new TestPidExistsOSErrorWidening class
    that hits `gateway.status._pid_exists` directly via the POSIX
    fallback branch (so Windows-style `OSError(WinError 87)` + `PermissionError`
    widening is still covered on Linux CI).

- tests/tools/test_process_registry.py (1 test)
    Mock `psutil.Process` + `_pid_exists` instead of `os.kill`
    for the detached-session kill path.

- tests/tools/test_mcp_stability.py::test_kill_orphaned_uses_sigkill_when_available
    SIGTERM → alive-check → SIGKILL flow now uses `_pid_exists`
    for the middle step; assertion count drops from 3 to 2.

- tests/gateway/test_status.py::TestScopedLocks (2 tests)
    `acquire_scoped_lock` consults `_pid_exists`; patch that
    seam directly instead of trying to control the nested psutil
    call via os.kill monkeypatch.

- tests/hermes_cli/test_gateway.py::test_stop_profile_gateway_keeps_pid_file_when_process_still_running
    The stop loop sends one SIGTERM via os.kill then polls 20x via
    _pid_exists; instrument both separately. Old assertion
    `calls["kill"] == 21` split into `kill == 1` + `alive_probes == 20`.

- tests/hermes_cli/test_auth_toctou_file_modes.py::test_shared_nous_store_writes_0o600_with_0o700_parent
    Commit c34884ea2 switched the pytest seat-belt guard in
    `_nous_shared_store_path()` from `Path.home() / ".hermes"`
    to `get_default_hermes_root()`, which honors HERMES_HOME. The
    test sets both HERMES_HOME and HERMES_SHARED_AUTH_DIR to
    subpaths of the same tmp_path, and the override now collapses
    onto the same path the guard is refusing. Renamed the override
    subdirectory so the two paths diverge — guard passes, test runs.

All 21 original CI failures and their local-flaky siblings now pass
(278 tests across the touched files, 0 failures).
2026-05-08 14:18:41 -07:00
Teknium dc25ab7de2 fix(skills): move platforms key out of folded description: > scalars
The platforms-frontmatter sweep inserted 'platforms: [linux, macos, windows]'
immediately after 'description: >' on 5 optional-skills, landing inside the
folded scalar and breaking YAML parsing. docs-site-checks tripped on
one-three-one-rule/SKILL.md and would have failed on the other 4 in turn.

Fixed files:
- optional-skills/communication/one-three-one-rule/SKILL.md
- optional-skills/health/fitness-nutrition/SKILL.md
- optional-skills/health/neuroskill-bci/SKILL.md
- optional-skills/research/drug-discovery/SKILL.md
- optional-skills/security/oss-forensics/SKILL.md

Moved each platforms line below the closing of the description block.
All 161 SKILL.md files across the repo now parse as valid YAML.
2026-05-08 14:02:09 -07:00
Teknium a6168c2a0a fix(install.ps1): strip UTF-8 BOM that broke [scriptblock]::Create
Commit 3dfb35700 accidentally saved scripts/install.ps1 with a UTF-8 BOM
(EF BB BF) at byte 0.  PowerShell's normal file-execution path (`& .\install.ps1`)
handles BOMs fine, but the curl-and-iex one-liner documented in the README
uses `[scriptblock]::Create((irm ...))` which does NOT strip BOMs — the
BOM lands inside the param() block and fails with 'The assignment
expression is not valid' on $Branch and $HermesHome.

teknium1 hit this trying to reinstall from the PR branch after Brooklyn's
commits landed.  Every user trying the PR branch install-one-liner hit
it too until we notice.

Saved without BOM, verified via xxd: file now starts with '# =====' at
byte 0 instead of EF BB BF.
2026-05-08 13:41:17 -07:00
Teknium 7412878aca feat(windows uninstall): clean up User env, PATH, Scheduled Task, and portable tooling
`hermes uninstall` was POSIX-only.  On Windows it would leave four classes
of installer debris behind that the user had to scrub manually:

1. Scheduled Task and/or Startup-folder .cmd entry that installer.ps1
   dropped for `hermes gateway install`.  Left running at next logon
   even after uninstall, pointing at deleted code paths.
2. User-scope PATH entries for the Hermes venv, PortableGit (cmd, bin,
   usr\bin), and bundled Node, all written to HKCU\Environment\Path.
3. User-scope env vars HERMES_HOME and HERMES_GIT_BASH_PATH, same
   registry key.
4. PortableGit and Node copies under %LOCALAPPDATA%\hermes\ (~200MB),
   plus gateway-service/ scratch dir.

Fixes:

- `uninstall_gateway_service()` gets a Windows branch that calls into
  `gateway_windows.stop()` + `gateway_windows.uninstall()`, which already
  know how to remove both schtasks entries and Startup-folder .cmd files
  and how to stop any running detached pythonw gateway.
- `remove_path_from_windows_registry(hermes_home)` reads HKCU\Environment
  via winreg, strips any PATH entry whose path-prefix matches the
  installer-owned markers (\hermes-agent, \git, \node, \venv under the
  current HERMES_HOME), and writes the cleaned value back.  Preserves
  REG_EXPAND_SZ vs REG_SZ so unexpanded %VARS% in the user's PATH
  survive.  No PowerShell subprocess, no fragile `reg query` parsing.
- `remove_hermes_env_vars_windows()` deletes HERMES_HOME and
  HERMES_GIT_BASH_PATH from the same key.
- `remove_portable_tooling_windows(hermes_home)` rmtree's
  `hermes_home/git`, `hermes_home/node`, `hermes_home/gateway-service`
  — they're installer artifacts, not user data, so they get removed in
  BOTH "keep data" and "full uninstall" modes.

Wired these into `run_uninstall()` guarded by `_is_windows()` so
POSIX paths are untouched.  Also fixed the closing "Reload your shell"
footer to point Windows users at opening a new terminal (PATH changes
don't propagate into the current PowerShell session) with the
PowerShell install one-liner instead of bash's curl-pipe.

Verified on Delta-1 (Windows 10) via preview script: correctly
identifies 4 Hermes-installed PATH entries out of 13 total to remove,
leaves Python/LM Studio/ripgrep/ffmpeg/winget entries alone.
2026-05-08 13:26:42 -07:00
Teknium b2bdf274f7 fix(windows): gateway status dedup + install.ps1 platform-SDK bootstrap
## Two residual Windows fixes that were hanging from earlier commits.

### 1. `hermes gateway status` reported 2 PIDs per gateway — TWO bugs compounded

Diagnosed with psutil parent/child walk against live gateway PIDs:

**Bug A (the real one): `_get_parent_pid` silently failed on Windows.**
The helper shelled out to `ps -o ppid= -p <pid>`, which doesn't exist
on Windows — `FileNotFoundError` → returns `None` → the ancestor walk
terminated at `os.getpid()` alone.  Consequence: the PID table scan in
`_scan_gateway_pids` couldn't filter out `hermes gateway status`'s own
launcher stub (a venv `pythonw.exe`/`python.exe` that matches the same
`-m hermes_cli.main gateway` pattern as the gateway).  Every status
call saw "itself" as a second gateway.

Fix: `_get_parent_pid` now calls `psutil.Process(pid).ppid()` first
(psutil is a core dependency since 3dfb35700) and falls back to `ps`
only when `shutil.which("ps")` succeeds — matching the Windows-footgun
checker's "always guard `ps` / `wmic` / etc. with `shutil.which`" rule.

Before: `Gateway process running (PID: 21952, 46880)` — 46880 changing
on every call (the status invocation's own launcher, which died by the
time the next status call looked).

After (5 consecutive calls):
```
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
```

Ancestor walk on the fix: 14 PIDs (full chain through bash/explorer)
instead of the broken 1-PID set.

**Bug B (the cosmetic one): venv-launcher dedup.** Standard Windows
CPython venv behaviour is that `<venv>/Scripts/pythonw.exe` is a ~5 MB
launcher stub that spawns the base Python (`C:\\Program Files\\Python311
\\pythonw.exe`) with the same command line and waits.  Our process
scanner sees two PIDs for every gateway: launcher + interpreter, same
cmdline.  Bug A masked this by accidentally counting the status call
AS one of them; with Bug A fixed, we see both the real launcher and
real interpreter for the gateway process itself.

Fix: `_filter_venv_launcher_stubs` at the tail of `_scan_gateway_pids`
walks each matched PID's ppid via psutil.  Any PID that's the PARENT
of another matched PID is a launcher stub — drop it, keep the child.
Scoped to Windows (`is_windows() and len(pids) > 1`) and no-ops when
psutil isn't importable.

Net effect: `gateway status` now reports one PID per gateway — the
interpreter — matching POSIX behaviour and user expectations.

### 2. `install.ps1`: bootstrap pip + auto-install platform SDKs

New `Install-PlatformSdks` function wired between `Invoke-SetupWizard`
and `Start-GatewayIfConfigured`.  Fixes two related issues on fresh
Windows installs:

1. The tiered `uv pip install` cascade (introduced in 87fca8342)
   correctly falls through when tier 1 `.[all]` fails on the RL git
   deps, but the fallback tiers can silently skip SDKs from `[messaging]`
   when there's a partial-resolve.  Result: user sets `DISCORD_BOT_TOKEN`
   in `.env`, fires up gateway, hits "discord module not installed".

2. `uv` creates venvs WITHOUT pip by default, so the user's escape
   hatch (`pip install discord.py` in the venv) doesn't exist either.

The new function:
- Skips if `-NoVenv` (nothing to bootstrap into).
- Scans `~/.hermes/.env` for messaging tokens (TELEGRAM_BOT_TOKEN,
  DISCORD_BOT_TOKEN, SLACK_BOT_TOKEN, SLACK_APP_TOKEN, WHATSAPP_ENABLED),
  filtering placeholder values.
- For each token that's set, runs `python -c "import <sdk>"` to verify.
- If any import fails: runs `python -m ensurepip --upgrade` to bootstrap
  pip into the venv (idempotent — no-ops if pip is already present),
  then `pip install <spec>` for each missing SDK with specs mirroring
  pyproject.toml's `[messaging]` extra to avoid version drift.

The `$ErrorActionPreference = "SilentlyContinue"` spans are not
cosmetic — PowerShell wraps native-stderr from a non-zero-exit
subprocess as a `NativeCommandError` that prints even through
`*> $null` / `2>$null`.  Save + restore EAP over the import-probe
and pip-install blocks keeps the output clean.

Verified on this Windows 10 box:
- Initial state: telegram+fastapi+psutil present, discord+slack_sdk
  missing (tier 1 `.[all]` had failed — `.tirith-install-failed`
  marker in `%LOCALAPPDATA%\\hermes`).
- First run with discord+slack tokens in .env: detects both missing,
  ensurepip (skipped — pip was already bootstrapped earlier this
  session for telegram), installs `discord.py[voice]==2.7.1` +
  `PyNaCl` + `davey`, installs `slack-sdk==3.41.0`. All imports
  succeed on verify.
- Second run: all three SDKs report OK, function no-ops.

Pip spec strings mirror pyproject.toml's `[messaging]` extra verbatim
so a bump to the extra picks up here automatically — no drift.

### Files

- `hermes_cli/gateway.py`: `_get_parent_pid` rewritten (psutil-first);
  `_filter_venv_launcher_stubs` added; `_scan_gateway_pids` dedups
  launchers on Windows when it finds >1 match.
- `scripts/install.ps1`: new `Install-PlatformSdks` function (~85
  lines); wired into the main flow at line 1438.

### Verification

- `venv/Scripts/python.exe scripts/check-windows-footguns.py --all`
  → `✓ No Windows footguns found (380 file(s) scanned).`
- `ast.parse` passes on gateway.py.
- `[System.Management.Automation.Language.Parser]::ParseFile` passes
  on install.ps1.
- Live gateway (PID 21952, running since 12:33 today) survived 5x
  stress loop of `hermes gateway status` without dying.
2026-05-08 13:10:34 -07:00
Teknium 3dfb357001 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
Teknium 1cbe399149 fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper
On Windows, Python's ``os.kill(pid, 0)`` is NOT a no-op. CPython's
implementation (``Modules/posixmodule.c::os_kill_impl``) treats sig=0
as ``CTRL_C_EVENT`` because the two integer values collide at the C
layer, and routes it through ``GenerateConsoleCtrlEvent(0, pid)`` —
which sends a Ctrl+C to the ENTIRE console process group containing
the target PID, not just the PID itself. Any caller that wanted to
check "is PID X alive" via the classic POSIX ``os.kill(pid, 0)``
idiom was silently killing that process (and often unrelated
processes in the same console group) on Windows. Long-standing
Python Windows quirk; see bpo-14484 (open since 2012).

This manifested in Hermes as: every ``hermes gateway status``
invocation would read the gateway's PID from the PID file, call
``os.kill(pid, 0)`` via ``gateway.status.get_running_pid()`` as a
"liveness check", and instantly terminate the gateway it was trying
to report on. No shutdown log, no traceback, no atexit hook fire,
no exit-diag entry — just silent termination of the detached pythonw
process. "Bot answered one message then stopped typing" was the
characteristic end-user symptom because `os.kill(pid, 0)` fires
mid-response-send and kills the gateway between logs.

Reproduction (verified in this branch before the fix):

  $ hermes gateway start       # gateway alive, PID 37520
  $ hermes gateway status      # reports "No gateway process detected"
  $ tasklist /FI "PID eq 37520"  # INFO: No tasks are running
                                 # — gateway terminated silently

Root-cause fix is a new ``gateway.status._pid_exists(pid)`` helper:

- On Windows: Win32 ``OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION |
  SYNCHRONIZE, False, pid)`` + ``WaitForSingleObject(handle, 0)``
  via ctypes. Zero signal delivery, zero console-group side effects.
  Pins ctypes return types to avoid DWORD-vs-signed-int parse bugs
  on WAIT_TIMEOUT (0x102). Distinguishes ERROR_INVALID_PARAMETER
  (PID gone) from ERROR_ACCESS_DENIED (alive but another user).
- On POSIX: the canonical ``os.kill(pid, 0)`` idiom that actually is
  a no-op there.

Then patch every ``os.kill(pid, 0)`` liveness-check callsite to
route through ``_pid_exists`` instead. Total 14 callsites across
11 files; every single one was a latent silent-kill on Windows:

  gateway/run.py:2810      — /restart watcher (inline subprocess)
  gateway/run.py:15195     — --replace wait loop
  gateway/status.py:572    — acquire_gateway_runtime_lock stale check
  gateway/status.py:828    — get_running_pid (THE killer for status)
  gateway/platforms/whatsapp.py:111
  hermes_cli/gateway.py:228, 522, 1012  — gateway-related drain loops
  hermes_cli/kanban_db.py:2826         — _pid_alive was claiming to
                                         be cross-platform but used
                                         os.kill(pid, 0) on Windows
  hermes_cli/main.py:5792        — CLI process-kill polling
  hermes_cli/profiles.py:782     — profile stop wait loop
  plugins/google_meet/process_manager.py:74
  tools/browser_tool.py:1215, 1255  — browser daemon ownership probes
  tools/mcp_tool.py:1255, 3374     — MCP stdio orphan tracking

The watcher source in gateway/run.py:2810 is a multi-line string
that gets spawned as an inline ``python -c "..."`` subprocess, so
it can't import gateway.status. The fix for that callsite inlines
the same ctypes probe directly into the watcher source.

Tested on Windows 10 with the hermes gateway + Telegram bot:
- gateway start → alive
- 5 consecutive ``hermes gateway status`` invocations → gateway
  alive after every one, same PID reported each time (37520, 21952)
- gateway.log shows uninterrupted operation; no spurious shutdown
  entries; cron ticker and kanban dispatcher still running on
  their 60-second cadence
- bot continues answering Telegram messages throughout

Ships alongside an exit-path diagnostic wrapper in
``hermes_cli/gateway.py::run_gateway()`` that captures every way
``asyncio.run(start_gateway(...))`` can return (success, SystemExit,
KeyboardInterrupt, BaseException, atexit) with full traceback to
``logs/gateway-exit-diag.log``. This was used to prove the gateway
was being hard-killed externally (no exit event fired) and should
be kept for future Windows debugging.

Refs: https://bugs.python.org/issue14484
See also: references/windows-subprocess-sigint-storm.md in
the hermes-agent skill.
2026-05-08 12:34:27 -07:00
Teknium ac178b78c4 feat(windows): gateway as a Scheduled Task + Startup-folder fallback
Hermes gateway now installs as a real Windows service via
`hermes gateway install`, auto-starts on user logon, and stays running
across reboots. Mirrors the launchd (macOS) / systemd (Linux) contract
so the rest of the CLI dispatcher just plugs into the same `install /
uninstall / start / stop / restart / status` entrypoints.

Primary implementation is the new `hermes_cli/gateway_windows.py`:

- `schtasks /Create /SC ONLOGON /RL LIMITED /RU <user> /NP /IT` creates
  a per-user Scheduled Task running as the current user at next logon,
  with no UAC prompt and no stored password. Same pattern OpenClaw uses.
- When `schtasks /Create` returns "Access is denied" or times out
  (locked-down corporate boxes, 15s/30s hard + no-output cutoffs),
  fall back to writing a `.cmd` file into
  `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`, which
  Windows Explorer fires at every logon. Either path produces the same
  end-user experience.
- `_spawn_detached()` launches `pythonw.exe -m hermes_cli.main gateway
  run --replace` directly with `DETACHED_PROCESS |
  CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW |
  CREATE_BREAKAWAY_FROM_JOB` + DEVNULL stdio + sidecar
  `logs/gateway-stdio.log`. Going through pythonw.exe (no console)
  instead of a cmd.exe shim is what lets the gateway survive the
  spawning shell's exit on Windows — documented in
  `references/windows-subprocess-sigint-storm.md`.
- Two separate quoting helpers for cmd.exe vs schtasks (`/TR` argument)
  — they're different parsers and mixing breaks both. Same split
  OpenClaw documents in src/daemon/schtasks.ts.
- `_wait_for_gateway_ready()` + `_report_gateway_start()` poll for a
  live gateway process after spawn and report the PID, so install
  doesn't lie about success.

Dispatcher wiring in `hermes_cli/gateway.py`:

- `_gateway_command_inner()` gets Windows branches for install /
  uninstall / start / stop / restart / status + `_is_service_installed`
  + `_is_service_running`. `gateway status` output + suggested
  commands now mention `hermes gateway install` instead of
  `sudo hermes gateway install --system` on Windows.

Two separable Windows fixes that only matter for a working
detached gateway, bundled here because shipping them independently
leaves install broken:

(1) Spurious CTRL_C_EVENT on detached pythonw runs. When the gateway
is launched detached on Windows, something on the boot path (HTTPX /
python-telegram-bot / asyncio ProactorEventLoop subprocess plumbing)
synthesizes a Ctrl+C within ~60-90 seconds. Python 3.11 translates it
into KeyboardInterrupt inside `asyncio.run(start_gateway(...))`, the
outer `except KeyboardInterrupt: return` exits cleanly, and the
process dies with no shutdown log — "bot started typing, then
stopped" is the fingerprint because the interrupt fires mid-send.
Fix in `run_gateway()`: when `is_windows()` and stdin is not a TTY,
install `signal.signal(SIGINT, SIG_IGN)` + same for SIGBREAK. Real
console runs have a TTY and skip the absorber, so user Ctrl+C still
works interactively. Same family as commit 449ad952b's browser-tool
SIGINT absorber; cross-referenced in the ref doc.

(2) `wmic process get` is the process-list path used by
`_scan_gateway_pids()` / `find_gateway_pids()`, which power status,
stop, and restart on Windows. `C:\Windows\System32\wbem\WMIC.exe` has
been deprecated since Windows 10 21H1 and is not installed on modern
Win 10/11 boxes, so `find_gateway_pids()` silently returns [] — status
sees no gateway even when one is running. Fix: `shutil.which("wmic")`
first, fall back to PowerShell's `Get-CimInstance Win32_Process`
emitting the same LIST-style `CommandLine=...` / `ProcessId=...` pairs
the downstream parser already handles. Zero behavior change on boxes
where wmic still works.

Verified end-to-end on Windows 10 (Delta-1):
- `hermes gateway install` → falls back to Startup folder (access
  denied on schtasks for this user) + detached pythonw spawn, PID
  reported correctly.
- Gateway connects to Telegram, answers messages, stays alive past
  2min (previously died at ~85s with no shutdown log).
- `hermes gateway stop` + `uninstall` both clean up both tracks.

Refs: openclaw/openclaw src/daemon/schtasks.ts for the ONLOGON +
startup-folder-fallback pattern. skill hermes-agent
references/windows-subprocess-sigint-storm.md for the deeper
CTRL_C_EVENT / ProactorEventLoop background.
2026-05-08 12:04:50 -07:00
Teknium 87fca8342a fix(windows installer): UTF-8 BOM, tiered extras, skip tinker-atropos by default
install.ps1 had three related problems that compounded into `hermes dashboard`
failing to boot on Windows with 'No module named fastapi':

1. UTF-8 BOM missing.  Windows PowerShell 5.1 (the default on Windows 10/11,
   which is what `irm | iex` runs under) reads files without a BOM as
   cp1252.  install.ps1 has em-dashes, arrows, check marks, etc. — PS 5.1
   mangled them and the file failed to parse.  Added UTF-8 BOM so PS 5.1,
   PS 7, and the in-memory `irm | iex` path all read the file identically.

2. `uv pip install -e .[all]` had a single-tier silent fallback to bare
   `.` on any failure, with `2>&1 | Out-Null` swallowing the error.  Any
   transient extras install failure (network hiccup, wheel build issue,
   etc.) would drop every optional extra including [web], and the installer
   would still print 'Main package installed'.  Replaced with a four-tier
   fallback (.[all] -> PyPI-only extras -> dashboard+core -> bare) that
   prints output at every step and a targeted [web] verify+repair at the
   end so `hermes dashboard` specifically is never silently broken.

3. tinker-atropos was installed unconditionally after the main install.
   tinker-atropos/pyproject.toml pulls atroposlib and tinker from
   git+https://github.com/... which can fail on locked-down networks,
   flaky DNS, or rate-limited github.com and would half-install the venv.
   install.sh already skipped it by default with a one-liner for users
   who actually do RL training — install.ps1 now matches that behavior.

Parse-checked clean under Windows PowerShell 5.1.26100.8115
(5318 tokens, 0 parse errors).
2026-05-08 12:04:50 -07:00
Teknium acc0a81624 fix(windows): browser tool + spurious SIGINT from subprocess spawning
Three related Windows-only fixes that together make the browser toolset
actually usable on Windows. Symptom chain: user invokes browser_navigate
-> tool returns {"success": false, "error": "Daemon process exited
during startup with no error output"} and the CLI exits mid-turn with
the session summary.

Root cause (3 layers):

1. tools/browser_tool.py::_find_agent_browser() resolved
   node_modules/.bin/agent-browser to the extensionless POSIX shell
   shim via Path.exists(). On Windows, CreateProcessW cannot execute
   that script (WinError 193 "not a valid Win32 application"). Fix:
   delegate to shutil.which with path=node_modules/.bin so PATHEXT
   picks up agent-browser.CMD on Windows and the extensionless shim
   stays correct on POSIX.

2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the
   parent hermes.exe whenever a background thread spawns a .cmd
   subprocess. Python 3.11's default SIGINT handler raises
   KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's
   app.run() -> cli.py::run()'s finally block calls _run_cleanup()
   -> _emergency_cleanup_all_sessions -> spawns a concurrent
   _run_browser_command("close", ...) on the same session the agent
   thread just opened. Two agent-browser processes race on the same
   --session name, the daemon startup loses, and the tool returns
   the "Daemon process exited during startup" error. Fix: install a
   Windows-only SIGINT handler that absorbs the signal silently.
   Real user Ctrl+C still routes through prompt_toolkit's own c-c
   keybinding at the TUI layer, which is how Claude Code handles the
   same quirk (driving cancellation via the TUI key handler, not
   signals).

3. In tools/browser_tool.py, both Popen sites now pass
   creationflags=CREATE_NO_WINDOW | STARTF_USESTDHANDLES with
   close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd
   console flash; STARTF_USESTDHANDLES + close_fds ensures the child
   inherits only our three chosen handles (DEVNULL stdin, temp-file
   stdout/stderr) and no leaked parent console handles that could
   confuse agent-browser's native daemon spawn. Notably we do NOT
   add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag
   interacts badly with asyncio's ProactorEventLoop and makes things
   worse.

Verified end-to-end on Windows 10 / Windows Terminal / PowerShell:
browser_navigate to https://example.com returns
{"success": true, "title": "Example Domain"} and the CLI stays alive
for follow-up tool calls and assistant turns.

Refs: earlier Windows quirks commits 1cebb3bad (Ctrl+Enter newline),
26f5af52a (environment hints), aefd1a37f (Playwright Chromium).
2026-05-08 12:04:50 -07:00
emozilla c34884ea20 auth: use get_default_hermes_root() for shared nous_auth.json path
Replace hardcoded ~/.hermes/shared/ references with
get_default_hermes_root() / 'shared' so the cross-profile Nous auth
store lands in the correct location on every platform:

- Linux/macOS: ~/.hermes/shared/
- native Windows: %LOCALAPPDATA%\hermes\shared- Docker / custom HERMES_HOME: <root>/shared/

Updates _nous_shared_auth_dir(), the pytest seat-belt in
_nous_shared_store_path(), and the auth_add_command comment to match.
Previously Windows installs wrote to ~/.hermes/shared/ even though the
rest of the CLI uses %LOCALAPPDATA%\hermes, so profiles couldn't see
each other's shared credential.
2026-05-08 13:28:06 -04:00
Teknium b9bac87d5a feat(skills): declare platforms frontmatter for all 79 undeclared built-in skills
Completes the Windows-gating coverage for the built-in skills/ tree. Every
bundled SKILL.md now carries an explicit platforms: declaration so the
loader (agent.skill_utils.skill_matches_platform) can skip-load skills
that don't fit the current OS.

74 skills declared cross-platform (platforms: [linux, macos, windows]):
  Creative (16): ascii-art, ascii-video, architecture-diagram, baoyu-comic,
    baoyu-infographic, claude-design, creative-ideation, design-md,
    excalidraw, humanizer, manim-video, p5js, pixel-art,
    popular-web-designs, pretext, sketch, songwriting-and-ai-music,
    touchdesigner-mcp
  Autonomous agents: claude-code, codex, hermes-agent, opencode
  Data/devops: jupyter-live-kernel, kanban-orchestrator, kanban-worker,
    webhook-subscriptions, dogfood, codebase-inspection
  GitHub: github-auth, github-code-review, github-issues,
    github-pr-workflow, github-repo-management
  Media: gif-search, heartmula, songsee, spotify, youtube-content
  MCP / email / gaming / notes / smart-home: native-mcp, himalaya,
    pokemon-player, obsidian, openhue
  mlops (non-broken): weights-and-biases, huggingface-hub, llama-cpp,
    outlines, segment-anything-model, dspy, trl-fine-tuning
  Productivity: airtable, google-workspace, linear, maps, nano-pdf,
    notion, ocr-and-documents, powerpoint
  Red-teaming / research: godmode, arxiv, blogwatcher, llm-wiki,
    polymarket
  Software-dev: debugging-hermes-tui-commands, hermes-agent-skill-authoring,
    node-inspect-debugger, plan, requesting-code-review, spike,
    subagent-driven-development, systematic-debugging,
    test-driven-development, writing-plans
  Misc: yuanbao

5 skills gated from Windows (platforms: [linux, macos]):
  mlops/inference/vllm (serving-llms-vllm)
    vLLM is officially Linux-only; Windows requires WSL.
  mlops/training/axolotl
    Axolotl's flash-attn + deepspeed + bitsandbytes stack is Linux-first.
  mlops/training/unsloth
    Requires Triton + xformers + flash-attn — Linux only in practice.
  mlops/models/audiocraft (audiocraft-audio-generation)
    torchaudio ffmpeg backend + encodec dependencies are Linux-first.
  mlops/inference/obliteratus
    Research abliteration workflow; relies on Linux-focused pytorch
    kernels and MLX — no first-class Windows path.

Same strict-over-lenient policy as the optional-skills sweep: when the
underlying tool's Windows support is rough, missing, or WSL-only, gate the
skill. Easier to un-gate after verified Windows support lands than to leak
partial support that manifests as mid-task failures.

Combined with prior commits in this branch, every bundled SKILL.md
(skills/ + optional-skills/) now has a platforms: declaration.
2026-05-08 09:23:27 -07:00
Teknium 31224b9b5c feat(optional-skills): declare platforms frontmatter for all 63 undeclared skills
Extends the Windows-gating work to the optional-skills/ tree. Every
SKILL.md that previously omitted the platforms: field now carries an
explicit declaration, which Hermes's loader (agent.skill_utils.
skill_matches_platform) honors to skip-load on incompatible OSes.

58 skills declared cross-platform (platforms: [linux, macos, windows]):
  autonomous-ai-agents/blackbox, autonomous-ai-agents/honcho
  blockchain/base, blockchain/solana
  communication/one-three-one-rule
  creative/blender-mcp, creative/concept-diagrams, creative/hyperframes,
  creative/kanban-video-orchestrator, creative/meme-generation
  devops/cli (inference-sh-cli), devops/docker-management
  dogfood/adversarial-ux-test
  email/agentmail
  finance/3-statement-model, finance/comps-analysis, finance/dcf-model,
  finance/excel-author, finance/lbo-model, finance/merger-model,
  finance/pptx-author
  health/fitness-nutrition, health/neuroskill-bci
  mcp/fastmcp, mcp/mcporter
  migration/openclaw-migration
  mlops/accelerate, mlops/chroma, mlops/clip, mlops/guidance,
  mlops/hermes-atropos-environments, mlops/huggingface-tokenizers,
  mlops/instructor, mlops/lambda-labs, mlops/llava, mlops/modal,
  mlops/peft, mlops/pinecone, mlops/pytorch-lightning, mlops/qdrant,
  mlops/saelens, mlops/simpo, mlops/stable-diffusion
  productivity/canvas, productivity/shop-app, productivity/shopify,
  productivity/siyuan, productivity/telephony
  research/domain-intel, research/drug-discovery, research/duckduckgo-search,
  research/gitnexus-explorer, research/parallel-cli, research/scrapling
  security/1password, security/oss-forensics, security/sherlock
  web-development/page-agent

5 skills gated from Windows (platforms: [linux, macos]):
  mlops/flash-attention   - Flash Attention wheels are Linux-first; Windows
                            install requires building from source with CUDA
  mlops/faiss             - faiss-gpu has no Windows wheel; gate rather than
                            leak partial (faiss-cpu) support
  mlops/nemo-curator      - NVIDIA NeMo ecosystem has no first-class Windows path
  mlops/slime             - Megatron+SGLang RL stack is Linux-only in practice
  mlops/whisper           - openai-whisper + ffmpeg setup on Windows is
                            non-trivial; gate until Windows install stanza lands

Methodology: scanned every SKILL.md for Windows-hostile signals
(apt-get, brew, systemd, osascript, ptrace, X11 binaries, POSIX-only
Python APIs, Docker POSIX $(pwd) bind-mounts, explicit 'linux-only' /
'macos-only' text). 3 skills flagged as having hard signals on review:
docker-management and qdrant only had POSIX $(pwd) docker examples and
the tools themselves (Docker Desktop, Qdrant) run fine on Windows —
declared ALL. whisper had an apt/brew ffmpeg install path and nothing
else but the openai-whisper Windows install story is rough enough to
warrant gating.

Strict-over-lenient policy: when in doubt, gate. Easier to un-gate after
verified Windows support lands than to leak partial support that
manifests as mid-task failures for Windows users.
2026-05-08 09:16:33 -07:00
Teknium 3e823d5b3e feat(skills): gate 7 Linux/macOS-only skills from Windows via platforms frontmatter
Hermes's skill loader (agent/skill_utils.skill_matches_platform) already honors
the 'platforms:' frontmatter field and skip-loads skills whose declared
platform list doesn't include sys.platform. Seven bundled skills are in fact
Linux/macOS-only but never declared it, so they leak into Windows skill
listings and sometimes load with broken instructions.

Audited all 160 SKILL.md files (skills/ + optional-skills/) for Windows-
hostile signals: apt-get/brew/systemd/chmod+x install flows, ptrace/proc
runtime dependencies, bash-only launcher scripts, and package dependencies
with no Windows build. The 7 below fail one or more of those tests in a way
that fundamentally can't be papered over by docs edits:

  minecraft-modpack-server      bash start.sh + chmod +x + apt openjdk
  evaluating-llms-harness       lm-eval-harness bash launcher scripts
  distributed-llm-pretraining-
  torchtitan                    bash multi-node torchrun launcher
  python-debugpy                remote attach relies on /proc ptrace_scope
  pytorch-fsdp                  NCCL backend; Windows path is WSL only
  tensorrt-llm                  NVIDIA TensorRT-LLM has no Windows build
  searxng-search                Docker volume flow assumes POSIX $(pwd)

All seven get 'platforms: [linux, macos]'. On Windows the loader now skips
them silently — no more phantom skill listings, no more mid-task failures
because an Apple-only path was surfaced as a suggestion.

Cross-platform skills that merely CONTAIN signals in examples or
install-instructions (brew install as one of several paths, /tmp/ in a code
snippet, etc.) are NOT touched by this commit. A broader audit that
declares the ~140 cross-platform skills as 'platforms: [linux, macos,
windows]' can follow as a separate change once each has been verified
working on Windows.

The installed user copies under ~/AppData/Local/hermes/skills/ (when they
exist) are also patched so the running session reflects the gating
immediately, but only the in-repo files are committed here.
2026-05-08 08:27:23 -07:00
Teknium aefd1a37f4 fix(windows): auto-install Playwright Chromium + surface it in doctor
scripts/install.sh runs 'npx playwright install --with-deps chromium'
on every Linux distro after the npm-install step, which is why browser
tools Just Work on Linux.  scripts/install.ps1 never did the equivalent
step, so on native Windows installs check_browser_requirements() in
tools/browser_tool.py would return False (no Chromium under
%LOCALAPPDATA%\ms-playwright) and every browser_* tool got silently
filtered out of the agent's tool schema — no error, no log entry, user
just wondered why the tools didn't exist.

Two-part fix:

1. scripts/install.ps1: after 'npm install' in InstallDir succeeds, run
   'npx playwright install chromium'.  Resolves npx via the same
   execution-policy-aware logic already used for npm (prefer npx.cmd
   next to npmExe, fall back to Get-Command).  Surfaces a warning +
   manual-recovery hint when the install fails, matching install.sh
   behaviour for distros.

2. hermes_cli/doctor.py: after the agent-browser check, lazily import
   tools.browser_tool and reuse the exact same _chromium_installed()
   predicate check_browser_requirements() uses, so the doctor signal
   cannot drift from the runtime gate.  Skip the check when Camofox /
   CDP override / a cloud provider / Lightpanda is configured (those
   bypass local Chromium).  On missing Chromium, the hint is
   platform-correct: '--with-deps' on POSIX, plain 'install chromium'
   on win32.

Verified on Windows 10:
- 'npx playwright install chromium' completes successfully, drops
  Chrome Headless Shell under %LOCALAPPDATA%\ms-playwright
- check_browser_requirements() flips from False -> True
- 'hermes doctor' now prints either '✓ Playwright Chromium (browser
  engine)' or '⚠ Playwright Chromium not installed' + fix command
- tests/hermes_cli/test_doctor.py: 38/38 pass
- tests/tools/test_browser_chromium_check.py: 16/16 pass
2026-05-08 07:56:35 -07:00
Teknium ec3f7d1a89 docs: add Windows-Specific Quirks section to hermes-agent skill + keystroke diagnostic
Adds a dedicated '## Windows-Specific Quirks' section to the hermes-agent
skill so Windows pitfalls have one discoverable place to evolve. Inaugural
entries cover:

- Input / keybindings — Alt+Enter intercepted by Windows Terminal,
  Ctrl+Enter as the Windows newline keystroke, mintty/git-bash behavior,
  pointer to scripts/keystroke_diagnostic.py for investigation.
- Config / files — UTF-8 BOM HTTP-400 trap.
- execute_code / sandbox — WinError 10106 SYSTEMROOT root cause +
  _WINDOWS_ESSENTIAL_ENV_VARS fix location.
- Testing / contributing — scripts/run_tests.sh POSIX-venv limitation and
  the system-Python workaround, POSIX-only test skip-guard patterns.
- Path / filesystem — line-ending warnings (cosmetic), forward-slash
  portability.

Collapses the old scattered Windows bullets under 'Platform-specific
issues' into a single pointer at the new dedicated section so there's
only one place to maintain this content.

Also adds the scripts/keystroke_diagnostic.py the skill now references —
a small prompt_toolkit Application that prints the Keys.* identifier and
raw escape bytes for every keystroke. Used to establish the Ctrl+Enter
= c-j fact on Windows Terminal; generally useful for anyone adding a
platform-aware keybinding.
2026-05-08 07:13:07 -07:00
Teknium 1cebb3bad8 feat: Ctrl+Enter inserts newline on Windows Terminal
Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving
Windows users with no Enter-involving way to insert a newline in the Hermes
prompt. Fix it by reclaiming c-j on Windows only:

- _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where
  thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows
  plain Enter is always c-m, so c-j is free.
- Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends
  Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal
  settings changes required.
- Alt+Enter binding unchanged; still works on mac/Linux/WSL.
- Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler
  split into platform-aware assertions for POSIX vs win32.
- Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit
  even on POSIX) to point Windows users at Ctrl+Enter.

Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline,
since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No
conflicting Hermes binding existed for Ctrl+J, so this is a harmless side
effect.
2026-05-08 06:23:25 -07:00
Teknium 26f5af52a8 feat: enrich system-prompt environment hints with host + terminal-backend info
build_environment_hints() now emits a factual block describing the
execution environment on every prompt build:

* Local backend: host OS, $HOME, and cwd — so the agent stops guessing
  paths from the hostname. Windows also gets two specific callouts:
  - hostname != username (prevents C:\Users\<hostname>\... bugs)
  - `terminal` shells out to bash (git-bash/MSYS), not PowerShell

* Remote backend (docker/singularity/modal/daytona/ssh/vercel_sandbox):
  host info is SUPPRESSED — the agent's tools can't touch the host, so
  showing it is misleading. Instead we probe the backend once per
  process with `uname/whoami/pwd` and cache the result. On probe
  failure, fall back to a per-backend description that states only what
  we know from the backend choice itself (container type + likely OS
  family) without inventing user/cwd/$HOME.

Linux/Mac local users now get a small helpful 3-line host block instead
of an empty string. Zero change to the existing WSL hint paragraph.

Tests: 8 new/updated in TestEnvironmentHints, including a regression
guard that fails if a new remote backend is added without listing it in
_REMOTE_TERMINAL_BACKENDS.
2026-05-08 05:07:40 -07:00
Teknium e0c03defd5 lint: enable PLW1514 as a blocking ruff rule
Turns the existing 'all lints disabled' stance into 'exactly one lint
enabled' — PLW1514 (unspecified-encoding) catches bare open() /
read_text() / write_text() calls that default to locale encoding on
Windows (cp1252), silently corrupting non-ASCII content.

Changes:

1. pyproject.toml
   - Migrate [tool.ruff] top-level select → [tool.ruff.lint].select
     (deprecated config location, ruff was warning on every run)
   - Add preview = true (PLW1514 is a preview rule in ruff 0.15.x)
   - select = ['PLW1514'] (exactly one rule, deliberately minimal)
   - per-file-ignores exempt tests/, plugins/, skills/, optional-skills/ —
     those have their own conventions or intentionally exercise edge cases

2. website/scripts/extract-skills.py
   - Fix 3 remaining bare opens (website/ was excluded from the main
     sweep but needed for ruff check . to go green)

3. tests/test_lint_config.py (new, 5 tests)
   - Guards against accidental rule removal.  If someone deletes PLW1514
     from the select list or disables preview mode, these tests fail
     with a loud message explaining why the rule exists.

Paired with a companion commit (held locally for now, pending a token
with workflow scope) that adds a blocking ruff step to .github/workflows/
lint.yml.  Without that companion commit, ruff is configured correctly
but nothing in CI enforces it yet — the advisory PR comment will still
surface new PLW1514 violations though, so authors see them.

Verified: ruff check . → exit 0, 0 violations across the repo.
Test suite: 90 passed, 14 skipped, 0 failed.
2026-05-07 19:36:13 -07:00
Teknium 9c914c01c8 codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.

Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs).  That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.

After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly.  Works identically on every platform
and every locale, no surprise behavior.

Mechanical sweep via:
  ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix     --exclude 'tests,venv,.venv,node_modules,website,optional-skills,               skills,tinker-atropos,plugins' .

All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8').  Nothing
else changed.  Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).

Scope notes:
  - tests/ excluded: test fixtures can use locale encoding intentionally
    (exercising edge cases).  If we want to tighten tests later that's
    a separate PR.
  - plugins/ excluded: plugin-specific conventions may differ; plugin
    authors own their code.
  - optional-skills/ and skills/ excluded: skill scripts are user-authored
    and we don't want to mass-edit them.
  - website/ and tinker-atropos/ excluded: vendored / generated content.

46 files touched, 89 +/- lines (symmetric replacement).  No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-07 19:24:45 -07:00
Teknium 6098272454 hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).

Problem: Python on Windows has two long-standing text-encoding pitfalls:

  1. sys.stdout/stderr are bound to the console code page (cp1252 on
     US-locale installs) — print('café') crashes with UnicodeEncodeError.
  2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
     PYTHONIOENCODING are set in their env — so any Python we spawn
     (linters, sandbox children, delegation workers) hits the same bug.

Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:

  - hermes_cli/main.py   (hermes / hermes-agent console_script)
  - run_agent.py         (hermes-agent direct)
  - acp_adapter/entry.py (hermes-acp)
  - gateway/run.py       (messaging gateway)
  - batch_runner.py      (parallel batch mode)
  - cli.py               (legacy direct-launch CLI)

On Windows, the bootstrap:
  - os.environ.setdefault('PYTHONUTF8', '1')       (PEP 540 UTF-8 mode)
  - os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
  - sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')

Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.

On POSIX (Linux/macOS), the bootstrap is a complete no-op.  We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected.  POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.

setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.

What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init.  A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.

Tests (17): 16 passed, 1 skipped on Windows.
  - Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
  - POSIX: complete no-op (verified on fake POSIX + skipped on real
    POSIX since we don't have a Linux box in this session)
  - Idempotence: multiple calls safe
  - Graceful degradation: non-reconfigurable streams don't crash
  - User opt-out: explicit PYTHONUTF8=0 is respected
  - Load order: every entry point's FIRST top-level import is
    hermes_bootstrap, enforced by an AST-level parametrized test

pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
2026-05-07 19:09:40 -07:00
Teknium bf43f6cfdd execute_code: set PYTHONIOENCODING=utf-8 + PYTHONUTF8=1 in child env
Third Windows-specific sandbox bug (after WinError 10106 and the UTF-8
file-write bug): user scripts that print non-ASCII to stdout crash with

    UnicodeEncodeError: 'charmap' codec can't encode character '\u2192'
                        in position N: character maps to <undefined>

Root cause: Python's sys.stdout on Windows is bound to the console code
page (cp1252 on US-locale installs) when the process is attached to a
pipe without PYTHONIOENCODING set.  LLM-generated scripts routinely
print em-dashes, arrows, accented chars, and emoji — all of which cp1252
can't encode.

Fix: spawn the sandbox child with:

    PYTHONIOENCODING=utf-8   # sys.stdin/stdout/stderr all UTF-8
    PYTHONUTF8=1             # PEP 540 UTF-8 mode — open() defaults to UTF-8 too

PYTHONUTF8 is the belt-and-suspenders half: LLM scripts that call
open(path, 'w') without encoding= in user code will now produce UTF-8
files by default, matching what the sandbox already does for its own
staging files.

The parent side already decodes child stdout/stderr as UTF-8 with
errors='replace' (lines 1345-1347) so the end-to-end chain is clean.

On POSIX these values usually match the locale default already, so
setting them is harmless belt-and-suspenders for C/POSIX-locale
containers and minimal base images.

Tests added (4) — total file now at 28 passed, 1 skipped on Windows:
  - test_popen_env_sets_pythonioencoding_utf8 (source grep)
  - test_popen_env_sets_pythonutf8_mode (source grep)
  - test_live_child_can_print_non_ascii (cross-platform live test)
  - test_windows_child_without_utf8_env_would_fail (Windows negative
    control — actually reproduces the bug without our env overrides,
    proving the fix is load-bearing on this system)
2026-05-07 18:59:35 -07:00
Teknium f5ec30dfe6 tests: skip POSIX-venv-layout tests on Windows
test_code_execution_modes.py had two test-level failures and two
class-level stale skip reasons on this Windows-native branch:

  - TestResolveChildPython::test_project_with_virtualenv_picks_venv_python
  - TestResolveChildPython::test_project_prefers_virtualenv_over_conda

Both fail on Windows with OSError: [WinError 1314] — they call
pathlib.Path.symlink_to() to build a fake venv, which requires
developer mode or admin on Windows.  They also assume POSIX venv
layout (bin/python) where Windows uses Scripts/python.exe.  Skip
them with a specific, accurate reason.

Also updated two class-level skipif reasons that said
'execute_code is POSIX-only' — no longer true on this branch.
New reason explains it's the test infrastructure (symlinks + POSIX
venv layout) that's the blocker, not execute_code itself.

Results on Windows Python 3.11:
  Before: 41 passed, 10 skipped, 2 failed
  After:  43 passed, 12 skipped, 0 failed
2026-05-07 18:56:33 -07:00
Teknium 8798bea31f execute_code: write sandbox files as UTF-8 on Windows
Second Windows-specific sandbox bug (WinError 10106 was the first):
after the env-scrub fix let the child start, it immediately failed to
import hermes_tools with:

    SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x97
                 in position 154: invalid start byte

Root cause: _execute_local wrote the generated hermes_tools.py stub and
the user's script.py via open(path, 'w') without encoding=.  On Windows
the default text-mode encoding is cp1252 (system locale), which encodes
em-dashes (used in the stub's docstrings) as 0x97.  Python then decodes
source files as UTF-8 (PEP 3120) on import, chokes on 0x97, and the
sandbox dies before any tool call.

Fix: pass encoding='utf-8' to all four file opens in the code_execution
path — the two staging writes in _execute_local (hermes_tools.py +
script.py) and the two RPC file-transport reads/writes in the generated
remote stub.  JSON is ASCII-safe for most payloads but tool results
(terminal output, web_extract content) routinely carry non-ASCII.

Tests added (4):
  - test_stub_and_script_writes_specify_utf8 — source grep guard
  - test_file_rpc_stub_uses_utf8 — generated remote stub check
  - test_stub_source_roundtrips_through_utf8 — concrete round-trip
  - test_windows_default_encoding_would_have_failed — negative control
    (skips on modern Python builds where default is already UTF-8
    compatible, but retained for platforms where the regression could
    return)

24/25 tests pass on Windows 3.11 (negative control skips because this
Python build handles em-dashes via cp1252 subset — the fix is still
correct, just the corruption path isn't always triggerable).
2026-05-07 18:52:59 -07:00
Teknium 668e4b8d7e tests: lock in POSIX-equivalence guard for execute_code env scrubber
Adds TestPosixEquivalence to test_code_execution_windows_env.py.  The
class pins the invariant that _scrub_child_env(env, is_windows=False)
produces byte-for-byte identical output to the pre-refactor inline
scrubber, across a matrix of:

  - 2 synthetic envs (POSIX-shaped, Windows-shaped-on-POSIX)
  - 3 passthrough rules (none, single-var, everything)
  - 1 real-os.environ check on whatever platform runs the test

Plus a superset sanity check: is_windows=True must keep everything
is_windows=False keeps, and any extras must come from the
_WINDOWS_ESSENTIAL_ENV_VARS allowlist.

Rationale: the previous commit refactored the env-scrubbing inline
block into a helper.  Future changes to that helper must not silently
regress POSIX behavior — if someone needs to change it, they update
_legacy_posix_scrubber in lockstep so the churn is visible in review.

All 21 tests in the file pass locally on Windows (pytest 9.0.3).  8 of
them are parametrized equivalence checks that run on every OS.
2026-05-07 18:45:34 -07:00
Teknium fab984c7f8 execute_code: pass through Windows OS-essential env vars
The sandbox's env scrubbing was dropping SYSTEMROOT, WINDIR, COMSPEC,
APPDATA, etc. On Windows this broke the child process before any RPC
could happen:

    OSError: [WinError 10106] The requested service provider could not
    be loaded or initialized

Python's socket module uses SYSTEMROOT to locate mswsock.dll during
Winsock initialization. Without it, socket.socket(AF_INET, SOCK_STREAM)
fails — and the existing loopback-TCP fallback for Windows couldn't work.

Fix: add a small Windows-only allowlist (_WINDOWS_ESSENTIAL_ENV_VARS)
matched by exact uppercase name, after the existing secret-substring
block. The secret block still runs first, so the allowlist cannot be
used to exfiltrate credentials. Also extract the env scrubber into a
testable helper (_scrub_child_env) that takes is_windows as a parameter,
so the logic can be unit-tested on any OS.

Live Winsock smoke test verifies that a child spawned with the scrubbed
env can now create an AF_INET socket on a real Windows host; the test
is guarded by sys.platform == 'win32' so POSIX CI stays green.
2026-05-07 18:39:38 -07:00
Teknium f0d2516a30 fix(windows): prefer npm.cmd over npm.ps1, skip .py argv0 in relaunch
Two fixes from teknium1's next install run:

1. **npm install: "npm.ps1 cannot be loaded because running scripts is
   disabled on this system."**  Get-Command's default PATHEXT ordering
   picked up ``npm.ps1`` (the PowerShell shim) ahead of ``npm.cmd`` (the
   batch shim).  Most Windows users have PowerShell's execution policy
   set to Restricted or RemoteSigned, which blocks unsigned ``.ps1``
   files.  ``npm.cmd`` has no such restriction and works universally.

   Install-NodeDeps now detects when Get-Command returned npm.ps1, looks
   for a sibling npm.cmd in the same directory, and prefers it.  Prints
   an info line so the user sees why.  Emits a warning + hint if only
   npm.ps1 is available.

2. **"Launch hermes chat now? Y" crashes with "%1 is not a valid Win32
   application" on Windows installs.**  The setup wizard calls
   ``relaunch(["chat"])``; ``resolve_hermes_bin()`` returned
   ``sys.argv[0]`` which was ``...\\hermes_cli\\main.py`` (because hermes
   was launched via ``python -m hermes_cli.main`` during setup).

   On Windows, ``os.access(script.py, os.X_OK)`` returns True because
   PATHEXT lists ``.py`` when the Python launcher is registered — but
   ``subprocess.run([script.py, ...])`` can't actually execute a ``.py``
   directly.  CreateProcessW needs a real PE file.

   Fixed ``resolve_hermes_bin`` to reject ``.py``/``.pyc`` argv0 values
   on Windows specifically.  Falls through to ``shutil.which("hermes")``
   (hermes.exe in the venv Scripts dir) or, as a final fallback, lets
   build_relaunch_argv build ``[sys.executable, "-m", "hermes_cli.main"]``
   which is bulletproof.  POSIX behaviour unchanged — ``.py`` argv0 with
   a shebang + chmod+x is still a valid exec target there.

3 new tests cover the Windows paths: .py argv0 + hermes.exe on PATH →
returns hermes.exe; .py argv0 + no PATH → returns None (caller uses
python -m); POSIX + executable .py → still accepted.

26 relaunch tests pass, no POSIX regressions.
2026-05-07 18:29:17 -07:00
Teknium 2e403bd0a4 fix(windows): enable execute_code — stale AF_UNIX gate was blocking the tool
teknium1 noticed execute_code was missing from his enabled tools on Windows.
Root cause: tools/code_execution_tool.py set ``SANDBOX_AVAILABLE =
sys.platform != \"win32\"`` as a module-level constant, originally because
the RPC transport required AF_UNIX.  We added loopback TCP fallback for
the sandbox in commit eeb723fff (and covered it in the Windows TCP tests),
but forgot to lift the availability gate.  So execute_code was still
invisible via the check_fn path on Windows.

- SANDBOX_AVAILABLE is now True unconditionally (it's still checked — a
  future platform could flip it off via monkeypatch/env if needed).
- Error message when disabled no longer mentions Windows specifically,
  just says 'sandbox is unavailable in this environment'.
- test_windows_returns_error updated: patches SANDBOX_AVAILABLE=False
  directly (which was always its real intent) and asserts on 'unavailable'
  instead of 'Windows'.

Tests: 171 code-execution + windows-compat tests pass, no regressions.
2026-05-07 18:17:31 -07:00
Teknium 2c7b479d16 fix(windows): %1 install error, patch CRLF false-negative, SOUL.md BOM
Three bugs from teknium1's successful install + diagnostic chat on Windows:

1. **Start-Process -FilePath npm.cmd fails with "%1 is not a valid Win32
   application".**  Start-Process bypasses cmd.exe and PATHEXT to call
   CreateProcessW directly, which refuses .cmd batch shims.  Switched
   Install-NodeDeps to use PowerShell's invocation operator (``& $npmExe
   install --silent *> $log``) which DOES honour PATHEXT.  Extracted a
   ``_Run-NpmInstall`` helper so the browser + TUI paths share the same
   logic.  Captures $LASTEXITCODE correctly, still surfaces the real
   stderr on failure with a log-file pointer for the full output.

2. **patch tool returns false-negative on Windows due to CRLF round-trip.**
   Root cause was upstream of patch: ``subprocess.Popen(..., text=True,
   stdin=PIPE)`` on Windows translates ``\\n`` → ``\\r\\n`` when data flows
   through the stdin pipe.  ``_pipe_stdin()`` was writing the patch's
   new_content string through a text-mode pipe, bash then wrote those
   CRLF bytes to disk, and patch's post-write verify compared the
   on-disk CRLF bytes against the original LF-only string — fail.

   Fixed in two places for defense in depth:
   - ``_pipe_stdin()`` now writes through ``proc.stdin.buffer`` with
     explicit UTF-8 encoding, bypassing Python's newline translation on
     every platform.  No behaviour change on POSIX (bytes are identical)
     but stops the CRLF injection on Windows.
   - ``patch_replace``'s post-write verify normalizes CRLF→LF on both
     sides before comparing, so even if some future backend still
     translates newlines the patch tool won't report a bogus failure.

3. **SOUL.md gets a UTF-8 BOM on Windows PowerShell 5.1.**  ``Set-Content
   -Encoding UTF8`` on PS5.1 writes UTF-8 WITH a byte-order-mark (changed
   in PS7 via ``utf8NoBOM``).  Hermes's prompt-injection scanner sees
   the BOM (U+FEFF invisible char) and refuses to load the file, so
   SOUL.md's persona instructions never get applied.

   Fixed by writing the file via ``[System.IO.File]::WriteAllText``
   with an explicit ``UTF8Encoding($false)`` — BOM-free on every
   PowerShell version.

All POSIX behaviour verified unchanged: 198 tests pass across
test_file_operations, test_local_env_cwd_recovery, test_code_execution,
test_windows_native_support, test_windows_compat.
2026-05-07 18:11:43 -07:00
Teknium 225b57f314 fix(install.ps1): step out of $InstallDir before touching it + harden repo probe
User hit 'fatal: not in a git directory' on re-install because:

1. They ran Remove-Item -Force $env:LOCALAPPDATA\hermes -ErrorAction
   SilentlyContinue WHILE cd'd inside the install dir.  Windows
   silently refuses to delete a directory any shell is currently cd'd
   inside and leaves the skeleton intact, but the -ErrorAction
   SilentlyContinue swallowed every partial-delete failure so they
   thought the wipe succeeded.

2. The installer then walked into Install-Repository, saw $InstallDir
   still exists with a partial .git stub, my repo-validity probe
   returned success (the probe's git rev-parse may have exit-code-zeroed
   in a way I didn't expect), and the real git fetch died with three
   'fatal: not a git repository' errors.

Two fixes belt-and-braces:

- Main() now cds to $env:USERPROFILE at start if the current shell
  is inside $InstallDir.  Harmless when the user ran from elsewhere;
  critical when they didn't.  This alone fixes the user's case.

- Install-Repository's 'is this a valid repo' probe now runs BOTH
  git rev-parse --is-inside-work-tree AND git status, resets
  $LASTEXITCODE before each to avoid picking up a stale 0, and
  requires BOTH to succeed.  Also requires rev-parse's output to
  match 'true' (not just exit 0) to rule out exit-0-with-empty-output
  edge cases.
2026-05-07 18:05:35 -07:00
Teknium 4d7e72e14d fix(install.ps1): validate existing repo via git itself + clean up broken stubs
teknium1 hit "fatal: not in a git directory" on re-install when the previous
install left a $InstallDir\.git stub that Test-Path matched but git didn't
recognize (three "fatal: not a git repository" lines, then the script
exited before touching anything).

Two bugs:

1. Test-Path "$InstallDir\.git" was a weak gate — it matches .git
   whether it's a directory, file, symlink, submodule gitfile, OR a
   broken stub from a failed previous Remove-Item.  Replaced with a
   real repo probe: Push-Location + git rev-parse --is-inside-work-tree
   + $LASTEXITCODE check.  If git itself can't see a repo, we treat
   the directory as not-a-repo and fall through to fresh clone.

2. The original update path ignored $LASTEXITCODE.  fetch/checkout/pull
   all emitted fatals but the script kept going.  Now each command
   checks $LASTEXITCODE and throws with an explicit message.

Also: when the directory exists but isn't a valid repo, the new code
wipes it (Remove-Item -ErrorAction Stop) and falls through to fresh
clone, instead of dying with the old "Directory exists but is not a git
repository" error.  If the wipe itself fails (file locked, hermes still
running), we throw with a user-readable "close any programs using files
in <dir>" hint.

Refactored the function to use a $didUpdate flag instead of my earlier
draft's early `return` — that was skipping the submodule init block at
the bottom of the function.  Both the update and fresh-clone paths now
fall through to the submodule init step, which is correct (git pull
doesn't auto-update submodules).

PowerShell structural check: 21 functions defined, braces balanced.
2026-05-07 18:00:59 -07:00
Teknium 787d964ea1 fix(windows): quote cache paths in bash + augment PATH so rg/bash resolve on first launch
Three interrelated bugs from teknium1's first interactive chat on Windows:

1. **Snapshot/cwd file paths unquoted in bash command strings.**  The session
   bootstrap and per-command wrapper interpolated
   ``self._snapshot_path`` / ``self._cwd_file`` unquoted into bash commands
   like ``export -p > C:/Users/ryanc/.../hermes-snap-xxx.sh``.  Git Bash's
   MSYS2 layer handles ``C:/...`` paths correctly ONLY when quoted; unquoted,
   the colon and forward-slash get glob-parsed and the redirect targets a
   bogus path.  Symptom: every terminal command emitted two
   ``C:/Users/.../hermes-snap-*.sh (No such file or directory)`` lines that
   bled into stdout (``stderr=STDOUT`` on the local backend) and corrupted
   file contents when the agent wrote to scratch paths via the terminal
   tool.  Fix: ``shlex.quote()`` every interpolation of ``_snapshot_path``
   and ``_cwd_file`` in base.py — no-op on POSIX (the paths contain no
   shell-metachars), critical on Windows.

2. **Stale PATH on first hermes launch after install.**  ``install.ps1``
   adds the PortableGit ``cmd`` / ``bin`` / ``usr\bin`` directories to the
   Windows **User** PATH via ``SetEnvironmentVariable(..., "User")``.  That
   write propagates to newly *spawned* processes only — already-running
   shells (including the one the user types ``hermes`` into immediately
   after install) retain their old PATH.  So hermes starts with a PATH that
   doesn't include bash, rg, grep, ssh — and ``search_files`` reports
   "rg/find not available" when the user clearly just installed them.

   Fix: new ``_augment_path_with_known_tools()`` helper called from
   ``configure_windows_stdio()`` on startup.  Prepends the Hermes-managed
   Git directories + the WinGet Links directory (where ripgrep lands) to
   ``os.environ['PATH']`` if they exist on disk but aren't already in
   PATH.  Subsequent subprocess calls (including bash spawns via
   ``_find_bash()``) inherit the augmented PATH and find everything.
   No-op on POSIX and when the directories don't exist.

3. **Root cause of "file content corruption".**  #1 was the proximate cause.
   Errors like ``C:/Users/.../hermes-snap-xxx.sh: No such file or directory``
   were emitted on stderr by the failed redirect, captured into stdout via
   ``stderr=subprocess.STDOUT``, and if the agent used terminal commands
   like ``cat > file`` the leaked error bytes became part of the file.
   Fixing #1 eliminates this entirely.

## Tests

All 77 Windows-compat tests still pass on Linux (POSIX path is
shlex.quote('/tmp/foo.sh') → '/tmp/foo.sh' — unchanged).

## Not addressed here (would need a bigger design)

- Python file tools (``write_file``, ``read_file``) and the bash-backed
  terminal tool see DIFFERENT views of ``/tmp`` on Windows.  Python treats
  ``/tmp`` as ``C:\tmp`` (drive-relative), Git Bash's MSYS2 treats it as
  a virtual mount to the PortableGit install's ``tmp\``.  Would need a
  translation shim in the Python tools to resolve bash-virtual paths to
  their native-Windows equivalents.  Workaround for users today: use
  absolute native paths (``C:\Users\you\...``) instead of ``/tmp/...``
  when crossing between terminal and Python file tools.
2026-05-07 17:51:57 -07:00
Teknium cf9b2df57a fix(windows): use PortableGit (not MinGit), fix relaunch os.execvp crash, surface npm errors
Three real bugs from teknium1's first Windows install run:

1. **MinGit has no bash.exe.**  MinGit is the minimal-automation Git for Windows
   distribution — it ships git.exe but deliberately strips bash and the POSIX
   coreutils.  Installer logged "Could not locate bash.exe" and Hermes would
   fail to run any shell command.  Switched to PortableGit — the full Git for
   Windows minus the installer UI.  PortableGit ships bash.exe at
   <root>\bin\bash.exe plus sh, awk, sed, grep, curl, ssh in usr\bin\.  ARM64
   variant is detected separately (PortableGit-*-arm64.7z.exe).  32-bit falls
   back to MinGit-32-bit with a warning (PortableGit is 64-bit only).

   PortableGit ships as a 7z self-extractor (56MB vs MinGit's 38MB).  We
   invoke it with `-o<target> -y` to extract silently — no 7z install needed,
   it's self-contained.

   Updated tools/environments/local.py::_find_bash candidate order to prefer
   the PortableGit layout (<root>\bin\bash.exe) with the MinGit layout
   (<root>\usr\bin\bash.exe) as a fallback so existing installs keep working.

2. **os.execvp "Exec format error" on Windows.**  Setup wizard's "Launch
   hermes chat now? Y" called `os.execvp(["hermes", "chat"])` which on
   Windows can only swap to real Win32 .exe files — chokes with OSError(8)
   on .cmd batch shims and Python console-script wrappers.  Added a
   win32 branch in hermes_cli/relaunch.py::relaunch() that uses
   subprocess.run + sys.exit — functionally identical (user sees "hermes
   exited, then new hermes started") with one extra PID in play.  POSIX
   path is UNCHANGED — still uses os.execvp for in-place replacement.
   Catches OSError in the Windows branch and surfaces a "open a new
   terminal so PATH picks up, then re-run hermes" hint instead of a
   cryptic traceback.

3. **npm install failures silent on Windows.**  The install.ps1 was invoking
   `npm install --silent 2>&1 | Out-Null` inside a try/catch.  PowerShell's
   try/catch does NOT trigger on non-zero process exit codes — only on
   unhandled .NET exceptions — so npm failing printed a generic "npm
   install failed" with zero information about WHY.  The silent pipe ate
   the stderr.

   Rewrote Install-NodeDeps to:
   - Resolve npm.cmd via Get-Command (respects PATHEXT) instead of
     relying on bare `npm` name resolution.
   - Use Start-Process with -PassThru to capture the actual exit code.
   - Redirect stderr to a temp log and surface the first ~800 chars of
     the real npm error when install fails, plus the log path for the
     full text.
   - Fail loudly with the right exit code instead of a misleading success.
   - Bail cleanly with a helpful message when npm isn't on PATH at all.

4. **"True" printing to console after Node check.**  `Test-Node` returns $true;
   installer called it as a bare statement (no assignment, no cast).  PowerShell
   prints bare return values.  Wrapped the call in `[void](Test-Node)`.

## Tests

- Added 3 new tests in tests/hermes_cli/test_relaunch.py covering the
  Windows branch: subprocess is called (not execvp), child exit code
  propagates, OSError surfaces a helpful message.  All 23 tests pass
  (20 existing + 3 new).
- 77 Windows-compat tests still pass, POSIX behaviour unchanged.
2026-05-07 17:42:47 -07:00
Teknium eeb723fff2 feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.

Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.

## New module

- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
  windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
  All no-ops on non-Windows.

## CRITICAL fixes (would crash or silently break on Windows)

- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
  AttributeError on import on Windows, breaking `hermes --tui` entirely (it
  spawns this module as a subprocess).  Guard each signal.signal() call with
  hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.

- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
  unguarded.  os.WNOHANG doesn't exist on Windows.  Gate the whole reap loop
  behind `os.name != "nt"` — Windows has no zombies anyway.

- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
  most Windows builds.  Fall back to loopback TCP (AF_INET on 127.0.0.1:0
  ephemeral port) when _IS_WINDOWS.  HERMES_RPC_SOCKET env var now accepts
  either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
  Generated sandbox client parses both.

- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded.  Use
  shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
  readable error when bash is genuinely absent.

- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
  (npm install + node version probe), browser_tool.py x2.  On Windows npm
  is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
  fails with WinError 193.  shutil.which(...) returns the absolute .cmd
  path which CreateProcessW accepts because the extension routes through
  cmd.exe /c.  POSIX behaviour unchanged (shutil.which still returns the
  same path subprocess would resolve itself).

## HIGH fixes (silent misbehaviour on Windows)

- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
  Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
  via MSYS2's virtual /tmp but native Python couldn't open.  Result: cwd
  tracking silently broken — `cd` in terminal tool did nothing.  Windows
  branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
  (works in both bash and Python, guaranteed no spaces).

- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
  in split(":")` heuristic mangles Windows PATH (";" separator).  Gate
  the injection behind `not _IS_WINDOWS`.

- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
  Popen + watcher-script Popen both used start_new_session=True, which
  Windows silently ignores.  Watcher stayed attached to CLI's console,
  died when user closed terminal after `hermes update`, left gateway
  stale.  Now branches through windows_detach_popen_kwargs() helper
  (CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
  Windows, start_new_session=True on POSIX — identical to main).

## MEDIUM fixes

- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
  chain crashes on Windows when user triggers /update in-gateway.  Now
  has sys.platform=="win32" branch using sys.executable + a tiny
  Python watcher with proper detach flags.  POSIX path is unchanged.

- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
  style paths that break subprocess.Popen(cwd=...) and Path().resolve().
  Added _normalize_git_bash_path() helper that translates /c/Users,
  /cygdrive/c, /mnt/c variants to native C:\Users form.  POSIX no-op.
  _git_repo_root() now routes every result through it.

- cli.py worktree .worktreeinclude: os.symlink on directories failed
  hard on Windows (requires admin or Developer Mode).  Falls back to
  shutil.copytree with a warning log.

## Tests

- 29 new tests in tests/tools/test_windows_native_support.py covering:
  subprocess_compat helpers, TUI entry signal guards, kanban waitpid
  guard, code_execution TCP fallback source-level invariants, cron bash
  resolution, npm/npx bare-spawn lint per-file, local env Windows temp
  dir, PATH injection gating, git bash path normalization, symlink
  fallback, gateway detached watcher flags.

- One existing test assertion adjusted in test_browser_homebrew_paths:
  it compared captured Popen argv to the BARE `"npx"` literal; after the
  shutil.which() change argv[0] is the absolute path.  New assertion
  checks the shape (two items, second is `agent-browser`) rather than
  the exact first-item string.  Behaviour unchanged; test was too strict.

All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.

## What's still deferred (LOW priority)

- Visible cmd-window flashes on short-lived console apps (~14 sites) —
  cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
  hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
  reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
Teknium 1da89528e7 fix(windows-editor): default EDITOR=notepad so /edit and Ctrl+X Ctrl+E work
Pre-existing Windows bug surfaced while reviewing the portable-MinGit
install: prompt_toolkit's Buffer.open_in_editor() falls back to POSIX
absolute paths (/usr/bin/nano, /usr/bin/vi, /usr/bin/emacs) that don't
exist on native Windows.  When neither $EDITOR nor $VISUAL is set,
Ctrl+X Ctrl+E ("open prompt in editor") and /edit both silently do
nothing on Windows — the user hits the key, nothing happens, no error.

This wasn't caused by MinGit (full Git for Windows doesn't fix it either,
because the Windows Python subprocess call resolves `/usr/bin/nano` as
`C:\usr\bin\nano`, which doesn't exist even with nano installed).

Fixes:
- hermes_cli/stdio.py::configure_windows_stdio now sets EDITOR=notepad
  on Windows if neither EDITOR nor VISUAL is set.  notepad.exe is in
  every Windows install, works as a blocking editor (subprocess.call
  waits for the window to close), and writes back to the file.
- hermes_cli/config.py (hermes config edit): reorder fallback list so
  Windows tries notepad first — previously nano led the list, which
  required Git Bash / WSL to be in PATH.
- Users who want VSCode / Neovim / Notepad++ can still override via
  $env:EDITOR — that's checked before our default kicks in.  Docstring
  spells out the common overrides.

The Ink TUI (`hermes --tui`) already handled Windows correctly via
ui-tui/src/lib/editor.ts falling back to notepad.exe on win32 — this
commit brings the classic prompt_toolkit CLI into parity.

3 new tests in test_windows_native_support.py verify:
- EDITOR=notepad gets set when unset on Windows
- Explicit $EDITOR is respected
- $VISUAL is respected (not overwritten by our default)
2026-05-07 16:46:37 -07:00
Teknium 5486ad2f2a feat(windows-install): bundle portable MinGit instead of relying on winget
User hit a real failure case: their system Git was in a half-installed state
(can neither uninstall nor reinstall) and winget refused to work around it.
We were one step away from shipping an installer that would have left users
with exactly the problem he already had.

What other agents do (reality check):
- Claude Code: requires pre-installed Git; breaks if user doesn't have it.
- OpenCode, Codex: don't need bash at all — PowerShell-first design.
- Cline: uses whatever shell VSCode is configured with; installs nothing.

None of them solve the "broken system Git" problem.  We need to own our Git.

Changes:
- scripts/install.ps1::Install-Git: dropped winget path entirely.  Now:
  (1) use existing git if present; (2) download portable MinGit from the
  official git-for-windows GitHub release to %LOCALAPPDATA%\hermes\git.
  No winget, no admin, no Windows installer registry, no system impact.
- Added %LOCALAPPDATA%\hermes\git\{cmd,usr\bin} to User PATH so git + bash
  + POSIX coreutils (which, env, grep, …) resolve in fresh shells.
- tools/environments/local.py::_find_bash: reorder so Hermes' portable
  MinGit install is checked BEFORE falling through to shutil.which("bash")
  or system install locations.  This way a broken system Git can't
  hijack the bash lookup.
- README + installation docs reworded to reflect the new story: "portable
  Git Bash, isolated from any system install, recoverable via rm -rf if it
  ever breaks."

Recoverability: if Hermes' Git install ever breaks, ``Remove-Item %LOCALAPPDATA%\hermes\git``
and re-run the installer — no system impact, no uninstall drama, no winget
to fight with.
2026-05-07 16:38:11 -07:00
Teknium fda234a210 feat(windows): close native-Windows install gaps — crash-free startup, UTF-8 stdio, tzdata dep, docs
Native Windows (with Git for Windows installed) can now run the Hermes CLI
and gateway end-to-end without crashing.  install.ps1 already existed and
the Git Bash terminal backend was already wired up — this PR fills the
remaining gaps discovered by auditing every Windows-unsafe primitive
(`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios`
imports) and by comparing hermes against how Claude Code, OpenCode, Codex,
and Cline handle native Windows.

## What changed

### UTF-8 stdio (new module)
- `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point.
  Flips the console code page to CP_UTF8 (65001), reconfigures
  `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8`
  for subprocesses.  No-op on non-Windows.  Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`.
- Called early in `cli.py::main`, `hermes_cli/main.py::main`, and
  `gateway/run.py::main` so Unicode banners (box-drawing, geometric
  symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252
  consoles.

### Crash sites fixed
- `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw
  `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)`
  which routes through `taskkill /T /F` on Windows.
- `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also
  converted SIGTERM path to `terminate_pid()` and widened OSError catch
  on the intermediate `os.kill(pid, 0)` probe.
- `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` →
  `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the
  pattern already used in `gateway/status.py`).

### OSError widening on `os.kill(pid, 0)` probes
Windows raises `OSError` (WinError 87) for a gone PID instead of
`ProcessLookupError`.  Widened the catch at:
- `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this,
  the loop busy-spins the full 10s every Windows gateway start)
- `hermes_cli/gateway.py:228, 460, 940`
- `hermes_cli/profiles.py:777`
- `tools/process_registry.py::_is_host_pid_alive`
- `tools/browser_tool.py:1170, 1206`

### Dashboard PTY graceful degradation
`hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`,
none of which exist on native Windows.  Previously a Windows dashboard
would crash on `import hermes_cli.web_server` because of a top-level
import.  Now:
- `hermes_cli/web_server.py` wraps the pty_bridge import in
  `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`.
- The `/api/pty` WebSocket handler returns a friendly "use WSL2 for
  this tab" message instead of exploding.
- Every other dashboard feature (sessions, jobs, metrics, config
  editor) runs natively on Windows.

### Dependency
- `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so
  Python's `zoneinfo` works on Windows (which has no IANA tzdata
  shipped with the OS).  Credits @sprmn24 (PR #13182).

### Docs
- README.md: removed "Native Windows is not supported"; added
  PowerShell one-liner and Git-for-Windows prerequisite note.
- `website/docs/getting-started/installation.md`: new Windows section
  with capability matrix (everything native except the dashboard
  `/chat` PTY tab, which is WSL2-only).
- `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as
  "WSL2 as an alternative to native" rather than "the only way".
- `website/docs/developer-guide/contributing.md`: updated
  cross-platform guidance with the `signal.SIGKILL` / `OSError`
  rules we enforce now.
- `website/docs/user-guide/features/web-dashboard.md`: acknowledged
  native Windows works for everything except the embedded PTY pane.

## Why this shape

Pulled from a survey of how other agent codebases handle native
Windows (Claude Code, OpenCode, Codex, Cline):

- All four treat Git Bash as the canonical shell on Windows, same as
  hermes already does in `tools/environments/local.py::_find_bash()`.
- None of them force `SetConsoleOutputCP` — but they don't have to,
  Node/Rust write UTF-16 to the Win32 console API.  Python does not get
  that for free, so we flip CP_UTF8 via ctypes.
- None of them ship PowerShell-as-primary-shell (Claude Code exposes
  PS as a secondary tool; scope creep for this PR).
- All of them use `taskkill /T /F` for force-kill on Windows, which
  is exactly what `gateway.status.terminate_pid(force=True)` does.

## Non-goals (deliberate scope limits)

- No PowerShell-as-a-second-shell tool — worth designing separately.
- No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's
  the hardest design call and needs a separate doc.
- No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld
  cluster) — will do as follow-up if users hit actual breakage; most
  modern code already specifies it.

## Validation

- 28 new tests in `tests/tools/test_windows_native_support.py` — all
  platform-mocked, pass on Linux CI.  Cover:
  - `configure_windows_stdio` idempotency, opt-out, env-preservation
  - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback
  - `getattr(signal, "SIGKILL", …)` fallback shape
  - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior)
  - Source-level checks that all entry points call `configure_windows_stdio`
  - pty_bridge import-guard present in `web_server.py`
  - README no longer says "not supported"
- 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass.
- `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed
  pre-existing on main by stash-test).
- `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure).
- `tests/tools/test_process_registry.py` + `test_browser_*` pass.
- Manual smoke: `import hermes_cli.stdio; import gateway.run;
  import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True`
  on Linux (as expected).

## Files

- New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py`
- Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`,
  `hermes_cli/profiles.py`, `hermes_cli/gateway.py`,
  `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`,
  `hermes_cli/web_server.py`, `tools/browser_tool.py`,
  `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4
  docs pages.

Credits to everyone whose prior PR work informed these fixes — see
the co-author trailers.  All of the PRs listed in
`~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL`
/ UTF-8 stdio / tzdata / README patterns found the same issues; this PR
consolidates them.

Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com>
Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com>
Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com>
Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com>
Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com>
Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com>
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com>
Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>
2026-05-07 16:31:40 -07:00
255 changed files with 7740 additions and 625 deletions
+54 -4
View File
@@ -1,9 +1,12 @@
name: Lint (ruff + ty)
# Surface ruff and ty diagnostics as a diff vs the target branch.
# This check is advisory only ATM it always exits zero and never blocks merge.
# It posts a Markdown summary to the workflow run and, for pull requests,
# comments the same summary on the PR.
# Two things here:
# 1. Advisory diff — ruff + ty diagnostics as a diff vs the target branch.
# Posts a Markdown summary and a PR comment. Exit zero always.
# 2. Blocking ``ruff check .`` — enforces the explicit rules in
# ``[tool.ruff.lint.select]`` (currently PLW1514). Failure blocks merge.
# Separate job so the advisory diff still runs and posts even when
# enforcement fails.
on:
push:
@@ -149,3 +152,50 @@ jobs:
body: fullBody,
});
}
ruff-blocking:
# Enforce the rules in pyproject.toml [tool.ruff.lint.select]. Currently
# PLW1514 (unspecified-encoding) — catches bare ``open()`` /
# ``read_text()`` / ``write_text()`` calls that default to locale
# encoding on Windows. Failure here blocks merge; the advisory
# ``lint-diff`` job above runs independently so reviewers still get
# the diff comment even when enforcement fails.
name: ruff enforcement (blocking)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
- name: Install ruff
run: uv tool install ruff
- name: ruff check .
# No --exit-zero, no || true. Exit code propagates to the job,
# which propagates to the required-check gate.
run: |
ruff check .
windows-footguns:
# Static guardrails on Windows-unsafe Python primitives — os.kill(pid, 0),
# os.killpg, os.setsid, signal.SIGKILL without getattr fallback,
# shebang scripts via subprocess, bare open() without encoding=, etc.
# See scripts/check-windows-footguns.py for the full rule list.
name: Windows footguns (blocking)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Set up Python
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5
with:
python-version: "3.11"
- name: Run footgun checker
run: python scripts/check-windows-footguns.py --all
+155 -7
View File
@@ -522,11 +522,57 @@ See `hermes_cli/skin_engine.py` for the full schema and existing skins as exampl
## Cross-Platform Compatibility
Hermes runs on Linux, macOS, and WSL2 on Windows. When writing code that touches the OS:
Hermes runs on Linux, macOS, and native Windows (plus WSL2). When writing code
that touches the OS, assume *any* platform can hit your code path.
> **Before you PR:** run `scripts/check-windows-footguns.py` to catch the
> common Windows-unsafe patterns in your diff. It's grep-based and cheap;
> CI runs it on every PR too.
### Critical rules
1. **`termios` and `fcntl` are Unix-only.** Always catch both `ImportError` and `NotImplementedError`:
1. **Never call `os.kill(pid, 0)` for liveness checks.** `os.kill(pid, 0)`
is a standard POSIX idiom to check "is this PID alive" — the signal 0
is a no-op permission check. **On Windows it is NOT a no-op.** Python's
Windows `os.kill` maps `sig=0` to `CTRL_C_EVENT` (they collide at the
integer value 0) and routes it through `GenerateConsoleCtrlEvent(0, pid)`,
which broadcasts Ctrl+C to the **entire console process group** containing
the target PID. "Probe if alive" silently becomes "kill the target and
often unrelated processes sharing its console." See [bpo-14484](https://bugs.python.org/issue14484)
(open since 2012 — will never be fixed for compat reasons).
**Preferred:** use `psutil` (a core dependency — always available):
```python
import psutil
if psutil.pid_exists(pid):
# process is alive — safe on every platform
...
```
If you specifically need the hermes wrapper (it has a stdlib fallback
for scaffold-phase imports before pip install finishes), use
`gateway.status._pid_exists(pid)`. It calls `psutil.pid_exists` first
and falls back to a hand-rolled `OpenProcess + WaitForSingleObject`
dance on Windows only when psutil is somehow missing.
Audit grep for new callsites: `rg "os\.kill\([^,]+,\s*0\s*\)"`. Any hit
in non-test code is presumptively a Windows silent-kill bug.
2. **Use `shutil.which()` before shelling out — don't assume Windows has
tools Linux has.** `wmic` was removed in Windows 10 21H1 and later. `ps`,
`kill`, `grep`, `awk`, `fuser`, `lsof`, `pgrep`, and most POSIX CLI tools
simply don't exist on Windows. Test availability with
`shutil.which("tool")` and fall back to a Windows-native equivalent —
usually PowerShell via `subprocess.run(["powershell", "-NoProfile",
"-Command", ...])`.
For process enumeration: PowerShell's `Get-CimInstance Win32_Process` is
the modern replacement for `wmic process`. See
`hermes_cli/gateway.py::_scan_gateway_pids` for the pattern.
3. **`termios` and `fcntl` are Unix-only.** Always catch both `ImportError`
and `NotImplementedError`:
```python
try:
from simple_term_menu import TerminalMenu
@@ -539,24 +585,126 @@ Hermes runs on Linux, macOS, and WSL2 on Windows. When writing code that touches
idx = int(input("Choice: ")) - 1
```
2. **File encoding.** Windows may save `.env` files in `cp1252`. Always handle encoding errors:
4. **File encoding.** Windows may save `.env` files in `cp1252`. Always
handle encoding errors:
```python
try:
load_dotenv(env_path)
except UnicodeDecodeError:
load_dotenv(env_path, encoding="latin-1")
```
Config files (`config.yaml`) may be saved with a UTF-8 BOM by Notepad and
similar editors — use `encoding="utf-8-sig"` when reading files that
could have been touched by a Windows GUI editor.
3. **Process management.** `os.setsid()`, `os.killpg()`, and signal handling differ on Windows. Use platform checks:
5. **Process management.** `os.setsid()`, `os.killpg()`, `os.fork()`,
`os.getuid()`, and POSIX signal handling differ on Windows. Guard with
`platform.system()`, `sys.platform`, or `hasattr(os, "setsid")`:
```python
import platform
if platform.system() != "Windows":
kwargs["preexec_fn"] = os.setsid
else:
kwargs["creationflags"] = subprocess.CREATE_NEW_PROCESS_GROUP
```
4. **Path separators.** Use `pathlib.Path` instead of string concatenation with `/`.
**Preferred:** for killing a process AND its children (what `os.killpg`
does on POSIX), use `psutil` — it works on every platform:
```python
import psutil
try:
parent = psutil.Process(pid)
# Kill children first (leaf-up), then the parent.
for child in parent.children(recursive=True):
child.kill()
parent.kill()
except psutil.NoSuchProcess:
pass
```
5. **Shell commands in installers.** If you change `scripts/install.sh`, check if the equivalent change is needed in `scripts/install.ps1`.
6. **Signals that don't exist on Windows: `SIGALRM`, `SIGCHLD`, `SIGHUP`,
`SIGUSR1`, `SIGUSR2`, `SIGPIPE`, `SIGQUIT`, `SIGKILL`.** Python's
`signal` module raises `AttributeError` at import time if you reference
them on Windows. Use `getattr(signal, "SIGKILL", signal.SIGTERM)` or
gate the whole block behind a platform check. `loop.add_signal_handler`
raises `NotImplementedError` on Windows — always catch it.
7. **Path separators.** Use `pathlib.Path` instead of string concatenation
with `/`. Forward slashes work almost everywhere on Windows, but
`subprocess.run(["cmd.exe", "/c", ...])` and other shell contexts can
require backslashes — convert with `str(path)` at the subprocess boundary,
not inside Python logic.
8. **Symlinks need elevated privileges on Windows** (unless Developer Mode is
on). Tests that create symlinks need `@pytest.mark.skipif(sys.platform ==
"win32", reason="Symlinks require elevated privileges on Windows")`.
9. **POSIX file modes (0o600, 0o644, etc.) are NOT enforced on NTFS** by
default. Tests that assert on `stat().st_mode & 0o777` must skip on
Windows — the concept doesn't translate. Use ACLs (`icacls`, `pywin32`)
for Windows secret-file protection if needed.
10. **Detached background daemons on Windows need `pythonw.exe`, NOT
`python.exe`.** `python.exe` always allocates or attaches to a console,
which makes it vulnerable to `CTRL_C_EVENT` broadcasts from any sibling
process. `pythonw.exe` is the no-console variant. Combine with
`CREATE_NO_WINDOW | DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP |
CREATE_BREAKAWAY_FROM_JOB` in `subprocess.Popen(creationflags=...)`.
See `hermes_cli/gateway_windows.py::_spawn_detached` for the reference
implementation.
11. **`subprocess.Popen` with `.cmd` or `.bat` shims needs `shutil.which`
to resolve.** Passing `"agent-browser"` to `Popen` on Windows finds
the extensionless POSIX shebang shim in `node_modules/.bin/`, which
`CreateProcessW` can't execute — you'll get `WinError 193 "not a valid
Win32 application"`. Use `shutil.which("agent-browser", path=local_bin)`
which honors PATHEXT and picks the `.CMD` variant on Windows.
12. **Don't use shell shebangs as a way to run Python.** `#!/usr/bin/env
python` only works when the file is executed through a Unix shell.
`subprocess.run(["./myscript.py"])` on Windows fails even if the file
has a shebang line. Always invoke Python explicitly:
`[sys.executable, "myscript.py"]`.
13. **Shell commands in installers.** If you change `scripts/install.sh`,
make the equivalent change in `scripts/install.ps1`. The two scripts
are the canonical example of "works on Linux does not mean works on
Windows" and have drifted multiple times — keep them in lockstep.
14. **Known paths that are OneDrive-redirected on Windows:** Desktop,
Documents, Pictures, Videos. The "real" path when OneDrive Backup is
enabled is `%USERPROFILE%\OneDrive\Desktop` (etc.), NOT
`%USERPROFILE%\Desktop` (which exists as an empty husk). Resolve the
real location via `ctypes` + `SHGetKnownFolderPath` or by reading the
`Shell Folders` registry key — never assume `~/Desktop`.
15. **CRLF vs LF in generated scripts.** Windows `cmd.exe` and `schtasks`
parse line-by-line; mixed or LF-only line endings can break multi-line
`.cmd` / `.bat` files. Use `open(path, "w", encoding="utf-8",
newline="\r\n")` — or `open(path, "wb")` + explicit bytes — when
generating scripts Windows will execute.
16. **Two different quoting schemes in one command line.** `subprocess.run
(["schtasks", "/TR", some_cmd])` → schtasks itself parses `/TR`, AND
the `some_cmd` string is re-parsed by `cmd.exe` when the task fires.
Different parsers, different escape rules. Use two separate quoting
helpers and never cross them. See `hermes_cli/gateway_windows.py::
_quote_cmd_script_arg` and `_quote_schtasks_arg` for the reference
pair.
### Testing cross-platform
Tests that use POSIX-only syscalls need a skip marker. Common ones:
- Symlinks → `@pytest.mark.skipif(sys.platform == "win32", ...)`
- `0o600` file modes → `@pytest.mark.skipif(sys.platform.startswith("win"), ...)`
- `signal.SIGALRM` → Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
- `os.setsid` / `os.fork` → Unix-only
- Live Winsock / Windows-specific regression tests →
`@pytest.mark.skipif(sys.platform != "win32", reason="Windows-specific regression")`
If you monkeypatch `sys.platform` for cross-platform tests, also patch
`platform.system()` / `platform.release()` / `platform.mac_ver()` — each
re-reads the real OS independently, so half-patched tests still route
through the wrong branch on a Windows runner.
---
+14 -2
View File
@@ -30,15 +30,27 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
## Quick Install
### Linux, macOS, WSL2, Termux
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
Works on Linux, macOS, WSL2, and Android via Termux. The installer handles the platform-specific setup for you.
### Windows (native, PowerShell)
Run this in PowerShell:
```powershell
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
```
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
>
> **Windows:** Native Windows is not supported. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run the command above.
> **Windows:** Native Windows is supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
After installation:
+11
View File
@@ -13,6 +13,17 @@ Usage::
hermes-acp
"""
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import asyncio
import logging
import sys
+1 -1
View File
@@ -69,7 +69,7 @@ def _resolve_home_dir() -> str:
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip() # windows-footgun: ok — POSIX fallback inside try/except (pwd import fails on Windows)
if resolved:
return resolved
except Exception:
+1 -1
View File
@@ -1607,7 +1607,7 @@ def _run_llm_review(prompt: str) -> Dict[str, Any]:
# terminal. The background-thread runner also hides it; this
# belt-and-suspenders path matters when a caller invokes
# run_curator_review(synchronous=True) from the CLI.
with open(os.devnull, "w") as _devnull, \
with open(os.devnull, "w", encoding="utf-8") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
conv_result = review_agent.run_conversation(user_message=prompt)
+3 -3
View File
@@ -754,7 +754,7 @@ def _load_context_cache() -> Dict[str, int]:
if not path.exists():
return {}
try:
with open(path) as f:
with open(path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
return data.get("context_lengths", {})
except Exception as e:
@@ -776,7 +776,7 @@ def save_context_length(model: str, base_url: str, length: int) -> None:
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
with open(path, "w", encoding="utf-8") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
logger.info("Cached context length %s -> %s tokens", key, f"{length:,}")
except Exception as e:
@@ -800,7 +800,7 @@ def _invalidate_cached_context_length(model: str, base_url: str) -> None:
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
with open(path, "w", encoding="utf-8") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
except Exception as e:
logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
+1 -1
View File
@@ -144,7 +144,7 @@ def nous_rate_limit_remaining() -> Optional[float]:
"""
path = _state_path()
try:
with open(path) as f:
with open(path, encoding="utf-8") as f:
state = json.load(f)
reset_at = state.get("reset_at", 0)
remaining = reset_at - time.time()
+204 -2
View File
@@ -539,13 +539,215 @@ WSL_ENVIRONMENT_HINT = (
)
# Non-local terminal backends that run commands (and therefore every file
# tool: read_file, write_file, patch, search_files) inside a separate
# container / remote host rather than on the machine where Hermes itself
# runs. For these backends, host info (Windows/Linux/macOS, $HOME, cwd) is
# misleading — the agent should only see the machine it can actually touch.
_REMOTE_TERMINAL_BACKENDS = frozenset({
"docker", "singularity", "modal", "daytona", "ssh",
"vercel_sandbox", "managed_modal",
})
# Per-backend fallback descriptions — used when the live probe fails.
# Only states what we know from the backend choice itself (container type,
# likely OS family). Does NOT invent cwd, user, or $HOME — the agent is
# told to probe those directly if it needs them.
_BACKEND_FALLBACK_DESCRIPTIONS: dict[str, str] = {
"docker": "a Docker container (Linux)",
"singularity": "a Singularity container (Linux)",
"modal": "a Modal sandbox (Linux)",
"managed_modal": "a managed Modal sandbox (Linux)",
"daytona": "a Daytona workspace (Linux)",
"vercel_sandbox": "a Vercel sandbox (Linux)",
"ssh": "a remote host reached over SSH (likely Linux)",
}
# Cache the backend probe result per process so we only pay the probe cost
# on the first prompt build of a session. Keyed by (env_type, cwd_hint) so
# a mid-process backend switch rebuilds the string. Kept in-module (not on
# disk) because the probe captures live backend state that may change
# across Hermes restarts.
_BACKEND_PROBE_CACHE: dict[tuple[str, str], str] = {}
_WINDOWS_BASH_SHELL_HINT = (
"Shell: on this Windows host your `terminal` tool runs commands through "
"bash (git-bash / MSYS), NOT PowerShell or cmd.exe. Use POSIX shell "
"syntax (`ls`, `$HOME`, `&&`, `|`, single-quoted strings) inside terminal "
"calls. MSYS-style paths like `/c/Users/<user>/...` work alongside "
"native `C:\\Users\\<user>\\...` paths. PowerShell builtins "
"(`Get-ChildItem`, `$env:FOO`, `Select-String`) will NOT work — use their "
"POSIX equivalents (`ls`, `$FOO`, `grep`)."
)
def _probe_remote_backend(env_type: str) -> str | None:
"""Run a tiny introspection command inside the active terminal backend.
Returns a pre-formatted multi-line string describing the backend's OS,
$HOME, cwd, and user — or None if the probe failed. Result is cached
per process. Used only for non-local backends where the agent's tools
operate on a different machine than the host Hermes runs on.
"""
cwd_hint = os.getenv("TERMINAL_CWD", "")
cache_key = (env_type, cwd_hint)
cached = _BACKEND_PROBE_CACHE.get(cache_key)
if cached is not None:
return cached or None
try:
# Import locally: tools/ imports are heavy and only relevant when a
# non-local backend is actually configured.
from tools.terminal_tool import _get_env_config # type: ignore
from tools.environments import get_environment # type: ignore
except Exception as e:
logger.debug("Backend probe unavailable (import failed): %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
try:
config = _get_env_config()
env = get_environment(config)
# Single-line POSIX probe — works on any Unixy backend. Wrapped in
# `2>/dev/null` so a missing binary doesn't pollute the output.
probe_cmd = (
"printf 'os=%s\\nkernel=%s\\nhome=%s\\ncwd=%s\\nuser=%s\\n' "
"\"$(uname -s 2>/dev/null || echo unknown)\" "
"\"$(uname -r 2>/dev/null || echo unknown)\" "
"\"$HOME\" \"$(pwd)\" \"$(whoami 2>/dev/null || id -un 2>/dev/null || echo unknown)\""
)
result = env.execute(probe_cmd, timeout=4)
if result.get("returncode") != 0:
logger.debug("Backend probe returned non-zero: %r", result)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
output = (result.get("output") or "").strip()
if not output:
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
except Exception as e:
logger.debug("Backend probe failed: %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
# Parse key=value lines back into a tidy summary.
parsed: dict[str, str] = {}
for line in output.splitlines():
if "=" in line:
k, _, v = line.partition("=")
parsed[k.strip()] = v.strip()
pieces = []
os_bits = " ".join(x for x in (parsed.get("os"), parsed.get("kernel")) if x and x != "unknown")
if os_bits:
pieces.append(f"OS: {os_bits}")
if parsed.get("user") and parsed["user"] != "unknown":
pieces.append(f"User: {parsed['user']}")
if parsed.get("home"):
pieces.append(f"Home: {parsed['home']}")
if parsed.get("cwd"):
pieces.append(f"Working directory: {parsed['cwd']}")
if not pieces:
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
formatted = "\n".join(f" {p}" for p in pieces)
_BACKEND_PROBE_CACHE[cache_key] = formatted
return formatted
def _clear_backend_probe_cache() -> None:
"""Test helper — drop the backend probe cache so monkeypatched backends take effect."""
_BACKEND_PROBE_CACHE.clear()
def build_environment_hints() -> str:
"""Return environment-specific guidance for the system prompt.
Detects WSL, and can be extended for Termux, Docker, etc.
Returns an empty string when no special environment is detected.
Always emits a factual block describing the execution environment:
- For **local** terminal backends: the host OS, user home, current
working directory (plus a Windows-only note about hostname != user
and a Windows-only note that `terminal` shells out to bash, not
PowerShell).
- For **remote / sandbox** terminal backends (docker, singularity,
modal, daytona, ssh, vercel_sandbox): host info is **suppressed**
because the agent's tools can't touch the host — only the backend
matters. A live probe inside the backend reports its OS, user, $HOME,
and cwd. Falls back to a static summary if the probe fails.
The WSL environment hint is appended unchanged when running under WSL.
"""
import platform
import sys
hints: list[str] = []
backend = (os.getenv("TERMINAL_ENV") or "local").strip().lower()
is_remote_backend = backend in _REMOTE_TERMINAL_BACKENDS
if not is_remote_backend:
# --- Host info block (local backend: host == where tools run) ---
host_lines: list[str] = []
if is_wsl():
host_lines.append("Host: WSL (Windows Subsystem for Linux)")
elif sys.platform == "win32":
host_lines.append(f"Host: Windows ({platform.release()})")
elif sys.platform == "darwin":
mac_ver = platform.mac_ver()[0]
host_lines.append(f"Host: macOS ({mac_ver or platform.release()})")
else:
host_lines.append(f"Host: {platform.system()} ({platform.release()})")
host_lines.append(f"User home directory: {os.path.expanduser('~')}")
try:
host_lines.append(f"Current working directory: {os.getcwd()}")
except OSError:
pass
if sys.platform == "win32" and not is_wsl():
host_lines.append(
"Note: on Windows, the machine hostname (e.g. from `hostname` "
"or uname) is NOT the username. Use the 'User home directory' "
"above to construct paths under C:\\Users\\<user>\\, never the "
"hostname."
)
hints.append("\n".join(host_lines))
# Windows-local terminal runs bash, not PowerShell — the model must
# know this or it will issue PowerShell syntax and fail.
if sys.platform == "win32" and not is_wsl():
hints.append(_WINDOWS_BASH_SHELL_HINT)
else:
# --- Remote backend block (host info suppressed) ---
probe = _probe_remote_backend(backend)
if probe:
hints.append(
f"Terminal backend: {backend}. Your `terminal`, `read_file`, "
f"`write_file`, `patch`, and `search_files` tools all operate "
f"inside this {backend} environment — NOT on the machine "
f"where Hermes itself is running. The host OS, home, and cwd "
f"of the Hermes process are irrelevant; only the following "
f"backend state matters:\n{probe}"
)
else:
description = _BACKEND_FALLBACK_DESCRIPTIONS.get(
backend, f"a {backend} environment (likely Linux)"
)
hints.append(
f"Terminal backend: {backend}. Your `terminal`, `read_file`, "
f"`write_file`, `patch`, and `search_files` tools all operate "
f"inside {description} — NOT on the machine where Hermes "
f"itself runs. The backend probe didn't respond at "
f"prompt-build time, so the sandbox's current user, $HOME, "
f"and working directory are unknown from here. If you need "
f"them, probe directly with a terminal call like "
f"`uname -a && whoami && pwd`."
)
if is_wsl():
hints.append(WSL_ENVIRONMENT_HINT)
return "\n\n".join(hints)
+1 -1
View File
@@ -617,7 +617,7 @@ def _locked_update_approvals() -> Iterator[Dict[str, Any]]:
save_allowlist(data)
return
with open(lock_path, "a+") as lock_fh:
with open(lock_path, "a+", encoding="utf-8") as lock_fh:
fcntl.flock(lock_fh.fileno(), fcntl.LOCK_EX)
try:
data = load_allowlist()
+11
View File
@@ -20,6 +20,17 @@ Usage:
python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --distribution=image_gen
"""
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import json
import logging
import os
+159 -13
View File
@@ -9,10 +9,20 @@ Usage:
python cli.py # Start interactive mode with all tools
python cli.py --toolsets web,terminal # Start with specific toolsets
python cli.py --skills hermes-agent-dev,github-auth
python cli.py -q "your question" # Single query mode
python cli.py --list-tools # List available tools and exit
"""
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import logging
import os
import shutil
@@ -675,6 +685,7 @@ def _run_cleanup():
if _cleanup_done:
return
_cleanup_done = True
try:
_cleanup_all_terminals()
except Exception:
@@ -728,8 +739,43 @@ def _run_cleanup():
_active_worktree: Optional[Dict[str, str]] = None
def _normalize_git_bash_path(p: Optional[str]) -> Optional[str]:
"""Translate a Git Bash-style path (``/c/Users/...``) to the native
Windows form (``C:\\Users\\...``) that Python's ``subprocess.Popen``
and ``pathlib.Path`` accept.
No-op on non-Windows and for paths that already look native. Git on
native Windows normally emits forward-slash Windows paths
(``C:/Users/...``) which both bash and Python handle, but certain
configurations (Git Bash shells, MSYS2, WSL-mounted repos) surface
``/c/...`` or ``/cygdrive/c/...`` variants.
"""
if not p:
return p
if sys.platform != "win32":
return p
import re as _re
# /c/Users/... or /C/Users/...
m = _re.match(r"^/([a-zA-Z])/(.*)$", p)
if m:
drive, rest = m.group(1), m.group(2)
return f"{drive.upper()}:\\{rest.replace('/', chr(92))}"
# /cygdrive/c/... or /mnt/c/...
m = _re.match(r"^/(?:cygdrive|mnt)/([a-zA-Z])/(.*)$", p)
if m:
drive, rest = m.group(1), m.group(2)
return f"{drive.upper()}:\\{rest.replace('/', chr(92))}"
return p
def _git_repo_root() -> Optional[str]:
"""Return the git repo root for CWD, or None if not in a repo."""
"""Return the git repo root for CWD, or None if not in a repo.
Runs through :func:`_normalize_git_bash_path` so callers can pass
the result directly to ``Path``/``subprocess.Popen(cwd=...)`` on
Windows without hitting ``C:\\c\\Users\\...`` style resolution
mistakes.
"""
import subprocess
try:
result = subprocess.run(
@@ -737,7 +783,7 @@ def _git_repo_root() -> Optional[str]:
capture_output=True, text=True, timeout=5,
)
if result.returncode == 0:
return result.stdout.strip()
return _normalize_git_bash_path(result.stdout.strip())
except Exception:
pass
return None
@@ -781,7 +827,7 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
try:
existing = gitignore.read_text() if gitignore.exists() else ""
if _ignore_entry not in existing.splitlines():
with open(gitignore, "a") as f:
with open(gitignore, "a", encoding="utf-8") as f:
if existing and not existing.endswith("\n"):
f.write("\n")
f.write(f"{_ignore_entry}\n")
@@ -832,10 +878,39 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(str(src), str(dst))
elif src.is_dir():
# Symlink directories (faster, saves disk)
# Symlink directories (faster, saves disk). On Windows,
# symlink creation requires Developer Mode or elevation,
# and fails with OSError otherwise — fall back to a
# recursive copy so the worktree is still usable. The
# copy is slower and uses disk, but it doesn't require
# admin and matches the Linux/macOS symlink outcome
# functionally.
if not dst.exists():
dst.parent.mkdir(parents=True, exist_ok=True)
os.symlink(str(src_resolved), str(dst))
try:
os.symlink(str(src_resolved), str(dst))
except (OSError, NotImplementedError) as _sym_err:
if sys.platform == "win32":
logger.info(
".worktreeinclude: symlink failed (%s) — "
"falling back to copytree on Windows.",
_sym_err,
)
try:
shutil.copytree(
str(src_resolved),
str(dst),
symlinks=True,
dirs_exist_ok=False,
)
except Exception as _copy_err:
logger.warning(
".worktreeinclude: copy fallback "
"also failed for %s -> %s: %s",
src, dst, _copy_err,
)
else:
raise
except Exception as e:
logger.debug("Error copying .worktreeinclude entries: %s", e)
@@ -1781,9 +1856,20 @@ _TERMINAL_INPUT_MODE_RESET_SEQ = (
def _bind_prompt_submit_keys(kb, handler) -> None:
"""Bind both CR and LF terminal Enter forms to the submit handler."""
for key in ("enter", "c-j"):
kb.add(key)(handler)
"""Bind terminal Enter forms to the submit handler.
Enter is always submit. On POSIX we also bind c-j (LF) to submit because
some thin PTYs (docker exec, certain SSH flavors) deliver Enter as LF
instead of CR without this, Enter appears dead on those terminals.
On Windows, Windows Terminal delivers Ctrl+Enter as a distinct c-j key
while plain Enter is c-m, so we leave c-j unbound here it becomes the
multi-line newline keystroke, giving Windows users an Enter-involving
newline without any terminal settings changes.
"""
kb.add("enter")(handler)
if sys.platform != "win32":
kb.add("c-j")(handler)
def _disable_prompt_toolkit_cpr_warning(app) -> None:
@@ -2080,7 +2166,7 @@ def save_config_value(key_path: str, value: any) -> bool:
# Load existing config
if config_path.exists():
with open(config_path, 'r') as f:
with open(config_path, 'r', encoding="utf-8") as f:
config = yaml.safe_load(f) or {}
else:
config = {}
@@ -9706,7 +9792,7 @@ class HermesCLI:
# Debug: log to file (stdout may be devnull from redirect_stdout)
try:
_dbg = _hermes_home / "interrupt_debug.log"
with open(_dbg, "a") as _f:
with open(_dbg, "a", encoding="utf-8") as _f:
_f.write(f"{time.strftime('%H:%M:%S')} interrupt fired: msg={str(interrupt_msg)[:60]!r}, "
f"children={len(self.agent._active_children)}, "
f"parent._interrupt={self.agent._interrupt_requested}\n")
@@ -10538,7 +10624,7 @@ class HermesCLI:
# Debug: log to file when message enters interrupt queue
try:
_dbg = _hermes_home / "interrupt_debug.log"
with open(_dbg, "a") as _f:
with open(_dbg, "a", encoding="utf-8") as _f:
_f.write(f"{time.strftime('%H:%M:%S')} ENTER: queued interrupt msg={str(payload)[:60]!r}, "
f"agent_running={self._agent_running}\n")
except Exception:
@@ -10569,9 +10655,30 @@ class HermesCLI:
@kb.add('escape', 'enter')
def handle_alt_enter(event):
"""Alt+Enter inserts a newline for multi-line input."""
"""Alt+Enter inserts a newline for multi-line input.
Works on mac/Linux/WSL. On Windows Terminal this keystroke is
intercepted at the terminal layer (toggles fullscreen) and never
reaches here Windows users get newline via Ctrl+Enter instead
(bound below as c-j, since WT delivers Ctrl+Enter as LF).
"""
event.current_buffer.insert_text('\n')
if sys.platform == "win32":
@kb.add('c-j')
def handle_ctrl_enter_newline_windows(event):
"""Ctrl+Enter inserts a newline on Windows.
Windows Terminal delivers Ctrl+Enter as LF (c-j), distinct
from plain Enter (c-m). This binding makes Ctrl+Enter the
Windows equivalent of Alt+Enter, giving an Enter-involving
newline keystroke without requiring terminal settings changes.
Ctrl+J (the raw LF keystroke) also triggers this by virtue
of being the same key code a harmless side effect since
Ctrl+J has no conflicting Hermes binding.
"""
event.current_buffer.insert_text('\n')
# VSCode/Cursor bind Ctrl+G to "Find Next" at the editor level, so
# the keystroke never reaches the embedded terminal. Alt+G is unbound
# in those IDEs and arrives here as ('escape', 'g') — register it as
@@ -12157,6 +12264,36 @@ class HermesCLI:
_signal.signal(_signal.SIGTERM, _signal_handler)
if hasattr(_signal, 'SIGHUP'):
_signal.signal(_signal.SIGHUP, _signal_handler)
# Windows: install a SIGINT handler that absorbs the signal
# instead of letting Python's default handler raise
# KeyboardInterrupt in MainThread. Windows Terminal / Win32
# delivers spurious CTRL_C_EVENT to the hermes process when
# child processes are spawned from background threads (agent
# subprocess Popen path). The default Python SIGINT handler
# would then unwind prompt_toolkit's app.run(), trigger
# _run_cleanup mid-turn, and close browser sessions mid-open
# — causing "Daemon process exited during startup" errors.
#
# The handler is a silent no-op. Real user Ctrl+C still works
# because prompt_toolkit binds c-c at the TUI layer and never
# reaches this OS-signal path. This matches how Claude Code
# handles the same Windows quirk (cancellation is driven by
# the TUI key handler, not by OS signals).
#
# POSIX: leave the default SIGINT handler alone. prompt_toolkit
# installs its own handler there and it works as expected.
if sys.platform == "win32":
def _sigint_absorb(signum, frame):
# Absorb silently. Do NOT call agent.interrupt() here:
# Windows fires spurious CTRL_C_EVENT whenever a
# background thread spawns a .cmd subprocess, and
# interrupt() would inject a fake user message each
# time. Real user Ctrl+C routes through prompt_toolkit's
# own c-c key binding at the TUI layer (same pattern as
# Claude Code's Windows handling).
return
_signal.signal(_signal.SIGINT, _sigint_absorb)
except Exception:
pass # Signal handlers may fail in restricted environments
@@ -12342,6 +12479,15 @@ def main(
"""
global _active_worktree
# Force UTF-8 stdio on Windows before any banner/print() runs — the
# Rich console prints Unicode box-drawing characters that would
# UnicodeEncodeError on cp1252. No-op on Linux/macOS.
try:
from hermes_cli.stdio import configure_windows_stdio
configure_windows_stdio()
except Exception:
pass
# Signal to terminal_tool that we're in interactive mode
# This enables interactive sudo password prompts with timeout
os.environ["HERMES_INTERACTIVE"] = "1"
+18 -3
View File
@@ -14,6 +14,7 @@ import contextvars
import json
import logging
import os
import shutil
import subprocess
import sys
@@ -714,7 +715,21 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
# choice explicit here keeps the allowed surface small and auditable.
suffix = path.suffix.lower()
if suffix in (".sh", ".bash"):
argv = ["/bin/bash", str(path)]
# Resolve bash dynamically so Windows (Git Bash) and Linux/macOS
# all work. On native Windows without Git for Windows installed
# shutil.which returns None — fall back to a clear error rather
# than a FileNotFoundError with a confusing "[WinError 2]"
# traceback.
_bash = shutil.which("bash") or (
"/bin/bash" if os.path.isfile("/bin/bash") else None
)
if _bash is None:
return False, (
f"Cannot run .sh/.bash script {path.name!r}: bash not found on PATH. "
"On Windows, install Git for Windows (which ships Git Bash) "
"or rewrite the script as Python (.py)."
)
argv = [_bash, str(path)]
else:
argv = [sys.executable, str(path)]
@@ -1213,7 +1228,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
import yaml
_cfg_path = str(_get_hermes_home() / "config.yaml")
if os.path.exists(_cfg_path):
with open(_cfg_path) as _f:
with open(_cfg_path, encoding="utf-8") as _f:
_cfg = yaml.safe_load(_f) or {}
_cfg = _expand_env_vars(_cfg)
_model_cfg = _cfg.get("model", {})
@@ -1596,7 +1611,7 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
# Cross-platform file locking: fcntl on Unix, msvcrt on Windows
lock_fd = None
try:
lock_fd = open(lock_file, "w")
lock_fd = open(lock_file, "w", encoding="utf-8")
if fcntl:
fcntl.flock(lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
elif msvcrt:
@@ -365,7 +365,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
os.makedirs(log_dir, exist_ok=True)
run_ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
self._streaming_path = os.path.join(log_dir, f"samples_{run_ts}.jsonl")
self._streaming_file = open(self._streaming_path, "w")
self._streaming_file = open(self._streaming_path, "w", encoding="utf-8")
self._streaming_lock = __import__("threading").Lock()
print(f" Streaming results to: {self._streaming_path}")
@@ -422,7 +422,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
os.makedirs(log_dir, exist_ok=True)
run_ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
self._streaming_path = os.path.join(log_dir, f"samples_{run_ts}.jsonl")
self._streaming_file = open(self._streaming_path, "w")
self._streaming_file = open(self._streaming_path, "w", encoding="utf-8")
self._streaming_lock = threading.Lock()
print(f"\nYC-Bench eval matrix: {len(self.all_eval_items)} runs")
+2 -2
View File
@@ -744,7 +744,7 @@ class TelegramAdapter(BasePlatformAdapter):
return
import yaml as _yaml
with open(config_path, "r") as f:
with open(config_path, "r", encoding="utf-8") as f:
config = _yaml.safe_load(f) or {}
# Navigate to platforms.telegram.extra.dm_topics
@@ -3516,7 +3516,7 @@ class TelegramAdapter(BasePlatformAdapter):
return
import yaml as _yaml
with open(config_path, "r") as f:
with open(config_path, "r", encoding="utf-8") as f:
config = _yaml.safe_load(f) or {}
dm_topics = (
+44 -15
View File
@@ -21,6 +21,7 @@ import logging
import os
import platform
import re
import shutil
import signal
import subprocess
@@ -106,12 +107,15 @@ def _kill_stale_bridge_by_pidfile(session_path: Path) -> None:
except OSError:
pass
return
try:
os.kill(pid, 0) # check existence
os.kill(pid, signal.SIGTERM)
logger.info("[whatsapp] Killed stale bridge PID %d from pidfile", pid)
except (ProcessLookupError, PermissionError, OSError):
pass
# ``os.kill(pid, 0)`` is NOT a no-op on Windows (bpo-14484) — use the
# cross-platform existence check before sending a real signal.
from gateway.status import _pid_exists
if _pid_exists(pid):
try:
os.kill(pid, signal.SIGTERM)
logger.info("[whatsapp] Killed stale bridge PID %d from pidfile", pid)
except (ProcessLookupError, PermissionError, OSError):
pass
try:
pid_file.unlink()
except OSError:
@@ -151,10 +155,26 @@ def _terminate_bridge_process(proc, *, force: bool = False) -> None:
raise OSError(details or f"taskkill failed for PID {proc.pid}")
return
import signal
sig = signal.SIGTERM if not force else signal.SIGKILL
os.killpg(os.getpgid(proc.pid), sig)
import psutil
try:
parent = psutil.Process(proc.pid)
children = parent.children(recursive=True)
if force:
for child in children:
try:
child.kill()
except psutil.NoSuchProcess:
pass
parent.kill()
else:
for child in children:
try:
child.terminate()
except psutil.NoSuchProcess:
pass
parent.terminate()
except psutil.NoSuchProcess:
return
import sys
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
@@ -177,10 +197,15 @@ def check_whatsapp_requirements() -> bool:
WhatsApp requires a Node.js bridge for most implementations.
"""
# Check for Node.js
# Check for Node.js. Resolve via shutil.which so we respect PATHEXT
# (node.exe vs node) and get a meaningful "not installed" signal
# instead of spawning a cmd flash on Windows.
_node = shutil.which("node")
if not _node:
return False
try:
result = subprocess.run(
["node", "--version"],
[_node, "--version"],
capture_output=True,
text=True,
timeout=5
@@ -464,9 +489,13 @@ class WhatsAppAdapter(BasePlatformAdapter):
bridge_dir = bridge_path.parent
if not (bridge_dir / "node_modules").exists():
print(f"[{self.name}] Installing WhatsApp bridge dependencies...")
# Resolve npm path so Windows can execute the .cmd shim.
# shutil.which honours PATHEXT; on POSIX it returns the
# plain executable path.
_npm_bin = shutil.which("npm") or "npm"
try:
install_result = subprocess.run(
["npm", "install", "--silent"],
[_npm_bin, "install", "--silent"],
cwd=str(bridge_dir),
capture_output=True,
text=True,
@@ -516,7 +545,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
# messages are preserved for troubleshooting.
whatsapp_mode = os.getenv("WHATSAPP_MODE", "self-chat")
self._bridge_log = self._session_path.parent / "bridge.log"
bridge_log_fh = open(self._bridge_log, "a")
bridge_log_fh = open(self._bridge_log, "a", encoding="utf-8")
self._bridge_log_fh = bridge_log_fh
# Build bridge subprocess environment.
@@ -1160,7 +1189,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
if file_size > MAX_TEXT_INJECT_BYTES:
print(f"[{self.name}] Skipping text injection for {doc_path} ({file_size} bytes > {MAX_TEXT_INJECT_BYTES})", flush=True)
continue
content = Path(doc_path).read_text(errors="replace")
content = Path(doc_path).read_text(encoding="utf-8", errors="replace")
fname = Path(doc_path).name
# Remove the doc_<hex>_ prefix for display
display_name = fname
+160 -24
View File
@@ -13,6 +13,17 @@ Usage:
python cli.py --gateway
"""
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import asyncio
import dataclasses
import inspect
@@ -2784,6 +2795,74 @@ class GatewayRunner:
return
current_pid = os.getpid()
# On Windows there's no bash/setsid chain — spawn a tiny Python
# watcher directly via sys.executable instead. The watcher polls
# current_pid, waits for our exit, then runs `hermes gateway
# restart` with detach flags so the respawn survives the CLI
# that triggered the /restart command closing its console.
if sys.platform == "win32":
import textwrap
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
cmd_argv = [*hermes_cmd, "gateway", "restart"]
watcher = textwrap.dedent(
"""
import os, subprocess, sys, time
pid = int(sys.argv[1])
cmd = sys.argv[2:]
deadline = time.monotonic() + 120
def _alive(p):
# On Windows, os.kill(pid, 0) is NOT a no-op — it maps to
# GenerateConsoleCtrlEvent(0, pid) (bpo-14484). Use the
# Win32 handle-based existence check instead.
if os.name == 'nt':
import ctypes
k32 = ctypes.windll.kernel32
k32.OpenProcess.restype = ctypes.c_void_p
k32.WaitForSingleObject.restype = ctypes.c_uint
k32.GetLastError.restype = ctypes.c_uint
h = k32.OpenProcess(0x1000 | 0x100000, False, int(p))
if not h:
return k32.GetLastError() != 87
try:
return k32.WaitForSingleObject(h, 0) == 0x102
finally:
k32.CloseHandle(h)
try:
os.kill(int(p), 0)
return True
except ProcessLookupError:
return False
except PermissionError:
return True
except OSError:
return False
while time.monotonic() < deadline:
if not _alive(pid):
break
time.sleep(0.2)
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
subprocess.Popen(
cmd,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
creationflags=_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW,
)
"""
).strip()
subprocess.Popen(
[sys.executable, "-c", watcher, str(current_pid), *cmd_argv],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**windows_detach_popen_kwargs(),
)
return
cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
shell_cmd = (
f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
@@ -11305,30 +11384,78 @@ class GatewayRunner:
# where systemd-run --user fails due to missing D-Bus session).
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the
# gateway can stream it to the messenger in near-real-time.
hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
update_cmd = (
f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway"
f" > {shlex.quote(str(output_path))} 2>&1; "
f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
)
# Spawn `hermes update --gateway` detached so it survives gateway restart.
# --gateway enables file-based IPC for interactive prompts (stash
# restore, config migration) so the gateway can forward them to the
# user instead of silently skipping them.
# Use setsid for portable session detach (works under system services
# where systemd-run --user fails due to missing D-Bus session).
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the
# gateway can stream it to the messenger in near-real-time.
#
# Windows: no bash/setsid chain. Run `hermes update --gateway`
# directly via sys.executable; redirect stdout/stderr to the same
# output files via Popen file handles; write the exit code in a
# follow-up write. A tiny Python watcher would be cleaner but
# we're already inside gateway/run.py's update path which is async,
# so the simplest correct thing is: launch an inline Python helper
# that runs the command and writes both outputs.
try:
setsid_bin = shutil.which("setsid")
if setsid_bin:
# Preferred: setsid creates a new session, fully detached
if sys.platform == "win32":
import textwrap
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
# hermes_cmd is a list of argv parts we can pass directly
# (no shell-quoting needed).
helper = textwrap.dedent(
"""
import os, subprocess, sys
output_path = sys.argv[1]
exit_code_path = sys.argv[2]
cmd = sys.argv[3:]
env = dict(os.environ)
env["PYTHONUNBUFFERED"] = "1"
with open(output_path, "wb") as f:
proc = subprocess.Popen(cmd, stdout=f, stderr=subprocess.STDOUT, env=env)
rc = proc.wait()
with open(exit_code_path, "w") as f:
f.write(str(rc))
"""
).strip()
subprocess.Popen(
[setsid_bin, "bash", "-c", update_cmd],
[
sys.executable, "-c", helper,
str(output_path), str(exit_code_path),
*hermes_cmd, "update", "--gateway",
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
**windows_detach_popen_kwargs(),
)
else:
# Fallback: start_new_session=True calls os.setsid() in child
subprocess.Popen(
["bash", "-c", update_cmd],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
update_cmd = (
f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway"
f" > {shlex.quote(str(output_path))} 2>&1; "
f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
)
setsid_bin = shutil.which("setsid")
if setsid_bin:
# Preferred: setsid creates a new session, fully detached
subprocess.Popen(
[setsid_bin, "bash", "-c", update_cmd],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
else:
# Fallback: start_new_session=True calls os.setsid() in child
subprocess.Popen(
["bash", "-c", update_cmd],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
except Exception as e:
pending_path.unlink(missing_ok=True)
exit_code_path.unlink(missing_ok=True)
@@ -15095,13 +15222,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
except Exception:
pass
return False
# Wait up to 10 seconds for the old process to exit
# Wait up to 10 seconds for the old process to exit.
# ``os.kill(pid, 0)`` on Windows is NOT a no-op — use the
# handle-based existence check instead.
from gateway.status import _pid_exists
for _ in range(20):
try:
os.kill(existing_pid, 0)
time.sleep(0.5)
except (ProcessLookupError, PermissionError):
if not _pid_exists(existing_pid):
break # Process is gone
time.sleep(0.5)
else:
# Still alive after 10s — force kill
logger.warning(
@@ -15267,12 +15395,12 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
if threading.current_thread() is threading.main_thread():
for sig in (signal.SIGINT, signal.SIGTERM):
try:
loop.add_signal_handler(sig, shutdown_signal_handler, sig)
loop.add_signal_handler(sig, shutdown_signal_handler, sig) # windows-footgun: ok — wrapped in try/except NotImplementedError for Windows
except NotImplementedError:
pass
if hasattr(signal, "SIGUSR1"):
try:
loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler) # windows-footgun: ok — POSIX signal, guarded by hasattr above + try/except NotImplementedError
except NotImplementedError:
pass
else:
@@ -15385,6 +15513,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
def main():
"""CLI entry point for the gateway."""
# Force UTF-8 stdio on Windows — gateway logs and startup banner would
# otherwise UnicodeEncodeError on cp1252 consoles. No-op on POSIX.
try:
from hermes_cli.stdio import configure_windows_stdio
configure_windows_stdio()
except Exception:
pass
import argparse
parser = argparse.ArgumentParser(description="Hermes Gateway - Multi-platform messaging")
+81 -22
View File
@@ -113,7 +113,7 @@ def _get_process_start_time(pid: int) -> Optional[int]:
stat_path = Path(f"/proc/{pid}/stat")
try:
# Field 22 in /proc/<pid>/stat is process start time (clock ticks).
return int(stat_path.read_text().split()[21])
return int(stat_path.read_text(encoding="utf-8").split()[21])
except (FileNotFoundError, IndexError, PermissionError, ValueError, OSError):
return None
@@ -197,7 +197,7 @@ def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
if not path.exists():
return None
try:
raw = path.read_text().strip()
raw = path.read_text(encoding="utf-8").strip()
except OSError:
return None
if not raw:
@@ -299,6 +299,81 @@ def _try_acquire_file_lock(handle) -> bool:
return False
def _pid_exists(pid: int) -> bool:
"""Cross-platform "is this PID alive" check that does NOT kill the target.
CRITICAL on Windows: Python's ``os.kill(pid, 0)`` is NOT a no-op like it
is on POSIX. CPython's Windows implementation
(``Modules/posixmodule.c::os_kill_impl``) treats ``sig=0`` as
``CTRL_C_EVENT`` because the two values collide at the C level, and
routes it through ``GenerateConsoleCtrlEvent(0, pid)`` which sends
a Ctrl+C to the entire console process group containing the target
PID, not just the PID itself. Any caller that wanted to "check if
this PID is alive" via ``os.kill(pid, 0)`` on Windows was silently
killing that process (and often unrelated processes in the same
console group). Long-standing Python quirk; see bpo-14484.
Implementation: prefer :mod:`psutil` (hard dependency the canonical
cross-platform answer, maintained by Giampaolo Rodolà, uses
``OpenProcess + GetExitCodeProcess`` on Windows internally). Fall back
to a hand-rolled ctypes ``OpenProcess`` / ``WaitForSingleObject`` pair
on Windows + ``os.kill(pid, 0)`` on POSIX if psutil is somehow
unavailable e.g. stripped-down install or import error during the
scaffold phase before ``psutil`` is pip-installed.
"""
try:
import psutil # type: ignore
return bool(psutil.pid_exists(int(pid)))
except ImportError:
pass # Fall through to stdlib fallback.
if _IS_WINDOWS:
try:
import ctypes
kernel32 = ctypes.windll.kernel32 # type: ignore[attr-defined]
# Pin return types — default ctypes restype is c_int (signed),
# which mangles WAIT_* DWORD return codes into negative numbers.
kernel32.OpenProcess.restype = ctypes.c_void_p
kernel32.WaitForSingleObject.restype = ctypes.c_uint
kernel32.GetLastError.restype = ctypes.c_uint
PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
SYNCHRONIZE = 0x100000 # required for WaitForSingleObject
WAIT_TIMEOUT = 0x00000102
ERROR_INVALID_PARAMETER = 87
ERROR_ACCESS_DENIED = 5
handle = kernel32.OpenProcess(
PROCESS_QUERY_LIMITED_INFORMATION | SYNCHRONIZE, False, int(pid)
)
if not handle:
err = kernel32.GetLastError()
if err == ERROR_INVALID_PARAMETER:
return False # PID definitely gone
if err == ERROR_ACCESS_DENIED:
return True # Exists but owned by another user/session
return False # Conservative default for unknown errors
try:
wait_result = kernel32.WaitForSingleObject(handle, 0)
# WAIT_TIMEOUT = still running; anything else (WAIT_OBJECT_0
# via exit, WAIT_FAILED via handle issue) = treat as gone.
return wait_result == WAIT_TIMEOUT
finally:
kernel32.CloseHandle(handle)
except (OSError, AttributeError):
return False
else:
try:
os.kill(int(pid), 0) # windows-footgun: ok — POSIX-only branch (the whole point of _pid_exists)
return True
except ProcessLookupError:
return False
except PermissionError:
# Process exists but we can't signal it — still alive.
return True
except OSError:
return False
def _release_file_lock(handle) -> None:
try:
if _IS_WINDOWS:
@@ -503,10 +578,7 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
stale = existing_pid is None
if not stale:
try:
os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError, OSError):
# Windows raises OSError with WinError 87 for invalid pid check
if not _pid_exists(existing_pid):
stale = True
else:
current_start = _get_process_start_time(existing_pid)
@@ -517,13 +589,13 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
):
stale = True
# Check if process is stopped (Ctrl+Z / SIGTSTP) — stopped
# processes still respond to os.kill(pid, 0) but are not
# processes still appear alive to _pid_exists but are not
# actually running. Treat them as stale so --replace works.
if not stale:
try:
_proc_status = Path(f"/proc/{existing_pid}/status")
if _proc_status.exists():
for _line in _proc_status.read_text().splitlines():
for _line in _proc_status.read_text(encoding="utf-8").splitlines():
if _line.startswith("State:"):
_state = _line.split()[1]
if _state in ("T", "t"): # stopped or tracing stop
@@ -824,20 +896,7 @@ def get_running_pid(
if pid is None:
continue
try:
os.kill(pid, 0) # signal 0 = existence check, no actual signal sent
except ProcessLookupError:
continue
except PermissionError:
# The process exists but belongs to another user/service scope.
# With the runtime lock still held, prefer keeping it visible
# rather than deleting the PID file as "stale".
if _record_looks_like_gateway(record):
return pid
continue
except OSError:
# Windows raises OSError with WinError 87 for an invalid pid
# (process is definitely gone). Treat as "process doesn't exist".
if not _pid_exists(pid):
continue
recorded_start = record.get("start_time")
+129
View File
@@ -0,0 +1,129 @@
"""Windows UTF-8 bootstrap for Hermes entry points.
Python on Windows has two long-standing text-encoding footguns:
1. ``sys.stdout`` / ``sys.stderr`` are bound to the console code page
(``cp1252`` on US-locale installs), so ``print("café")`` crashes with
``UnicodeEncodeError: 'charmap' codec can't encode character``.
2. Child processes spawned via ``subprocess`` don't know to use UTF-8
unless ``PYTHONUTF8`` and/or ``PYTHONIOENCODING`` are set in their
environment so any Python subprocess (the execute_code sandbox,
delegation children, linter subprocesses, etc.) inherits the same
cp1252 defaults and hits the same UnicodeEncodeError.
This module fixes both on Windows *only* POSIX is untouched. It
should be imported at the very top of every Hermes entry point
(``hermes``, ``hermes-agent``, ``hermes-acp``, ``python -m gateway.run``,
``batch_runner.py``, ``cron/scheduler.py``) before any other imports
that might do file I/O or print to stdout.
What this module does on Windows:
- Sets ``os.environ["PYTHONUTF8"] = "1"`` (PEP 540 UTF-8 mode) so
every child process we spawn uses UTF-8 for ``open()`` and stdio.
- Sets ``os.environ["PYTHONIOENCODING"] = "utf-8"`` for belt-and-
suspenders some tools read this instead of / in addition to
``PYTHONUTF8``.
- Reconfigures ``sys.stdout`` / ``sys.stderr`` to UTF-8 in the current
process, using the ``reconfigure()`` API (Python 3.7+). This fixes
``print("café")`` in the parent without a re-exec.
What this module does NOT do:
- It does not re-exec Python with ``-X utf8``, so ``open()`` calls in
the *current* process still default to locale encoding. Those need
an explicit ``encoding="utf-8"`` at the call site (lint rule
``PLW1514`` / ``PYI058``). Ruff is the right tool for that sweep.
What this module does on POSIX:
- Nothing. POSIX systems are already UTF-8 by default in 99% of cases,
and we don't want to touch ``LANG``/``LC_*`` behavior that users may
have configured intentionally. If someone hits a C/POSIX locale on
Linux, they can export ``PYTHONUTF8=1`` themselves we won't override.
Idempotent: safe to call multiple times. ``_bootstrap_once`` guards
against double-reconfigure.
"""
from __future__ import annotations
import os
import sys
_IS_WINDOWS = sys.platform == "win32"
_bootstrap_applied = False
def apply_windows_utf8_bootstrap() -> bool:
"""Apply the Windows UTF-8 bootstrap if we're on Windows.
Returns True if bootstrap was applied (i.e. we're on Windows and
haven't already done this), False otherwise. The return value is
advisory callers normally don't need it, but tests may want to
assert the path was taken.
Idempotent: subsequent calls after the first are a no-op.
"""
global _bootstrap_applied
if not _IS_WINDOWS:
return False
if _bootstrap_applied:
return False
# 1. Child processes inherit these and run in UTF-8 mode.
# We use setdefault() rather than overwriting so the user can
# explicitly opt out by setting PYTHONUTF8=0 in their environment
# (or PYTHONIOENCODING=something-else) if they really want to.
os.environ.setdefault("PYTHONUTF8", "1")
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
# 2. Reconfigure the current process's stdio to UTF-8. Needed
# because os.environ changes don't retroactively rebind sys.stdout
# — those were bound at interpreter startup based on the console
# code page. ``reconfigure`` is a TextIOWrapper method since 3.7.
#
# errors="replace" means that if we ever *read* something from
# stdin that isn't UTF-8 (unlikely but possible with piped input
# from legacy tools), we'll get U+FFFD replacement chars rather
# than a crash. Output is pure UTF-8.
for stream_name in ("stdout", "stderr"):
stream = getattr(sys, stream_name, None)
if stream is None:
continue
reconfigure = getattr(stream, "reconfigure", None)
if reconfigure is None:
# Not a TextIOWrapper (could be redirected to a BytesIO in
# tests, or a non-standard stream in some embedded cases).
# Skip silently — the env-var fix is still in effect for
# child processes, which is the bigger win.
continue
try:
reconfigure(encoding="utf-8", errors="replace")
except (OSError, ValueError):
# Already closed, or someone replaced it with something
# non-reconfigurable. Non-fatal.
pass
# stdin is reconfigured separately with errors="replace" too — input
# from a legacy pipe shouldn't crash the process.
stdin = getattr(sys, "stdin", None)
if stdin is not None:
reconfigure = getattr(stdin, "reconfigure", None)
if reconfigure is not None:
try:
reconfigure(encoding="utf-8", errors="replace")
except (OSError, ValueError):
pass
_bootstrap_applied = True
return True
# Apply on import — entry points just need ``import hermes_bootstrap``
# (or ``from hermes_bootstrap import apply_windows_utf8_bootstrap``) at
# the very top of their module, before importing anything else. The
# import side effect does the right thing.
apply_windows_utf8_bootstrap()
+175
View File
@@ -0,0 +1,175 @@
"""Windows subprocess compatibility helpers.
Hermes is developed on Linux / macOS and tested natively on Windows too.
Several common subprocess patterns break silently-or-loudly on Windows:
* ``["npm", "install", ...]`` on Windows ``npm`` is ``npm.cmd``, a batch
shim. ``subprocess.Popen(["npm", ...])`` fails with WinError 193
("not a valid Win32 application") because CreateProcessW can't run a
``.cmd`` file without ``shell=True`` or PATHEXT resolution.
* ``start_new_session=True`` on POSIX, this maps to ``os.setsid()`` and
actually detaches the child. On Windows it's silently ignored; the
Windows equivalent is ``CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS``
creationflags, which Python only applies when you pass them explicitly.
* Console-window flashes every ``subprocess.Popen`` of a ``.exe`` on
Windows spawns a cmd window briefly unless ``CREATE_NO_WINDOW`` is
passed. Cosmetic but jarring for background daemons.
This module centralizes the platform-branching logic so the rest of the
codebase doesn't sprinkle ``if sys.platform == "win32":`` everywhere.
**All helpers are no-ops on non-Windows** calling them in Linux/macOS
code paths is safe by design. That's the "do no damage on POSIX"
guarantee.
"""
from __future__ import annotations
import os
import shutil
import subprocess
import sys
from typing import Optional, Sequence
__all__ = [
"IS_WINDOWS",
"resolve_node_command",
"windows_detach_flags",
"windows_hide_flags",
"windows_detach_popen_kwargs",
]
IS_WINDOWS = sys.platform == "win32"
# -----------------------------------------------------------------------------
# Node ecosystem launcher resolution
# -----------------------------------------------------------------------------
def resolve_node_command(name: str, argv: Sequence[str]) -> list[str]:
"""Resolve a Node-ecosystem command name to an absolute-path argv.
On Windows, commands like ``npm``, ``npx``, ``yarn``, ``pnpm``,
``playwright``, ``prettier`` ship as ``.cmd`` files (batch shims).
``subprocess.Popen(["npm", "install"])`` fails with WinError 193
because CreateProcessW doesn't execute batch files directly.
``shutil.which(name)`` *does* resolve ``.cmd`` via PATHEXT and returns
the fully-qualified path which CreateProcessW accepts because the
extension tells Windows to route through ``cmd.exe /c``.
On POSIX ``shutil.which`` also returns a fully-qualified path when
found. That's a small change from bare-name resolution (the OS does
its own PATH search) but functionally identical and has the side
benefit of making the argv reproducible in logs.
Behavior when the command is not on PATH:
- On Windows: return the bare name caller can still try with
``shell=True`` as a last resort, OR the subsequent Popen will
raise FileNotFoundError with a readable error we want to surface.
- On POSIX: same. Bare ``npm`` on a Linux box without npm installed
fails the same way it did before this function existed.
Args:
name: The command name to resolve (``npm``, ``npx``, ``node`` ).
argv: The remaining arguments. Must NOT include ``name`` itself
this function builds the full argv list.
Returns:
A list suitable for passing to subprocess.Popen/run/call.
"""
resolved = shutil.which(name)
if resolved:
return [resolved, *argv]
return [name, *argv]
# -----------------------------------------------------------------------------
# Detached / hidden process creation
# -----------------------------------------------------------------------------
# Win32 CreationFlags — defined here rather than imported from subprocess
# because CREATE_NO_WINDOW and DETACHED_PROCESS aren't guaranteed to be
# present on stdlib subprocess on older Pythons or non-Windows builds.
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
def windows_detach_flags() -> int:
"""Return Win32 creationflags that detach a child from the parent
console and process group. 0 on non-Windows.
Pair with ``start_new_session=False`` (default) when calling
subprocess.Popen on POSIX use ``start_new_session=True`` instead,
which maps to ``os.setsid()`` in the child.
Rationale:
- ``CREATE_NEW_PROCESS_GROUP`` child has its own process group so
Ctrl+C in the parent console doesn't propagate.
- ``DETACHED_PROCESS`` child has no console at all. Necessary for
background daemons (gateway watchers, update respawners) because
without it, closing the console kills the child.
- ``CREATE_NO_WINDOW`` suppress the brief cmd flash that would
otherwise appear when launching a console app. Redundant with
DETACHED_PROCESS but explicit for clarity.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
def windows_hide_flags() -> int:
"""Return Win32 creationflags that merely hide the child's console
window without detaching the child. 0 on non-Windows.
Use for short-lived console apps spawned as part of a larger
operation (``taskkill``, ``where``, version probes) where we want no
flash but also want to collect stdout/exit code synchronously.
The key difference from :func:`windows_detach_flags`: NO
``DETACHED_PROCESS`` the child still inherits stdio handles so
``capture_output=True`` works. ``DETACHED_PROCESS`` would sever
stdio and break stdout capture.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NO_WINDOW
def windows_detach_popen_kwargs() -> dict:
"""Return a dict of Popen kwargs that detach a child on Windows and
fall back to the POSIX equivalent (``start_new_session=True``) on
Linux/macOS.
Usage pattern:
.. code-block:: python
subprocess.Popen(
argv,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
stdin=subprocess.DEVNULL,
close_fds=True,
**windows_detach_popen_kwargs(),
)
This replaces the unsafe-on-Windows pattern:
.. code-block:: python
subprocess.Popen(..., start_new_session=True)
which silently fails to detach on Windows (the flag is accepted but
has no effect the child stays attached to the parent's console
and dies when the console closes).
"""
if IS_WINDOWS:
return {"creationflags": windows_detach_flags()}
return {"start_new_session": True}
+19 -8
View File
@@ -893,7 +893,7 @@ def _file_lock(
if msvcrt and (not lock_path.exists() or lock_path.stat().st_size == 0):
lock_path.write_text(" ", encoding="utf-8")
with lock_path.open("r+" if msvcrt else "a+") as lock_file:
with lock_path.open("r+" if msvcrt else "a+", encoding="utf-8") as lock_file:
deadline = time.monotonic() + max(1.0, timeout_seconds)
while True:
try:
@@ -2827,9 +2827,12 @@ def _poll_for_token(
# import instead of running the full device-code flow every time.
#
# File lives at ${HERMES_SHARED_AUTH_DIR}/nous_auth.json, defaulting to
# ~/.hermes/shared/nous_auth.json. It is OUTSIDE any named profile's
# HERMES_HOME so named profiles (which typically live under
# ~/.hermes/profiles/<name>/) all see the same file.
# ``<hermes-root>/shared/nous_auth.json`` where ``<hermes-root>`` is what
# ``get_default_hermes_root()`` returns — ``~/.hermes`` on Linux/macOS,
# ``%LOCALAPPDATA%\hermes`` on native Windows, or the Docker/custom root.
# It is OUTSIDE any named profile's HERMES_HOME so named profiles (which
# typically live under ``<hermes-root>/profiles/<name>/``) all see the
# same file.
#
# Written on successful login and on every runtime refresh so the stored
# refresh_token stays current even if one profile refreshes and rotates it.
@@ -2846,25 +2849,33 @@ def _nous_shared_auth_dir() -> Path:
Honors ``HERMES_SHARED_AUTH_DIR`` so tests can redirect it to a tmp
path without touching the real user's home. Defaults to
``~/.hermes/shared/``.
``<hermes-root>/shared/``, where ``<hermes-root>`` is what
:func:`hermes_constants.get_default_hermes_root` returns so
Linux/macOS classic installs land at ``~/.hermes/shared/``, native
Windows installs at ``%LOCALAPPDATA%\\hermes\\shared\\``, and
Docker / custom ``HERMES_HOME`` deployments at
``<HERMES_HOME>/shared/``. Sits outside any named profile so all
profiles under the same root share the store.
"""
override = os.getenv("HERMES_SHARED_AUTH_DIR", "").strip()
if override:
return Path(override).expanduser()
return Path.home() / ".hermes" / "shared"
from hermes_constants import get_default_hermes_root
return get_default_hermes_root() / "shared"
def _nous_shared_store_path() -> Path:
path = _nous_shared_auth_dir() / NOUS_SHARED_STORE_FILENAME
# Seat belt: if pytest is running and this resolves to a path under the
# real user's home, refuse rather than silently corrupt cross-profile
# real user's Hermes root, refuse rather than silently corrupt cross-profile
# state. Tests must set HERMES_SHARED_AUTH_DIR to a tmp_path (conftest
# does not do this automatically — mirror the _auth_file_path() guard
# so forgetting to set it fails loudly instead of writing to the real
# shared store).
if os.environ.get("PYTEST_CURRENT_TEST"):
from hermes_constants import get_default_hermes_root
real_home_shared = (
Path.home() / ".hermes" / "shared" / NOUS_SHARED_STORE_FILENAME
get_default_hermes_root() / "shared" / NOUS_SHARED_STORE_FILENAME
).resolve(strict=False)
try:
resolved = path.resolve(strict=False)
+1 -1
View File
@@ -246,7 +246,7 @@ def auth_add_command(args) -> None:
if provider == "nous":
# Codex-style auto-import: if a shared Nous credential lives at
# ~/.hermes/shared/nous_auth.json (written by any previous
# <hermes-root>/shared/nous_auth.json (written by any previous
# successful login), offer to import it instead of running the
# full device-code flow. This makes `hermes --profile <name>
# auth add nous --type oauth` a one-tap operation for users who
+3 -3
View File
@@ -573,7 +573,7 @@ def create_quick_snapshot(
"total_size": sum(manifest.values()),
"files": manifest,
}
with open(snap_dir / "manifest.json", "w") as f:
with open(snap_dir / "manifest.json", "w", encoding="utf-8") as f:
json.dump(meta, f, indent=2)
# Auto-prune
@@ -599,7 +599,7 @@ def list_quick_snapshots(
manifest_path = d / "manifest.json"
if manifest_path.exists():
try:
with open(manifest_path) as f:
with open(manifest_path, encoding="utf-8") as f:
results.append(json.load(f))
except (json.JSONDecodeError, OSError):
results.append({"id": d.name, "file_count": 0, "total_size": 0})
@@ -629,7 +629,7 @@ def restore_quick_snapshot(
if not manifest_path.exists():
return False
with open(manifest_path) as f:
with open(manifest_path, encoding="utf-8") as f:
meta = json.load(f)
restored = 0
+24 -15
View File
@@ -212,7 +212,7 @@ def get_container_exec_info() -> Optional[dict]:
try:
info = {}
with open(container_mode_file, "r") as f:
with open(container_mode_file, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if "=" in line and not line.startswith("#"):
@@ -297,7 +297,7 @@ def _is_container() -> bool:
return True
# LXC / cgroup-based detection
try:
with open("/proc/1/cgroup", "r") as f:
with open("/proc/1/cgroup", "r", encoding="utf-8") as f:
cgroup_content = f.read()
if "docker" in cgroup_content or "lxc" in cgroup_content or "kubepods" in cgroup_content:
return True
@@ -3452,7 +3452,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
if not manifest_file.exists():
continue
try:
with open(manifest_file) as _mf:
with open(manifest_file, encoding="utf-8") as _mf:
manifest = yaml.safe_load(_mf) or {}
except Exception:
manifest = {}
@@ -4148,8 +4148,9 @@ def load_env() -> Dict[str, str]:
if env_path.exists():
# On Windows, open() defaults to the system locale (cp1252) which can
# fail on UTF-8 .env files. Use explicit UTF-8 only on Windows.
open_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
# fail on UTF-8 .env files. Always use explicit UTF-8; tolerate BOM
# via utf-8-sig since users may edit .env in Notepad which adds one.
open_kw = {"encoding": "utf-8-sig", "errors": "replace"}
with open(env_path, **open_kw) as f:
raw_lines = f.readlines()
# Sanitize before parsing: split concatenated lines & drop stale
@@ -4234,8 +4235,8 @@ def sanitize_env_file() -> int:
if not env_path.exists():
return 0
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
with open(env_path, **read_kw) as f:
original_lines = f.readlines()
@@ -4324,8 +4325,8 @@ def save_env_value(key: str, value: str):
# On Windows, open() defaults to the system locale (cp1252) which can
# cause OSError errno 22 on UTF-8 .env files.
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
lines = []
if env_path.exists():
@@ -4394,8 +4395,8 @@ def remove_env_value(key: str) -> bool:
os.environ.pop(key, None)
return False
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
with open(env_path, **read_kw) as f:
lines = f.readlines()
@@ -4696,11 +4697,19 @@ def edit_config():
# Find editor
editor = os.getenv('EDITOR') or os.getenv('VISUAL')
if not editor:
# Try common editors
for cmd in ['nano', 'vim', 'vi', 'code', 'notepad']:
import shutil
# Try common editors — order is platform-aware so Windows users
# land on a working editor (notepad) even without Git Bash or nano
# installed. On POSIX, prefer nano/vim over code/notepad because
# it's more likely to be present on headless / server systems.
import shutil
import sys as _sys
if _sys.platform == "win32":
candidates = ['notepad', 'code', 'vim', 'vi', 'nano']
else:
candidates = ['nano', 'vim', 'vi', 'code', 'notepad']
for cmd in candidates:
if shutil.which(cmd):
editor = cmd
break
+60 -4
View File
@@ -598,7 +598,7 @@ def run_doctor(args):
# Detect stale root-level model keys (known bug source — PR #4329)
try:
import yaml
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
raw_config = yaml.safe_load(f) or {}
stale_root_keys = [k for k in ("provider", "base_url") if k in raw_config and isinstance(raw_config[k], str)]
if stale_root_keys:
@@ -1035,10 +1035,13 @@ def run_doctor(args):
check_ok("Node.js")
# Check if agent-browser is installed
agent_browser_path = PROJECT_ROOT / "node_modules" / "agent-browser"
agent_browser_ok = False
if agent_browser_path.exists():
check_ok("agent-browser (Node.js)", "(browser automation)")
agent_browser_ok = True
elif shutil.which("agent-browser"):
check_ok("agent-browser", "(browser automation)")
agent_browser_ok = True
else:
if _is_termux():
check_info("agent-browser is not installed (expected in the tested Termux path)")
@@ -1048,6 +1051,56 @@ def run_doctor(args):
check_info(step)
else:
check_warn("agent-browser not installed", "(run: npm install)")
# Chromium presence — the browser tools silently fail to register when
# agent-browser is found but no Playwright-managed Chromium is on disk
# (tools/browser_tool.py::check_browser_requirements filters them out
# before the agent ever sees them). Reuse the exact predicate it uses
# so the two checks cannot diverge. Skip on Termux (not a tested
# path).
if agent_browser_ok and not _is_termux():
try:
# Lazy import: browser_tool is a ~150KB module we don't want
# to eagerly load in every `hermes doctor` invocation.
from tools.browser_tool import (
_chromium_installed,
_is_camofox_mode,
_get_cloud_provider,
_get_cdp_override,
_using_lightpanda_engine,
)
except Exception:
# If browser_tool can't even import, that's a separate bug
# surfaced elsewhere; don't crash doctor.
pass
else:
# Only warn about Chromium if the installed engine actually
# requires it: Camofox, CDP override, a cloud provider, or
# Lightpanda all bypass the local Chromium requirement.
skip_chromium_check = (
_is_camofox_mode()
or bool(_get_cdp_override())
or _get_cloud_provider() is not None
or _using_lightpanda_engine()
)
if not skip_chromium_check:
if _chromium_installed():
check_ok("Playwright Chromium", "(browser engine)")
else:
check_warn(
"Playwright Chromium not installed",
"(browser_* tools will be hidden from the agent)",
)
if sys.platform == "win32":
check_info(
f"Install with: cd {PROJECT_ROOT} && "
"npx playwright install chromium"
)
else:
check_info(
f"Install with: cd {PROJECT_ROOT} && "
"npx playwright install --with-deps chromium"
)
else:
if _is_termux():
check_info("Node.js not found (browser tools are optional in the tested Termux path)")
@@ -1059,7 +1112,8 @@ def run_doctor(args):
check_warn("Node.js not found", "(optional, needed for browser tools)")
# npm audit for all Node.js packages
if _safe_which("npm"):
_npm_bin = _safe_which("npm")
if _npm_bin:
npm_dirs = [
(PROJECT_ROOT, "Browser tools (agent-browser)"),
(PROJECT_ROOT / "scripts" / "whatsapp-bridge", "WhatsApp bridge"),
@@ -1068,8 +1122,10 @@ def run_doctor(args):
if not (npm_dir / "node_modules").exists():
continue
try:
# Use resolved absolute path so Windows can execute
# npm.cmd (CreateProcessW can't run bare .cmd names).
audit_result = subprocess.run(
["npm", "audit", "--json"],
[_npm_bin, "audit", "--json"],
cwd=str(npm_dir),
capture_output=True, text=True, timeout=30,
)
@@ -1396,7 +1452,7 @@ def run_doctor(args):
import yaml as _yaml
_mem_cfg_path = HERMES_HOME / "config.yaml"
if _mem_cfg_path.exists():
with open(_mem_cfg_path) as _f:
with open(_mem_cfg_path, encoding="utf-8") as _f:
_raw_cfg = _yaml.safe_load(_f) or {}
_active_memory_provider = (_raw_cfg.get("memory") or {}).get("provider", "")
except Exception:
+1 -1
View File
@@ -113,7 +113,7 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
except ImportError:
return # early bootstrap — config module not available yet
read_kw = {"encoding": "utf-8", "errors": "replace"}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
try:
with open(path, **read_kw) as f:
original = f.readlines()
+383 -47
View File
@@ -131,9 +131,26 @@ def _get_service_pids() -> set:
def _get_parent_pid(pid: int) -> int | None:
"""Return the parent PID for ``pid``, or ``None`` when unavailable."""
"""Return the parent PID for ``pid``, or ``None`` when unavailable.
Uses psutil (core dependency) which works on every platform. The
older implementation shelled out to ``ps -o ppid= -p <pid>``, which
silently fails on Windows (no ``ps``) so the ancestor walk terminated
at self the caller's dedup / exclude logic then couldn't distinguish
"hermes CLI that invoked this scan" from "real gateway process".
"""
if pid <= 1:
return None
try:
import psutil # type: ignore
return psutil.Process(pid).ppid() or None
except ImportError:
pass
except Exception:
return None
# Fallback: shell out to ps (POSIX only — bare ``ps`` doesn't exist on Windows).
if not shutil.which("ps"):
return None
try:
result = subprocess.run(
["ps", "-o", "ppid=", "-p", str(pid)],
@@ -177,7 +194,7 @@ def _request_gateway_self_restart(pid: int) -> bool:
if not _is_pid_ancestor_of_current_process(pid):
return False
try:
os.kill(pid, signal.SIGUSR1)
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
except (ProcessLookupError, PermissionError, OSError):
return False
return True
@@ -213,7 +230,7 @@ def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
if pid <= 0:
return False
try:
os.kill(pid, signal.SIGUSR1)
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
except ProcessLookupError:
# Already gone — nothing to drain.
return True
@@ -223,15 +240,16 @@ def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
import time as _time
deadline = _time.monotonic() + max(drain_timeout, 1.0)
# IMPORTANT Windows note: ``os.kill(pid, 0)`` is NOT a no-op on
# Windows — Python's implementation calls ``TerminateProcess(handle, 0)``
# for sig=0, hard-killing the target. Use the cross-platform
# ``_pid_exists`` helper in gateway.status which does OpenProcess +
# WaitForSingleObject on Windows.
from gateway.status import _pid_exists
while _time.monotonic() < deadline:
try:
os.kill(pid, 0) # signal 0 — probe liveness
except ProcessLookupError:
if not _pid_exists(pid):
return True
except PermissionError:
# Process still exists but we can't signal it. Treat as alive
# so the caller falls back.
pass
_time.sleep(0.5)
# Drain didn't finish in time.
return False
@@ -299,6 +317,11 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
or f"HERMES_HOME={current_home}" in command
)
# Default-profile case: no profile flag in argv. Accept as long as
# the command doesn't advertise *some other* profile. HERMES_HOME
# may be passed via env (not visible in wmic/CIM command line) so
# its absence is NOT disqualifying — only a non-matching explicit
# HERMES_HOME= in argv is.
if "--profile " in command or " -p " in command:
return False
if "HERMES_HOME=" in command and f"HERMES_HOME={current_home}" not in command:
@@ -307,14 +330,52 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
try:
if is_windows():
result = subprocess.run(
["wmic", "process", "get", "ProcessId,CommandLine", "/FORMAT:LIST"],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=10,
)
# Prefer wmic when present (fast, stable output format). On
# modern Windows 11 / Win 10 late builds, wmic has been
# removed as part of the WMIC deprecation — fall back to
# PowerShell's Get-CimInstance. Any OSError here (FileNotFoundError
# on missing wmic) trips the fallback.
wmic_path = shutil.which("wmic")
used_fallback = False
result = None
if wmic_path is not None:
try:
result = subprocess.run(
[wmic_path, "process", "get", "ProcessId,CommandLine", "/FORMAT:LIST"],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=10,
)
except (OSError, subprocess.TimeoutExpired):
result = None
if result is None or result.returncode != 0 or not (result.stdout or ""):
# Fallback: PowerShell Get-CimInstance, emit LIST-style output
# so the downstream parser below doesn't need to branch.
powershell = shutil.which("powershell") or shutil.which("pwsh")
if powershell is None:
return []
ps_cmd = (
"Get-CimInstance Win32_Process | "
"ForEach-Object { "
" 'CommandLine=' + ($_.CommandLine -replace \"`r`n\",' ' -replace \"`n\",' '); "
" 'ProcessId=' + $_.ProcessId; "
" '' "
"}"
)
try:
result = subprocess.run(
[powershell, "-NoProfile", "-Command", ps_cmd],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=15,
)
except (OSError, subprocess.TimeoutExpired):
return []
used_fallback = True
if result.returncode != 0 or result.stdout is None:
return []
current_cmd = ""
@@ -372,9 +433,53 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
except (OSError, subprocess.TimeoutExpired):
return []
# Windows-specific: collapse venv launcher stubs. A venv-built
# ``pythonw.exe`` in ``<venv>/Scripts/`` is a ~100 KB launcher exe
# that spawns the base Python (e.g. ``C:\Program Files\Python311\
# pythonw.exe``) with the same command line, preserving the venv's
# ``pyvenv.cfg`` context. This is standard Windows CPython venv
# behaviour — BUT it means every gateway run produces two pythonw
# PIDs with identical command lines (one launcher stub, one actual
# interpreter) which is confusing in ``gateway status`` output.
# Filter the stub: if a PID in our result is the PARENT of another
# PID in our result, and both are pythonw.exe, the parent is the
# launcher stub — drop it, keep the child.
if is_windows() and len(pids) > 1:
pids = _filter_venv_launcher_stubs(pids)
return pids
def _filter_venv_launcher_stubs(pids: list[int]) -> list[int]:
"""Drop venv-launcher ``pythonw.exe`` stubs that are parents of the real
interpreter process. See comment at the tail of ``_scan_gateway_pids``.
Uses ``psutil`` (core dependency). Safe on any platform; only invoked
on Windows by the caller because the stub pattern is Windows-specific.
"""
try:
import psutil # type: ignore
except ImportError:
return pids
pid_set = set(pids)
# Collect each PID's parent so we can flag "child of another matched PID".
parent_of: dict[int, int | None] = {}
for pid in pids:
try:
parent_of[pid] = psutil.Process(pid).ppid()
except (psutil.NoSuchProcess, psutil.AccessDenied):
parent_of[pid] = None
# For each child whose parent is also in our set, drop the parent.
drop: set[int] = set()
for pid, ppid in parent_of.items():
if ppid is not None and ppid in pid_set:
drop.add(ppid)
return [p for p in pids if p not in drop]
def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = False) -> list:
"""Find PIDs of running gateway processes.
@@ -441,6 +546,25 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
if old_pid <= 0:
return False
# The watcher is a tiny Python subprocess that polls the old PID and
# respawns the gateway once it's gone. Both legs of the chain need
# platform-appropriate detach semantics:
#
# POSIX — ``start_new_session=True`` (os.setsid in the child) detaches
# from the parent's process group so Ctrl+C in the CLI doesn't
# propagate and the watcher/gateway survive the CLI exiting.
#
# Windows — ``start_new_session`` is silently accepted but does NOT
# detach. The watcher stays attached to the CLI's console and dies
# when the user closes the terminal, leaving ``hermes update`` users
# with no running gateway until they re-invoke ``hermes gateway``
# manually. The Win32 equivalent is the ``CREATE_NEW_PROCESS_GROUP |
# DETACHED_PROCESS | CREATE_NO_WINDOW`` creationflags bundle.
#
# ``windows_detach_popen_kwargs()`` returns the right kwargs for the
# host platform and is a no-op on POSIX (just ``start_new_session=True``).
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
watcher = textwrap.dedent(
"""
import os
@@ -452,28 +576,41 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
cmd = sys.argv[2:]
deadline = time.monotonic() + 120
while time.monotonic() < deadline:
try:
os.kill(pid, 0)
except ProcessLookupError:
# ``os.kill(pid, 0)`` is not a no-op on Windows — use the
# cross-platform existence check.
from gateway.status import _pid_exists
if not _pid_exists(pid):
break
except PermissionError:
pass
time.sleep(0.2)
subprocess.Popen(
cmd,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
# Platform-appropriate detach for the respawned gateway. On POSIX
# start_new_session=True maps to os.setsid; on Windows we need
# explicit creationflags because start_new_session is a no-op there.
_popen_kwargs = {
"stdout": subprocess.DEVNULL,
"stderr": subprocess.DEVNULL,
}
if sys.platform == "win32":
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
_popen_kwargs["creationflags"] = (
_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
)
else:
_popen_kwargs["start_new_session"] = True
subprocess.Popen(cmd, **_popen_kwargs)
"""
).strip()
try:
# Same platform-aware detach for the watcher process itself — so
# closing the user's terminal doesn't kill the watcher.
subprocess.Popen(
[sys.executable, "-c", watcher, str(old_pid), *_gateway_run_args_for_profile(profile)],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
**windows_detach_popen_kwargs(),
)
except OSError:
return False
@@ -929,14 +1066,14 @@ def stop_profile_gateway() -> bool:
print(f"⚠ Permission denied to kill PID {pid}")
return False
# Wait briefly for it to exit
# Wait briefly for it to exit. On Windows, os.kill(pid, 0) is NOT
# a no-op — route through the cross-platform existence check.
import time as _time
from gateway.status import _pid_exists
for _ in range(20):
try:
os.kill(pid, 0)
_time.sleep(0.5)
except (ProcessLookupError, PermissionError):
if not _pid_exists(pid):
break
_time.sleep(0.5)
if get_running_pid() is None:
remove_pid_file()
@@ -1120,13 +1257,13 @@ class SystemScopeRequiresRootError(RuntimeError):
def _user_dbus_socket_path() -> Path:
"""Return the expected per-user D-Bus socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
return Path(xdg) / "bus"
def _user_systemd_private_socket_path() -> Path:
"""Return the per-user systemd private socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
return Path(xdg) / "systemd" / "private"
@@ -1149,7 +1286,7 @@ def _ensure_user_systemd_env() -> None:
We detect the standard socket path and set the vars so all subsequent
subprocess calls inherit them.
"""
uid = os.getuid()
uid = os.getuid() # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
if "XDG_RUNTIME_DIR" not in os.environ:
runtime_dir = f"/run/user/{uid}"
if Path(runtime_dir).exists():
@@ -1215,7 +1352,7 @@ def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
username,
reason="User systemd control sockets are missing even though linger is enabled.",
fix_hint=(
f" systemctl start user@{os.getuid()}.service\n"
f" systemctl start user@{os.getuid()}.service\n" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
" (may require sudo; try again after the command succeeds)"
),
)
@@ -1485,7 +1622,7 @@ def remove_legacy_hermes_units(
# System-scope removal (needs root)
if system_units:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd removal path, guarded by `if system == "Linux"` / systemd-only branch
print()
print_warning("System-scope legacy units require root to remove.")
print_info(" Re-run with: sudo hermes gateway migrate-legacy")
@@ -1532,7 +1669,7 @@ def print_systemd_scope_conflict_warning() -> None:
def _require_root_for_system_service(action: str) -> None:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
raise SystemScopeRequiresRootError(
f"System gateway {action} requires root. Re-run with sudo.",
action,
@@ -1600,7 +1737,7 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b
if scope == "system":
run_as_user = _default_system_service_user()
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd install wizard, never invoked on Windows
print_warning(" System service install requires sudo, so Hermes can't create it from this user session.")
if run_as_user:
print_info(f" After setup, run: sudo hermes gateway install --system --run-as-user {run_as_user}")
@@ -1644,7 +1781,7 @@ def get_systemd_linger_status() -> tuple[bool | None, str]:
if not username:
try:
import pwd
username = pwd.getpwuid(os.getuid()).pw_name
username = pwd.getpwuid(os.getuid()).pw_name # windows-footgun: ok — POSIX loginctl helper, never invoked on Windows
except Exception:
return None, "could not determine current user"
@@ -1694,7 +1831,7 @@ def _launchd_user_home() -> Path:
"""
import pwd
return Path(pwd.getpwuid(os.getuid()).pw_dir)
return Path(pwd.getpwuid(os.getuid()).pw_dir) # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
def get_launchd_plist_path() -> Path:
@@ -2093,7 +2230,7 @@ def _system_scope_wizard_would_need_root(system: bool = False) -> bool:
``SystemScopeRequiresRootError`` propagate out and leave the user
staring at a bare shell.
"""
if os.geteuid() == 0:
if os.geteuid() == 0: # windows-footgun: ok — systemd scope wizard decision, never invoked on Windows
return False
return _select_systemd_scope(system=system)
@@ -2444,7 +2581,7 @@ def get_launchd_label() -> str:
def _launchd_domain() -> str:
return f"gui/{os.getuid()}"
return f"gui/{os.getuid()}" # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
def generate_launchd_plist() -> str:
@@ -2819,6 +2956,62 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
_guard_official_docker_root_gateway()
sys.path.insert(0, str(PROJECT_ROOT))
# On Windows, when the gateway is launched as a detached background
# process (via ``hermes gateway install`` → Scheduled Task / Startup
# folder / direct pythonw.exe spawn) there is no console attached. In
# that case Windows can still deliver CTRL_C_EVENT / CTRL_BREAK_EVENT
# to the process group under some circumstances (e.g. when *another*
# process in the same group sends one), which Python 3.11 translates
# into KeyboardInterrupt inside asyncio.run(). The outer handler below
# catches that and exits cleanly — silently killing the gateway. On
# detached boots we must absorb those spurious signals so the gateway
# stays alive; real user Ctrl+C still comes through prompt_toolkit /
# the asyncio signal handler when running in a real console.
#
# IMPORTANT lesson (May 2026): we originally gated this on "stdin is
# NOT a TTY" assuming only detached pythonw runs would be vulnerable.
# Wrong. When the user runs `hermes gateway start` from a PowerShell
# console, the gateway inherits that console and stdin IS a TTY —
# but it's STILL vulnerable to CTRL_C_EVENT broadcast by any sibling
# `hermes` invocation (like `hermes gateway status` 30 seconds later)
# because Windows routes console events to all processes sharing the
# console. Every hermes CLI process after that sibling fires is a
# potential drive-by killer. So on Windows, for `gateway run`
# specifically (never interactive by design), always install the
# SIGINT absorber regardless of TTY state.
try:
_stdin_is_tty = bool(sys.stdin and sys.stdin.isatty())
except (ValueError, OSError):
_stdin_is_tty = False
if is_windows():
try:
signal.signal(signal.SIGINT, signal.SIG_IGN)
if hasattr(signal, "SIGBREAK"):
signal.signal(signal.SIGBREAK, signal.SIG_IGN)
except (OSError, ValueError):
# SetConsoleCtrlHandler not available (rare on Windows) —
# best-effort, proceed either way.
pass
# Python's signal module only hooks SIGINT/SIGBREAK. To also
# absorb CTRL_CLOSE_EVENT / CTRL_LOGOFF_EVENT and any other
# console control signals Windows may broadcast to the console
# process group, call the native SetConsoleCtrlHandler(NULL, TRUE)
# — this tells the kernel to IGNORE all console control events
# for this process entirely, which is what background services
# are supposed to do. Belt-and-braces over the Python-level
# handlers above.
try:
import ctypes
kernel32 = ctypes.windll.kernel32 # type: ignore[attr-defined]
# BOOL SetConsoleCtrlHandler(NULL, Add) — Add=TRUE means
# "install the NULL handler", which has the documented
# effect of ignoring Ctrl+C. Called twice for defense in
# depth: once before any Python import could have flipped
# our disposition, once as our last word.
kernel32.SetConsoleCtrlHandler(None, 1)
except (OSError, AttributeError):
pass
# Refresh the systemd unit definition on every boot so that restart
# settings (RestartSec, StartLimitIntervalSec, etc.) stay current even
# when the process was respawned via exit-code-75 (stale-code or
@@ -2846,13 +3039,86 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
# Exit with code 1 if gateway fails to connect any platform,
# so systemd Restart=always will retry on transient errors
verbosity = None if quiet else verbose
# ── Exit-path diagnostics ────────────────────────────────────────────
# When the gateway dies silently on Windows (no shutdown log, no
# traceback in gateway.log / errors.log), we're usually blind to the
# cause. The code below captures *every* way the asyncio.run() call
# below can return, with full context dumped to a dedicated log so
# the next silent death yields evidence instead of a mystery. This
# is diagnostic scaffolding; cheap to keep on, costs nothing during
# normal operation, and the emitted lines are opt-in via the
# HERMES_GATEWAY_EXIT_DIAG env var (default: on while we're still
# chasing the Windows lifecycle bug).
import atexit as _atexit
import traceback as _traceback
from datetime import datetime as _dt, timezone as _tz
def _exit_diag(tag: str, **extra: object) -> None:
if os.environ.get("HERMES_GATEWAY_EXIT_DIAG", "1") != "1":
return
try:
from hermes_constants import get_hermes_home as _ghh
log_dir = _ghh() / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
ts = _dt.now(_tz.utc).isoformat()
line = {
"ts": ts,
"tag": tag,
"pid": os.getpid(),
"python": sys.version.split()[0],
"platform": sys.platform,
**extra,
}
import json as _json
with open(log_dir / "gateway-exit-diag.log", "a", encoding="utf-8") as f:
f.write(_json.dumps(line, default=str) + "\n")
except Exception:
pass # never let the diagnostic itself crash the gateway
_exit_diag(
"gateway.start",
replace=replace,
argv=sys.argv,
stdin_is_tty=_stdin_is_tty,
)
def _atexit_hook() -> None:
_exit_diag("atexit.hook", sys_exc=repr(sys.exc_info()))
_atexit.register(_atexit_hook)
success = False
try:
success = asyncio.run(start_gateway(replace=replace, verbosity=verbosity))
_exit_diag("asyncio.run.returned", success=success)
except KeyboardInterrupt:
# On Windows-detached runs this shouldn't fire (we absorb SIGINT above),
# but keep the handler for console runs.
_exit_diag(
"asyncio.run.KeyboardInterrupt",
traceback=_traceback.format_exc(),
)
print("\nGateway stopped.")
return
except SystemExit as e:
_exit_diag("asyncio.run.SystemExit", code=getattr(e, "code", None),
traceback=_traceback.format_exc())
raise
except BaseException as e:
# Absolutely everything else: Exception, asyncio.CancelledError,
# even exotic BaseException subclasses. We want the cause logged.
_exit_diag(
"asyncio.run.exception",
exc_type=type(e).__name__,
exc_repr=repr(e),
traceback=_traceback.format_exc(),
)
raise
if not success:
_exit_diag("gateway.exit_nonzero")
sys.exit(1)
_exit_diag("gateway.exit_clean")
# =============================================================================
@@ -3700,6 +3966,9 @@ def _is_service_installed() -> bool:
return get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()
elif is_macos():
return get_launchd_plist_path().exists()
elif is_windows():
from hermes_cli import gateway_windows
return gateway_windows.is_installed()
return False
@@ -3741,6 +4010,12 @@ def _is_service_running() -> bool:
return result.returncode == 0
except subprocess.TimeoutExpired:
return False
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
# "installed" doesn't necessarily mean "running" on Windows. The
# canonical check is whether a gateway process actually exists.
return len(find_gateway_pids()) > 0
# Check for manual processes
return len(find_gateway_pids()) > 0
@@ -4589,6 +4864,9 @@ def _gateway_command_inner(args):
systemd_install(force=force, system=system, run_as_user=run_as_user)
elif is_macos():
launchd_install(force)
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.install(force=force)
elif is_wsl():
print("WSL detected but systemd is not running.")
print("Either enable systemd (add systemd=true to /etc/wsl.conf and restart WSL)")
@@ -4625,6 +4903,9 @@ def _gateway_command_inner(args):
systemd_uninstall(system=system)
elif is_macos():
launchd_uninstall()
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.uninstall()
elif is_container():
print("Service uninstall is not applicable inside a Docker container.")
print("To stop the gateway, stop or remove the container:")
@@ -4655,6 +4936,9 @@ def _gateway_command_inner(args):
systemd_start(system=system)
elif is_macos():
launchd_start()
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.start()
elif is_wsl():
print("WSL detected but systemd is not available.")
print("Run the gateway in foreground mode instead:")
@@ -4697,6 +4981,14 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_available else 0)
if total:
@@ -4718,9 +5010,17 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
if not service_available:
# No systemd/launchd — use profile-scoped PID file
# No systemd/launchd/schtasks service — use profile-scoped PID file
if stop_profile_gateway():
print("✓ Stopped gateway for this profile")
else:
@@ -4750,6 +5050,14 @@ def _gateway_command_inner(args):
service_stopped = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_stopped = True
except (subprocess.CalledProcessError, RuntimeError):
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_stopped else 0)
if total:
@@ -4762,6 +5070,12 @@ def _gateway_command_inner(args):
systemd_start(system=system)
elif is_macos() and get_launchd_plist_path().exists():
launchd_start()
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
gateway_windows.start()
else:
run_gateway(verbose=0)
else:
run_gateway(verbose=0)
return
@@ -4780,6 +5094,15 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
service_configured = True
try:
gateway_windows.restart()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
if not service_available:
# systemd/launchd restart failed — check if linger is the issue
@@ -4822,12 +5145,20 @@ def _gateway_command_inner(args):
snapshot = get_gateway_runtime_snapshot(system=system)
# Check for service first
_windows_service_installed = False
if is_windows():
from hermes_cli import gateway_windows
_windows_service_installed = gateway_windows.is_installed()
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
systemd_status(deep, system=system, full=full)
_print_gateway_process_mismatch(snapshot)
elif is_macos() and get_launchd_plist_path().exists():
launchd_status(deep)
_print_gateway_process_mismatch(snapshot)
elif _windows_service_installed:
from hermes_cli import gateway_windows
gateway_windows.status(deep=deep)
_print_gateway_process_mismatch(snapshot)
else:
# Check for manually running processes
pids = list(snapshot.gateway_pids)
@@ -4848,6 +5179,9 @@ def _gateway_command_inner(args):
print("WSL note:")
print(" The gateway is running in foreground/manual mode (recommended for WSL).")
print(" Use tmux or screen for persistence across terminal closes.")
elif is_windows():
print("To install as a Windows Scheduled Task (auto-start on login):")
print(" hermes gateway install")
else:
print("To install as a service:")
print(" hermes gateway install")
@@ -4868,6 +5202,8 @@ def _gateway_command_inner(args):
elif is_wsl():
print(" tmux new -s hermes 'hermes gateway run' # persistent via tmux")
print(" nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 & # background")
elif is_windows():
print(" hermes gateway install # Install as Windows Scheduled Task (auto-start on login)")
else:
print(" hermes gateway install # Install as user service")
print(" sudo hermes gateway install --system # Install as boot-time system service")
+689
View File
@@ -0,0 +1,689 @@
"""Windows gateway service backend (Scheduled Task + Startup-folder fallback).
This mirrors the contract exposed by ``launchd_install`` / ``launchd_start`` /
``launchd_status`` etc. on macOS and ``systemd_install`` / ``systemd_start`` on
Linux. It uses ``schtasks`` under the hood with ``/SC ONLOGON`` and restart-on-
failure XML settings, and falls back to a ``%APPDATA%\\...\\Startup\\<name>.cmd``
dropper when Scheduled Task creation is denied (locked-down corporate boxes).
Design notes
------------
* ``schtasks /Create /SC ONLOGON /RL LIMITED`` means the task runs at the
CURRENT USER's next logon without any elevation prompt. We also
``schtasks /Run`` immediately after install so the gateway starts right
away without waiting for the next logon.
* We write two files: a shared ``gateway.cmd`` wrapper script (cwd + env + the
actual ``python -m hermes_cli.main gateway run --replace`` invocation) and
EITHER a schtasks entry pointing at it OR a Startup-folder ``.cmd`` that
spawns it detached.
* Status = merge of "is the schtasks entry registered?" + "is the startup
.cmd present?" + "is there a gateway process running?" so the status
command keeps working regardless of which install path was taken.
* Quoting is tricky: schtasks parses ``/TR`` itself and cmd.exe parses the
generated ``gateway.cmd``. Those are DIFFERENT parsers. We keep two
separate quote helpers (same pattern OpenClaw uses) and never cross them.
* All of this is Windows-only. ``import`` paths are still safe on POSIX but
the functions raise if called on non-Windows.
"""
from __future__ import annotations
import os
import re
import shlex
import shutil
import subprocess
import sys
import time
from pathlib import Path
# Short timeouts: schtasks occasionally wedges and we don't want to hang forever.
_SCHTASKS_TIMEOUT_S = 15
_SCHTASKS_NO_OUTPUT_TIMEOUT_S = 30
# Patterns in schtasks stderr that mean "fall back to the Startup folder".
_FALLBACK_PATTERNS = re.compile(
r"(access is denied|acceso denegado|schtasks timed out|schtasks produced no output)",
re.IGNORECASE,
)
_TASK_NAME_DEFAULT = "Hermes_Gateway"
_TASK_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
# ---------------------------------------------------------------------------
# Platform guard
# ---------------------------------------------------------------------------
def _assert_windows() -> None:
if sys.platform != "win32":
raise RuntimeError("gateway_windows is Windows-only")
# ---------------------------------------------------------------------------
# Quoting helpers (two DIFFERENT parsers — do not mix)
# ---------------------------------------------------------------------------
def _quote_cmd_script_arg(value: str) -> str:
"""Quote a single argument for use INSIDE a .cmd file, for cmd.exe parsing.
cmd.exe splits on spaces/tabs outside of double quotes. Embedded quotes
are doubled. We also refuse line breaks because they'd terminate the
logical command line mid-script.
"""
if "\r" in value or "\n" in value:
raise ValueError(f"refusing to quote value containing newline: {value!r}")
if not value:
return '""'
if not re.search(r'[ \t"]', value):
return value
return '"' + value.replace('"', '""') + '"'
def _quote_schtasks_arg(value: str) -> str:
"""Quote a single argument for schtasks.exe's /TR parser.
Schtasks uses a different quoting convention than cmd.exe: embedded
quotes are backslash-escaped, and the whole thing is wrapped in double
quotes if it contains whitespace or quotes.
"""
if not re.search(r'[ \t"]', value):
return value
return '"' + value.replace('"', '\\"') + '"'
# ---------------------------------------------------------------------------
# schtasks.exe wrapper
# ---------------------------------------------------------------------------
def _exec_schtasks(args: list[str]) -> tuple[int, str, str]:
"""Run ``schtasks.exe`` with a hard timeout. Return (code, stdout, stderr).
If schtasks wedges, returns code=124 with a synthetic stderr string
same convention OpenClaw uses, so the fallback detection regex matches.
"""
_assert_windows()
schtasks = shutil.which("schtasks")
if schtasks is None:
return (1, "", "schtasks.exe not found on PATH")
try:
proc = subprocess.run(
[schtasks, *args],
capture_output=True,
text=True,
timeout=_SCHTASKS_TIMEOUT_S,
# CREATE_NO_WINDOW avoids a flashing console window when the CLI
# is itself hosted in a TUI. See tools/browser_tool.py for the
# same pattern and the windows-subprocess-sigint-storm.md ref.
creationflags=0x08000000, # CREATE_NO_WINDOW
)
return (proc.returncode, proc.stdout or "", proc.stderr or "")
except subprocess.TimeoutExpired:
return (124, "", f"schtasks timed out after {_SCHTASKS_TIMEOUT_S}s")
except OSError as e:
return (1, "", f"schtasks invocation failed: {e}")
def _should_fall_back(code: int, detail: str) -> bool:
return code == 124 or bool(_FALLBACK_PATTERNS.search(detail or ""))
# ---------------------------------------------------------------------------
# Paths: where we stash our task script and where Startup lives
# ---------------------------------------------------------------------------
def get_task_name() -> str:
"""Scheduled Task name, scoped per profile.
Default profile: ``Hermes_Gateway``
Named profile X: ``Hermes_Gateway_<X>``
"""
_assert_windows()
# Local import to avoid circular module initialization during hermes_cli boot.
from hermes_cli.gateway import _profile_suffix
suffix = _profile_suffix()
if not suffix:
return _TASK_NAME_DEFAULT
return f"{_TASK_NAME_DEFAULT}_{suffix}"
def _sanitize_filename(value: str) -> str:
"""Remove characters illegal in Windows filenames."""
return re.sub(r'[<>:"/\\|?*\x00-\x1f]', "_", value)
def get_task_script_path() -> Path:
"""The generated ``gateway.cmd`` wrapper that the schtasks entry invokes.
Lives under ``%LOCALAPPDATA%\\hermes\\gateway-service\\<task_name>.cmd``
(or ``<HERMES_HOME>/gateway-service/<task_name>.cmd`` so per-profile
Hermes installs stay self-contained).
"""
_assert_windows()
from hermes_cli.config import get_hermes_home
script_dir = Path(get_hermes_home()) / "gateway-service"
script_dir.mkdir(parents=True, exist_ok=True)
return script_dir / f"{_sanitize_filename(get_task_name())}.cmd"
def _startup_dir() -> Path:
appdata = os.environ.get("APPDATA", "").strip()
if appdata:
return Path(appdata) / "Microsoft" / "Windows" / "Start Menu" / "Programs" / "Startup"
userprofile = os.environ.get("USERPROFILE", "").strip() or os.environ.get("HOME", "").strip()
if not userprofile:
raise RuntimeError("neither APPDATA nor USERPROFILE is set — cannot resolve Startup folder")
return (
Path(userprofile)
/ "AppData"
/ "Roaming"
/ "Microsoft"
/ "Windows"
/ "Start Menu"
/ "Programs"
/ "Startup"
)
def get_startup_entry_path() -> Path:
_assert_windows()
return _startup_dir() / f"{_sanitize_filename(get_task_name())}.cmd"
# ---------------------------------------------------------------------------
# Script rendering
# ---------------------------------------------------------------------------
def _build_gateway_cmd_script(
python_path: str,
working_dir: str,
hermes_home: str,
profile_arg: str,
) -> str:
"""Build the ``gateway.cmd`` wrapper content (CRLF-terminated).
The script:
- cd's into the project directory
- exports HERMES_HOME, PYTHONIOENCODING, VIRTUAL_ENV
- invokes ``python -m hermes_cli.main [--profile X] gateway run --replace``
We intentionally do NOT inline PATH overrides here cmd.exe inherits
the per-user PATH the Scheduled Task was created with, and forcibly
rewriting PATH tends to break Homebrew/nvm-style installations.
"""
lines = ["@echo off", f"rem {_TASK_DESCRIPTION}"]
lines.append(f"cd /d {_quote_cmd_script_arg(working_dir)}")
lines.append(f'set "HERMES_HOME={hermes_home}"')
lines.append('set "PYTHONIOENCODING=utf-8"')
# VIRTUAL_ENV lets the gateway's own python detection find the venv
# if someone imports hermes_constants-based logic during startup.
venv_dir = str(Path(python_path).resolve().parent.parent)
lines.append(f'set "VIRTUAL_ENV={venv_dir}"')
prog_args = [python_path, "-m", "hermes_cli.main"]
if profile_arg:
prog_args.extend(profile_arg.split())
prog_args.extend(["gateway", "run", "--replace"])
lines.append(" ".join(_quote_cmd_script_arg(a) for a in prog_args))
return "\r\n".join(lines) + "\r\n"
def _build_startup_launcher(script_path: Path) -> str:
"""The tiny .cmd that goes in the Startup folder. Just minimizes and chains."""
lines = [
"@echo off",
f"rem {_TASK_DESCRIPTION}",
# ``start "" /min`` detaches with a minimized console window.
# ``/d /c`` on cmd.exe skips AUTORUN and runs the target script once.
f'start "" /min cmd.exe /d /c {_quote_cmd_script_arg(str(script_path))}',
]
return "\r\n".join(lines) + "\r\n"
def _write_task_script() -> Path:
"""Generate and write the gateway.cmd wrapper. Return its absolute path."""
_assert_windows()
# Local imports to avoid circular-init at module load time.
from hermes_cli.config import get_hermes_home
from hermes_cli.gateway import (
PROJECT_ROOT,
_profile_arg,
get_python_path,
)
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
hermes_home = str(Path(get_hermes_home()).resolve())
profile_arg = _profile_arg(hermes_home)
content = _build_gateway_cmd_script(python_path, working_dir, hermes_home, profile_arg)
script_path = get_task_script_path()
script_path.write_text(content, encoding="utf-8", newline="")
return script_path
# ---------------------------------------------------------------------------
# Install / uninstall
# ---------------------------------------------------------------------------
def _resolve_task_user() -> str | None:
"""Return ``DOMAIN\\USER`` if available, else bare USERNAME, else None."""
username = os.environ.get("USERNAME") or os.environ.get("USER") or os.environ.get("LOGNAME")
if not username:
return None
if "\\" in username:
return username
domain = os.environ.get("USERDOMAIN")
return f"{domain}\\{username}" if domain else username
def _install_scheduled_task(task_name: str, script_path: Path) -> tuple[bool, str]:
"""Create or update the Scheduled Task. Returns (success, detail)."""
quoted_script = _quote_schtasks_arg(str(script_path))
# First try /Change in case the task already exists — keeps the existing
# trigger + settings intact and just repoints /TR.
change_code, _out, change_err = _exec_schtasks(
["/Change", "/TN", task_name, "/TR", quoted_script]
)
if change_code == 0:
return (True, f"Updated existing Scheduled Task {task_name!r}")
# Create fresh. Start with the "current user, interactive, no stored
# password" variant; if that fails, retry without /RU /NP /IT.
base = [
"/Create",
"/F",
"/SC",
"ONLOGON",
"/RL",
"LIMITED",
"/TN",
task_name,
"/TR",
quoted_script,
]
user = _resolve_task_user()
variants = []
if user:
variants.append([*base, "/RU", user, "/NP", "/IT"])
variants.append(base)
last_code = 1
last_err = ""
for argv in variants:
code, out, err = _exec_schtasks(argv)
if code == 0:
return (True, f"Created Scheduled Task {task_name!r}")
last_code, last_err = code, (err or out or "")
return (False, f"schtasks /Create failed (code {last_code}): {last_err.strip()}")
def _install_startup_entry(script_path: Path) -> Path:
"""Write the Startup-folder fallback launcher. Returns its path."""
entry = get_startup_entry_path()
entry.parent.mkdir(parents=True, exist_ok=True)
entry.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
return entry
def _derive_venv_pythonw(python_exe: str) -> str:
"""Given a ``python.exe`` path, return the sibling ``pythonw.exe`` if present.
``pythonw.exe`` is the console-less variant. Using it for detached
daemons means there's no console handle to inherit from the spawning
shell, which is what lets the gateway survive a parent-shell exit on
Windows. Falls back to the original ``python.exe`` if the ``w`` variant
isn't there — caller must still set CREATE_NO_WINDOW in that case.
"""
p = Path(python_exe)
candidate = p.with_name(p.stem + "w" + p.suffix)
if candidate.exists():
return str(candidate)
return python_exe
def _build_gateway_argv() -> tuple[list[str], str, dict[str, str]]:
"""Build (argv, working_dir, env_overlay) for the gateway subprocess.
Same logical command as what gateway.cmd runs, but assembled as a
native argv for direct ``subprocess.Popen`` invocation no cmd.exe
layer in between.
"""
_assert_windows()
from hermes_cli.config import get_hermes_home
from hermes_cli.gateway import (
PROJECT_ROOT,
_profile_arg,
get_python_path,
)
python_exe = _derive_venv_pythonw(get_python_path())
working_dir = str(PROJECT_ROOT)
hermes_home = str(Path(get_hermes_home()).resolve())
profile_arg = _profile_arg(hermes_home)
argv = [python_exe, "-m", "hermes_cli.main"]
if profile_arg:
argv.extend(profile_arg.split())
argv.extend(["gateway", "run", "--replace"])
env_overlay = {
"HERMES_HOME": hermes_home,
"PYTHONIOENCODING": "utf-8",
"VIRTUAL_ENV": str(Path(python_exe).resolve().parent.parent),
}
return argv, working_dir, env_overlay
def _spawn_detached(script_path: Path | None = None) -> int:
"""Launch the gateway as a fully detached background process.
We spawn ``pythonw.exe -m hermes_cli.main gateway run --replace``
directly NOT through a cmd.exe shim because on Windows a cmd.exe
child inherits the parent session's console handle and tends to get
reaped when the spawning shell exits. pythonw.exe has no console, and
combined with DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP |
CREATE_NO_WINDOW + DEVNULL stdio + a fresh env, the resulting process
is independent of whichever shell started it.
Arg ``script_path`` is accepted for API symmetry with older callers
but ignored we don't need it now that we go direct.
Returns the spawned PID so callers can verify the process actually
came up.
"""
_assert_windows()
argv, working_dir, env_overlay = _build_gateway_argv()
# Inherit PATH etc. from the current env, overlay our required vars.
env = {**os.environ, **env_overlay}
# DETACHED_PROCESS 0x00000008 — no console attached to child
# CREATE_NEW_PROCESS_GROUP 0x00000200 — child gets its own group, won't
# receive Ctrl+C from our group
# CREATE_NO_WINDOW 0x08000000 — belt-and-braces no-console flag
# CREATE_BREAKAWAY_FROM_JOB 0x01000000 — escape any job object the
# parent is in (prevents parent-
# job teardown from reaping us;
# some Windows Terminal versions
# wrap their children in a job).
flags = 0x00000008 | 0x00000200 | 0x08000000 | 0x01000000
# Redirect any stray stdout/stderr output to a sidecar log. Python's
# logging module writes to gateway.log through a FileHandler, so the
# real gateway logs still land there — this just captures anything
# that goes to print() or native stderr.
from hermes_cli.config import get_hermes_home
log_dir = Path(get_hermes_home()) / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
stray_log = log_dir / "gateway-stdio.log"
try:
with open(stray_log, "ab", buffering=0) as log_fh:
proc = subprocess.Popen(
argv,
cwd=working_dir,
env=env,
creationflags=flags,
close_fds=True,
stdin=subprocess.DEVNULL,
stdout=log_fh,
stderr=log_fh,
)
except OSError:
# CREATE_BREAKAWAY_FROM_JOB can fail with "access denied" when the
# parent's job object doesn't permit breakaway (some Windows
# Terminal configs). Retry without the breakaway flag — in most
# setups pythonw.exe + DETACHED_PROCESS is enough on its own.
flags_no_breakaway = flags & ~0x01000000
with open(stray_log, "ab", buffering=0) as log_fh:
proc = subprocess.Popen(
argv,
cwd=working_dir,
env=env,
creationflags=flags_no_breakaway,
close_fds=True,
stdin=subprocess.DEVNULL,
stdout=log_fh,
stderr=log_fh,
)
return proc.pid
def install(force: bool = False) -> None:
"""Install the gateway as a Windows Scheduled Task (with Startup fallback).
Idempotent: re-running updates the task to point at the current python/
project paths. ``force`` is accepted for API parity with ``launchd_install``
/ ``systemd_install`` but isn't needed — we always reconcile.
"""
_assert_windows()
task_name = get_task_name()
script_path = _write_task_script()
ok, detail = _install_scheduled_task(task_name, script_path)
if ok:
print(f"{detail}")
print(f" Task script: {script_path}")
# Start it now so the user doesn't have to log off/on.
run_code, _out, run_err = _exec_schtasks(["/Run", "/TN", task_name])
if run_code == 0:
_report_gateway_start("Scheduled Task")
else:
# Scheduled Task was created but /Run failed (e.g. the task's
# action is malformed). Spawn directly as a backstop.
pid = _spawn_detached(script_path)
_report_gateway_start(
f"direct spawn (PID {pid}; schtasks /Run said: {run_err.strip()})"
)
_print_next_steps()
return
# schtasks create didn't work. See if it's a "fall back to startup" case.
if _should_fall_back(1, detail):
print(f"↻ Scheduled Task install blocked ({detail.splitlines()[0]}) — using Startup folder fallback")
entry = _install_startup_entry(script_path)
pid = _spawn_detached(script_path)
print(f"✓ Installed Windows login item: {entry}")
print(f" Task script: {script_path}")
_report_gateway_start(f"direct spawn (PID {pid})")
_print_next_steps()
return
# Unknown schtasks error — surface it and bail.
raise RuntimeError(f"Windows gateway install failed: {detail}")
def _wait_for_gateway_ready(timeout_s: float = 6.0, interval_s: float = 0.4) -> list[int]:
"""Poll for a live gateway process for up to ``timeout_s`` seconds.
Returns the list of PIDs found. Empty list means nothing came up in
time the caller should surface that to the user as a failed start.
"""
from hermes_cli.gateway import find_gateway_pids
deadline = time.time() + timeout_s
while time.time() < deadline:
pids = list(find_gateway_pids())
if pids:
return pids
time.sleep(interval_s)
return []
def _report_gateway_start(via: str) -> None:
pids = _wait_for_gateway_ready()
if pids:
print(f"✓ Gateway started via {via} (PID: {', '.join(map(str, pids))})")
else:
print(f"⚠ Launched gateway via {via}, but no process detected after 6s.")
print(" Check the log for startup errors:")
from hermes_cli.config import get_hermes_home
print(f" type {Path(get_hermes_home()).resolve()}\\logs\\gateway.log")
print(f" type {Path(get_hermes_home()).resolve()}\\logs\\gateway-stdio.log")
def _print_next_steps() -> None:
from hermes_cli.config import get_hermes_home
hermes_home = Path(get_hermes_home()).resolve()
print()
print("Next steps:")
print(" hermes gateway status # Check status")
print(f" type {hermes_home}\\logs\\gateway.log # View logs")
def uninstall() -> None:
"""Remove both the Scheduled Task and the Startup-folder fallback, if present."""
_assert_windows()
task_name = get_task_name()
script_path = get_task_script_path()
startup_entry = get_startup_entry_path()
if is_task_registered():
code, _out, err = _exec_schtasks(["/Delete", "/F", "/TN", task_name])
if code == 0:
print(f"✓ Removed Scheduled Task {task_name!r}")
else:
print(f"⚠ schtasks /Delete returned code {code}: {err.strip()}")
for path, label in [(startup_entry, "Windows login item"), (script_path, "Task script")]:
try:
path.unlink()
print(f"✓ Removed {label}: {path}")
except FileNotFoundError:
pass
# ---------------------------------------------------------------------------
# Status / start / stop / restart
# ---------------------------------------------------------------------------
def is_task_registered() -> bool:
code, _out, _err = _exec_schtasks(["/Query", "/TN", get_task_name()])
return code == 0
def is_startup_entry_installed() -> bool:
return get_startup_entry_path().exists()
def is_installed() -> bool:
"""True when either the schtasks entry or the Startup fallback is present."""
return is_task_registered() or is_startup_entry_installed()
def query_task_status() -> dict[str, str]:
"""Parse ``schtasks /Query /V /FO LIST`` and pull the interesting keys."""
code, out, err = _exec_schtasks(["/Query", "/TN", get_task_name(), "/V", "/FO", "LIST"])
if code != 0:
return {}
info: dict[str, str] = {}
for raw in out.splitlines():
line = raw.strip()
if not line or ":" not in line:
continue
key, _, value = line.partition(":")
key = key.strip().lower()
value = value.strip()
# Some Windows locales emit "Last Result" instead of "Last Run Result".
if key in {"status", "last run time", "last run result", "last result"}:
if key == "last result":
info.setdefault("last run result", value)
else:
info[key] = value
return info
def _gateway_pids() -> list[int]:
"""Reuse the cross-platform PID scanner in gateway.py."""
from hermes_cli.gateway import find_gateway_pids
return list(find_gateway_pids())
def status(deep: bool = False) -> None:
"""Print a status report for the Windows gateway service."""
_assert_windows()
task_name = get_task_name()
task_installed = is_task_registered()
startup_installed = is_startup_entry_installed()
pids = _gateway_pids()
if task_installed:
print(f"✓ Scheduled Task registered: {task_name}")
info = query_task_status()
if info:
for key in ("status", "last run time", "last run result"):
if key in info:
print(f" {key.title()}: {info[key]}")
elif startup_installed:
print(f"✓ Windows login item installed: {get_startup_entry_path()}")
else:
print("✗ Gateway service not installed")
if pids:
print(f"✓ Gateway process running (PID: {', '.join(map(str, pids))})")
else:
print("✗ No gateway process detected")
if deep:
print()
print(f" Task name: {task_name}")
print(f" Task script: {get_task_script_path()}")
print(f" Startup entry: {get_startup_entry_path()}")
if not task_installed and not startup_installed and not pids:
print()
print("To install:")
print(" hermes gateway install")
def start() -> None:
"""Start the gateway. Prefers /Run on the scheduled task if present."""
_assert_windows()
if is_task_registered():
code, _out, err = _exec_schtasks(["/Run", "/TN", get_task_name()])
if code == 0:
_report_gateway_start(f"Scheduled Task {get_task_name()!r}")
return
print(f"⚠ schtasks /Run failed (code {code}): {err.strip()} — falling back to direct spawn")
# Direct spawn — no script_path needed with the new argv-based spawner.
pid = _spawn_detached()
_report_gateway_start(f"direct spawn (PID {pid})")
def stop() -> None:
"""Stop the gateway. Tries /End on the scheduled task, then kills any stragglers."""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes
stopped_any = False
if is_task_registered():
code, _out, err = _exec_schtasks(["/End", "/TN", get_task_name()])
# schtasks returns nonzero when the task isn't currently running — don't treat that as an error.
if code == 0:
stopped_any = True
elif "not running" not in (err or "").lower():
print(f"⚠ schtasks /End returned code {code}: {err.strip()}")
killed = kill_gateway_processes(all_profiles=False)
if killed:
stopped_any = True
print(f"✓ Killed {killed} gateway process(es)")
if stopped_any:
print("✓ Gateway stopped")
else:
print("✗ No gateway was running")
def restart() -> None:
"""Stop the gateway then start it again."""
_assert_windows()
stop()
# Give Windows a moment to release the listening port.
time.sleep(1.0)
start()
+1 -1
View File
@@ -205,7 +205,7 @@ def _cmd_test(args) -> None:
if getattr(args, "payload_file", None):
try:
custom = json.loads(Path(args.payload_file).read_text())
custom = json.loads(Path(args.payload_file).read_text(encoding="utf-8"))
if isinstance(custom, dict):
payload.update(custom)
else:
+40 -29
View File
@@ -2805,12 +2805,18 @@ def _classify_worker_exit(pid: int) -> "tuple[str, Optional[int]]":
def _pid_alive(pid: Optional[int]) -> bool:
"""Return True if ``pid`` is still running on this host.
Cross-platform: uses ``os.kill(pid, 0)`` on POSIX and ``OpenProcess``
on Windows. Returns False for falsy PIDs or on any OS error.
Cross-platform: uses ``OpenProcess`` + ``WaitForSingleObject`` on
Windows (via ``gateway.status._pid_exists``) and ``os.kill(pid, 0)``
on POSIX. Returns False for falsy PIDs or on any OS error.
**Zombie handling:** ``os.kill(pid, 0)`` succeeds against
zombie processes (post-exit, pre-reap) because the process table
entry still exists. A worker that exits without being reaped by its
**DO NOT** use ``os.kill(pid, 0)`` directly on Windows Python's
Windows ``os.kill`` treats ``sig=0`` as ``CTRL_C_EVENT`` (bpo-14484)
and will broadcast it to the target's console group, potentially
killing unrelated processes.
**Zombie handling:** the existence check succeeds against zombie
processes (post-exit, pre-reap) because the process table entry
still exists. A worker that exits without being reaped by its
parent would stay "alive" to the dispatcher forever. Dispatcher
workers are started via ``start_new_session=True`` + intentional
Popen handle abandonment, so init reaps them quickly but during
@@ -2821,21 +2827,14 @@ def _pid_alive(pid: Optional[int]) -> bool:
"""
if not pid or pid <= 0:
return False
try:
if hasattr(os, "kill"):
os.kill(int(pid), 0)
except ProcessLookupError:
from gateway.status import _pid_exists
if not _pid_exists(int(pid)):
return False
except PermissionError:
# Process exists, we just can't signal it.
return True
except OSError:
return False
# Still here → kill(0) succeeded. Check for zombie on platforms
# Still here → process exists. Check for zombie on platforms
# where we have a cheap, deterministic process-state probe.
if sys.platform == "linux":
try:
with open(f"/proc/{int(pid)}/status", "r") as f:
with open(f"/proc/{int(pid)}/status", "r", encoding="utf-8") as f:
for line in f:
if line.startswith("State:"):
# "State:\tZ (zombie)" → dead
@@ -2911,7 +2910,10 @@ def _terminate_reclaimed_worker(
if _pid_alive(pid):
try:
kill(int(pid), signal.SIGKILL)
# signal.SIGKILL doesn't exist on Windows; fall back to SIGTERM
# (which maps to TerminateProcess via the stdlib shim).
_sigkill = getattr(signal, "SIGKILL", signal.SIGTERM)
kill(int(pid), _sigkill)
info["sigkill"] = True
except (ProcessLookupError, OSError):
return info
@@ -3035,7 +3037,9 @@ def enforce_max_runtime(
time.sleep(0.5)
if _pid_alive(pid):
try:
kill(pid, signal.SIGKILL)
# signal.SIGKILL doesn't exist on Windows.
_sigkill = getattr(signal, "SIGKILL", signal.SIGTERM)
kill(pid, _sigkill)
killed = True
except (ProcessLookupError, OSError):
pass
@@ -3514,17 +3518,24 @@ def dispatch_once(
# cleanly without calling ``kanban_complete`` / ``kanban_block``
# (protocol violation — auto-block) from a real crash (OOM killer,
# SIGKILL, non-zero exit — existing counter behavior).
try:
while True:
try:
_pid, _status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if _pid == 0:
break
_record_worker_exit(_pid, _status)
except Exception:
pass
#
# Windows has no zombies / no os.WNOHANG — subprocess.Popen handles
# are freed when the Python object is garbage-collected or .wait() is
# called explicitly. The kanban dispatcher discards the Popen handle
# after spawn (``_default_spawn`` → abandon), so on Windows there's
# nothing to reap here — skip the whole block.
if os.name != "nt":
try:
while True:
try:
_pid, _status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if _pid == 0:
break
_record_worker_exit(_pid, _status)
except Exception:
pass
result = DispatchResult()
result.reclaimed = release_stale_claims(conn)
+38 -10
View File
@@ -43,6 +43,24 @@ Usage:
hermes claw migrate --dry-run # Preview migration without changes
"""
# IMPORTANT: hermes_bootstrap must be the very first import — it sets up
# UTF-8 stdio on Windows so print()/subprocess children don't hit
# UnicodeEncodeError with non-ASCII characters. No-op on POSIX.
#
# Guarded against ModuleNotFoundError because ``hermes_bootstrap`` is a
# top-level module registered via pyproject.toml's ``py-modules`` list.
# When the user upgrades code via ``git pull`` (or ``hermes update``
# crashes between ``git reset --hard`` and ``uv pip install -e .``), the
# new code references ``hermes_bootstrap`` but the editable install's
# ``.pth`` file still points at the old set of top-level modules. Without
# this guard, hermes crashes on import and the user can't run
# ``hermes update`` to recover. Missing the bootstrap means UTF-8 stdio
# setup is skipped on Windows — degraded, not broken. POSIX is unaffected.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
pass
import argparse
import json
import os
@@ -5782,16 +5800,14 @@ def _kill_stale_dashboard_processes(
while pending and _time.monotonic() < deadline:
_time.sleep(0.1)
still_pending = []
# On Windows, os.kill(pid, 0) is NOT a no-op. Route through
# the cross-platform existence check.
from gateway.status import _pid_exists
for pid in pending:
try:
os.kill(pid, 0) # probe
except ProcessLookupError:
killed.append(pid)
except (PermissionError, OSError):
# Can't probe — assume still there.
if _pid_exists(pid):
still_pending.append(pid)
else:
still_pending.append(pid)
killed.append(pid)
pending = still_pending
# SIGKILL any survivors.
@@ -6835,7 +6851,7 @@ def _ensure_fhs_path_guard() -> None:
if sys.platform != "linux":
return
try:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux FHS helper, guarded by sys.platform == "linux" above + AttributeError catch
return
except AttributeError:
return
@@ -7965,10 +7981,15 @@ def _cmd_update_impl(args, gateway_mode: bool):
print(
f"{len(_stuck)} gateway process(es) ignored SIGTERM — force-killing"
)
from gateway.status import terminate_pid as _terminate_pid
for pid in _stuck:
try:
os.kill(pid, _signal.SIGKILL)
except (ProcessLookupError, PermissionError):
# Routes through taskkill /T /F on Windows,
# SIGKILL on POSIX — _signal.SIGKILL doesn't
# exist on Windows so the old raw os.kill call
# used to crash the entire update path.
_terminate_pid(pid, force=True)
except (ProcessLookupError, PermissionError, OSError):
pass
# Give the OS a beat to reap the processes so the
# watchers see them exit and respawn.
@@ -8554,6 +8575,13 @@ def _build_provider_choices() -> list[str]:
def main():
"""Main entry point for hermes CLI."""
# Force UTF-8 stdio on Windows before anything prints. No-op elsewhere.
try:
from hermes_cli.stdio import configure_windows_stdio
configure_windows_stdio()
except Exception:
pass
from hermes_cli._parser import build_top_level_parser
parser, subparsers, chat_parser = build_top_level_parser()
+2 -2
View File
@@ -69,7 +69,7 @@ def _install_dependencies(provider_name: str) -> None:
try:
import yaml
with open(yaml_path) as f:
with open(yaml_path, encoding="utf-8") as f:
meta = yaml.safe_load(f) or {}
except Exception:
return
@@ -377,7 +377,7 @@ def _write_env_vars(env_path: Path, env_writes: dict) -> None:
if key not in updated_keys:
new_lines.append(f"{key}={val}")
env_path.write_text("\n".join(new_lines) + "\n")
env_path.write_text("\n".join(new_lines) + "\n", encoding="utf-8")
# ---------------------------------------------------------------------------
+2 -2
View File
@@ -173,7 +173,7 @@ def _read_disk_cache() -> tuple[dict[str, Any] | None, float]:
except (OSError, FileNotFoundError):
return (None, 0.0)
try:
with open(path) as fh:
with open(path, encoding="utf-8") as fh:
data = json.load(fh)
except (OSError, json.JSONDecodeError):
return (None, 0.0)
@@ -187,7 +187,7 @@ def _write_disk_cache(data: dict[str, Any]) -> None:
try:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
with open(tmp, "w") as fh:
with open(tmp, "w", encoding="utf-8") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
atomic_replace(tmp, path)
+1 -1
View File
@@ -174,7 +174,7 @@ def run_oneshot(
# Redirect stderr AND stdout to devnull for the entire call tree.
# We'll print the final response to the real stdout at the end.
real_stdout = sys.stdout
devnull = open(os.devnull, "w")
devnull = open(os.devnull, "w", encoding="utf-8")
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
+1 -1
View File
@@ -870,7 +870,7 @@ class PluginManager:
if yaml is None:
logger.warning("PyYAML not installed cannot load %s", manifest_file)
return None
data = yaml.safe_load(manifest_file.read_text()) or {}
data = yaml.safe_load(manifest_file.read_text(encoding="utf-8")) or {}
name = data.get("name", plugin_dir.name)
key = f"{prefix}/{plugin_dir.name}" if prefix else name
+2 -2
View File
@@ -127,7 +127,7 @@ def _read_manifest(plugin_dir: Path) -> dict:
try:
import yaml
with open(manifest_file) as f:
with open(manifest_file, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except Exception as e:
logger.warning("Failed to read plugin.yaml in %s: %s", plugin_dir, e)
@@ -703,7 +703,7 @@ def _discover_all_plugins() -> list:
description = ""
if yaml:
try:
with open(manifest_file) as f:
with open(manifest_file, encoding="utf-8") as f:
manifest = yaml.safe_load(f) or {}
name = manifest.get("name", d.name)
version = manifest.get("version", "")
+14 -9
View File
@@ -354,7 +354,7 @@ def _read_config_model(profile_dir: Path) -> tuple:
return None, None
try:
import yaml
with open(config_path, "r") as f:
with open(config_path, "r", encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, str):
@@ -758,7 +758,6 @@ def _cleanup_gateway_service(name: str, profile_dir: Path) -> None:
def _stop_gateway_process(profile_dir: Path) -> None:
"""Stop a running gateway process via its PID file."""
import signal as _signal
import time as _time
pid_file = profile_dir / "gateway.pid"
@@ -769,19 +768,25 @@ def _stop_gateway_process(profile_dir: Path) -> None:
raw = pid_file.read_text().strip()
data = json.loads(raw) if raw.startswith("{") else {"pid": int(raw)}
pid = int(data["pid"])
os.kill(pid, _signal.SIGTERM)
# Wait up to 10s for graceful shutdown
# Route through terminate_pid so Windows uses the appropriate
# primitive (taskkill / TerminateProcess) — raw os.kill with
# _signal.SIGKILL raises AttributeError at import time on Windows,
# and raw os.kill with SIGTERM doesn't cascade to child processes
# the same way taskkill /T does.
from gateway.status import terminate_pid as _terminate_pid
from gateway.status import _pid_exists
_terminate_pid(pid) # graceful first
# Wait up to 10s for graceful shutdown. On Windows, os.kill(pid, 0)
# is NOT a no-op — use the handle-based existence check.
for _ in range(20):
_time.sleep(0.5)
try:
os.kill(pid, 0)
except ProcessLookupError:
if not _pid_exists(pid):
print(f"✓ Gateway stopped (PID {pid})")
return
# Force kill
try:
os.kill(pid, _signal.SIGKILL)
except ProcessLookupError:
_terminate_pid(pid, force=True)
except (ProcessLookupError, OSError):
pass
print(f"✓ Gateway force-stopped (PID {pid})")
except (ProcessLookupError, PermissionError):
+9 -6
View File
@@ -7,11 +7,14 @@ keystrokes can be fed back in. The only caller today is the
Design constraints:
* **POSIX-only.** Hermes Agent supports Windows exclusively via WSL, which
exposes a native POSIX PTY via ``openpty(3)``. Native Windows Python
has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
install/platform message so the dashboard can render a banner instead of
crashing.
* **POSIX-only.** This module depends on ``fcntl``, ``termios``, and
``ptyprocess``, none of which exist on native Windows Python. Native
Windows ConPTY is a different API (Windows 10 build 17763+) and would
need a separate Windows implementation (``pywinpty``) that's tracked
as a future enhancement. On native Windows, importing this module
raises :class:`ImportError` and the dashboard's ``/chat`` tab shows a
WSL-recommended banner instead of crashing. Every other feature in the
dashboard (sessions, jobs, metrics, config editor) works natively.
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
which is a pure-Python wrapper around the OS calls. The browser talks
to the same ``hermes --tui`` binary it would launch from the CLI, so
@@ -210,7 +213,7 @@ class PtyBridge:
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL): # windows-footgun: ok — POSIX-only module (imports fcntl/termios/ptyprocess at top)
if not self._proc.isalive():
break
try:
+60 -4
View File
@@ -84,18 +84,34 @@ def resolve_hermes_bin() -> Optional[str]:
1. ``sys.argv[0]`` if it resolves to a real executable.
2. ``shutil.which("hermes")`` on PATH.
3. ``None`` caller should fall back to ``python -m hermes_cli.main``.
Windows note: ``os.access(path, os.X_OK)`` returns True for ``.py`` and
``.pyc`` files on Windows (the OS treats anything listed in PATHEXT as
executable, and Python files are often registered there). But
``subprocess.run([script.py, ...])`` can't actually execute a .py
directly CreateProcessW needs a real .exe, not a script associated
with the Python launcher. On Windows we therefore skip the argv[0]
fast-path when it points at a .py file and fall through to either
``hermes.exe`` on PATH or the ``sys.executable -m hermes_cli.main``
fallback.
"""
argv0 = sys.argv[0]
_is_windows = sys.platform == "win32"
def _is_python_script(p: str) -> bool:
return p.lower().endswith((".py", ".pyc"))
# Absolute path to an executable (covers nix store, venv wrappers, etc.)
if os.path.isabs(argv0) and os.path.isfile(argv0) and os.access(argv0, os.X_OK):
return argv0
if not (_is_windows and _is_python_script(argv0)):
return argv0
# Relative path — resolve against CWD
if not argv0.startswith("-") and os.path.isfile(argv0):
abs_path = os.path.abspath(argv0)
if os.access(abs_path, os.X_OK):
return abs_path
if not (_is_windows and _is_python_script(abs_path)):
return abs_path
# PATH lookup
path_bin = shutil.which("hermes")
@@ -142,8 +158,48 @@ def relaunch(
preserve_inherited: bool = True,
original_argv: Optional[Sequence[str]] = None,
) -> None:
"""Replace the current process with a fresh hermes invocation."""
"""Replace the current process with a fresh hermes invocation.
On POSIX we use ``os.execvp`` which replaces the running process with
the new one in place same PID, no double-fork. That's what the
relaunch contract wants: "run hermes again as if the user had typed
the new argv".
Windows has no native exec semantics ``os.execvp`` on Windows
*emulates* exec by spawning the child and exiting the parent, but
only works when the target is a real Win32 executable. Our target
is usually ``hermes.exe`` (a Python console-script shim that wraps
``python -m hermes_cli.main``) or a ``.cmd`` batch file, and both
raise ``OSError(8, "Exec format error")`` on Windows' execvp.
The Windows-correct pattern is: spawn the child with ``subprocess.run``
(which routes through ``cmd.exe`` via ``shell=False`` + PATHEXT resolution),
wait for it to exit, then propagate its exit code via ``sys.exit``.
That's functionally equivalent — the user sees "hermes exited, then
new hermes started" — just with two PIDs in play instead of one.
"""
new_argv = build_relaunch_argv(
extra_args, preserve_inherited=preserve_inherited, original_argv=original_argv
)
os.execvp(new_argv[0], new_argv)
if sys.platform == "win32":
# Windows: subprocess + exit, because execvp can't swap to .cmd/.exe shims.
import subprocess
try:
result = subprocess.run(new_argv)
sys.exit(result.returncode)
except KeyboardInterrupt:
sys.exit(130)
except OSError as exc:
# Surface a helpful error rather than the raw OSError — the
# caller used to see ``[Errno 8] Exec format error`` which is
# cryptic. Common causes: ``hermes`` not on PATH yet (install
# hasn't propagated User PATH into this shell) or a stale shim.
print(
f"\nHermes relaunch failed: {exc}\n"
f"Command: {' '.join(new_argv)}\n"
f"Fix: open a new terminal so PATH picks up, then re-run hermes.",
file=sys.stderr,
)
sys.exit(1)
else:
os.execvp(new_argv[0], new_argv)
+2 -2
View File
@@ -1257,7 +1257,7 @@ def do_snapshot_export(output_path: str, console: Optional[Console] = None) -> N
sys.stdout.write(payload)
else:
out = Path(output_path)
out.write_text(payload)
out.write_text(payload, encoding="utf-8")
c.print(f"[bold green]Snapshot exported:[/] {out}")
c.print(f"[dim]{len(installed)} skill(s), {len(tap_list)} tap(s)[/]\n")
@@ -1274,7 +1274,7 @@ def do_snapshot_import(input_path: str, force: bool = False,
return
try:
snapshot = json.loads(inp.read_text())
snapshot = json.loads(inp.read_text(encoding="utf-8"))
except json.JSONDecodeError:
c.print(f"[bold red]Error:[/] Invalid JSON in {inp}\n")
return
+252
View File
@@ -0,0 +1,252 @@
"""Windows-safe stdio configuration.
On Windows, Python's ``sys.stdout``/``sys.stderr`` default to the console's
active code page (often ``cp1252``, sometimes ``cp437``, occasionally ``cp932``
on Japanese locales, etc.). Hermes's banners, tool output feed, and slash
command listings all contain Unicode: box-drawing characters (````),
mathematical and geometric symbols (`` ``), and user-supplied
text in any language. Printing those to a cp1252 console raises
``UnicodeEncodeError: 'charmap' codec can't encode character…`` and kills the
whole CLI before the REPL even opens.
The fix is to force UTF-8 on the Python side and also flip the console's
code page to UTF-8 (65001). Both matter: Python-level only helps when
Python's stdout is a real TTY; code-page flipping lets subprocesses and
child Python ``print()`` calls agree on encoding.
This module is a no-op on every non-Windows platform, and idempotent.
Entry points (``cli.py`` ``main``, ``hermes_cli/main.py`` CLI dispatch,
``gateway/run.py`` startup) call :func:`configure_windows_stdio` exactly
once early in startup.
Patterns cribbed from Claude Code (``src/utils/platform.ts``), OpenCode
(``packages/opencode/src/pty/index.ts`` env injection), and OpenAI Codex
(``codex-rs/core/src/unified_exec/process_manager.rs``). None of those
actually flip the console code page they rely on their runtime (Node or
Rust) writing UTF-16 to the Win32 console API and letting the terminal
sort it out. Python doesn't get that luxury.
"""
from __future__ import annotations
import os
import sys
__all__ = ["configure_windows_stdio", "is_windows"]
_CONFIGURED = False
def is_windows() -> bool:
"""Return True iff running on native Windows (not WSL)."""
return sys.platform == "win32"
def _flip_console_code_page_to_utf8() -> None:
"""Set the attached console's input and output code pages to UTF-8.
Uses ``SetConsoleCP`` / ``SetConsoleOutputCP`` via ``ctypes``. Failure
is silent if there's no attached console (e.g. Hermes is running
behind a redirected stdout, under a service, or inside a PTY-less CI
runner) these calls simply return 0 and we move on.
CP_UTF8 is 65001.
"""
try:
import ctypes
kernel32 = ctypes.windll.kernel32 # type: ignore[attr-defined]
# Best-effort; if there's no console attached these just fail silently.
kernel32.SetConsoleCP(65001)
kernel32.SetConsoleOutputCP(65001)
except Exception:
# ctypes import, missing kernel32, or non-Windows — any failure here
# is non-fatal. We've still reconfigured Python's own streams below.
pass
def _reconfigure_stream(stream, *, encoding: str = "utf-8", errors: str = "replace") -> None:
"""Reconfigure a text stream to UTF-8 in place.
Uses ``TextIOWrapper.reconfigure`` (Python 3.7+). If the stream isn't
a ``TextIOWrapper`` (e.g. it's been redirected to an ``io.StringIO``
during tests), we skip rather than blow up.
"""
try:
reconfigure = getattr(stream, "reconfigure", None)
if reconfigure is None:
return
reconfigure(encoding=encoding, errors=errors)
except Exception:
pass
def configure_windows_stdio() -> bool:
"""Force UTF-8 stdio on Windows. No-op elsewhere.
Idempotent safe to call multiple times from different entry points.
Returns ``True`` if anything was actually changed, ``False`` on
non-Windows or on a repeat call.
Set ``HERMES_DISABLE_WINDOWS_UTF8=1`` in the environment to opt out
(for diagnosing encoding-related bugs by forcing the old cp1252 path).
Also sets a sensible default ``EDITOR`` on Windows if none is already
set see :func:`_default_windows_editor`.
"""
global _CONFIGURED
if _CONFIGURED:
return False
if not is_windows():
# Mark configured so repeated calls on POSIX are true no-ops.
_CONFIGURED = True
return False
if os.environ.get("HERMES_DISABLE_WINDOWS_UTF8") in ("1", "true", "True", "yes"):
_CONFIGURED = True
return False
# Encourage every child Python process spawned by the agent to also use
# UTF-8 for its stdio. PYTHONIOENCODING wins over the locale-based
# default in subprocesses. Don't override an explicit user setting.
os.environ.setdefault("PYTHONIOENCODING", "utf-8")
# PYTHONUTF8 = 1 enables UTF-8 Mode globally for any Python subprocess
# (PEP 540). Again, don't override an explicit setting.
os.environ.setdefault("PYTHONUTF8", "1")
# Set EDITOR to a working Windows default if neither EDITOR nor VISUAL
# is set. prompt_toolkit's ``open_in_editor`` falls back to POSIX-only
# paths (``/usr/bin/nano``, ``/usr/bin/vi``) that don't exist on
# Windows — Ctrl+X Ctrl+E and ``/edit`` silently do nothing there
# otherwise. This happens even with full Git for Windows installed,
# so it's not a MinGit-specific issue.
_default_editor = _default_windows_editor()
if _default_editor and not os.environ.get("EDITOR") and not os.environ.get("VISUAL"):
os.environ["EDITOR"] = _default_editor
# Augment PATH with the Hermes-managed Git install directories so
# subprocess calls (bash, rg, grep, etc.) resolve even in sessions
# that started before the User PATH broadcast reached them. When
# install.ps1 adds these to User PATH via SetEnvironmentVariable,
# already-running shells don't see the change — which means hermes
# launched from the install session won't find rg / bash / grep
# even though they're "installed". Prepending the known paths here
# closes that gap. No-op when the paths don't exist (e.g. system-Git
# install without Hermes-managed PortableGit).
_augment_path_with_known_tools()
# Flip the console code page first so that any subprocess that
# inherits the console (e.g. a launched shell) also sees CP_UTF8.
_flip_console_code_page_to_utf8()
# Reconfigure Python's own stdio wrappers so ``print()`` calls from
# this process round-trip emoji / box-drawing / non-Latin text.
# ``errors="replace"`` means a genuinely unencodable byte sequence
# gets a ``?`` rather than crashing the interpreter — we prefer
# degraded output over a stack trace.
_reconfigure_stream(sys.stdout)
_reconfigure_stream(sys.stderr)
# stdin is re-configured for completeness; Hermes's interactive
# input path uses prompt_toolkit which manages its own encoding,
# but batch/pipe input benefits from UTF-8 decoding on stdin too.
_reconfigure_stream(sys.stdin)
_CONFIGURED = True
return True
def _default_windows_editor() -> str:
"""Return a Windows-appropriate default for ``$EDITOR``.
Priority order, first match wins:
1. ``notepad`` ships with every Windows install, no deps, works as a
blocking editor (``subprocess.call(["notepad", file])`` blocks until
the user closes the window). This is the "always-works" default.
The prompt_toolkit buffer's ``open_in_editor`` and Hermes's
``hermes config edit`` both honour ``$EDITOR``. Users who prefer a
different editor can override:
- VSCode: ``$env:EDITOR = "code --wait"`` (``--wait`` is critical;
without it the editor returns immediately and any input is lost)
- Notepad++: ``$env:EDITOR = "'C:\\Program Files\\Notepad++\\notepad++.exe' -multiInst -nosession"``
- Neovim: ``$env:EDITOR = "nvim"`` (if installed)
Set this before launching Hermes (User env var in Windows Settings, or
export in a PowerShell profile) and Hermes picks it up automatically.
"""
import shutil
# notepad.exe is always in %SystemRoot%\System32 on Windows, so shutil.which
# will reliably find it. Return the bare name so prompt_toolkit's shlex
# split doesn't trip over a path containing spaces.
if shutil.which("notepad"):
return "notepad"
# On the extreme off-chance notepad is missing (WinPE, Nano Server), fall
# back to nothing and let prompt_toolkit's silent no-op do its thing.
return ""
def _augment_path_with_known_tools() -> None:
"""Prepend well-known Hermes-managed tool directories to os.environ['PATH'].
Fixes the "User PATH was just updated but my process can't see it" gap on
Windows. When install.ps1 runs, it adds entries like
``%LOCALAPPDATA%\\hermes\\git\\bin`` to the User PATH via
``SetEnvironmentVariable(..., "User")``. That write propagates to newly
*spawned* processes only already-running shells (including the one the
user invokes ``hermes`` from right after install) retain their old PATH.
Any subprocess Hermes spawns bash, ``rg``, ``grep``, ``npm`` inherits
that stale PATH and reports commands as missing even though they're on
disk. Symptom: ``search_files`` reports "rg/find not available" when
the user clearly just installed ripgrep.
Patch-up strategy: add the known Hermes-managed tool directories to our
PATH at startup so subprocess calls resolve correctly. No-op on POSIX
and when the directories don't exist. The User PATH broadcast still
happens in the background for future shells; this just smooths over
the first-launch gap.
"""
if not is_windows():
return
import shutil as _shutil
local_appdata = os.environ.get("LOCALAPPDATA", "")
if not local_appdata:
return
# Known tool dirs installed by scripts/install.ps1. Kept in sync with
# the PATH entries that installer adds to User scope — the two lists
# should match so this prefill fully mirrors what a fresh shell would
# see on next launch.
candidate_dirs = [
os.path.join(local_appdata, "hermes", "git", "cmd"),
os.path.join(local_appdata, "hermes", "git", "bin"),
os.path.join(local_appdata, "hermes", "git", "usr", "bin"),
# Hermes venv Scripts directory — host of the hermes.exe shim itself,
# also where any pip-installed console scripts land. Usually already
# on PATH when the user invokes hermes, but harmless to include.
os.path.join(local_appdata, "hermes", "hermes-agent", "venv", "Scripts"),
# WinGet packages directory — where ``winget install`` drops CLI
# shims by default (ripgrep lands here as rg.exe). Covers the case
# of a system-Git install + ripgrep-via-winget that isn't yet on
# the spawning shell's PATH.
os.path.join(local_appdata, "Microsoft", "WinGet", "Links"),
]
existing = os.environ.get("PATH", "")
existing_lower = {p.lower() for p in existing.split(os.pathsep) if p}
prepend = []
for d in candidate_dirs:
if os.path.isdir(d) and d.lower() not in existing_lower:
prepend.append(d)
if prepend:
os.environ["PATH"] = os.pathsep.join([*prepend, existing])
+1 -1
View File
@@ -54,7 +54,7 @@ TIPS = [
"Combine multiple references: \"Review @file:main.py and @file:test.py for consistency.\"",
# --- Keybindings ---
"Alt+Enter (or Ctrl+J) inserts a newline for multi-line input.",
"Alt+Enter inserts a newline for multi-line input. (Windows Terminal intercepts Alt+Enter — use Ctrl+Enter instead.)",
"Ctrl+C interrupts the agent. Double-press within 2 seconds to force exit.",
"Ctrl+Z suspends Hermes to the background — run fg in your shell to resume.",
"Tab accepts auto-suggestion ghost text or autocompletes slash commands.",
+9 -3
View File
@@ -509,8 +509,12 @@ def _run_post_setup(post_setup_key: str):
if not node_modules.exists() and npm_bin:
_print_info(" Installing Node.js dependencies for browser tools...")
import subprocess
# Use the resolved npm_bin absolute path so subprocess.Popen can
# execute npm.cmd on Windows (CreateProcessW otherwise rejects
# batch shims). On POSIX npm_bin is the plain path — same
# behaviour as before.
result = subprocess.run(
["npm", "install", "--silent"],
[npm_bin, "install", "--silent"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT)
)
if result.returncode == 0:
@@ -609,11 +613,13 @@ def _run_post_setup(post_setup_key: str):
elif post_setup_key == "camofox":
camofox_dir = PROJECT_ROOT / "node_modules" / "@askjo" / "camofox-browser"
if not camofox_dir.exists() and shutil.which("npm"):
_npm_bin = shutil.which("npm")
if not camofox_dir.exists() and _npm_bin:
_print_info(" Installing Camofox browser server...")
import subprocess
# Absolute npm path so .cmd shim executes on Windows.
result = subprocess.run(
["npm", "install", "--silent"],
[_npm_bin, "install", "--silent"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT)
)
if result.returncode == 0:
+208 -9
View File
@@ -118,12 +118,13 @@ def remove_wrapper_script():
def uninstall_gateway_service():
"""Stop and uninstall the gateway service (systemd, launchd) and kill any
standalone gateway processes.
"""Stop and uninstall the gateway service (systemd, launchd, Windows
Scheduled Task / Startup folder) and kill any standalone gateway processes.
Delegates to the gateway module which handles:
- Linux: user + system systemd services (with proper DBUS env setup)
- macOS: launchd plists
- Windows: Scheduled Task + Startup-folder fallback, via ``gateway_windows``
- All platforms: standalone ``hermes gateway run`` processes
- Termux/Android: skips systemd (no systemd on Android), still kills standalone processes
"""
@@ -167,7 +168,7 @@ def uninstall_gateway_service():
scope = "system" if is_system else "user"
try:
if is_system and os.geteuid() != 0:
if is_system and os.geteuid() != 0: # windows-footgun: ok — Linux systemd uninstall path, guarded by `if system == "Linux"` above
log_warn(f"System gateway service exists at {unit_path} "
f"but needs sudo to remove")
continue
@@ -201,9 +202,163 @@ def uninstall_gateway_service():
except Exception as e:
log_warn(f"Could not remove launchd gateway service: {e}")
# 4. Windows: uninstall Scheduled Task + Startup-folder entry. The
# gateway_windows module already knows how to locate and remove both
# code paths (schtasks /Delete + .cmd unlink) and how to stop any
# running detached pythonw gateway process. We call into it so the
# uninstall logic stays in exactly one place.
elif system == "Windows":
try:
from hermes_cli import gateway_windows
if gateway_windows.is_installed() or gateway_windows.is_task_registered() \
or gateway_windows.is_startup_entry_installed():
try:
gateway_windows.stop()
except Exception as e:
log_warn(f"Could not stop Windows gateway cleanly: {e}")
try:
gateway_windows.uninstall()
log_success("Removed Windows gateway (Scheduled Task + Startup entry)")
stopped_something = True
except Exception as e:
log_warn(f"Could not fully uninstall Windows gateway: {e}")
except Exception as e:
log_warn(f"Could not check Windows gateway service: {e}")
return stopped_something
# ============================================================================
# Windows-specific uninstall helpers
# ============================================================================
#
# The installer (``scripts/install.ps1``) does four Windows-only things that
# ``remove_path_from_shell_configs`` / ``remove_wrapper_script`` don't cover:
#
# 1. Sets User-scope env vars ``HERMES_HOME`` and ``HERMES_GIT_BASH_PATH``
# via ``[Environment]::SetEnvironmentVariable(..., "User")``. These
# don't live in ~/.bashrc — they're in the Windows registry at
# HKCU\Environment.
# 2. Prepends to User-scope ``PATH`` (same registry location) entries
# like ``%LOCALAPPDATA%\hermes\git\cmd``, ``%LOCALAPPDATA%\hermes\git\bin``,
# ``%LOCALAPPDATA%\hermes\git\usr\bin``, ``%LOCALAPPDATA%\hermes\node``.
# Again not in any rc file — only accessible via the registry or the
# .NET [Environment] API.
# 3. Downloads PortableGit to ``%LOCALAPPDATA%\hermes\git\`` and Node to
# ``%LOCALAPPDATA%\hermes\node\`` as user-scoped, isolated copies.
# These are ~200MB combined and serve no purpose after uninstall.
# 4. On the ``hermes dashboard`` + gateway paths, drops files into
# ``%LOCALAPPDATA%\hermes\gateway-service\`` and sometimes
# ``%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`` — the
# latter is handled by ``gateway_windows.uninstall()`` already.
#
# Running a PowerShell one-liner per operation is overkill and fragile on
# locked-down machines (Constrained Language Mode, restricted ExecutionPolicy).
# Direct registry writes via ``winreg`` work without spawning any subprocess
# and apply immediately for new shells (SendMessage WM_SETTINGCHANGE would
# be nicer but requires ctypes and buys us nothing — the user will log out
# or open a new terminal anyway).
def _hermes_path_markers(hermes_home: Path) -> list[str]:
"""Path-entry substrings that identify Hermes-owned User-PATH entries."""
root = str(hermes_home).rstrip("\\/")
# Match on prefix so sub-entries (git\cmd, git\bin, git\usr\bin, node, etc.)
# all get swept. Also match the bare hermes-agent install dir.
markers = [root + "\\hermes-agent", root + "\\git", root + "\\node", root + "\\venv"]
# Also match if HERMES_HOME was customised to somewhere else — find-and-nuke
# any entry whose path component contains "hermes". We don't want to catch
# unrelated entries like "chermes-foo" or "ephermeral", so we look for
# backslash-hermes as a word-ish boundary.
return markers
def remove_path_from_windows_registry(hermes_home: Path) -> list[str]:
"""Strip Hermes-owned entries from User-scope PATH in the registry.
Returns the list of removed path entries. Operates on HKCU\\Environment,
same key the installer wrote to via ``[Environment]::SetEnvironmentVariable``.
"""
try:
import winreg
except ImportError:
return [] # not on Windows, nothing to do
removed: list[str] = []
key_path = "Environment"
try:
with winreg.OpenKey(winreg.HKEY_CURRENT_USER, key_path, 0,
winreg.KEY_READ | winreg.KEY_WRITE) as key:
try:
path_value, path_type = winreg.QueryValueEx(key, "Path")
except FileNotFoundError:
return []
# Preserve REG_EXPAND_SZ vs REG_SZ so unexpanded %VARS% survive.
entries = [e for e in path_value.split(";") if e]
markers = _hermes_path_markers(hermes_home)
kept: list[str] = []
for entry in entries:
entry_norm = entry.rstrip("\\/")
matched = any(entry_norm.lower().startswith(m.lower()) for m in markers)
if matched:
removed.append(entry)
else:
kept.append(entry)
if removed:
new_value = ";".join(kept)
winreg.SetValueEx(key, "Path", 0, path_type, new_value)
except OSError as e:
log_warn(f"Could not edit User PATH in registry: {e}")
return removed
def remove_hermes_env_vars_windows() -> list[str]:
"""Delete HERMES_HOME and HERMES_GIT_BASH_PATH from User-scope env vars."""
try:
import winreg
except ImportError:
return []
removed: list[str] = []
try:
with winreg.OpenKey(winreg.HKEY_CURRENT_USER, "Environment", 0,
winreg.KEY_READ | winreg.KEY_WRITE) as key:
for name in ("HERMES_HOME", "HERMES_GIT_BASH_PATH"):
try:
winreg.QueryValueEx(key, name)
except FileNotFoundError:
continue
try:
winreg.DeleteValue(key, name)
removed.append(name)
except OSError as e:
log_warn(f"Could not delete {name} from User env: {e}")
except OSError as e:
log_warn(f"Could not open User Environment key: {e}")
return removed
def remove_portable_tooling_windows(hermes_home: Path) -> list[Path]:
"""Delete PortableGit and Node installs the Windows installer created under
``%LOCALAPPDATA%\\hermes\\``. Only called on full uninstall; they're
isolated from any system Git / Node so they cannot break other tools."""
removed: list[Path] = []
for sub in ("git", "node", "gateway-service"):
target = hermes_home / sub
if target.exists():
try:
shutil.rmtree(target, ignore_errors=False)
removed.append(target)
except Exception as e:
log_warn(f"Could not remove {target}: {e}")
return removed
def _is_windows() -> bool:
import sys
return sys.platform == "win32"
def _is_default_hermes_home(hermes_home: Path) -> bool:
"""Return True when ``hermes_home`` points at the default (non-profile) root."""
try:
@@ -400,14 +555,36 @@ def run_uninstall(args):
if not uninstall_gateway_service():
log_info("No gateway service or processes found")
# 2. Remove PATH entries from shell configs
# 2. Remove PATH entries from shell configs (POSIX) AND from the Windows
# User-scope registry. Both helpers no-op on the wrong platform so we
# can safely call them unconditionally.
log_info("Removing PATH entries from shell configs...")
removed_configs = remove_path_from_shell_configs()
if removed_configs:
for config in removed_configs:
log_success(f"Updated {config}")
else:
log_info("No PATH entries found to remove")
log_info("No PATH entries found to remove in shell rc files")
if _is_windows():
log_info("Removing PATH entries from Windows User environment...")
# Expand %LOCALAPPDATA% etc. in hermes_home so the marker matching is
# against fully resolved paths — installer writes literal strings
# like C:\Users\<u>\AppData\Local\hermes\git\cmd, not %LOCALAPPDATA%.
removed_path_entries = remove_path_from_windows_registry(Path(os.path.expandvars(str(hermes_home))))
if removed_path_entries:
for entry in removed_path_entries:
log_success(f"Removed from User PATH: {entry}")
else:
log_info("No Hermes-owned PATH entries in User environment")
log_info("Removing HERMES_HOME / HERMES_GIT_BASH_PATH User env vars...")
removed_env = remove_hermes_env_vars_windows()
if removed_env:
for name in removed_env:
log_success(f"Removed User env var: {name}")
else:
log_info("No Hermes-set User env vars to remove")
# 3. Remove wrapper script
log_info("Removing hermes command...")
@@ -436,6 +613,21 @@ def run_uninstall(args):
except Exception as e:
log_warn(f"Could not fully remove {project_root}: {e}")
log_info("You may need to manually remove it")
# 4b. Remove Windows-only installer artifacts that are NOT user data:
# PortableGit, bundled Node, gateway-service dir. Installer put them
# under HERMES_HOME but they're install tooling, not config — safe to
# remove even in "keep data" mode. If we're doing a full uninstall
# the step-5 rmtree(hermes_home) would sweep them anyway; calling
# this helper there is a no-op since they'll already be gone.
if _is_windows():
log_info("Removing Windows installer artifacts (PortableGit, Node, gateway-service)...")
removed_artifacts = remove_portable_tooling_windows(hermes_home)
if removed_artifacts:
for path in removed_artifacts:
log_success(f"Removed {path}")
else:
log_info("No Windows installer artifacts to remove")
# 5. Optionally remove ~/.hermes/ data directory (and named profiles)
if full_uninstall:
@@ -471,11 +663,18 @@ def run_uninstall(args):
print(f" {hermes_home}/")
print()
print("To reinstall later with your existing settings:")
print(color(" curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
if _is_windows():
print(color(" irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex", Colors.DIM))
else:
print(color(" curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
print()
print(color("Reload your shell to complete the process:", Colors.YELLOW))
print(" source ~/.bashrc # or ~/.zshrc")
if _is_windows():
print(color("Open a new terminal (PowerShell / Windows Terminal) to pick up", Colors.YELLOW))
print(color("the updated User PATH and environment variables.", Colors.YELLOW))
else:
print(color("Reload your shell to complete the process:", Colors.YELLOW))
print(" source ~/.bashrc # or ~/.zshrc")
print()
print("Thank you for using Hermes Agent! ⚕")
print()
+27 -2
View File
@@ -692,7 +692,7 @@ def _tail_lines(path: Path, n: int) -> List[str]:
if not path.exists():
return []
try:
text = path.read_text(errors="replace")
text = path.read_text(encoding="utf-8", errors="replace")
except OSError:
return []
lines = text.splitlines()
@@ -2979,7 +2979,20 @@ async def get_models_analytics(days: int = 30):
import re
import asyncio
from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
# PTY bridge is POSIX-only (depends on fcntl/termios/ptyprocess). On native
# Windows the import raises; catch and leave PtyBridge=None so the rest of
# the dashboard (sessions, jobs, metrics, config editor) still loads and the
# /api/pty endpoint cleanly refuses with a WSL-suggested message.
try:
from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
_PTY_BRIDGE_AVAILABLE = True
except ImportError as _pty_import_err: # pragma: no cover - Windows-only path
PtyBridge = None # type: ignore[assignment]
_PTY_BRIDGE_AVAILABLE = False
class PtyUnavailableError(RuntimeError): # type: ignore[no-redef]
"""Stub on platforms where pty_bridge can't be imported."""
pass
_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
_PTY_READ_CHUNK_TIMEOUT = 0.2
@@ -3113,6 +3126,18 @@ async def pty_ws(ws: WebSocket) -> None:
await ws.accept()
# On native Windows, the POSIX PTY bridge can't be imported. Tell the
# client and close cleanly rather than pretending the feature works.
if not _PTY_BRIDGE_AVAILABLE:
await ws.send_text(
"\r\n\x1b[31mChat unavailable: the embedded terminal requires a "
"POSIX PTY, which native Windows Python doesn't provide.\x1b[0m\r\n"
"\x1b[33mInstall Hermes inside WSL2 to use the dashboard's /chat "
"tab — the rest of the dashboard works here.\x1b[0m\r\n"
)
await ws.close(code=1011)
return
# --- spawn PTY ------------------------------------------------------
resume = ws.query_params.get("resume") or None
channel = _channel_or_close_code(ws)
+2 -2
View File
@@ -233,7 +233,7 @@ def is_wsl() -> bool:
if _wsl_detected is not None:
return _wsl_detected
try:
with open("/proc/version", "r") as f:
with open("/proc/version", "r", encoding="utf-8") as f:
_wsl_detected = "microsoft" in f.read().lower()
except Exception:
_wsl_detected = False
@@ -260,7 +260,7 @@ def is_container() -> bool:
_container_detected = True
return True
try:
with open("/proc/1/cgroup", "r") as f:
with open("/proc/1/cgroup", "r", encoding="utf-8") as f:
cgroup = f.read()
if "docker" in cgroup or "podman" in cgroup or "/lxc/" in cgroup:
_container_detected = True
+1 -1
View File
@@ -50,7 +50,7 @@ def _resolve_timezone_name() -> str:
import yaml
config_path = get_config_path()
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
tz_cfg = cfg.get("timezone", "")
if isinstance(tz_cfg, str) and tz_cfg.strip():
@@ -4,6 +4,7 @@ description: Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent w
version: 1.0.0
author: Hermes Agent (Nous Research)
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Coding-Agent, Blackbox, Multi-Agent, Judge, Multi-Model]
@@ -4,6 +4,7 @@ description: Configure and use Honcho memory with Hermes -- cross-session user m
version: 2.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
+1
View File
@@ -4,6 +4,7 @@ description: Query Base (Ethereum L2) blockchain data with USD pricing — walle
version: 0.1.0
author: youssefea
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
@@ -4,6 +4,7 @@ description: Query Solana blockchain data with USD pricing — wallet balances,
version: 0.2.0
author: Deniz Alagoz (gizdusum), enhanced by Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Solana, Blockchain, Crypto, Web3, RPC, DeFi, NFT]
@@ -8,6 +8,7 @@ description: >
and one concrete recommendation with definition of done and implementation plan.
Use when the user asks for a "1-3-1", says "give me options", or needs help
choosing between competing approaches.
platforms: [linux, macos, windows]
version: 1.0.0
author: Willard Moore
license: MIT
@@ -5,6 +5,7 @@ version: 1.0.0
requires: Blender 4.3+ (desktop instance required, headless not supported)
author: alireza78a
tags: [blender, 3d, animation, modeling, bpy, mcp]
platforms: [linux, macos, windows]
---
# Blender MCP
@@ -5,6 +5,7 @@ version: 0.1.0
author: v1k22 (original PR), ported into hermes-agent
license: MIT
dependencies: []
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [diagrams, svg, visualization, education, physics, chemistry, engineering]
@@ -4,6 +4,7 @@ description: Create HTML-based video compositions, animated title cards, social
version: 1.0.0
author: heygen-com
license: Apache-2.0
platforms: [linux, macos, windows]
prerequisites:
commands: [node, ffmpeg, npx]
metadata:
@@ -4,6 +4,7 @@ description: Plan, set up, and monitor a multi-agent video production pipeline b
version: 1.0.0
author: [SHL0MS, alt-glitch]
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [video, kanban, multi-agent, orchestration, production-pipeline]
@@ -4,6 +4,7 @@ description: Generate real meme images by picking a template and overlaying text
version: 2.0.0
author: adanaleycio
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [creative, memes, humor, images]
+1
View File
@@ -4,6 +4,7 @@ description: "Run 150+ AI apps via inference.sh CLI (infsh) — image generation
version: 1.0.0
author: okaris
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [AI, image-generation, video, LLM, search, inference, FLUX, Veo, Claude]
@@ -4,6 +4,7 @@ description: Manage Docker containers, images, volumes, networks, and Compose st
version: 1.0.0
author: sprmn24
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [docker, containers, devops, infrastructure, compose, images, volumes, networks, debugging]
@@ -4,6 +4,7 @@ description: Roleplay the most difficult, tech-resistant user for your product.
version: 1.0.0
author: Omni @ Comelse
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [qa, ux, testing, adversarial, dogfood, personas, user-testing]
+1
View File
@@ -2,6 +2,7 @@
name: agentmail
description: Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).
version: 1.0.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [email, communication, agentmail, mcp]
@@ -4,6 +4,7 @@ description: Build fully-integrated 3-statement models (IS, BS, CF) in Excel wit
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, three-statement, income-statement, balance-sheet, cash-flow, excel, openpyxl, modeling]
@@ -4,6 +4,7 @@ description: Build comparable company analysis in Excel — operating metrics, v
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, comps, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build institutional-quality DCF valuation models in Excel — reven
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, dcf, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build auditable Excel workbooks headless with openpyxl — blue/bla
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [excel, openpyxl, finance, spreadsheet, modeling]
@@ -4,6 +4,7 @@ description: Build leveraged buyout models in Excel — sources & uses, debt sch
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, lbo, private-equity, excel, openpyxl, modeling]
@@ -4,6 +4,7 @@ description: Build accretion/dilution (merger) models in Excel — pro-forma P&L
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, m-and-a, merger, accretion-dilution, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build PowerPoint decks headless with python-pptx. Pairs with excel-
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [powerpoint, pptx, python-pptx, presentation, finance]
@@ -6,6 +6,7 @@ description: >
foods via USDA FoodData Central. Compute BMI, TDEE, one-rep max, macro
splits, and body fat — pure Python, no pip installs. Built for anyone
chasing gains, cutting weight, or just trying to eat better.
platforms: [linux, macos, windows]
version: 1.0.0
authors:
- haileymarshall
@@ -6,6 +6,7 @@ description: >
heart rate, HRV, sleep staging, and 40+ derived EXG scores) into responses.
Requires a BCI wearable (Muse 2/S or OpenBCI) and the NeuroSkill desktop app
running locally.
platforms: [linux, macos, windows]
version: 1.0.0
author: Hermes Agent + Nous Research
license: MIT
+1
View File
@@ -4,6 +4,7 @@ description: Build, test, inspect, install, and deploy MCP servers with FastMCP
version: 1.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [MCP, FastMCP, Python, Tools, Resources, Prompts, Deployment]
+1
View File
@@ -4,6 +4,7 @@ description: Use the mcporter CLI to list, configure, auth, and call MCP servers
version: 1.0.0
author: community
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [MCP, Tools, API, Integrations, Interop]
@@ -4,6 +4,7 @@ description: Migrate a user's OpenClaw customization footprint into Hermes Agent
version: 1.0.0
author: Hermes Agent (Nous Research)
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Migration, OpenClaw, Hermes, Memory, Persona, Import]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [accelerate, torch, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Distributed Training, HuggingFace, Accelerate, DeepSpeed, FSDP, Mixed Precision, PyTorch, DDP, Unified API, Simple]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [chromadb, sentence-transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [transformers, torch, pillow]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Multimodal, CLIP, Vision-Language, Zero-Shot, Image Classification, OpenAI, Image Search, Cross-Modal Retrieval, Content Moderation]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [faiss-cpu, faiss-gpu, numpy]
platforms: [linux, macos]
metadata:
hermes:
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [flash-attn, torch, transformers]
platforms: [linux, macos]
metadata:
hermes:
tags: [Optimization, Flash Attention, Attention Optimization, Memory Efficiency, Speed Optimization, Long Context, PyTorch, SDPA, H100, FP8, Transformers]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [guidance, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Prompt Engineering, Guidance, Constrained Generation, Structured Output, JSON Validation, Grammar, Microsoft Research, Format Enforcement, Multi-Step Workflows]
@@ -4,6 +4,7 @@ description: Build, test, and debug Hermes Agent RL environments for Atropos tra
version: 1.1.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [atropos, rl, environments, training, reinforcement-learning, reward-functions]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [tokenizers, transformers, datasets]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Tokenization, HuggingFace, BPE, WordPiece, Unigram, Fast Tokenization, Rust, Custom Tokenizer, Alignment Tracking, Production]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [instructor, pydantic, openai, anthropic]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Prompt Engineering, Instructor, Structured Output, Pydantic, Data Extraction, JSON Parsing, Type Safety, Validation, Streaming, OpenAI, Anthropic]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [lambda-cloud-client>=1.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Infrastructure, GPU Cloud, Training, Inference, Lambda Labs]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [transformers, torch, pillow]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [LLaVA, Vision-Language, Multimodal, Visual Question Answering, Image Chat, CLIP, Vicuna, Conversational AI, Instruction Tuning, VQA]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [modal>=0.64.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Infrastructure, Serverless, GPU, Cloud, Deployment, Modal]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [nemo-curator, cudf, dask, rapids]
platforms: [linux, macos]
metadata:
hermes:
tags: [Data Processing, NeMo Curator, Data Curation, GPU Acceleration, Deduplication, Quality Filtering, NVIDIA, RAPIDS, PII Redaction, Multimodal, LLM Training Data]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [peft>=0.13.0, transformers>=4.45.0, torch>=2.0.0, bitsandbytes>=0.43.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Fine-Tuning, PEFT, LoRA, QLoRA, Parameter-Efficient, Adapters, Low-Rank, Memory Optimization, Multi-Adapter]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [pinecone-client]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch>=2.0, transformers]
platforms: [linux, macos]
metadata:
hermes:
tags: [Distributed Training, PyTorch, FSDP, Data Parallel, Sharding, Mixed Precision, CPU Offloading, FSDP2, Large-Scale Training]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [lightning, torch, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [PyTorch Lightning, Training Framework, Distributed Training, DDP, FSDP, DeepSpeed, High-Level API, Callbacks, Best Practices, Scalable]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [qdrant-client>=1.12.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Vector Search, Qdrant, Semantic Search, Embeddings, Similarity Search, HNSW, Production, Distributed]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [sae-lens>=6.0.0, transformer-lens>=2.0.0, torch>=2.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Sparse Autoencoders, SAE, Mechanistic Interpretability, Feature Discovery, Superposition]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch, transformers, datasets, trl, accelerate]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Post-Training, SimPO, Preference Optimization, Alignment, DPO Alternative, Reference-Free, LLM Alignment, Efficient Training]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [sglang-router>=0.2.3, ray, torch>=2.0.0, transformers>=4.40.0]
platforms: [linux, macos]
metadata:
hermes:
tags: [Reinforcement Learning, Megatron-LM, SGLang, GRPO, Post-Training, GLM]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [diffusers>=0.30.0, transformers>=4.41.0, accelerate>=0.31.0, torch>=2.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Image Generation, Stable Diffusion, Diffusers, Text-to-Image, Multimodal, Computer Vision]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [tensorrt-llm, torch]
platforms: [linux, macos]
metadata:
hermes:
tags: [Inference Serving, TensorRT-LLM, NVIDIA, Inference Optimization, High Throughput, Low Latency, Production, FP8, INT4, In-Flight Batching, Multi-GPU]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch>=2.6.0, torchtitan>=0.2.0, torchao>=0.5.0]
platforms: [linux, macos]
metadata:
hermes:
tags: [Model Architecture, Distributed Training, TorchTitan, FSDP2, Tensor Parallel, Pipeline Parallel, Context Parallel, Float8, Llama, Pretraining]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [openai-whisper, transformers, torch]
platforms: [linux, macos]
metadata:
hermes:
tags: [Whisper, Speech Recognition, ASR, Multimodal, Multilingual, OpenAI, Speech-To-Text, Transcription, Translation, Audio Processing]
@@ -4,6 +4,7 @@ description: Canvas LMS integration — fetch enrolled courses and assignments u
version: 1.0.0
author: community
license: MIT
platforms: [linux, macos, windows]
prerequisites:
env_vars: [CANVAS_API_TOKEN, CANVAS_BASE_URL]
metadata:

Some files were not shown because too many files have changed in this diff Show More