Compare commits

...

21 Commits

Author SHA1 Message Date
Teknium 81f5faf1e0 fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes
teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after
a code update, on both his Windows machine AND his Linux workstation.  The
failure mode is real and affects every user who updates hermes by any path
OTHER than a fully-successful ``hermes update``.

## What happens

hermes_bootstrap.py is a top-level module registered via pyproject.toml's
``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work).  It
must be registered in the venv's editable-install .pth file before Python
can find it as a bare ``import hermes_bootstrap``.

``hermes update`` handles this correctly: (1) git reset --hard, (2) clear
__pycache__, (3) uv pip install -e . (re-registers the package including
the new py-modules list), (4) restart.

BUT if any step AFTER (1) fails — network blip during pip install, PEP 668
on a system Python, venv locked, uv not in PATH, a crash mid-update — the
user is left with new code that references hermes_bootstrap and a venv
that doesn't know about it.  Every hermes invocation after that crashes
with ModuleNotFoundError, including ``hermes update`` itself.  No recovery
path without manual `uv pip install -e .`.

Also affects users who ``git pull`` the repo directly without running
hermes update — relatively common for developers.

## Fix

Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError
across all 6 entry points (hermes_cli/main, run_agent, gateway/run,
acp_adapter/entry, cli, batch_runner).  On Windows, missing bootstrap
means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode
chars may fail to print) but NOT a crash.  POSIX is unaffected either way
since the bootstrap is a no-op there.

Once hermes is running again, the user can ``hermes update`` to fully
recover.

## Test update

tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap
scans for the first top-level import in each entry point and asserts it
is hermes_bootstrap.  Extended the check to accept a Try block whose body
is a lone Import of hermes_bootstrap — that's the recovery-friendly form
we just introduced.

Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak``
and confirming ``python -c "import hermes_cli.main"`` succeeds.  82/82
tests pass (hermes_bootstrap + windows-native + windows-compat).
2026-05-08 14:38:42 -07:00
Teknium 67b8a1076a ci: retrigger checks after #22083 landed the profiles.py PLW1514 fix on main 2026-05-08 14:24:55 -07:00
Teknium e318b593f1 ci(lint): add blocking ruff-check + windows-footguns jobs to lint.yml
Paired with commit e0c03defd (enabled PLW1514 in pyproject.toml) and
commit 3dfb35700 (added scripts/check-windows-footguns.py). Both
commits noted that the corresponding workflow edits were held back
because the authoring token lacked the `workflow` OAuth scope.

New jobs, both separate from `lint-diff` so the advisory diff
comment still posts when enforcement fails:

- ruff-blocking: runs `ruff check .` against the explicit select
  list in pyproject.toml (currently PLW1514, which catches bare
  open() that defaults to locale encoding — cp1252 on Windows).
  No --exit-zero, no `|| true`; exit code propagates to the
  required-check gate.

- windows-footguns: runs scripts/check-windows-footguns.py --all
  (380 files, stdlib-only, <2s). Covers 11 Windows-unsafe
  primitives — os.kill(pid, 0) bpo-14484 footgun, os.killpg,
  os.setsid/setpgrp, signal.SIGKILL/SIGHUP/SIGUSR* without
  getattr fallback, shebang scripts via subprocess, wmic without
  shutil.which guard, hardcoded ~/Desktop OneDrive trap, bare
  open() without encoding=, etc.

Both jobs pin actions by SHA to match repo convention.
tests/test_lint_config.py::test_workflow_has_blocking_ruff_step
now finds the blocking step and passes.
2026-05-08 14:19:23 -07:00
Teknium 37f509d2bb test: migrate stale os.kill monkeypatches to gateway.status._pid_exists
PR #21561 migrated liveness probes across 14 call sites from
`os.kill(pid, 0)` to `gateway.status._pid_exists` (psutil-first) so
the gateway doesn't Ctrl+C-itself on Windows via bpo-14484. A handful of
tests still patched the old `os.kill` seam and either happened to pass
on POSIX (when PID 12345 incidentally wasn't alive on the CI worker) or
failed outright — on CI runs they surfaced as 7 flaky/stable failures.

Migrate each affected test to patch the correct seam:

- tests/tools/test_browser_orphan_reaper.py (5 tests)
    Patch `gateway.status._pid_exists` instead of `os.kill`.
    Rename test_permission_error_on_kill_check_skips to
    test_alive_legacy_daemon_is_reaped — the old assertion was
    "PermissionError on sig 0 → skip dir"; post-migration the
    untracked-alive-daemon path always reaps the dir after SIGTERM
    (best-effort semantics were preserved).

- tests/tools/test_windows_native_support.py (4 tests)
    Replace tests that asserted `os.kill` seam behavior with tests
    that exercise `ProcessRegistry._is_host_pid_alive` as a
    delegator and split out a new TestPidExistsOSErrorWidening class
    that hits `gateway.status._pid_exists` directly via the POSIX
    fallback branch (so Windows-style `OSError(WinError 87)` + `PermissionError`
    widening is still covered on Linux CI).

- tests/tools/test_process_registry.py (1 test)
    Mock `psutil.Process` + `_pid_exists` instead of `os.kill`
    for the detached-session kill path.

- tests/tools/test_mcp_stability.py::test_kill_orphaned_uses_sigkill_when_available
    SIGTERM → alive-check → SIGKILL flow now uses `_pid_exists`
    for the middle step; assertion count drops from 3 to 2.

- tests/gateway/test_status.py::TestScopedLocks (2 tests)
    `acquire_scoped_lock` consults `_pid_exists`; patch that
    seam directly instead of trying to control the nested psutil
    call via os.kill monkeypatch.

- tests/hermes_cli/test_gateway.py::test_stop_profile_gateway_keeps_pid_file_when_process_still_running
    The stop loop sends one SIGTERM via os.kill then polls 20x via
    _pid_exists; instrument both separately. Old assertion
    `calls["kill"] == 21` split into `kill == 1` + `alive_probes == 20`.

- tests/hermes_cli/test_auth_toctou_file_modes.py::test_shared_nous_store_writes_0o600_with_0o700_parent
    Commit c34884ea2 switched the pytest seat-belt guard in
    `_nous_shared_store_path()` from `Path.home() / ".hermes"`
    to `get_default_hermes_root()`, which honors HERMES_HOME. The
    test sets both HERMES_HOME and HERMES_SHARED_AUTH_DIR to
    subpaths of the same tmp_path, and the override now collapses
    onto the same path the guard is refusing. Renamed the override
    subdirectory so the two paths diverge — guard passes, test runs.

All 21 original CI failures and their local-flaky siblings now pass
(278 tests across the touched files, 0 failures).
2026-05-08 14:18:41 -07:00
Teknium dc25ab7de2 fix(skills): move platforms key out of folded description: > scalars
The platforms-frontmatter sweep inserted 'platforms: [linux, macos, windows]'
immediately after 'description: >' on 5 optional-skills, landing inside the
folded scalar and breaking YAML parsing. docs-site-checks tripped on
one-three-one-rule/SKILL.md and would have failed on the other 4 in turn.

Fixed files:
- optional-skills/communication/one-three-one-rule/SKILL.md
- optional-skills/health/fitness-nutrition/SKILL.md
- optional-skills/health/neuroskill-bci/SKILL.md
- optional-skills/research/drug-discovery/SKILL.md
- optional-skills/security/oss-forensics/SKILL.md

Moved each platforms line below the closing of the description block.
All 161 SKILL.md files across the repo now parse as valid YAML.
2026-05-08 14:02:09 -07:00
Teknium a6168c2a0a fix(install.ps1): strip UTF-8 BOM that broke [scriptblock]::Create
Commit 3dfb35700 accidentally saved scripts/install.ps1 with a UTF-8 BOM
(EF BB BF) at byte 0.  PowerShell's normal file-execution path (`& .\install.ps1`)
handles BOMs fine, but the curl-and-iex one-liner documented in the README
uses `[scriptblock]::Create((irm ...))` which does NOT strip BOMs — the
BOM lands inside the param() block and fails with 'The assignment
expression is not valid' on $Branch and $HermesHome.

teknium1 hit this trying to reinstall from the PR branch after Brooklyn's
commits landed.  Every user trying the PR branch install-one-liner hit
it too until we notice.

Saved without BOM, verified via xxd: file now starts with '# =====' at
byte 0 instead of EF BB BF.
2026-05-08 13:41:17 -07:00
Teknium 7412878aca feat(windows uninstall): clean up User env, PATH, Scheduled Task, and portable tooling
`hermes uninstall` was POSIX-only.  On Windows it would leave four classes
of installer debris behind that the user had to scrub manually:

1. Scheduled Task and/or Startup-folder .cmd entry that installer.ps1
   dropped for `hermes gateway install`.  Left running at next logon
   even after uninstall, pointing at deleted code paths.
2. User-scope PATH entries for the Hermes venv, PortableGit (cmd, bin,
   usr\bin), and bundled Node, all written to HKCU\Environment\Path.
3. User-scope env vars HERMES_HOME and HERMES_GIT_BASH_PATH, same
   registry key.
4. PortableGit and Node copies under %LOCALAPPDATA%\hermes\ (~200MB),
   plus gateway-service/ scratch dir.

Fixes:

- `uninstall_gateway_service()` gets a Windows branch that calls into
  `gateway_windows.stop()` + `gateway_windows.uninstall()`, which already
  know how to remove both schtasks entries and Startup-folder .cmd files
  and how to stop any running detached pythonw gateway.
- `remove_path_from_windows_registry(hermes_home)` reads HKCU\Environment
  via winreg, strips any PATH entry whose path-prefix matches the
  installer-owned markers (\hermes-agent, \git, \node, \venv under the
  current HERMES_HOME), and writes the cleaned value back.  Preserves
  REG_EXPAND_SZ vs REG_SZ so unexpanded %VARS% in the user's PATH
  survive.  No PowerShell subprocess, no fragile `reg query` parsing.
- `remove_hermes_env_vars_windows()` deletes HERMES_HOME and
  HERMES_GIT_BASH_PATH from the same key.
- `remove_portable_tooling_windows(hermes_home)` rmtree's
  `hermes_home/git`, `hermes_home/node`, `hermes_home/gateway-service`
  — they're installer artifacts, not user data, so they get removed in
  BOTH "keep data" and "full uninstall" modes.

Wired these into `run_uninstall()` guarded by `_is_windows()` so
POSIX paths are untouched.  Also fixed the closing "Reload your shell"
footer to point Windows users at opening a new terminal (PATH changes
don't propagate into the current PowerShell session) with the
PowerShell install one-liner instead of bash's curl-pipe.

Verified on Delta-1 (Windows 10) via preview script: correctly
identifies 4 Hermes-installed PATH entries out of 13 total to remove,
leaves Python/LM Studio/ripgrep/ffmpeg/winget entries alone.
2026-05-08 13:26:42 -07:00
Teknium b2bdf274f7 fix(windows): gateway status dedup + install.ps1 platform-SDK bootstrap
## Two residual Windows fixes that were hanging from earlier commits.

### 1. `hermes gateway status` reported 2 PIDs per gateway — TWO bugs compounded

Diagnosed with psutil parent/child walk against live gateway PIDs:

**Bug A (the real one): `_get_parent_pid` silently failed on Windows.**
The helper shelled out to `ps -o ppid= -p <pid>`, which doesn't exist
on Windows — `FileNotFoundError` → returns `None` → the ancestor walk
terminated at `os.getpid()` alone.  Consequence: the PID table scan in
`_scan_gateway_pids` couldn't filter out `hermes gateway status`'s own
launcher stub (a venv `pythonw.exe`/`python.exe` that matches the same
`-m hermes_cli.main gateway` pattern as the gateway).  Every status
call saw "itself" as a second gateway.

Fix: `_get_parent_pid` now calls `psutil.Process(pid).ppid()` first
(psutil is a core dependency since 3dfb35700) and falls back to `ps`
only when `shutil.which("ps")` succeeds — matching the Windows-footgun
checker's "always guard `ps` / `wmic` / etc. with `shutil.which`" rule.

Before: `Gateway process running (PID: 21952, 46880)` — 46880 changing
on every call (the status invocation's own launcher, which died by the
time the next status call looked).

After (5 consecutive calls):
```
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
✓ Gateway process running (PID: 21952)
```

Ancestor walk on the fix: 14 PIDs (full chain through bash/explorer)
instead of the broken 1-PID set.

**Bug B (the cosmetic one): venv-launcher dedup.** Standard Windows
CPython venv behaviour is that `<venv>/Scripts/pythonw.exe` is a ~5 MB
launcher stub that spawns the base Python (`C:\\Program Files\\Python311
\\pythonw.exe`) with the same command line and waits.  Our process
scanner sees two PIDs for every gateway: launcher + interpreter, same
cmdline.  Bug A masked this by accidentally counting the status call
AS one of them; with Bug A fixed, we see both the real launcher and
real interpreter for the gateway process itself.

Fix: `_filter_venv_launcher_stubs` at the tail of `_scan_gateway_pids`
walks each matched PID's ppid via psutil.  Any PID that's the PARENT
of another matched PID is a launcher stub — drop it, keep the child.
Scoped to Windows (`is_windows() and len(pids) > 1`) and no-ops when
psutil isn't importable.

Net effect: `gateway status` now reports one PID per gateway — the
interpreter — matching POSIX behaviour and user expectations.

### 2. `install.ps1`: bootstrap pip + auto-install platform SDKs

New `Install-PlatformSdks` function wired between `Invoke-SetupWizard`
and `Start-GatewayIfConfigured`.  Fixes two related issues on fresh
Windows installs:

1. The tiered `uv pip install` cascade (introduced in 87fca8342)
   correctly falls through when tier 1 `.[all]` fails on the RL git
   deps, but the fallback tiers can silently skip SDKs from `[messaging]`
   when there's a partial-resolve.  Result: user sets `DISCORD_BOT_TOKEN`
   in `.env`, fires up gateway, hits "discord module not installed".

2. `uv` creates venvs WITHOUT pip by default, so the user's escape
   hatch (`pip install discord.py` in the venv) doesn't exist either.

The new function:
- Skips if `-NoVenv` (nothing to bootstrap into).
- Scans `~/.hermes/.env` for messaging tokens (TELEGRAM_BOT_TOKEN,
  DISCORD_BOT_TOKEN, SLACK_BOT_TOKEN, SLACK_APP_TOKEN, WHATSAPP_ENABLED),
  filtering placeholder values.
- For each token that's set, runs `python -c "import <sdk>"` to verify.
- If any import fails: runs `python -m ensurepip --upgrade` to bootstrap
  pip into the venv (idempotent — no-ops if pip is already present),
  then `pip install <spec>` for each missing SDK with specs mirroring
  pyproject.toml's `[messaging]` extra to avoid version drift.

The `$ErrorActionPreference = "SilentlyContinue"` spans are not
cosmetic — PowerShell wraps native-stderr from a non-zero-exit
subprocess as a `NativeCommandError` that prints even through
`*> $null` / `2>$null`.  Save + restore EAP over the import-probe
and pip-install blocks keeps the output clean.

Verified on this Windows 10 box:
- Initial state: telegram+fastapi+psutil present, discord+slack_sdk
  missing (tier 1 `.[all]` had failed — `.tirith-install-failed`
  marker in `%LOCALAPPDATA%\\hermes`).
- First run with discord+slack tokens in .env: detects both missing,
  ensurepip (skipped — pip was already bootstrapped earlier this
  session for telegram), installs `discord.py[voice]==2.7.1` +
  `PyNaCl` + `davey`, installs `slack-sdk==3.41.0`. All imports
  succeed on verify.
- Second run: all three SDKs report OK, function no-ops.

Pip spec strings mirror pyproject.toml's `[messaging]` extra verbatim
so a bump to the extra picks up here automatically — no drift.

### Files

- `hermes_cli/gateway.py`: `_get_parent_pid` rewritten (psutil-first);
  `_filter_venv_launcher_stubs` added; `_scan_gateway_pids` dedups
  launchers on Windows when it finds >1 match.
- `scripts/install.ps1`: new `Install-PlatformSdks` function (~85
  lines); wired into the main flow at line 1438.

### Verification

- `venv/Scripts/python.exe scripts/check-windows-footguns.py --all`
  → `✓ No Windows footguns found (380 file(s) scanned).`
- `ast.parse` passes on gateway.py.
- `[System.Management.Automation.Language.Parser]::ParseFile` passes
  on install.ps1.
- Live gateway (PID 21952, running since 12:33 today) survived 5x
  stress loop of `hermes gateway status` without dying.
2026-05-08 13:10:34 -07:00
Teknium 3dfb357001 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
Teknium 1cbe399149 fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper
On Windows, Python's ``os.kill(pid, 0)`` is NOT a no-op. CPython's
implementation (``Modules/posixmodule.c::os_kill_impl``) treats sig=0
as ``CTRL_C_EVENT`` because the two integer values collide at the C
layer, and routes it through ``GenerateConsoleCtrlEvent(0, pid)`` —
which sends a Ctrl+C to the ENTIRE console process group containing
the target PID, not just the PID itself. Any caller that wanted to
check "is PID X alive" via the classic POSIX ``os.kill(pid, 0)``
idiom was silently killing that process (and often unrelated
processes in the same console group) on Windows. Long-standing
Python Windows quirk; see bpo-14484 (open since 2012).

This manifested in Hermes as: every ``hermes gateway status``
invocation would read the gateway's PID from the PID file, call
``os.kill(pid, 0)`` via ``gateway.status.get_running_pid()`` as a
"liveness check", and instantly terminate the gateway it was trying
to report on. No shutdown log, no traceback, no atexit hook fire,
no exit-diag entry — just silent termination of the detached pythonw
process. "Bot answered one message then stopped typing" was the
characteristic end-user symptom because `os.kill(pid, 0)` fires
mid-response-send and kills the gateway between logs.

Reproduction (verified in this branch before the fix):

  $ hermes gateway start       # gateway alive, PID 37520
  $ hermes gateway status      # reports "No gateway process detected"
  $ tasklist /FI "PID eq 37520"  # INFO: No tasks are running
                                 # — gateway terminated silently

Root-cause fix is a new ``gateway.status._pid_exists(pid)`` helper:

- On Windows: Win32 ``OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION |
  SYNCHRONIZE, False, pid)`` + ``WaitForSingleObject(handle, 0)``
  via ctypes. Zero signal delivery, zero console-group side effects.
  Pins ctypes return types to avoid DWORD-vs-signed-int parse bugs
  on WAIT_TIMEOUT (0x102). Distinguishes ERROR_INVALID_PARAMETER
  (PID gone) from ERROR_ACCESS_DENIED (alive but another user).
- On POSIX: the canonical ``os.kill(pid, 0)`` idiom that actually is
  a no-op there.

Then patch every ``os.kill(pid, 0)`` liveness-check callsite to
route through ``_pid_exists`` instead. Total 14 callsites across
11 files; every single one was a latent silent-kill on Windows:

  gateway/run.py:2810      — /restart watcher (inline subprocess)
  gateway/run.py:15195     — --replace wait loop
  gateway/status.py:572    — acquire_gateway_runtime_lock stale check
  gateway/status.py:828    — get_running_pid (THE killer for status)
  gateway/platforms/whatsapp.py:111
  hermes_cli/gateway.py:228, 522, 1012  — gateway-related drain loops
  hermes_cli/kanban_db.py:2826         — _pid_alive was claiming to
                                         be cross-platform but used
                                         os.kill(pid, 0) on Windows
  hermes_cli/main.py:5792        — CLI process-kill polling
  hermes_cli/profiles.py:782     — profile stop wait loop
  plugins/google_meet/process_manager.py:74
  tools/browser_tool.py:1215, 1255  — browser daemon ownership probes
  tools/mcp_tool.py:1255, 3374     — MCP stdio orphan tracking

The watcher source in gateway/run.py:2810 is a multi-line string
that gets spawned as an inline ``python -c "..."`` subprocess, so
it can't import gateway.status. The fix for that callsite inlines
the same ctypes probe directly into the watcher source.

Tested on Windows 10 with the hermes gateway + Telegram bot:
- gateway start → alive
- 5 consecutive ``hermes gateway status`` invocations → gateway
  alive after every one, same PID reported each time (37520, 21952)
- gateway.log shows uninterrupted operation; no spurious shutdown
  entries; cron ticker and kanban dispatcher still running on
  their 60-second cadence
- bot continues answering Telegram messages throughout

Ships alongside an exit-path diagnostic wrapper in
``hermes_cli/gateway.py::run_gateway()`` that captures every way
``asyncio.run(start_gateway(...))`` can return (success, SystemExit,
KeyboardInterrupt, BaseException, atexit) with full traceback to
``logs/gateway-exit-diag.log``. This was used to prove the gateway
was being hard-killed externally (no exit event fired) and should
be kept for future Windows debugging.

Refs: https://bugs.python.org/issue14484
See also: references/windows-subprocess-sigint-storm.md in
the hermes-agent skill.
2026-05-08 12:34:27 -07:00
Teknium ac178b78c4 feat(windows): gateway as a Scheduled Task + Startup-folder fallback
Hermes gateway now installs as a real Windows service via
`hermes gateway install`, auto-starts on user logon, and stays running
across reboots. Mirrors the launchd (macOS) / systemd (Linux) contract
so the rest of the CLI dispatcher just plugs into the same `install /
uninstall / start / stop / restart / status` entrypoints.

Primary implementation is the new `hermes_cli/gateway_windows.py`:

- `schtasks /Create /SC ONLOGON /RL LIMITED /RU <user> /NP /IT` creates
  a per-user Scheduled Task running as the current user at next logon,
  with no UAC prompt and no stored password. Same pattern OpenClaw uses.
- When `schtasks /Create` returns "Access is denied" or times out
  (locked-down corporate boxes, 15s/30s hard + no-output cutoffs),
  fall back to writing a `.cmd` file into
  `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`, which
  Windows Explorer fires at every logon. Either path produces the same
  end-user experience.
- `_spawn_detached()` launches `pythonw.exe -m hermes_cli.main gateway
  run --replace` directly with `DETACHED_PROCESS |
  CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW |
  CREATE_BREAKAWAY_FROM_JOB` + DEVNULL stdio + sidecar
  `logs/gateway-stdio.log`. Going through pythonw.exe (no console)
  instead of a cmd.exe shim is what lets the gateway survive the
  spawning shell's exit on Windows — documented in
  `references/windows-subprocess-sigint-storm.md`.
- Two separate quoting helpers for cmd.exe vs schtasks (`/TR` argument)
  — they're different parsers and mixing breaks both. Same split
  OpenClaw documents in src/daemon/schtasks.ts.
- `_wait_for_gateway_ready()` + `_report_gateway_start()` poll for a
  live gateway process after spawn and report the PID, so install
  doesn't lie about success.

Dispatcher wiring in `hermes_cli/gateway.py`:

- `_gateway_command_inner()` gets Windows branches for install /
  uninstall / start / stop / restart / status + `_is_service_installed`
  + `_is_service_running`. `gateway status` output + suggested
  commands now mention `hermes gateway install` instead of
  `sudo hermes gateway install --system` on Windows.

Two separable Windows fixes that only matter for a working
detached gateway, bundled here because shipping them independently
leaves install broken:

(1) Spurious CTRL_C_EVENT on detached pythonw runs. When the gateway
is launched detached on Windows, something on the boot path (HTTPX /
python-telegram-bot / asyncio ProactorEventLoop subprocess plumbing)
synthesizes a Ctrl+C within ~60-90 seconds. Python 3.11 translates it
into KeyboardInterrupt inside `asyncio.run(start_gateway(...))`, the
outer `except KeyboardInterrupt: return` exits cleanly, and the
process dies with no shutdown log — "bot started typing, then
stopped" is the fingerprint because the interrupt fires mid-send.
Fix in `run_gateway()`: when `is_windows()` and stdin is not a TTY,
install `signal.signal(SIGINT, SIG_IGN)` + same for SIGBREAK. Real
console runs have a TTY and skip the absorber, so user Ctrl+C still
works interactively. Same family as commit 449ad952b's browser-tool
SIGINT absorber; cross-referenced in the ref doc.

(2) `wmic process get` is the process-list path used by
`_scan_gateway_pids()` / `find_gateway_pids()`, which power status,
stop, and restart on Windows. `C:\Windows\System32\wbem\WMIC.exe` has
been deprecated since Windows 10 21H1 and is not installed on modern
Win 10/11 boxes, so `find_gateway_pids()` silently returns [] — status
sees no gateway even when one is running. Fix: `shutil.which("wmic")`
first, fall back to PowerShell's `Get-CimInstance Win32_Process`
emitting the same LIST-style `CommandLine=...` / `ProcessId=...` pairs
the downstream parser already handles. Zero behavior change on boxes
where wmic still works.

Verified end-to-end on Windows 10 (Delta-1):
- `hermes gateway install` → falls back to Startup folder (access
  denied on schtasks for this user) + detached pythonw spawn, PID
  reported correctly.
- Gateway connects to Telegram, answers messages, stays alive past
  2min (previously died at ~85s with no shutdown log).
- `hermes gateway stop` + `uninstall` both clean up both tracks.

Refs: openclaw/openclaw src/daemon/schtasks.ts for the ONLOGON +
startup-folder-fallback pattern. skill hermes-agent
references/windows-subprocess-sigint-storm.md for the deeper
CTRL_C_EVENT / ProactorEventLoop background.
2026-05-08 12:04:50 -07:00
Teknium 87fca8342a fix(windows installer): UTF-8 BOM, tiered extras, skip tinker-atropos by default
install.ps1 had three related problems that compounded into `hermes dashboard`
failing to boot on Windows with 'No module named fastapi':

1. UTF-8 BOM missing.  Windows PowerShell 5.1 (the default on Windows 10/11,
   which is what `irm | iex` runs under) reads files without a BOM as
   cp1252.  install.ps1 has em-dashes, arrows, check marks, etc. — PS 5.1
   mangled them and the file failed to parse.  Added UTF-8 BOM so PS 5.1,
   PS 7, and the in-memory `irm | iex` path all read the file identically.

2. `uv pip install -e .[all]` had a single-tier silent fallback to bare
   `.` on any failure, with `2>&1 | Out-Null` swallowing the error.  Any
   transient extras install failure (network hiccup, wheel build issue,
   etc.) would drop every optional extra including [web], and the installer
   would still print 'Main package installed'.  Replaced with a four-tier
   fallback (.[all] -> PyPI-only extras -> dashboard+core -> bare) that
   prints output at every step and a targeted [web] verify+repair at the
   end so `hermes dashboard` specifically is never silently broken.

3. tinker-atropos was installed unconditionally after the main install.
   tinker-atropos/pyproject.toml pulls atroposlib and tinker from
   git+https://github.com/... which can fail on locked-down networks,
   flaky DNS, or rate-limited github.com and would half-install the venv.
   install.sh already skipped it by default with a one-liner for users
   who actually do RL training — install.ps1 now matches that behavior.

Parse-checked clean under Windows PowerShell 5.1.26100.8115
(5318 tokens, 0 parse errors).
2026-05-08 12:04:50 -07:00
Teknium acc0a81624 fix(windows): browser tool + spurious SIGINT from subprocess spawning
Three related Windows-only fixes that together make the browser toolset
actually usable on Windows. Symptom chain: user invokes browser_navigate
-> tool returns {"success": false, "error": "Daemon process exited
during startup with no error output"} and the CLI exits mid-turn with
the session summary.

Root cause (3 layers):

1. tools/browser_tool.py::_find_agent_browser() resolved
   node_modules/.bin/agent-browser to the extensionless POSIX shell
   shim via Path.exists(). On Windows, CreateProcessW cannot execute
   that script (WinError 193 "not a valid Win32 application"). Fix:
   delegate to shutil.which with path=node_modules/.bin so PATHEXT
   picks up agent-browser.CMD on Windows and the extensionless shim
   stays correct on POSIX.

2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the
   parent hermes.exe whenever a background thread spawns a .cmd
   subprocess. Python 3.11's default SIGINT handler raises
   KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's
   app.run() -> cli.py::run()'s finally block calls _run_cleanup()
   -> _emergency_cleanup_all_sessions -> spawns a concurrent
   _run_browser_command("close", ...) on the same session the agent
   thread just opened. Two agent-browser processes race on the same
   --session name, the daemon startup loses, and the tool returns
   the "Daemon process exited during startup" error. Fix: install a
   Windows-only SIGINT handler that absorbs the signal silently.
   Real user Ctrl+C still routes through prompt_toolkit's own c-c
   keybinding at the TUI layer, which is how Claude Code handles the
   same quirk (driving cancellation via the TUI key handler, not
   signals).

3. In tools/browser_tool.py, both Popen sites now pass
   creationflags=CREATE_NO_WINDOW | STARTF_USESTDHANDLES with
   close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd
   console flash; STARTF_USESTDHANDLES + close_fds ensures the child
   inherits only our three chosen handles (DEVNULL stdin, temp-file
   stdout/stderr) and no leaked parent console handles that could
   confuse agent-browser's native daemon spawn. Notably we do NOT
   add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag
   interacts badly with asyncio's ProactorEventLoop and makes things
   worse.

Verified end-to-end on Windows 10 / Windows Terminal / PowerShell:
browser_navigate to https://example.com returns
{"success": true, "title": "Example Domain"} and the CLI stays alive
for follow-up tool calls and assistant turns.

Refs: earlier Windows quirks commits 1cebb3bad (Ctrl+Enter newline),
26f5af52a (environment hints), aefd1a37f (Playwright Chromium).
2026-05-08 12:04:50 -07:00
emozilla c34884ea20 auth: use get_default_hermes_root() for shared nous_auth.json path
Replace hardcoded ~/.hermes/shared/ references with
get_default_hermes_root() / 'shared' so the cross-profile Nous auth
store lands in the correct location on every platform:

- Linux/macOS: ~/.hermes/shared/
- native Windows: %LOCALAPPDATA%\hermes\shared- Docker / custom HERMES_HOME: <root>/shared/

Updates _nous_shared_auth_dir(), the pytest seat-belt in
_nous_shared_store_path(), and the auth_add_command comment to match.
Previously Windows installs wrote to ~/.hermes/shared/ even though the
rest of the CLI uses %LOCALAPPDATA%\hermes, so profiles couldn't see
each other's shared credential.
2026-05-08 13:28:06 -04:00
Teknium b9bac87d5a feat(skills): declare platforms frontmatter for all 79 undeclared built-in skills
Completes the Windows-gating coverage for the built-in skills/ tree. Every
bundled SKILL.md now carries an explicit platforms: declaration so the
loader (agent.skill_utils.skill_matches_platform) can skip-load skills
that don't fit the current OS.

74 skills declared cross-platform (platforms: [linux, macos, windows]):
  Creative (16): ascii-art, ascii-video, architecture-diagram, baoyu-comic,
    baoyu-infographic, claude-design, creative-ideation, design-md,
    excalidraw, humanizer, manim-video, p5js, pixel-art,
    popular-web-designs, pretext, sketch, songwriting-and-ai-music,
    touchdesigner-mcp
  Autonomous agents: claude-code, codex, hermes-agent, opencode
  Data/devops: jupyter-live-kernel, kanban-orchestrator, kanban-worker,
    webhook-subscriptions, dogfood, codebase-inspection
  GitHub: github-auth, github-code-review, github-issues,
    github-pr-workflow, github-repo-management
  Media: gif-search, heartmula, songsee, spotify, youtube-content
  MCP / email / gaming / notes / smart-home: native-mcp, himalaya,
    pokemon-player, obsidian, openhue
  mlops (non-broken): weights-and-biases, huggingface-hub, llama-cpp,
    outlines, segment-anything-model, dspy, trl-fine-tuning
  Productivity: airtable, google-workspace, linear, maps, nano-pdf,
    notion, ocr-and-documents, powerpoint
  Red-teaming / research: godmode, arxiv, blogwatcher, llm-wiki,
    polymarket
  Software-dev: debugging-hermes-tui-commands, hermes-agent-skill-authoring,
    node-inspect-debugger, plan, requesting-code-review, spike,
    subagent-driven-development, systematic-debugging,
    test-driven-development, writing-plans
  Misc: yuanbao

5 skills gated from Windows (platforms: [linux, macos]):
  mlops/inference/vllm (serving-llms-vllm)
    vLLM is officially Linux-only; Windows requires WSL.
  mlops/training/axolotl
    Axolotl's flash-attn + deepspeed + bitsandbytes stack is Linux-first.
  mlops/training/unsloth
    Requires Triton + xformers + flash-attn — Linux only in practice.
  mlops/models/audiocraft (audiocraft-audio-generation)
    torchaudio ffmpeg backend + encodec dependencies are Linux-first.
  mlops/inference/obliteratus
    Research abliteration workflow; relies on Linux-focused pytorch
    kernels and MLX — no first-class Windows path.

Same strict-over-lenient policy as the optional-skills sweep: when the
underlying tool's Windows support is rough, missing, or WSL-only, gate the
skill. Easier to un-gate after verified Windows support lands than to leak
partial support that manifests as mid-task failures.

Combined with prior commits in this branch, every bundled SKILL.md
(skills/ + optional-skills/) now has a platforms: declaration.
2026-05-08 09:23:27 -07:00
Teknium 31224b9b5c feat(optional-skills): declare platforms frontmatter for all 63 undeclared skills
Extends the Windows-gating work to the optional-skills/ tree. Every
SKILL.md that previously omitted the platforms: field now carries an
explicit declaration, which Hermes's loader (agent.skill_utils.
skill_matches_platform) honors to skip-load on incompatible OSes.

58 skills declared cross-platform (platforms: [linux, macos, windows]):
  autonomous-ai-agents/blackbox, autonomous-ai-agents/honcho
  blockchain/base, blockchain/solana
  communication/one-three-one-rule
  creative/blender-mcp, creative/concept-diagrams, creative/hyperframes,
  creative/kanban-video-orchestrator, creative/meme-generation
  devops/cli (inference-sh-cli), devops/docker-management
  dogfood/adversarial-ux-test
  email/agentmail
  finance/3-statement-model, finance/comps-analysis, finance/dcf-model,
  finance/excel-author, finance/lbo-model, finance/merger-model,
  finance/pptx-author
  health/fitness-nutrition, health/neuroskill-bci
  mcp/fastmcp, mcp/mcporter
  migration/openclaw-migration
  mlops/accelerate, mlops/chroma, mlops/clip, mlops/guidance,
  mlops/hermes-atropos-environments, mlops/huggingface-tokenizers,
  mlops/instructor, mlops/lambda-labs, mlops/llava, mlops/modal,
  mlops/peft, mlops/pinecone, mlops/pytorch-lightning, mlops/qdrant,
  mlops/saelens, mlops/simpo, mlops/stable-diffusion
  productivity/canvas, productivity/shop-app, productivity/shopify,
  productivity/siyuan, productivity/telephony
  research/domain-intel, research/drug-discovery, research/duckduckgo-search,
  research/gitnexus-explorer, research/parallel-cli, research/scrapling
  security/1password, security/oss-forensics, security/sherlock
  web-development/page-agent

5 skills gated from Windows (platforms: [linux, macos]):
  mlops/flash-attention   - Flash Attention wheels are Linux-first; Windows
                            install requires building from source with CUDA
  mlops/faiss             - faiss-gpu has no Windows wheel; gate rather than
                            leak partial (faiss-cpu) support
  mlops/nemo-curator      - NVIDIA NeMo ecosystem has no first-class Windows path
  mlops/slime             - Megatron+SGLang RL stack is Linux-only in practice
  mlops/whisper           - openai-whisper + ffmpeg setup on Windows is
                            non-trivial; gate until Windows install stanza lands

Methodology: scanned every SKILL.md for Windows-hostile signals
(apt-get, brew, systemd, osascript, ptrace, X11 binaries, POSIX-only
Python APIs, Docker POSIX $(pwd) bind-mounts, explicit 'linux-only' /
'macos-only' text). 3 skills flagged as having hard signals on review:
docker-management and qdrant only had POSIX $(pwd) docker examples and
the tools themselves (Docker Desktop, Qdrant) run fine on Windows —
declared ALL. whisper had an apt/brew ffmpeg install path and nothing
else but the openai-whisper Windows install story is rough enough to
warrant gating.

Strict-over-lenient policy: when in doubt, gate. Easier to un-gate after
verified Windows support lands than to leak partial support that
manifests as mid-task failures for Windows users.
2026-05-08 09:16:33 -07:00
Teknium 3e823d5b3e feat(skills): gate 7 Linux/macOS-only skills from Windows via platforms frontmatter
Hermes's skill loader (agent/skill_utils.skill_matches_platform) already honors
the 'platforms:' frontmatter field and skip-loads skills whose declared
platform list doesn't include sys.platform. Seven bundled skills are in fact
Linux/macOS-only but never declared it, so they leak into Windows skill
listings and sometimes load with broken instructions.

Audited all 160 SKILL.md files (skills/ + optional-skills/) for Windows-
hostile signals: apt-get/brew/systemd/chmod+x install flows, ptrace/proc
runtime dependencies, bash-only launcher scripts, and package dependencies
with no Windows build. The 7 below fail one or more of those tests in a way
that fundamentally can't be papered over by docs edits:

  minecraft-modpack-server      bash start.sh + chmod +x + apt openjdk
  evaluating-llms-harness       lm-eval-harness bash launcher scripts
  distributed-llm-pretraining-
  torchtitan                    bash multi-node torchrun launcher
  python-debugpy                remote attach relies on /proc ptrace_scope
  pytorch-fsdp                  NCCL backend; Windows path is WSL only
  tensorrt-llm                  NVIDIA TensorRT-LLM has no Windows build
  searxng-search                Docker volume flow assumes POSIX $(pwd)

All seven get 'platforms: [linux, macos]'. On Windows the loader now skips
them silently — no more phantom skill listings, no more mid-task failures
because an Apple-only path was surfaced as a suggestion.

Cross-platform skills that merely CONTAIN signals in examples or
install-instructions (brew install as one of several paths, /tmp/ in a code
snippet, etc.) are NOT touched by this commit. A broader audit that
declares the ~140 cross-platform skills as 'platforms: [linux, macos,
windows]' can follow as a separate change once each has been verified
working on Windows.

The installed user copies under ~/AppData/Local/hermes/skills/ (when they
exist) are also patched so the running session reflects the gating
immediately, but only the in-repo files are committed here.
2026-05-08 08:27:23 -07:00
Teknium aefd1a37f4 fix(windows): auto-install Playwright Chromium + surface it in doctor
scripts/install.sh runs 'npx playwright install --with-deps chromium'
on every Linux distro after the npm-install step, which is why browser
tools Just Work on Linux.  scripts/install.ps1 never did the equivalent
step, so on native Windows installs check_browser_requirements() in
tools/browser_tool.py would return False (no Chromium under
%LOCALAPPDATA%\ms-playwright) and every browser_* tool got silently
filtered out of the agent's tool schema — no error, no log entry, user
just wondered why the tools didn't exist.

Two-part fix:

1. scripts/install.ps1: after 'npm install' in InstallDir succeeds, run
   'npx playwright install chromium'.  Resolves npx via the same
   execution-policy-aware logic already used for npm (prefer npx.cmd
   next to npmExe, fall back to Get-Command).  Surfaces a warning +
   manual-recovery hint when the install fails, matching install.sh
   behaviour for distros.

2. hermes_cli/doctor.py: after the agent-browser check, lazily import
   tools.browser_tool and reuse the exact same _chromium_installed()
   predicate check_browser_requirements() uses, so the doctor signal
   cannot drift from the runtime gate.  Skip the check when Camofox /
   CDP override / a cloud provider / Lightpanda is configured (those
   bypass local Chromium).  On missing Chromium, the hint is
   platform-correct: '--with-deps' on POSIX, plain 'install chromium'
   on win32.

Verified on Windows 10:
- 'npx playwright install chromium' completes successfully, drops
  Chrome Headless Shell under %LOCALAPPDATA%\ms-playwright
- check_browser_requirements() flips from False -> True
- 'hermes doctor' now prints either '✓ Playwright Chromium (browser
  engine)' or '⚠ Playwright Chromium not installed' + fix command
- tests/hermes_cli/test_doctor.py: 38/38 pass
- tests/tools/test_browser_chromium_check.py: 16/16 pass
2026-05-08 07:56:35 -07:00
Teknium ec3f7d1a89 docs: add Windows-Specific Quirks section to hermes-agent skill + keystroke diagnostic
Adds a dedicated '## Windows-Specific Quirks' section to the hermes-agent
skill so Windows pitfalls have one discoverable place to evolve. Inaugural
entries cover:

- Input / keybindings — Alt+Enter intercepted by Windows Terminal,
  Ctrl+Enter as the Windows newline keystroke, mintty/git-bash behavior,
  pointer to scripts/keystroke_diagnostic.py for investigation.
- Config / files — UTF-8 BOM HTTP-400 trap.
- execute_code / sandbox — WinError 10106 SYSTEMROOT root cause +
  _WINDOWS_ESSENTIAL_ENV_VARS fix location.
- Testing / contributing — scripts/run_tests.sh POSIX-venv limitation and
  the system-Python workaround, POSIX-only test skip-guard patterns.
- Path / filesystem — line-ending warnings (cosmetic), forward-slash
  portability.

Collapses the old scattered Windows bullets under 'Platform-specific
issues' into a single pointer at the new dedicated section so there's
only one place to maintain this content.

Also adds the scripts/keystroke_diagnostic.py the skill now references —
a small prompt_toolkit Application that prints the Keys.* identifier and
raw escape bytes for every keystroke. Used to establish the Ctrl+Enter
= c-j fact on Windows Terminal; generally useful for anyone adding a
platform-aware keybinding.
2026-05-08 07:13:07 -07:00
Teknium 1cebb3bad8 feat: Ctrl+Enter inserts newline on Windows Terminal
Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving
Windows users with no Enter-involving way to insert a newline in the Hermes
prompt. Fix it by reclaiming c-j on Windows only:

- _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where
  thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows
  plain Enter is always c-m, so c-j is free.
- Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends
  Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal
  settings changes required.
- Alt+Enter binding unchanged; still works on mac/Linux/WSL.
- Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler
  split into platform-aware assertions for POSIX vs win32.
- Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit
  even on POSIX) to point Windows users at Ctrl+Enter.

Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline,
since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No
conflicting Hermes binding existed for Ctrl+J, so this is a harmless side
effect.
2026-05-08 06:23:25 -07:00
Teknium 26f5af52a8 feat: enrich system-prompt environment hints with host + terminal-backend info
build_environment_hints() now emits a factual block describing the
execution environment on every prompt build:

* Local backend: host OS, $HOME, and cwd — so the agent stops guessing
  paths from the hostname. Windows also gets two specific callouts:
  - hostname != username (prevents C:\Users\<hostname>\... bugs)
  - `terminal` shells out to bash (git-bash/MSYS), not PowerShell

* Remote backend (docker/singularity/modal/daytona/ssh/vercel_sandbox):
  host info is SUPPRESSED — the agent's tools can't touch the host, so
  showing it is misleading. Instead we probe the backend once per
  process with `uname/whoami/pwd` and cache the result. On probe
  failure, fall back to a per-backend description that states only what
  we know from the backend choice itself (container type + likely OS
  family) without inventing user/cwd/$HOME.

Linux/Mac local users now get a small helpful 3-line host block instead
of an empty string. Zero change to the existing WSL hint paragraph.

Tests: 8 new/updated in TestEnvironmentHints, including a regression
guard that fails if a new remote backend is added without listing it in
_REMOTE_TERMINAL_BACKENDS.
2026-05-08 05:07:40 -07:00
203 changed files with 3742 additions and 378 deletions
+54 -4
View File
@@ -1,9 +1,12 @@
name: Lint (ruff + ty)
# Surface ruff and ty diagnostics as a diff vs the target branch.
# This check is advisory only ATM it always exits zero and never blocks merge.
# It posts a Markdown summary to the workflow run and, for pull requests,
# comments the same summary on the PR.
# Two things here:
# 1. Advisory diff — ruff + ty diagnostics as a diff vs the target branch.
# Posts a Markdown summary and a PR comment. Exit zero always.
# 2. Blocking ``ruff check .`` — enforces the explicit rules in
# ``[tool.ruff.lint.select]`` (currently PLW1514). Failure blocks merge.
# Separate job so the advisory diff still runs and posts even when
# enforcement fails.
on:
push:
@@ -149,3 +152,50 @@ jobs:
body: fullBody,
});
}
ruff-blocking:
# Enforce the rules in pyproject.toml [tool.ruff.lint.select]. Currently
# PLW1514 (unspecified-encoding) — catches bare ``open()`` /
# ``read_text()`` / ``write_text()`` calls that default to locale
# encoding on Windows. Failure here blocks merge; the advisory
# ``lint-diff`` job above runs independently so reviewers still get
# the diff comment even when enforcement fails.
name: ruff enforcement (blocking)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
- name: Install ruff
run: uv tool install ruff
- name: ruff check .
# No --exit-zero, no || true. Exit code propagates to the job,
# which propagates to the required-check gate.
run: |
ruff check .
windows-footguns:
# Static guardrails on Windows-unsafe Python primitives — os.kill(pid, 0),
# os.killpg, os.setsid, signal.SIGKILL without getattr fallback,
# shebang scripts via subprocess, bare open() without encoding=, etc.
# See scripts/check-windows-footguns.py for the full rule list.
name: Windows footguns (blocking)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Set up Python
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5
with:
python-version: "3.11"
- name: Run footgun checker
run: python scripts/check-windows-footguns.py --all
+155 -7
View File
@@ -522,11 +522,57 @@ See `hermes_cli/skin_engine.py` for the full schema and existing skins as exampl
## Cross-Platform Compatibility
Hermes runs on Linux, macOS, and WSL2 on Windows. When writing code that touches the OS:
Hermes runs on Linux, macOS, and native Windows (plus WSL2). When writing code
that touches the OS, assume *any* platform can hit your code path.
> **Before you PR:** run `scripts/check-windows-footguns.py` to catch the
> common Windows-unsafe patterns in your diff. It's grep-based and cheap;
> CI runs it on every PR too.
### Critical rules
1. **`termios` and `fcntl` are Unix-only.** Always catch both `ImportError` and `NotImplementedError`:
1. **Never call `os.kill(pid, 0)` for liveness checks.** `os.kill(pid, 0)`
is a standard POSIX idiom to check "is this PID alive" — the signal 0
is a no-op permission check. **On Windows it is NOT a no-op.** Python's
Windows `os.kill` maps `sig=0` to `CTRL_C_EVENT` (they collide at the
integer value 0) and routes it through `GenerateConsoleCtrlEvent(0, pid)`,
which broadcasts Ctrl+C to the **entire console process group** containing
the target PID. "Probe if alive" silently becomes "kill the target and
often unrelated processes sharing its console." See [bpo-14484](https://bugs.python.org/issue14484)
(open since 2012 — will never be fixed for compat reasons).
**Preferred:** use `psutil` (a core dependency — always available):
```python
import psutil
if psutil.pid_exists(pid):
# process is alive — safe on every platform
...
```
If you specifically need the hermes wrapper (it has a stdlib fallback
for scaffold-phase imports before pip install finishes), use
`gateway.status._pid_exists(pid)`. It calls `psutil.pid_exists` first
and falls back to a hand-rolled `OpenProcess + WaitForSingleObject`
dance on Windows only when psutil is somehow missing.
Audit grep for new callsites: `rg "os\.kill\([^,]+,\s*0\s*\)"`. Any hit
in non-test code is presumptively a Windows silent-kill bug.
2. **Use `shutil.which()` before shelling out — don't assume Windows has
tools Linux has.** `wmic` was removed in Windows 10 21H1 and later. `ps`,
`kill`, `grep`, `awk`, `fuser`, `lsof`, `pgrep`, and most POSIX CLI tools
simply don't exist on Windows. Test availability with
`shutil.which("tool")` and fall back to a Windows-native equivalent —
usually PowerShell via `subprocess.run(["powershell", "-NoProfile",
"-Command", ...])`.
For process enumeration: PowerShell's `Get-CimInstance Win32_Process` is
the modern replacement for `wmic process`. See
`hermes_cli/gateway.py::_scan_gateway_pids` for the pattern.
3. **`termios` and `fcntl` are Unix-only.** Always catch both `ImportError`
and `NotImplementedError`:
```python
try:
from simple_term_menu import TerminalMenu
@@ -539,24 +585,126 @@ Hermes runs on Linux, macOS, and WSL2 on Windows. When writing code that touches
idx = int(input("Choice: ")) - 1
```
2. **File encoding.** Windows may save `.env` files in `cp1252`. Always handle encoding errors:
4. **File encoding.** Windows may save `.env` files in `cp1252`. Always
handle encoding errors:
```python
try:
load_dotenv(env_path)
except UnicodeDecodeError:
load_dotenv(env_path, encoding="latin-1")
```
Config files (`config.yaml`) may be saved with a UTF-8 BOM by Notepad and
similar editors — use `encoding="utf-8-sig"` when reading files that
could have been touched by a Windows GUI editor.
3. **Process management.** `os.setsid()`, `os.killpg()`, and signal handling differ on Windows. Use platform checks:
5. **Process management.** `os.setsid()`, `os.killpg()`, `os.fork()`,
`os.getuid()`, and POSIX signal handling differ on Windows. Guard with
`platform.system()`, `sys.platform`, or `hasattr(os, "setsid")`:
```python
import platform
if platform.system() != "Windows":
kwargs["preexec_fn"] = os.setsid
else:
kwargs["creationflags"] = subprocess.CREATE_NEW_PROCESS_GROUP
```
4. **Path separators.** Use `pathlib.Path` instead of string concatenation with `/`.
**Preferred:** for killing a process AND its children (what `os.killpg`
does on POSIX), use `psutil` — it works on every platform:
```python
import psutil
try:
parent = psutil.Process(pid)
# Kill children first (leaf-up), then the parent.
for child in parent.children(recursive=True):
child.kill()
parent.kill()
except psutil.NoSuchProcess:
pass
```
5. **Shell commands in installers.** If you change `scripts/install.sh`, check if the equivalent change is needed in `scripts/install.ps1`.
6. **Signals that don't exist on Windows: `SIGALRM`, `SIGCHLD`, `SIGHUP`,
`SIGUSR1`, `SIGUSR2`, `SIGPIPE`, `SIGQUIT`, `SIGKILL`.** Python's
`signal` module raises `AttributeError` at import time if you reference
them on Windows. Use `getattr(signal, "SIGKILL", signal.SIGTERM)` or
gate the whole block behind a platform check. `loop.add_signal_handler`
raises `NotImplementedError` on Windows — always catch it.
7. **Path separators.** Use `pathlib.Path` instead of string concatenation
with `/`. Forward slashes work almost everywhere on Windows, but
`subprocess.run(["cmd.exe", "/c", ...])` and other shell contexts can
require backslashes — convert with `str(path)` at the subprocess boundary,
not inside Python logic.
8. **Symlinks need elevated privileges on Windows** (unless Developer Mode is
on). Tests that create symlinks need `@pytest.mark.skipif(sys.platform ==
"win32", reason="Symlinks require elevated privileges on Windows")`.
9. **POSIX file modes (0o600, 0o644, etc.) are NOT enforced on NTFS** by
default. Tests that assert on `stat().st_mode & 0o777` must skip on
Windows — the concept doesn't translate. Use ACLs (`icacls`, `pywin32`)
for Windows secret-file protection if needed.
10. **Detached background daemons on Windows need `pythonw.exe`, NOT
`python.exe`.** `python.exe` always allocates or attaches to a console,
which makes it vulnerable to `CTRL_C_EVENT` broadcasts from any sibling
process. `pythonw.exe` is the no-console variant. Combine with
`CREATE_NO_WINDOW | DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP |
CREATE_BREAKAWAY_FROM_JOB` in `subprocess.Popen(creationflags=...)`.
See `hermes_cli/gateway_windows.py::_spawn_detached` for the reference
implementation.
11. **`subprocess.Popen` with `.cmd` or `.bat` shims needs `shutil.which`
to resolve.** Passing `"agent-browser"` to `Popen` on Windows finds
the extensionless POSIX shebang shim in `node_modules/.bin/`, which
`CreateProcessW` can't execute — you'll get `WinError 193 "not a valid
Win32 application"`. Use `shutil.which("agent-browser", path=local_bin)`
which honors PATHEXT and picks the `.CMD` variant on Windows.
12. **Don't use shell shebangs as a way to run Python.** `#!/usr/bin/env
python` only works when the file is executed through a Unix shell.
`subprocess.run(["./myscript.py"])` on Windows fails even if the file
has a shebang line. Always invoke Python explicitly:
`[sys.executable, "myscript.py"]`.
13. **Shell commands in installers.** If you change `scripts/install.sh`,
make the equivalent change in `scripts/install.ps1`. The two scripts
are the canonical example of "works on Linux does not mean works on
Windows" and have drifted multiple times — keep them in lockstep.
14. **Known paths that are OneDrive-redirected on Windows:** Desktop,
Documents, Pictures, Videos. The "real" path when OneDrive Backup is
enabled is `%USERPROFILE%\OneDrive\Desktop` (etc.), NOT
`%USERPROFILE%\Desktop` (which exists as an empty husk). Resolve the
real location via `ctypes` + `SHGetKnownFolderPath` or by reading the
`Shell Folders` registry key — never assume `~/Desktop`.
15. **CRLF vs LF in generated scripts.** Windows `cmd.exe` and `schtasks`
parse line-by-line; mixed or LF-only line endings can break multi-line
`.cmd` / `.bat` files. Use `open(path, "w", encoding="utf-8",
newline="\r\n")` — or `open(path, "wb")` + explicit bytes — when
generating scripts Windows will execute.
16. **Two different quoting schemes in one command line.** `subprocess.run
(["schtasks", "/TR", some_cmd])` → schtasks itself parses `/TR`, AND
the `some_cmd` string is re-parsed by `cmd.exe` when the task fires.
Different parsers, different escape rules. Use two separate quoting
helpers and never cross them. See `hermes_cli/gateway_windows.py::
_quote_cmd_script_arg` and `_quote_schtasks_arg` for the reference
pair.
### Testing cross-platform
Tests that use POSIX-only syscalls need a skip marker. Common ones:
- Symlinks → `@pytest.mark.skipif(sys.platform == "win32", ...)`
- `0o600` file modes → `@pytest.mark.skipif(sys.platform.startswith("win"), ...)`
- `signal.SIGALRM` → Unix-only (see `tests/conftest.py::_enforce_test_timeout`)
- `os.setsid` / `os.fork` → Unix-only
- Live Winsock / Windows-specific regression tests →
`@pytest.mark.skipif(sys.platform != "win32", reason="Windows-specific regression")`
If you monkeypatch `sys.platform` for cross-platform tests, also patch
`platform.system()` / `platform.release()` / `platform.mac_ver()` — each
re-reads the real OS independently, so half-patched tests still route
through the wrong branch on a Windows runner.
---
+8 -1
View File
@@ -15,7 +15,14 @@ Usage::
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
import hermes_bootstrap # noqa: F401
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import asyncio
import logging
+1 -1
View File
@@ -69,7 +69,7 @@ def _resolve_home_dir() -> str:
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip() # windows-footgun: ok — POSIX fallback inside try/except (pwd import fails on Windows)
if resolved:
return resolved
except Exception:
+204 -2
View File
@@ -539,13 +539,215 @@ WSL_ENVIRONMENT_HINT = (
)
# Non-local terminal backends that run commands (and therefore every file
# tool: read_file, write_file, patch, search_files) inside a separate
# container / remote host rather than on the machine where Hermes itself
# runs. For these backends, host info (Windows/Linux/macOS, $HOME, cwd) is
# misleading — the agent should only see the machine it can actually touch.
_REMOTE_TERMINAL_BACKENDS = frozenset({
"docker", "singularity", "modal", "daytona", "ssh",
"vercel_sandbox", "managed_modal",
})
# Per-backend fallback descriptions — used when the live probe fails.
# Only states what we know from the backend choice itself (container type,
# likely OS family). Does NOT invent cwd, user, or $HOME — the agent is
# told to probe those directly if it needs them.
_BACKEND_FALLBACK_DESCRIPTIONS: dict[str, str] = {
"docker": "a Docker container (Linux)",
"singularity": "a Singularity container (Linux)",
"modal": "a Modal sandbox (Linux)",
"managed_modal": "a managed Modal sandbox (Linux)",
"daytona": "a Daytona workspace (Linux)",
"vercel_sandbox": "a Vercel sandbox (Linux)",
"ssh": "a remote host reached over SSH (likely Linux)",
}
# Cache the backend probe result per process so we only pay the probe cost
# on the first prompt build of a session. Keyed by (env_type, cwd_hint) so
# a mid-process backend switch rebuilds the string. Kept in-module (not on
# disk) because the probe captures live backend state that may change
# across Hermes restarts.
_BACKEND_PROBE_CACHE: dict[tuple[str, str], str] = {}
_WINDOWS_BASH_SHELL_HINT = (
"Shell: on this Windows host your `terminal` tool runs commands through "
"bash (git-bash / MSYS), NOT PowerShell or cmd.exe. Use POSIX shell "
"syntax (`ls`, `$HOME`, `&&`, `|`, single-quoted strings) inside terminal "
"calls. MSYS-style paths like `/c/Users/<user>/...` work alongside "
"native `C:\\Users\\<user>\\...` paths. PowerShell builtins "
"(`Get-ChildItem`, `$env:FOO`, `Select-String`) will NOT work — use their "
"POSIX equivalents (`ls`, `$FOO`, `grep`)."
)
def _probe_remote_backend(env_type: str) -> str | None:
"""Run a tiny introspection command inside the active terminal backend.
Returns a pre-formatted multi-line string describing the backend's OS,
$HOME, cwd, and user — or None if the probe failed. Result is cached
per process. Used only for non-local backends where the agent's tools
operate on a different machine than the host Hermes runs on.
"""
cwd_hint = os.getenv("TERMINAL_CWD", "")
cache_key = (env_type, cwd_hint)
cached = _BACKEND_PROBE_CACHE.get(cache_key)
if cached is not None:
return cached or None
try:
# Import locally: tools/ imports are heavy and only relevant when a
# non-local backend is actually configured.
from tools.terminal_tool import _get_env_config # type: ignore
from tools.environments import get_environment # type: ignore
except Exception as e:
logger.debug("Backend probe unavailable (import failed): %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
try:
config = _get_env_config()
env = get_environment(config)
# Single-line POSIX probe — works on any Unixy backend. Wrapped in
# `2>/dev/null` so a missing binary doesn't pollute the output.
probe_cmd = (
"printf 'os=%s\\nkernel=%s\\nhome=%s\\ncwd=%s\\nuser=%s\\n' "
"\"$(uname -s 2>/dev/null || echo unknown)\" "
"\"$(uname -r 2>/dev/null || echo unknown)\" "
"\"$HOME\" \"$(pwd)\" \"$(whoami 2>/dev/null || id -un 2>/dev/null || echo unknown)\""
)
result = env.execute(probe_cmd, timeout=4)
if result.get("returncode") != 0:
logger.debug("Backend probe returned non-zero: %r", result)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
output = (result.get("output") or "").strip()
if not output:
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
except Exception as e:
logger.debug("Backend probe failed: %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
# Parse key=value lines back into a tidy summary.
parsed: dict[str, str] = {}
for line in output.splitlines():
if "=" in line:
k, _, v = line.partition("=")
parsed[k.strip()] = v.strip()
pieces = []
os_bits = " ".join(x for x in (parsed.get("os"), parsed.get("kernel")) if x and x != "unknown")
if os_bits:
pieces.append(f"OS: {os_bits}")
if parsed.get("user") and parsed["user"] != "unknown":
pieces.append(f"User: {parsed['user']}")
if parsed.get("home"):
pieces.append(f"Home: {parsed['home']}")
if parsed.get("cwd"):
pieces.append(f"Working directory: {parsed['cwd']}")
if not pieces:
_BACKEND_PROBE_CACHE[cache_key] = ""
return None
formatted = "\n".join(f" {p}" for p in pieces)
_BACKEND_PROBE_CACHE[cache_key] = formatted
return formatted
def _clear_backend_probe_cache() -> None:
"""Test helper — drop the backend probe cache so monkeypatched backends take effect."""
_BACKEND_PROBE_CACHE.clear()
def build_environment_hints() -> str:
"""Return environment-specific guidance for the system prompt.
Detects WSL, and can be extended for Termux, Docker, etc.
Returns an empty string when no special environment is detected.
Always emits a factual block describing the execution environment:
- For **local** terminal backends: the host OS, user home, current
working directory (plus a Windows-only note about hostname != user
and a Windows-only note that `terminal` shells out to bash, not
PowerShell).
- For **remote / sandbox** terminal backends (docker, singularity,
modal, daytona, ssh, vercel_sandbox): host info is **suppressed**
because the agent's tools can't touch the host — only the backend
matters. A live probe inside the backend reports its OS, user, $HOME,
and cwd. Falls back to a static summary if the probe fails.
The WSL environment hint is appended unchanged when running under WSL.
"""
import platform
import sys
hints: list[str] = []
backend = (os.getenv("TERMINAL_ENV") or "local").strip().lower()
is_remote_backend = backend in _REMOTE_TERMINAL_BACKENDS
if not is_remote_backend:
# --- Host info block (local backend: host == where tools run) ---
host_lines: list[str] = []
if is_wsl():
host_lines.append("Host: WSL (Windows Subsystem for Linux)")
elif sys.platform == "win32":
host_lines.append(f"Host: Windows ({platform.release()})")
elif sys.platform == "darwin":
mac_ver = platform.mac_ver()[0]
host_lines.append(f"Host: macOS ({mac_ver or platform.release()})")
else:
host_lines.append(f"Host: {platform.system()} ({platform.release()})")
host_lines.append(f"User home directory: {os.path.expanduser('~')}")
try:
host_lines.append(f"Current working directory: {os.getcwd()}")
except OSError:
pass
if sys.platform == "win32" and not is_wsl():
host_lines.append(
"Note: on Windows, the machine hostname (e.g. from `hostname` "
"or uname) is NOT the username. Use the 'User home directory' "
"above to construct paths under C:\\Users\\<user>\\, never the "
"hostname."
)
hints.append("\n".join(host_lines))
# Windows-local terminal runs bash, not PowerShell — the model must
# know this or it will issue PowerShell syntax and fail.
if sys.platform == "win32" and not is_wsl():
hints.append(_WINDOWS_BASH_SHELL_HINT)
else:
# --- Remote backend block (host info suppressed) ---
probe = _probe_remote_backend(backend)
if probe:
hints.append(
f"Terminal backend: {backend}. Your `terminal`, `read_file`, "
f"`write_file`, `patch`, and `search_files` tools all operate "
f"inside this {backend} environment — NOT on the machine "
f"where Hermes itself is running. The host OS, home, and cwd "
f"of the Hermes process are irrelevant; only the following "
f"backend state matters:\n{probe}"
)
else:
description = _BACKEND_FALLBACK_DESCRIPTIONS.get(
backend, f"a {backend} environment (likely Linux)"
)
hints.append(
f"Terminal backend: {backend}. Your `terminal`, `read_file`, "
f"`write_file`, `patch`, and `search_files` tools all operate "
f"inside {description} — NOT on the machine where Hermes "
f"itself runs. The backend probe didn't respond at "
f"prompt-build time, so the sandbox's current user, $HOME, "
f"and working directory are unknown from here. If you need "
f"them, probe directly with a terminal call like "
f"`uname -a && whoami && pwd`."
)
if is_wsl():
hints.append(WSL_ENVIRONMENT_HINT)
return "\n\n".join(hints)
+8 -1
View File
@@ -22,7 +22,14 @@ Usage:
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
import hermes_bootstrap # noqa: F401
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import json
import logging
+75 -5
View File
@@ -14,7 +14,14 @@ Usage:
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
import hermes_bootstrap # noqa: F401
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import logging
import os
@@ -678,6 +685,7 @@ def _run_cleanup():
if _cleanup_done:
return
_cleanup_done = True
try:
_cleanup_all_terminals()
except Exception:
@@ -1848,9 +1856,20 @@ _TERMINAL_INPUT_MODE_RESET_SEQ = (
def _bind_prompt_submit_keys(kb, handler) -> None:
"""Bind both CR and LF terminal Enter forms to the submit handler."""
for key in ("enter", "c-j"):
kb.add(key)(handler)
"""Bind terminal Enter forms to the submit handler.
Enter is always submit. On POSIX we also bind c-j (LF) to submit because
some thin PTYs (docker exec, certain SSH flavors) deliver Enter as LF
instead of CR without this, Enter appears dead on those terminals.
On Windows, Windows Terminal delivers Ctrl+Enter as a distinct c-j key
while plain Enter is c-m, so we leave c-j unbound here it becomes the
multi-line newline keystroke, giving Windows users an Enter-involving
newline without any terminal settings changes.
"""
kb.add("enter")(handler)
if sys.platform != "win32":
kb.add("c-j")(handler)
def _disable_prompt_toolkit_cpr_warning(app) -> None:
@@ -10636,9 +10655,30 @@ class HermesCLI:
@kb.add('escape', 'enter')
def handle_alt_enter(event):
"""Alt+Enter inserts a newline for multi-line input."""
"""Alt+Enter inserts a newline for multi-line input.
Works on mac/Linux/WSL. On Windows Terminal this keystroke is
intercepted at the terminal layer (toggles fullscreen) and never
reaches here Windows users get newline via Ctrl+Enter instead
(bound below as c-j, since WT delivers Ctrl+Enter as LF).
"""
event.current_buffer.insert_text('\n')
if sys.platform == "win32":
@kb.add('c-j')
def handle_ctrl_enter_newline_windows(event):
"""Ctrl+Enter inserts a newline on Windows.
Windows Terminal delivers Ctrl+Enter as LF (c-j), distinct
from plain Enter (c-m). This binding makes Ctrl+Enter the
Windows equivalent of Alt+Enter, giving an Enter-involving
newline keystroke without requiring terminal settings changes.
Ctrl+J (the raw LF keystroke) also triggers this by virtue
of being the same key code a harmless side effect since
Ctrl+J has no conflicting Hermes binding.
"""
event.current_buffer.insert_text('\n')
# VSCode/Cursor bind Ctrl+G to "Find Next" at the editor level, so
# the keystroke never reaches the embedded terminal. Alt+G is unbound
# in those IDEs and arrives here as ('escape', 'g') — register it as
@@ -12224,6 +12264,36 @@ class HermesCLI:
_signal.signal(_signal.SIGTERM, _signal_handler)
if hasattr(_signal, 'SIGHUP'):
_signal.signal(_signal.SIGHUP, _signal_handler)
# Windows: install a SIGINT handler that absorbs the signal
# instead of letting Python's default handler raise
# KeyboardInterrupt in MainThread. Windows Terminal / Win32
# delivers spurious CTRL_C_EVENT to the hermes process when
# child processes are spawned from background threads (agent
# subprocess Popen path). The default Python SIGINT handler
# would then unwind prompt_toolkit's app.run(), trigger
# _run_cleanup mid-turn, and close browser sessions mid-open
# — causing "Daemon process exited during startup" errors.
#
# The handler is a silent no-op. Real user Ctrl+C still works
# because prompt_toolkit binds c-c at the TUI layer and never
# reaches this OS-signal path. This matches how Claude Code
# handles the same Windows quirk (cancellation is driven by
# the TUI key handler, not by OS signals).
#
# POSIX: leave the default SIGINT handler alone. prompt_toolkit
# installs its own handler there and it works as expected.
if sys.platform == "win32":
def _sigint_absorb(signum, frame):
# Absorb silently. Do NOT call agent.interrupt() here:
# Windows fires spurious CTRL_C_EVENT whenever a
# background thread spawns a .cmd subprocess, and
# interrupt() would inject a fake user message each
# time. Real user Ctrl+C routes through prompt_toolkit's
# own c-c key binding at the TUI layer (same pattern as
# Claude Code's Windows handling).
return
_signal.signal(_signal.SIGINT, _sigint_absorb)
except Exception:
pass # Signal handlers may fail in restricted environments
+29 -10
View File
@@ -107,12 +107,15 @@ def _kill_stale_bridge_by_pidfile(session_path: Path) -> None:
except OSError:
pass
return
try:
os.kill(pid, 0) # check existence
os.kill(pid, signal.SIGTERM)
logger.info("[whatsapp] Killed stale bridge PID %d from pidfile", pid)
except (ProcessLookupError, PermissionError, OSError):
pass
# ``os.kill(pid, 0)`` is NOT a no-op on Windows (bpo-14484) — use the
# cross-platform existence check before sending a real signal.
from gateway.status import _pid_exists
if _pid_exists(pid):
try:
os.kill(pid, signal.SIGTERM)
logger.info("[whatsapp] Killed stale bridge PID %d from pidfile", pid)
except (ProcessLookupError, PermissionError, OSError):
pass
try:
pid_file.unlink()
except OSError:
@@ -152,10 +155,26 @@ def _terminate_bridge_process(proc, *, force: bool = False) -> None:
raise OSError(details or f"taskkill failed for PID {proc.pid}")
return
import signal
sig = signal.SIGTERM if not force else signal.SIGKILL
os.killpg(os.getpgid(proc.pid), sig)
import psutil
try:
parent = psutil.Process(proc.pid)
children = parent.children(recursive=True)
if force:
for child in children:
try:
child.kill()
except psutil.NoSuchProcess:
pass
parent.kill()
else:
for child in children:
try:
child.terminate()
except psutil.NoSuchProcess:
pass
parent.terminate()
except psutil.NoSuchProcess:
return
import sys
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
+45 -14
View File
@@ -15,7 +15,14 @@ Usage:
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
import hermes_bootstrap # noqa: F401
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import asyncio
import dataclasses
@@ -2805,10 +2812,36 @@ class GatewayRunner:
pid = int(sys.argv[1])
cmd = sys.argv[2:]
deadline = time.monotonic() + 120
while time.monotonic() < deadline:
def _alive(p):
# On Windows, os.kill(pid, 0) is NOT a no-op — it maps to
# GenerateConsoleCtrlEvent(0, pid) (bpo-14484). Use the
# Win32 handle-based existence check instead.
if os.name == 'nt':
import ctypes
k32 = ctypes.windll.kernel32
k32.OpenProcess.restype = ctypes.c_void_p
k32.WaitForSingleObject.restype = ctypes.c_uint
k32.GetLastError.restype = ctypes.c_uint
h = k32.OpenProcess(0x1000 | 0x100000, False, int(p))
if not h:
return k32.GetLastError() != 87
try:
return k32.WaitForSingleObject(h, 0) == 0x102
finally:
k32.CloseHandle(h)
try:
os.kill(pid, 0)
except (ProcessLookupError, PermissionError, OSError):
os.kill(int(p), 0)
return True
except ProcessLookupError:
return False
except PermissionError:
return True
except OSError:
return False
while time.monotonic() < deadline:
if not _alive(pid):
break
time.sleep(0.2)
_CREATE_NEW_PROCESS_GROUP = 0x00000200
@@ -15189,16 +15222,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
except Exception:
pass
return False
# Wait up to 10 seconds for the old process to exit
# Wait up to 10 seconds for the old process to exit.
# ``os.kill(pid, 0)`` on Windows is NOT a no-op — use the
# handle-based existence check instead.
from gateway.status import _pid_exists
for _ in range(20):
try:
os.kill(existing_pid, 0)
time.sleep(0.5)
except (ProcessLookupError, PermissionError, OSError):
# OSError covers Windows' WinError 87 "invalid parameter"
# for an already-gone PID — without this the probe loop
# busy-spins for the full 10s on every --replace start.
if not _pid_exists(existing_pid):
break # Process is gone
time.sleep(0.5)
else:
# Still alive after 10s — force kill
logger.warning(
@@ -15364,12 +15395,12 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
if threading.current_thread() is threading.main_thread():
for sig in (signal.SIGINT, signal.SIGTERM):
try:
loop.add_signal_handler(sig, shutdown_signal_handler, sig)
loop.add_signal_handler(sig, shutdown_signal_handler, sig) # windows-footgun: ok — wrapped in try/except NotImplementedError for Windows
except NotImplementedError:
pass
if hasattr(signal, "SIGUSR1"):
try:
loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler) # windows-footgun: ok — POSIX signal, guarded by hasattr above + try/except NotImplementedError
except NotImplementedError:
pass
else:
+78 -19
View File
@@ -299,6 +299,81 @@ def _try_acquire_file_lock(handle) -> bool:
return False
def _pid_exists(pid: int) -> bool:
"""Cross-platform "is this PID alive" check that does NOT kill the target.
CRITICAL on Windows: Python's ``os.kill(pid, 0)`` is NOT a no-op like it
is on POSIX. CPython's Windows implementation
(``Modules/posixmodule.c::os_kill_impl``) treats ``sig=0`` as
``CTRL_C_EVENT`` because the two values collide at the C level, and
routes it through ``GenerateConsoleCtrlEvent(0, pid)`` — which sends
a Ctrl+C to the entire console process group containing the target
PID, not just the PID itself. Any caller that wanted to "check if
this PID is alive" via ``os.kill(pid, 0)`` on Windows was silently
killing that process (and often unrelated processes in the same
console group). Long-standing Python quirk; see bpo-14484.
Implementation: prefer :mod:`psutil` (hard dependency — the canonical
cross-platform answer, maintained by Giampaolo Rodolà, uses
``OpenProcess + GetExitCodeProcess`` on Windows internally). Fall back
to a hand-rolled ctypes ``OpenProcess`` / ``WaitForSingleObject`` pair
on Windows + ``os.kill(pid, 0)`` on POSIX if psutil is somehow
unavailable — e.g. stripped-down install or import error during the
scaffold phase before ``psutil`` is pip-installed.
"""
try:
import psutil # type: ignore
return bool(psutil.pid_exists(int(pid)))
except ImportError:
pass # Fall through to stdlib fallback.
if _IS_WINDOWS:
try:
import ctypes
kernel32 = ctypes.windll.kernel32 # type: ignore[attr-defined]
# Pin return types — default ctypes restype is c_int (signed),
# which mangles WAIT_* DWORD return codes into negative numbers.
kernel32.OpenProcess.restype = ctypes.c_void_p
kernel32.WaitForSingleObject.restype = ctypes.c_uint
kernel32.GetLastError.restype = ctypes.c_uint
PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
SYNCHRONIZE = 0x100000 # required for WaitForSingleObject
WAIT_TIMEOUT = 0x00000102
ERROR_INVALID_PARAMETER = 87
ERROR_ACCESS_DENIED = 5
handle = kernel32.OpenProcess(
PROCESS_QUERY_LIMITED_INFORMATION | SYNCHRONIZE, False, int(pid)
)
if not handle:
err = kernel32.GetLastError()
if err == ERROR_INVALID_PARAMETER:
return False # PID definitely gone
if err == ERROR_ACCESS_DENIED:
return True # Exists but owned by another user/session
return False # Conservative default for unknown errors
try:
wait_result = kernel32.WaitForSingleObject(handle, 0)
# WAIT_TIMEOUT = still running; anything else (WAIT_OBJECT_0
# via exit, WAIT_FAILED via handle issue) = treat as gone.
return wait_result == WAIT_TIMEOUT
finally:
kernel32.CloseHandle(handle)
except (OSError, AttributeError):
return False
else:
try:
os.kill(int(pid), 0) # windows-footgun: ok — POSIX-only branch (the whole point of _pid_exists)
return True
except ProcessLookupError:
return False
except PermissionError:
# Process exists but we can't signal it — still alive.
return True
except OSError:
return False
def _release_file_lock(handle) -> None:
try:
if _IS_WINDOWS:
@@ -503,10 +578,7 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
stale = existing_pid is None
if not stale:
try:
os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError, OSError):
# Windows raises OSError with WinError 87 for invalid pid check
if not _pid_exists(existing_pid):
stale = True
else:
current_start = _get_process_start_time(existing_pid)
@@ -517,7 +589,7 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
):
stale = True
# Check if process is stopped (Ctrl+Z / SIGTSTP) — stopped
# processes still respond to os.kill(pid, 0) but are not
# processes still appear alive to _pid_exists but are not
# actually running. Treat them as stale so --replace works.
if not stale:
try:
@@ -824,20 +896,7 @@ def get_running_pid(
if pid is None:
continue
try:
os.kill(pid, 0) # signal 0 = existence check, no actual signal sent
except ProcessLookupError:
continue
except PermissionError:
# The process exists but belongs to another user/service scope.
# With the runtime lock still held, prefer keeping it visible
# rather than deleting the PID file as "stale".
if _record_looks_like_gateway(record):
return pid
continue
except OSError:
# Windows raises OSError with WinError 87 for an invalid pid
# (process is definitely gone). Treat as "process doesn't exist".
if not _pid_exists(pid):
continue
recorded_start = record.get("start_time")
+19 -8
View File
@@ -893,7 +893,7 @@ def _file_lock(
if msvcrt and (not lock_path.exists() or lock_path.stat().st_size == 0):
lock_path.write_text(" ", encoding="utf-8")
with lock_path.open("r+" if msvcrt else "a+") as lock_file:
with lock_path.open("r+" if msvcrt else "a+", encoding="utf-8") as lock_file:
deadline = time.monotonic() + max(1.0, timeout_seconds)
while True:
try:
@@ -2827,9 +2827,12 @@ def _poll_for_token(
# import instead of running the full device-code flow every time.
#
# File lives at ${HERMES_SHARED_AUTH_DIR}/nous_auth.json, defaulting to
# ~/.hermes/shared/nous_auth.json. It is OUTSIDE any named profile's
# HERMES_HOME so named profiles (which typically live under
# ~/.hermes/profiles/<name>/) all see the same file.
# ``<hermes-root>/shared/nous_auth.json`` where ``<hermes-root>`` is what
# ``get_default_hermes_root()`` returns — ``~/.hermes`` on Linux/macOS,
# ``%LOCALAPPDATA%\hermes`` on native Windows, or the Docker/custom root.
# It is OUTSIDE any named profile's HERMES_HOME so named profiles (which
# typically live under ``<hermes-root>/profiles/<name>/``) all see the
# same file.
#
# Written on successful login and on every runtime refresh so the stored
# refresh_token stays current even if one profile refreshes and rotates it.
@@ -2846,25 +2849,33 @@ def _nous_shared_auth_dir() -> Path:
Honors ``HERMES_SHARED_AUTH_DIR`` so tests can redirect it to a tmp
path without touching the real user's home. Defaults to
``~/.hermes/shared/``.
``<hermes-root>/shared/``, where ``<hermes-root>`` is what
:func:`hermes_constants.get_default_hermes_root` returns — so
Linux/macOS classic installs land at ``~/.hermes/shared/``, native
Windows installs at ``%LOCALAPPDATA%\\hermes\\shared\\``, and
Docker / custom ``HERMES_HOME`` deployments at
``<HERMES_HOME>/shared/``. Sits outside any named profile so all
profiles under the same root share the store.
"""
override = os.getenv("HERMES_SHARED_AUTH_DIR", "").strip()
if override:
return Path(override).expanduser()
return Path.home() / ".hermes" / "shared"
from hermes_constants import get_default_hermes_root
return get_default_hermes_root() / "shared"
def _nous_shared_store_path() -> Path:
path = _nous_shared_auth_dir() / NOUS_SHARED_STORE_FILENAME
# Seat belt: if pytest is running and this resolves to a path under the
# real user's home, refuse rather than silently corrupt cross-profile
# real user's Hermes root, refuse rather than silently corrupt cross-profile
# state. Tests must set HERMES_SHARED_AUTH_DIR to a tmp_path (conftest
# does not do this automatically — mirror the _auth_file_path() guard
# so forgetting to set it fails loudly instead of writing to the real
# shared store).
if os.environ.get("PYTEST_CURRENT_TEST"):
from hermes_constants import get_default_hermes_root
real_home_shared = (
Path.home() / ".hermes" / "shared" / NOUS_SHARED_STORE_FILENAME
get_default_hermes_root() / "shared" / NOUS_SHARED_STORE_FILENAME
).resolve(strict=False)
try:
resolved = path.resolve(strict=False)
+1 -1
View File
@@ -246,7 +246,7 @@ def auth_add_command(args) -> None:
if provider == "nous":
# Codex-style auto-import: if a shared Nous credential lives at
# ~/.hermes/shared/nous_auth.json (written by any previous
# <hermes-root>/shared/nous_auth.json (written by any previous
# successful login), offer to import it instead of running the
# full device-code flow. This makes `hermes --profile <name>
# auth add nous --type oauth` a one-tap operation for users who
+9 -8
View File
@@ -4148,8 +4148,9 @@ def load_env() -> Dict[str, str]:
if env_path.exists():
# On Windows, open() defaults to the system locale (cp1252) which can
# fail on UTF-8 .env files. Use explicit UTF-8 only on Windows.
open_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
# fail on UTF-8 .env files. Always use explicit UTF-8; tolerate BOM
# via utf-8-sig since users may edit .env in Notepad which adds one.
open_kw = {"encoding": "utf-8-sig", "errors": "replace"}
with open(env_path, **open_kw) as f:
raw_lines = f.readlines()
# Sanitize before parsing: split concatenated lines & drop stale
@@ -4234,8 +4235,8 @@ def sanitize_env_file() -> int:
if not env_path.exists():
return 0
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
with open(env_path, **read_kw) as f:
original_lines = f.readlines()
@@ -4324,8 +4325,8 @@ def save_env_value(key: str, value: str):
# On Windows, open() defaults to the system locale (cp1252) which can
# cause OSError errno 22 on UTF-8 .env files.
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
lines = []
if env_path.exists():
@@ -4394,8 +4395,8 @@ def remove_env_value(key: str) -> bool:
os.environ.pop(key, None)
return False
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
write_kw = {"encoding": "utf-8"}
with open(env_path, **read_kw) as f:
lines = f.readlines()
+53
View File
@@ -1035,10 +1035,13 @@ def run_doctor(args):
check_ok("Node.js")
# Check if agent-browser is installed
agent_browser_path = PROJECT_ROOT / "node_modules" / "agent-browser"
agent_browser_ok = False
if agent_browser_path.exists():
check_ok("agent-browser (Node.js)", "(browser automation)")
agent_browser_ok = True
elif shutil.which("agent-browser"):
check_ok("agent-browser", "(browser automation)")
agent_browser_ok = True
else:
if _is_termux():
check_info("agent-browser is not installed (expected in the tested Termux path)")
@@ -1048,6 +1051,56 @@ def run_doctor(args):
check_info(step)
else:
check_warn("agent-browser not installed", "(run: npm install)")
# Chromium presence — the browser tools silently fail to register when
# agent-browser is found but no Playwright-managed Chromium is on disk
# (tools/browser_tool.py::check_browser_requirements filters them out
# before the agent ever sees them). Reuse the exact predicate it uses
# so the two checks cannot diverge. Skip on Termux (not a tested
# path).
if agent_browser_ok and not _is_termux():
try:
# Lazy import: browser_tool is a ~150KB module we don't want
# to eagerly load in every `hermes doctor` invocation.
from tools.browser_tool import (
_chromium_installed,
_is_camofox_mode,
_get_cloud_provider,
_get_cdp_override,
_using_lightpanda_engine,
)
except Exception:
# If browser_tool can't even import, that's a separate bug
# surfaced elsewhere; don't crash doctor.
pass
else:
# Only warn about Chromium if the installed engine actually
# requires it: Camofox, CDP override, a cloud provider, or
# Lightpanda all bypass the local Chromium requirement.
skip_chromium_check = (
_is_camofox_mode()
or bool(_get_cdp_override())
or _get_cloud_provider() is not None
or _using_lightpanda_engine()
)
if not skip_chromium_check:
if _chromium_installed():
check_ok("Playwright Chromium", "(browser engine)")
else:
check_warn(
"Playwright Chromium not installed",
"(browser_* tools will be hidden from the agent)",
)
if sys.platform == "win32":
check_info(
f"Install with: cd {PROJECT_ROOT} && "
"npx playwright install chromium"
)
else:
check_info(
f"Install with: cd {PROJECT_ROOT} && "
"npx playwright install --with-deps chromium"
)
else:
if _is_termux():
check_info("Node.js not found (browser tools are optional in the tested Termux path)")
+1 -1
View File
@@ -113,7 +113,7 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
except ImportError:
return # early bootstrap — config module not available yet
read_kw = {"encoding": "utf-8", "errors": "replace"}
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
try:
with open(path, **read_kw) as f:
original = f.readlines()
+343 -48
View File
@@ -131,9 +131,26 @@ def _get_service_pids() -> set:
def _get_parent_pid(pid: int) -> int | None:
"""Return the parent PID for ``pid``, or ``None`` when unavailable."""
"""Return the parent PID for ``pid``, or ``None`` when unavailable.
Uses psutil (core dependency) which works on every platform. The
older implementation shelled out to ``ps -o ppid= -p <pid>``, which
silently fails on Windows (no ``ps``) so the ancestor walk terminated
at self the caller's dedup / exclude logic then couldn't distinguish
"hermes CLI that invoked this scan" from "real gateway process".
"""
if pid <= 1:
return None
try:
import psutil # type: ignore
return psutil.Process(pid).ppid() or None
except ImportError:
pass
except Exception:
return None
# Fallback: shell out to ps (POSIX only — bare ``ps`` doesn't exist on Windows).
if not shutil.which("ps"):
return None
try:
result = subprocess.run(
["ps", "-o", "ppid=", "-p", str(pid)],
@@ -177,7 +194,7 @@ def _request_gateway_self_restart(pid: int) -> bool:
if not _is_pid_ancestor_of_current_process(pid):
return False
try:
os.kill(pid, signal.SIGUSR1)
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
except (ProcessLookupError, PermissionError, OSError):
return False
return True
@@ -213,7 +230,7 @@ def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
if pid <= 0:
return False
try:
os.kill(pid, signal.SIGUSR1)
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
except ProcessLookupError:
# Already gone — nothing to drain.
return True
@@ -223,18 +240,15 @@ def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
import time as _time
deadline = _time.monotonic() + max(drain_timeout, 1.0)
# IMPORTANT Windows note: ``os.kill(pid, 0)`` is NOT a no-op on
# Windows — Python's implementation calls ``TerminateProcess(handle, 0)``
# for sig=0, hard-killing the target. Use the cross-platform
# ``_pid_exists`` helper in gateway.status which does OpenProcess +
# WaitForSingleObject on Windows.
from gateway.status import _pid_exists
while _time.monotonic() < deadline:
try:
os.kill(pid, 0) # signal 0 — probe liveness
except ProcessLookupError:
return True
except PermissionError:
# Process still exists but we can't signal it. Treat as alive
# so the caller falls back.
pass
except OSError:
# Windows raises OSError (WinError 87 "invalid parameter") for
# a gone PID — treat the same as ProcessLookupError.
if not _pid_exists(pid):
return True
_time.sleep(0.5)
# Drain didn't finish in time.
@@ -303,6 +317,11 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
or f"HERMES_HOME={current_home}" in command
)
# Default-profile case: no profile flag in argv. Accept as long as
# the command doesn't advertise *some other* profile. HERMES_HOME
# may be passed via env (not visible in wmic/CIM command line) so
# its absence is NOT disqualifying — only a non-matching explicit
# HERMES_HOME= in argv is.
if "--profile " in command or " -p " in command:
return False
if "HERMES_HOME=" in command and f"HERMES_HOME={current_home}" not in command:
@@ -311,14 +330,52 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
try:
if is_windows():
result = subprocess.run(
["wmic", "process", "get", "ProcessId,CommandLine", "/FORMAT:LIST"],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=10,
)
# Prefer wmic when present (fast, stable output format). On
# modern Windows 11 / Win 10 late builds, wmic has been
# removed as part of the WMIC deprecation — fall back to
# PowerShell's Get-CimInstance. Any OSError here (FileNotFoundError
# on missing wmic) trips the fallback.
wmic_path = shutil.which("wmic")
used_fallback = False
result = None
if wmic_path is not None:
try:
result = subprocess.run(
[wmic_path, "process", "get", "ProcessId,CommandLine", "/FORMAT:LIST"],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=10,
)
except (OSError, subprocess.TimeoutExpired):
result = None
if result is None or result.returncode != 0 or not (result.stdout or ""):
# Fallback: PowerShell Get-CimInstance, emit LIST-style output
# so the downstream parser below doesn't need to branch.
powershell = shutil.which("powershell") or shutil.which("pwsh")
if powershell is None:
return []
ps_cmd = (
"Get-CimInstance Win32_Process | "
"ForEach-Object { "
" 'CommandLine=' + ($_.CommandLine -replace \"`r`n\",' ' -replace \"`n\",' '); "
" 'ProcessId=' + $_.ProcessId; "
" '' "
"}"
)
try:
result = subprocess.run(
[powershell, "-NoProfile", "-Command", ps_cmd],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=15,
)
except (OSError, subprocess.TimeoutExpired):
return []
used_fallback = True
if result.returncode != 0 or result.stdout is None:
return []
current_cmd = ""
@@ -376,9 +433,53 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
except (OSError, subprocess.TimeoutExpired):
return []
# Windows-specific: collapse venv launcher stubs. A venv-built
# ``pythonw.exe`` in ``<venv>/Scripts/`` is a ~100 KB launcher exe
# that spawns the base Python (e.g. ``C:\Program Files\Python311\
# pythonw.exe``) with the same command line, preserving the venv's
# ``pyvenv.cfg`` context. This is standard Windows CPython venv
# behaviour — BUT it means every gateway run produces two pythonw
# PIDs with identical command lines (one launcher stub, one actual
# interpreter) which is confusing in ``gateway status`` output.
# Filter the stub: if a PID in our result is the PARENT of another
# PID in our result, and both are pythonw.exe, the parent is the
# launcher stub — drop it, keep the child.
if is_windows() and len(pids) > 1:
pids = _filter_venv_launcher_stubs(pids)
return pids
def _filter_venv_launcher_stubs(pids: list[int]) -> list[int]:
"""Drop venv-launcher ``pythonw.exe`` stubs that are parents of the real
interpreter process. See comment at the tail of ``_scan_gateway_pids``.
Uses ``psutil`` (core dependency). Safe on any platform; only invoked
on Windows by the caller because the stub pattern is Windows-specific.
"""
try:
import psutil # type: ignore
except ImportError:
return pids
pid_set = set(pids)
# Collect each PID's parent so we can flag "child of another matched PID".
parent_of: dict[int, int | None] = {}
for pid in pids:
try:
parent_of[pid] = psutil.Process(pid).ppid()
except (psutil.NoSuchProcess, psutil.AccessDenied):
parent_of[pid] = None
# For each child whose parent is also in our set, drop the parent.
drop: set[int] = set()
for pid, ppid in parent_of.items():
if ppid is not None and ppid in pid_set:
drop.add(ppid)
return [p for p in pids if p not in drop]
def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = False) -> list:
"""Find PIDs of running gateway processes.
@@ -475,14 +576,10 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
cmd = sys.argv[2:]
deadline = time.monotonic() + 120
while time.monotonic() < deadline:
try:
os.kill(pid, 0)
except ProcessLookupError:
break
except PermissionError:
pass
except OSError:
# Windows: gone PID raises OSError (WinError 87).
# ``os.kill(pid, 0)`` is not a no-op on Windows — use the
# cross-platform existence check.
from gateway.status import _pid_exists
if not _pid_exists(pid):
break
time.sleep(0.2)
@@ -969,15 +1066,14 @@ def stop_profile_gateway() -> bool:
print(f"⚠ Permission denied to kill PID {pid}")
return False
# Wait briefly for it to exit
# Wait briefly for it to exit. On Windows, os.kill(pid, 0) is NOT
# a no-op — route through the cross-platform existence check.
import time as _time
from gateway.status import _pid_exists
for _ in range(20):
try:
os.kill(pid, 0)
_time.sleep(0.5)
except (ProcessLookupError, PermissionError, OSError):
# OSError covers Windows' WinError 87 for gone PIDs.
if not _pid_exists(pid):
break
_time.sleep(0.5)
if get_running_pid() is None:
remove_pid_file()
@@ -1161,13 +1257,13 @@ class SystemScopeRequiresRootError(RuntimeError):
def _user_dbus_socket_path() -> Path:
"""Return the expected per-user D-Bus socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
return Path(xdg) / "bus"
def _user_systemd_private_socket_path() -> Path:
"""Return the per-user systemd private socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
return Path(xdg) / "systemd" / "private"
@@ -1190,7 +1286,7 @@ def _ensure_user_systemd_env() -> None:
We detect the standard socket path and set the vars so all subsequent
subprocess calls inherit them.
"""
uid = os.getuid()
uid = os.getuid() # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
if "XDG_RUNTIME_DIR" not in os.environ:
runtime_dir = f"/run/user/{uid}"
if Path(runtime_dir).exists():
@@ -1256,7 +1352,7 @@ def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
username,
reason="User systemd control sockets are missing even though linger is enabled.",
fix_hint=(
f" systemctl start user@{os.getuid()}.service\n"
f" systemctl start user@{os.getuid()}.service\n" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
" (may require sudo; try again after the command succeeds)"
),
)
@@ -1526,7 +1622,7 @@ def remove_legacy_hermes_units(
# System-scope removal (needs root)
if system_units:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd removal path, guarded by `if system == "Linux"` / systemd-only branch
print()
print_warning("System-scope legacy units require root to remove.")
print_info(" Re-run with: sudo hermes gateway migrate-legacy")
@@ -1573,7 +1669,7 @@ def print_systemd_scope_conflict_warning() -> None:
def _require_root_for_system_service(action: str) -> None:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
raise SystemScopeRequiresRootError(
f"System gateway {action} requires root. Re-run with sudo.",
action,
@@ -1641,7 +1737,7 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b
if scope == "system":
run_as_user = _default_system_service_user()
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd install wizard, never invoked on Windows
print_warning(" System service install requires sudo, so Hermes can't create it from this user session.")
if run_as_user:
print_info(f" After setup, run: sudo hermes gateway install --system --run-as-user {run_as_user}")
@@ -1685,7 +1781,7 @@ def get_systemd_linger_status() -> tuple[bool | None, str]:
if not username:
try:
import pwd
username = pwd.getpwuid(os.getuid()).pw_name
username = pwd.getpwuid(os.getuid()).pw_name # windows-footgun: ok — POSIX loginctl helper, never invoked on Windows
except Exception:
return None, "could not determine current user"
@@ -1735,7 +1831,7 @@ def _launchd_user_home() -> Path:
"""
import pwd
return Path(pwd.getpwuid(os.getuid()).pw_dir)
return Path(pwd.getpwuid(os.getuid()).pw_dir) # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
def get_launchd_plist_path() -> Path:
@@ -2134,7 +2230,7 @@ def _system_scope_wizard_would_need_root(system: bool = False) -> bool:
``SystemScopeRequiresRootError`` propagate out and leave the user
staring at a bare shell.
"""
if os.geteuid() == 0:
if os.geteuid() == 0: # windows-footgun: ok — systemd scope wizard decision, never invoked on Windows
return False
return _select_systemd_scope(system=system)
@@ -2485,7 +2581,7 @@ def get_launchd_label() -> str:
def _launchd_domain() -> str:
return f"gui/{os.getuid()}"
return f"gui/{os.getuid()}" # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
def generate_launchd_plist() -> str:
@@ -2860,6 +2956,62 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
_guard_official_docker_root_gateway()
sys.path.insert(0, str(PROJECT_ROOT))
# On Windows, when the gateway is launched as a detached background
# process (via ``hermes gateway install`` → Scheduled Task / Startup
# folder / direct pythonw.exe spawn) there is no console attached. In
# that case Windows can still deliver CTRL_C_EVENT / CTRL_BREAK_EVENT
# to the process group under some circumstances (e.g. when *another*
# process in the same group sends one), which Python 3.11 translates
# into KeyboardInterrupt inside asyncio.run(). The outer handler below
# catches that and exits cleanly — silently killing the gateway. On
# detached boots we must absorb those spurious signals so the gateway
# stays alive; real user Ctrl+C still comes through prompt_toolkit /
# the asyncio signal handler when running in a real console.
#
# IMPORTANT lesson (May 2026): we originally gated this on "stdin is
# NOT a TTY" assuming only detached pythonw runs would be vulnerable.
# Wrong. When the user runs `hermes gateway start` from a PowerShell
# console, the gateway inherits that console and stdin IS a TTY —
# but it's STILL vulnerable to CTRL_C_EVENT broadcast by any sibling
# `hermes` invocation (like `hermes gateway status` 30 seconds later)
# because Windows routes console events to all processes sharing the
# console. Every hermes CLI process after that sibling fires is a
# potential drive-by killer. So on Windows, for `gateway run`
# specifically (never interactive by design), always install the
# SIGINT absorber regardless of TTY state.
try:
_stdin_is_tty = bool(sys.stdin and sys.stdin.isatty())
except (ValueError, OSError):
_stdin_is_tty = False
if is_windows():
try:
signal.signal(signal.SIGINT, signal.SIG_IGN)
if hasattr(signal, "SIGBREAK"):
signal.signal(signal.SIGBREAK, signal.SIG_IGN)
except (OSError, ValueError):
# SetConsoleCtrlHandler not available (rare on Windows) —
# best-effort, proceed either way.
pass
# Python's signal module only hooks SIGINT/SIGBREAK. To also
# absorb CTRL_CLOSE_EVENT / CTRL_LOGOFF_EVENT and any other
# console control signals Windows may broadcast to the console
# process group, call the native SetConsoleCtrlHandler(NULL, TRUE)
# — this tells the kernel to IGNORE all console control events
# for this process entirely, which is what background services
# are supposed to do. Belt-and-braces over the Python-level
# handlers above.
try:
import ctypes
kernel32 = ctypes.windll.kernel32 # type: ignore[attr-defined]
# BOOL SetConsoleCtrlHandler(NULL, Add) — Add=TRUE means
# "install the NULL handler", which has the documented
# effect of ignoring Ctrl+C. Called twice for defense in
# depth: once before any Python import could have flipped
# our disposition, once as our last word.
kernel32.SetConsoleCtrlHandler(None, 1)
except (OSError, AttributeError):
pass
# Refresh the systemd unit definition on every boot so that restart
# settings (RestartSec, StartLimitIntervalSec, etc.) stay current even
# when the process was respawned via exit-code-75 (stale-code or
@@ -2887,13 +3039,86 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
# Exit with code 1 if gateway fails to connect any platform,
# so systemd Restart=always will retry on transient errors
verbosity = None if quiet else verbose
# ── Exit-path diagnostics ────────────────────────────────────────────
# When the gateway dies silently on Windows (no shutdown log, no
# traceback in gateway.log / errors.log), we're usually blind to the
# cause. The code below captures *every* way the asyncio.run() call
# below can return, with full context dumped to a dedicated log so
# the next silent death yields evidence instead of a mystery. This
# is diagnostic scaffolding; cheap to keep on, costs nothing during
# normal operation, and the emitted lines are opt-in via the
# HERMES_GATEWAY_EXIT_DIAG env var (default: on while we're still
# chasing the Windows lifecycle bug).
import atexit as _atexit
import traceback as _traceback
from datetime import datetime as _dt, timezone as _tz
def _exit_diag(tag: str, **extra: object) -> None:
if os.environ.get("HERMES_GATEWAY_EXIT_DIAG", "1") != "1":
return
try:
from hermes_constants import get_hermes_home as _ghh
log_dir = _ghh() / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
ts = _dt.now(_tz.utc).isoformat()
line = {
"ts": ts,
"tag": tag,
"pid": os.getpid(),
"python": sys.version.split()[0],
"platform": sys.platform,
**extra,
}
import json as _json
with open(log_dir / "gateway-exit-diag.log", "a", encoding="utf-8") as f:
f.write(_json.dumps(line, default=str) + "\n")
except Exception:
pass # never let the diagnostic itself crash the gateway
_exit_diag(
"gateway.start",
replace=replace,
argv=sys.argv,
stdin_is_tty=_stdin_is_tty,
)
def _atexit_hook() -> None:
_exit_diag("atexit.hook", sys_exc=repr(sys.exc_info()))
_atexit.register(_atexit_hook)
success = False
try:
success = asyncio.run(start_gateway(replace=replace, verbosity=verbosity))
_exit_diag("asyncio.run.returned", success=success)
except KeyboardInterrupt:
# On Windows-detached runs this shouldn't fire (we absorb SIGINT above),
# but keep the handler for console runs.
_exit_diag(
"asyncio.run.KeyboardInterrupt",
traceback=_traceback.format_exc(),
)
print("\nGateway stopped.")
return
except SystemExit as e:
_exit_diag("asyncio.run.SystemExit", code=getattr(e, "code", None),
traceback=_traceback.format_exc())
raise
except BaseException as e:
# Absolutely everything else: Exception, asyncio.CancelledError,
# even exotic BaseException subclasses. We want the cause logged.
_exit_diag(
"asyncio.run.exception",
exc_type=type(e).__name__,
exc_repr=repr(e),
traceback=_traceback.format_exc(),
)
raise
if not success:
_exit_diag("gateway.exit_nonzero")
sys.exit(1)
_exit_diag("gateway.exit_clean")
# =============================================================================
@@ -3741,6 +3966,9 @@ def _is_service_installed() -> bool:
return get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()
elif is_macos():
return get_launchd_plist_path().exists()
elif is_windows():
from hermes_cli import gateway_windows
return gateway_windows.is_installed()
return False
@@ -3782,6 +4010,12 @@ def _is_service_running() -> bool:
return result.returncode == 0
except subprocess.TimeoutExpired:
return False
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
# "installed" doesn't necessarily mean "running" on Windows. The
# canonical check is whether a gateway process actually exists.
return len(find_gateway_pids()) > 0
# Check for manual processes
return len(find_gateway_pids()) > 0
@@ -4630,6 +4864,9 @@ def _gateway_command_inner(args):
systemd_install(force=force, system=system, run_as_user=run_as_user)
elif is_macos():
launchd_install(force)
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.install(force=force)
elif is_wsl():
print("WSL detected but systemd is not running.")
print("Either enable systemd (add systemd=true to /etc/wsl.conf and restart WSL)")
@@ -4666,6 +4903,9 @@ def _gateway_command_inner(args):
systemd_uninstall(system=system)
elif is_macos():
launchd_uninstall()
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.uninstall()
elif is_container():
print("Service uninstall is not applicable inside a Docker container.")
print("To stop the gateway, stop or remove the container:")
@@ -4696,6 +4936,9 @@ def _gateway_command_inner(args):
systemd_start(system=system)
elif is_macos():
launchd_start()
elif is_windows():
from hermes_cli import gateway_windows
gateway_windows.start()
elif is_wsl():
print("WSL detected but systemd is not available.")
print("Run the gateway in foreground mode instead:")
@@ -4738,6 +4981,14 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_available else 0)
if total:
@@ -4759,9 +5010,17 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
if not service_available:
# No systemd/launchd — use profile-scoped PID file
# No systemd/launchd/schtasks service — use profile-scoped PID file
if stop_profile_gateway():
print("✓ Stopped gateway for this profile")
else:
@@ -4791,6 +5050,14 @@ def _gateway_command_inner(args):
service_stopped = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
try:
gateway_windows.stop()
service_stopped = True
except (subprocess.CalledProcessError, RuntimeError):
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_stopped else 0)
if total:
@@ -4803,6 +5070,12 @@ def _gateway_command_inner(args):
systemd_start(system=system)
elif is_macos() and get_launchd_plist_path().exists():
launchd_start()
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
gateway_windows.start()
else:
run_gateway(verbose=0)
else:
run_gateway(verbose=0)
return
@@ -4821,6 +5094,15 @@ def _gateway_command_inner(args):
service_available = True
except subprocess.CalledProcessError:
pass
elif is_windows():
from hermes_cli import gateway_windows
if gateway_windows.is_installed():
service_configured = True
try:
gateway_windows.restart()
service_available = True
except (subprocess.CalledProcessError, RuntimeError):
pass
if not service_available:
# systemd/launchd restart failed — check if linger is the issue
@@ -4863,12 +5145,20 @@ def _gateway_command_inner(args):
snapshot = get_gateway_runtime_snapshot(system=system)
# Check for service first
_windows_service_installed = False
if is_windows():
from hermes_cli import gateway_windows
_windows_service_installed = gateway_windows.is_installed()
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
systemd_status(deep, system=system, full=full)
_print_gateway_process_mismatch(snapshot)
elif is_macos() and get_launchd_plist_path().exists():
launchd_status(deep)
_print_gateway_process_mismatch(snapshot)
elif _windows_service_installed:
from hermes_cli import gateway_windows
gateway_windows.status(deep=deep)
_print_gateway_process_mismatch(snapshot)
else:
# Check for manually running processes
pids = list(snapshot.gateway_pids)
@@ -4889,6 +5179,9 @@ def _gateway_command_inner(args):
print("WSL note:")
print(" The gateway is running in foreground/manual mode (recommended for WSL).")
print(" Use tmux or screen for persistence across terminal closes.")
elif is_windows():
print("To install as a Windows Scheduled Task (auto-start on login):")
print(" hermes gateway install")
else:
print("To install as a service:")
print(" hermes gateway install")
@@ -4909,6 +5202,8 @@ def _gateway_command_inner(args):
elif is_wsl():
print(" tmux new -s hermes 'hermes gateway run' # persistent via tmux")
print(" nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 & # background")
elif is_windows():
print(" hermes gateway install # Install as Windows Scheduled Task (auto-start on login)")
else:
print(" hermes gateway install # Install as user service")
print(" sudo hermes gateway install --system # Install as boot-time system service")
+689
View File
@@ -0,0 +1,689 @@
"""Windows gateway service backend (Scheduled Task + Startup-folder fallback).
This mirrors the contract exposed by ``launchd_install`` / ``launchd_start`` /
``launchd_status`` etc. on macOS and ``systemd_install`` / ``systemd_start`` on
Linux. It uses ``schtasks`` under the hood with ``/SC ONLOGON`` and restart-on-
failure XML settings, and falls back to a ``%APPDATA%\\...\\Startup\\<name>.cmd``
dropper when Scheduled Task creation is denied (locked-down corporate boxes).
Design notes
------------
* ``schtasks /Create /SC ONLOGON /RL LIMITED`` means the task runs at the
CURRENT USER's next logon without any elevation prompt. We also
``schtasks /Run`` immediately after install so the gateway starts right
away without waiting for the next logon.
* We write two files: a shared ``gateway.cmd`` wrapper script (cwd + env + the
actual ``python -m hermes_cli.main gateway run --replace`` invocation) and
EITHER a schtasks entry pointing at it OR a Startup-folder ``.cmd`` that
spawns it detached.
* Status = merge of "is the schtasks entry registered?" + "is the startup
.cmd present?" + "is there a gateway process running?" so the status
command keeps working regardless of which install path was taken.
* Quoting is tricky: schtasks parses ``/TR`` itself and cmd.exe parses the
generated ``gateway.cmd``. Those are DIFFERENT parsers. We keep two
separate quote helpers (same pattern OpenClaw uses) and never cross them.
* All of this is Windows-only. ``import`` paths are still safe on POSIX but
the functions raise if called on non-Windows.
"""
from __future__ import annotations
import os
import re
import shlex
import shutil
import subprocess
import sys
import time
from pathlib import Path
# Short timeouts: schtasks occasionally wedges and we don't want to hang forever.
_SCHTASKS_TIMEOUT_S = 15
_SCHTASKS_NO_OUTPUT_TIMEOUT_S = 30
# Patterns in schtasks stderr that mean "fall back to the Startup folder".
_FALLBACK_PATTERNS = re.compile(
r"(access is denied|acceso denegado|schtasks timed out|schtasks produced no output)",
re.IGNORECASE,
)
_TASK_NAME_DEFAULT = "Hermes_Gateway"
_TASK_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
# ---------------------------------------------------------------------------
# Platform guard
# ---------------------------------------------------------------------------
def _assert_windows() -> None:
if sys.platform != "win32":
raise RuntimeError("gateway_windows is Windows-only")
# ---------------------------------------------------------------------------
# Quoting helpers (two DIFFERENT parsers — do not mix)
# ---------------------------------------------------------------------------
def _quote_cmd_script_arg(value: str) -> str:
"""Quote a single argument for use INSIDE a .cmd file, for cmd.exe parsing.
cmd.exe splits on spaces/tabs outside of double quotes. Embedded quotes
are doubled. We also refuse line breaks because they'd terminate the
logical command line mid-script.
"""
if "\r" in value or "\n" in value:
raise ValueError(f"refusing to quote value containing newline: {value!r}")
if not value:
return '""'
if not re.search(r'[ \t"]', value):
return value
return '"' + value.replace('"', '""') + '"'
def _quote_schtasks_arg(value: str) -> str:
"""Quote a single argument for schtasks.exe's /TR parser.
Schtasks uses a different quoting convention than cmd.exe: embedded
quotes are backslash-escaped, and the whole thing is wrapped in double
quotes if it contains whitespace or quotes.
"""
if not re.search(r'[ \t"]', value):
return value
return '"' + value.replace('"', '\\"') + '"'
# ---------------------------------------------------------------------------
# schtasks.exe wrapper
# ---------------------------------------------------------------------------
def _exec_schtasks(args: list[str]) -> tuple[int, str, str]:
"""Run ``schtasks.exe`` with a hard timeout. Return (code, stdout, stderr).
If schtasks wedges, returns code=124 with a synthetic stderr string
same convention OpenClaw uses, so the fallback detection regex matches.
"""
_assert_windows()
schtasks = shutil.which("schtasks")
if schtasks is None:
return (1, "", "schtasks.exe not found on PATH")
try:
proc = subprocess.run(
[schtasks, *args],
capture_output=True,
text=True,
timeout=_SCHTASKS_TIMEOUT_S,
# CREATE_NO_WINDOW avoids a flashing console window when the CLI
# is itself hosted in a TUI. See tools/browser_tool.py for the
# same pattern and the windows-subprocess-sigint-storm.md ref.
creationflags=0x08000000, # CREATE_NO_WINDOW
)
return (proc.returncode, proc.stdout or "", proc.stderr or "")
except subprocess.TimeoutExpired:
return (124, "", f"schtasks timed out after {_SCHTASKS_TIMEOUT_S}s")
except OSError as e:
return (1, "", f"schtasks invocation failed: {e}")
def _should_fall_back(code: int, detail: str) -> bool:
return code == 124 or bool(_FALLBACK_PATTERNS.search(detail or ""))
# ---------------------------------------------------------------------------
# Paths: where we stash our task script and where Startup lives
# ---------------------------------------------------------------------------
def get_task_name() -> str:
"""Scheduled Task name, scoped per profile.
Default profile: ``Hermes_Gateway``
Named profile X: ``Hermes_Gateway_<X>``
"""
_assert_windows()
# Local import to avoid circular module initialization during hermes_cli boot.
from hermes_cli.gateway import _profile_suffix
suffix = _profile_suffix()
if not suffix:
return _TASK_NAME_DEFAULT
return f"{_TASK_NAME_DEFAULT}_{suffix}"
def _sanitize_filename(value: str) -> str:
"""Remove characters illegal in Windows filenames."""
return re.sub(r'[<>:"/\\|?*\x00-\x1f]', "_", value)
def get_task_script_path() -> Path:
"""The generated ``gateway.cmd`` wrapper that the schtasks entry invokes.
Lives under ``%LOCALAPPDATA%\\hermes\\gateway-service\\<task_name>.cmd``
(or ``<HERMES_HOME>/gateway-service/<task_name>.cmd`` so per-profile
Hermes installs stay self-contained).
"""
_assert_windows()
from hermes_cli.config import get_hermes_home
script_dir = Path(get_hermes_home()) / "gateway-service"
script_dir.mkdir(parents=True, exist_ok=True)
return script_dir / f"{_sanitize_filename(get_task_name())}.cmd"
def _startup_dir() -> Path:
appdata = os.environ.get("APPDATA", "").strip()
if appdata:
return Path(appdata) / "Microsoft" / "Windows" / "Start Menu" / "Programs" / "Startup"
userprofile = os.environ.get("USERPROFILE", "").strip() or os.environ.get("HOME", "").strip()
if not userprofile:
raise RuntimeError("neither APPDATA nor USERPROFILE is set — cannot resolve Startup folder")
return (
Path(userprofile)
/ "AppData"
/ "Roaming"
/ "Microsoft"
/ "Windows"
/ "Start Menu"
/ "Programs"
/ "Startup"
)
def get_startup_entry_path() -> Path:
_assert_windows()
return _startup_dir() / f"{_sanitize_filename(get_task_name())}.cmd"
# ---------------------------------------------------------------------------
# Script rendering
# ---------------------------------------------------------------------------
def _build_gateway_cmd_script(
python_path: str,
working_dir: str,
hermes_home: str,
profile_arg: str,
) -> str:
"""Build the ``gateway.cmd`` wrapper content (CRLF-terminated).
The script:
- cd's into the project directory
- exports HERMES_HOME, PYTHONIOENCODING, VIRTUAL_ENV
- invokes ``python -m hermes_cli.main [--profile X] gateway run --replace``
We intentionally do NOT inline PATH overrides here cmd.exe inherits
the per-user PATH the Scheduled Task was created with, and forcibly
rewriting PATH tends to break Homebrew/nvm-style installations.
"""
lines = ["@echo off", f"rem {_TASK_DESCRIPTION}"]
lines.append(f"cd /d {_quote_cmd_script_arg(working_dir)}")
lines.append(f'set "HERMES_HOME={hermes_home}"')
lines.append('set "PYTHONIOENCODING=utf-8"')
# VIRTUAL_ENV lets the gateway's own python detection find the venv
# if someone imports hermes_constants-based logic during startup.
venv_dir = str(Path(python_path).resolve().parent.parent)
lines.append(f'set "VIRTUAL_ENV={venv_dir}"')
prog_args = [python_path, "-m", "hermes_cli.main"]
if profile_arg:
prog_args.extend(profile_arg.split())
prog_args.extend(["gateway", "run", "--replace"])
lines.append(" ".join(_quote_cmd_script_arg(a) for a in prog_args))
return "\r\n".join(lines) + "\r\n"
def _build_startup_launcher(script_path: Path) -> str:
"""The tiny .cmd that goes in the Startup folder. Just minimizes and chains."""
lines = [
"@echo off",
f"rem {_TASK_DESCRIPTION}",
# ``start "" /min`` detaches with a minimized console window.
# ``/d /c`` on cmd.exe skips AUTORUN and runs the target script once.
f'start "" /min cmd.exe /d /c {_quote_cmd_script_arg(str(script_path))}',
]
return "\r\n".join(lines) + "\r\n"
def _write_task_script() -> Path:
"""Generate and write the gateway.cmd wrapper. Return its absolute path."""
_assert_windows()
# Local imports to avoid circular-init at module load time.
from hermes_cli.config import get_hermes_home
from hermes_cli.gateway import (
PROJECT_ROOT,
_profile_arg,
get_python_path,
)
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
hermes_home = str(Path(get_hermes_home()).resolve())
profile_arg = _profile_arg(hermes_home)
content = _build_gateway_cmd_script(python_path, working_dir, hermes_home, profile_arg)
script_path = get_task_script_path()
script_path.write_text(content, encoding="utf-8", newline="")
return script_path
# ---------------------------------------------------------------------------
# Install / uninstall
# ---------------------------------------------------------------------------
def _resolve_task_user() -> str | None:
"""Return ``DOMAIN\\USER`` if available, else bare USERNAME, else None."""
username = os.environ.get("USERNAME") or os.environ.get("USER") or os.environ.get("LOGNAME")
if not username:
return None
if "\\" in username:
return username
domain = os.environ.get("USERDOMAIN")
return f"{domain}\\{username}" if domain else username
def _install_scheduled_task(task_name: str, script_path: Path) -> tuple[bool, str]:
"""Create or update the Scheduled Task. Returns (success, detail)."""
quoted_script = _quote_schtasks_arg(str(script_path))
# First try /Change in case the task already exists — keeps the existing
# trigger + settings intact and just repoints /TR.
change_code, _out, change_err = _exec_schtasks(
["/Change", "/TN", task_name, "/TR", quoted_script]
)
if change_code == 0:
return (True, f"Updated existing Scheduled Task {task_name!r}")
# Create fresh. Start with the "current user, interactive, no stored
# password" variant; if that fails, retry without /RU /NP /IT.
base = [
"/Create",
"/F",
"/SC",
"ONLOGON",
"/RL",
"LIMITED",
"/TN",
task_name,
"/TR",
quoted_script,
]
user = _resolve_task_user()
variants = []
if user:
variants.append([*base, "/RU", user, "/NP", "/IT"])
variants.append(base)
last_code = 1
last_err = ""
for argv in variants:
code, out, err = _exec_schtasks(argv)
if code == 0:
return (True, f"Created Scheduled Task {task_name!r}")
last_code, last_err = code, (err or out or "")
return (False, f"schtasks /Create failed (code {last_code}): {last_err.strip()}")
def _install_startup_entry(script_path: Path) -> Path:
"""Write the Startup-folder fallback launcher. Returns its path."""
entry = get_startup_entry_path()
entry.parent.mkdir(parents=True, exist_ok=True)
entry.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
return entry
def _derive_venv_pythonw(python_exe: str) -> str:
"""Given a ``python.exe`` path, return the sibling ``pythonw.exe`` if present.
``pythonw.exe`` is the console-less variant. Using it for detached
daemons means there's no console handle to inherit from the spawning
shell, which is what lets the gateway survive a parent-shell exit on
Windows. Falls back to the original ``python.exe`` if the ``w`` variant
isn't there — caller must still set CREATE_NO_WINDOW in that case.
"""
p = Path(python_exe)
candidate = p.with_name(p.stem + "w" + p.suffix)
if candidate.exists():
return str(candidate)
return python_exe
def _build_gateway_argv() -> tuple[list[str], str, dict[str, str]]:
"""Build (argv, working_dir, env_overlay) for the gateway subprocess.
Same logical command as what gateway.cmd runs, but assembled as a
native argv for direct ``subprocess.Popen`` invocation no cmd.exe
layer in between.
"""
_assert_windows()
from hermes_cli.config import get_hermes_home
from hermes_cli.gateway import (
PROJECT_ROOT,
_profile_arg,
get_python_path,
)
python_exe = _derive_venv_pythonw(get_python_path())
working_dir = str(PROJECT_ROOT)
hermes_home = str(Path(get_hermes_home()).resolve())
profile_arg = _profile_arg(hermes_home)
argv = [python_exe, "-m", "hermes_cli.main"]
if profile_arg:
argv.extend(profile_arg.split())
argv.extend(["gateway", "run", "--replace"])
env_overlay = {
"HERMES_HOME": hermes_home,
"PYTHONIOENCODING": "utf-8",
"VIRTUAL_ENV": str(Path(python_exe).resolve().parent.parent),
}
return argv, working_dir, env_overlay
def _spawn_detached(script_path: Path | None = None) -> int:
"""Launch the gateway as a fully detached background process.
We spawn ``pythonw.exe -m hermes_cli.main gateway run --replace``
directly NOT through a cmd.exe shim because on Windows a cmd.exe
child inherits the parent session's console handle and tends to get
reaped when the spawning shell exits. pythonw.exe has no console, and
combined with DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP |
CREATE_NO_WINDOW + DEVNULL stdio + a fresh env, the resulting process
is independent of whichever shell started it.
Arg ``script_path`` is accepted for API symmetry with older callers
but ignored we don't need it now that we go direct.
Returns the spawned PID so callers can verify the process actually
came up.
"""
_assert_windows()
argv, working_dir, env_overlay = _build_gateway_argv()
# Inherit PATH etc. from the current env, overlay our required vars.
env = {**os.environ, **env_overlay}
# DETACHED_PROCESS 0x00000008 — no console attached to child
# CREATE_NEW_PROCESS_GROUP 0x00000200 — child gets its own group, won't
# receive Ctrl+C from our group
# CREATE_NO_WINDOW 0x08000000 — belt-and-braces no-console flag
# CREATE_BREAKAWAY_FROM_JOB 0x01000000 — escape any job object the
# parent is in (prevents parent-
# job teardown from reaping us;
# some Windows Terminal versions
# wrap their children in a job).
flags = 0x00000008 | 0x00000200 | 0x08000000 | 0x01000000
# Redirect any stray stdout/stderr output to a sidecar log. Python's
# logging module writes to gateway.log through a FileHandler, so the
# real gateway logs still land there — this just captures anything
# that goes to print() or native stderr.
from hermes_cli.config import get_hermes_home
log_dir = Path(get_hermes_home()) / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
stray_log = log_dir / "gateway-stdio.log"
try:
with open(stray_log, "ab", buffering=0) as log_fh:
proc = subprocess.Popen(
argv,
cwd=working_dir,
env=env,
creationflags=flags,
close_fds=True,
stdin=subprocess.DEVNULL,
stdout=log_fh,
stderr=log_fh,
)
except OSError:
# CREATE_BREAKAWAY_FROM_JOB can fail with "access denied" when the
# parent's job object doesn't permit breakaway (some Windows
# Terminal configs). Retry without the breakaway flag — in most
# setups pythonw.exe + DETACHED_PROCESS is enough on its own.
flags_no_breakaway = flags & ~0x01000000
with open(stray_log, "ab", buffering=0) as log_fh:
proc = subprocess.Popen(
argv,
cwd=working_dir,
env=env,
creationflags=flags_no_breakaway,
close_fds=True,
stdin=subprocess.DEVNULL,
stdout=log_fh,
stderr=log_fh,
)
return proc.pid
def install(force: bool = False) -> None:
"""Install the gateway as a Windows Scheduled Task (with Startup fallback).
Idempotent: re-running updates the task to point at the current python/
project paths. ``force`` is accepted for API parity with ``launchd_install``
/ ``systemd_install`` but isn't needed — we always reconcile.
"""
_assert_windows()
task_name = get_task_name()
script_path = _write_task_script()
ok, detail = _install_scheduled_task(task_name, script_path)
if ok:
print(f"{detail}")
print(f" Task script: {script_path}")
# Start it now so the user doesn't have to log off/on.
run_code, _out, run_err = _exec_schtasks(["/Run", "/TN", task_name])
if run_code == 0:
_report_gateway_start("Scheduled Task")
else:
# Scheduled Task was created but /Run failed (e.g. the task's
# action is malformed). Spawn directly as a backstop.
pid = _spawn_detached(script_path)
_report_gateway_start(
f"direct spawn (PID {pid}; schtasks /Run said: {run_err.strip()})"
)
_print_next_steps()
return
# schtasks create didn't work. See if it's a "fall back to startup" case.
if _should_fall_back(1, detail):
print(f"↻ Scheduled Task install blocked ({detail.splitlines()[0]}) — using Startup folder fallback")
entry = _install_startup_entry(script_path)
pid = _spawn_detached(script_path)
print(f"✓ Installed Windows login item: {entry}")
print(f" Task script: {script_path}")
_report_gateway_start(f"direct spawn (PID {pid})")
_print_next_steps()
return
# Unknown schtasks error — surface it and bail.
raise RuntimeError(f"Windows gateway install failed: {detail}")
def _wait_for_gateway_ready(timeout_s: float = 6.0, interval_s: float = 0.4) -> list[int]:
"""Poll for a live gateway process for up to ``timeout_s`` seconds.
Returns the list of PIDs found. Empty list means nothing came up in
time the caller should surface that to the user as a failed start.
"""
from hermes_cli.gateway import find_gateway_pids
deadline = time.time() + timeout_s
while time.time() < deadline:
pids = list(find_gateway_pids())
if pids:
return pids
time.sleep(interval_s)
return []
def _report_gateway_start(via: str) -> None:
pids = _wait_for_gateway_ready()
if pids:
print(f"✓ Gateway started via {via} (PID: {', '.join(map(str, pids))})")
else:
print(f"⚠ Launched gateway via {via}, but no process detected after 6s.")
print(" Check the log for startup errors:")
from hermes_cli.config import get_hermes_home
print(f" type {Path(get_hermes_home()).resolve()}\\logs\\gateway.log")
print(f" type {Path(get_hermes_home()).resolve()}\\logs\\gateway-stdio.log")
def _print_next_steps() -> None:
from hermes_cli.config import get_hermes_home
hermes_home = Path(get_hermes_home()).resolve()
print()
print("Next steps:")
print(" hermes gateway status # Check status")
print(f" type {hermes_home}\\logs\\gateway.log # View logs")
def uninstall() -> None:
"""Remove both the Scheduled Task and the Startup-folder fallback, if present."""
_assert_windows()
task_name = get_task_name()
script_path = get_task_script_path()
startup_entry = get_startup_entry_path()
if is_task_registered():
code, _out, err = _exec_schtasks(["/Delete", "/F", "/TN", task_name])
if code == 0:
print(f"✓ Removed Scheduled Task {task_name!r}")
else:
print(f"⚠ schtasks /Delete returned code {code}: {err.strip()}")
for path, label in [(startup_entry, "Windows login item"), (script_path, "Task script")]:
try:
path.unlink()
print(f"✓ Removed {label}: {path}")
except FileNotFoundError:
pass
# ---------------------------------------------------------------------------
# Status / start / stop / restart
# ---------------------------------------------------------------------------
def is_task_registered() -> bool:
code, _out, _err = _exec_schtasks(["/Query", "/TN", get_task_name()])
return code == 0
def is_startup_entry_installed() -> bool:
return get_startup_entry_path().exists()
def is_installed() -> bool:
"""True when either the schtasks entry or the Startup fallback is present."""
return is_task_registered() or is_startup_entry_installed()
def query_task_status() -> dict[str, str]:
"""Parse ``schtasks /Query /V /FO LIST`` and pull the interesting keys."""
code, out, err = _exec_schtasks(["/Query", "/TN", get_task_name(), "/V", "/FO", "LIST"])
if code != 0:
return {}
info: dict[str, str] = {}
for raw in out.splitlines():
line = raw.strip()
if not line or ":" not in line:
continue
key, _, value = line.partition(":")
key = key.strip().lower()
value = value.strip()
# Some Windows locales emit "Last Result" instead of "Last Run Result".
if key in {"status", "last run time", "last run result", "last result"}:
if key == "last result":
info.setdefault("last run result", value)
else:
info[key] = value
return info
def _gateway_pids() -> list[int]:
"""Reuse the cross-platform PID scanner in gateway.py."""
from hermes_cli.gateway import find_gateway_pids
return list(find_gateway_pids())
def status(deep: bool = False) -> None:
"""Print a status report for the Windows gateway service."""
_assert_windows()
task_name = get_task_name()
task_installed = is_task_registered()
startup_installed = is_startup_entry_installed()
pids = _gateway_pids()
if task_installed:
print(f"✓ Scheduled Task registered: {task_name}")
info = query_task_status()
if info:
for key in ("status", "last run time", "last run result"):
if key in info:
print(f" {key.title()}: {info[key]}")
elif startup_installed:
print(f"✓ Windows login item installed: {get_startup_entry_path()}")
else:
print("✗ Gateway service not installed")
if pids:
print(f"✓ Gateway process running (PID: {', '.join(map(str, pids))})")
else:
print("✗ No gateway process detected")
if deep:
print()
print(f" Task name: {task_name}")
print(f" Task script: {get_task_script_path()}")
print(f" Startup entry: {get_startup_entry_path()}")
if not task_installed and not startup_installed and not pids:
print()
print("To install:")
print(" hermes gateway install")
def start() -> None:
"""Start the gateway. Prefers /Run on the scheduled task if present."""
_assert_windows()
if is_task_registered():
code, _out, err = _exec_schtasks(["/Run", "/TN", get_task_name()])
if code == 0:
_report_gateway_start(f"Scheduled Task {get_task_name()!r}")
return
print(f"⚠ schtasks /Run failed (code {code}): {err.strip()} — falling back to direct spawn")
# Direct spawn — no script_path needed with the new argv-based spawner.
pid = _spawn_detached()
_report_gateway_start(f"direct spawn (PID {pid})")
def stop() -> None:
"""Stop the gateway. Tries /End on the scheduled task, then kills any stragglers."""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes
stopped_any = False
if is_task_registered():
code, _out, err = _exec_schtasks(["/End", "/TN", get_task_name()])
# schtasks returns nonzero when the task isn't currently running — don't treat that as an error.
if code == 0:
stopped_any = True
elif "not running" not in (err or "").lower():
print(f"⚠ schtasks /End returned code {code}: {err.strip()}")
killed = kill_gateway_processes(all_profiles=False)
if killed:
stopped_any = True
print(f"✓ Killed {killed} gateway process(es)")
if stopped_any:
print("✓ Gateway stopped")
else:
print("✗ No gateway was running")
def restart() -> None:
"""Stop the gateway then start it again."""
_assert_windows()
stop()
# Give Windows a moment to release the listening port.
time.sleep(1.0)
start()
+14 -15
View File
@@ -2805,12 +2805,18 @@ def _classify_worker_exit(pid: int) -> "tuple[str, Optional[int]]":
def _pid_alive(pid: Optional[int]) -> bool:
"""Return True if ``pid`` is still running on this host.
Cross-platform: uses ``os.kill(pid, 0)`` on POSIX and ``OpenProcess``
on Windows. Returns False for falsy PIDs or on any OS error.
Cross-platform: uses ``OpenProcess`` + ``WaitForSingleObject`` on
Windows (via ``gateway.status._pid_exists``) and ``os.kill(pid, 0)``
on POSIX. Returns False for falsy PIDs or on any OS error.
**Zombie handling:** ``os.kill(pid, 0)`` succeeds against
zombie processes (post-exit, pre-reap) because the process table
entry still exists. A worker that exits without being reaped by its
**DO NOT** use ``os.kill(pid, 0)`` directly on Windows Python's
Windows ``os.kill`` treats ``sig=0`` as ``CTRL_C_EVENT`` (bpo-14484)
and will broadcast it to the target's console group, potentially
killing unrelated processes.
**Zombie handling:** the existence check succeeds against zombie
processes (post-exit, pre-reap) because the process table entry
still exists. A worker that exits without being reaped by its
parent would stay "alive" to the dispatcher forever. Dispatcher
workers are started via ``start_new_session=True`` + intentional
Popen handle abandonment, so init reaps them quickly but during
@@ -2821,17 +2827,10 @@ def _pid_alive(pid: Optional[int]) -> bool:
"""
if not pid or pid <= 0:
return False
try:
if hasattr(os, "kill"):
os.kill(int(pid), 0)
except ProcessLookupError:
from gateway.status import _pid_exists
if not _pid_exists(int(pid)):
return False
except PermissionError:
# Process exists, we just can't signal it.
return True
except OSError:
return False
# Still here → kill(0) succeeded. Check for zombie on platforms
# Still here → process exists. Check for zombie on platforms
# where we have a cheap, deterministic process-state probe.
if sys.platform == "linux":
try:
+20 -9
View File
@@ -46,7 +46,20 @@ Usage:
# IMPORTANT: hermes_bootstrap must be the very first import — it sets up
# UTF-8 stdio on Windows so print()/subprocess children don't hit
# UnicodeEncodeError with non-ASCII characters. No-op on POSIX.
import hermes_bootstrap # noqa: F401
#
# Guarded against ModuleNotFoundError because ``hermes_bootstrap`` is a
# top-level module registered via pyproject.toml's ``py-modules`` list.
# When the user upgrades code via ``git pull`` (or ``hermes update``
# crashes between ``git reset --hard`` and ``uv pip install -e .``), the
# new code references ``hermes_bootstrap`` but the editable install's
# ``.pth`` file still points at the old set of top-level modules. Without
# this guard, hermes crashes on import and the user can't run
# ``hermes update`` to recover. Missing the bootstrap means UTF-8 stdio
# setup is skipped on Windows — degraded, not broken. POSIX is unaffected.
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
pass
import argparse
import json
@@ -5787,16 +5800,14 @@ def _kill_stale_dashboard_processes(
while pending and _time.monotonic() < deadline:
_time.sleep(0.1)
still_pending = []
# On Windows, os.kill(pid, 0) is NOT a no-op. Route through
# the cross-platform existence check.
from gateway.status import _pid_exists
for pid in pending:
try:
os.kill(pid, 0) # probe
except ProcessLookupError:
killed.append(pid)
except (PermissionError, OSError):
# Can't probe — assume still there.
if _pid_exists(pid):
still_pending.append(pid)
else:
still_pending.append(pid)
killed.append(pid)
pending = still_pending
# SIGKILL any survivors.
@@ -6840,7 +6851,7 @@ def _ensure_fhs_path_guard() -> None:
if sys.platform != "linux":
return
try:
if os.geteuid() != 0:
if os.geteuid() != 0: # windows-footgun: ok — Linux FHS helper, guarded by sys.platform == "linux" above + AttributeError catch
return
except AttributeError:
return
+4 -6
View File
@@ -774,15 +774,13 @@ def _stop_gateway_process(profile_dir: Path) -> None:
# and raw os.kill with SIGTERM doesn't cascade to child processes
# the same way taskkill /T does.
from gateway.status import terminate_pid as _terminate_pid
from gateway.status import _pid_exists
_terminate_pid(pid) # graceful first
# Wait up to 10s for graceful shutdown
# Wait up to 10s for graceful shutdown. On Windows, os.kill(pid, 0)
# is NOT a no-op — use the handle-based existence check.
for _ in range(20):
_time.sleep(0.5)
try:
os.kill(pid, 0)
except (ProcessLookupError, OSError):
# OSError covers Windows' WinError 87 "invalid parameter"
# returned for an invalid/gone PID probe.
if not _pid_exists(pid):
print(f"✓ Gateway stopped (PID {pid})")
return
# Force kill
+1 -1
View File
@@ -213,7 +213,7 @@ class PtyBridge:
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL): # windows-footgun: ok — POSIX-only module (imports fcntl/termios/ptyprocess at top)
if not self._proc.isalive():
break
try:
+1 -1
View File
@@ -54,7 +54,7 @@ TIPS = [
"Combine multiple references: \"Review @file:main.py and @file:test.py for consistency.\"",
# --- Keybindings ---
"Alt+Enter (or Ctrl+J) inserts a newline for multi-line input.",
"Alt+Enter inserts a newline for multi-line input. (Windows Terminal intercepts Alt+Enter — use Ctrl+Enter instead.)",
"Ctrl+C interrupts the agent. Double-press within 2 seconds to force exit.",
"Ctrl+Z suspends Hermes to the background — run fg in your shell to resume.",
"Tab accepts auto-suggestion ghost text or autocompletes slash commands.",
+208 -9
View File
@@ -118,12 +118,13 @@ def remove_wrapper_script():
def uninstall_gateway_service():
"""Stop and uninstall the gateway service (systemd, launchd) and kill any
standalone gateway processes.
"""Stop and uninstall the gateway service (systemd, launchd, Windows
Scheduled Task / Startup folder) and kill any standalone gateway processes.
Delegates to the gateway module which handles:
- Linux: user + system systemd services (with proper DBUS env setup)
- macOS: launchd plists
- Windows: Scheduled Task + Startup-folder fallback, via ``gateway_windows``
- All platforms: standalone ``hermes gateway run`` processes
- Termux/Android: skips systemd (no systemd on Android), still kills standalone processes
"""
@@ -167,7 +168,7 @@ def uninstall_gateway_service():
scope = "system" if is_system else "user"
try:
if is_system and os.geteuid() != 0:
if is_system and os.geteuid() != 0: # windows-footgun: ok — Linux systemd uninstall path, guarded by `if system == "Linux"` above
log_warn(f"System gateway service exists at {unit_path} "
f"but needs sudo to remove")
continue
@@ -201,9 +202,163 @@ def uninstall_gateway_service():
except Exception as e:
log_warn(f"Could not remove launchd gateway service: {e}")
# 4. Windows: uninstall Scheduled Task + Startup-folder entry. The
# gateway_windows module already knows how to locate and remove both
# code paths (schtasks /Delete + .cmd unlink) and how to stop any
# running detached pythonw gateway process. We call into it so the
# uninstall logic stays in exactly one place.
elif system == "Windows":
try:
from hermes_cli import gateway_windows
if gateway_windows.is_installed() or gateway_windows.is_task_registered() \
or gateway_windows.is_startup_entry_installed():
try:
gateway_windows.stop()
except Exception as e:
log_warn(f"Could not stop Windows gateway cleanly: {e}")
try:
gateway_windows.uninstall()
log_success("Removed Windows gateway (Scheduled Task + Startup entry)")
stopped_something = True
except Exception as e:
log_warn(f"Could not fully uninstall Windows gateway: {e}")
except Exception as e:
log_warn(f"Could not check Windows gateway service: {e}")
return stopped_something
# ============================================================================
# Windows-specific uninstall helpers
# ============================================================================
#
# The installer (``scripts/install.ps1``) does four Windows-only things that
# ``remove_path_from_shell_configs`` / ``remove_wrapper_script`` don't cover:
#
# 1. Sets User-scope env vars ``HERMES_HOME`` and ``HERMES_GIT_BASH_PATH``
# via ``[Environment]::SetEnvironmentVariable(..., "User")``. These
# don't live in ~/.bashrc — they're in the Windows registry at
# HKCU\Environment.
# 2. Prepends to User-scope ``PATH`` (same registry location) entries
# like ``%LOCALAPPDATA%\hermes\git\cmd``, ``%LOCALAPPDATA%\hermes\git\bin``,
# ``%LOCALAPPDATA%\hermes\git\usr\bin``, ``%LOCALAPPDATA%\hermes\node``.
# Again not in any rc file — only accessible via the registry or the
# .NET [Environment] API.
# 3. Downloads PortableGit to ``%LOCALAPPDATA%\hermes\git\`` and Node to
# ``%LOCALAPPDATA%\hermes\node\`` as user-scoped, isolated copies.
# These are ~200MB combined and serve no purpose after uninstall.
# 4. On the ``hermes dashboard`` + gateway paths, drops files into
# ``%LOCALAPPDATA%\hermes\gateway-service\`` and sometimes
# ``%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`` — the
# latter is handled by ``gateway_windows.uninstall()`` already.
#
# Running a PowerShell one-liner per operation is overkill and fragile on
# locked-down machines (Constrained Language Mode, restricted ExecutionPolicy).
# Direct registry writes via ``winreg`` work without spawning any subprocess
# and apply immediately for new shells (SendMessage WM_SETTINGCHANGE would
# be nicer but requires ctypes and buys us nothing — the user will log out
# or open a new terminal anyway).
def _hermes_path_markers(hermes_home: Path) -> list[str]:
"""Path-entry substrings that identify Hermes-owned User-PATH entries."""
root = str(hermes_home).rstrip("\\/")
# Match on prefix so sub-entries (git\cmd, git\bin, git\usr\bin, node, etc.)
# all get swept. Also match the bare hermes-agent install dir.
markers = [root + "\\hermes-agent", root + "\\git", root + "\\node", root + "\\venv"]
# Also match if HERMES_HOME was customised to somewhere else — find-and-nuke
# any entry whose path component contains "hermes". We don't want to catch
# unrelated entries like "chermes-foo" or "ephermeral", so we look for
# backslash-hermes as a word-ish boundary.
return markers
def remove_path_from_windows_registry(hermes_home: Path) -> list[str]:
"""Strip Hermes-owned entries from User-scope PATH in the registry.
Returns the list of removed path entries. Operates on HKCU\\Environment,
same key the installer wrote to via ``[Environment]::SetEnvironmentVariable``.
"""
try:
import winreg
except ImportError:
return [] # not on Windows, nothing to do
removed: list[str] = []
key_path = "Environment"
try:
with winreg.OpenKey(winreg.HKEY_CURRENT_USER, key_path, 0,
winreg.KEY_READ | winreg.KEY_WRITE) as key:
try:
path_value, path_type = winreg.QueryValueEx(key, "Path")
except FileNotFoundError:
return []
# Preserve REG_EXPAND_SZ vs REG_SZ so unexpanded %VARS% survive.
entries = [e for e in path_value.split(";") if e]
markers = _hermes_path_markers(hermes_home)
kept: list[str] = []
for entry in entries:
entry_norm = entry.rstrip("\\/")
matched = any(entry_norm.lower().startswith(m.lower()) for m in markers)
if matched:
removed.append(entry)
else:
kept.append(entry)
if removed:
new_value = ";".join(kept)
winreg.SetValueEx(key, "Path", 0, path_type, new_value)
except OSError as e:
log_warn(f"Could not edit User PATH in registry: {e}")
return removed
def remove_hermes_env_vars_windows() -> list[str]:
"""Delete HERMES_HOME and HERMES_GIT_BASH_PATH from User-scope env vars."""
try:
import winreg
except ImportError:
return []
removed: list[str] = []
try:
with winreg.OpenKey(winreg.HKEY_CURRENT_USER, "Environment", 0,
winreg.KEY_READ | winreg.KEY_WRITE) as key:
for name in ("HERMES_HOME", "HERMES_GIT_BASH_PATH"):
try:
winreg.QueryValueEx(key, name)
except FileNotFoundError:
continue
try:
winreg.DeleteValue(key, name)
removed.append(name)
except OSError as e:
log_warn(f"Could not delete {name} from User env: {e}")
except OSError as e:
log_warn(f"Could not open User Environment key: {e}")
return removed
def remove_portable_tooling_windows(hermes_home: Path) -> list[Path]:
"""Delete PortableGit and Node installs the Windows installer created under
``%LOCALAPPDATA%\\hermes\\``. Only called on full uninstall; they're
isolated from any system Git / Node so they cannot break other tools."""
removed: list[Path] = []
for sub in ("git", "node", "gateway-service"):
target = hermes_home / sub
if target.exists():
try:
shutil.rmtree(target, ignore_errors=False)
removed.append(target)
except Exception as e:
log_warn(f"Could not remove {target}: {e}")
return removed
def _is_windows() -> bool:
import sys
return sys.platform == "win32"
def _is_default_hermes_home(hermes_home: Path) -> bool:
"""Return True when ``hermes_home`` points at the default (non-profile) root."""
try:
@@ -400,14 +555,36 @@ def run_uninstall(args):
if not uninstall_gateway_service():
log_info("No gateway service or processes found")
# 2. Remove PATH entries from shell configs
# 2. Remove PATH entries from shell configs (POSIX) AND from the Windows
# User-scope registry. Both helpers no-op on the wrong platform so we
# can safely call them unconditionally.
log_info("Removing PATH entries from shell configs...")
removed_configs = remove_path_from_shell_configs()
if removed_configs:
for config in removed_configs:
log_success(f"Updated {config}")
else:
log_info("No PATH entries found to remove")
log_info("No PATH entries found to remove in shell rc files")
if _is_windows():
log_info("Removing PATH entries from Windows User environment...")
# Expand %LOCALAPPDATA% etc. in hermes_home so the marker matching is
# against fully resolved paths — installer writes literal strings
# like C:\Users\<u>\AppData\Local\hermes\git\cmd, not %LOCALAPPDATA%.
removed_path_entries = remove_path_from_windows_registry(Path(os.path.expandvars(str(hermes_home))))
if removed_path_entries:
for entry in removed_path_entries:
log_success(f"Removed from User PATH: {entry}")
else:
log_info("No Hermes-owned PATH entries in User environment")
log_info("Removing HERMES_HOME / HERMES_GIT_BASH_PATH User env vars...")
removed_env = remove_hermes_env_vars_windows()
if removed_env:
for name in removed_env:
log_success(f"Removed User env var: {name}")
else:
log_info("No Hermes-set User env vars to remove")
# 3. Remove wrapper script
log_info("Removing hermes command...")
@@ -436,6 +613,21 @@ def run_uninstall(args):
except Exception as e:
log_warn(f"Could not fully remove {project_root}: {e}")
log_info("You may need to manually remove it")
# 4b. Remove Windows-only installer artifacts that are NOT user data:
# PortableGit, bundled Node, gateway-service dir. Installer put them
# under HERMES_HOME but they're install tooling, not config — safe to
# remove even in "keep data" mode. If we're doing a full uninstall
# the step-5 rmtree(hermes_home) would sweep them anyway; calling
# this helper there is a no-op since they'll already be gone.
if _is_windows():
log_info("Removing Windows installer artifacts (PortableGit, Node, gateway-service)...")
removed_artifacts = remove_portable_tooling_windows(hermes_home)
if removed_artifacts:
for path in removed_artifacts:
log_success(f"Removed {path}")
else:
log_info("No Windows installer artifacts to remove")
# 5. Optionally remove ~/.hermes/ data directory (and named profiles)
if full_uninstall:
@@ -471,11 +663,18 @@ def run_uninstall(args):
print(f" {hermes_home}/")
print()
print("To reinstall later with your existing settings:")
print(color(" curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
if _is_windows():
print(color(" irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex", Colors.DIM))
else:
print(color(" curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash", Colors.DIM))
print()
print(color("Reload your shell to complete the process:", Colors.YELLOW))
print(" source ~/.bashrc # or ~/.zshrc")
if _is_windows():
print(color("Open a new terminal (PowerShell / Windows Terminal) to pick up", Colors.YELLOW))
print(color("the updated User PATH and environment variables.", Colors.YELLOW))
else:
print(color("Reload your shell to complete the process:", Colors.YELLOW))
print(" source ~/.bashrc # or ~/.zshrc")
print()
print("Thank you for using Hermes Agent! ⚕")
print()
@@ -4,6 +4,7 @@ description: Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent w
version: 1.0.0
author: Hermes Agent (Nous Research)
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Coding-Agent, Blackbox, Multi-Agent, Judge, Multi-Model]
@@ -4,6 +4,7 @@ description: Configure and use Honcho memory with Hermes -- cross-session user m
version: 2.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
+1
View File
@@ -4,6 +4,7 @@ description: Query Base (Ethereum L2) blockchain data with USD pricing — walle
version: 0.1.0
author: youssefea
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
@@ -4,6 +4,7 @@ description: Query Solana blockchain data with USD pricing — wallet balances,
version: 0.2.0
author: Deniz Alagoz (gizdusum), enhanced by Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Solana, Blockchain, Crypto, Web3, RPC, DeFi, NFT]
@@ -8,6 +8,7 @@ description: >
and one concrete recommendation with definition of done and implementation plan.
Use when the user asks for a "1-3-1", says "give me options", or needs help
choosing between competing approaches.
platforms: [linux, macos, windows]
version: 1.0.0
author: Willard Moore
license: MIT
@@ -5,6 +5,7 @@ version: 1.0.0
requires: Blender 4.3+ (desktop instance required, headless not supported)
author: alireza78a
tags: [blender, 3d, animation, modeling, bpy, mcp]
platforms: [linux, macos, windows]
---
# Blender MCP
@@ -5,6 +5,7 @@ version: 0.1.0
author: v1k22 (original PR), ported into hermes-agent
license: MIT
dependencies: []
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [diagrams, svg, visualization, education, physics, chemistry, engineering]
@@ -4,6 +4,7 @@ description: Create HTML-based video compositions, animated title cards, social
version: 1.0.0
author: heygen-com
license: Apache-2.0
platforms: [linux, macos, windows]
prerequisites:
commands: [node, ffmpeg, npx]
metadata:
@@ -4,6 +4,7 @@ description: Plan, set up, and monitor a multi-agent video production pipeline b
version: 1.0.0
author: [SHL0MS, alt-glitch]
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [video, kanban, multi-agent, orchestration, production-pipeline]
@@ -4,6 +4,7 @@ description: Generate real meme images by picking a template and overlaying text
version: 2.0.0
author: adanaleycio
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [creative, memes, humor, images]
+1
View File
@@ -4,6 +4,7 @@ description: "Run 150+ AI apps via inference.sh CLI (infsh) — image generation
version: 1.0.0
author: okaris
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [AI, image-generation, video, LLM, search, inference, FLUX, Veo, Claude]
@@ -4,6 +4,7 @@ description: Manage Docker containers, images, volumes, networks, and Compose st
version: 1.0.0
author: sprmn24
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [docker, containers, devops, infrastructure, compose, images, volumes, networks, debugging]
@@ -4,6 +4,7 @@ description: Roleplay the most difficult, tech-resistant user for your product.
version: 1.0.0
author: Omni @ Comelse
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [qa, ux, testing, adversarial, dogfood, personas, user-testing]
+1
View File
@@ -2,6 +2,7 @@
name: agentmail
description: Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).
version: 1.0.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [email, communication, agentmail, mcp]
@@ -4,6 +4,7 @@ description: Build fully-integrated 3-statement models (IS, BS, CF) in Excel wit
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, three-statement, income-statement, balance-sheet, cash-flow, excel, openpyxl, modeling]
@@ -4,6 +4,7 @@ description: Build comparable company analysis in Excel — operating metrics, v
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, comps, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build institutional-quality DCF valuation models in Excel — reven
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, dcf, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build auditable Excel workbooks headless with openpyxl — blue/bla
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [excel, openpyxl, finance, spreadsheet, modeling]
@@ -4,6 +4,7 @@ description: Build leveraged buyout models in Excel — sources & uses, debt sch
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, valuation, lbo, private-equity, excel, openpyxl, modeling]
@@ -4,6 +4,7 @@ description: Build accretion/dilution (merger) models in Excel — pro-forma P&L
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [finance, m-and-a, merger, accretion-dilution, excel, openpyxl, modeling, investment-banking]
@@ -4,6 +4,7 @@ description: Build PowerPoint decks headless with python-pptx. Pairs with excel-
version: 1.0.0
author: Anthropic (adapted by Nous Research)
license: Apache-2.0
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [powerpoint, pptx, python-pptx, presentation, finance]
@@ -6,6 +6,7 @@ description: >
foods via USDA FoodData Central. Compute BMI, TDEE, one-rep max, macro
splits, and body fat — pure Python, no pip installs. Built for anyone
chasing gains, cutting weight, or just trying to eat better.
platforms: [linux, macos, windows]
version: 1.0.0
authors:
- haileymarshall
@@ -6,6 +6,7 @@ description: >
heart rate, HRV, sleep staging, and 40+ derived EXG scores) into responses.
Requires a BCI wearable (Muse 2/S or OpenBCI) and the NeuroSkill desktop app
running locally.
platforms: [linux, macos, windows]
version: 1.0.0
author: Hermes Agent + Nous Research
license: MIT
+1
View File
@@ -4,6 +4,7 @@ description: Build, test, inspect, install, and deploy MCP servers with FastMCP
version: 1.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [MCP, FastMCP, Python, Tools, Resources, Prompts, Deployment]
+1
View File
@@ -4,6 +4,7 @@ description: Use the mcporter CLI to list, configure, auth, and call MCP servers
version: 1.0.0
author: community
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [MCP, Tools, API, Integrations, Interop]
@@ -4,6 +4,7 @@ description: Migrate a user's OpenClaw customization footprint into Hermes Agent
version: 1.0.0
author: Hermes Agent (Nous Research)
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Migration, OpenClaw, Hermes, Memory, Persona, Import]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [accelerate, torch, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Distributed Training, HuggingFace, Accelerate, DeepSpeed, FSDP, Mixed Precision, PyTorch, DDP, Unified API, Simple]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [chromadb, sentence-transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [transformers, torch, pillow]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Multimodal, CLIP, Vision-Language, Zero-Shot, Image Classification, OpenAI, Image Search, Cross-Modal Retrieval, Content Moderation]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [faiss-cpu, faiss-gpu, numpy]
platforms: [linux, macos]
metadata:
hermes:
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [flash-attn, torch, transformers]
platforms: [linux, macos]
metadata:
hermes:
tags: [Optimization, Flash Attention, Attention Optimization, Memory Efficiency, Speed Optimization, Long Context, PyTorch, SDPA, H100, FP8, Transformers]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [guidance, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Prompt Engineering, Guidance, Constrained Generation, Structured Output, JSON Validation, Grammar, Microsoft Research, Format Enforcement, Multi-Step Workflows]
@@ -4,6 +4,7 @@ description: Build, test, and debug Hermes Agent RL environments for Atropos tra
version: 1.1.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [atropos, rl, environments, training, reinforcement-learning, reward-functions]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [tokenizers, transformers, datasets]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Tokenization, HuggingFace, BPE, WordPiece, Unigram, Fast Tokenization, Rust, Custom Tokenizer, Alignment Tracking, Production]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [instructor, pydantic, openai, anthropic]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Prompt Engineering, Instructor, Structured Output, Pydantic, Data Extraction, JSON Parsing, Type Safety, Validation, Streaming, OpenAI, Anthropic]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [lambda-cloud-client>=1.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Infrastructure, GPU Cloud, Training, Inference, Lambda Labs]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [transformers, torch, pillow]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [LLaVA, Vision-Language, Multimodal, Visual Question Answering, Image Chat, CLIP, Vicuna, Conversational AI, Instruction Tuning, VQA]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [modal>=0.64.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Infrastructure, Serverless, GPU, Cloud, Deployment, Modal]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [nemo-curator, cudf, dask, rapids]
platforms: [linux, macos]
metadata:
hermes:
tags: [Data Processing, NeMo Curator, Data Curation, GPU Acceleration, Deduplication, Quality Filtering, NVIDIA, RAPIDS, PII Redaction, Multimodal, LLM Training Data]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [peft>=0.13.0, transformers>=4.45.0, torch>=2.0.0, bitsandbytes>=0.43.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Fine-Tuning, PEFT, LoRA, QLoRA, Parameter-Efficient, Adapters, Low-Rank, Memory Optimization, Multi-Adapter]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [pinecone-client]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch>=2.0, transformers]
platforms: [linux, macos]
metadata:
hermes:
tags: [Distributed Training, PyTorch, FSDP, Data Parallel, Sharding, Mixed Precision, CPU Offloading, FSDP2, Large-Scale Training]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [lightning, torch, transformers]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [PyTorch Lightning, Training Framework, Distributed Training, DDP, FSDP, DeepSpeed, High-Level API, Callbacks, Best Practices, Scalable]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [qdrant-client>=1.12.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [RAG, Vector Search, Qdrant, Semantic Search, Embeddings, Similarity Search, HNSW, Production, Distributed]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [sae-lens>=6.0.0, transformer-lens>=2.0.0, torch>=2.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Sparse Autoencoders, SAE, Mechanistic Interpretability, Feature Discovery, Superposition]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch, transformers, datasets, trl, accelerate]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Post-Training, SimPO, Preference Optimization, Alignment, DPO Alternative, Reference-Free, LLM Alignment, Efficient Training]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [sglang-router>=0.2.3, ray, torch>=2.0.0, transformers>=4.40.0]
platforms: [linux, macos]
metadata:
hermes:
tags: [Reinforcement Learning, Megatron-LM, SGLang, GRPO, Post-Training, GLM]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [diffusers>=0.30.0, transformers>=4.41.0, accelerate>=0.31.0, torch>=2.0.0]
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Image Generation, Stable Diffusion, Diffusers, Text-to-Image, Multimodal, Computer Vision]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [tensorrt-llm, torch]
platforms: [linux, macos]
metadata:
hermes:
tags: [Inference Serving, TensorRT-LLM, NVIDIA, Inference Optimization, High Throughput, Low Latency, Production, FP8, INT4, In-Flight Batching, Multi-GPU]
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [torch>=2.6.0, torchtitan>=0.2.0, torchao>=0.5.0]
platforms: [linux, macos]
metadata:
hermes:
tags: [Model Architecture, Distributed Training, TorchTitan, FSDP2, Tensor Parallel, Pipeline Parallel, Context Parallel, Float8, Llama, Pretraining]
+1
View File
@@ -5,6 +5,7 @@ version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [openai-whisper, transformers, torch]
platforms: [linux, macos]
metadata:
hermes:
tags: [Whisper, Speech Recognition, ASR, Multimodal, Multilingual, OpenAI, Speech-To-Text, Transcription, Translation, Audio Processing]
@@ -4,6 +4,7 @@ description: Canvas LMS integration — fetch enrolled courses and assignments u
version: 1.0.0
author: community
license: MIT
platforms: [linux, macos, windows]
prerequisites:
env_vars: [CANVAS_API_TOKEN, CANVAS_BASE_URL]
metadata:
@@ -4,6 +4,7 @@ description: "Shop.app: product search, order tracking, returns, reorder."
version: 0.0.28
author: community
license: MIT
platforms: [linux, macos, windows]
prerequisites:
commands: [curl]
metadata:
@@ -4,6 +4,7 @@ description: Shopify Admin & Storefront GraphQL APIs via curl. Products, orders,
version: 1.0.0
author: community
license: MIT
platforms: [linux, macos, windows]
prerequisites:
env_vars: [SHOPIFY_ACCESS_TOKEN, SHOPIFY_STORE_DOMAIN]
commands: [curl, jq]
@@ -4,6 +4,7 @@ description: SiYuan Note API for searching, reading, creating, and managing bloc
version: 1.0.0
author: FEUAZUR
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [SiYuan, Notes, Knowledge Base, PKM, API]
@@ -4,6 +4,7 @@ description: Give Hermes phone capabilities without core tool changes. Provision
version: 1.0.0
author: Nous Research
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [telephony, phone, sms, mms, voice, twilio, bland.ai, vapi, calling, texting]
@@ -1,6 +1,7 @@
---
name: domain-intel
description: Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required.
platforms: [linux, macos, windows]
---
# Domain Intelligence — Passive OSINT
@@ -7,6 +7,7 @@ description: >
OpenFDA, interpret ADMET profiles, and assist with lead optimization.
Use for medicinal chemistry questions, molecule property analysis, clinical
pharmacology, and open-science drug research.
platforms: [linux, macos, windows]
version: 1.0.0
author: bennytimz
license: MIT
@@ -4,6 +4,7 @@ description: Free web search via DuckDuckGo — text, news, images, videos. No A
version: 1.3.0
author: gamedevCloudy
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [search, duckduckgo, web-search, free, fallback]
@@ -4,6 +4,7 @@ description: Index a codebase with GitNexus and serve an interactive knowledge g
version: 1.0.0
author: Hermes Agent + Teknium
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [gitnexus, code-intelligence, knowledge-graph, visualization]
@@ -4,6 +4,7 @@ description: Optional vendor skill for Parallel CLI — agent-native web search,
version: 1.1.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Research, Web, Search, Deep-Research, Enrichment, CLI]
@@ -4,6 +4,7 @@ description: Web scraping with Scrapling - HTTP fetching, stealth browser automa
version: 1.0.0
author: FEUAZUR
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Web Scraping, Browser, Cloudflare, Stealth, Crawling, Spider]
@@ -4,6 +4,7 @@ description: Free meta-search via SearXNG — aggregates results from 70+ search
version: 1.0.0
author: hermes-agent
license: MIT
platforms: [linux, macos]
metadata:
hermes:
tags: [search, searxng, meta-search, self-hosted, free, fallback]
@@ -4,6 +4,7 @@ description: Set up and use 1Password CLI (op). Use when installing the CLI, ena
version: 1.0.0
author: arceus77-7, enhanced by Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [security, secrets, 1password, op, cli]
@@ -5,6 +5,7 @@ description: |
Covers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence
collection, hypothesis formation/validation, and structured forensic reporting.
Inspired by RAPTOR's 1800+ line OSS Forensics system.
platforms: [linux, macos, windows]
category: security
triggers:
- "investigate this repository"
@@ -4,6 +4,7 @@ description: OSINT username search across 400+ social networks. Hunt down social
version: 1.0.0
author: unmodeled-tyler
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [osint, security, username, social-media, reconnaissance]
@@ -4,6 +4,7 @@ description: Embed alibaba/page-agent into your own web application — a pure-J
version: 1.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [web, javascript, agent, browser, gui, alibaba, embed, copilot, saas]
+1 -1
View File
@@ -54,7 +54,7 @@ def discover_context_engines() -> List[Tuple[str, str, bool]]:
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
with open(yaml_file, encoding="utf-8-sig") as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
except Exception:
+1 -1
View File
@@ -90,7 +90,7 @@ def _log(message: str) -> None:
log_file = get_log_file()
log_file.parent.mkdir(parents=True, exist_ok=True)
ts = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
with open(log_file, "a") as f:
with open(log_file, "a", encoding="utf-8") as f:
f.write(f"[{ts}] {message}\n")
except OSError:
# Never let the audit log break the agent loop.
+6 -9
View File
@@ -70,14 +70,11 @@ def _clear_active() -> None:
def _pid_alive(pid: int) -> bool:
try:
os.kill(pid, 0)
except ProcessLookupError:
return False
except PermissionError:
# Process exists but we can't signal it — treat as alive.
return True
return True
# ``os.kill(pid, 0)`` is NOT a no-op on Windows (bpo-14484) — it
# routes through GenerateConsoleCtrlEvent and can kill the target.
# Use the cross-platform existence check.
from gateway.status import _pid_exists
return _pid_exists(pid)
# ---------------------------------------------------------------------------
@@ -313,7 +310,7 @@ def stop(*, reason: str = "requested") -> Dict[str, Any]:
time.sleep(0.5)
if _pid_alive(pid):
try:
os.kill(pid, signal.SIGKILL)
os.kill(pid, signal.SIGKILL) # windows-footgun: ok — POSIX-only plugin (google_meet registers no-op on Windows; see __init__.py)
except ProcessLookupError:
pass
@@ -292,7 +292,7 @@ class RealtimeSpeaker:
return
self.processed_path.parent.mkdir(parents=True, exist_ok=True)
record = {"id": entry.get("id"), "text": entry.get("text", ""), "result": result}
with open(self.processed_path, "a") as fp:
with open(self.processed_path, "a", encoding="utf-8") as fp:
fp.write(json.dumps(record) + "\n")
# ── main loop ────────────────────────────────────────────────────────
+2 -2
View File
@@ -135,7 +135,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
with open(yaml_file, encoding="utf-8-sig") as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
except Exception:
@@ -381,7 +381,7 @@ def discover_plugin_cli_commands() -> List[dict]:
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
with open(yaml_file, encoding="utf-8-sig") as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
if desc:
+4 -4
View File
@@ -1215,7 +1215,7 @@ class HindsightMemoryProvider(MemoryProvider):
# would capture output from other threads.
import hindsight_embed.daemon_embed_manager as dem
from rich.console import Console
dem.console = Console(file=open(log_path, "a"), force_terminal=False)
dem.console = Console(file=open(log_path, "a", encoding="utf-8"), force_terminal=False)
client = self._get_client()
profile = self._config.get("profile", "hermes")
@@ -1231,15 +1231,15 @@ class HindsightMemoryProvider(MemoryProvider):
if config_changed:
profile_env = _materialize_embedded_profile_env(self._config)
if client._manager.is_running(profile):
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write("\n=== Config changed, restarting daemon ===\n")
client._manager.stop(profile)
client._ensure_started()
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write("\n=== Daemon started successfully ===\n")
except Exception as e:
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write(f"\n=== Daemon startup failed: {e} ===\n")
traceback.print_exc(file=f)
+3 -3
View File
@@ -101,7 +101,7 @@ def _load_plugin_config() -> dict:
return {}
try:
import yaml
with open(config_path) as f:
with open(config_path, encoding="utf-8-sig") as f:
all_config = yaml.safe_load(f) or {}
return cfg_get(all_config, "plugins", "hermes-memory-store", default={}) or {}
except Exception:
@@ -136,11 +136,11 @@ class HolographicMemoryProvider(MemoryProvider):
import yaml
existing = {}
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8-sig") as f:
existing = yaml.safe_load(f) or {}
existing.setdefault("plugins", {})
existing["plugins"]["hermes-memory-store"] = values
with open(config_path, "w") as f:
with open(config_path, "w", encoding="utf-8") as f:
yaml.dump(existing, f, default_flow_style=False)
except Exception:
pass
+6
View File
@@ -42,6 +42,12 @@ dependencies = [
# Python resolves automatically. No-op on Linux/macOS (which have
# /usr/share/zoneinfo). Credits: PR #13182 (@sprmn24).
"tzdata>=2023.3; sys_platform == 'win32'",
# Cross-platform process / PID management. `psutil` is the canonical
# answer for "is this PID alive" and process-tree walking across Linux,
# macOS and Windows. It replaces POSIX-only idioms like `os.kill(pid, 0)`
# (which is a silent killer on Windows — see CONTRIBUTING.md) and
# `os.killpg` (which doesn't exist on Windows).
"psutil>=5.9.0,<8",
]
[project.optional-dependencies]
+8 -1
View File
@@ -22,7 +22,14 @@ Usage:
# IMPORTANT: hermes_bootstrap must be the very first import — UTF-8 stdio
# on Windows. No-op on POSIX. See hermes_bootstrap.py for full rationale.
import hermes_bootstrap # noqa: F401
try:
import hermes_bootstrap # noqa: F401
except ModuleNotFoundError:
# Graceful fallback when hermes_bootstrap isn't registered in the venv
# yet — happens during partial ``hermes update`` where git-reset landed
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import asyncio
import base64
+624
View File
@@ -0,0 +1,624 @@
#!/usr/bin/env python3
"""
Grep-based checker for Windows cross-platform footguns.
Flags common patterns that break silently on Windows. Run before PRs
cheap, fast, catches regressions in a codebase that runs on three OSes.
Usage:
# Scan staged changes (default when run from a git checkout)
python scripts/check-windows-footguns.py
# Scan the full tree (full-repo audit)
python scripts/check-windows-footguns.py --all
# Scan a specific file or directory
python scripts/check-windows-footguns.py path/to/file.py path/to/dir/
# Scan only modified files vs. main
python scripts/check-windows-footguns.py --diff main
Exit status:
0 no Windows footguns found (or all matches suppressed)
1 at least one unsuppressed match
Suppress an intentional use (e.g. tests or platform-gated code) with:
os.kill(pid, 0) # windows-footgun: ok — only called on POSIX
"""
from __future__ import annotations
import argparse
import os
import re
import subprocess
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
REPO_ROOT = Path(__file__).resolve().parent.parent
SUPPRESS_MARKER = re.compile(r"#\s*windows-footgun\s*:\s*ok\b", re.IGNORECASE)
# Line-level guard hints. If a line contains any of these tokens, we assume
# the programmer wrote the line in full awareness of the Windows pitfall —
# e.g. `if hasattr(os, 'setsid'): ... os.setsid()`, or the classic
# `getattr(signal, 'SIGKILL', signal.SIGTERM)`, or `shutil.which("wmic")`.
# False negatives are fine here — the inline `# windows-footgun: ok` marker
# is still the authoritative suppression. This is just to reduce the noise
# floor on obviously-guarded lines so the signal-to-noise stays useful.
GUARD_HINTS = (
"hasattr(os,",
"hasattr(signal,",
"getattr(os,",
"getattr(signal,",
"shutil.which(",
"if platform.system() != \"Windows\"",
"if platform.system() != 'Windows'",
"if sys.platform == \"win32\"",
"if sys.platform != \"win32\"",
"if sys.platform == 'win32'",
"if sys.platform != 'win32'",
"IS_WINDOWS",
"is_windows",
)
# Dirs we never scan.
EXCLUDED_DIRS = {
".git",
"node_modules",
"venv",
".venv",
"__pycache__",
"build",
"dist",
".tox",
".mypy_cache",
".pytest_cache",
"site-packages",
"website/build",
"optional-skills", # external skills
}
# File globs we never scan (beyond the dirs above).
EXCLUDED_SUFFIXES = {
".pyc",
".pyo",
".so",
".dll",
".exe",
".png",
".jpg",
".gif",
".ico",
".svg",
".mp4",
".mp3",
".wav",
".pdf",
".zip",
".tar",
".gz",
".whl",
".lock",
".min.js",
".min.css",
}
# Files we never scan (self-referential — this script mentions the
# patterns it detects — and the CONTRIBUTING docs that list them).
EXCLUDED_FILES = {
"scripts/check-windows-footguns.py",
"CONTRIBUTING.md",
}
@dataclass
class Footgun:
"""A Windows cross-platform footgun pattern."""
name: str
pattern: re.Pattern
message: str
fix: str
# If set, matches in files/paths containing any of these substrings are
# silently ignored (e.g. tests that legitimately exercise the footgun
# behind a platform guard). Prefer `# windows-footgun: ok` inline
# suppression over this list; only use path_allowlist for whole files
# that are inherently tests of the footgun itself.
path_allowlist: tuple[str, ...] = ()
# Optional post-match predicate. Takes the re.Match and returns True
# if the match is a REAL footgun (not a false positive). Use this when
# the regex can't fully distinguish (e.g. open() where mode may contain
# "b" for binary, or the line may have `encoding=` elsewhere).
post_filter: "callable | None" = None
FOOTGUNS: list[Footgun] = [
Footgun(
name="open() without encoding= on text mode",
# Match builtins.open() specifically — NOT os.open(), .open()
# method calls (Path.open, tarfile.open, zf.open, webbrowser.open,
# Image.open, wave.open, etc), or `async def open()` method
# definitions. The pattern requires a start-of-identifier boundary
# before `open(` so `os.open`, `.open`, `def open` are all skipped.
# Note: Path.open() is ALSO affected by the encoding default, but
# rather than flagging all `.open(` (huge noise), we require an
# explicit builtins-style open() call. Path.open() is rare in the
# codebase compared to open() and can be audited separately.
pattern=re.compile(
r"""(?:^|[\s\(,;=])(?<![.\w])open\s*\(\s*[^,)]+\s*(?:,\s*['"](?P<mode>[^'"]*)['"])?"""
),
message=(
"open() without an explicit encoding= uses the platform default "
"(UTF-8 on POSIX, cp1252/mbcs on Windows) — files round-tripped "
"between hosts get mojibake. Always pass encoding='utf-8' for "
"text files, or use open(path, 'rb')/'wb' for binary."
),
fix=(
"open(path, 'r', encoding='utf-8') # or 'utf-8-sig' if the "
"file may have a BOM"
),
# Filter: only flag if mode is missing-or-text AND the line doesn't
# already pass encoding=. Skip binary mode (contains "b").
post_filter=lambda m, line: (
"b" not in (m.group("mode") or "")
and "encoding=" not in line
and "encoding =" not in line
# Skip `def open(` and `async def open(` (method definitions)
and not line.lstrip().startswith("def ")
and not line.lstrip().startswith("async def ")
# Skip open(path, **kwargs) patterns — encoding may be in the dict.
# Too expensive to trace; require the author to set encoding in
# the dict and trust them (or they can add a # windows-footgun: ok).
and "**" not in line
),
),
Footgun(
name="os.kill(pid, 0)",
pattern=re.compile(r"\bos\.kill\s*\(\s*[^,]+,\s*0\s*\)"),
message=(
"os.kill(pid, 0) is NOT a no-op on Windows — it sends "
"CTRL_C_EVENT to the target's console process group, "
"hard-killing the target and potentially unrelated siblings. "
"See bpo-14484."
),
fix=(
"Use psutil.pid_exists(pid) (psutil is a core dependency). "
"Or gateway.status._pid_exists(pid) for the hermes wrapper "
"with a stdlib fallback."
),
),
Footgun(
name="bare os.setsid",
pattern=re.compile(r"(?<!hasattr\()\bos\.setsid\b"),
message=(
"os.setsid does not exist on Windows and raises "
"AttributeError. Subprocesses that need detachment on "
"Windows use creationflags instead."
),
fix=(
"if platform.system() != 'Windows':\n"
" kwargs['preexec_fn'] = os.setsid\n"
"else:\n"
" kwargs['creationflags'] = subprocess.CREATE_NEW_PROCESS_GROUP"
),
),
Footgun(
name="bare os.killpg",
pattern=re.compile(r"\bos\.killpg\b"),
message="os.killpg does not exist on Windows.",
fix=(
"Use psutil for cross-platform process-tree kill:\n"
" p = psutil.Process(pid)\n"
" for c in p.children(recursive=True): c.kill()\n"
" p.kill()"
),
),
Footgun(
name="bare os.getuid / os.geteuid / os.getgid",
pattern=re.compile(r"\bos\.(?:getuid|geteuid|getgid|getegid)\b"),
message=(
"os.getuid / os.geteuid / os.getgid do not exist on Windows "
"and raise AttributeError at import time if referenced."
),
fix=(
"Use getpass.getuser() for the username, or gate with "
"hasattr(os, 'getuid')."
),
),
Footgun(
name="bare os.fork",
pattern=re.compile(r"(?<!hasattr\()\bos\.fork\s*\("),
message="os.fork does not exist on Windows.",
fix=(
"Use subprocess.Popen for daemonization, or guard with "
"hasattr(os, 'fork') and a Windows fallback path."
),
),
Footgun(
name="bare signal.SIGKILL",
pattern=re.compile(r"\bsignal\.SIGKILL\b"),
message=(
"signal.SIGKILL does not exist on Windows and raises "
"AttributeError at import time."
),
fix="Use getattr(signal, 'SIGKILL', signal.SIGTERM).",
),
Footgun(
name="bare signal.SIGHUP / SIGUSR1 / SIGUSR2 / SIGALRM / SIGCHLD / SIGPIPE / SIGQUIT",
pattern=re.compile(
r"\bsignal\.(?:SIGHUP|SIGUSR1|SIGUSR2|SIGALRM|SIGCHLD|SIGPIPE|SIGQUIT)\b"
),
message=(
"These POSIX signals don't exist on Windows; referencing "
"them raises AttributeError at import time."
),
fix=(
"Use getattr(signal, 'SIGXXX', None) and check for None "
"before using, or gate the whole block behind a platform check."
),
),
Footgun(
name="subprocess shebang script invocation",
pattern=re.compile(
r"subprocess\.(?:run|Popen|call|check_output|check_call)\s*\(\s*\[\s*['\"]\./"
),
message=(
"Running a script via './scriptname' doesn't work on Windows — "
"shebang lines aren't honored. CreateProcessW can't execute "
"bash/python scripts without an explicit interpreter."
),
fix="Use [sys.executable, 'scriptname.py', ...] explicitly.",
),
Footgun(
name="wmic invocation without shutil.which guard",
# Match wmic appearing as a subprocess argument — NOT the
# shutil.which("wmic") guard pattern itself. Looks for wmic in a
# list or as first arg of subprocess.run/Popen.
pattern=re.compile(
r"""(?:subprocess\.\w+\s*\(\s*\[\s*['"]wmic['"]|['"]wmic\.exe['"])"""
),
message=(
"wmic was removed in Windows 10 21H1 and later. Always "
"gate with shutil.which('wmic') and fall back to "
"PowerShell (Get-CimInstance Win32_Process)."
),
fix=(
"if shutil.which('wmic'):\n"
" ... wmic path ...\n"
"else:\n"
" subprocess.run(['powershell', '-NoProfile', '-Command',\n"
" 'Get-CimInstance Win32_Process | ...'])"
),
),
Footgun(
name="hardcoded ~/Desktop (OneDrive trap)",
pattern=re.compile(
r"""['"](?:~|~/|[A-Z]:[/\\]Users[/\\][^/\\'"]+[/\\])Desktop\b"""
),
message=(
"When OneDrive Backup is enabled on Windows, the real Desktop "
"is at %USERPROFILE%\\OneDrive\\Desktop, not %USERPROFILE%\\"
"Desktop (which exists as an empty husk)."
),
fix=(
"On Windows, resolve via ctypes + SHGetKnownFolderPath, or "
"read the Shell Folders registry key, or run PowerShell "
"[Environment]::GetFolderPath('Desktop')."
),
),
Footgun(
name="asyncio add_signal_handler without try/except",
pattern=re.compile(r"\.add_signal_handler\s*\("),
message=(
"loop.add_signal_handler raises NotImplementedError on "
"Windows — always wrap in try/except or gate with a "
"platform check."
),
fix=(
"try:\n"
" loop.add_signal_handler(sig, handler, sig)\n"
"except NotImplementedError:\n"
" pass # Windows asyncio doesn't support signal handlers"
),
),
]
def should_scan_file(path: Path) -> bool:
"""Return True if this file is in scope for the checker."""
# Skip the excluded dirs
parts = set(path.parts)
if parts & EXCLUDED_DIRS:
return False
# Skip excluded suffixes
for suffix in EXCLUDED_SUFFIXES:
if str(path).endswith(suffix):
return False
# Skip self and docs that intentionally mention the patterns
rel = path.relative_to(REPO_ROOT).as_posix()
if rel in EXCLUDED_FILES:
return False
# Only scan text files (rough heuristic — .py, .md, .sh, .ps1, .yaml, etc.)
if path.suffix in {".py", ".pyw", ".pyi"}:
return True
# Other file types are read but only Python-specific patterns would match;
# that's fine and cheap to skip.
return False
def iter_files(paths: Iterable[Path]) -> Iterable[Path]:
for p in paths:
if p.is_file():
if should_scan_file(p):
yield p
elif p.is_dir():
for root, dirs, files in os.walk(p):
# prune excluded dirs in-place for speed
dirs[:] = [d for d in dirs if d not in EXCLUDED_DIRS]
for fname in files:
fpath = Path(root) / fname
if should_scan_file(fpath):
yield fpath
def _strip_code(line: str) -> str:
"""Return just the code portion of a line — strip trailing comments and
skip lines that are entirely inside a string literal or comment.
Heuristic only (we don't parse Python); good enough to avoid flagging
our own `# ``os.kill(pid, 0)`` is NOT a no-op` docstring-style comments.
"""
stripped = line.lstrip()
# Line starts with # — entirely a comment.
if stripped.startswith("#"):
return ""
# Remove trailing "# ..." inline comment. Naive — doesn't handle `#`
# inside strings — but on balance reduces noise far more than it adds.
hash_idx = _find_unquoted_hash(line)
if hash_idx is not None:
return line[:hash_idx]
return line
def _find_unquoted_hash(line: str) -> int | None:
"""Index of the first `#` not inside a single/double/triple-quoted string.
Simple state machine good enough for the 99% case of "code, then
optional trailing comment."
"""
i = 0
n = len(line)
in_s = False # single-quote string
in_d = False # double-quote string
while i < n:
c = line[i]
if c == "\\" and (in_s or in_d) and i + 1 < n:
i += 2
continue
if not in_d and c == "'":
in_s = not in_s
elif not in_s and c == '"':
in_d = not in_d
elif c == "#" and not in_s and not in_d:
return i
i += 1
return None
def scan_file(path: Path, footguns: list[Footgun]) -> list[tuple[int, str, Footgun]]:
"""Return a list of (line_number, line, footgun) for unsuppressed matches."""
try:
text = path.read_text(encoding="utf-8", errors="replace")
except OSError:
return []
matches: list[tuple[int, str, Footgun]] = []
# Track whether we're inside a triple-quoted string (docstring/raw block).
# Simple state machine — handles both ''' and """, toggled by the FIRST
# triple-quote we see; we don't try to handle nested or f-string cases.
in_triple: str | None = None # None, "'''", or '"""'
for i, line in enumerate(text.splitlines(), start=1):
# Update triple-quote state based on this line's occurrences.
code_for_scan = line
if in_triple:
# We're inside a docstring — skip the whole line's scan.
# Check if it closes here.
if in_triple in line:
# Find the closing delimiter; anything after it is real code.
after = line.split(in_triple, 1)[1]
in_triple = None
code_for_scan = after
else:
continue
# Now check for docstring-open in the (possibly after-triple) portion.
# Scan for the first unescaped '''/""" in the current code_for_scan.
stripped = code_for_scan.strip()
for delim in ('"""', "'''"):
if delim in code_for_scan:
# Count occurrences — even count means single-line docstring,
# odd means we've entered a multi-line one.
count = code_for_scan.count(delim)
if count % 2 == 1:
# Odd — we're now inside the triple-quoted block.
# Scan only the part BEFORE the opening delimiter.
before = code_for_scan.split(delim, 1)[0]
code_for_scan = before
in_triple = delim
break
else:
# Even — entire docstring fits on one line. Strip it
# from the scan text to avoid matching on prose.
parts = code_for_scan.split(delim)
# Keep the "outside" parts (every other chunk, starting
# with index 0) as code, drop the "inside" parts.
code_for_scan = "".join(parts[::2])
break
if SUPPRESS_MARKER.search(line):
continue
# Skip if the line has an obvious guard — e.g. hasattr/getattr/
# shutil.which or a platform check. False negatives are acceptable;
# the inline suppression marker is the authoritative override.
if any(hint in line for hint in GUARD_HINTS):
continue
code = _strip_code(code_for_scan)
if not code.strip():
continue
for fg in footguns:
if fg.path_allowlist and any(s in str(path) for s in fg.path_allowlist):
continue
match = fg.pattern.search(code)
if not match:
continue
if fg.post_filter is not None:
try:
if not fg.post_filter(match, line):
continue
except (IndexError, AttributeError):
# Post-filter assumed a named group that isn't there — skip.
continue
matches.append((i, line.rstrip(), fg))
return matches
def get_staged_files() -> list[Path]:
"""Return paths staged in the current git index. Empty on non-git trees."""
try:
out = subprocess.check_output(
["git", "diff", "--cached", "--name-only", "--diff-filter=ACMR"],
cwd=REPO_ROOT,
stderr=subprocess.DEVNULL,
text=True,
)
except (subprocess.CalledProcessError, FileNotFoundError):
return []
return [REPO_ROOT / f for f in out.splitlines() if f.strip()]
def get_diff_files(ref: str) -> list[Path]:
"""Return paths modified vs. the given git ref."""
try:
out = subprocess.check_output(
["git", "diff", f"{ref}...HEAD", "--name-only", "--diff-filter=ACMR"],
cwd=REPO_ROOT,
stderr=subprocess.DEVNULL,
text=True,
)
except (subprocess.CalledProcessError, FileNotFoundError):
return []
return [REPO_ROOT / f for f in out.splitlines() if f.strip()]
def parse_args(argv: list[str]) -> argparse.Namespace:
p = argparse.ArgumentParser(
description="Flag Windows cross-platform footguns in Python code."
)
p.add_argument(
"paths",
nargs="*",
type=Path,
help="Specific files/dirs to scan (default: staged changes).",
)
p.add_argument(
"--all",
action="store_true",
help="Scan the full repository (hermes_cli/, gateway/, tools/, cron/, etc.).",
)
p.add_argument(
"--diff",
metavar="REF",
help="Scan files changed vs. the given git ref (e.g. --diff main).",
)
p.add_argument(
"--list",
action="store_true",
help="List all known footgun rules and exit.",
)
return p.parse_args(argv)
def print_rules() -> None:
print("Known Windows footguns checked by this script:\n")
for i, fg in enumerate(FOOTGUNS, start=1):
print(f"{i:2}. {fg.name}")
print(f" {fg.message}")
print(f" Fix: {fg.fix}")
print()
def main(argv: list[str]) -> int:
args = parse_args(argv)
if args.list:
print_rules()
return 0
if args.all:
# Scan main Python packages + scripts
roots = [
REPO_ROOT / "hermes_cli",
REPO_ROOT / "gateway",
REPO_ROOT / "tools",
REPO_ROOT / "cron",
REPO_ROOT / "agent",
REPO_ROOT / "plugins",
REPO_ROOT / "scripts",
REPO_ROOT / "acp_adapter",
REPO_ROOT / "acp_registry",
]
roots = [r for r in roots if r.exists()]
elif args.diff:
roots = get_diff_files(args.diff)
elif args.paths:
roots = [p.resolve() for p in args.paths]
else:
# Default: staged changes
roots = get_staged_files()
if not roots:
print(
"No staged files to scan. Pass --all for a full-repo scan, "
"--diff <ref> for a range diff, or paths explicitly.",
file=sys.stderr,
)
return 0
total_matches = 0
files_scanned = 0
for path in iter_files(roots):
files_scanned += 1
matches = scan_file(path, FOOTGUNS)
for lineno, line, fg in matches:
rel = path.relative_to(REPO_ROOT).as_posix()
print(f"{rel}:{lineno}: [{fg.name}]")
print(f" {line.strip()}")
print(f"{fg.message}")
print(f" Fix: {fg.fix.splitlines()[0]}")
print()
total_matches += 1
if total_matches:
print(
f"\n{total_matches} Windows footgun(s) found across "
f"{files_scanned} file(s) scanned.",
file=sys.stderr,
)
print(
" If an individual match is a false positive or intentionally "
"platform-gated, suppress it with `# windows-footgun: ok` on "
"the same line.\n Run with --list to see all rules.",
file=sys.stderr,
)
return 1
print(
f"✓ No Windows footguns found ({files_scanned} file(s) scanned)."
)
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))

Some files were not shown because too many files have changed in this diff Show More