Compare commits

..

14 Commits

Author SHA1 Message Date
kshitijk4poor 2af14bd401 fix: self-review findings — logging, create_task, get_running_loop, benchmark path
Self-review findings addressed:

- browser_tool.py: log swallowed supervisor error at DEBUG instead of bare
  'pass' (was silent, triggered F841 for unused 'exc' variable). Renamed
  to '_exc' to signal intentional discard.
- browser_tool.py: rename unused 'press_id' to '_press_id' in both normal
  and retry paths (mouseReleased-only wait is intentional; press_id is never
  used after send).
- browser_tool.py: get_event_loop() → get_running_loop() in 3 locations
  inside _cdp_resolve_session and _cdp_coordinate_click_async. Both are
  async functions and get_event_loop() is deprecated in async context in
  Python 3.10+.
- browser_supervisor.py: ensure_future → create_task in dispatch_mouse_click.
  create_task is the correct modern API when already inside a running
  coroutine; ensure_future is deprecated for coroutines in Python 3.10+.
  Also consistent with the rest of browser_supervisor.py which uses
  create_task exclusively everywhere else.
- scripts/benchmark_click_paths.py: replace hardcoded /private/tmp/hermes-
  coord-click sys.path hack with __file__-relative repo root detection so
  the script works from any checkout location.

27/27 tests pass.
2026-05-07 10:49:06 +05:30
kshitijk4poor aef97da6d4 perf: reuse supervisor's persistent WS for coordinate clicks (23x speedup)
The CDPSupervisor (browser_supervisor.py) already maintains a persistent
WebSocket connection per task_id for dialog detection and frame tracking.
After browser_navigate(), a supervisor is always running with an open WS.
Instead of opening a new connection per click, dispatch directly on it.

Changes:
- browser_supervisor.py: add CDPSupervisor.dispatch_mouse_click() — sync
  bridge onto the supervisor's asyncio loop via run_coroutine_threadsafe.
  Pipelines mousePressed + mouseReleased via asyncio.gather (Playwright
  Promise.all pattern), no serial round-trips.
- browser_tool.py: _cdp_coordinate_click() now checks
  SUPERVISOR_REGISTRY.get(task_id) first; falls back to per-click WS
  connect if no supervisor is running (e.g. raw CDP without navigate).

Dispatch priority (fastest first):
  1. Supervisor path  — zero WS connection cost (supervisor WS already open)
  2. Warm-cache path  — 1 WS open + 2 mouse events (session cached)
  3. Cold-cache path  — 1 WS open + getTargets + attachToTarget + 2 events
  4. agent-browser    — 3 subprocess IPC calls (no CDP endpoint configured)

Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iterations):
  Baseline   (3 connections):          4.86ms mean
  Warm cache (1 conn + cache):         1.30ms mean   (3.74x)
  Supervisor (persistent WS):          0.20ms mean  (23.75x)
  Ref-click IPC baseline:              0.14ms mean  (parity)

The supervisor path is 1.5x ref-click (0.07ms overhead) — essentially
the cost of one cross-thread future dispatch.

27/27 tests pass (+3 new TestSupervisorPath tests).
2026-05-07 10:28:48 +05:30
kshitijk4poor 451c55bd9c perf: session ID caching + skip mousePressed ack (browser-harness/Playwright patterns)
Two additional optimizations from researching Playwright, Puppeteer, and
browser-harness source:

SESSION ID CACHING (browser-harness daemon pattern)
  Target.getTargets + Target.attachToTarget are stable across clicks on the
  same page. Cache the resolved session_id keyed by CDP endpoint URL.
  Subsequent clicks skip straight to mousePressed+mouseReleased with no
  session negotiation overhead.

  Self-healing: on 'Session with given id not found' (stale after navigation),
  the cache is invalidated and session resolution runs once before retrying.
  This matches the exact retry pattern from browser-harness's daemon.handle().

SKIP mousePressed ACK (Playwright Promise.all pattern)
  Browser processes CDP messages sequentially within a session. If
  mouseReleased is acknowledged, mousePressed was already processed.
  We skip waiting for the press ack entirely, saving one RTT. This is
  the same pattern as Playwright's Mouse.click() using Promise.all and
  Puppeteer's concurrent down+up dispatch.

COMPRESSION=NONE (Puppeteer NodeWebSocketTransport pattern)
  Small CDP messages (Input.dispatchMouseEvent payloads are ~80 bytes)
  don't benefit from per-message compression. Disable it explicitly.
  Puppeteer uses perMessageDeflate: false for the same reason.

Benchmark vs real Lightpanda WS (300 iterations):
  Baseline (3 connections):       3.28ms mean
  Optimized cold cache (1 conn):  1.17ms mean  (2.79x speedup)
  Optimized warm cache (1 conn):  1.17ms mean  (2.82x speedup)

The cold/warm delta is <0.01ms because getTargets+attachToTarget on an
already-open socket costs almost nothing on localhost — the dominant cost
is WS connection setup, which we eliminated in the previous commit.
The session cache still removes real work (2 CDP round-trips) and prevents
accumulating latency on remote/higher-latency CDP endpoints.

Tests: 24 passed (21 existing + 3 new session caching tests)
2026-05-07 10:17:29 +05:30
kshitijk4poor 0bfab1d361 perf: batch CDP click into single WS connection (2.4x speedup)
Replace the 3-separate-_cdp_call() approach (one WS connection per
message) with a single _cdp_coordinate_click_async() coroutine that
opens the WebSocket once and sequences all CDP messages on it:

  1. Target.getTargets
  2. Target.attachToTarget (if page target found)
  3. Input.dispatchMouseEvent (mousePressed)  } pipelined — both sent
  4. Input.dispatchMouseEvent (mouseReleased) } before awaiting either

Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iters):

  Baseline  (current main, 3 connections): 3.14ms mean, 2.97ms median
  Optimized (this commit, 1 connection):   1.30ms mean, 1.11ms median
  Speedup: 2.42x mean, 2.68x median, 1.62x p95

The savings come entirely from eliminating 2 TCP+WS handshakes.
mousePressed + mouseReleased are pipelined on the same connection,
so they travel in the same network burst.

21/21 tests pass.
2026-05-07 10:04:38 +05:30
kshitijk4poor ff8c6f2d64 feat: add compositor-level coordinate click to browser_click
Add optional x/y parameters to browser_click for viewport-coordinate
clicking via CDP Input.dispatchMouseEvent. When coordinates are provided,
clicks are dispatched at the browser compositor level — Chrome does its own
hit-testing, bypassing DOM selectors entirely.

Use cases where ref-based click fails but coordinate click works:
- Cross-origin iframes (OOPIFs)
- Closed shadow DOM
- Canvas/WebGL elements
- Dynamic overlays where the snapshot may be stale

Implementation:
- CDP path (preferred): Input.dispatchMouseEvent via WebSocket
  (Target.getTargets + mousePressed + mouseReleased)
- agent-browser fallback: mouse move/down/up when no CDP endpoint available
- ref is no longer required — either ref OR x+y must be provided

Benchmark (real Lightpanda WS at ws://127.0.0.1:63372, 200 iterations):
  CDP coord click:         3.71ms mean (2.97ms median, 2.61ms min, 7.01ms p95)
  Single WS conn baseline: 1.57ms mean (cost per connection open+call)
  agent-browser IPC:       0.20ms mean per HTTP call

The 3.71ms per CDP click comes from 3 sequential fresh WS connections
(pre-existing architecture in browser_cdp_tool.py). A persistent WS
connection pool would bring this to ~3.1ms (just the 2 mouse events).
Both paths are well under the 100ms human perception threshold.

Files:
- tools/browser_tool.py: schema update (x/y, ref no longer required),
  _cdp_coordinate_click(), _coordinate_click_via_agent_browser(),
  updated browser_click() with validation and dispatch
- tests/tools/test_browser_coordinate_click.py: 21 tests covering
  validation, CDP path, fallback path, ref preservation, schema, registry
- scripts/benchmark_click_paths.py: real-browser latency benchmark
2026-05-07 09:57:44 +05:30
Teknium 49c3c2e0d3 docs(kanban): fix worker skill setup instructions too (#20960)
Follow-up to #20958. The worker skill section had the same stale
'hermes skills install devops/kanban-worker' command — kanban-worker
is also bundled, so that command fails with 'Could not fetch from any
source.'

Replace with bundled-skill verification + restore pattern, matching
the orchestrator section. Uses <your-worker-profile> placeholder since
assignees vary (researcher, writer, ops, linguist, reviewer, etc.)
rather than a single fixed 'worker' profile.
2026-05-06 18:40:30 -07:00
Gille 45cbf93899 docs(kanban): fix orchestrator skill setup instructions (#20958) 2026-05-06 18:14:30 -07:00
Teknium 5a3cadf6eb fix(discord): narrow rate-limit catch and move sync state under gateway/
Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
2026-05-06 18:12:35 -07:00
helix4u d797755a1c fix(gateway): wait for systemd restart readiness 2026-05-06 18:12:35 -07:00
Teknium 3cdbf334d5 fix(gateway): don't dead-end setup wizard when only system-scope unit is installed
The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.

- Convert _require_root_for_system_service to raise a typed
  SystemScopeRequiresRootError (RuntimeError subclass) instead of
  sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
  restart|uninstall` without sudo) still exits 1 cleanly via a new
  catch at the top of gateway_command, matching the existing
  UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
  _print_system_scope_remediation() helper. Both setup wizards
  (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
  detect the dead-end before prompting and print actionable guidance:
  either `sudo systemctl start <service>` this time, or uninstall the
  system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
  SystemScopeRequiresRootError and fall back to the remediation
  helper if the pre-check is bypassed (race, etc.).

Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.
2026-05-06 15:58:02 -07:00
brooklyn! 04cf4788cc fix(tui): restore voice push-to-talk parity (#20897)
* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301b)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>
2026-05-06 15:49:59 -07:00
brooklyn! 5ccab51fa8 fix(tui): steady transcript scrollbar (#20917)
* fix(tui): steady transcript scrollbar

Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags.

* fix(tui): smooth precision wheel scroll

Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.
2026-05-06 14:50:31 -07:00
ethernet 53a024994a Merge pull request #20890 from NousResearch/fix/docker-push
ci(docker): don't cancel overlapping builds, guard :latest
2026-05-06 17:38:21 -04:00
ethernet f4031df05d ci(docker): don't cancel overlapping builds, guard :latest
Switch top-level concurrency to cancel-in-progress=false so every push
to main gets its own SHA-tagged image published — no more discarded
builds when commits land back-to-back.

Guard the :latest tag with a second job that has its own concurrency
group with cancel-in-progress=true plus a git-ancestor check against
the revision label on the current :latest. Together these guarantee
:latest only ever moves forward in history: a slower run whose commit
isn't a descendant of the current :latest refuses to clobber it, and
a newer push mid-way through the move-latest job preempts the older
one before it can retag.

- Every main push publishes nousresearch/hermes-agent:sha-<commit>
  with an org.opencontainers.image.revision label embedded.
- move-latest job reads that label off :latest, runs merge-base
  --is-ancestor, and only retags (via buildx imagetools create,
  registry-side, no rebuild) if our commit strictly descends.
- fetch-depth bumped to 1000 so merge-base has the history it needs.
- Release tag flow unchanged (unique tag, no race).
2026-05-06 15:53:47 -04:00
26 changed files with 3198 additions and 205 deletions
+142 -3
View File
@@ -16,9 +16,13 @@ on:
permissions:
contents: read
# Top-level concurrency: do NOT cancel in-flight builds when a new push lands.
# Every commit deserves its own SHA-tagged image in the registry, and we guard
# the :latest tag in a separate job below (with its own concurrency group) so
# a slow run can't clobber :latest with older bits.
concurrency:
group: docker-${{ github.ref }}
cancel-in-progress: true
cancel-in-progress: false
jobs:
build-and-push:
@@ -26,11 +30,18 @@ jobs:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
timeout-minutes: 60
outputs:
pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
submodules: recursive
# Fetch enough history to run `git merge-base --is-ancestor` in the
# move-latest job. That job reuses this checkout via its own
# actions/checkout call, but commits reachable from main up to ~1000
# back are plenty for any realistic race window.
fetch-depth: 1000
- name: Set up QEMU
uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130 # v3
@@ -74,7 +85,12 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Push multi-arch image (main branch)
# Always push a per-commit SHA tag on main. This is race-free because
# every commit has a unique SHA — concurrent runs can't clobber each
# other here. We also embed the git SHA as an OCI label so the
# move-latest job (below) can read it back off the registry's `:latest`.
- name: Push multi-arch image with SHA tag (main branch)
id: push_sha
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
@@ -82,10 +98,17 @@ jobs:
file: Dockerfile
push: true
platforms: linux/amd64,linux/arm64
tags: nousresearch/hermes-agent:latest
tags: nousresearch/hermes-agent:sha-${{ github.sha }}
labels: |
org.opencontainers.image.revision=${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Mark SHA tag pushed
id: mark_pushed
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
- name: Push multi-arch image (release)
if: github.event_name == 'release'
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
@@ -97,3 +120,119 @@ jobs:
tags: nousresearch/hermes-agent:${{ github.event.release.tag_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Second job: moves `:latest` to point at the SHA tag the first job pushed.
#
# Has its own concurrency group with `cancel-in-progress: true`, which
# gives us the serialization we need: if a newer push arrives while an
# older run is mid-way through this job, the older run is cancelled
# before it can clobber `:latest`. Combined with the ancestor check
# below, this means `:latest` only ever moves forward in git history.
move-latest:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'push'
&& github.ref == 'refs/heads/main'
&& needs.build-and-push.outputs.pushed_sha_tag == 'true'
needs: build-and-push
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-latest-${{ github.ref }}
cancel-in-progress: true
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 1000
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Read the git revision label off the current `:latest` manifest, then
# use `git merge-base --is-ancestor` to check whether our commit is a
# descendant of it. If `:latest` doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run already
# advanced `:latest` past us (or diverged), we skip and leave it alone.
- name: Decide whether to move :latest
id: latest_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
# Pull the JSON for the linux/amd64 sub-manifest's config and extract
# the OCI revision label with jq — Go template field access can't
# handle dots in map keys, so using json+jq is the robust route.
image_json=$(
docker buildx imagetools inspect "${image}:latest" \
--format '{{ json (index .Image "linux/amd64") }}' \
2>/dev/null || true
)
if [ -z "${image_json}" ]; then
echo "No existing :latest (or inspect failed) — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
current_sha=$(
printf '%s' "${image_json}" \
| jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
)
if [ -z "${current_sha}" ]; then
echo "Registry :latest has no revision label — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Registry :latest is at ${current_sha}"
echo "This run is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":latest already points at our SHA — nothing to do."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Make sure we have the :latest commit locally for merge-base.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
|| true
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
echo "Registry :latest points at an unknown commit (${current_sha}); refusing to overwrite."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Our SHA must be a descendant of the current :latest to be safe.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :latest — safe to advance."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
else
echo "Another run advanced :latest past us (or diverged) — leaving it alone."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed SHA manifest as :latest. This is a registry-
# side operation — no rebuild, no layer re-push — so it's quick and
# atomic per-tag. The ancestor check above plus the cancel-in-progress
# concurrency on this job together guarantee we only ever move :latest
# forward in git history.
- name: Move :latest to this SHA
if: steps.latest_check.outputs.push_latest == 'true'
run: |
set -euo pipefail
image=nousresearch/hermes-agent
docker buildx imagetools create \
--tag "${image}:latest" \
"${image}:sha-${GITHUB_SHA}"
+3 -10
View File
@@ -48,16 +48,12 @@ jobs:
- name: Determine base ref
id: base
env:
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
run: |
# For PRs, diff against the PR's pinned parent commit
# (github.event.pull_request.base.sha — snapshot at PR open time,
# so later pushes to main don't leak into the diff).
# For PRs, diff against the merge base with the target branch.
# For pushes to main, diff against the previous commit on main.
if [ "${{ github.event_name }}" = "pull_request" ]; then
BASE_SHA="${PR_BASE_SHA}"
BASE_REF="PR base (${BASE_SHA:0:7})"
BASE_SHA=$(git merge-base "origin/${{ github.base_ref }}" HEAD)
BASE_REF="origin/${{ github.base_ref }}"
else
BASE_SHA=$(git rev-parse HEAD~1 2>/dev/null || git rev-parse HEAD)
BASE_REF="HEAD~1"
@@ -121,9 +117,6 @@ jobs:
name: lint-reports
path: .lint-reports/
retention-days: 14
# .lint-reports/ is a dotfile-prefixed directory, and upload-artifact@v4
# skips hidden files by default (breaking change from v3). Opt back in.
include-hidden-files: true
- name: Post / update PR comment
if: github.event_name == 'pull_request'
+222 -13
View File
@@ -10,6 +10,8 @@ Uses discord.py library for:
"""
import asyncio
import hashlib
import json
import logging
import os
import struct
@@ -24,6 +26,10 @@ logger = logging.getLogger(__name__)
VALID_THREAD_AUTO_ARCHIVE_MINUTES = {60, 1440, 4320, 10080}
_DISCORD_COMMAND_SYNC_POLICIES = {"safe", "bulk", "off"}
_DISCORD_COMMAND_SYNC_STATE_SUBDIR = "gateway"
_DISCORD_COMMAND_SYNC_STATE_FILENAME = "discord_command_sync_state.json"
_DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS = 4.5
_DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS = 30.0
try:
import discord
@@ -45,6 +51,7 @@ from gateway.config import Platform, PlatformConfig
import re
from gateway.platforms.helpers import MessageDeduplicator, ThreadParticipationTracker
from utils import atomic_json_write
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
@@ -825,6 +832,167 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Disconnected", self.name)
def _command_sync_state_path(self) -> _Path:
from hermes_constants import get_hermes_home
directory = get_hermes_home() / _DISCORD_COMMAND_SYNC_STATE_SUBDIR
try:
directory.mkdir(parents=True, exist_ok=True)
except Exception:
pass
return directory / _DISCORD_COMMAND_SYNC_STATE_FILENAME
def _read_command_sync_state(self) -> dict:
try:
path = self._command_sync_state_path()
if not path.exists():
return {}
data = json.loads(path.read_text(encoding="utf-8"))
except Exception:
return {}
return data if isinstance(data, dict) else {}
def _write_command_sync_state(self, state: dict) -> None:
atomic_json_write(
self._command_sync_state_path(),
state,
indent=None,
separators=(",", ":"),
)
def _command_sync_state_key(self, app_id: Any) -> str:
return str(app_id or "unknown")
def _desired_command_sync_fingerprint(self) -> str:
tree = self._client.tree if self._client else None
desired = []
if tree is not None:
desired = [
self._canonicalize_app_command_payload(command.to_dict(tree))
for command in tree.get_commands()
]
desired.sort(key=lambda item: (item.get("type", 1), item.get("name", "")))
payload = json.dumps(desired, sort_keys=True, separators=(",", ":"))
return hashlib.sha256(payload.encode("utf-8")).hexdigest()
def _command_sync_skip_reason(self, app_id: Any, fingerprint: str) -> Optional[str]:
entry = self._read_command_sync_state().get(self._command_sync_state_key(app_id))
if not isinstance(entry, dict):
return None
now = time.time()
retry_after_until = float(entry.get("retry_after_until") or 0)
if retry_after_until > now:
remaining = max(1, int(retry_after_until - now))
return f"Discord asked us to wait before syncing slash commands; retry in {remaining}s"
if entry.get("fingerprint") == fingerprint and entry.get("last_success_at"):
return "same slash-command fingerprint already synced"
return None
def _record_command_sync_attempt(self, app_id: Any, fingerprint: str) -> None:
state = self._read_command_sync_state()
state[self._command_sync_state_key(app_id)] = {
**(
state.get(self._command_sync_state_key(app_id))
if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
else {}
),
"fingerprint": fingerprint,
"last_attempt_at": time.time(),
}
self._write_command_sync_state(state)
def _record_command_sync_rate_limit(self, app_id: Any, fingerprint: str, retry_after: float) -> None:
retry_after = max(1.0, float(retry_after))
state = self._read_command_sync_state()
state[self._command_sync_state_key(app_id)] = {
**(
state.get(self._command_sync_state_key(app_id))
if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
else {}
),
"fingerprint": fingerprint,
"last_attempt_at": time.time(),
"retry_after_until": time.time() + retry_after,
"retry_after": retry_after,
}
self._write_command_sync_state(state)
def _record_command_sync_success(self, app_id: Any, fingerprint: str, summary: dict) -> None:
state = self._read_command_sync_state()
state[self._command_sync_state_key(app_id)] = {
"fingerprint": fingerprint,
"last_attempt_at": time.time(),
"last_success_at": time.time(),
"summary": summary,
}
self._write_command_sync_state(state)
@staticmethod
def _extract_discord_retry_after(exc: BaseException) -> Optional[float]:
value = getattr(exc, "retry_after", None)
if value is not None:
try:
return max(1.0, float(value))
except (TypeError, ValueError):
return None
response = getattr(exc, "response", None)
headers = getattr(response, "headers", None)
if headers:
for key in ("Retry-After", "X-RateLimit-Reset-After"):
try:
raw = headers.get(key)
except Exception:
raw = None
if raw is None:
continue
try:
return max(1.0, float(raw))
except (TypeError, ValueError):
continue
return None
@staticmethod
def _is_discord_rate_limit(exc: BaseException) -> bool:
"""True only for exceptions that look like Discord 429 rate limits.
Narrower than ``hasattr(exc, 'retry_after')``: discord.py's own
``RateLimited`` exception and any HTTPException with status 429
qualify. This prevents suppressing unrelated failures that happen
to expose a ``retry_after`` attribute."""
# discord.py emits RateLimited / HTTPException subclasses for 429s.
# Guard with isinstance-of-class so a mocked ``discord`` module
# (where attrs are MagicMocks, not types) doesn't trip isinstance.
if DISCORD_AVAILABLE and discord is not None:
for attr_name in ("RateLimited", "HTTPException"):
cls = getattr(discord, attr_name, None)
if not isinstance(cls, type):
continue
if isinstance(exc, cls):
if attr_name == "RateLimited":
return True
status = getattr(exc, "status", None)
if status == 429:
return True
# Fallback duck-type: something named like a rate-limit with a
# numeric retry_after. Covers mocked clients in tests and exotic
# transports, without swallowing arbitrary exceptions.
name = type(exc).__name__.lower()
if ("ratelimit" in name or "rate_limit" in name) and getattr(exc, "retry_after", None) is not None:
return True
response = getattr(exc, "response", None)
status = getattr(response, "status", None) or getattr(response, "status_code", None)
if status == 429:
return True
return False
def _command_sync_mutation_interval_seconds(self) -> float:
return _DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS
async def _sleep_between_command_sync_mutations(self) -> None:
interval = self._command_sync_mutation_interval_seconds()
if interval > 0:
await asyncio.sleep(interval)
async def _run_post_connect_initialization(self) -> None:
"""Finish non-critical startup work after Discord is connected."""
if not self._client:
@@ -840,14 +1008,46 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
return
# Discord's per-app command-management bucket is ~5 writes / 20 s,
# so a mass-prune-plus-upsert reconcile (e.g. 77 orphans + 30
# desired = 107 writes) takes several minutes of forced waits.
# A flat 30 s budget blew up reliably under bucket pressure and
# left slash commands broken for ~60 min until the bucket fully
# recovered. Use a wide ceiling; the cap still guards against a
# true hang. (#16713)
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
app_id = getattr(self._client, "application_id", None) or getattr(getattr(self._client, "user", None), "id", None)
fingerprint = self._desired_command_sync_fingerprint()
skip_reason = self._command_sync_skip_reason(app_id, fingerprint)
if skip_reason:
logger.info("[%s] Skipping Discord slash command sync: %s", self.name, skip_reason)
return
self._record_command_sync_attempt(app_id, fingerprint)
http = getattr(self._client, "http", None)
has_ratelimit_timeout = http is not None and hasattr(http, "max_ratelimit_timeout")
previous_ratelimit_timeout = getattr(http, "max_ratelimit_timeout", None) if has_ratelimit_timeout else None
if has_ratelimit_timeout:
http.max_ratelimit_timeout = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
try:
# Discord's per-app command-management bucket is small, and
# discord.py can otherwise sit inside one long retry sleep
# before surfacing the 429. Keep the whole sync bounded and
# persist Discord's retry-after when it refuses the batch.
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
except Exception as e:
if not self._is_discord_rate_limit(e):
raise
retry_after = self._extract_discord_retry_after(e)
if retry_after is None:
# Rate-limited but no retry-after signal — back off for a
# conservative default so we don't slam the bucket again.
retry_after = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
self._record_command_sync_rate_limit(app_id, fingerprint, retry_after)
logger.warning(
"[%s] Discord rate-limited slash command sync; retrying after %.0fs",
self.name,
retry_after,
)
return
finally:
if has_ratelimit_timeout:
http.max_ratelimit_timeout = previous_ratelimit_timeout
self._record_command_sync_success(app_id, fingerprint, summary)
logger.info(
"[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
self.name,
@@ -1009,11 +1209,20 @@ class DiscordAdapter(BasePlatformAdapter):
created = 0
deleted = 0
http = self._client.http
mutation_count = 0
async def mutate(call, *args):
nonlocal mutation_count
if mutation_count:
await self._sleep_between_command_sync_mutations()
result = await call(*args)
mutation_count += 1
return result
for key, desired in desired_by_key.items():
current = existing_by_key.pop(key, None)
if current is None:
await http.upsert_global_command(app_id, desired)
await mutate(http.upsert_global_command, app_id, desired)
created += 1
continue
@@ -1025,16 +1234,16 @@ class DiscordAdapter(BasePlatformAdapter):
continue
if self._patchable_app_command_payload(current_existing_payload) == self._patchable_app_command_payload(desired):
await http.delete_global_command(app_id, current.id)
await http.upsert_global_command(app_id, desired)
await mutate(http.delete_global_command, app_id, current.id)
await mutate(http.upsert_global_command, app_id, desired)
recreated += 1
continue
await http.edit_global_command(app_id, current.id, desired)
await mutate(http.edit_global_command, app_id, current.id, desired)
updated += 1
for current in existing_by_key.values():
await http.delete_global_command(app_id, current.id)
await mutate(http.delete_global_command, app_id, current.id)
deleted += 1
return {
+1 -1
View File
@@ -4216,7 +4216,7 @@ def _prompt_model_selection(
clear_screen=False,
title=effective_title,
)
idx: int | None = menu.show() # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
idx = menu.show()
from hermes_cli.curses_ui import flush_stdin
flush_stdin()
if idx is None:
+238 -37
View File
@@ -505,6 +505,7 @@ def _read_systemd_unit_properties(
"SubState",
"Result",
"ExecMainStatus",
"MainPID",
),
) -> dict[str, str]:
"""Return selected ``systemctl show`` properties for the gateway unit."""
@@ -538,6 +539,41 @@ def _read_systemd_unit_properties(
return parsed
def _systemd_main_pid_from_props(props: dict[str, str]) -> int | None:
try:
pid = int(props.get("MainPID", "0") or "0")
except (TypeError, ValueError):
return None
return pid if pid > 0 else None
def _systemd_main_pid(system: bool = False) -> int | None:
return _systemd_main_pid_from_props(_read_systemd_unit_properties(system=system))
def _read_gateway_runtime_status() -> dict | None:
try:
from gateway.status import read_runtime_status
state = read_runtime_status()
except Exception:
return None
return state if isinstance(state, dict) else None
def _gateway_runtime_status_for_pid(pid: int | None) -> dict | None:
if not pid:
return None
state = _read_gateway_runtime_status()
if not state:
return None
try:
state_pid = int(state.get("pid", 0) or 0)
except (TypeError, ValueError):
return None
return state if state_pid == pid else None
def _wait_for_systemd_service_restart(
*,
system: bool = False,
@@ -550,6 +586,7 @@ def _wait_for_systemd_service_restart(
svc = get_service_name()
scope_label = _service_scope_label(system).capitalize()
deadline = time.time() + timeout
printed_runtime_wait = False
while time.time() < deadline:
props = _read_systemd_unit_properties(system=system)
@@ -562,19 +599,32 @@ def _wait_for_systemd_service_restart(
new_pid = get_running_pid()
except Exception:
new_pid = None
if not new_pid:
new_pid = _systemd_main_pid_from_props(props)
if active_state == "active":
if new_pid and (previous_pid is None or new_pid != previous_pid):
print(f"{scope_label} service restarted (PID {new_pid})")
return True
if previous_pid is None:
print(f"{scope_label} service restarted")
return True
runtime_state = _gateway_runtime_status_for_pid(new_pid)
gateway_state = (runtime_state or {}).get("gateway_state")
if gateway_state == "running":
print(f"{scope_label} service restarted (PID {new_pid})")
return True
if gateway_state == "startup_failed":
reason = (runtime_state or {}).get("exit_reason") or "startup failed"
print(f"{scope_label} service process restarted (PID {new_pid}), but gateway startup failed: {reason}")
return False
if not printed_runtime_wait:
print(f"{scope_label} service process started (PID {new_pid}); waiting for gateway runtime...")
printed_runtime_wait = True
if active_state == "activating" and sub_state == "auto-restart":
time.sleep(1)
continue
if _systemd_unit_is_start_limited(props):
_print_systemd_start_limit_wait(system=system)
return False
time.sleep(2)
print(
@@ -585,6 +635,46 @@ def _wait_for_systemd_service_restart(
return False
def _systemd_unit_is_start_limited(props: dict[str, str]) -> bool:
result = props.get("Result", "").lower()
sub_state = props.get("SubState", "").lower()
return result == "start-limit-hit" or sub_state == "start-limit-hit"
def _systemd_error_indicates_start_limit(exc: subprocess.CalledProcessError) -> bool:
parts: list[str] = []
for attr in ("stderr", "stdout", "output"):
value = getattr(exc, attr, None)
if not value:
continue
if isinstance(value, bytes):
value = value.decode(errors="replace")
parts.append(str(value))
text = "\n".join(parts).lower()
return (
"start-limit-hit" in text
or "start request repeated too quickly" in text
or "start-limit" in text
)
def _systemd_service_is_start_limited(system: bool = False) -> bool:
return _systemd_unit_is_start_limited(_read_systemd_unit_properties(system=system))
def _print_systemd_start_limit_wait(system: bool = False) -> None:
svc = get_service_name()
scope_label = _service_scope_label(system).capitalize()
scope_flag = " --system" if system else ""
systemctl_prefix = "systemctl " if system else "systemctl --user "
journal_prefix = "journalctl " if system else "journalctl --user "
print(f"{scope_label} service is temporarily rate-limited by systemd.")
print(" systemd is refusing another immediate start after repeated exits.")
print(f" Wait for the start-limit window to expire, then run: {'sudo ' if system else ''}hermes gateway restart{scope_flag}")
print(f" Or clear the failed state manually: {systemctl_prefix}reset-failed {svc}")
print(f" Check logs: {journal_prefix}-u {svc} -l --since '5 min ago'")
def _recover_pending_systemd_restart(system: bool = False, previous_pid: int | None = None) -> bool:
"""Recover a planned service restart that is stuck in systemd state."""
props = _read_systemd_unit_properties(system=system)
@@ -967,6 +1057,27 @@ class UserSystemdUnavailableError(RuntimeError):
"""
class SystemScopeRequiresRootError(RuntimeError):
"""Raised when a system-scope gateway operation is attempted as non-root.
System-scope units live in ``/etc/systemd/system/`` and require root for
install / uninstall / start / stop / restart via ``systemctl``. The
previous behavior was ``sys.exit(1)`` which blew past the wizard's
``except Exception`` guards and dumped the user at a bare shell prompt
with no guidance. Raising a typed exception lets callers that can
recover (the setup wizard) print actionable remediation instead, while
``gateway_command`` still exits 1 with the same message for the direct
CLI path.
``args[0]`` carries the user-facing message, ``args[1]`` the action name.
``str(e)`` returns only the message (not the tuple repr) so format
strings like ``f"Failed: {e}"`` render cleanly.
"""
def __str__(self) -> str:
return self.args[0] if self.args else ""
def _user_dbus_socket_path() -> Path:
"""Return the expected per-user D-Bus socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
@@ -1382,8 +1493,10 @@ def print_systemd_scope_conflict_warning() -> None:
def _require_root_for_system_service(action: str) -> None:
if os.geteuid() != 0:
print(f"System gateway {action} requires root. Re-run with sudo.")
sys.exit(1)
raise SystemScopeRequiresRootError(
f"System gateway {action} requires root. Re-run with sudo.",
action,
)
def _system_service_identity(run_as_user: str | None = None) -> tuple[str, str, str]:
@@ -1930,6 +2043,47 @@ def _select_systemd_scope(system: bool = False) -> bool:
return get_systemd_unit_path(system=True).exists() and not get_systemd_unit_path(system=False).exists()
def _system_scope_wizard_would_need_root(system: bool = False) -> bool:
"""True when the setup wizard is about to trigger a system-scope operation
as a non-root user.
Replicates the decision ``_select_systemd_scope`` makes inside
``systemd_start`` / ``systemd_restart`` / ``systemd_stop`` so the wizard
can detect the dead-end BEFORE prompting, rather than letting
``SystemScopeRequiresRootError`` propagate out and leave the user
staring at a bare shell.
"""
if os.geteuid() == 0:
return False
return _select_systemd_scope(system=system)
def _print_system_scope_remediation(action: str) -> None:
"""Print actionable remediation when the wizard skips a system-scope
prompt because the user isn't root. Keeps the wizard flowing instead of
aborting.
"""
svc = get_service_name()
print_warning(
f"Gateway is installed as a system-wide service — "
f"{action} requires root."
)
print_info(" Options:")
print_info(f" 1. {action.capitalize()} it this time:")
if action == "start":
print_info(f" sudo systemctl start {svc}")
elif action == "stop":
print_info(f" sudo systemctl stop {svc}")
elif action == "restart":
print_info(f" sudo systemctl restart {svc}")
else:
print_info(f" sudo systemctl {action} {svc}")
print_info(" 2. Switch to a per-user service (recommended for personal use):")
print_info(" sudo hermes gateway uninstall --system")
print_info(" hermes gateway install")
print_info(" hermes gateway start")
def _get_restart_drain_timeout() -> float:
"""Return the configured gateway restart drain timeout in seconds."""
raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
@@ -2071,41 +2225,52 @@ def systemd_restart(system: bool = False):
refresh_systemd_unit_if_needed(system=system)
from gateway.status import get_running_pid
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
import time
pid = get_running_pid() or _systemd_main_pid(system=system)
if pid is not None:
scope_label = _service_scope_label(system).capitalize()
svc = get_service_name()
drain_timeout = _get_restart_drain_timeout()
# Phase 1: wait for old process to exit (drain + shutdown)
print(f"{scope_label} service draining active work...")
deadline = time.time() + 90
while time.time() < deadline:
try:
os.kill(pid, 0)
time.sleep(1)
except (ProcessLookupError, PermissionError):
break # old process is gone
else:
print(f"⚠ Old process (PID {pid}) still alive after 90s")
print(f"{scope_label} service restarting gracefully (PID {pid})...")
if _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
# The gateway exits with code 75 for a planned service restart.
# RestartSec can otherwise delay the relaunch even though the
# operator asked for an immediate restart, so kick the unit once
# the old PID has exited and then wait for the replacement PID.
_run_systemctl(
["reset-failed", svc],
system=system,
check=False,
timeout=30,
)
_run_systemctl(
["restart", svc],
system=system,
check=False,
timeout=90,
)
if _wait_for_systemd_service_restart(system=system, previous_pid=pid):
return
if _systemd_service_is_start_limited(system=system):
return
# The gateway exits with code 75 for a planned service restart.
# systemd can sit in the RestartSec window or even wedge itself into a
# failed/rate-limited state if the operator asks for another restart in
# the middle of that handoff. Clear any stale failed state and kick the
# unit immediately so `hermes gateway restart` behaves idempotently.
print(
f"⚠ Graceful restart did not complete within {int(drain_timeout + 5)}s; "
"forcing a service restart..."
)
_run_systemctl(
["reset-failed", svc],
system=system,
check=False,
timeout=30,
)
_run_systemctl(
["start", svc],
system=system,
check=False,
timeout=90,
)
try:
_run_systemctl(["restart", svc], system=system, check=True, timeout=90)
except subprocess.CalledProcessError as exc:
if _systemd_error_indicates_start_limit(exc) or _systemd_service_is_start_limited(system=system):
_print_systemd_start_limit_wait(system=system)
return
raise
_wait_for_systemd_service_restart(system=system, previous_pid=pid)
return
@@ -2118,8 +2283,14 @@ def systemd_restart(system: bool = False):
check=False,
timeout=30,
)
_run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
print(f"{_service_scope_label(system).capitalize()} service restarted")
try:
_run_systemctl(["restart", get_service_name()], system=system, check=True, timeout=90)
except subprocess.CalledProcessError as exc:
if _systemd_error_indicates_start_limit(exc) or _systemd_service_is_start_limited(system=system):
_print_systemd_start_limit_wait(system=system)
return
raise
_wait_for_systemd_service_restart(system=system, previous_pid=pid)
@@ -2191,6 +2362,10 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
result_code = unit_props.get("Result", "")
if active_state == "activating" and sub_state == "auto-restart":
print(" ⏳ Restart pending: systemd is waiting to relaunch the gateway")
elif _systemd_unit_is_start_limited(unit_props):
print(" ⏳ Restart pending: systemd is temporarily rate-limiting starts")
print(f" Run after the start-limit window expires: {'sudo ' if system else ''}hermes gateway restart{scope_flag}")
print(f" Or clear it manually: systemctl {'--user ' if not system else ''}reset-failed {get_service_name()}")
elif active_state == "failed" and exec_main_status == str(GATEWAY_SERVICE_RESTART_EXIT_CODE):
print(" ⚠ Planned restart is stuck in systemd failed state (exit 75)")
print(f" Run: systemctl {'--user ' if not system else ''}reset-failed {get_service_name()} && {'sudo ' if system else ''}hermes gateway start{scope_flag}")
@@ -4115,7 +4290,9 @@ def gateway_setup():
print_success("Gateway service is installed and running.")
elif service_installed:
print_warning("Gateway service is installed but not running.")
if prompt_yes_no(" Start it now?", True):
if supports_systemd_services() and _system_scope_wizard_would_need_root():
_print_system_scope_remediation("start")
elif prompt_yes_no(" Start it now?", True):
try:
if supports_systemd_services():
systemd_start()
@@ -4125,6 +4302,12 @@ def gateway_setup():
print_error(" Failed to start — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
# Defense in depth: the pre-check above should have caught
# this, but handle the race/edge case gracefully instead of
# letting the exception escape the wizard.
print_error(f" Failed to start: {e}")
_print_system_scope_remediation("start")
except subprocess.CalledProcessError as e:
print_error(f" Failed to start: {e}")
else:
@@ -4174,7 +4357,9 @@ def gateway_setup():
service_running = _is_service_running()
if service_running:
if prompt_yes_no(" Restart the gateway to pick up changes?", True):
if supports_systemd_services() and _system_scope_wizard_would_need_root():
_print_system_scope_remediation("restart")
elif prompt_yes_no(" Restart the gateway to pick up changes?", True):
try:
if supports_systemd_services():
systemd_restart()
@@ -4187,10 +4372,15 @@ def gateway_setup():
print_error(" Restart failed — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
print_error(f" Restart failed: {e}")
_print_system_scope_remediation("restart")
except subprocess.CalledProcessError as e:
print_error(f" Restart failed: {e}")
elif service_installed:
if prompt_yes_no(" Start the gateway service?", True):
if supports_systemd_services() and _system_scope_wizard_would_need_root():
_print_system_scope_remediation("start")
elif prompt_yes_no(" Start the gateway service?", True):
try:
if supports_systemd_services():
systemd_start()
@@ -4200,6 +4390,9 @@ def gateway_setup():
print_error(" Start failed — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
print_error(f" Start failed: {e}")
_print_system_scope_remediation("start")
except subprocess.CalledProcessError as e:
print_error(f" Start failed: {e}")
else:
@@ -4273,6 +4466,14 @@ def gateway_command(args):
for line in str(e).splitlines():
print(f" {line}")
sys.exit(1)
except SystemScopeRequiresRootError as e:
# The direct ``hermes gateway install|uninstall|start|stop|restart``
# path lands here when the user typed a system-scope action without
# sudo. Same exit code as before — just gives the wizard a way to
# intercept the same condition with friendlier guidance before the
# error is raised.
print(str(e))
sys.exit(1)
def _gateway_command_inner(args):
+4 -4
View File
@@ -3534,7 +3534,7 @@ def _remove_custom_provider(config):
clear_screen=False,
title="Select provider to remove:",
)
idx: int | None = menu.show() # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
idx = menu.show()
from hermes_cli.curses_ui import flush_stdin
flush_stdin()
@@ -3620,7 +3620,7 @@ def _model_flow_named_custom(config, provider_info):
clear_screen=False,
title=f"Select model from {name}:",
)
idx: int | None = menu.show() # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
idx = menu.show()
from hermes_cli.curses_ui import flush_stdin
flush_stdin()
@@ -3796,7 +3796,7 @@ def _prompt_reasoning_effort_selection(efforts, current_effort=""):
clear_screen=False,
title="Select reasoning effort:",
)
idx: int | None = menu.show() # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
idx = menu.show()
from hermes_cli.curses_ui import flush_stdin
flush_stdin()
@@ -7582,7 +7582,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
# systemd units without SIGUSR1 wiring this wait just times out
# and we fall back to ``systemctl restart`` (the old behaviour).
try:
from gateway.restart import (
from hermes_constants import (
DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT as _DEFAULT_DRAIN,
)
except Exception:
+22 -2
View File
@@ -2462,6 +2462,9 @@ def setup_gateway(config: dict):
launchd_start,
launchd_restart,
UserSystemdUnavailableError,
SystemScopeRequiresRootError,
_system_scope_wizard_would_need_root,
_print_system_scope_remediation,
)
service_installed = _is_service_installed()
@@ -2479,7 +2482,9 @@ def setup_gateway(config: dict):
print()
if service_running:
if prompt_yes_no(" Restart the gateway to pick up changes?", True):
if supports_systemd and _system_scope_wizard_would_need_root():
_print_system_scope_remediation("restart")
elif prompt_yes_no(" Restart the gateway to pick up changes?", True):
try:
if supports_systemd:
systemd_restart()
@@ -2489,10 +2494,19 @@ def setup_gateway(config: dict):
print_error(" Restart failed — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
# Defense in depth: the pre-check above should have
# caught this, but a race (unit file appearing mid-run)
# could still land here. Previously this exited the
# whole wizard via sys.exit(1).
print_error(f" Restart failed: {e}")
_print_system_scope_remediation("restart")
except Exception as e:
print_error(f" Restart failed: {e}")
elif service_installed:
if prompt_yes_no(" Start the gateway service?", True):
if supports_systemd and _system_scope_wizard_would_need_root():
_print_system_scope_remediation("start")
elif prompt_yes_no(" Start the gateway service?", True):
try:
if supports_systemd:
systemd_start()
@@ -2502,6 +2516,9 @@ def setup_gateway(config: dict):
print_error(" Start failed — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
print_error(f" Start failed: {e}")
_print_system_scope_remediation("start")
except Exception as e:
print_error(f" Start failed: {e}")
elif supports_service_manager:
@@ -2529,6 +2546,9 @@ def setup_gateway(config: dict):
print_error(" Start failed — user systemd not reachable:")
for line in str(e).splitlines():
print(f" {line}")
except SystemScopeRequiresRootError as e:
print_error(f" Start failed: {e}")
_print_system_scope_remediation("start")
except Exception as e:
print_error(f" Start failed: {e}")
except Exception as e:
+148 -36
View File
@@ -281,6 +281,8 @@ _recorder_lock = threading.Lock()
# ── Continuous (VAD) state ───────────────────────────────────────────
_continuous_lock = threading.Lock()
_continuous_active = False
_continuous_stopping = False
_continuous_auto_restart: bool = True
_continuous_recorder: Any = None
# ── TTS-vs-STT feedback guard ────────────────────────────────────────
@@ -370,32 +372,43 @@ def start_continuous(
on_silent_limit: Optional[Callable[[], None]] = None,
silence_threshold: int = 200,
silence_duration: float = 3.0,
) -> None:
auto_restart: bool = True,
) -> bool:
"""Start a VAD-driven continuous recording loop.
The loop calls ``on_transcript(text)`` each time speech is detected and
transcribed successfully, then auto-restarts. After
``_CONTINUOUS_NO_SPEECH_LIMIT`` consecutive silent cycles (no speech
picked up at all) the loop stops itself and calls ``on_silent_limit``
so the UI can reflect "voice off". Idempotent calling while already
active is a no-op.
transcribed successfully. If ``auto_restart`` is True, it auto-restarts
for the next turn and resets the no-speech counter for that loop. If
``auto_restart`` is False, the first silence-triggered transcription ends
the loop and reports ``"idle"``; no-speech counts are retained across
starts so a push-to-talk caller can still enforce the three-strikes guard.
After ``_CONTINUOUS_NO_SPEECH_LIMIT`` consecutive silent cycles (no speech
picked up at all) the loop stops itself and calls ``on_silent_limit`` so the
UI can reflect "voice off". Returns False if a previous stop is still
transcribing/cleaning up; otherwise returns True. Idempotent calling while
already active is a successful no-op.
``on_status`` is called with ``"listening"`` / ``"transcribing"`` /
``"idle"`` so the UI can show a live indicator.
"""
global _continuous_active, _continuous_recorder
global _continuous_active, _continuous_recorder, _continuous_auto_restart
global _continuous_on_transcript, _continuous_on_status, _continuous_on_silent_limit
global _continuous_no_speech_count
with _continuous_lock:
if _continuous_active:
_debug("start_continuous: already active — no-op")
return
return True
if _continuous_stopping:
_debug("start_continuous: stop/transcribe in progress — busy")
return False
_continuous_active = True
_continuous_auto_restart = auto_restart
_continuous_on_transcript = on_transcript
_continuous_on_status = on_status
_continuous_on_silent_limit = on_silent_limit
_continuous_no_speech_count = 0
if auto_restart:
_continuous_no_speech_count = 0
if _continuous_recorder is None:
_continuous_recorder = create_audio_recorder()
@@ -428,15 +441,18 @@ def start_continuous(
except Exception:
pass
return True
def stop_continuous() -> None:
def stop_continuous(force_transcribe: bool = False) -> None:
"""Stop the active continuous loop and release the microphone.
Idempotent calling while not active is a no-op. Any in-flight
transcription completes but its result is discarded (the callback
checks ``_continuous_active`` before firing).
Idempotent calling while not active is a no-op. If ``force_transcribe`` is
True, the recorder stops synchronously, then transcription/cleanup runs on a
background thread before reporting ``"idle"``. Otherwise the buffer is
discarded.
"""
global _continuous_active, _continuous_on_transcript
global _continuous_active, _continuous_on_transcript, _continuous_stopping
global _continuous_on_status, _continuous_on_silent_limit
global _continuous_recorder, _continuous_no_speech_count
@@ -446,18 +462,98 @@ def stop_continuous() -> None:
_continuous_active = False
rec = _continuous_recorder
on_status = _continuous_on_status
on_transcript = _continuous_on_transcript
on_silent_limit = _continuous_on_silent_limit
auto_restart = _continuous_auto_restart
track_no_speech = force_transcribe and not auto_restart
_continuous_stopping = rec is not None
_continuous_on_transcript = None
_continuous_on_status = None
_continuous_on_silent_limit = None
_continuous_no_speech_count = 0
if not track_no_speech:
_continuous_no_speech_count = 0
if rec is not None:
try:
# cancel() (not stop()) discards buffered frames — the loop
# is over, we don't want to transcribe a half-captured turn.
rec.cancel()
except Exception as e:
logger.warning("failed to cancel recorder: %s", e)
if force_transcribe and on_transcript:
if on_status:
try:
on_status("transcribing")
except Exception:
pass
try:
wav_path = rec.stop()
except Exception as e:
logger.warning("failed to stop recorder: %s", e)
try:
rec.cancel()
except Exception as cancel_error:
logger.warning("failed to cancel recorder: %s", cancel_error)
wav_path = None
def _transcribe_and_cleanup():
global _continuous_no_speech_count, _continuous_stopping
transcript: Optional[str] = None
should_halt = False
try:
if wav_path:
try:
result = transcribe_recording(wav_path)
if result.get("success"):
text = (result.get("transcript") or "").strip()
if text and not is_whisper_hallucination(text):
transcript = text
finally:
if os.path.isfile(wav_path):
os.unlink(wav_path)
except Exception as e:
logger.warning("failed to stop/transcribe recorder: %s", e)
finally:
if transcript:
try:
on_transcript(transcript)
except Exception as e:
logger.warning("on_transcript callback raised: %s", e)
if track_no_speech:
with _continuous_lock:
if transcript:
_continuous_no_speech_count = 0
else:
_continuous_no_speech_count += 1
should_halt = (
_continuous_no_speech_count
>= _CONTINUOUS_NO_SPEECH_LIMIT
)
if should_halt:
_continuous_no_speech_count = 0
if should_halt and on_silent_limit:
try:
on_silent_limit()
except Exception:
pass
_play_beep(frequency=660, count=2)
with _continuous_lock:
_continuous_stopping = False
if on_status:
try:
on_status("idle")
except Exception:
pass
threading.Thread(target=_transcribe_and_cleanup, daemon=True).start()
return
else:
try:
# cancel() (not stop()) discards buffered frames — the loop
# is over, we don't want to transcribe a half-captured turn.
rec.cancel()
except Exception as e:
logger.warning("failed to cancel recorder: %s", e)
with _continuous_lock:
_continuous_stopping = False
# Audible "recording stopped" cue (CLI parity: same 660 Hz × 2 the
# silence-auto-stop path plays).
@@ -603,23 +699,39 @@ def _continuous_on_silence() -> None:
_debug("_continuous_on_silence: stopped while waiting for TTS")
return
# Restart for the next turn.
_debug(f"_continuous_on_silence: restarting loop (no_speech={no_speech})")
_play_beep(frequency=880, count=1)
try:
rec.start(on_silence_stop=_continuous_on_silence)
except Exception as e:
logger.error("failed to restart continuous recording: %s", e)
_debug(f"_continuous_on_silence: restart raised {type(e).__name__}: {e}")
if _continuous_auto_restart:
# Restart for the next turn.
_debug(f"_continuous_on_silence: restarting loop (no_speech={no_speech})")
_play_beep(frequency=880, count=1)
try:
rec.start(on_silence_stop=_continuous_on_silence)
except Exception as e:
logger.error("failed to restart continuous recording: %s", e)
_debug(f"_continuous_on_silence: restart raised {type(e).__name__}: {e}")
with _continuous_lock:
_continuous_active = False
if on_status:
try:
on_status("idle")
except Exception:
pass
return
if on_status:
try:
on_status("listening")
except Exception:
pass
else:
# Do not auto-restart. Clean up state and notify idle.
_debug("_continuous_on_silence: auto_restart=False, stopping loop")
with _continuous_lock:
_continuous_active = False
return
if on_status:
try:
on_status("listening")
except Exception:
pass
if on_status:
try:
on_status("idle")
except Exception:
pass
# ── TTS API ──────────────────────────────────────────────────────────
+296
View File
@@ -0,0 +1,296 @@
"""
Benchmark: Current main (3 separate WS connections) vs optimized (1 connection).
Compares the two CDP coordinate click implementations against a real
Lightpanda WebSocket at ws://127.0.0.1:63372/.
- Baseline (current main style): 3 separate _cdp_call() invocations, each
opening a fresh WS connection (Target.getTargets, mousePressed, mouseReleased)
- Optimized (this PR): single WS connection with all 4 messages pipelined
(getTargets + attachToTarget + mousePressed+mouseReleased in one burst)
Also measures the agent-browser HTTP IPC round-trip as a reference point
for how fast the existing ref-based click path is.
Usage:
python scripts/benchmark_click_paths.py
python scripts/benchmark_click_paths.py --iterations 300 --warmup 20
"""
from __future__ import annotations
import argparse
import asyncio
import json
import sys
import time
import urllib.request
from statistics import mean, median, stdev
from typing import List, Dict, Optional, Tuple
import os
# Add repo root to sys.path when running this script directly
_repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
if _repo_root not in sys.path:
sys.path.insert(0, _repo_root)
LIGHTPANDA_WS = "ws://127.0.0.1:63372/"
AGENT_BROWSER_PORT = 63371
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _stats(times_s: List[float]) -> Dict:
ms = [t * 1000 for t in times_s]
return {
"mean_ms": mean(ms),
"median_ms": median(ms),
"min_ms": min(ms),
"max_ms": max(ms),
"stdev_ms": stdev(ms) if len(ms) > 1 else 0.0,
"p95_ms": sorted(ms)[int(len(ms) * 0.95)],
}
def _bench(fn, warmup: int, n: int) -> Tuple[List[float], int]:
for _ in range(warmup):
fn()
times, errors = [], 0
for _ in range(n):
t0 = time.perf_counter()
try:
result = fn()
elapsed = time.perf_counter() - t0
if isinstance(result, str):
d = json.loads(result)
if not d.get("success"):
errors += 1
except Exception:
elapsed = time.perf_counter() - t0
errors += 1
times.append(elapsed)
return times, errors
def _row(label: str, stats: Dict, col_w: int = 9) -> None:
print(
f" {label:<46} "
f"{stats['mean_ms']:>{col_w}.2f} "
f"{stats['median_ms']:>{col_w}.2f} "
f"{stats['min_ms']:>{col_w}.2f} "
f"{stats['p95_ms']:>{col_w}.2f} "
f"{stats['max_ms']:>{col_w}.2f} ms"
)
# ---------------------------------------------------------------------------
# The "current main" approach — 3 separate _cdp_call() connections
# ---------------------------------------------------------------------------
def _baseline_cdp_click(endpoint: str, x: int, y: int, button: str = "left") -> str:
"""Replicate the previous 3-connection approach from the original PR."""
from tools.browser_cdp_tool import _cdp_call, _run_async
try:
targets_result = _run_async(_cdp_call(endpoint, "Target.getTargets", {}, None, 10.0))
page_target = None
for t in targets_result.get("targetInfos", []):
if t.get("type") == "page" and t.get("attached", True):
page_target = t["targetId"]
break
except Exception:
page_target = None
mouse_params = {"type": "", "x": x, "y": y, "button": button, "clickCount": 1}
try:
_run_async(_cdp_call(endpoint, "Input.dispatchMouseEvent",
{**mouse_params, "type": "mousePressed"}, page_target, 10.0))
_run_async(_cdp_call(endpoint, "Input.dispatchMouseEvent",
{**mouse_params, "type": "mouseReleased"}, page_target, 10.0))
except Exception as e:
return json.dumps({"success": False, "error": str(e)})
return json.dumps({"success": True, "clicked_at": {"x": x, "y": y}, "method": "baseline"})
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def run_benchmark(iterations: int = 300, warmup: int = 20) -> None:
print(f"\n{'=' * 78}")
print(f" browser_click Coordinate Click: Current Main vs Optimized (1-conn)")
print(f" Real Lightpanda WS: {LIGHTPANDA_WS}")
print(f"{'=' * 78}")
print(f" Iterations: {iterations} | Warmup: {warmup}")
# pre-flight
try:
with urllib.request.urlopen("http://127.0.0.1:63372/json/version", timeout=2) as r:
info = json.loads(r.read())
assert "webSocketDebuggerUrl" in info
print(f" ✓ Lightpanda CDP: {info.get('webSocketDebuggerUrl')}")
except Exception as e:
print(f" ✗ Lightpanda not reachable: {e}")
return
try:
with urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=2) as r:
sessions = json.loads(r.read())
print(f" ✓ agent-browser: {len(sessions)} session(s)")
ab_ok = True
except Exception:
print(f" ⚠ agent-browser not reachable — ref-click IPC baseline skipped")
ab_ok = False
import importlib
import tools.browser_tool as bt
import tools.browser_cdp_tool as cdp_mod
importlib.reload(cdp_mod)
importlib.reload(bt)
bt._is_camofox_mode = lambda: False
_orig_resolve = cdp_mod._resolve_cdp_endpoint
# -----------------------------------------------------------------------
# 1. Baseline: current-main 3-connection approach
# -----------------------------------------------------------------------
print(f"\n [1/4] Baseline (current main — 3 separate WS connections per click)")
print(f" Warmup {warmup}, then {iterations} iterations...")
base_times, base_err = _bench(
lambda: _baseline_cdp_click(LIGHTPANDA_WS, 150, 200),
warmup, iterations,
)
base_stats = _stats(base_times)
print(f" Done — {base_err} errors, mean={base_stats['mean_ms']:.2f}ms")
# -----------------------------------------------------------------------
# 2. Optimized: single-connection — cold cache (session resolve included)
# -----------------------------------------------------------------------
print(f"\n [2/4] Optimized — cold cache (1 WS conn, includes getTargets+attachToTarget)")
print(f" {iterations} iterations, cache cleared before each...")
def _cold_click():
bt._CDP_SESSION_CACHE.clear()
return bt.browser_click(x=150.0, y=200.0, task_id="bench")
cdp_mod._resolve_cdp_endpoint = lambda: LIGHTPANDA_WS
# Temporarily null out supervisor registry so this test isolates path 2
import tools.browser_supervisor as sup_mod
_orig_registry_get = sup_mod.SUPERVISOR_REGISTRY.get
sup_mod.SUPERVISOR_REGISTRY.get = lambda tid: None
cold_times, cold_err = _bench(_cold_click, warmup=0, n=iterations)
cold_stats = _stats(cold_times)
print(f" Done — {cold_err} errors, mean={cold_stats['mean_ms']:.2f}ms")
# -----------------------------------------------------------------------
# 3. Optimized: warm cache (session cached — skips getTargets+attachToTarget)
# -----------------------------------------------------------------------
print(f"\n [3/4] Optimized — warm cache (1 WS conn, skips getTargets+attachToTarget)")
print(f" Warmup {warmup} (fills cache), then {iterations} iterations...")
bt._CDP_SESSION_CACHE.clear()
opt_times, opt_err = _bench(
lambda: bt.browser_click(x=150.0, y=200.0, task_id="bench"),
warmup, iterations,
)
sup_mod.SUPERVISOR_REGISTRY.get = _orig_registry_get
cdp_mod._resolve_cdp_endpoint = _orig_resolve
opt_stats = _stats(opt_times)
print(f" Done — {opt_err} errors, mean={opt_stats['mean_ms']:.2f}ms")
# -----------------------------------------------------------------------
# 4. Supervisor path: real CDPSupervisor with persistent WS
# -----------------------------------------------------------------------
print(f"\n [4/4] Supervisor path (persistent WS — zero per-click connection cost)")
print(f" Starting supervisor → {LIGHTPANDA_WS}...")
sup_stats = None
sup_err_count = 0
try:
supervisor = sup_mod.CDPSupervisor.__new__(sup_mod.CDPSupervisor)
# minimal init — we only need _loop, _ws, _page_session_id, _state_lock,
# _pending_calls, _next_call_id, _active, _stop_requested
# Use SUPERVISOR_REGISTRY.get_or_start for a fully initialized supervisor
TASK_ID = "bench-supervisor"
real_sup = sup_mod.SUPERVISOR_REGISTRY.get_or_start(TASK_ID, LIGHTPANDA_WS)
import time as _time
# Give supervisor time to connect and attach
for _ in range(20):
snap = real_sup.snapshot()
if snap.active:
break
_time.sleep(0.1)
if not real_sup.snapshot().active:
print(f" ⚠ Supervisor did not become active — skipping")
else:
print(f" ✓ Supervisor active, warmup {warmup}...")
def _sup_click():
real_sup.dispatch_mouse_click(150, 200)
return json.dumps({"success": True})
for _ in range(warmup):
_sup_click()
print(f" Running {iterations} iterations...")
sup_times, sup_err_count = _bench(_sup_click, warmup=0, n=iterations)
sup_stats = _stats(sup_times)
print(f" Done — {sup_err_count} errors, mean={sup_stats['mean_ms']:.2f}ms")
sup_mod.SUPERVISOR_REGISTRY.stop(TASK_ID)
except Exception as e:
print(f" ⚠ Supervisor benchmark failed: {e}")
# -----------------------------------------------------------------------
# Ref baseline
# -----------------------------------------------------------------------
if ab_ok:
print(f"\n [ref] agent-browser HTTP IPC (ref-click latency baseline)")
ab_times = []
for _ in range(warmup):
urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=5).read()
for _ in range(iterations):
t0 = time.perf_counter()
urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=5).read()
ab_times.append(time.perf_counter() - t0)
ab_stats = _stats(ab_times)
print(f" Done — mean={ab_stats['mean_ms']:.2f}ms")
# -----------------------------------------------------------------------
# Results
# -----------------------------------------------------------------------
col_w = 9
print(f"\n{'' * 82}")
print(f" {'Approach':<50} {'Mean':>{col_w}} {'Median':>{col_w}} {'Min':>{col_w}} {'p95':>{col_w}}")
print(f"{'' * 82}")
_row("Baseline (3 WS connections, sequential) ", base_stats, col_w)
_row("Optimized — cold cache (1 conn + negotiate) ", cold_stats, col_w)
_row("Optimized — warm cache (1 conn, skip resolve) ", opt_stats, col_w)
if sup_stats:
_row("Supervisor (persistent WS, zero conn cost) ", sup_stats, col_w)
if ab_ok:
_row("Ref-click IPC baseline (1 HTTP req) ", ab_stats, col_w)
print(f"{'' * 82}")
print(f"\n Speedups (mean vs baseline):")
print(f" Cold cache: {base_stats['mean_ms'] / cold_stats['mean_ms']:.2f}x ({base_stats['mean_ms'] - cold_stats['mean_ms']:.2f} ms saved)")
print(f" Warm cache: {base_stats['mean_ms'] / opt_stats['mean_ms']:.2f}x ({base_stats['mean_ms'] - opt_stats['mean_ms']:.2f} ms saved)")
if sup_stats:
print(f" Supervisor: {base_stats['mean_ms'] / sup_stats['mean_ms']:.2f}x ({base_stats['mean_ms'] - sup_stats['mean_ms']:.2f} ms saved)")
print(f" Warm→Supervisor additional gain: {opt_stats['mean_ms'] - sup_stats['mean_ms']:.2f} ms (WS conn eliminated)")
if ab_ok and sup_stats:
print(f" Supervisor vs ref-click: {sup_stats['mean_ms'] / ab_stats['mean_ms']:.1f}x (+{sup_stats['mean_ms'] - ab_stats['mean_ms']:.2f} ms)")
print(f"\n Optimization tiers in this PR:")
print(f" 1. Single WS connection — eliminates 2 TCP+WS handshakes")
print(f" 2. mouseReleased-only wait — skips redundant press ack (Playwright)")
print(f" 3. Session ID cache — skips getTargets+attachToTarget")
print(f" 4. Supervisor reuse (new) — eliminates the WS open entirely")
print(f" Active after browser_navigate; falls back to warm-cache path if absent.")
print()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--iterations", type=int, default=300)
parser.add_argument("--warmup", type=int, default=20)
args = parser.parse_args()
run_benchmark(iterations=args.iterations, warmup=args.warmup)
+187
View File
@@ -1,4 +1,5 @@
import asyncio
import json
import sys
from types import SimpleNamespace
from unittest.mock import AsyncMock, MagicMock
@@ -70,6 +71,15 @@ import gateway.platforms.discord as discord_platform # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
@pytest.fixture(autouse=True)
def _speed_up_command_sync_mutation_pacing(monkeypatch):
monkeypatch.setattr(
DiscordAdapter,
"_command_sync_mutation_interval_seconds",
lambda self: 0.0,
)
class FakeTree:
def __init__(self):
self.sync = AsyncMock(return_value=[])
@@ -536,6 +546,183 @@ async def test_post_connect_initialization_skips_sync_when_policy_off(monkeypatc
fake_tree.sync.assert_not_called()
@pytest.mark.asyncio
async def test_post_connect_initialization_skips_same_fingerprint_after_success(tmp_path, monkeypatch):
adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
class _DesiredCommand:
def to_dict(self, tree):
return {
"name": "status",
"description": "Show Hermes status",
"type": 1,
"options": [],
}
fake_tree = SimpleNamespace(
get_commands=lambda: [_DesiredCommand()],
fetch_commands=AsyncMock(return_value=[]),
)
fake_http = SimpleNamespace(
upsert_global_command=AsyncMock(),
edit_global_command=AsyncMock(),
delete_global_command=AsyncMock(),
)
adapter._client = SimpleNamespace(
tree=fake_tree,
http=fake_http,
application_id=999,
user=SimpleNamespace(id=999),
)
await adapter._run_post_connect_initialization()
await adapter._run_post_connect_initialization()
fake_tree.fetch_commands.assert_awaited_once()
fake_http.upsert_global_command.assert_awaited_once()
@pytest.mark.asyncio
async def test_post_connect_initialization_respects_discord_retry_after(tmp_path, monkeypatch):
adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
class _DesiredCommand:
def to_dict(self, tree):
return {
"name": "status",
"description": "Show Hermes status",
"type": 1,
"options": [],
}
adapter._client = SimpleNamespace(
tree=SimpleNamespace(get_commands=lambda: [_DesiredCommand()]),
application_id=999,
user=SimpleNamespace(id=999),
)
class _DiscordRateLimit(RuntimeError):
retry_after = 123.0
sync = AsyncMock(side_effect=_DiscordRateLimit("discord rate limited"))
monkeypatch.setattr(adapter, "_safe_sync_slash_commands", sync)
await adapter._run_post_connect_initialization()
await adapter._run_post_connect_initialization()
sync.assert_awaited_once()
state_path = (
tmp_path
/ discord_platform._DISCORD_COMMAND_SYNC_STATE_SUBDIR
/ discord_platform._DISCORD_COMMAND_SYNC_STATE_FILENAME
)
state = json.loads(state_path.read_text())
entry = state["999"]
assert entry["retry_after"] == 123.0
assert entry["retry_after_until"] > entry["last_attempt_at"]
@pytest.mark.asyncio
async def test_post_connect_initialization_reraises_non_rate_limit_exceptions(tmp_path, monkeypatch):
"""Arbitrary failures during sync must surface, not be swallowed as rate-limits."""
adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
class _DesiredCommand:
def to_dict(self, tree):
return {"name": "status", "description": "Show Hermes status", "type": 1, "options": []}
adapter._client = SimpleNamespace(
tree=SimpleNamespace(get_commands=lambda: [_DesiredCommand()]),
application_id=4242,
user=SimpleNamespace(id=4242),
)
# Unrelated failure that happens to expose retry_after. Must NOT be
# caught by the rate-limit handler — it has nothing to do with 429s.
class _UnrelatedError(RuntimeError):
retry_after = 999.0
sync = AsyncMock(side_effect=_UnrelatedError("database is down"))
monkeypatch.setattr(adapter, "_safe_sync_slash_commands", sync)
# The outer _run_post_connect_initialization has a broad except Exception
# that logs defensively — so we assert on state NOT being written.
await adapter._run_post_connect_initialization()
sync.assert_awaited_once()
state_path = (
tmp_path
/ discord_platform._DISCORD_COMMAND_SYNC_STATE_SUBDIR
/ discord_platform._DISCORD_COMMAND_SYNC_STATE_FILENAME
)
state = json.loads(state_path.read_text()) if state_path.exists() else {}
entry = state.get("4242", {})
# Attempt was recorded before the sync call, but no rate-limit cooldown
# should have been persisted from the unrelated exception.
assert "retry_after_until" not in entry
assert "retry_after" not in entry
@pytest.mark.asyncio
async def test_safe_sync_slash_commands_paces_mutation_writes(monkeypatch):
adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
monkeypatch.setattr(
DiscordAdapter,
"_command_sync_mutation_interval_seconds",
lambda self: 1.25,
)
sleeps = []
async def fake_sleep(delay):
sleeps.append(delay)
monkeypatch.setattr(discord_platform.asyncio, "sleep", fake_sleep)
class _DesiredCommand:
def __init__(self, payload):
self._payload = payload
def to_dict(self, tree):
assert tree is not None
return dict(self._payload)
desired_one = {
"name": "status",
"description": "Show Hermes status",
"type": 1,
"options": [],
}
desired_two = {
"name": "debug",
"description": "Generate a debug report",
"type": 1,
"options": [],
}
fake_tree = SimpleNamespace(
get_commands=lambda: [_DesiredCommand(desired_one), _DesiredCommand(desired_two)],
fetch_commands=AsyncMock(return_value=[]),
)
fake_http = SimpleNamespace(
upsert_global_command=AsyncMock(),
edit_global_command=AsyncMock(),
delete_global_command=AsyncMock(),
)
adapter._client = SimpleNamespace(
tree=fake_tree,
http=fake_http,
application_id=999,
user=SimpleNamespace(id=999),
)
summary = await adapter._safe_sync_slash_commands()
assert summary["created"] == 2
assert fake_http.upsert_global_command.await_count == 2
assert sleeps == [1.25]
@pytest.mark.asyncio
async def test_safe_sync_reads_permission_attrs_from_existing_command():
"""Regression: AppCommand.to_dict() in discord.py does NOT include
+294 -33
View File
@@ -2,6 +2,7 @@
import os
import pwd
import subprocess
from pathlib import Path
from types import SimpleNamespace
@@ -90,6 +91,13 @@ class TestSystemdServiceRefresh:
monkeypatch.setattr(gateway_cli, "generate_systemd_unit", lambda system=False, run_as_user=None: "new unit\n")
calls = []
monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
monkeypatch.setattr(gateway_cli, "_recover_pending_systemd_restart", lambda system=False, previous_pid=None: False)
monkeypatch.setattr(
gateway_cli,
"_wait_for_systemd_service_restart",
lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
)
def fake_run(cmd, check=True, **kwargs):
calls.append(cmd)
@@ -100,11 +108,12 @@ class TestSystemdServiceRefresh:
gateway_cli.systemd_restart()
assert unit_path.read_text(encoding="utf-8") == "new unit\n"
assert calls[:4] == [
assert calls[:5] == [
["systemctl", "--user", "daemon-reload"],
["systemctl", "--user", "show", gateway_cli.get_service_name(), "--no-pager", "--property", "ActiveState,SubState,Result,ExecMainStatus"],
["systemctl", "--user", "show", gateway_cli.get_service_name(), "--no-pager", "--property", "ActiveState,SubState,Result,ExecMainStatus,MainPID"],
["systemctl", "--user", "reset-failed", gateway_cli.get_service_name()],
["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
["systemctl", "--user", "restart", gateway_cli.get_service_name()],
("wait", False, None),
]
def test_systemd_stop_marks_running_gateway_as_planned_stop(self, monkeypatch):
@@ -611,62 +620,141 @@ class TestGatewayServiceDetection:
assert gateway_cli._is_service_running() is False
class TestGatewaySystemServiceRouting:
def test_systemd_restart_self_requests_graceful_restart_and_waits(self, monkeypatch, capsys):
def test_systemd_restart_gracefully_restarts_running_service_and_waits(self, monkeypatch, capsys):
calls = []
monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: calls.append(("refresh", system)))
monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 12.0)
monkeypatch.setattr(
"gateway.status.get_running_pid",
lambda: 654,
)
monkeypatch.setattr(
gateway_cli,
"_request_gateway_self_restart",
lambda pid: calls.append(("self", pid)) or True,
"_graceful_restart_via_sigusr1",
lambda pid, timeout: calls.append(("graceful", pid, timeout)) or True,
)
# Simulate: old process dies immediately, new process becomes active
kill_call_count = [0]
def fake_kill(pid, sig):
kill_call_count[0] += 1
if kill_call_count[0] >= 2: # first call checks, second = dead
raise ProcessLookupError()
monkeypatch.setattr(os, "kill", fake_kill)
# Simulate systemctl reset-failed/start followed by an active unit
new_pid = [None]
# Simulate systemctl reset-failed/restart followed by an active unit.
# A plain start does not break systemd's auto-restart timer once the
# old gateway has exited with the planned restart code.
def fake_subprocess_run(cmd, **kwargs):
if "reset-failed" in cmd:
calls.append(("reset-failed", cmd))
return SimpleNamespace(stdout="", returncode=0)
if "start" in cmd:
calls.append(("start", cmd))
if "restart" in cmd:
calls.append(("restart", cmd))
return SimpleNamespace(stdout="", returncode=0)
if "show" in cmd:
new_pid[0] = 999
return SimpleNamespace(
stdout="ActiveState=active\nSubState=running\nResult=success\nExecMainStatus=0\n",
returncode=0,
)
raise AssertionError(f"Unexpected systemctl call: {cmd}")
monkeypatch.setattr(gateway_cli.subprocess, "run", fake_subprocess_run)
# get_running_pid returns new PID after restart
pid_calls = [0]
def fake_get_pid():
pid_calls[0] += 1
return 999 if pid_calls[0] > 1 else 654
monkeypatch.setattr("gateway.status.get_running_pid", fake_get_pid)
monkeypatch.setattr(
gateway_cli,
"_wait_for_systemd_service_restart",
lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
)
gateway_cli.systemd_restart()
assert ("self", 654) in calls
assert ("graceful", 654, 17.0) in calls
assert any(call[0] == "reset-failed" for call in calls)
assert any(call[0] == "start" for call in calls)
assert any(call[0] == "restart" for call in calls)
assert ("wait", False, 654) in calls
out = capsys.readouterr().out.lower()
assert "restarted" in out
assert "restarting gracefully" in out
def test_systemd_restart_uses_systemd_main_pid_when_pid_file_is_missing(self, monkeypatch, capsys):
calls = []
monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: None)
monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 10.0)
monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
monkeypatch.setattr(
gateway_cli,
"_read_systemd_unit_properties",
lambda system=False: {
"ActiveState": "active",
"SubState": "running",
"Result": "success",
"ExecMainStatus": "0",
"MainPID": "777",
},
)
monkeypatch.setattr(
gateway_cli,
"_graceful_restart_via_sigusr1",
lambda pid, timeout: calls.append(("graceful", pid, timeout)) or True,
)
monkeypatch.setattr(gateway_cli, "_run_systemctl", lambda args, **kwargs: calls.append(args) or SimpleNamespace(stdout="", returncode=0))
monkeypatch.setattr(
gateway_cli,
"_wait_for_systemd_service_restart",
lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
)
gateway_cli.systemd_restart()
assert ("graceful", 777, 15.0) in calls
assert ("wait", False, 777) in calls
assert "restarting gracefully (pid 777)" in capsys.readouterr().out.lower()
def test_wait_for_systemd_restart_waits_for_runtime_running(self, monkeypatch, capsys):
monkeypatch.setattr(
gateway_cli,
"_read_systemd_unit_properties",
lambda system=False: {
"ActiveState": "active",
"SubState": "running",
"Result": "success",
"ExecMainStatus": "0",
"MainPID": "999",
},
)
monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
monkeypatch.setattr(
gateway_cli,
"_gateway_runtime_status_for_pid",
lambda pid: {"pid": pid, "gateway_state": "running"},
)
assert gateway_cli._wait_for_systemd_service_restart(previous_pid=777, timeout=0.1) is True
assert "restarted (pid 999)" in capsys.readouterr().out.lower()
def test_systemd_restart_reports_start_limit_hit(self, monkeypatch, capsys):
calls = []
monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: None)
monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
monkeypatch.setattr(gateway_cli, "_recover_pending_systemd_restart", lambda system=False, previous_pid=None: False)
def fake_run_systemctl(args, **kwargs):
calls.append(args)
if args[0] == "show":
return SimpleNamespace(stdout="ActiveState=inactive\nSubState=dead\nResult=success\nExecMainStatus=0\nMainPID=0\n", stderr="", returncode=0)
if args[0] == "reset-failed":
return SimpleNamespace(stdout="", stderr="", returncode=0)
if args[0] == "restart":
raise subprocess.CalledProcessError(
1,
["systemctl", "--user", *args],
stderr="Job failed. See result 'start-limit-hit'.",
)
raise AssertionError(f"Unexpected args: {args}")
monkeypatch.setattr(gateway_cli, "_run_systemctl", fake_run_systemctl)
gateway_cli.systemd_restart()
assert ["restart", gateway_cli.get_service_name()] in calls
out = capsys.readouterr().out.lower()
assert "rate-limited by systemd" in out
assert "reset-failed" in out
def test_systemd_restart_recovers_failed_planned_restart(self, monkeypatch, capsys):
monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
@@ -711,6 +799,11 @@ class TestGatewaySystemServiceRouting:
"gateway.status.get_running_pid",
lambda: 999 if started["value"] else None,
)
monkeypatch.setattr(
gateway_cli,
"_gateway_runtime_status_for_pid",
lambda pid: {"pid": pid, "gateway_state": "running"},
)
gateway_cli.systemd_restart()
@@ -2177,3 +2270,171 @@ class TestSystemdInstallOffersLegacyRemoval:
assert prompt_called["count"] == 0
assert remove_called["invoked"] is False
class TestSystemScopeRequiresRootError:
"""Tests for the SystemScopeRequiresRootError replacement of sys.exit(1).
Before this change, ``_require_root_for_system_service`` called
``sys.exit(1)`` when non-root code tried a system-scope systemd
operation. The wizard's ``except Exception`` guards don't catch
``SystemExit`` (it's a ``BaseException`` subclass), so the user was
dumped at a bare shell prompt mid-setup. The fix raises a typed
exception instead, which the wizard intercepts and handles with
actionable remediation.
"""
def test_require_root_raises_when_non_root(self, monkeypatch):
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
with pytest.raises(gateway_cli.SystemScopeRequiresRootError) as excinfo:
gateway_cli._require_root_for_system_service("start")
assert excinfo.value.args[0] == "System gateway start requires root. Re-run with sudo."
assert excinfo.value.args[1] == "start"
# str(e) renders only the message, not the tuple repr, so that
# wizard format strings like f"Failed: {e}" print cleanly.
assert str(excinfo.value) == "System gateway start requires root. Re-run with sudo."
assert f"Failed: {excinfo.value}" == "Failed: System gateway start requires root. Re-run with sudo."
def test_require_root_noop_when_root(self, monkeypatch):
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 0)
# Should not raise, should not exit
gateway_cli._require_root_for_system_service("start")
def test_error_is_runtime_error_subclass(self):
"""Wizards use ``except Exception`` guards — the error must be a
``RuntimeError`` (catchable by ``Exception``), NOT a ``SystemExit``
(``BaseException``), so the wizard can recover from it.
"""
err = gateway_cli.SystemScopeRequiresRootError("msg", "start")
assert isinstance(err, RuntimeError)
assert isinstance(err, Exception)
assert not isinstance(err, SystemExit)
class TestSystemScopeWizardPreCheck:
"""Tests for _system_scope_wizard_would_need_root — the guard the
wizard uses to detect the dead-end BEFORE prompting the user to start
a service that will fail without sudo.
"""
@staticmethod
def _setup_units(tmp_path, monkeypatch, system_present: bool, user_present: bool):
sys_dir = tmp_path / "sys"
usr_dir = tmp_path / "usr"
sys_dir.mkdir()
usr_dir.mkdir()
if system_present:
(sys_dir / "hermes-gateway.service").write_text("[Unit]\n")
if user_present:
(usr_dir / "hermes-gateway.service").write_text("[Unit]\n")
monkeypatch.setattr(
gateway_cli,
"get_systemd_unit_path",
lambda system=False: (sys_dir if system else usr_dir) / "hermes-gateway.service",
)
def test_non_root_with_only_system_unit_returns_true(self, tmp_path, monkeypatch):
self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=False)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
assert gateway_cli._system_scope_wizard_would_need_root() is True
def test_root_never_needs_root(self, tmp_path, monkeypatch):
self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=False)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 0)
assert gateway_cli._system_scope_wizard_would_need_root() is False
def test_non_root_with_user_unit_present_returns_false(self, tmp_path, monkeypatch):
# User-scope unit present — user can start it themselves, no sudo needed.
self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=True)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
assert gateway_cli._system_scope_wizard_would_need_root() is False
def test_non_root_with_no_units_returns_false(self, tmp_path, monkeypatch):
self._setup_units(tmp_path, monkeypatch, system_present=False, user_present=False)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
assert gateway_cli._system_scope_wizard_would_need_root() is False
def test_non_root_with_explicit_system_arg_returns_true(self, tmp_path, monkeypatch):
# Caller passed system=True explicitly (e.g. ``hermes gateway start --system``).
self._setup_units(tmp_path, monkeypatch, system_present=False, user_present=False)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
assert gateway_cli._system_scope_wizard_would_need_root(system=True) is True
class TestSystemScopeRemediationOutput:
"""Tests for _print_system_scope_remediation — the actionable guidance
shown when the wizard detects a system-scope-only setup as non-root.
"""
def test_start_remediation_mentions_sudo_systemctl_and_uninstall(self, capsys, monkeypatch):
monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
gateway_cli._print_system_scope_remediation("start")
out = capsys.readouterr().out
assert "system-wide service" in out
assert "start requires root" in out
assert "sudo systemctl start hermes-gateway" in out
assert "sudo hermes gateway uninstall --system" in out
assert "hermes gateway install" in out
def test_restart_remediation_uses_systemctl_restart(self, capsys, monkeypatch):
monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
gateway_cli._print_system_scope_remediation("restart")
out = capsys.readouterr().out
assert "restart requires root" in out
assert "sudo systemctl restart hermes-gateway" in out
def test_stop_remediation_uses_systemctl_stop(self, capsys, monkeypatch):
monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
gateway_cli._print_system_scope_remediation("stop")
out = capsys.readouterr().out
assert "stop requires root" in out
assert "sudo systemctl stop hermes-gateway" in out
class TestGatewayCommandCatchesSystemScopeError:
"""The direct CLI path (``hermes gateway start --system`` etc.) must
still exit 1 with a clean message when non-root. The top-level
``gateway_command`` catches ``SystemScopeRequiresRootError`` and
converts it back to ``sys.exit(1)``, preserving existing CLI behavior.
"""
def test_non_root_system_start_exits_one_with_clean_message(self, tmp_path, monkeypatch, capsys):
sys_dir = tmp_path / "sys"
usr_dir = tmp_path / "usr"
sys_dir.mkdir()
usr_dir.mkdir()
(sys_dir / "hermes-gateway.service").write_text("[Unit]\n")
monkeypatch.setattr(
gateway_cli,
"get_systemd_unit_path",
lambda system=False: (sys_dir if system else usr_dir) / "hermes-gateway.service",
)
monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
monkeypatch.setattr(gateway_cli, "kill_gateway_processes", lambda **kw: 0)
args = SimpleNamespace(gateway_command="start", system=True, all=False)
with pytest.raises(SystemExit) as excinfo:
gateway_cli.gateway_command(args)
assert excinfo.value.code == 1
out = capsys.readouterr().out
# Renders the message, NOT the ``('msg', 'action')`` tuple repr
assert "System gateway start requires root. Re-run with sudo." in out
assert "('" not in out # no tuple repr leaking through
+218 -1
View File
@@ -309,6 +309,7 @@ class TestContinuousAPI:
# Isolate from any state left behind by other tests in the session.
monkeypatch.setattr(voice, "_continuous_active", False)
monkeypatch.setattr(voice, "_continuous_stopping", False, raising=False)
monkeypatch.setattr(voice, "_continuous_recorder", None)
assert voice.is_continuous_active() is False
@@ -343,11 +344,20 @@ class TestContinuousAPI:
monkeypatch.setattr(voice, "_continuous_recorder", FakeRecorder())
voice.start_continuous(on_transcript=lambda _t: None)
started = voice.start_continuous(on_transcript=lambda _t: None)
# The guard inside start_continuous short-circuits before rec.start()
assert started is True
assert called["n"] == 0
def test_start_returns_false_while_stopping(self, monkeypatch):
import hermes_cli.voice as voice
monkeypatch.setattr(voice, "_continuous_active", False)
monkeypatch.setattr(voice, "_continuous_stopping", True, raising=False)
assert voice.start_continuous(on_transcript=lambda _t: None) is False
class TestContinuousLoopSimulation:
"""End-to-end simulation of the VAD loop with a fake recorder.
@@ -368,6 +378,8 @@ class TestContinuousLoopSimulation:
monkeypatch.setattr(voice, "_continuous_on_transcript", None)
monkeypatch.setattr(voice, "_continuous_on_status", None)
monkeypatch.setattr(voice, "_continuous_on_silent_limit", None)
monkeypatch.setattr(voice, "_continuous_auto_restart", True, raising=False)
monkeypatch.setattr(voice, "_play_beep", lambda *_, **__: None)
class FakeRecorder:
_silence_threshold = 200
@@ -381,13 +393,20 @@ class TestContinuousLoopSimulation:
self.cancelled = 0
# Preset WAV path returned by stop()
self.next_stop_wav = "/tmp/fake.wav"
self.fail_stop = False
self.fail_next_start = False
def start(self, on_silence_stop=None):
if self.fail_next_start:
self.fail_next_start = False
raise RuntimeError("boom")
self.start_calls += 1
self.last_callback = on_silence_stop
self.is_recording = True
def stop(self):
if self.fail_stop:
raise RuntimeError("stop failed")
self.stopped += 1
self.is_recording = False
return self.next_stop_wav
@@ -433,6 +452,204 @@ class TestContinuousLoopSimulation:
voice.stop_continuous()
def test_auto_restart_false_stops_after_first_transcript(self, fake_recorder, monkeypatch):
import hermes_cli.voice as voice
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": "single shot"},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
transcripts = []
statuses = []
voice.start_continuous(
on_transcript=lambda t: transcripts.append(t),
on_status=lambda s: statuses.append(s),
auto_restart=False,
)
fake_recorder.last_callback()
assert transcripts == ["single shot"]
assert fake_recorder.start_calls == 1
assert statuses == ["listening", "transcribing", "idle"]
assert voice.is_continuous_active() is False
def test_auto_restart_false_retains_silent_strikes_across_starts(
self, fake_recorder, monkeypatch
):
import hermes_cli.voice as voice
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": ""},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
silent_limit_fired = []
for _ in range(3):
voice.start_continuous(
on_transcript=lambda _t: None,
on_silent_limit=lambda: silent_limit_fired.append(True),
auto_restart=False,
)
fake_recorder.last_callback()
assert silent_limit_fired == [True]
assert voice.is_continuous_active() is False
assert fake_recorder.start_calls == 3
def test_force_transcribe_stop_delivers_current_buffer(self, fake_recorder, monkeypatch):
import hermes_cli.voice as voice
class ImmediateThread:
def __init__(self, target, daemon=False):
self.target = target
def start(self):
self.target()
monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": "manual stop"},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
transcripts = []
statuses = []
voice.start_continuous(
on_transcript=lambda t: transcripts.append(t),
on_status=lambda s: statuses.append(s),
)
voice.stop_continuous(force_transcribe=True)
assert fake_recorder.stopped == 1
assert transcripts == ["manual stop"]
assert statuses == ["listening", "transcribing", "idle"]
assert voice.is_continuous_active() is False
def test_force_transcribe_empty_single_shots_hit_silent_limit(
self, fake_recorder, monkeypatch
):
import hermes_cli.voice as voice
class ImmediateThread:
def __init__(self, target, daemon=False):
self.target = target
def start(self):
self.target()
monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": ""},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
silent_limit_fired = []
for _ in range(3):
voice.start_continuous(
on_transcript=lambda _t: None,
on_silent_limit=lambda: silent_limit_fired.append(True),
auto_restart=False,
)
voice.stop_continuous(force_transcribe=True)
assert silent_limit_fired == [True]
assert fake_recorder.stopped == 3
assert voice._continuous_no_speech_count == 0
def test_force_transcribe_valid_single_shot_resets_silent_strikes(
self, fake_recorder, monkeypatch
):
import hermes_cli.voice as voice
class ImmediateThread:
def __init__(self, target, daemon=False):
self.target = target
def start(self):
self.target()
monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
monkeypatch.setattr(voice, "_continuous_no_speech_count", 2)
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": "manual stop"},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
transcripts = []
silent_limit_fired = []
voice.start_continuous(
on_transcript=lambda t: transcripts.append(t),
on_silent_limit=lambda: silent_limit_fired.append(True),
auto_restart=False,
)
voice.stop_continuous(force_transcribe=True)
assert transcripts == ["manual stop"]
assert silent_limit_fired == []
assert voice._continuous_no_speech_count == 0
def test_force_transcribe_stop_failure_cancels_and_clears_stopping(
self, fake_recorder, monkeypatch
):
import hermes_cli.voice as voice
class ImmediateThread:
def __init__(self, target, daemon=False):
self.target = target
def start(self):
self.target()
monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
fake_recorder.fail_stop = True
statuses = []
voice.start_continuous(
on_transcript=lambda _t: None,
on_status=lambda s: statuses.append(s),
)
voice.stop_continuous(force_transcribe=True)
assert fake_recorder.cancelled == 1
assert statuses == ["listening", "transcribing", "idle"]
assert voice.is_continuous_active() is False
assert voice._continuous_stopping is False
def test_restart_failure_reports_idle(self, fake_recorder, monkeypatch):
import hermes_cli.voice as voice
monkeypatch.setattr(
voice,
"transcribe_recording",
lambda _p: {"success": True, "transcript": "hello world"},
)
monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
statuses = []
voice.start_continuous(on_transcript=lambda _t: None, on_status=statuses.append)
fake_recorder.fail_next_start = True
fake_recorder.last_callback()
assert statuses == ["listening", "transcribing", "idle"]
assert voice.is_continuous_active() is False
def test_silent_limit_halts_loop_after_three_strikes(self, fake_recorder, monkeypatch):
import hermes_cli.voice as voice
+75
View File
@@ -204,6 +204,7 @@ def test_voice_record_start_handles_non_dict_voice_cfg(monkeypatch):
assert resp["result"]["status"] == "recording"
assert captured["silence_threshold"] == 200
assert captured["silence_duration"] == 3.0
assert captured["auto_restart"] is False
# Round-12 Copilot review regression on #19835: ``bool`` is a subclass
# of ``int``, so the naive ``isinstance(threshold, (int, float))``
@@ -232,6 +233,80 @@ def test_voice_record_start_handles_non_dict_voice_cfg(monkeypatch):
assert (
captured["silence_duration"] == 3.0
), f"bool silence_duration leaked through for {bad_bool_cfg!r}"
assert captured["auto_restart"] is False
def test_voice_record_stop_forces_transcription(monkeypatch):
captured: dict = {}
def fake_stop_continuous(**kwargs):
captured.update(kwargs)
monkeypatch.setitem(
sys.modules,
"hermes_cli.voice",
types.SimpleNamespace(
start_continuous=lambda **_kwargs: None,
stop_continuous=fake_stop_continuous,
),
)
resp = server.dispatch(
{
"id": "voice-record-stop",
"method": "voice.record",
"params": {"action": "stop"},
}
)
assert resp["result"]["status"] == "stopped"
assert captured["force_transcribe"] is True
def test_voice_record_stop_updates_event_session_id(monkeypatch):
monkeypatch.setitem(
sys.modules,
"hermes_cli.voice",
types.SimpleNamespace(
start_continuous=lambda **_kwargs: True,
stop_continuous=lambda **_kwargs: None,
),
)
monkeypatch.setattr(server, "_voice_event_sid", "old-session")
resp = server.dispatch(
{
"id": "voice-record-stop-session",
"method": "voice.record",
"params": {"action": "stop", "session_id": "new-session"},
}
)
assert resp["result"]["status"] == "stopped"
assert server._voice_event_sid == "new-session"
def test_voice_record_start_reports_busy_when_stop_is_in_progress(monkeypatch):
monkeypatch.setitem(
sys.modules,
"hermes_cli.voice",
types.SimpleNamespace(
start_continuous=lambda **_kwargs: False,
stop_continuous=lambda **_kwargs: None,
),
)
monkeypatch.setenv("HERMES_VOICE", "1")
monkeypatch.setattr(server, "_load_cfg", lambda: {"voice": {}})
resp = server.dispatch(
{
"id": "voice-record-busy",
"method": "voice.record",
"params": {"action": "start"},
}
)
assert resp["result"]["status"] == "busy"
def test_voice_toggle_tts_branch_also_carries_record_key(monkeypatch):
@@ -0,0 +1,648 @@
"""Tests for compositor-level coordinate click (browser_click with x/y params).
Covers:
- Input validation (ref vs x/y mutually exclusive)
- CDP coordinate click path (via mock CDP server)
- agent-browser mouse fallback path
- Camofox passthrough still works with ref
"""
from __future__ import annotations
import asyncio
import json
import threading
from typing import Any, Dict, List
import pytest
import websockets
from websockets.asyncio.server import serve
# ---------------------------------------------------------------------------
# In-process CDP mock server (reused from test_browser_cdp_tool.py)
# ---------------------------------------------------------------------------
class _CDPServer:
"""Tiny CDP mock — replies to registered method handlers."""
def __init__(self) -> None:
self._handlers: Dict[str, Any] = {}
self._responses: List[Dict[str, Any]] = []
self._loop: asyncio.AbstractEventLoop | None = None
self._server: Any = None
self._thread: threading.Thread | None = None
self._host = "127.0.0.1"
self._port = 0
self._url: str = ""
def on(self, method: str, handler):
self._handlers[method] = handler
def start(self) -> str:
ready = threading.Event()
def _run() -> None:
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
async def _handler(ws):
try:
async for raw in ws:
msg = json.loads(raw)
call_id = msg.get("id")
method = msg.get("method", "")
params = msg.get("params", {}) or {}
session_id = msg.get("sessionId")
self._responses.append(msg)
fn = self._handlers.get(method)
if fn is None:
reply = {
"id": call_id,
"error": {"code": -32601, "message": f"No handler for {method}"},
}
else:
try:
result = fn(params, session_id)
reply = {"id": call_id, "result": result}
except Exception as exc:
reply = {"id": call_id, "error": {"code": -1, "message": str(exc)}}
if session_id:
reply["sessionId"] = session_id
await ws.send(json.dumps(reply))
except websockets.exceptions.ConnectionClosed:
pass
async def _serve() -> None:
self._server = await serve(_handler, self._host, 0)
sock = next(iter(self._server.sockets))
self._port = sock.getsockname()[1]
ready.set()
await self._server.wait_closed()
try:
self._loop.run_until_complete(_serve())
finally:
self._loop.close()
self._thread = threading.Thread(target=_run, daemon=True)
self._thread.start()
if not ready.wait(timeout=5.0):
raise RuntimeError("CDP mock server failed to start")
self._url = f"ws://{self._host}:{self._port}/devtools/browser/mock"
return self._url
def stop(self) -> None:
if self._loop and self._server:
self._loop.call_soon_threadsafe(self._server.close)
if self._thread:
self._thread.join(timeout=3.0)
def received(self) -> List[Dict[str, Any]]:
return list(self._responses)
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def cdp_server(monkeypatch):
"""Start a CDP mock and point browser_cdp_tool's resolver at it."""
server = _CDPServer()
ws_url = server.start()
import tools.browser_cdp_tool as cdp_mod
monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: ws_url)
# clear the session cache so each test starts fresh
from tools import browser_tool as _bt
_bt._CDP_SESSION_CACHE.clear()
try:
yield server
finally:
_bt._CDP_SESSION_CACHE.clear()
server.stop()
# ---------------------------------------------------------------------------
# Input validation
# ---------------------------------------------------------------------------
class TestClickInputValidation:
"""browser_click validates that exactly one of ref / (x,y) is provided."""
def test_neither_ref_nor_coords(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click())
assert result["success"] is False
assert "ref" in result["error"].lower() or "x" in result["error"].lower()
def test_both_ref_and_coords(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click(ref="@e1", x=100, y=200))
assert result["success"] is False
assert "not both" in result["error"].lower()
def test_x_without_y(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click(x=100))
assert result["success"] is False
assert "both" in result["error"].lower()
def test_y_without_x(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click(y=200))
assert result["success"] is False
assert "both" in result["error"].lower()
def test_empty_ref_treated_as_missing(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click(ref=""))
assert result["success"] is False
assert "ref" in result["error"].lower() or "x" in result["error"].lower()
def test_non_numeric_coordinates(self):
from tools.browser_tool import browser_click
result = json.loads(browser_click(x="abc", y="def"))
assert result["success"] is False
assert "number" in result["error"].lower()
# ---------------------------------------------------------------------------
# CDP coordinate click (happy path via mock server)
# ---------------------------------------------------------------------------
class TestCDPCoordinateClick:
"""Coordinate clicks via CDP Input.dispatchMouseEvent."""
def test_cdp_click_dispatches_press_and_release(self, cdp_server):
from tools.browser_tool import browser_click
# Register handlers for the protocol calls
cdp_server.on(
"Target.getTargets",
lambda p, s: {
"targetInfos": [
{"targetId": "page-1", "type": "page", "attached": True, "url": "https://example.com"},
]
},
)
cdp_server.on(
"Target.attachToTarget",
lambda p, s: {"sessionId": f"sess-{p['targetId']}"},
)
cdp_server.on(
"Input.dispatchMouseEvent",
lambda p, s: {},
)
result = json.loads(browser_click(x=150, y=300))
assert result["success"] is True
assert result["clicked_at"] == {"x": 150, "y": 300}
assert result["method"] == "cdp_compositor"
# Verify the CDP calls: Target.getTargets, attach, mousePressed, attach, mouseReleased
calls = cdp_server.received()
methods = [c["method"] for c in calls]
assert "Target.getTargets" in methods
assert "Input.dispatchMouseEvent" in methods
# Find the mouse events
mouse_events = [c for c in calls if c["method"] == "Input.dispatchMouseEvent"]
assert len(mouse_events) == 2
assert mouse_events[0]["params"]["type"] == "mousePressed"
assert mouse_events[0]["params"]["x"] == 150
assert mouse_events[0]["params"]["y"] == 300
assert mouse_events[0]["params"]["button"] == "left"
assert mouse_events[1]["params"]["type"] == "mouseReleased"
def test_cdp_click_rounds_float_coordinates(self, cdp_server):
from tools.browser_tool import browser_click
cdp_server.on(
"Target.getTargets",
lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
)
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
result = json.loads(browser_click(x=150.7, y=299.3))
assert result["success"] is True
assert result["clicked_at"] == {"x": 151, "y": 299}
def test_cdp_click_no_page_target_still_works(self, cdp_server):
"""When Target.getTargets returns no page targets, click proceeds without target_id."""
from tools.browser_tool import browser_click
cdp_server.on(
"Target.getTargets",
lambda p, s: {"targetInfos": [{"targetId": "sw1", "type": "service_worker"}]},
)
# No Target.attachToTarget needed — page_target is None so _cdp_call
# sends without attaching
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
result = json.loads(browser_click(x=50, y=50))
assert result["success"] is True
assert result["clicked_at"] == {"x": 50, "y": 50}
def test_cdp_dispatch_mouse_event_failure(self, cdp_server):
"""When Input.dispatchMouseEvent returns a CDP error, return failure."""
from tools.browser_tool import browser_click
cdp_server.on(
"Target.getTargets",
lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
)
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
# No handler for Input.dispatchMouseEvent — server returns CDP error
result = json.loads(browser_click(x=100, y=200))
assert result["success"] is False
assert "CDP coordinate click failed" in result["error"]
# ---------------------------------------------------------------------------
# agent-browser mouse fallback
# ---------------------------------------------------------------------------
class TestAgentBrowserMouseFallback:
"""When no CDP endpoint is available, fall back to agent-browser mouse commands."""
def test_falls_back_to_agent_browser_mouse(self, monkeypatch):
from tools import browser_tool, browser_cdp_tool
# No CDP endpoint available
monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
# Mock _run_browser_command and _last_session_key
commands_sent = []
def mock_run_cmd(task_id, command, args=None, timeout=None):
commands_sent.append((command, args))
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
result = json.loads(browser_tool.browser_click(x=200, y=400))
assert result["success"] is True
assert result["clicked_at"] == {"x": 200, "y": 400}
assert result["method"] == "agent_browser_mouse"
# Should have sent: mouse move, mouse down, mouse up
assert len(commands_sent) == 3
assert commands_sent[0] == ("mouse", ["move", "200", "400"])
assert commands_sent[1] == ("mouse", ["down"])
assert commands_sent[2] == ("mouse", ["up"])
def test_mouse_move_failure_returns_error(self, monkeypatch):
from tools import browser_tool, browser_cdp_tool
monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
def mock_run_cmd(task_id, command, args=None, timeout=None):
if args and args[0] == "move":
return {"success": False, "error": "mouse move not supported"}
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
result = json.loads(browser_tool.browser_click(x=100, y=100))
assert result["success"] is False
assert "mouse move" in result["error"]
def test_mouse_down_failure_returns_error(self, monkeypatch):
from tools import browser_tool, browser_cdp_tool
monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
def mock_run_cmd(task_id, command, args=None, timeout=None):
if args and args[0] == "down":
return {"success": False, "error": "mouse down failed"}
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
result = json.loads(browser_tool.browser_click(x=100, y=100))
assert result["success"] is False
assert "mouse down" in result["error"]
def test_mouse_up_failure_returns_error(self, monkeypatch):
from tools import browser_tool, browser_cdp_tool
monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
def mock_run_cmd(task_id, command, args=None, timeout=None):
if args and args[0] == "up":
return {"success": False, "error": "mouse up failed"}
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
result = json.loads(browser_tool.browser_click(x=100, y=100))
assert result["success"] is False
assert "mouse up" in result["error"]
# ---------------------------------------------------------------------------
# Ref-based click unchanged
# ---------------------------------------------------------------------------
class TestRefClickPreserved:
"""Existing ref-based click behavior is unchanged."""
def test_ref_click_still_works(self, monkeypatch):
from tools import browser_tool
monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
def mock_run_cmd(task_id, command, args=None, timeout=None):
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
result = json.loads(browser_tool.browser_click(ref="@e5"))
assert result["success"] is True
assert result["clicked"] == "@e5"
def test_ref_without_at_prefix_auto_added(self, monkeypatch):
from tools import browser_tool
monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
clicked_refs = []
def mock_run_cmd(task_id, command, args=None, timeout=None):
clicked_refs.append(args)
return {"success": True}
monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
browser_tool.browser_click(ref="e12")
assert clicked_refs[0] == ["@e12"]
# ---------------------------------------------------------------------------
# Schema check
# ---------------------------------------------------------------------------
class TestSchemaUpdated:
"""The tool schema reflects x/y params and ref is no longer required."""
def test_schema_has_x_y_properties(self):
from tools.browser_tool import _BROWSER_SCHEMA_MAP
schema = _BROWSER_SCHEMA_MAP["browser_click"]
props = schema["parameters"]["properties"]
assert "x" in props
assert "y" in props
assert props["x"]["type"] == "number"
assert props["y"]["type"] == "number"
def test_schema_no_required_fields(self):
from tools.browser_tool import _BROWSER_SCHEMA_MAP
schema = _BROWSER_SCHEMA_MAP["browser_click"]
# ref is no longer required — either ref or x+y
assert "required" not in schema["parameters"] or schema["parameters"]["required"] == []
def test_schema_ref_still_present(self):
from tools.browser_tool import _BROWSER_SCHEMA_MAP
schema = _BROWSER_SCHEMA_MAP["browser_click"]
assert "ref" in schema["parameters"]["properties"]
# ---------------------------------------------------------------------------
# Registry integration
# ---------------------------------------------------------------------------
class TestRegistryIntegration:
"""browser_click is registered with x/y params wired through."""
def test_dispatch_with_coordinates(self, monkeypatch, cdp_server):
from tools.registry import registry
cdp_server.on(
"Target.getTargets",
lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
)
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
raw = registry.dispatch(
"browser_click", {"x": 42, "y": 84}, task_id="t1"
)
result = json.loads(raw)
assert result["success"] is True
assert result["clicked_at"] == {"x": 42, "y": 84}
def test_dispatch_with_ref(self, monkeypatch):
from tools import browser_tool
from tools.registry import registry
monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
monkeypatch.setattr(
browser_tool, "_run_browser_command",
lambda tid, cmd, args=None, timeout=None: {"success": True},
)
raw = registry.dispatch("browser_click", {"ref": "@e3"}, task_id="t1")
result = json.loads(raw)
assert result["success"] is True
# ---------------------------------------------------------------------------
# Session caching
# ---------------------------------------------------------------------------
class TestSessionCaching:
"""Second click skips Target.getTargets + Target.attachToTarget."""
def test_second_click_skips_session_resolution(self, cdp_server, monkeypatch):
"""After first click the session_id is cached; second click goes straight
to mousePressed+mouseReleased without re-issuing getTargets/attachToTarget."""
from tools import browser_tool
import tools.browser_cdp_tool as cdp_mod
# clear cache
browser_tool._CDP_SESSION_CACHE.clear()
monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
resolve_count = {"n": 0}
def _getTargets(p, s):
resolve_count["n"] += 1
return {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]}
cdp_server.on("Target.getTargets", _getTargets)
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "sess-cached"})
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
# First click — must call getTargets
r1 = json.loads(browser_tool.browser_click(x=10.0, y=20.0))
assert r1["success"] is True
assert resolve_count["n"] == 1
# Second click — cache hit; getTargets must NOT be called again
r2 = json.loads(browser_tool.browser_click(x=30.0, y=40.0))
assert r2["success"] is True
assert resolve_count["n"] == 1, "session resolution was repeated despite warm cache"
def test_stale_session_triggers_reattach(self, cdp_server, monkeypatch):
"""If the browser returns 'Session with given id not found', the cache is
cleared and session resolution runs again before retrying the click."""
from tools import browser_tool
import tools.browser_cdp_tool as cdp_mod
browser_tool._CDP_SESSION_CACHE.clear()
monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
call_count = {"mouse": 0, "resolve": 0}
def _getTargets(p, s):
call_count["resolve"] += 1
return {"targetInfos": [{"targetId": "px", "type": "page", "attached": True, "url": "..."}]}
def _dispatch(p, s):
call_count["mouse"] += 1
# First two mouse calls (with stale session) return an error;
# after re-resolve they should succeed
if call_count["mouse"] <= 2:
raise RuntimeError("Session with given id not found: stale-session-id")
return {}
cdp_server.on("Target.getTargets", _getTargets)
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": f"sess-{call_count['resolve']}"})
cdp_server.on("Input.dispatchMouseEvent", _dispatch)
# Seed cache with stale session to trigger the error path
browser_tool._CDP_SESSION_CACHE[cdp_server._url] = "stale-session-id"
r = json.loads(browser_tool.browser_click(x=50.0, y=60.0))
assert r["success"] is True
# Must have resolved the session once (after evicting stale entry)
assert call_count["resolve"] >= 1
def test_cache_cleared_on_endpoint_change(self, monkeypatch):
"""Cache is keyed per endpoint URL; different URL doesn't reuse cached session."""
from tools import browser_tool
browser_tool._CDP_SESSION_CACHE.clear()
browser_tool._CDP_SESSION_CACHE["ws://endpoint-a/"] = "sess-a"
# Endpoint B must not find endpoint A's session
assert browser_tool._CDP_SESSION_CACHE.get("ws://endpoint-b/") is None
# ---------------------------------------------------------------------------
# Supervisor path
# ---------------------------------------------------------------------------
class TestSupervisorPath:
"""When a CDPSupervisor is alive for the task_id, coordinate clicks use its
persistent WS connection zero per-click connection setup cost."""
def test_supervisor_path_used_when_supervisor_alive(self, monkeypatch):
"""browser_click delegates to the supervisor when one is registered."""
from tools import browser_tool
clicks = []
class _FakeSupervisor:
def dispatch_mouse_click(self, x, y, button="left", timeout=10.0):
clicks.append((x, y, button))
class _FakeRegistry:
def get(self, task_id):
return _FakeSupervisor()
import tools.browser_supervisor as bs_mod
monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _FakeRegistry())
result = json.loads(browser_tool.browser_click(x=77.0, y=88.0, task_id="t1"))
assert result["success"] is True
assert result["method"] == "cdp_supervisor"
assert result["clicked_at"] == {"x": 77, "y": 88}
assert clicks == [(77, 88, "left")]
def test_supervisor_error_falls_through_to_per_click(self, monkeypatch, cdp_server):
"""If dispatch_mouse_click raises, the per-click WS path is used instead."""
from tools import browser_tool
import tools.browser_supervisor as bs_mod
import tools.browser_cdp_tool as cdp_mod
browser_tool._CDP_SESSION_CACHE.clear()
monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
class _BrokenSupervisor:
def dispatch_mouse_click(self, x, y, button="left", timeout=10.0):
raise RuntimeError("supervisor WS disconnected")
class _BrokenRegistry:
def get(self, task_id):
return _BrokenSupervisor()
monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _BrokenRegistry())
cdp_server.on("Target.getTargets", lambda p, s: {
"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]
})
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
result = json.loads(browser_tool.browser_click(x=10.0, y=20.0, task_id="t2"))
assert result["success"] is True
# Should have fallen through to per-click path (cdp_compositor, not cdp_supervisor)
assert result["method"] == "cdp_compositor"
def test_no_supervisor_uses_per_click_path(self, monkeypatch, cdp_server):
"""When SUPERVISOR_REGISTRY.get() returns None, the per-click WS path runs."""
from tools import browser_tool
import tools.browser_supervisor as bs_mod
import tools.browser_cdp_tool as cdp_mod
browser_tool._CDP_SESSION_CACHE.clear()
monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
class _EmptyRegistry:
def get(self, task_id):
return None
monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _EmptyRegistry())
cdp_server.on("Target.getTargets", lambda p, s: {
"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]
})
cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
result = json.loads(browser_tool.browser_click(x=5.0, y=6.0, task_id="t3"))
assert result["success"] is True
assert result["method"] == "cdp_compositor"
+51 -1
View File
@@ -457,7 +457,57 @@ class CDPSupervisor:
return {"ok": False, "error": f"{type(e).__name__}: {e}"}
return {"ok": True, "dialog": snapshot_copy.to_dict()}
# ── Supervisor loop internals ────────────────────────────────────────────
def dispatch_mouse_click(
self,
x: int,
y: int,
button: str = "left",
timeout: float = 10.0,
) -> None:
"""Dispatch a compositor-level click over the supervisor's live WS.
Uses the supervisor's already-connected WebSocket — zero connection
setup cost vs opening a fresh WS per click. mousePressed and
mouseReleased are both sent before awaiting either response
(pipelined), following the Playwright Promise.all pattern.
Raises RuntimeError if the supervisor is inactive or the click fails.
"""
loop = self._loop
if loop is None or not loop.is_running():
raise RuntimeError("supervisor loop is not running")
with self._state_lock:
if not self._active:
raise RuntimeError("supervisor is not active")
session_id = self._page_session_id
async def _do_click() -> None:
mouse_params = {"x": x, "y": y, "button": button, "clickCount": 1}
# Pipeline both events — send without awaiting press ack.
# Browser processes CDP messages in order; if mouseReleased is
# acked, mousePressed has already been applied.
press_fut = asyncio.create_task(
self._cdp("Input.dispatchMouseEvent",
{**mouse_params, "type": "mousePressed"},
session_id=session_id, timeout=timeout)
)
release_fut = asyncio.create_task(
self._cdp("Input.dispatchMouseEvent",
{**mouse_params, "type": "mouseReleased"},
session_id=session_id, timeout=timeout)
)
await asyncio.gather(press_fut, release_fut)
try:
fut = asyncio.run_coroutine_threadsafe(_do_click(), loop)
fut.result(timeout=timeout + 1)
except Exception as exc:
raise RuntimeError(
f"supervisor mouse click failed: {type(exc).__name__}: {exc}"
) from exc
def _thread_main(self) -> None:
"""Entry point for the supervisor's dedicated thread."""
+346 -6
View File
@@ -1317,16 +1317,23 @@ BROWSER_TOOL_SCHEMAS = [
},
{
"name": "browser_click",
"description": "Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first.",
"description": "Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first.\n\nAlternatively, click at exact viewport coordinates (x, y) using compositor-level input. This bypasses DOM selectors entirely — clicks pass through iframes, shadow DOM, cross-origin boundaries, and canvas elements. Use browser_vision with annotate=true to find coordinates, or browser_console to evaluate getBoundingClientRect(). Provide EITHER ref OR (x + y), not both.",
"parameters": {
"type": "object",
"properties": {
"ref": {
"type": "string",
"description": "The element reference from the snapshot (e.g., '@e5', '@e12')"
},
"x": {
"type": "number",
"description": "Viewport X coordinate for compositor-level click. Use with y instead of ref to click through iframes, shadow DOM, or canvas elements."
},
"y": {
"type": "number",
"description": "Viewport Y coordinate for compositor-level click. Use with x instead of ref."
}
},
"required": ["ref"]
}
}
},
{
@@ -2286,17 +2293,350 @@ def browser_snapshot(
return json.dumps(_copy_fallback_warning(response, result), ensure_ascii=False)
def browser_click(ref: str, task_id: Optional[str] = None) -> str:
# ---------------------------------------------------------------------------
# Session cache for CDP coordinate clicks
#
# Target.getTargets + Target.attachToTarget cost one round-trip each and
# their results (page targetId + session_id) are stable across clicks on
# the same page. We cache them keyed by CDP endpoint URL and invalidate
# automatically when the browser reports a stale session error.
#
# Pattern: browser-harness daemon keeps session_id on the daemon object and
# retries once on "Session with given id not found" to self-heal after
# navigation or crash. We replicate that here without a persistent daemon
# process by storing it in a module-level dict.
# ---------------------------------------------------------------------------
_CDP_SESSION_CACHE: dict[str, str] = {} # ws_url → cached session_id
async def _cdp_resolve_session(
ws: Any,
ws_url: str,
deadline: float,
msg_id_ref: list,
) -> Optional[str]:
"""Resolve (and cache) the page-scoped CDP session ID.
Sends Target.getTargets + Target.attachToTarget on *ws* and caches the
resulting session_id for future clicks. Returns None if no page target
is found (Input.dispatchMouseEvent will be sent at browser level, which
works for simple cases).
"""
Click on an element.
import asyncio as _asyncio
async def _send(method: str, params: dict, sid: Optional[str] = None) -> int:
msg_id_ref[0] += 1
call_id = msg_id_ref[0]
req: dict = {"id": call_id, "method": method, "params": params}
if sid:
req["sessionId"] = sid
await ws.send(json.dumps(req))
return call_id
async def _recv_until(call_id: int) -> dict:
while True:
remaining = deadline - _asyncio.get_running_loop().time()
if remaining <= 0:
raise TimeoutError(f"CDP timed out waiting for id={call_id}")
raw = await _asyncio.wait_for(ws.recv(), timeout=remaining)
msg = json.loads(raw)
if msg.get("id") == call_id:
if "error" in msg:
raise RuntimeError(f"CDP error: {msg['error']}")
return msg.get("result", {})
gt_id = await _send("Target.getTargets", {})
gt_result = await _recv_until(gt_id)
page_target_id: Optional[str] = None
for t in gt_result.get("targetInfos", []):
if t.get("type") == "page" and t.get("attached", True):
page_target_id = t["targetId"]
break
if not page_target_id:
return None
at_id = await _send("Target.attachToTarget",
{"targetId": page_target_id, "flatten": True})
at_result = await _recv_until(at_id)
session_id = at_result.get("sessionId") or None
if session_id:
_CDP_SESSION_CACHE[ws_url] = session_id
return session_id
async def _cdp_coordinate_click_async(
ws_url: str,
x: int,
y: int,
button: str,
timeout: float,
) -> None:
"""Perform a compositor-level click on a single persistent WS connection.
Optimizations vs the naïve 3-separate-connections approach:
1. **Single connection** one TCP+WS handshake for the entire click.
All CDP messages are sent on the same socket.
2. **Session ID caching** Target.getTargets + Target.attachToTarget are
only paid once per CDP endpoint. Subsequent clicks skip straight to
the two mouse events. Cache is invalidated automatically on stale-
session errors and re-resolved once (browser-harness self-heal pattern).
3. **mouseReleased-only wait** mousePressed and mouseReleased are both
fired before awaiting either response. Because the browser processes
CDP messages sequentially within a session, if mouseReleased is
acknowledged then mousePressed has already been processed. We only
wait for the release ack (Playwright / Puppeteer pattern), saving one
RTT on the common path.
"""
import asyncio as _asyncio
from tools.browser_cdp_tool import websockets as _ws
async with _ws.connect(
ws_url,
max_size=None,
open_timeout=timeout,
close_timeout=5,
ping_interval=None,
compression=None, # small CDP messages don't benefit from compression
) as ws:
deadline = _asyncio.get_running_loop().time() + timeout
msg_id_ref = [0] # mutable so nested helpers can increment
def _next_id() -> int:
msg_id_ref[0] += 1
return msg_id_ref[0]
async def _send_mouse(event_type: str, sid: Optional[str]) -> int:
call_id = _next_id()
req: dict = {
"id": call_id,
"method": "Input.dispatchMouseEvent",
"params": {"type": event_type, "x": x, "y": y,
"button": button, "clickCount": 1},
}
if sid:
req["sessionId"] = sid
await ws.send(json.dumps(req))
return call_id
async def _recv_until(call_id: int) -> dict:
while True:
remaining = deadline - _asyncio.get_running_loop().time()
if remaining <= 0:
raise TimeoutError(f"CDP timed out waiting for id={call_id}")
raw = await _asyncio.wait_for(ws.recv(), timeout=remaining)
msg = json.loads(raw)
if msg.get("id") == call_id:
if "error" in msg:
raise RuntimeError(f"CDP error: {msg['error']}")
return msg.get("result", {})
# --- resolve session (cached after first click) ---
session_id: Optional[str] = _CDP_SESSION_CACHE.get(ws_url)
if not session_id:
session_id = await _cdp_resolve_session(ws, ws_url, deadline, msg_id_ref)
# --- fire mousePressed + mouseReleased without awaiting press ack ---
# Both messages are sent before we await either response. The browser
# processes them in order, so waiting only for mouseReleased is enough.
_press_id = await _send_mouse("mousePressed", session_id)
release_id = await _send_mouse("mouseReleased", session_id)
try:
await _recv_until(release_id)
except RuntimeError as exc:
# Stale session (e.g. after navigation) — invalidate cache, retry once
if "Session with given id not found" in str(exc) and session_id:
_CDP_SESSION_CACHE.pop(ws_url, None)
session_id = await _cdp_resolve_session(ws, ws_url, deadline, msg_id_ref)
_press_id = await _send_mouse("mousePressed", session_id)
release_id = await _send_mouse("mouseReleased", session_id)
await _recv_until(release_id)
else:
raise
def _cdp_coordinate_click(
x: float,
y: float,
task_id: str,
button: str = "left",
) -> str:
"""Compositor-level click at viewport coordinates via CDP Input.dispatchMouseEvent.
Dispatch priority (fastest first):
1. **Supervisor path** if a CDPSupervisor is alive for this task_id, reuse
its already-connected WebSocket. Zero connection setup cost; the supervisor
thread owns a persistent WS that self-heals on navigation/crash.
2. **Per-click connect path** open a single WS, resolve session (cached),
pipeline mousePressed + mouseReleased, close.
3. **agent-browser fallback** when no CDP endpoint is configured at all.
"""
ix, iy = int(round(x)), int(round(y))
# --- path 1: reuse supervisor's live WS (zero connection overhead) ------
try:
from tools.browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
supervisor = SUPERVISOR_REGISTRY.get(task_id)
if supervisor is not None:
try:
supervisor.dispatch_mouse_click(ix, iy, button)
return json.dumps({
"success": True,
"clicked_at": {"x": ix, "y": iy},
"method": "cdp_supervisor",
}, ensure_ascii=False)
except Exception as _exc:
# Supervisor present but errored (WS disconnect, stale session, etc.)
# — fall through to per-click WS path.
logger.debug("supervisor coordinate click failed for task=%s, falling back: %s", task_id, _exc)
except ImportError:
pass
# --- path 2: per-click WS connect (with session cache) ------------------
try:
from tools.browser_cdp_tool import _run_async, _resolve_cdp_endpoint, _WS_AVAILABLE
except ImportError:
return json.dumps({
"success": False,
"error": "browser_cdp_tool not available — coordinate clicks require the CDP tool module.",
}, ensure_ascii=False)
if not _WS_AVAILABLE:
return json.dumps({
"success": False,
"error": "The 'websockets' package is required for coordinate clicks. Install with: pip install websockets",
}, ensure_ascii=False)
endpoint = _resolve_cdp_endpoint()
if not endpoint:
return _coordinate_click_via_agent_browser(x, y, task_id, button)
if not endpoint.startswith(("ws://", "wss://")):
return _coordinate_click_via_agent_browser(x, y, task_id, button)
try:
_run_async(_cdp_coordinate_click_async(endpoint, ix, iy, button, 10.0))
except Exception as exc:
return json.dumps({
"success": False,
"error": f"CDP coordinate click failed: {type(exc).__name__}: {exc}",
}, ensure_ascii=False)
return json.dumps({
"success": True,
"clicked_at": {"x": ix, "y": iy},
"method": "cdp_compositor",
}, ensure_ascii=False)
def _coordinate_click_via_agent_browser(
x: float,
y: float,
task_id: str,
button: str = "left",
) -> str:
"""Fallback: coordinate click via agent-browser mouse subcommands."""
effective_task_id = _last_session_key(task_id)
ix, iy = int(round(x)), int(round(y))
# agent-browser mouse move <x> <y> + mouse down + mouse up
move_result = _run_browser_command(effective_task_id, "mouse", ["move", str(ix), str(iy)])
if not move_result.get("success"):
return json.dumps({
"success": False,
"error": f"mouse move failed: {move_result.get('error', 'unknown')}",
}, ensure_ascii=False)
btn_arg = [] if button == "left" else [button]
down_result = _run_browser_command(effective_task_id, "mouse", ["down"] + btn_arg)
if not down_result.get("success"):
return json.dumps({
"success": False,
"error": f"mouse down failed: {down_result.get('error', 'unknown')}",
}, ensure_ascii=False)
up_result = _run_browser_command(effective_task_id, "mouse", ["up"] + btn_arg)
if not up_result.get("success"):
return json.dumps({
"success": False,
"error": f"mouse up failed: {up_result.get('error', 'unknown')}",
}, ensure_ascii=False)
return json.dumps({
"success": True,
"clicked_at": {"x": ix, "y": iy},
"method": "agent_browser_mouse",
}, ensure_ascii=False)
def browser_click(
ref: Optional[str] = None,
x: Optional[float] = None,
y: Optional[float] = None,
task_id: Optional[str] = None,
) -> str:
"""
Click on an element by ref ID, or at exact viewport coordinates.
Provide EITHER ``ref`` (selector-based, via agent-browser) OR ``x`` + ``y``
(compositor-level, via CDP Input.dispatchMouseEvent). Coordinate clicks
bypass DOM selectors entirely they pass through iframes, shadow DOM,
cross-origin boundaries, and canvas elements.
Args:
ref: Element reference (e.g., "@e5")
x: Viewport X coordinate for compositor-level click
y: Viewport Y coordinate for compositor-level click
task_id: Task identifier for session isolation
Returns:
JSON string with click result
"""
# --- Input validation ---------------------------------------------------
has_ref = ref is not None and str(ref).strip() != ""
has_coords = x is not None and y is not None
if has_ref and has_coords:
return json.dumps({
"success": False,
"error": "Provide either 'ref' or 'x'+'y', not both.",
}, ensure_ascii=False)
if (x is not None) != (y is not None):
return json.dumps({
"success": False,
"error": "Both 'x' and 'y' are required for coordinate clicks.",
}, ensure_ascii=False)
if not has_ref and not has_coords:
return json.dumps({
"success": False,
"error": "Provide either 'ref' (element reference) or 'x'+'y' (viewport coordinates).",
}, ensure_ascii=False)
# --- Coordinate-based click (compositor-level) --------------------------
if has_coords:
try:
fx, fy = float(x), float(y) # type: ignore[arg-type]
except (TypeError, ValueError):
return json.dumps({
"success": False,
"error": f"x and y must be numbers, got x={x!r} y={y!r}",
}, ensure_ascii=False)
return _cdp_coordinate_click(fx, fy, task_id or "default")
# --- Ref-based click (existing path) ------------------------------------
if not has_ref or ref is None:
# Defensive guard — validation above should ensure we never reach here
return json.dumps({
"success": False,
"error": "Internal error: expected ref parameter.",
}, ensure_ascii=False)
if _is_camofox_mode():
from tools.browser_camofox import camofox_click
return camofox_click(ref, task_id)
@@ -3413,7 +3753,7 @@ registry.register(
name="browser_click",
toolset="browser",
schema=_BROWSER_SCHEMA_MAP["browser_click"],
handler=lambda args, **kw: browser_click(ref=args.get("ref", ""), task_id=kw.get("task_id")),
handler=lambda args, **kw: browser_click(ref=args.get("ref"), x=args.get("x"), y=args.get("y"), task_id=kw.get("task_id")),
check_fn=check_browser_requirements,
emoji="👆",
)
+14 -9
View File
@@ -5619,14 +5619,13 @@ def _(rid, params: dict) -> dict:
@method("voice.record")
def _(rid, params: dict) -> dict:
"""VAD-driven continuous record loop, CLI-parity.
"""VAD-bounded push-to-talk capture, CLI-parity.
``start`` turns on a VAD loop that emits ``voice.transcript`` events
for each detected utterance and auto-restarts for the next turn.
``stop`` halts the loop (manual stop; matches cli.py's Ctrl+B-while-
recording branch clearing ``_voice_continuous``). Three consecutive
silent cycles stop the loop automatically and emit a
``voice.transcript`` with ``no_speech_limit=True``.
``start`` begins one VAD-bounded capture and emits ``voice.transcript``
after silence stops the recorder. ``stop`` forces transcription of the
active buffer, matching classic CLI push-to-talk. The voice wrapper retains
no-speech counts across single-shot starts, so three consecutive silent
captures emit ``voice.transcript`` with ``no_speech_limit=True``.
"""
action = params.get("action", "start")
@@ -5665,7 +5664,7 @@ def _(rid, params: dict) -> dict:
if isinstance(duration, (int, float)) and not isinstance(duration, bool)
else 3.0
)
start_continuous(
started = start_continuous(
on_transcript=lambda t: _voice_emit("voice.transcript", {"text": t}),
on_status=lambda s: _voice_emit("voice.status", {"state": s}),
on_silent_limit=lambda: _voice_emit(
@@ -5673,13 +5672,19 @@ def _(rid, params: dict) -> dict:
),
silence_threshold=safe_threshold,
silence_duration=safe_duration,
auto_restart=False,
)
if started is False:
return _ok(rid, {"status": "busy"})
return _ok(rid, {"status": "recording"})
# action == "stop"
with _voice_sid_lock:
_voice_event_sid = params.get("session_id") or _voice_event_sid
from hermes_cli.voice import stop_continuous
stop_continuous()
stop_continuous(force_transcribe=True)
return _ok(rid, {"status": "stopped"})
except ImportError:
return _err(
@@ -0,0 +1,44 @@
import { describe, expect, it } from 'vitest'
import { computePrecisionWheelStep, initPrecisionWheel } from '../lib/precisionWheel.js'
describe('precisionWheel', () => {
it('passes the first modifier-held wheel event', () => {
const s = initPrecisionWheel()
expect(computePrecisionWheelStep(s, 1, true, 1000)).toEqual({ active: true, entered: true, rows: 1 })
})
it('coalesces same-frame events without throttling line-by-line scroll', () => {
const s = initPrecisionWheel()
computePrecisionWheelStep(s, 1, true, 1000)
expect(computePrecisionWheelStep(s, 1, true, 1008).rows).toBe(0)
expect(computePrecisionWheelStep(s, 1, true, 1016).rows).toBe(1)
})
it('keeps queued momentum in precision mode briefly after modifier release', () => {
const s = initPrecisionWheel()
computePrecisionWheelStep(s, 1, true, 1000)
expect(computePrecisionWheelStep(s, 1, false, 1050)).toMatchObject({ active: true, rows: 1 })
})
it('leaves precision mode once modifier-free momentum goes idle', () => {
const s = initPrecisionWheel()
computePrecisionWheelStep(s, 1, true, 1000)
expect(computePrecisionWheelStep(s, 1, false, 1100)).toEqual({ active: false, entered: false, rows: 0 })
})
it('does not coalesce immediate reversals', () => {
const s = initPrecisionWheel()
computePrecisionWheelStep(s, 1, true, 1000)
expect(computePrecisionWheelStep(s, -1, true, 1008).rows).toBe(1)
})
})
@@ -0,0 +1,37 @@
import { describe, expect, it, vi } from 'vitest'
import { applyVoiceRecordResponse } from '../app/useInputHandlers.js'
describe('applyVoiceRecordResponse', () => {
it('reverts optimistic REC state when the gateway reports voice busy', () => {
const setProcessing = vi.fn()
const setRecording = vi.fn()
const sys = vi.fn()
applyVoiceRecordResponse({ status: 'busy' }, true, { setProcessing, setRecording }, sys)
expect(setRecording).toHaveBeenCalledWith(false)
expect(setProcessing).toHaveBeenCalledWith(true)
expect(sys).toHaveBeenCalledWith('voice: still transcribing; try again shortly')
})
it('keeps optimistic REC state for successful recording starts', () => {
const setProcessing = vi.fn()
const setRecording = vi.fn()
applyVoiceRecordResponse({ status: 'recording' }, true, { setProcessing, setRecording }, vi.fn())
expect(setRecording).not.toHaveBeenCalled()
expect(setProcessing).not.toHaveBeenCalled()
})
it('reverts optimistic REC state when the gateway returns null', () => {
const setProcessing = vi.fn()
const setRecording = vi.fn()
applyVoiceRecordResponse(null, true, { setProcessing, setRecording }, vi.fn())
expect(setRecording).toHaveBeenCalledWith(false)
expect(setProcessing).toHaveBeenCalledWith(false)
})
})
+32 -1
View File
@@ -1,6 +1,6 @@
import { describe, expect, it } from 'vitest'
import { getViewportSnapshot, viewportSnapshotKey } from '../lib/viewportStore.js'
import { getScrollbarSnapshot, getViewportSnapshot, scrollbarSnapshotKey, viewportSnapshotKey } from '../lib/viewportStore.js'
describe('viewportStore', () => {
it('normalizes absent scroll handles', () => {
@@ -51,4 +51,35 @@ describe('viewportStore', () => {
expect(snap.atBottom).toBe(true)
expect(snap.scrollHeight).toBe(20)
})
it('keeps scrollbar position tied to committed scrollTop, not pending target', () => {
const handle = {
getPendingDelta: () => 24,
getScrollHeight: () => 100,
getScrollTop: () => 10,
getViewportHeight: () => 20,
isSticky: () => false
}
const viewport = getViewportSnapshot(handle as any)
const scrollbar = getScrollbarSnapshot(handle as any)
expect(viewport.top).toBe(34)
expect(scrollbar).toEqual({
scrollHeight: 100,
top: 10,
viewportHeight: 20
})
expect(scrollbarSnapshotKey(scrollbar)).toBe('10:20:100')
})
it('clamps scrollbar position to committed scroll bounds', () => {
const handle = {
getScrollHeight: () => 30,
getScrollTop: () => 50,
getViewportHeight: () => 20
}
expect(getScrollbarSnapshot(handle as any).top).toBe(10)
})
})
+45 -38
View File
@@ -11,6 +11,7 @@ import type {
VoiceRecordResponse
} from '../gatewayTypes.js'
import { isAction, isCopyShortcut, isMac, isVoiceToggleKey } from '../lib/platform.js'
import { computePrecisionWheelStep, initPrecisionWheel } from '../lib/precisionWheel.js'
import { computeWheelStep, initWheelAccelForHost } from '../lib/wheelAccel.js'
import { getInputSelection } from './inputSelectionStore.js'
@@ -21,8 +22,26 @@ import { patchTurnState } from './turnStore.js'
import { getUiState } from './uiStore.js'
const isCtrl = (key: { ctrl: boolean }, ch: string, target: string) => key.ctrl && ch.toLowerCase() === target
const PRECISION_WHEEL_MIN_GAP_MS = 80
const PRECISION_WHEEL_STICKY_MS = 80
export function applyVoiceRecordResponse(
response: null | VoiceRecordResponse,
starting: boolean,
voice: Pick<InputHandlerContext['voice'], 'setProcessing' | 'setRecording'>,
sys: (text: string) => void
) {
if (!starting || response?.status === 'recording') {
return
}
voice.setRecording(false)
if (response?.status === 'busy') {
voice.setProcessing(true)
sys('voice: still transcribing; try again shortly')
} else {
voice.setProcessing(false)
}
}
export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
const { actions, composer, gateway, terminal, voice, wheelStep } = ctx
@@ -38,9 +57,7 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
// rows = wheelStep × accelMult. State mutates in place across renders.
const wheelAccelRef = useRef(initWheelAccelForHost())
const precisionWheelRef = useRef<{ active: boolean; dir: 0 | -1 | 1; lastEventAtMs: number; lastScrollAtMs: number }>(
{ active: false, dir: 0, lastEventAtMs: 0, lastScrollAtMs: 0 }
)
const precisionWheelRef = useRef(initPrecisionWheel())
useEffect(() => () => clearTimeout(scrollIdleTimer.current ?? undefined), [])
@@ -160,11 +177,12 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
}
}
// CLI parity: Ctrl+B toggles the VAD-driven continuous recording loop
// CLI parity: Ctrl+B toggles a VAD-bounded push-to-talk capture
// (NOT the voice-mode umbrella bit). The mode is enabled via /voice on;
// Ctrl+B while the mode is off sys-nudges the user. While the mode is
// on, the first press starts a continuous loop (gateway → start_continuous,
// VAD auto-stop → transcribe → auto-restart), a subsequent press stops it.
// on, the first press starts a single VAD-bounded capture
// (gateway -> start_continuous(auto_restart=false), VAD auto-stop ->
// transcribe -> idle), a subsequent press stops and transcribes it.
// The gateway publishes voice.status + voice.transcript events that
// createGatewayEventHandler turns into UI badges and composer injection.
const voiceRecordToggle = () => {
@@ -185,14 +203,17 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
voice.setProcessing(false)
}
gateway.rpc<VoiceRecordResponse>('voice.record', { action }).catch((e: Error) => {
// Revert optimistic UI on failure.
if (starting) {
voice.setRecording(false)
}
gateway
.rpc<VoiceRecordResponse>('voice.record', { action, session_id: getUiState().sid })
.then(r => applyVoiceRecordResponse(r, starting, voice, actions.sys))
.catch((e: Error) => {
// Revert optimistic UI on failure.
if (starting) {
voice.setRecording(false)
}
actions.sys(`voice error: ${e.message}`)
})
actions.sys(`voice error: ${e.message}`)
})
}
useInput((ch, key) => {
@@ -291,40 +312,26 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
if (key.wheelUp || key.wheelDown) {
const dir: -1 | 1 = key.wheelUp ? -1 : 1
const now = Date.now()
// Modifier-held wheel = precision mode: at most one wheelStep per short
// interval. Smooth mice / trackpads emit many raw wheel events for one
// intended line step, so raw 1:1 still moves too far.
// Modifier-held wheel = precision mode: one row per frame, no accel.
// Smooth mice / trackpads emit tiny same-frame bursts; coalesce those
// without the old 80ms throttle that made opt-scroll feel stepped.
// SGR/X10 mouse encoding only carries shift/meta/ctrl bits; Cmd on
// macOS is intercepted by the terminal, so we honor Option (meta) on
// Mac / Alt (meta) on Win+Linux / Ctrl as a portable fallback. Shift
// is reserved for selection extension.
const hasModifier = key.meta || key.ctrl
const precision = precisionWheelRef.current
// Keep precision active through the current wheel burst after the
// modifier is released. Otherwise a stream of queued/momentum wheel
// events can hand off mid-burst into the accelerated path and jump.
const precisionSticky = now - precision.lastEventAtMs < PRECISION_WHEEL_STICKY_MS
const precision = computePrecisionWheelStep(precisionWheelRef.current, dir, hasModifier, now)
if (hasModifier || precisionSticky) {
if (!precision.active) {
precision.active = true
if (precision.active) {
// Entering precision mode must discard any accelerated wheel state;
// otherwise the next normal wheel event inherits stale momentum.
if (precision.entered) {
wheelAccelRef.current = initWheelAccelForHost()
}
precision.lastEventAtMs = now
if (dir === precision.dir && now - precision.lastScrollAtMs < PRECISION_WHEEL_MIN_GAP_MS) {
return
}
precision.lastScrollAtMs = now
precision.dir = dir
return scrollTranscript(dir * wheelStep)
return precision.rows ? scrollTranscript(dir * wheelStep) : undefined
}
precision.active = false
// 0 = direction-flip bounce deferred; skip the no-op scroll.
const rows = computeWheelStep(wheelAccelRef.current, dir, now)
+11 -5
View File
@@ -1,6 +1,6 @@
import { Box, type ScrollBoxHandle, Text } from '@hermes/ink'
import { useStore } from '@nanostores/react'
import { type ReactNode, type RefObject, useEffect, useMemo, useState } from 'react'
import { type ReactNode, type RefObject, useEffect, useMemo, useRef, useState } from 'react'
import unicodeSpinners from 'unicode-animations'
import { $delegationState } from '../app/delegationStore.js'
@@ -13,7 +13,7 @@ import { fmtDuration } from '../domain/messages.js'
import { stickyPromptFromViewport } from '../domain/viewport.js'
import { buildSubagentTree, treeTotals, widthByDepth } from '../lib/subagentTree.js'
import { fmtK } from '../lib/text.js'
import { useViewportSnapshot } from '../lib/viewportStore.js'
import { useScrollbarSnapshot, useViewportSnapshot } from '../lib/viewportStore.js'
import type { Theme } from '../theme.js'
import type { Msg, Usage } from '../types.js'
@@ -377,7 +377,8 @@ export function StickyPromptTracker({ messages, offsets, scrollRef, onChange }:
export function TranscriptScrollbar({ scrollRef, t }: TranscriptScrollbarProps) {
const [hover, setHover] = useState(false)
const [grab, setGrab] = useState<number | null>(null)
const { scrollHeight: total, top: pos, viewportHeight: vp } = useViewportSnapshot(scrollRef)
const grabRef = useRef<number | null>(null)
const { scrollHeight: total, top: pos, viewportHeight: vp } = useScrollbarSnapshot(scrollRef)
if (!vp) {
return <Box width={1} />
@@ -405,15 +406,20 @@ export function TranscriptScrollbar({ scrollRef, t }: TranscriptScrollbarProps)
onMouseDown={(e: { localRow?: number }) => {
const row = Math.max(0, Math.min(vp - 1, e.localRow ?? 0))
const off = row >= thumbTop && row < thumbTop + thumb ? row - thumbTop : Math.floor(thumb / 2)
grabRef.current = off
setGrab(off)
jump(row, off)
}}
onMouseDrag={(e: { localRow?: number }) =>
jump(Math.max(0, Math.min(vp - 1, e.localRow ?? 0)), grab ?? Math.floor(thumb / 2))
jump(Math.max(0, Math.min(vp - 1, e.localRow ?? 0)), grabRef.current ?? Math.floor(thumb / 2))
}
onMouseEnter={() => setHover(true)}
onMouseLeave={() => setHover(false)}
onMouseUp={() => setGrab(null)}
onMouseUp={() => {
grabRef.current = null
setGrab(null)
}}
width={1}
>
{!scrollable ? (
+1 -1
View File
@@ -295,7 +295,7 @@ export interface VoiceToggleResponse {
}
export interface VoiceRecordResponse {
status?: string
status?: 'busy' | 'recording' | 'stopped'
text?: string
}
+48
View File
@@ -0,0 +1,48 @@
const PRECISION_WHEEL_FRAME_MS = 16
const PRECISION_WHEEL_STICKY_MS = 80
export type PrecisionWheelState = {
active: boolean
dir: 0 | -1 | 1
lastEventAtMs: number
lastScrollAtMs: number
}
export type PrecisionWheelStep = {
active: boolean
entered: boolean
rows: 0 | 1
}
export function initPrecisionWheel(): PrecisionWheelState {
return { active: false, dir: 0, lastEventAtMs: 0, lastScrollAtMs: 0 }
}
export function computePrecisionWheelStep(
state: PrecisionWheelState,
dir: -1 | 1,
hasModifier: boolean,
now: number
): PrecisionWheelStep {
const active = hasModifier || now - state.lastEventAtMs < PRECISION_WHEEL_STICKY_MS
if (!active) {
state.active = false
return { active: false, entered: false, rows: 0 }
}
const entered = !state.active
state.active = true
state.lastEventAtMs = now
if (dir === state.dir && now - state.lastScrollAtMs < PRECISION_WHEEL_FRAME_MS) {
return { active: true, entered, rows: 0 }
}
state.dir = dir
state.lastScrollAtMs = now
return { active: true, entered, rows: 1 }
}
+50
View File
@@ -11,6 +11,12 @@ export interface ViewportSnapshot {
viewportHeight: number
}
export interface ScrollbarSnapshot {
scrollHeight: number
top: number
viewportHeight: number
}
const EMPTY: ViewportSnapshot = {
atBottom: true,
bottom: 0,
@@ -20,6 +26,12 @@ const EMPTY: ViewportSnapshot = {
viewportHeight: 0
}
const EMPTY_SCROLLBAR: ScrollbarSnapshot = {
scrollHeight: 0,
top: 0,
viewportHeight: 0
}
export function getViewportSnapshot(s?: ScrollBoxHandle | null): ViewportSnapshot {
if (!s) {
return EMPTY
@@ -52,6 +64,26 @@ export function viewportSnapshotKey(v: ViewportSnapshot) {
return `${v.atBottom ? 1 : 0}:${Math.ceil(v.top / 8) * 8}:${v.viewportHeight}:${Math.ceil(v.scrollHeight / 8) * 8}:${v.pending}`
}
export function getScrollbarSnapshot(s?: ScrollBoxHandle | null): ScrollbarSnapshot {
if (!s) {
return EMPTY_SCROLLBAR
}
const viewportHeight = Math.max(0, s.getViewportHeight())
const scrollHeight = Math.max(viewportHeight, s.getScrollHeight())
const maxTop = Math.max(0, scrollHeight - viewportHeight)
return {
scrollHeight,
top: Math.max(0, Math.min(maxTop, s.getScrollTop())),
viewportHeight
}
}
export function scrollbarSnapshotKey(v: ScrollbarSnapshot) {
return `${v.top}:${v.viewportHeight}:${v.scrollHeight}`
}
export function useViewportSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>): ViewportSnapshot {
const key = useSyncExternalStore(
useCallback((cb: () => void) => scrollRef.current?.subscribe(cb) ?? (() => {}), [scrollRef]),
@@ -72,3 +104,21 @@ export function useViewportSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>
}
}, [key])
}
export function useScrollbarSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>): ScrollbarSnapshot {
const key = useSyncExternalStore(
useCallback((cb: () => void) => scrollRef.current?.subscribe(cb) ?? (() => {}), [scrollRef]),
() => scrollbarSnapshotKey(getScrollbarSnapshot(scrollRef.current)),
() => scrollbarSnapshotKey(EMPTY_SCROLLBAR)
)
return useMemo(() => {
const [top = '0', viewportHeight = '0', scrollHeight = '0'] = key.split(':')
return {
scrollHeight: Number(scrollHeight),
top: Number(top),
viewportHeight: Number(viewportHeight)
}
}, [key])
}
+21 -4
View File
@@ -335,10 +335,19 @@ Any profile that should be able to work kanban tasks must load the `kanban-worke
3. Call `kanban_heartbeat(note="...")` every few minutes during long operations.
4. Complete with `kanban_complete(summary="...", metadata={...})`, or `kanban_block(reason="...")` if stuck.
Load it with (this one is **you**, installing into a profile — not a tool call):
`kanban-worker` is a bundled skill, synced into every profile during install and
update — there is no separate Skills Hub install step. Verify it is present in
whichever profile you use for kanban workers (`researcher`, `writer`, `ops`,
etc.):
```bash
hermes skills install devops/kanban-worker
hermes -p <your-worker-profile> skills list | grep kanban-worker
```
If the bundled copy is missing, restore it for that profile:
```bash
hermes -p <your-worker-profile> skills reset kanban-worker --restore
```
The dispatcher also auto-passes `--skills kanban-worker` when spawning every worker, so the worker always has the pattern library available even if a profile's default skills config doesn't include it.
@@ -403,10 +412,18 @@ kanban_complete(
)
```
Load it into your orchestrator profile:
`kanban-orchestrator` is a bundled skill. It is synced into each profile during
install and update, so there is no separate Skills Hub install step. Verify it is
present in your orchestrator profile:
```bash
hermes skills install devops/kanban-orchestrator
hermes -p orchestrator skills list | grep kanban-orchestrator
```
If the bundled copy is missing, restore it for that profile:
```bash
hermes -p orchestrator skills reset kanban-orchestrator --restore
```
For best results, pair it with a profile whose toolsets are restricted to board operations (`kanban`, `gateway`, `memory`) so the orchestrator literally cannot execute implementation tasks even if it tries.