fix: self-review findings — logging, create_task, get_running_loop, benchmark path

Self-review findings addressed: - browser_tool.py: log swallowed supervisor error at DEBUG instead of bare 'pass' (was silent, triggered F841 for unused 'exc' variable). Renamed to '_exc' to signal intentional discard. - browser_tool.py: rename unused 'press_id' to '_press_id' in both normal and retry paths (mouseReleased-only wait is intentional; press_id is never used after send). - browser_tool.py: get_event_loop() → get_running_loop() in 3 locations inside _cdp_resolve_session and _cdp_coordinate_click_async. Both are async functions and get_event_loop() is deprecated in async context in Python 3.10+. - browser_supervisor.py: ensure_future → create_task in dispatch_mouse_click. create_task is the correct modern API when already inside a running coroutine; ensure_future is deprecated for coroutines in Python 3.10+. Also consistent with the rest of browser_supervisor.py which uses create_task exclusively everywhere else. - scripts/benchmark_click_paths.py: replace hardcoded /private/tmp/hermes- coord-click sys.path hack with __file__-relative repo root detection so the script works from any checkout location. 27/27 tests pass.
perf: reuse supervisor's persistent WS for coordinate clicks (23x speedup)
2026-05-07 10:49:06 +05:30 · 2026-05-07 10:28:48 +05:30 · 2026-05-07 10:17:29 +05:30 · 2026-05-07 10:04:38 +05:30 · 2026-05-07 09:57:44 +05:30 · 2026-05-06 18:40:30 -07:00
26 changed files with 3198 additions and 205 deletions
@@ -16,9 +16,13 @@ on:
 permissions:
  contents: read

+# Top-level concurrency: do NOT cancel in-flight builds when a new push lands.
+# Every commit deserves its own SHA-tagged image in the registry, and we guard
+# the :latest tag in a separate job below (with its own concurrency group) so
+# a slow run can't clobber :latest with older bits.
 concurrency:
  group: docker-${{ github.ref }}
-  cancel-in-progress: true
+  cancel-in-progress: false

 jobs:
  build-and-push:
@@ -26,11 +30,18 @@ jobs:
    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
    timeout-minutes: 60
+    outputs:
+      pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
    steps:
      - name: Checkout code
        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
        with:
          submodules: recursive
+          # Fetch enough history to run `git merge-base --is-ancestor` in the
+          # move-latest job.  That job reuses this checkout via its own
+          # actions/checkout call, but commits reachable from main up to ~1000
+          # back are plenty for any realistic race window.
+          fetch-depth: 1000

      - name: Set up QEMU
        uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130  # v3
@@ -74,7 +85,12 @@ jobs:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      - name: Push multi-arch image (main branch)
+      # Always push a per-commit SHA tag on main.  This is race-free because
+      # every commit has a unique SHA — concurrent runs can't clobber each
+      # other here.  We also embed the git SHA as an OCI label so the
+      # move-latest job (below) can read it back off the registry's `:latest`.
+      - name: Push multi-arch image with SHA tag (main branch)
+        id: push_sha
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
        with:
@@ -82,10 +98,17 @@ jobs:
          file: Dockerfile
          push: true
          platforms: linux/amd64,linux/arm64
-          tags: nousresearch/hermes-agent:latest
+          tags: nousresearch/hermes-agent:sha-${{ github.sha }}
+          labels: |
+            org.opencontainers.image.revision=${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

+      - name: Mark SHA tag pushed
+        id: mark_pushed
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+        run: echo "pushed=true" >> "$GITHUB_OUTPUT"
+
      - name: Push multi-arch image (release)
        if: github.event_name == 'release'
        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
@@ -97,3 +120,119 @@ jobs:
          tags: nousresearch/hermes-agent:${{ github.event.release.tag_name }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
+
+  # Second job: moves `:latest` to point at the SHA tag the first job pushed.
+  #
+  # Has its own concurrency group with `cancel-in-progress: true`, which
+  # gives us the serialization we need: if a newer push arrives while an
+  # older run is mid-way through this job, the older run is cancelled
+  # before it can clobber `:latest`.  Combined with the ancestor check
+  # below, this means `:latest` only ever moves forward in git history.
+  move-latest:
+    if: |
+      github.repository == 'NousResearch/hermes-agent'
+      && github.event_name == 'push'
+      && github.ref == 'refs/heads/main'
+      && needs.build-and-push.outputs.pushed_sha_tag == 'true'
+    needs: build-and-push
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    concurrency:
+      group: docker-move-latest-${{ github.ref }}
+      cancel-in-progress: true
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        with:
+          fetch-depth: 1000
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      # Read the git revision label off the current `:latest` manifest, then
+      # use `git merge-base --is-ancestor` to check whether our commit is a
+      # descendant of it.  If `:latest` doesn't exist yet, or its label is
+      # missing, we treat that as "safe to publish".  If another run already
+      # advanced `:latest` past us (or diverged), we skip and leave it alone.
+      - name: Decide whether to move :latest
+        id: latest_check
+        run: |
+          set -euo pipefail
+          image=nousresearch/hermes-agent
+
+          # Pull the JSON for the linux/amd64 sub-manifest's config and extract
+          # the OCI revision label with jq — Go template field access can't
+          # handle dots in map keys, so using json+jq is the robust route.
+          image_json=$(
+            docker buildx imagetools inspect "${image}:latest" \
+              --format '{{ json (index .Image "linux/amd64") }}' \
+              2>/dev/null || true
+          )
+
+          if [ -z "${image_json}" ]; then
+            echo "No existing :latest (or inspect failed) — safe to publish."
+            echo "push_latest=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          current_sha=$(
+            printf '%s' "${image_json}" \
+              | jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
+          )
+
+          if [ -z "${current_sha}" ]; then
+            echo "Registry :latest has no revision label — safe to publish."
+            echo "push_latest=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          echo "Registry :latest is at ${current_sha}"
+          echo "This run is at      ${GITHUB_SHA}"
+
+          if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
+            echo ":latest already points at our SHA — nothing to do."
+            echo "push_latest=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Make sure we have the :latest commit locally for merge-base.
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            git fetch --no-tags --prune origin \
+              "+refs/heads/main:refs/remotes/origin/main" \
+              || true
+          fi
+
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            echo "Registry :latest points at an unknown commit (${current_sha}); refusing to overwrite."
+            echo "push_latest=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Our SHA must be a descendant of the current :latest to be safe.
+          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
+            echo "Our commit is a descendant of :latest — safe to advance."
+            echo "push_latest=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "Another run advanced :latest past us (or diverged) — leaving it alone."
+            echo "push_latest=false" >> "$GITHUB_OUTPUT"
+          fi
+
+      # Retag the already-pushed SHA manifest as :latest.  This is a registry-
+      # side operation — no rebuild, no layer re-push — so it's quick and
+      # atomic per-tag.  The ancestor check above plus the cancel-in-progress
+      # concurrency on this job together guarantee we only ever move :latest
+      # forward in git history.
+      - name: Move :latest to this SHA
+        if: steps.latest_check.outputs.push_latest == 'true'
+        run: |
+          set -euo pipefail
+          image=nousresearch/hermes-agent
+          docker buildx imagetools create \
+            --tag "${image}:latest" \
+            "${image}:sha-${GITHUB_SHA}"
@@ -48,16 +48,12 @@ jobs:

      - name: Determine base ref
        id: base
-        env:
-          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
        run: |
-          # For PRs, diff against the PR's pinned parent commit
-          # (github.event.pull_request.base.sha — snapshot at PR open time,
-          # so later pushes to main don't leak into the diff).
+          # For PRs, diff against the merge base with the target branch.
          # For pushes to main, diff against the previous commit on main.
          if [ "${{ github.event_name }}" = "pull_request" ]; then
-            BASE_SHA="${PR_BASE_SHA}"
-            BASE_REF="PR base (${BASE_SHA:0:7})"
+            BASE_SHA=$(git merge-base "origin/${{ github.base_ref }}" HEAD)
+            BASE_REF="origin/${{ github.base_ref }}"
          else
            BASE_SHA=$(git rev-parse HEAD~1 2>/dev/null || git rev-parse HEAD)
            BASE_REF="HEAD~1"
@@ -121,9 +117,6 @@ jobs:
          name: lint-reports
          path: .lint-reports/
          retention-days: 14
-          # .lint-reports/ is a dotfile-prefixed directory, and upload-artifact@v4
-          # skips hidden files by default (breaking change from v3). Opt back in.
-          include-hidden-files: true

      - name: Post / update PR comment
        if: github.event_name == 'pull_request'
@@ -10,6 +10,8 @@ Uses discord.py library for:
 """

 import asyncio
+import hashlib
+import json
 import logging
 import os
 import struct
@@ -24,6 +26,10 @@ logger = logging.getLogger(__name__)

 VALID_THREAD_AUTO_ARCHIVE_MINUTES = {60, 1440, 4320, 10080}
 _DISCORD_COMMAND_SYNC_POLICIES = {"safe", "bulk", "off"}
+_DISCORD_COMMAND_SYNC_STATE_SUBDIR = "gateway"
+_DISCORD_COMMAND_SYNC_STATE_FILENAME = "discord_command_sync_state.json"
+_DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS = 4.5
+_DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS = 30.0

 try:
    import discord
@@ -45,6 +51,7 @@ from gateway.config import Platform, PlatformConfig
 import re

 from gateway.platforms.helpers import MessageDeduplicator, ThreadParticipationTracker
+from utils import atomic_json_write
 from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
@@ -825,6 +832,167 @@ class DiscordAdapter(BasePlatformAdapter):

        logger.info("[%s] Disconnected", self.name)

+    def _command_sync_state_path(self) -> _Path:
+        from hermes_constants import get_hermes_home
+
+        directory = get_hermes_home() / _DISCORD_COMMAND_SYNC_STATE_SUBDIR
+        try:
+            directory.mkdir(parents=True, exist_ok=True)
+        except Exception:
+            pass
+        return directory / _DISCORD_COMMAND_SYNC_STATE_FILENAME
+
+    def _read_command_sync_state(self) -> dict:
+        try:
+            path = self._command_sync_state_path()
+            if not path.exists():
+                return {}
+            data = json.loads(path.read_text(encoding="utf-8"))
+        except Exception:
+            return {}
+        return data if isinstance(data, dict) else {}
+
+    def _write_command_sync_state(self, state: dict) -> None:
+        atomic_json_write(
+            self._command_sync_state_path(),
+            state,
+            indent=None,
+            separators=(",", ":"),
+        )
+
+    def _command_sync_state_key(self, app_id: Any) -> str:
+        return str(app_id or "unknown")
+
+    def _desired_command_sync_fingerprint(self) -> str:
+        tree = self._client.tree if self._client else None
+        desired = []
+        if tree is not None:
+            desired = [
+                self._canonicalize_app_command_payload(command.to_dict(tree))
+                for command in tree.get_commands()
+            ]
+        desired.sort(key=lambda item: (item.get("type", 1), item.get("name", "")))
+        payload = json.dumps(desired, sort_keys=True, separators=(",", ":"))
+        return hashlib.sha256(payload.encode("utf-8")).hexdigest()
+
+    def _command_sync_skip_reason(self, app_id: Any, fingerprint: str) -> Optional[str]:
+        entry = self._read_command_sync_state().get(self._command_sync_state_key(app_id))
+        if not isinstance(entry, dict):
+            return None
+        now = time.time()
+        retry_after_until = float(entry.get("retry_after_until") or 0)
+        if retry_after_until > now:
+            remaining = max(1, int(retry_after_until - now))
+            return f"Discord asked us to wait before syncing slash commands; retry in {remaining}s"
+        if entry.get("fingerprint") == fingerprint and entry.get("last_success_at"):
+            return "same slash-command fingerprint already synced"
+        return None
+
+    def _record_command_sync_attempt(self, app_id: Any, fingerprint: str) -> None:
+        state = self._read_command_sync_state()
+        state[self._command_sync_state_key(app_id)] = {
+            **(
+                state.get(self._command_sync_state_key(app_id))
+                if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
+                else {}
+            ),
+            "fingerprint": fingerprint,
+            "last_attempt_at": time.time(),
+        }
+        self._write_command_sync_state(state)
+
+    def _record_command_sync_rate_limit(self, app_id: Any, fingerprint: str, retry_after: float) -> None:
+        retry_after = max(1.0, float(retry_after))
+        state = self._read_command_sync_state()
+        state[self._command_sync_state_key(app_id)] = {
+            **(
+                state.get(self._command_sync_state_key(app_id))
+                if isinstance(state.get(self._command_sync_state_key(app_id)), dict)
+                else {}
+            ),
+            "fingerprint": fingerprint,
+            "last_attempt_at": time.time(),
+            "retry_after_until": time.time() + retry_after,
+            "retry_after": retry_after,
+        }
+        self._write_command_sync_state(state)
+
+    def _record_command_sync_success(self, app_id: Any, fingerprint: str, summary: dict) -> None:
+        state = self._read_command_sync_state()
+        state[self._command_sync_state_key(app_id)] = {
+            "fingerprint": fingerprint,
+            "last_attempt_at": time.time(),
+            "last_success_at": time.time(),
+            "summary": summary,
+        }
+        self._write_command_sync_state(state)
+
+    @staticmethod
+    def _extract_discord_retry_after(exc: BaseException) -> Optional[float]:
+        value = getattr(exc, "retry_after", None)
+        if value is not None:
+            try:
+                return max(1.0, float(value))
+            except (TypeError, ValueError):
+                return None
+        response = getattr(exc, "response", None)
+        headers = getattr(response, "headers", None)
+        if headers:
+            for key in ("Retry-After", "X-RateLimit-Reset-After"):
+                try:
+                    raw = headers.get(key)
+                except Exception:
+                    raw = None
+                if raw is None:
+                    continue
+                try:
+                    return max(1.0, float(raw))
+                except (TypeError, ValueError):
+                    continue
+        return None
+
+    @staticmethod
+    def _is_discord_rate_limit(exc: BaseException) -> bool:
+        """True only for exceptions that look like Discord 429 rate limits.
+
+        Narrower than ``hasattr(exc, 'retry_after')``: discord.py's own
+        ``RateLimited`` exception and any HTTPException with status 429
+        qualify. This prevents suppressing unrelated failures that happen
+        to expose a ``retry_after`` attribute."""
+        # discord.py emits RateLimited / HTTPException subclasses for 429s.
+        # Guard with isinstance-of-class so a mocked ``discord`` module
+        # (where attrs are MagicMocks, not types) doesn't trip isinstance.
+        if DISCORD_AVAILABLE and discord is not None:
+            for attr_name in ("RateLimited", "HTTPException"):
+                cls = getattr(discord, attr_name, None)
+                if not isinstance(cls, type):
+                    continue
+                if isinstance(exc, cls):
+                    if attr_name == "RateLimited":
+                        return True
+                    status = getattr(exc, "status", None)
+                    if status == 429:
+                        return True
+        # Fallback duck-type: something named like a rate-limit with a
+        # numeric retry_after. Covers mocked clients in tests and exotic
+        # transports, without swallowing arbitrary exceptions.
+        name = type(exc).__name__.lower()
+        if ("ratelimit" in name or "rate_limit" in name) and getattr(exc, "retry_after", None) is not None:
+            return True
+        response = getattr(exc, "response", None)
+        status = getattr(response, "status", None) or getattr(response, "status_code", None)
+        if status == 429:
+            return True
+        return False
+
+    def _command_sync_mutation_interval_seconds(self) -> float:
+        return _DISCORD_COMMAND_SYNC_MUTATION_INTERVAL_SECONDS
+
+    async def _sleep_between_command_sync_mutations(self) -> None:
+        interval = self._command_sync_mutation_interval_seconds()
+        if interval > 0:
+            await asyncio.sleep(interval)
+
    async def _run_post_connect_initialization(self) -> None:
        """Finish non-critical startup work after Discord is connected."""
        if not self._client:
@@ -840,14 +1008,46 @@ class DiscordAdapter(BasePlatformAdapter):
                logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
                return

-            # Discord's per-app command-management bucket is ~5 writes / 20 s,
-            # so a mass-prune-plus-upsert reconcile (e.g. 77 orphans + 30
-            # desired = 107 writes) takes several minutes of forced waits.
-            # A flat 30 s budget blew up reliably under bucket pressure and
-            # left slash commands broken for ~60 min until the bucket fully
-            # recovered. Use a wide ceiling; the cap still guards against a
-            # true hang. (#16713)
-            summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
+            app_id = getattr(self._client, "application_id", None) or getattr(getattr(self._client, "user", None), "id", None)
+            fingerprint = self._desired_command_sync_fingerprint()
+            skip_reason = self._command_sync_skip_reason(app_id, fingerprint)
+            if skip_reason:
+                logger.info("[%s] Skipping Discord slash command sync: %s", self.name, skip_reason)
+                return
+            self._record_command_sync_attempt(app_id, fingerprint)
+
+            http = getattr(self._client, "http", None)
+            has_ratelimit_timeout = http is not None and hasattr(http, "max_ratelimit_timeout")
+            previous_ratelimit_timeout = getattr(http, "max_ratelimit_timeout", None) if has_ratelimit_timeout else None
+            if has_ratelimit_timeout:
+                http.max_ratelimit_timeout = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
+
+            try:
+                # Discord's per-app command-management bucket is small, and
+                # discord.py can otherwise sit inside one long retry sleep
+                # before surfacing the 429. Keep the whole sync bounded and
+                # persist Discord's retry-after when it refuses the batch.
+                summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
+            except Exception as e:
+                if not self._is_discord_rate_limit(e):
+                    raise
+                retry_after = self._extract_discord_retry_after(e)
+                if retry_after is None:
+                    # Rate-limited but no retry-after signal — back off for a
+                    # conservative default so we don't slam the bucket again.
+                    retry_after = _DISCORD_COMMAND_SYNC_MAX_RATE_LIMIT_SLEEP_SECONDS
+                self._record_command_sync_rate_limit(app_id, fingerprint, retry_after)
+                logger.warning(
+                    "[%s] Discord rate-limited slash command sync; retrying after %.0fs",
+                    self.name,
+                    retry_after,
+                )
+                return
+            finally:
+                if has_ratelimit_timeout:
+                    http.max_ratelimit_timeout = previous_ratelimit_timeout
+
+            self._record_command_sync_success(app_id, fingerprint, summary)
            logger.info(
                "[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
                self.name,
@@ -1009,11 +1209,20 @@ class DiscordAdapter(BasePlatformAdapter):
        created = 0
        deleted = 0
        http = self._client.http
+        mutation_count = 0
+
+        async def mutate(call, *args):
+            nonlocal mutation_count
+            if mutation_count:
+                await self._sleep_between_command_sync_mutations()
+            result = await call(*args)
+            mutation_count += 1
+            return result

        for key, desired in desired_by_key.items():
            current = existing_by_key.pop(key, None)
            if current is None:
-                await http.upsert_global_command(app_id, desired)
+                await mutate(http.upsert_global_command, app_id, desired)
                created += 1
                continue

@@ -1025,16 +1234,16 @@ class DiscordAdapter(BasePlatformAdapter):
                continue

            if self._patchable_app_command_payload(current_existing_payload) == self._patchable_app_command_payload(desired):
-                await http.delete_global_command(app_id, current.id)
-                await http.upsert_global_command(app_id, desired)
+                await mutate(http.delete_global_command, app_id, current.id)
+                await mutate(http.upsert_global_command, app_id, desired)
                recreated += 1
                continue

-            await http.edit_global_command(app_id, current.id, desired)
+            await mutate(http.edit_global_command, app_id, current.id, desired)
            updated += 1

        for current in existing_by_key.values():
-            await http.delete_global_command(app_id, current.id)
+            await mutate(http.delete_global_command, app_id, current.id)
            deleted += 1

        return {
@@ -4216,7 +4216,7 @@ def _prompt_model_selection(
            clear_screen=False,
            title=effective_title,
        )
-        idx: int | None = menu.show()  # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
+        idx = menu.show()
        from hermes_cli.curses_ui import flush_stdin
        flush_stdin()
        if idx is None:
@@ -505,6 +505,7 @@ def _read_systemd_unit_properties(
        "SubState",
        "Result",
        "ExecMainStatus",
+        "MainPID",
    ),
 ) -> dict[str, str]:
    """Return selected ``systemctl show`` properties for the gateway unit."""
@@ -538,6 +539,41 @@ def _read_systemd_unit_properties(
    return parsed


+def _systemd_main_pid_from_props(props: dict[str, str]) -> int | None:
+    try:
+        pid = int(props.get("MainPID", "0") or "0")
+    except (TypeError, ValueError):
+        return None
+    return pid if pid > 0 else None
+
+
+def _systemd_main_pid(system: bool = False) -> int | None:
+    return _systemd_main_pid_from_props(_read_systemd_unit_properties(system=system))
+
+
+def _read_gateway_runtime_status() -> dict | None:
+    try:
+        from gateway.status import read_runtime_status
+
+        state = read_runtime_status()
+    except Exception:
+        return None
+    return state if isinstance(state, dict) else None
+
+
+def _gateway_runtime_status_for_pid(pid: int | None) -> dict | None:
+    if not pid:
+        return None
+    state = _read_gateway_runtime_status()
+    if not state:
+        return None
+    try:
+        state_pid = int(state.get("pid", 0) or 0)
+    except (TypeError, ValueError):
+        return None
+    return state if state_pid == pid else None
+
+
 def _wait_for_systemd_service_restart(
    *,
    system: bool = False,
@@ -550,6 +586,7 @@ def _wait_for_systemd_service_restart(
    svc = get_service_name()
    scope_label = _service_scope_label(system).capitalize()
    deadline = time.time() + timeout
+    printed_runtime_wait = False

    while time.time() < deadline:
        props = _read_systemd_unit_properties(system=system)
@@ -562,19 +599,32 @@ def _wait_for_systemd_service_restart(
            new_pid = get_running_pid()
        except Exception:
            new_pid = None
+        if not new_pid:
+            new_pid = _systemd_main_pid_from_props(props)

        if active_state == "active":
            if new_pid and (previous_pid is None or new_pid != previous_pid):
-                print(f"✓ {scope_label} service restarted (PID {new_pid})")
-                return True
-            if previous_pid is None:
-                print(f"✓ {scope_label} service restarted")
-                return True
+                runtime_state = _gateway_runtime_status_for_pid(new_pid)
+                gateway_state = (runtime_state or {}).get("gateway_state")
+                if gateway_state == "running":
+                    print(f"✓ {scope_label} service restarted (PID {new_pid})")
+                    return True
+                if gateway_state == "startup_failed":
+                    reason = (runtime_state or {}).get("exit_reason") or "startup failed"
+                    print(f"⚠ {scope_label} service process restarted (PID {new_pid}), but gateway startup failed: {reason}")
+                    return False
+                if not printed_runtime_wait:
+                    print(f"⏳ {scope_label} service process started (PID {new_pid}); waiting for gateway runtime...")
+                    printed_runtime_wait = True

        if active_state == "activating" and sub_state == "auto-restart":
            time.sleep(1)
            continue

+        if _systemd_unit_is_start_limited(props):
+            _print_systemd_start_limit_wait(system=system)
+            return False
+
        time.sleep(2)

    print(
@@ -585,6 +635,46 @@ def _wait_for_systemd_service_restart(
    return False


+def _systemd_unit_is_start_limited(props: dict[str, str]) -> bool:
+    result = props.get("Result", "").lower()
+    sub_state = props.get("SubState", "").lower()
+    return result == "start-limit-hit" or sub_state == "start-limit-hit"
+
+
+def _systemd_error_indicates_start_limit(exc: subprocess.CalledProcessError) -> bool:
+    parts: list[str] = []
+    for attr in ("stderr", "stdout", "output"):
+        value = getattr(exc, attr, None)
+        if not value:
+            continue
+        if isinstance(value, bytes):
+            value = value.decode(errors="replace")
+        parts.append(str(value))
+    text = "\n".join(parts).lower()
+    return (
+        "start-limit-hit" in text
+        or "start request repeated too quickly" in text
+        or "start-limit" in text
+    )
+
+
+def _systemd_service_is_start_limited(system: bool = False) -> bool:
+    return _systemd_unit_is_start_limited(_read_systemd_unit_properties(system=system))
+
+
+def _print_systemd_start_limit_wait(system: bool = False) -> None:
+    svc = get_service_name()
+    scope_label = _service_scope_label(system).capitalize()
+    scope_flag = " --system" if system else ""
+    systemctl_prefix = "systemctl " if system else "systemctl --user "
+    journal_prefix = "journalctl " if system else "journalctl --user "
+    print(f"⏳ {scope_label} service is temporarily rate-limited by systemd.")
+    print("  systemd is refusing another immediate start after repeated exits.")
+    print(f"  Wait for the start-limit window to expire, then run: {'sudo ' if system else ''}hermes gateway restart{scope_flag}")
+    print(f"  Or clear the failed state manually: {systemctl_prefix}reset-failed {svc}")
+    print(f"  Check logs: {journal_prefix}-u {svc} -l --since '5 min ago'")
+
+
 def _recover_pending_systemd_restart(system: bool = False, previous_pid: int | None = None) -> bool:
    """Recover a planned service restart that is stuck in systemd state."""
    props = _read_systemd_unit_properties(system=system)
@@ -967,6 +1057,27 @@ class UserSystemdUnavailableError(RuntimeError):
    """


+class SystemScopeRequiresRootError(RuntimeError):
+    """Raised when a system-scope gateway operation is attempted as non-root.
+
+    System-scope units live in ``/etc/systemd/system/`` and require root for
+    install / uninstall / start / stop / restart via ``systemctl``. The
+    previous behavior was ``sys.exit(1)`` which blew past the wizard's
+    ``except Exception`` guards and dumped the user at a bare shell prompt
+    with no guidance. Raising a typed exception lets callers that can
+    recover (the setup wizard) print actionable remediation instead, while
+    ``gateway_command`` still exits 1 with the same message for the direct
+    CLI path.
+
+    ``args[0]`` carries the user-facing message, ``args[1]`` the action name.
+    ``str(e)`` returns only the message (not the tuple repr) so format
+    strings like ``f"Failed: {e}"`` render cleanly.
+    """
+
+    def __str__(self) -> str:
+        return self.args[0] if self.args else ""
+
+
 def _user_dbus_socket_path() -> Path:
    """Return the expected per-user D-Bus socket path (regardless of existence)."""
    xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
@@ -1382,8 +1493,10 @@ def print_systemd_scope_conflict_warning() -> None:

 def _require_root_for_system_service(action: str) -> None:
    if os.geteuid() != 0:
-        print(f"System gateway {action} requires root. Re-run with sudo.")
-        sys.exit(1)
+        raise SystemScopeRequiresRootError(
+            f"System gateway {action} requires root. Re-run with sudo.",
+            action,
+        )


 def _system_service_identity(run_as_user: str | None = None) -> tuple[str, str, str]:
@@ -1930,6 +2043,47 @@ def _select_systemd_scope(system: bool = False) -> bool:
    return get_systemd_unit_path(system=True).exists() and not get_systemd_unit_path(system=False).exists()


+def _system_scope_wizard_would_need_root(system: bool = False) -> bool:
+    """True when the setup wizard is about to trigger a system-scope operation
+    as a non-root user.
+
+    Replicates the decision ``_select_systemd_scope`` makes inside
+    ``systemd_start`` / ``systemd_restart`` / ``systemd_stop`` so the wizard
+    can detect the dead-end BEFORE prompting, rather than letting
+    ``SystemScopeRequiresRootError`` propagate out and leave the user
+    staring at a bare shell.
+    """
+    if os.geteuid() == 0:
+        return False
+    return _select_systemd_scope(system=system)
+
+
+def _print_system_scope_remediation(action: str) -> None:
+    """Print actionable remediation when the wizard skips a system-scope
+    prompt because the user isn't root. Keeps the wizard flowing instead of
+    aborting.
+    """
+    svc = get_service_name()
+    print_warning(
+        f"Gateway is installed as a system-wide service — "
+        f"{action} requires root."
+    )
+    print_info("  Options:")
+    print_info(f"    1. {action.capitalize()} it this time:")
+    if action == "start":
+        print_info(f"         sudo systemctl start {svc}")
+    elif action == "stop":
+        print_info(f"         sudo systemctl stop {svc}")
+    elif action == "restart":
+        print_info(f"         sudo systemctl restart {svc}")
+    else:
+        print_info(f"         sudo systemctl {action} {svc}")
+    print_info("    2. Switch to a per-user service (recommended for personal use):")
+    print_info("         sudo hermes gateway uninstall --system")
+    print_info("         hermes gateway install")
+    print_info("         hermes gateway start")
+
+
 def _get_restart_drain_timeout() -> float:
    """Return the configured gateway restart drain timeout in seconds."""
    raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
@@ -2071,41 +2225,52 @@ def systemd_restart(system: bool = False):
    refresh_systemd_unit_if_needed(system=system)
    from gateway.status import get_running_pid

-    pid = get_running_pid()
-    if pid is not None and _request_gateway_self_restart(pid):
-        import time
+    pid = get_running_pid() or _systemd_main_pid(system=system)
+    if pid is not None:
        scope_label = _service_scope_label(system).capitalize()
        svc = get_service_name()
+        drain_timeout = _get_restart_drain_timeout()

-        # Phase 1: wait for old process to exit (drain + shutdown)
-        print(f"⏳ {scope_label} service draining active work...")
-        deadline = time.time() + 90
-        while time.time() < deadline:
-            try:
-                os.kill(pid, 0)
-                time.sleep(1)
-            except (ProcessLookupError, PermissionError):
-                break  # old process is gone
-        else:
-            print(f"⚠ Old process (PID {pid}) still alive after 90s")
+        print(f"⏳ {scope_label} service restarting gracefully (PID {pid})...")
+        if _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
+            # The gateway exits with code 75 for a planned service restart.
+            # RestartSec can otherwise delay the relaunch even though the
+            # operator asked for an immediate restart, so kick the unit once
+            # the old PID has exited and then wait for the replacement PID.
+            _run_systemctl(
+                ["reset-failed", svc],
+                system=system,
+                check=False,
+                timeout=30,
+            )
+            _run_systemctl(
+                ["restart", svc],
+                system=system,
+                check=False,
+                timeout=90,
+            )
+            if _wait_for_systemd_service_restart(system=system, previous_pid=pid):
+                return
+            if _systemd_service_is_start_limited(system=system):
+                return

-        # The gateway exits with code 75 for a planned service restart.
-        # systemd can sit in the RestartSec window or even wedge itself into a
-        # failed/rate-limited state if the operator asks for another restart in
-        # the middle of that handoff. Clear any stale failed state and kick the
-        # unit immediately so `hermes gateway restart` behaves idempotently.
+        print(
+            f"⚠ Graceful restart did not complete within {int(drain_timeout + 5)}s; "
+            "forcing a service restart..."
+        )
        _run_systemctl(
            ["reset-failed", svc],
            system=system,
            check=False,
            timeout=30,
        )
-        _run_systemctl(
-            ["start", svc],
-            system=system,
-            check=False,
-            timeout=90,
-        )
+        try:
+            _run_systemctl(["restart", svc], system=system, check=True, timeout=90)
+        except subprocess.CalledProcessError as exc:
+            if _systemd_error_indicates_start_limit(exc) or _systemd_service_is_start_limited(system=system):
+                _print_systemd_start_limit_wait(system=system)
+                return
+            raise
        _wait_for_systemd_service_restart(system=system, previous_pid=pid)
        return

@@ -2118,8 +2283,14 @@ def systemd_restart(system: bool = False):
        check=False,
        timeout=30,
    )
-    _run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
-    print(f"✓ {_service_scope_label(system).capitalize()} service restarted")
+    try:
+        _run_systemctl(["restart", get_service_name()], system=system, check=True, timeout=90)
+    except subprocess.CalledProcessError as exc:
+        if _systemd_error_indicates_start_limit(exc) or _systemd_service_is_start_limited(system=system):
+            _print_systemd_start_limit_wait(system=system)
+            return
+        raise
+    _wait_for_systemd_service_restart(system=system, previous_pid=pid)



@@ -2191,6 +2362,10 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
    result_code = unit_props.get("Result", "")
    if active_state == "activating" and sub_state == "auto-restart":
        print("  ⏳ Restart pending: systemd is waiting to relaunch the gateway")
+    elif _systemd_unit_is_start_limited(unit_props):
+        print("  ⏳ Restart pending: systemd is temporarily rate-limiting starts")
+        print(f"  Run after the start-limit window expires: {'sudo ' if system else ''}hermes gateway restart{scope_flag}")
+        print(f"  Or clear it manually: systemctl {'--user ' if not system else ''}reset-failed {get_service_name()}")
    elif active_state == "failed" and exec_main_status == str(GATEWAY_SERVICE_RESTART_EXIT_CODE):
        print("  ⚠ Planned restart is stuck in systemd failed state (exit 75)")
        print(f"  Run: systemctl {'--user ' if not system else ''}reset-failed {get_service_name()} && {'sudo ' if system else ''}hermes gateway start{scope_flag}")
@@ -4115,7 +4290,9 @@ def gateway_setup():
        print_success("Gateway service is installed and running.")
    elif service_installed:
        print_warning("Gateway service is installed but not running.")
-        if prompt_yes_no("  Start it now?", True):
+        if supports_systemd_services() and _system_scope_wizard_would_need_root():
+            _print_system_scope_remediation("start")
+        elif prompt_yes_no("  Start it now?", True):
            try:
                if supports_systemd_services():
                    systemd_start()
@@ -4125,6 +4302,12 @@ def gateway_setup():
                print_error("  Failed to start — user systemd not reachable:")
                for line in str(e).splitlines():
                    print(f"  {line}")
+            except SystemScopeRequiresRootError as e:
+                # Defense in depth: the pre-check above should have caught
+                # this, but handle the race/edge case gracefully instead of
+                # letting the exception escape the wizard.
+                print_error(f"  Failed to start: {e}")
+                _print_system_scope_remediation("start")
            except subprocess.CalledProcessError as e:
                print_error(f"  Failed to start: {e}")
    else:
@@ -4174,7 +4357,9 @@ def gateway_setup():
        service_running = _is_service_running()

        if service_running:
-            if prompt_yes_no("  Restart the gateway to pick up changes?", True):
+            if supports_systemd_services() and _system_scope_wizard_would_need_root():
+                _print_system_scope_remediation("restart")
+            elif prompt_yes_no("  Restart the gateway to pick up changes?", True):
                try:
                    if supports_systemd_services():
                        systemd_restart()
@@ -4187,10 +4372,15 @@ def gateway_setup():
                    print_error("  Restart failed — user systemd not reachable:")
                    for line in str(e).splitlines():
                        print(f"  {line}")
+                except SystemScopeRequiresRootError as e:
+                    print_error(f"  Restart failed: {e}")
+                    _print_system_scope_remediation("restart")
                except subprocess.CalledProcessError as e:
                    print_error(f"  Restart failed: {e}")
        elif service_installed:
-            if prompt_yes_no("  Start the gateway service?", True):
+            if supports_systemd_services() and _system_scope_wizard_would_need_root():
+                _print_system_scope_remediation("start")
+            elif prompt_yes_no("  Start the gateway service?", True):
                try:
                    if supports_systemd_services():
                        systemd_start()
@@ -4200,6 +4390,9 @@ def gateway_setup():
                    print_error("  Start failed — user systemd not reachable:")
                    for line in str(e).splitlines():
                        print(f"  {line}")
+                except SystemScopeRequiresRootError as e:
+                    print_error(f"  Start failed: {e}")
+                    _print_system_scope_remediation("start")
                except subprocess.CalledProcessError as e:
                    print_error(f"  Start failed: {e}")
        else:
@@ -4273,6 +4466,14 @@ def gateway_command(args):
        for line in str(e).splitlines():
            print(f"  {line}")
        sys.exit(1)
+    except SystemScopeRequiresRootError as e:
+        # The direct ``hermes gateway install|uninstall|start|stop|restart``
+        # path lands here when the user typed a system-scope action without
+        # sudo. Same exit code as before — just gives the wizard a way to
+        # intercept the same condition with friendlier guidance before the
+        # error is raised.
+        print(str(e))
+        sys.exit(1)


 def _gateway_command_inner(args):
@@ -3534,7 +3534,7 @@ def _remove_custom_provider(config):
            clear_screen=False,
            title="Select provider to remove:",
        )
-        idx: int | None = menu.show()  # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
+        idx = menu.show()
        from hermes_cli.curses_ui import flush_stdin

        flush_stdin()
@@ -3620,7 +3620,7 @@ def _model_flow_named_custom(config, provider_info):
                clear_screen=False,
                title=f"Select model from {name}:",
            )
-            idx: int | None = menu.show()  # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
+            idx = menu.show()
            from hermes_cli.curses_ui import flush_stdin

            flush_stdin()
@@ -3796,7 +3796,7 @@ def _prompt_reasoning_effort_selection(efforts, current_effort=""):
            clear_screen=False,
            title="Select reasoning effort:",
        )
-        idx: int | None = menu.show()  # ty:ignore[invalid-assignment] - TerminalMenu.show() is always `int | None` when multi_select is False / not provided.
+        idx = menu.show()
        from hermes_cli.curses_ui import flush_stdin

        flush_stdin()
@@ -7582,7 +7582,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
            # systemd units without SIGUSR1 wiring this wait just times out
            # and we fall back to ``systemctl restart`` (the old behaviour).
            try:
-                from gateway.restart import (
+                from hermes_constants import (
                    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT as _DEFAULT_DRAIN,
                )
            except Exception:
@@ -2462,6 +2462,9 @@ def setup_gateway(config: dict):
            launchd_start,
            launchd_restart,
            UserSystemdUnavailableError,
+            SystemScopeRequiresRootError,
+            _system_scope_wizard_would_need_root,
+            _print_system_scope_remediation,
        )

        service_installed = _is_service_installed()
@@ -2479,7 +2482,9 @@ def setup_gateway(config: dict):
            print()

        if service_running:
-            if prompt_yes_no("  Restart the gateway to pick up changes?", True):
+            if supports_systemd and _system_scope_wizard_would_need_root():
+                _print_system_scope_remediation("restart")
+            elif prompt_yes_no("  Restart the gateway to pick up changes?", True):
                try:
                    if supports_systemd:
                        systemd_restart()
@@ -2489,10 +2494,19 @@ def setup_gateway(config: dict):
                    print_error("  Restart failed — user systemd not reachable:")
                    for line in str(e).splitlines():
                        print(f"  {line}")
+                except SystemScopeRequiresRootError as e:
+                    # Defense in depth: the pre-check above should have
+                    # caught this, but a race (unit file appearing mid-run)
+                    # could still land here. Previously this exited the
+                    # whole wizard via sys.exit(1).
+                    print_error(f"  Restart failed: {e}")
+                    _print_system_scope_remediation("restart")
                except Exception as e:
                    print_error(f"  Restart failed: {e}")
        elif service_installed:
-            if prompt_yes_no("  Start the gateway service?", True):
+            if supports_systemd and _system_scope_wizard_would_need_root():
+                _print_system_scope_remediation("start")
+            elif prompt_yes_no("  Start the gateway service?", True):
                try:
                    if supports_systemd:
                        systemd_start()
@@ -2502,6 +2516,9 @@ def setup_gateway(config: dict):
                    print_error("  Start failed — user systemd not reachable:")
                    for line in str(e).splitlines():
                        print(f"  {line}")
+                except SystemScopeRequiresRootError as e:
+                    print_error(f"  Start failed: {e}")
+                    _print_system_scope_remediation("start")
                except Exception as e:
                    print_error(f"  Start failed: {e}")
        elif supports_service_manager:
@@ -2529,6 +2546,9 @@ def setup_gateway(config: dict):
                            print_error("  Start failed — user systemd not reachable:")
                            for line in str(e).splitlines():
                                print(f"  {line}")
+                        except SystemScopeRequiresRootError as e:
+                            print_error(f"  Start failed: {e}")
+                            _print_system_scope_remediation("start")
                        except Exception as e:
                            print_error(f"  Start failed: {e}")
                except Exception as e:
@@ -281,6 +281,8 @@ _recorder_lock = threading.Lock()
 # ── Continuous (VAD) state ───────────────────────────────────────────
 _continuous_lock = threading.Lock()
 _continuous_active = False
+_continuous_stopping = False
+_continuous_auto_restart: bool = True
 _continuous_recorder: Any = None

 # ── TTS-vs-STT feedback guard ────────────────────────────────────────
@@ -370,32 +372,43 @@ def start_continuous(
    on_silent_limit: Optional[Callable[[], None]] = None,
    silence_threshold: int = 200,
    silence_duration: float = 3.0,
-) -> None:
+    auto_restart: bool = True,
+) -> bool:
    """Start a VAD-driven continuous recording loop.

    The loop calls ``on_transcript(text)`` each time speech is detected and
-    transcribed successfully, then auto-restarts. After
-    ``_CONTINUOUS_NO_SPEECH_LIMIT`` consecutive silent cycles (no speech
-    picked up at all) the loop stops itself and calls ``on_silent_limit``
-    so the UI can reflect "voice off". Idempotent — calling while already
-    active is a no-op.
+    transcribed successfully. If ``auto_restart`` is True, it auto-restarts
+    for the next turn and resets the no-speech counter for that loop. If
+    ``auto_restart`` is False, the first silence-triggered transcription ends
+    the loop and reports ``"idle"``; no-speech counts are retained across
+    starts so a push-to-talk caller can still enforce the three-strikes guard.
+    After ``_CONTINUOUS_NO_SPEECH_LIMIT`` consecutive silent cycles (no speech
+    picked up at all) the loop stops itself and calls ``on_silent_limit`` so the
+    UI can reflect "voice off". Returns False if a previous stop is still
+    transcribing/cleaning up; otherwise returns True. Idempotent — calling while
+    already active is a successful no-op.

    ``on_status`` is called with ``"listening"`` / ``"transcribing"`` /
    ``"idle"`` so the UI can show a live indicator.
    """
-    global _continuous_active, _continuous_recorder
+    global _continuous_active, _continuous_recorder, _continuous_auto_restart
    global _continuous_on_transcript, _continuous_on_status, _continuous_on_silent_limit
    global _continuous_no_speech_count

    with _continuous_lock:
        if _continuous_active:
            _debug("start_continuous: already active — no-op")
-            return
+            return True
+        if _continuous_stopping:
+            _debug("start_continuous: stop/transcribe in progress — busy")
+            return False
        _continuous_active = True
+        _continuous_auto_restart = auto_restart
        _continuous_on_transcript = on_transcript
        _continuous_on_status = on_status
        _continuous_on_silent_limit = on_silent_limit
-        _continuous_no_speech_count = 0
+        if auto_restart:
+            _continuous_no_speech_count = 0

        if _continuous_recorder is None:
            _continuous_recorder = create_audio_recorder()
@@ -428,15 +441,18 @@ def start_continuous(
        except Exception:
            pass

+    return True

-def stop_continuous() -> None:
+
+def stop_continuous(force_transcribe: bool = False) -> None:
    """Stop the active continuous loop and release the microphone.

-    Idempotent — calling while not active is a no-op. Any in-flight
-    transcription completes but its result is discarded (the callback
-    checks ``_continuous_active`` before firing).
+    Idempotent — calling while not active is a no-op. If ``force_transcribe`` is
+    True, the recorder stops synchronously, then transcription/cleanup runs on a
+    background thread before reporting ``"idle"``. Otherwise the buffer is
+    discarded.
    """
-    global _continuous_active, _continuous_on_transcript
+    global _continuous_active, _continuous_on_transcript, _continuous_stopping
    global _continuous_on_status, _continuous_on_silent_limit
    global _continuous_recorder, _continuous_no_speech_count

@@ -446,18 +462,98 @@ def stop_continuous() -> None:
        _continuous_active = False
        rec = _continuous_recorder
        on_status = _continuous_on_status
+        on_transcript = _continuous_on_transcript
+        on_silent_limit = _continuous_on_silent_limit
+        auto_restart = _continuous_auto_restart
+        track_no_speech = force_transcribe and not auto_restart
+        _continuous_stopping = rec is not None
        _continuous_on_transcript = None
        _continuous_on_status = None
        _continuous_on_silent_limit = None
-        _continuous_no_speech_count = 0
+        if not track_no_speech:
+            _continuous_no_speech_count = 0

    if rec is not None:
-        try:
-            # cancel() (not stop()) discards buffered frames — the loop
-            # is over, we don't want to transcribe a half-captured turn.
-            rec.cancel()
-        except Exception as e:
-            logger.warning("failed to cancel recorder: %s", e)
+        if force_transcribe and on_transcript:
+            if on_status:
+                try:
+                    on_status("transcribing")
+                except Exception:
+                    pass
+            try:
+                wav_path = rec.stop()
+            except Exception as e:
+                logger.warning("failed to stop recorder: %s", e)
+                try:
+                    rec.cancel()
+                except Exception as cancel_error:
+                    logger.warning("failed to cancel recorder: %s", cancel_error)
+                wav_path = None
+
+            def _transcribe_and_cleanup():
+                global _continuous_no_speech_count, _continuous_stopping
+                transcript: Optional[str] = None
+                should_halt = False
+
+                try:
+                    if wav_path:
+                        try:
+                            result = transcribe_recording(wav_path)
+                            if result.get("success"):
+                                text = (result.get("transcript") or "").strip()
+                                if text and not is_whisper_hallucination(text):
+                                    transcript = text
+                        finally:
+                            if os.path.isfile(wav_path):
+                                os.unlink(wav_path)
+                except Exception as e:
+                    logger.warning("failed to stop/transcribe recorder: %s", e)
+                finally:
+                    if transcript:
+                        try:
+                            on_transcript(transcript)
+                        except Exception as e:
+                            logger.warning("on_transcript callback raised: %s", e)
+
+                    if track_no_speech:
+                        with _continuous_lock:
+                            if transcript:
+                                _continuous_no_speech_count = 0
+                            else:
+                                _continuous_no_speech_count += 1
+                                should_halt = (
+                                    _continuous_no_speech_count
+                                    >= _CONTINUOUS_NO_SPEECH_LIMIT
+                                )
+                                if should_halt:
+                                    _continuous_no_speech_count = 0
+                        if should_halt and on_silent_limit:
+                            try:
+                                on_silent_limit()
+                            except Exception:
+                                pass
+
+                    _play_beep(frequency=660, count=2)
+                    with _continuous_lock:
+                        _continuous_stopping = False
+                    if on_status:
+                        try:
+                            on_status("idle")
+                        except Exception:
+                            pass
+
+            threading.Thread(target=_transcribe_and_cleanup, daemon=True).start()
+            return
+        else:
+            try:
+                # cancel() (not stop()) discards buffered frames — the loop
+                # is over, we don't want to transcribe a half-captured turn.
+                rec.cancel()
+            except Exception as e:
+                logger.warning("failed to cancel recorder: %s", e)
+
+    with _continuous_lock:
+        _continuous_stopping = False

    # Audible "recording stopped" cue (CLI parity: same 660 Hz × 2 the
    # silence-auto-stop path plays).
@@ -603,23 +699,39 @@ def _continuous_on_silence() -> None:
                _debug("_continuous_on_silence: stopped while waiting for TTS")
                return

-    # Restart for the next turn.
-    _debug(f"_continuous_on_silence: restarting loop (no_speech={no_speech})")
-    _play_beep(frequency=880, count=1)
-    try:
-        rec.start(on_silence_stop=_continuous_on_silence)
-    except Exception as e:
-        logger.error("failed to restart continuous recording: %s", e)
-        _debug(f"_continuous_on_silence: restart raised {type(e).__name__}: {e}")
+    if _continuous_auto_restart:
+        # Restart for the next turn.
+        _debug(f"_continuous_on_silence: restarting loop (no_speech={no_speech})")
+        _play_beep(frequency=880, count=1)
+        try:
+            rec.start(on_silence_stop=_continuous_on_silence)
+        except Exception as e:
+            logger.error("failed to restart continuous recording: %s", e)
+            _debug(f"_continuous_on_silence: restart raised {type(e).__name__}: {e}")
+            with _continuous_lock:
+                _continuous_active = False
+            if on_status:
+                try:
+                    on_status("idle")
+                except Exception:
+                    pass
+            return
+
+        if on_status:
+            try:
+                on_status("listening")
+            except Exception:
+                pass
+    else:
+        # Do not auto-restart. Clean up state and notify idle.
+        _debug("_continuous_on_silence: auto_restart=False, stopping loop")
        with _continuous_lock:
            _continuous_active = False
-        return
-
-    if on_status:
-        try:
-            on_status("listening")
-        except Exception:
-            pass
+        if on_status:
+            try:
+                on_status("idle")
+            except Exception:
+                pass


 # ── TTS API ──────────────────────────────────────────────────────────
@@ -0,0 +1,296 @@
+"""
+Benchmark: Current main (3 separate WS connections) vs optimized (1 connection).
+
+Compares the two CDP coordinate click implementations against a real
+Lightpanda WebSocket at ws://127.0.0.1:63372/.
+
+  - Baseline (current main style): 3 separate _cdp_call() invocations, each
+    opening a fresh WS connection (Target.getTargets, mousePressed, mouseReleased)
+  - Optimized (this PR): single WS connection with all 4 messages pipelined
+    (getTargets + attachToTarget + mousePressed+mouseReleased in one burst)
+
+Also measures the agent-browser HTTP IPC round-trip as a reference point
+for how fast the existing ref-based click path is.
+
+Usage:
+    python scripts/benchmark_click_paths.py
+    python scripts/benchmark_click_paths.py --iterations 300 --warmup 20
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+import sys
+import time
+import urllib.request
+from statistics import mean, median, stdev
+from typing import List, Dict, Optional, Tuple
+import os
+
+# Add repo root to sys.path when running this script directly
+_repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if _repo_root not in sys.path:
+    sys.path.insert(0, _repo_root)
+
+LIGHTPANDA_WS = "ws://127.0.0.1:63372/"
+AGENT_BROWSER_PORT = 63371
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _stats(times_s: List[float]) -> Dict:
+    ms = [t * 1000 for t in times_s]
+    return {
+        "mean_ms":   mean(ms),
+        "median_ms": median(ms),
+        "min_ms":    min(ms),
+        "max_ms":    max(ms),
+        "stdev_ms":  stdev(ms) if len(ms) > 1 else 0.0,
+        "p95_ms":    sorted(ms)[int(len(ms) * 0.95)],
+    }
+
+
+def _bench(fn, warmup: int, n: int) -> Tuple[List[float], int]:
+    for _ in range(warmup):
+        fn()
+    times, errors = [], 0
+    for _ in range(n):
+        t0 = time.perf_counter()
+        try:
+            result = fn()
+            elapsed = time.perf_counter() - t0
+            if isinstance(result, str):
+                d = json.loads(result)
+                if not d.get("success"):
+                    errors += 1
+        except Exception:
+            elapsed = time.perf_counter() - t0
+            errors += 1
+        times.append(elapsed)
+    return times, errors
+
+
+def _row(label: str, stats: Dict, col_w: int = 9) -> None:
+    print(
+        f"  {label:<46}  "
+        f"{stats['mean_ms']:>{col_w}.2f}  "
+        f"{stats['median_ms']:>{col_w}.2f}  "
+        f"{stats['min_ms']:>{col_w}.2f}  "
+        f"{stats['p95_ms']:>{col_w}.2f}  "
+        f"{stats['max_ms']:>{col_w}.2f}  ms"
+    )
+
+
+# ---------------------------------------------------------------------------
+# The "current main" approach — 3 separate _cdp_call() connections
+# ---------------------------------------------------------------------------
+
+def _baseline_cdp_click(endpoint: str, x: int, y: int, button: str = "left") -> str:
+    """Replicate the previous 3-connection approach from the original PR."""
+    from tools.browser_cdp_tool import _cdp_call, _run_async
+
+    try:
+        targets_result = _run_async(_cdp_call(endpoint, "Target.getTargets", {}, None, 10.0))
+        page_target = None
+        for t in targets_result.get("targetInfos", []):
+            if t.get("type") == "page" and t.get("attached", True):
+                page_target = t["targetId"]
+                break
+    except Exception:
+        page_target = None
+
+    mouse_params = {"type": "", "x": x, "y": y, "button": button, "clickCount": 1}
+    try:
+        _run_async(_cdp_call(endpoint, "Input.dispatchMouseEvent",
+                             {**mouse_params, "type": "mousePressed"}, page_target, 10.0))
+        _run_async(_cdp_call(endpoint, "Input.dispatchMouseEvent",
+                             {**mouse_params, "type": "mouseReleased"}, page_target, 10.0))
+    except Exception as e:
+        return json.dumps({"success": False, "error": str(e)})
+    return json.dumps({"success": True, "clicked_at": {"x": x, "y": y}, "method": "baseline"})
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def run_benchmark(iterations: int = 300, warmup: int = 20) -> None:
+    print(f"\n{'=' * 78}")
+    print(f"  browser_click Coordinate Click: Current Main vs Optimized (1-conn)")
+    print(f"  Real Lightpanda WS: {LIGHTPANDA_WS}")
+    print(f"{'=' * 78}")
+    print(f"  Iterations: {iterations}  |  Warmup: {warmup}")
+
+    # pre-flight
+    try:
+        with urllib.request.urlopen("http://127.0.0.1:63372/json/version", timeout=2) as r:
+            info = json.loads(r.read())
+            assert "webSocketDebuggerUrl" in info
+        print(f"  ✓ Lightpanda CDP: {info.get('webSocketDebuggerUrl')}")
+    except Exception as e:
+        print(f"  ✗ Lightpanda not reachable: {e}")
+        return
+
+    try:
+        with urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=2) as r:
+            sessions = json.loads(r.read())
+        print(f"  ✓ agent-browser: {len(sessions)} session(s)")
+        ab_ok = True
+    except Exception:
+        print(f"  ⚠  agent-browser not reachable — ref-click IPC baseline skipped")
+        ab_ok = False
+
+    import importlib
+    import tools.browser_tool as bt
+    import tools.browser_cdp_tool as cdp_mod
+    importlib.reload(cdp_mod)
+    importlib.reload(bt)
+    bt._is_camofox_mode = lambda: False
+    _orig_resolve = cdp_mod._resolve_cdp_endpoint
+
+    # -----------------------------------------------------------------------
+    # 1. Baseline: current-main 3-connection approach
+    # -----------------------------------------------------------------------
+    print(f"\n  [1/4] Baseline (current main — 3 separate WS connections per click)")
+    print(f"        Warmup {warmup}, then {iterations} iterations...")
+
+    base_times, base_err = _bench(
+        lambda: _baseline_cdp_click(LIGHTPANDA_WS, 150, 200),
+        warmup, iterations,
+    )
+    base_stats = _stats(base_times)
+    print(f"        Done — {base_err} errors, mean={base_stats['mean_ms']:.2f}ms")
+
+    # -----------------------------------------------------------------------
+    # 2. Optimized: single-connection — cold cache (session resolve included)
+    # -----------------------------------------------------------------------
+    print(f"\n  [2/4] Optimized — cold cache (1 WS conn, includes getTargets+attachToTarget)")
+    print(f"        {iterations} iterations, cache cleared before each...")
+
+    def _cold_click():
+        bt._CDP_SESSION_CACHE.clear()
+        return bt.browser_click(x=150.0, y=200.0, task_id="bench")
+
+    cdp_mod._resolve_cdp_endpoint = lambda: LIGHTPANDA_WS
+    # Temporarily null out supervisor registry so this test isolates path 2
+    import tools.browser_supervisor as sup_mod
+    _orig_registry_get = sup_mod.SUPERVISOR_REGISTRY.get
+    sup_mod.SUPERVISOR_REGISTRY.get = lambda tid: None
+    cold_times, cold_err = _bench(_cold_click, warmup=0, n=iterations)
+    cold_stats = _stats(cold_times)
+    print(f"        Done — {cold_err} errors, mean={cold_stats['mean_ms']:.2f}ms")
+
+    # -----------------------------------------------------------------------
+    # 3. Optimized: warm cache (session cached — skips getTargets+attachToTarget)
+    # -----------------------------------------------------------------------
+    print(f"\n  [3/4] Optimized — warm cache (1 WS conn, skips getTargets+attachToTarget)")
+    print(f"        Warmup {warmup} (fills cache), then {iterations} iterations...")
+
+    bt._CDP_SESSION_CACHE.clear()
+    opt_times, opt_err = _bench(
+        lambda: bt.browser_click(x=150.0, y=200.0, task_id="bench"),
+        warmup, iterations,
+    )
+    sup_mod.SUPERVISOR_REGISTRY.get = _orig_registry_get
+    cdp_mod._resolve_cdp_endpoint = _orig_resolve
+    opt_stats = _stats(opt_times)
+    print(f"        Done — {opt_err} errors, mean={opt_stats['mean_ms']:.2f}ms")
+
+    # -----------------------------------------------------------------------
+    # 4. Supervisor path: real CDPSupervisor with persistent WS
+    # -----------------------------------------------------------------------
+    print(f"\n  [4/4] Supervisor path (persistent WS — zero per-click connection cost)")
+    print(f"        Starting supervisor → {LIGHTPANDA_WS}...")
+    sup_stats = None
+    sup_err_count = 0
+    try:
+        supervisor = sup_mod.CDPSupervisor.__new__(sup_mod.CDPSupervisor)
+        # minimal init — we only need _loop, _ws, _page_session_id, _state_lock,
+        # _pending_calls, _next_call_id, _active, _stop_requested
+        # Use SUPERVISOR_REGISTRY.get_or_start for a fully initialized supervisor
+        TASK_ID = "bench-supervisor"
+        real_sup = sup_mod.SUPERVISOR_REGISTRY.get_or_start(TASK_ID, LIGHTPANDA_WS)
+        import time as _time
+        # Give supervisor time to connect and attach
+        for _ in range(20):
+            snap = real_sup.snapshot()
+            if snap.active:
+                break
+            _time.sleep(0.1)
+
+        if not real_sup.snapshot().active:
+            print(f"        ⚠  Supervisor did not become active — skipping")
+        else:
+            print(f"        ✓ Supervisor active, warmup {warmup}...")
+            def _sup_click():
+                real_sup.dispatch_mouse_click(150, 200)
+                return json.dumps({"success": True})
+
+            for _ in range(warmup):
+                _sup_click()
+            print(f"        Running {iterations} iterations...")
+            sup_times, sup_err_count = _bench(_sup_click, warmup=0, n=iterations)
+            sup_stats = _stats(sup_times)
+            print(f"        Done — {sup_err_count} errors, mean={sup_stats['mean_ms']:.2f}ms")
+            sup_mod.SUPERVISOR_REGISTRY.stop(TASK_ID)
+    except Exception as e:
+        print(f"        ⚠  Supervisor benchmark failed: {e}")
+
+    # -----------------------------------------------------------------------
+    # Ref baseline
+    # -----------------------------------------------------------------------
+    if ab_ok:
+        print(f"\n  [ref] agent-browser HTTP IPC (ref-click latency baseline)")
+        ab_times = []
+        for _ in range(warmup):
+            urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=5).read()
+        for _ in range(iterations):
+            t0 = time.perf_counter()
+            urllib.request.urlopen(f"http://127.0.0.1:{AGENT_BROWSER_PORT}/api/sessions", timeout=5).read()
+            ab_times.append(time.perf_counter() - t0)
+        ab_stats = _stats(ab_times)
+        print(f"        Done — mean={ab_stats['mean_ms']:.2f}ms")
+
+    # -----------------------------------------------------------------------
+    # Results
+    # -----------------------------------------------------------------------
+    col_w = 9
+    print(f"\n{'─' * 82}")
+    print(f"  {'Approach':<50}  {'Mean':>{col_w}}  {'Median':>{col_w}}  {'Min':>{col_w}}  {'p95':>{col_w}}")
+    print(f"{'─' * 82}")
+    _row("Baseline  (3 WS connections, sequential)         ", base_stats, col_w)
+    _row("Optimized — cold cache (1 conn + negotiate)      ", cold_stats, col_w)
+    _row("Optimized — warm cache (1 conn, skip resolve)    ", opt_stats,  col_w)
+    if sup_stats:
+        _row("Supervisor (persistent WS, zero conn cost)       ", sup_stats,  col_w)
+    if ab_ok:
+        _row("Ref-click IPC baseline (1 HTTP req)              ", ab_stats,  col_w)
+    print(f"{'─' * 82}")
+
+    print(f"\n  Speedups (mean vs baseline):")
+    print(f"    Cold cache:   {base_stats['mean_ms'] / cold_stats['mean_ms']:.2f}x  ({base_stats['mean_ms'] - cold_stats['mean_ms']:.2f} ms saved)")
+    print(f"    Warm cache:   {base_stats['mean_ms'] / opt_stats['mean_ms']:.2f}x  ({base_stats['mean_ms'] - opt_stats['mean_ms']:.2f} ms saved)")
+    if sup_stats:
+        print(f"    Supervisor:   {base_stats['mean_ms'] / sup_stats['mean_ms']:.2f}x  ({base_stats['mean_ms'] - sup_stats['mean_ms']:.2f} ms saved)")
+        print(f"    Warm→Supervisor additional gain: {opt_stats['mean_ms'] - sup_stats['mean_ms']:.2f} ms  (WS conn eliminated)")
+    if ab_ok and sup_stats:
+        print(f"    Supervisor vs ref-click: {sup_stats['mean_ms'] / ab_stats['mean_ms']:.1f}x  (+{sup_stats['mean_ms'] - ab_stats['mean_ms']:.2f} ms)")
+
+    print(f"\n  Optimization tiers in this PR:")
+    print(f"    1. Single WS connection       — eliminates 2 TCP+WS handshakes")
+    print(f"    2. mouseReleased-only wait     — skips redundant press ack (Playwright)")
+    print(f"    3. Session ID cache            — skips getTargets+attachToTarget")
+    print(f"    4. Supervisor reuse (new)      — eliminates the WS open entirely")
+    print(f"       Active after browser_navigate; falls back to warm-cache path if absent.")
+    print()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--iterations", type=int, default=300)
+    parser.add_argument("--warmup", type=int, default=20)
+    args = parser.parse_args()
+    run_benchmark(iterations=args.iterations, warmup=args.warmup)
@@ -1,4 +1,5 @@
 import asyncio
+import json
 import sys
 from types import SimpleNamespace
 from unittest.mock import AsyncMock, MagicMock
@@ -70,6 +71,15 @@ import gateway.platforms.discord as discord_platform  # noqa: E402
 from gateway.platforms.discord import DiscordAdapter  # noqa: E402


+@pytest.fixture(autouse=True)
+def _speed_up_command_sync_mutation_pacing(monkeypatch):
+    monkeypatch.setattr(
+        DiscordAdapter,
+        "_command_sync_mutation_interval_seconds",
+        lambda self: 0.0,
+    )
+
+
 class FakeTree:
    def __init__(self):
        self.sync = AsyncMock(return_value=[])
@@ -536,6 +546,183 @@ async def test_post_connect_initialization_skips_sync_when_policy_off(monkeypatc
    fake_tree.sync.assert_not_called()


+@pytest.mark.asyncio
+async def test_post_connect_initialization_skips_same_fingerprint_after_success(tmp_path, monkeypatch):
+    adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
+    monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
+
+    class _DesiredCommand:
+        def to_dict(self, tree):
+            return {
+                "name": "status",
+                "description": "Show Hermes status",
+                "type": 1,
+                "options": [],
+            }
+
+    fake_tree = SimpleNamespace(
+        get_commands=lambda: [_DesiredCommand()],
+        fetch_commands=AsyncMock(return_value=[]),
+    )
+    fake_http = SimpleNamespace(
+        upsert_global_command=AsyncMock(),
+        edit_global_command=AsyncMock(),
+        delete_global_command=AsyncMock(),
+    )
+    adapter._client = SimpleNamespace(
+        tree=fake_tree,
+        http=fake_http,
+        application_id=999,
+        user=SimpleNamespace(id=999),
+    )
+
+    await adapter._run_post_connect_initialization()
+    await adapter._run_post_connect_initialization()
+
+    fake_tree.fetch_commands.assert_awaited_once()
+    fake_http.upsert_global_command.assert_awaited_once()
+
+
+@pytest.mark.asyncio
+async def test_post_connect_initialization_respects_discord_retry_after(tmp_path, monkeypatch):
+    adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
+    monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
+
+    class _DesiredCommand:
+        def to_dict(self, tree):
+            return {
+                "name": "status",
+                "description": "Show Hermes status",
+                "type": 1,
+                "options": [],
+            }
+
+    adapter._client = SimpleNamespace(
+        tree=SimpleNamespace(get_commands=lambda: [_DesiredCommand()]),
+        application_id=999,
+        user=SimpleNamespace(id=999),
+    )
+    class _DiscordRateLimit(RuntimeError):
+        retry_after = 123.0
+
+    sync = AsyncMock(side_effect=_DiscordRateLimit("discord rate limited"))
+    monkeypatch.setattr(adapter, "_safe_sync_slash_commands", sync)
+
+    await adapter._run_post_connect_initialization()
+    await adapter._run_post_connect_initialization()
+
+    sync.assert_awaited_once()
+    state_path = (
+        tmp_path
+        / discord_platform._DISCORD_COMMAND_SYNC_STATE_SUBDIR
+        / discord_platform._DISCORD_COMMAND_SYNC_STATE_FILENAME
+    )
+    state = json.loads(state_path.read_text())
+    entry = state["999"]
+    assert entry["retry_after"] == 123.0
+    assert entry["retry_after_until"] > entry["last_attempt_at"]
+
+
+@pytest.mark.asyncio
+async def test_post_connect_initialization_reraises_non_rate_limit_exceptions(tmp_path, monkeypatch):
+    """Arbitrary failures during sync must surface, not be swallowed as rate-limits."""
+    adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
+    monkeypatch.setattr("hermes_constants.get_hermes_home", lambda: tmp_path)
+
+    class _DesiredCommand:
+        def to_dict(self, tree):
+            return {"name": "status", "description": "Show Hermes status", "type": 1, "options": []}
+
+    adapter._client = SimpleNamespace(
+        tree=SimpleNamespace(get_commands=lambda: [_DesiredCommand()]),
+        application_id=4242,
+        user=SimpleNamespace(id=4242),
+    )
+
+    # Unrelated failure that happens to expose retry_after. Must NOT be
+    # caught by the rate-limit handler — it has nothing to do with 429s.
+    class _UnrelatedError(RuntimeError):
+        retry_after = 999.0
+
+    sync = AsyncMock(side_effect=_UnrelatedError("database is down"))
+    monkeypatch.setattr(adapter, "_safe_sync_slash_commands", sync)
+
+    # The outer _run_post_connect_initialization has a broad except Exception
+    # that logs defensively — so we assert on state NOT being written.
+    await adapter._run_post_connect_initialization()
+
+    sync.assert_awaited_once()
+    state_path = (
+        tmp_path
+        / discord_platform._DISCORD_COMMAND_SYNC_STATE_SUBDIR
+        / discord_platform._DISCORD_COMMAND_SYNC_STATE_FILENAME
+    )
+    state = json.loads(state_path.read_text()) if state_path.exists() else {}
+    entry = state.get("4242", {})
+    # Attempt was recorded before the sync call, but no rate-limit cooldown
+    # should have been persisted from the unrelated exception.
+    assert "retry_after_until" not in entry
+    assert "retry_after" not in entry
+
+
+@pytest.mark.asyncio
+async def test_safe_sync_slash_commands_paces_mutation_writes(monkeypatch):
+    adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token"))
+    monkeypatch.setattr(
+        DiscordAdapter,
+        "_command_sync_mutation_interval_seconds",
+        lambda self: 1.25,
+    )
+    sleeps = []
+
+    async def fake_sleep(delay):
+        sleeps.append(delay)
+
+    monkeypatch.setattr(discord_platform.asyncio, "sleep", fake_sleep)
+
+    class _DesiredCommand:
+        def __init__(self, payload):
+            self._payload = payload
+
+        def to_dict(self, tree):
+            assert tree is not None
+            return dict(self._payload)
+
+    desired_one = {
+        "name": "status",
+        "description": "Show Hermes status",
+        "type": 1,
+        "options": [],
+    }
+    desired_two = {
+        "name": "debug",
+        "description": "Generate a debug report",
+        "type": 1,
+        "options": [],
+    }
+    fake_tree = SimpleNamespace(
+        get_commands=lambda: [_DesiredCommand(desired_one), _DesiredCommand(desired_two)],
+        fetch_commands=AsyncMock(return_value=[]),
+    )
+    fake_http = SimpleNamespace(
+        upsert_global_command=AsyncMock(),
+        edit_global_command=AsyncMock(),
+        delete_global_command=AsyncMock(),
+    )
+    adapter._client = SimpleNamespace(
+        tree=fake_tree,
+        http=fake_http,
+        application_id=999,
+        user=SimpleNamespace(id=999),
+    )
+
+    summary = await adapter._safe_sync_slash_commands()
+
+    assert summary["created"] == 2
+    assert fake_http.upsert_global_command.await_count == 2
+    assert sleeps == [1.25]
+
+
@pytest.mark.asyncio
 async def test_safe_sync_reads_permission_attrs_from_existing_command():
    """Regression: AppCommand.to_dict() in discord.py does NOT include
@@ -2,6 +2,7 @@

 import os
 import pwd
+import subprocess
 from pathlib import Path
 from types import SimpleNamespace

@@ -90,6 +91,13 @@ class TestSystemdServiceRefresh:
        monkeypatch.setattr(gateway_cli, "generate_systemd_unit", lambda system=False, run_as_user=None: "new unit\n")

        calls = []
+        monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
+        monkeypatch.setattr(gateway_cli, "_recover_pending_systemd_restart", lambda system=False, previous_pid=None: False)
+        monkeypatch.setattr(
+            gateway_cli,
+            "_wait_for_systemd_service_restart",
+            lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
+        )

        def fake_run(cmd, check=True, **kwargs):
            calls.append(cmd)
@@ -100,11 +108,12 @@ class TestSystemdServiceRefresh:
        gateway_cli.systemd_restart()

        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
-        assert calls[:4] == [
+        assert calls[:5] == [
            ["systemctl", "--user", "daemon-reload"],
-            ["systemctl", "--user", "show", gateway_cli.get_service_name(), "--no-pager", "--property", "ActiveState,SubState,Result,ExecMainStatus"],
+            ["systemctl", "--user", "show", gateway_cli.get_service_name(), "--no-pager", "--property", "ActiveState,SubState,Result,ExecMainStatus,MainPID"],
            ["systemctl", "--user", "reset-failed", gateway_cli.get_service_name()],
-            ["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
+            ["systemctl", "--user", "restart", gateway_cli.get_service_name()],
+            ("wait", False, None),
        ]

    def test_systemd_stop_marks_running_gateway_as_planned_stop(self, monkeypatch):
@@ -611,62 +620,141 @@ class TestGatewayServiceDetection:
        assert gateway_cli._is_service_running() is False

 class TestGatewaySystemServiceRouting:
-    def test_systemd_restart_self_requests_graceful_restart_and_waits(self, monkeypatch, capsys):
+    def test_systemd_restart_gracefully_restarts_running_service_and_waits(self, monkeypatch, capsys):
        calls = []

        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
        monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: calls.append(("refresh", system)))
+        monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 12.0)
        monkeypatch.setattr(
            "gateway.status.get_running_pid",
            lambda: 654,
        )
        monkeypatch.setattr(
            gateway_cli,
-            "_request_gateway_self_restart",
-            lambda pid: calls.append(("self", pid)) or True,
+            "_graceful_restart_via_sigusr1",
+            lambda pid, timeout: calls.append(("graceful", pid, timeout)) or True,
        )

-        # Simulate: old process dies immediately, new process becomes active
-        kill_call_count = [0]
-        def fake_kill(pid, sig):
-            kill_call_count[0] += 1
-            if kill_call_count[0] >= 2:  # first call checks, second = dead
-                raise ProcessLookupError()
-        monkeypatch.setattr(os, "kill", fake_kill)
-
-        # Simulate systemctl reset-failed/start followed by an active unit
-        new_pid = [None]
+        # Simulate systemctl reset-failed/restart followed by an active unit.
+        # A plain start does not break systemd's auto-restart timer once the
+        # old gateway has exited with the planned restart code.
        def fake_subprocess_run(cmd, **kwargs):
            if "reset-failed" in cmd:
                calls.append(("reset-failed", cmd))
                return SimpleNamespace(stdout="", returncode=0)
-            if "start" in cmd:
-                calls.append(("start", cmd))
+            if "restart" in cmd:
+                calls.append(("restart", cmd))
                return SimpleNamespace(stdout="", returncode=0)
-            if "show" in cmd:
-                new_pid[0] = 999
-                return SimpleNamespace(
-                    stdout="ActiveState=active\nSubState=running\nResult=success\nExecMainStatus=0\n",
-                    returncode=0,
-                )
            raise AssertionError(f"Unexpected systemctl call: {cmd}")

        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_subprocess_run)
-        # get_running_pid returns new PID after restart
-        pid_calls = [0]
-        def fake_get_pid():
-            pid_calls[0] += 1
-            return 999 if pid_calls[0] > 1 else 654
-        monkeypatch.setattr("gateway.status.get_running_pid", fake_get_pid)
+        monkeypatch.setattr(
+            gateway_cli,
+            "_wait_for_systemd_service_restart",
+            lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
+        )

        gateway_cli.systemd_restart()

-        assert ("self", 654) in calls
+        assert ("graceful", 654, 17.0) in calls
        assert any(call[0] == "reset-failed" for call in calls)
-        assert any(call[0] == "start" for call in calls)
+        assert any(call[0] == "restart" for call in calls)
+        assert ("wait", False, 654) in calls
        out = capsys.readouterr().out.lower()
-        assert "restarted" in out
+        assert "restarting gracefully" in out
+
+    def test_systemd_restart_uses_systemd_main_pid_when_pid_file_is_missing(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
+        monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
+        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: None)
+        monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 10.0)
+        monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
+        monkeypatch.setattr(
+            gateway_cli,
+            "_read_systemd_unit_properties",
+            lambda system=False: {
+                "ActiveState": "active",
+                "SubState": "running",
+                "Result": "success",
+                "ExecMainStatus": "0",
+                "MainPID": "777",
+            },
+        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_graceful_restart_via_sigusr1",
+            lambda pid, timeout: calls.append(("graceful", pid, timeout)) or True,
+        )
+        monkeypatch.setattr(gateway_cli, "_run_systemctl", lambda args, **kwargs: calls.append(args) or SimpleNamespace(stdout="", returncode=0))
+        monkeypatch.setattr(
+            gateway_cli,
+            "_wait_for_systemd_service_restart",
+            lambda system=False, previous_pid=None: calls.append(("wait", system, previous_pid)) or True,
+        )
+
+        gateway_cli.systemd_restart()
+
+        assert ("graceful", 777, 15.0) in calls
+        assert ("wait", False, 777) in calls
+        assert "restarting gracefully (pid 777)" in capsys.readouterr().out.lower()
+
+    def test_wait_for_systemd_restart_waits_for_runtime_running(self, monkeypatch, capsys):
+        monkeypatch.setattr(
+            gateway_cli,
+            "_read_systemd_unit_properties",
+            lambda system=False: {
+                "ActiveState": "active",
+                "SubState": "running",
+                "Result": "success",
+                "ExecMainStatus": "0",
+                "MainPID": "999",
+            },
+        )
+        monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
+        monkeypatch.setattr(
+            gateway_cli,
+            "_gateway_runtime_status_for_pid",
+            lambda pid: {"pid": pid, "gateway_state": "running"},
+        )
+
+        assert gateway_cli._wait_for_systemd_service_restart(previous_pid=777, timeout=0.1) is True
+        assert "restarted (pid 999)" in capsys.readouterr().out.lower()
+
+    def test_systemd_restart_reports_start_limit_hit(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
+        monkeypatch.setattr(gateway_cli, "_require_service_installed", lambda action, system=False: None)
+        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: None)
+        monkeypatch.setattr("gateway.status.get_running_pid", lambda: None)
+        monkeypatch.setattr(gateway_cli, "_recover_pending_systemd_restart", lambda system=False, previous_pid=None: False)
+
+        def fake_run_systemctl(args, **kwargs):
+            calls.append(args)
+            if args[0] == "show":
+                return SimpleNamespace(stdout="ActiveState=inactive\nSubState=dead\nResult=success\nExecMainStatus=0\nMainPID=0\n", stderr="", returncode=0)
+            if args[0] == "reset-failed":
+                return SimpleNamespace(stdout="", stderr="", returncode=0)
+            if args[0] == "restart":
+                raise subprocess.CalledProcessError(
+                    1,
+                    ["systemctl", "--user", *args],
+                    stderr="Job failed. See result 'start-limit-hit'.",
+                )
+            raise AssertionError(f"Unexpected args: {args}")
+
+        monkeypatch.setattr(gateway_cli, "_run_systemctl", fake_run_systemctl)
+
+        gateway_cli.systemd_restart()
+
+        assert ["restart", gateway_cli.get_service_name()] in calls
+        out = capsys.readouterr().out.lower()
+        assert "rate-limited by systemd" in out
+        assert "reset-failed" in out

    def test_systemd_restart_recovers_failed_planned_restart(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
@@ -711,6 +799,11 @@ class TestGatewaySystemServiceRouting:
            "gateway.status.get_running_pid",
            lambda: 999 if started["value"] else None,
        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_gateway_runtime_status_for_pid",
+            lambda pid: {"pid": pid, "gateway_state": "running"},
+        )

        gateway_cli.systemd_restart()

@@ -2177,3 +2270,171 @@ class TestSystemdInstallOffersLegacyRemoval:

        assert prompt_called["count"] == 0
        assert remove_called["invoked"] is False
+
+
+class TestSystemScopeRequiresRootError:
+    """Tests for the SystemScopeRequiresRootError replacement of sys.exit(1).
+
+    Before this change, ``_require_root_for_system_service`` called
+    ``sys.exit(1)`` when non-root code tried a system-scope systemd
+    operation. The wizard's ``except Exception`` guards don't catch
+    ``SystemExit`` (it's a ``BaseException`` subclass), so the user was
+    dumped at a bare shell prompt mid-setup. The fix raises a typed
+    exception instead, which the wizard intercepts and handles with
+    actionable remediation.
+    """
+
+    def test_require_root_raises_when_non_root(self, monkeypatch):
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+
+        with pytest.raises(gateway_cli.SystemScopeRequiresRootError) as excinfo:
+            gateway_cli._require_root_for_system_service("start")
+
+        assert excinfo.value.args[0] == "System gateway start requires root. Re-run with sudo."
+        assert excinfo.value.args[1] == "start"
+        # str(e) renders only the message, not the tuple repr, so that
+        # wizard format strings like f"Failed: {e}" print cleanly.
+        assert str(excinfo.value) == "System gateway start requires root. Re-run with sudo."
+        assert f"Failed: {excinfo.value}" == "Failed: System gateway start requires root. Re-run with sudo."
+
+    def test_require_root_noop_when_root(self, monkeypatch):
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 0)
+
+        # Should not raise, should not exit
+        gateway_cli._require_root_for_system_service("start")
+
+    def test_error_is_runtime_error_subclass(self):
+        """Wizards use ``except Exception`` guards — the error must be a
+        ``RuntimeError`` (catchable by ``Exception``), NOT a ``SystemExit``
+        (``BaseException``), so the wizard can recover from it.
+        """
+        err = gateway_cli.SystemScopeRequiresRootError("msg", "start")
+        assert isinstance(err, RuntimeError)
+        assert isinstance(err, Exception)
+        assert not isinstance(err, SystemExit)
+
+
+class TestSystemScopeWizardPreCheck:
+    """Tests for _system_scope_wizard_would_need_root — the guard the
+    wizard uses to detect the dead-end BEFORE prompting the user to start
+    a service that will fail without sudo.
+    """
+
+    @staticmethod
+    def _setup_units(tmp_path, monkeypatch, system_present: bool, user_present: bool):
+        sys_dir = tmp_path / "sys"
+        usr_dir = tmp_path / "usr"
+        sys_dir.mkdir()
+        usr_dir.mkdir()
+        if system_present:
+            (sys_dir / "hermes-gateway.service").write_text("[Unit]\n")
+        if user_present:
+            (usr_dir / "hermes-gateway.service").write_text("[Unit]\n")
+        monkeypatch.setattr(
+            gateway_cli,
+            "get_systemd_unit_path",
+            lambda system=False: (sys_dir if system else usr_dir) / "hermes-gateway.service",
+        )
+
+    def test_non_root_with_only_system_unit_returns_true(self, tmp_path, monkeypatch):
+        self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=False)
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+
+        assert gateway_cli._system_scope_wizard_would_need_root() is True
+
+    def test_root_never_needs_root(self, tmp_path, monkeypatch):
+        self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=False)
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 0)
+
+        assert gateway_cli._system_scope_wizard_would_need_root() is False
+
+    def test_non_root_with_user_unit_present_returns_false(self, tmp_path, monkeypatch):
+        # User-scope unit present — user can start it themselves, no sudo needed.
+        self._setup_units(tmp_path, monkeypatch, system_present=True, user_present=True)
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+
+        assert gateway_cli._system_scope_wizard_would_need_root() is False
+
+    def test_non_root_with_no_units_returns_false(self, tmp_path, monkeypatch):
+        self._setup_units(tmp_path, monkeypatch, system_present=False, user_present=False)
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+
+        assert gateway_cli._system_scope_wizard_would_need_root() is False
+
+    def test_non_root_with_explicit_system_arg_returns_true(self, tmp_path, monkeypatch):
+        # Caller passed system=True explicitly (e.g. ``hermes gateway start --system``).
+        self._setup_units(tmp_path, monkeypatch, system_present=False, user_present=False)
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+
+        assert gateway_cli._system_scope_wizard_would_need_root(system=True) is True
+
+
+class TestSystemScopeRemediationOutput:
+    """Tests for _print_system_scope_remediation — the actionable guidance
+    shown when the wizard detects a system-scope-only setup as non-root.
+    """
+
+    def test_start_remediation_mentions_sudo_systemctl_and_uninstall(self, capsys, monkeypatch):
+        monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
+
+        gateway_cli._print_system_scope_remediation("start")
+        out = capsys.readouterr().out
+
+        assert "system-wide service" in out
+        assert "start requires root" in out
+        assert "sudo systemctl start hermes-gateway" in out
+        assert "sudo hermes gateway uninstall --system" in out
+        assert "hermes gateway install" in out
+
+    def test_restart_remediation_uses_systemctl_restart(self, capsys, monkeypatch):
+        monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
+
+        gateway_cli._print_system_scope_remediation("restart")
+        out = capsys.readouterr().out
+
+        assert "restart requires root" in out
+        assert "sudo systemctl restart hermes-gateway" in out
+
+    def test_stop_remediation_uses_systemctl_stop(self, capsys, monkeypatch):
+        monkeypatch.setattr(gateway_cli, "get_service_name", lambda: "hermes-gateway")
+
+        gateway_cli._print_system_scope_remediation("stop")
+        out = capsys.readouterr().out
+
+        assert "stop requires root" in out
+        assert "sudo systemctl stop hermes-gateway" in out
+
+
+class TestGatewayCommandCatchesSystemScopeError:
+    """The direct CLI path (``hermes gateway start --system`` etc.) must
+    still exit 1 with a clean message when non-root. The top-level
+    ``gateway_command`` catches ``SystemScopeRequiresRootError`` and
+    converts it back to ``sys.exit(1)``, preserving existing CLI behavior.
+    """
+
+    def test_non_root_system_start_exits_one_with_clean_message(self, tmp_path, monkeypatch, capsys):
+        sys_dir = tmp_path / "sys"
+        usr_dir = tmp_path / "usr"
+        sys_dir.mkdir()
+        usr_dir.mkdir()
+        (sys_dir / "hermes-gateway.service").write_text("[Unit]\n")
+        monkeypatch.setattr(
+            gateway_cli,
+            "get_systemd_unit_path",
+            lambda system=False: (sys_dir if system else usr_dir) / "hermes-gateway.service",
+        )
+        monkeypatch.setattr(gateway_cli.os, "geteuid", lambda: 1000)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway_cli, "kill_gateway_processes", lambda **kw: 0)
+
+        args = SimpleNamespace(gateway_command="start", system=True, all=False)
+
+        with pytest.raises(SystemExit) as excinfo:
+            gateway_cli.gateway_command(args)
+
+        assert excinfo.value.code == 1
+        out = capsys.readouterr().out
+        # Renders the message, NOT the ``('msg', 'action')`` tuple repr
+        assert "System gateway start requires root. Re-run with sudo." in out
+        assert "('" not in out  # no tuple repr leaking through
@@ -309,6 +309,7 @@ class TestContinuousAPI:

        # Isolate from any state left behind by other tests in the session.
        monkeypatch.setattr(voice, "_continuous_active", False)
+        monkeypatch.setattr(voice, "_continuous_stopping", False, raising=False)
        monkeypatch.setattr(voice, "_continuous_recorder", None)

        assert voice.is_continuous_active() is False
@@ -343,11 +344,20 @@ class TestContinuousAPI:

        monkeypatch.setattr(voice, "_continuous_recorder", FakeRecorder())

-        voice.start_continuous(on_transcript=lambda _t: None)
+        started = voice.start_continuous(on_transcript=lambda _t: None)

        # The guard inside start_continuous short-circuits before rec.start()
+        assert started is True
        assert called["n"] == 0

+    def test_start_returns_false_while_stopping(self, monkeypatch):
+        import hermes_cli.voice as voice
+
+        monkeypatch.setattr(voice, "_continuous_active", False)
+        monkeypatch.setattr(voice, "_continuous_stopping", True, raising=False)
+
+        assert voice.start_continuous(on_transcript=lambda _t: None) is False
+

 class TestContinuousLoopSimulation:
    """End-to-end simulation of the VAD loop with a fake recorder.
@@ -368,6 +378,8 @@ class TestContinuousLoopSimulation:
        monkeypatch.setattr(voice, "_continuous_on_transcript", None)
        monkeypatch.setattr(voice, "_continuous_on_status", None)
        monkeypatch.setattr(voice, "_continuous_on_silent_limit", None)
+        monkeypatch.setattr(voice, "_continuous_auto_restart", True, raising=False)
+        monkeypatch.setattr(voice, "_play_beep", lambda *_, **__: None)

        class FakeRecorder:
            _silence_threshold = 200
@@ -381,13 +393,20 @@ class TestContinuousLoopSimulation:
                self.cancelled = 0
                # Preset WAV path returned by stop()
                self.next_stop_wav = "/tmp/fake.wav"
+                self.fail_stop = False
+                self.fail_next_start = False

            def start(self, on_silence_stop=None):
+                if self.fail_next_start:
+                    self.fail_next_start = False
+                    raise RuntimeError("boom")
                self.start_calls += 1
                self.last_callback = on_silence_stop
                self.is_recording = True

            def stop(self):
+                if self.fail_stop:
+                    raise RuntimeError("stop failed")
                self.stopped += 1
                self.is_recording = False
                return self.next_stop_wav
@@ -433,6 +452,204 @@ class TestContinuousLoopSimulation:

        voice.stop_continuous()

+    def test_auto_restart_false_stops_after_first_transcript(self, fake_recorder, monkeypatch):
+        import hermes_cli.voice as voice
+
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": "single shot"},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        transcripts = []
+        statuses = []
+
+        voice.start_continuous(
+            on_transcript=lambda t: transcripts.append(t),
+            on_status=lambda s: statuses.append(s),
+            auto_restart=False,
+        )
+        fake_recorder.last_callback()
+
+        assert transcripts == ["single shot"]
+        assert fake_recorder.start_calls == 1
+        assert statuses == ["listening", "transcribing", "idle"]
+        assert voice.is_continuous_active() is False
+
+    def test_auto_restart_false_retains_silent_strikes_across_starts(
+        self, fake_recorder, monkeypatch
+    ):
+        import hermes_cli.voice as voice
+
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": ""},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        silent_limit_fired = []
+
+        for _ in range(3):
+            voice.start_continuous(
+                on_transcript=lambda _t: None,
+                on_silent_limit=lambda: silent_limit_fired.append(True),
+                auto_restart=False,
+            )
+            fake_recorder.last_callback()
+
+        assert silent_limit_fired == [True]
+        assert voice.is_continuous_active() is False
+        assert fake_recorder.start_calls == 3
+
+    def test_force_transcribe_stop_delivers_current_buffer(self, fake_recorder, monkeypatch):
+        import hermes_cli.voice as voice
+
+        class ImmediateThread:
+            def __init__(self, target, daemon=False):
+                self.target = target
+
+            def start(self):
+                self.target()
+
+        monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": "manual stop"},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        transcripts = []
+        statuses = []
+
+        voice.start_continuous(
+            on_transcript=lambda t: transcripts.append(t),
+            on_status=lambda s: statuses.append(s),
+        )
+        voice.stop_continuous(force_transcribe=True)
+
+        assert fake_recorder.stopped == 1
+        assert transcripts == ["manual stop"]
+        assert statuses == ["listening", "transcribing", "idle"]
+        assert voice.is_continuous_active() is False
+
+    def test_force_transcribe_empty_single_shots_hit_silent_limit(
+        self, fake_recorder, monkeypatch
+    ):
+        import hermes_cli.voice as voice
+
+        class ImmediateThread:
+            def __init__(self, target, daemon=False):
+                self.target = target
+
+            def start(self):
+                self.target()
+
+        monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": ""},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        silent_limit_fired = []
+
+        for _ in range(3):
+            voice.start_continuous(
+                on_transcript=lambda _t: None,
+                on_silent_limit=lambda: silent_limit_fired.append(True),
+                auto_restart=False,
+            )
+            voice.stop_continuous(force_transcribe=True)
+
+        assert silent_limit_fired == [True]
+        assert fake_recorder.stopped == 3
+        assert voice._continuous_no_speech_count == 0
+
+    def test_force_transcribe_valid_single_shot_resets_silent_strikes(
+        self, fake_recorder, monkeypatch
+    ):
+        import hermes_cli.voice as voice
+
+        class ImmediateThread:
+            def __init__(self, target, daemon=False):
+                self.target = target
+
+            def start(self):
+                self.target()
+
+        monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
+        monkeypatch.setattr(voice, "_continuous_no_speech_count", 2)
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": "manual stop"},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        transcripts = []
+        silent_limit_fired = []
+
+        voice.start_continuous(
+            on_transcript=lambda t: transcripts.append(t),
+            on_silent_limit=lambda: silent_limit_fired.append(True),
+            auto_restart=False,
+        )
+        voice.stop_continuous(force_transcribe=True)
+
+        assert transcripts == ["manual stop"]
+        assert silent_limit_fired == []
+        assert voice._continuous_no_speech_count == 0
+
+    def test_force_transcribe_stop_failure_cancels_and_clears_stopping(
+        self, fake_recorder, monkeypatch
+    ):
+        import hermes_cli.voice as voice
+
+        class ImmediateThread:
+            def __init__(self, target, daemon=False):
+                self.target = target
+
+            def start(self):
+                self.target()
+
+        monkeypatch.setattr(voice.threading, "Thread", ImmediateThread)
+        fake_recorder.fail_stop = True
+
+        statuses = []
+        voice.start_continuous(
+            on_transcript=lambda _t: None,
+            on_status=lambda s: statuses.append(s),
+        )
+        voice.stop_continuous(force_transcribe=True)
+
+        assert fake_recorder.cancelled == 1
+        assert statuses == ["listening", "transcribing", "idle"]
+        assert voice.is_continuous_active() is False
+        assert voice._continuous_stopping is False
+
+    def test_restart_failure_reports_idle(self, fake_recorder, monkeypatch):
+        import hermes_cli.voice as voice
+
+        monkeypatch.setattr(
+            voice,
+            "transcribe_recording",
+            lambda _p: {"success": True, "transcript": "hello world"},
+        )
+        monkeypatch.setattr(voice, "is_whisper_hallucination", lambda _t: False)
+
+        statuses = []
+        voice.start_continuous(on_transcript=lambda _t: None, on_status=statuses.append)
+
+        fake_recorder.fail_next_start = True
+        fake_recorder.last_callback()
+
+        assert statuses == ["listening", "transcribing", "idle"]
+        assert voice.is_continuous_active() is False
+
    def test_silent_limit_halts_loop_after_three_strikes(self, fake_recorder, monkeypatch):
        import hermes_cli.voice as voice

@@ -204,6 +204,7 @@ def test_voice_record_start_handles_non_dict_voice_cfg(monkeypatch):
        assert resp["result"]["status"] == "recording"
        assert captured["silence_threshold"] == 200
        assert captured["silence_duration"] == 3.0
+        assert captured["auto_restart"] is False

    # Round-12 Copilot review regression on #19835: ``bool`` is a subclass
    # of ``int``, so the naive ``isinstance(threshold, (int, float))``
@@ -232,6 +233,80 @@ def test_voice_record_start_handles_non_dict_voice_cfg(monkeypatch):
        assert (
            captured["silence_duration"] == 3.0
        ), f"bool silence_duration leaked through for {bad_bool_cfg!r}"
+        assert captured["auto_restart"] is False
+
+
+def test_voice_record_stop_forces_transcription(monkeypatch):
+    captured: dict = {}
+
+    def fake_stop_continuous(**kwargs):
+        captured.update(kwargs)
+
+    monkeypatch.setitem(
+        sys.modules,
+        "hermes_cli.voice",
+        types.SimpleNamespace(
+            start_continuous=lambda **_kwargs: None,
+            stop_continuous=fake_stop_continuous,
+        ),
+    )
+
+    resp = server.dispatch(
+        {
+            "id": "voice-record-stop",
+            "method": "voice.record",
+            "params": {"action": "stop"},
+        }
+    )
+
+    assert resp["result"]["status"] == "stopped"
+    assert captured["force_transcribe"] is True
+
+
+def test_voice_record_stop_updates_event_session_id(monkeypatch):
+    monkeypatch.setitem(
+        sys.modules,
+        "hermes_cli.voice",
+        types.SimpleNamespace(
+            start_continuous=lambda **_kwargs: True,
+            stop_continuous=lambda **_kwargs: None,
+        ),
+    )
+    monkeypatch.setattr(server, "_voice_event_sid", "old-session")
+
+    resp = server.dispatch(
+        {
+            "id": "voice-record-stop-session",
+            "method": "voice.record",
+            "params": {"action": "stop", "session_id": "new-session"},
+        }
+    )
+
+    assert resp["result"]["status"] == "stopped"
+    assert server._voice_event_sid == "new-session"
+
+
+def test_voice_record_start_reports_busy_when_stop_is_in_progress(monkeypatch):
+    monkeypatch.setitem(
+        sys.modules,
+        "hermes_cli.voice",
+        types.SimpleNamespace(
+            start_continuous=lambda **_kwargs: False,
+            stop_continuous=lambda **_kwargs: None,
+        ),
+    )
+    monkeypatch.setenv("HERMES_VOICE", "1")
+    monkeypatch.setattr(server, "_load_cfg", lambda: {"voice": {}})
+
+    resp = server.dispatch(
+        {
+            "id": "voice-record-busy",
+            "method": "voice.record",
+            "params": {"action": "start"},
+        }
+    )
+
+    assert resp["result"]["status"] == "busy"


 def test_voice_toggle_tts_branch_also_carries_record_key(monkeypatch):
@@ -0,0 +1,648 @@
+"""Tests for compositor-level coordinate click (browser_click with x/y params).
+
+Covers:
+- Input validation (ref vs x/y mutually exclusive)
+- CDP coordinate click path (via mock CDP server)
+- agent-browser mouse fallback path
+- Camofox passthrough still works with ref
+"""
+from __future__ import annotations
+
+import asyncio
+import json
+import threading
+from typing import Any, Dict, List
+import pytest
+
+import websockets
+from websockets.asyncio.server import serve
+
+
+# ---------------------------------------------------------------------------
+# In-process CDP mock server (reused from test_browser_cdp_tool.py)
+# ---------------------------------------------------------------------------
+
+
+class _CDPServer:
+    """Tiny CDP mock — replies to registered method handlers."""
+
+    def __init__(self) -> None:
+        self._handlers: Dict[str, Any] = {}
+        self._responses: List[Dict[str, Any]] = []
+        self._loop: asyncio.AbstractEventLoop | None = None
+        self._server: Any = None
+        self._thread: threading.Thread | None = None
+        self._host = "127.0.0.1"
+        self._port = 0
+        self._url: str = ""
+
+    def on(self, method: str, handler):
+        self._handlers[method] = handler
+
+    def start(self) -> str:
+        ready = threading.Event()
+
+        def _run() -> None:
+            self._loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(self._loop)
+
+            async def _handler(ws):
+                try:
+                    async for raw in ws:
+                        msg = json.loads(raw)
+                        call_id = msg.get("id")
+                        method = msg.get("method", "")
+                        params = msg.get("params", {}) or {}
+                        session_id = msg.get("sessionId")
+                        self._responses.append(msg)
+
+                        fn = self._handlers.get(method)
+                        if fn is None:
+                            reply = {
+                                "id": call_id,
+                                "error": {"code": -32601, "message": f"No handler for {method}"},
+                            }
+                        else:
+                            try:
+                                result = fn(params, session_id)
+                                reply = {"id": call_id, "result": result}
+                            except Exception as exc:
+                                reply = {"id": call_id, "error": {"code": -1, "message": str(exc)}}
+                        if session_id:
+                            reply["sessionId"] = session_id
+                        await ws.send(json.dumps(reply))
+                except websockets.exceptions.ConnectionClosed:
+                    pass
+
+            async def _serve() -> None:
+                self._server = await serve(_handler, self._host, 0)
+                sock = next(iter(self._server.sockets))
+                self._port = sock.getsockname()[1]
+                ready.set()
+                await self._server.wait_closed()
+
+            try:
+                self._loop.run_until_complete(_serve())
+            finally:
+                self._loop.close()
+
+        self._thread = threading.Thread(target=_run, daemon=True)
+        self._thread.start()
+        if not ready.wait(timeout=5.0):
+            raise RuntimeError("CDP mock server failed to start")
+        self._url = f"ws://{self._host}:{self._port}/devtools/browser/mock"
+        return self._url
+
+    def stop(self) -> None:
+        if self._loop and self._server:
+            self._loop.call_soon_threadsafe(self._server.close)
+        if self._thread:
+            self._thread.join(timeout=3.0)
+
+    def received(self) -> List[Dict[str, Any]]:
+        return list(self._responses)
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def cdp_server(monkeypatch):
+    """Start a CDP mock and point browser_cdp_tool's resolver at it."""
+    server = _CDPServer()
+    ws_url = server.start()
+
+    import tools.browser_cdp_tool as cdp_mod
+    monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: ws_url)
+
+    # clear the session cache so each test starts fresh
+    from tools import browser_tool as _bt
+    _bt._CDP_SESSION_CACHE.clear()
+
+    try:
+        yield server
+    finally:
+        _bt._CDP_SESSION_CACHE.clear()
+        server.stop()
+
+
+# ---------------------------------------------------------------------------
+# Input validation
+# ---------------------------------------------------------------------------
+
+
+class TestClickInputValidation:
+    """browser_click validates that exactly one of ref / (x,y) is provided."""
+
+    def test_neither_ref_nor_coords(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click())
+        assert result["success"] is False
+        assert "ref" in result["error"].lower() or "x" in result["error"].lower()
+
+    def test_both_ref_and_coords(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click(ref="@e1", x=100, y=200))
+        assert result["success"] is False
+        assert "not both" in result["error"].lower()
+
+    def test_x_without_y(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click(x=100))
+        assert result["success"] is False
+        assert "both" in result["error"].lower()
+
+    def test_y_without_x(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click(y=200))
+        assert result["success"] is False
+        assert "both" in result["error"].lower()
+
+    def test_empty_ref_treated_as_missing(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click(ref=""))
+        assert result["success"] is False
+        assert "ref" in result["error"].lower() or "x" in result["error"].lower()
+
+    def test_non_numeric_coordinates(self):
+        from tools.browser_tool import browser_click
+
+        result = json.loads(browser_click(x="abc", y="def"))
+        assert result["success"] is False
+        assert "number" in result["error"].lower()
+
+
+# ---------------------------------------------------------------------------
+# CDP coordinate click (happy path via mock server)
+# ---------------------------------------------------------------------------
+
+
+class TestCDPCoordinateClick:
+    """Coordinate clicks via CDP Input.dispatchMouseEvent."""
+
+    def test_cdp_click_dispatches_press_and_release(self, cdp_server):
+        from tools.browser_tool import browser_click
+
+        # Register handlers for the protocol calls
+        cdp_server.on(
+            "Target.getTargets",
+            lambda p, s: {
+                "targetInfos": [
+                    {"targetId": "page-1", "type": "page", "attached": True, "url": "https://example.com"},
+                ]
+            },
+        )
+        cdp_server.on(
+            "Target.attachToTarget",
+            lambda p, s: {"sessionId": f"sess-{p['targetId']}"},
+        )
+        cdp_server.on(
+            "Input.dispatchMouseEvent",
+            lambda p, s: {},
+        )
+
+        result = json.loads(browser_click(x=150, y=300))
+        assert result["success"] is True
+        assert result["clicked_at"] == {"x": 150, "y": 300}
+        assert result["method"] == "cdp_compositor"
+
+        # Verify the CDP calls: Target.getTargets, attach, mousePressed, attach, mouseReleased
+        calls = cdp_server.received()
+        methods = [c["method"] for c in calls]
+        assert "Target.getTargets" in methods
+        assert "Input.dispatchMouseEvent" in methods
+
+        # Find the mouse events
+        mouse_events = [c for c in calls if c["method"] == "Input.dispatchMouseEvent"]
+        assert len(mouse_events) == 2
+        assert mouse_events[0]["params"]["type"] == "mousePressed"
+        assert mouse_events[0]["params"]["x"] == 150
+        assert mouse_events[0]["params"]["y"] == 300
+        assert mouse_events[0]["params"]["button"] == "left"
+        assert mouse_events[1]["params"]["type"] == "mouseReleased"
+
+    def test_cdp_click_rounds_float_coordinates(self, cdp_server):
+        from tools.browser_tool import browser_click
+
+        cdp_server.on(
+            "Target.getTargets",
+            lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
+        )
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        result = json.loads(browser_click(x=150.7, y=299.3))
+        assert result["success"] is True
+        assert result["clicked_at"] == {"x": 151, "y": 299}
+
+    def test_cdp_click_no_page_target_still_works(self, cdp_server):
+        """When Target.getTargets returns no page targets, click proceeds without target_id."""
+        from tools.browser_tool import browser_click
+
+        cdp_server.on(
+            "Target.getTargets",
+            lambda p, s: {"targetInfos": [{"targetId": "sw1", "type": "service_worker"}]},
+        )
+        # No Target.attachToTarget needed — page_target is None so _cdp_call
+        # sends without attaching
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        result = json.loads(browser_click(x=50, y=50))
+        assert result["success"] is True
+        assert result["clicked_at"] == {"x": 50, "y": 50}
+
+    def test_cdp_dispatch_mouse_event_failure(self, cdp_server):
+        """When Input.dispatchMouseEvent returns a CDP error, return failure."""
+        from tools.browser_tool import browser_click
+
+        cdp_server.on(
+            "Target.getTargets",
+            lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
+        )
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
+        # No handler for Input.dispatchMouseEvent — server returns CDP error
+
+        result = json.loads(browser_click(x=100, y=200))
+        assert result["success"] is False
+        assert "CDP coordinate click failed" in result["error"]
+
+
+# ---------------------------------------------------------------------------
+# agent-browser mouse fallback
+# ---------------------------------------------------------------------------
+
+
+class TestAgentBrowserMouseFallback:
+    """When no CDP endpoint is available, fall back to agent-browser mouse commands."""
+
+    def test_falls_back_to_agent_browser_mouse(self, monkeypatch):
+        from tools import browser_tool, browser_cdp_tool
+
+        # No CDP endpoint available
+        monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
+
+        # Mock _run_browser_command and _last_session_key
+        commands_sent = []
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            commands_sent.append((command, args))
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        result = json.loads(browser_tool.browser_click(x=200, y=400))
+        assert result["success"] is True
+        assert result["clicked_at"] == {"x": 200, "y": 400}
+        assert result["method"] == "agent_browser_mouse"
+
+        # Should have sent: mouse move, mouse down, mouse up
+        assert len(commands_sent) == 3
+        assert commands_sent[0] == ("mouse", ["move", "200", "400"])
+        assert commands_sent[1] == ("mouse", ["down"])
+        assert commands_sent[2] == ("mouse", ["up"])
+
+    def test_mouse_move_failure_returns_error(self, monkeypatch):
+        from tools import browser_tool, browser_cdp_tool
+
+        monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            if args and args[0] == "move":
+                return {"success": False, "error": "mouse move not supported"}
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        result = json.loads(browser_tool.browser_click(x=100, y=100))
+        assert result["success"] is False
+        assert "mouse move" in result["error"]
+
+    def test_mouse_down_failure_returns_error(self, monkeypatch):
+        from tools import browser_tool, browser_cdp_tool
+
+        monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            if args and args[0] == "down":
+                return {"success": False, "error": "mouse down failed"}
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        result = json.loads(browser_tool.browser_click(x=100, y=100))
+        assert result["success"] is False
+        assert "mouse down" in result["error"]
+
+    def test_mouse_up_failure_returns_error(self, monkeypatch):
+        from tools import browser_tool, browser_cdp_tool
+
+        monkeypatch.setattr(browser_cdp_tool, "_resolve_cdp_endpoint", lambda: "")
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            if args and args[0] == "up":
+                return {"success": False, "error": "mouse up failed"}
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        result = json.loads(browser_tool.browser_click(x=100, y=100))
+        assert result["success"] is False
+        assert "mouse up" in result["error"]
+
+
+# ---------------------------------------------------------------------------
+# Ref-based click unchanged
+# ---------------------------------------------------------------------------
+
+
+class TestRefClickPreserved:
+    """Existing ref-based click behavior is unchanged."""
+
+    def test_ref_click_still_works(self, monkeypatch):
+        from tools import browser_tool
+
+        monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+
+        result = json.loads(browser_tool.browser_click(ref="@e5"))
+        assert result["success"] is True
+        assert result["clicked"] == "@e5"
+
+    def test_ref_without_at_prefix_auto_added(self, monkeypatch):
+        from tools import browser_tool
+
+        monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+
+        clicked_refs = []
+
+        def mock_run_cmd(task_id, command, args=None, timeout=None):
+            clicked_refs.append(args)
+            return {"success": True}
+
+        monkeypatch.setattr(browser_tool, "_run_browser_command", mock_run_cmd)
+
+        browser_tool.browser_click(ref="e12")
+        assert clicked_refs[0] == ["@e12"]
+
+
+# ---------------------------------------------------------------------------
+# Schema check
+# ---------------------------------------------------------------------------
+
+
+class TestSchemaUpdated:
+    """The tool schema reflects x/y params and ref is no longer required."""
+
+    def test_schema_has_x_y_properties(self):
+        from tools.browser_tool import _BROWSER_SCHEMA_MAP
+
+        schema = _BROWSER_SCHEMA_MAP["browser_click"]
+        props = schema["parameters"]["properties"]
+        assert "x" in props
+        assert "y" in props
+        assert props["x"]["type"] == "number"
+        assert props["y"]["type"] == "number"
+
+    def test_schema_no_required_fields(self):
+        from tools.browser_tool import _BROWSER_SCHEMA_MAP
+
+        schema = _BROWSER_SCHEMA_MAP["browser_click"]
+        # ref is no longer required — either ref or x+y
+        assert "required" not in schema["parameters"] or schema["parameters"]["required"] == []
+
+    def test_schema_ref_still_present(self):
+        from tools.browser_tool import _BROWSER_SCHEMA_MAP
+
+        schema = _BROWSER_SCHEMA_MAP["browser_click"]
+        assert "ref" in schema["parameters"]["properties"]
+
+
+# ---------------------------------------------------------------------------
+# Registry integration
+# ---------------------------------------------------------------------------
+
+
+class TestRegistryIntegration:
+    """browser_click is registered with x/y params wired through."""
+
+    def test_dispatch_with_coordinates(self, monkeypatch, cdp_server):
+        from tools.registry import registry
+
+        cdp_server.on(
+            "Target.getTargets",
+            lambda p, s: {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]},
+        )
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        raw = registry.dispatch(
+            "browser_click", {"x": 42, "y": 84}, task_id="t1"
+        )
+        result = json.loads(raw)
+        assert result["success"] is True
+        assert result["clicked_at"] == {"x": 42, "y": 84}
+
+    def test_dispatch_with_ref(self, monkeypatch):
+        from tools import browser_tool
+        from tools.registry import registry
+
+        monkeypatch.setattr(browser_tool, "_is_camofox_mode", lambda: False)
+        monkeypatch.setattr(browser_tool, "_last_session_key", lambda tid: tid)
+        monkeypatch.setattr(
+            browser_tool, "_run_browser_command",
+            lambda tid, cmd, args=None, timeout=None: {"success": True},
+        )
+
+        raw = registry.dispatch("browser_click", {"ref": "@e3"}, task_id="t1")
+        result = json.loads(raw)
+        assert result["success"] is True
+
+
+# ---------------------------------------------------------------------------
+# Session caching
+# ---------------------------------------------------------------------------
+
+
+class TestSessionCaching:
+    """Second click skips Target.getTargets + Target.attachToTarget."""
+
+    def test_second_click_skips_session_resolution(self, cdp_server, monkeypatch):
+        """After first click the session_id is cached; second click goes straight
+        to mousePressed+mouseReleased without re-issuing getTargets/attachToTarget."""
+        from tools import browser_tool
+        import tools.browser_cdp_tool as cdp_mod
+
+        # clear cache
+        browser_tool._CDP_SESSION_CACHE.clear()
+        monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
+
+        resolve_count = {"n": 0}
+
+        def _getTargets(p, s):
+            resolve_count["n"] += 1
+            return {"targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]}
+
+        cdp_server.on("Target.getTargets", _getTargets)
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "sess-cached"})
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        # First click — must call getTargets
+        r1 = json.loads(browser_tool.browser_click(x=10.0, y=20.0))
+        assert r1["success"] is True
+        assert resolve_count["n"] == 1
+
+        # Second click — cache hit; getTargets must NOT be called again
+        r2 = json.loads(browser_tool.browser_click(x=30.0, y=40.0))
+        assert r2["success"] is True
+        assert resolve_count["n"] == 1, "session resolution was repeated despite warm cache"
+
+    def test_stale_session_triggers_reattach(self, cdp_server, monkeypatch):
+        """If the browser returns 'Session with given id not found', the cache is
+        cleared and session resolution runs again before retrying the click."""
+        from tools import browser_tool
+        import tools.browser_cdp_tool as cdp_mod
+
+        browser_tool._CDP_SESSION_CACHE.clear()
+        monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
+
+        call_count = {"mouse": 0, "resolve": 0}
+
+        def _getTargets(p, s):
+            call_count["resolve"] += 1
+            return {"targetInfos": [{"targetId": "px", "type": "page", "attached": True, "url": "..."}]}
+
+        def _dispatch(p, s):
+            call_count["mouse"] += 1
+            # First two mouse calls (with stale session) return an error;
+            # after re-resolve they should succeed
+            if call_count["mouse"] <= 2:
+                raise RuntimeError("Session with given id not found: stale-session-id")
+            return {}
+
+        cdp_server.on("Target.getTargets", _getTargets)
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": f"sess-{call_count['resolve']}"})
+        cdp_server.on("Input.dispatchMouseEvent", _dispatch)
+
+        # Seed cache with stale session to trigger the error path
+        browser_tool._CDP_SESSION_CACHE[cdp_server._url] = "stale-session-id"
+
+        r = json.loads(browser_tool.browser_click(x=50.0, y=60.0))
+        assert r["success"] is True
+        # Must have resolved the session once (after evicting stale entry)
+        assert call_count["resolve"] >= 1
+
+    def test_cache_cleared_on_endpoint_change(self, monkeypatch):
+        """Cache is keyed per endpoint URL; different URL doesn't reuse cached session."""
+        from tools import browser_tool
+
+        browser_tool._CDP_SESSION_CACHE.clear()
+        browser_tool._CDP_SESSION_CACHE["ws://endpoint-a/"] = "sess-a"
+
+        # Endpoint B must not find endpoint A's session
+        assert browser_tool._CDP_SESSION_CACHE.get("ws://endpoint-b/") is None
+
+
+# ---------------------------------------------------------------------------
+# Supervisor path
+# ---------------------------------------------------------------------------
+
+
+class TestSupervisorPath:
+    """When a CDPSupervisor is alive for the task_id, coordinate clicks use its
+    persistent WS connection — zero per-click connection setup cost."""
+
+    def test_supervisor_path_used_when_supervisor_alive(self, monkeypatch):
+        """browser_click delegates to the supervisor when one is registered."""
+        from tools import browser_tool
+
+        clicks = []
+
+        class _FakeSupervisor:
+            def dispatch_mouse_click(self, x, y, button="left", timeout=10.0):
+                clicks.append((x, y, button))
+
+        class _FakeRegistry:
+            def get(self, task_id):
+                return _FakeSupervisor()
+
+        import tools.browser_supervisor as bs_mod
+        monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _FakeRegistry())
+
+        result = json.loads(browser_tool.browser_click(x=77.0, y=88.0, task_id="t1"))
+        assert result["success"] is True
+        assert result["method"] == "cdp_supervisor"
+        assert result["clicked_at"] == {"x": 77, "y": 88}
+        assert clicks == [(77, 88, "left")]
+
+    def test_supervisor_error_falls_through_to_per_click(self, monkeypatch, cdp_server):
+        """If dispatch_mouse_click raises, the per-click WS path is used instead."""
+        from tools import browser_tool
+        import tools.browser_supervisor as bs_mod
+        import tools.browser_cdp_tool as cdp_mod
+
+        browser_tool._CDP_SESSION_CACHE.clear()
+        monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
+
+        class _BrokenSupervisor:
+            def dispatch_mouse_click(self, x, y, button="left", timeout=10.0):
+                raise RuntimeError("supervisor WS disconnected")
+
+        class _BrokenRegistry:
+            def get(self, task_id):
+                return _BrokenSupervisor()
+
+        monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _BrokenRegistry())
+
+        cdp_server.on("Target.getTargets", lambda p, s: {
+            "targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]
+        })
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        result = json.loads(browser_tool.browser_click(x=10.0, y=20.0, task_id="t2"))
+        assert result["success"] is True
+        # Should have fallen through to per-click path (cdp_compositor, not cdp_supervisor)
+        assert result["method"] == "cdp_compositor"
+
+    def test_no_supervisor_uses_per_click_path(self, monkeypatch, cdp_server):
+        """When SUPERVISOR_REGISTRY.get() returns None, the per-click WS path runs."""
+        from tools import browser_tool
+        import tools.browser_supervisor as bs_mod
+        import tools.browser_cdp_tool as cdp_mod
+
+        browser_tool._CDP_SESSION_CACHE.clear()
+        monkeypatch.setattr(cdp_mod, "_resolve_cdp_endpoint", lambda: cdp_server._url)
+
+        class _EmptyRegistry:
+            def get(self, task_id):
+                return None
+
+        monkeypatch.setattr(bs_mod, "SUPERVISOR_REGISTRY", _EmptyRegistry())
+
+        cdp_server.on("Target.getTargets", lambda p, s: {
+            "targetInfos": [{"targetId": "p1", "type": "page", "attached": True, "url": "..."}]
+        })
+        cdp_server.on("Target.attachToTarget", lambda p, s: {"sessionId": "s1"})
+        cdp_server.on("Input.dispatchMouseEvent", lambda p, s: {})
+
+        result = json.loads(browser_tool.browser_click(x=5.0, y=6.0, task_id="t3"))
+        assert result["success"] is True
+        assert result["method"] == "cdp_compositor"
+
@@ -457,7 +457,57 @@ class CDPSupervisor:
            return {"ok": False, "error": f"{type(e).__name__}: {e}"}
        return {"ok": True, "dialog": snapshot_copy.to_dict()}

-    # ── Supervisor loop internals ────────────────────────────────────────────
+    def dispatch_mouse_click(
+        self,
+        x: int,
+        y: int,
+        button: str = "left",
+        timeout: float = 10.0,
+    ) -> None:
+        """Dispatch a compositor-level click over the supervisor's live WS.
+
+        Uses the supervisor's already-connected WebSocket — zero connection
+        setup cost vs opening a fresh WS per click.  mousePressed and
+        mouseReleased are both sent before awaiting either response
+        (pipelined), following the Playwright Promise.all pattern.
+
+        Raises RuntimeError if the supervisor is inactive or the click fails.
+        """
+        loop = self._loop
+        if loop is None or not loop.is_running():
+            raise RuntimeError("supervisor loop is not running")
+
+        with self._state_lock:
+            if not self._active:
+                raise RuntimeError("supervisor is not active")
+            session_id = self._page_session_id
+
+        async def _do_click() -> None:
+            mouse_params = {"x": x, "y": y, "button": button, "clickCount": 1}
+            # Pipeline both events — send without awaiting press ack.
+            # Browser processes CDP messages in order; if mouseReleased is
+            # acked, mousePressed has already been applied.
+            press_fut = asyncio.create_task(
+                self._cdp("Input.dispatchMouseEvent",
+                          {**mouse_params, "type": "mousePressed"},
+                          session_id=session_id, timeout=timeout)
+            )
+            release_fut = asyncio.create_task(
+                self._cdp("Input.dispatchMouseEvent",
+                          {**mouse_params, "type": "mouseReleased"},
+                          session_id=session_id, timeout=timeout)
+            )
+            await asyncio.gather(press_fut, release_fut)
+
+        try:
+            fut = asyncio.run_coroutine_threadsafe(_do_click(), loop)
+            fut.result(timeout=timeout + 1)
+        except Exception as exc:
+            raise RuntimeError(
+                f"supervisor mouse click failed: {type(exc).__name__}: {exc}"
+            ) from exc
+
+

    def _thread_main(self) -> None:
        """Entry point for the supervisor's dedicated thread."""
@@ -1317,16 +1317,23 @@ BROWSER_TOOL_SCHEMAS = [
    },
    {
        "name": "browser_click",
-        "description": "Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first.",
+        "description": "Click on an element identified by its ref ID from the snapshot (e.g., '@e5'). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first.\n\nAlternatively, click at exact viewport coordinates (x, y) using compositor-level input. This bypasses DOM selectors entirely — clicks pass through iframes, shadow DOM, cross-origin boundaries, and canvas elements. Use browser_vision with annotate=true to find coordinates, or browser_console to evaluate getBoundingClientRect(). Provide EITHER ref OR (x + y), not both.",
        "parameters": {
            "type": "object",
            "properties": {
                "ref": {
                    "type": "string",
                    "description": "The element reference from the snapshot (e.g., '@e5', '@e12')"
+                },
+                "x": {
+                    "type": "number",
+                    "description": "Viewport X coordinate for compositor-level click. Use with y instead of ref to click through iframes, shadow DOM, or canvas elements."
+                },
+                "y": {
+                    "type": "number",
+                    "description": "Viewport Y coordinate for compositor-level click. Use with x instead of ref."
                }
-            },
-            "required": ["ref"]
+            }
        }
    },
    {
@@ -2286,17 +2293,350 @@ def browser_snapshot(
        return json.dumps(_copy_fallback_warning(response, result), ensure_ascii=False)


-def browser_click(ref: str, task_id: Optional[str] = None) -> str:
+# ---------------------------------------------------------------------------
+# Session cache for CDP coordinate clicks
+#
+# Target.getTargets + Target.attachToTarget cost one round-trip each and
+# their results (page targetId + session_id) are stable across clicks on
+# the same page.  We cache them keyed by CDP endpoint URL and invalidate
+# automatically when the browser reports a stale session error.
+#
+# Pattern: browser-harness daemon keeps session_id on the daemon object and
+# retries once on "Session with given id not found" to self-heal after
+# navigation or crash.  We replicate that here without a persistent daemon
+# process by storing it in a module-level dict.
+# ---------------------------------------------------------------------------
+
+_CDP_SESSION_CACHE: dict[str, str] = {}  # ws_url → cached session_id
+
+
+async def _cdp_resolve_session(
+    ws: Any,
+    ws_url: str,
+    deadline: float,
+    msg_id_ref: list,
+) -> Optional[str]:
+    """Resolve (and cache) the page-scoped CDP session ID.
+
+    Sends Target.getTargets + Target.attachToTarget on *ws* and caches the
+    resulting session_id for future clicks.  Returns None if no page target
+    is found (Input.dispatchMouseEvent will be sent at browser level, which
+    works for simple cases).
    """
-    Click on an element.
+    import asyncio as _asyncio
+
+    async def _send(method: str, params: dict, sid: Optional[str] = None) -> int:
+        msg_id_ref[0] += 1
+        call_id = msg_id_ref[0]
+        req: dict = {"id": call_id, "method": method, "params": params}
+        if sid:
+            req["sessionId"] = sid
+        await ws.send(json.dumps(req))
+        return call_id
+
+    async def _recv_until(call_id: int) -> dict:
+        while True:
+            remaining = deadline - _asyncio.get_running_loop().time()
+            if remaining <= 0:
+                raise TimeoutError(f"CDP timed out waiting for id={call_id}")
+            raw = await _asyncio.wait_for(ws.recv(), timeout=remaining)
+            msg = json.loads(raw)
+            if msg.get("id") == call_id:
+                if "error" in msg:
+                    raise RuntimeError(f"CDP error: {msg['error']}")
+                return msg.get("result", {})
+
+    gt_id = await _send("Target.getTargets", {})
+    gt_result = await _recv_until(gt_id)
+    page_target_id: Optional[str] = None
+    for t in gt_result.get("targetInfos", []):
+        if t.get("type") == "page" and t.get("attached", True):
+            page_target_id = t["targetId"]
+            break
+
+    if not page_target_id:
+        return None
+
+    at_id = await _send("Target.attachToTarget",
+                        {"targetId": page_target_id, "flatten": True})
+    at_result = await _recv_until(at_id)
+    session_id = at_result.get("sessionId") or None
+    if session_id:
+        _CDP_SESSION_CACHE[ws_url] = session_id
+    return session_id
+
+
+async def _cdp_coordinate_click_async(
+    ws_url: str,
+    x: int,
+    y: int,
+    button: str,
+    timeout: float,
+) -> None:
+    """Perform a compositor-level click on a single persistent WS connection.
+
+    Optimizations vs the naïve 3-separate-connections approach:
+
+    1. **Single connection** — one TCP+WS handshake for the entire click.
+       All CDP messages are sent on the same socket.
+
+    2. **Session ID caching** — Target.getTargets + Target.attachToTarget are
+       only paid once per CDP endpoint.  Subsequent clicks skip straight to
+       the two mouse events.  Cache is invalidated automatically on stale-
+       session errors and re-resolved once (browser-harness self-heal pattern).
+
+    3. **mouseReleased-only wait** — mousePressed and mouseReleased are both
+       fired before awaiting either response.  Because the browser processes
+       CDP messages sequentially within a session, if mouseReleased is
+       acknowledged then mousePressed has already been processed.  We only
+       wait for the release ack (Playwright / Puppeteer pattern), saving one
+       RTT on the common path.
+    """
+    import asyncio as _asyncio
+    from tools.browser_cdp_tool import websockets as _ws
+
+    async with _ws.connect(
+        ws_url,
+        max_size=None,
+        open_timeout=timeout,
+        close_timeout=5,
+        ping_interval=None,
+        compression=None,       # small CDP messages don't benefit from compression
+    ) as ws:
+        deadline = _asyncio.get_running_loop().time() + timeout
+        msg_id_ref = [0]        # mutable so nested helpers can increment
+
+        def _next_id() -> int:
+            msg_id_ref[0] += 1
+            return msg_id_ref[0]
+
+        async def _send_mouse(event_type: str, sid: Optional[str]) -> int:
+            call_id = _next_id()
+            req: dict = {
+                "id": call_id,
+                "method": "Input.dispatchMouseEvent",
+                "params": {"type": event_type, "x": x, "y": y,
+                           "button": button, "clickCount": 1},
+            }
+            if sid:
+                req["sessionId"] = sid
+            await ws.send(json.dumps(req))
+            return call_id
+
+        async def _recv_until(call_id: int) -> dict:
+            while True:
+                remaining = deadline - _asyncio.get_running_loop().time()
+                if remaining <= 0:
+                    raise TimeoutError(f"CDP timed out waiting for id={call_id}")
+                raw = await _asyncio.wait_for(ws.recv(), timeout=remaining)
+                msg = json.loads(raw)
+                if msg.get("id") == call_id:
+                    if "error" in msg:
+                        raise RuntimeError(f"CDP error: {msg['error']}")
+                    return msg.get("result", {})
+
+        # --- resolve session (cached after first click) ---
+        session_id: Optional[str] = _CDP_SESSION_CACHE.get(ws_url)
+        if not session_id:
+            session_id = await _cdp_resolve_session(ws, ws_url, deadline, msg_id_ref)
+
+        # --- fire mousePressed + mouseReleased without awaiting press ack ---
+        # Both messages are sent before we await either response.  The browser
+        # processes them in order, so waiting only for mouseReleased is enough.
+        _press_id = await _send_mouse("mousePressed", session_id)
+        release_id = await _send_mouse("mouseReleased", session_id)
+        try:
+            await _recv_until(release_id)
+        except RuntimeError as exc:
+            # Stale session (e.g. after navigation) — invalidate cache, retry once
+            if "Session with given id not found" in str(exc) and session_id:
+                _CDP_SESSION_CACHE.pop(ws_url, None)
+                session_id = await _cdp_resolve_session(ws, ws_url, deadline, msg_id_ref)
+                _press_id = await _send_mouse("mousePressed", session_id)
+                release_id = await _send_mouse("mouseReleased", session_id)
+                await _recv_until(release_id)
+            else:
+                raise
+
+
+def _cdp_coordinate_click(
+    x: float,
+    y: float,
+    task_id: str,
+    button: str = "left",
+) -> str:
+    """Compositor-level click at viewport coordinates via CDP Input.dispatchMouseEvent.
+
+    Dispatch priority (fastest first):
+    1. **Supervisor path** — if a CDPSupervisor is alive for this task_id, reuse
+       its already-connected WebSocket.  Zero connection setup cost; the supervisor
+       thread owns a persistent WS that self-heals on navigation/crash.
+    2. **Per-click connect path** — open a single WS, resolve session (cached),
+       pipeline mousePressed + mouseReleased, close.
+    3. **agent-browser fallback** — when no CDP endpoint is configured at all.
+    """
+    ix, iy = int(round(x)), int(round(y))
+
+    # --- path 1: reuse supervisor's live WS (zero connection overhead) ------
+    try:
+        from tools.browser_supervisor import SUPERVISOR_REGISTRY  # type: ignore[import-not-found]
+        supervisor = SUPERVISOR_REGISTRY.get(task_id)
+        if supervisor is not None:
+            try:
+                supervisor.dispatch_mouse_click(ix, iy, button)
+                return json.dumps({
+                    "success": True,
+                    "clicked_at": {"x": ix, "y": iy},
+                    "method": "cdp_supervisor",
+                }, ensure_ascii=False)
+            except Exception as _exc:
+                # Supervisor present but errored (WS disconnect, stale session, etc.)
+                # — fall through to per-click WS path.
+                logger.debug("supervisor coordinate click failed for task=%s, falling back: %s", task_id, _exc)
+    except ImportError:
+        pass
+
+    # --- path 2: per-click WS connect (with session cache) ------------------
+    try:
+        from tools.browser_cdp_tool import _run_async, _resolve_cdp_endpoint, _WS_AVAILABLE
+    except ImportError:
+        return json.dumps({
+            "success": False,
+            "error": "browser_cdp_tool not available — coordinate clicks require the CDP tool module.",
+        }, ensure_ascii=False)
+
+    if not _WS_AVAILABLE:
+        return json.dumps({
+            "success": False,
+            "error": "The 'websockets' package is required for coordinate clicks. Install with: pip install websockets",
+        }, ensure_ascii=False)
+
+    endpoint = _resolve_cdp_endpoint()
+    if not endpoint:
+        return _coordinate_click_via_agent_browser(x, y, task_id, button)
+
+    if not endpoint.startswith(("ws://", "wss://")):
+        return _coordinate_click_via_agent_browser(x, y, task_id, button)
+
+    try:
+        _run_async(_cdp_coordinate_click_async(endpoint, ix, iy, button, 10.0))
+    except Exception as exc:
+        return json.dumps({
+            "success": False,
+            "error": f"CDP coordinate click failed: {type(exc).__name__}: {exc}",
+        }, ensure_ascii=False)
+
+    return json.dumps({
+        "success": True,
+        "clicked_at": {"x": ix, "y": iy},
+        "method": "cdp_compositor",
+    }, ensure_ascii=False)
+
+
+def _coordinate_click_via_agent_browser(
+    x: float,
+    y: float,
+    task_id: str,
+    button: str = "left",
+) -> str:
+    """Fallback: coordinate click via agent-browser mouse subcommands."""
+    effective_task_id = _last_session_key(task_id)
+    ix, iy = int(round(x)), int(round(y))
+
+    # agent-browser mouse move <x> <y> + mouse down + mouse up
+    move_result = _run_browser_command(effective_task_id, "mouse", ["move", str(ix), str(iy)])
+    if not move_result.get("success"):
+        return json.dumps({
+            "success": False,
+            "error": f"mouse move failed: {move_result.get('error', 'unknown')}",
+        }, ensure_ascii=False)
+
+    btn_arg = [] if button == "left" else [button]
+    down_result = _run_browser_command(effective_task_id, "mouse", ["down"] + btn_arg)
+    if not down_result.get("success"):
+        return json.dumps({
+            "success": False,
+            "error": f"mouse down failed: {down_result.get('error', 'unknown')}",
+        }, ensure_ascii=False)
+
+    up_result = _run_browser_command(effective_task_id, "mouse", ["up"] + btn_arg)
+    if not up_result.get("success"):
+        return json.dumps({
+            "success": False,
+            "error": f"mouse up failed: {up_result.get('error', 'unknown')}",
+        }, ensure_ascii=False)
+
+    return json.dumps({
+        "success": True,
+        "clicked_at": {"x": ix, "y": iy},
+        "method": "agent_browser_mouse",
+    }, ensure_ascii=False)
+
+
+def browser_click(
+    ref: Optional[str] = None,
+    x: Optional[float] = None,
+    y: Optional[float] = None,
+    task_id: Optional[str] = None,
+) -> str:
+    """
+    Click on an element by ref ID, or at exact viewport coordinates.
+
+    Provide EITHER ``ref`` (selector-based, via agent-browser) OR ``x`` + ``y``
+    (compositor-level, via CDP Input.dispatchMouseEvent).  Coordinate clicks
+    bypass DOM selectors entirely — they pass through iframes, shadow DOM,
+    cross-origin boundaries, and canvas elements.

    Args:
        ref: Element reference (e.g., "@e5")
+        x: Viewport X coordinate for compositor-level click
+        y: Viewport Y coordinate for compositor-level click
        task_id: Task identifier for session isolation

    Returns:
        JSON string with click result
    """
+    # --- Input validation ---------------------------------------------------
+    has_ref = ref is not None and str(ref).strip() != ""
+    has_coords = x is not None and y is not None
+
+    if has_ref and has_coords:
+        return json.dumps({
+            "success": False,
+            "error": "Provide either 'ref' or 'x'+'y', not both.",
+        }, ensure_ascii=False)
+
+    if (x is not None) != (y is not None):
+        return json.dumps({
+            "success": False,
+            "error": "Both 'x' and 'y' are required for coordinate clicks.",
+        }, ensure_ascii=False)
+
+    if not has_ref and not has_coords:
+        return json.dumps({
+            "success": False,
+            "error": "Provide either 'ref' (element reference) or 'x'+'y' (viewport coordinates).",
+        }, ensure_ascii=False)
+
+    # --- Coordinate-based click (compositor-level) --------------------------
+    if has_coords:
+        try:
+            fx, fy = float(x), float(y)  # type: ignore[arg-type]
+        except (TypeError, ValueError):
+            return json.dumps({
+                "success": False,
+                "error": f"x and y must be numbers, got x={x!r} y={y!r}",
+            }, ensure_ascii=False)
+        return _cdp_coordinate_click(fx, fy, task_id or "default")
+
+    # --- Ref-based click (existing path) ------------------------------------
+    if not has_ref or ref is None:
+        # Defensive guard — validation above should ensure we never reach here
+        return json.dumps({
+            "success": False,
+            "error": "Internal error: expected ref parameter.",
+        }, ensure_ascii=False)
+
    if _is_camofox_mode():
        from tools.browser_camofox import camofox_click
        return camofox_click(ref, task_id)
@@ -3413,7 +3753,7 @@ registry.register(
    name="browser_click",
    toolset="browser",
    schema=_BROWSER_SCHEMA_MAP["browser_click"],
-    handler=lambda args, **kw: browser_click(ref=args.get("ref", ""), task_id=kw.get("task_id")),
+    handler=lambda args, **kw: browser_click(ref=args.get("ref"), x=args.get("x"), y=args.get("y"), task_id=kw.get("task_id")),
    check_fn=check_browser_requirements,
    emoji="👆",
 )
@@ -5619,14 +5619,13 @@ def _(rid, params: dict) -> dict:

@method("voice.record")
 def _(rid, params: dict) -> dict:
-    """VAD-driven continuous record loop, CLI-parity.
+    """VAD-bounded push-to-talk capture, CLI-parity.

-    ``start`` turns on a VAD loop that emits ``voice.transcript`` events
-    for each detected utterance and auto-restarts for the next turn.
-    ``stop`` halts the loop (manual stop; matches cli.py's Ctrl+B-while-
-    recording branch clearing ``_voice_continuous``). Three consecutive
-    silent cycles stop the loop automatically and emit a
-    ``voice.transcript`` with ``no_speech_limit=True``.
+    ``start`` begins one VAD-bounded capture and emits ``voice.transcript``
+    after silence stops the recorder. ``stop`` forces transcription of the
+    active buffer, matching classic CLI push-to-talk. The voice wrapper retains
+    no-speech counts across single-shot starts, so three consecutive silent
+    captures emit ``voice.transcript`` with ``no_speech_limit=True``.
    """
    action = params.get("action", "start")

@@ -5665,7 +5664,7 @@ def _(rid, params: dict) -> dict:
                if isinstance(duration, (int, float)) and not isinstance(duration, bool)
                else 3.0
            )
-            start_continuous(
+            started = start_continuous(
                on_transcript=lambda t: _voice_emit("voice.transcript", {"text": t}),
                on_status=lambda s: _voice_emit("voice.status", {"state": s}),
                on_silent_limit=lambda: _voice_emit(
@@ -5673,13 +5672,19 @@ def _(rid, params: dict) -> dict:
                ),
                silence_threshold=safe_threshold,
                silence_duration=safe_duration,
+                auto_restart=False,
            )
+            if started is False:
+                return _ok(rid, {"status": "busy"})
            return _ok(rid, {"status": "recording"})

        # action == "stop"
+        with _voice_sid_lock:
+            _voice_event_sid = params.get("session_id") or _voice_event_sid
+
        from hermes_cli.voice import stop_continuous

-        stop_continuous()
+        stop_continuous(force_transcribe=True)
        return _ok(rid, {"status": "stopped"})
    except ImportError:
        return _err(
@@ -0,0 +1,44 @@
+import { describe, expect, it } from 'vitest'
+
+import { computePrecisionWheelStep, initPrecisionWheel } from '../lib/precisionWheel.js'
+
+describe('precisionWheel', () => {
+  it('passes the first modifier-held wheel event', () => {
+    const s = initPrecisionWheel()
+
+    expect(computePrecisionWheelStep(s, 1, true, 1000)).toEqual({ active: true, entered: true, rows: 1 })
+  })
+
+  it('coalesces same-frame events without throttling line-by-line scroll', () => {
+    const s = initPrecisionWheel()
+
+    computePrecisionWheelStep(s, 1, true, 1000)
+
+    expect(computePrecisionWheelStep(s, 1, true, 1008).rows).toBe(0)
+    expect(computePrecisionWheelStep(s, 1, true, 1016).rows).toBe(1)
+  })
+
+  it('keeps queued momentum in precision mode briefly after modifier release', () => {
+    const s = initPrecisionWheel()
+
+    computePrecisionWheelStep(s, 1, true, 1000)
+
+    expect(computePrecisionWheelStep(s, 1, false, 1050)).toMatchObject({ active: true, rows: 1 })
+  })
+
+  it('leaves precision mode once modifier-free momentum goes idle', () => {
+    const s = initPrecisionWheel()
+
+    computePrecisionWheelStep(s, 1, true, 1000)
+
+    expect(computePrecisionWheelStep(s, 1, false, 1100)).toEqual({ active: false, entered: false, rows: 0 })
+  })
+
+  it('does not coalesce immediate reversals', () => {
+    const s = initPrecisionWheel()
+
+    computePrecisionWheelStep(s, 1, true, 1000)
+
+    expect(computePrecisionWheelStep(s, -1, true, 1008).rows).toBe(1)
+  })
+})
@@ -0,0 +1,37 @@
+import { describe, expect, it, vi } from 'vitest'
+
+import { applyVoiceRecordResponse } from '../app/useInputHandlers.js'
+
+describe('applyVoiceRecordResponse', () => {
+  it('reverts optimistic REC state when the gateway reports voice busy', () => {
+    const setProcessing = vi.fn()
+    const setRecording = vi.fn()
+    const sys = vi.fn()
+
+    applyVoiceRecordResponse({ status: 'busy' }, true, { setProcessing, setRecording }, sys)
+
+    expect(setRecording).toHaveBeenCalledWith(false)
+    expect(setProcessing).toHaveBeenCalledWith(true)
+    expect(sys).toHaveBeenCalledWith('voice: still transcribing; try again shortly')
+  })
+
+  it('keeps optimistic REC state for successful recording starts', () => {
+    const setProcessing = vi.fn()
+    const setRecording = vi.fn()
+
+    applyVoiceRecordResponse({ status: 'recording' }, true, { setProcessing, setRecording }, vi.fn())
+
+    expect(setRecording).not.toHaveBeenCalled()
+    expect(setProcessing).not.toHaveBeenCalled()
+  })
+
+  it('reverts optimistic REC state when the gateway returns null', () => {
+    const setProcessing = vi.fn()
+    const setRecording = vi.fn()
+
+    applyVoiceRecordResponse(null, true, { setProcessing, setRecording }, vi.fn())
+
+    expect(setRecording).toHaveBeenCalledWith(false)
+    expect(setProcessing).toHaveBeenCalledWith(false)
+  })
+})
@@ -1,6 +1,6 @@
 import { describe, expect, it } from 'vitest'

-import { getViewportSnapshot, viewportSnapshotKey } from '../lib/viewportStore.js'
+import { getScrollbarSnapshot, getViewportSnapshot, scrollbarSnapshotKey, viewportSnapshotKey } from '../lib/viewportStore.js'

 describe('viewportStore', () => {
  it('normalizes absent scroll handles', () => {
@@ -51,4 +51,35 @@ describe('viewportStore', () => {
    expect(snap.atBottom).toBe(true)
    expect(snap.scrollHeight).toBe(20)
  })
+
+  it('keeps scrollbar position tied to committed scrollTop, not pending target', () => {
+    const handle = {
+      getPendingDelta: () => 24,
+      getScrollHeight: () => 100,
+      getScrollTop: () => 10,
+      getViewportHeight: () => 20,
+      isSticky: () => false
+    }
+
+    const viewport = getViewportSnapshot(handle as any)
+    const scrollbar = getScrollbarSnapshot(handle as any)
+
+    expect(viewport.top).toBe(34)
+    expect(scrollbar).toEqual({
+      scrollHeight: 100,
+      top: 10,
+      viewportHeight: 20
+    })
+    expect(scrollbarSnapshotKey(scrollbar)).toBe('10:20:100')
+  })
+
+  it('clamps scrollbar position to committed scroll bounds', () => {
+    const handle = {
+      getScrollHeight: () => 30,
+      getScrollTop: () => 50,
+      getViewportHeight: () => 20
+    }
+
+    expect(getScrollbarSnapshot(handle as any).top).toBe(10)
+  })
 })
@@ -11,6 +11,7 @@ import type {
  VoiceRecordResponse
 } from '../gatewayTypes.js'
 import { isAction, isCopyShortcut, isMac, isVoiceToggleKey } from '../lib/platform.js'
+import { computePrecisionWheelStep, initPrecisionWheel } from '../lib/precisionWheel.js'
 import { computeWheelStep, initWheelAccelForHost } from '../lib/wheelAccel.js'

 import { getInputSelection } from './inputSelectionStore.js'
@@ -21,8 +22,26 @@ import { patchTurnState } from './turnStore.js'
 import { getUiState } from './uiStore.js'

 const isCtrl = (key: { ctrl: boolean }, ch: string, target: string) => key.ctrl && ch.toLowerCase() === target
-const PRECISION_WHEEL_MIN_GAP_MS = 80
-const PRECISION_WHEEL_STICKY_MS = 80
+
+export function applyVoiceRecordResponse(
+  response: null | VoiceRecordResponse,
+  starting: boolean,
+  voice: Pick<InputHandlerContext['voice'], 'setProcessing' | 'setRecording'>,
+  sys: (text: string) => void
+) {
+  if (!starting || response?.status === 'recording') {
+    return
+  }
+
+  voice.setRecording(false)
+
+  if (response?.status === 'busy') {
+    voice.setProcessing(true)
+    sys('voice: still transcribing; try again shortly')
+  } else {
+    voice.setProcessing(false)
+  }
+}

 export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
  const { actions, composer, gateway, terminal, voice, wheelStep } = ctx
@@ -38,9 +57,7 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
  // rows = wheelStep × accelMult. State mutates in place across renders.
  const wheelAccelRef = useRef(initWheelAccelForHost())

-  const precisionWheelRef = useRef<{ active: boolean; dir: 0 | -1 | 1; lastEventAtMs: number; lastScrollAtMs: number }>(
-    { active: false, dir: 0, lastEventAtMs: 0, lastScrollAtMs: 0 }
-  )
+  const precisionWheelRef = useRef(initPrecisionWheel())

  useEffect(() => () => clearTimeout(scrollIdleTimer.current ?? undefined), [])

@@ -160,11 +177,12 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
    }
  }

-  // CLI parity: Ctrl+B toggles the VAD-driven continuous recording loop
+  // CLI parity: Ctrl+B toggles a VAD-bounded push-to-talk capture
  // (NOT the voice-mode umbrella bit). The mode is enabled via /voice on;
  // Ctrl+B while the mode is off sys-nudges the user. While the mode is
-  // on, the first press starts a continuous loop (gateway → start_continuous,
-  // VAD auto-stop → transcribe → auto-restart), a subsequent press stops it.
+  // on, the first press starts a single VAD-bounded capture
+  // (gateway -> start_continuous(auto_restart=false), VAD auto-stop ->
+  // transcribe -> idle), a subsequent press stops and transcribes it.
  // The gateway publishes voice.status + voice.transcript events that
  // createGatewayEventHandler turns into UI badges and composer injection.
  const voiceRecordToggle = () => {
@@ -185,14 +203,17 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
      voice.setProcessing(false)
    }

-    gateway.rpc<VoiceRecordResponse>('voice.record', { action }).catch((e: Error) => {
-      // Revert optimistic UI on failure.
-      if (starting) {
-        voice.setRecording(false)
-      }
+    gateway
+      .rpc<VoiceRecordResponse>('voice.record', { action, session_id: getUiState().sid })
+      .then(r => applyVoiceRecordResponse(r, starting, voice, actions.sys))
+      .catch((e: Error) => {
+        // Revert optimistic UI on failure.
+        if (starting) {
+          voice.setRecording(false)
+        }

-      actions.sys(`voice error: ${e.message}`)
-    })
+        actions.sys(`voice error: ${e.message}`)
+      })
  }

  useInput((ch, key) => {
@@ -291,40 +312,26 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
    if (key.wheelUp || key.wheelDown) {
      const dir: -1 | 1 = key.wheelUp ? -1 : 1
      const now = Date.now()
-      // Modifier-held wheel = precision mode: at most one wheelStep per short
-      // interval. Smooth mice / trackpads emit many raw wheel events for one
-      // intended line step, so raw 1:1 still moves too far.
+      // Modifier-held wheel = precision mode: one row per frame, no accel.
+      // Smooth mice / trackpads emit tiny same-frame bursts; coalesce those
+      // without the old 80ms throttle that made opt-scroll feel stepped.
      // SGR/X10 mouse encoding only carries shift/meta/ctrl bits; Cmd on
      // macOS is intercepted by the terminal, so we honor Option (meta) on
      // Mac / Alt (meta) on Win+Linux / Ctrl as a portable fallback. Shift
      // is reserved for selection extension.
      const hasModifier = key.meta || key.ctrl
-      const precision = precisionWheelRef.current
-      // Keep precision active through the current wheel burst after the
-      // modifier is released. Otherwise a stream of queued/momentum wheel
-      // events can hand off mid-burst into the accelerated path and jump.
-      const precisionSticky = now - precision.lastEventAtMs < PRECISION_WHEEL_STICKY_MS
+      const precision = computePrecisionWheelStep(precisionWheelRef.current, dir, hasModifier, now)

-      if (hasModifier || precisionSticky) {
-        if (!precision.active) {
-          precision.active = true
+      if (precision.active) {
+        // Entering precision mode must discard any accelerated wheel state;
+        // otherwise the next normal wheel event inherits stale momentum.
+        if (precision.entered) {
          wheelAccelRef.current = initWheelAccelForHost()
        }

-        precision.lastEventAtMs = now
-
-        if (dir === precision.dir && now - precision.lastScrollAtMs < PRECISION_WHEEL_MIN_GAP_MS) {
-          return
-        }
-
-        precision.lastScrollAtMs = now
-        precision.dir = dir
-
-        return scrollTranscript(dir * wheelStep)
+        return precision.rows ? scrollTranscript(dir * wheelStep) : undefined
      }

-      precision.active = false
-
      // 0 = direction-flip bounce deferred; skip the no-op scroll.
      const rows = computeWheelStep(wheelAccelRef.current, dir, now)

@@ -1,6 +1,6 @@
 import { Box, type ScrollBoxHandle, Text } from '@hermes/ink'
 import { useStore } from '@nanostores/react'
-import { type ReactNode, type RefObject, useEffect, useMemo, useState } from 'react'
+import { type ReactNode, type RefObject, useEffect, useMemo, useRef, useState } from 'react'
 import unicodeSpinners from 'unicode-animations'

 import { $delegationState } from '../app/delegationStore.js'
@@ -13,7 +13,7 @@ import { fmtDuration } from '../domain/messages.js'
 import { stickyPromptFromViewport } from '../domain/viewport.js'
 import { buildSubagentTree, treeTotals, widthByDepth } from '../lib/subagentTree.js'
 import { fmtK } from '../lib/text.js'
-import { useViewportSnapshot } from '../lib/viewportStore.js'
+import { useScrollbarSnapshot, useViewportSnapshot } from '../lib/viewportStore.js'
 import type { Theme } from '../theme.js'
 import type { Msg, Usage } from '../types.js'

@@ -377,7 +377,8 @@ export function StickyPromptTracker({ messages, offsets, scrollRef, onChange }:
 export function TranscriptScrollbar({ scrollRef, t }: TranscriptScrollbarProps) {
  const [hover, setHover] = useState(false)
  const [grab, setGrab] = useState<number | null>(null)
-  const { scrollHeight: total, top: pos, viewportHeight: vp } = useViewportSnapshot(scrollRef)
+  const grabRef = useRef<number | null>(null)
+  const { scrollHeight: total, top: pos, viewportHeight: vp } = useScrollbarSnapshot(scrollRef)

  if (!vp) {
    return <Box width={1} />
@@ -405,15 +406,20 @@ export function TranscriptScrollbar({ scrollRef, t }: TranscriptScrollbarProps)
      onMouseDown={(e: { localRow?: number }) => {
        const row = Math.max(0, Math.min(vp - 1, e.localRow ?? 0))
        const off = row >= thumbTop && row < thumbTop + thumb ? row - thumbTop : Math.floor(thumb / 2)
+
+        grabRef.current = off
        setGrab(off)
        jump(row, off)
      }}
      onMouseDrag={(e: { localRow?: number }) =>
-        jump(Math.max(0, Math.min(vp - 1, e.localRow ?? 0)), grab ?? Math.floor(thumb / 2))
+        jump(Math.max(0, Math.min(vp - 1, e.localRow ?? 0)), grabRef.current ?? Math.floor(thumb / 2))
      }
      onMouseEnter={() => setHover(true)}
      onMouseLeave={() => setHover(false)}
-      onMouseUp={() => setGrab(null)}
+      onMouseUp={() => {
+        grabRef.current = null
+        setGrab(null)
+      }}
      width={1}
    >
      {!scrollable ? (
@@ -295,7 +295,7 @@ export interface VoiceToggleResponse {
 }

 export interface VoiceRecordResponse {
-  status?: string
+  status?: 'busy' | 'recording' | 'stopped'
  text?: string
 }

@@ -0,0 +1,48 @@
+const PRECISION_WHEEL_FRAME_MS = 16
+const PRECISION_WHEEL_STICKY_MS = 80
+
+export type PrecisionWheelState = {
+  active: boolean
+  dir: 0 | -1 | 1
+  lastEventAtMs: number
+  lastScrollAtMs: number
+}
+
+export type PrecisionWheelStep = {
+  active: boolean
+  entered: boolean
+  rows: 0 | 1
+}
+
+export function initPrecisionWheel(): PrecisionWheelState {
+  return { active: false, dir: 0, lastEventAtMs: 0, lastScrollAtMs: 0 }
+}
+
+export function computePrecisionWheelStep(
+  state: PrecisionWheelState,
+  dir: -1 | 1,
+  hasModifier: boolean,
+  now: number
+): PrecisionWheelStep {
+  const active = hasModifier || now - state.lastEventAtMs < PRECISION_WHEEL_STICKY_MS
+
+  if (!active) {
+    state.active = false
+
+    return { active: false, entered: false, rows: 0 }
+  }
+
+  const entered = !state.active
+
+  state.active = true
+  state.lastEventAtMs = now
+
+  if (dir === state.dir && now - state.lastScrollAtMs < PRECISION_WHEEL_FRAME_MS) {
+    return { active: true, entered, rows: 0 }
+  }
+
+  state.dir = dir
+  state.lastScrollAtMs = now
+
+  return { active: true, entered, rows: 1 }
+}
@@ -11,6 +11,12 @@ export interface ViewportSnapshot {
  viewportHeight: number
 }

+export interface ScrollbarSnapshot {
+  scrollHeight: number
+  top: number
+  viewportHeight: number
+}
+
 const EMPTY: ViewportSnapshot = {
  atBottom: true,
  bottom: 0,
@@ -20,6 +26,12 @@ const EMPTY: ViewportSnapshot = {
  viewportHeight: 0
 }

+const EMPTY_SCROLLBAR: ScrollbarSnapshot = {
+  scrollHeight: 0,
+  top: 0,
+  viewportHeight: 0
+}
+
 export function getViewportSnapshot(s?: ScrollBoxHandle | null): ViewportSnapshot {
  if (!s) {
    return EMPTY
@@ -52,6 +64,26 @@ export function viewportSnapshotKey(v: ViewportSnapshot) {
  return `${v.atBottom ? 1 : 0}:${Math.ceil(v.top / 8) * 8}:${v.viewportHeight}:${Math.ceil(v.scrollHeight / 8) * 8}:${v.pending}`
 }

+export function getScrollbarSnapshot(s?: ScrollBoxHandle | null): ScrollbarSnapshot {
+  if (!s) {
+    return EMPTY_SCROLLBAR
+  }
+
+  const viewportHeight = Math.max(0, s.getViewportHeight())
+  const scrollHeight = Math.max(viewportHeight, s.getScrollHeight())
+  const maxTop = Math.max(0, scrollHeight - viewportHeight)
+
+  return {
+    scrollHeight,
+    top: Math.max(0, Math.min(maxTop, s.getScrollTop())),
+    viewportHeight
+  }
+}
+
+export function scrollbarSnapshotKey(v: ScrollbarSnapshot) {
+  return `${v.top}:${v.viewportHeight}:${v.scrollHeight}`
+}
+
 export function useViewportSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>): ViewportSnapshot {
  const key = useSyncExternalStore(
    useCallback((cb: () => void) => scrollRef.current?.subscribe(cb) ?? (() => {}), [scrollRef]),
@@ -72,3 +104,21 @@ export function useViewportSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>
    }
  }, [key])
 }
+
+export function useScrollbarSnapshot(scrollRef: RefObject<ScrollBoxHandle | null>): ScrollbarSnapshot {
+  const key = useSyncExternalStore(
+    useCallback((cb: () => void) => scrollRef.current?.subscribe(cb) ?? (() => {}), [scrollRef]),
+    () => scrollbarSnapshotKey(getScrollbarSnapshot(scrollRef.current)),
+    () => scrollbarSnapshotKey(EMPTY_SCROLLBAR)
+  )
+
+  return useMemo(() => {
+    const [top = '0', viewportHeight = '0', scrollHeight = '0'] = key.split(':')
+
+    return {
+      scrollHeight: Number(scrollHeight),
+      top: Number(top),
+      viewportHeight: Number(viewportHeight)
+    }
+  }, [key])
+}
@@ -335,10 +335,19 @@ Any profile that should be able to work kanban tasks must load the `kanban-worke
 3. Call `kanban_heartbeat(note="...")` every few minutes during long operations.
 4. Complete with `kanban_complete(summary="...", metadata={...})`, or `kanban_block(reason="...")` if stuck.

-Load it with (this one is **you**, installing into a profile — not a tool call):
+`kanban-worker` is a bundled skill, synced into every profile during install and
+update — there is no separate Skills Hub install step. Verify it is present in
+whichever profile you use for kanban workers (`researcher`, `writer`, `ops`,
+etc.):

 ```bash
-hermes skills install devops/kanban-worker
+hermes -p <your-worker-profile> skills list | grep kanban-worker
+```
+
+If the bundled copy is missing, restore it for that profile:
+
+```bash
+hermes -p <your-worker-profile> skills reset kanban-worker --restore
 ```

 The dispatcher also auto-passes `--skills kanban-worker` when spawning every worker, so the worker always has the pattern library available even if a profile's default skills config doesn't include it.
@@ -403,10 +412,18 @@ kanban_complete(
 )
 ```

-Load it into your orchestrator profile:
+`kanban-orchestrator` is a bundled skill. It is synced into each profile during
+install and update, so there is no separate Skills Hub install step. Verify it is
+present in your orchestrator profile:

 ```bash
-hermes skills install devops/kanban-orchestrator
+hermes -p orchestrator skills list | grep kanban-orchestrator
+```
+
+If the bundled copy is missing, restore it for that profile:
+
+```bash
+hermes -p orchestrator skills reset kanban-orchestrator --restore
 ```

 For best results, pair it with a profile whose toolsets are restricted to board operations (`kanban`, `gateway`, `memory`) so the orchestrator literally cannot execute implementation tasks even if it tries.
Author	SHA1	Message	Date
kshitijk4poor	2af14bd401	fix: self-review findings — logging, create_task, get_running_loop, benchmark path Self-review findings addressed: - browser_tool.py: log swallowed supervisor error at DEBUG instead of bare 'pass' (was silent, triggered F841 for unused 'exc' variable). Renamed to '_exc' to signal intentional discard. - browser_tool.py: rename unused 'press_id' to '_press_id' in both normal and retry paths (mouseReleased-only wait is intentional; press_id is never used after send). - browser_tool.py: get_event_loop() → get_running_loop() in 3 locations inside _cdp_resolve_session and _cdp_coordinate_click_async. Both are async functions and get_event_loop() is deprecated in async context in Python 3.10+. - browser_supervisor.py: ensure_future → create_task in dispatch_mouse_click. create_task is the correct modern API when already inside a running coroutine; ensure_future is deprecated for coroutines in Python 3.10+. Also consistent with the rest of browser_supervisor.py which uses create_task exclusively everywhere else. - scripts/benchmark_click_paths.py: replace hardcoded /private/tmp/hermes- coord-click sys.path hack with __file__-relative repo root detection so the script works from any checkout location. 27/27 tests pass.	2026-05-07 10:49:06 +05:30
kshitijk4poor	aef97da6d4	perf: reuse supervisor's persistent WS for coordinate clicks (23x speedup) The CDPSupervisor (browser_supervisor.py) already maintains a persistent WebSocket connection per task_id for dialog detection and frame tracking. After browser_navigate(), a supervisor is always running with an open WS. Instead of opening a new connection per click, dispatch directly on it. Changes: - browser_supervisor.py: add CDPSupervisor.dispatch_mouse_click() — sync bridge onto the supervisor's asyncio loop via run_coroutine_threadsafe. Pipelines mousePressed + mouseReleased via asyncio.gather (Playwright Promise.all pattern), no serial round-trips. - browser_tool.py: _cdp_coordinate_click() now checks SUPERVISOR_REGISTRY.get(task_id) first; falls back to per-click WS connect if no supervisor is running (e.g. raw CDP without navigate). Dispatch priority (fastest first): 1. Supervisor path — zero WS connection cost (supervisor WS already open) 2. Warm-cache path — 1 WS open + 2 mouse events (session cached) 3. Cold-cache path — 1 WS open + getTargets + attachToTarget + 2 events 4. agent-browser — 3 subprocess IPC calls (no CDP endpoint configured) Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iterations): Baseline (3 connections): 4.86ms mean Warm cache (1 conn + cache): 1.30ms mean (3.74x) Supervisor (persistent WS): 0.20ms mean (23.75x) Ref-click IPC baseline: 0.14ms mean (parity) The supervisor path is 1.5x ref-click (0.07ms overhead) — essentially the cost of one cross-thread future dispatch. 27/27 tests pass (+3 new TestSupervisorPath tests).	2026-05-07 10:28:48 +05:30
kshitijk4poor	451c55bd9c	perf: session ID caching + skip mousePressed ack (browser-harness/Playwright patterns) Two additional optimizations from researching Playwright, Puppeteer, and browser-harness source: SESSION ID CACHING (browser-harness daemon pattern) Target.getTargets + Target.attachToTarget are stable across clicks on the same page. Cache the resolved session_id keyed by CDP endpoint URL. Subsequent clicks skip straight to mousePressed+mouseReleased with no session negotiation overhead. Self-healing: on 'Session with given id not found' (stale after navigation), the cache is invalidated and session resolution runs once before retrying. This matches the exact retry pattern from browser-harness's daemon.handle(). SKIP mousePressed ACK (Playwright Promise.all pattern) Browser processes CDP messages sequentially within a session. If mouseReleased is acknowledged, mousePressed was already processed. We skip waiting for the press ack entirely, saving one RTT. This is the same pattern as Playwright's Mouse.click() using Promise.all and Puppeteer's concurrent down+up dispatch. COMPRESSION=NONE (Puppeteer NodeWebSocketTransport pattern) Small CDP messages (Input.dispatchMouseEvent payloads are ~80 bytes) don't benefit from per-message compression. Disable it explicitly. Puppeteer uses perMessageDeflate: false for the same reason. Benchmark vs real Lightpanda WS (300 iterations): Baseline (3 connections): 3.28ms mean Optimized cold cache (1 conn): 1.17ms mean (2.79x speedup) Optimized warm cache (1 conn): 1.17ms mean (2.82x speedup) The cold/warm delta is <0.01ms because getTargets+attachToTarget on an already-open socket costs almost nothing on localhost — the dominant cost is WS connection setup, which we eliminated in the previous commit. The session cache still removes real work (2 CDP round-trips) and prevents accumulating latency on remote/higher-latency CDP endpoints. Tests: 24 passed (21 existing + 3 new session caching tests)	2026-05-07 10:17:29 +05:30
kshitijk4poor	0bfab1d361	perf: batch CDP click into single WS connection (2.4x speedup) Replace the 3-separate-_cdp_call() approach (one WS connection per message) with a single _cdp_coordinate_click_async() coroutine that opens the WebSocket once and sequences all CDP messages on it: 1. Target.getTargets 2. Target.attachToTarget (if page target found) 3. Input.dispatchMouseEvent (mousePressed) } pipelined — both sent 4. Input.dispatchMouseEvent (mouseReleased) } before awaiting either Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iters): Baseline (current main, 3 connections): 3.14ms mean, 2.97ms median Optimized (this commit, 1 connection): 1.30ms mean, 1.11ms median Speedup: 2.42x mean, 2.68x median, 1.62x p95 The savings come entirely from eliminating 2 TCP+WS handshakes. mousePressed + mouseReleased are pipelined on the same connection, so they travel in the same network burst. 21/21 tests pass.	2026-05-07 10:04:38 +05:30
kshitijk4poor	ff8c6f2d64	feat: add compositor-level coordinate click to browser_click Add optional x/y parameters to browser_click for viewport-coordinate clicking via CDP Input.dispatchMouseEvent. When coordinates are provided, clicks are dispatched at the browser compositor level — Chrome does its own hit-testing, bypassing DOM selectors entirely. Use cases where ref-based click fails but coordinate click works: - Cross-origin iframes (OOPIFs) - Closed shadow DOM - Canvas/WebGL elements - Dynamic overlays where the snapshot may be stale Implementation: - CDP path (preferred): Input.dispatchMouseEvent via WebSocket (Target.getTargets + mousePressed + mouseReleased) - agent-browser fallback: mouse move/down/up when no CDP endpoint available - ref is no longer required — either ref OR x+y must be provided Benchmark (real Lightpanda WS at ws://127.0.0.1:63372, 200 iterations): CDP coord click: 3.71ms mean (2.97ms median, 2.61ms min, 7.01ms p95) Single WS conn baseline: 1.57ms mean (cost per connection open+call) agent-browser IPC: 0.20ms mean per HTTP call The 3.71ms per CDP click comes from 3 sequential fresh WS connections (pre-existing architecture in browser_cdp_tool.py). A persistent WS connection pool would bring this to ~3.1ms (just the 2 mouse events). Both paths are well under the 100ms human perception threshold. Files: - tools/browser_tool.py: schema update (x/y, ref no longer required), _cdp_coordinate_click(), _coordinate_click_via_agent_browser(), updated browser_click() with validation and dispatch - tests/tools/test_browser_coordinate_click.py: 21 tests covering validation, CDP path, fallback path, ref preservation, schema, registry - scripts/benchmark_click_paths.py: real-browser latency benchmark	2026-05-07 09:57:44 +05:30
Teknium	49c3c2e0d3	docs(kanban): fix worker skill setup instructions too (#20960 ) Follow-up to #20958. The worker skill section had the same stale 'hermes skills install devops/kanban-worker' command — kanban-worker is also bundled, so that command fails with 'Could not fetch from any source.' Replace with bundled-skill verification + restore pattern, matching the orchestrator section. Uses <your-worker-profile> placeholder since assignees vary (researcher, writer, ops, linguist, reviewer, etc.) rather than a single fixed 'worker' profile.	2026-05-06 18:40:30 -07:00
Gille	45cbf93899	docs(kanban): fix orchestrator skill setup instructions (#20958 )	2026-05-06 18:14:30 -07:00
Teknium	5a3cadf6eb	fix(discord): narrow rate-limit catch and move sync state under gateway/ Two follow-ups on top of helix4u's slash-command sync hardening: - Only suppress exceptions that are actually Discord 429 rate limits (discord.RateLimited, HTTPException with status 429, or a clearly rate-limit-named duck type). Arbitrary failures that happen to expose a retry_after attribute now re-raise to the outer handler instead of silently swallowing a cooldown. - Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root stops collecting ad-hoc runtime files. Added a test verifying unrelated exceptions don't get misclassified as rate limits.	2026-05-06 18:12:35 -07:00
helix4u	d797755a1c	fix(gateway): wait for systemd restart readiness	2026-05-06 18:12:35 -07:00
Teknium	3cdbf334d5	fix(gateway): don't dead-end setup wizard when only system-scope unit is installed The setup wizard dropped non-root users at a bare shell prompt when trying to start a system-scope gateway service. Previously _require_root_for_system_service called sys.exit(1), which the wizard's `except Exception` guards cannot catch (SystemExit is a BaseException). Users with a pre-existing /etc/systemd/system unit (e.g. from an earlier `sudo hermes setup` run) hit this whenever they re-ran `hermes setup` as a regular user. - Convert _require_root_for_system_service to raise a typed SystemScopeRequiresRootError (RuntimeError subclass) instead of sys.exit(1). The direct CLI path (`hermes gateway install\|start\|stop\| restart\|uninstall` without sudo) still exits 1 cleanly via a new catch at the top of gateway_command, matching the existing UserSystemdUnavailableError pattern. - Add _system_scope_wizard_would_need_root() pre-check and _print_system_scope_remediation() helper. Both setup wizards (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now detect the dead-end before prompting and print actionable guidance: either `sudo systemctl start <service>` this time, or uninstall the system unit and install a per-user one. - Defense-in-depth: all 5 wizard prompt sites also catch SystemScopeRequiresRootError and fall back to the remediation helper if the pre-check is bypassed (race, etc.). Tests: 12 new tests in TestSystemScopeRequiresRootError, TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and TestGatewayCommandCatchesSystemScopeError covering the exception contract, pre-check matrix (root vs non-root, system-only vs user-present vs none vs explicit system=True), remediation output for each action, and the direct-CLI exit-1 path.	2026-05-06 15:58:02 -07:00
brooklyn!	04cf4788cc	fix(tui): restore voice push-to-talk parity (#20897 ) * fix(tui): restore classic CLI voice push-to-talk parity (cherry picked from commit `93b9ae301b`) * fix(tui): harden voice push-to-talk stop flow Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests. * fix(tui): preserve silent voice strike tracking Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking. * fix(tui): clean up voice stop failure path Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV. * fix(tui): report busy voice capture starts Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up. * fix(tui): handle busy voice record responses Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request. * fix(tui): clear voice recording on null response Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors. * fix(tui): count silent manual voice stops Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard. --------- Co-authored-by: Montbra <montbra@gmail.com>	2026-05-06 15:49:59 -07:00
brooklyn!	5ccab51fa8	fix(tui): steady transcript scrollbar (#20917 ) * fix(tui): steady transcript scrollbar Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags. * fix(tui): smooth precision wheel scroll Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.	2026-05-06 14:50:31 -07:00
ethernet	53a024994a	Merge pull request #20890 from NousResearch/fix/docker-push ci(docker): don't cancel overlapping builds, guard :latest	2026-05-06 17:38:21 -04:00
ethernet	f4031df05d	ci(docker): don't cancel overlapping builds, guard :latest Switch top-level concurrency to cancel-in-progress=false so every push to main gets its own SHA-tagged image published — no more discarded builds when commits land back-to-back. Guard the :latest tag with a second job that has its own concurrency group with cancel-in-progress=true plus a git-ancestor check against the revision label on the current :latest. Together these guarantee :latest only ever moves forward in history: a slower run whose commit isn't a descendant of the current :latest refuses to clobber it, and a newer push mid-way through the move-latest job preempts the older one before it can retag. - Every main push publishes nousresearch/hermes-agent:sha-<commit> with an org.opencontainers.image.revision label embedded. - move-latest job reads that label off :latest, runs merge-base --is-ancestor, and only retags (via buildx imagetools create, registry-side, no rebuild) if our commit strictly descends. - fetch-depth bumped to 1000 so merge-base has the history it needs. - Release tag flow unchanged (unique tag, no race).	2026-05-06 15:53:47 -04:00