test(image-shrink): accept clamp_dimensions kwarg in mock resize signature

The _fake_resize mock in test_image_shrink_recovery.py predates the clamp_dimensions kwarg on _resize_image_for_vision. Add it to keep the mock signature aligned.
refactor(vision): revert Pillow-missing warning to match codebase pattern
2026-05-22 19:09:17 +02:00 · 2026-05-22 18:49:31 +02:00 · 2026-05-22 18:31:28 +02:00 · 2026-05-22 18:17:22 +02:00 · 2026-05-21 22:21:20 +02:00 · 2026-05-21 22:16:43 +02:00
188 changed files with 2345 additions and 11927 deletions
@@ -27,9 +27,9 @@ on:
 permissions:
  contents: read

-# Concurrency: push/release runs are NEVER cancelled so every merge gets
-# its own :main or release-tagged image.  :latest is guarded separately
-# by the move-latest job.  PR runs reuse a PR-scoped group with
+# Concurrency: push/release runs are NEVER cancelled so every merge gets its
+# own SHA-tagged image; :main and :latest are guarded separately by the
+# move-main and move-latest jobs.  PR runs reuse a PR-scoped group with
 # cancel-in-progress: true so rapid pushes to the same PR collapse to the
 # latest commit.
 concurrency:
@@ -92,10 +92,10 @@ jobs:
      # pattern for multi-runner multi-platform builds.
      #
      # We apply the OCI revision label here (and again on arm64) because
-      # the move-latest job reads it off the linux/amd64 sub-manifest
-      # config of the floating tag to decide whether it's safe to advance.
-      # The label must be on each per-arch image — manifest lists themselves
-      # don't carry image config labels.
+      # the move-main / move-latest jobs read it off the linux/amd64
+      # sub-manifest config of the floating tag to decide whether it's safe
+      # to advance.  The label must be on each per-arch image — manifest
+      # lists themselves don't carry image config labels.
      - name: Push amd64 by digest
        id: push
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
@@ -208,14 +208,8 @@ jobs:
  # ---------------------------------------------------------------------------
  # Stitch both per-arch digests into a single tagged multi-arch manifest.
  # This is a registry-side operation — no building, no layer re-push —
-  # so it runs in ~30 seconds.  On main pushes it produces :main; on
-  # releases it produces :<release_tag_name>.
-  #
-  # For main pushes the ancestor check runs BEFORE the manifest push so
-  # we never overwrite :main with an older commit.  The top-level
-  # concurrency group (`docker-${{ github.ref }}` with
-  # `cancel-in-progress: false`) already serialises runs per ref; the
-  # ancestor check is defense-in-depth.
+  # so it runs in ~30 seconds.  On main pushes it produces :sha-<sha>.
+  # On releases it produces :<release_tag_name>.
  # ---------------------------------------------------------------------------
  merge:
    if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
@@ -223,15 +217,10 @@ jobs:
    needs: [build-amd64, build-arm64]
    timeout-minutes: 10
    outputs:
+      pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
      pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
      release_tag: ${{ steps.tag.outputs.tag }}
    steps:
-      - name: Checkout code
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
-        with:
-          fetch-depth: 1000
-
      - name: Download digests
        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
        with:
@@ -248,19 +237,120 @@ jobs:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      # Read the git revision label off the current :main manifest, then
-      # use `git merge-base --is-ancestor` to check whether our commit is
-      # a descendant of it.  If :main doesn't exist yet, or its label is
-      # missing, we treat that as "safe to publish".  If another run
-      # already advanced :main past us (or diverged), we skip and leave
-      # it alone.
-      - name: Decide whether to move :main
+      # Compute the tag for this run.  Main pushes use sha-<sha> (so every
+      # commit gets its own immutable tag); releases use the release tag name.
+      - name: Compute tag
+        id: tag
+        run: |
+          if [ "${{ github.event_name }}" = "release" ]; then
+            echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "tag=sha-${{ github.sha }}" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Create manifest list and push
+        working-directory: /tmp/digests
+        run: |
+          set -euo pipefail
+          # Build the arg array from each digest file (filename = the digest
+          # hex, with no sha256: prefix; empty file content, only the name
+          # matters).  Using an array avoids shellcheck SC2046 and keeps
+          # every digest a single argv token even under pathological names.
+          args=()
+          for digest_file in *; do
+            args+=("${IMAGE_NAME}@sha256:${digest_file}")
+          done
+          docker buildx imagetools create \
+            -t "${IMAGE_NAME}:${TAG}" \
+            "${args[@]}"
+        env:
+          IMAGE_NAME: ${{ env.IMAGE_NAME }}
+          TAG: ${{ steps.tag.outputs.tag }}
+
+      - name: Inspect image
+        run: |
+          docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
+        env:
+          IMAGE_NAME: ${{ env.IMAGE_NAME }}
+          TAG: ${{ steps.tag.outputs.tag }}
+
+      # Signal to move-main that the SHA tag is live.  Only on main pushes;
+      # releases set pushed_release_tag instead.
+      - name: Mark SHA tag pushed
+        id: mark_pushed
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+        run: echo "pushed=true" >> "$GITHUB_OUTPUT"
+
+      # Signal to move-latest that the release tag is live.
+      - name: Mark release tag pushed
+        id: mark_release_pushed
+        if: github.event_name == 'release'
+        run: echo "pushed=true" >> "$GITHUB_OUTPUT"
+
+  # ---------------------------------------------------------------------------
+  # Move :main to point at the SHA tag the merge job pushed.
+  #
+  # :main is the floating tag that tracks the tip of the main branch.  Every
+  # merge to main retags :main forward.  Users who want "latest dev build"
+  # pull :main; users who want stable releases pull :latest.
+  #
+  # The real serialization guarantee comes from the top-level concurrency
+  # group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
+  # which ensures at most one workflow run for this ref executes at a time.
+  # That means two move-main steps for the same ref cannot overlap.
+  #
+  # This job has its own concurrency group as defense-in-depth: if the
+  # top-level group is ever loosened, queued move-mains will run serially
+  # in arrival order, each one running the ancestor check below and either
+  # advancing :main or skipping.  `cancel-in-progress: false` matches the
+  # top-level setting — we don't want rapid pushes to cancel a queued
+  # move-main, because the ancestor check is the real safety mechanism
+  # and queueing is cheap (move-main is a ~30s registry op).
+  #
+  # Combined with the ancestor check, this means :main only ever moves
+  # forward in git history.
+  # ---------------------------------------------------------------------------
+  move-main:
+    if: |
+      github.repository == 'NousResearch/hermes-agent'
+      && github.event_name == 'push'
+      && github.ref == 'refs/heads/main'
+      && needs.merge.outputs.pushed_sha_tag == 'true'
+    needs: merge
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    concurrency:
+      group: docker-move-main-${{ github.ref }}
+      cancel-in-progress: false
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 1000
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121  # v4.1.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      # Read the git revision label off the current :main manifest, then
+      # use `git merge-base --is-ancestor` to check whether our commit is a
+      # descendant of it.  If :main doesn't exist yet, or its label is
+      # missing, we treat that as "safe to publish".  If another run already
+      # advanced :main past us (or diverged), we skip and leave it alone.
+      - name: Decide whether to move :main
        id: main_check
        run: |
          set -euo pipefail
          image=nousresearch/hermes-agent

+          # Pull the JSON for the linux/amd64 sub-manifest's config and extract
+          # the OCI revision label with jq — Go template field access can't
+          # handle dots in map keys, so using json+jq is the robust route.
          image_json=$(
            docker buildx imagetools inspect "${image}:main" \
              --format '{{ json (index .Image "linux/amd64") }}' \
@@ -293,6 +383,7 @@ jobs:
            exit 0
          fi

+          # Make sure we have the :main commit locally for merge-base.
          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
            git fetch --no-tags --prune origin \
              "+refs/heads/main:refs/remotes/origin/main" \
@@ -305,6 +396,7 @@ jobs:
            exit 0
          fi

+          # Our SHA must be a descendant of the current :main to be safe.
          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
            echo "Our commit is a descendant of :main — safe to advance."
            echo "push_main=true" >> "$GITHUB_OUTPUT"
@@ -313,48 +405,19 @@ jobs:
            echo "push_main=false" >> "$GITHUB_OUTPUT"
          fi

-      # Compute the tag for this run.  Main pushes tag directly as :main
-      # (no per-commit SHA tags); releases use the release tag name.
-      - name: Compute tag
-        id: tag
-        run: |
-          if [ "${{ github.event_name }}" = "release" ]; then
-            echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
-          else
-            echo "tag=main" >> "$GITHUB_OUTPUT"
-          fi
-
-      # Gate the manifest push on the ancestor check for main pushes.
-      # For releases there is no gate — the check doesn't even run.
-      - name: Create manifest list and push
-        if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
-        working-directory: /tmp/digests
+      # Retag the already-pushed SHA manifest as :main.  This is a registry-
+      # side operation — no rebuild, no layer re-push — so it's quick and
+      # atomic per-tag.  The ancestor check above plus the cancel-in-progress
+      # concurrency on this job together guarantee we only ever move :main
+      # forward in git history.
+      - name: Move :main to this SHA
+        if: steps.main_check.outputs.push_main == 'true'
        run: |
          set -euo pipefail
-          args=()
-          for digest_file in *; do
-            args+=("${IMAGE_NAME}@sha256:${digest_file}")
-          done
+          image=nousresearch/hermes-agent
          docker buildx imagetools create \
-            -t "${IMAGE_NAME}:${TAG}" \
-            "${args[@]}"
-        env:
-          IMAGE_NAME: ${{ env.IMAGE_NAME }}
-          TAG: ${{ steps.tag.outputs.tag }}
-
-      - name: Inspect image
-        if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
-        run: |
-          docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
-        env:
-          IMAGE_NAME: ${{ env.IMAGE_NAME }}
-          TAG: ${{ steps.tag.outputs.tag }}
-
-      # Signal to move-latest that the release tag is live.
-      - name: Mark release tag pushed
-        id: mark_release_pushed
-        if: github.event_name == 'release'
-        run: echo "pushed=true" >> "$GITHUB_OUTPUT"
+            --tag "${image}:main" \
+            "${image}:sha-${GITHUB_SHA}"

  # ---------------------------------------------------------------------------
  # Move :latest to point at the release tag the merge job pushed.
@@ -364,10 +427,10 @@ jobs:
  #
  # We still run an ancestor check against the existing :latest so that a
  # backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
-  # is out) doesn't drag :latest backwards.  The check is the same shape
-  # as the ancestor check in the merge job for :main: read the OCI
-  # revision label off the current :latest, look up that commit in git,
-  # and only advance if our release commit is a strict descendant.
+  # is out) doesn't drag :latest backwards.  The check is the same shape as
+  # move-main: read the OCI revision label off the current :latest, look up
+  # that commit in git, and only advance if our release commit is a strict
+  # descendant.
  # ---------------------------------------------------------------------------
  move-latest:
    if: |
@@ -23,24 +23,13 @@ concurrency:
 jobs:
  test:
    runs-on: ubuntu-latest
-    timeout-minutes: 60
+    timeout-minutes: 30
    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

-      - name: Install ripgrep (prebuilt binary)
-        run: |
-          set -euo pipefail
-          RG_VERSION=15.1.0
-          RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
-          RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
-          curl -sSfL -o "$RG_TARBALL" \
-            "https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
-          echo "${RG_SHA256}  ${RG_TARBALL}" | sha256sum -c -
-          tar -xzf "$RG_TARBALL"
-          sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
-          rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
-          rg --version
+      - name: Install system dependencies
+        run: sudo apt-get update && sudo apt-get install -y ripgrep

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
@@ -55,26 +44,9 @@ jobs:
          uv pip install -e ".[all,dev]"

      - name: Run tests
-        # Per-file isolation via scripts/run_tests_parallel.py: discovers
-        # every test_*.py file under tests/ (excluding integration/ + e2e/),
-        # then runs `python -m pytest <file>` in a freshly-spawned subprocess
-        # with bounded parallelism. No xdist, no shared workers, no
-        # module-level state leakage between files.
-        #
-        # Why per-file (not per-test): per-test spawn cost (~250ms × 17k
-        # tests = 70min CPU minimum) blew the wall-clock budget. Per-file
-        # spawn (~250ms × ~850 files = ~3.5min) fits while still giving
-        # every file a fresh interpreter — the only isolation boundary
-        # that matters in practice (cross-file leakage was the original
-        # flake source; intra-file is the test author's responsibility).
-        #
-        # Why drop xdist entirely: xdist's persistent workers accumulate
-        # state across files, which is exactly the leakage we wanted to
-        # fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
-        # the job with cleaner semantics.
        run: |
          source .venv/bin/activate
-          python scripts/run_tests_parallel.py
+          python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto --timeout=30 --timeout-method=signal
        env:
          # Ensure tests don't accidentally call real APIs
          OPENROUTER_API_KEY: ""
@@ -88,19 +60,8 @@ jobs:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

-      - name: Install ripgrep (prebuilt binary)
-        run: |
-          set -euo pipefail
-          RG_VERSION=15.1.0
-          RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
-          RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
-          curl -sSfL -o "$RG_TARBALL" \
-            "https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
-          echo "${RG_SHA256}  ${RG_TARBALL}" | sha256sum -c -
-          tar -xzf "$RG_TARBALL"
-          sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
-          rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
-          rg --version
+      - name: Install system dependencies
+        run: sudo apt-get update && sudo apt-get install -y ripgrep

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
@@ -18,7 +18,6 @@ __pycache__/web_tools.cpython-310.pyc
 logs/
 data/
 .pytest_cache/
-.pytest-cache/
 tmp/
 temp_vision_images/
 hermes-*/*
@@ -1013,39 +1013,17 @@ def profile_env(tmp_path, monkeypatch):

 **ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
 hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
-`-n auto` xdist workers, in-tree subprocess-isolation plugin). Direct `pytest`
-on a 16+ core developer machine with API keys set diverges from CI in ways
-that have caused multiple "works locally, fails in CI" incidents (and the reverse).
+4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
+developer machine with API keys set diverges from CI in ways that have caused
+multiple "works locally, fails in CI" incidents (and the reverse).

 ```bash
 scripts/run_tests.sh                                  # full suite, CI-parity
 scripts/run_tests.sh tests/gateway/                   # one directory
 scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
 scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags
-scripts/run_tests.sh --no-isolate tests/foo/          # disable subprocess isolation (faster, for debugging)
 ```

-### Subprocess-per-test isolation
-
-Every test runs in a freshly-spawned Python subprocess via the in-tree plugin
-at `tests/_isolate_plugin.py`. This means module-level dicts/sets and
-ContextVars from one test cannot leak into the next — the historic
-`_reset_module_state` autouse fixture is gone.
-
-Implementation notes:
-
- The plugin uses `multiprocessing.get_context("spawn")`, which works on
-  Linux, macOS, and Windows alike (POSIX `fork` is not used).
- Per-test overhead is ~0.5–1.0s (Python startup + pytest collection). xdist
-  parallelism amortizes this across cores; on a 20-core box the full suite
-  finishes in roughly the same wall time as before, but flake-free.
- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s.
-  Hangs are killed and surfaced as a failure report.
- Pass `--no-isolate` to disable isolation — useful when debugging a single
-  test interactively, or when you specifically want to verify state leakage.
- The plugin disables itself in child processes (sentinel envvar
-  `HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk.
-
 ### Why the wrapper (and why the old "just call pytest" doesn't work)

 Five real sources of local-vs-CI drift the script closes:
@@ -1056,7 +1034,7 @@ Five real sources of local-vs-CI drift the script closes:
 | HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
 | Timezone | Local TZ (PDT etc.) | UTC |
 | Locale | Whatever is set | C.UTF-8 |
-| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) |
+| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |

 `tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
 invocation (including IDE integrations) gets hermetic behavior — but the wrapper
@@ -1064,21 +1042,15 @@ is belt-and-suspenders.

 ### Running without the wrapper (only if you must)

-If you can't use the wrapper (e.g. inside an IDE that shells pytest directly),
-at minimum activate the venv. The isolation plugin loads automatically from
-`addopts` in `pyproject.toml`, so you get the same per-test process isolation
-either way.
+If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
+pytest directly), at minimum activate the venv and pass `-n 4`:

 ```bash
 source .venv/bin/activate   # or: source venv/bin/activate
-python -m pytest tests/ -q
+python -m pytest tests/ -q -n 4
 ```

-If you need to bypass isolation for fast feedback while debugging:
-
-```bash
-python -m pytest tests/agent/test_foo.py -q --no-isolate
-```
+Worker count above 4 will surface test-ordering flakes that CI never sees.

 Always run the full suite before pushing changes.

@@ -71,71 +71,6 @@ def _ra():
    return run_agent


-def _normalized_custom_base_url(value: Any) -> str:
-    if not isinstance(value, str):
-        return ""
-    return value.strip().rstrip("/")
-
-
-def _custom_provider_model_matches(agent_model: str, entry: Dict[str, Any]) -> bool:
-    provider_model = str(entry.get("model", "") or "").strip().lower()
-    if not provider_model:
-        return True
-    return provider_model == str(agent_model or "").strip().lower()
-
-
-def _custom_provider_extra_body_for_agent(
-    *,
-    provider: str,
-    model: str,
-    base_url: str,
-    custom_providers: List[Dict[str, Any]],
-) -> Optional[Dict[str, Any]]:
-    if (provider or "").strip().lower() != "custom":
-        return None
-
-    target_url = _normalized_custom_base_url(base_url)
-    if not target_url:
-        return None
-
-    fallback: Optional[Dict[str, Any]] = None
-    for entry in custom_providers or []:
-        if not isinstance(entry, dict):
-            continue
-        if _normalized_custom_base_url(entry.get("base_url")) != target_url:
-            continue
-        extra_body = entry.get("extra_body")
-        if not isinstance(extra_body, dict) or not extra_body:
-            continue
-        provider_model = str(entry.get("model", "") or "").strip()
-        if provider_model:
-            if _custom_provider_model_matches(model, entry):
-                return dict(extra_body)
-        elif fallback is None:
-            fallback = dict(extra_body)
-
-    return fallback
-
-
-def _merge_custom_provider_extra_body(agent, custom_providers: List[Dict[str, Any]]) -> None:
-    extra_body = _custom_provider_extra_body_for_agent(
-        provider=agent.provider,
-        model=agent.model,
-        base_url=agent.base_url,
-        custom_providers=custom_providers,
-    )
-    if not extra_body:
-        return
-
-    overrides = dict(getattr(agent, "request_overrides", {}) or {})
-    merged_extra_body = dict(extra_body)
-    existing_extra_body = overrides.get("extra_body")
-    if isinstance(existing_extra_body, dict):
-        merged_extra_body.update(existing_extra_body)
-    overrides["extra_body"] = merged_extra_body
-    agent.request_overrides = overrides
-
-
 def init_agent(
    agent,
    base_url: str = None,
@@ -1125,18 +1060,7 @@ def init_agent(
    # through _ra().get_tool_definitions()).  Duplicate function names cause
    # 400 errors on providers that enforce unique names (e.g. Xiaomi
    # MiMo via Nous Portal).
-    #
-    # Respect the platform's enabled_toolsets configuration (#5544):
-    #   enabled_toolsets is None        → no filter, inject (backward compat)
-    #   "memory" in enabled_toolsets    → user opted in, inject
-    #   otherwise (incl. [])            → user excluded memory, skip injection
-    #
-    # Without this gate, `platform_toolsets: telegram: []` still leaks memory
-    # provider tools (fact_store, etc.) into the tool surface — a 10x latency
-    # penalty on local models and a frequent trigger of tool-call loops.
-    if agent._memory_manager and agent.tools is not None and (
-        agent.enabled_toolsets is None or "memory" in agent.enabled_toolsets
-    ):
+    if agent._memory_manager and agent.tools is not None:
        _existing_tool_names = {
            t.get("function", {}).get("name")
            for t in agent.tools
@@ -1289,7 +1213,6 @@ def init_agent(
    # Store for reuse by _check_compression_model_feasibility (auxiliary
    # compression model context-length detection needs the same list).
    agent._custom_providers = _custom_providers
-    _merge_custom_provider_extra_body(agent, _custom_providers)

    # Check custom_providers per-model context_length
    if _config_context_length is None and _custom_providers:
@@ -1446,22 +1369,8 @@ def init_agent(
    # errors. Even with the cache fix, dedup is the right defense
    # against plugin paths that may register the same schemas via
    # ctx.register_tool(). Mirrors the memory tools dedup above.
-    #
-    # Respect the platform's enabled_toolsets configuration (#5544):
-    # context engine tools follow the same gating pattern as memory
-    # provider tools — without the gate, `platform_toolsets: telegram: []`
-    # would still leak lcm_* tools into the tool surface and incur the
-    # same local-model latency penalty.
    agent._context_engine_tool_names: set = set()
-    if (
-        hasattr(agent, "context_compressor")
-        and agent.context_compressor
-        and agent.tools is not None
-        and (
-            agent.enabled_toolsets is None
-            or "context_engine" in agent.enabled_toolsets
-        )
-    ):
+    if hasattr(agent, "context_compressor") and agent.context_compressor and agent.tools is not None:
        _existing_tool_names = {
            t.get("function", {}).get("name")
            for t in agent.tools
@@ -1606,155 +1606,182 @@ def _content_parts_to_anthropic_blocks(parts: Any) -> List[Dict[str, Any]]:
    return out


-def _convert_assistant_message(m: Dict[str, Any]) -> Dict[str, Any]:
-    """Convert an assistant message to Anthropic content blocks.
+def convert_messages_to_anthropic(
+    messages: List[Dict],
+    base_url: str | None = None,
+    model: str | None = None,
+) -> Tuple[Optional[Any], List[Dict]]:
+    """Convert OpenAI-format messages to Anthropic format.

-    Handles thinking blocks, regular content, tool calls, and
-    reasoning_content injection for Kimi/DeepSeek endpoints.
+    Returns (system_prompt, anthropic_messages).
+    System messages are extracted since Anthropic takes them as a separate param.
+    system_prompt is a string or list of content blocks (when cache_control present).
+
+    When *base_url* is provided and points to a third-party Anthropic-compatible
+    endpoint, all thinking block signatures are stripped.  Signatures are
+    Anthropic-proprietary — third-party endpoints cannot validate them and will
+    reject them with HTTP 400 "Invalid signature in thinking block".
+
+    When *model* is provided and matches the Kimi / Moonshot family (or
+    *base_url* is a Kimi / Moonshot host), unsigned thinking blocks
+    synthesised from ``reasoning_content`` are preserved on replayed
+    assistant tool-call messages — Kimi requires the field to exist, even
+    if empty.
    """
-    content = m.get("content", "")
-    blocks = _extract_preserved_thinking_blocks(m)
-    if content:
-        if isinstance(content, list):
-            converted_content = _convert_content_to_anthropic(content)
-            if isinstance(converted_content, list):
-                blocks.extend(converted_content)
-        else:
-            blocks.append({"type": "text", "text": str(content)})
-    for tc in m.get("tool_calls", []):
-        if not tc or not isinstance(tc, dict):
+    system = None
+    result = []
+
+    for m in messages:
+        role = m.get("role", "user")
+        content = m.get("content", "")
+
+        if role == "system":
+            if isinstance(content, list):
+                # Preserve cache_control markers on content blocks
+                has_cache = any(
+                    p.get("cache_control") for p in content if isinstance(p, dict)
+                )
+                if has_cache:
+                    system = [p for p in content if isinstance(p, dict)]
+                else:
+                    system = "\n".join(
+                        p["text"] for p in content if p.get("type") == "text"
+                    )
+            else:
+                system = content
            continue
-        fn = tc.get("function", {})
-        args = fn.get("arguments", "{}")
-        try:
-            parsed_args = json.loads(args) if isinstance(args, str) else args
-        except (json.JSONDecodeError, ValueError):
-            parsed_args = {}
-        blocks.append({
-            "type": "tool_use",
-            "id": _sanitize_tool_id(tc.get("id", "")),
-            "name": fn.get("name", ""),
-            "input": parsed_args,
-        })
-    # Kimi's /coding endpoint (Anthropic protocol) requires assistant
-    # tool-call messages to carry reasoning_content when thinking is
-    # enabled server-side.  Preserve it as a thinking block so Kimi
-    # can validate the message history.  See hermes-agent#13848.
-    #
-    # Accept empty string "" — _copy_reasoning_content_for_api()
-    # injects "" as a tier-3 fallback for Kimi tool-call messages
-    # that had no reasoning.  Kimi requires the field to exist, even
-    # if empty.
-    #
-    # Prepend (not append): Anthropic protocol requires thinking
-    # blocks before text and tool_use blocks.
-    #
-    # Guard: only add when reasoning_details didn't already contribute
-    # thinking blocks.  On native Anthropic, reasoning_details produces
-    # signed thinking blocks — adding another unsigned one from
-    # reasoning_content would create a duplicate (same text) that gets
-    # downgraded to a spurious text block on the last assistant message.
-    reasoning_content = m.get("reasoning_content")
-    _already_has_thinking = any(
-        isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
-        for b in blocks
-    )
-    if isinstance(reasoning_content, str) and not _already_has_thinking:
-        blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
-    # Anthropic rejects empty assistant content
-    effective = blocks or content
-    if not effective or effective == "":
-        effective = [{"type": "text", "text": "(empty)"}]
-    return {"role": "assistant", "content": effective}

-
-def _convert_tool_message_to_result(
-    result: List[Dict[str, Any]], m: Dict[str, Any]
-) -> None:
-    """Convert a tool message to an Anthropic tool_result, merging consecutive
-    results into one user message.
-
-    Mutates ``result`` in place — either appends a new user message or extends
-    the trailing user message's tool_result list.
-    """
-    content = m.get("content", "")
-    multimodal_blocks: Optional[List[Dict[str, Any]]] = None
-    if isinstance(content, dict) and content.get("_multimodal"):
-        multimodal_blocks = _content_parts_to_anthropic_blocks(
-            content.get("content") or []
-        )
-        # Fallback text if the conversion produced nothing usable.
-        if not multimodal_blocks and content.get("text_summary"):
-            multimodal_blocks = [
-                {"type": "text", "text": str(content["text_summary"])}
-            ]
-    elif isinstance(content, list):
-        converted = _content_parts_to_anthropic_blocks(content)
-        if any(b.get("type") == "image" for b in converted):
-            multimodal_blocks = converted
-    # Back-compat: some callers stash blocks under a private key.
-    if multimodal_blocks is None:
-        stashed = m.get("_anthropic_content_blocks")
-        if isinstance(stashed, list) and stashed:
-            text_content = content if isinstance(content, str) and content.strip() else None
-            multimodal_blocks = (
-                [{"type": "text", "text": text_content}] + stashed
-                if text_content else list(stashed)
+        if role == "assistant":
+            blocks = _extract_preserved_thinking_blocks(m)
+            if content:
+                if isinstance(content, list):
+                    converted_content = _convert_content_to_anthropic(content)
+                    if isinstance(converted_content, list):
+                        blocks.extend(converted_content)
+                else:
+                    blocks.append({"type": "text", "text": str(content)})
+            for tc in m.get("tool_calls", []):
+                if not tc or not isinstance(tc, dict):
+                    continue
+                fn = tc.get("function", {})
+                args = fn.get("arguments", "{}")
+                try:
+                    parsed_args = json.loads(args) if isinstance(args, str) else args
+                except (json.JSONDecodeError, ValueError):
+                    parsed_args = {}
+                blocks.append({
+                    "type": "tool_use",
+                    "id": _sanitize_tool_id(tc.get("id", "")),
+                    "name": fn.get("name", ""),
+                    "input": parsed_args,
+                })
+            # Kimi's /coding endpoint (Anthropic protocol) requires assistant
+            # tool-call messages to carry reasoning_content when thinking is
+            # enabled server-side.  Preserve it as a thinking block so Kimi
+            # can validate the message history.  See hermes-agent#13848.
+            #
+            # Accept empty string "" — _copy_reasoning_content_for_api()
+            # injects "" as a tier-3 fallback for Kimi tool-call messages
+            # that had no reasoning.  Kimi requires the field to exist, even
+            # if empty.
+            #
+            # Prepend (not append): Anthropic protocol requires thinking
+            # blocks before text and tool_use blocks.
+            #
+            # Guard: only add when reasoning_details didn't already contribute
+            # thinking blocks.  On native Anthropic, reasoning_details produces
+            # signed thinking blocks — adding another unsigned one from
+            # reasoning_content would create a duplicate (same text) that gets
+            # downgraded to a spurious text block on the last assistant message.
+            reasoning_content = m.get("reasoning_content")
+            _already_has_thinking = any(
+                isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
+                for b in blocks
            )
+            if isinstance(reasoning_content, str) and not _already_has_thinking:
+                blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
+            # Anthropic rejects empty assistant content
+            effective = blocks or content
+            if not effective or effective == "":
+                effective = [{"type": "text", "text": "(empty)"}]
+            result.append({"role": "assistant", "content": effective})
+            continue

-    if multimodal_blocks:
-        result_content: Any = multimodal_blocks
-    elif isinstance(content, str):
-        result_content = content
-    else:
-        result_content = json.dumps(content) if content else "(no output)"
-    if not result_content:
-        result_content = "(no output)"
-    tool_result = {
-        "type": "tool_result",
-        "tool_use_id": _sanitize_tool_id(m.get("tool_call_id", "")),
-        "content": result_content,
-    }
-    if isinstance(m.get("cache_control"), dict):
-        tool_result["cache_control"] = dict(m["cache_control"])
-    # Merge consecutive tool results into one user message
-    if (
-        result
-        and result[-1]["role"] == "user"
-        and isinstance(result[-1]["content"], list)
-        and result[-1]["content"]
-        and result[-1]["content"][0].get("type") == "tool_result"
-    ):
-        result[-1]["content"].append(tool_result)
-    else:
-        result.append({"role": "user", "content": [tool_result]})
+        if role == "tool":
+            # Sanitize tool_use_id and ensure non-empty content.
+            # Computer-use (and other multimodal) tool results arrive as
+            # either a list of OpenAI-style content parts, or a dict
+            # marked `_multimodal` with an embedded `content` list. Convert
+            # both into Anthropic `tool_result` inner blocks (text + image).
+            multimodal_blocks: Optional[List[Dict[str, Any]]] = None
+            if isinstance(content, dict) and content.get("_multimodal"):
+                multimodal_blocks = _content_parts_to_anthropic_blocks(
+                    content.get("content") or []
+                )
+                # Fallback text if the conversion produced nothing usable.
+                if not multimodal_blocks and content.get("text_summary"):
+                    multimodal_blocks = [
+                        {"type": "text", "text": str(content["text_summary"])}
+                    ]
+            elif isinstance(content, list):
+                converted = _content_parts_to_anthropic_blocks(content)
+                if any(b.get("type") == "image" for b in converted):
+                    multimodal_blocks = converted
+            # Back-compat: some callers stash blocks under a private key.
+            if multimodal_blocks is None:
+                stashed = m.get("_anthropic_content_blocks")
+                if isinstance(stashed, list) and stashed:
+                    text_content = content if isinstance(content, str) and content.strip() else None
+                    multimodal_blocks = (
+                        [{"type": "text", "text": text_content}] + stashed
+                        if text_content else list(stashed)
+                    )

+            if multimodal_blocks:
+                result_content: Any = multimodal_blocks
+            elif isinstance(content, str):
+                result_content = content
+            else:
+                result_content = json.dumps(content) if content else "(no output)"
+            if not result_content:
+                result_content = "(no output)"
+            tool_result = {
+                "type": "tool_result",
+                "tool_use_id": _sanitize_tool_id(m.get("tool_call_id", "")),
+                "content": result_content,
+            }
+            if isinstance(m.get("cache_control"), dict):
+                tool_result["cache_control"] = dict(m["cache_control"])
+            # Merge consecutive tool results into one user message
+            if (
+                result
+                and result[-1]["role"] == "user"
+                and isinstance(result[-1]["content"], list)
+                and result[-1]["content"]
+                and result[-1]["content"][0].get("type") == "tool_result"
+            ):
+                result[-1]["content"].append(tool_result)
+            else:
+                result.append({"role": "user", "content": [tool_result]})
+            continue

-def _convert_user_message(content: Any) -> Dict[str, Any]:
-    """Validate and convert a user message to anthropic format."""
-    if isinstance(content, list):
-        converted_blocks = _convert_content_to_anthropic(content)
-        if not converted_blocks or all(
-            b.get("text", "").strip() == ""
-            for b in converted_blocks
-            if isinstance(b, dict) and b.get("type") == "text"
-        ):
-            converted_blocks = [{"type": "text", "text": "(empty message)"}]
-        return {"role": "user", "content": converted_blocks}
-    else:
-        if not content or (isinstance(content, str) and not content.strip()):
-            content = "(empty message)"
-        return {"role": "user", "content": content}
+        # Regular user message — validate non-empty content (Anthropic rejects empty)
+        if isinstance(content, list):
+            converted_blocks = _convert_content_to_anthropic(content)
+            # Check if all text blocks are empty
+            if not converted_blocks or all(
+                b.get("text", "").strip() == ""
+                for b in converted_blocks
+                if isinstance(b, dict) and b.get("type") == "text"
+            ):
+                converted_blocks = [{"type": "text", "text": "(empty message)"}]
+            result.append({"role": "user", "content": converted_blocks})
+        else:
+            # Validate string content is non-empty
+            if not content or (isinstance(content, str) and not content.strip()):
+                content = "(empty message)"
+            result.append({"role": "user", "content": content})

-
-def _strip_orphaned_tool_blocks(result: List[Dict[str, Any]]) -> None:
-    """Strip tool_use blocks with no matching tool_result, and vice versa.
-
-    Context compression or session truncation can remove either side of a
-    tool-call pair.  Anthropic rejects both orphans with HTTP 400.
-
-    Mutates ``result`` in place.
-    """
    # Strip orphaned tool_use blocks (no matching tool_result follows)
    tool_result_ids = set()
    for m in result:
@@ -1772,7 +1799,10 @@ def _strip_orphaned_tool_blocks(result: List[Dict[str, Any]]) -> None:
            if not m["content"]:
                m["content"] = [{"type": "text", "text": "(tool call removed)"}]

-    # Strip orphaned tool_result blocks (no matching tool_use precedes them)
+    # Strip orphaned tool_result blocks (no matching tool_use precedes them).
+    # This is the mirror of the above: context compression or session truncation
+    # can remove an assistant message containing a tool_use while leaving the
+    # subsequent tool_result intact.  Anthropic rejects these with a 400.
    tool_use_ids = set()
    for m in result:
        if m["role"] == "assistant" and isinstance(m["content"], list):
@@ -1789,16 +1819,12 @@ def _strip_orphaned_tool_blocks(result: List[Dict[str, Any]]) -> None:
            if not m["content"]:
                m["content"] = [{"type": "text", "text": "(tool result removed)"}]

-
-def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
-    """Merge consecutive same-role messages to enforce Anthropic alternation.
-
-    Returns a new list (caller must rebind ``result``).
-    """
+    # Enforce strict role alternation (Anthropic rejects consecutive same-role messages)
    fixed = []
    for m in result:
        if fixed and fixed[-1]["role"] == m["role"]:
            if m["role"] == "user":
+                # Merge consecutive user messages
                prev_content = fixed[-1]["content"]
                curr_content = m["content"]
                if isinstance(prev_content, str) and isinstance(curr_content, str):
@@ -1806,6 +1832,7 @@ def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any
                elif isinstance(prev_content, list) and isinstance(curr_content, list):
                    fixed[-1]["content"] = prev_content + curr_content
                else:
+                    # Mixed types — wrap string in list
                    if isinstance(prev_content, str):
                        prev_content = [{"type": "text", "text": prev_content}]
                    if isinstance(curr_content, str):
@@ -1828,6 +1855,7 @@ def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any
                elif isinstance(prev_blocks, str) and isinstance(curr_blocks, str):
                    fixed[-1]["content"] = prev_blocks + "\n" + curr_blocks
                else:
+                    # Mixed types — normalize both to list and merge
                    if isinstance(prev_blocks, str):
                        prev_blocks = [{"type": "text", "text": prev_blocks}]
                    if isinstance(curr_blocks, str):
@@ -1835,34 +1863,37 @@ def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any
                    fixed[-1]["content"] = prev_blocks + curr_blocks
        else:
            fixed.append(m)
-    return fixed
+    result = fixed

-
-def _manage_thinking_signatures(
-    result: List[Dict[str, Any]], base_url: str | None, model: str | None
-) -> None:
-    """Strip or preserve thinking blocks based on endpoint type.
-
-    Anthropic signs thinking blocks against the full turn content.
-    Any upstream mutation (context compression, session truncation, orphan
-    stripping, message merging) invalidates the signature, causing HTTP 400
-    "Invalid signature in thinking block".
-
-    Signatures are Anthropic-proprietary.  Third-party endpoints (MiniMax,
-    Azure AI Foundry, AWS Bedrock, self-hosted proxies) cannot validate them
-    and will reject them outright.  Kimi's /coding and DeepSeek's /anthropic
-    endpoints speak the Anthropic protocol upstream but require unsigned
-    thinking blocks (synthesised from ``reasoning_content``) to round-trip on
-    replayed assistant tool-call messages.  See hermes-agent#13848 (Kimi) and
-    hermes-agent#16748 (DeepSeek).
-
-    Mutates ``result`` in place.
-    """
+    # ── Thinking block signature management ──────────────────────────
+    # Anthropic signs thinking blocks against the full turn content.
+    # Any upstream mutation (context compression, session truncation,
+    # orphan stripping, message merging) invalidates the signature,
+    # causing HTTP 400 "Invalid signature in thinking block".
+    #
+    # Signatures are Anthropic-proprietary.  Third-party endpoints
+    # (MiniMax, Microsoft Foundry, self-hosted proxies) cannot validate
+    # them and will reject them outright.  When targeting a third-party
+    # endpoint, strip ALL thinking/redacted_thinking blocks from every
+    # assistant message — the third-party will generate its own
+    # thinking blocks if it supports extended thinking.
+    #
+    # For direct Anthropic (strategy following clawdbot/OpenClaw):
+    # 1. Strip thinking/redacted_thinking from all assistant messages
+    #    EXCEPT the last one — preserves reasoning continuity on the
+    #    current tool-use chain while avoiding stale signature errors.
+    # 2. Downgrade unsigned thinking blocks (no signature) to text —
+    #    Anthropic can't validate them and will reject them.
+    # 3. Strip cache_control from thinking/redacted_thinking blocks —
+    #    cache markers can interfere with signature validation.
    _THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
    _is_third_party = _is_third_party_anthropic_endpoint(base_url)
-    # Kimi / DeepSeek share a contract: strip signed Anthropic blocks
-    # (neither upstream can validate Anthropic signatures), preserve unsigned
-    # ones synthesised from reasoning_content.  See #13848, #16748.
+    # Kimi /coding and DeepSeek /anthropic share a contract: both speak the
+    # Anthropic Messages protocol upstream but require that thinking blocks
+    # synthesised from reasoning_content round-trip on subsequent turns when
+    # thinking is enabled.  Signed Anthropic blocks still have to be stripped
+    # (neither endpoint can validate Anthropic's signatures); unsigned blocks
+    # are preserved.  See hermes-agent#13848 (Kimi) and #16748 (DeepSeek).
    _preserve_unsigned_thinking = (
        _is_kimi_family_endpoint(base_url, model)
        or _is_deepseek_anthropic_endpoint(base_url)
@@ -1879,19 +1910,26 @@ def _manage_thinking_signatures(
            continue

        if _preserve_unsigned_thinking:
-            # Kimi / DeepSeek: strip signed, preserve unsigned.
+            # Kimi's /coding and DeepSeek's /anthropic endpoints both enable
+            # thinking server-side and require unsigned thinking blocks on
+            # replayed assistant tool-call messages.  Strip signed Anthropic
+            # blocks (neither upstream can validate Anthropic signatures) but
+            # preserve the unsigned ones we synthesised from reasoning_content.
            new_content = []
            for b in m["content"]:
                if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
                    new_content.append(b)
                    continue
                if b.get("signature") or b.get("data"):
-                    # Signed (or redacted-with-data) — upstream can't validate, strip.
+                    # Anthropic-signed block — upstream can't validate, strip
                    continue
+                # Unsigned thinking (synthesised from reasoning_content) —
+                # keep it: the upstream needs it for message-history validation.
                new_content.append(b)
            m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
        elif _is_third_party or idx != last_assistant_idx:
-            # Third-party: strip ALL thinking blocks (signatures are proprietary).
+            # Third-party endpoint: strip ALL thinking blocks from every
+            # assistant message — signatures are Anthropic-proprietary.
            # Direct Anthropic: strip from non-latest assistant messages only.
            stripped = [
                b for b in m["content"]
@@ -1899,21 +1937,24 @@ def _manage_thinking_signatures(
            ]
            m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}]
        else:
-            # Latest assistant on direct Anthropic: keep signed, downgrade unsigned
-            # to text so the reasoning isn't lost.
+            # Latest assistant on direct Anthropic: keep signed thinking
+            # blocks for reasoning continuity; downgrade unsigned ones to
+            # plain text.
            new_content = []
            for b in m["content"]:
                if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
                    new_content.append(b)
                    continue
                if b.get("type") == "redacted_thinking":
-                    # Redacted blocks use 'data' for the signature payload —
-                    # drop the block when 'data' is missing (can't be validated).
+                    # Redacted blocks use 'data' for the signature payload
                    if b.get("data"):
                        new_content.append(b)
+                    # else: drop — no data means it can't be validated
                elif b.get("signature"):
+                    # Signed thinking block — keep it
                    new_content.append(b)
                else:
+                    # Unsigned thinking — downgrade to text so it's not lost
                    thinking_text = b.get("thinking", "")
                    if thinking_text:
                        new_content.append({"type": "text", "text": thinking_text})
@@ -1925,15 +1966,12 @@ def _manage_thinking_signatures(
            if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
                b.pop("cache_control", None)

-
-def _evict_old_screenshots(result: List[Dict[str, Any]]) -> None:
-    """Keep only the most recent ``_MAX_KEEP_IMAGES`` computer-use screenshots.
-
-    Base64 images cost ~1,465 tokens each and accumulate across tool calls.
-    Walk backward, keep the most recent N, replace older ones with a placeholder.
-
-    Mutates ``result`` in place.
-    """
+    # ── Image eviction: keep only the most recent N screenshots ─────
+    # computer_use screenshots (base64 images) sit inside tool_result
+    # blocks: they accumulate and are sent with every API call. Each
+    # costs ~1,465 tokens; after 10+ the conversation becomes slow
+    # even for simple text queries. Walk backward, keep the most recent
+    # _MAX_KEEP_IMAGES, replace older ones with a text placeholder.
    _MAX_KEEP_IMAGES = 3
    _image_count = 0
    for msg in reversed(result):
@@ -1960,68 +1998,6 @@ def _evict_old_screenshots(result: List[Dict[str, Any]]) -> None:
                    for b in inner
                ]

-
-def convert_messages_to_anthropic(
-    messages: List[Dict],
-    base_url: str | None = None,
-    model: str | None = None,
-) -> Tuple[Optional[Any], List[Dict]]:
-    """Convert OpenAI-format messages to Anthropic format.
-
-    Returns (system_prompt, anthropic_messages).
-    System messages are extracted since Anthropic takes them as a separate param.
-    system_prompt is a string or list of content blocks (when cache_control present).
-
-    When *base_url* is provided and points to a third-party Anthropic-compatible
-    endpoint, all thinking block signatures are stripped.  Signatures are
-    Anthropic-proprietary — third-party endpoints cannot validate them and will
-    reject them with HTTP 400 "Invalid signature in thinking block".
-
-    When *model* is provided and matches the Kimi / Moonshot family (or
-    *base_url* is a Kimi / Moonshot host), unsigned thinking blocks
-    synthesised from ``reasoning_content`` are preserved on replayed
-    assistant tool-call messages — Kimi requires the field to exist, even
-    if empty.
-    """
-    system = None
-    result: List[Dict[str, Any]] = []
-
-    for m in messages:
-        role = m.get("role", "user")
-        content = m.get("content", "")
-
-        if role == "system":
-            if isinstance(content, list):
-                # Preserve cache_control markers on content blocks
-                has_cache = any(
-                    p.get("cache_control") for p in content if isinstance(p, dict)
-                )
-                if has_cache:
-                    system = [p for p in content if isinstance(p, dict)]
-                else:
-                    system = "\n".join(
-                        p["text"] for p in content if p.get("type") == "text"
-                    )
-            else:
-                system = content
-            continue
-
-        if role == "assistant":
-            result.append(_convert_assistant_message(m))
-            continue
-
-        if role == "tool":
-            _convert_tool_message_to_result(result, m)
-            continue
-
-        # Regular user message
-        result.append(_convert_user_message(content))
-
-    _strip_orphaned_tool_blocks(result)
-    result = _merge_consecutive_roles(result)
-    _manage_thinking_signatures(result, base_url, model)
-    _evict_old_screenshots(result)
-
    return system, result


@@ -504,7 +504,7 @@ def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
        return False

    try:
-        from tools.vision_tools import _resize_image_for_vision
+        from tools.vision_tools import _resize_image_for_vision, _is_anthropic_provider
    except Exception as exc:
        logger.warning("image-shrink recovery: vision_tools unavailable — %s", exc)
        return False
@@ -546,6 +546,7 @@ def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
                    Path(tmp.name),
                    mime_type=mime,
                    max_base64_bytes=target_bytes,
+                    clamp_dimensions=_is_anthropic_provider(),
                )
            finally:
                try:
@@ -46,7 +46,6 @@ from agent.message_sanitization import (
    _strip_non_ascii,
 )
 from agent.model_metadata import (
-    MINIMUM_CONTEXT_LENGTH,
    estimate_messages_tokens_rough,
    estimate_request_tokens_rough,
    get_next_probe_tier,
@@ -74,50 +73,6 @@ from utils import base_url_host_matches, env_var_enabled
 logger = logging.getLogger(__name__)


-def _ollama_context_limit_error(agent: Any, request_tokens: int) -> Optional[str]:
-    """Return a user-facing error when Ollama is loaded with too little context."""
-    if not getattr(agent, "tools", None):
-        return None
-
-    runtime_ctx = getattr(agent, "_ollama_num_ctx", None)
-    if not isinstance(runtime_ctx, int) or runtime_ctx <= 0:
-        return None
-    if runtime_ctx >= MINIMUM_CONTEXT_LENGTH:
-        return None
-
-    model = getattr(agent, "model", "") or "the selected model"
-    base_url = getattr(agent, "base_url", "") or "unknown base URL"
-    provider = getattr(agent, "provider", "") or "unknown"
-    tool_count = len(getattr(agent, "tools", None) or [])
-
-    logger.warning(
-        "Ollama runtime context too small for Hermes tool use: "
-        "model=%s provider=%s base_url=%s runtime_context=%d "
-        "minimum_context=%d estimated_request_tokens=%d tool_count=%d "
-        "session=%s",
-        model,
-        provider,
-        base_url,
-        runtime_ctx,
-        MINIMUM_CONTEXT_LENGTH,
-        request_tokens,
-        tool_count,
-        getattr(agent, "session_id", None) or "none",
-    )
-
-    return (
-        f"Ollama loaded `{model}` with only {runtime_ctx:,} tokens of runtime "
-        f"context, but Hermes needs at least {MINIMUM_CONTEXT_LENGTH:,} tokens "
-        "for reliable tool use.\n\n"
-        "Increase the Ollama context for this model and restart/reload the "
-        "model before trying again. A known-good starting point is 65,536 "
-        "tokens. In Hermes config, set `model.ollama_num_ctx: 65536` "
-        "(and `model.context_length: 65536` if you also override the displayed "
-        "model context). If you manage the model through an Ollama Modelfile, "
-        "set `PARAMETER num_ctx 65536` there instead."
-    )
-
-
 def _ra():
    """Lazy reference to ``run_agent`` so callers can patch
    ``run_agent.handle_function_call`` / ``run_agent._set_interrupt`` /
@@ -572,7 +527,6 @@ def run_conversation(
    api_call_count = 0
    final_response = None
    interrupted = False
-    failed = False
    codex_ack_continuations = 0
    length_continue_retries = 0
    truncated_tool_call_retries = 0
@@ -929,26 +883,6 @@ def run_conversation(
        # Calculate approximate request size for logging
        total_chars = sum(len(str(msg)) for msg in api_messages)
        approx_tokens = estimate_messages_tokens_rough(api_messages)
-        approx_request_tokens = estimate_request_tokens_rough(
-            api_messages, tools=agent.tools or None
-        )
-
-        _runtime_context_error = _ollama_context_limit_error(
-            agent, approx_request_tokens
-        )
-        if _runtime_context_error:
-            final_response = _runtime_context_error
-            failed = True
-            _turn_exit_reason = "ollama_runtime_context_too_small"
-            messages.append({"role": "assistant", "content": final_response})
-            agent._emit_status("❌ Ollama runtime context is too small for Hermes tool use")
-            api_call_count -= 1
-            agent._api_call_count = api_call_count
-            try:
-                agent.iteration_budget.refund()
-            except Exception:
-                pass
-            break
        
        # Thinking spinner for quiet mode (animated during API call)
        thinking_spinner = None
@@ -989,7 +923,6 @@ def run_conversation(
        copilot_auth_retry_attempted=False
        thinking_sig_retry_attempted = False
        image_shrink_retry_attempted = False
-        multimodal_tool_content_retry_attempted = False
        oauth_1m_beta_retry_attempted = False
        llama_cpp_grammar_retry_attempted = False
        has_retried_429 = False
@@ -2061,31 +1994,6 @@ def run_conversation(
                            "or shrink didn't reduce size; surfacing original error."
                        )

-                # Multimodal-tool-content recovery: providers that follow
-                # the OpenAI spec strictly (tool message content must be a
-                # string) reject our list-type content with a 400.  Strip
-                # image parts from any list-type tool messages, mark the
-                # (provider, model) as no-list-tool-content for the rest
-                # of this session so future tool results preemptively
-                # downgrade, and retry once.  See issue #27344.
-                if (
-                    classified.reason == FailoverReason.multimodal_tool_content_unsupported
-                    and not multimodal_tool_content_retry_attempted
-                ):
-                    multimodal_tool_content_retry_attempted = True
-                    if agent._try_strip_image_parts_from_tool_messages(api_messages):
-                        agent._vprint(
-                            f"{agent.log_prefix}📐 Provider rejected list-type tool content — "
-                            f"downgraded screenshots to text and retrying...",
-                            force=True,
-                        )
-                        continue
-                    else:
-                        logger.info(
-                            "multimodal-tool-content recovery: no list-type tool "
-                            "messages with image parts found; surfacing original error."
-                        )
-
                # Anthropic OAuth subscription rejected the 1M-context beta
                # header ("long context beta is not yet available for this
                # subscription"). Disable the beta for the rest of this
@@ -3940,11 +3848,7 @@ def run_conversation(
                )

    # Determine if conversation completed successfully
-    completed = (
-        final_response is not None
-        and api_call_count < agent.max_iterations
-        and not failed
-    )
+    completed = final_response is not None and api_call_count < agent.max_iterations

    # Save trajectory if enabled.  ``user_message`` may be a multimodal
    # list of parts; the trajectory format wants a plain string.
@@ -4094,7 +3998,6 @@ def run_conversation(
        "api_calls": api_call_count,
        "completed": completed,
        "turn_exit_reason": _turn_exit_reason,
-        "failed": failed,
        "partial": False,  # True only when stopped due to invalid tool calls
        "interrupted": interrupted,
        "response_previewed": getattr(agent, "_response_was_previewed", False),
@@ -50,7 +50,6 @@ from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple

 from hermes_constants import get_hermes_home
-from agent.skill_utils import is_excluded_skill_path

 logger = logging.getLogger(__name__)

@@ -177,9 +176,7 @@ def get_keep() -> int:

 def _count_skill_files(base: Path) -> int:
    try:
-        return sum(
-            1 for p in base.rglob("SKILL.md") if not is_excluded_skill_path(p)
-        )
+        return sum(1 for _ in base.rglob("SKILL.md"))
    except OSError:
        return 0

@@ -50,7 +50,6 @@ class FailoverReason(enum.Enum):

    # Request format
    format_error = "format_error"        # 400 bad request — abort or strip + retry
-    multimodal_tool_content_unsupported = "multimodal_tool_content_unsupported"  # Provider rejected list-type content in tool messages (e.g. Xiaomi MiMo) — downgrade to text and retry

    # Provider-specific
    thinking_signature = "thinking_signature"  # Anthropic thinking block sig invalid
@@ -166,32 +165,6 @@ _IMAGE_TOO_LARGE_PATTERNS = [
    # the likely culprit; we still try the shrink path before giving up.
 ]

-# Providers that follow the OpenAI spec strictly require tool message
-# ``content`` to be a string.  Some (Anthropic native, Codex Responses,
-# Gemini native, first-party OpenAI) extend this to accept a content-parts
-# list (text + image_url) so screenshots from computer_use survive.  Others
-# (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible
-# providers) reject the list with a 400 — the patterns below are the most
-# common error shapes we see.  Recovery: strip image parts from tool
-# messages in-place, record the (provider, model) for the rest of the
-# session so we don't waste another call learning the same lesson, retry.
-#
-# See: https://github.com/NousResearch/hermes-agent/issues/27344
-_MULTIMODAL_TOOL_CONTENT_PATTERNS = [
-    # Xiaomi MiMo: {"error":{"code":"400","message":"Param Incorrect","param":"text is not set"}}
-    "text is not set",
-    # Generic "tool message must be string" shapes
-    "tool message content must be a string",
-    "tool content must be a string",
-    "tool message must be a string",
-    # OpenAI-compat servers that reject list-type tool content with a
-    # schema-validation message
-    "expected string, got list",
-    "expected string, got array",
-    # Alibaba/DashScope variant
-    "tool_call.content must be string",
-]
-
 # Context overflow patterns
 _CONTEXT_OVERFLOW_PATTERNS = [
    "context length",
@@ -808,19 +781,6 @@ def _classify_400(
 ) -> ClassifiedError:
    """Classify 400 Bad Request — context overflow, format error, or generic."""

-    # Multimodal tool content rejected from 400.  Must be checked BEFORE
-    # image_too_large because the recovery is different (strip image parts
-    # from tool messages, mark the model as no-list-tool-content for the
-    # rest of the session) and BEFORE context_overflow because some of the
-    # patterns ("text is not set") are ambiguous in isolation but become
-    # specific when combined with a 400 on a request known to contain
-    # multimodal tool content.
-    if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
-        return result_fn(
-            FailoverReason.multimodal_tool_content_unsupported,
-            retryable=True,
-        )
-
    # Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
    # Must be checked BEFORE context_overflow because messages can trip both
    # patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
@@ -962,13 +922,6 @@ def _classify_by_message(
            should_compress=True,
        )

-    # Multimodal tool content patterns (from message text when no status_code)
-    if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
-        return result_fn(
-            FailoverReason.multimodal_tool_content_unsupported,
-            retryable=True,
-        )
-
    # Image-too-large patterns (from message text when no status_code)
    if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
        return result_fn(
@@ -1258,10 +1258,6 @@ def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -
        "terminal",
        "process",
        "execute_code",
-        "app_search_tools",
-        "app_tool_schemas",
-        "app_execute_tools",
-        "app_manage_connections",
    }

    if valid_names and not (valid_names & relevant_tool_names):
@@ -1283,7 +1279,7 @@ def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -

    lines = [
        "# Nous Subscription",
-        "Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, browser automation (Browser Use), and app integrations (500+ apps) by default. Modal execution is optional.",
+        "Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, and browser automation (Browser Use) by default. Modal execution is optional.",
        "Current capability status:",
    ]
    lines.extend(_status_line(feature) for feature in features.items())
@@ -1298,39 +1294,6 @@ def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -
    return "\n".join(lines)


-# =========================================================================
-# App tools (500+ external integrations) behavioural prompt
-# =========================================================================
-
-_APP_TOOLS_PROMPT = """\
-## App Tools (500+ External Integrations)
-
-You have app_search_tools, app_tool_schemas, app_execute_tools, and app_manage_connections available RIGHT NOW as callable tools. They are already configured and connected to the Nous tool gateway — no SDK installation, no API keys, no plugin setup needed. Just call them.
-
-**When to use:** When a user asks to interact with ANY external app or service — Gmail, Slack, GitHub, Jira, Notion, Google Sheets, Linear, HubSpot, Figma, Salesforce, or any of 500+ other apps. ALWAYS prefer these tools over loading skills about the same service (e.g. do NOT load the 'linear', 'airtable', 'google-workspace', 'notion', or any similar skill — use app_search_tools instead). Do NOT suggest installing SDKs, CLI tools, MCP servers, or API keys for external services — call app_search_tools directly.
-
-**Workflow:**
-1. Call app_search_tools with a clear use_case description to discover available tools
-2. Check the response for connection status — if no active connection, call app_manage_connections and share the auth link with the user
-3. Review the execution plan and pitfalls in the search response before executing
-4. If a tool has schemaRef instead of input_schema, call app_tool_schemas to get the full schema
-5. Execute tools via app_execute_tools with schema-compliant arguments
-
-**Session tracking:** Pass session: {generate_id: true} on your first app_search_tools call. Reuse the returned session.id in all subsequent calls. Generate a new session when the user pivots to a different task.
-
-**Important:** Never fabricate tool slugs or argument field names. Only use slugs and schemas returned by app_search_tools or app_tool_schemas."""
-
-
-def build_app_tools_prompt(valid_tool_names: "set[str] | None" = None) -> str:
-    """Return the app tools behavioural guidance when the toolset is active."""
-    if valid_tool_names and "app_search_tools" not in valid_tool_names:
-        return ""
-    if not valid_tool_names:
-        # No tool names known — skip (conservative)
-        return ""
-    return _APP_TOOLS_PROMPT
-
-
 # =========================================================================
 # Context files (SOUL.md, AGENTS.md, .cursorrules)
 # =========================================================================
@@ -1,13 +0,0 @@
-"""External secret source integrations.
-
-A secret source is anything that can supply environment-variable-shaped
-credentials at process startup, _after_ ~/.hermes/.env has loaded.  By
-default sources are non-destructive: they only set values for env vars
-that aren't already present, so .env and shell exports continue to win.
-
-Currently shipped:
-
-  - ``bitwarden`` — Bitwarden Secrets Manager (`bws` CLI).  See
-    ``agent.secret_sources.bitwarden`` for the integration and
-    ``hermes_cli.secrets_cli`` for the user-facing setup wizard.
-"""
@@ -1,515 +0,0 @@
-"""Bitwarden Secrets Manager (`bws` CLI) integration.
-
-Hermes pulls API keys from Bitwarden Secrets Manager at process startup
-so they don't have to live in plaintext in ``~/.hermes/.env``.
-
-Design summary
--------------
-
-* The ``bws`` binary is auto-installed into ``<hermes_home>/bin/bws`` on
-  first use.  Hermes pins one version (``_BWS_VERSION``) and downloads
-  the matching asset from the official GitHub Releases page, verifying
-  the SHA-256 against the release's published checksum file.
-* The access token is stored in ``~/.hermes/.env`` as
-  ``BWS_ACCESS_TOKEN`` (or whatever name the user picked in
-  ``secrets.bitwarden.access_token_env``).  This is the one
-  bootstrap secret — every other provider key can live in Bitwarden.
-* Pulling secrets is a single ``bws secret list <project_id>
-  --output json`` call.  We cache the result in-process for
-  ``cache_ttl_seconds`` so back-to-back ``hermes`` invocations don't
-  hammer the API.
-* Failures NEVER block Hermes startup.  Missing binary, no network,
-  expired token, etc. all emit a one-line warning and continue with
-  whatever credentials ``.env`` already had.
-
-The module is intentionally subprocess-driven rather than going through
-the ``bitwarden-sdk-secrets`` Python package: one cross-platform binary
-is easier to lazy-install than a wheels-with-Rust-extension dependency.
-"""
-
-from __future__ import annotations
-
-import hashlib
-import json
-import logging
-import os
-import platform
-import shutil
-import stat
-import subprocess
-import sys
-import tempfile
-import time
-import urllib.error
-import urllib.request
-import zipfile
-from dataclasses import dataclass, field
-from pathlib import Path
-from typing import Dict, List, Optional, Tuple
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# Configuration constants
-# ---------------------------------------------------------------------------
-
-# Pinned upstream version.  Bump in a follow-up PR — never auto-resolve
-# "latest" because upstream release shape (asset names, CLI flags) is
-# allowed to change between majors and we want updates to be deliberate.
-_BWS_VERSION = "2.0.0"
-
-_BWS_RELEASE_BASE = (
-    f"https://github.com/bitwarden/sdk-sm/releases/download/bws-v{_BWS_VERSION}"
-)
-_BWS_CHECKSUM_NAME = f"bws-sha256-checksums-{_BWS_VERSION}.txt"
-
-# How long to wait for bws subprocesses and HTTP downloads, in seconds.
-_BWS_DOWNLOAD_TIMEOUT = 60
-_BWS_RUN_TIMEOUT = 30
-
-# In-process cache so repeated load_hermes_dotenv() calls (CLI startup,
-# gateway hot-reload, test suites) don't re-fetch from BSM.
-_CacheKey = Tuple[str, str]  # (access_token_fingerprint, project_id)
-_CACHE: Dict[_CacheKey, "_CachedFetch"] = {}
-
-
-@dataclass
-class _CachedFetch:
-    secrets: Dict[str, str]
-    fetched_at: float
-
-    def is_fresh(self, ttl_seconds: float) -> bool:
-        if ttl_seconds <= 0:
-            return False
-        return (time.time() - self.fetched_at) < ttl_seconds
-
-
-# ---------------------------------------------------------------------------
-# Public dataclasses
-# ---------------------------------------------------------------------------
-
-
-@dataclass
-class FetchResult:
-    """Outcome of a single BSM pull."""
-
-    secrets: Dict[str, str] = field(default_factory=dict)
-    applied: List[str] = field(default_factory=list)   # set into os.environ
-    skipped: List[str] = field(default_factory=list)   # already set, not overridden
-    warnings: List[str] = field(default_factory=list)  # non-fatal issues
-    error: Optional[str] = None                        # fatal: nothing was fetched
-    binary_path: Optional[Path] = None
-
-    @property
-    def ok(self) -> bool:
-        return self.error is None
-
-
-# ---------------------------------------------------------------------------
-# Binary discovery + lazy install
-# ---------------------------------------------------------------------------
-
-
-def _hermes_bin_dir() -> Path:
-    """Where Hermes stores its managed binaries.  Profile-aware."""
-    from hermes_constants import get_hermes_home
-
-    return get_hermes_home() / "bin"
-
-
-def find_bws(*, install_if_missing: bool = False) -> Optional[Path]:
-    """Return a path to a usable ``bws`` binary, or None.
-
-    Resolution order:
-      1. ``<hermes_home>/bin/bws``  (our managed copy — preferred)
-      2. ``shutil.which("bws")``    (system PATH)
-
-    When ``install_if_missing`` is True and neither resolves, this calls
-    :func:`install_bws` to download and verify the pinned version.
-    """
-    managed = _hermes_bin_dir() / _platform_binary_name()
-    if managed.exists() and os.access(managed, os.X_OK):
-        return managed
-
-    system = shutil.which("bws")
-    if system:
-        return Path(system)
-
-    if install_if_missing:
-        try:
-            return install_bws()
-        except Exception as exc:  # noqa: BLE001 — never block startup
-            logger.warning("bws auto-install failed: %s", exc)
-            return None
-    return None
-
-
-def _platform_binary_name() -> str:
-    return "bws.exe" if platform.system() == "Windows" else "bws"
-
-
-def _platform_asset_name() -> str:
-    """Map (uname, arch, libc) → the upstream asset filename.
-
-    Asset names follow Rust's target triple convention.  Linux defaults
-    to gnu (glibc); we switch to musl only if ldd --version says so.
-    """
-    system = platform.system()
-    machine = platform.machine().lower()
-
-    if system == "Darwin":
-        # Universal binary works on both Intel and Apple Silicon — no
-        # need to pick a per-arch asset.
-        return f"bws-macos-universal-{_BWS_VERSION}.zip"
-
-    if system == "Windows":
-        arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
-        return f"bws-{arch}-pc-windows-msvc-{_BWS_VERSION}.zip"
-
-    if system == "Linux":
-        arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
-        libc = "gnu"
-        # ldd --version writes to stderr on glibc, stdout on musl.  We
-        # don't need bullet-proof detection — getting it wrong falls
-        # back to a clear error from the binary loader, which we catch.
-        try:
-            res = subprocess.run(
-                ["ldd", "--version"],
-                capture_output=True,
-                text=True,
-                timeout=2,
-            )
-            if "musl" in (res.stdout + res.stderr).lower():
-                libc = "musl"
-        except (OSError, subprocess.TimeoutExpired):
-            pass
-        return f"bws-{arch}-unknown-linux-{libc}-{_BWS_VERSION}.zip"
-
-    raise RuntimeError(
-        f"Unsupported platform for bws auto-install: {system} {machine}"
-    )
-
-
-def install_bws(*, force: bool = False) -> Path:
-    """Download, verify, and install the pinned ``bws`` binary.
-
-    Returns the path to the installed executable.  Raises on any
-    failure (network, checksum, extraction) — callers in the auto-install
-    path catch these; the user-facing ``hermes secrets bitwarden setup``
-    surface lets them propagate so the wizard can show a clear error.
-    """
-    bin_dir = _hermes_bin_dir()
-    bin_dir.mkdir(parents=True, exist_ok=True)
-    target = bin_dir / _platform_binary_name()
-
-    if target.exists() and not force:
-        return target
-
-    asset_name = _platform_asset_name()
-    asset_url = f"{_BWS_RELEASE_BASE}/{asset_name}"
-    checksum_url = f"{_BWS_RELEASE_BASE}/{_BWS_CHECKSUM_NAME}"
-
-    with tempfile.TemporaryDirectory(prefix="hermes-bws-") as tmpdir:
-        tmp = Path(tmpdir)
-        zip_path = tmp / asset_name
-        checksum_path = tmp / _BWS_CHECKSUM_NAME
-
-        logger.info("Downloading %s", asset_url)
-        _http_download(asset_url, zip_path)
-        _http_download(checksum_url, checksum_path)
-
-        expected = _expected_sha256(checksum_path, asset_name)
-        actual = _sha256_file(zip_path)
-        if expected.lower() != actual.lower():
-            raise RuntimeError(
-                f"Checksum mismatch for {asset_name}: "
-                f"expected {expected}, got {actual}"
-            )
-
-        with zipfile.ZipFile(zip_path) as zf:
-            member = _pick_zip_member(zf, _platform_binary_name())
-            zf.extract(member, tmp)
-            extracted = tmp / member
-
-        # Move into place atomically.  We write to a sibling tempfile in
-        # the final directory so the rename can't cross filesystems.
-        fd, staged = tempfile.mkstemp(dir=str(bin_dir), prefix=".bws_")
-        os.close(fd)
-        shutil.copy2(extracted, staged)
-        os.chmod(
-            staged,
-            stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR
-            | stat.S_IRGRP | stat.S_IXGRP
-            | stat.S_IROTH | stat.S_IXOTH,
-        )
-        os.replace(staged, target)
-
-    logger.info("Installed bws %s at %s", _BWS_VERSION, target)
-    return target
-
-
-def _http_download(url: str, dest: Path) -> None:
-    req = urllib.request.Request(url, headers={"User-Agent": "hermes-agent"})
-    try:
-        with urllib.request.urlopen(req, timeout=_BWS_DOWNLOAD_TIMEOUT) as resp:  # noqa: S310
-            with open(dest, "wb") as f:
-                shutil.copyfileobj(resp, f)
-    except urllib.error.URLError as exc:
-        raise RuntimeError(f"Failed to download {url}: {exc}") from exc
-
-
-def _expected_sha256(checksum_file: Path, asset_name: str) -> str:
-    """Parse the upstream ``bws-sha256-checksums-X.Y.Z.txt`` file.
-
-    Format is the standard ``sha256sum`` output: ``<hex>  <filename>``,
-    one per line.
-    """
-    text = checksum_file.read_text(encoding="utf-8", errors="replace")
-    for line in text.splitlines():
-        parts = line.strip().split()
-        if len(parts) >= 2 and parts[-1] == asset_name:
-            return parts[0]
-    raise RuntimeError(
-        f"No checksum entry for {asset_name} in {checksum_file.name}"
-    )
-
-
-def _sha256_file(path: Path) -> str:
-    h = hashlib.sha256()
-    with open(path, "rb") as f:
-        for chunk in iter(lambda: f.read(65536), b""):
-            h.update(chunk)
-    return h.hexdigest()
-
-
-def _pick_zip_member(zf: zipfile.ZipFile, binary_name: str) -> str:
-    """Find the binary inside the upstream zip.
-
-    Historically the archive has been flat (``bws`` at the root) but we
-    tolerate a top-level directory just in case upstream changes.
-    """
-    candidates = [n for n in zf.namelist() if n.split("/")[-1] == binary_name]
-    if not candidates:
-        raise RuntimeError(
-            f"Could not find {binary_name} inside downloaded archive "
-            f"(members: {zf.namelist()[:5]}...)"
-        )
-    # Prefer the shortest path (i.e. root over nested) for determinism.
-    candidates.sort(key=len)
-    return candidates[0]
-
-
-# ---------------------------------------------------------------------------
-# Secret fetch + apply
-# ---------------------------------------------------------------------------
-
-
-def _token_fingerprint(token: str) -> str:
-    """SHA-256 prefix used as a cache key — never logged, never displayed."""
-    return hashlib.sha256(token.encode("utf-8")).hexdigest()[:16]
-
-
-def fetch_bitwarden_secrets(
-    *,
-    access_token: str,
-    project_id: str,
-    binary: Optional[Path] = None,
-    cache_ttl_seconds: float = 300,
-    use_cache: bool = True,
-) -> Tuple[Dict[str, str], List[str]]:
-    """Pull the secrets for ``project_id`` from Bitwarden Secrets Manager.
-
-    Returns ``(secrets_dict, warnings_list)``.
-
-    Raises :class:`RuntimeError` for fatal conditions (missing binary,
-    auth failure, unparseable output).  Callers in the env_loader path
-    catch this and emit a single warning; callers in the user-facing
-    setup wizard let it propagate.
-    """
-    if not access_token:
-        raise RuntimeError("Bitwarden access token is empty")
-    if not project_id:
-        raise RuntimeError("Bitwarden project_id is empty")
-
-    cache_key = (_token_fingerprint(access_token), project_id)
-    if use_cache:
-        cached = _CACHE.get(cache_key)
-        if cached and cached.is_fresh(cache_ttl_seconds):
-            return cached.secrets, []
-
-    bws = binary or find_bws(install_if_missing=True)
-    if bws is None:
-        raise RuntimeError(
-            "bws binary not available — auto-install failed and `bws` is "
-            "not on PATH.  Install manually from "
-            "https://github.com/bitwarden/sdk-sm/releases or re-run "
-            "`hermes secrets bitwarden setup`."
-        )
-
-    secrets, warnings = _run_bws_list(bws, access_token, project_id)
-    _CACHE[cache_key] = _CachedFetch(secrets=secrets, fetched_at=time.time())
-    return secrets, warnings
-
-
-def _run_bws_list(
-    bws: Path, access_token: str, project_id: str
-) -> Tuple[Dict[str, str], List[str]]:
-    cmd = [str(bws), "secret", "list", project_id, "--output", "json"]
-    env = os.environ.copy()
-    env["BWS_ACCESS_TOKEN"] = access_token
-    # Make sure we're not echoing telemetry / colour codes into json.
-    env.setdefault("NO_COLOR", "1")
-
-    try:
-        proc = subprocess.run(  # noqa: S603 — bws path is trusted
-            cmd,
-            env=env,
-            capture_output=True,
-            text=True,
-            timeout=_BWS_RUN_TIMEOUT,
-        )
-    except subprocess.TimeoutExpired as exc:
-        raise RuntimeError(
-            f"bws timed out after {_BWS_RUN_TIMEOUT}s fetching secrets"
-        ) from exc
-    except OSError as exc:
-        raise RuntimeError(f"failed to invoke bws: {exc}") from exc
-
-    if proc.returncode != 0:
-        # bws writes auth/network errors to stderr in plain English.
-        # Strip ANSI just in case and surface the first 200 chars.
-        err = (proc.stderr or proc.stdout or "").strip().replace("\x1b", "")
-        raise RuntimeError(
-            f"bws exited {proc.returncode}: {err[:200]}"
-        )
-
-    raw = proc.stdout.strip()
-    if not raw:
-        return {}, ["bws returned no output (empty project?)"]
-
-    try:
-        payload = json.loads(raw)
-    except json.JSONDecodeError as exc:
-        raise RuntimeError(f"bws returned non-JSON output: {exc}") from exc
-
-    if not isinstance(payload, list):
-        raise RuntimeError(
-            f"bws returned unexpected shape: {type(payload).__name__}"
-        )
-
-    secrets: Dict[str, str] = {}
-    warnings: List[str] = []
-    for item in payload:
-        if not isinstance(item, dict):
-            continue
-        key = item.get("key")
-        value = item.get("value")
-        if not isinstance(key, str) or not isinstance(value, str):
-            continue
-        if not _is_valid_env_name(key):
-            warnings.append(
-                f"Skipping secret {key!r}: not a valid env-var name"
-            )
-            continue
-        secrets[key] = value
-    return secrets, warnings
-
-
-def _is_valid_env_name(name: str) -> bool:
-    if not name:
-        return False
-    if not (name[0].isalpha() or name[0] == "_"):
-        return False
-    return all(c.isalnum() or c == "_" for c in name)
-
-
-# ---------------------------------------------------------------------------
-# Public entry point — called from hermes_cli.env_loader
-# ---------------------------------------------------------------------------
-
-
-def apply_bitwarden_secrets(
-    *,
-    enabled: bool,
-    access_token_env: str = "BWS_ACCESS_TOKEN",
-    project_id: str = "",
-    override_existing: bool = False,
-    cache_ttl_seconds: float = 300,
-    auto_install: bool = True,
-) -> FetchResult:
-    """Pull secrets from BSM and set them on ``os.environ``.
-
-    This is the function ``load_hermes_dotenv()`` calls after the .env
-    files have loaded.  It is intentionally defensive — any failure
-    returns a :class:`FetchResult` with ``error`` set; it never raises.
-
-    Parameters mirror the ``secrets.bitwarden.*`` config keys so the
-    caller can just splat the dict in.
-    """
-    result = FetchResult()
-
-    if not enabled:
-        return result
-
-    access_token = os.environ.get(access_token_env, "").strip()
-    if not access_token:
-        result.error = (
-            f"secrets.bitwarden.enabled is true but {access_token_env} is "
-            "not set.  Run `hermes secrets bitwarden setup`."
-        )
-        return result
-
-    if not project_id:
-        result.error = (
-            "secrets.bitwarden.project_id is empty.  "
-            "Run `hermes secrets bitwarden setup`."
-        )
-        return result
-
-    binary = find_bws(install_if_missing=auto_install)
-    result.binary_path = binary
-    if binary is None:
-        result.error = (
-            "bws binary not available and auto-install is disabled.  "
-            "Run `hermes secrets bitwarden setup` to install."
-        )
-        return result
-
-    try:
-        secrets, warnings = fetch_bitwarden_secrets(
-            access_token=access_token,
-            project_id=project_id,
-            binary=binary,
-            cache_ttl_seconds=cache_ttl_seconds,
-        )
-    except RuntimeError as exc:
-        result.error = str(exc)
-        return result
-
-    result.secrets = secrets
-    result.warnings.extend(warnings)
-
-    for key, value in secrets.items():
-        if key == access_token_env:
-            # Don't let BSM clobber the very token we used to fetch
-            # itself — that would be a footgun if someone stored the
-            # token as a BSM secret too.
-            result.skipped.append(key)
-            continue
-        if not override_existing and os.environ.get(key):
-            result.skipped.append(key)
-            continue
-        os.environ[key] = value
-        result.applied.append(key)
-
-    return result
-
-
-# ---------------------------------------------------------------------------
-# Test hook — used by hermetic tests to flush the cache between cases.
-# ---------------------------------------------------------------------------
-
-
-def _reset_cache_for_tests() -> None:
-    _CACHE.clear()
@@ -12,7 +12,7 @@ import sys
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Set, Tuple

-from hermes_constants import get_config_path, get_skills_dir, is_termux
+from hermes_constants import get_config_path, get_skills_dir

 logger = logging.getLogger(__name__)

@@ -24,43 +24,7 @@ PLATFORM_MAP = {
    "windows": "win32",
 }

-EXCLUDED_SKILL_DIRS = frozenset(
-    (
-        ".git",
-        ".github",
-        ".hub",
-        ".archive",
-        ".venv",
-        "venv",
-        "node_modules",
-        "site-packages",
-        "__pycache__",
-        ".tox",
-        ".nox",
-        ".pytest_cache",
-        ".mypy_cache",
-        ".ruff_cache",
-    )
-)
-
-
-def is_excluded_skill_path(path) -> bool:
-    """True if any component of *path* is in EXCLUDED_SKILL_DIRS.
-
-    Use this on every SKILL.md path produced by ``rglob`` to prune
-    dependency, virtualenv, VCS, and cache directories. Centralising the
-    check here keeps every skill-scanning site in sync with the shared
-    exclusion set.
-
-    Accepts a Path or string.
-    """
-    try:
-        parts = path.parts  # Path
-    except AttributeError:
-        from pathlib import PurePath
-        parts = PurePath(str(path)).parts
-    return any(part in EXCLUDED_SKILL_DIRS for part in parts)
-
+EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub", ".archive"))

 # ── Lazy YAML loader ─────────────────────────────────────────────────────

@@ -136,14 +100,6 @@ def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:

    If the field is absent or empty the skill is compatible with **all**
    platforms (backward-compatible default).
-
-    Termux note: on Termux/Android, ``sys.platform`` is ``"linux"`` on
-    older Pythons but became ``"android"`` on Python 3.13+. Termux is a
-    Linux userland riding on the Android kernel, so skills tagged
-    ``linux`` are treated as compatible in Termux regardless of which
-    ``sys.platform`` value Python reports. Individual Linux commands
-    inside a skill may still misbehave (no systemd, BusyBox utils, no
-    apt/dnf, etc.) but that is on the skill, not on platform gating.
    """
    platforms = frontmatter.get("platforms")
    if not platforms:
@@ -151,21 +107,11 @@ def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
    if not isinstance(platforms, list):
        platforms = [platforms]
    current = sys.platform
-    running_in_termux = is_termux()
    for platform in platforms:
        normalized = str(platform).lower().strip()
        mapped = PLATFORM_MAP.get(normalized, normalized)
        if current.startswith(mapped):
            return True
-        # Termux runs a Linux userland on Android. Accept linux-tagged
-        # skills regardless of whether sys.platform is "linux" (pre-3.13
-        # Termux) or "android" (Python 3.13+ Termux, and any other
-        # Android runtime).
-        if running_in_termux and mapped == "linux":
-            return True
-        # Explicit termux/android tags match a Termux session too.
-        if running_in_termux and mapped in ("termux", "android"):
-            return True
    return False


@@ -532,8 +478,7 @@ def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
 def iter_skill_index_files(skills_dir: Path, filename: str):
    """Walk skills_dir yielding sorted paths matching *filename*.

-    Excludes Hermes metadata, VCS, virtualenv/dependency, and cache
-    directories so dependencies cannot register nested skills.
+    Excludes ``.git``, ``.github``, ``.hub``, ``.archive`` directories.
    """
    matches = []
    for root, dirs, files in os.walk(skills_dir, followlinks=True):
@@ -130,12 +130,6 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
    nous_subscription_prompt = _r.build_nous_subscription_prompt(agent.valid_tool_names)
    if nous_subscription_prompt:
        stable_parts.append(nous_subscription_prompt)
-
-    # App tools (500+ external integrations) behavioural guidance
-    app_tools_prompt = _r.build_app_tools_prompt(agent.valid_tool_names)
-    if app_tools_prompt:
-        stable_parts.append(app_tools_prompt)
-
    # Tool-use enforcement: tells the model to actually call tools instead
    # of describing intended actions.  Controlled by config.yaml
    # agent.tool_use_enforcement:
@@ -10221,7 +10221,6 @@ class HermesCLI:
            self._voice_processing = True

        submitted = False
-        transcription_failed = False
        wav_path = None
        try:
            if self._voice_recorder is None:
@@ -10270,24 +10269,18 @@ class HermesCLI:
            else:
                error = result.get("error", "Unknown error")
                _cprint(f"\n{_DIM}Transcription failed: {error}{_RST}")
-                transcription_failed = True

        except Exception as e:
            _cprint(f"\n{_DIM}Voice processing error: {e}{_RST}")
-            transcription_failed = wav_path is not None
        finally:
            with self._voice_lock:
                self._voice_processing = False
            if hasattr(self, '_app') and self._app:
                self._app.invalidate()
-            # Clean up temp file unless transcription failed. On failure, keep
-            # the source recording so long dictation is not lost.
+            # Clean up temp file
            try:
                if wav_path and os.path.isfile(wav_path):
-                    if transcription_failed:
-                        _cprint(f"{_DIM}Recording preserved at: {wav_path}{_RST}")
-                    else:
-                        os.unlink(wav_path)
+                    os.unlink(wav_path)
            except Exception:
                pass

@@ -18,7 +18,6 @@ Security features (based on OWASP + NIST SP 800-63-4 guidance):
 Storage: ~/.hermes/pairing/
 """

-import hashlib
 import json
 import os
 import secrets
@@ -149,11 +148,6 @@ class PairingStore:

    # ----- Pending codes -----

-    @staticmethod
-    def _hash_code(code: str, salt: bytes) -> str:
-        """Hash a pairing code with the given salt using SHA-256."""
-        return hashlib.sha256(salt + code.encode("utf-8")).hexdigest()
-
    def generate_code(
        self, platform: str, user_id: str, user_name: str = ""
    ) -> Optional[str]:
@@ -164,9 +158,6 @@ class PairingStore:
          - User is rate-limited (too recent request)
          - Max pending codes reached for this platform
          - User/platform is in lockout due to failed attempts
-
-        The code is NOT stored in plaintext.  Only a salted SHA-256 hash is
-        persisted so that reading the pending file does not reveal codes.
        """
        with self._lock:
            self._cleanup_expired(platform)
@@ -187,17 +178,8 @@ class PairingStore:
            # Generate cryptographically random code
            code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))

-            # Hash the code with a random salt before storing
-            salt = os.urandom(16)
-            code_hash = self._hash_code(code, salt)
-
-            # Use a unique entry id as the key (not the code itself)
-            entry_id = secrets.token_hex(8)
-
-            # Store pending request with hashed code
-            pending[entry_id] = {
-                "hash": code_hash,
-                "salt": salt.hex(),
+            # Store pending request
+            pending[code] = {
                "user_id": user_id,
                "user_name": user_name,
                "created_at": time.time(),
@@ -213,16 +195,10 @@ class PairingStore:
        """
        Approve a pairing code. Adds the user to the approved list.

-        Returns ``{user_id, user_name}`` on success, ``None`` if the code is
+        Returns {user_id, user_name} on success, None if code is
        invalid/expired OR the platform is currently locked out after
        ``MAX_FAILED_ATTEMPTS`` failed approvals (#10195). Callers can
        disambiguate with ``_is_locked_out(platform)``.
-
-        Verification: the user-provided code is hashed with each stored
-        entry's salt and compared to the stored hash using constant-time
-        comparison. Pre-hash entries (legacy plaintext-key format from
-        pre-upgrade pending.json files) are silently ignored — they get
-        pruned at TTL by ``_cleanup_expired``.
        """
        with self._lock:
            self._cleanup_expired(platform)
@@ -237,73 +213,34 @@ class PairingStore:
                return None

            pending = self._load_json(self._pending_path(platform))
-
-            # Find the entry whose hash matches the provided code.
-            # Tolerate legacy plaintext-key entries (no salt/hash) and
-            # malformed entries — skip them rather than KeyError, so an
-            # in-place upgrade across an existing pending.json doesn't
-            # crash on the first approve call. Legacy entries get pruned
-            # at their TTL by _cleanup_expired.
-            matched_key = None
-            matched_entry = None
-            for entry_id, entry in pending.items():
-                if not isinstance(entry, dict):
-                    continue
-                if "salt" not in entry or "hash" not in entry:
-                    continue
-                try:
-                    salt = bytes.fromhex(entry["salt"])
-                except ValueError:
-                    continue
-                candidate_hash = self._hash_code(code, salt)
-                if secrets.compare_digest(candidate_hash, entry["hash"]):
-                    matched_key = entry_id
-                    matched_entry = entry
-                    break
-
-            if matched_key is None:
+            if code not in pending:
                self._record_failed_attempt(platform)
                return None

-            del pending[matched_key]
+            entry = pending.pop(code)
            self._save_json(self._pending_path(platform), pending)

            # Add to approved list
-            self._approve_user(platform, matched_entry["user_id"],
-                               matched_entry.get("user_name", ""))
+            self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))

            return {
-                "user_id": matched_entry["user_id"],
-                "user_name": matched_entry.get("user_name", ""),
+                "user_id": entry["user_id"],
+                "user_name": entry.get("user_name", ""),
            }

    def list_pending(self, platform: str = None) -> list:
-        """List pending pairing requests, optionally filtered by platform.
-
-        Codes are stored hashed — the ``code`` field is replaced with the
-        first 8 hex characters of the hash so admins can distinguish entries
-        without revealing the original code. Legacy plaintext-key entries
-        (pre-hash format) are shown with a "legacy" placeholder so admins
-        can see them age out without crashing on a missing ``hash`` field.
-        """
+        """List pending pairing requests, optionally filtered by platform."""
        results = []
        platforms = [platform] if platform else self._all_platforms("pending")
        for p in platforms:
            self._cleanup_expired(p)
            pending = self._load_json(self._pending_path(p))
-            for entry_id, info in pending.items():
-                if not isinstance(info, dict):
-                    continue
-                created_at = info.get("created_at")
-                if not isinstance(created_at, (int, float)):
-                    continue
-                age_min = int((time.time() - created_at) / 60)
-                hash_val = info.get("hash")
-                code_display = hash_val[:8] if isinstance(hash_val, str) else "legacy"
+            for code, info in pending.items():
+                age_min = int((time.time() - info["created_at"]) / 60)
                results.append({
                    "platform": p,
-                    "code": code_display,
-                    "user_id": info.get("user_id", ""),
+                    "code": code,
+                    "user_id": info["user_id"],
                    "user_name": info.get("user_name", ""),
                    "age_minutes": age_min,
                })
@@ -360,29 +297,17 @@ class PairingStore:
    # ----- Cleanup -----

    def _cleanup_expired(self, platform: str) -> None:
-        """Remove expired pending codes.
-
-        Tolerant of malformed / legacy entries — anything without a numeric
-        ``created_at`` is treated as expired (it's effectively unusable
-        with the new hash-keyed schema anyway).
-        """
+        """Remove expired pending codes."""
        path = self._pending_path(platform)
        pending = self._load_json(path)
        now = time.time()
-        expired = []
-        for entry_id, info in pending.items():
-            if not isinstance(info, dict):
-                expired.append(entry_id)
-                continue
-            created_at = info.get("created_at")
-            if not isinstance(created_at, (int, float)):
-                expired.append(entry_id)
-                continue
-            if (now - created_at) > CODE_TTL_SECONDS:
-                expired.append(entry_id)
+        expired = [
+            code for code, info in pending.items()
+            if (now - info["created_at"]) > CODE_TTL_SECONDS
+        ]
        if expired:
-            for entry_id in expired:
-                del pending[entry_id]
+            for code in expired:
+                del pending[code]
            self._save_json(path, pending)

    def _all_platforms(self, suffix: str) -> list:
@@ -308,26 +308,11 @@ class WebhookAdapter(BasePlatformAdapter):
            data = json.loads(subs_path.read_text(encoding="utf-8"))
            if not isinstance(data, dict):
                return
-            # Merge: static routes take precedence over dynamic ones.
-            # Reject any dynamic route whose effective secret is empty —
-            # an empty secret would cause _handle_webhook to skip HMAC
-            # validation entirely, letting unauthenticated callers in.
-            new_dynamic: Dict[str, dict] = {}
-            for k, v in data.items():
-                if k in self._static_routes:
-                    continue
-                effective_secret = v.get("secret", self._global_secret)
-                if not effective_secret:
-                    logger.warning(
-                        "[webhook] Dynamic route '%s' skipped: 'secret' is "
-                        "missing or empty. Set a valid HMAC secret, or use "
-                        "'%s' to explicitly disable auth (testing only).",
-                        k,
-                        _INSECURE_NO_AUTH,
-                    )
-                    continue
-                new_dynamic[k] = v
-            self._dynamic_routes = new_dynamic
+            # Merge: static routes take precedence over dynamic ones
+            self._dynamic_routes = {
+                k: v for k, v in data.items()
+                if k not in self._static_routes
+            }
            self._routes = {**self._dynamic_routes, **self._static_routes}
            self._dynamic_routes_mtime = mtime
            logger.info(
@@ -1109,7 +1109,7 @@ def _check_unavailable_skill(command_name: str) -> str | None:
    normalized = command_name.lower().replace("_", "-")
    try:
        from tools.skills_tool import _get_disabled_skill_names
-        from agent.skill_utils import get_all_skills_dirs, is_excluded_skill_path
+        from agent.skill_utils import get_all_skills_dirs
        disabled = _get_disabled_skill_names()

        # Check disabled skills across all dirs (local + external)
@@ -1117,7 +1117,7 @@ def _check_unavailable_skill(command_name: str) -> str | None:
            if not skills_dir.exists():
                continue
            for skill_md in skills_dir.rglob("SKILL.md"):
-                if is_excluded_skill_path(skill_md):
+                if any(part in {'.git', '.github', '.hub', '.archive'} for part in skill_md.parts):
                    continue
                slug, declared_name = _skill_slug_from_frontmatter(skill_md)
                if not slug or not declared_name:
@@ -1136,8 +1136,6 @@ def _check_unavailable_skill(command_name: str) -> str | None:
        optional_dir = get_optional_skills_dir(repo_root / "optional-skills")
        if optional_dir.exists():
            for skill_md in optional_dir.rglob("SKILL.md"):
-                if is_excluded_skill_path(skill_md):
-                    continue
                slug, _declared = _skill_slug_from_frontmatter(skill_md)
                if not slug:
                    continue
@@ -1747,48 +1747,8 @@ DEFAULT_CONFIG = {
        "retries": 2,
    },

-    # =========================================================================
-    # External secret sources
-    # =========================================================================
-    # Pull credentials from external secret managers at process startup
-    # rather than storing them in ~/.hermes/.env.
-    "secrets": {
-        "bitwarden": {
-            # Master switch.  When false, BSM is never contacted and the
-            # bws binary is never auto-installed — same as not having
-            # this section at all.
-            "enabled": False,
-            # Name of the env var that holds the Bitwarden machine-account
-            # access token.  This is the one bootstrap secret; it lives
-            # in ~/.hermes/.env (or your shell) and never in config.yaml.
-            "access_token_env": "BWS_ACCESS_TOKEN",
-            # UUID of the BSM project to sync from.
-            "project_id": "",
-            # Seconds to cache fetched secrets in-process.  0 disables.
-            "cache_ttl_seconds": 300,
-            # When True, BSM values overwrite existing env vars.  Default
-            # True because the point of using BSM is centralized rotation —
-            # if .env had the final say, rotating in Bitwarden wouldn't
-            # take effect until you also cleared the matching .env line.
-            "override_existing": True,
-            # When True, the bws binary is auto-downloaded into
-            # ~/.hermes/bin/ on first use.  When False you must install
-            # bws yourself and have it on PATH.
-            "auto_install": True,
-        },
-    },
-
-    # ── Nous Portal feature flags ──────────────────────────────────────
-    "portal": {
-        # App tools: 500+ external app integrations (Gmail, Slack, GitHub,
-        # Notion, etc.) via the Nous tool gateway.  Requires an active Nous
-        # subscription.  Set to False to hide the app_tools toolset even
-        # when a subscription is present.
-        "app_tools": True,
-    },
-
    # Config schema version - bump this when adding new required fields
-    "_config_version": 24,
+    "_config_version": 23,
 }

 # =============================================================================
@@ -2276,22 +2236,6 @@ OPTIONAL_ENV_VARS = {
        "category": "tool",
        "advanced": True,
    },
-    "TOOLS_GATEWAY_URL": {
-        "description": "Explicit URL for the tools-gateway (app integrations). Overrides the auto-derived tools-gateway.nousresearch.com",
-        "prompt": "Tools-gateway URL",
-        "url": None,
-        "password": False,
-        "category": "tool",
-        "advanced": True,
-    },
-    "PORTAL_APP_TOOLS": {
-        "description": "Enable app integration tools (500+ apps via Nous tool gateway). Requires Nous subscription.",
-        "prompt": "Enable app tools (500+ apps)",
-        "url": None,
-        "password": False,
-        "category": "tool",
-        "advanced": True,
-    },
    "TAVILY_API_KEY": {
        "description": "Tavily API key for AI-native web search, extract, and crawl",
        "prompt": "Tavily API key",
@@ -3073,7 +3017,7 @@ def _normalize_custom_provider_entry(
        "api_mode", "transport", "model", "default_model", "models",
        "context_length", "rate_limit_delay",
        "request_timeout_seconds", "stale_timeout_seconds",
-        "discover_models", "extra_body",
+        "discover_models",
    }
    for camel, snake in _CAMEL_ALIASES.items():
        if camel in entry and snake not in entry:
@@ -3168,10 +3112,6 @@ def _normalize_custom_provider_entry(
    if isinstance(discover_models, bool):
        normalized["discover_models"] = discover_models

-    extra_body = entry.get("extra_body")
-    if isinstance(extra_body, dict):
-        normalized["extra_body"] = dict(extra_body)
-
    return normalized


@@ -3326,13 +3266,13 @@ _KNOWN_ROOT_KEYS = {
    "fallback_providers", "credential_pool_strategies", "toolsets",
    "agent", "terminal", "display", "compression", "delegation",
    "auxiliary", "custom_providers", "context", "memory", "gateway",
-    "sessions", "portal",
+    "sessions",
 }

 # Valid fields inside a custom_providers list entry
 _VALID_CUSTOM_PROVIDER_FIELDS = {
    "name", "base_url", "api_key", "api_mode", "model", "models",
-    "context_length", "rate_limit_delay", "extra_body",
+    "context_length", "rate_limit_delay",
    # key_env is read at runtime by runtime_provider.py and auxiliary_client.py
    # — include it here so the set accurately describes the supported schema.
    "key_env",
@@ -3989,26 +3929,6 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
                        f"{', '.join(added_aux)}"
                    )

-    # ── Version 23 → 24: inject app_tools into saved platform_toolsets ──
-    # The portal.app_tools config flag is handled by deep-merge (DEFAULT_CONFIG
-    # has it, so load_config() always includes it). But platform_toolsets are
-    # user-owned lists that deep-merge can't append to — existing users who
-    # ran `hermes tools` have a saved list that won't include app_tools.
-    if current_ver < 24:
-        config = read_raw_config()
-        pt = config.get("platform_toolsets")
-        if isinstance(pt, dict):
-            patched = False
-            for plat_key, ts_list in pt.items():
-                if isinstance(ts_list, list) and "app_tools" not in ts_list:
-                    ts_list.append("app_tools")
-                    patched = True
-            if patched:
-                save_config(config)
-                results["config_added"].append("app_tools added to platform_toolsets")
-                if not quiet:
-                    print("  ✓ Added app_tools to saved platform toolset lists")
-
    if current_ver < latest_ver and not quiet:
        print(f"Config version: {current_ver} → {latest_ver}")
    
@@ -71,7 +71,7 @@ def curses_checklist(
                curses.use_default_colors()
                curses.init_pair(1, curses.COLOR_GREEN, -1)
                curses.init_pair(2, curses.COLOR_YELLOW, -1)
-                curses.init_pair(3, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)  # dim gray
+                curses.init_pair(3, 8, -1)  # dim gray
            cursor = 0
            scroll_offset = 0

@@ -16,7 +16,6 @@ from pathlib import Path
 from hermes_cli.config import get_hermes_home, get_env_path, get_project_root, load_config
 from hermes_cli.env_loader import load_hermes_dotenv
 from hermes_constants import display_hermes_home
-from agent.skill_utils import is_excluded_skill_path


 def _get_git_commit(project_root: Path) -> str:
@@ -70,8 +69,6 @@ def _count_skills(hermes_home: Path) -> int:
        return 0
    count = 0
    for item in skills_dir.rglob("SKILL.md"):
-        if is_excluded_skill_path(item):
-            continue
        count += 1
    return count

@@ -21,44 +21,6 @@ _CREDENTIAL_SUFFIXES = ("_API_KEY", "_TOKEN", "_SECRET", "_KEY")
 # tests) don't spam the same warning multiple times.
 _WARNED_KEYS: set[str] = set()

-# Map of env-var name → source label ("bitwarden", etc.) for credentials
-# that were injected by an external secret source during load_hermes_dotenv().
-# Used by setup / `hermes model` flows to label detected credentials so
-# users understand WHERE a key came from when their .env doesn't contain it
-# directly (otherwise the "credentials detected ✓" line looks identical to
-# the .env case and they don't know Bitwarden is wired up).
-_SECRET_SOURCES: dict[str, str] = {}
-
-
-def get_secret_source(env_var: str) -> str | None:
-    """Return the label of the secret source that supplied ``env_var``, if any.
-
-    Returns ``"bitwarden"`` for keys pulled from Bitwarden Secrets Manager
-    during the current process's ``load_hermes_dotenv()`` call.  Returns
-    ``None`` for keys that came from ``.env``, the shell environment, or
-    aren't tracked.
-    """
-    return _SECRET_SOURCES.get(env_var)
-
-
-def format_secret_source_suffix(env_var: str) -> str:
-    """Return a human-readable suffix like ``" (from Bitwarden)"`` or ``""``.
-
-    Use this when printing a detected credential so the user can see where
-    it came from.  Empty string when the credential came from ``.env`` or
-    the shell — those are the implicit / "default" cases users already
-    understand.
-    """
-    source = get_secret_source(env_var)
-    if not source:
-        return ""
-    if source == "bitwarden":
-        return " (from Bitwarden)"
-    # Generic fallback — future-proofing for additional secret sources
-    # (e.g. 1Password, HashiCorp Vault) without having to update every
-    # call site.
-    return f" (from {source})"
-

 def _format_offending_chars(value: str, limit: int = 3) -> str:
    """Return a compact 'U+XXXX ('c'), ...' summary of non-ASCII codepoints."""
@@ -210,87 +172,4 @@ def load_hermes_dotenv(
        _load_dotenv_with_fallback(project_env_path, override=not loaded)
        loaded.append(project_env_path)

-    _apply_external_secret_sources(home_path)
-
    return loaded
-
-
-def _apply_external_secret_sources(home_path: Path) -> None:
-    """Pull secrets from external sources (currently Bitwarden) into env.
-
-    Runs AFTER dotenv loads so .env values are visible (we use them to
-    locate the access token) but BEFORE the rest of Hermes reads
-    ``os.environ`` for credentials.  Any failure here is logged and
-    swallowed — external secret sources must never block startup.
-    """
-    try:
-        cfg = _load_secrets_config(home_path)
-    except Exception:  # noqa: BLE001 — config errors must not block startup
-        return
-
-    bw_cfg = (cfg or {}).get("bitwarden") or {}
-    if not bw_cfg.get("enabled"):
-        return
-
-    try:
-        from agent.secret_sources.bitwarden import apply_bitwarden_secrets
-    except ImportError:
-        return
-
-    result = apply_bitwarden_secrets(
-        enabled=True,
-        access_token_env=bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN"),
-        project_id=bw_cfg.get("project_id", ""),
-        override_existing=bool(bw_cfg.get("override_existing", False)),
-        cache_ttl_seconds=float(bw_cfg.get("cache_ttl_seconds", 300)),
-        auto_install=bool(bw_cfg.get("auto_install", True)),
-    )
-
-    if result.applied:
-        # Re-run the ASCII sanitization pass: BSM values are user-supplied
-        # and might have the same copy-paste corruption as a manually
-        # edited .env (see #6843).
-        _sanitize_loaded_credentials()
-        # Remember where these came from so the setup / `hermes model`
-        # flows can label detected credentials with "(from Bitwarden)" —
-        # otherwise users see "credentials ✓" with no hint that the value
-        # came from BSM rather than .env.
-        for name in result.applied:
-            _SECRET_SOURCES[name] = "bitwarden"
-        print(
-            f"  Bitwarden Secrets Manager: applied {len(result.applied)} "
-            f"secret{'s' if len(result.applied) != 1 else ''} "
-            f"({', '.join(sorted(result.applied))})",
-            file=sys.stderr,
-        )
-    if result.error:
-        print(
-            f"  Bitwarden Secrets Manager: {result.error}",
-            file=sys.stderr,
-        )
-    for warn in result.warnings:
-        print(
-            f"  Bitwarden Secrets Manager: {warn}",
-            file=sys.stderr,
-        )
-
-
-def _load_secrets_config(home_path: Path) -> dict:
-    """Read just the ``secrets:`` section out of config.yaml.
-
-    Imported lazily and isolated from the main config loader so a
-    malformed config can't take down dotenv loading entirely.
-    """
-    config_path = home_path / "config.yaml"
-    if not config_path.exists():
-        return {}
-    try:
-        import yaml  # type: ignore
-    except ImportError:
-        return {}
-    try:
-        with open(config_path, "r", encoding="utf-8") as f:
-            data = yaml.safe_load(f) or {}
-    except Exception:  # noqa: BLE001
-        return {}
-    return data.get("secrets") or {}
@@ -275,133 +275,6 @@ def _is_termux_startup_environment(env: dict[str, str] | None = None) -> bool:
    )


-def _read_packed_ref(common_dir: Path, ref: str) -> str | None:
-    """Look up a ref in .git/packed-refs without spawning git.
-
-    packed-refs lines look like ``<sha> <ref>`` with optional ``^<sha>``
-    peel lines and ``#``-prefixed comments / ``# pack-refs with:`` header.
-    """
-    try:
-        text = (common_dir / "packed-refs").read_text(encoding="utf-8", errors="replace")
-    except OSError:
-        return None
-    for line in text.splitlines():
-        if not line or line.startswith("#") or line.startswith("^"):
-            continue
-        parts = line.split(" ", 1)
-        if len(parts) == 2 and parts[1].strip() == ref:
-            return parts[0].strip()
-    return None
-
-
-def _read_git_revision_fingerprint(repo_root: Path) -> str | None:
-    """Return a cheap checkout fingerprint without spawning git."""
-    git_dir = repo_root / ".git"
-    try:
-        if git_dir.is_file():
-            for line in git_dir.read_text(encoding="utf-8", errors="replace").splitlines():
-                key, _, value = line.partition(":")
-                if key.strip() == "gitdir" and value.strip():
-                    git_dir = (repo_root / value.strip()).resolve()
-                    break
-        # Worktrees point HEAD at a per-worktree gitdir but pack their refs
-        # in the main repo's gitdir (referenced via ``commondir``). Resolve
-        # that up front so packed-refs lookups hit the right file.
-        common_dir = git_dir
-        commondir_file = git_dir / "commondir"
-        if commondir_file.exists():
-            try:
-                rel = commondir_file.read_text(encoding="utf-8", errors="replace").strip()
-                if rel:
-                    common_dir = (git_dir / rel).resolve()
-            except OSError:
-                pass
-        head_file = git_dir / "HEAD"
-        head = head_file.read_text(encoding="utf-8", errors="replace").strip()
-        if head.startswith("ref:"):
-            ref = head.split(":", 1)[1].strip()
-            # Loose refs may live in the worktree gitdir OR the common dir
-            # (branches created via `git worktree add` typically live in the
-            # common dir's refs/heads/).
-            for candidate in (git_dir, common_dir):
-                ref_file = candidate / ref
-                if ref_file.exists():
-                    return f"git:{ref}:{ref_file.read_text(encoding='utf-8', errors='replace').strip()}"
-            packed_sha = _read_packed_ref(common_dir, ref)
-            if packed_sha:
-                return f"git:{ref}:{packed_sha}"
-            # Ref name is known but unresolved — still stable across launches,
-            # and the version/release fallback in the caller will invalidate
-            # after `hermes update`.
-            return f"git:{ref}:unresolved"
-        return f"git:HEAD:{head}"
-    except OSError:
-        return None
-
-
-def _termux_bundled_skills_fingerprint() -> str:
-    """Cheap invalidation key for Termux bundled-skill startup sync."""
-    git_fp = _read_git_revision_fingerprint(PROJECT_ROOT)
-    if git_fp:
-        return git_fp
-    skills_dir = PROJECT_ROOT / "skills"
-    try:
-        stat = skills_dir.stat()
-        return f"skills:{__version__}:{__release_date__}:{stat.st_mtime_ns}:{stat.st_size}"
-    except OSError:
-        return f"skills:{__version__}:{__release_date__}:missing"
-
-
-def _termux_bundled_skills_stamp_path() -> Path:
-    return get_hermes_home() / "skills" / ".termux_bundled_sync_stamp"
-
-
-def _termux_bundled_skills_sync_needed() -> bool:
-    if not _is_termux_startup_environment():
-        return True
-    if os.environ.get("HERMES_TERMUX_FORCE_SKILLS_SYNC") == "1":
-        return True
-    try:
-        stamp = _termux_bundled_skills_stamp_path()
-        return stamp.read_text(encoding="utf-8").strip() != _termux_bundled_skills_fingerprint()
-    except OSError:
-        return True
-
-
-def _mark_termux_bundled_skills_synced() -> None:
-    if not _is_termux_startup_environment():
-        return
-    try:
-        stamp = _termux_bundled_skills_stamp_path()
-        stamp.parent.mkdir(parents=True, exist_ok=True)
-        stamp.write_text(_termux_bundled_skills_fingerprint() + "\n", encoding="utf-8")
-    except OSError:
-        pass
-
-
-def _sync_bundled_skills_for_startup() -> bool:
-    """Sync bundled skills, but skip unchanged Termux checkouts cheaply.
-
-    Hashing every bundled skill is safe but expensive on older Android
-    storage. The git/ref stamp keeps post-update correctness: a changed
-    checkout revision forces one real sync, then later starts skip it.
-    """
-    if _is_termux_startup_environment() and not _termux_bundled_skills_sync_needed():
-        return False
-
-    from tools.skills_sync import sync_skills
-
-    sync_skills(quiet=True)
-    _mark_termux_bundled_skills_synced()
-    return True
-
-
-def _termux_should_prefetch_update_check() -> bool:
-    if not _is_termux_startup_environment():
-        return True
-    return os.environ.get("HERMES_TERMUX_PREFETCH_UPDATES") == "1"
-
-
 def _relative_time(ts) -> str:
    """Format a timestamp as relative time (e.g., '2h ago', 'yesterday')."""
    if not ts:
@@ -591,7 +464,7 @@ def _session_browse_picker(sessions: list) -> Optional[str]:
                curses.init_pair(1, curses.COLOR_GREEN, -1)  # selected
                curses.init_pair(2, curses.COLOR_YELLOW, -1)  # header
                curses.init_pair(3, curses.COLOR_CYAN, -1)  # search
-                curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)  # dim
+                curses.init_pair(4, 8, -1)  # dim

            cursor = 0
            scroll_offset = 0
@@ -1273,13 +1146,13 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
            p = Path(ext_dir)
            if (p / "dist" / "entry.js").is_file():
                node = _node_bin("node")
-                return [node, "--expose-gc", str(p / "dist" / "entry.js")], p
+                return [node, str(p / "dist" / "entry.js")], p

        # 1b. Bundled in wheel (pip install)
        bundled = _find_bundled_tui()
        if bundled is not None:
            node = _node_bin("node")
-            return [node, "--expose-gc", str(bundled)], bundled.parent
+            return [node, str(bundled)], bundled.parent

    # 2. Normal flow: npm install if needed, always esbuild, then node dist/entry.js.
    #    --dev flow: npm install if needed, then tsx src/entry.tsx.
@@ -1356,7 +1229,7 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
            sys.exit(1)

    node = _node_bin("node")
-    return [node, "--expose-gc", str(tui_dir / "dist" / "entry.js")], tui_dir
+    return [node, str(tui_dir / "dist" / "entry.js")], tui_dir


 def _normalize_tui_toolsets(toolsets: object) -> list[str]:
@@ -1478,16 +1351,16 @@ def _launch_tui(
        env["HERMES_TUI_TOOL_PROGRESS"] = "off"
    if accept_hooks:
        env["HERMES_ACCEPT_HOOKS"] = "1"
-    # Guarantee an 8GB V8 heap for the TUI. Default node cap is ~1.5–4GB
-    # depending on version and can fatal-OOM on long sessions with large
-    # transcripts / reasoning blobs. Token-level merge: respect any
-    # user-supplied --max-old-space-size (they may have set it higher).
-    # --expose-gc is *not* added here: Node rejects it in NODE_OPTIONS
-    # ("--expose-gc is not allowed in NODE_OPTIONS") and refuses to start.
-    # It is passed as a direct argv flag in _make_tui_argv() instead.
+    # Guarantee an 8GB V8 heap + exposed GC for the TUI. Default node cap is
+    # ~1.5–4GB depending on version and can fatal-OOM on long sessions with
+    # large transcripts / reasoning blobs. Token-level merge: respect any
+    # user-supplied --max-old-space-size (they may have set it higher) and
+    # avoid duplicating --expose-gc.
    _tokens = env.get("NODE_OPTIONS", "").split()
    if not any(t.startswith("--max-old-space-size=") for t in _tokens):
        _tokens.append("--max-old-space-size=8192")
+    if "--expose-gc" not in _tokens:
+        _tokens.append("--expose-gc")
    env["NODE_OPTIONS"] = " ".join(_tokens)
    # HERMES_TUI_RESUME is an internal hand-off from the Python wrapper to the
    # Ink app.  Because we start from os.environ.copy(), an exported/stale value
@@ -1650,20 +1523,19 @@ def cmd_chat(args):
        print("You can run 'hermes setup' at any time to configure.")
        sys.exit(1)

-    # Start update check in background (runs while other init happens).
-    # On Termux this imports rich/prompt_toolkit in the foreground and then
-    # competes for CPU on single-core devices, so keep it opt-in there.
-    if _termux_should_prefetch_update_check():
-        try:
-            from hermes_cli.banner import prefetch_update_check
+    # Start update check in background (runs while other init happens)
+    try:
+        from hermes_cli.banner import prefetch_update_check

-            prefetch_update_check()
-        except Exception:
-            pass
+        prefetch_update_check()
+    except Exception:
+        pass

    # Sync bundled skills on every CLI launch (fast -- skips unchanged skills)
    try:
-        _sync_bundled_skills_for_startup()
+        from tools.skills_sync import sync_skills
+
+        sync_skills(quiet=True)
    except Exception:
        pass

@@ -2433,9 +2305,6 @@ _AUX_TASKS: list[tuple[str, str, str]] = [
    ("mcp", "MCP", "MCP tool reasoning"),
    ("title_generation", "Title generation", "session titles"),
    ("skills_hub", "Skills hub", "skills search/install"),
-    ("triage_specifier", "Triage specifier", "kanban spec fleshing"),
-    ("kanban_decomposer", "Kanban decomposer", "task decomposition"),
-    ("profile_describer", "Profile describer", "auto profile descriptions"),
    ("curator", "Curator", "skill-usage review pass"),
 ]

@@ -4665,9 +4534,7 @@ def _model_flow_copilot(config, current_model=""):
        source = creds.get("source", "")
    else:
        if source in {"GITHUB_TOKEN", "GH_TOKEN"}:
-            from hermes_cli.env_loader import format_secret_source_suffix
-            bw_suffix = format_secret_source_suffix(source)
-            print(f"  GitHub token: {api_key[:8]}... ✓ ({source}{bw_suffix})")
+            print(f"  GitHub token: {api_key[:8]}... ✓ ({source})")
        elif source == "gh auth token":
            print("  GitHub token: ✓ (from `gh auth token`)")
        else:
@@ -4924,10 +4791,7 @@ def _prompt_api_key(pconfig, existing_key: str, provider_id: str = "") -> tuple:
        return new_key, False

    # Already configured — offer K / R / C ────────────────────────────────
-    from hermes_cli.env_loader import format_secret_source_suffix
-
-    source_suffix = format_secret_source_suffix(key_env) if key_env else ""
-    print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓{source_suffix}")
+    print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓")
    if not key_env:
        # Nothing we can rewrite; just acknowledge and move on.
        print()
@@ -5210,9 +5074,7 @@ def _model_flow_bedrock_api_key(config, region, current_model=""):
    # Prompt for API key
    existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
    if existing_key:
-        from hermes_cli.env_loader import format_secret_source_suffix
-        source_suffix = format_secret_source_suffix("AWS_BEARER_TOKEN_BEDROCK")
-        print(f"  Bedrock API Key: {existing_key[:12]}... ✓{source_suffix}")
+        print(f"  Bedrock API Key: {existing_key[:12]}... ✓")
    else:
        print(f"  Endpoint: {mantle_base_url}")
        print()
@@ -5883,22 +5745,7 @@ def _model_flow_anthropic(config, current_model=""):
    if has_creds:
        # Show what we found
        if existing_key:
-            from hermes_cli.env_loader import format_secret_source_suffix
-            from hermes_cli.auth import PROVIDER_REGISTRY
-
-            # Surface which env var supplied the key so users with
-            # Bitwarden see "(from Bitwarden)" — without this, a detected
-            # BSM key looks identical to a key in .env and users assume
-            # nothing is wired up.
-            source_suffix = ""
-            for var in PROVIDER_REGISTRY["anthropic"].api_key_env_vars:
-                if os.getenv(var, "").strip() == existing_key:
-                    source_suffix = format_secret_source_suffix(var)
-                    if source_suffix:
-                        break
-            print(
-                f"  Anthropic credentials: {existing_key[:12]}... ✓{source_suffix}"
-            )
+            print(f"  Anthropic credentials: {existing_key[:12]}... ✓")
        elif cc_available:
            print("  Claude Code credentials: ✓ (auto-detected)")
        print()
@@ -6124,7 +5971,8 @@ def cmd_import(args):
    run_import(args)


-def _print_version_info(*, check_updates: bool = True) -> None:
+def cmd_version(args):
+    """Show version."""
    print(f"Hermes Agent v{__version__} ({__release_date__})")
    print(f"Project: {PROJECT_ROOT}")

@@ -6144,9 +5992,6 @@ def _print_version_info(*, check_updates: bool = True) -> None:
    except ImportError:
        print("OpenAI SDK: Not installed")

-    if not check_updates:
-        return
-
    # Show update status (synchronous — acceptable since user asked for version info)
    try:
        from hermes_cli.banner import check_for_updates
@@ -6165,11 +6010,6 @@ def _print_version_info(*, check_updates: bool = True) -> None:
        pass


-def cmd_version(args):
-    """Show version."""
-    _print_version_info(check_updates=True)
-
-
 def cmd_uninstall(args):
    """Uninstall Hermes Agent."""
    _require_tty("uninstall")
@@ -6246,36 +6086,24 @@ def _validate_critical_files_syntax(root) -> tuple[bool, str | None, str | None]
    them after a successful ``git pull`` so we can auto-roll-back instead of
    leaving the user with a bricked install.

-    The compiled ``.pyc`` is written to a temp directory rather than the
-    source tree's ``__pycache__/`` so we don't race with concurrent test
-    workers that walk the same dir, and so we don't leave a stale pyc
-    behind in production if the next interpreter run picks a different
-    Python version. The pyc is discarded on function return either way —
-    we only care about the compile-or-not signal.
-
    Returns ``(ok, failing_path, error_message)``. ``ok=True`` means every
    file parsed cleanly.
    """
    import py_compile
-    import tempfile

    root = Path(root)
-    with tempfile.TemporaryDirectory(prefix="hermes-syntax-check-") as tmpdir:
-        for relpath in _UPDATE_CRITICAL_FILES:
-            path = root / relpath
-            if not path.exists():
-                # Missing file is suspicious but not necessarily fatal — a future
-                # refactor may legitimately remove one of these. Skip and move on.
-                continue
-            # Mirror the relative path under the tmpdir so two different
-            # files with the same basename don't collide on the cfile name.
-            cfile = Path(tmpdir) / (relpath.replace("/", "__") + "c")
-            try:
-                py_compile.compile(str(path), cfile=str(cfile), doraise=True)
-            except py_compile.PyCompileError as exc:
-                return False, str(path), str(exc)
-            except OSError as exc:
-                return False, str(path), f"could not read: {exc}"
+    for relpath in _UPDATE_CRITICAL_FILES:
+        path = root / relpath
+        if not path.exists():
+            # Missing file is suspicious but not necessarily fatal — a future
+            # refactor may legitimately remove one of these. Skip and move on.
+            continue
+        try:
+            py_compile.compile(str(path), doraise=True)
+        except py_compile.PyCompileError as exc:
+            return False, str(path), str(exc)
+        except OSError as exc:
+            return False, str(path), f"could not read: {exc}"
    return True, None, None


@@ -10585,7 +10413,7 @@ _BUILTIN_SUBCOMMANDS = frozenset(
        "model", "pairing", "plugins", "postinstall", "profile", "proxy",
        "send", "sessions", "setup",
        "skills", "slack", "status", "tools", "uninstall", "update",
-        "version", "webhook", "whatsapp", "chat", "secrets",
+        "version", "webhook", "whatsapp", "chat",
        # Help-ish invocations — plugin commands not being listed in
        # top-level --help is an acceptable trade-off for skipping an
        # expensive eager import of every bundled plugin module.
@@ -10675,137 +10503,6 @@ def _plugin_cli_discovery_needed() -> bool:
    return True


-_AGENT_COMMANDS = {None, "chat", "acp", "rl"}
-_AGENT_SUBCOMMANDS = {
-    "cron": ("cron_command", {"run", "tick"}),
-    "gateway": ("gateway_command", {"run"}),
-    "mcp": ("mcp_action", {"serve"}),
-}
-
-
-def _prepare_agent_startup(args) -> None:
-    """Discover plugins/MCP/hooks for commands that can run an agent turn."""
-    _sub_attr, _sub_set = _AGENT_SUBCOMMANDS.get(args.command, (None, None))
-    if not (
-        args.command in _AGENT_COMMANDS
-        or (_sub_attr and getattr(args, _sub_attr, None) in _sub_set)
-    ):
-        return
-
-    _accept_hooks = bool(getattr(args, "accept_hooks", False))
-    try:
-        from hermes_cli.plugins import discover_plugins
-
-        discover_plugins()
-    except Exception:
-        logger.warning(
-            "plugin discovery failed at CLI startup",
-            exc_info=True,
-        )
-    try:
-        # MCP tool discovery — no event loop running in CLI/TUI startup,
-        # so inline is safe.  Moved here from model_tools.py module scope
-        # to avoid freezing the gateway's event loop on its first message
-        # via the same lazy import path (#16856).
-        from tools.mcp_tool import discover_mcp_tools
-
-        discover_mcp_tools()
-    except Exception:
-        logger.debug(
-            "MCP tool discovery failed at CLI startup",
-            exc_info=True,
-        )
-    try:
-        from hermes_cli.config import load_config
-        from agent.shell_hooks import register_from_config
-
-        register_from_config(load_config(), accept_hooks=_accept_hooks)
-    except Exception:
-        logger.debug(
-            "shell-hook registration failed at CLI startup",
-            exc_info=True,
-        )
-
-
-def _set_chat_arg_defaults(args) -> None:
-    for attr, default in [
-        ("query", None),
-        ("model", None),
-        ("provider", None),
-        ("toolsets", None),
-        ("verbose", False),
-        ("resume", None),
-        ("continue_last", None),
-        ("worktree", False),
-    ]:
-        if not hasattr(args, attr):
-            setattr(args, attr, default)
-
-
-def _is_termux_fast_version_argv(argv: list[str]) -> bool:
-    return argv in (["--version"], ["-V"], ["version"])
-
-
-def _try_termux_fast_cli_launch() -> bool:
-    """Run obvious Termux non-TUI chat/oneshot/version paths on a light parser."""
-    if not _is_termux_startup_environment():
-        return False
-    if os.environ.get("HERMES_TERMUX_DISABLE_FAST_CLI") == "1":
-        return False
-
-    argv = sys.argv[1:]
-    if "-h" in argv or "--help" in argv:
-        return False
-    if os.environ.get("HERMES_TUI") == "1" or "--tui" in argv:
-        return False
-
-    if _is_termux_fast_version_argv(argv):
-        _print_version_info(check_updates=False)
-        return True
-
-    first = _first_positional_argv()
-    has_oneshot = any(
-        arg == "-z" or arg == "--oneshot" or arg.startswith("--oneshot=")
-        for arg in argv
-    )
-    if not has_oneshot and first not in {None, "chat"}:
-        return False
-
-    from hermes_cli._parser import build_top_level_parser
-
-    parser, _subparsers, chat_parser = build_top_level_parser()
-    chat_parser.set_defaults(func=cmd_chat)
-    args = parser.parse_args(_coalesce_session_name_args(argv))
-
-    if getattr(args, "version", False):
-        _print_version_info(check_updates=False)
-        return True
-
-    if getattr(args, "oneshot", None):
-        _prepare_agent_startup(args)
-        from hermes_cli.oneshot import run_oneshot
-
-        sys.exit(
-            run_oneshot(
-                args.oneshot,
-                model=getattr(args, "model", None),
-                provider=getattr(args, "provider", None),
-                toolsets=getattr(args, "toolsets", None),
-            )
-        )
-
-    if (args.resume or args.continue_last) and args.command is None:
-        args.command = "chat"
-
-    if args.command in {None, "chat"}:
-        _set_chat_arg_defaults(args)
-        _prepare_agent_startup(args)
-        cmd_chat(args)
-        return True
-
-    return False
-
-
 def _try_termux_fast_tui_launch() -> bool:
    """Launch obvious Termux TUI invocations before building every subparser.

@@ -10866,8 +10563,6 @@ def main():

    if _try_termux_fast_tui_launch():
        return
-    if _try_termux_fast_cli_launch():
-        return

    from hermes_cli._parser import build_top_level_parser

@@ -10965,42 +10660,6 @@ def main():
    )
    fallback_parser.set_defaults(func=cmd_fallback)

-    # =========================================================================
-    # secrets command — external secret managers (currently: Bitwarden)
-    # =========================================================================
-    secrets_parser = subparsers.add_parser(
-        "secrets",
-        help="Manage external secret sources (Bitwarden Secrets Manager)",
-        description=(
-            "Pull API keys from an external secret manager at process startup "
-            "instead of storing them in ~/.hermes/.env.  Currently supports "
-            "Bitwarden Secrets Manager.  See: "
-            "https://hermes-agent.nousresearch.com/docs/user-guide/secrets/bitwarden"
-        ),
-    )
-    secrets_subparsers = secrets_parser.add_subparsers(dest="secrets_command")
-
-    secrets_bw = secrets_subparsers.add_parser(
-        "bitwarden",
-        aliases=["bw"],
-        help="Bitwarden Secrets Manager integration",
-    )
-
-    # Lazy import — only pays for itself when this subcommand is actually used.
-    from hermes_cli import secrets_cli as _secrets_cli
-
-    _secrets_cli.register_cli(secrets_bw)
-
-    def _dispatch_secrets(args):  # noqa: ANN001
-        sub = getattr(args, "secrets_command", None)
-        bw_sub = getattr(args, "secrets_bw_command", None)
-        if sub in ("bitwarden", "bw") and bw_sub is not None:
-            return args.func(args)
-        secrets_parser.print_help()
-        return 0
-
-    secrets_parser.set_defaults(func=_dispatch_secrets)
-
    # =========================================================================
    # migrate command
    # =========================================================================
@@ -13666,7 +13325,51 @@ Examples:
    # so introspection/management commands (hermes hooks list, cron
    # list, gateway status, mcp add, ...) don't pay discovery cost or
    # trigger consent prompts for hooks the user is still inspecting.
-    _prepare_agent_startup(args)
+    # Groups with mixed admin/CRUD vs. agent-running entries narrow via
+    # the nested subcommand (dest varies by parser).
+    _AGENT_COMMANDS = {None, "chat", "acp", "rl"}
+    _AGENT_SUBCOMMANDS = {
+        "cron": ("cron_command", {"run", "tick"}),
+        "gateway": ("gateway_command", {"run"}),
+        "mcp": ("mcp_action", {"serve"}),
+    }
+    _sub_attr, _sub_set = _AGENT_SUBCOMMANDS.get(args.command, (None, None))
+    if args.command in _AGENT_COMMANDS or (
+        _sub_attr and getattr(args, _sub_attr, None) in _sub_set
+    ):
+        _accept_hooks = bool(getattr(args, "accept_hooks", False))
+        try:
+            from hermes_cli.plugins import discover_plugins
+
+            discover_plugins()
+        except Exception:
+            logger.warning(
+                "plugin discovery failed at CLI startup",
+                exc_info=True,
+            )
+        try:
+            # MCP tool discovery — no event loop running in CLI/TUI startup,
+            # so inline is safe.  Moved here from model_tools.py module scope
+            # to avoid freezing the gateway's event loop on its first message
+            # via the same lazy import path (#16856).
+            from tools.mcp_tool import discover_mcp_tools
+
+            discover_mcp_tools()
+        except Exception:
+            logger.debug(
+                "MCP tool discovery failed at CLI startup",
+                exc_info=True,
+            )
+        try:
+            from hermes_cli.config import load_config
+            from agent.shell_hooks import register_from_config
+
+            register_from_config(load_config(), accept_hooks=_accept_hooks)
+        except Exception:
+            logger.debug(
+                "shell-hook registration failed at CLI startup",
+                exc_info=True,
+            )

    # Handle top-level --oneshot / -z: single-shot mode, stdout = final
    # response only, nothing else. Bypasses cli.py entirely.
@@ -74,12 +74,8 @@ class NousSubscriptionFeatures:
    def modal(self) -> NousFeatureState:
        return self.features["modal"]

-    @property
-    def app_tools(self) -> NousFeatureState:
-        return self.features["app_tools"]
-
    def items(self) -> Iterable[NousFeatureState]:
-        ordered = ("web", "image_gen", "tts", "browser", "modal", "app_tools")
+        ordered = ("web", "image_gen", "tts", "browser", "modal")
        for key in ordered:
            yield self.features[key]

@@ -229,22 +225,6 @@ def _resolve_browser_feature_state(
    return "local", available, active, False


-def _read_portal_app_tools_enabled(config: Optional[Dict[str, object]] = None) -> bool:
-    """Return True when the portal.app_tools config flag is on."""
-    if config is not None:
-        # Fast path: use the pre-loaded config snapshot from the caller
-        import os
-        env_val = os.getenv("PORTAL_APP_TOOLS")
-        if env_val is not None:
-            return is_truthy_value(env_val)
-        portal = config.get("portal")
-        if isinstance(portal, dict):
-            return bool(portal.get("app_tools", True))
-        return True
-    from tools.tool_backend_helpers import portal_app_tools_enabled
-    return portal_app_tools_enabled()
-
-
 def get_nous_subscription_features(
    config: Optional[Dict[str, object]] = None,
 ) -> NousSubscriptionFeatures:
@@ -333,8 +313,6 @@ def get_nous_subscription_features(
    managed_tts_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("openai-audio")
    managed_browser_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("browser-use")
    managed_modal_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("modal")
-    app_gw_ready = bool(managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("tools"))
-    app_config_on = _read_portal_app_tools_enabled(config)
    modal_state = resolve_modal_backend_state(
        modal_mode,
        has_direct=direct_modal,
@@ -498,17 +476,6 @@ def get_nous_subscription_features(
            current_provider="Modal" if terminal_backend == "modal" else terminal_backend or "local",
            explicit_configured=terminal_backend == "modal",
        ),
-        "app_tools": NousFeatureState(
-            key="app_tools",
-            label="App tools (500+ apps)",
-            included_by_default=True,
-            available=app_gw_ready,
-            active=app_gw_ready and app_config_on,
-            managed_by_nous=app_gw_ready and app_config_on,
-            direct_override=False,
-            toolset_enabled=app_config_on,
-            current_provider="Nous Tool Gateway",
-        ),
    }

    return NousSubscriptionFeatures(
@@ -1051,7 +1051,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
            curses.init_pair(1, curses.COLOR_GREEN, -1)
            curses.init_pair(2, curses.COLOR_YELLOW, -1)
            curses.init_pair(3, curses.COLOR_CYAN, -1)
-            curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)  # dim gray
+            curses.init_pair(4, 8, -1)  # dim gray
        cursor = 0
        scroll_offset = 0

@@ -1196,7 +1196,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
                            curses.init_pair(1, curses.COLOR_GREEN, -1)
                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
                            curses.init_pair(3, curses.COLOR_CYAN, -1)
-                            curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
+                            curses.init_pair(4, 8, -1)
                        curses.curs_set(0)
            elif key in {curses.KEY_ENTER, 10, 13}:
                if cursor < n_plugins:
@@ -1228,7 +1228,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
                            curses.init_pair(1, curses.COLOR_GREEN, -1)
                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
                            curses.init_pair(3, curses.COLOR_CYAN, -1)
-                            curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
+                            curses.init_pair(4, 8, -1)
                        curses.curs_set(0)
            elif key in {27, ord("q")}:
                # Save plugin changes on exit
@@ -35,7 +35,6 @@ from pathlib import Path
 from typing import Optional

 from hermes_cli import profiles as profiles_mod
-from agent.skill_utils import is_excluded_skill_path

 logger = logging.getLogger(__name__)

@@ -110,7 +109,8 @@ def _collect_skills(profile_dir: Path) -> list[str]:
        return []
    names: list[str] = []
    for md in skills_dir.rglob("SKILL.md"):
-        if is_excluded_skill_path(md):
+        path_str = str(md)
+        if "/.hub/" in path_str or "/.git/" in path_str:
            continue
        try:
            rel = md.relative_to(skills_dir)
@@ -201,7 +201,7 @@ def describe_profile(
    skill_list = "\n".join(f"  - {n}" for n in skill_names) or "  (no skills installed)"
    skill_count = sum(
        1 for _ in (profile_dir / "skills").rglob("SKILL.md")
-        if not is_excluded_skill_path(_)
+        if "/.hub/" not in str(_) and "/.git/" not in str(_)
    ) if (profile_dir / "skills").is_dir() else 0

    # Read model + provider from the profile's config.
@@ -70,8 +70,6 @@ from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple

-from agent.skill_utils import is_excluded_skill_path
-

 # ---------------------------------------------------------------------------
 # Constants
@@ -465,9 +463,7 @@ def _count_skills(staged: Path) -> int:
    skills_dir = staged / "skills"
    if not skills_dir.is_dir():
        return 0
-    return sum(
-        1 for p in skills_dir.rglob("SKILL.md") if not is_excluded_skill_path(p)
-    )
+    return sum(1 for _ in skills_dir.rglob("SKILL.md"))


 def plan_install(
@@ -30,8 +30,6 @@ from dataclasses import dataclass
 from pathlib import Path, PurePosixPath, PureWindowsPath
 from typing import List, Optional

-from agent.skill_utils import is_excluded_skill_path
-
 _PROFILE_ID_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")

 # Directories bootstrapped inside every new profile
@@ -487,9 +485,8 @@ def _count_skills(profile_dir: Path) -> int:
        return 0
    count = 0
    for md in skills_dir.rglob("SKILL.md"):
-        if is_excluded_skill_path(md):
-            continue
-        count += 1
+        if "/.hub/" not in str(md) and "/.git/" not in str(md):
+            count += 1
    return count


@@ -905,49 +902,7 @@ def delete_profile(name: str, yes: bool = False) -> Path:

    # 4. Remove profile directory
    try:
-        def _make_writable(func, path, exc):
-            """onexc/onerror handler: add +w on PermissionError so rmtree can proceed.
-
-            Handles two cases on NixOS (and other systems with read-only
-            copies from immutable stores):
-            1. The path itself isn't writable (e.g. a file with mode 0444)
-            2. The *parent* directory isn't writable (e.g. mode 0555)
-
-            Compatible with both the ``onexc`` API (3.12+, receives an
-            exception instance) and the ``onerror`` API (3.11-, receives
-            ``sys.exc_info()`` tuple).
-            """
-            import stat as _stat
-            import sys as _sys
-
-            # Normalise the two callback signatures:
-            #   onexc(func, path, exc_instance)   — 3.12+
-            #   onerror(func, path, exc_info_tuple) — 3.11
-            if isinstance(exc, tuple):
-                exc = exc[1]  # exc_info → actual exception object
-
-            if isinstance(exc, PermissionError):
-                # Make the path writable
-                try:
-                    os.chmod(path, os.stat(path).st_mode | _stat.S_IWUSR)
-                except OSError:
-                    pass
-                # Also make the parent writable (needed for unlink/rmdir)
-                parent = os.path.dirname(path)
-                if parent:
-                    try:
-                        os.chmod(parent, os.stat(parent).st_mode | _stat.S_IWUSR)
-                    except OSError:
-                        pass
-                func(path)
-            else:
-                raise
-
-        # ``onexc`` was added in 3.12; fall back to ``onerror`` on 3.11.
-        try:
-            shutil.rmtree(profile_dir, onexc=_make_writable)
-        except TypeError:
-            shutil.rmtree(profile_dir, onerror=_make_writable)
+        shutil.rmtree(profile_dir)
        print(f"✓ Removed {profile_dir}")
    except Exception as e:
        print(f"⚠ Could not remove {profile_dir}: {e}")
@@ -528,9 +528,6 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
                        "api_key": resolved_api_key,
                        "model": entry.get("default_model", ""),
                    }
-                    extra_body = entry.get("extra_body")
-                    if isinstance(extra_body, dict):
-                        result["extra_body"] = dict(extra_body)
                    # The v11→v12 migration writes the API mode under the new
                    # ``transport`` field, but hand-edited configs may still
                    # use the legacy ``api_mode`` spelling.  Accept both —
@@ -556,9 +553,6 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
                            "api_key": resolved_api_key,
                            "model": entry.get("default_model", ""),
                        }
-                        extra_body = entry.get("extra_body")
-                        if isinstance(extra_body, dict):
-                            result["extra_body"] = dict(extra_body)
                        api_mode = _parse_api_mode(entry.get("api_mode") or entry.get("transport"))
                        if api_mode:
                            result["api_mode"] = api_mode
@@ -602,9 +596,6 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
            result["key_env"] = key_env
        if provider_key:
            result["provider_key"] = provider_key
-        extra_body = entry.get("extra_body")
-        if isinstance(extra_body, dict):
-            result["extra_body"] = dict(extra_body)
        api_mode = _parse_api_mode(entry.get("api_mode"))
        if api_mode:
            result["api_mode"] = api_mode
@@ -616,13 +607,6 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
    return None


-def _custom_provider_request_overrides(custom_provider: Dict[str, Any]) -> Dict[str, Any]:
-    extra_body = custom_provider.get("extra_body")
-    if not isinstance(extra_body, dict) or not extra_body:
-        return {}
-    return {"extra_body": dict(extra_body)}
-
-
 def _resolve_named_custom_runtime(
    *,
    requested_provider: str,
@@ -699,12 +683,6 @@ def _resolve_named_custom_runtime(
        model_name = custom_provider.get("model")
        if model_name:
            pool_result["model"] = model_name
-        request_overrides = _custom_provider_request_overrides(custom_provider)
-        if request_overrides:
-            pool_result["request_overrides"] = {
-                **dict(pool_result.get("request_overrides") or {}),
-                **request_overrides,
-            }
        return pool_result

    _cp_is_openai_url   = base_url_host_matches(base_url, "openai.com") or base_url_host_matches(base_url, "openai.azure.com")
@@ -736,9 +714,6 @@ def _resolve_named_custom_runtime(
    # provider name differs from the actual model string the API expects.
    if custom_provider.get("model"):
        result["model"] = custom_provider["model"]
-    request_overrides = _custom_provider_request_overrides(custom_provider)
-    if request_overrides:
-        result["request_overrides"] = request_overrides
    return result


@@ -1,445 +0,0 @@
-"""CLI handlers for ``hermes secrets bitwarden ...``.
-
-Subcommands:
-    setup    — interactive wizard: install bws, prompt for token + project, test fetch
-    status   — show current config + binary version + last fetch outcome
-    sync     — run a fetch right now and show what would be applied (dry-run friendly)
-    disable  — flip ``secrets.bitwarden.enabled`` to False
-    install  — just download the bws binary (no token / project required)
-"""
-
-from __future__ import annotations
-
-import argparse
-import getpass
-import json
-import os
-import subprocess
-import sys
-from pathlib import Path
-from typing import List, Optional, Tuple
-
-from rich.console import Console
-from rich.panel import Panel
-from rich.table import Table
-
-from agent.secret_sources import bitwarden as bw
-from hermes_cli.config import (
-    get_env_path,
-    load_config,
-    save_config,
-    save_env_value,
-)
-
-
-# ---------------------------------------------------------------------------
-# Argparse wiring — called from hermes_cli.main
-# ---------------------------------------------------------------------------
-
-
-def register_cli(parent_parser: argparse.ArgumentParser) -> None:
-    """Attach the ``bitwarden`` subcommand tree to a parent parser.
-
-    Called from ``hermes_cli.main`` as part of building the top-level
-    ``hermes secrets`` parser.
-    """
-    sub = parent_parser.add_subparsers(dest="secrets_bw_command")
-
-    setup = sub.add_parser(
-        "setup",
-        help="Interactive wizard: install bws, store access token, pick project",
-    )
-    setup.add_argument(
-        "--project-id",
-        help="Pre-select a project UUID instead of prompting",
-    )
-    setup.add_argument(
-        "--access-token",
-        help="Provide the access token non-interactively (will be stored in .env)",
-    )
-    setup.set_defaults(func=cmd_setup)
-
-    status = sub.add_parser("status", help="Show config + binary + last fetch")
-    status.set_defaults(func=cmd_status)
-
-    sync = sub.add_parser("sync", help="Fetch secrets now and report what changed")
-    sync.add_argument(
-        "--apply",
-        action="store_true",
-        help="Actually export the secrets into the current shell's env (default: dry-run)",
-    )
-    sync.set_defaults(func=cmd_sync)
-
-    disable = sub.add_parser("disable", help="Turn off the Bitwarden integration")
-    disable.set_defaults(func=cmd_disable)
-
-    install = sub.add_parser(
-        "install",
-        help=f"Download and verify the pinned bws binary (v{bw._BWS_VERSION})",
-    )
-    install.add_argument(
-        "--force",
-        action="store_true",
-        help="Re-download even if a managed copy already exists",
-    )
-    install.set_defaults(func=cmd_install)
-
-
-# ---------------------------------------------------------------------------
-# Handlers
-# ---------------------------------------------------------------------------
-
-
-def cmd_setup(args: argparse.Namespace) -> int:
-    console = Console()
-    console.print(
-        Panel.fit(
-            "[bold]Bitwarden Secrets Manager setup[/bold]\n\n"
-            "Need an access token? In the Bitwarden web app:\n"
-            "  Secrets Manager → Machine accounts → [your account] →\n"
-            "  Access tokens → Create access token\n\n"
-            "Copy the token (starts with [cyan]0.[/cyan]…) — it cannot be retrieved later.",
-            border_style="cyan",
-        )
-    )
-
-    # ------------------------------------------------------------------ binary
-    console.print()
-    console.print("[bold]Step 1[/bold]  Install the bws CLI")
-    try:
-        binary = bw.find_bws(install_if_missing=False)
-        if binary is None:
-            console.print("  No bws on PATH — downloading…")
-            binary = bw.install_bws()
-        version = _bws_version(binary)
-        console.print(f"  [green]✓[/green] {binary}  ({version})")
-    except Exception as exc:  # noqa: BLE001
-        console.print(f"  [red]✗ Could not install bws: {exc}[/red]")
-        console.print(
-            "  Manual install: "
-            "https://github.com/bitwarden/sdk-sm/releases"
-        )
-        return 1
-
-    # ------------------------------------------------------------------- token
-    console.print()
-    console.print("[bold]Step 2[/bold]  Provide your access token")
-    cfg = load_config()
-    secrets_cfg = (cfg.setdefault("secrets", {})
-                     .setdefault("bitwarden", {}))
-    token_env = secrets_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
-
-    token = (args.access_token or "").strip()
-    if not token:
-        token = getpass.getpass(f"  Paste access token ({token_env}): ").strip()
-    if not token:
-        console.print("  [red]Empty token, aborting.[/red]")
-        return 1
-    if not token.startswith("0."):
-        console.print(
-            "  [yellow]Warning: token doesn't start with '0.' — usually that means "
-            "you pasted something other than a BSM access token.  Continuing anyway.[/yellow]"
-        )
-
-    save_env_value(token_env, token)
-    os.environ[token_env] = token  # so the test fetch below sees it
-    console.print(f"  [green]✓[/green] stored in {get_env_path()} as {token_env}")
-
-    # ------------------------------------------------------------------- project
-    if args.project_id and args.project_id.strip():
-        project_id = args.project_id.strip()
-    else:
-        console.print()
-        console.print("[bold]Step 3[/bold]  Pick a project")
-        project_id = ""
-        projects = _list_projects(binary, token, console)
-        if projects is None:
-            return 1
-        if not projects:
-            console.print("  [yellow]No projects visible to this machine account.[/yellow]")
-            console.print(
-                "  In the Bitwarden web app, open the machine account → Projects tab "
-                "and grant it access to at least one project."
-            )
-            return 1
-
-        table = Table(show_header=True, header_style="bold")
-        table.add_column("#", style="cyan", width=4)
-        table.add_column("Name")
-        table.add_column("ID", style="dim")
-        for i, p in enumerate(projects, 1):
-            table.add_row(str(i), p.get("name", "?"), p.get("id", "?"))
-        console.print(table)
-
-        while True:
-            choice = console.input(f"  Select project [1-{len(projects)}]: ").strip()
-            if not choice:
-                continue
-            try:
-                idx = int(choice)
-            except ValueError:
-                console.print("  [red]Enter a number.[/red]")
-                continue
-            if 1 <= idx <= len(projects):
-                project_id = projects[idx - 1]["id"]
-                break
-            console.print(f"  [red]Out of range — pick 1-{len(projects)}.[/red]")
-
-    # ------------------------------------------------------------------- test
-    console.print()
-    step_num = 4 if not (args.project_id and args.project_id.strip()) else 3
-    console.print(f"[bold]Step {step_num}[/bold]  Test fetch")
-    try:
-        secrets, warnings = bw.fetch_bitwarden_secrets(
-            access_token=token,
-            project_id=project_id,
-            binary=binary,
-            use_cache=False,
-        )
-    except Exception as exc:  # noqa: BLE001
-        console.print(f"  [red]✗ Fetch failed: {exc}[/red]")
-        return 1
-
-    if not secrets:
-        console.print("  [yellow]Fetch succeeded but the project has no secrets.[/yellow]")
-    else:
-        table = Table(show_header=True, header_style="bold")
-        table.add_column("Name", style="cyan")
-        table.add_column("Status")
-        for key in sorted(secrets):
-            if key == token_env:
-                status = "[dim]bootstrap token — never overrides itself[/dim]"
-            elif os.environ.get(key):
-                status = "[yellow]already set in env (will be overwritten)[/yellow]"
-            else:
-                status = "[green]new[/green]"
-            table.add_row(key, status)
-        console.print(table)
-    for w in warnings:
-        console.print(f"  [yellow]warning:[/yellow] {w}")
-
-    # ------------------------------------------------------------------- save
-    secrets_cfg["enabled"] = True
-    secrets_cfg["project_id"] = project_id
-    secrets_cfg.setdefault("access_token_env", token_env)
-    secrets_cfg.setdefault("cache_ttl_seconds", 300)
-    secrets_cfg.setdefault("override_existing", True)
-    secrets_cfg.setdefault("auto_install", True)
-    save_config(cfg)
-
-    console.print()
-    console.print(
-        "[green]✓ Bitwarden Secrets Manager is enabled.[/green]  "
-        "Secrets will be pulled at the start of every Hermes process."
-    )
-    console.print(
-        "  Status:  [cyan]hermes secrets bitwarden status[/cyan]\n"
-        "  Refresh: [cyan]hermes secrets bitwarden sync[/cyan]\n"
-        "  Disable: [cyan]hermes secrets bitwarden disable[/cyan]"
-    )
-    return 0
-
-
-def cmd_status(args: argparse.Namespace) -> int:
-    console = Console()
-    cfg = load_config()
-    bw_cfg = (cfg.get("secrets") or {}).get("bitwarden") or {}
-
-    enabled = bool(bw_cfg.get("enabled"))
-    token_env = bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
-    project_id = bw_cfg.get("project_id", "")
-    token_set = bool(os.environ.get(token_env))
-
-    table = Table(show_header=False, box=None, padding=(0, 2))
-    table.add_column("", style="bold")
-    table.add_column("")
-    table.add_row("Enabled",         _yn(enabled))
-    table.add_row("Token env var",   token_env)
-    table.add_row("Token in env",    _yn(token_set))
-    table.add_row("Project ID",      project_id or "[dim](unset)[/dim]")
-    table.add_row("Override existing", _yn(bool(bw_cfg.get("override_existing", False))))
-    table.add_row("Cache TTL (s)",   str(bw_cfg.get("cache_ttl_seconds", 300)))
-    table.add_row("Auto-install",    _yn(bool(bw_cfg.get("auto_install", True))))
-
-    binary = bw.find_bws(install_if_missing=False)
-    if binary:
-        table.add_row("bws binary",  f"{binary} ({_bws_version(binary)})")
-    else:
-        table.add_row("bws binary",  "[yellow]not installed[/yellow]")
-
-    console.print(Panel(table, title="Bitwarden Secrets Manager", border_style="cyan"))
-
-    if not enabled:
-        console.print("\n  Run [cyan]hermes secrets bitwarden setup[/cyan] to enable.")
-        return 0
-    if not token_set:
-        console.print(
-            f"\n  [yellow]Enabled but {token_env} is not set — Hermes will skip BSM "
-            "and warn on next startup.[/yellow]"
-        )
-    if not project_id:
-        console.print(
-            "\n  [yellow]Enabled but no project_id — nothing to fetch.[/yellow]"
-        )
-    return 0
-
-
-def cmd_sync(args: argparse.Namespace) -> int:
-    console = Console()
-    cfg = load_config()
-    bw_cfg = (cfg.get("secrets") or {}).get("bitwarden") or {}
-    if not bw_cfg.get("enabled"):
-        console.print(
-            "[yellow]Bitwarden integration is disabled.  Run "
-            "`hermes secrets bitwarden setup` first.[/yellow]"
-        )
-        return 1
-
-    token_env = bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
-    token = os.environ.get(token_env, "").strip()
-    if not token:
-        console.print(f"[red]{token_env} is not set.[/red]")
-        return 1
-
-    project_id = bw_cfg.get("project_id", "")
-    if not project_id:
-        console.print("[red]No project_id configured.[/red]")
-        return 1
-
-    try:
-        secrets, warnings = bw.fetch_bitwarden_secrets(
-            access_token=token,
-            project_id=project_id,
-            use_cache=False,
-        )
-    except Exception as exc:  # noqa: BLE001
-        console.print(f"[red]Fetch failed: {exc}[/red]")
-        return 1
-
-    if not secrets:
-        console.print("[yellow]No secrets in project.[/yellow]")
-        return 0
-
-    override = bool(bw_cfg.get("override_existing", False)) or args.apply
-    table = Table(show_header=True, header_style="bold")
-    table.add_column("Name", style="cyan")
-    table.add_column("Action")
-    applied = 0
-    for key in sorted(secrets):
-        if key == token_env:
-            table.add_row(key, "[dim]skip (bootstrap token)[/dim]")
-            continue
-        already = bool(os.environ.get(key))
-        if already and not override:
-            table.add_row(key, "[dim]skip (already set)[/dim]")
-            continue
-        if args.apply:
-            os.environ[key] = secrets[key]
-            applied += 1
-            table.add_row(key, "[green]exported[/green]" + (" (overrode)" if already else ""))
-        else:
-            table.add_row(key, "[green]would export[/green]" + (" (overrides)" if already else ""))
-
-    console.print(table)
-    for w in warnings:
-        console.print(f"[yellow]warning:[/yellow] {w}")
-
-    if not args.apply:
-        console.print(
-            "\n  This was a dry-run — secrets are picked up automatically on the "
-            "next [cyan]hermes[/cyan] invocation.  Re-run with [cyan]--apply[/cyan] "
-            "to export into the current shell instead."
-        )
-    else:
-        console.print(f"\n  [green]Exported {applied} secret(s) into current process.[/green]")
-    return 0
-
-
-def cmd_disable(args: argparse.Namespace) -> int:
-    console = Console()
-    cfg = load_config()
-    bw_cfg = (cfg.setdefault("secrets", {})
-                .setdefault("bitwarden", {}))
-    bw_cfg["enabled"] = False
-    save_config(cfg)
-    console.print(
-        "[green]Disabled.[/green]  Bitwarden secrets will NOT be pulled on the next "
-        "Hermes invocation.\n"
-        "  Your access token is left in .env — remove it manually if you also want "
-        "to revoke the credential."
-    )
-    return 0
-
-
-def cmd_install(args: argparse.Namespace) -> int:
-    console = Console()
-    try:
-        path = bw.install_bws(force=bool(args.force))
-        console.print(f"[green]✓[/green] {path}  ({_bws_version(path)})")
-        return 0
-    except Exception as exc:  # noqa: BLE001
-        console.print(f"[red]Install failed: {exc}[/red]")
-        return 1
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-
-def _yn(b: bool) -> str:
-    return "[green]yes[/green]" if b else "[dim]no[/dim]"
-
-
-def _bws_version(binary: Path) -> str:
-    try:
-        res = subprocess.run(
-            [str(binary), "--version"],
-            capture_output=True,
-            text=True,
-            timeout=5,
-        )
-        if res.returncode == 0:
-            return (res.stdout or res.stderr).strip().splitlines()[0]
-    except (OSError, subprocess.TimeoutExpired):
-        pass
-    return "version unknown"
-
-
-def _list_projects(
-    binary: Path, token: str, console: Console
-) -> Optional[List[dict]]:
-    """Call ``bws project list`` and return the parsed list, or None on failure."""
-    env = os.environ.copy()
-    env["BWS_ACCESS_TOKEN"] = token
-    env.setdefault("NO_COLOR", "1")
-    try:
-        res = subprocess.run(
-            [str(binary), "project", "list", "--output", "json"],
-            env=env,
-            capture_output=True,
-            text=True,
-            timeout=15,
-        )
-    except (OSError, subprocess.TimeoutExpired) as exc:
-        console.print(f"  [red]Couldn't list projects: {exc}[/red]")
-        return None
-
-    if res.returncode != 0:
-        err = (res.stderr or res.stdout).strip()[:300]
-        console.print(f"  [red]bws project list failed: {err}[/red]")
-        if "authorization" in err.lower() or "invalid" in err.lower():
-            console.print(
-                "  [yellow]This usually means the access token is wrong or revoked. "
-                "Double-check it in the Bitwarden web app.[/yellow]"
-            )
-        return None
-
-    try:
-        data = json.loads(res.stdout or "[]")
-    except json.JSONDecodeError as exc:
-        console.print(f"  [red]bws returned non-JSON: {exc}[/red]")
-        return None
-    if not isinstance(data, list):
-        return []
-    return [p for p in data if isinstance(p, dict) and p.get("id")]
@@ -23,7 +23,6 @@ from rich.table import Table
 # Lazy imports to avoid circular dependencies and slow startup.
 # tools.skills_hub and tools.skills_guard are imported inside functions.
 from hermes_constants import display_hermes_home
-from agent.skill_utils import is_excluded_skill_path

 _console = Console()

@@ -179,12 +178,9 @@ def _existing_categories() -> List[str]:
            # top level (no category); otherwise treat as a category bucket.
            if (entry / "SKILL.md").exists():
                continue
-            # Has at least one nested SKILL.md (excluding dependency/cache dirs)?
+            # Has at least one nested SKILL.md?
            try:
-                if any(
-                    not is_excluded_skill_path(p)
-                    for p in entry.rglob("SKILL.md")
-                ):
+                if any(entry.rglob("SKILL.md")):
                    out.append(entry.name)
            except OSError:
                continue
@@ -78,7 +78,6 @@ CONFIGURABLE_TOOLSETS = [
    ("discord_admin",   "🛡️  Discord Server Admin",    "list channels/roles, pin, assign roles"),
    ("yuanbao",          "🤖 Yuanbao",                  "group info, member queries, DM"),
    ("computer_use",     "🖱️  Computer Use (macOS)",     "background desktop control via cua-driver"),
-    ("app_tools",        "🔌 App Integrations (500+)",   "Gmail, Slack, GitHub, Jira, Notion, etc. via Nous tool gateway"),
 ]

 # Toolsets that are OFF by default for new installs.
@@ -312,16 +311,6 @@ TOOL_CATEGORIES = {
    "image_gen": {
        "name": "Image Generation",
        "icon": "🎨",
-        # Per-provider rows for FAL.ai (`plugins/image_gen/fal`), OpenAI,
-        # OpenAI Codex, and xAI are injected at runtime from each
-        # ``plugins.image_gen.<vendor>`` package via
-        # ``_plugin_image_gen_providers()`` in ``_visible_providers``.
-        # Only non-provider UX setup-flow rows remain here:
-        #   - "Nous Subscription" — managed FAL billed via the Nous
-        #     subscription (requires_nous_auth + override_env_vars).
-        #     Uses the fal plugin as the underlying backend but has a
-        #     distinct setup UX.
-        # Mirrors the shape browser/video_gen ship today.
        "providers": [
            {
                "name": "Nous Subscription",
@@ -333,6 +322,15 @@ TOOL_CATEGORIES = {
                "override_env_vars": ["FAL_KEY"],
                "imagegen_backend": "fal",
            },
+            {
+                "name": "FAL.ai",
+                "badge": "paid",
+                "tag": "Pick from flux-2-klein, flux-2-pro, gpt-image, nano-banana, etc.",
+                "env_vars": [
+                    {"key": "FAL_KEY", "prompt": "FAL API key", "url": "https://fal.ai/dashboard/keys"},
+                ],
+                "imagegen_backend": "fal",
+            },
        ],
    },
    "video_gen": {
@@ -484,11 +482,6 @@ TOOLSET_ENV_REQUIREMENTS = {
 # ─── Post-Setup Hooks ─────────────────────────────────────────────────────────


-def _cua_driver_cmd() -> str:
-    """Return the cua-driver executable name/path, honoring non-empty overrides."""
-    return os.environ.get("HERMES_CUA_DRIVER_CMD", "").strip() or "cua-driver"
-
-
 def _pip_install(
    args: List[str],
    *,
@@ -557,55 +550,6 @@ def _pip_install(
    )


-
-def _check_cua_driver_asset_for_arch() -> bool:
-    """Check whether the latest CUA release ships an asset for this architecture.
-
-    Returns True if the asset likely exists (or if we cannot determine it).
-    Returns False and prints a warning when the asset is confirmed missing,
-    so callers can skip the install attempt and avoid a raw 404.
-    """
-    import platform as _plat
-    import urllib.request
-
-    machine = _plat.machine()  # "x86_64" or "arm64"
-    if machine == "arm64":
-        # arm64 (Apple Silicon) assets are always published.
-        return True
-
-    # x86_64 / Intel — probe the latest release for an architecture-specific
-    # asset before falling through to the upstream installer.
-    api_url = (
-        "https://api.github.com/repos/trycua/cua/releases/latest"
-    )
-    try:
-        req = urllib.request.Request(api_url, headers={"Accept": "application/vnd.github+json"})
-        with urllib.request.urlopen(req, timeout=10) as resp:
-            release = _json.loads(resp.read().decode())
-        tag = release.get("tag_name", "")
-        assets = release.get("assets", [])
-        arch_names = {"x86_64", "amd64"}
-        has_asset = any(
-            any(a in a_info.get("name", "").lower() for a in arch_names)
-            for a_info in assets
-        )
-        if not has_asset:
-            _print_warning(
-                f"    Latest CUA release ({tag}) has no Intel (x86_64) asset."
-            )
-            _print_info(
-                "    CUA Driver currently only ships Apple Silicon builds."
-            )
-            _print_info(
-                "    See: https://github.com/trycua/cua/issues/1493"
-            )
-            return False
-    except Exception:
-        # Network / API failure — proceed and let the installer handle it.
-        pass
-    return True
-
-
 def install_cua_driver(upgrade: bool = False) -> bool:
    """Install or refresh the cua-driver binary used by Computer Use.

@@ -635,8 +579,7 @@ def install_cua_driver(upgrade: bool = False) -> bool:
        _print_warning("    Computer Use (cua-driver) is macOS-only; skipping.")
        return False

-    driver_cmd = _cua_driver_cmd()
-    binary = shutil.which(driver_cmd)
+    binary = shutil.which("cua-driver")

    # Not installed → fresh install path (only when caller asked for it).
    if not binary and not upgrade:
@@ -644,20 +587,18 @@ def install_cua_driver(upgrade: bool = False) -> bool:
            _print_warning("    curl not found — install manually:")
            _print_info("      https://github.com/trycua/cua/blob/main/libs/cua-driver/README.md")
            return False
-        if not _check_cua_driver_asset_for_arch():
-            return False
        return _run_cua_driver_installer(label="Installing")

    # Already installed and caller didn't ask to upgrade → just confirm.
    if binary and not upgrade:
        try:
            version = subprocess.run(
-                [driver_cmd, "--version"],
+                ["cua-driver", "--version"],
                capture_output=True, text=True, timeout=5,
            ).stdout.strip()
-            _print_success(f"    {driver_cmd} already installed: {version or 'unknown version'}")
+            _print_success(f"    cua-driver already installed: {version or 'unknown version'}")
        except Exception:
-            _print_success(f"    {driver_cmd} already installed.")
+            _print_success("    cua-driver already installed.")
        _print_info("    Grant macOS permissions if not done yet:")
        _print_info("      System Settings > Privacy & Security > Accessibility")
        _print_info("      System Settings > Privacy & Security > Screen Recording")
@@ -668,14 +609,11 @@ def install_cua_driver(upgrade: bool = False) -> bool:
        _print_warning("    curl not found — cannot refresh cua-driver.")
        return bool(binary)

-    if not _check_cua_driver_asset_for_arch():
-        return bool(binary)
-
    if binary:
        # Show before/after version when we have a baseline. Best-effort.
        try:
            before = subprocess.run(
-                [driver_cmd, "--version"],
+                ["cua-driver", "--version"],
                capture_output=True, text=True, timeout=5,
            ).stdout.strip()
        except Exception:
@@ -687,13 +625,13 @@ def install_cua_driver(upgrade: bool = False) -> bool:
    if ok and before:
        try:
            after = subprocess.run(
-                [driver_cmd, "--version"],
+                ["cua-driver", "--version"],
                capture_output=True, text=True, timeout=5,
            ).stdout.strip()
            if after and after != before:
-                _print_success(f"    {driver_cmd} upgraded: {before} → {after}")
+                _print_success(f"    cua-driver upgraded: {before} → {after}")
            elif after:
-                _print_info(f"    {driver_cmd} up to date: {after}")
+                _print_info(f"    cua-driver up to date: {after}")
        except Exception:
            pass
    return ok
@@ -717,12 +655,11 @@ def _run_cua_driver_installer(label: str = "Installing", verbose: bool = True) -
        _print_info(f"    {label} cua-driver (macOS background computer-use)...")
    else:
        _print_info(f"    {label} cua-driver...")
-    driver_cmd = _cua_driver_cmd()
    try:
        result = subprocess.run(install_cmd, shell=True, timeout=300)
-        if result.returncode == 0 and shutil.which(driver_cmd):
+        if result.returncode == 0 and shutil.which("cua-driver"):
            if verbose:
-                _print_success(f"    {driver_cmd} installed.")
+                _print_success("    cua-driver installed.")
                _print_info("    IMPORTANT — grant macOS permissions now:")
                _print_info("      System Settings > Privacy & Security > Accessibility")
                _print_info("      System Settings > Privacy & Security > Screen Recording")
@@ -1569,9 +1506,12 @@ def _plugin_image_gen_providers() -> list[dict]:
    Each returned dict looks like a regular ``TOOL_CATEGORIES`` provider
    row but carries an ``image_gen_plugin_name`` marker so downstream
    code (config writing, model picker) knows to route through the
-    plugin registry. Every image-gen backend is a plugin now — there
-    are no hardcoded rows left in ``TOOL_CATEGORIES["image_gen"]`` for
-    this function to dedupe against (see issue #26241).
+    plugin registry instead of the in-tree FAL backend.
+
+    FAL is skipped — it's already exposed by the hardcoded
+    ``TOOL_CATEGORIES["image_gen"]`` entries. When FAL gets ported to
+    a plugin in a follow-up PR, the hardcoded entries go away and this
+    function surfaces it alongside OpenAI automatically.
    """
    try:
        from agent.image_gen_registry import list_providers
@@ -1584,6 +1524,9 @@ def _plugin_image_gen_providers() -> list[dict]:

    rows: list[dict] = []
    for provider in providers:
+        if getattr(provider, "name", None) == "fal":
+            # FAL has its own hardcoded rows today.
+            continue
        try:
            schema = provider.get_setup_schema()
        except Exception:
@@ -1808,7 +1751,7 @@ _POST_SETUP_INSTALLED: dict = {
    # entry when (a) the post_setup is the ONLY install side-effect for
    # a no-key provider, and (b) an installed-state check is cheap and
    # doesn't trigger a heavy import.
-    "cua_driver": lambda: bool(shutil.which(_cua_driver_cmd())),
+    "cua_driver": lambda: bool(shutil.which("cua-driver")),
 }


@@ -975,13 +975,11 @@ _AUX_TASK_SLOTS: Tuple[str, ...] = (
    "vision",
    "web_extract",
    "compression",
+    "session_search",
    "skills_hub",
    "approval",
    "mcp",
    "title_generation",
-    "triage_specifier",
-    "kanban_decomposer",
-    "profile_describer",
    "curator",
 )

@@ -1,121 +0,0 @@
-Create a professional infographic following these specifications:
-
-## Image Specifications
-
- **Type**: Infographic
- **Layout**: bento-grid
- **Style**: retro-pop-grid
- **Aspect Ratio**: 1:1 (square)
- **Language**: en
-
-## Core Principles
-
- Follow the layout structure precisely for information architecture
- Apply style aesthetics consistently throughout
- Keep information concise, highlight keywords and core concepts
- Use ample whitespace for visual clarity
- Maintain clear visual hierarchy
-
-## Text Requirements
-
- All text must match the specified style treatment
- Main titles should be prominent and readable
- Key concepts should be visually emphasized
- Labels should be clear and appropriately sized
- Use English for all text content
-
-## Layout Guidelines (bento-grid)
-
- Grid of rectangular cells with varied sizes (1x1, 2x1, 1x2, 2x2)
- Hero cell ("ONE TOKEN, EVERY KEY") takes the largest position (top-center or upper-left, 2x2)
- Supporting cells around the hero, mixed cell sizes for rhythm
- Each cell self-contained with its own title + icon + brief content
- Title strip at the top: "BITWARDEN SECRETS MANAGER — HERMES-AGENT PR #30035"
- Footer strip at the bottom with commit SHA + repo
-
-## Style Guidelines (retro-pop-grid)
-
- 1970s retro pop art with strict Swiss international grid
- Background: warm vintage cream/beige (#F5F0E6)
- Accents: salmon pink, sky blue, mustard yellow, mint green — all muted retro tones
- Pure solid black (#000000) and solid white (#FFFFFF) for extreme-contrast cells
- Uniform thick black outlines on ALL illustrations, text boxes, grid dividers
- Pure 2D flat vector aesthetic with subtle screen-print texture
- One cell inverted to black-background-with-white-text for the "NEVER BLOCKS STARTUP" warning section
- Geometric fill patterns in empty cells: checkerboards, diagonal lines, dot grids
- Flat abstract symbols: shields (security), wrenches (install), arrows (rotation), keyholes (auth), checkmarks (tests)
- Vintage comic-style smiley face for "26/26 PASSING" cell
- Bold brutalist or thick retro display fonts for headers; clean sans-serif body
- Decorative stylistic labels acceptable: "WARNING", "NEW DEFAULT", "PINNED", "VERIFIED", "ROTATE"
-
-## Avoid
-
- 3D rendering, gradients, soft shadows, sketch-like lines
- Free-floating elements — everything anchored in grid cells
- Pure white background — must use warm cream/beige
-
---
-
-Generate the infographic based on the content below:
-
-### Title (top strip)
-BITWARDEN SECRETS MANAGER → HERMES-AGENT
-PR #30035
-
-### HERO CELL (largest, top-center, salmon pink background with thick black border)
-ONE TOKEN, EVERY KEY
-Rotate once in the Bitwarden web app.
-Every Hermes process picks it up on next start.
-NEW DEFAULT: override_existing = true
-
-### Cell — LAZY INSTALL (sky blue background)
-~/.hermes/bin/bws
-bws v2.0.0 PINNED
-SHA-256 VERIFIED
-No apt · no brew · no sudo
-Icon: wrench + downward arrow
-
-### Cell — CLI SURFACE (mustard yellow background, checkerboard accents)
-$ hermes secrets bitwarden
-  setup    wizard
-  status   diagnose
-  sync     fetch
-  install  binary
-  disable  off
-Icon: terminal prompt symbol
-
-### Cell — SOURCE OF TRUTH (mint green background)
-BITWARDEN WINS
-Overwrites stale .env on every start
-Bootstrap token never overwritten (exception)
-Icon: keyhole + arrow
-
-### Cell — INVERTED BLACK CELL with WHITE TEXT — NEVER BLOCKS STARTUP (extreme contrast)
-WARNING-FREE STARTUP
-Missing binary → warn + continue
-Bad token → warn + continue
-Network down → warn + continue
-Checksum mismatch → refuse + warn
-30s timeout ceiling
-Icon: white triangle warning sign
-
-### Cell — TESTS (cream with thick black outline, vintage comic smiley face)
-26 / 26
-HERMETIC
-subprocess + urllib mocked
-linux · macos · windows
-x86_64 · arm64
-Icon: comic-style smiley face with checkmark
-
-### Cell — CONFIG YAML (white background with black grid)
-secrets:
-  bitwarden:
-    enabled: true
-    project_id: ...
-    override_existing: true
-    cache_ttl_seconds: 300
-    auto_install: true
-
-### Footer strip (bottom, black-on-cream)
-PR #30035 · commit 7f9b05668 · NousResearch/hermes-agent
-10 files · +1743 / -1 · agent/secret_sources/ · hermes_cli/secrets_cli.py
@@ -1,57 +0,0 @@
-# Hermes-Agent PR #30035 — Bitwarden Secrets Manager Integration
-
-## Hero
-**ONE TOKEN, EVERY KEY**
-Rotate once. Every Hermes process picks it up on next start.
-`secrets.bitwarden.override_existing: true` (default)
-
-## Cells
-
-### Lazy Install
- `bws v2.0.0` pinned
- Downloaded into `~/.hermes/bin/bws`
- SHA-256 verified vs GitHub Releases checksum file
- No apt, no brew, no sudo
- Cross-platform: linux gnu+musl, macos universal, windows x86_64+arm64
-
-### CLI Surface
- `hermes secrets bitwarden setup`     wizard
- `hermes secrets bitwarden status`    diagnose
- `hermes secrets bitwarden sync`      dry-run / --apply
- `hermes secrets bitwarden install`   binary only
- `hermes secrets bitwarden disable`   off switch
-
-### Source of Truth
- Bitwarden WINS on every Hermes start
- BSM values overwrite stale `.env` lines
- Rotate a key once → all your machines reload it
- Bootstrap token `BWS_ACCESS_TOKEN` is the lone exception (never overwritten)
-
-### Never Blocks Startup
- Missing binary → warn + continue
- Bad token → warn + continue
- Checksum mismatch → refuse install + warn
- No network → warn + continue
- Timeout → 30s ceiling, warn + continue
-
-### Tests
- 26/26 passing, hermetic
- subprocess + urllib mocked
- Platform matrix tested (linux, macos, windows × x86_64, arm64)
- Cache hit/miss, auth fail, non-JSON, timeout, override behavior
-
-### Config
-```yaml
-secrets:
-  bitwarden:
-    enabled: true
-    project_id: <uuid>
-    override_existing: true   # NEW DEFAULT
-    cache_ttl_seconds: 300
-    auto_install: true
-```
-
-## Footer
-PR #30035 · commit 7f9b05668 · NousResearch/hermes-agent
-
-10 files changed · +1743 / -1 · agent/secret_sources/ · hermes_cli/secrets_cli.py · tests · docs
@@ -1,85 +0,0 @@
-Create a professional infographic following these specifications:
-
-## Image Specifications
-
- **Type**: Infographic
- **Layout**: bento-grid
- **Style**: technical-schematic (engineering blueprint variant)
- **Aspect Ratio**: 1:1 (square)
- **Language**: English
-
-## Core Principles
-
- Follow the bento-grid layout precisely with varied cell sizes
- Apply technical-schematic aesthetics consistently throughout
- Keep information concise, highlight keywords and core concepts
- Use ample whitespace for visual clarity
- Maintain clear visual hierarchy with a hero cell for the headline metric
-
-## Style Guidelines (technical-schematic blueprint)
-
- Color palette: deep blue background (#1E3A5F), white lines and text, amber accent (#F59E0B) ONLY on the hero metric and critical deltas, cyan callouts for measurement annotations
- Grid pattern overlay across the entire canvas — fine white grid lines on the deep blue background
- All-caps technical stencil typography for headers; clean sans-serif for body
- Dimension lines with arrowheads connecting metrics to their cells
- Technical symbols where appropriate (gear icons, flow arrows, modular block diagrams)
- Consistent stroke weights — bold for cell borders, thin for grid, medium for connector lines
- Engineering spec-sheet aesthetic: feels like a printed architectural blueprint, austere and precise
-
-## Layout Guidelines (bento-grid)
-
- Hero cell (TOP-CENTER or LEFT, occupying ~40% of canvas): "−61 COMPLEXITY · 79 → 18" headline metric in massive amber-on-blue, with subtitle "convert_messages_to_anthropic refactored"
- 7 helper cells in a 2x4 or 3x3 grid showing each extracted helper as its own modular block — each cell has the helper name in all-caps, its complexity number, and one-line role
- Metrics strip cell: BEFORE/AFTER table with deltas (185 statements → ~70, 79 C → 18 C, +5 violations intentional)
- Test validation cell: "152/152 + 213/213 PASS" with checkmark stencil
- Footer strip across bottom: "PR #27784 · agent/anthropic_adapter.py · @kshitijk4poor · NousResearch/hermes-agent"
-
-## Content to render
-
-**Main title (top of canvas, all caps):** "ANTHROPIC ADAPTER · 1-INTO-7 EXTRACTION"
-**Subtitle:** "PR #27784 — convert_messages_to_anthropic refactor"
-
-**Hero cell (largest, amber accent):**
- "−61"
- "CYCLOMATIC COMPLEXITY"
- "79 → 18 MAX (−77%)"
- Subtext: "convert_messages_to_anthropic · pure code motion · zero behavior change"
-
-**7 helper cells (one per helper, each its own modular block):**
-
-1. _convert_assistant_message · C<10 · "Assistant msg → content blocks"
-2. _convert_tool_message_to_result · C=12 · "Tool msg → tool_result + merge"
-3. _convert_user_message · C<10 · "User msg validation"
-4. _strip_orphaned_tool_blocks · C=15 · "Orphan tool_use removal"
-5. _merge_consecutive_roles · C=13 · "Anthropic role-alternation"
-6. _manage_thinking_signatures · C=18 · "Strip/preserve by endpoint"
-7. _evict_old_screenshots · C<10 · "Keep most recent 3 images"
-
-**Metrics cell (table format with arrows):**
- MAX FUNCTION COMPLEXITY: 79 → 18 (−77%)
- MAX STATEMENTS/FUNCTION: 185 → ~70 (−62%)
- LOC FILE-WIDE: −4
- MAIN FUNCTION LOC: 395 → 63
-
-**Test validation cell (checkmark stencil):**
- test_anthropic_adapter.py: 152/152 PASS
- test_auxiliary_client.py: 172/172 PASS
- test_azure_identity_adapter.py: 39/39 PASS
- test_bedrock_1m_context.py: 2/2 PASS
-
-**Behavior preservation cell:**
-"ZERO LOGIC CHANGES · ANTHROPIC + KIMI + DEEPSEEK + MINIMAX + AZURE FOUNDRY + BEDROCK SEMANTICS PRESERVED"
-
-**Footer strip:**
-"PR #27784 · agent/anthropic_adapter.py · cherry-picked from #23968 · @kshitijk4poor · NousResearch/hermes-agent"
-
-## Text Requirements
-
- All text in English, all-caps for headers
- Hero metric "−61" in amber (#F59E0B), oversized, with thick blueprint stencil treatment
- Helper names in white technical stencil
- Complexity numbers (C=12, C=18, etc.) in cyan callouts
- "BEFORE" labels in white-on-blue, "AFTER" labels in amber-on-blue
- Footer in small white stencil
-
-Generate the infographic now as a square engineering blueprint.
@@ -1,66 +0,0 @@
-# Infographic: PR #27784 — convert_messages_to_anthropic refactor
-
-## Hero metric
-**−61 cyclomatic complexity** in `agent/anthropic_adapter.py` (79 → 18 max).
-**−4 LOC** net file-wide. **77% drop** in single-function complexity ceiling.
-
-## Title
-ANTHROPIC ADAPTER · 1-INTO-7 EXTRACTION
-PR #27784 · agent/anthropic_adapter.py · @kshitijk4poor
-
-## Section 1: BEFORE (left side)
-**convert_messages_to_anthropic**
- 185 statements
- 90 branches
- Cyclomatic: 79
- Did 7 jobs in one function
-
-Inline responsibilities mixed together:
-1. Walk + dispatch by role
-2. Tool-result conversion
-3. Orphan tool-use stripping
-4. Same-role merging
-5. Thinking-signature management
-6. Screenshot eviction
-7. Final assembly
-
-## Section 2: AFTER (right side)
-**convert_messages_to_anthropic** — now 63 lines, C<10
-Plus 7 single-responsibility helpers:
-
-| Helper | C | Role |
-|---|---|---|
-| _convert_assistant_message | <10 | Assistant msg → content blocks |
-| _convert_tool_message_to_result | 12 | Tool msg → tool_result + merge |
-| _convert_user_message | <10 | User msg validation + conversion |
-| _strip_orphaned_tool_blocks | 15 | Strip orphan tool_use + tool_result |
-| _merge_consecutive_roles | 13 | Anthropic role-alternation enforce |
-| _manage_thinking_signatures | 18 | Strip/preserve/downgrade by endpoint |
-| _evict_old_screenshots | <10 | Keep most recent 3 images |
-
-## Section 3: METRICS
-| Metric | Before | After | Δ |
-|---|---:|---:|---:|
-| Max function complexity | 79 | 18 | −77% |
-| Max statements/function | 185 | ~70 | −62% |
-| LOC (file-wide) | — | — | **−4** |
-| C901 violations | 3 | 8 | +5 (intentional split) |
-
-## Section 4: ZERO BEHAVIOR CHANGE
- Pure code motion — no logic edits
- Mutating helpers update `result` in place (same as inline)
- `_merge_consecutive_roles` returns new list — caller rebinds
- Anthropic / Kimi / DeepSeek / MiniMax / Azure Foundry / Bedrock semantics preserved
- Thinking-signature handling identical to pre-refactor
-
-## Section 5: TEST VALIDATION
- tests/agent/test_anthropic_adapter.py — **152 / 152 pass**
- tests/agent/test_auxiliary_client.py — **172 / 172 pass**
- tests/agent/test_azure_identity_adapter.py — **39 / 39 pass**
- tests/agent/test_bedrock_1m_context.py — **2 / 2 pass**
-
-## Footer
-File: agent/anthropic_adapter.py
-Original PR: #27784 (cherry-pick of #23968)
-Salvage commit: 9c102b937 (kshitijk4poor authorship preserved)
-Repo: NousResearch/hermes-agent
@@ -4,7 +4,7 @@ let
  src = ../ui-tui;
  npmDeps = pkgs.fetchNpmDeps {
    inherit src;
-    hash = "sha256-F6/MzZOWc0zhW9mIfnaY+PrllPvJcsA/OdFdEM+NpLY=";
+    hash = "sha256-dNL/J4tyQQ7Ji3xfIE5b5Jdi6rQyCFjqYpzLYftJVdc=";
  };

  npm = hermesNpmLib.mkNpmPassthru { folder = "ui-tui"; attr = "tui"; pname = "hermes-tui"; };
@@ -4,7 +4,7 @@ let
  src = ../web;
  npmDeps = pkgs.fetchNpmDeps {
    inherit src;
-    hash = "sha256-xSsyluzU2lNhwGqB6XMCGMv3QFHZizE6hgUyc1jvyOw=";
+    hash = "sha256-GxSmEpclOwmv94KmGMediPITxqXAsxqTEQOoDIbYkUw=";
  };

  npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
@@ -148,7 +148,7 @@ class BrowserUseBrowserProvider(BrowserProvider):

        return {
            "api_key": managed.nous_user_token,
-            "base_url": managed.resolved_origin.rstrip("/"),
+            "base_url": managed.gateway_origin.rstrip("/"),
            "managed_mode": True,
        }

@@ -1,182 +0,0 @@
-"""FAL.ai image generation backend.
-
-Wraps the 18-model FAL catalog (FLUX 2, Z-Image, Nano Banana, GPT
-Image 1.5, Recraft, Imagen 4, Qwen, Ideogram, …) as an
-:class:`ImageGenProvider` implementation.
-
-The heavy lifting — model catalog, payload construction, request
-submission, managed-Nous-gateway selection, Clarity Upscaler chaining
-— lives in :mod:`tools.image_generation_tool`. This plugin reaches into
-that module via call-time indirection (``import tools.image_generation_tool as _it``)
-so:
-
-* the existing test suite (``tests/tools/test_image_generation.py``,
-  ``tests/tools/test_managed_media_gateways.py``) keeps patching
-  ``image_tool._submit_fal_request`` / ``image_tool.fal_client`` /
-  ``image_tool._managed_fal_client`` without modification, and
-* there's exactly one canonical FAL code path on disk — the plugin is a
-  registration adapter, not a parallel implementation.
-
-See issue #26241 for the migration plan and the
-``plugin-extraction-test-patch-compatibility.md`` rules this follows.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import os
-from typing import Any, Dict, List, Optional
-
-from agent.image_gen_provider import (
-    DEFAULT_ASPECT_RATIO,
-    ImageGenProvider,
-    resolve_aspect_ratio,
-)
-
-logger = logging.getLogger(__name__)
-
-
-class FalImageGenProvider(ImageGenProvider):
-    """FAL.ai image generation backend.
-
-    Delegates to ``tools.image_generation_tool.image_generate_tool`` so
-    the in-tree FAL implementation (model catalog, payload builder,
-    managed-gateway selection, Clarity Upscaler chaining) is the single
-    source of truth. Everything is resolved at call time via the
-    ``_it`` indirection so tests can monkey-patch the legacy module.
-    """
-
-    @property
-    def name(self) -> str:
-        return "fal"
-
-    @property
-    def display_name(self) -> str:
-        return "FAL.ai"
-
-    def is_available(self) -> bool:
-        # Available when direct FAL_KEY is set OR the managed Nous
-        # gateway resolves a fal-queue origin. Both checks come from the
-        # legacy module so this provider tracks whatever logic ships
-        # there.
-        import tools.image_generation_tool as _it
-        try:
-            return bool(_it.check_fal_api_key())
-        except Exception:  # noqa: BLE001 — defensive; never break the picker
-            return False
-
-    def list_models(self) -> List[Dict[str, Any]]:
-        import tools.image_generation_tool as _it
-        return [
-            {
-                "id": model_id,
-                "display": meta.get("display", model_id),
-                "speed": meta.get("speed", ""),
-                "strengths": meta.get("strengths", ""),
-                "price": meta.get("price", ""),
-            }
-            for model_id, meta in _it.FAL_MODELS.items()
-        ]
-
-    def default_model(self) -> Optional[str]:
-        import tools.image_generation_tool as _it
-        return _it.DEFAULT_MODEL
-
-    def get_setup_schema(self) -> Dict[str, Any]:
-        return {
-            "name": "FAL.ai",
-            "badge": "paid",
-            "tag": "Pick from flux-2-klein, flux-2-pro, gpt-image, nano-banana, etc.",
-            "env_vars": [
-                {
-                    "key": "FAL_KEY",
-                    "prompt": "FAL API key",
-                    "url": "https://fal.ai/dashboard/keys",
-                },
-            ],
-        }
-
-    def generate(
-        self,
-        prompt: str,
-        aspect_ratio: str = DEFAULT_ASPECT_RATIO,
-        **kwargs: Any,
-    ) -> Dict[str, Any]:
-        """Generate an image via the legacy FAL pipeline.
-
-        Forwards prompt + aspect_ratio (and any forward-compat extras
-        the schema supports) into :func:`tools.image_generation_tool.image_generate_tool`,
-        then reshapes its JSON-string response into the provider-ABC
-        dict format consumed by ``_dispatch_to_plugin_provider``.
-        """
-        import tools.image_generation_tool as _it
-
-        aspect = resolve_aspect_ratio(aspect_ratio)
-        passthrough = {
-            key: kwargs[key]
-            for key in (
-                "num_inference_steps",
-                "guidance_scale",
-                "num_images",
-                "output_format",
-                "seed",
-            )
-            if key in kwargs and kwargs[key] is not None
-        }
-
-        try:
-            raw = _it.image_generate_tool(
-                prompt=prompt,
-                aspect_ratio=aspect,
-                **passthrough,
-            )
-        except Exception as exc:  # noqa: BLE001 — never raise out of generate
-            logger.warning("FAL image_generate_tool raised: %s", exc, exc_info=True)
-            return {
-                "success": False,
-                "image": None,
-                "error": f"FAL image generation failed: {exc}",
-                "error_type": type(exc).__name__,
-                "provider": "fal",
-                "prompt": prompt,
-                "aspect_ratio": aspect,
-            }
-
-        try:
-            response = json.loads(raw) if isinstance(raw, str) else raw
-        except Exception:  # noqa: BLE001
-            response = {"success": False, "image": None, "error": "Invalid JSON from FAL pipeline"}
-
-        if not isinstance(response, dict):
-            response = {
-                "success": False,
-                "image": None,
-                "error": "FAL pipeline returned a non-dict response",
-                "error_type": "provider_contract",
-            }
-
-        # Stamp provider/prompt/aspect_ratio so downstream consumers see
-        # the uniform shape declared in ``agent.image_gen_provider``.
-        response.setdefault("provider", "fal")
-        response.setdefault("prompt", prompt)
-        response.setdefault("aspect_ratio", aspect)
-        # Annotate model best-effort — the legacy pipeline resolves it
-        # internally, so query it after the fact for the response shape.
-        if "model" not in response:
-            try:
-                model_id, _meta = _it._resolve_fal_model()
-                response["model"] = model_id
-            except Exception:  # noqa: BLE001
-                pass
-        return response
-
-
-# ---------------------------------------------------------------------------
-# Plugin entry point
-# ---------------------------------------------------------------------------
-
-
-def register(ctx) -> None:
-    """Plugin entry point — wire ``FalImageGenProvider`` into the registry."""
-    ctx.register_image_gen_provider(FalImageGenProvider())
@@ -1,7 +0,0 @@
-name: fal
-version: 1.0.0
-description: "FAL.ai image generation backend (flux-2-klein, flux-2-pro, nano-banana, gpt-image-1.5, recraft-v3, etc.)."
-author: NousResearch
-kind: backend
-requires_env:
-  - FAL_KEY
@@ -47,25 +47,6 @@ _DEFAULT_ENDPOINT = "http://127.0.0.1:1933"
 _TIMEOUT = 30.0
 _REMOTE_RESOURCE_PREFIXES = ("http://", "https://", "git@", "ssh://", "git://")

-# Maps the viking_remember `category` enum to a viking:// subdirectory.
-# Keep in sync with REMEMBER_SCHEMA.parameters.properties.category.enum.
-_CATEGORY_SUBDIR_MAP = {
-    "preference": "preferences",
-    "entity": "entities",
-    "event": "events",
-    "case": "cases",
-    "pattern": "patterns",
-}
-_DEFAULT_MEMORY_SUBDIR = "preferences"
-
-# Maps the built-in memory tool's `target` ("user" vs "memory") to a subdir
-# for on_memory_write mirroring. User profile facts → preferences; agent
-# notes / observations → patterns. Anything unknown falls back to the default.
-_MEMORY_WRITE_TARGET_SUBDIR_MAP = {
-    "user": "preferences",
-    "memory": "patterns",
-}
-

 # ---------------------------------------------------------------------------
 # Process-level atexit safety net — ensures pending sessions are committed
@@ -626,35 +607,24 @@ class OpenVikingMemoryProvider(MemoryProvider):
        except Exception as e:
            logger.warning("OpenViking session commit failed: %s", e)

-    def _build_memory_uri(self, subdir: str) -> str:
-        """Build a viking:// memory URI under the configured user/subdir."""
-        slug = uuid.uuid4().hex[:12]
-        return f"viking://user/{self._user}/memories/{subdir}/mem_{slug}.md"
-
-    def on_memory_write(
-        self,
-        action: str,
-        target: str,
-        content: str,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> None:
-        """Mirror built-in memory writes to OpenViking via content/write."""
+    def on_memory_write(self, action: str, target: str, content: str) -> None:
+        """Mirror built-in memory writes to OpenViking as explicit memories."""
        if not self._client or action != "add" or not content:
            return

-        subdir = _MEMORY_WRITE_TARGET_SUBDIR_MAP.get(target, _DEFAULT_MEMORY_SUBDIR)
-        uri = self._build_memory_uri(subdir)
-
        def _write():
            try:
                client = _VikingClient(
                    self._endpoint, self._api_key,
                    account=self._account, user=self._user, agent=self._agent,
                )
-                client.post("/api/v1/content/write", {
-                    "uri": uri,
-                    "content": content,
-                    "mode": "create",
+                # Add as a user message with memory context so the commit
+                # picks it up as an explicit memory during extraction
+                client.post(f"/api/v1/sessions/{self._session_id}/messages", {
+                    "role": "user",
+                    "parts": [
+                        {"type": "text", "text": f"[Memory note — {target}] {content}"},
+                    ],
                })
            except Exception as e:
                logger.debug("OpenViking memory mirror failed: %s", e)
@@ -888,27 +858,24 @@ class OpenVikingMemoryProvider(MemoryProvider):
        if not content:
            return tool_error("content is required")

+        # Store as a session message that will be extracted during commit.
+        # The category hint helps OpenViking's extraction classify correctly.
        category = args.get("category", "")
-        subdir = _CATEGORY_SUBDIR_MAP.get(category, _DEFAULT_MEMORY_SUBDIR)
-        uri = self._build_memory_uri(subdir)
+        text = f"[Remember] {content}"
+        if category:
+            text = f"[Remember — {category}] {content}"

-        # Write directly via content/write API.
-        # This creates the file, stores the content, and queues vector indexing
-        # in a single call — no dependency on session commit / VLM extraction.
-        try:
-            result = self._client.post("/api/v1/content/write", {
-                "uri": uri,
-                "content": content,
-                "mode": "create",
-            })
-            written = result.get("result", {}).get("written_bytes", 0)
-            return json.dumps({
-                "status": "stored",
-                "message": f"Memory stored ({written}b) and queued for vector indexing.",
-            })
-        except Exception as e:
-            logger.error("OpenViking content/write failed: %s", e)
-            return tool_error(f"Failed to store memory: {e}")
+        self._client.post(f"/api/v1/sessions/{self._session_id}/messages", {
+            "role": "user",
+            "parts": [
+                {"type": "text", "text": text},
+            ],
+        })
+
+        return json.dumps({
+            "status": "stored",
+            "message": "Memory recorded. Will be extracted and indexed on session commit.",
+        })

    def _tool_add_resource(self, args: dict) -> str:
        url = args.get("url", "")
@@ -282,24 +282,20 @@ def _build_payload(


 # ---------------------------------------------------------------------------
-# fal_client lazy import (shared with image_generation_tool via fal_common)
+# fal_client lazy import (same pattern as image_generation_tool)
 # ---------------------------------------------------------------------------

 _fal_client: Any = None


 def _load_fal_client() -> Any:
-    """Lazy-load the ``fal_client`` SDK and cache it on this module.
-
-    Delegates the actual import to :func:`tools.fal_common.import_fal_client`
-    so the ``lazy_deps`` ensure-install handling stays in one place.
-    """
    global _fal_client
    if _fal_client is not None:
        return _fal_client
-    from tools.fal_common import import_fal_client
-    _fal_client = import_fal_client()
-    return _fal_client
+    import fal_client  # type: ignore
+
+    _fal_client = fal_client
+    return fal_client


 # ---------------------------------------------------------------------------
@@ -238,7 +238,7 @@ def _get_firecrawl_client() -> Any:

        kwargs = {
            "api_key": managed_gateway.nous_user_token,
-            "api_url": managed_gateway.resolved_origin,
+            "api_url": managed_gateway.gateway_origin,
        }
        client_config = (
            "tool-gateway",
@@ -84,7 +84,7 @@ modal = ["modal==1.3.4"]
 daytona = ["daytona==0.155.0"]
 vercel = ["vercel==0.5.7"]
 hindsight = ["hindsight-client==0.6.1"]
-dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "ty==0.0.21", "ruff==0.15.10"]
+dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-xdist==3.8.0", "pytest-split==0.11.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "ty==0.0.21", "ruff==0.15.10"]
 messaging = ["python-telegram-bot[webhooks]==22.6", "discord.py[voice]==2.7.1", "aiohttp==3.13.3", "brotlicffi==1.2.0.1", "slack-bolt==1.27.0", "slack-sdk==3.40.1", "qrcode==7.4.2"]
 cron = []  # croniter is now a core dependency; this extra kept for back-compat
 slack = ["slack-bolt==1.27.0", "slack-sdk==3.40.1", "aiohttp==3.13.3"]
@@ -232,12 +232,16 @@ markers = [
    "integration: marks tests requiring external services (API keys, Modal, etc.)",
    "real_concurrent_gate: opt out of the autouse stub that disables _detect_concurrent_hermes_instances",
 ]
-# pytest-timeout: per-test 30s hard cap with signal method.
-# This is the fallback inside each per-file pytest subprocess (see
-# scripts/run_tests_parallel.py). Per-file isolation gives every test
-# file a fresh Python interpreter; pytest-timeout catches Python-level
-# hangs within a file.
-addopts = "-m 'not integration' --timeout=30 --timeout-method=signal"
+# pytest-timeout: per-test 60s hard cap with thread method.
+# Discovered May 2026: the suite reliably hangs at ~96% on full runs even
+# though every individual test completes in <30s. Root cause is leaked
+# threads / atexit handlers accumulating across thousands of tests until
+# something deadlocks at session teardown. Adding pytest-timeout (with
+# thread method, which forces an interrupt into the test thread) breaks
+# the deadlock — the suite then completes cleanly. The 60s cap is large
+# enough that no legitimate test trips it; if a test exceeds it that's a
+# real bug worth surfacing as a Timeout failure.
+addopts = "-m 'not integration' -n auto --timeout=30 --timeout-method=signal"

 [tool.ty.environment]
 python-version = "3.13"
@@ -3357,25 +3357,6 @@ class AIAgent:
            return content

        if self._model_supports_vision():
-            # Vision-capable on paper — but if we've already learned in this
-            # session that the active (provider, model) rejects list-type
-            # tool content (e.g. Xiaomi MiMo's 400 "text is not set"),
-            # short-circuit to a text summary so we don't burn another
-            # round-trip relearning the same lesson.  Cache populated by
-            # the 400 recovery path in agent.conversation_loop.  Transient
-            # per-session; next session retries.
-            key = (
-                (getattr(self, "provider", "") or "").strip().lower(),
-                (getattr(self, "model", "") or "").strip(),
-            )
-            no_list = getattr(self, "_no_list_tool_content_models", None)
-            if no_list and key in no_list:
-                logger.debug(
-                    "Tool %s: model %s/%s known to reject list-type tool "
-                    "content this session — sending text summary",
-                    tool_name, key[0], key[1],
-                )
-                return _multimodal_text_summary(result)
            return content

        summary = _multimodal_text_summary(result)
@@ -3404,80 +3385,6 @@ class AIAgent:
        from agent.conversation_compression import try_shrink_image_parts_in_messages
        return try_shrink_image_parts_in_messages(api_messages)

-    def _try_strip_image_parts_from_tool_messages(self, api_messages: list) -> bool:
-        """Downgrade list-type tool messages to text summaries in-place.
-
-        Recovery path for providers that reject list-type tool message content
-        (e.g. Xiaomi MiMo's 400 "text is not set"; see issue #27344).  Walks
-        ``api_messages`` for any ``role: "tool"`` message whose ``content`` is
-        a list containing image parts, replaces the content with the existing
-        text part(s) (or a minimal placeholder if none survive), and records
-        the active (provider, model) in ``self._no_list_tool_content_models``
-        so subsequent ``_tool_result_content_for_active_model`` calls in this
-        session preemptively downgrade screenshots without a round-trip.
-
-        Returns True when at least one tool message was downgraded — the
-        caller (the 400 recovery branch in ``agent.conversation_loop``) uses
-        this to decide whether to retry the API call with the modified
-        history or surface the original error.
-        """
-        if not isinstance(api_messages, list):
-            return False
-
-        # Record (provider, model) so we don't relearn this lesson.
-        key = (
-            (getattr(self, "provider", "") or "").strip().lower(),
-            (getattr(self, "model", "") or "").strip(),
-        )
-        if not hasattr(self, "_no_list_tool_content_models"):
-            self._no_list_tool_content_models = set()
-        if key[1]:  # only record when we actually have a model id
-            self._no_list_tool_content_models.add(key)
-
-        changed = False
-        for msg in api_messages:
-            if not isinstance(msg, dict) or msg.get("role") != "tool":
-                continue
-            content = msg.get("content")
-            if not isinstance(content, list):
-                continue
-
-            # Salvage any text parts so the model still sees some signal.
-            text_parts: List[str] = []
-            had_image = False
-            for part in content:
-                if not isinstance(part, dict):
-                    if isinstance(part, str) and part.strip():
-                        text_parts.append(part.strip())
-                    continue
-                ptype = part.get("type")
-                if ptype == "image_url" or ptype == "input_image":
-                    had_image = True
-                    continue
-                if ptype in {"text", "input_text"}:
-                    text = str(part.get("text") or "").strip()
-                    if text:
-                        text_parts.append(text)
-
-            if not had_image:
-                # List-type content but no image parts — leave alone (some
-                # providers reject ANY list content, but stripping a
-                # text-only list doesn't reduce ambiguity; let the caller
-                # surface the original error if this turns out to be the
-                # case).
-                continue
-
-            if text_parts:
-                msg["content"] = "\n\n".join(text_parts)
-            else:
-                msg["content"] = (
-                    "[image content removed — provider does not accept "
-                    "list-type tool message content]"
-                )
-            changed = True
-
-        return changed
-
    def _anthropic_preserve_dots(self) -> bool:
        """True when using an anthropic-compatible endpoint that preserves dots in model names.
        Alibaba/DashScope keeps dots (e.g. qwen3.5-plus).
@@ -47,9 +47,7 @@ ACP_REGISTRY_MANIFEST = REPO_ROOT / "acp_registry" / "agent.json"
 AUTHOR_MAP = {
    # teknium (multiple emails)
    "teknium1@gmail.com": "teknium1",
-    "cipherframe@users.noreply.github.com": "CipherFrame",
    "me@promplate.dev": "CNSeniorious000",
-    "yichengqiao21@gmail.com": "YarrowQiao",
    "erhanyasarx@gmail.com": "erhnysr",
    "30366221+WorldWriter@users.noreply.github.com": "WorldWriter",
    "dafeng@DafengdeMacBook-Pro.local": "WorldWriter",
@@ -60,18 +58,13 @@ AUTHOR_MAP = {
    "mgongzai@gmail.com": "vKongv",
    "0x.badfriend@gmail.com": "discodirector",
    "altriatree@gmail.com": "TruaShamu",
-    "contact-me@stark-x.cn": "Stark-X",
-    "nat@nthrow.io": "nthrow",
    "m@mobrienv.dev": "mikeyobrien",
    "saeed919@pm.me": "falasi",
-    "chrisdlc119@outlook.com": "chdlc",
    "omar@techdeveloper.site": "nycomar",
    "qiyin.zuo@pcitc.com": "qiyin-code",
    "mr.aashiz@gmail.com": "aashizpoudel",
    "70629228+shaun0927@users.noreply.github.com": "shaun0927",
    "98262967+Bihruze@users.noreply.github.com": "Bihruze",
-    "189280367+Lempkey@users.noreply.github.com": "Lempkey",
-    "leovillalbajr@gmail.com": "Lempkey",
    "nidhi2894@gmail.com": "nidhi-singh02",
    "30312689+aashizpoudel@users.noreply.github.com": "aashizpoudel",
    "oleksii.lisikh@gmail.com": "olisikh",
@@ -935,8 +928,6 @@ AUTHOR_MAP = {
    "holynn@placeholder.local": "holynn-q",
    "agent@hermes.local": "jacdevos",
    "sunsky.lau@gmail.com": "liuhao1024",
-    "fabianoeq@gmail.com": "rodrigoeqnit",
-    "178342791+sgtworkman@users.noreply.github.com": "sgtworkman",
    "qiuqfang98@qq.com": "keepcalmqqf",
    "261867348+ai-ag2026@users.noreply.github.com": "ai-ag2026",
    "yanzh.su@gmail.com": "YanzhongSu",
@@ -3,36 +3,29 @@
 # `pytest` directly to guarantee your local run matches CI behavior.
 #
 # What this script enforces:
-#   * Per-file isolation via scripts/run_tests_parallel.py — each test
-#     file runs in its own freshly-spawned `python -m pytest <file>`
-#     subprocess. No xdist, no shared workers, no module-level leakage
-#     between files.
+#   * -n 4 xdist workers (CI has 4 cores; -n auto diverges locally)
 #   * TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0 (deterministic)
-#   * Env vars blanked (conftest.py also does this, but this
-#     is belt-and-suspenders for anyone running pytest outside our
-#     conftest path — e.g. on a single file)
-#   * Proper venv activation (probes .venv, venv, then ~/.hermes/...)
+#   * Credential env vars blanked (conftest.py also does this, but this
+#     is belt-and-suspenders for anyone running `pytest` outside of
+#     our conftest path — e.g. calling pytest on a single file)
+#   * Proper venv activation
 #
 # Usage:
-#   scripts/run_tests.sh                            # full suite
-#   scripts/run_tests.sh -j 4                       # cap parallelism
-#   scripts/run_tests.sh tests/agent/               # discover only here
-#   scripts/run_tests.sh tests/agent/ tests/acp/    # multiple roots
-#   scripts/run_tests.sh tests/foo.py               # single file
-#   scripts/run_tests.sh tests/foo.py -- --tb=long  # path + pytest args
-#   scripts/run_tests.sh -- -v --tb=long            # pytest args only
-#
-# Everything after a literal '--' is passed through to each per-file
-# pytest invocation. Positional path arguments before '--' override
-# the default discovery root (tests/).
+#   scripts/run_tests.sh                     # full suite
+#   scripts/run_tests.sh tests/agent/        # one directory
+#   scripts/run_tests.sh tests/agent/test_foo.py::TestClass::test_method
+#   scripts/run_tests.sh --tb=long -v        # pass-through pytest args

 set -euo pipefail

 # ── Locate repo root ────────────────────────────────────────────────────────
+# Works whether this is the main checkout or a worktree.
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"

 # ── Activate venv ───────────────────────────────────────────────────────────
+# Prefer a .venv in the current tree, fall back to the main checkout's venv
+# (useful for worktrees where we don't always duplicate the venv).
 VENV=""
 for candidate in "$REPO_ROOT/.venv" "$REPO_ROOT/venv" "$HOME/.hermes/hermes-agent/venv"; do
  if [ -f "$candidate/bin/activate" ]; then
@@ -48,31 +41,94 @@ fi

 PYTHON="$VENV/bin/python"

-
-# ── Live-gateway plugin (computed before we drop env) ───────────────────────
-EXTRA_PYTHONPATH=""
-EXTRA_PYTEST_PLUGINS=""
-if [ -f "$HOME/.hermes/pytest_live_guard.py" ]; then
-  EXTRA_PYTHONPATH="$HOME/.hermes"
-  EXTRA_PYTEST_PLUGINS="pytest_live_guard"
+# ── Ensure pytest-split is installed (required for shard-equivalent runs) ──
+if ! "$PYTHON" -c "import pytest_split" 2>/dev/null; then
+  echo "→ installing pytest-split into $VENV"
+  if command -v uv >/dev/null 2>&1; then
+    uv pip install --python "$PYTHON" --quiet "pytest-split>=0.9,<1"
+  elif "$PYTHON" -m pip --version >/dev/null 2>&1; then
+    "$PYTHON" -m pip install --quiet "pytest-split>=0.9,<1"
+  else
+    echo "error: neither uv nor pip is available in $VENV — pytest-split is missing" >&2
+    echo "  fix: run  uv pip install -e \".[dev]\"  from $REPO_ROOT" >&2
+    exit 1
+  fi
 fi

+# ── Hermetic environment ────────────────────────────────────────────────────
+# Mirror what CI does in .github/workflows/tests.yml + what conftest.py does.
+# Unset every credential-shaped var currently in the environment.
+while IFS='=' read -r name _; do
+  case "$name" in
+    *_API_KEY|*_TOKEN|*_SECRET|*_PASSWORD|*_CREDENTIALS|*_ACCESS_KEY| \
+    *_SECRET_ACCESS_KEY|*_PRIVATE_KEY|*_OAUTH_TOKEN|*_WEBHOOK_SECRET| \
+    *_ENCRYPT_KEY|*_APP_SECRET|*_CLIENT_SECRET|*_CORP_SECRET|*_AES_KEY| \
+    AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_SESSION_TOKEN|FAL_KEY| \
+    GH_TOKEN|GITHUB_TOKEN)
+      unset "$name"
+      ;;
+  esac
+done < <(env)

-# ── Run in hermetic env ──────────────────────────────────────────────────────
-# env -i: start with empty environment, opt-in only what we need.
-# No credential var can leak — you'd have to explicitly add it here.
-echo "▶ running per-file parallel test suite via run_tests_parallel.py"
-echo "  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; clean env)"
+# Unset HERMES_* behavioral vars too.
+unset HERMES_YOLO_MODE HERMES_INTERACTIVE HERMES_QUIET HERMES_TOOL_PROGRESS \
+      HERMES_TOOL_PROGRESS_MODE HERMES_MAX_ITERATIONS HERMES_SESSION_PLATFORM \
+      HERMES_SESSION_CHAT_ID HERMES_SESSION_CHAT_NAME HERMES_SESSION_THREAD_ID \
+      HERMES_SESSION_SOURCE HERMES_SESSION_KEY HERMES_GATEWAY_SESSION \
+      HERMES_CRON_SESSION \
+      HERMES_PLATFORM HERMES_INFERENCE_PROVIDER HERMES_MANAGED HERMES_DEV \
+      HERMES_CONTAINER HERMES_EPHEMERAL_SYSTEM_PROMPT HERMES_TIMEZONE \
+      HERMES_REDACT_SECRETS HERMES_BACKGROUND_NOTIFICATIONS HERMES_EXEC_ASK \
+      HERMES_HOME_MODE 2>/dev/null || true

+# Pin deterministic runtime.
+export TZ=UTC
+export LANG=C.UTF-8
+export LC_ALL=C.UTF-8
+export PYTHONHASHSEED=0
+
+# ── Live-gateway test guard (developer machines) ────────────────────────────
+# If a system-wide hermes pytest_live_guard plugin is installed at
+# $HOME/.hermes/pytest_live_guard.py, force-load it here so every test run
+# from this script gets the protection regardless of which worktree is
+# checked out (in-tree tests/conftest.py guard may be missing on stale
+# branches). Harmless on CI / fresh machines that don't have the file.
+if [ -f "$HOME/.hermes/pytest_live_guard.py" ]; then
+  case ":${PYTHONPATH:-}:" in
+    *":$HOME/.hermes:"*) ;;
+    *) export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}$HOME/.hermes" ;;
+  esac
+  if [[ ",${PYTEST_PLUGINS:-}," != *,pytest_live_guard,* ]]; then
+    export PYTEST_PLUGINS="${PYTEST_PLUGINS:+$PYTEST_PLUGINS,}pytest_live_guard"
+  fi
+fi
+
+# ── Worker count ────────────────────────────────────────────────────────────
+# CI uses `-n auto` on ubuntu-latest which gives 4 workers. A 20-core
+# workstation with `-n auto` gets 20 workers and exposes test-ordering
+# flakes that CI will never see. Pin to 4 so local matches CI.
+WORKERS="${HERMES_TEST_WORKERS:-4}"
+
+# ── Run pytest ──────────────────────────────────────────────────────────────
 cd "$REPO_ROOT"

-exec env -i \
-  PATH="$PATH" \
-  HOME="$HOME" \
-  TZ=UTC \
-  LANG=C.UTF-8 \
-  LC_ALL=C.UTF-8 \
-  PYTHONHASHSEED=0 \
-  ${EXTRA_PYTHONPATH:+PYTHONPATH="$EXTRA_PYTHONPATH"} \
-  ${EXTRA_PYTEST_PLUGINS:+PYTEST_PLUGINS="$EXTRA_PYTEST_PLUGINS"} \
-  "$PYTHON" "$SCRIPT_DIR/run_tests_parallel.py" "$@"
+# If the first argument starts with `-` treat all args as pytest flags;
+# otherwise treat them as test paths.
+ARGS=("$@")
+
+echo "▶ running pytest with $WORKERS workers, hermetic env, in $REPO_ROOT"
+echo "  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)"
+
+# -o "addopts=" clears pyproject.toml's `-n auto` so our -n wins.
+# We re-add --timeout/--timeout-method here because pyproject.toml's
+# addopts is wiped above. The 60s cap is essential: see pyproject.toml
+# for why (suite deadlocks at session teardown without it).
+exec "$PYTHON" -m pytest \
+  -o "addopts=" \
+  -n "$WORKERS" \
+  --timeout=30 \
+  --timeout-method=signal \
+  --ignore=tests/integration \
+  --ignore=tests/e2e \
+  -m "not integration" \
+  "${ARGS[@]}"
@@ -1,650 +0,0 @@
-#!/usr/bin/env python3
-"""Per-file parallel test runner.
-
-The minimum-viable replacement for pytest-xdist + a subprocess-isolation
-plugin. Discovers test files under ``tests/`` (excluding integration/e2e
-unless explicitly requested), then runs one ``python -m pytest <file>``
-subprocess per file, with bounded parallelism (default: ``os.cpu_count()``).
-
-Why per-file rather than per-test?
-    Per-test spawn overhead (~250ms × 17k tests = 70min CPU minimum)
-    swamped the actual work. Per-file spawn (~250ms × ~850 files = ~3.5min)
-    fits in the budget while still giving every file a fresh Python
-    interpreter — the only isolation boundary that actually matters
-    (cross-file module-level state leakage was the original flake source;
-    intra-file state is the test author's responsibility).
-
-Why drop xdist entirely?
-    xdist's persistent workers accumulate state across files, which is
-    exactly the leakage we wanted to fix. xdist also adds complexity
-    (loadfile vs loadscope, --max-worker-restart, internal control plane)
-    that we don't need when the unit of work is "run pytest on one file".
-    A subprocess.Popen pool gated by a semaphore is ~60 lines and does
-    the job.
-
-Usage:
-    python scripts/run_tests_parallel.py [pytest_args...]
-
-    Common pytest args pass through (e.g. ``-v``, ``-x``, ``--tb=long``,
-    ``-k 'pattern'``, ``--lf``).
-
-Environment:
-    HERMES_TEST_WORKERS  Override worker count (default: os.cpu_count())
-    HERMES_TEST_PATHS    Override discovery roots (colon-sep, default: 'tests')
-
-Exit code: 0 if every file's pytest exited 0; 1 otherwise.
-"""
-
-from __future__ import annotations
-
-import argparse
-import os
-import subprocess
-import sys
-import threading
-import time
-from concurrent.futures import ThreadPoolExecutor, Future
-from pathlib import Path
-from typing import Dict, List, Tuple
-
-
-# Default test discovery roots.
-_DEFAULT_ROOTS = ["tests"]
-
-# Directories to skip during discovery — the e2e + integration suites
-# require real services and are run separately. Match exactly the
-# ``--ignore=`` flags the previous CI command used.
-_SKIP_PARTS = {"integration", "e2e"}
-
-# Per-file wall-clock cap. Generous default — pytest-timeout still
-# enforces per-test caps inside each subprocess; this is just an outer
-# safety net so a single hung file can't stall the whole suite. Override
-# via --file-timeout or HERMES_TEST_FILE_TIMEOUT.
-_DEFAULT_FILE_TIMEOUT_SECONDS = 600.0  # 10 minutes
-
-
-def _count_tests(
-    files: List[Path], repo_root: Path, pytest_passthrough: List[str]
-) -> dict[Path, int]:
-    """Run ``pytest --co -q`` once to count individual tests per file.
-
-    Returns a mapping ``{file_path: test_count}``. Files with zero
-    collected tests are omitted from the dict (not an error — e.g. the
-    file only defines fixtures / conftest helpers).
-
-    This is a single subprocess call (~2-5s for ~1k files) that gives
-    us the total test count for the discovery announcement and
-    per-file counts for the progress lines.
-
-    ``--ignore`` flags for directories in ``_SKIP_PARTS`` are added
-    automatically so that pytest's own collection machinery (conftest
-    walking, directory traversal) doesn't pull in tests we intend to
-    skip — matching what the per-file runs will actually execute.
-    """
-    # Build --ignore flags for skipped dirs so the --co collection
-    # mirrors what we'll actually run (not what pytest might find via
-    # conftest walking or directory traversal).
-    ignore_args: List[str] = []
-    for root in [repo_root / p for p in _DEFAULT_ROOTS]:
-        for part in _SKIP_PARTS:
-            d = root / part
-            if d.is_dir():
-                ignore_args.extend(["--ignore", str(d)])
-
-    cmd = [
-        sys.executable, "-m", "pytest",
-        "--co", "-q",
-        *ignore_args,
-        *[str(f) for f in files],
-        *pytest_passthrough,
-    ]
-    try:
-        result = subprocess.run(
-            cmd,
-            cwd=repo_root,
-            capture_output=True,
-            text=True,
-            timeout=120,
-        )
-    except (subprocess.TimeoutExpired, OSError):
-        return {}
-
-    counts: dict[Path, int] = {}
-    for line in result.stdout.splitlines():
-        # Lines look like: tests/acp/test_auth.py::TestClass::test_name
-        if "::" not in line:
-            continue
-        file_part = line.split("::", 1)[0]
-        key = repo_root / file_part
-        counts[key] = counts.get(key, 0) + 1
-
-    return counts
-
-
-def _discover_files(roots: List[Path]) -> List[Path]:
-    """Return every ``test_*.py`` under the given roots (sorted).
-
-    Roots may be directories (recursed for ``test_*.py``) or explicit
-    ``.py`` files (included as-is, even if they don't match the
-    ``test_*`` prefix — caller knows what they want).
-
-    Exclude any file whose path contains a component in ``_SKIP_PARTS``,
-    UNLESS the user explicitly named it as a root (in which case the
-    user's intent overrides the skip filter).
-    """
-    seen: set[Path] = set()
-    out: List[Path] = []
-    for root in roots:
-        if not root.exists():
-            continue
-        if root.is_file():
-            # Explicit file: include it as-is, skip the _SKIP_PARTS filter
-            # since the user named it directly.
-            real = root.resolve()
-            if real not in seen:
-                seen.add(real)
-                out.append(root)
-            continue
-        for path in root.rglob("test_*.py"):
-            if any(part in _SKIP_PARTS for part in path.parts):
-                continue
-            real = path.resolve()
-            if real in seen:
-                continue
-            seen.add(real)
-            out.append(path)
-    return sorted(out)
-
-
-def _kill_tree(proc: "subprocess.Popen", pgid: int | None = None) -> None:
-    """Kill the pytest subprocess and every descendant it spawned.
-
-    A test run can spin up uvicorn servers, async runtimes, or other
-    long-running grandchildren that survive the pytest subprocess exit
-    if we don't kill the whole tree. ``subprocess.Popen.kill()`` only
-    targets the immediate child; grandchildren reparent to PID 1
-    (Linux) / get adopted by services.exe (Windows) and leak.
-
-    POSIX: the caller must pass ``pgid`` — the process group id captured
-    immediately after Popen (via ``os.getpgid(proc.pid)``). We can't
-    look it up here in the happy path because by the time we get
-    called the leader process has already been reaped and its pid is
-    gone from the kernel's process table, even though descendants in
-    the group are still alive. SIGKILL'ing the captured pgid takes out
-    everything in that group atomically.
-
-    Windows: ``taskkill /F /T /PID`` walks the recorded ppid chain and
-    terminates the whole tree, even when the root has already exited.
-
-    Why not psutil: psutil walks the parent-child tree, but in the
-    happy path the root has already been reaped so ``psutil.Process(pid)``
-    can't find it; grandchildren reparented to PID 1 are also
-    unreachable by tree walk at that point. The platform-native
-    primitives (process groups / taskkill) handle both cases correctly
-    without an extra abstraction layer.
-    """
-    if proc.pid is None:
-        return
-
-    if sys.platform == "win32":
-        try:
-            
-            subprocess.run(
-                ["taskkill", "/F", "/T", "/PID", str(proc.pid)],
-                stdout=subprocess.DEVNULL,
-                stderr=subprocess.DEVNULL,
-                timeout=10,
-            )  # windows-footgun: ok
-        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
-            pass
-    else:
-        # POSIX: kill the captured pgid. Local-import signal so the
-        # SIGKILL attribute is never referenced on Windows.
-        if pgid is not None:
-            try:
-                import signal as _signal
-                os.killpg(pgid, _signal.SIGKILL)  # windows-footgun: ok
-            except (ProcessLookupError, PermissionError, OSError):
-                pass
-
-    # Belt-and-suspenders: ensure subprocess.communicate() sees the exit.
-    try:
-        proc.kill()
-    except (ProcessLookupError, OSError):
-        pass
-
-
-def _run_one_file(
-    file: Path,
-    pytest_args: List[str],
-    repo_root: Path,
-    file_timeout: float,
-) -> Tuple[Path, int, str, dict[str, int]]:
-    """Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.
-
-    Returns (file, returncode, captured_combined_output, summary_counts).
-
-    ``summary_counts`` is the result of ``_parse_pytest_summary(output)`` —
-
-    pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
-        0 = all tests passed
-        1 = some tests failed
-        2 = test execution interrupted
-        3 = internal error
-        4 = pytest CLI usage error
-        5 = no tests collected
-
-    We treat exit 5 as a pass: it just means every test in the file was
-    skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
-    files where every test is marked integration). That's intentional and
-    not a failure mode.
-
-    On per-file timeout (``file_timeout`` seconds) or any other exception
-    during ``communicate()``, we kill the whole process group / process
-    tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
-    orphan onto PID 1. The pytest-timeout plugin enforces per-test
-    timeouts inside the subprocess; this outer timeout exists only to
-    bound a pathologically slow or hung file as a whole.
-    """
-    cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
-    proc = subprocess.Popen(
-        cmd,
-        cwd=repo_root,
-        stdout=subprocess.PIPE,
-        stderr=subprocess.STDOUT,
-        text=True,
-        # POSIX: place the child at the head of its own process group so
-        # _kill_tree can SIGKILL the group atomically.
-        # Windows: this maps to CREATE_NEW_PROCESS_GROUP in CPython 3.12+;
-        # _kill_tree handles the Windows path via taskkill /F /T.
-        start_new_session=True,
-    )
-
-    # Capture the pgid NOW, before the leader can exit and be reaped.
-    # Once the leader is reaped, os.getpgid(proc.pid) raises
-    # ProcessLookupError even though grandchildren in that group are
-    # still alive — defeating the whole cleanup. None on Windows where
-    # the pgid concept doesn't apply (taskkill walks ppid chain instead).
-    pgid: int | None = None
-    if sys.platform != "win32":
-        try:
-            pgid = os.getpgid(proc.pid)
-        except (ProcessLookupError, PermissionError):
-            # Astonishingly fast child? Already dead. _kill_tree's
-            # fallback will handle this case as a no-op.
-            pgid = None
-
-    try:
-        output, _ = proc.communicate(timeout=file_timeout)
-        rc = proc.returncode
-    except subprocess.TimeoutExpired:
-        _kill_tree(proc, pgid=pgid)
-        # Drain whatever the child wrote before we killed it so we have
-        # something to surface in the failure dump.
-        try:
-            output, _ = proc.communicate(timeout=10)
-        except subprocess.TimeoutExpired:
-            output = "(file timeout exceeded; output unavailable)"
-        rc = 124  # de facto convention for "killed by timeout".
-        output = (
-            f"(per-file timeout: {file_timeout:.0f}s exceeded; "
-            f"process tree SIGKILL'd)\n{output}"
-        )
-    except BaseException:
-        # KeyboardInterrupt / runner crash — make sure no zombie
-        # grandchildren outlive us.
-        _kill_tree(proc, pgid=pgid)
-        raise
-    else:
-        # Happy path: pytest exited on its own. The child process already
-        # cleaned up its grandchildren if it's well-behaved, but
-        # well-behaved is not universal — kill the group anyway. Already-
-        # dead processes are a no-op.
-        _kill_tree(proc, pgid=pgid)
-
-    if rc == 5:
-        # No tests collected — every test in the file was filtered out.
-        # Treat as a pass; surface info in a slightly distinct status
-        # so the operator can spot it.
-        rc = 0
-    summary = _parse_pytest_summary(output)
-    return file, rc, output, summary
-
-
-def _parse_pytest_summary(output: str) -> dict[str, int]:
-    """Extract per-file test pass/fail/skip counts from pytest output.
-
-    pytest prints a summary line like ``12 passed, 3 skipped, 1 failed in 2.1s``
-    as the last non-empty line before the short test summary.  We scrape that
-    line for the individual counts so the progress display can show test-level
-    granularity instead of just file-level pass/fail.
-
-    Returns a dict with keys ``passed``, ``failed``, ``skipped``, ``errors``,
-    ``xfailed``, ``xpassed`` (only keys found in the output are present).
-    """
-    import re
-
-    result: dict[str, int] = {}
-    # Walk backwards from the end — the summary line is always near the tail.
-    for line in reversed(output.splitlines()):
-        line = line.strip()
-        if not line:
-            continue
-        # Match "N passed", "N failed", "N skipped", "N errors", "N xfailed", "N xpassed"
-        for m in re.finditer(r"(\d+)\s+(passed|failed|skipped|errors|xfailed|xpassed)", line):
-            result[m.group(2)] = int(m.group(1))
-        # Also match "N error" (singular — pytest uses this sometimes).
-        for m in re.finditer(r"(\d+)\s+error\b", line):
-            result.setdefault("errors", result.get("errors", 0) + int(m.group(1)))
-        if result:
-            # Found the counts line — done.
-            break
-        # Stop at the short test summary header (if any) — everything above
-        # that is individual failure details, not the counts line.
-        if line.startswith("FAILED") or line.startswith("SHORT TEST SUMMARY"):
-            break
-    return result
-
-
-def _format_file(file: Path, repo_root: Path) -> str:
-    """Render a test-file path for display: strip the repo-root prefix
-    when possible so output reads ``tests/acp/test_auth.py`` instead of
-    ``/home/runner/work/hermes-agent/hermes-agent/tests/acp/test_auth.py``.
-
-    Falls back to the absolute path for anything outside the repo root.
-    """
-    try:
-        return str(file.resolve().relative_to(repo_root.resolve()))
-    except ValueError:
-        return str(file)
-
-
-def _print_progress(
-    tests_done: int,
-    total_tests: int,
-    file: Path,
-    rc: int,
-    dur: float,
-    repo_root: Path,
-    tests_passed: int,
-    tests_failed: int,
-    test_counts: dict[Path, int],
-    file_summary: dict[str, int] | None = None,
-) -> None:
-    """Single-line live progress.
-
-    When ``file_summary`` is provided (parsed from pytest output), the
-    per-file parenthetical shows individual test pass/fail counts instead
-    of just the total test count.
-    """
-    status = "✓" if rc == 0 else "✗"
-    pct = (tests_done / total_tests * 100) if total_tests else 0
-    # Digit width for left-side counter padding (derived from total file count).
-    fw = len(str(tests_passed + tests_failed))
-    # Build per-file test count string.
-    if file_summary:
-        parts = []
-        p = file_summary.get("passed", 0)
-        f = file_summary.get("failed", 0)
-        s = file_summary.get("skipped", 0)
-        e = file_summary.get("errors", 0)
-        if p:
-            parts.append(f"{p}✓")
-        if f:
-            parts.append(f"{f}✗")
-        if s:
-            parts.append(f"{s}s")
-        if e:
-            parts.append(f"{e}e")
-        # xfailed/xpassed are rare; include if present.
-        xf = file_summary.get("xfailed", 0)
-        xp = file_summary.get("xpassed", 0)
-        if xf:
-            parts.append(f"{xf}xf")
-        if xp:
-            parts.append(f"{xp}xp")
-        test_str = " ".join(parts) + ", " if parts else ""
-    else:
-        n_tests = test_counts.get(file, 0)
-        test_str = f"{n_tests} tests, " if n_tests else ""
-    msg = (
-        f"[{pct:5.1f}% | {tests_done:>5}/{total_tests}"
-        f" | ✓{tests_passed:>{fw}} | ✗{tests_failed:>{fw}}] "
-        f"{status} {_format_file(file, repo_root)} ({test_str}{dur:.1f}s)"
-    )
-    # Truncate to terminal width if available (no clobbering ANSI lines).
-    try:
-        cols = os.get_terminal_size().columns
-        if len(msg) > cols:
-            msg = msg[: cols - 1] + "…"
-    except OSError:
-        pass
-    print(msg, flush=True)
-
-
-def _print_inline_failure(
-    file: Path, output: str, repo_root: Path, pytest_passthrough: List[str]
-) -> None:
-    """Print a compact failure summary immediately when a file fails.
-
-    Shows the tail of the pytest output (the failure section with stack
-    traces) and a ready-to-run repro command, so the developer doesn't
-    have to wait for the full run to finish before seeing what broke.
-    """
-    rel = _format_file(file, repo_root)
-    # Build a repro command the developer can copy-paste.
-    passthrough_str = " ".join(pytest_passthrough) if pytest_passthrough else ""
-    repro = f"python -m pytest {rel}"
-    if passthrough_str:
-        repro += f" {passthrough_str}"
-
-    # Grab just the failure lines (last ~30 lines of pytest output —
-    # typically the FAILED summary + short test info).
-    lines = output.rstrip().splitlines()
-    tail = "\n".join(lines[-30:])
-
-    print(flush=True)
-    print(f"  ╔╍ Failed: {rel} ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍", flush=True)
-    for line in tail.splitlines():
-        print(f"  ║ {line}", flush=True)
-    print(f"  ║", flush=True)
-    print(f"  ║  Repro: {repro}", flush=True)
-    print(f"  ╚╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍", flush=True)
-    print(flush=True)
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(
-        description=__doc__,
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-    )
-    parser.add_argument(
-        "-j",
-        "--jobs",
-        type=int,
-        default=int(os.environ.get("HERMES_TEST_WORKERS") or (os.cpu_count() or 4) * 2),
-        help="Parallel worker count (default: $HERMES_TEST_WORKERS or cpu_count*2)",
-    )
-    parser.add_argument(
-        "--paths",
-        default=os.environ.get("HERMES_TEST_PATHS", ":".join(_DEFAULT_ROOTS)),
-        help="Colon-separated discovery roots (default: 'tests')",
-    )
-    parser.add_argument(
-        "--include-integration",
-        action="store_true",
-        help="Don't skip integration/ e2e/ during discovery",
-    )
-    parser.add_argument(
-        "--file-timeout",
-        type=float,
-        default=float(
-            os.environ.get("HERMES_TEST_FILE_TIMEOUT", _DEFAULT_FILE_TIMEOUT_SECONDS)
-        ),
-        help=(
-            "Per-file wall-clock cap in seconds. On timeout, the pytest "
-            "subprocess and its full process tree are SIGKILL'd. "
-            "Default: 600 (10 min), env: HERMES_TEST_FILE_TIMEOUT."
-        ),
-    )
-    parser.add_argument(
-        "paths_positional",
-        nargs="*",
-        metavar="PATH",
-        help=(
-            "Restrict discovery to these paths (directories or .py files). "
-            "Mutually exclusive with --paths. Anything after a literal '--' "
-            "separator is passed through to each per-file pytest invocation."
-        ),
-    )
-    # Manually split argv on '--' so positional paths and pytest passthrough
-    # args don't fight over each other. argparse's nargs="*" positional is
-    # greedy and will swallow everything after '--' including the pytest
-    # flags, defeating the convention.
-    argv = sys.argv[1:]
-    if "--" in argv:
-        sep = argv.index("--")
-        our_args, pytest_passthrough = argv[:sep], argv[sep + 1 :]
-    else:
-        our_args, pytest_passthrough = argv, []
-    args = parser.parse_args(our_args)
-
-    repo_root = Path(__file__).resolve().parent.parent
-
-    # Resolve discovery roots: positional path args override --paths if any
-    # were supplied, otherwise --paths (which itself defaults to 'tests').
-    if args.paths_positional:
-        # Positionals can be directories OR explicit .py files. Either is
-        # fine — _discover_files handles both via rglob('test_*.py') for
-        # dirs and direct inclusion for files.
-        roots = [repo_root / p for p in args.paths_positional]
-    else:
-        roots = [repo_root / p for p in args.paths.split(":") if p]
-
-    if args.include_integration:
-        # Caller takes responsibility — typically used via explicit -k filter.
-        global _SKIP_PARTS  # noqa: PLW0603 — config knob
-        _SKIP_PARTS = set()
-
-    files = _discover_files(roots)
-    if not files:
-        print(f"No test files discovered under {[str(r) for r in roots]}", file=sys.stderr)
-        return 1
-
-    # Count individual tests per file via a single pytest --co pass.
-    test_counts = _count_tests(files, repo_root, pytest_passthrough)
-    total_tests = sum(test_counts.values())
-
-    print(
-        f"Discovered {len(files)} test files ({total_tests} tests) under "
-        f"{[str(r.relative_to(repo_root)) if r.is_relative_to(repo_root) else str(r) for r in roots]}; "
-        f"running with -j {args.jobs}",
-        flush=True,
-    )
-
-    # Capture and print on completion (out-of-order is fine — keeps the
-    # terminal clean rather than interleaving N parallel pytest outputs).
-    failures: List[Tuple[Path, str, Dict[str, int]]] = []
-    started = time.monotonic()
-    files_done = 0
-    tests_done = 0
-    pass_count = 0
-    fail_count = 0
-    tests_passed = 0
-    tests_failed = 0
-    lock = threading.Lock()
-
-    def _on_done(file: Path, started_at: float, fut: "Future[Tuple[Path, int, str, dict[str, int]]]") -> None:
-        nonlocal files_done, tests_done, pass_count, fail_count, tests_passed, tests_failed
-        n_tests = test_counts.get(file, 0)
-        try:
-            fpath, rc, output, summary = fut.result()
-        except Exception as exc:  # noqa: BLE001 — must always advance counter
-            with lock:
-                files_done += 1
-                tests_done += n_tests
-                fail_count += 1
-                failures.append((file, f"runner crashed: {exc!r}", {}))
-                _print_progress(
-                    tests_done, total_tests, file, 1,
-                    time.monotonic() - started_at,
-                    repo_root, tests_passed, tests_failed,
-                    test_counts,
-                )
-            return
-        with lock:
-            files_done += 1
-            tests_done += n_tests
-            # Accumulate test-level counts from parsed summary.
-            tests_passed += summary.get("passed", 0)
-            tests_failed += summary.get("failed", 0)
-            if rc == 0:
-                pass_count += 1
-            else:
-                fail_count += 1
-                failures.append((fpath, output, summary))
-            _print_progress(
-                tests_done, total_tests, fpath, rc,
-                time.monotonic() - started_at,
-                repo_root, tests_passed, tests_failed,
-                test_counts,
-                file_summary=summary,
-            )
-            if rc != 0:
-                _print_inline_failure(fpath, output, repo_root, pytest_passthrough)
-
-    with ThreadPoolExecutor(max_workers=args.jobs) as pool:
-        futures: List[Future] = []
-        for file in files:
-            t0 = time.monotonic()
-            fut = pool.submit(
-                _run_one_file, file, pytest_passthrough, repo_root, args.file_timeout
-            )
-            fut.add_done_callback(lambda f, file=file, t0=t0: _on_done(file, t0, f))
-            futures.append(fut)
-        # Block until everything's done. ThreadPoolExecutor.__exit__ waits
-        # for all submitted work, but doing it explicitly here makes the
-        # control flow obvious.
-        for fut in futures:
-            fut.result() if fut.exception() is None else None
-
-    elapsed = time.monotonic() - started
-    print()
-    pct = (tests_done / total_tests * 100) if total_tests else 0
-    print(f"=== Summary: {len(files)} files, {tests_passed} tests passed, {tests_failed} failed ({pct:.0f}% complete) in {elapsed:.1f}s ({args.jobs} workers) ===")
-
-    if failures:
-        print()
-        print("=== Failure output ===")
-        for file, output, _summary in failures:
-            print()
-            print(f"--- {_format_file(file, repo_root)} ---")
-            print(output.rstrip())
-        print()
-        # Split: files with actual test failures vs non-zero exit for other reasons
-        test_fail_files = [(f, s) for f, _o, s in failures if s.get("failed", 0) > 0]
-        all_passed_but_nonzero = [(f, s) for f, _o, s in failures
-                                  if s.get("failed", 0) == 0 and s.get("passed", 0) > 0]
-        no_tests_ran = [(f, s) for f, _o, s in failures
-                        if s.get("failed", 0) == 0 and s.get("passed", 0) == 0]
-        if test_fail_files:
-            total_tf = sum(s.get("failed", 0) for _, s in test_fail_files)
-            print(f"=== {len(test_fail_files)} file{'s' if len(test_fail_files) != 1 else ''} with test failures ({total_tf} test{'s' if total_tf != 1 else ''} failed) ===")
-            for file, s in test_fail_files:
-                nf = s.get("failed", 0)
-                print(f"  {_format_file(file, repo_root)}  ({nf} test{'s' if nf != 1 else ''} failed)")
-        if all_passed_but_nonzero:
-            print(f"=== {len(all_passed_but_nonzero)} file{'s' if len(all_passed_but_nonzero) != 1 else ''} where all tests passed but pytest exited non-zero (warnings-as-errors, hook failures, etc.) ===")
-            for file, s in all_passed_but_nonzero:
-                print(f"  {_format_file(file, repo_root)}  ({s.get('passed', 0)} passed)")
-        if no_tests_ran:
-            print(f"=== {len(no_tests_ran)} file{'s' if len(no_tests_ran) != 1 else ''} where no tests ran (collection/import error, timeout before collection, etc.) ===")
-            for file, s in no_tests_ran:
-                print(f"  {_format_file(file, repo_root)}")
-        return 1
-
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
@@ -40,16 +40,6 @@ def _clean_env(monkeypatch):
        "ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN", "CLAUDE_CODE_OAUTH_TOKEN",
    ):
        monkeypatch.delenv(key, raising=False)
-    # Module-level unhealthy cache (10-min TTL) leaks between tests;
-    # earlier tests that call _mark_provider_unhealthy() poison the
-    # cache for later ones, causing _resolve_auto to skip providers
-    # that the test patched to return valid clients.
-    import agent.auxiliary_client as _aux_mod
-    _aux_mod._aux_unhealthy_until.clear()
-    _aux_mod._aux_unhealthy_logged_at.clear()
-    yield
-    _aux_mod._aux_unhealthy_until.clear()
-    _aux_mod._aux_unhealthy_logged_at.clear()


@pytest.fixture
@@ -471,17 +461,6 @@ class TestExpiredCodexFallback:
        import base64
        import time as _time

-        # Belt-and-suspenders: _try_openrouter marks openrouter unhealthy
-        # when OPENROUTER_API_KEY is absent (which the preceding test in
-        # this class exercises).  The file-level _clean_env autouse fixture
-        # clears the cache, but fixture ordering with the conftest
-        # _hermetic_environment autouse can leave a narrow window where
-        # the mark reappears.  Explicitly clear here so this test is
-        # independent of run order.
-        import agent.auxiliary_client as _aux_mod
-        _aux_mod._aux_unhealthy_until.clear()
-        _aux_mod._aux_unhealthy_logged_at.clear()
-
        header = base64.urlsafe_b64encode(b'{"alg":"RS256","typ":"JWT"}').rstrip(b"=").decode()
        payload_data = json.dumps({"exp": int(_time.time()) - 3600}).encode()
        payload = base64.urlsafe_b64encode(payload_data).rstrip(b"=").decode()
@@ -1068,20 +1047,6 @@ class TestGetProviderChain:
 class TestTryPaymentFallback:
    """_try_payment_fallback skips the failed provider and tries alternatives."""

-    @pytest.fixture(autouse=True)
-    def _clear_unhealthy_cache(self):
-        """Earlier tests in this file call _mark_provider_unhealthy() which
-        pollutes the module-level ``_aux_unhealthy_until`` dict (10-min TTL).
-        Without this cleanup the fallback chain skips providers we've patched
-        to return valid clients — the patched function is never called.
-        """
-        from agent.auxiliary_client import _aux_unhealthy_until, _aux_unhealthy_logged_at
-        _aux_unhealthy_until.clear()
-        _aux_unhealthy_logged_at.clear()
-        yield
-        _aux_unhealthy_until.clear()
-        _aux_unhealthy_logged_at.clear()
-
    def test_skips_failed_provider(self):
        mock_client = MagicMock()
        with patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
@@ -1,93 +0,0 @@
-from types import SimpleNamespace
-
-from agent.agent_init import _merge_custom_provider_extra_body
-
-
-def test_custom_provider_extra_body_merges_into_request_overrides():
-    agent = SimpleNamespace(
-        provider="custom",
-        model="google/gemma-4-31b-it",
-        base_url="https://example.test/v1",
-        request_overrides={"service_tier": "priority"},
-    )
-
-    _merge_custom_provider_extra_body(
-        agent,
-        [
-            {
-                "name": "gemma",
-                "base_url": "https://example.test/v1/",
-                "model": "google/gemma-4-31b-it",
-                "extra_body": {
-                    "enable_thinking": True,
-                    "reasoning_effort": "high",
-                },
-            }
-        ],
-    )
-
-    assert agent.request_overrides == {
-        "service_tier": "priority",
-        "extra_body": {
-            "enable_thinking": True,
-            "reasoning_effort": "high",
-        },
-    }
-
-
-def test_custom_provider_extra_body_preserves_caller_override():
-    agent = SimpleNamespace(
-        provider="custom",
-        model="google/gemma-4-31b-it",
-        base_url="https://example.test/v1",
-        request_overrides={
-            "extra_body": {
-                "reasoning_effort": "low",
-                "caller_only": True,
-            }
-        },
-    )
-
-    _merge_custom_provider_extra_body(
-        agent,
-        [
-            {
-                "name": "gemma",
-                "base_url": "https://example.test/v1",
-                "model": "google/gemma-4-31b-it",
-                "extra_body": {
-                    "enable_thinking": True,
-                    "reasoning_effort": "high",
-                },
-            }
-        ],
-    )
-
-    assert agent.request_overrides["extra_body"] == {
-        "enable_thinking": True,
-        "reasoning_effort": "low",
-        "caller_only": True,
-    }
-
-
-def test_custom_provider_extra_body_ignores_other_custom_models():
-    agent = SimpleNamespace(
-        provider="custom",
-        model="other-model",
-        base_url="https://example.test/v1",
-        request_overrides={},
-    )
-
-    _merge_custom_provider_extra_body(
-        agent,
-        [
-            {
-                "name": "gemma",
-                "base_url": "https://example.test/v1",
-                "model": "google/gemma-4-31b-it",
-                "extra_body": {"enable_thinking": True},
-            }
-        ],
-    )
-
-    assert agent.request_overrides == {}
@@ -56,7 +56,6 @@ class TestFailoverReason:
            "overloaded", "server_error", "timeout",
            "context_overflow", "payload_too_large", "image_too_large",
            "model_not_found", "format_error",
-            "multimodal_tool_content_unsupported",
            "provider_policy_blocked",
            "thinking_signature", "long_context_tier",
            "oauth_long_context_beta_forbidden",
@@ -1257,66 +1256,3 @@ class TestRateLimitErrorWithoutStatusCode:
        e.status_code = None
        result = classify_api_error(e, provider="copilot", model="gpt-4o")
        assert result.reason != FailoverReason.rate_limit
-
-
-
-# ── Test: multimodal_tool_content_unsupported pattern ───────────────────
-
-class TestMultimodalToolContentUnsupported:
-    """Issue #27344 — providers that reject list-type tool message content
-    should be classified as ``multimodal_tool_content_unsupported`` so the
-    retry loop can downgrade screenshots to text and try again.
-    """
-
-    def test_xiaomi_mimo_text_is_not_set_pattern(self):
-        """The actual Xiaomi MiMo 400 wording from the bug report."""
-        e = MockAPIError(
-            "Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}",
-            status_code=400,
-        )
-        result = classify_api_error(e, provider="xiaomi", model="mimo-v2.5")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-        assert result.retryable is True
-
-    def test_generic_tool_message_must_be_string(self):
-        e = MockAPIError(
-            "tool message content must be a string",
-            status_code=400,
-        )
-        result = classify_api_error(e, provider="custom", model="some-model")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-
-    def test_expected_string_got_list(self):
-        e = MockAPIError(
-            "Schema validation failed: expected string, got list",
-            status_code=400,
-        )
-        result = classify_api_error(e, provider="custom", model="some-model")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-
-    def test_multimodal_tool_content_takes_priority_over_context_overflow(self):
-        """Some providers return a 400 whose message contains BOTH
-        'text is not set' and a length-shaped phrase; the tool-content
-        recovery is cheaper than compression so it must win the priority.
-        """
-        e = MockAPIError(
-            "text is not set; context length exceeded",
-            status_code=400,
-        )
-        result = classify_api_error(e, provider="xiaomi", model="mimo-v2.5")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-
-    def test_no_status_code_path_also_classifies(self):
-        """When the error reaches us without a status code (transport
-        layer ate it) the message-only classifier branch must also
-        recognise the pattern.
-        """
-        e = MockTransportError("tool_call.content must be string")
-        result = classify_api_error(e, provider="alibaba", model="qwen3.5-plus")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-
-    def test_unrelated_400_is_not_misclassified(self):
-        """Make sure the patterns don't false-positive on normal 400s."""
-        e = MockAPIError("bad request: missing field 'model'", status_code=400)
-        result = classify_api_error(e, provider="openrouter", model="anthropic/claude-sonnet-4")
-        assert result.reason != FailoverReason.multimodal_tool_content_unsupported
@@ -1060,191 +1060,3 @@ class TestHonchoCadenceTracking:
        p.on_turn_start(2, "second message")
        should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
        assert should_skip, "Second turn (turn 2) SHOULD be skipped"
-
-
-class TestMemoryToolToolsetGate:
-    """Issue #5544: memory provider tools must respect platform_toolsets.
-
-    Before the fix, MemoryManager.get_all_tool_schemas() output was appended
-    to AIAgent.tools unconditionally in agent_init.py — bypassing the
-    enabled_toolsets filter. Result: `platform_toolsets: telegram: []`
-    still leaked fact_store and other memory tools into the tool surface,
-    causing 10x latency on local models (Qwen3-30B: 1.7s → 42s) and
-    tool-call loops on small models.
-
-    These tests mirror the gate logic in agent/agent_init.py around the
-    memory provider tool injection block. The gate condition is:
-
-        enabled_toolsets is None        → no filter, inject (backward compat)
-        "memory" in enabled_toolsets    → user opted in, inject
-        otherwise (incl. [])            → skip injection
-    """
-
-    @staticmethod
-    def _run_memory_injection(enabled_toolsets, memory_manager):
-        """Simulate the gated memory-tool injection block from agent_init.py."""
-        tools = []
-        valid_tool_names = set()
-
-        if memory_manager and tools is not None and (
-            enabled_toolsets is None or "memory" in enabled_toolsets
-        ):
-            _existing = {
-                t.get("function", {}).get("name")
-                for t in tools
-                if isinstance(t, dict)
-            }
-            for _schema in memory_manager.get_all_tool_schemas():
-                _tname = _schema.get("name", "")
-                if _tname and _tname in _existing:
-                    continue
-                tools.append({"type": "function", "function": _schema})
-                if _tname:
-                    valid_tool_names.add(_tname)
-                    _existing.add(_tname)
-
-        return tools, valid_tool_names
-
-    def _mgr_with_tools(self, *tool_names):
-        """Build a MemoryManager whose providers expose the named tool schemas."""
-        mgr = MemoryManager()
-        p = FakeMemoryProvider(
-            "ext",
-            tools=[{"name": n, "description": n, "parameters": {}} for n in tool_names],
-        )
-        mgr.add_provider(p)
-        return mgr
-
-    def test_none_toolsets_injects(self):
-        """enabled_toolsets=None (no filter) injects memory tools — backward compat."""
-        mgr = self._mgr_with_tools("fact_store")
-        tools, names = self._run_memory_injection(None, mgr)
-        assert "fact_store" in names
-        assert any(t["function"]["name"] == "fact_store" for t in tools)
-
-    def test_memory_in_toolsets_injects(self):
-        """enabled_toolsets including 'memory' injects memory tools."""
-        mgr = self._mgr_with_tools("fact_store")
-        tools, names = self._run_memory_injection(["terminal", "memory", "web"], mgr)
-        assert "fact_store" in names
-
-    def test_empty_toolsets_blocks_injection(self):
-        """`platform_toolsets: telegram: []` must suppress memory tools. (#5544)"""
-        mgr = self._mgr_with_tools("fact_store")
-        tools, names = self._run_memory_injection([], mgr)
-        assert tools == []
-        assert names == set()
-
-    def test_toolsets_without_memory_blocks_injection(self):
-        """Toolset list that doesn't name 'memory' must suppress injection."""
-        mgr = self._mgr_with_tools("fact_store")
-        tools, names = self._run_memory_injection(["terminal", "web"], mgr)
-        assert tools == []
-        assert names == set()
-
-    def test_no_memory_manager_no_injection(self):
-        """Gate is moot without a memory manager."""
-        tools, names = self._run_memory_injection(None, None)
-        assert tools == []
-
-    def test_multiple_schemas_all_blocked_together(self):
-        """When the gate is closed, no memory tools leak — not even partially."""
-        mgr = self._mgr_with_tools("fact_store", "memory_search", "memory_add")
-        tools, names = self._run_memory_injection(["terminal"], mgr)
-        assert tools == []
-        assert names == set()
-
-    def test_multiple_schemas_all_injected_when_enabled(self):
-        """When the gate is open, every memory tool schema is injected."""
-        mgr = self._mgr_with_tools("fact_store", "memory_search", "memory_add")
-        tools, names = self._run_memory_injection(None, mgr)
-        assert names == {"fact_store", "memory_search", "memory_add"}
-
-
-class TestContextEngineToolsetGate:
-    """Issue #5544 (sibling): context engine tools follow the same gate.
-
-    `agent.context_compressor.get_tool_schemas()` (e.g. lcm_grep, lcm_describe,
-    lcm_expand) was appended to AIAgent.tools unconditionally. Same blind
-    injection class as the memory bug; same local-model penalty. Gate name:
-    "context_engine" (matches the existing plugin-system convention).
-    """
-
-    @staticmethod
-    def _run_context_engine_injection(enabled_toolsets, compressor):
-        """Simulate the gated context-engine injection block from agent_init.py."""
-        tools = []
-        valid_tool_names = set()
-        engine_tool_names = set()
-
-        if (
-            compressor is not None
-            and tools is not None
-            and (
-                enabled_toolsets is None
-                or "context_engine" in enabled_toolsets
-            )
-        ):
-            _existing = {
-                t.get("function", {}).get("name")
-                for t in tools
-                if isinstance(t, dict)
-            }
-            for _schema in compressor.get_tool_schemas():
-                _tname = _schema.get("name", "")
-                if _tname and _tname in _existing:
-                    continue
-                tools.append({"type": "function", "function": _schema})
-                if _tname:
-                    valid_tool_names.add(_tname)
-                    engine_tool_names.add(_tname)
-                    _existing.add(_tname)
-
-        return tools, valid_tool_names, engine_tool_names
-
-    class _FakeCompressor:
-        def __init__(self, schemas):
-            self._schemas = schemas
-
-        def get_tool_schemas(self):
-            return list(self._schemas)
-
-    def _compressor_with(self, *tool_names):
-        return self._FakeCompressor(
-            [{"name": n, "description": n, "parameters": {}} for n in tool_names]
-        )
-
-    def test_none_toolsets_injects(self):
-        """enabled_toolsets=None injects context-engine tools — backward compat."""
-        c = self._compressor_with("lcm_grep", "lcm_describe", "lcm_expand")
-        tools, names, engine_names = self._run_context_engine_injection(None, c)
-        assert engine_names == {"lcm_grep", "lcm_describe", "lcm_expand"}
-
-    def test_context_engine_in_toolsets_injects(self):
-        """enabled_toolsets including 'context_engine' injects the tools."""
-        c = self._compressor_with("lcm_grep")
-        tools, names, engine_names = self._run_context_engine_injection(
-            ["terminal", "context_engine"], c
-        )
-        assert "lcm_grep" in engine_names
-
-    def test_empty_toolsets_blocks_injection(self):
-        """`platform_toolsets: telegram: []` must suppress context-engine tools."""
-        c = self._compressor_with("lcm_grep")
-        tools, names, engine_names = self._run_context_engine_injection([], c)
-        assert tools == []
-        assert engine_names == set()
-
-    def test_toolsets_without_context_engine_blocks_injection(self):
-        """A toolset list that doesn't name 'context_engine' suppresses injection."""
-        c = self._compressor_with("lcm_grep", "lcm_describe")
-        tools, names, engine_names = self._run_context_engine_injection(
-            ["terminal", "memory"], c
-        )
-        assert tools == []
-        assert engine_names == set()
-
-    def test_no_compressor_no_injection(self):
-        """Gate is moot without a context_compressor."""
-        tools, names, engine_names = self._run_context_engine_injection(None, None)
-        assert tools == []
@@ -444,7 +444,6 @@ class TestBuildNousSubscriptionPrompt:
                    "tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
                    "browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
                    "modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
-                    "app_tools": NousFeatureState("app_tools", "App tools (500+ apps)", True, True, True, True, False, True, "Nous Subscription"),
                },
            ),
        )
@@ -469,7 +468,6 @@ class TestBuildNousSubscriptionPrompt:
                    "tts": NousFeatureState("tts", "OpenAI TTS", True, False, False, False, False, True, ""),
                    "browser": NousFeatureState("browser", "Browser automation", True, False, False, False, False, True, ""),
                    "modal": NousFeatureState("modal", "Modal execution", False, False, False, False, False, True, ""),
-                    "app_tools": NousFeatureState("app_tools", "App tools (500+ apps)", True, False, False, False, False, True, ""),
                },
            ),
        )
@@ -556,11 +556,10 @@ Generate some audio.
            raising=False,
        )

-        with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
-            from gateway.session_context import clear_session_vars, set_session_vars
-
-            tokens = set_session_vars(platform="telegram")
-            try:
+        with patch.dict(
+            os.environ, {"HERMES_SESSION_PLATFORM": "telegram"}, clear=False
+        ):
+            with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
                _make_skill(
                    tmp_path,
                    "test-skill",
@@ -572,8 +571,6 @@ Generate some audio.
                )
                scan_skill_commands()
                msg = build_skill_invocation_message("/test-skill", "do stuff")
-            finally:
-                clear_session_vars(tokens)

        assert msg is not None
        assert "local cli" in msg.lower()
@@ -1,12 +1,6 @@
-"""Tests for agent/skill_utils.py."""
+"""Tests for agent/skill_utils.py — extract_skill_conditions metadata handling."""

-from unittest.mock import patch
-
-from agent.skill_utils import (
-    extract_skill_conditions,
-    iter_skill_index_files,
-    skill_matches_platform,
-)
+from agent.skill_utils import extract_skill_conditions


 def test_metadata_as_dict_with_hermes():
@@ -62,138 +56,3 @@ def test_metadata_missing_entirely():
        "fallback_for_tools": [],
        "requires_tools": [],
    }
-
-
-def test_iter_skill_index_files_prunes_dependency_dirs(tmp_path):
-    real = tmp_path / "real-skill"
-    real.mkdir()
-    (real / "SKILL.md").write_text("---\nname: real-skill\n---\n", encoding="utf-8")
-
-    nested = (
-        tmp_path
-        / "bring"
-        / "scripts"
-        / ".venv"
-        / "lib"
-        / "python3.13"
-        / "site-packages"
-        / "typer"
-        / ".agents"
-        / "skills"
-        / "typer"
-    )
-    nested.mkdir(parents=True)
-    (nested / "SKILL.md").write_text("---\nname: typer\n---\n", encoding="utf-8")
-
-    node_module = (
-        tmp_path
-        / "web-skill"
-        / "node_modules"
-        / "dep"
-        / ".agents"
-        / "skills"
-        / "dep"
-    )
-    node_module.mkdir(parents=True)
-    (node_module / "SKILL.md").write_text("---\nname: dep\n---\n", encoding="utf-8")
-
-    found = list(iter_skill_index_files(tmp_path, "SKILL.md"))
-
-    assert found == [real / "SKILL.md"]
-
-
-# ── skill_matches_platform on Termux ──────────────────────────────────────
-
-
-class TestSkillMatchesPlatformTermux:
-    """Termux is Linux userland on Android. Skills tagged platforms:[linux]
-    must load there regardless of whether Python reports sys.platform as
-    "linux" (pre-3.13) or "android" (3.13+). Reported by user @LikiusInik
-    in May 2026 — only 3 built-in skills appeared on Termux because every
-    github/productivity/mlops skill is tagged platforms:[linux,macos,windows]
-    and sys.platform=="android" did not start with "linux".
-    """
-
-    def test_no_platforms_field_matches_everywhere(self):
-        # Backward-compat default — skills without a platforms tag load
-        # on any OS, Termux included.
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform({}) is True
-            assert skill_matches_platform({"name": "foo"}) is True
-
-    def test_linux_skill_loads_on_termux_android_platform(self):
-        # Python 3.13+ on Termux reports sys.platform == "android".
-        fm = {"platforms": ["linux"]}
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform(fm) is True
-
-    def test_linux_macos_windows_skill_loads_on_termux(self):
-        # The common "[linux, macos, windows]" tag used by github-*,
-        # productivity, mlops, etc.
-        fm = {"platforms": ["linux", "macos", "windows"]}
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform(fm) is True
-
-    def test_linux_skill_loads_on_termux_linux_platform(self):
-        # Pre-3.13 Termux reports sys.platform == "linux" already — this
-        # works without the Termux escape hatch but must still pass.
-        fm = {"platforms": ["linux"]}
-        with patch("agent.skill_utils.sys.platform", "linux"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform(fm) is True
-
-    def test_macos_only_skill_still_excluded_on_termux(self):
-        # macOS-only skills (apple-notes, imessage, ...) should NOT load
-        # on Termux. The Termux fallback only widens platforms:[linux,...].
-        fm = {"platforms": ["macos"]}
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform(fm) is False
-
-    def test_windows_only_skill_still_excluded_on_termux(self):
-        fm = {"platforms": ["windows"]}
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform(fm) is False
-
-    def test_explicit_termux_or_android_tag_matches(self):
-        # Skills can also opt in explicitly via platforms:[termux] or
-        # platforms:[android] — both should match a Termux session.
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=True
-        ):
-            assert skill_matches_platform({"platforms": ["termux"]}) is True
-            assert skill_matches_platform({"platforms": ["android"]}) is True
-
-    def test_non_termux_android_does_not_widen(self):
-        # If we're somehow on a plain Android Python (not Termux), don't
-        # silently load Linux skills — Termux is the supported environment.
-        fm = {"platforms": ["linux"]}
-        with patch("agent.skill_utils.sys.platform", "android"), patch(
-            "agent.skill_utils.is_termux", return_value=False
-        ):
-            assert skill_matches_platform(fm) is False
-
-    def test_linux_skill_on_real_linux_unaffected(self):
-        # The non-Termux Linux path must not change.
-        fm = {"platforms": ["linux"]}
-        with patch("agent.skill_utils.sys.platform", "linux"), patch(
-            "agent.skill_utils.is_termux", return_value=False
-        ):
-            assert skill_matches_platform(fm) is True
-
-    def test_macos_skill_on_real_macos_unaffected(self):
-        fm = {"platforms": ["macos"]}
-        with patch("agent.skill_utils.sys.platform", "darwin"), patch(
-            "agent.skill_utils.is_termux", return_value=False
-        ):
-            assert skill_matches_platform(fm) is True
@@ -20,9 +20,12 @@ test runner at ``scripts/run_tests.sh``.
 """

 import asyncio
+import logging
 import os
 import re
+import signal
 import sys
+import tempfile
 from pathlib import Path
 from unittest.mock import patch

@@ -34,22 +37,6 @@ if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))


-# ── Per-file process isolation ──────────────────────────────────────────────
-# Tests run via ``scripts/run_tests_parallel.py``, which spawns a fresh
-# ``python -m pytest <file>`` subprocess per test file. Cross-file state
-# leakage (module-level dicts, ContextVars, caches) is impossible: each
-# file gets a clean Python interpreter. Intra-file ordering is the test
-# author's responsibility — if test A in foo.py mutates state that test B
-# in foo.py reads, that's a real bug to fix in the file (it would also
-# bite anyone running ``pytest tests/foo.py`` directly).
-#
-# This replaces the historic _reset_module_state autouse fixture (manual
-# state clearing) and the brief experiment with subprocess-per-test
-# isolation (too slow at ~17k tests).
-#
-# See ``scripts/run_tests_parallel.py`` for the runner.
-
-
 # ── Credential env-var filter ──────────────────────────────────────────────
 #
 # Any env var in the current process matching ONE of these patterns is
@@ -292,7 +279,7 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
    "WECOM_HOME_CHANNEL_NAME",
    # Platform gating — set by load_gateway_config() as a side effect when
    # a config.yaml is present, so individual test bodies that call the
-    # loader leak these values into later tests in the same process.
+    # loader leak these values into later tests on the same xdist worker.
    # Force-clear on every test setup so the leak can't happen.
    "SLACK_REQUIRE_MENTION",
    "SLACK_STRICT_MENTION",
@@ -381,21 +368,144 @@ def _isolate_hermes_home(_hermetic_environment):
    return None


-# ── Module-level state reset — replaced by per-file process isolation ──────
+# ── Module-level state reset ───────────────────────────────────────────────
 #
-# Each test FILE runs in a freshly-spawned ``python -m pytest <file>``
-# subprocess via ``scripts/run_tests_parallel.py``, so module-level dicts /
-# sets / ContextVars from tests in one file cannot leak into tests in
-# another file. No manual per-module clearing needed.
+# Python modules are singletons per process, and pytest-xdist workers are
+# long-lived. Module-level dicts/sets (tool registries, approval state,
+# interrupt flags) and ContextVars persist across tests in the same worker,
+# causing tests that pass alone to fail when run with siblings.
 #
-# Within a single file, ordering is the author's responsibility. If your
-# tests in the same file share mutable state, either reset it explicitly
-# in a fixture or split them across files.
+# Each entry in this fixture clears state that belongs to a specific module.
+# New state buckets go here too — this is the single gate that prevents
+# "works alone, flakes in CI" bugs from state leakage.
 #
-# The skill ``test-suite-cascade-diagnosis`` documents the cascade patterns
-# this replaces; the running example was ``test_command_guards`` failing
-# 12/15 CI runs because ``tools.approval._session_approved`` carried
-# approvals from one test's session into another's.
+# The skill `test-suite-cascade-diagnosis` documents the concrete patterns
+# this closes; the running example was `test_command_guards` failing 12/15
+# CI runs because ``tools.approval._session_approved`` carried approvals
+# from one test's session into another's.
+
+@pytest.fixture(autouse=True)
+def _reset_module_state():
+    """Clear module-level mutable state and ContextVars between tests.
+
+    Keeps state from leaking across tests on the same xdist worker. Modules
+    that don't exist yet (test collection before production import) are
+    skipped silently — production import later creates fresh empty state.
+    """
+    # --- logging — quiet/one-shot paths mutate process-global logger state ---
+    logging.disable(logging.NOTSET)
+    for _logger_name in ("tools", "run_agent", "trajectory_compressor", "cron", "hermes_cli"):
+        _logger = logging.getLogger(_logger_name)
+        _logger.disabled = False
+        _logger.setLevel(logging.NOTSET)
+        _logger.propagate = True
+
+    # --- tools.approval — the single biggest source of cross-test pollution ---
+    try:
+        from tools import approval as _approval_mod
+        _approval_mod._session_approved.clear()
+        _approval_mod._session_yolo.clear()
+        _approval_mod._permanent_approved.clear()
+        _approval_mod._pending.clear()
+        _approval_mod._gateway_queues.clear()
+        _approval_mod._gateway_notify_cbs.clear()
+        # ContextVar: reset to empty string so get_current_session_key()
+        # falls through to the env var / default path, matching a fresh
+        # process.
+        _approval_mod._approval_session_key.set("")
+    except Exception:
+        pass
+
+    # --- tools.interrupt — per-thread interrupt flag set ---
+    try:
+        from tools import interrupt as _interrupt_mod
+        with _interrupt_mod._lock:
+            _interrupt_mod._interrupted_threads.clear()
+    except Exception:
+        pass
+
+    # --- gateway.session_context — 9 ContextVars that represent
+    #     the active gateway session. If set in one test and not reset,
+    #     the next test's get_session_env() reads stale values.
+    try:
+        from gateway import session_context as _sc_mod
+        for _cv in (
+            _sc_mod._SESSION_PLATFORM,
+            _sc_mod._SESSION_CHAT_ID,
+            _sc_mod._SESSION_CHAT_NAME,
+            _sc_mod._SESSION_THREAD_ID,
+            _sc_mod._SESSION_USER_ID,
+            _sc_mod._SESSION_USER_NAME,
+            _sc_mod._SESSION_KEY,
+            _sc_mod._CRON_AUTO_DELIVER_PLATFORM,
+            _sc_mod._CRON_AUTO_DELIVER_CHAT_ID,
+            _sc_mod._CRON_AUTO_DELIVER_THREAD_ID,
+        ):
+            _cv.set(_sc_mod._UNSET)
+    except Exception:
+        pass
+
+    # --- tools.env_passthrough — ContextVar<set[str]> with no default ---
+    # LookupError is normal if the test never set it. Setting it to an
+    # empty set unconditionally normalizes the starting state.
+    try:
+        from tools import env_passthrough as _envp_mod
+        _envp_mod._allowed_env_vars_var.set(set())
+    except Exception:
+        pass
+
+    # --- tools.terminal_tool — active environment/cwd cache ---
+    # File tools prefer a live terminal cwd when one is cached for the task.
+    # Clear terminal environments between tests so a prior terminal call can't
+    # override TERMINAL_CWD in path-resolution tests.
+    try:
+        from tools import terminal_tool as _term_mod
+        _envs_to_cleanup = []
+        with _term_mod._env_lock:
+            _envs_to_cleanup = list(_term_mod._active_environments.values())
+            _term_mod._active_environments.clear()
+            _term_mod._last_activity.clear()
+            _term_mod._creation_locks.clear()
+        for _env in _envs_to_cleanup:
+            try:
+                _env.cleanup()
+            except Exception:
+                pass
+    except Exception:
+        pass
+
+    # --- tools.credential_files — ContextVar<dict> ---
+    try:
+        from tools import credential_files as _credf_mod
+        _credf_mod._registered_files_var.set({})
+    except Exception:
+        pass
+
+    # --- agent.auxiliary_client — runtime main provider/model override and
+    #     payment-error health cache. Both are process-global in production;
+    #     reset them per test so one worker's fallback/402 test does not make
+    #     later auxiliary-client tests skip otherwise-available providers.
+    try:
+        from agent import auxiliary_client as _aux_mod
+        _aux_mod.clear_runtime_main()
+        _aux_mod._reset_aux_unhealthy_cache()
+    except Exception:
+        pass
+
+    # --- tools.file_tools — per-task read history + file-ops cache ---
+    # _read_tracker accumulates per-task_id read history for loop detection,
+    # capped by _READ_HISTORY_CAP. If entries from a prior test persist, the
+    # cap is hit faster than expected and capacity-related tests flake.
+    try:
+        from tools import file_tools as _ft_mod
+        with _ft_mod._read_tracker_lock:
+            _ft_mod._read_tracker.clear()
+        with _ft_mod._file_ops_lock:
+            _ft_mod._file_ops_cache.clear()
+    except Exception:
+        pass
+
+    yield


@pytest.fixture()
@@ -422,12 +532,13 @@ def mock_config():
    }


-# ── Per-test timeout — handled by the isolation plugin ─────────────────────
-#
-# The subprocess-per-test plugin enforces the configured ``isolate_timeout``
-# ini key by terminating the child if it overruns. The old SIGALRM-based
-# fixture (POSIX-only, didn't work on Windows) is gone.
+# ── Global test timeout ─────────────────────────────────────────────────────
+# Kill any individual test that takes longer than 30 seconds.
+# Prevents hanging tests (subprocess spawns, blocking I/O) from stalling the
+# entire test suite.

+def _timeout_handler(signum, frame):
+    raise TimeoutError("Test exceeded 30 second timeout")

@pytest.fixture(autouse=True)
 def _ensure_current_event_loop(request):
@@ -473,6 +584,45 @@ def _ensure_current_event_loop(request):
                asyncio.set_event_loop(None)


+@pytest.fixture(autouse=True)
+def _enforce_test_timeout():
+    """Kill any individual test that takes longer than 30 seconds.
+    SIGALRM is Unix-only; skip on Windows."""
+    if sys.platform == "win32":
+        yield
+        return
+    old = signal.signal(signal.SIGALRM, _timeout_handler)
+    signal.alarm(30)
+    yield
+    signal.alarm(0)
+    signal.signal(signal.SIGALRM, old)
+
+
+@pytest.fixture(autouse=True)
+def _reset_tool_registry_caches():
+    """Clear tool-registry-level caches between tests.
+
+    The production registry caches ``check_fn()`` results for 30 s
+    (see tools/registry.py) and :func:`get_tool_definitions` memoizes
+    its result (see model_tools.py). Both are keyed on state that tests
+    routinely mutate (env vars, registry._generation, config.yaml mtime)
+    — but a stale result from test A can still be served to test B
+    because 30 s covers the entire suite, and xdist worker reuse means
+    one test's cache lands in another's process. Clearing before every
+    test keeps hermetic behavior.
+    """
+    try:
+        from tools.registry import invalidate_check_fn_cache
+        invalidate_check_fn_cache()
+    except ImportError:
+        pass
+    try:
+        from model_tools import _clear_tool_defs_cache
+        _clear_tool_defs_cache()
+    except ImportError:
+        pass
+
+
 # ── Live-system guard ──────────────────────────────────────────────────────
 #
 # Several test files exercise the gateway-restart / kill code paths
@@ -313,30 +313,19 @@ def _scan_for_plugin_adapter_antipattern(source: str) -> list[str]:
    return offenses


-def _fingerprint_gateway_tests() -> str:
-    """Return a short fingerprint that changes when any gateway test file changes.
+def pytest_configure(config):
+    """Reject plugin-adapter tests that use the sys.path anti-pattern.

-    Uses (mtime, size) pairs instead of content hashing — fast to compute
-    (stat-only, no reads) and sufficient for cache invalidation across
-    per-file subprocess runs.
+    Runs once per pytest session on the controller, BEFORE any xdist
+    worker is spawned. If any file under ``tests/gateway/`` matches the
+    anti-pattern, we fail the whole session with a clear message —
+    before a polluted ``sys.path`` can cascade across workers.
    """
-    import hashlib
+    # Only run on the xdist controller (or in non-xdist runs). Skip on
+    # worker subprocesses so we don't scan the filesystem N times.
+    if hasattr(config, "workerinput"):
+        return

-    h = hashlib.sha256()
-    for path in sorted(_GATEWAY_DIR.rglob("test_*.py")):
-        try:
-            st = path.stat()
-            h.update(f"{path.name}:{st.st_mtime_ns}:{st.st_size}".encode())
-        except OSError:
-            h.update(f"{path.name}:missing".encode())
-    return h.hexdigest()[:16]
-
-
-def _run_adapter_antipattern_scan() -> list[str]:
-    """Scan gateway test files for the plugin-adapter anti-pattern.
-
-    Returns a list of violation strings (empty if clean).
-    """
    violations: list[str] = []
    for path in _GATEWAY_DIR.rglob("test_*.py"):
        if path.name in {"_plugin_adapter_loader.py", "conftest.py"}:
@@ -345,108 +334,20 @@ def _run_adapter_antipattern_scan() -> list[str]:
            source = path.read_text(encoding="utf-8")
        except OSError:
            continue
-        # Fast string pre-filter: skip files that can't possibly violate.
-        # A violating file MUST contain both (a) an adapter/plugins/platforms
-        # reference AND (b) either sys.path manipulation or a bare adapter import.
        if "adapter" not in source and "plugins/platforms" not in source:
            continue
-        if not (
-            "sys.path" in source
-            or "import adapter" in source
-            or "from adapter import" in source
-        ):
-            continue
        offenses = _scan_for_plugin_adapter_antipattern(source)
        if offenses:
            violations.append(
                f"  {path.relative_to(_GATEWAY_DIR.parent.parent)}:\n    "
                + "\n    ".join(offenses)
            )
-    return violations

-
-def pytest_configure(config):
-    """Reject plugin-adapter tests that use the sys.path anti-pattern.
-
-    Runs once per pytest session on the controller, BEFORE any xdist
-    worker is spawned. If any file under ``tests/gateway/`` matches the
-    anti-pattern, we fail the whole session with a clear message —
-    before a polluted ``sys.path`` can cascade across workers.
-
-    **Performance**: in the per-file subprocess isolation model (no xdist),
-    every subprocess is a "controller" — so the naive scan would run 257
-    times, each costing ~1s of AST walking.  We avoid this with two
-    strategies:
-
-    1. **Tight string pre-filter**: a file can only violate if it contains
-       *both* an adapter/plugins/platforms reference *and* a sys.path
-       manipulation or bare ``import adapter``.  This drops ~95% of files
-       from needing AST parsing.
-    2. **File-locked cache**: the scan result is cached in
-       ``.pytest-cache/gw-adapter-guard-<fingerprint>`` keyed on a
-       fingerprint of the gateway test file mtimes/sizes.  Concurrent
-       subprocesses acquire a lock; only the first performs the scan;
-       the rest wait and read the cached result.
-    """
-    # Only run on the xdist controller (or in non-xdist runs). Skip on
-    # worker subprocesses so we don't scan the filesystem N times.
-    if hasattr(config, "workerinput"):
-        return
-
-    fp = _fingerprint_gateway_tests()
-    cache_dir = Path.cwd() / ".pytest-cache"
-    cache_file = cache_dir / f"gw-adapter-guard-{fp}"
-    lock_file = cache_dir / f".gw-adapter-guard-{fp}.lock"
-
-    cache_dir.mkdir(parents=True, exist_ok=True)
-
-    # Evict stale cache entries from previous fingerprints (best-effort).
-    try:
-        for old in cache_dir.glob("gw-adapter-guard-*"):
-            if old.name != f"gw-adapter-guard-{fp}":
-                old.unlink(missing_ok=True)
-        for old in cache_dir.glob(".gw-adapter-guard-*.lock"):
-            if old.name != f".gw-adapter-guard-{fp}.lock":
-                old.unlink(missing_ok=True)
-    except OSError:
-        pass  # Non-critical; old files are harmless.
-
-    # Use filelock to ensure only one process scans at a time.
-    # Concurrent subprocesses all hit pytest_configure simultaneously;
-    # without a lock they'd all find no cache and all run the scan.
-    try:
-        from filelock import FileLock
-        lock = FileLock(str(lock_file), timeout=120)
-    except ImportError:
-        # Fallback: no locking (still correct, just slower under contention).
-        import contextlib
-
-        class _NoLock:
-            def __enter__(self):
-                return self
-            def __exit__(self, *a):
-                pass
-        lock = _NoLock()
-
-    with lock:
-        if cache_file.exists():
-            cached = cache_file.read_text(encoding="utf-8")
-            if cached == "clean":
-                return
-            raise pytest.UsageError(cached)
-
-        # Slow path: this process is the first to acquire the lock.
-        violations = _run_adapter_antipattern_scan()
-
-        if violations:
-            msg = (
-                "Plugin-adapter-import anti-pattern detected in gateway tests:\n"
-                + "\n".join(violations)
-                + "\n\n"
-                + _GUARD_HINT
-            )
-            cache_file.write_text(msg, encoding="utf-8")
-            raise pytest.UsageError(msg)
-        else:
-            cache_file.write_text("clean", encoding="utf-8")
+    if violations:
+        raise pytest.UsageError(
+            "Plugin-adapter-import anti-pattern detected in gateway tests:\n"
+            + "\n".join(violations)
+            + "\n\n"
+            + _GUARD_HINT
+        )

@@ -22,26 +22,19 @@ from gateway.config import PlatformConfig


 def _ensure_telegram_mock():
+    if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
+        return
+
    telegram_mod = MagicMock()
    telegram_mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
+    telegram_mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
+    telegram_mod.constants.ChatType.GROUP = "group"
+    telegram_mod.constants.ChatType.SUPERGROUP = "supergroup"
+    telegram_mod.constants.ChatType.CHANNEL = "channel"
+    telegram_mod.constants.ChatType.PRIVATE = "private"

-    # Register telegram.constants as a separate module mock so that
-    # ``from telegram.constants import ChatType`` resolves to our mock
-    # with string-valued members (not auto-generated MagicMocks).
-    constants_mod = MagicMock()
-    constants_mod.ParseMode.MARKDOWN_V2 = "MarkdownV2"
-    constants_mod.ChatType.GROUP = "group"
-    constants_mod.ChatType.SUPERGROUP = "supergroup"
-    constants_mod.ChatType.CHANNEL = "channel"
-    constants_mod.ChatType.PRIVATE = "private"
-
-    sys.modules["telegram"] = telegram_mod
-    sys.modules["telegram.ext"] = telegram_mod.ext
-    sys.modules["telegram.constants"] = constants_mod
-    sys.modules["telegram.request"] = telegram_mod.request
-
-    # Force reimport so the adapter picks up the mock ChatType.
-    sys.modules.pop("gateway.platforms.telegram", None)
+    for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
+        sys.modules.setdefault(name, telegram_mod)


 _ensure_telegram_mock()
@@ -22,11 +22,6 @@ import pytest

 from gateway.config import Platform, PlatformConfig, load_gateway_config

-# Platform uses _missing_() for dynamic members, so "google_chat" is
-# resolvable via Platform("google_chat") even without a static
-# GOOGLE_CHAT attribute on the enum class.
-_GC = Platform("google_chat")
-

 # ---------------------------------------------------------------------------
 # Mock the google-* packages if they are not installed
@@ -234,7 +229,7 @@ def _make_chat_envelope(text="hello", sender_email="u@example.com", sender_type=

 class TestPlatformRegistration:
    def test_enum_value(self):
-        assert _GC.value == "google_chat"
+        assert Platform.GOOGLE_CHAT.value == "google_chat"

    def test_requirements_check_returns_true_when_available(self):
        # The shim flag is True in this test module.
@@ -271,14 +266,14 @@ class TestEnvConfigLoading:
        monkeypatch.setenv("GOOGLE_CHAT_PROJECT_ID", "p")
        # No subscription.
        cfg = load_gateway_config()
-        assert _GC not in cfg.platforms
+        assert Platform.GOOGLE_CHAT not in cfg.platforms

    def test_missing_project_does_not_enable(self, monkeypatch):
        self._clean_env(monkeypatch)
        monkeypatch.setenv("GOOGLE_CHAT_SUBSCRIPTION_NAME",
                           "projects/p/subscriptions/s")
        cfg = load_gateway_config()
-        assert _GC not in cfg.platforms
+        assert Platform.GOOGLE_CHAT not in cfg.platforms



@@ -2588,7 +2583,7 @@ class TestAuthorizationEmailMatch:
        runner.pairing_store.is_approved = MagicMock(return_value=False)

        source = SessionSource(
-            platform=_GC,
+            platform=Platform.GOOGLE_CHAT,
            chat_id="spaces/S",
            chat_type="dm",
            user_id="alice@example.com",       # post-swap: email is canonical
@@ -2609,7 +2604,7 @@ class TestAuthorizationEmailMatch:
        runner.pairing_store.is_approved = MagicMock(return_value=False)

        source = SessionSource(
-            platform=_GC,
+            platform=Platform.GOOGLE_CHAT,
            chat_id="spaces/S",
            chat_type="dm",
            user_id="bob@example.com",
@@ -2635,7 +2630,7 @@ class TestAuthorizationEmailMatch:
        runner.pairing_store.is_approved = MagicMock(return_value=False)

        source = SessionSource(
-            platform=_GC,
+            platform=Platform.GOOGLE_CHAT,
            chat_id="spaces/S",
            chat_type="dm",
            user_id="users/77777",  # no email available — resource name wins
@@ -75,197 +75,9 @@ class TestCodeGeneration:
            code = store.generate_code("telegram", "user1", "Alice")
            pending = store.list_pending("telegram")
        assert len(pending) == 1
-        # list_pending no longer returns the original code — it returns a
-        # truncated hash prefix.  Verify the metadata is correct instead.
+        assert pending[0]["code"] == code
        assert pending[0]["user_id"] == "user1"
        assert pending[0]["user_name"] == "Alice"
-        # The code field is now a hash prefix, not the original plaintext code
-        assert pending[0]["code"] != code
-
-
-# ---------------------------------------------------------------------------
-# Hashed storage
-# ---------------------------------------------------------------------------
-
-
-class TestHashedStorage:
-    def test_pending_file_contains_hash_and_salt(self, tmp_path):
-        """Stored entries must have 'hash' and 'salt', never the plaintext code."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            store = PairingStore()
-            code = store.generate_code("telegram", "user1", "Alice")
-            raw = json.loads(
-                (tmp_path / "telegram-pending.json").read_text(encoding="utf-8")
-            )
-
-        assert len(raw) == 1
-        entry = next(iter(raw.values()))
-        # Must have hash and salt fields
-        assert "hash" in entry
-        assert "salt" in entry
-        # Hash must be a valid hex SHA-256 digest (64 hex chars)
-        assert len(entry["hash"]) == 64
-        assert all(c in "0123456789abcdef" for c in entry["hash"])
-        # Salt must be a valid hex string (32 hex chars for 16 bytes)
-        assert len(entry["salt"]) == 32
-        assert all(c in "0123456789abcdef" for c in entry["salt"])
-        # The plaintext code must NOT appear as a key or value anywhere
-        assert code not in raw  # not a key
-        for key, val in raw.items():
-            assert code != key
-            for field_val in val.values():
-                if isinstance(field_val, str):
-                    assert field_val != code
-
-    def test_plaintext_code_not_stored(self, tmp_path):
-        """The raw JSON file must not contain the plaintext code anywhere."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            store = PairingStore()
-            code = store.generate_code("telegram", "user1")
-            raw_text = (tmp_path / "telegram-pending.json").read_text(encoding="utf-8")
-        assert code not in raw_text
-
-    def test_valid_code_verifies_against_hash(self, tmp_path):
-        """approve_code with the correct code should succeed."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            store = PairingStore()
-            code = store.generate_code("telegram", "user1", "Bob")
-            result = store.approve_code("telegram", code)
-        assert result is not None
-        assert result["user_id"] == "user1"
-        assert result["user_name"] == "Bob"
-
-    def test_invalid_code_rejected(self, tmp_path):
-        """approve_code with a wrong code should fail."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            store = PairingStore()
-            store.generate_code("telegram", "user1")
-            result = store.approve_code("telegram", "ZZZZZZZZ")
-        assert result is None
-
-    def test_different_salts_per_entry(self, tmp_path):
-        """Each pending entry should have a unique salt."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            store = PairingStore()
-            store.generate_code("telegram", "user0")
-            store.generate_code("telegram", "user1")
-            store.generate_code("telegram", "user2")
-            raw = json.loads(
-                (tmp_path / "telegram-pending.json").read_text(encoding="utf-8")
-            )
-        salts = [entry["salt"] for entry in raw.values()]
-        assert len(set(salts)) == 3  # all unique
-
-    def test_hash_code_static_method(self, tmp_path):
-        """_hash_code should be deterministic for the same code+salt."""
-        salt = os.urandom(16)
-        h1 = PairingStore._hash_code("ABCD1234", salt)
-        h2 = PairingStore._hash_code("ABCD1234", salt)
-        assert h1 == h2
-        # Different salt should produce a different hash
-        salt2 = os.urandom(16)
-        h3 = PairingStore._hash_code("ABCD1234", salt2)
-        assert h3 != h1
-
-
-class TestLegacyPendingFileCompat:
-    """Defensive coverage for pre-hash pending.json on upgraded installs.
-
-    Existing user installs may have a pending.json written by the old
-    code (plaintext code as key, no hash/salt fields). The new
-    approve_code / list_pending / _cleanup_expired must not crash on
-    those entries — they should be ignored and aged out at TTL.
-    """
-
-    @staticmethod
-    def _write_legacy(tmp_path, code="ABCD1234", created_at=None):
-        """Write a pre-hash pending.json with plaintext code as the key."""
-        import time as _time
-        if created_at is None:
-            created_at = _time.time()
-        legacy = {
-            code: {
-                "user_id": "legacy-user",
-                "user_name": "Legacy",
-                "created_at": created_at,
-            }
-        }
-        (tmp_path / "telegram-pending.json").write_text(
-            json.dumps(legacy), encoding="utf-8"
-        )
-
-    def test_approve_code_ignores_legacy_entries(self, tmp_path):
-        """A valid old-format code must NOT silently approve under the new schema."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            self._write_legacy(tmp_path, code="LEGACY01")
-            store = PairingStore()
-            # The plaintext "code" used to be the key — under the new schema
-            # it's not even looked at, and there's no hash/salt to verify.
-            # Result: approve_code returns None, the legacy entry is left
-            # alone (gets pruned by _cleanup_expired at TTL).
-            result = store.approve_code("telegram", "LEGACY01")
-            assert result is None
-            # Approved list must be empty
-            assert store.is_approved("telegram", "legacy-user") is False
-
-    def test_list_pending_handles_legacy_entries(self, tmp_path):
-        """list_pending must not KeyError on a missing 'hash' field."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            self._write_legacy(tmp_path)
-            store = PairingStore()
-            pending = store.list_pending("telegram")
-        assert len(pending) == 1
-        assert pending[0]["user_id"] == "legacy-user"
-        assert pending[0]["code"] == "legacy"  # placeholder
-
-    def test_cleanup_expired_removes_legacy_at_ttl(self, tmp_path):
-        """Legacy entries past CODE_TTL must still get pruned."""
-        import time as _time
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            self._write_legacy(
-                tmp_path,
-                code="LEGACY99",
-                created_at=_time.time() - CODE_TTL_SECONDS - 1,
-            )
-            store = PairingStore()
-            store._cleanup_expired("telegram")
-            raw = json.loads(
-                (tmp_path / "telegram-pending.json").read_text(encoding="utf-8")
-            )
-        assert raw == {}
-
-    def test_cleanup_expired_handles_malformed_entries(self, tmp_path):
-        """Non-dict / missing-created_at entries get evicted, not crashed on."""
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            (tmp_path / "telegram-pending.json").write_text(
-                json.dumps({
-                    "broken1": "not a dict",
-                    "broken2": {"user_id": "x"},  # no created_at
-                    "broken3": {"created_at": "not a number"},
-                }),
-                encoding="utf-8",
-            )
-            store = PairingStore()
-            store._cleanup_expired("telegram")
-            raw = json.loads(
-                (tmp_path / "telegram-pending.json").read_text(encoding="utf-8")
-            )
-        assert raw == {}
-
-    def test_approve_code_skips_malformed_entries(self, tmp_path):
-        """Malformed entries must not crash approve_code's hash loop."""
-        import time as _time
-        with patch("gateway.pairing.PAIRING_DIR", tmp_path):
-            (tmp_path / "telegram-pending.json").write_text(
-                json.dumps({
-                    "broken": {"user_id": "x", "created_at": _time.time(),
-                               "salt": "not-hex", "hash": "doesntmatter"},
-                }),
-                encoding="utf-8",
-            )
-            store = PairingStore()
-            # Approving with any code must just return None, not crash.
-            assert store.approve_code("telegram", "ABCD1234") is None


 # ---------------------------------------------------------------------------
@@ -488,10 +300,9 @@ class TestCodeExpiry:
            store = PairingStore()
            code = store.generate_code("telegram", "user1")

-            # Manually expire all pending entries
+            # Manually expire the code
            pending = store._load_json(store._pending_path("telegram"))
-            for entry_id in pending:
-                pending[entry_id]["created_at"] = time.time() - CODE_TTL_SECONDS - 1
+            pending[code]["created_at"] = time.time() - CODE_TTL_SECONDS - 1
            store._save_json(store._pending_path("telegram"), pending)

            # Cleanup happens on next operation
@@ -503,10 +314,9 @@ class TestCodeExpiry:
            store = PairingStore()
            code = store.generate_code("telegram", "user1")

-            # Expire all entries
+            # Expire it
            pending = store._load_json(store._pending_path("telegram"))
-            for entry_id in pending:
-                pending[entry_id]["created_at"] = time.time() - CODE_TTL_SECONDS - 1
+            pending[code]["created_at"] = time.time() - CODE_TTL_SECONDS - 1
            store._save_json(store._pending_path("telegram"), pending)

            result = store.approve_code("telegram", code)
@@ -6,11 +6,7 @@ import pytest
 from pathlib import Path

 from gateway.config import PlatformConfig
-from gateway.platforms.webhook import (
-    WebhookAdapter,
-    _DYNAMIC_ROUTES_FILENAME,
-    _INSECURE_NO_AUTH,
-)
+from gateway.platforms.webhook import WebhookAdapter, _DYNAMIC_ROUTES_FILENAME


 def _make_adapter(routes=None, extra=None):
@@ -89,78 +85,3 @@ class TestDynamicRouteLoading:
        adapter._reload_dynamic_routes()
        assert "static" in adapter._routes
        assert len(adapter._dynamic_routes) == 0
-
-
-class TestDynamicRouteSecretValidation:
-    """Empty/missing secrets must be rejected during hot-reload.
-
-    Regression for HMAC bypass: prior to the fix, an agent-induced
-    dynamic route with `"secret": ""` would be merged into self._routes
-    by _reload_dynamic_routes(), then _handle_webhook's
-    `if secret and secret != _INSECURE_NO_AUTH` would skip signature
-    validation because empty string is falsy. Unauthenticated POSTs
-    would then execute the webhook prompt.
-    """
-
-    def test_empty_secret_rejected(self, tmp_path):
-        # Explicit empty-string secret must NOT fall back to the global
-        # secret, and the route must be skipped entirely.
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"evil": {"secret": "", "prompt": "rm -rf"}})
-        )
-        adapter = _make_adapter()  # has global secret
-        adapter._reload_dynamic_routes()
-        assert "evil" not in adapter._routes
-        assert "evil" not in adapter._dynamic_routes
-
-    def test_missing_secret_no_global_rejected(self, tmp_path):
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"orphan": {"prompt": "test"}})
-        )
-        # No global secret configured
-        adapter = _make_adapter(extra={"secret": ""})
-        adapter._reload_dynamic_routes()
-        assert "orphan" not in adapter._routes
-        assert "orphan" not in adapter._dynamic_routes
-
-    def test_missing_secret_inherits_global(self, tmp_path):
-        # No per-route secret but a global one is set → route is kept,
-        # the global secret protects it. Preserves existing fallback.
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"valid": {"prompt": "ok"}})
-        )
-        adapter = _make_adapter()  # global secret set
-        adapter._reload_dynamic_routes()
-        assert "valid" in adapter._routes
-
-    def test_insecure_no_auth_preserved(self, tmp_path):
-        # Explicit opt-in escape hatch for local testing — must still load.
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"test": {"secret": _INSECURE_NO_AUTH, "prompt": "p"}})
-        )
-        adapter = _make_adapter()
-        adapter._reload_dynamic_routes()
-        assert "test" in adapter._routes
-
-    def test_warning_logged_on_skip(self, tmp_path, caplog):
-        import logging
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({"silent": {"secret": "", "prompt": "x"}})
-        )
-        adapter = _make_adapter()
-        with caplog.at_level(logging.WARNING, logger="gateway.platforms.webhook"):
-            adapter._reload_dynamic_routes()
-        assert any("silent" in rec.message for rec in caplog.records)
-
-    def test_partial_skip(self, tmp_path):
-        # One route bad, one route good — only the bad one is dropped.
-        (tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
-            json.dumps({
-                "bad":  {"secret": "", "prompt": "x"},
-                "good": {"secret": "valid-secret", "prompt": "y"},
-            })
-        )
-        adapter = _make_adapter()
-        adapter._reload_dynamic_routes()
-        assert "good" in adapter._routes
-        assert "bad" not in adapter._routes
@@ -1,131 +0,0 @@
-"""Tests for curses color compatibility on low-color terminals (Docker).
-
-Regression test for #13688: ``hermes plugins`` crashes with
-``curses.error: init_pair() : color number is greater than COLORS-1``
-in Docker containers where curses.COLORS == 8 (only colors 0-7 exist).
-
-The bug was ``curses.init_pair(4, 8, -1)`` using raw color 8 ("bright
-black" / dim gray) which does not exist on 8-color terminals.  The fix
-clamps with ``min(8, curses.COLORS - 1)``.
-"""
-
-import curses
-import re
-from pathlib import Path
-from unittest.mock import patch, MagicMock, call
-
-import pytest
-
-
-# Path to the source files under test
-_SRC_ROOT = Path(__file__).parent.parent.parent / "hermes_cli"
-
-
-class TestInitPairClampingBehavior:
-    """Simulate curses color initialization on low-color terminals.
-
-    Patches curses.COLORS to 8 (Docker default) and verifies that
-    init_pair is never called with a color >= COLORS.
-    """
-
-    def _collect_init_pair_calls(self, draw_fn, colors_value):
-        """Run a curses draw function with a mock stdscr and patched COLORS.
-
-        Returns list of (pair_number, fg, bg) tuples from init_pair calls.
-        """
-        calls = []
-        real_init_pair = curses.init_pair
-
-        def tracking_init_pair(pair, fg, bg):
-            calls.append((pair, fg, bg))
-
-        mock_stdscr = MagicMock()
-        mock_stdscr.getmaxyx.return_value = (24, 80)
-        mock_stdscr.getch.return_value = 27  # ESC to exit
-
-        with patch("curses.COLORS", colors_value, create=True), \
-             patch("curses.init_pair", side_effect=tracking_init_pair), \
-             patch("curses.has_colors", return_value=True), \
-             patch("curses.start_color"), \
-             patch("curses.use_default_colors"), \
-             patch("curses.curs_set"):
-            try:
-                draw_fn(mock_stdscr)
-            except (SystemExit, StopIteration, Exception):
-                pass  # draw functions loop until keypress
-
-        return calls
-
-    def test_8_color_terminal_no_color_exceeds_limit(self):
-        """On an 8-color terminal (Docker), no init_pair fg color >= 8."""
-        # Simulate the color init pattern from plugins_cmd.py
-        def _simulated_color_init(stdscr):
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(1, curses.COLOR_GREEN, -1)
-                curses.init_pair(2, curses.COLOR_YELLOW, -1)
-                curses.init_pair(3, curses.COLOR_CYAN, -1)
-                curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
-
-        calls = self._collect_init_pair_calls(_simulated_color_init, 8)
-        for pair, fg, bg in calls:
-            assert fg < 8, (
-                f"init_pair({pair}, {fg}, {bg}) uses color {fg} which "
-                f"does not exist on an 8-color terminal (valid: 0-7)"
-            )
-
-    def test_256_color_terminal_uses_color_8(self):
-        """On a 256-color terminal, color 8 (dim gray) should be used."""
-        def _simulated_color_init(stdscr):
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
-
-        calls = self._collect_init_pair_calls(_simulated_color_init, 256)
-        assert any(fg == 8 for _, fg, _ in calls), (
-            "On 256-color terminals, color 8 (dim gray) should be used"
-        )
-
-    def test_16_color_terminal_uses_color_8(self):
-        """On a 16-color terminal, color 8 should be available."""
-        def _simulated_color_init(stdscr):
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
-
-        calls = self._collect_init_pair_calls(_simulated_color_init, 16)
-        assert any(fg == 8 for _, fg, _ in calls)
-
-
-class TestSourceCodeGuardrails:
-    """Regression guardrails: raw color 8 must not reappear in source.
-
-    These complement the behavioral tests above — they catch regressions
-    introduced by copy-paste of the old pattern.
-    """
-
-    _RAW_COLOR_8_PATTERN = re.compile(r'init_pair\(\d+,\s*8\s*,')
-
-    def test_no_raw_color_8_in_plugins_cmd(self):
-        source = (_SRC_ROOT / "plugins_cmd.py").read_text()
-        matches = self._RAW_COLOR_8_PATTERN.findall(source)
-        assert not matches, (
-            f"plugins_cmd.py contains unclamped color 8: {matches}"
-        )
-
-    def test_no_raw_color_8_in_main(self):
-        source = (_SRC_ROOT / "main.py").read_text()
-        matches = self._RAW_COLOR_8_PATTERN.findall(source)
-        assert not matches, (
-            f"main.py contains unclamped color 8: {matches}"
-        )
-
-    def test_no_raw_color_8_in_curses_ui(self):
-        source = (_SRC_ROOT / "curses_ui.py").read_text()
-        matches = self._RAW_COLOR_8_PATTERN.findall(source)
-        assert not matches, (
-            f"curses_ui.py contains unclamped color 8: {matches}"
-        )
@@ -69,19 +69,18 @@ class TestPluginPickerInjection:
        assert "Myimg" in names
        assert "myimg" in plugin_names

-    def test_fal_surfaced_alongside_other_plugins(self, monkeypatch):
+    def test_fal_skipped_to_avoid_duplicate(self, monkeypatch):
        from hermes_cli import tools_config

-        # After #26241, FAL is itself a plugin (`plugins/image_gen/fal/`)
-        # and the hardcoded `TOOL_CATEGORIES["image_gen"]` FAL row is
-        # gone. The plugin-row builder therefore surfaces it like any
-        # other backend — no deduplication step needed.
+        # Simulate a FAL plugin being registered — the picker already has
+        # hardcoded FAL rows in TOOL_CATEGORIES, so plugin-FAL must be
+        # skipped to avoid showing FAL twice.
        image_gen_registry.register_provider(_FakeProvider("fal"))
        image_gen_registry.register_provider(_FakeProvider("openai"))

        rows = tools_config._plugin_image_gen_providers()
        names = [r.get("image_gen_plugin_name") for r in rows]
-        assert "fal" in names
+        assert "fal" not in names
        assert "openai" in names

    def test_visible_providers_includes_plugins_for_image_gen(self, monkeypatch):
@@ -1,4 +1,4 @@
-"""Tests for ``install_cua_driver`` upgrade semantics and architecture pre-check.
+"""Tests for ``install_cua_driver`` upgrade semantics.

 The cua-driver upstream installer always pulls the latest release tag, so
 re-running it is the canonical upgrade path. ``install_cua_driver(upgrade=True)``
@@ -10,18 +10,18 @@ must:
  fix for the "we only pulled cua-driver once on enable" complaint).
 * Preserve original ``upgrade=False`` behaviour for the toolset-enable flow:
  skip if installed, install otherwise, warn on non-macOS.
-* Pre-check architecture compatibility before downloading to avoid raw 404
-  errors on Intel macOS when the upstream release lacks x86_64 assets.
 """

 from __future__ import annotations

-import json
-from unittest.mock import MagicMock, patch
+from unittest.mock import patch


 class TestInstallCuaDriverUpgrade:
    def test_upgrade_on_non_macos_is_silent_noop(self):
+        """``hermes update`` calls install_cua_driver(upgrade=True) for every
+        user. On Linux/Windows it must return False without printing the
+        "macOS-only; skipping" warning that the toolset-enable path emits."""
        from hermes_cli import tools_config

        with patch.object(tools_config, "_print_warning") as warn, \
@@ -30,6 +30,8 @@ class TestInstallCuaDriverUpgrade:
            warn.assert_not_called()

    def test_non_upgrade_on_non_macos_warns(self):
+        """The toolset-enable path (upgrade=False) should still warn loudly
+        when the user tries to enable Computer Use on a non-macOS host."""
        from hermes_cli import tools_config

        with patch.object(tools_config, "_print_warning") as warn, \
@@ -38,36 +40,43 @@ class TestInstallCuaDriverUpgrade:
            warn.assert_called()

    def test_upgrade_on_macos_with_binary_runs_installer(self):
+        """When cua-driver is already on PATH and upgrade=True, we must
+        re-run the upstream installer (this is the fix for the bug report).
+        """
        from hermes_cli import tools_config

        with patch("platform.system", return_value="Darwin"), \
             patch.object(tools_config.shutil, "which",
                          side_effect=lambda n: "/usr/local/bin/" + n
                                                 if n in {"cua-driver", "curl"} else None), \
-             patch.object(tools_config, "_check_cua_driver_asset_for_arch",
-                          return_value=True), \
             patch.object(tools_config, "_run_cua_driver_installer",
                          return_value=True) as runner, \
             patch("subprocess.run"):
            assert tools_config.install_cua_driver(upgrade=True) is True
            runner.assert_called_once()
+            # Refresh path uses non-verbose mode so we don't re-print the
+            # "grant macOS permissions" block on every `hermes update`.
            kwargs = runner.call_args.kwargs
            assert kwargs.get("verbose") is False

    def test_upgrade_on_macos_without_binary_runs_installer(self):
+        """upgrade=True with cua-driver missing must still trigger an
+        install — equivalent to a fresh install. (Don't silently no-op.)"""
        from hermes_cli import tools_config

        with patch("platform.system", return_value="Darwin"), \
             patch.object(tools_config.shutil, "which",
                          side_effect=lambda n: "/usr/bin/curl" if n == "curl" else None), \
-             patch.object(tools_config, "_check_cua_driver_asset_for_arch",
-                          return_value=True), \
             patch.object(tools_config, "_run_cua_driver_installer",
                          return_value=True) as runner:
            assert tools_config.install_cua_driver(upgrade=True) is True
            runner.assert_called_once()

    def test_non_upgrade_on_macos_with_binary_skips_install(self):
+        """Original toolset-enable behaviour: cua-driver already installed
+        + upgrade=False → confirm and return without re-running installer.
+        This is the behaviour that ``hermes tools`` (re)enable depends on,
+        so the new helper must not regress it."""
        from hermes_cli import tools_config

        with patch("platform.system", return_value="Darwin"), \
@@ -80,133 +89,27 @@ class TestInstallCuaDriverUpgrade:
            runner.assert_not_called()

    def test_non_upgrade_on_macos_without_binary_runs_installer(self):
+        """Original fresh-install path must still work."""
        from hermes_cli import tools_config

        with patch("platform.system", return_value="Darwin"), \
             patch.object(tools_config.shutil, "which",
                          side_effect=lambda n: "/usr/bin/curl" if n == "curl" else None), \
-             patch.object(tools_config, "_check_cua_driver_asset_for_arch",
-                          return_value=True), \
             patch.object(tools_config, "_run_cua_driver_installer",
                          return_value=True) as runner:
            assert tools_config.install_cua_driver(upgrade=False) is True
+            runner.assert_called_once()

-
-class TestCheckCuaDriverAssetForArch:
-    def test_arm64_always_returns_true(self):
+    def test_upgrade_without_curl_does_not_crash(self):
+        """If curl isn't on PATH we can't refresh — must warn and return
+        the current install state, not raise."""
        from hermes_cli import tools_config

-        with patch("platform.machine", return_value="arm64"):
-            assert tools_config._check_cua_driver_asset_for_arch() is True
-
-    def test_x86_64_with_asset_returns_true(self):
-        from hermes_cli import tools_config
-
-        release = {
-            "tag_name": "cua-driver-v0.1.6",
-            "assets": [
-                {"name": "cua-driver-0.1.6-darwin-arm64.tar.gz"},
-                {"name": "cua-driver-0.1.6-darwin-x86_64.tar.gz"},
-            ],
-        }
-        mock_resp = MagicMock()
-        mock_resp.read.return_value = json.dumps(release).encode()
-        mock_resp.__enter__ = lambda s: s
-        mock_resp.__exit__ = MagicMock(return_value=False)
-
-        with patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", return_value=mock_resp):
-            assert tools_config._check_cua_driver_asset_for_arch() is True
-
-    def test_x86_64_without_asset_returns_false(self):
-        from hermes_cli import tools_config
-
-        release = {
-            "tag_name": "cua-driver-v0.1.6",
-            "assets": [
-                {"name": "cua-driver-0.1.6-darwin-arm64.tar.gz"},
-                {"name": "cua-driver.tar.gz"},
-            ],
-        }
-        mock_resp = MagicMock()
-        mock_resp.read.return_value = json.dumps(release).encode()
-        mock_resp.__enter__ = lambda s: s
-        mock_resp.__exit__ = MagicMock(return_value=False)
-
-        with patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", return_value=mock_resp), \
-             patch.object(tools_config, "_print_warning") as warn, \
-             patch.object(tools_config, "_print_info"):
-            assert tools_config._check_cua_driver_asset_for_arch() is False
-            warn.assert_called_once()
-            assert "no Intel" in warn.call_args[0][0].lower() or "x86_64" in warn.call_args[0][0]
-
-    def test_x86_64_api_failure_returns_true(self):
-        """Network failure should fail open — let the installer handle it."""
-        from hermes_cli import tools_config
-
-        with patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", side_effect=Exception("timeout")):
-            assert tools_config._check_cua_driver_asset_for_arch() is True
-
-    def test_fresh_install_x86_64_no_asset_skips_installer(self):
-        """When the latest release has no Intel asset, skip the installer."""
-        from hermes_cli import tools_config
-
-        release = {
-            "tag_name": "cua-driver-v0.1.6",
-            "assets": [{"name": "cua-driver-0.1.6-darwin-arm64.tar.gz"}],
-        }
-        mock_resp = MagicMock()
-        mock_resp.read.return_value = json.dumps(release).encode()
-        mock_resp.__enter__ = lambda s: s
-        mock_resp.__exit__ = MagicMock(return_value=False)
+        # cua-driver present, curl missing.
+        def _which(name):
+            return "/usr/local/bin/cua-driver" if name == "cua-driver" else None

        with patch("platform.system", return_value="Darwin"), \
-             patch.object(tools_config.shutil, "which",
-                          side_effect=lambda n: "/usr/bin/curl" if n == "curl" else None), \
-             patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", return_value=mock_resp), \
-             patch.object(tools_config, "_print_warning"), \
-             patch.object(tools_config, "_print_info"), \
-             patch.object(tools_config, "_run_cua_driver_installer") as runner:
-            assert tools_config.install_cua_driver(upgrade=False) is False
-            runner.assert_not_called()
-
-    def test_upgrade_x86_64_no_asset_returns_existing_status(self):
-        """On upgrade with no Intel asset, return whether binary existed."""
-        from hermes_cli import tools_config
-
-        release = {
-            "tag_name": "cua-driver-v0.1.6",
-            "assets": [{"name": "cua-driver-0.1.6-darwin-arm64.tar.gz"}],
-        }
-        mock_resp = MagicMock()
-        mock_resp.read.return_value = json.dumps(release).encode()
-        mock_resp.__enter__ = lambda s: s
-        mock_resp.__exit__ = MagicMock(return_value=False)
-
-        # With binary installed — returns True (binary exists)
-        with patch("platform.system", return_value="Darwin"), \
-             patch.object(tools_config.shutil, "which",
-                          side_effect=lambda n: "/usr/local/bin/" + n
-                                                 if n in ("cua-driver", "curl") else None), \
-             patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", return_value=mock_resp), \
-             patch.object(tools_config, "_print_warning"), \
-             patch.object(tools_config, "_print_info"), \
-             patch.object(tools_config, "_run_cua_driver_installer") as runner:
+             patch.object(tools_config.shutil, "which", side_effect=_which), \
+             patch.object(tools_config, "_print_warning"):
            assert tools_config.install_cua_driver(upgrade=True) is True
-            runner.assert_not_called()
-
-        # Without binary — returns False
-        with patch("platform.system", return_value="Darwin"), \
-             patch.object(tools_config.shutil, "which",
-                          side_effect=lambda n: "/usr/bin/curl" if n == "curl" else None), \
-             patch("platform.machine", return_value="x86_64"), \
-             patch("urllib.request.urlopen", return_value=mock_resp), \
-             patch.object(tools_config, "_print_warning"), \
-             patch.object(tools_config, "_print_info"), \
-             patch.object(tools_config, "_run_cua_driver_installer") as runner:
-            assert tools_config.install_cua_driver(upgrade=True) is False
-            runner.assert_not_called()
@@ -7,7 +7,6 @@ printf) to verify it behaves like a PTY you can read/write/resize/close.
 from __future__ import annotations

 import os
-import shutil
 import sys
 import time

@@ -67,7 +66,7 @@ class TestPtyBridgeIO:
    def test_write_sends_to_child_stdin(self):
        # `cat` with no args echoes stdin back to stdout.  We write a line,
        # read it back, then signal EOF to let cat exit cleanly.
-        bridge = PtyBridge.spawn([shutil.which("cat") or "cat"])
+        bridge = PtyBridge.spawn(["/bin/cat"])
        try:
            bridge.write(b"hello-pty\n")
            output = _read_until(bridge, b"hello-pty")
@@ -1631,33 +1631,6 @@ def test_named_custom_runtime_propagates_model_direct_path(monkeypatch):
    assert resolved["provider"] == "custom"


-def test_named_custom_runtime_propagates_extra_body_direct_path(monkeypatch):
-    """Custom provider extra_body should become runtime request_overrides."""
-    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "my-gemma")
-    monkeypatch.setattr(
-        rp, "_get_named_custom_provider",
-        lambda p: {
-            "name": "my-gemma",
-            "base_url": "http://localhost:8000/v1",
-            "api_key": "test-key",
-            "model": "google/gemma-4-31b-it",
-            "extra_body": {
-                "enable_thinking": True,
-                "reasoning_effort": "high",
-            },
-        },
-    )
-    monkeypatch.setattr(rp, "_try_resolve_from_custom_pool", lambda *a, **k: None)
-
-    resolved = rp.resolve_runtime_provider(requested="my-gemma")
-    assert resolved["request_overrides"] == {
-        "extra_body": {
-            "enable_thinking": True,
-            "reasoning_effort": "high",
-        }
-    }
-
-
 def test_named_custom_runtime_propagates_model_pool_path(monkeypatch):
    """Model should propagate even when credential pool handles credentials."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "my-server")
@@ -1689,36 +1662,6 @@ def test_named_custom_runtime_propagates_model_pool_path(monkeypatch):
    assert resolved["api_key"] == "pool-key", "pool credentials should be used"


-def test_named_custom_runtime_propagates_extra_body_pool_path(monkeypatch):
-    """Custom provider extra_body should survive credential-pool resolution."""
-    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "my-gemma")
-    monkeypatch.setattr(
-        rp, "_get_named_custom_provider",
-        lambda p: {
-            "name": "my-gemma",
-            "base_url": "http://localhost:8000/v1",
-            "api_key": "test-key",
-            "model": "google/gemma-4-31b-it",
-            "extra_body": {"enable_thinking": True},
-        },
-    )
-    monkeypatch.setattr(
-        rp, "_try_resolve_from_custom_pool",
-        lambda *a, **k: {
-            "provider": "custom",
-            "api_mode": "chat_completions",
-            "base_url": "http://localhost:8000/v1",
-            "api_key": "pool-key",
-            "source": "pool:custom:my-gemma",
-        },
-    )
-
-    resolved = rp.resolve_runtime_provider(requested="my-gemma")
-    assert resolved["request_overrides"] == {
-        "extra_body": {"enable_thinking": True}
-    }
-
-
 def test_named_custom_runtime_no_model_when_absent(monkeypatch):
    """When custom_providers entry has no model field, runtime should not either."""
    monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "my-server")
@@ -2207,24 +2150,6 @@ class TestProviderEntryApiKeyEnvAlias:
        key_env so the set stays in sync with what the runtime actually reads."""
        from hermes_cli.config import _VALID_CUSTOM_PROVIDER_FIELDS
        assert "key_env" in _VALID_CUSTOM_PROVIDER_FIELDS
-
-    def test_extra_body_is_supported_schema(self):
-        from hermes_cli.config import (
-            _VALID_CUSTOM_PROVIDER_FIELDS,
-            _normalize_custom_provider_entry,
-        )
-        entry = {
-            "name": "vendor",
-            "base_url": "https://api.vendor.example.com/v1",
-            "extra_body": {
-                "chat_template_kwargs": {"enable_thinking": True},
-                "include_reasoning": True,
-            },
-        }
-        normalized = _normalize_custom_provider_entry(dict(entry), provider_key="vendor")
-        assert normalized is not None
-        assert "extra_body" in _VALID_CUSTOM_PROVIDER_FIELDS
-        assert normalized["extra_body"] == entry["extra_body"]
 # =============================================================================
 # Tencent TokenHub — API-key provider runtime resolution
 # =============================================================================
@@ -90,7 +90,6 @@ def test_show_status_reports_managed_nous_features(monkeypatch, capsys, tmp_path
                "tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
                "browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
                "modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
-                "app_tools": NousFeatureState("app_tools", "App tools (500+ apps)", True, True, True, True, False, True, "Nous Subscription"),
            },
        ),
        raising=False,
@@ -12,10 +12,8 @@ from hermes_cli.tools_config import (
    _get_platform_tools,
    _platform_toolset_summary,
    _reconfigure_tool,
-    _run_post_setup,
    _save_platform_tools,
    _toolset_has_keys,
-    _toolset_needs_configuration_prompt,
    CONFIGURABLE_TOOLSETS,
    TOOL_CATEGORIES,
    _visible_providers,
@@ -754,91 +752,6 @@ def test_numeric_mcp_server_name_does_not_crash_sorted():

 # ─── Imagegen Backend Picker Wiring ────────────────────────────────────────

-def test_toolset_has_keys_treats_no_key_providers_as_configured():
-    config = {}
-
-    assert _toolset_has_keys("computer_use", config) is True
-
-
-def test_computer_use_needs_configuration_when_cua_driver_post_setup_pending():
-    """No-key providers can still need setup when their post_setup is unsatisfied.
-
-    Returning users enabling Computer Use through `hermes tools` must reach the
-    cua-driver post-setup installer even though the provider has no API keys.
-    """
-    with patch("shutil.which", return_value=None):
-        assert _toolset_needs_configuration_prompt("computer_use", {}) is True
-
-
-def test_computer_use_skips_configuration_when_cua_driver_already_installed():
-    """Installed post_setup dependencies should keep returning-user toggles no-op."""
-    def fake_which(name: str):
-        return "/usr/local/bin/cua-driver" if name == "cua-driver" else None
-
-    with patch("shutil.which", side_effect=fake_which):
-        assert _toolset_needs_configuration_prompt("computer_use", {}) is False
-
-
-def test_computer_use_respects_custom_cua_driver_command():
-    """The setup gate should match runtime's HERMES_CUA_DRIVER_CMD override."""
-    def fake_which(name: str):
-        return "/opt/bin/custom-cua" if name == "custom-cua" else None
-
-    with patch.dict("os.environ", {"HERMES_CUA_DRIVER_CMD": "custom-cua"}), \
-         patch("shutil.which", side_effect=fake_which):
-        assert _toolset_needs_configuration_prompt("computer_use", {}) is False
-
-
-def test_computer_use_blank_custom_driver_command_falls_back_to_default():
-    """Blank overrides should not make the setup gate look for an empty command."""
-    def fake_which(name: str):
-        return "/usr/local/bin/cua-driver" if name == "cua-driver" else None
-
-    with patch.dict("os.environ", {"HERMES_CUA_DRIVER_CMD": "   "}), \
-         patch("shutil.which", side_effect=fake_which):
-        assert _toolset_needs_configuration_prompt("computer_use", {}) is False
-
-
-def test_computer_use_post_setup_respects_custom_driver_command_when_installed():
-    """post_setup already-installed checks should version-probe the override."""
-    def fake_which(name: str):
-        return "/opt/bin/custom-cua" if name == "custom-cua" else None
-
-    with patch.dict("os.environ", {"HERMES_CUA_DRIVER_CMD": "custom-cua"}), \
-         patch("platform.system", return_value="Darwin"), \
-         patch("shutil.which", side_effect=fake_which), \
-         patch("subprocess.run") as run:
-        run.return_value.stdout = "custom 1.2.3\n"
-
-        _run_post_setup("cua_driver")
-
-    run.assert_called_once()
-    assert run.call_args.args[0] == ["custom-cua", "--version"]
-
-
-def test_computer_use_post_setup_missing_override_does_not_accept_default_binary():
-    """A default cua-driver binary must not satisfy a missing runtime override."""
-    seen = []
-
-    def fake_which(name: str):
-        seen.append(name)
-        if name == "cua-driver":
-            return "/usr/local/bin/cua-driver"
-        if name == "curl":
-            return None
-        return None
-
-    with patch.dict("os.environ", {"HERMES_CUA_DRIVER_CMD": "custom-cua"}), \
-         patch("platform.system", return_value="Darwin"), \
-         patch("shutil.which", side_effect=fake_which), \
-         patch("subprocess.run") as run:
-        _run_post_setup("cua_driver")
-
-    run.assert_not_called()
-    assert "custom-cua" in seen
-    assert "curl" in seen
-
-
 class TestImagegenBackendRegistry:
    """IMAGEGEN_BACKENDS tags drive the model picker flow in tools_config."""

@@ -168,7 +168,7 @@ def test_make_tui_argv_skips_build_only_on_termux_when_fresh(

    argv, cwd = main_mod._make_tui_argv(tmp_path, tui_dev=False)

-    assert argv == ["/bin/node", "--expose-gc", str(tmp_path / "dist" / "entry.js")]
+    assert argv == ["/bin/node", str(tmp_path / "dist" / "entry.js")]
    assert cwd == tmp_path


@@ -283,233 +283,6 @@ def test_fast_tui_launch_is_termux_only(monkeypatch, main_mod):
    assert main_mod._try_termux_fast_tui_launch() is False


-def test_termux_fast_cli_launch_chat_uses_light_parser(monkeypatch, main_mod):
-    captured = {}
-    prepared = []
-
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.delenv("HERMES_TUI", raising=False)
-    monkeypatch.setattr(
-        sys, "argv", ["hermes", "chat", "-q", "hello", "--toolsets", "web,terminal"]
-    )
-    monkeypatch.setattr(
-        main_mod, "_prepare_agent_startup", lambda args: prepared.append(args.command)
-    )
-    monkeypatch.setattr(
-        main_mod,
-        "cmd_chat",
-        lambda args: captured.update(
-            {"query": args.query, "toolsets": args.toolsets, "command": args.command}
-        ),
-    )
-
-    assert main_mod._try_termux_fast_cli_launch() is True
-    assert prepared == ["chat"]
-    assert captured == {
-        "query": "hello",
-        "toolsets": "web,terminal",
-        "command": "chat",
-    }
-
-
-def test_termux_fast_cli_launch_oneshot_uses_light_parser(monkeypatch, main_mod):
-    captured = {}
-    prepared = []
-
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.delenv("HERMES_TUI", raising=False)
-    monkeypatch.setattr(
-        sys,
-        "argv",
-        ["hermes", "-z", "hello", "--model", "gpt-test", "--provider", "openai"],
-    )
-    monkeypatch.setattr(
-        main_mod, "_prepare_agent_startup", lambda args: prepared.append(args.command)
-    )
-    monkeypatch.setitem(
-        sys.modules,
-        "hermes_cli.oneshot",
-        types.SimpleNamespace(
-            run_oneshot=lambda prompt, **kwargs: captured.update(
-                {"prompt": prompt, **kwargs}
-            )
-            or 17
-        ),
-    )
-
-    with pytest.raises(SystemExit) as exc:
-        main_mod._try_termux_fast_cli_launch()
-
-    assert exc.value.code == 17
-    assert prepared == [None]
-    assert captured == {
-        "prompt": "hello",
-        "model": "gpt-test",
-        "provider": "openai",
-        "toolsets": None,
-    }
-
-
-def test_termux_fast_cli_launch_version_skips_update_check(monkeypatch, main_mod):
-    captured = []
-
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.delenv("HERMES_TUI", raising=False)
-    monkeypatch.setattr(sys, "argv", ["hermes", "version"])
-    monkeypatch.setattr(
-        main_mod, "_print_version_info", lambda *, check_updates: captured.append(check_updates)
-    )
-
-    assert main_mod._try_termux_fast_cli_launch() is True
-    assert captured == [False]
-
-
-def test_termux_fast_cli_launch_skips_help(monkeypatch, main_mod):
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.delenv("HERMES_TUI", raising=False)
-    monkeypatch.setattr(sys, "argv", ["hermes", "chat", "--help"])
-
-    assert main_mod._try_termux_fast_cli_launch() is False
-
-
-def test_termux_fast_cli_launch_can_be_disabled(monkeypatch, main_mod):
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.setenv("HERMES_TERMUX_DISABLE_FAST_CLI", "1")
-    monkeypatch.delenv("HERMES_TUI", raising=False)
-    monkeypatch.setattr(sys, "argv", ["hermes", "version"])
-
-    assert main_mod._try_termux_fast_cli_launch() is False
-
-
-def test_termux_bundled_skills_stamp_controls_sync(monkeypatch, tmp_path, main_mod):
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.setattr(main_mod, "get_hermes_home", lambda: tmp_path)
-    monkeypatch.setattr(main_mod, "_termux_bundled_skills_fingerprint", lambda: "fp1")
-
-    assert main_mod._termux_bundled_skills_sync_needed() is True
-    main_mod._mark_termux_bundled_skills_synced()
-    assert main_mod._termux_bundled_skills_sync_needed() is False
-
-    monkeypatch.setenv("HERMES_TERMUX_FORCE_SKILLS_SYNC", "1")
-    assert main_mod._termux_bundled_skills_sync_needed() is True
-
-
-def test_termux_skips_bundled_skill_sync_when_stamp_fresh(monkeypatch, tmp_path, main_mod):
-    calls = []
-
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.setattr(main_mod, "get_hermes_home", lambda: tmp_path)
-    monkeypatch.setattr(main_mod, "_termux_bundled_skills_fingerprint", lambda: "fp1")
-    main_mod._mark_termux_bundled_skills_synced()
-    monkeypatch.setitem(
-        sys.modules,
-        "tools.skills_sync",
-        types.SimpleNamespace(sync_skills=lambda quiet: calls.append(quiet)),
-    )
-
-    assert main_mod._sync_bundled_skills_for_startup() is False
-    assert calls == []
-
-
-def test_termux_forced_bundled_skill_sync_runs(monkeypatch, tmp_path, main_mod):
-    calls = []
-
-    monkeypatch.setenv("TERMUX_VERSION", "1")
-    monkeypatch.setenv("HERMES_TERMUX_FORCE_SKILLS_SYNC", "1")
-    monkeypatch.setattr(main_mod, "get_hermes_home", lambda: tmp_path)
-    monkeypatch.setattr(main_mod, "_termux_bundled_skills_fingerprint", lambda: "fp1")
-    monkeypatch.setitem(
-        sys.modules,
-        "tools.skills_sync",
-        types.SimpleNamespace(sync_skills=lambda quiet: calls.append(quiet)),
-    )
-
-    assert main_mod._sync_bundled_skills_for_startup() is True
-    assert calls == [True]
-
-
-def test_read_git_revision_fingerprint_resolves_packed_refs(tmp_path, main_mod):
-    repo = tmp_path / "repo"
-    git_dir = repo / ".git"
-    git_dir.mkdir(parents=True)
-    (git_dir / "HEAD").write_text("ref: refs/heads/main\n", encoding="utf-8")
-    packed_sha = "1234567890abcdef1234567890abcdef12345678"
-    (git_dir / "packed-refs").write_text(
-        "# pack-refs with: peeled fully-peeled sorted\n"
-        f"{packed_sha} refs/heads/main\n"
-        "abcdef0000000000000000000000000000000000 refs/tags/v1.0\n"
-        "^99999999aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n",
-        encoding="utf-8",
-    )
-
-    fingerprint = main_mod._read_git_revision_fingerprint(repo)
-
-    assert fingerprint == f"git:refs/heads/main:{packed_sha}"
-
-
-def test_read_git_revision_fingerprint_packed_refs_in_worktree_common_dir(
-    tmp_path, main_mod
-):
-    main_repo = tmp_path / "repo"
-    common_git = main_repo / ".git"
-    common_git.mkdir(parents=True)
-    packed_sha = "fedcba9876543210fedcba9876543210fedcba98"
-    (common_git / "packed-refs").write_text(
-        f"{packed_sha} refs/heads/main\n",
-        encoding="utf-8",
-    )
-
-    worktree = tmp_path / "wt"
-    worktree.mkdir()
-    wt_gitdir = common_git / "worktrees" / "wt"
-    wt_gitdir.mkdir(parents=True)
-    (wt_gitdir / "HEAD").write_text("ref: refs/heads/main\n", encoding="utf-8")
-    (wt_gitdir / "commondir").write_text("../..\n", encoding="utf-8")
-    (worktree / ".git").write_text(f"gitdir: {wt_gitdir}\n", encoding="utf-8")
-
-    fingerprint = main_mod._read_git_revision_fingerprint(worktree)
-
-    assert fingerprint == f"git:refs/heads/main:{packed_sha}"
-
-
-def test_read_git_revision_fingerprint_loose_ref_in_worktree_common_dir(
-    tmp_path, main_mod
-):
-    """`git worktree add -b NAME` writes the new branch ref to the common dir,
-    not the per-worktree gitdir. The fingerprint must still resolve it."""
-    main_repo = tmp_path / "repo"
-    common_git = main_repo / ".git"
-    common_git.mkdir(parents=True)
-    loose_sha = "0123456789abcdef0123456789abcdef01234567"
-    (common_git / "refs" / "heads").mkdir(parents=True)
-    (common_git / "refs" / "heads" / "feature").write_text(
-        loose_sha + "\n", encoding="utf-8"
-    )
-
-    worktree = tmp_path / "wt"
-    worktree.mkdir()
-    wt_gitdir = common_git / "worktrees" / "wt"
-    wt_gitdir.mkdir(parents=True)
-    (wt_gitdir / "HEAD").write_text("ref: refs/heads/feature\n", encoding="utf-8")
-    (wt_gitdir / "commondir").write_text("../..\n", encoding="utf-8")
-    (worktree / ".git").write_text(f"gitdir: {wt_gitdir}\n", encoding="utf-8")
-
-    fingerprint = main_mod._read_git_revision_fingerprint(worktree)
-
-    assert fingerprint == f"git:refs/heads/feature:{loose_sha}"
-
-
-def test_read_git_revision_fingerprint_unresolved_ref_is_stable(tmp_path, main_mod):
-    repo = tmp_path / "repo"
-    git_dir = repo / ".git"
-    git_dir.mkdir(parents=True)
-    (git_dir / "HEAD").write_text("ref: refs/heads/missing\n", encoding="utf-8")
-
-    fingerprint = main_mod._read_git_revision_fingerprint(repo)
-
-    assert fingerprint == "git:refs/heads/missing:unresolved"
-
-
 def test_main_top_level_oneshot_accepts_toolsets(monkeypatch, main_mod):
    captured = {}

@@ -1,300 +0,0 @@
-"""Behavior-parity check for the image-gen FAL plugin migration (#26241).
-
-Spawns one subprocess per (version, scenario) cell — pinned to either
-``origin/main`` (legacy in-tree FAL fall-through + ``configured == "fal"``
-skip in ``_dispatch_to_plugin_provider``) or this PR's worktree (FAL is
-itself a plugin and the dispatcher routes every set provider through
-the registry). Each subprocess clears all FAL-related env vars + writes
-a ``config.yaml``, then asks the dispatcher how it would route an
-``image_generate`` call. The emitted shape tuple is
-``{dispatch_kind, provider_name, model}``:
-
-* ``dispatch_kind`` ∈ ``{"legacy_fal", "plugin", "error", None}`` —
-  whether the call would go straight to the in-tree pipeline,
-  through ``_dispatch_to_plugin_provider``, raise an explicit
-  provider-not-registered error, or fall through silently.
-* ``provider_name`` — when ``dispatch_kind == "plugin"``, the
-  resolved provider name. ``None`` otherwise.
-* ``model`` — the resolved FAL model id when applicable.
-
-The parent process diffs the shapes per scenario. A diff means the
-migration introduced an observable behaviour change vs origin/main —
-likely a real regression for users on the existing config keys.
-
-Run from the PR worktree:
-
-    python tests/plugins/image_gen/check_parity_vs_main.py
-"""
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-
-REPO_ROOT = Path(__file__).resolve().parents[3]
-
-
-# Pin one path to current main, one to the PR worktree.
-# ``REPO_ROOT`` is ``.../.worktrees/<name>``; the main checkout lives
-# two levels up. When running directly from a regular clone (no
-# worktree), ``MAIN_DIR`` falls back to a sibling ``hermes-agent-main``
-# checkout if one exists.
-def _resolve_main_dir() -> Path:
-    candidate = REPO_ROOT.parent.parent
-    if (candidate / "tools" / "image_generation_tool.py").exists() and candidate != REPO_ROOT:
-        return candidate
-    sibling = REPO_ROOT.parent / "hermes-agent-main"
-    if (sibling / "tools" / "image_generation_tool.py").exists():
-        return sibling
-    return REPO_ROOT
-
-
-MAIN_DIR = _resolve_main_dir()
-PR_DIR = REPO_ROOT
-assert (PR_DIR / "tools" / "image_generation_tool.py").exists(), (
-    f"PR_DIR={PR_DIR} doesn't look like a hermes-agent checkout"
-)
-
-
-SUBPROCESS_SCRIPT = r"""
-import json, os, sys, tempfile
-sys.path.insert(0, sys.argv[1])
-
-# Isolated HERMES_HOME so the config write is hermetic.
-home = tempfile.mkdtemp()
-os.environ["HERMES_HOME"] = home
-
-# Clear FAL-related env so dispatch decisions are config-driven.
-for k in (
-    "FAL_KEY", "FAL_QUEUE_GATEWAY_URL",
-    "TOOL_GATEWAY_DOMAIN", "TOOL_GATEWAY_USER_TOKEN",
-    "FAL_IMAGE_MODEL",
-):
-    os.environ.pop(k, None)
-
-scenario_env = json.loads(sys.argv[2])
-os.environ.update(scenario_env)
-
-config_yaml = sys.argv[3]
-config_path = os.path.join(home, "config.yaml")
-with open(config_path, "w") as f:
-    f.write(config_yaml)
-
-# Fresh import — must not have anything cached.
-for name in list(sys.modules):
-    if (name.startswith("tools.")
-            or name.startswith("agent.")
-            or name.startswith("plugins.")
-            or name.startswith("hermes_cli.")):
-        sys.modules.pop(name, None)
-
-import tools.image_generation_tool as image_tool
-
-dispatch_kind = None
-provider_name = None
-model = None
-error_text = None
-
-try:
-    raw = image_tool._dispatch_to_plugin_provider("ping", "landscape")
-    if raw is None:
-        dispatch_kind = "legacy_fal"
-    else:
-        parsed = json.loads(raw) if isinstance(raw, str) else raw
-        if isinstance(parsed, dict):
-            if parsed.get("error_type") == "provider_not_registered":
-                dispatch_kind = "error"
-                error_text = parsed.get("error")
-            else:
-                dispatch_kind = "plugin"
-                provider_name = parsed.get("provider")
-                model = parsed.get("model")
-        else:
-            dispatch_kind = "unknown_payload"
-
-    if model is None:
-        # _resolve_fal_model still returns the active FAL model id even
-        # when dispatch goes to a non-FAL plugin — used for the diff
-        # only when applicable.
-        try:
-            model_id, _meta = image_tool._resolve_fal_model()
-            if dispatch_kind == "legacy_fal":
-                model = model_id
-        except Exception:
-            pass
-except Exception as exc:
-    dispatch_kind = "exception"
-    error_text = repr(exc)
-
-shape = {
-    "dispatch_kind": dispatch_kind,
-    "provider_name": provider_name,
-    "model": model,
-    "error_present": error_text is not None,
-}
-print(json.dumps(shape))
-"""
-
-
-SCENARIOS: list[tuple[str, str, dict[str, str]]] = [
-    # (label, config.yaml body, extra env vars)
-    ("no-config-no-env", "", {}),
-    (
-        "explicit-fal-no-creds",
-        "image_gen:\n  provider: fal\n",
-        {},
-    ),
-    (
-        "explicit-fal-with-creds",
-        "image_gen:\n  provider: fal\n",
-        {"FAL_KEY": "test-key"},
-    ),
-    (
-        "explicit-fal-with-model",
-        "image_gen:\n  provider: fal\n  model: fal-ai/flux-2-pro\n",
-        {"FAL_KEY": "test-key"},
-    ),
-    (
-        "explicit-typo-provider",
-        "image_gen:\n  provider: not-a-real-backend\n",
-        {"FAL_KEY": "test-key"},
-    ),
-    (
-        "managed-gateway-only",
-        "",
-        {
-            "TOOL_GATEWAY_DOMAIN": "nousresearch.com",
-            "TOOL_GATEWAY_USER_TOKEN": "nous-token",
-        },
-    ),
-]
-
-
-def _run_scenario(repo_path: Path, label: str, config_yaml: str, env: dict) -> dict:
-    venv_python = repo_path / ".venv" / "bin" / "python"
-    if not venv_python.exists():
-        venv_python = MAIN_DIR / ".venv" / "bin" / "python"
-    if not venv_python.exists():
-        venv_python = Path("python3")
-
-    out = subprocess.run(
-        [
-            str(venv_python),
-            "-c",
-            SUBPROCESS_SCRIPT,
-            str(repo_path),
-            json.dumps(env),
-            config_yaml,
-        ],
-        capture_output=True,
-        text=True,
-        timeout=60,
-    )
-    if out.returncode != 0:
-        return {
-            "error": "subprocess failed",
-            "stdout": out.stdout[-500:],
-            "stderr": out.stderr[-500:],
-        }
-    try:
-        return json.loads(out.stdout.strip().splitlines()[-1])
-    except Exception as exc:
-        return {"error": f"could not parse output: {exc}", "stdout": out.stdout}
-
-
-def _reduce(shape: dict) -> dict:
-    """Reduce to the parts that matter for user-visible parity.
-
-    On origin/main, ``explicit-fal-*`` scenarios short-circuit to
-    ``legacy_fal`` because of the ``configured == "fal"`` skip. On the
-    PR, those same scenarios route through the plugin and emit
-    ``dispatch_kind == "plugin"`` with ``provider_name == "fal"``.
-
-    Both shapes are functionally equivalent — the plugin's ``generate()``
-    re-enters the same in-tree pipeline via ``_it`` indirection — but
-    we want the diff to be visible so reviewers can sign off on the
-    intentional behaviour delta.
-    """
-    return {
-        "dispatch_kind": shape.get("dispatch_kind"),
-        "provider_name": shape.get("provider_name"),
-        "model": shape.get("model"),
-        "error_present": shape.get("error_present"),
-    }
-
-
-def main() -> int:
-    print(f"main:    {MAIN_DIR}")
-    print(f"pr:      {PR_DIR}")
-    print()
-
-    if MAIN_DIR == PR_DIR:
-        print(
-            "WARN: MAIN_DIR == PR_DIR — diffs will be trivially identical.\n"
-            "      Set up a sibling 'hermes-agent-main' checkout pinned to "
-            "origin/main to get real parity coverage."
-        )
-        print()
-
-    failures: list[str] = []
-    errors: list[str] = []
-    intentional_diffs: list[tuple[str, dict, dict]] = []
-    for label, config_yaml, env in SCENARIOS:
-        main_shape = _run_scenario(MAIN_DIR, label, config_yaml, env)
-        pr_shape = _run_scenario(PR_DIR, label, config_yaml, env)
-
-        if "error" in main_shape or "error" in pr_shape:
-            print(f"  [ERR ] {label}: subprocess failed")
-            print(f"    main: {main_shape}")
-            print(f"    pr:   {pr_shape}")
-            errors.append(label)
-            continue
-
-        main_reduced = _reduce(main_shape)
-        pr_reduced = _reduce(pr_shape)
-
-        if main_reduced == pr_reduced:
-            print(f"  [OK]   {label}: {main_reduced}")
-            continue
-
-        # On main, "explicit-fal-*" returns legacy_fal; on PR, plugin
-        # dispatch. That's the only acceptable diff — flag everything
-        # else as a regression.
-        legacy_to_plugin_fal = (
-            main_reduced.get("dispatch_kind") == "legacy_fal"
-            and pr_reduced.get("dispatch_kind") == "plugin"
-            and pr_reduced.get("provider_name") == "fal"
-        )
-        if legacy_to_plugin_fal:
-            print(f"  [DIFF] {label}: legacy_fal → plugin (fal) — expected")
-            intentional_diffs.append((label, main_reduced, pr_reduced))
-        else:
-            print(f"  [FAIL] {label}")
-            print(f"    main: {main_reduced}")
-            print(f"    pr:   {pr_reduced}")
-            failures.append(label)
-
-    print()
-    if errors:
-        print(f"SUBPROCESS ERRORS in {len(errors)} scenario(s):")
-        for e in errors:
-            print(f"  - {e}")
-    if failures:
-        print(f"BEHAVIOUR REGRESSION in {len(failures)} scenario(s):")
-        for f in failures:
-            print(f"  - {f}")
-    if intentional_diffs:
-        print(
-            f"INTENTIONAL DIFFS ({len(intentional_diffs)}): "
-            f"legacy_fal → plugin dispatch for explicit FAL paths."
-        )
-    if failures or errors:
-        return 1
-    print(f"PARITY OK across {len(SCENARIOS)} scenarios.")
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
@@ -1,226 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for the FAL.ai image generation plugin.
-
-The plugin is a thin registration adapter — actual FAL pipeline logic
-lives in ``tools.image_generation_tool`` and is exercised by
-``tests/tools/test_image_generation.py``. These tests focus on:
-
-* the ``ImageGenProvider`` ABC surface (name, models, schema)
-* call-time indirection (``_it`` resolution at ``generate()`` time so
-  ``monkeypatch.setattr(image_tool, ...)`` keeps working)
-* response shape stamping (provider/prompt/aspect_ratio/model)
-"""
-
-from __future__ import annotations
-
-import json
-from unittest.mock import MagicMock
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# Provider surface
-# ---------------------------------------------------------------------------
-
-
-class TestFalImageGenProviderSurface:
-    def test_name(self):
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        assert FalImageGenProvider().name == "fal"
-
-    def test_display_name(self):
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        assert FalImageGenProvider().display_name == "FAL.ai"
-
-    def test_default_model_matches_legacy(self):
-        from plugins.image_gen.fal import FalImageGenProvider
-        from tools.image_generation_tool import DEFAULT_MODEL
-
-        assert FalImageGenProvider().default_model() == DEFAULT_MODEL
-
-    def test_list_models_uses_legacy_catalog(self):
-        from plugins.image_gen.fal import FalImageGenProvider
-        from tools.image_generation_tool import FAL_MODELS
-
-        provider = FalImageGenProvider()
-        models = provider.list_models()
-        ids = {m["id"] for m in models}
-        # Whatever FAL_MODELS ships, the provider mirrors verbatim.
-        assert ids == set(FAL_MODELS.keys())
-        # Spot-check the expected first-class fields are present.
-        for entry in models:
-            for field in ("id", "display", "speed", "strengths", "price"):
-                assert field in entry
-
-    def test_setup_schema_advertises_fal_key(self):
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        schema = FalImageGenProvider().get_setup_schema()
-        assert schema["name"] == "FAL.ai"
-        assert schema["badge"] == "paid"
-        env_keys = {entry["key"] for entry in schema.get("env_vars", [])}
-        assert "FAL_KEY" in env_keys
-
-
-class TestFalImageGenProviderAvailability:
-    def test_is_available_when_legacy_check_passes(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        monkeypatch.setattr(image_tool, "check_fal_api_key", lambda: True)
-        assert FalImageGenProvider().is_available() is True
-
-    def test_is_available_false_when_legacy_check_fails(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        monkeypatch.setattr(image_tool, "check_fal_api_key", lambda: False)
-        assert FalImageGenProvider().is_available() is False
-
-    def test_is_available_handles_legacy_exception(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        def _boom():
-            raise RuntimeError("config broke")
-
-        monkeypatch.setattr(image_tool, "check_fal_api_key", _boom)
-        # Picker must not propagate exceptions — show as "not available".
-        assert FalImageGenProvider().is_available() is False
-
-
-# ---------------------------------------------------------------------------
-# generate() — call-time indirection
-# ---------------------------------------------------------------------------
-
-
-class TestFalImageGenProviderGenerate:
-    def test_generate_delegates_to_legacy_image_generate_tool(self, monkeypatch):
-        """Plugin must look up ``image_generate_tool`` at call time so
-        ``monkeypatch.setattr(image_tool, "image_generate_tool", ...)``
-        takes effect."""
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        captured = {}
-
-        def fake_image_generate_tool(prompt, aspect_ratio, **kwargs):
-            captured["prompt"] = prompt
-            captured["aspect_ratio"] = aspect_ratio
-            captured["kwargs"] = kwargs
-            return json.dumps({"success": True, "image": "https://fake/image.png"})
-
-        monkeypatch.setattr(image_tool, "image_generate_tool", fake_image_generate_tool)
-        monkeypatch.setattr(image_tool, "_resolve_fal_model",
-                            lambda: ("fal-ai/flux-2/klein/9b", {}))
-
-        result = FalImageGenProvider().generate(
-            "a serene mountain landscape",
-            aspect_ratio="square",
-            seed=42,
-        )
-
-        assert captured["prompt"] == "a serene mountain landscape"
-        assert captured["aspect_ratio"] == "square"
-        assert captured["kwargs"] == {"seed": 42}
-        assert result["success"] is True
-        assert result["image"] == "https://fake/image.png"
-        # Stamped fields for the unified response shape
-        assert result["provider"] == "fal"
-        assert result["prompt"] == "a serene mountain landscape"
-        assert result["aspect_ratio"] == "square"
-        assert result["model"] == "fal-ai/flux-2/klein/9b"
-
-    def test_generate_invalid_aspect_ratio_is_coerced(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        seen_aspect = {}
-
-        def fake(prompt, aspect_ratio, **kwargs):
-            seen_aspect["v"] = aspect_ratio
-            return json.dumps({"success": True, "image": "x"})
-
-        monkeypatch.setattr(image_tool, "image_generate_tool", fake)
-        monkeypatch.setattr(image_tool, "_resolve_fal_model",
-                            lambda: ("fal-ai/flux-2/klein/9b", {}))
-
-        FalImageGenProvider().generate("p", aspect_ratio="not-a-real-ratio")
-        # ``resolve_aspect_ratio`` clamps to landscape.
-        assert seen_aspect["v"] == "landscape"
-
-    def test_generate_passthrough_drops_none_kwargs(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        seen = {}
-
-        def fake(prompt, aspect_ratio, **kwargs):
-            seen.update(kwargs)
-            return json.dumps({"success": True, "image": "x"})
-
-        monkeypatch.setattr(image_tool, "image_generate_tool", fake)
-        monkeypatch.setattr(image_tool, "_resolve_fal_model",
-                            lambda: ("fal-ai/flux-2/klein/9b", {}))
-
-        FalImageGenProvider().generate(
-            "p",
-            aspect_ratio="landscape",
-            seed=None,
-            num_images=2,
-            guidance_scale=None,
-        )
-
-        # ``None`` values must not be forwarded — they'd override the
-        # model's defaults inside the legacy payload builder.
-        assert "seed" not in seen
-        assert "guidance_scale" not in seen
-        assert seen.get("num_images") == 2
-
-    def test_generate_catches_exception_from_legacy(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        def boom(*args, **kwargs):
-            raise RuntimeError("FAL endpoint exploded")
-
-        monkeypatch.setattr(image_tool, "image_generate_tool", boom)
-
-        result = FalImageGenProvider().generate("p")
-        assert result["success"] is False
-        assert "FAL image generation failed" in result["error"]
-        assert result["error_type"] == "RuntimeError"
-        assert result["provider"] == "fal"
-
-    def test_generate_invalid_json_response(self, monkeypatch):
-        import tools.image_generation_tool as image_tool
-        from plugins.image_gen.fal import FalImageGenProvider
-
-        monkeypatch.setattr(image_tool, "image_generate_tool", lambda **kw: "not-json")
-        monkeypatch.setattr(image_tool, "_resolve_fal_model",
-                            lambda: ("fal-ai/flux-2/klein/9b", {}))
-
-        result = FalImageGenProvider().generate("p")
-        assert result["success"] is False
-        assert "Invalid JSON" in result["error"]
-        assert result["provider"] == "fal"
-
-
-# ---------------------------------------------------------------------------
-# Registry wiring
-# ---------------------------------------------------------------------------
-
-
-class TestFalImageGenPluginRegistration:
-    def test_register_wires_provider_into_registry(self):
-        from plugins.image_gen.fal import FalImageGenProvider, register
-
-        ctx = MagicMock()
-        register(ctx)
-
-        ctx.register_image_gen_provider.assert_called_once()
-        (registered,), _ = ctx.register_image_gen_provider.call_args
-        assert isinstance(registered, FalImageGenProvider)
@@ -62,9 +62,8 @@ def plugin_api(tmp_path, monkeypatch):
 class _FakeSessionDB:
    """Stand-in for hermes_state.SessionDB that records scan calls."""

-    def __init__(self, session_count: int, scan_delay: float = 0):
+    def __init__(self, session_count: int):
        self.session_count = session_count
-        self.scan_delay = scan_delay
        self.last_limit: Optional[int] = None
        self.last_include_children: Optional[bool] = None
        self.list_calls = 0
@@ -79,8 +78,6 @@ class _FakeSessionDB:
        include_children: bool = False,
        project_compression_tips: bool = True,
    ) -> List[Dict[str, Any]]:
-        if self.scan_delay:
-            time.sleep(self.scan_delay)
        self.last_limit = limit
        self.last_include_children = include_children
        self.list_calls += 1
@@ -228,8 +225,10 @@ def test_evaluate_all_stale_cache_serves_stale_and_refreshes_in_background(plugi
    the stale data immediately and kicks a background refresh. Users don't
    stare at a loading spinner every time TTL expires.
    """
-    fake_db = _FakeSessionDB(session_count=10, scan_delay=2.0)
+    fake_db = _FakeSessionDB(session_count=10)
    _install_fake_session_db(plugin_api, fake_db)
+
+    # Seed a stale snapshot on disk.
    stale_generated_at = int(time.time()) - plugin_api.SNAPSHOT_TTL_SECONDS - 60
    stale_payload = {
        "achievements": [],
@@ -2,8 +2,8 @@

 Covers:

- All eight bundled plugins (brave-free, ddgs, searxng, exa, parallel,
-  tavily, firecrawl, xai) instantiate and self-report the expected
+- All seven bundled plugins (brave-free, ddgs, searxng, exa, parallel,
+  tavily, firecrawl) instantiate and self-report the expected
  capabilities + ABC-derived defaults.
 - Each plugin's ``is_available()`` correctly reflects env-var presence.
 - The web_search_registry resolves an active provider in the documented
@@ -47,7 +47,6 @@ def _clear_web_env(monkeypatch: pytest.MonkeyPatch) -> None:
        "FIRECRAWL_GATEWAY_URL",
        "TOOL_GATEWAY_DOMAIN",
        "TOOL_GATEWAY_USER_TOKEN",
-        "XAI_API_KEY",
    ):
        monkeypatch.delenv(k, raising=False)

@@ -71,7 +70,7 @@ def _isolate_env(monkeypatch: pytest.MonkeyPatch) -> None:


 class TestBundledPluginsRegister:
-    """All eight bundled web plugins discover and register correctly."""
+    """All seven bundled web plugins discover and register correctly."""

    def test_all_seven_plugins_present_in_registry(self) -> None:
        _ensure_plugins_loaded()
@@ -86,7 +85,6 @@ class TestBundledPluginsRegister:
            "parallel",
            "searxng",
            "tavily",
-            "xai",
        ]

    @pytest.mark.parametrize(
@@ -102,8 +100,6 @@ class TestBundledPluginsRegister:
            # disabled in the migration (fell through to a legacy inline
            # path); the follow-up commit enabled it natively.
            ("firecrawl", True, True, True),
-            # xai: search-only via Grok's agentic web_search tool.
-            ("xai", True, False, False),
        ],
    )
    def test_capability_flags_match_spec(
@@ -124,7 +120,7 @@ class TestBundledPluginsRegister:

    @pytest.mark.parametrize(
        "plugin_name",
-        ["brave-free", "ddgs", "searxng", "exa", "parallel", "tavily", "firecrawl", "xai"],
+        ["brave-free", "ddgs", "searxng", "exa", "parallel", "tavily", "firecrawl"],
    )
    def test_each_plugin_has_name_and_display_name(self, plugin_name: str) -> None:
        _ensure_plugins_loaded()
@@ -137,7 +133,7 @@ class TestBundledPluginsRegister:

    @pytest.mark.parametrize(
        "plugin_name",
-        ["brave-free", "ddgs", "searxng", "exa", "parallel", "tavily", "firecrawl", "xai"],
+        ["brave-free", "ddgs", "searxng", "exa", "parallel", "tavily", "firecrawl"],
    )
    def test_each_plugin_has_setup_schema(self, plugin_name: str) -> None:
        """``get_setup_schema()`` returns a dict the picker can consume."""
@@ -243,17 +239,6 @@ class TestIsAvailable:
        # Truthy or falsy, just must not raise.
        _ = bool(p.is_available())

-    def test_xai_requires_api_key_or_oauth(self, monkeypatch: pytest.MonkeyPatch) -> None:
-        """xAI needs XAI_API_KEY or OAuth tokens in auth.json."""
-        _ensure_plugins_loaded()
-        from agent.web_search_registry import get_provider
-
-        p = get_provider("xai")
-        assert p is not None
-        assert p.is_available() is False  # no XAI_API_KEY, no auth.json
-        monkeypatch.setenv("XAI_API_KEY", "real")
-        assert p.is_available() is True
-

 # ---------------------------------------------------------------------------
 # Registry resolution semantics (Option B — conservative smart fallback)
@@ -470,7 +455,7 @@ class TestErrorResponseShapes:
        if result["results"]:
            assert "error" in result["results"][0]

-    def test_firecrawl_crawl_returns_error_dict_when_unconfigured(self):
+    def test_firecrawl_crawl_returns_error_dict_when_unconfigured(self) -> None:
        """firecrawl crawl is async (wraps SDK in to_thread); error must be
        surfaced via the per-page result shape, not raised."""
        _ensure_plugins_loaded()
@@ -488,15 +473,3 @@ class TestErrorResponseShapes:
        assert len(result["results"]) >= 1
        assert "error" in result["results"][0]
        assert result["results"][0]["url"] == "https://example.com"
-
-    def test_xai_search_returns_error_dict_when_unconfigured(self) -> None:
-        """xAI returns a typed error dict (no XAI_API_KEY)."""
-        _ensure_plugins_loaded()
-        from agent.web_search_registry import get_provider
-
-        p = get_provider("xai")
-        assert p is not None
-        result = p.search("test", limit=5)
-        assert isinstance(result, dict)
-        assert result.get("success") is False
-        assert "error" in result
@@ -236,7 +236,7 @@ class TestQwenParity:


 class TestCustomOllamaParity:
-    """Custom/Ollama: num_ctx, thinking controls — now tested via profile."""
+    """Custom/Ollama: num_ctx, think=false — now tested via profile."""

    def test_ollama_num_ctx(self, transport):
        kw = transport.build_kwargs(
@@ -145,7 +145,7 @@ class TestShrinkImagePartsHelper:
        oversized_url = _big_png_data_url(5000)  # ~5 MB raw → ~6.7 MB b64
        shrunk = "data:image/jpeg;base64," + "A" * 1000  # small

-        def _fake_resize(path, mime_type=None, max_base64_bytes=None):
+        def _fake_resize(path, mime_type=None, max_base64_bytes=None, clamp_dimensions=False):
            return shrunk

        monkeypatch.setattr(
@@ -1,260 +0,0 @@
-"""Tests for reactive multimodal-tool-content recovery.
-
-Covers the full chain for providers that reject list-type content in
-``role: "tool"`` messages (Xiaomi MiMo's 400 "text is not set", etc.):
-
-  1. agent/error_classifier.py: 400 with the right wording classifies as
-     ``FailoverReason.multimodal_tool_content_unsupported``.
-  2. run_agent._try_strip_image_parts_from_tool_messages downgrades tool
-     messages whose ``content`` is a list-with-image to a string text
-     summary, in-place, and records the active (provider, model) in
-     ``self._no_list_tool_content_models`` so future tool results in this
-     session preemptively downgrade.
-  3. run_agent._tool_result_content_for_active_model short-circuits to a
-     text summary when the (provider, model) is in the cache, even though
-     ``_model_supports_vision`` returns True — avoiding a wasted round
-     trip on every subsequent screenshot in the session.
-
-The end-to-end retry loop wiring (`conversation_loop.py`) is exercised by
-the classifier signal + helper-mutation tests; the integration only adds
-a trivial flag-and-continue around the existing pattern used for
-``image_too_large`` recovery.
-
-See: https://github.com/NousResearch/hermes-agent/issues/27344
-"""
-
-from __future__ import annotations
-
-import pytest
-
-from agent.error_classifier import FailoverReason, classify_api_error
-
-
-class _FakeApiError(Exception):
-    """Stand-in for an openai.BadRequestError with status_code + body."""
-
-    def __init__(self, status_code: int, message: str, body: dict | None = None):
-        super().__init__(message)
-        self.status_code = status_code
-        self.body = body or {"error": {"message": message}}
-        self.response = None
-
-
-def _make_agent(provider: str = "xiaomi", model: str = "mimo-v2.5"):
-    """Build a bare AIAgent for method-level testing, no provider setup."""
-    from run_agent import AIAgent
-    agent = object.__new__(AIAgent)
-    agent.provider = provider
-    agent.model = model
-    return agent
-
-
-# ─── Strip helper ────────────────────────────────────────────────────────────
-
-
-class TestStripImagePartsHelper:
-    def test_no_messages_returns_false(self):
-        agent = _make_agent()
-        assert agent._try_strip_image_parts_from_tool_messages([]) is False
-        assert agent._try_strip_image_parts_from_tool_messages(None) is False
-
-    def test_no_tool_messages_returns_false(self):
-        agent = _make_agent()
-        msgs = [
-            {"role": "user", "content": "plain text"},
-            {"role": "assistant", "content": "ack"},
-        ]
-        assert agent._try_strip_image_parts_from_tool_messages(msgs) is False
-
-    def test_tool_message_with_string_content_unchanged(self):
-        agent = _make_agent()
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": "plain string result"},
-        ]
-        assert agent._try_strip_image_parts_from_tool_messages(msgs) is False
-        assert msgs[0]["content"] == "plain string result"
-
-    def test_tool_message_list_without_image_unchanged(self):
-        """List content with only text parts is left alone — caller surfaces
-        the original error if this turns out to also be rejected."""
-        agent = _make_agent()
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "text", "text": "hello"},
-            ]},
-        ]
-        assert agent._try_strip_image_parts_from_tool_messages(msgs) is False
-
-    def test_tool_message_list_with_image_downgrades(self):
-        agent = _make_agent()
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "text", "text": "AX summary: 5 buttons visible"},
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBOR..."}},
-            ]},
-        ]
-        assert agent._try_strip_image_parts_from_tool_messages(msgs) is True
-        # Image stripped; text preserved as a string.
-        assert isinstance(msgs[0]["content"], str)
-        assert "AX summary" in msgs[0]["content"]
-        assert "image_url" not in msgs[0]["content"]
-        assert "iVBOR" not in msgs[0]["content"]
-
-    def test_tool_message_image_only_gets_placeholder(self):
-        """If the list had nothing but image parts, leave a placeholder so
-        the assistant message has something to reference."""
-        agent = _make_agent()
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBOR..."}},
-            ]},
-        ]
-        assert agent._try_strip_image_parts_from_tool_messages(msgs) is True
-        assert isinstance(msgs[0]["content"], str)
-        assert "image content removed" in msgs[0]["content"]
-
-    def test_records_provider_model_in_session_cache(self):
-        agent = _make_agent(provider="xiaomi", model="mimo-v2.5")
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "text", "text": "summary"},
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,X"}},
-            ]},
-        ]
-        agent._try_strip_image_parts_from_tool_messages(msgs)
-        assert ("xiaomi", "mimo-v2.5") in agent._no_list_tool_content_models
-
-    def test_only_tool_messages_get_downgraded(self):
-        """User / assistant messages with list-type content are out of
-        scope — they're handled by the existing image-routing path."""
-        agent = _make_agent()
-        msgs = [
-            {"role": "user", "content": [
-                {"type": "text", "text": "describe"},
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,X"}},
-            ]},
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "text", "text": "summary"},
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,Y"}},
-            ]},
-        ]
-        agent._try_strip_image_parts_from_tool_messages(msgs)
-        # User message untouched.
-        assert isinstance(msgs[0]["content"], list)
-        assert any(p.get("type") == "image_url" for p in msgs[0]["content"])
-        # Tool message downgraded.
-        assert isinstance(msgs[1]["content"], str)
-        assert "summary" in msgs[1]["content"]
-
-    def test_skips_recording_when_no_model_id(self):
-        """Don't poison the cache with empty keys when provider/model is
-        unset (e.g. lazy-initialised mid-handshake)."""
-        agent = _make_agent(provider="", model="")
-        msgs = [
-            {"role": "tool", "tool_call_id": "x", "content": [
-                {"type": "text", "text": "summary"},
-                {"type": "image_url", "image_url": {"url": "data:image/png;base64,X"}},
-            ]},
-        ]
-        agent._try_strip_image_parts_from_tool_messages(msgs)
-        assert agent._no_list_tool_content_models == set()
-
-
-# ─── Short-circuit on cached models ──────────────────────────────────────────
-
-
-class TestToolResultContentShortCircuit:
-    """Once the session has learned that (provider, model) rejects list
-    content, ``_tool_result_content_for_active_model`` returns a text
-    summary even though ``_model_supports_vision`` reports True.
-    """
-
-    def _multimodal_result(self, png_b64: str = "iVBORw0KGgoAAAA"):
-        return {
-            "_multimodal": True,
-            "content": [
-                {"type": "text", "text": "capture mode=som 800x600 app=Safari"},
-                {"type": "image_url",
-                 "image_url": {"url": f"data:image/png;base64,{png_b64}"}},
-            ],
-            "text_summary": "capture mode=som 800x600 app=Safari",
-            "meta": {"mode": "som", "width": 800, "height": 600, "elements": 5,
-                     "png_bytes": 1024},
-        }
-
-    def test_returns_list_when_cache_empty_and_vision_supported(self, monkeypatch):
-        agent = _make_agent(provider="xiaomi", model="mimo-v2.5")
-        agent._no_list_tool_content_models = set()  # explicit empty
-        monkeypatch.setattr(agent, "_model_supports_vision", lambda: True)
-        out = agent._tool_result_content_for_active_model(
-            "computer_use", self._multimodal_result()
-        )
-        # Native multimodal path: returns the content parts list.
-        assert isinstance(out, list)
-        assert any(p.get("type") == "image_url" for p in out)
-
-    def test_returns_text_summary_when_model_in_cache(self, monkeypatch):
-        agent = _make_agent(provider="xiaomi", model="mimo-v2.5")
-        agent._no_list_tool_content_models = {("xiaomi", "mimo-v2.5")}
-        monkeypatch.setattr(agent, "_model_supports_vision", lambda: True)
-        out = agent._tool_result_content_for_active_model(
-            "computer_use", self._multimodal_result()
-        )
-        # Short-circuit: a plain string summary, no image_url present.
-        assert isinstance(out, str)
-        assert "data:image" not in out
-        assert "image_url" not in out
-
-    def test_cache_miss_on_different_model(self, monkeypatch):
-        """Cache is per (provider, model). A cached entry for mimo-v2.5
-        must NOT affect a session running on a different model.
-        """
-        agent = _make_agent(provider="xiaomi", model="mimo-v2.5-pro")
-        agent._no_list_tool_content_models = {("xiaomi", "mimo-v2.5")}
-        monkeypatch.setattr(agent, "_model_supports_vision", lambda: True)
-        out = agent._tool_result_content_for_active_model(
-            "computer_use", self._multimodal_result()
-        )
-        assert isinstance(out, list)
-
-    def test_missing_cache_attribute_falls_through(self, monkeypatch):
-        """Tests that build agents via ``object.__new__`` without calling
-        ``__init__`` must not crash — the cache attribute may be absent.
-        """
-        agent = _make_agent()
-        # Deliberately do not assign _no_list_tool_content_models.
-        monkeypatch.setattr(agent, "_model_supports_vision", lambda: True)
-        out = agent._tool_result_content_for_active_model(
-            "computer_use", self._multimodal_result()
-        )
-        assert isinstance(out, list)
-
-
-# ─── Classifier ──────────────────────────────────────────────────────────────
-
-
-class TestRecoveryEndToEndClassification:
-    """Lock in that the patterns used by the recovery path classify to
-    the right ``FailoverReason``. (The recovery hook in
-    ``agent.conversation_loop`` consumes this reason directly.)
-    """
-
-    def test_xiaomi_mimo_classifies(self):
-        err = _FakeApiError(
-            status_code=400,
-            message=(
-                "Error code: 400 - {'error': {'code': '400', 'message': "
-                "'Param Incorrect', 'param': 'text is not set', 'type': ''}}"
-            ),
-        )
-        result = classify_api_error(err, provider="xiaomi", model="mimo-v2.5")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
-        assert result.retryable is True
-
-    def test_alibaba_variant_classifies(self):
-        err = _FakeApiError(
-            status_code=400,
-            message="tool_call.content must be string",
-        )
-        result = classify_api_error(err, provider="alibaba", model="qwen3.5-plus")
-        assert result.reason == FailoverReason.multimodal_tool_content_unsupported
@@ -2636,31 +2636,6 @@ class TestRunConversation:
        assert result["final_response"] == "Final answer"
        assert result["completed"] is True

-    def test_ollama_small_runtime_context_fails_before_api_call(self, agent, caplog):
-        self._setup_agent(agent)
-        agent.model = "qwen3.5:9b"
-        agent.provider = "custom"
-        agent.base_url = "http://host.docker.internal:11434/v1"
-        agent._ollama_num_ctx = 4096
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-            caplog.at_level(logging.WARNING, logger="agent.conversation_loop"),
-        ):
-            result = agent.run_conversation("Call ps -aux")
-
-        assert result["failed"] is True
-        assert result["completed"] is False
-        assert result["api_calls"] == 0
-        assert result["turn_exit_reason"] == "ollama_runtime_context_too_small"
-        assert "Ollama loaded `qwen3.5:9b` with only 4,096 tokens" in result["final_response"]
-        assert "model.ollama_num_ctx: 65536" in result["final_response"]
-        assert not agent.client.chat.completions.create.called
-        assert "Ollama runtime context too small for Hermes tool use" in caplog.text
-        assert "runtime_context=4096" in caplog.text
-
    def test_tool_calls_then_stop(self, agent):
        self._setup_agent(agent)
        tc = _mock_tool_call(name="web_search", arguments="{}", call_id="c1")
@@ -1,491 +0,0 @@
-"""Hermetic tests for the Bitwarden Secrets Manager integration.
-
-We never hit GitHub or Bitwarden in tests — subprocess + urllib are
-mocked so the suite stays fast and offline-safe.  The "live" pull and
-binary download are exercised manually by `hermes secrets bitwarden
-setup` outside of pytest.
-"""
-
-from __future__ import annotations
-
-import hashlib
-import io
-import json
-import os
-import stat
-import subprocess
-import sys
-import tempfile
-import time
-import zipfile
-from pathlib import Path
-from unittest import mock
-
-import pytest
-
-
-# Make the worktree importable without depending on the installed wheel.
-ROOT = Path(__file__).resolve().parents[1]
-if str(ROOT) not in sys.path:
-    sys.path.insert(0, str(ROOT))
-
-from agent.secret_sources import bitwarden as bw  # noqa: E402
-
-
-@pytest.fixture(autouse=True)
-def _reset_caches():
-    bw._reset_cache_for_tests()
-    yield
-    bw._reset_cache_for_tests()
-
-
-@pytest.fixture
-def hermes_home(tmp_path, monkeypatch):
-    """Point Hermes at an isolated home directory."""
-    home = tmp_path / ".hermes"
-    home.mkdir()
-    monkeypatch.setenv("HERMES_HOME", str(home))
-    # Some modules cache get_hermes_home; clear if needed.
-    import hermes_constants
-    if hasattr(hermes_constants, "_HERMES_HOME_CACHE"):
-        hermes_constants._HERMES_HOME_CACHE = None  # type: ignore[attr-defined]
-    return home
-
-
-# ---------------------------------------------------------------------------
-# _platform_asset_name
-# ---------------------------------------------------------------------------
-
-
-@pytest.mark.parametrize(
-    "system,machine,libc_text,expected",
-    [
-        ("Darwin", "x86_64", "",
-         f"bws-macos-universal-{bw._BWS_VERSION}.zip"),
-        ("Darwin", "arm64", "",
-         f"bws-macos-universal-{bw._BWS_VERSION}.zip"),
-        ("Linux", "x86_64", "glibc",
-         f"bws-x86_64-unknown-linux-gnu-{bw._BWS_VERSION}.zip"),
-        ("Linux", "x86_64", "musl libc",
-         f"bws-x86_64-unknown-linux-musl-{bw._BWS_VERSION}.zip"),
-        ("Linux", "aarch64", "",
-         f"bws-aarch64-unknown-linux-gnu-{bw._BWS_VERSION}.zip"),
-        ("Windows", "AMD64", "",
-         f"bws-x86_64-pc-windows-msvc-{bw._BWS_VERSION}.zip"),
-        ("Windows", "ARM64", "",
-         f"bws-aarch64-pc-windows-msvc-{bw._BWS_VERSION}.zip"),
-    ],
-)
-def test_platform_asset_name(system, machine, libc_text, expected):
-    with mock.patch.object(bw.platform, "system", return_value=system), \
-         mock.patch.object(bw.platform, "machine", return_value=machine), \
-         mock.patch.object(
-             bw.subprocess,
-             "run",
-             return_value=mock.Mock(stdout=libc_text, stderr=libc_text),
-         ):
-        assert bw._platform_asset_name() == expected
-
-
-# ---------------------------------------------------------------------------
-# install_bws — fully mocked HTTP
-# ---------------------------------------------------------------------------
-
-
-def _make_fake_zip(binary_bytes: bytes) -> bytes:
-    buf = io.BytesIO()
-    with zipfile.ZipFile(buf, "w") as zf:
-        zf.writestr("bws", binary_bytes)
-    return buf.getvalue()
-
-
-def test_install_bws_happy_path(hermes_home, monkeypatch):
-    fake_binary = b"#!/bin/sh\necho 'bws fake 2.0.0'\n"
-    zip_bytes = _make_fake_zip(fake_binary)
-    asset_name = bw._platform_asset_name()
-    checksum_text = (
-        f"{hashlib.sha256(zip_bytes).hexdigest()}  {asset_name}\n"
-        "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff  other-file\n"
-    )
-
-    def fake_download(url, dest):
-        if url.endswith(".zip"):
-            Path(dest).write_bytes(zip_bytes)
-        elif url.endswith(".txt"):
-            Path(dest).write_text(checksum_text)
-        else:
-            raise AssertionError(f"unexpected download url: {url}")
-
-    monkeypatch.setattr(bw, "_http_download", fake_download)
-
-    path = bw.install_bws()
-    assert path.exists()
-    assert path.read_bytes() == fake_binary
-    # Executable bit set
-    assert path.stat().st_mode & stat.S_IXUSR
-
-
-def test_install_bws_checksum_mismatch(hermes_home, monkeypatch):
-    zip_bytes = _make_fake_zip(b"contents")
-    asset_name = bw._platform_asset_name()
-    wrong_checksum = "0" * 64
-    checksum_text = f"{wrong_checksum}  {asset_name}\n"
-
-    def fake_download(url, dest):
-        if url.endswith(".zip"):
-            Path(dest).write_bytes(zip_bytes)
-        else:
-            Path(dest).write_text(checksum_text)
-
-    monkeypatch.setattr(bw, "_http_download", fake_download)
-
-    with pytest.raises(RuntimeError, match="Checksum mismatch"):
-        bw.install_bws()
-
-
-def test_install_bws_missing_checksum_entry(hermes_home, monkeypatch):
-    zip_bytes = _make_fake_zip(b"x")
-
-    def fake_download(url, dest):
-        if url.endswith(".zip"):
-            Path(dest).write_bytes(zip_bytes)
-        else:
-            Path(dest).write_text("ffffffff  some-other-file.zip\n")
-
-    monkeypatch.setattr(bw, "_http_download", fake_download)
-
-    with pytest.raises(RuntimeError, match="No checksum entry"):
-        bw.install_bws()
-
-
-# ---------------------------------------------------------------------------
-# fetch_bitwarden_secrets
-# ---------------------------------------------------------------------------
-
-
-def _fake_bws_payload(items):
-    return json.dumps(items)
-
-
-def test_fetch_happy_path(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([
-        {"key": "OPENAI_API_KEY", "value": "sk-abc"},
-        {"key": "ANTHROPIC_API_KEY", "value": "sk-ant-xyz"},
-    ])
-
-    def fake_run(cmd, **kwargs):
-        assert cmd[0] == str(fake_binary)
-        assert "secret" in cmd and "list" in cmd
-        assert kwargs["env"]["BWS_ACCESS_TOKEN"] == "0.fake.token"
-        return mock.Mock(returncode=0, stdout=payload, stderr="")
-
-    monkeypatch.setattr(bw.subprocess, "run", fake_run)
-
-    secrets, warnings = bw.fetch_bitwarden_secrets(
-        access_token="0.fake.token",
-        project_id="proj-uuid",
-        binary=fake_binary,
-        use_cache=False,
-    )
-    assert secrets == {
-        "OPENAI_API_KEY": "sk-abc",
-        "ANTHROPIC_API_KEY": "sk-ant-xyz",
-    }
-    assert warnings == []
-
-
-def test_fetch_skips_invalid_env_names(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([
-        {"key": "VALID_KEY", "value": "v1"},
-        {"key": "1BAD_START", "value": "v2"},
-        {"key": "has spaces", "value": "v3"},
-        {"key": "DASH-KEY", "value": "v4"},
-    ])
-
-    monkeypatch.setattr(
-        bw.subprocess,
-        "run",
-        lambda *a, **kw: mock.Mock(returncode=0, stdout=payload, stderr=""),
-    )
-
-    secrets, warnings = bw.fetch_bitwarden_secrets(
-        access_token="0.t",
-        project_id="p",
-        binary=fake_binary,
-        use_cache=False,
-    )
-    assert secrets == {"VALID_KEY": "v1"}
-    assert len(warnings) == 3
-
-
-def test_fetch_auth_failure(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-
-    monkeypatch.setattr(
-        bw.subprocess,
-        "run",
-        lambda *a, **kw: mock.Mock(
-            returncode=1, stdout="", stderr="Error: invalid access token"
-        ),
-    )
-
-    with pytest.raises(RuntimeError, match="invalid access token"):
-        bw.fetch_bitwarden_secrets(
-            access_token="0.bad",
-            project_id="p",
-            binary=fake_binary,
-            use_cache=False,
-        )
-
-
-def test_fetch_timeout(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-
-    def fake_run(*a, **kw):
-        raise subprocess.TimeoutExpired(cmd="bws", timeout=30)
-
-    monkeypatch.setattr(bw.subprocess, "run", fake_run)
-
-    with pytest.raises(RuntimeError, match="timed out"):
-        bw.fetch_bitwarden_secrets(
-            access_token="0.t",
-            project_id="p",
-            binary=fake_binary,
-            use_cache=False,
-        )
-
-
-def test_fetch_non_json(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-
-    monkeypatch.setattr(
-        bw.subprocess,
-        "run",
-        lambda *a, **kw: mock.Mock(
-            returncode=0, stdout="not json at all", stderr=""
-        ),
-    )
-
-    with pytest.raises(RuntimeError, match="non-JSON"):
-        bw.fetch_bitwarden_secrets(
-            access_token="0.t",
-            project_id="p",
-            binary=fake_binary,
-            use_cache=False,
-        )
-
-
-def test_fetch_cache_hits(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([{"key": "K", "value": "v"}])
-
-    call_count = {"n": 0}
-    def fake_run(*a, **kw):
-        call_count["n"] += 1
-        return mock.Mock(returncode=0, stdout=payload, stderr="")
-
-    monkeypatch.setattr(bw.subprocess, "run", fake_run)
-
-    bw.fetch_bitwarden_secrets(access_token="0.t", project_id="p",
-                                binary=fake_binary, cache_ttl_seconds=60)
-    bw.fetch_bitwarden_secrets(access_token="0.t", project_id="p",
-                                binary=fake_binary, cache_ttl_seconds=60)
-    assert call_count["n"] == 1  # cached on second call
-
-
-def test_fetch_cache_disabled(monkeypatch, tmp_path):
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([])
-    call_count = {"n": 0}
-    def fake_run(*a, **kw):
-        call_count["n"] += 1
-        return mock.Mock(returncode=0, stdout=payload, stderr="")
-    monkeypatch.setattr(bw.subprocess, "run", fake_run)
-
-    bw.fetch_bitwarden_secrets(access_token="0.t", project_id="p",
-                                binary=fake_binary, use_cache=False)
-    bw.fetch_bitwarden_secrets(access_token="0.t", project_id="p",
-                                binary=fake_binary, use_cache=False)
-    assert call_count["n"] == 2
-
-
-# ---------------------------------------------------------------------------
-# apply_bitwarden_secrets — the public entry point used by env_loader
-# ---------------------------------------------------------------------------
-
-
-def test_apply_disabled_returns_empty():
-    result = bw.apply_bitwarden_secrets(enabled=False, project_id="p")
-    assert result.ok
-    assert not result.applied
-    assert not result.error
-
-
-def test_apply_missing_token(monkeypatch):
-    monkeypatch.delenv("BWS_ACCESS_TOKEN", raising=False)
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="p", auto_install=False
-    )
-    assert not result.ok
-    assert "BWS_ACCESS_TOKEN" in result.error
-
-
-def test_apply_missing_project_id(monkeypatch):
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.t")
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="", auto_install=False
-    )
-    assert not result.ok
-    assert "project_id" in result.error
-
-
-def test_apply_does_not_override_existing(monkeypatch, tmp_path):
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.t")
-    monkeypatch.setenv("OPENAI_API_KEY", "existing-value")
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([
-        {"key": "OPENAI_API_KEY", "value": "bsm-value"},
-        {"key": "NEW_KEY", "value": "new-value"},
-    ])
-    monkeypatch.setattr(
-        bw.subprocess, "run",
-        lambda *a, **kw: mock.Mock(returncode=0, stdout=payload, stderr=""),
-    )
-    monkeypatch.setattr(bw, "find_bws", lambda **kw: fake_binary)
-
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="p",
-        override_existing=False, auto_install=False,
-    )
-    assert result.ok
-    assert "NEW_KEY" in result.applied
-    assert "OPENAI_API_KEY" in result.skipped
-    assert os.environ["OPENAI_API_KEY"] == "existing-value"
-    assert os.environ["NEW_KEY"] == "new-value"
-
-
-def test_apply_override_existing(monkeypatch, tmp_path):
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.t")
-    monkeypatch.setenv("OPENAI_API_KEY", "stale")
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([{"key": "OPENAI_API_KEY", "value": "fresh"}])
-    monkeypatch.setattr(
-        bw.subprocess, "run",
-        lambda *a, **kw: mock.Mock(returncode=0, stdout=payload, stderr=""),
-    )
-    monkeypatch.setattr(bw, "find_bws", lambda **kw: fake_binary)
-
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="p",
-        override_existing=True, auto_install=False,
-    )
-    assert result.ok
-    assert os.environ["OPENAI_API_KEY"] == "fresh"
-
-
-def test_apply_never_overrides_bootstrap_token(monkeypatch, tmp_path):
-    """Even with override_existing=True, the access-token var is preserved."""
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.original")
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    payload = _fake_bws_payload([
-        {"key": "BWS_ACCESS_TOKEN", "value": "0.malicious-replacement"},
-    ])
-    monkeypatch.setattr(
-        bw.subprocess, "run",
-        lambda *a, **kw: mock.Mock(returncode=0, stdout=payload, stderr=""),
-    )
-    monkeypatch.setattr(bw, "find_bws", lambda **kw: fake_binary)
-
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="p",
-        override_existing=True, auto_install=False,
-    )
-    assert os.environ["BWS_ACCESS_TOKEN"] == "0.original"
-    assert "BWS_ACCESS_TOKEN" in result.skipped
-
-
-def test_apply_swallows_fetch_errors(monkeypatch, tmp_path):
-    """A fetch failure produces an error, NOT an exception."""
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.t")
-    fake_binary = tmp_path / "bws"
-    fake_binary.write_text("")
-    monkeypatch.setattr(
-        bw.subprocess, "run",
-        lambda *a, **kw: mock.Mock(returncode=1, stdout="", stderr="bad token"),
-    )
-    monkeypatch.setattr(bw, "find_bws", lambda **kw: fake_binary)
-
-    result = bw.apply_bitwarden_secrets(
-        enabled=True, project_id="p", auto_install=False,
-    )
-    assert not result.ok
-    assert "bad token" in result.error
-
-
-# ---------------------------------------------------------------------------
-# env_loader integration
-# ---------------------------------------------------------------------------
-
-
-def test_env_loader_skips_when_disabled(tmp_path, monkeypatch):
-    """No config.yaml present → no BSM call, no crash."""
-    home = tmp_path / ".hermes"
-    home.mkdir()
-    monkeypatch.setenv("HERMES_HOME", str(home))
-    monkeypatch.setattr(Path, "home", lambda: tmp_path)
-
-    from hermes_cli.env_loader import _apply_external_secret_sources
-    # Should be a no-op (returns None).
-    assert _apply_external_secret_sources(home) is None
-
-
-def test_env_loader_calls_bsm_when_enabled(tmp_path, monkeypatch):
-    home = tmp_path / ".hermes"
-    home.mkdir()
-    (home / "config.yaml").write_text(
-        "secrets:\n"
-        "  bitwarden:\n"
-        "    enabled: true\n"
-        "    project_id: 'proj-1'\n"
-        "    access_token_env: 'BWS_ACCESS_TOKEN'\n"
-        "    cache_ttl_seconds: 0\n"
-        "    override_existing: false\n"
-        "    auto_install: false\n"
-    )
-    monkeypatch.setenv("HERMES_HOME", str(home))
-    monkeypatch.setenv("BWS_ACCESS_TOKEN", "0.t")
-    monkeypatch.delenv("MY_BSM_KEY", raising=False)
-
-    called = {"n": 0}
-    def fake_apply(**kwargs):
-        called["n"] += 1
-        assert kwargs["enabled"] is True
-        assert kwargs["project_id"] == "proj-1"
-        os.environ["MY_BSM_KEY"] = "from-bsm"
-        return bw.FetchResult(
-            secrets={"MY_BSM_KEY": "from-bsm"},
-            applied=["MY_BSM_KEY"],
-        )
-
-    monkeypatch.setattr(
-        "agent.secret_sources.bitwarden.apply_bitwarden_secrets",
-        fake_apply,
-    )
-
-    from hermes_cli.env_loader import _apply_external_secret_sources
-    _apply_external_secret_sources(home)
-
-    assert called["n"] == 1
-    assert os.environ.get("MY_BSM_KEY") == "from-bsm"
@@ -1,119 +0,0 @@
-"""Tests for the secret-source tracking in ``hermes_cli.env_loader``.
-
-These cover the small public surface that lets `hermes model` / `hermes setup`
-label detected credentials with their origin ("from Bitwarden") so users
-don't see an unexplained "credentials ✓" line when their .env is empty.
-"""
-
-from __future__ import annotations
-
-import sys
-from pathlib import Path
-
-import pytest
-
-
-ROOT = Path(__file__).resolve().parents[1]
-if str(ROOT) not in sys.path:
-    sys.path.insert(0, str(ROOT))
-
-from hermes_cli import env_loader  # noqa: E402
-
-
-@pytest.fixture(autouse=True)
-def _reset_sources():
-    """Each test starts with a clean source map."""
-    env_loader._SECRET_SOURCES.clear()
-    yield
-    env_loader._SECRET_SOURCES.clear()
-
-
-def test_get_secret_source_returns_none_for_untracked_var():
-    assert env_loader.get_secret_source("ANTHROPIC_API_KEY") is None
-
-
-def test_get_secret_source_returns_label_for_tracked_var():
-    env_loader._SECRET_SOURCES["ANTHROPIC_API_KEY"] = "bitwarden"
-    assert env_loader.get_secret_source("ANTHROPIC_API_KEY") == "bitwarden"
-
-
-def test_format_secret_source_suffix_empty_for_untracked():
-    # Credentials from .env or the shell shouldn't add noise — the
-    # implicit case stays unlabeled.
-    assert env_loader.format_secret_source_suffix("ANTHROPIC_API_KEY") == ""
-
-
-def test_format_secret_source_suffix_bitwarden_uses_proper_name():
-    env_loader._SECRET_SOURCES["ANTHROPIC_API_KEY"] = "bitwarden"
-    assert (
-        env_loader.format_secret_source_suffix("ANTHROPIC_API_KEY")
-        == " (from Bitwarden)"
-    )
-
-
-def test_format_secret_source_suffix_generic_label_for_future_sources():
-    # Future-proofing: a new secret source (e.g. "vault") should still
-    # produce a sensible label without needing to edit every call site.
-    env_loader._SECRET_SOURCES["OPENAI_API_KEY"] = "vault"
-    assert (
-        env_loader.format_secret_source_suffix("OPENAI_API_KEY")
-        == " (from vault)"
-    )
-
-
-def test_apply_external_secret_sources_records_bitwarden_origin(tmp_path, monkeypatch):
-    """End-to-end: when ``apply_bitwarden_secrets`` returns applied keys,
-    they end up in ``_SECRET_SOURCES`` so the UI can label them."""
-
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-    config_path = tmp_path / "config.yaml"
-    config_path.write_text(
-        "secrets:\n"
-        "  bitwarden:\n"
-        "    enabled: true\n"
-        "    project_id: test-project\n"
-        "    access_token_env: BWS_ACCESS_TOKEN\n",
-        encoding="utf-8",
-    )
-
-    # Stub apply_bitwarden_secrets to return a synthetic FetchResult.
-    from agent.secret_sources.bitwarden import FetchResult
-
-    fake_result = FetchResult(
-        secrets={"ANTHROPIC_API_KEY": "sk-ant-test"},
-        applied=["ANTHROPIC_API_KEY"],
-    )
-
-    def _fake_apply(**_kwargs):
-        return fake_result
-
-    # The import inside _apply_external_secret_sources is lazy, so we
-    # patch the *module attribute* it will pull in.
-    import agent.secret_sources.bitwarden as bw_module
-
-    monkeypatch.setattr(bw_module, "apply_bitwarden_secrets", _fake_apply)
-
-    env_loader._apply_external_secret_sources(tmp_path)
-
-    assert env_loader.get_secret_source("ANTHROPIC_API_KEY") == "bitwarden"
-    assert (
-        env_loader.format_secret_source_suffix("ANTHROPIC_API_KEY")
-        == " (from Bitwarden)"
-    )
-
-
-def test_apply_external_secret_sources_noop_when_disabled(tmp_path, monkeypatch):
-    """Disabled Bitwarden config must not touch the source map."""
-
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-    config_path = tmp_path / "config.yaml"
-    config_path.write_text(
-        "secrets:\n"
-        "  bitwarden:\n"
-        "    enabled: false\n",
-        encoding="utf-8",
-    )
-
-    env_loader._apply_external_secret_sources(tmp_path)
-
-    assert env_loader.get_secret_source("ANTHROPIC_API_KEY") is None
@@ -1,187 +0,0 @@
-"""Verify scripts/run_tests_parallel.py kills test-spawned grandchildren.
-
-Setup
-----
-A test in this file spawns a long-lived Python grandchild that writes
-its PID + a nonce to a tempfile, then exits without cleaning up.
-With the old ``subprocess.run`` runner, that grandchild would orphan
-and outlive the test (and the whole runner). With the current Popen +
-``start_new_session`` + ``_kill_tree`` runner, the grandchild gets
-SIGKILL'd via process-group kill when its file's pytest exits.
-
-The leaker test always passes — its only job is to spawn a grandchild
-and walk away. The verifier runs the runner over the leaker file in a
-subprocess, then waits for the grandchild PID to disappear from the
-kernel's process table.
-
-POSIX-only: Windows has its own grandchild lifecycle (no shared session,
-``taskkill /F /T`` semantics). Marked accordingly.
-"""
-
-from __future__ import annotations
-
-import json
-import os
-import subprocess
-import sys
-import textwrap
-import time
-from pathlib import Path
-
-import pytest
-
-
-# Both tests share the same handoff file: the leaker writes here, the
-# verifier reads here. We park it in $TMPDIR with a unique-per-run name
-# so concurrent invocations of the suite don't clobber each other.
-_HANDOFF_DIR = Path(os.environ.get("TMPDIR", "/tmp")) / "hermes-isolation-probe"
-_HANDOFF_DIR.mkdir(exist_ok=True)
-
-
-def _handoff_path_for(nonce: str) -> Path:
-    return _HANDOFF_DIR / f"grandchild-{nonce}.json"
-
-
-def _pid_alive(pid: int) -> bool:
-    """POSIX: send signal 0 to probe whether ``pid`` is still alive.
-
-    ``os.kill(pid, 0)`` raises ``ProcessLookupError`` if the process is
-    gone, ``PermissionError`` if it exists but we can't signal it
-    (someone else's pid). We treat PermissionError as "alive" because
-    the process exists and that's all we need to know.
-    """
-    if sys.platform == "win32":  # pragma: no cover — POSIX-only test
-        # On Windows we'd use OpenProcess + GetExitCodeProcess; this
-        # test is skipped on Windows so the path is unreachable.
-        raise RuntimeError("_pid_alive POSIX-only")
-    try:
-        os.kill(pid, 0)
-    except ProcessLookupError:
-        return False
-    except PermissionError:
-        return True
-    return True
-
-
-@pytest.mark.skipif(sys.platform == "win32", reason="POSIX-only probe")
-@pytest.mark.live_system_guard_bypass
-def test_grandchild_leak_is_killed_by_runner(tmp_path: Path) -> None:
-    """Run the parallel runner over a probe file and verify cleanup.
-
-    1. Materialize a probe file that spawns a long-lived grandchild and
-       writes its PID to disk before exiting.
-    2. Invoke ``scripts/run_tests_parallel.py`` against the probe file.
-    3. Wait for the grandchild PID to vanish (poll for ~5s).
-    4. Assert the runner exited cleanly AND the grandchild is dead.
-    """
-    repo_root = Path(__file__).resolve().parent.parent
-    runner = repo_root / "scripts" / "run_tests_parallel.py"
-    assert runner.exists(), f"runner missing at {runner}"
-
-    # Probe lives in a temp dir, NOT under tests/, so the regular suite
-    # never picks it up — only our explicit invocation does.
-    probe_dir = tmp_path / "probe"
-    probe_dir.mkdir()
-    probe = probe_dir / "test_probe_leaker.py"
-    nonce = f"{os.getpid()}-{int(time.time() * 1000)}"
-    handoff = _handoff_path_for(nonce)
-    if handoff.exists():
-        handoff.unlink()
-
-    probe_src = textwrap.dedent(f"""
-        import json, os, subprocess, sys, time
-        from pathlib import Path
-
-        HANDOFF = Path({str(handoff)!r})
-
-        def test_spawns_grandchild_and_walks_away():
-            # Long-lived grandchild: detached, ignores SIGTERM (we want
-            # SIGKILL or process-group kill to be the only thing that
-            # works, simulating a misbehaving server).
-            child = subprocess.Popen(
-                [
-                    sys.executable, "-c",
-                    "import os, signal, sys, time; "
-                    "signal.signal(signal.SIGTERM, signal.SIG_IGN); "
-                    "sys.stdout.write(f'gc-pgid={{os.getpgid(0)}} gc-pid={{os.getpid()}}\\\\n'); "
-                    "sys.stdout.flush(); "
-                    "time.sleep(600)",
-                ],
-                stdout=subprocess.PIPE,
-                stderr=subprocess.STDOUT,
-                # IMPORTANT: do NOT pass start_new_session here. We want
-                # the grandchild to inherit the pytest subprocess's
-                # process group, so when the runner kills the group the
-                # grandchild dies too.
-            )
-            # Read the first line so we can record gc's pgid in the
-            # handoff, then walk away — don't close the pipe (would
-            # signal EOF and let the child see SIGPIPE on next write).
-            first_line = child.stdout.readline().decode().strip()
-            HANDOFF.write_text(json.dumps({{
-                "pid": child.pid,
-                "diag": first_line,
-                "test_pid": os.getpid(),
-                "test_pgid": os.getpgid(0),
-            }}))
-            assert child.pid > 0
-    """).strip()
-    probe.write_text(probe_src + "\n")
-
-    # Run the parallel runner against just the probe file. The runner
-    # discovers under ``tests/`` by default, so we override via --paths.
-    proc = subprocess.run(
-        [
-            sys.executable,
-            str(runner),
-            "--paths",
-            str(probe_dir),
-            "-j",
-            "1",
-            # Tight per-file timeout: the probe finishes in <1s, no
-            # need for 10min.
-            "--file-timeout",
-            "30",
-        ],
-        cwd=repo_root,
-        stdout=subprocess.PIPE,
-        stderr=subprocess.STDOUT,
-        text=True,
-        timeout=60,
-    )
-
-    assert handoff.exists(), (
-        f"probe never wrote handoff file; runner output:\n{proc.stdout}"
-    )
-    handoff_data = json.loads(handoff.read_text())
-    grandchild_pid = handoff_data["pid"]
-    diag = handoff_data.get("diag", "(no diag)")
-    test_pid = handoff_data.get("test_pid")
-    test_pgid = handoff_data.get("test_pgid")
-    handoff.unlink()
-
-    # The runner must have exited cleanly (probe test passes).
-    assert proc.returncode == 0, (
-        f"runner exited {proc.returncode}; output:\n{proc.stdout}"
-    )
-
-    # The grandchild must be gone. Poll for a bit because process-group
-    # SIGKILL + reaping isn't synchronous; on a loaded box it can take
-    # a beat.
-    deadline = time.monotonic() + 5.0
-    while time.monotonic() < deadline:
-        if not _pid_alive(grandchild_pid):
-            break
-        time.sleep(0.05)
-    else:
-        # Test cleanup: kill the leaked grandchild ourselves so a
-        # FAILED assertion doesn't leave a sleep(600) running.
-        try:
-            os.kill(grandchild_pid, 9)
-        except ProcessLookupError:
-            pass
-        pytest.fail(
-            f"grandchild PID {grandchild_pid} survived runner exit; "
-            f"diag={diag!r} test_pid={test_pid} test_pgid={test_pgid}; "
-            f"runner output:\n{proc.stdout}"
-        )
@@ -59,59 +59,6 @@ def test_write_json_returns_false_on_broken_pipe(monkeypatch):
    assert server.write_json({"ok": True}) is False


-def test_tui_verbose_tool_details_fail_closed_when_redaction_fails(monkeypatch):
-    redact_module = types.ModuleType("agent.redact")
-
-    def fail_redaction(*_args, **_kwargs):
-        raise RuntimeError("redaction unavailable")
-
-    setattr(redact_module, "redact_sensitive_text", fail_redaction)
-    monkeypatch.setitem(sys.modules, "agent.redact", redact_module)
-
-    assert server._redact_tui_verbose_text("api_key=secret") == ""
-    assert server._tool_args_text({"api_key": "secret"}) == ""
-    assert server._tool_result_text("token=secret") == ""
-
-
-def test_tui_verbose_tool_details_are_capped_before_emit(monkeypatch):
-    monkeypatch.setattr(server, "_TUI_VERBOSE_TEXT_MAX_CHARS", 12)
-    monkeypatch.setattr(server, "_TUI_VERBOSE_TEXT_MAX_LINES", 2)
-
-    capped = server._cap_tui_verbose_text("one\ntwo\nthree\nfour")
-
-    assert capped.startswith("[showing verbose tail; omitted ")
-    assert capped.endswith("three\nfour")
-    assert "one" not in capped
-
-
-def test_tui_verbose_tool_events_omit_details_when_redaction_fails(monkeypatch):
-    redact_module = types.ModuleType("agent.redact")
-
-    def fail_redaction(*_args, **_kwargs):
-        raise RuntimeError("redaction unavailable")
-
-    setattr(redact_module, "redact_sensitive_text", fail_redaction)
-    monkeypatch.setitem(sys.modules, "agent.redact", redact_module)
-
-    events: list[tuple[str, str, dict]] = []
-    monkeypatch.setattr(
-        server, "_emit", lambda event_type, sid, payload: events.append((event_type, sid, payload))
-    )
-    monkeypatch.setitem(
-        server._sessions,
-        "redaction-test",
-        {"tool_progress_mode": "verbose", "tool_started_at": {}},
-    )
-
-    server._on_tool_start("redaction-test", "tool-1", "terminal", {"command": "pwd"})
-    server._on_tool_complete("redaction-test", "tool-1", "terminal", {"command": "pwd"}, "done")
-
-    assert events[0][0] == "tool.start"
-    assert events[1][0] == "tool.complete"
-    assert "args_text" not in events[0][2]
-    assert "result_text" not in events[1][2]
-
-
 def test_dispatch_rejects_non_object_request():
    resp = server.dispatch([])

@@ -1529,10 +1476,8 @@ def test_config_mouse_uses_documented_key_with_legacy_fallback(monkeypatch):
    set_toggle = server.handle_request(
        {"id": "2", "method": "config.set", "params": {"key": "mouse"}}
    )
-    # /mouse (no arg) toggles between 'all' and 'off'. Starting from
-    # tui_mouse: False (→ 'off'), the toggle flips to 'all'.
-    assert set_toggle["result"] == {"key": "mouse", "value": "all"}
-    assert writes == [("display.mouse_tracking", "all")]
+    assert set_toggle["result"] == {"key": "mouse", "value": "on"}
+    assert writes == [("display.mouse_tracking", True)]

    cfg["display"] = {"mouse_tracking": 0, "tui_mouse": True}
    get_canonical = server.handle_request(
@@ -1544,51 +1489,7 @@ def test_config_mouse_uses_documented_key_with_legacy_fallback(monkeypatch):
    get_null = server.handle_request(
        {"id": "4", "method": "config.get", "params": {"key": "mouse"}}
    )
-    # mouse_tracking present-but-None defers neither to tui_mouse nor to
-    # the legacy off bucket: it falls through to the 'all' default.
-    assert get_null["result"]["value"] == "all"
-
-
-def test_config_mouse_accepts_preset_strings_and_aliases(monkeypatch):
-    cfg = {"display": {"mouse_tracking": "all"}}
-    writes = []
-
-    monkeypatch.setattr(server, "_load_cfg", lambda: cfg)
-    monkeypatch.setattr(
-        server, "_write_config_key", lambda path, value: writes.append((path, value))
-    )
-
-    # Direct preset.
-    set_wheel = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"key": "mouse", "value": "wheel"},
-        }
-    )
-    assert set_wheel["result"] == {"key": "mouse", "value": "wheel"}
-    assert writes[-1] == ("display.mouse_tracking", "wheel")
-
-    # Alias for buttons.
-    set_click = server.handle_request(
-        {
-            "id": "2",
-            "method": "config.set",
-            "params": {"key": "mouse", "value": "click"},
-        }
-    )
-    assert set_click["result"] == {"key": "mouse", "value": "buttons"}
-    assert writes[-1] == ("display.mouse_tracking", "buttons")
-
-    # Unknown value → 4002.
-    bad = server.handle_request(
-        {
-            "id": "3",
-            "method": "config.set",
-            "params": {"key": "mouse", "value": "rainbows"},
-        }
-    )
-    assert bad["error"]["code"] == 4002
+    assert get_null["result"]["value"] == "on"


 def test_enable_gateway_prompts_sets_gateway_env(monkeypatch):
@@ -1,69 +0,0 @@
-"""Shared fixtures for tests/tools/ web-provider tests.
-
-Per-file subprocess isolation means each test file gets a fresh interpreter,
-so module-level state (like the web-search-provider registry) is empty when
-a file starts.  The ``web_registry_populated`` fixture registers all bundled
-providers before each test and resets the registry afterwards — tests that
-depend on the registry being populated should use it explicitly or via
-``@pytest.mark.usefixtures("web_registry_populated")``.
-"""
-
-from unittest.mock import patch
-
-import pytest
-
-
-def register_all_web_providers():
-    """Register all bundled web-search providers into the global registry.
-
-    This is the single source of truth for the provider list used by
-    test classes that need the registry populated for dispatch checks.
-    """
-    from agent.web_search_registry import register_provider, _reset_for_tests
-    from plugins.web.brave_free.provider import BraveFreeWebSearchProvider
-    from plugins.web.ddgs.provider import DDGSWebSearchProvider
-    from plugins.web.exa.provider import ExaWebSearchProvider
-    from plugins.web.firecrawl.provider import FirecrawlWebSearchProvider
-    from plugins.web.parallel.provider import ParallelWebSearchProvider
-    from plugins.web.searxng.provider import SearXNGWebSearchProvider
-    from plugins.web.tavily.provider import TavilyWebSearchProvider
-    from plugins.web.xai.provider import XAIWebSearchProvider
-
-    _reset_for_tests()
-    for cls in (
-        BraveFreeWebSearchProvider,
-        DDGSWebSearchProvider,
-        ExaWebSearchProvider,
-        FirecrawlWebSearchProvider,
-        ParallelWebSearchProvider,
-        SearXNGWebSearchProvider,
-        TavilyWebSearchProvider,
-        XAIWebSearchProvider,
-    ):
-        register_provider(cls())
-
-
-@pytest.fixture
-def web_registry_populated():
-    """Populate the web-search-provider registry for one test, then reset."""
-    register_all_web_providers()
-    yield
-    from agent.web_search_registry import _reset_for_tests
-    _reset_for_tests()
-
-
-@pytest.fixture
-def disable_lazy_stt_install():
-    """Disarm the runtime lazy-install probe so static ``_HAS_FASTER_WHISPER``
-    patches accurately simulate 'faster-whisper not installed'.
-
-    Without this, ``_try_lazy_install_stt()`` calls
-    ``importlib.util.find_spec("faster_whisper")``, which returns truthy
-    whenever the package is installed in the dev / CI environment —
-    defeating the test's ``_HAS_FASTER_WHISPER=False`` patch.
-
-    Opt in at module scope with
-    ``pytestmark = pytest.mark.usefixtures("disable_lazy_stt_install")``.
-    """
-    with patch("tools.transcription_tools._try_lazy_install_stt", return_value=False):
-        yield
@@ -1,246 +0,0 @@
-"""Unit tests for tools/app_tools.py — the Nous tool gateway integration."""
-
-from __future__ import annotations
-
-import json
-from unittest.mock import MagicMock
-
-import httpx
-import pytest
-
-from tools.managed_tool_gateway import ManagedToolGatewayConfig
-
-
-_FAKE_GATEWAY = ManagedToolGatewayConfig(
-    vendor="tools",
-    gateway_origin="https://tools-gateway.example.com",
-    nous_user_token="test-token-abc123",
-    managed_mode=True,
-)
-
-
-@pytest.fixture(autouse=True)
-def _reset_http_client_cache():
-    """Clear the module-level cached httpx client between tests."""
-    import tools.app_tools as mod
-    mod._http_client = None
-    mod._http_client_origin = None
-    yield
-    mod._http_client = None
-    mod._http_client_origin = None
-
-
-@pytest.fixture()
-def gateway_post(monkeypatch):
-    """Patch the gateway and httpx.Client.post; return a dict capturing the request."""
-    monkeypatch.setattr(
-        "tools.app_tools.resolve_managed_tool_gateway", lambda v: _FAKE_GATEWAY
-    )
-    monkeypatch.setattr(
-        "tools.app_tools._get_current_model_name", lambda: None
-    )
-    captured = {}
-    resp = MagicMock(spec=httpx.Response)
-    resp.status_code = 200
-    resp.json.return_value = {"data": {}, "error": None}
-    resp.text = json.dumps({"data": {}, "error": None})
-
-    def fake_post(self, url, *, json=None, headers=None, **kw):
-        captured["url"] = url
-        captured["headers"] = headers
-        captured["json"] = json
-        return resp
-
-    monkeypatch.setattr(httpx.Client, "post", fake_post)
-    return captured
-
-
-# ---------------------------------------------------------------------------
-# check_fn gating
-# ---------------------------------------------------------------------------
-
-class TestAppToolsAvailability:
-    def test_returns_false_when_gateway_not_ready(self, monkeypatch):
-        monkeypatch.setattr("tools.app_tools.is_managed_tool_gateway_ready", lambda vendor: False)
-        monkeypatch.setattr("tools.app_tools._read_portal_app_tools_enabled", lambda: True)
-        from tools.app_tools import _app_tools_available
-        assert _app_tools_available() is False
-
-    def test_returns_true_when_gateway_ready_and_config_on(self, monkeypatch):
-        monkeypatch.setattr("tools.app_tools.is_managed_tool_gateway_ready", lambda vendor: True)
-        monkeypatch.setattr("tools.app_tools._read_portal_app_tools_enabled", lambda: True)
-        from tools.app_tools import _app_tools_available
-        assert _app_tools_available() is True
-
-    def test_returns_false_when_config_off(self, monkeypatch):
-        monkeypatch.setattr("tools.app_tools.is_managed_tool_gateway_ready", lambda vendor: True)
-        monkeypatch.setattr("tools.app_tools._read_portal_app_tools_enabled", lambda: False)
-        from tools.app_tools import _app_tools_available
-        assert _app_tools_available() is False
-
-
-# ---------------------------------------------------------------------------
-# URL + auth header
-# ---------------------------------------------------------------------------
-
-class TestSearchPostsCorrectUrlAndAuth:
-    def test_posts_to_v1_search_with_bearer_token(self, monkeypatch, gateway_post):
-        monkeypatch.setattr("tools.app_tools._get_current_model_name", lambda: "test-model")
-        from tools.app_tools import handle_app_search_tools
-        handle_app_search_tools({"queries": [{"use_case": "send email"}]})
-
-        assert gateway_post["url"] == "https://tools-gateway.example.com/v1/search"
-        assert gateway_post["headers"]["Authorization"] == "Bearer test-token-abc123"
-        assert gateway_post["headers"]["Content-Type"] == "application/json"
-        assert gateway_post["json"]["queries"] == [{"use_case": "send email"}]
-        assert gateway_post["json"]["model"] == "test-model"
-
-
-# ---------------------------------------------------------------------------
-# Model auto-injection
-# ---------------------------------------------------------------------------
-
-class TestModelAutoInjection:
-    def test_injects_model_from_config(self, monkeypatch, gateway_post):
-        monkeypatch.setattr("tools.app_tools._get_current_model_name", lambda: "claude-sonnet-4")
-        from tools.app_tools import handle_app_search_tools
-        handle_app_search_tools({"queries": [{"use_case": "test"}]})
-        assert gateway_post["json"]["model"] == "claude-sonnet-4"
-
-    def test_omits_model_when_unresolvable(self, gateway_post):
-        from tools.app_tools import handle_app_search_tools
-        handle_app_search_tools({"queries": [{"use_case": "test"}]})
-        assert "model" not in gateway_post["json"]
-
-
-# ---------------------------------------------------------------------------
-# Gateway-internal param stripping (allowlist approach)
-# ---------------------------------------------------------------------------
-
-class TestExecuteStripsInternalParams:
-    def test_strips_sync_response_thought_step_metric(self, gateway_post):
-        from tools.app_tools import handle_app_execute_tools
-        handle_app_execute_tools({
-            "tools": [{"tool_slug": "TEST", "arguments": {}}],
-            "sync_response_to_workbench": True,
-            "thought": "testing",
-            "current_step": "TESTING",
-            "current_step_metric": "1/1 tests",
-        })
-        body = gateway_post["json"]
-        for key in ("sync_response_to_workbench", "thought", "current_step", "current_step_metric"):
-            assert key not in body
-        assert body["tools"] == [{"tool_slug": "TEST", "arguments": {}}]
-
-
-# ---------------------------------------------------------------------------
-# HTTP error → tool result (not exception)
-# ---------------------------------------------------------------------------
-
-class TestHttpErrorReturnedAsToolResult:
-    @pytest.mark.parametrize("status_code", [402, 403, 422, 500])
-    def test_returns_error_json_not_exception(self, monkeypatch, status_code):
-        monkeypatch.setattr("tools.app_tools.resolve_managed_tool_gateway", lambda v: _FAKE_GATEWAY)
-        error_body = {"error": {"code": "TEST_ERROR", "message": "fail"}}
-        resp = MagicMock(spec=httpx.Response)
-        resp.status_code = status_code
-        resp.json.return_value = error_body
-        resp.text = json.dumps(error_body)
-        monkeypatch.setattr(httpx.Client, "post", lambda self, url, **kw: resp)
-
-        from tools.app_tools import handle_app_search_tools
-        result = json.loads(handle_app_search_tools({"queries": [{"use_case": "test"}]}))
-        assert result["error"]["code"] == "TEST_ERROR"
-
-
-# ---------------------------------------------------------------------------
-# Network failure → tool result
-# ---------------------------------------------------------------------------
-
-class TestNetworkFailureReturnedAsToolResult:
-    def test_connect_error_returns_gateway_unreachable(self, monkeypatch):
-        monkeypatch.setattr("tools.app_tools.resolve_managed_tool_gateway", lambda v: _FAKE_GATEWAY)
-
-        def raise_connect(self, url, **kw):
-            raise httpx.ConnectError("Connection refused")
-        monkeypatch.setattr(httpx.Client, "post", raise_connect)
-
-        from tools.app_tools import handle_app_search_tools
-        result = json.loads(handle_app_search_tools({"queries": [{"use_case": "test"}]}))
-        assert result["error"]["code"] == "GATEWAY_UNREACHABLE"
-
-    def test_timeout_returns_gateway_timeout(self, monkeypatch):
-        monkeypatch.setattr("tools.app_tools.resolve_managed_tool_gateway", lambda v: _FAKE_GATEWAY)
-
-        def raise_timeout(self, url, **kw):
-            raise httpx.ReadTimeout("timed out")
-        monkeypatch.setattr(httpx.Client, "post", raise_timeout)
-
-        from tools.app_tools import handle_app_search_tools
-        result = json.loads(handle_app_search_tools({"queries": [{"use_case": "test"}]}))
-        assert result["error"]["code"] == "GATEWAY_TIMEOUT"
-
-
-# ---------------------------------------------------------------------------
-# Endpoint routing + payload forwarding
-# ---------------------------------------------------------------------------
-
-class TestEndpointRouting:
-    def test_manage_connections_forwards_toolkits(self, gateway_post):
-        from tools.app_tools import handle_app_manage_connections
-        handle_app_manage_connections({"toolkits": ["gmail", "slack"], "reinitiate_all": True})
-        assert gateway_post["url"].endswith("/v1/connections")
-        assert gateway_post["json"]["toolkits"] == ["gmail", "slack"]
-        assert gateway_post["json"]["reinitiate_all"] is True
-
-    def test_tool_schemas_forwards_slugs(self, gateway_post):
-        from tools.app_tools import handle_app_tool_schemas
-        handle_app_tool_schemas({"tool_slugs": ["GMAIL_SEND_EMAIL"], "include": ["input_schema", "output_schema"]})
-        assert gateway_post["url"].endswith("/v1/schemas")
-        assert gateway_post["json"]["tool_slugs"] == ["GMAIL_SEND_EMAIL"]
-        assert gateway_post["json"]["include"] == ["input_schema", "output_schema"]
-
-
-# ---------------------------------------------------------------------------
-# Registry entries
-# ---------------------------------------------------------------------------
-
-class TestRegistryEntries:
-    def test_all_four_tools_registered_under_app_tools(self):
-        from tools.registry import registry
-        import tools.app_tools  # noqa: F401
-        expected = {"app_search_tools", "app_tool_schemas", "app_execute_tools", "app_manage_connections"}
-        for name in expected:
-            entry = registry._tools.get(name)
-            assert entry is not None, f"{name} not registered"
-            assert entry.toolset == "app_tools"
-
-
-# ---------------------------------------------------------------------------
-# session (object) vs session_id (string) asymmetry
-# ---------------------------------------------------------------------------
-
-class TestSessionHandling:
-    def test_search_uses_session_object(self, gateway_post):
-        from tools.app_tools import handle_app_search_tools
-        handle_app_search_tools({"queries": [{"use_case": "test"}], "session": {"generate_id": True}})
-        assert isinstance(gateway_post["json"]["session"], dict)
-        assert "session_id" not in gateway_post["json"]
-
-    def test_schemas_uses_session_id_string(self, gateway_post):
-        from tools.app_tools import handle_app_tool_schemas
-        handle_app_tool_schemas({"tool_slugs": ["TEST"], "session_id": "sess-123"})
-        assert gateway_post["json"]["session_id"] == "sess-123"
-        assert "session" not in gateway_post["json"]
-
-    def test_execute_uses_session_id_string(self, gateway_post):
-        from tools.app_tools import handle_app_execute_tools
-        handle_app_execute_tools({"tools": [{"tool_slug": "TEST", "arguments": {}}], "session_id": "sess-456"})
-        assert gateway_post["json"]["session_id"] == "sess-456"
-        assert "session" not in gateway_post["json"]
-
-    def test_connections_uses_session_id_string(self, gateway_post):
-        from tools.app_tools import handle_app_manage_connections
-        handle_app_manage_connections({"toolkits": ["gmail"], "session_id": "sess-789"})
-        assert gateway_post["json"]["session_id"] == "sess-789"
-        assert "session" not in gateway_post["json"]
@@ -22,28 +22,18 @@ from tools.approval import (


@pytest.fixture
-def isolated_session(monkeypatch, tmp_path):
-    """Give each test a fresh session_key, clean approval-state, and isolated
-    HERMES_HOME so the real user's command_allowlist doesn't leak in."""
-    import tools.approval as _am
-
+def isolated_session(monkeypatch):
+    """Give each test a fresh session_key and clean approval-state."""
    session_key = "test:session:approval_hooks"
    token = set_current_session_key(session_key)
    monkeypatch.setenv("HERMES_SESSION_KEY", session_key)
    # Make sure we don't skip guards via yolo / approvals.mode=off
    monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
-    # Isolate from the real user's permanent allowlist + session state
-    _saved_permanent = _am._permanent_approved.copy()
-    _saved_session = {k: v.copy() for k, v in _am._session_approved.items()}
-    _am._permanent_approved.clear()
-    _am._session_approved.clear()
    try:
        yield session_key
    finally:
-        _am._permanent_approved.update(_saved_permanent)
-        _am._session_approved.update(_saved_session)
        try:
-            _am._approval_session_key.reset(token)
+            approval_module._approval_session_key.reset(token)
        except Exception:
            pass
        clear_session(session_key)
@@ -41,7 +41,7 @@ def _find_chrome() -> str:


@pytest.fixture
-def chrome_cdp(request):
+def chrome_cdp(worker_id):
    """Start a headless Chrome with --remote-debugging-port, yield its WS URL.

    Uses a unique port per xdist worker to avoid cross-worker collisions.
@@ -51,9 +51,6 @@ def chrome_cdp(request):
    import socket

    # xdist worker_id is "master" in single-process mode or "gw0".."gwN" otherwise.
-    # Under subprocess-per-file isolation there's no xdist, so we fall back
-    # to "master" via the session-scoped fixture below.
-    worker_id = request.getfixturevalue("worker_id") if "worker_id" in request.fixturenames else "master"
    if worker_id == "master":
        port_offset = 0
    else:
@@ -76,27 +76,6 @@ class TestSchema:
        modes = set(COMPUTER_USE_SCHEMA["parameters"]["properties"]["mode"]["enum"])
        assert modes == {"som", "vision", "ax"}

-    def test_schema_exposes_max_elements_cap_for_capture(self):
-        from tools.computer_use.schema import COMPUTER_USE_SCHEMA
-        props = COMPUTER_USE_SCHEMA["parameters"]["properties"]
-        assert "max_elements" in props
-        assert props["max_elements"]["type"] == "integer"
-        assert props["max_elements"].get("minimum", 1) >= 1
-
-    def test_schema_max_elements_documents_default_and_upper_bound(self):
-        """Schema description must agree with the runtime. The original PR
-        text said "Default 100" without a corresponding `default` field, and
-        had no upper bound — both Copilot findings.
-        """
-        from tools.computer_use.schema import COMPUTER_USE_SCHEMA
-        from tools.computer_use.tool import (
-            _DEFAULT_MAX_ELEMENTS,
-            _MAX_ALLOWED_MAX_ELEMENTS,
-        )
-        prop = COMPUTER_USE_SCHEMA["parameters"]["properties"]["max_elements"]
-        assert prop.get("default") == _DEFAULT_MAX_ELEMENTS
-        assert prop.get("maximum") == _MAX_ALLOWED_MAX_ELEMENTS
-

 class TestRegistration:
    def test_tool_registers_with_registry(self):
@@ -176,104 +155,6 @@ class TestDispatch:
        click_kw = next(c[1] for c in noop_backend.calls if c[0] == "click")
        assert click_kw["button"] == "right"

-    def test_type_action_routes_to_type_text_backend(self, noop_backend):
-        """type action must call backend.type_text, not type_text_chars (issue #24170, bug 3)."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({"action": "type", "text": "hello"})
-        parsed = json.loads(out)
-        assert "error" not in parsed
-        call_names = [c[0] for c in noop_backend.calls]
-        assert "type" in call_names
-        type_kw = next(c[1] for c in noop_backend.calls if c[0] == "type")
-        assert type_kw["text"] == "hello"
-
-    def test_drag_action_routes_to_backend_by_coordinate(self, noop_backend):
-        """drag action must dispatch to backend.drag with coordinates (issue #24170, bug 4)."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({
-            "action": "drag",
-            "from_coordinate": [100, 200],
-            "to_coordinate": [400, 500],
-        })
-        parsed = json.loads(out)
-        assert "error" not in parsed
-        call_names = [c[0] for c in noop_backend.calls]
-        assert "drag" in call_names
-        drag_kw = next(c[1] for c in noop_backend.calls if c[0] == "drag")
-        assert drag_kw["from_xy"] == (100, 200)
-        assert drag_kw["to_xy"] == (400, 500)
-
-    def test_drag_action_routes_to_backend_by_element(self, noop_backend):
-        """drag action must dispatch to backend.drag with element indices (issue #24170, bug 4)."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({
-            "action": "drag",
-            "from_element": 1,
-            "to_element": 5,
-        })
-        parsed = json.loads(out)
-        assert "error" not in parsed
-        call_names = [c[0] for c in noop_backend.calls]
-        assert "drag" in call_names
-        drag_kw = next(c[1] for c in noop_backend.calls if c[0] == "drag")
-        assert drag_kw["from_element"] == 1
-        assert drag_kw["to_element"] == 5
-
-    def test_drag_action_requires_coordinates_or_elements(self, noop_backend):
-        """drag without from/to must return an error."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({"action": "drag"})
-        parsed = json.loads(out)
-        assert "error" in parsed
-
-    def test_set_value_routes_to_backend(self, noop_backend):
-        """set_value must reach the backend — regression for missing _NoopBackend stub."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({"action": "set_value", "value": "Option A", "element": 5})
-        parsed = json.loads(out)
-        assert parsed.get("ok") is True
-        assert parsed.get("action") == "set_value"
-        assert any(c[0] == "set_value" for c in noop_backend.calls)
-
-    def test_set_value_missing_value_returns_error(self, noop_backend):
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({"action": "set_value"})
-        parsed = json.loads(out)
-        assert "error" in parsed
-    def test_capture_after_skipped_when_action_failed(self, noop_backend):
-        """capture_after must not fire when res.ok=False (regression guard).
-
-        A follow-up screenshot after a failed action shows the screen in a
-        normal state, misleading the model into thinking the action succeeded.
-        """
-        from unittest.mock import patch
-        from tools.computer_use.backend import ActionResult
-        from tools.computer_use.tool import handle_computer_use
-
-        # Make click() return a failure.
-        with patch.object(noop_backend, "click",
-                          return_value=ActionResult(ok=False, action="click",
-                                                    message="element not found")):
-            out = handle_computer_use({"action": "click", "element": 99,
-                                       "capture_after": True})
-
-        parsed = json.loads(out)
-        # Should return the error, not a multimodal capture.
-        assert parsed.get("ok") is False
-        assert parsed.get("action") == "click"
-        # No follow-up capture should have been issued.
-        capture_calls = [c for c in noop_backend.calls if c[0] == "capture"]
-        assert len(capture_calls) == 0, "capture must not be called after a failed action"
-
-    def test_capture_after_fires_when_action_succeeds(self, noop_backend):
-        """capture_after must trigger for successful actions."""
-        from tools.computer_use.tool import handle_computer_use
-        out = handle_computer_use({"action": "click", "element": 1,
-                                   "capture_after": True})
-        # Noop backend returns ok=True, so capture should have been called.
-        capture_calls = [c for c in noop_backend.calls if c[0] == "capture"]
-        assert len(capture_calls) == 1
-

 # ---------------------------------------------------------------------------
 # Safety guards (type / key block lists)
@@ -406,193 +287,6 @@ class TestCaptureResponse:
        assert "AXButton" in text_part["text"]
        assert "AXTextField" in text_part["text"]

-    def _ax_backend_with(self, count: int):
-        """Construct a fake backend that yields ``count`` AX elements."""
-        from tools.computer_use.backend import CaptureResult, UIElement
-
-        elements = [
-            UIElement(index=i + 1, role="AXButton", label=f"el-{i}", bounds=(0, 0, 1, 1))
-            for i in range(count)
-        ]
-
-        class FakeBackend:
-            def start(self): pass
-            def stop(self): pass
-            def is_available(self): return True
-            def capture(self, mode="som", app=None):
-                return CaptureResult(
-                    mode=mode, width=800, height=600,
-                    png_b64="",
-                    elements=list(elements),
-                    app="Obsidian",
-                )
-            def click(self, **kw): ...
-            def drag(self, **kw): ...
-            def scroll(self, **kw): ...
-            def type_text(self, text): ...
-            def key(self, keys): ...
-            def list_apps(self): return []
-            def focus_app(self, app, raise_window=False): ...
-
-        return FakeBackend()
-
-    def test_capture_ax_caps_elements_at_default_for_dense_trees(self):
-        """Regression for #22865: an Electron-style 600-element AX tree must
-        not emit the entire array verbatim into the tool result.
-        """
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(600)
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-            out = cu_tool.handle_computer_use({"action": "capture", "mode": "ax"})
-
-        parsed = json.loads(out)
-        assert parsed["mode"] == "ax"
-        assert parsed["total_elements"] == 600
-        assert len(parsed["elements"]) == cu_tool._DEFAULT_MAX_ELEMENTS
-        assert parsed["truncated_elements"] == 600 - cu_tool._DEFAULT_MAX_ELEMENTS
-        # Truncation must be visible in the human summary so the model knows
-        # the JSON view is partial and can re-issue with a tighter scope.
-        assert "truncated to" in parsed["summary"]
-
-    def test_capture_ax_honors_explicit_max_elements_override(self):
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(600)
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-            out = cu_tool.handle_computer_use(
-                {"action": "capture", "mode": "ax", "max_elements": 250}
-            )
-
-        parsed = json.loads(out)
-        assert len(parsed["elements"]) == 250
-        assert parsed["truncated_elements"] == 350
-
-    def test_capture_ax_below_cap_is_unchanged(self):
-        """Backwards-compat: small captures keep the full elements array and
-        do not surface a `truncated_elements` field.
-        """
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(5)
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-            out = cu_tool.handle_computer_use({"action": "capture", "mode": "ax"})
-
-        parsed = json.loads(out)
-        assert len(parsed["elements"]) == 5
-        assert parsed["total_elements"] == 5
-        assert "truncated_elements" not in parsed
-        assert "truncated to" not in parsed["summary"]
-
-    def test_capture_ax_invalid_max_elements_falls_back_to_default(self):
-        """Malformed `max_elements` (string, negative, zero) must not silently
-        disable the cap and re-introduce the original unbounded behavior.
-        """
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(600)
-        cu_tool.reset_backend_for_tests()
-        for bad in ("not-a-number", 0, -10):
-            with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-                out = cu_tool.handle_computer_use(
-                    {"action": "capture", "mode": "ax", "max_elements": bad}
-                )
-            parsed = json.loads(out)
-            assert len(parsed["elements"]) == cu_tool._DEFAULT_MAX_ELEMENTS, (
-                f"bad max_elements={bad!r} disabled the cap"
-            )
-
-    def test_capture_ax_clamps_oversized_max_elements_to_hard_cap(self):
-        """A caller passing a very large `max_elements` must not be able to
-        disable the safeguard. The cap is clamped to a hard upper bound so
-        the context-blow-up protection cannot be bypassed by argument.
-        """
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(5000)
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-            out = cu_tool.handle_computer_use(
-                {"action": "capture", "mode": "ax", "max_elements": 10_000}
-            )
-        parsed = json.loads(out)
-        assert len(parsed["elements"]) == cu_tool._MAX_ALLOWED_MAX_ELEMENTS
-        assert parsed["total_elements"] == 5000
-        assert parsed["truncated_elements"] == 5000 - cu_tool._MAX_ALLOWED_MAX_ELEMENTS
-
-    def test_capture_ax_summary_indices_match_returned_elements(self):
-        """When `max_elements` is below the human-summary's own line cap, the
-        summary must not index elements that aren't in the returned array.
-        Otherwise the model sees `#15` in the summary and finds no matching
-        entry in `elements`.
-        """
-        from tools.computer_use import tool as cu_tool
-
-        fake_backend = self._ax_backend_with(600)
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=fake_backend):
-            out = cu_tool.handle_computer_use(
-                {"action": "capture", "mode": "ax", "max_elements": 5}
-            )
-        parsed = json.loads(out)
-        returned_indices = {e["index"] for e in parsed["elements"]}
-        summary_lines = parsed["summary"].splitlines()
-        indexed_lines = [ln for ln in summary_lines if ln.lstrip().startswith("#")]
-        for ln in indexed_lines:
-            idx_token = ln.lstrip().split()[0].lstrip("#")
-            idx = int(idx_token)
-            assert idx in returned_indices, (
-                f"summary references #{idx} but it is absent from elements payload "
-                f"(returned: {sorted(returned_indices)})"
-            )
-
-    def test_capture_multimodal_summary_omits_truncation_note(self):
-        """The som/vision multimodal envelope returns a screenshot, not an
-        `elements` array — so a "response truncated to N of M elements"
-        claim in the summary would be inaccurate.
-        """
-        from tools.computer_use.backend import CaptureResult, UIElement
-        from tools.computer_use import tool as cu_tool
-
-        fake_png = "iVBORw0KGgo="
-        elements = [
-            UIElement(index=i + 1, role="AXButton", label=f"el-{i}", bounds=(0, 0, 1, 1))
-            for i in range(600)
-        ]
-
-        class FakeBackend:
-            def start(self): pass
-            def stop(self): pass
-            def is_available(self): return True
-            def capture(self, mode="som", app=None):
-                return CaptureResult(
-                    mode=mode, width=800, height=600,
-                    png_b64=fake_png, elements=list(elements),
-                    app="Obsidian",
-                )
-            def click(self, **kw): ...
-            def drag(self, **kw): ...
-            def scroll(self, **kw): ...
-            def type_text(self, text): ...
-            def key(self, keys): ...
-            def list_apps(self): return []
-            def focus_app(self, app, raise_window=False): ...
-
-        cu_tool.reset_backend_for_tests()
-        with patch.object(cu_tool, "_get_backend", return_value=FakeBackend()):
-            out = cu_tool.handle_computer_use({"action": "capture", "mode": "som"})
-
-        assert isinstance(out, dict) and out["_multimodal"] is True
-        text_part = next(p for p in out["content"] if p.get("type") == "text")
-        assert "truncated to" not in text_part["text"], (
-            "multimodal response carries an image, not an elements array; "
-            "the truncation note describes a payload field that isn't present"
-        )
-        assert "truncated to" not in out["text_summary"]
-

 # ---------------------------------------------------------------------------
 # Anthropic adapter: multimodal tool-result conversion
@@ -985,332 +679,3 @@ class TestUniversality:
        source = inspect.getsource(entry.check_fn)
        assert "anthropic" not in source.lower()
        assert "openai" not in source.lower()
-
-
-# ---------------------------------------------------------------------------
-# Regression tests for bugs 2 & 5 from issue #24170 (cua-driver v0.1.6)
-# ---------------------------------------------------------------------------
-
-class TestElementLabelParsing:
-    """Bug 5: element labels stripped in capture results (cua-driver v0.1.6 format).
-
-    cua-driver ≥0.1.6 emits ``[N] AXRole (order) id=Label`` instead of
-    ``  - [N] AXRole "label"``.  _parse_elements_from_tree must handle both.
-    """
-
-    def test_classic_quoted_label_format(self):
-        from tools.computer_use.cua_backend import _parse_elements_from_tree
-        tree = (
-            '  - [14] AXButton "One"\n'
-            '  - [15] AXButton "Two"\n'
-            '  - [16] AXTextField ""\n'
-        )
-        els = _parse_elements_from_tree(tree)
-        assert len(els) == 3
-        assert els[0].index == 14
-        assert els[0].role == "AXButton"
-        assert els[0].label == "One"
-        assert els[1].label == "Two"
-        assert els[2].label == ""  # empty quoted label
-
-    def test_new_id_eq_format(self):
-        """cua-driver v0.1.6 format: [N] AXRole (order) id=Label"""
-        from tools.computer_use.cua_backend import _parse_elements_from_tree
-        tree = (
-            "[14] AXButton (1) id=One\n"
-            "[15] AXButton (2) id=Two\n"
-            "[16] AXTextField (3) id=\n"
-        )
-        els = _parse_elements_from_tree(tree)
-        assert len(els) == 3
-        assert els[0].index == 14
-        assert els[0].role == "AXButton"
-        assert els[0].label == "One"
-        assert els[1].label == "Two"
-        assert els[2].label == ""  # empty id= value
-
-    def test_mixed_formats_in_single_tree(self):
-        """Gracefully handles trees that mix old and new line formats."""
-        from tools.computer_use.cua_backend import _parse_elements_from_tree
-        tree = (
-            '  - [1] AXWindow "Main Window"\n'
-            "[14] AXButton (1) id=One\n"
-            '  - [15] AXTextField "Search"\n'
-        )
-        els = _parse_elements_from_tree(tree)
-        assert len(els) == 3
-        labels = {e.index: e.label for e in els}
-        assert labels[1] == "Main Window"
-        assert labels[14] == "One"
-        assert labels[15] == "Search"
-
-
-class TestCaptureAfterAppContext:
-    """Bug 2: capture_after=True loses app context after actions.
-
-    _maybe_follow_capture must re-target the same app that was set by
-    the preceding capture/focus_app call, rather than the frontmost window.
-    """
-
-    def test_capture_after_uses_last_app(self):
-        """capture_after=True should pass _last_app to the follow-up capture."""
-        from tools.computer_use.backend import ActionResult, CaptureResult
-        from tools.computer_use import tool as cu_tool
-
-        captured_app_args = []
-
-        class TrackingBackend:
-            _last_app = "Calculator"  # simulates a previous focus_app call
-
-            def start(self):
-                pass
-
-            def stop(self):
-                pass
-
-            def is_available(self):
-                return True
-
-            def capture(self, mode="som", app=None):
-                captured_app_args.append(app)
-                return CaptureResult(
-                    mode=mode, width=100, height=100,
-                    png_b64=None, elements=[],
-                    app=app or "Calculator", window_title="",
-                )
-
-            def click(self, **kw):
-                return ActionResult(ok=True, action="click")
-
-            def drag(self, **kw):
-                return ActionResult(ok=True, action="drag")
-
-            def scroll(self, **kw):
-                return ActionResult(ok=True, action="scroll")
-
-            def type_text(self, text):
-                return ActionResult(ok=True, action="type")
-
-            def key(self, keys):
-                return ActionResult(ok=True, action="key")
-
-            def list_apps(self):
-                return []
-
-            def focus_app(self, app, raise_window=False):
-                return ActionResult(ok=True, action="focus_app")
-
-            def set_value(self, value, element=None):
-                return ActionResult(ok=True, action="set_value")
-
-            def wait(self, seconds=1.0):
-                return ActionResult(ok=True, action="wait")
-
-        backend = TrackingBackend()
-        cu_tool.reset_backend_for_tests()
-        cu_tool._backend = backend
-
-        cu_tool.handle_computer_use({"action": "click", "element": 14, "capture_after": True})
-
-        # The follow-up capture must have been called with app="Calculator"
-        assert len(captured_app_args) == 1
-        assert captured_app_args[0] == "Calculator", (
-            f"Expected follow-up capture with app='Calculator', got {captured_app_args[0]!r}"
-        )
-
-    def test_capture_after_without_prior_app_uses_none(self):
-        """When no app context is set, follow-up capture uses app=None (frontmost)."""
-        from tools.computer_use.backend import ActionResult, CaptureResult
-        from tools.computer_use import tool as cu_tool
-
-        captured_app_args = []
-
-        class NoContextBackend:
-            _last_app = None  # no prior context
-
-            def start(self):
-                pass
-
-            def stop(self):
-                pass
-
-            def is_available(self):
-                return True
-
-            def capture(self, mode="som", app=None):
-                captured_app_args.append(app)
-                return CaptureResult(
-                    mode=mode, width=100, height=100,
-                    png_b64=None, elements=[],
-                    app="Finder", window_title="",
-                )
-
-            def click(self, **kw):
-                return ActionResult(ok=True, action="click")
-
-            def drag(self, **kw):
-                return ActionResult(ok=True, action="drag")
-
-            def scroll(self, **kw):
-                return ActionResult(ok=True, action="scroll")
-
-            def type_text(self, text):
-                return ActionResult(ok=True, action="type")
-
-            def key(self, keys):
-                return ActionResult(ok=True, action="key")
-
-            def list_apps(self):
-                return []
-
-            def focus_app(self, app, raise_window=False):
-                return ActionResult(ok=True, action="focus_app")
-
-            def set_value(self, value, element=None):
-                return ActionResult(ok=True, action="set_value")
-
-            def wait(self, seconds=1.0):
-                return ActionResult(ok=True, action="wait")
-
-        backend = NoContextBackend()
-        cu_tool.reset_backend_for_tests()
-        cu_tool._backend = backend
-
-        cu_tool.handle_computer_use({"action": "click", "element": 5, "capture_after": True})
-
-        # No app context — should pass None so cua-driver picks the frontmost window
-        assert len(captured_app_args) == 1
-        assert captured_app_args[0] is None
-
-# ---------------------------------------------------------------------------
-# Regression tests for bug 1 from issue #24170:
-#   capture(app=...) and focus_app(app=...) must surface when the filter
-#   matches nothing instead of silently picking the frontmost window.
-# ---------------------------------------------------------------------------
-
-def _make_cua_backend_with_windows(windows: List[Dict[str, Any]]):
-    """Construct a CuaDriverBackend with a mocked MCP session that returns
-    the supplied list_windows payload."""
-    from tools.computer_use.cua_backend import CuaDriverBackend
-
-    backend = CuaDriverBackend()
-    backend._session = MagicMock()
-    backend._session.call_tool.return_value = {
-        "data": "",
-        "images": [],
-        "structuredContent": {"windows": windows},
-        "isError": False,
-    }
-    return backend
-
-
-class TestCaptureAppFilterNoMatch:
-    """capture(app=X) must not silently fall back to the frontmost window
-    when X matches nothing — on a non-English macOS, list_windows returns
-    localized app names (e.g. "計算機"), so an English `app="Calculator"`
-    legitimately matches nothing and the caller needs to retry with the
-    localized name. The old code silently captured the frontmost window
-    (e.g. a menu-bar utility), giving the agent wrong UI elements.
-    """
-
-    def test_app_filter_no_match_returns_empty_capture_with_diagnostic(self):
-        # Simulates a localized macOS where Calculator's app_name is "計算機".
-        windows = [
-            {"app_name": "Fuwari", "pid": 100, "window_id": 1,
-             "is_on_screen": True, "title": "menu bar", "z_index": 0},
-            {"app_name": "計算機", "pid": 200, "window_id": 2,
-             "is_on_screen": True, "title": "Calculator", "z_index": 1},
-        ]
-        backend = _make_cua_backend_with_windows(windows)
-
-        cap = backend.capture(mode="som", app="Calculator")
-
-        # No window matched; capture must NOT pick the frontmost (Fuwari).
-        assert cap.app == "", (
-            f"app= filter no-match should not silently target a window; got {cap.app!r}"
-        )
-        assert cap.elements == []
-        assert "Calculator" in cap.window_title
-        assert "list_apps" in cap.window_title
-        # _active_pid must remain unset so a subsequent click doesn't hit Fuwari.
-        assert backend._active_pid is None
-        assert backend._active_window_id is None
-
-    def test_app_filter_match_still_works(self):
-        windows = [
-            {"app_name": "Fuwari", "pid": 100, "window_id": 1,
-             "is_on_screen": True, "title": "menu bar", "z_index": 0},
-            {"app_name": "計算機", "pid": 200, "window_id": 2,
-             "is_on_screen": True, "title": "Calculator", "z_index": 1},
-        ]
-        backend = _make_cua_backend_with_windows(windows)
-        # get_window_state for the matched window
-        backend._session.call_tool.side_effect = [
-            {"data": "", "images": [], "isError": False,
-             "structuredContent": {"windows": windows}},
-            {"data": '✅ 計算機 — 0 elements\n', "images": [], "isError": False,
-             "structuredContent": None},
-        ]
-
-        cap = backend.capture(mode="ax", app="計算機")
-
-        assert backend._active_pid == 200
-        assert backend._active_window_id == 2
-
-    def test_no_app_filter_still_picks_frontmost(self):
-        """When no app= is given, capture continues to pick the frontmost
-        window — the no-match early-return must not fire on the empty case."""
-        windows = [
-            {"app_name": "Fuwari", "pid": 100, "window_id": 1,
-             "is_on_screen": True, "title": "menu bar", "z_index": 0},
-        ]
-        backend = _make_cua_backend_with_windows(windows)
-        backend._session.call_tool.side_effect = [
-            {"data": "", "images": [], "isError": False,
-             "structuredContent": {"windows": windows}},
-            {"data": '✅ Fuwari — 0 elements\n', "images": [], "isError": False,
-             "structuredContent": None},
-        ]
-
-        cap = backend.capture(mode="ax", app=None)
-
-        assert backend._active_pid == 100
-
-
-class TestFocusAppFilterNoMatch:
-    """focus_app(app=X) must return ok=False when X matches nothing —
-    not silently target the frontmost window and report ok=True with a
-    misleading 'Targeted Fuwari' message.
-    """
-
-    def test_focus_app_no_match_returns_not_ok(self):
-        windows = [
-            {"app_name": "Fuwari", "pid": 100, "window_id": 1,
-             "is_on_screen": True, "title": "menu bar", "z_index": 0},
-            {"app_name": "計算機", "pid": 200, "window_id": 2,
-             "is_on_screen": True, "title": "Calculator", "z_index": 1},
-        ]
-        backend = _make_cua_backend_with_windows(windows)
-
-        res = backend.focus_app("Calculator")
-
-        assert res.ok is False
-        assert res.action == "focus_app"
-        assert "Calculator" in res.message
-        # _active_pid must remain unset so a subsequent click doesn't hit Fuwari.
-        assert backend._active_pid is None
-
-    def test_focus_app_match_still_works(self):
-        windows = [
-            {"app_name": "Fuwari", "pid": 100, "window_id": 1,
-             "is_on_screen": True, "title": "menu bar", "z_index": 0},
-            {"app_name": "計算機", "pid": 200, "window_id": 2,
-             "is_on_screen": True, "title": "Calculator", "z_index": 1},
-        ]
-        backend = _make_cua_backend_with_windows(windows)
-
-        res = backend.focus_app("計算機")
-
-        assert res.ok is True
-        assert backend._active_pid == 200
-        assert backend._active_window_id == 2
@@ -1,431 +0,0 @@
-"""End-to-end regression for #24015 — capture routing via auxiliary.vision.
-
-When ``computer_use(action='capture', mode='som'|'vision')`` returns a
-screenshot, ``_capture_response`` previously always returned a
-``_multimodal`` envelope. For non-vision main models, or when the user
-explicitly configured ``auxiliary.vision`` in ``config.yaml``, that
-envelope tripped HTTP 404 / 400 at the provider boundary even though a
-perfectly good vision backend was sitting in config waiting to be used.
-
-This file exercises the integrated ``_capture_response`` flow with
-deterministic stubs for:
-
-* ``should_route_capture_to_aux_vision`` (the policy decision)
-* ``_run_async`` (sync->async bridge)
-* ``vision_analyze_tool`` (the aux LLM call)
-* ``hermes_constants.get_hermes_dir`` (cache path)
-
-…so the full code path is covered without a live cua-driver, a real
-auxiliary client, or network access.
-"""
-
-from __future__ import annotations
-
-import base64
-import json
-import os
-from pathlib import Path
-from typing import Any
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# Fixtures / helpers
-# ---------------------------------------------------------------------------
-
-# 1×1 PNG (transparent) — minimal bytes that decode cleanly.
-_PNG_B64 = (
-    "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42m"
-    "NkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="
-)
-
-# 1×1 JPEG — used to verify mime detection works for either stream type.
-_JPEG_B64 = (
-    "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEB"
-    "AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/"
-)
-
-
-@pytest.fixture
-def tmp_cache_dir(tmp_path):
-    """Override get_hermes_dir so cache writes land under tmp_path."""
-    cache_dir = tmp_path / "cache_vision"
-    cache_dir.mkdir()
-
-    def _fake_get(*_args, **_kw):
-        return cache_dir
-
-    with patch("hermes_constants.get_hermes_dir", _fake_get):
-        yield cache_dir
-
-
-def _make_capture(
-    *,
-    png_b64: str = _PNG_B64,
-    mode: str = "som",
-    elements=None,
-    app: str = "Safari",
-    window_title: str = "GitHub – Issue #24015",
-    width: int = 1280,
-    height: int = 800,
-):
-    from tools.computer_use.backend import CaptureResult, UIElement
-
-    elements = list(elements or [
-        UIElement(index=0, role="AXButton", label="Sign in",
-                  bounds=(10, 20, 80, 30)),
-        UIElement(index=1, role="AXTextField", label="username",
-                  bounds=(10, 60, 200, 24)),
-    ])
-    raw = base64.b64decode(png_b64, validate=False)
-    return CaptureResult(
-        mode=mode,
-        width=width,
-        height=height,
-        png_b64=png_b64,
-        elements=elements,
-        app=app,
-        window_title=window_title,
-        png_bytes_len=len(raw),
-    )
-
-
-def _stub_aux_analysis(text: str):
-    """Return a fake vision_analyze_tool coroutine result (JSON envelope)."""
-    return json.dumps({"success": True, "analysis": text})
-
-
-# ---------------------------------------------------------------------------
-# _capture_response: routing OFF (current/native behaviour)
-# ---------------------------------------------------------------------------
-
-class TestCaptureResponseDefaultPath:
-    """When routing helper says 'native', the existing multimodal envelope wins."""
-
-    def test_som_capture_returns_multimodal_envelope_when_native(self):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(png_b64=_PNG_B64, mode="som")
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=False):
-            resp = cu_tool._capture_response(cap)
-
-        assert isinstance(resp, dict)
-        assert resp.get("_multimodal") is True
-        # Image part must use image/png MIME for a PNG payload.
-        image_part = next(
-            p for p in resp["content"] if p.get("type") == "image_url"
-        )
-        url = image_part["image_url"]["url"]
-        assert url.startswith("data:image/png;base64,")
-        assert "vision_analysis" not in resp
-
-    def test_jpeg_capture_returns_image_jpeg_mime_when_native(self):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(png_b64=_JPEG_B64, mode="som")
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=False):
-            resp = cu_tool._capture_response(cap)
-
-        url = next(p for p in resp["content"] if p.get("type") == "image_url")
-        assert url["image_url"]["url"].startswith("data:image/jpeg;base64,")
-
-    def test_ax_only_capture_returns_text_regardless_of_routing(self):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="ax", png_b64="")
-        # ax mode never has a PNG so neither path matters; assert pure text.
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True) as routing:
-            resp = cu_tool._capture_response(cap)
-
-        # ax never even consults the routing helper — short-circuited above
-        # the image branch.
-        routing.assert_not_called()
-        assert isinstance(resp, str)
-        body = json.loads(resp)
-        assert body["mode"] == "ax"
-
-
-# ---------------------------------------------------------------------------
-# _capture_response: routing ON (the #24015 fix)
-# ---------------------------------------------------------------------------
-
-class TestCaptureResponseRoutedToAuxVision:
-    """When routing helper says 'aux', the PNG is pre-analysed and a text
-    response is returned with no image_url parts at all."""
-
-    def test_som_capture_returns_text_with_vision_analysis(
-        self, tmp_cache_dir,
-    ):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-
-        captured_calls = {}
-
-        def _fake_run_async(coro):
-            captured_calls["called"] = True
-            return _stub_aux_analysis(
-                "A Safari window showing a GitHub issue page with a 'Sign "
-                "in' button and a 'username' text field."
-            )
-
-        # vision_analyze_tool is async; force a sync MagicMock so we can
-        # assert positional args without dealing with awaitables.
-        fake_vat = MagicMock(return_value="<coro>")
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            resp = cu_tool._capture_response(cap)
-
-        # Must be a JSON string, NOT a multimodal envelope. This is exactly
-        # the contract that prevents #24015's HTTP 404 from firing on the
-        # next agent turn.
-        assert isinstance(resp, str)
-        body = json.loads(resp)
-        assert body["mode"] == "som"
-        assert body["app"] == "Safari"
-        assert "Sign in" in body["vision_analysis"]
-        assert body["vision_analysis_routed_via"] == "auxiliary.vision"
-        # The original AX-only metadata (window title, element index, app)
-        # is preserved alongside the new vision analysis so the agent loses
-        # no context vs the multimodal path.
-        assert body["window_title"] == "GitHub – Issue #24015"
-        assert len(body["elements"]) == 2
-
-        assert captured_calls.get("called") is True
-        # vision_analyze_tool was invoked with a path under the patched cache
-        # and a non-empty prompt.
-        args, _kwargs = fake_vat.call_args
-        path_arg, prompt_arg = args[0], args[1]
-        assert str(tmp_cache_dir) in path_arg
-        assert "macOS application screenshot" in prompt_arg
-        # AX summary is included so the aux model can ground its description
-        # against the same set-of-mark index the agent will see.
-        assert "Sign in" in prompt_arg
-
-    def test_temp_screenshot_file_is_cleaned_up_after_routing(
-        self, tmp_cache_dir,
-    ):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-        # We capture the path the aux call sees so we can assert it's gone
-        # after _capture_response returns.
-        observed_path = {}
-
-        def _fake_run_async(_coro):
-            return _stub_aux_analysis("description goes here")
-
-        def _fake_vat(image_path, _prompt):
-            observed_path["path"] = image_path
-            # File must exist while aux is being arranged.
-            assert os.path.exists(image_path)
-            return "<coro>"
-
-        fake_vat = MagicMock(side_effect=_fake_vat)
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            cu_tool._capture_response(cap)
-
-        # File must be unlinked after _capture_response returns.
-        assert observed_path["path"]
-        assert not os.path.exists(observed_path["path"])
-
-    def test_temp_file_cleaned_up_even_when_aux_call_raises(
-        self, tmp_cache_dir,
-    ):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-        observed_path = {}
-
-        def _fake_vat(image_path, _prompt):
-            observed_path["path"] = image_path
-            return "<coro>"
-
-        def _fake_run_async(_coro):
-            raise RuntimeError("aux LLM down")
-
-        fake_vat = MagicMock(side_effect=_fake_vat)
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            resp = cu_tool._capture_response(cap)
-
-        # Aux failure → fall back to multimodal envelope (so the user still
-        # gets *something* useful even if vision is broken).
-        assert isinstance(resp, dict)
-        assert resp.get("_multimodal") is True
-        # Temp file must still be cleaned up.
-        assert observed_path["path"]
-        assert not os.path.exists(observed_path["path"])
-
-    def test_empty_aux_analysis_falls_back_to_multimodal(self, tmp_cache_dir):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-
-        def _fake_run_async(_coro):
-            return _stub_aux_analysis("")
-
-        fake_vat = MagicMock(return_value="<coro>")
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            resp = cu_tool._capture_response(cap)
-
-        # Empty analysis is treated as failure — we'd rather show pixels
-        # than embed an empty 'vision_analysis' string into the result.
-        assert isinstance(resp, dict)
-        assert resp.get("_multimodal") is True
-
-    def test_invalid_aux_response_falls_back_to_multimodal(self, tmp_cache_dir):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-
-        def _fake_run_async(_coro):
-            return 1234  # not a string at all
-
-        fake_vat = MagicMock(return_value="<coro>")
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            resp = cu_tool._capture_response(cap)
-
-        assert isinstance(resp, dict)
-        assert resp.get("_multimodal") is True
-
-
-# ---------------------------------------------------------------------------
-# _should_route_through_aux_vision: end-to-end with real config plumbing
-# ---------------------------------------------------------------------------
-
-class TestRoutingDecisionWiring:
-    """Verify _should_route_through_aux_vision wires the right config + helper."""
-
-    def test_explicit_aux_vision_in_config_routes_to_aux(self):
-        from tools.computer_use import tool as cu_tool
-
-        cfg = {
-            "model": {"default": "tencent/hy3-preview", "provider": "openrouter"},
-            "auxiliary": {
-                "vision": {
-                    "provider": "openrouter",
-                    "model": "google/gemini-2.5-flash",
-                }
-            },
-        }
-        with patch("agent.auxiliary_client._read_main_provider",
-                   return_value="openrouter"), \
-             patch("agent.auxiliary_client._read_main_model",
-                   return_value="tencent/hy3-preview"), \
-             patch("hermes_cli.config.load_config", return_value=cfg):
-            assert cu_tool._should_route_through_aux_vision() is True
-
-    def test_no_explicit_aux_and_vision_capable_main_keeps_multimodal(self):
-        from tools.computer_use import tool as cu_tool
-
-        cfg = {
-            "model": {"default": "claude-opus-4-5", "provider": "anthropic"},
-        }
-        with patch("agent.auxiliary_client._read_main_provider",
-                   return_value="anthropic"), \
-             patch("agent.auxiliary_client._read_main_model",
-                   return_value="claude-opus-4-5"), \
-             patch("hermes_cli.config.load_config", return_value=cfg), \
-             patch("tools.computer_use.vision_routing._lookup_supports_vision",
-                   return_value=True), \
-             patch("tools.computer_use.vision_routing."
-                   "_provider_accepts_multimodal_tool_result",
-                   return_value=True):
-            assert cu_tool._should_route_through_aux_vision() is False
-
-    def test_config_load_failure_disables_routing_safely(self):
-        from tools.computer_use import tool as cu_tool
-
-        with patch("hermes_cli.config.load_config",
-                   side_effect=RuntimeError("config.yaml unreadable")):
-            # No exception should bubble up — fail open by returning False
-            # so the legacy multimodal envelope continues to work.
-            assert cu_tool._should_route_through_aux_vision() is False
-
-    def test_helper_decision_exception_is_swallowed(self):
-        from tools.computer_use import tool as cu_tool
-        from tools.computer_use import vision_routing as vr_mod
-
-        with patch("agent.auxiliary_client._read_main_provider",
-                   return_value="openrouter"), \
-             patch("agent.auxiliary_client._read_main_model",
-                   return_value="x"), \
-             patch("hermes_cli.config.load_config", return_value={}), \
-             patch.object(vr_mod, "should_route_capture_to_aux_vision",
-                          side_effect=ValueError("policy bug")):
-            assert cu_tool._should_route_through_aux_vision() is False
-
-
-# ---------------------------------------------------------------------------
-# Bug reproduction marker — proves the fix is needed.
-# ---------------------------------------------------------------------------
-
-class TestBugReproductionAnchor:
-    """Without the fix, this test would assert the wrong thing.
-
-    On upstream/main HEAD prior to this branch, _capture_response returns a
-    multimodal envelope unconditionally — so when a non-vision main model
-    is configured, the captured PNG is delivered to the main provider as
-    image_url content and the request is rejected with HTTP 404. We don't
-    have a live provider here, but we can pin the contract: with routing
-    enabled the response MUST be a JSON string with no image_url parts.
-    """
-
-    def test_non_vision_main_model_never_returns_image_url_when_routed(
-        self, tmp_cache_dir,
-    ):
-        from tools.computer_use import tool as cu_tool
-
-        cap = _make_capture(mode="som")
-
-        def _fake_run_async(_coro):
-            return _stub_aux_analysis(
-                "Screenshot showing a GitHub.com window with a sign-in "
-                "form."
-            )
-
-        fake_vat = MagicMock(return_value="<coro>")
-
-        with patch.object(cu_tool, "_should_route_through_aux_vision",
-                          return_value=True), \
-             patch("model_tools._run_async", side_effect=_fake_run_async), \
-             patch("tools.vision_tools.vision_analyze_tool",
-                   new_callable=lambda: fake_vat):
-            resp = cu_tool._capture_response(cap)
-
-        # Must be a string (text-only result).
-        assert isinstance(resp, str)
-        # Must NOT contain a base64 image URL anywhere — that's what tripped
-        # 'No endpoints found that support image input' on the reporter's
-        # main provider in #24015.
-        assert "data:image" not in resp
-        assert "image_url" not in resp
@@ -1,260 +0,0 @@
-"""Unit tests for tools.computer_use.vision_routing.
-
-Cover the small ``should_route_capture_to_aux_vision`` policy helper that
-decides whether a captured screenshot from ``computer_use(action='capture')``
-should be returned as a multimodal envelope (main model handles vision
-natively) or pre-analysed via the ``auxiliary.vision`` pipeline so the
-main model only sees text.
-
-The companion end-to-end regression for #24015 lives in
-``tests/tools/test_computer_use_capture_routing.py``; this file pins the
-unit contract of the helper in isolation so behaviour does not regress
-silently if the surrounding ``computer_use`` plumbing is refactored.
-"""
-
-from __future__ import annotations
-
-from unittest.mock import patch
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# _explicit_aux_vision_override
-# ---------------------------------------------------------------------------
-
-class TestExplicitAuxVisionOverride:
-    """Mirror agent.image_routing — config detection must agree across paths."""
-
-    def test_returns_false_for_none_cfg(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        assert _explicit_aux_vision_override(None) is False
-
-    def test_returns_false_for_non_dict_cfg(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        assert _explicit_aux_vision_override("not-a-dict") is False
-        assert _explicit_aux_vision_override([]) is False
-
-    def test_returns_false_when_auxiliary_block_missing(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        assert _explicit_aux_vision_override({}) is False
-        assert _explicit_aux_vision_override({"model": {"default": "x"}}) is False
-
-    def test_returns_false_when_vision_block_missing(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"compression": {"provider": "openai"}}}
-        assert _explicit_aux_vision_override(cfg) is False
-
-    def test_returns_false_for_blank_provider_no_model_no_base_url(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"provider": "", "model": "", "base_url": ""}}}
-        assert _explicit_aux_vision_override(cfg) is False
-
-    def test_returns_false_for_provider_auto(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"provider": "auto"}}}
-        assert _explicit_aux_vision_override(cfg) is False
-
-    def test_returns_false_for_provider_AUTO_uppercase(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"provider": "  AUTO  "}}}
-        assert _explicit_aux_vision_override(cfg) is False
-
-    def test_returns_true_for_explicit_provider(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"provider": "openrouter"}}}
-        assert _explicit_aux_vision_override(cfg) is True
-
-    def test_returns_true_for_explicit_model_only(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"model": "google/gemini-2.5-flash"}}}
-        assert _explicit_aux_vision_override(cfg) is True
-
-    def test_returns_true_for_explicit_base_url_only(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": {"base_url": "http://localhost:1234/v1"}}}
-        assert _explicit_aux_vision_override(cfg) is True
-
-    def test_returns_true_for_provider_auto_plus_explicit_model(self):
-        """``provider: auto`` + an explicit model still counts as override."""
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {
-            "auxiliary": {
-                "vision": {"provider": "auto", "model": "claude-3-haiku"},
-            }
-        }
-        assert _explicit_aux_vision_override(cfg) is True
-
-    def test_handles_non_dict_vision_block(self):
-        from tools.computer_use.vision_routing import _explicit_aux_vision_override
-        cfg = {"auxiliary": {"vision": "not-a-dict"}}
-        assert _explicit_aux_vision_override(cfg) is False
-
-
-# ---------------------------------------------------------------------------
-# should_route_capture_to_aux_vision
-# ---------------------------------------------------------------------------
-
-class TestRouteDecision:
-    """End-to-end policy: explicit override > tool-result support > vision caps."""
-
-    def test_explicit_override_routes_to_aux_even_for_vision_main(self):
-        """Issue #24015 core repro: explicit aux config must win.
-
-        Even if the main model fully supports vision (Anthropic / Claude),
-        an explicit ``auxiliary.vision`` block means the user wants their
-        configured backend used. Don't silently bypass it.
-        """
-        from tools.computer_use import vision_routing
-
-        cfg = {
-            "auxiliary": {
-                "vision": {
-                    "provider": "openrouter",
-                    "model": "google/gemini-2.5-flash",
-                }
-            }
-        }
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=True), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=True):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "anthropic", "claude-opus-4-5", cfg
-            ) is True
-
-    def test_non_vision_main_model_routes_to_aux(self):
-        """The reported #24015 scenario: tencent/hy3-preview has no vision."""
-        from tools.computer_use import vision_routing
-
-        cfg = {"model": {"default": "tencent/hy3-preview", "provider": "openrouter"}}
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=False), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=True):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "openrouter", "tencent/hy3-preview", cfg
-            ) is True
-
-    def test_vision_main_model_no_override_keeps_multimodal(self):
-        """Default path: vision-capable main model + no aux override → native."""
-        from tools.computer_use import vision_routing
-
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=True), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=True):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "anthropic", "claude-opus-4-5", None
-            ) is False
-
-    def test_provider_rejects_multimodal_tool_results_routes_to_aux(self):
-        """Some providers' tool-result messages won't carry images at all."""
-        from tools.computer_use import vision_routing
-
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=True), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=False):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "some-aggregator", "some-vision-model", {}
-            ) is True
-
-    def test_unknown_provider_capabilities_fail_closed(self):
-        """When tool-result lookup returns None, route to aux (safe default)."""
-        from tools.computer_use import vision_routing
-
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=True), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=None):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "exotic-provider", "exotic-model", {}
-            ) is True
-
-    def test_unknown_vision_capability_fails_closed(self):
-        """When models.dev has no entry, prefer aux over a likely 404."""
-        from tools.computer_use import vision_routing
-
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=None), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=True):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "openrouter", "novel/never-seen-model", {}
-            ) is True
-
-    def test_explicit_override_wins_over_unknown_caps(self):
-        """Explicit aux config wins regardless of unknown caps elsewhere."""
-        from tools.computer_use import vision_routing
-
-        cfg = {"auxiliary": {"vision": {"provider": "openrouter"}}}
-        with patch.object(vision_routing, "_lookup_supports_vision", return_value=None), \
-             patch.object(vision_routing,
-                          "_provider_accepts_multimodal_tool_result",
-                          return_value=None):
-            assert vision_routing.should_route_capture_to_aux_vision(
-                "openrouter", "tencent/hy3-preview", cfg
-            ) is True
-
-
-# ---------------------------------------------------------------------------
-# Internal lookups — defensive paths
-# ---------------------------------------------------------------------------
-
-class TestLookupHelpers:
-    def test_lookup_supports_vision_returns_none_for_blank_provider(self):
-        from tools.computer_use.vision_routing import _lookup_supports_vision
-        assert _lookup_supports_vision("", "claude") is None
-
-    def test_lookup_supports_vision_returns_none_for_blank_model(self):
-        from tools.computer_use.vision_routing import _lookup_supports_vision
-        assert _lookup_supports_vision("anthropic", "") is None
-
-    def test_lookup_supports_vision_handles_lookup_exception(self):
-        """Underlying caps lookup may raise; helper must swallow + return None."""
-        from tools.computer_use import vision_routing
-
-        def _boom(_provider, _model):
-            raise RuntimeError("models.dev unreachable")
-
-        with patch("agent.models_dev.get_model_capabilities", side_effect=_boom):
-            assert vision_routing._lookup_supports_vision("anthropic", "claude") is None
-
-    def test_lookup_supports_vision_returns_none_when_caps_missing(self):
-        from tools.computer_use import vision_routing
-
-        with patch("agent.models_dev.get_model_capabilities", return_value=None):
-            assert vision_routing._lookup_supports_vision("anthropic", "claude") is None
-
-    def test_provider_accepts_multimodal_tool_result_returns_none_for_blank_provider(self):
-        from tools.computer_use.vision_routing import (
-            _provider_accepts_multimodal_tool_result,
-        )
-        assert _provider_accepts_multimodal_tool_result("", "claude") is None
-
-
-# ---------------------------------------------------------------------------
-# Module surface
-# ---------------------------------------------------------------------------
-
-class TestModuleSurface:
-    """Pin the public surface so dependents stay in lockstep."""
-
-    def test_should_route_capture_to_aux_vision_is_exported(self):
-        from tools.computer_use import vision_routing
-
-        assert "should_route_capture_to_aux_vision" in vision_routing.__all__
-        assert callable(vision_routing.should_route_capture_to_aux_vision)
-
-    @pytest.mark.parametrize("name", [
-        "_explicit_aux_vision_override",
-        "_lookup_supports_vision",
-        "_provider_accepts_multimodal_tool_result",
-    ])
-    def test_internal_helpers_are_addressable(self, name):
-        """Internal helpers stay importable so tests can monkeypatch them."""
-        from tools.computer_use import vision_routing
-
-        assert hasattr(vision_routing, name)
-        assert callable(getattr(vision_routing, name))
@@ -1089,17 +1089,9 @@ class Test403Enrichment:
 class TestModelToolsIntegration:
    def setup_method(self):
        _reset_capability_cache()
-        from model_tools import _clear_tool_defs_cache
-        from tools.registry import invalidate_check_fn_cache
-        _clear_tool_defs_cache()
-        invalidate_check_fn_cache()

    def teardown_method(self):
        _reset_capability_cache()
-        from model_tools import _clear_tool_defs_cache
-        from tools.registry import invalidate_check_fn_cache
-        _clear_tool_defs_cache()
-        invalidate_check_fn_cache()

    @patch("tools.discord_tool._discord_request")
    def test_discord_admin_schema_rebuilt_by_get_tool_definitions(
@@ -501,18 +501,16 @@ class TestRegistration:

    def test_check_fn_gates_availability(self, monkeypatch):
        """Registry should exclude HA tools when HASS_TOKEN is not set."""
-        from tools.registry import invalidate_check_fn_cache, registry
+        from tools.registry import registry

        monkeypatch.delenv("HASS_TOKEN", raising=False)
-        invalidate_check_fn_cache()
        defs = registry.get_definitions({"ha_list_entities", "ha_get_state", "ha_call_service"})
        assert len(defs) == 0

    def test_check_fn_includes_when_token_set(self, monkeypatch):
        """Registry should include HA tools when HASS_TOKEN is set."""
-        from tools.registry import invalidate_check_fn_cache, registry
+        from tools.registry import registry

        monkeypatch.setenv("HASS_TOKEN", "test-token")
-        invalidate_check_fn_cache()
        defs = registry.get_definitions({"ha_list_entities", "ha_get_state", "ha_call_service"})
        assert len(defs) == 3
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
yoniebans	f3058fbec7	test(image-shrink): accept clamp_dimensions kwarg in mock resize signature The _fake_resize mock in test_image_shrink_recovery.py predates the clamp_dimensions kwarg on _resize_image_for_vision. Add it to keep the mock signature aligned.	2026-05-22 19:09:17 +02:00
yoniebans	bd3bad232b	refactor(vision): revert Pillow-missing warning to match codebase pattern	2026-05-22 18:49:31 +02:00
yoniebans	a2546ed4fe	refactor(vision): simplify _is_anthropic_provider; drop manual probe script - Inline the provider check via _ANTHROPIC_IMAGE_PROVIDERS frozenset instead of duplicating the predicate logic in a function body. - Drop scripts/verify_anthropic_pixel_cap.py — it was a one-off development probe, not a repeatable utility. Moved to local workspace.	2026-05-22 18:31:28 +02:00
yoniebans	2f253a4f55	fix(vision): broaden Anthropic detection + wire clamp into browser/compression paths - Broaden _is_anthropic_provider to cover claude/claude-code aliases and aggregators that proxy Claude (openrouter, nous, vertex, bedrock, anthropic-vertex, google-vertex) — same set as _supports_media_in_tool_results. - Wire clamp_dimensions through browser_tool screenshot resize and conversation_compression image-shrink recovery, both of which were bypassing the clamp. - Promote Pillow-missing log to warning when clamp was requested. - Add parametrized tests for _is_anthropic_provider covering 19 cases.	2026-05-22 18:17:22 +02:00
yoniebans	80a0c829d7	test(vision): add wiring regression for provider-gated dimension clamp	2026-05-21 22:21:20 +02:00
yoniebans	f4b32301ce	test(vision): add manual verification script for Anthropic pixel cap Manual script that hits real Anthropic API to confirm: (1) >8000 px images are still rejected with the same error message, (2) our clamp produces an image Anthropic accepts. Run when threshold drift is suspected.	2026-05-21 22:16:43 +02:00
yoniebans	b3309f3c0f	fix(vision): gate pixel-dimension clamp on Anthropic provider Anthropic is the only major provider that hard-rejects >8000 px images. Clamping unconditionally silently downscaled images for OpenAI/Gemini/custom hosts that could handle larger inputs. Gate the clamp on the active provider and add an opt-in clamp_dimensions kwarg to _resize_image_for_vision.	2026-05-21 21:53:45 +02:00
yoniebans	2882899925	fix(vision): clamp image dimensions before inline base64 encode Anthropic's Messages API rejects any image whose width or height exceeds 8000 px with a non_retryable_client_error 400: messages.N.content.M.image.source.base64.data: At least one of the image dimensions exceed max allowed size: 8000 pixels The native vision fast path inlined oversized screenshots (e.g. tall or panoramic captures from browser_vision / vision_analyze) directly into the tool-result envelope before any size check. Once present in the message history, every subsequent request replayed the same oversized image and got the same 400 — permanently bricking the session, since the error is non-retryable. Recovery required manually editing the session JSON to drop the poisoned tool result. Fix: * Add _MAX_IMAGE_DIMENSION = 7999 (one px under Anthropic's cap). * Add _get_image_dimensions / _image_exceeds_pixel_cap helpers (header-only Pillow read, no full decode). * _resize_image_for_vision now clamps proportionally to the cap before any byte-size work. * Three call sites (native fast path + legacy path initial check) trigger resize on dimension overflow as well as byte overflow. Pillow remains a soft dependency: when missing, the dimension check returns False and the existing byte-size guard remains the last line of defence (same behaviour as today). Adds TestPixelDimensionCap covering the helpers, the Pillow-missing fallback, and the 10000x100 / 100x10000 regression cases. All 125 tests pass across vision_tools, vision_native_fast_path, image_shrink_recovery, and image_rejection_fallback.	2026-05-21 11:27:24 +02:00