fix(cli): address Copilot review #1 (4 threads)

Thread 1 (cli.py:1488): Fix broken skin hook — class is SkinConfig not Skin. The previous code silently no-op'd via the broad except, so SkinConfig.get_color() calls weren't actually remapped. Verified the hook fires now: in light mode, banner_text returns #1A1A1A instead of #FFF8DC. Thread 2 (cli.py:1328): Align comment with actual timeout. The OSC 11 read deadline is 100ms (time.monotonic() + 0.1), not 50ms. Fixed the docstring. Thread 3 (cli.py:13389): Remove unused imports of Point and Screen in the _output_screen_diff monkey-patch block. Leftover from earlier experiments — the wrapper only needs previous_screen mutation. Thread 4 (cli.py:11422): Skip light-mode remap entirely when a pt style string already specifies its own bg (e.g. 'bg:#1a1a2e #FFF8DC' for status-bar / completion-menu). Those colors were tuned for that specific dark bg; remapping the FG to #1A1A1A would produce dark-on-dark (invisible). Now we detect the explicit 'bg:' token and leave the whole value untouched. Also dropped the stale comment block at the resize-handler that described the old 'force \x1b[2J\x1b[H clear-screen on resize' recovery — replaced with the actual current strategy (monkey-patch _output_screen_diff).
feat(cli): light-mode color remap covers all skin reads (Rich Panel borders, etc)
2026-05-15 00:21:19 -05:00 · 2026-05-15 00:01:16 -05:00 · 2026-05-14 23:41:13 -05:00 · 2026-05-14 23:39:12 -05:00 · 2026-05-14 23:24:30 -05:00 · 2026-05-14 23:10:51 -05:00
265 changed files with 25547 additions and 4902 deletions
@@ -14,6 +14,14 @@
 # LLM_MODEL is no longer read from .env — this line is kept for reference only.
 # LLM_MODEL=anthropic/claude-opus-4.6

+# =============================================================================
+# LLM PROVIDER (NovitaAI)
+# =============================================================================
+# NovitaAI — 90+ models, pay-per-use
+# Get your key at: https://novita.ai/settings/key-management
+# NOVITA_API_KEY=
+# NOVITA_BASE_URL=https://api.novita.ai/openai/v1  # Override default base URL
+
 # =============================================================================
 # LLM PROVIDER (Google AI Studio / Gemini)
 # =============================================================================
@@ -28,9 +28,10 @@ permissions:
  contents: read

 # Concurrency: push/release runs are NEVER cancelled so every merge gets its
-# own SHA-tagged image; :latest is guarded separately by the move-latest job.
-# PR runs reuse a PR-scoped group with cancel-in-progress: true so rapid
-# pushes to the same PR collapse to the latest commit.
+# own SHA-tagged image; :main and :latest are guarded separately by the
+# move-main and move-latest jobs.  PR runs reuse a PR-scoped group with
+# cancel-in-progress: true so rapid pushes to the same PR collapse to the
+# latest commit.
 concurrency:
  group: docker-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
@@ -91,10 +92,10 @@ jobs:
      # pattern for multi-runner multi-platform builds.
      #
      # We apply the OCI revision label here (and again on arm64) because
-      # the move-latest job reads it off the linux/amd64 sub-manifest config
-      # of `:latest` to decide whether it's safe to advance.  The label must
-      # be on each per-arch image — manifest lists themselves don't carry
-      # image config labels.
+      # the move-main / move-latest jobs read it off the linux/amd64
+      # sub-manifest config of the floating tag to decide whether it's safe
+      # to advance.  The label must be on each per-arch image — manifest
+      # lists themselves don't carry image config labels.
      - name: Push amd64 by digest
        id: push
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
@@ -217,6 +218,8 @@ jobs:
    timeout-minutes: 10
    outputs:
      pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
+      pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
+      release_tag: ${{ steps.tag.outputs.tag }}
    steps:
      - name: Download digests
        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
@@ -271,33 +274,43 @@ jobs:
          IMAGE_NAME: ${{ env.IMAGE_NAME }}
          TAG: ${{ steps.tag.outputs.tag }}

-      # Signal to move-latest that the SHA tag is live.  Only on main pushes;
-      # releases don't trigger move-latest (they use their own release tag).
+      # Signal to move-main that the SHA tag is live.  Only on main pushes;
+      # releases set pushed_release_tag instead.
      - name: Mark SHA tag pushed
        id: mark_pushed
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        run: echo "pushed=true" >> "$GITHUB_OUTPUT"

+      # Signal to move-latest that the release tag is live.
+      - name: Mark release tag pushed
+        id: mark_release_pushed
+        if: github.event_name == 'release'
+        run: echo "pushed=true" >> "$GITHUB_OUTPUT"
+
  # ---------------------------------------------------------------------------
-  # Move :latest to point at the SHA tag the merge job pushed.
+  # Move :main to point at the SHA tag the merge job pushed.
+  #
+  # :main is the floating tag that tracks the tip of the main branch.  Every
+  # merge to main retags :main forward.  Users who want "latest dev build"
+  # pull :main; users who want stable releases pull :latest.
  #
  # The real serialization guarantee comes from the top-level concurrency
  # group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
  # which ensures at most one workflow run for this ref executes at a time.
-  # That means two move-latest steps for the same ref cannot overlap.
+  # That means two move-main steps for the same ref cannot overlap.
  #
  # This job has its own concurrency group as defense-in-depth: if the
-  # top-level group is ever loosened, queued move-latests will run serially
+  # top-level group is ever loosened, queued move-mains will run serially
  # in arrival order, each one running the ancestor check below and either
-  # advancing :latest or skipping.  `cancel-in-progress: false` matches the
+  # advancing :main or skipping.  `cancel-in-progress: false` matches the
  # top-level setting — we don't want rapid pushes to cancel a queued
-  # move-latest, because the ancestor check is the real safety mechanism
-  # and queueing is cheap (move-latest is a ~30s registry op).
+  # move-main, because the ancestor check is the real safety mechanism
+  # and queueing is cheap (move-main is a ~30s registry op).
  #
-  # Combined with the ancestor check, this means :latest only ever moves
+  # Combined with the ancestor check, this means :main only ever moves
  # forward in git history.
  # ---------------------------------------------------------------------------
-  move-latest:
+  move-main:
    if: |
      github.repository == 'NousResearch/hermes-agent'
      && github.event_name == 'push'
@@ -307,7 +320,7 @@ jobs:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    concurrency:
-      group: docker-move-latest-${{ github.ref }}
+      group: docker-move-main-${{ github.ref }}
      cancel-in-progress: false
    steps:
      - name: Checkout code
@@ -324,13 +337,13 @@ jobs:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      # Read the git revision label off the current :latest manifest, then
+      # Read the git revision label off the current :main manifest, then
      # use `git merge-base --is-ancestor` to check whether our commit is a
-      # descendant of it.  If :latest doesn't exist yet, or its label is
+      # descendant of it.  If :main doesn't exist yet, or its label is
      # missing, we treat that as "safe to publish".  If another run already
-      # advanced :latest past us (or diverged), we skip and leave it alone.
-      - name: Decide whether to move :latest
-        id: latest_check
+      # advanced :main past us (or diverged), we skip and leave it alone.
+      - name: Decide whether to move :main
+        id: main_check
        run: |
          set -euo pipefail
          image=nousresearch/hermes-agent
@@ -338,6 +351,119 @@ jobs:
          # Pull the JSON for the linux/amd64 sub-manifest's config and extract
          # the OCI revision label with jq — Go template field access can't
          # handle dots in map keys, so using json+jq is the robust route.
+          image_json=$(
+            docker buildx imagetools inspect "${image}:main" \
+              --format '{{ json (index .Image "linux/amd64") }}' \
+              2>/dev/null || true
+          )
+
+          if [ -z "${image_json}" ]; then
+            echo "No existing :main (or inspect failed) — safe to publish."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          current_sha=$(
+            printf '%s' "${image_json}" \
+              | jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
+          )
+
+          if [ -z "${current_sha}" ]; then
+            echo "Registry :main has no revision label — safe to publish."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          echo "Registry :main is at ${current_sha}"
+          echo "This run is at      ${GITHUB_SHA}"
+
+          if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
+            echo ":main already points at our SHA — nothing to do."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Make sure we have the :main commit locally for merge-base.
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            git fetch --no-tags --prune origin \
+              "+refs/heads/main:refs/remotes/origin/main" \
+              || true
+          fi
+
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            echo "Registry :main points at an unknown commit (${current_sha}); refusing to overwrite."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Our SHA must be a descendant of the current :main to be safe.
+          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
+            echo "Our commit is a descendant of :main — safe to advance."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "Another run advanced :main past us (or diverged) — leaving it alone."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+          fi
+
+      # Retag the already-pushed SHA manifest as :main.  This is a registry-
+      # side operation — no rebuild, no layer re-push — so it's quick and
+      # atomic per-tag.  The ancestor check above plus the cancel-in-progress
+      # concurrency on this job together guarantee we only ever move :main
+      # forward in git history.
+      - name: Move :main to this SHA
+        if: steps.main_check.outputs.push_main == 'true'
+        run: |
+          set -euo pipefail
+          image=nousresearch/hermes-agent
+          docker buildx imagetools create \
+            --tag "${image}:main" \
+            "${image}:sha-${GITHUB_SHA}"
+
+  # ---------------------------------------------------------------------------
+  # Move :latest to point at the release tag the merge job pushed.
+  #
+  # :latest is the floating tag that tracks the most recent stable release.
+  # Only `release: published` events advance it — never main pushes.
+  #
+  # We still run an ancestor check against the existing :latest so that a
+  # backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
+  # is out) doesn't drag :latest backwards.  The check is the same shape as
+  # move-main: read the OCI revision label off the current :latest, look up
+  # that commit in git, and only advance if our release commit is a strict
+  # descendant.
+  # ---------------------------------------------------------------------------
+  move-latest:
+    if: |
+      github.repository == 'NousResearch/hermes-agent'
+      && github.event_name == 'release'
+      && needs.merge.outputs.pushed_release_tag == 'true'
+    needs: merge
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    concurrency:
+      group: docker-move-latest
+      cancel-in-progress: false
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        with:
+          fetch-depth: 1000
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Decide whether to move :latest
+        id: latest_check
+        run: |
+          set -euo pipefail
+          image=nousresearch/hermes-agent
+
          image_json=$(
            docker buildx imagetools inspect "${image}:latest" \
              --format '{{ json (index .Image "linux/amd64") }}' \
@@ -362,7 +488,7 @@ jobs:
          fi

          echo "Registry :latest is at ${current_sha}"
-          echo "This run is at      ${GITHUB_SHA}"
+          echo "This release is at  ${GITHUB_SHA}"

          if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
            echo ":latest already points at our SHA — nothing to do."
@@ -371,6 +497,7 @@ jobs:
          fi

          # Make sure we have the :latest commit locally for merge-base.
+          # Releases can be cut from any branch, so fetch broadly.
          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
            git fetch --no-tags --prune origin \
              "+refs/heads/main:refs/remotes/origin/main" \
@@ -383,25 +510,25 @@ jobs:
            exit 0
          fi

-          # Our SHA must be a descendant of the current :latest to be safe.
+          # Our release SHA must be a descendant of the current :latest.
+          # Backport releases on older branches won't satisfy this and will
+          # be left alone — :latest stays on the newer release.
          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
-            echo "Our commit is a descendant of :latest — safe to advance."
+            echo "Our release commit is a descendant of :latest — safe to advance."
            echo "push_latest=true" >> "$GITHUB_OUTPUT"
          else
-            echo "Another run advanced :latest past us (or diverged) — leaving it alone."
+            echo "Existing :latest is newer than this release (likely a backport) — leaving it alone."
            echo "push_latest=false" >> "$GITHUB_OUTPUT"
          fi

-      # Retag the already-pushed SHA manifest as :latest.  This is a registry-
-      # side operation — no rebuild, no layer re-push — so it's quick and
-      # atomic per-tag.  The ancestor check above plus the cancel-in-progress
-      # concurrency on this job together guarantee we only ever move :latest
-      # forward in git history.
-      - name: Move :latest to this SHA
+      # Retag the already-pushed release manifest as :latest.
+      - name: Move :latest to this release tag
        if: steps.latest_check.outputs.push_latest == 'true'
+        env:
+          RELEASE_TAG: ${{ needs.merge.outputs.release_tag }}
        run: |
          set -euo pipefail
          image=nousresearch/hermes-agent
          docker buildx imagetools create \
            --tag "${image}:latest" \
-            "${image}:sha-${GITHUB_SHA}"
+            "${image}:${RELEASE_TAG}"
@@ -513,6 +513,17 @@ generic plugin surface (new hook, new ctx method) — never hardcode
 plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
 honcho argparse from `main.py` for exactly this reason.

+**No new in-tree memory providers (policy, May 2026):** the set of
+built-in memory providers under `plugins/memory/` is closed. New memory
+backends must ship as **standalone plugin repos** that users install
+into `~/.hermes/plugins/` (or via pip entry points) — they implement
+the same `MemoryProvider` ABC, register through the same discovery
+path, and integrate via `hermes memory setup` / `post_setup()` without
+landing in this tree. PRs that add a new directory under
+`plugins/memory/` will be closed with a pointer to publish the
+provider as its own repo. Existing in-tree providers stay; bug fixes
+to them are welcome.
+
 ### Model-provider plugins (`plugins/model-providers/<name>/`)

 Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
@@ -580,6 +591,86 @@ during setup, injected at load time).
 Top-level `tags:` and `category:` are also accepted and mirrored from
 `metadata.hermes.*` by the loader.

+### Skill authoring standards (HARDLINE)
+
+Every new or modernized skill — bundled, optional, or contributed —
+must meet these standards before merge. Reviewers reject PRs that
+violate them.
+
+1. **`description` ≤ 60 characters, one sentence, ends with a period.**
+   Long descriptions bloat skill listings and dilute the model's
+   attention when many skills are loaded. State the capability, not
+   the implementation. No marketing words ("powerful",
+   "comprehensive", "seamless", "advanced"). Don't repeat the skill
+   name. Verify with:
+   ```python
+   import re, pathlib
+   m = re.search(r'^description: (.*)$',
+                 pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
+                 re.MULTILINE)
+   assert len(m.group(1)) <= 60, len(m.group(1))
+   ```
+
+2. **Tools referenced in SKILL.md prose must be native Hermes tools or
+   MCP servers the skill explicitly expects.** When the skill needs a
+   capability, point at the proper tool by name in backticks
+   (`` `terminal` ``, `` `web_extract` ``, `` `read_file` ``,
+   `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``,
+   `` `browser_navigate` ``, `` `delegate_task` ``, etc.). Do NOT
+   name shell utilities the agent already has wrapped — `grep` →
+   `search_files`, `cat`/`head`/`tail` → `read_file`, `sed`/`awk` →
+   `patch`, `find`/`ls` → `search_files target='files'`. If the skill
+   depends on an MCP server, name the MCP server and document the
+   expected setup in `## Prerequisites`. Anything else (third-party
+   CLIs, shell pipelines, etc.) is fair game inside script files but
+   should not be the headline interaction surface in the prose.
+
+3. **`platforms:` gating audited against actual script imports.**
+   Skills that use POSIX-only primitives (`fcntl`, `termios`,
+   `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, `/tmp`
+   hardcoded, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`,
+   `systemctl`) must declare their supported platforms. Default
+   posture: try to fix it cross-platform first — `tempfile.gettempdir`,
+   `pathlib.Path`, `psutil.pid_exists`, Python-level filtering instead
+   of `grep`. Gate to a narrower set only when the dependency is
+   genuinely platform-bound.
+
+4. **`author` credits the human contributor first.** For external
+   contributions, the contributor's real name + GitHub handle goes
+   first; "Hermes Agent" is the secondary collaborator. If the
+   contributor's commit shows "Hermes Agent" as author (because they
+   used Hermes to draft the skill), replace it with their actual name
+   — credit the human, not the tool.
+
+5. **SKILL.md body uses the modern section order.** `# <Skill> Skill`
+   title, 2-3 sentence intro stating what it does and doesn't do,
+   `## When to Use`, `## Prerequisites`, `## How to Run`,
+   `## Quick Reference`, `## Procedure`, `## Pitfalls`,
+   `## Verification`. Target ~200 lines for a complex skill,
+   ~100 lines for a simple one. Cut redundant intro fluff, marketing
+   prose, and re-explanations of env vars already in
+   `## Prerequisites`.
+
+6. **Scripts go in `scripts/`, references in `references/`,
+   templates in `templates/`.** Don't expect the model to inline-write
+   parsers, XML walkers, or non-trivial logic every call — ship a
+   helper script. Reference it from SKILL.md by path relative to the
+   skill directory.
+
+7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only
+   stdlib + pytest + `unittest.mock`. No live network calls. Run via
+   `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`.
+
+8. **`.env.example` additions are isolated to a clearly delimited
+   block.** Don't touch the surrounding file — contributor-supplied
+   `.env.example` versions are usually stale and edits outside the
+   skill's own block must be dropped during salvage.
+
+The full salvage / modernization checklist for external skill PRs
+lives in the `hermes-agent-dev` skill at
+`references/new-skill-pr-salvage.md` — load it before polishing
+contributor skill PRs.
+
 ---

 ## Toolsets
@@ -49,6 +49,24 @@ If your skill is specialized, community-contributed, or niche, it's better suite

 ---

+## Memory Providers: Ship as a Standalone Plugin
+
+**We are no longer accepting new memory providers into this repo.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is closed. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).
+
+Standalone memory plugins:
+
+- Implement the same `MemoryProvider` ABC (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown`, and optionally `post_setup(hermes_home, config)` for setup-wizard integration
+- Use the same discovery system — `discover_memory_providers()` picks them up from user/project plugin directories and pip entry points
+- Integrate with `hermes memory setup` via `post_setup()` — no need to touch core code
+- Can register their own CLI subcommands via `register_cli(subparser)` in a `cli.py` file
+- Get all the same lifecycle hooks and config plumbing as in-tree providers
+
+PRs that add a new directory under `plugins/memory/` will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.
+
+This isn't a quality bar — it's a coupling-and-maintenance decision. Memory providers are the most common plugin type and they shouldn't all live in this tree.
+
+---
+
 ## Development Setup

 ### Prerequisites
@@ -461,6 +479,58 @@ Gateway and messaging sessions never collect secrets in-band; they instruct the

 See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.

+### Skill authoring standards (HARDLINE)
+
+Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.
+
+1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
+   ```python
+   import re, pathlib
+   m = re.search(r'^description: (.*)$',
+                 pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
+                 re.MULTILINE)
+   assert len(m.group(1)) <= 60, len(m.group(1))
+   ```
+
+   Good: `Search arXiv papers by keyword, author, category, or ID.`
+   Bad: `A powerful and comprehensive skill that allows the agent to search arXiv for relevant academic papers using various criteria including keywords, authors, and categories.`
+
+2. **Tools referenced in SKILL.md prose must be native Hermes tools or MCP servers the skill explicitly expects.** When the skill needs a capability, point at the proper tool by name in backticks: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``, `` `browser_navigate` ``, `` `delegate_task` ``, `` `image_generate` ``, `` `text_to_speech` ``, `` `cronjob` ``, `` `memory` ``, `` `skill_view` ``, `` `todo` ``, `` `execute_code` ``.
+
+   Do NOT name shell utilities the agent already has wrapped:
+
+   | Don't say | Say |
+   |---|---|
+   | `grep`, `rg` | `search_files` |
+   | `cat`, `head`, `tail` | `read_file` |
+   | `sed`, `awk` | `patch` |
+   | `find`, `ls` | `search_files` (with `target='files'`) |
+   | `curl` for content extraction | `web_extract` |
+   | `echo > file`, `cat <<EOF` | `write_file` |
+
+   If the skill depends on an MCP server, name the MCP server and document its setup in `## Prerequisites`. Third-party CLIs (e.g. `ffmpeg`, `gh`, a specific SDK) are fine to invoke from inside script files, but the prose should frame the interaction as "invoke through the `terminal` tool", not as a manual shell session.
+
+3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to fix it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).
+
+4. **`author` credits the human contributor first.** For external contributions, the contributor's real name + GitHub handle goes first (`Jane Doe (jane-doe)`); "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author because they used Hermes to draft the skill, replace it with their actual name — credit the human, not the tool.
+
+5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and what it doesn't do, then:
+   - `## When to Use` — trigger conditions
+   - `## Prerequisites` — env vars, install steps, MCP setup, API key sourcing
+   - `## How to Run` — canonical invocation through the `terminal` tool
+   - `## Quick Reference` — flat command/API reference
+   - `## Procedure` — numbered steps with copy-paste commands
+   - `## Pitfalls` — known limits, rate limits, things that look broken but aren't
+   - `## Verification` — single command that proves the skill works
+
+   Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already documented in `## Prerequisites`.
+
+6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference scripts from SKILL.md by path relative to the skill directory.
+
+7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only stdlib + pytest + `unittest.mock`. No live network calls. Run via `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`. Must pass under the hermetic CI env (no API keys leaking through). Use `monkeypatch` and `tmp_path` for any env-var or filesystem dependencies.
+
+8. **`.env.example` additions are isolated to a clearly delimited block.** Don't touch the surrounding file — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).
+
 ### Skill guidelines

 - **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
@@ -94,9 +94,13 @@ RUN cd web && npm run build && \
 # hermes_cli/main.py succeeds (see #18800). /opt/hermes/web is build-time
 # only (HERMES_WEB_DIST points at hermes_cli/web_dist) and is intentionally
 # not chowned here.
+# The .venv MUST be hermes-writable so lazy_deps.py can install platform
+# packages (discord.py, telegram, slack, etc.) at first gateway boot.
+# Without this, `uv pip install` fails with EACCES and all messaging
+# adapters silently fail to load.  See tools/lazy_deps.py.
 USER root
 RUN chmod -R a+rX /opt/hermes && \
-    chown -R hermes:hermes /opt/hermes/ui-tui /opt/hermes/node_modules
+    chown -R hermes:hermes /opt/hermes/.venv /opt/hermes/ui-tui /opt/hermes/node_modules
 # Start as root so the entrypoint can usermod/groupmod + gosu.
 # If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).

@@ -14,7 +14,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -1,10 +1,11 @@
-"""ACP permission bridging — maps ACP approval requests to hermes approval callbacks."""
+"""ACP permission bridging for Hermes dangerous-command approvals."""

 from __future__ import annotations

 import asyncio
 import logging
 from concurrent.futures import TimeoutError as FutureTimeout
+from itertools import count
 from typing import Callable

 from acp.schema import (
@@ -14,24 +15,87 @@ from acp.schema import (

 logger = logging.getLogger(__name__)

-# Maps ACP PermissionOptionKind -> hermes approval result strings
-_KIND_TO_HERMES = {
+# Maps ACP permission option ids to Hermes approval result strings.
+# Option ids are stable across both the ``allow_permanent=True`` and
+# ``allow_permanent=False`` paths even though the option list differs.
+_OPTION_ID_TO_HERMES = {
    "allow_once": "once",
+    "allow_session": "session",
    "allow_always": "always",
-    "reject_once": "deny",
-    "reject_always": "deny",
+    "deny": "deny",
 }

+_PERMISSION_REQUEST_IDS = count(1)
+
+
+def _build_permission_options(*, allow_permanent: bool) -> list[PermissionOption]:
+    """Return ACP options that match Hermes approval semantics."""
+    options = [
+        PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
+        PermissionOption(
+            option_id="allow_session",
+            # ACP has no session-scoped kind, so use the closest persistent
+            # hint while keeping Hermes semantics in the option id.
+            kind="allow_always",
+            name="Allow for session",
+        ),
+    ]
+    if allow_permanent:
+        options.append(
+            PermissionOption(
+                option_id="allow_always",
+                kind="allow_always",
+                name="Allow always",
+            ),
+        )
+    options.append(PermissionOption(option_id="deny", kind="reject_once", name="Deny"))
+    return options
+
+
+def _build_permission_tool_call(command: str, description: str):
+    """Return the ACP tool-call update attached to a permission request.
+
+    ``request_permission`` expects a ``ToolCallUpdate`` payload — produced
+    by ``_acp.update_tool_call`` — not a ``ToolCallStart``. Each request
+    gets a unique ``perm-check-N`` id so concurrent requests don't collide.
+    """
+    import acp as _acp
+
+    tool_call_id = f"perm-check-{next(_PERMISSION_REQUEST_IDS)}"
+    return _acp.update_tool_call(
+        tool_call_id,
+        title=description,
+        kind="execute",
+        status="pending",
+        content=[_acp.tool_content(_acp.text_block(f"$ {command}"))],
+        raw_input={"command": command, "description": description},
+    )
+
+
+def _map_outcome_to_hermes(outcome: object, *, allowed_option_ids: set[str]) -> str:
+    """Map an ACP permission outcome into Hermes approval strings."""
+    if not isinstance(outcome, AllowedOutcome):
+        return "deny"
+
+    option_id = outcome.option_id
+    if option_id not in allowed_option_ids:
+        logger.warning("Permission request returned unknown option_id: %s", option_id)
+        return "deny"
+    return _OPTION_ID_TO_HERMES.get(option_id, "deny")
+

 def make_approval_callback(
    request_permission_fn: Callable,
    loop: asyncio.AbstractEventLoop,
    session_id: str,
    timeout: float = 60.0,
-) -> Callable[[str, str], str]:
+) -> Callable[..., str]:
    """
-    Return a hermes-compatible ``approval_callback(command, description) -> str``
-    that bridges to the ACP client's ``request_permission`` call.
+    Return a Hermes-compatible approval callback that bridges to ACP.
+
+    The callback accepts ``command`` and ``description`` plus optional
+    keyword arguments such as ``allow_permanent`` used by
+    ``tools.approval.prompt_dangerous_approval()``.

    Args:
        request_permission_fn: The ACP connection's ``request_permission`` coroutine.
@@ -40,41 +104,38 @@ def make_approval_callback(
        timeout: Seconds to wait for a response before auto-denying.
    """

-    def _callback(command: str, description: str) -> str:
-        options = [
-            PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
-            PermissionOption(option_id="allow_always", kind="allow_always", name="Allow always"),
-            PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
-        ]
-        import acp as _acp
-
-        tool_call = _acp.start_tool_call("perm-check", command, kind="execute")
-
-        coro = request_permission_fn(
-            session_id=session_id,
-            tool_call=tool_call,
-            options=options,
-        )
+    def _callback(
+        command: str,
+        description: str,
+        *,
+        allow_permanent: bool = True,
+        **_: object,
+    ) -> str:
+        options = _build_permission_options(allow_permanent=allow_permanent)

+        future = None
        try:
+            tool_call = _build_permission_tool_call(command, description)
+            coro = request_permission_fn(
+                session_id=session_id,
+                tool_call=tool_call,
+                options=options,
+            )
            future = asyncio.run_coroutine_threadsafe(coro, loop)
            response = future.result(timeout=timeout)
        except (FutureTimeout, Exception) as exc:
+            if future is not None:
+                future.cancel()
            logger.warning("Permission request timed out or failed: %s", exc)
            return "deny"

        if response is None:
            return "deny"

-        outcome = response.outcome
-        if isinstance(outcome, AllowedOutcome):
-            option_id = outcome.option_id
-            # Look up the kind from our options list
-            for opt in options:
-                if opt.option_id == option_id:
-                    return _KIND_TO_HERMES.get(opt.kind, "deny")
-            return "once"  # fallback for unknown option_id
-        else:
-            return "deny"
+        allowed_option_ids = {option.option_id for option in options}
+        return _map_outcome_to_hermes(
+            response.outcome,
+            allowed_option_ids=allowed_option_ids,
+        )

    return _callback
@@ -1305,9 +1305,8 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
            ),
        }
        # Forward cache_control marker when present on the OpenAI-format
-        # tool dict (set by ``mark_tools_for_long_lived_cache``). Anthropic's
-        # tools array supports cache_control on the last tool to cache the
-        # entire schema cross-session.
+        # tool dict. Anthropic's tools array supports cache_control on the
+        # last tool to cache the entire schema cross-session.
        cache_control = t.get("cache_control")
        if isinstance(cache_control, dict):
            anthropic_tool["cache_control"] = dict(cache_control)
@@ -382,7 +382,28 @@ _AI_GATEWAY_HEADERS = {
 # Nous Portal extra_body for product attribution.
 # Callers should pass this as extra_body in chat.completions.create()
 # when the auxiliary client is backed by Nous Portal.
-NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent", "client=aux"]}
+#
+# The tags are computed from agent.portal_tags so the client= marker stays
+# in lockstep with hermes_cli.__version__ across every Portal call site
+# (main loop, aux, compression, web_extract). Do not inline a literal here;
+# see agent/portal_tags.py for the rationale.
+from agent.portal_tags import nous_portal_tags as _nous_portal_tags
+
+
+def _nous_extra_body() -> dict:
+    """Return a fresh Nous Portal ``extra_body`` dict.
+
+    Computed at call time so a hot-reloaded ``hermes_cli.__version__`` is
+    reflected without restarting long-running processes.
+    """
+    return {"tags": _nous_portal_tags()}
+
+
+# Backwards-compatible module attribute. Some callers (tests, third-party
+# plugins) read ``NOUS_EXTRA_BODY`` directly; keep it as a snapshot of the
+# current tags. Callers that need the freshest value should call
+# ``_nous_extra_body()`` or import ``nous_portal_tags`` directly.
+NOUS_EXTRA_BODY = _nous_extra_body()

 # Set at resolve time — True if the auxiliary client points to Nous Portal
 auxiliary_is_nous: bool = False
@@ -1386,6 +1407,7 @@ def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Opt
    if pool_present:
        or_key = explicit_api_key or _pool_runtime_api_key(entry)
        if not or_key:
+            _mark_provider_unhealthy("openrouter", ttl=60)
            return None, None
        base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
        logger.debug("Auxiliary client: OpenRouter via pool")
@@ -1394,6 +1416,7 @@ def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Opt

    or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY")
    if not or_key:
+        _mark_provider_unhealthy("openrouter", ttl=60)
        return None, None
    logger.debug("Auxiliary client: OpenRouter")
    return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
@@ -1425,6 +1448,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
                "Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
                _remaining,
            )
+            _mark_provider_unhealthy("nous", ttl=_remaining)
            return None, None
    except Exception:
        pass
@@ -1432,6 +1456,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
    nous = _read_nous_auth()
    runtime = _resolve_nous_runtime_api(force_refresh=False)
    if runtime is None and not nous:
+        _mark_provider_unhealthy("nous", ttl=60)
        return None, None
    global auxiliary_is_nous
    auxiliary_is_nous = True
@@ -3437,7 +3462,7 @@ def get_auxiliary_extra_body() -> dict:
    Includes Nous Portal product tags when the auxiliary client is backed
    by Nous Portal. Returns empty dict otherwise.
    """
-    return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
+    return _nous_extra_body() if auxiliary_is_nous else {}


 def auxiliary_max_tokens_param(value: int) -> dict:
@@ -4026,7 +4051,7 @@ def _build_call_kwargs(
    # Provider-specific extra_body
    merged_extra = dict(extra_body or {})
    if provider == "nous" or auxiliary_is_nous:
-        merged_extra.setdefault("tags", []).extend(NOUS_EXTRA_BODY["tags"])
+        merged_extra.setdefault("tags", []).extend(_nous_portal_tags())
    if merged_extra:
        kwargs["extra_body"] = merged_extra

@@ -4411,7 +4436,7 @@ def extract_content_or_reasoning(response) -> str:
      1. ``message.content`` — strip inline think/reasoning blocks, check for
         remaining non-whitespace text.
      2. ``message.reasoning`` / ``message.reasoning_content`` — direct
-         structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
+         structured reasoning fields (DeepSeek, Moonshot, NovitaAI, etc.).
      3. ``message.reasoning_details`` — OpenRouter unified array format.

    Returns the best available text, or ``""`` if nothing found.
@@ -1185,6 +1185,26 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            idx += 1
        return idx

+    def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
+        """Total count of head messages to protect.
+
+        ``protect_first_n`` is defined as *additional* messages protected
+        beyond the system prompt.  The system prompt (if present at index 0)
+        is always implicitly protected — it's load-bearing context that
+        must never be summarised away.  This keeps semantics stable across
+        call paths where the system prompt may or may not be included in
+        the ``messages`` list (e.g. the gateway ``/compress`` handler
+        strips it before calling compress()).
+
+        Examples:
+          protect_first_n=0 → system prompt only (or nothing if no system msg)
+          protect_first_n=3 → system + first 3 non-system messages
+        """
+        head = 0
+        if messages and messages[0].get("role") == "system":
+            head = 1
+        return head + self.protect_first_n
+
    def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
        """Pull a compress-end boundary backward to avoid splitting a
        tool_call / result group.
@@ -1343,7 +1363,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        skip the LLM call when the transcript is still entirely inside
        the protected head/tail.
        """
-        compress_start = self._align_boundary_forward(messages, self.protect_first_n)
+        compress_start = self._align_boundary_forward(messages, self._protect_head_size(messages))
        compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
        return compress_start < compress_end

@@ -1379,7 +1399,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        self._last_aux_model_failure_model = None
        n_messages = len(messages)
        # Only need head + 3 tail messages minimum (token budget decides the real tail size)
-        _min_for_compress = self.protect_first_n + 3 + 1
+        _min_for_compress = self._protect_head_size(messages) + 3 + 1
        if n_messages <= _min_for_compress:
            if not self.quiet_mode:
                logger.warning(
@@ -1399,7 +1419,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)

        # Phase 2: Determine boundaries
-        compress_start = self.protect_first_n
+        compress_start = self._protect_head_size(messages)
        compress_start = self._align_boundary_forward(messages, compress_start)

        # Use token-budget tail protection instead of fixed message count
@@ -55,6 +55,11 @@ class ContextEngine(ABC):
    # These control the preflight compression check.  Subclasses may
    # override via __init__ or property; defaults are sensible for most
    # engines.
+    #
+    # protect_first_n semantics (since PR #13754): count of non-system head
+    # messages always preserved verbatim, IN ADDITION to the system prompt
+    # which is always implicitly protected.  Default 3 keeps the
+    # historical "system + first 3 non-system messages" head shape.

    threshold_percent: float = 0.75
    protect_first_n: int = 3
@@ -14,6 +14,7 @@ from difflib import unified_diff
 from pathlib import Path

 from utils import safe_json_loads
+from agent.tool_result_classification import file_mutation_result_landed

 # ANSI escape codes for coloring tool failure indicators
 _RED = "\033[31m"
@@ -810,6 +811,8 @@ def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]
    """
    if result is None:
        return False, ""
+    if file_mutation_result_landed(tool_name, result):
+        return False, ""

    if tool_name == "terminal":
        data = safe_json_loads(result)
@@ -450,7 +450,13 @@ def _make_stream_chunk(
    finish_reason: Optional[str] = None,
    reasoning: str = "",
 ) -> _GeminiStreamChunk:
-    delta_kwargs: Dict[str, Any] = {"role": "assistant"}
+    delta_kwargs: Dict[str, Any] = {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": None,
+        "reasoning": None,
+        "reasoning_content": None,
+    }
    if content:
        delta_kwargs["content"] = content
    if tool_call_delta is not None:
@@ -77,6 +77,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:

    Reads ``image_gen.provider`` from config.yaml; falls back per the
    module docstring.
+
+    **Availability semantics** (mirrors :mod:`agent.web_search_registry`):
+
+    - When ``image_gen.provider`` is explicitly set, the configured
+      provider is returned even if :meth:`ImageGenProvider.is_available`
+      reports False — the dispatcher surfaces a precise "X_API_KEY is not
+      set" error rather than silently switching backends.
+    - When ``image_gen.provider`` is unset, the fallback path (single-
+      provider shortcut and the FAL legacy preference) is filtered by
+      ``is_available()`` so we don't pick a provider the user has no
+      credentials for.
    """
    configured: Optional[str] = None
    try:
@@ -94,6 +105,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:
    with _lock:
        snapshot = dict(_providers)

+    def _is_available_safe(p: ImageGenProvider) -> bool:
+        """Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
+        try:
+            return bool(p.is_available())
+        except Exception as exc:  # noqa: BLE001
+            logger.debug("image_gen provider %s.is_available() raised %s", p.name, exc)
+            return False
+
+    # 1. Explicit config wins — return regardless of is_available() so the
+    #    user gets a precise downstream error message rather than a silent
+    #    backend switch.
    if configured:
        provider = snapshot.get(configured)
        if provider is not None:
@@ -103,13 +125,16 @@ def get_active_provider() -> Optional[ImageGenProvider]:
            configured,
        )

-    # Fallback: single-provider case
-    if len(snapshot) == 1:
-        return next(iter(snapshot.values()))
+    # 2. Fallback: single registered provider — but only if it's actually
+    #    available (no credentials = don't surface it as "active").
+    available = [p for p in snapshot.values() if _is_available_safe(p)]
+    if len(available) == 1:
+        return available[0]

-    # Fallback: prefer legacy FAL for backward compat
-    if "fal" in snapshot:
-        return snapshot["fal"]
+    # 3. Fallback: prefer legacy FAL for backward compat, when available.
+    fal = snapshot.get("fal")
+    if fal is not None and _is_available_safe(fal):
+        return fal

    return None

@@ -47,7 +47,7 @@ def _resolve_requests_verify() -> bool | str:
 _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
    "gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
-    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
+    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba", "novita",
    "qwen-oauth",
    "xiaomi",
    "arcee",
@@ -66,7 +66,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "gmi-cloud", "gmicloud",
    "xai", "x-ai", "x.ai", "grok",
    "nvidia", "nim", "nvidia-nim", "nemotron",
-    "qwen-portal",
+    "qwen-portal", "novita-ai", "novitaai",
 })


@@ -104,6 +104,8 @@ def _strip_provider_prefix(model: str) -> str:

 _model_metadata_cache: Dict[str, Dict[str, Any]] = {}
 _model_metadata_cache_time: float = 0
+_novita_metadata_cache: Dict[str, Dict[str, Any]] = {}
+_novita_metadata_cache_time: float = 0
 _MODEL_CACHE_TTL = 3600
 _endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
 _endpoint_model_metadata_cache_time: Dict[str, float] = {}
@@ -285,6 +287,7 @@ def grok_supports_reasoning_effort(model: str) -> bool:
 _CONTEXT_LENGTH_KEYS = (
    "context_length",
    "context_window",
+    "context_size",
    "max_context_length",
    "max_position_embeddings",
    "max_model_len",
@@ -361,6 +364,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
    "api.gmi-serving.com": "gmi",
+    "api.novita.ai": "novita",
    "tokenhub.tencentmaas.com": "tencent-tokenhub",
    "ollama.com": "ollama-cloud",
 }
@@ -557,6 +561,16 @@ def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:


 def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
+    novita_input = payload.get("input_token_price_per_m")
+    novita_output = payload.get("output_token_price_per_m")
+    if novita_input is not None or novita_output is not None:
+        pricing: Dict[str, Any] = {}
+        if novita_input is not None:
+            pricing["prompt"] = str(float(novita_input) / 10_000 / 1_000_000)
+        if novita_output is not None:
+            pricing["completion"] = str(float(novita_output) / 10_000 / 1_000_000)
+        return pricing
+
    alias_map = {
        "prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
        "completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
@@ -1527,6 +1541,13 @@ def get_model_context_length(
        except ImportError:
            pass  # boto3 not installed — fall through to generic resolution

+    if provider == "novita" or (base_url and base_url_host_matches(base_url, "api.novita.ai")):
+        ctx = _resolve_endpoint_context_length(model, base_url or "https://api.novita.ai/openai/v1", api_key=api_key)
+        if ctx is not None:
+            if base_url:
+                save_context_length(model, base_url, ctx)
+            return ctx
+
    # 2. Active endpoint metadata for truly custom/unknown endpoints.
    # Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
    # /models endpoint may report a provider-imposed limit (e.g. Copilot
@@ -141,6 +141,7 @@ class ProviderInfo:
 # Hermes provider names → models.dev provider IDs
 PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "openrouter": "openrouter",
+    "novita": "novita-ai",
    "anthropic": "anthropic",
    "openai": "openai",
    "openai-codex": "openai",
@@ -0,0 +1,64 @@
+"""Centralized Nous Portal request tags.
+
+Every Hermes request that hits the Nous Portal — main agent loop, auxiliary
+client (compression / titles / vision / web_extract / session_search / etc.),
+and any future code path — must carry the same product-attribution tags so
+Nous can attribute usage to Hermes Agent and bucket it by client release.
+
+Tag shape (sent in OpenAI-compatible ``extra_body['tags']``):
+
+    [
+        "product=hermes-agent",
+        "client=hermes-client-v<__version__>",
+    ]
+
+The version is sourced live from ``hermes_cli.__version__`` so it auto-aligns
+to whatever release is installed; the release script
+(``scripts/release.py``) regex-bumps that single string, and every Portal
+request picks up the new tag on the next process start.
+
+Why one helper instead of inlining the literal at each site:
+* Four call sites (main loop profile, aux client, run_agent compression
+  fallback, web_tools fallback) used to drift apart — see PR #24194 which
+  only got the aux site, leaving the main loop sending a different tag set.
+* Tests should assert the same tag list everywhere; centralizing makes that
+  assertion a one-liner against this module.
+
+Do NOT pre-compute these as module-level constants in the consumers. The
+version can change at runtime (editable installs, hot-reload tooling), and
+``hermes_cli.__version__`` is the canonical source of truth.
+"""
+
+from __future__ import annotations
+
+from typing import List
+
+
+def _hermes_version() -> str:
+    """Return the current Hermes release version, e.g. ``"0.13.0"``.
+
+    Falls back to ``"unknown"`` if ``hermes_cli`` cannot be imported (should
+    never happen in a real install — guarded for defensive testing).
+    """
+    try:
+        from hermes_cli import __version__
+        return __version__
+    except Exception:
+        return "unknown"
+
+
+def hermes_client_tag() -> str:
+    """Return the ``client=...`` tag for Nous Portal requests.
+
+    Format: ``client=hermes-client-v<MAJOR>.<MINOR>.<PATCH>``.
+    """
+    return f"client=hermes-client-v{_hermes_version()}"
+
+
+def nous_portal_tags() -> List[str]:
+    """Return the canonical list of Nous Portal product tags.
+
+    Always returns a fresh list so callers can mutate it freely
+    (e.g. ``merged_extra.setdefault("tags", []).extend(nous_portal_tags())``).
+    """
+    return ["product=hermes-agent", hermes_client_tag()]
@@ -1,25 +1,15 @@
-"""Anthropic prompt caching strategies.
+"""Anthropic prompt caching strategy.

-Two layouts:
-
-* ``system_and_3`` (default, used everywhere except the long-lived path):
-  4 cache_control breakpoints — system prompt + last 3 non-system messages.
-  All at the same TTL (5m or 1h). Reduces input token costs by ~75% on
-  multi-turn conversations within a single session.
-
-* ``prefix_and_2`` (Claude on Anthropic / OpenRouter / Nous Portal):
-  4 breakpoints split across two TTL tiers — tools[-1] (1h) +
-  stable system prefix (1h) + last 2 non-system messages (5m). The
-  long-lived prefix is byte-stable across sessions for a given user
-  config, so every fresh session reads the cached system+tools instead
-  of re-paying for them. Within-session rolling window shrinks from 3
-  messages to 2 to free the breakpoint budget.
+Single layout: ``system_and_3``. 4 cache_control breakpoints — system
+prompt + last 3 non-system messages, all at the same TTL (5m or 1h).
+Reduces input token costs by ~75% on multi-turn conversations within a
+single session.

 Pure functions -- no class state, no AIAgent dependency.
 """

 import copy
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List


 def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
@@ -87,115 +77,3 @@ def apply_anthropic_cache_control(
        _apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)

    return messages
-
-
-def _mark_system_stable_block(
-    messages: List[Dict[str, Any]],
-    long_lived_marker: Dict[str, str],
-) -> bool:
-    """Mark the *first* content block of the system message with the 1h marker.
-
-    The system message is expected to have been split into multiple content
-    blocks beforehand by the caller — block[0] is the cross-session-stable
-    prefix, subsequent blocks carry context files + volatile suffix.
-    Falls back to marking the whole system message as a single block when
-    the message hasn't been split (preserves correctness on the fallback path).
-
-    Returns True when a marker was placed.
-    """
-    if not messages or messages[0].get("role") != "system":
-        return False
-
-    sys_msg = messages[0]
-    content = sys_msg.get("content")
-
-    # Already a list of blocks → mark the first block.
-    if isinstance(content, list) and content:
-        first = content[0]
-        if isinstance(first, dict):
-            first["cache_control"] = long_lived_marker
-            return True
-        return False
-
-    # String content (no split) → cannot place a stable-prefix breakpoint
-    # without changing the byte content.  Caller is responsible for
-    # splitting; if they didn't, fall through to envelope marker so we still
-    # cache *something* for this turn.
-    if isinstance(content, str) and content:
-        sys_msg["content"] = [
-            {"type": "text", "text": content, "cache_control": long_lived_marker}
-        ]
-        return True
-
-    return False
-
-
-def apply_anthropic_cache_control_long_lived(
-    api_messages: List[Dict[str, Any]],
-    long_lived_ttl: str = "1h",
-    rolling_ttl: str = "5m",
-    native_anthropic: bool = False,
-) -> List[Dict[str, Any]]:
-    """Apply prefix_and_2 caching: long-lived stable prefix + rolling window.
-
-    Layout (4 breakpoints total):
-      * Stable system prefix (block[0]) → ``long_lived_ttl`` TTL
-      * Last 2 non-system messages → ``rolling_ttl`` TTL each
-
-    NOTE: this function does NOT mark the tools array. Tools cache_control
-    is attached separately (see ``mark_tools_for_long_lived_cache``) because
-    tools live outside the messages list in the API payload.
-
-    The caller MUST have split the system message into ordered content
-    blocks where block[0] is the cross-session-stable portion. If the system
-    message is still a single string, it is wrapped into a single block and
-    marked — this is correct, just less effective (the volatile suffix is
-    not isolated, so the prefix invalidates per-session).
-
-    Returns:
-        Deep copy of messages with cache_control breakpoints injected.
-    """
-    messages = copy.deepcopy(api_messages)
-    if not messages:
-        return messages
-
-    long_marker = _build_marker(long_lived_ttl)
-    rolling_marker = _build_marker(rolling_ttl)
-
-    placed_prefix = _mark_system_stable_block(messages, long_marker)
-
-    # Reserve 1 breakpoint for the system prefix (when placed); spend the
-    # remaining 3 on the rolling tail.  Anthropic max is 4 total —
-    # tools[-1] (when marked) consumes the 4th, so we cap rolling at 2 here.
-    rolling_budget = 2 if placed_prefix else 3
-    non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
-    for idx in non_sys[-rolling_budget:]:
-        _apply_cache_marker(messages[idx], rolling_marker, native_anthropic=native_anthropic)
-
-    return messages
-
-
-def mark_tools_for_long_lived_cache(
-    tools: Optional[List[Dict[str, Any]]],
-    long_lived_ttl: str = "1h",
-) -> Optional[List[Dict[str, Any]]]:
-    """Attach cache_control to the last tool in the OpenAI-format tools list.
-
-    Anthropic prefix-cache order is ``tools → system → messages``.  Marking
-    the last tool dict caches the entire tools array (Anthropic's docs:
-    "the marker is placed on the last block you want included in the cached
-    prefix").  Marker is preserved across the OpenAI-wire boundary on
-    OpenRouter and Nous Portal (which proxies to OpenRouter); on native
-    Anthropic the marker is forwarded by ``convert_tools_to_anthropic``.
-
-    Returns a deep copy of the tools list with the marker attached, or the
-    input unchanged when tools is empty/None.  Pure function — does not
-    mutate the input.
-    """
-    if not tools:
-        return tools
-    out = copy.deepcopy(tools)
-    last = out[-1]
-    if isinstance(last, dict):
-        last["cache_control"] = _build_marker(long_lived_ttl)
-    return out
@@ -14,6 +14,7 @@ from dataclasses import dataclass, field
 from typing import Any, Mapping

 from utils import safe_json_loads
+from agent.tool_result_classification import file_mutation_result_landed


 IDEMPOTENT_TOOL_NAMES = frozenset(
@@ -196,6 +197,8 @@ def classify_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str
    """
    if result is None:
        return False, ""
+    if file_mutation_result_landed(tool_name, result):
+        return False, ""

    if tool_name == "terminal":
        data = safe_json_loads(result)
@@ -0,0 +1,26 @@
+"""Shared helpers for classifying tool result payloads."""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+
+FILE_MUTATING_TOOL_NAMES = frozenset({"write_file", "patch"})
+
+
+def file_mutation_result_landed(tool_name: str, result: Any) -> bool:
+    """Return True when a file mutation result proves the write landed."""
+    if tool_name not in FILE_MUTATING_TOOL_NAMES or not isinstance(result, str):
+        return False
+    try:
+        data = json.loads(result.strip())
+    except Exception:
+        return False
+    if not isinstance(data, dict) or data.get("error"):
+        return False
+    if tool_name == "write_file":
+        return "bytes_written" in data
+    if tool_name == "patch":
+        return data.get("success") is True
+    return False
@@ -0,0 +1,368 @@
+"""Codex app-server JSON-RPC client.
+
+Speaks the protocol documented in codex-rs/app-server/README.md (codex 0.125+).
+Transport is newline-delimited JSON-RPC 2.0 over stdio: spawn `codex app-server`,
+do an `initialize` handshake, then drive `thread/start` + `turn/start` and
+consume streaming `item/*` notifications until `turn/completed`.
+
+This module is the wire-level speaker only. Higher-level concerns (event
+projection into Hermes' display, approval bridging, transcript projection into
+AIAgent.messages, plugin migration) live in sibling modules.
+
+Status: optional opt-in runtime gated behind `model.openai_runtime ==
+"codex_app_server"`. Hermes' default tool dispatch is unchanged when this
+runtime is not selected.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import queue
+import subprocess
+import threading
+import time
+from dataclasses import dataclass, field
+from typing import Any, Callable, Optional
+
+# Default minimum codex version we test against. The PR sets this from the
+# `codex --version` parsed at install time; bumping is a one-line change here.
+MIN_CODEX_VERSION = (0, 125, 0)
+
+
+@dataclass
+class CodexAppServerError(RuntimeError):
+    """Raised on JSON-RPC errors from the app-server."""
+
+    code: int
+    message: str
+    data: Optional[Any] = None
+
+    def __str__(self) -> str:  # pragma: no cover - trivial
+        return f"codex app-server error {self.code}: {self.message}"
+
+
+@dataclass
+class _Pending:
+    queue: queue.Queue
+    method: str
+    sent_at: float = field(default_factory=time.time)
+
+
+class CodexAppServerClient:
+    """Minimal JSON-RPC 2.0 client for `codex app-server` over stdio.
+
+    Threading model:
+      - Spawning thread (caller) drives request/response pairs synchronously.
+      - One reader thread parses stdout, dispatches replies to the right
+        pending future, and routes notifications + server-initiated requests
+        to bounded queues that the caller drains on their own cadence.
+      - One reader thread captures stderr for diagnostics; codex emits
+        tracing logs there at RUST_LOG-controlled levels.
+
+    Intentionally NOT async. AIAgent.run_conversation() is synchronous and
+    runs on the main thread; layering asyncio just to drive a stdio child
+    creates surprising interrupt semantics. We use blocking queues with
+    timeouts and rely on `turn/interrupt` for cancellation.
+    """
+
+    def __init__(
+        self,
+        codex_bin: str = "codex",
+        codex_home: Optional[str] = None,
+        extra_args: Optional[list[str]] = None,
+        env: Optional[dict[str, str]] = None,
+    ) -> None:
+        self._codex_bin = codex_bin
+        cmd = [codex_bin, "app-server"] + list(extra_args or [])
+        spawn_env = os.environ.copy()
+        if env:
+            spawn_env.update(env)
+        if codex_home:
+            spawn_env["CODEX_HOME"] = codex_home
+        # Codex emits tracing to stderr; default WARN keeps it quiet for users.
+        spawn_env.setdefault("RUST_LOG", "warn")
+
+        self._proc = subprocess.Popen(
+            cmd,
+            stdin=subprocess.PIPE,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            bufsize=0,
+            env=spawn_env,
+        )
+        self._next_id = 1
+        self._pending: dict[int, _Pending] = {}
+        self._pending_lock = threading.Lock()
+        self._notifications: queue.Queue = queue.Queue()
+        self._server_requests: queue.Queue = queue.Queue()
+        self._stderr_lines: list[str] = []
+        self._stderr_lock = threading.Lock()
+        self._closed = False
+        self._initialized = False
+
+        self._reader = threading.Thread(target=self._read_stdout, daemon=True)
+        self._reader.start()
+        self._stderr_reader = threading.Thread(target=self._read_stderr, daemon=True)
+        self._stderr_reader.start()
+
+    # ---------- lifecycle ----------
+
+    def initialize(
+        self,
+        client_name: str = "hermes",
+        client_title: str = "Hermes Agent",
+        client_version: str = "0.1",
+        capabilities: Optional[dict] = None,
+        timeout: float = 10.0,
+    ) -> dict:
+        """Send `initialize` + `initialized` handshake. Returns the server's
+        InitializeResponse (userAgent, codexHome, platformFamily, platformOs)."""
+        if self._initialized:
+            raise RuntimeError("already initialized")
+        params = {
+            "clientInfo": {
+                "name": client_name,
+                "title": client_title,
+                "version": client_version,
+            },
+            "capabilities": capabilities or {},
+        }
+        result = self.request("initialize", params, timeout=timeout)
+        self.notify("initialized")
+        self._initialized = True
+        return result
+
+    def close(self, timeout: float = 3.0) -> None:
+        """Close stdin and wait for the subprocess to exit, escalating to kill."""
+        if self._closed:
+            return
+        self._closed = True
+        try:
+            if self._proc.stdin and not self._proc.stdin.closed:
+                self._proc.stdin.close()
+        except Exception:
+            pass
+        try:
+            self._proc.terminate()
+            self._proc.wait(timeout=timeout)
+        except subprocess.TimeoutExpired:
+            try:
+                self._proc.kill()
+                self._proc.wait(timeout=1.0)
+            except Exception:
+                pass
+
+    def __enter__(self) -> "CodexAppServerClient":
+        return self
+
+    def __exit__(self, *exc: Any) -> None:
+        self.close()
+
+    # ---------- send/receive ----------
+
+    def request(
+        self,
+        method: str,
+        params: Optional[dict] = None,
+        timeout: float = 30.0,
+    ) -> dict:
+        """Send a JSON-RPC request and block on the response. Returns `result`,
+        raises CodexAppServerError on `error`."""
+        rid = self._take_id()
+        q: queue.Queue = queue.Queue(maxsize=1)
+        with self._pending_lock:
+            self._pending[rid] = _Pending(queue=q, method=method)
+        self._send({"id": rid, "method": method, "params": params or {}})
+        try:
+            msg = q.get(timeout=timeout)
+        except queue.Empty:
+            with self._pending_lock:
+                self._pending.pop(rid, None)
+            raise TimeoutError(
+                f"codex app-server method {method!r} timed out after {timeout}s"
+            )
+        if "error" in msg:
+            err = msg["error"]
+            raise CodexAppServerError(
+                code=err.get("code", -1),
+                message=err.get("message", ""),
+                data=err.get("data"),
+            )
+        return msg.get("result", {})
+
+    def notify(self, method: str, params: Optional[dict] = None) -> None:
+        """Send a JSON-RPC notification (no id, no response expected)."""
+        self._send({"method": method, "params": params or {}})
+
+    def respond(self, request_id: Any, result: dict) -> None:
+        """Reply to a server-initiated request (e.g. approval prompts)."""
+        self._send({"id": request_id, "result": result})
+
+    def respond_error(
+        self, request_id: Any, code: int, message: str, data: Optional[Any] = None
+    ) -> None:
+        """Reply to a server-initiated request with an error."""
+        err: dict[str, Any] = {"code": code, "message": message}
+        if data is not None:
+            err["data"] = data
+        self._send({"id": request_id, "error": err})
+
+    def take_notification(self, timeout: float = 0.0) -> Optional[dict]:
+        """Pop the next streaming notification, or return None on timeout.
+
+        timeout=0.0 means non-blocking. Use small positive timeouts inside the
+        AIAgent turn loop to interleave reads with interrupt checks."""
+        try:
+            if timeout <= 0:
+                return self._notifications.get_nowait()
+            return self._notifications.get(timeout=timeout)
+        except queue.Empty:
+            return None
+
+    def take_server_request(self, timeout: float = 0.0) -> Optional[dict]:
+        """Pop the next server-initiated request (e.g. exec/applyPatch approval)."""
+        try:
+            if timeout <= 0:
+                return self._server_requests.get_nowait()
+            return self._server_requests.get(timeout=timeout)
+        except queue.Empty:
+            return None
+
+    # ---------- diagnostics ----------
+
+    def stderr_tail(self, n: int = 20) -> list[str]:
+        """Return last n lines of codex's stderr (for error reports)."""
+        with self._stderr_lock:
+            return list(self._stderr_lines[-n:])
+
+    def is_alive(self) -> bool:
+        return self._proc.poll() is None
+
+    # ---------- internals ----------
+
+    def _take_id(self) -> int:
+        # JSON-RPC ids only need to be unique per-connection. A simple
+        # monotonically increasing int is the common choice and matches what
+        # codex's own clients use.
+        rid = self._next_id
+        self._next_id += 1
+        return rid
+
+    def _send(self, obj: dict) -> None:
+        if self._closed:
+            raise RuntimeError("codex app-server client is closed")
+        if self._proc.stdin is None:
+            raise RuntimeError("codex app-server stdin not available")
+        try:
+            self._proc.stdin.write((json.dumps(obj) + "\n").encode("utf-8"))
+            self._proc.stdin.flush()
+        except (BrokenPipeError, ValueError) as exc:
+            raise RuntimeError(
+                f"codex app-server stdin closed unexpectedly: {exc}"
+            ) from exc
+
+    def _read_stdout(self) -> None:
+        if self._proc.stdout is None:
+            return
+        try:
+            for line in iter(self._proc.stdout.readline, b""):
+                if not line:
+                    break
+                line = line.strip()
+                if not line:
+                    continue
+                try:
+                    msg = json.loads(line)
+                except json.JSONDecodeError:
+                    # Non-JSON output is unexpected on stdout; tracing belongs
+                    # on stderr. Surface it via stderr buffer for diagnostics.
+                    with self._stderr_lock:
+                        self._stderr_lines.append(
+                            f"<non-json on stdout> {line[:200]!r}"
+                        )
+                    continue
+                self._dispatch(msg)
+        except Exception as exc:
+            with self._stderr_lock:
+                self._stderr_lines.append(f"<stdout reader error> {exc}")
+
+    def _dispatch(self, msg: dict) -> None:
+        # Reply (has id + result/error, no method)
+        if "id" in msg and ("result" in msg or "error" in msg):
+            with self._pending_lock:
+                pending = self._pending.pop(msg["id"], None)
+            if pending is not None:
+                try:
+                    pending.queue.put_nowait(msg)
+                except queue.Full:  # pragma: no cover - defensive
+                    pass
+            return
+        # Server-initiated request (has id + method)
+        if "id" in msg and "method" in msg:
+            self._server_requests.put(msg)
+            return
+        # Notification (no id)
+        if "method" in msg:
+            self._notifications.put(msg)
+
+    def _read_stderr(self) -> None:
+        if self._proc.stderr is None:
+            return
+        try:
+            for line in iter(self._proc.stderr.readline, b""):
+                if not line:
+                    break
+                with self._stderr_lock:
+                    self._stderr_lines.append(
+                        line.decode("utf-8", "replace").rstrip()
+                    )
+                    # Bound memory: keep last 500 lines.
+                    if len(self._stderr_lines) > 500:
+                        self._stderr_lines = self._stderr_lines[-500:]
+        except Exception:  # pragma: no cover
+            pass
+
+
+def parse_codex_version(output: str) -> Optional[tuple[int, int, int]]:
+    """Parse `codex --version` output. Returns (major, minor, patch) or None."""
+    # Output format: "codex-cli 0.130.0" possibly followed by metadata.
+    import re
+
+    match = re.search(r"(\d+)\.(\d+)\.(\d+)", output or "")
+    if not match:
+        return None
+    return (int(match.group(1)), int(match.group(2)), int(match.group(3)))
+
+
+def check_codex_binary(
+    codex_bin: str = "codex", min_version: tuple[int, int, int] = MIN_CODEX_VERSION
+) -> tuple[bool, str]:
+    """Verify codex CLI is installed and meets minimum version.
+
+    Returns (ok, message). Used by setup wizard and runtime startup."""
+    try:
+        proc = subprocess.run(
+            [codex_bin, "--version"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+    except FileNotFoundError:
+        return False, (
+            f"codex CLI not found at {codex_bin!r}. Install with: "
+            f"npm i -g @openai/codex"
+        )
+    except subprocess.TimeoutExpired:
+        return False, "codex --version timed out"
+    if proc.returncode != 0:
+        return False, f"codex --version exited {proc.returncode}: {proc.stderr.strip()}"
+    version = parse_codex_version(proc.stdout)
+    if version is None:
+        return False, f"could not parse codex version from: {proc.stdout!r}"
+    if version < min_version:
+        return False, (
+            f"codex {'.'.join(map(str, version))} is older than required "
+            f"{'.'.join(map(str, min_version))}. Run: npm i -g @openai/codex"
+        )
+    return True, ".".join(map(str, version))
@@ -0,0 +1,810 @@
+"""Session adapter for codex app-server runtime.
+
+Owns one Codex thread per Hermes session. Drives `turn/start`, consumes
+streaming notifications via CodexEventProjector, handles server-initiated
+approval requests (apply_patch, exec command), translates cancellation,
+and returns a clean turn result that AIAgent.run_conversation() can splice
+into its `messages` list.
+
+Lifecycle:
+    session = CodexAppServerSession(cwd="/home/x/proj")
+    session.ensure_started()                              # spawns + handshake + thread/start
+    result = session.run_turn(user_input="hello")         # blocks until turn/completed
+    # result.final_text          → assistant text returned to caller
+    # result.projected_messages  → list of {role, content, ...} for messages list
+    # result.tool_iterations     → how many tool-shaped items completed (skill nudge counter)
+    # result.interrupted         → True if Ctrl+C / interrupt_requested fired mid-turn
+    session.close()                                       # tears down subprocess
+
+Threading model: the adapter is single-threaded from the caller's perspective.
+The underlying CodexAppServerClient owns its own reader threads but exposes
+blocking-with-timeout queues that this adapter polls in a loop, so the run_turn
+call is synchronous and behaves like AIAgent's existing chat_completions loop.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import threading
+import time
+from dataclasses import dataclass, field
+from typing import Any, Callable, Optional
+
+from agent.redact import redact_sensitive_text
+from agent.transports.codex_app_server import (
+    CodexAppServerClient,
+    CodexAppServerError,
+)
+from agent.transports.codex_event_projector import CodexEventProjector
+
+logger = logging.getLogger(__name__)
+
+
+# How many tailing stderr lines from the codex subprocess to attach to a
+# user-facing error when we don't have a more specific classification (OAuth,
+# wedge watchdog, etc.). Small enough to keep error messages legible, large
+# enough to surface a config/provider/auth diagnostic.
+_STDERR_TAIL_LINES = 12
+
+
+# Permission profile mapping mirrors the docstring in PR proposal:
+# Hermes' tools.terminal.security_mode → Codex's permissions profile id.
+# Defaults if config is missing → workspace-write (matches Codex's own default).
+_HERMES_TO_CODEX_PERMISSION_PROFILE = {
+    "auto": "workspace-write",
+    "approval-required": "read-only-with-approval",
+    "unrestricted": "full-access",
+    # Backstop alias used by some skills/tests.
+    "yolo": "full-access",
+}
+
+
+@dataclass
+class TurnResult:
+    """Result of one user→assistant→tool turn through the codex app-server."""
+
+    final_text: str = ""
+    projected_messages: list[dict] = field(default_factory=list)
+    tool_iterations: int = 0
+    interrupted: bool = False
+    error: Optional[str] = None  # Set if turn ended in a non-recoverable error
+    turn_id: Optional[str] = None
+    thread_id: Optional[str] = None
+    # Hint to the caller that the underlying codex subprocess is likely
+    # wedged (turn-level timeout fired, post-tool watchdog tripped, or
+    # token-refresh failure killed the child). The caller should retire
+    # the session so the next turn respawns codex from scratch instead
+    # of riding a CPU-spinning or auth-broken process. Mirrors openclaw
+    # beta.8's "retire timed-out app-server clients" fix.
+    should_retire: bool = False
+
+
+# Markers we accept as terminal even when codex never emits turn/completed.
+# Some codex versions stream `<turn_aborted>` as raw text in agentMessage
+# items when an interrupt or upstream error tears the turn down before the
+# normal completion path fires. Mirrors openclaw beta.8 fix.
+_TURN_ABORTED_MARKERS = ("<turn_aborted>", "<turn_aborted/>")
+
+
+# Substrings in codex stderr / JSON-RPC error messages that signal the
+# subprocess died because its OAuth credentials are no longer valid.
+# Kept conservative: we only redirect users to `codex login` when we're
+# reasonably sure that's the actual failure, otherwise we surface the
+# original error verbatim. Mirrors openclaw beta.8's auth-refresh
+# classification.
+_OAUTH_REFRESH_FAILURE_HINTS = (
+    "invalid_grant",
+    "invalid grant",
+    "refresh token",
+    "refresh_token",
+    "token refresh",
+    "token_refresh",
+    "token has expired",
+    "expired_token",
+    "expired token",
+    "not authenticated",
+    "unauthenticated",
+    "unauthorized",
+    "401 unauthorized",
+    "re-authenticate",
+    "reauthenticate",
+    "please log in",
+    "please login",
+    "auth profile",
+    "no auth profile",
+    "oauth",
+)
+
+
+def _classify_oauth_failure(*parts: str) -> Optional[str]:
+    """Return a user-friendly re-auth hint if any of the provided strings
+    look like a codex OAuth/token-refresh failure; otherwise None.
+
+    Used for both `turn/start` JSON-RPC errors and post-mortem stderr
+    inspection when the subprocess exits unexpectedly. Conservative on
+    purpose — we only redirect users to `codex login` when the signal
+    is strong, so unrelated runtime failures still surface verbatim.
+    """
+    haystack = " ".join(p for p in parts if p).lower()
+    if not haystack:
+        return None
+    for needle in _OAUTH_REFRESH_FAILURE_HINTS:
+        if needle in haystack:
+            return (
+                "Codex authentication failed — your ChatGPT/Codex login "
+                "looks expired or invalid. Run `codex login` to refresh, "
+                "then retry. (Fall back to default runtime with "
+                "`/codex-runtime auto` if the issue persists.)"
+            )
+    return None
+
+
+@dataclass
+class _ServerRequestRouting:
+    """Default policies for codex-side approval requests when no interactive
+    callback is wired in. These are only used by tests + cron / non-interactive
+    contexts; the live CLI path passes an approval_callback that defers to
+    tools.approval.prompt_dangerous_approval()."""
+
+    auto_approve_exec: bool = False
+    auto_approve_apply_patch: bool = False
+
+
+class CodexAppServerSession:
+    """One Codex thread per Hermes session, lifetime owned by AIAgent.
+
+    Not thread-safe — one caller drives it at a time, matching how AIAgent's
+    run_conversation() loop is structured today. The codex client itself can
+    handle interleaved reads/writes via its own threads, but the adapter's
+    state (projector, thread_id, turn counter) is owned by the caller thread.
+    """
+
+    def __init__(
+        self,
+        *,
+        cwd: Optional[str] = None,
+        codex_bin: str = "codex",
+        codex_home: Optional[str] = None,
+        permission_profile: Optional[str] = None,
+        approval_callback: Optional[Callable[..., str]] = None,
+        on_event: Optional[Callable[[dict], None]] = None,
+        request_routing: Optional[_ServerRequestRouting] = None,
+        client_factory: Optional[Callable[..., CodexAppServerClient]] = None,
+    ) -> None:
+        self._cwd = cwd or os.getcwd()
+        self._codex_bin = codex_bin
+        self._codex_home = codex_home
+        self._permission_profile = (
+            permission_profile or _HERMES_TO_CODEX_PERMISSION_PROFILE.get(
+                os.environ.get("HERMES_TERMINAL_SECURITY_MODE", "auto"),
+                "workspace-write",
+            )
+        )
+        self._approval_callback = approval_callback
+        self._on_event = on_event  # Display hook (kawaii spinner ticks etc.)
+        self._routing = request_routing or _ServerRequestRouting()
+        self._client_factory = client_factory or CodexAppServerClient
+
+        self._client: Optional[CodexAppServerClient] = None
+        self._thread_id: Optional[str] = None
+        self._interrupt_event = threading.Event()
+        # Pending file-change items, keyed by item id. Populated on
+        # item/started for fileChange items; consumed by the approval
+        # bridge when codex sends item/fileChange/requestApproval. The
+        # approval params don't carry the changeset, so we cache here
+        # to surface a real summary in the approval prompt (quirk #4).
+        self._pending_file_changes: dict[str, str] = {}
+        self._closed = False
+
+    # ---------- lifecycle ----------
+
+    def ensure_started(self) -> str:
+        """Spawn the subprocess, do the initialize handshake, and start a
+        thread. Returns the codex thread id. Idempotent — repeated calls
+        return the same thread id."""
+        if self._thread_id is not None:
+            return self._thread_id
+        if self._client is None:
+            self._client = self._client_factory(
+                codex_bin=self._codex_bin, codex_home=self._codex_home
+            )
+        self._client.initialize(
+            client_name="hermes",
+            client_title="Hermes Agent",
+            client_version=_get_hermes_version(),
+        )
+        # Permission selection is intentionally NOT sent on thread/start.
+        # Two reasons (live-tested against codex 0.130.0):
+        #   1. `thread/start.permissions` is gated behind the experimentalApi
+        #      capability on this codex version — we'd have to opt in during
+        #      initialize and accept the unstable surface.
+        #   2. Even with experimentalApi declared and the correct shape
+        #      (`{"type": "profile", "id": "..."}`, not `{"profileId": ...}`),
+        #      codex requires a matching `[permissions]` table in
+        #      ~/.codex/config.toml or it fails the request with
+        #      'default_permissions requires a [permissions] table'.
+        # Letting codex pick its default (`:read-only` unless the user has
+        # configured otherwise in their codex config.toml) is the standard
+        # codex CLI workflow and avoids fighting codex's own validation.
+        # Users who want a write-capable profile configure it in their
+        # ~/.codex/config.toml the same way they would for any codex usage.
+        params: dict[str, Any] = {"cwd": self._cwd}
+        result = self._client.request("thread/start", params, timeout=15)
+        # Cross-fill thread.id/sessionId — different codex versions have
+        # serialized this under either key. Mirrors openclaw beta.8's
+        # tolerance fix so future codex drops/renames don't KeyError us
+        # at handshake time.
+        thread_obj = result.get("thread") or {}
+        thread_id = (
+            thread_obj.get("id")
+            or thread_obj.get("sessionId")
+            or result.get("sessionId")
+            or result.get("threadId")
+        )
+        if not thread_id:
+            raise CodexAppServerError(
+                code=-32603,
+                message=(
+                    "codex thread/start returned no thread id "
+                    f"(payload keys: {sorted(result.keys())})"
+                ),
+            )
+        self._thread_id = thread_id
+        logger.info(
+            "codex app-server thread started: id=%s profile=%s cwd=%s",
+            self._thread_id[:8],
+            self._permission_profile,
+            self._cwd,
+        )
+        return self._thread_id
+
+    def close(self) -> None:
+        if self._closed:
+            return
+        self._closed = True
+        if self._client is not None:
+            try:
+                self._client.close()
+            except Exception:  # pragma: no cover - best-effort cleanup
+                pass
+            self._client = None
+        self._thread_id = None
+
+    def __enter__(self) -> "CodexAppServerSession":
+        return self
+
+    def __exit__(self, *exc: Any) -> None:
+        self.close()
+
+    # ---------- interrupt ----------
+
+    def request_interrupt(self) -> None:
+        """Idempotent: signal the active turn loop to issue turn/interrupt
+        and unwind. Called by AIAgent's _interrupt_requested path."""
+        self._interrupt_event.set()
+
+    # ---------- diagnostics ----------
+
+    def _format_error_with_stderr(
+        self,
+        prefix: str,
+        exc: Any = "",
+        *,
+        tail_lines: int = _STDERR_TAIL_LINES,
+    ) -> str:
+        """Build a user-facing error string for codex failures.
+
+        Appends the last few lines of codex's stderr buffer when available,
+        passed through agent.redact with force=True so secrets in provider
+        error responses (auth headers, query-string tokens, sk-* keys) never
+        leak into chat output or trajectories. The codex CLI's own error
+        text ('Internal error', 'turn/start failed: ...') is otherwise
+        opaque and forces users to re-run with verbose flags to diagnose
+        config / provider / auth-bridge problems.
+
+        Use this for the generic / catch-all branches. Specific
+        classifications (OAuth via _classify_oauth_failure, post-tool wedge
+        watchdog) already produce a clean hint and should be used instead.
+        """
+        exc_str = str(exc) if exc != "" and exc is not None else ""
+        base = f"{prefix}: {exc_str}" if exc_str else prefix
+        if self._client is None:
+            return base
+        try:
+            tail = self._client.stderr_tail(tail_lines)
+        except Exception:  # pragma: no cover - diagnostic best-effort
+            return base
+        if not tail:
+            return base
+        joined = "\n".join(line.rstrip() for line in tail if line)
+        if not joined.strip():
+            return base
+        redacted = redact_sensitive_text(joined, force=True)
+        return f"{base}\ncodex stderr (last {len(tail)} lines):\n{redacted}"
+
+    # ---------- per-turn ----------
+
+    def run_turn(
+        self,
+        user_input: str,
+        *,
+        turn_timeout: float = 600.0,
+        notification_poll_timeout: float = 0.25,
+        post_tool_quiet_timeout: float = 90.0,
+    ) -> TurnResult:
+        """Send a user message and block until turn/completed, while
+        forwarding server-initiated approval requests and projecting items
+        into Hermes' messages shape.
+
+        post_tool_quiet_timeout: if codex emits a tool completion and then
+        goes quiet for this many seconds without emitting another item or
+        `turn/completed`, fast-fail and mark the session for retirement.
+        Mirrors openclaw beta.8's post-tool completion watchdog (#81697)
+        so a wedged codex doesn't burn the full turn deadline.
+        """
+        # Pre-create the result so startup failures (codex subprocess can't
+        # spawn, initialize handshake rejects, thread/start blows up) surface
+        # the same way per-turn failures do — with a TurnResult.error string
+        # the caller can render — instead of bubbling raw codex exceptions
+        # up to AIAgent.run_conversation.
+        result = TurnResult()
+        try:
+            self.ensure_started()
+        except (CodexAppServerError, TimeoutError) as exc:
+            result.error = self._format_error_with_stderr(
+                "codex app-server startup failed", exc
+            )
+            # Subprocess almost certainly unhealthy — retire so the next
+            # turn re-spawns cleanly.
+            result.should_retire = True
+            return result
+        assert self._client is not None and self._thread_id is not None
+        result.thread_id = self._thread_id
+
+        self._interrupt_event.clear()
+        projector = CodexEventProjector()
+
+        # Send turn/start with the user input. Text-only for now (codex
+        # supports rich content but Hermes' text path is the common case).
+        try:
+            ts = self._client.request(
+                "turn/start",
+                {
+                    "threadId": self._thread_id,
+                    "input": [{"type": "text", "text": user_input}],
+                },
+                timeout=10,
+            )
+        except CodexAppServerError as exc:
+            # Classify auth/refresh failures so the user gets a clear
+            # `codex login` pointer instead of a raw RPC error string.
+            stderr_blob = "\n".join(self._client.stderr_tail(40))
+            hint = _classify_oauth_failure(exc.message, stderr_blob)
+            if hint is not None:
+                result.error = hint
+                # Subprocess is fine on a JSON-RPC level here, but the
+                # token store is broken — retire so the next turn does a
+                # clean handshake (and the user has a chance to re-auth
+                # via `codex login` between turns).
+                result.should_retire = True
+            else:
+                result.error = self._format_error_with_stderr(
+                    "turn/start failed", exc
+                )
+            return result
+        except TimeoutError as exc:
+            # turn/start hanging is a strong signal the subprocess is wedged.
+            stderr_blob = "\n".join(self._client.stderr_tail(40))
+            hint = _classify_oauth_failure(stderr_blob)
+            result.error = hint or self._format_error_with_stderr(
+                "turn/start timed out", exc
+            )
+            result.should_retire = True
+            return result
+
+        result.turn_id = (ts.get("turn") or {}).get("id")
+        deadline = time.time() + turn_timeout
+        turn_complete = False
+        # Post-tool watchdog state. last_tool_completion_at is set whenever
+        # a tool-shaped item completes; if no further notification arrives
+        # within post_tool_quiet_timeout and the turn hasn't completed, we
+        # fast-fail and retire the session.
+        last_tool_completion_at: Optional[float] = None
+
+        while time.time() < deadline and not turn_complete:
+            if self._interrupt_event.is_set():
+                self._issue_interrupt(result.turn_id)
+                result.interrupted = True
+                break
+
+            # Detect a dead subprocess between iterations. If codex exited
+            # (e.g. crashed, segfaulted, or its auth refresh thread killed
+            # the process), we won't get any more notifications — bail out
+            # rather than waiting for the full turn deadline.
+            if not self._client.is_alive():
+                stderr_blob = "\n".join(self._client.stderr_tail(60))
+                hint = _classify_oauth_failure(stderr_blob)
+                if hint is not None:
+                    result.error = hint
+                else:
+                    result.error = self._format_error_with_stderr(
+                        "codex app-server subprocess exited unexpectedly",
+                        tail_lines=20,
+                    )
+                result.should_retire = True
+                break
+
+            # Post-tool watchdog: if a tool completion was the most recent
+            # signal and codex has been silent past the quiet timeout, give
+            # up on this turn instead of waiting for the outer deadline.
+            if (
+                last_tool_completion_at is not None
+                and (time.time() - last_tool_completion_at)
+                    > post_tool_quiet_timeout
+            ):
+                self._issue_interrupt(result.turn_id)
+                result.interrupted = True
+                result.error = (
+                    f"codex went silent for "
+                    f"{post_tool_quiet_timeout:.0f}s after a tool result; "
+                    f"retiring app-server session."
+                )
+                result.should_retire = True
+                break
+
+            # Drain any server-initiated requests (approvals) before
+            # reading notifications, so the codex side isn't blocked.
+            sreq = self._client.take_server_request(timeout=0)
+            if sreq is not None:
+                # Drain any pending notifications first so per-turn state
+                # (e.g. _pending_file_changes for fileChange approvals) is
+                # up to date when we make the approval decision. Bounded
+                # to avoid starving the server-request response.
+                for _ in range(8):
+                    pending = self._client.take_notification(timeout=0)
+                    if pending is None:
+                        break
+                    self._track_pending_file_change(pending)
+                    proj = projector.project(pending)
+                    if proj.messages:
+                        result.projected_messages.extend(proj.messages)
+                    if proj.is_tool_iteration:
+                        result.tool_iterations += 1
+                        last_tool_completion_at = time.time()
+                    if proj.final_text is not None:
+                        result.final_text = proj.final_text
+                        if _has_turn_aborted_marker(proj.final_text):
+                            turn_complete = True
+                            result.interrupted = True
+                            result.error = (
+                                result.error
+                                or "codex reported turn_aborted"
+                            )
+                self._handle_server_request(sreq)
+                # Activity counts as live signal — reset the post-tool
+                # quiet timer so an approval round-trip doesn't trip it.
+                last_tool_completion_at = None
+                continue
+
+            note = self._client.take_notification(
+                timeout=notification_poll_timeout
+            )
+            if note is None:
+                continue
+
+            method = note.get("method", "")
+            if self._on_event is not None:
+                try:
+                    self._on_event(note)
+                except Exception:  # pragma: no cover - display callback
+                    logger.debug("on_event callback raised", exc_info=True)
+
+            # Track in-progress fileChange items so the approval bridge
+            # can surface a real change summary when codex requests
+            # approval (the approval params themselves don't carry the
+            # changeset). Quirk #4 fix.
+            self._track_pending_file_change(note)
+
+            # Project into messages
+            projection = projector.project(note)
+            if projection.messages:
+                result.projected_messages.extend(projection.messages)
+            if projection.is_tool_iteration:
+                result.tool_iterations += 1
+                # Arm/refresh the post-tool quiet watchdog whenever a
+                # tool-shaped item completes.
+                last_tool_completion_at = time.time()
+            else:
+                # Any non-tool projected activity (assistant message,
+                # status update, etc.) means codex is still producing
+                # output — clear the quiet timer so we don't fast-fail.
+                if projection.messages or projection.final_text is not None:
+                    last_tool_completion_at = None
+            if projection.final_text is not None:
+                # Codex can emit multiple agentMessage items in one turn
+                # (e.g. partial then final). Take the last one as canonical.
+                result.final_text = projection.final_text
+                # Some codex builds tear a turn down by emitting a
+                # `<turn_aborted>` marker in the agent message text and
+                # never sending turn/completed. Treat the marker itself
+                # as terminal so we don't burn the full deadline.
+                if _has_turn_aborted_marker(projection.final_text):
+                    turn_complete = True
+                    result.interrupted = True
+                    result.error = (
+                        result.error or "codex reported turn_aborted"
+                    )
+
+            if method == "turn/completed":
+                turn_complete = True
+                turn_status = (
+                    (note.get("params") or {}).get("turn") or {}
+                ).get("status")
+                if turn_status and turn_status not in ("completed", "interrupted"):
+                    err_obj = (
+                        (note.get("params") or {}).get("turn") or {}
+                    ).get("error")
+                    if err_obj:
+                        err_msg = err_obj.get("message") or str(err_obj)
+                        # If the turn failed for an auth/refresh reason,
+                        # rewrite the error into a re-auth hint AND mark
+                        # the session for retirement.
+                        stderr_blob = "\n".join(
+                            self._client.stderr_tail(40)
+                        )
+                        hint = _classify_oauth_failure(err_msg, stderr_blob)
+                        if hint is not None:
+                            result.error = hint
+                            result.should_retire = True
+                        else:
+                            result.error = self._format_error_with_stderr(
+                                f"turn ended status={turn_status}", err_msg
+                            )
+
+        if not turn_complete and not result.interrupted:
+            # Hit the deadline. Issue interrupt to stop wasted compute, and
+            # tell the caller to retire the session — a turn that never
+            # finished is a strong sign codex is wedged in a way the next
+            # turn shouldn't inherit.
+            self._issue_interrupt(result.turn_id)
+            result.interrupted = True
+            if not result.error:
+                result.error = self._format_error_with_stderr(
+                    f"turn timed out after {turn_timeout}s"
+                )
+            result.should_retire = True
+
+        return result
+
+    # ---------- internals ----------
+
+    def _issue_interrupt(self, turn_id: Optional[str]) -> None:
+        if self._client is None or self._thread_id is None or turn_id is None:
+            return
+        try:
+            self._client.request(
+                "turn/interrupt",
+                {"threadId": self._thread_id, "turnId": turn_id},
+                timeout=5,
+            )
+        except CodexAppServerError as exc:
+            # "no active turn to interrupt" is fine — already done.
+            logger.debug("turn/interrupt non-fatal: %s", exc)
+        except TimeoutError:
+            logger.warning("turn/interrupt timed out")
+
+    def _handle_server_request(self, req: dict) -> None:
+        """Translate a codex server request (approval) into Hermes' approval
+        flow, then send the response.
+
+        Method names verified live against codex 0.130.0 (Apr 2026):
+          item/commandExecution/requestApproval — exec approvals
+          item/fileChange/requestApproval       — apply_patch approvals
+          item/permissions/requestApproval      — permissions changes
+                                                  (we decline; user controls
+                                                  permission profile in
+                                                  ~/.codex/config.toml).
+        """
+        if self._client is None:
+            return
+        method = req.get("method", "")
+        rid = req.get("id")
+        params = req.get("params") or {}
+
+        if method == "item/commandExecution/requestApproval":
+            decision = self._decide_exec_approval(params)
+            self._client.respond(rid, {"decision": decision})
+        elif method == "item/fileChange/requestApproval":
+            decision = self._decide_apply_patch_approval(params)
+            self._client.respond(rid, {"decision": decision})
+        elif method == "item/permissions/requestApproval":
+            # Codex sometimes asks to escalate permissions mid-turn. We
+            # always decline — the user already chose their permission
+            # profile in ~/.codex/config.toml and surprise escalations
+            # shouldn't be silently accepted.
+            self._client.respond(rid, {"decision": "decline"})
+        elif method == "mcpServer/elicitation/request":
+            # Codex's MCP layer asks the user for structured input on
+            # behalf of an MCP server (e.g. tool-call confirmation,
+            # OAuth, form data). For our own hermes-tools callback we
+            # auto-accept — the user already approved Hermes' tools
+            # by enabling the runtime, and we never expose anything
+            # codex's built-in shell can't already do. For other MCP
+            # servers we decline so the user explicitly opts in via
+            # codex's own auth flow.
+            server_name = params.get("serverName") or ""
+            if server_name == "hermes-tools":
+                self._client.respond(
+                    rid,
+                    {"action": "accept", "content": None, "_meta": None},
+                )
+            else:
+                self._client.respond(
+                    rid,
+                    {"action": "decline", "content": None, "_meta": None},
+                )
+        else:
+            # Unknown server request — codex can extend this surface. Reject
+            # cleanly so codex doesn't hang waiting for us.
+            logger.warning("Unknown codex server request: %s", method)
+            self._client.respond_error(
+                rid, code=-32601, message=f"Unsupported method: {method}"
+            )
+
+    def _decide_exec_approval(self, params: dict) -> str:
+        if self._routing.auto_approve_exec:
+            return "accept"
+        command = params.get("command") or ""
+        # Codex's CommandExecutionRequestApprovalParams has cwd as Optional —
+        # fall back to the session's cwd when codex doesn't include it so the
+        # approval prompt is never empty (quirk #10 fix).
+        cwd = params.get("cwd") or self._cwd or "<unknown>"
+        reason = params.get("reason")
+        description = f"Codex requests exec in {cwd}"
+        if reason:
+            description += f" — {reason}"
+        if self._approval_callback is not None:
+            try:
+                choice = self._approval_callback(
+                    command, description, allow_permanent=False
+                )
+                return _approval_choice_to_codex_decision(choice)
+            except Exception:
+                logger.exception("approval_callback raised on exec request")
+                return "decline"
+        return "decline"  # fail-closed when no callback wired
+
+    def _decide_apply_patch_approval(self, params: dict) -> str:
+        if self._routing.auto_approve_apply_patch:
+            return "accept"
+        if self._approval_callback is not None:
+            # FileChangeRequestApprovalParams gives us reason + grantRoot.
+            # The actual changeset lives on the corresponding fileChange
+            # item which the projector has already cached for us — look it
+            # up by item_id so the user sees what's actually changing.
+            reason = params.get("reason")
+            grant_root = params.get("grantRoot")
+            item_id = params.get("itemId") or ""
+            change_summary = self._lookup_pending_file_change(item_id)
+            description_parts = []
+            if reason:
+                description_parts.append(reason)
+            if change_summary:
+                description_parts.append(change_summary)
+            if grant_root:
+                description_parts.append(f"grants write to {grant_root}")
+            description = (
+                "; ".join(description_parts)
+                if description_parts
+                else "Codex requests to apply a patch"
+            )
+            command_label = (
+                f"apply_patch: {change_summary}" if change_summary
+                else f"apply_patch: {reason}" if reason
+                else "apply_patch"
+            )
+            try:
+                choice = self._approval_callback(
+                    command_label,
+                    description,
+                    allow_permanent=False,
+                )
+                return _approval_choice_to_codex_decision(choice)
+            except Exception:
+                logger.exception("approval_callback raised on apply_patch")
+                return "decline"
+        return "decline"
+
+    def _track_pending_file_change(self, note: dict) -> None:
+        """Maintain self._pending_file_changes from item/started + item/completed
+        notifications. Lets the apply_patch approval prompt show what's
+        actually changing — codex's approval params don't carry the data."""
+        method = note.get("method", "")
+        params = note.get("params") or {}
+        item = params.get("item") or {}
+        if item.get("type") != "fileChange":
+            return
+        item_id = item.get("id") or ""
+        if not item_id:
+            return
+        if method == "item/started":
+            changes = item.get("changes") or []
+            if not changes:
+                self._pending_file_changes[item_id] = "1 change pending"
+                return
+            kinds: dict[str, int] = {}
+            paths: list[str] = []
+            for ch in changes:
+                if not isinstance(ch, dict):
+                    continue
+                kind = (ch.get("kind") or {}).get("type") or "update"
+                kinds[kind] = kinds.get(kind, 0) + 1
+                p = ch.get("path") or ""
+                if p:
+                    paths.append(p)
+            counts = ", ".join(f"{n} {k}" for k, n in sorted(kinds.items()))
+            preview = ", ".join(paths[:3])
+            if len(paths) > 3:
+                preview += f", +{len(paths) - 3} more"
+            self._pending_file_changes[item_id] = (
+                f"{counts}: {preview}" if preview else counts
+            )
+        elif method == "item/completed":
+            self._pending_file_changes.pop(item_id, None)
+
+    def _lookup_pending_file_change(self, item_id: str) -> Optional[str]:
+        """Look up an in-progress fileChange item by id and summarize its
+        changes for the approval prompt. Returns None when we don't have
+        the item cached (e.g. approval arrived before item/started, or
+        fileChange item content not tracked yet)."""
+        if not item_id:
+            return None
+        cached = self._pending_file_changes.get(item_id)
+        if not cached:
+            return None
+        return cached
+
+
+def _approval_choice_to_codex_decision(choice: str) -> str:
+    """Map Hermes approval choices onto codex's CommandExecutionApprovalDecision
+    / FileChangeApprovalDecision wire values.
+
+    Hermes returns 'once', 'session', 'always', or 'deny'.
+    Codex expects 'accept', 'acceptForSession', 'decline', or 'cancel'
+    (verified against codex-rs/app-server-protocol/src/protocol/v2/item.rs
+    on codex 0.130.0).
+    """
+    if choice in ("once",):
+        return "accept"
+    if choice in ("session", "always"):
+        return "acceptForSession"
+    return "decline"
+
+
+def _has_turn_aborted_marker(text: str) -> bool:
+    """Return True if `text` contains any of the raw markers codex uses
+    to signal a turn was aborted without emitting `turn/completed`.
+
+    Codex emits `<turn_aborted>` (and sometimes `<turn_aborted/>`) as raw
+    text inside agentMessage items when an interrupt or upstream error
+    tears the turn down before the normal completion path fires. Mirrors
+    openclaw beta.8's terminal-marker fix so we don't burn the full turn
+    deadline waiting for a turn/completed that never comes.
+    """
+    if not text:
+        return False
+    for marker in _TURN_ABORTED_MARKERS:
+        if marker in text:
+            return True
+    return False
+
+
+def _get_hermes_version() -> str:
+    """Best-effort Hermes version string for codex's userAgent line."""
+    try:
+        from importlib.metadata import version
+
+        return version("hermes-agent")
+    except Exception:  # pragma: no cover
+        return "0.0.0"
@@ -0,0 +1,312 @@
+"""Projects codex app-server events into Hermes' messages list.
+
+The translator that lets Hermes' memory/skill review keep working under the
+Codex runtime: it converts Codex `item/*` notifications into the standard
+OpenAI-shaped `{role, content, tool_calls, tool_call_id}` entries that
+`agent/curator.py` already knows how to read.
+
+Codex emits items with a discriminator field `type`:
+  - userMessage         → {role: "user", content}
+  - agentMessage        → {role: "assistant", content}
+  - reasoning           → stashed in the assistant's "reasoning" field
+  - commandExecution    → assistant tool_call(name="exec") + tool result
+  - fileChange          → assistant tool_call(name="apply_patch") + tool result
+  - mcpToolCall         → assistant tool_call(name=f"mcp.{server}.{tool}") + tool result
+  - dynamicToolCall     → assistant tool_call(name=tool) + tool result
+  - plan/hookPrompt/collabAgentToolCall → recorded as opaque assistant notes
+
+Each item maps to AT MOST one assistant entry + one tool entry, preserving
+Hermes' message-alternation invariants (system → user → assistant → user/tool
+→ assistant → ...). Multiple Codex tool calls within one Codex turn produce
+multiple consecutive (assistant, tool) pairs, which is the same shape Hermes
+already produces for parallel tool calls.
+
+Counters tracked alongside projection:
+  - tool_iterations: ticks once per completed tool-shaped item. Used by
+    AIAgent._iters_since_skill (skill nudge gate, default threshold 10).
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json
+from dataclasses import dataclass, field
+from typing import Any, Optional
+
+
+def _deterministic_call_id(item_type: str, item_id: str) -> str:
+    """Stable id for tool_call message correlation.
+
+    Uses the codex item id directly when present (already a uuid); falls back
+    to a content hash so replay produces the same id across sessions and
+    prefix caches stay valid. See AGENTS.md Pitfall #16 (deterministic IDs in
+    tool call history)."""
+    if item_id:
+        return f"codex_{item_type}_{item_id}"
+    digest = hashlib.sha256(f"{item_type}".encode()).hexdigest()[:16]
+    return f"codex_{item_type}_{digest}"
+
+
+def _format_tool_args(d: dict) -> str:
+    """Format a dict as JSON the way Hermes' existing tool_calls path does."""
+    return json.dumps(d, ensure_ascii=False, sort_keys=True)
+
+
+@dataclass
+class ProjectionResult:
+    """Output of projecting one Codex item.
+
+    `messages` is a list because some Codex items produce two messages
+    (assistant tool_call + tool result). Empty list = item ignored (e.g. a
+    streaming `outputDelta` that doesn't materialize into messages until the
+    `item/completed` event)."""
+
+    messages: list[dict] = field(default_factory=list)
+    is_tool_iteration: bool = False
+    final_text: Optional[str] = None  # Set when an agentMessage completes
+
+
+class CodexEventProjector:
+    """Stateful projector consuming Codex notifications in arrival order.
+
+    Owns the in-progress reasoning content (codex emits reasoning as separate
+    items but Hermes stashes it on the next assistant message)."""
+
+    def __init__(self) -> None:
+        self._pending_reasoning: list[str] = []
+
+    def project(self, notification: dict) -> ProjectionResult:
+        """Project a single notification. Idempotent for non-completion events;
+        only `item/completed` and `turn/completed` materialize messages."""
+        method = notification.get("method", "")
+        params = notification.get("params", {}) or {}
+
+        # We only materialize messages on `item/completed`. Streaming deltas
+        # (`item/<type>/outputDelta`, `item/<type>/delta`) are display-only and
+        # don't enter the messages list — same way Hermes already only writes
+        # the assistant message after the streaming completion event.
+        if method != "item/completed":
+            return ProjectionResult()
+
+        item = params.get("item") or {}
+        item_type = item.get("type") or ""
+        item_id = item.get("id") or ""
+
+        if item_type == "agentMessage":
+            return self._project_agent_message(item)
+        if item_type == "reasoning":
+            self._pending_reasoning.extend(item.get("summary") or [])
+            self._pending_reasoning.extend(item.get("content") or [])
+            return ProjectionResult()
+        if item_type == "commandExecution":
+            return self._project_command(item, item_id)
+        if item_type == "fileChange":
+            return self._project_file_change(item, item_id)
+        if item_type == "mcpToolCall":
+            return self._project_mcp_tool_call(item, item_id)
+        if item_type == "dynamicToolCall":
+            return self._project_dynamic_tool_call(item, item_id)
+        if item_type == "userMessage":
+            return self._project_user_message(item)
+
+        # Unknown / rare items (plan, hookPrompt, collabAgentToolCall, etc.)
+        # — record as opaque assistant note so memory review can still see
+        # *something* happened, but don't fabricate tool_call structure.
+        return self._project_opaque(item, item_type)
+
+    # ---------- per-type projections ----------
+
+    def _project_agent_message(self, item: dict) -> ProjectionResult:
+        text = item.get("text") or ""
+        msg: dict[str, Any] = {"role": "assistant", "content": text}
+        if self._pending_reasoning:
+            msg["reasoning"] = "\n".join(self._pending_reasoning)
+            self._pending_reasoning = []
+        return ProjectionResult(messages=[msg], final_text=text)
+
+    def _project_user_message(self, item: dict) -> ProjectionResult:
+        # codex's userMessage content is a list of UserInput variants. For
+        # projection purposes we flatten any text fragments and ignore
+        # non-text parts (images, etc.) — Hermes' messages store text only.
+        text_parts: list[str] = []
+        for fragment in item.get("content") or []:
+            if isinstance(fragment, dict):
+                if fragment.get("type") == "text":
+                    text_parts.append(fragment.get("text") or "")
+                elif "text" in fragment:
+                    text_parts.append(str(fragment["text"]))
+        return ProjectionResult(
+            messages=[{"role": "user", "content": "\n".join(text_parts)}]
+        )
+
+    def _project_command(self, item: dict, item_id: str) -> ProjectionResult:
+        call_id = _deterministic_call_id("exec", item_id)
+        args = {
+            "command": item.get("command") or "",
+            "cwd": item.get("cwd") or "",
+        }
+        assistant_msg = {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [
+                {
+                    "id": call_id,
+                    "type": "function",
+                    "function": {
+                        "name": "exec_command",
+                        "arguments": _format_tool_args(args),
+                    },
+                }
+            ],
+        }
+        if self._pending_reasoning:
+            assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
+            self._pending_reasoning = []
+        output = item.get("aggregatedOutput") or ""
+        exit_code = item.get("exitCode")
+        if exit_code is not None and exit_code != 0:
+            output = f"[exit {exit_code}]\n{output}"
+        tool_msg = {
+            "role": "tool",
+            "tool_call_id": call_id,
+            "content": output,
+        }
+        return ProjectionResult(
+            messages=[assistant_msg, tool_msg], is_tool_iteration=True
+        )
+
+    def _project_file_change(self, item: dict, item_id: str) -> ProjectionResult:
+        call_id = _deterministic_call_id("apply_patch", item_id)
+        # Reduce the codex changes array to a digest the agent loop will
+        # find readable. We record per-file change kinds (Add/Update/Delete)
+        # without inlining full file contents — those can be huge.
+        changes_summary = []
+        for change in item.get("changes") or []:
+            kind = (change.get("kind") or {}).get("type") or "update"
+            path = change.get("path") or ""
+            changes_summary.append({"kind": kind, "path": path})
+        args = {"changes": changes_summary}
+        assistant_msg = {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [
+                {
+                    "id": call_id,
+                    "type": "function",
+                    "function": {
+                        "name": "apply_patch",
+                        "arguments": _format_tool_args(args),
+                    },
+                }
+            ],
+        }
+        if self._pending_reasoning:
+            assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
+            self._pending_reasoning = []
+        status = item.get("status") or "unknown"
+        n = len(changes_summary)
+        tool_msg = {
+            "role": "tool",
+            "tool_call_id": call_id,
+            "content": f"apply_patch status={status}, {n} change(s)",
+        }
+        return ProjectionResult(
+            messages=[assistant_msg, tool_msg], is_tool_iteration=True
+        )
+
+    def _project_mcp_tool_call(self, item: dict, item_id: str) -> ProjectionResult:
+        server = item.get("server") or "mcp"
+        tool = item.get("tool") or "unknown"
+        call_id = _deterministic_call_id(f"mcp_{server}_{tool}", item_id)
+        args = item.get("arguments") or {}
+        if not isinstance(args, dict):
+            args = {"arguments": args}
+        assistant_msg = {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [
+                {
+                    "id": call_id,
+                    "type": "function",
+                    "function": {
+                        "name": f"mcp.{server}.{tool}",
+                        "arguments": _format_tool_args(args),
+                    },
+                }
+            ],
+        }
+        if self._pending_reasoning:
+            assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
+            self._pending_reasoning = []
+        result = item.get("result")
+        error = item.get("error")
+        if error:
+            content = f"[error] {json.dumps(error, ensure_ascii=False)[:1000]}"
+        elif result is not None:
+            content = json.dumps(result, ensure_ascii=False)[:4000]
+        else:
+            content = ""
+        tool_msg = {
+            "role": "tool",
+            "tool_call_id": call_id,
+            "content": content,
+        }
+        return ProjectionResult(
+            messages=[assistant_msg, tool_msg], is_tool_iteration=True
+        )
+
+    def _project_dynamic_tool_call(
+        self, item: dict, item_id: str
+    ) -> ProjectionResult:
+        tool = item.get("tool") or "unknown"
+        call_id = _deterministic_call_id(f"dyn_{tool}", item_id)
+        args = item.get("arguments") or {}
+        if not isinstance(args, dict):
+            args = {"arguments": args}
+        assistant_msg = {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [
+                {
+                    "id": call_id,
+                    "type": "function",
+                    "function": {
+                        "name": tool,
+                        "arguments": _format_tool_args(args),
+                    },
+                }
+            ],
+        }
+        if self._pending_reasoning:
+            assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
+            self._pending_reasoning = []
+        content_items = item.get("contentItems") or []
+        if isinstance(content_items, list) and content_items:
+            content = json.dumps(content_items, ensure_ascii=False)[:4000]
+        else:
+            success = item.get("success")
+            content = f"success={success}"
+        tool_msg = {
+            "role": "tool",
+            "tool_call_id": call_id,
+            "content": content,
+        }
+        return ProjectionResult(
+            messages=[assistant_msg, tool_msg], is_tool_iteration=True
+        )
+
+    def _project_opaque(self, item: dict, item_type: str) -> ProjectionResult:
+        # Record the existence of the item without inventing tool_calls.
+        # Memory review will see this and may or may not save anything.
+        try:
+            payload = json.dumps(item, ensure_ascii=False)[:1500]
+        except (TypeError, ValueError):
+            payload = repr(item)[:1500]
+        return ProjectionResult(
+            messages=[
+                {
+                    "role": "assistant",
+                    "content": f"[codex {item_type}] {payload}",
+                }
+            ]
+        )
@@ -0,0 +1,225 @@
+"""Hermes-tools-as-MCP server for the codex_app_server runtime.
+
+When the user runs `openai/*` turns through the codex app-server, codex
+owns the loop and builds its own tool list. By default, that means
+Hermes' richer tool surface — web search, browser automation,
+delegate_task subagents, vision analysis, persistent memory, skills,
+cross-session search, image generation, TTS — is unreachable.
+
+This module exposes a curated subset of those Hermes tools to the
+spawned codex subprocess via stdio MCP. Codex registers it as a normal
+MCP server (per `~/.codex/config.toml [mcp_servers.hermes-tools]`) and
+the user gets full Hermes capability inside a Codex turn.
+
+Scope (what we expose):
+  - web_search, web_extract              — Firecrawl, no codex equivalent
+  - browser_navigate / _click / _type /  — Camofox/Browserbase automation
+    _snapshot / _screenshot / _scroll / _back / _press / _vision
+  - delegate_task                        — Hermes subagents
+  - vision_analyze                       — image inspection by vision model
+  - image_generate                       — image generation
+  - memory                               — Hermes' persistent memory store
+  - skill_view, skills_list              — Hermes' skill library
+  - session_search                       — cross-session search
+  - text_to_speech                       — TTS
+
+What we DO NOT expose (codex has equivalents):
+  - terminal / shell                     — codex's own shell tool
+  - read_file / write_file / patch       — codex's apply_patch + shell
+  - search_files / process               — codex's shell
+  - clarify, todo                        — codex's own UX
+
+Run with: python -m agent.transports.hermes_tools_mcp_server
+Spawned by: CodexAppServerSession.ensure_started() when the runtime is
+            active and config opts in.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import sys
+from typing import Any, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# Tools we expose. Each name MUST match a registered Hermes tool that
+# `model_tools.handle_function_call()` can dispatch.
+#
+# What we deliberately DO NOT expose:
+#   - terminal / shell / read_file / write_file / patch / search_files /
+#     process — codex's built-ins cover these and approval routes through
+#     codex's own UI.
+#   - delegate_task / memory / session_search / todo — these are
+#     `_AGENT_LOOP_TOOLS` in Hermes (model_tools.py:493). They require
+#     the running AIAgent context to dispatch (mid-loop state), so a
+#     stateless MCP callback can't drive them. Hermes' default runtime
+#     keeps these working; the codex_app_server runtime cannot.
+EXPOSED_TOOLS: tuple[str, ...] = (
+    "web_search",
+    "web_extract",
+    "browser_navigate",
+    "browser_click",
+    "browser_type",
+    "browser_press",
+    "browser_snapshot",
+    "browser_scroll",
+    "browser_back",
+    "browser_get_images",
+    "browser_console",
+    "browser_vision",
+    "vision_analyze",
+    "image_generate",
+    "skill_view",
+    "skills_list",
+    "text_to_speech",
+    # Kanban worker handoff tools — gated on HERMES_KANBAN_TASK env var
+    # (set by the kanban dispatcher when spawning a worker). Without these
+    # in the callback, a worker spawned with openai_runtime=codex_app_server
+    # could do the work but couldn't report completion back to the kernel,
+    # making it hang until timeout. Stateless dispatch — they just read
+    # the env var and write to ~/.hermes/kanban.db.
+    "kanban_complete",
+    "kanban_block",
+    "kanban_comment",
+    "kanban_heartbeat",
+    "kanban_show",
+    "kanban_list",
+    # NOTE: kanban_create / kanban_unblock / kanban_link are orchestrator-
+    # only — the kanban tool gates them on HERMES_KANBAN_TASK being unset.
+    # They're exposed here for orchestrator agents running on the codex
+    # runtime that need to dispatch new tasks.
+    "kanban_create",
+    "kanban_unblock",
+    "kanban_link",
+)
+
+
+def _build_server() -> Any:
+    """Create the FastMCP server with Hermes tools attached. Lazy imports
+    so the module can be imported without the mcp package installed
+    (we degrade to a clear error only when actually run)."""
+    try:
+        from mcp.server.fastmcp import FastMCP
+    except ImportError as exc:  # pragma: no cover - install hint
+        raise ImportError(
+            f"hermes-tools MCP server requires the 'mcp' package: {exc}"
+        ) from exc
+
+    # Discover Hermes tools so dispatch works.
+    from model_tools import (
+        get_tool_definitions,
+        handle_function_call,
+    )
+
+    mcp = FastMCP(
+        "hermes-tools",
+        instructions=(
+            "Hermes Agent's tool surface, exposed for use inside a Codex "
+            "session. Use these for capabilities Codex's built-in toolset "
+            "doesn't cover: web search/extract, browser automation, "
+            "subagent delegation, vision, image generation, persistent "
+            "memory, skills, and cross-session search."
+        ),
+    )
+
+    # Pull authoritative Hermes tool schemas for the ones we expose, so
+    # MCP clients see the same parameter docs Hermes gives the model.
+    all_defs = {
+        td["function"]["name"]: td["function"]
+        for td in (get_tool_definitions(quiet_mode=True) or [])
+        if isinstance(td, dict) and td.get("type") == "function"
+    }
+
+    exposed_count = 0
+
+    for name in EXPOSED_TOOLS:
+        spec = all_defs.get(name)
+        if spec is None:
+            logger.debug(
+                "skipping %s — not registered in this Hermes process", name
+            )
+            continue
+
+        description = spec.get("description") or f"Hermes {name} tool"
+        params_schema = spec.get("parameters") or {"type": "object", "properties": {}}
+
+        # FastMCP wants a Python callable. Build a closure that takes the
+        # arguments dict, dispatches via handle_function_call, and returns
+        # the result string. We use add_tool() for full control over the
+        # input schema (FastMCP's @tool() decorator inspects type hints,
+        # which we can't get from a JSON schema at runtime).
+        def _make_handler(tool_name: str):
+            def _dispatch(**kwargs: Any) -> str:
+                try:
+                    return handle_function_call(tool_name, kwargs or {})
+                except Exception as exc:
+                    logger.exception("tool %s raised", tool_name)
+                    return json.dumps({"error": str(exc), "tool": tool_name})
+            _dispatch.__name__ = tool_name
+            _dispatch.__doc__ = description
+            return _dispatch
+
+        try:
+            mcp.add_tool(
+                _make_handler(name),
+                name=name,
+                description=description,
+                # FastMCP accepts JSON schema directly via the
+                # input_schema parameter on newer versions; older
+                # versions use parameters_schema. Try both for compat.
+            )
+        except TypeError:
+            # Older mcp SDK signature — fall back to decorator-style.
+            handler = _make_handler(name)
+            handler = mcp.tool(name=name, description=description)(handler)
+
+        exposed_count += 1
+
+    logger.info(
+        "hermes-tools MCP server registered %d/%d tools",
+        exposed_count,
+        len(EXPOSED_TOOLS),
+    )
+    return mcp
+
+
+def main(argv: Optional[list[str]] = None) -> int:
+    """Entry point for `python -m agent.transports.hermes_tools_mcp_server`."""
+    argv = argv or sys.argv[1:]
+    verbose = "--verbose" in argv or "-v" in argv
+
+    log_level = logging.INFO if verbose else logging.WARNING
+    logging.basicConfig(
+        level=log_level,
+        stream=sys.stderr,  # MCP uses stdio for protocol — logs MUST go to stderr
+        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+    )
+
+    # Quiet mode: keep Hermes' own banners off stdout (which is the MCP wire).
+    os.environ.setdefault("HERMES_QUIET", "1")
+    os.environ.setdefault("HERMES_REDACT_SECRETS", "true")
+
+    try:
+        server = _build_server()
+    except ImportError as exc:
+        sys.stderr.write(f"hermes-tools MCP server cannot start: {exc}\n")
+        return 2
+
+    # FastMCP runs with stdio transport by default when launched as a
+    # subprocess.
+    try:
+        server.run()
+    except KeyboardInterrupt:
+        return 0
+    except Exception as exc:
+        logger.exception("hermes-tools MCP server crashed")
+        sys.stderr.write(f"hermes-tools MCP server error: {exc}\n")
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,299 @@
+"""
+Video Generation Provider ABC
+=============================
+
+Defines the pluggable-backend interface for video generation. Providers register
+instances via ``PluginContext.register_video_gen_provider()``; the active one
+(selected via ``video_gen.provider`` in ``config.yaml``) services every
+``video_generate`` tool call.
+
+Providers live in ``<repo>/plugins/video_gen/<name>/`` (built-in, auto-loaded
+as ``kind: backend``) or ``~/.hermes/plugins/video_gen/<name>/`` (user, opt-in
+via ``plugins.enabled``).
+
+Mirrors the ``image_gen`` provider design (``agent/image_gen_provider.py``) so
+the two surfaces stay learnable together.
+
+Unified surface
+---------------
+One tool — ``video_generate`` — covers **text-to-video** and **image-to-video**.
+The router is the presence of ``image_url``: if it's set, the provider routes
+to its image-to-video endpoint; if it's omitted, the provider routes to
+text-to-video. Users pick one **model family** (e.g. Pixverse v6, Veo 3.1,
+Kling O3 Standard); the provider handles which underlying FAL/xAI endpoint
+to hit.
+
+Video edit and video extend are intentionally NOT exposed in this surface —
+the inconsistency across backends is too large for one unified tool. If
+those use cases warrant attention later they can ship as separate tools.
+
+Response shape
+--------------
+All providers return a dict built by :func:`success_response` /
+:func:`error_response`. Keys:
+
+    success         bool
+    video           str | None      URL or absolute file path
+    model           str             provider-specific model identifier
+    prompt          str             echoed prompt
+    modality        str             "text" | "image" (which mode was used)
+    aspect_ratio    str             provider-native (e.g. "16:9") or ""
+    duration        int             seconds (0 if not applicable)
+    provider        str             provider name (for diagnostics)
+    error           str             only when success=False
+    error_type      str             only when success=False
+"""
+
+from __future__ import annotations
+
+import abc
+import base64
+import datetime
+import logging
+import uuid
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+
+# Common aspect ratios across providers (Veo / Kling / xAI / Pixverse). The
+# tool schema advertises this set as an enum hint, but providers may accept
+# a narrower or wider set — they are responsible for clamping.
+COMMON_ASPECT_RATIOS: Tuple[str, ...] = ("16:9", "9:16", "1:1", "4:3", "3:4", "3:2", "2:3")
+DEFAULT_ASPECT_RATIO = "16:9"
+
+COMMON_RESOLUTIONS: Tuple[str, ...] = ("480p", "540p", "720p", "1080p")
+DEFAULT_RESOLUTION = "720p"
+
+
+# ---------------------------------------------------------------------------
+# ABC
+# ---------------------------------------------------------------------------
+
+
+class VideoGenProvider(abc.ABC):
+    """Abstract base class for a video generation backend.
+
+    Subclasses must implement :meth:`generate`. Everything else has sane
+    defaults — override only what your provider needs.
+    """
+
+    @property
+    @abc.abstractmethod
+    def name(self) -> str:
+        """Stable short identifier used in ``video_gen.provider`` config.
+
+        Lowercase, no spaces. Examples: ``xai``, ``fal``, ``google``.
+        """
+
+    @property
+    def display_name(self) -> str:
+        """Human-readable label shown in ``hermes tools``. Defaults to ``name.title()``."""
+        return self.name.title()
+
+    def is_available(self) -> bool:
+        """Return True when this provider can service calls.
+
+        Typically checks for a required API key and optional-dependency
+        import. Default: True.
+        """
+        return True
+
+    def list_models(self) -> List[Dict[str, Any]]:
+        """Return catalog entries for ``hermes tools`` model picker.
+
+        Each entry represents a **model family** that supports text-to-video
+        and/or image-to-video routing internally::
+
+            {
+                "id": "veo-3.1",                       # required
+                "display": "Veo 3.1",                  # optional; defaults to id
+                "speed": "~60s",                       # optional
+                "strengths": "...",                    # optional
+                "price": "$0.20/s",                    # optional
+                "modalities": ["text", "image"],       # optional, advisory
+            }
+
+        Default: empty list (provider has no user-selectable models).
+        """
+        return []
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        """Return provider metadata for the ``hermes tools`` picker."""
+        return {
+            "name": self.display_name,
+            "badge": "",
+            "tag": "",
+            "env_vars": [],
+        }
+
+    def default_model(self) -> Optional[str]:
+        """Return the default model id, or None if not applicable."""
+        models = self.list_models()
+        if models:
+            return models[0].get("id")
+        return None
+
+    def capabilities(self) -> Dict[str, Any]:
+        """Return what this provider supports.
+
+        Returned dict (all keys optional)::
+
+            {
+                "modalities": ["text", "image"],      # which inputs the backend accepts
+                "aspect_ratios": ["16:9", "9:16", ...],
+                "resolutions": ["720p", "1080p"],
+                "max_duration": 15,                   # seconds
+                "min_duration": 1,
+                "supports_audio": True,
+                "supports_negative_prompt": True,
+                "max_reference_images": 7,
+            }
+
+        Used by the tool layer for soft validation and by ``hermes tools``
+        for the picker. Default: text-only.
+        """
+        return {
+            "modalities": ["text"],
+            "aspect_ratios": list(COMMON_ASPECT_RATIOS),
+            "resolutions": list(COMMON_RESOLUTIONS),
+            "max_duration": 10,
+            "min_duration": 1,
+            "supports_audio": False,
+            "supports_negative_prompt": False,
+            "max_reference_images": 0,
+        }
+
+    @abc.abstractmethod
+    def generate(
+        self,
+        prompt: str,
+        *,
+        model: Optional[str] = None,
+        image_url: Optional[str] = None,
+        reference_image_urls: Optional[List[str]] = None,
+        duration: Optional[int] = None,
+        aspect_ratio: str = DEFAULT_ASPECT_RATIO,
+        resolution: str = DEFAULT_RESOLUTION,
+        negative_prompt: Optional[str] = None,
+        audio: Optional[bool] = None,
+        seed: Optional[int] = None,
+        **kwargs: Any,
+    ) -> Dict[str, Any]:
+        """Generate a video from a prompt (text-to-video) or animate an image
+        (image-to-video).
+
+        Routing: if ``image_url`` is provided, the provider should route to
+        its image-to-video endpoint; otherwise text-to-video. The plugin
+        is responsible for picking the right underlying endpoint within
+        the user's chosen model family.
+
+        Implementations should return the dict from :func:`success_response`
+        or :func:`error_response`. ``kwargs`` may contain forward-compat
+        parameters future versions of the schema will expose —
+        implementations MUST ignore unknown keys (no TypeError).
+        """
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _videos_cache_dir() -> Path:
+    """Return ``$HERMES_HOME/cache/videos/``, creating parents as needed."""
+    from hermes_constants import get_hermes_home
+
+    path = get_hermes_home() / "cache" / "videos"
+    path.mkdir(parents=True, exist_ok=True)
+    return path
+
+
+def save_b64_video(
+    b64_data: str,
+    *,
+    prefix: str = "video",
+    extension: str = "mp4",
+) -> Path:
+    """Decode base64 video data and write under ``$HERMES_HOME/cache/videos/``.
+
+    Returns the absolute :class:`Path` to the saved file.
+
+    Filename format: ``<prefix>_<YYYYMMDD_HHMMSS>_<short-uuid>.<ext>``.
+    """
+    raw = base64.b64decode(b64_data)
+    ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    short = uuid.uuid4().hex[:8]
+    path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
+    path.write_bytes(raw)
+    return path
+
+
+def save_bytes_video(
+    raw: bytes,
+    *,
+    prefix: str = "video",
+    extension: str = "mp4",
+) -> Path:
+    """Write raw video bytes (e.g. an HTTP download body) to the cache."""
+    ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    short = uuid.uuid4().hex[:8]
+    path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
+    path.write_bytes(raw)
+    return path
+
+
+def success_response(
+    *,
+    video: str,
+    model: str,
+    prompt: str,
+    modality: str = "text",
+    aspect_ratio: str = "",
+    duration: int = 0,
+    provider: str,
+    extra: Optional[Dict[str, Any]] = None,
+) -> Dict[str, Any]:
+    """Build a uniform success response dict.
+
+    ``video`` may be an HTTP URL or an absolute filesystem path.
+    ``modality`` is ``"text"`` (text-to-video) or ``"image"`` (image-to-video) —
+    indicates which endpoint was actually hit, useful for diagnostics.
+    """
+    payload: Dict[str, Any] = {
+        "success": True,
+        "video": video,
+        "model": model,
+        "prompt": prompt,
+        "modality": modality,
+        "aspect_ratio": aspect_ratio,
+        "duration": int(duration) if duration else 0,
+        "provider": provider,
+    }
+    if extra:
+        for k, v in extra.items():
+            payload.setdefault(k, v)
+    return payload
+
+
+def error_response(
+    *,
+    error: str,
+    error_type: str = "provider_error",
+    provider: str = "",
+    model: str = "",
+    prompt: str = "",
+    aspect_ratio: str = "",
+) -> Dict[str, Any]:
+    """Build a uniform error response dict."""
+    return {
+        "success": False,
+        "video": None,
+        "error": error,
+        "error_type": error_type,
+        "model": model,
+        "prompt": prompt,
+        "aspect_ratio": aspect_ratio,
+        "provider": provider,
+    }
@@ -0,0 +1,117 @@
+"""
+Video Generation Provider Registry
+==================================
+
+Central map of registered providers. Populated by plugins at import-time via
+``PluginContext.register_video_gen_provider()``; consumed by the
+``video_generate`` tool to dispatch each call to the active backend.
+
+Active selection
+----------------
+The active provider is chosen by ``video_gen.provider`` in ``config.yaml``.
+If unset, :func:`get_active_provider` applies fallback logic:
+
+1. If exactly one provider is registered, use it.
+2. Otherwise return ``None`` (the tool surfaces a helpful error pointing
+   the user at ``hermes tools``).
+
+Mirrors ``agent/image_gen_registry.py`` so the two surfaces behave the
+same.
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+from typing import Dict, List, Optional
+
+from agent.video_gen_provider import VideoGenProvider
+
+logger = logging.getLogger(__name__)
+
+
+_providers: Dict[str, VideoGenProvider] = {}
+_lock = threading.Lock()
+
+
+def register_provider(provider: VideoGenProvider) -> None:
+    """Register a video generation provider.
+
+    Re-registration (same ``name``) overwrites the previous entry and logs
+    a debug message — this makes hot-reload scenarios (tests, dev loops)
+    behave predictably.
+    """
+    if not isinstance(provider, VideoGenProvider):
+        raise TypeError(
+            f"register_provider() expects a VideoGenProvider instance, "
+            f"got {type(provider).__name__}"
+        )
+    name = provider.name
+    if not isinstance(name, str) or not name.strip():
+        raise ValueError("Video gen provider .name must be a non-empty string")
+    with _lock:
+        existing = _providers.get(name)
+        _providers[name] = provider
+    if existing is not None:
+        logger.debug("Video gen provider '%s' re-registered (was %r)", name, type(existing).__name__)
+    else:
+        logger.debug("Registered video gen provider '%s' (%s)", name, type(provider).__name__)
+
+
+def list_providers() -> List[VideoGenProvider]:
+    """Return all registered providers, sorted by name."""
+    with _lock:
+        items = list(_providers.values())
+    return sorted(items, key=lambda p: p.name)
+
+
+def get_provider(name: str) -> Optional[VideoGenProvider]:
+    """Return the provider registered under *name*, or None."""
+    if not isinstance(name, str):
+        return None
+    with _lock:
+        return _providers.get(name.strip())
+
+
+def get_active_provider() -> Optional[VideoGenProvider]:
+    """Resolve the currently-active provider.
+
+    Reads ``video_gen.provider`` from config.yaml; falls back per the
+    module docstring.
+    """
+    configured: Optional[str] = None
+    try:
+        from hermes_cli.config import load_config
+
+        cfg = load_config()
+        section = cfg.get("video_gen") if isinstance(cfg, dict) else None
+        if isinstance(section, dict):
+            raw = section.get("provider")
+            if isinstance(raw, str) and raw.strip():
+                configured = raw.strip()
+    except Exception as exc:
+        logger.debug("Could not read video_gen.provider from config: %s", exc)
+
+    with _lock:
+        snapshot = dict(_providers)
+
+    if configured:
+        provider = snapshot.get(configured)
+        if provider is not None:
+            return provider
+        logger.debug(
+            "video_gen.provider='%s' configured but not registered; falling back",
+            configured,
+        )
+
+    # Fallback: single-provider case
+    if len(snapshot) == 1:
+        return next(iter(snapshot.values()))
+
+    return None
+
+
+def _reset_for_tests() -> None:
+    """Clear the registry. **Test-only.**"""
+    with _lock:
+        _providers.clear()
@@ -0,0 +1,221 @@
+"""
+Web Search Provider ABC
+=======================
+
+Defines the pluggable-backend interface for web search and content extraction.
+Providers register instances via ``PluginContext.register_web_search_provider()``;
+the active one (selected via ``web.search_backend`` / ``web.extract_backend`` /
+``web.backend`` in ``config.yaml``) services every ``web_search`` /
+``web_extract`` tool call.
+
+Providers live in ``<repo>/plugins/web/<name>/`` (built-in, auto-loaded as
+``kind: backend``) or ``~/.hermes/plugins/web/<name>/`` (user, opt-in via
+``plugins.enabled``).
+
+This ABC is the SINGLE plugin-facing surface for web providers — every
+provider in the tree (brave-free, ddgs, searxng, exa, parallel, tavily,
+firecrawl) implements it. The legacy in-tree ``tools.web_providers.base``
+ABCs were deleted in PR #25182 along with the per-vendor inline helpers
+in ``tools/web_tools.py``; the response-shape contract documented below
+is preserved bit-for-bit so the tool wrapper does not have to translate.
+
+Response shape (preserved from the legacy contract):
+
+Search results::
+
+    {
+        "success": True,
+        "data": {
+            "web": [
+                {"title": str, "url": str, "description": str, "position": int},
+                ...
+            ]
+        }
+    }
+
+Extract results::
+
+    {
+        "success": True,
+        "data": [
+            {"url": str, "title": str, "content": str,
+             "raw_content": str, "metadata": dict},
+            ...
+        ]
+    }
+
+On failure (either capability)::
+
+    {"success": False, "error": str}
+"""
+
+from __future__ import annotations
+
+import abc
+from typing import Any, Dict, List
+
+
+# ---------------------------------------------------------------------------
+# ABC
+# ---------------------------------------------------------------------------
+
+
+class WebSearchProvider(abc.ABC):
+    """Abstract base class for a web search/extract/crawl backend.
+
+    Subclasses must implement :meth:`is_available` and at least one of
+    :meth:`search` / :meth:`extract` / :meth:`crawl`. The
+    :meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
+    capability flags let the registry route each tool call to the right
+    provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
+    …) advertise multiple capabilities from a single class.
+    """
+
+    @property
+    @abc.abstractmethod
+    def name(self) -> str:
+        """Stable short identifier used in ``web.search_backend`` /
+        ``web.extract_backend`` / ``web.backend`` config keys.
+
+        Lowercase, no spaces; hyphens permitted to preserve existing
+        user-visible names. Examples: ``brave-free``, ``ddgs``,
+        ``searxng``, ``firecrawl``.
+        """
+
+    @property
+    def display_name(self) -> str:
+        """Human-readable label shown in ``hermes tools``. Defaults to ``name``."""
+        return self.name
+
+    @abc.abstractmethod
+    def is_available(self) -> bool:
+        """Return True when this provider can service calls.
+
+        Typically a cheap check (env var present, optional Python dep
+        importable, instance URL set). Must NOT make network calls — this
+        runs at tool-registration time and on every ``hermes tools`` paint.
+        """
+
+    def supports_search(self) -> bool:
+        """Return True if this provider implements :meth:`search`."""
+        return True
+
+    def supports_extract(self) -> bool:
+        """Return True if this provider implements :meth:`extract`.
+
+        Both sync and async :meth:`extract` implementations are valid — the
+        dispatcher detects coroutine functions via
+        :func:`inspect.iscoroutinefunction` and awaits as needed. Sync
+        implementations that perform blocking I/O (HTTP, SDK calls) should
+        ideally wrap in :func:`asyncio.to_thread` at the call site; small
+        providers can keep their sync shape and let the dispatcher handle
+        threading.
+        """
+        return False
+
+    def supports_crawl(self) -> bool:
+        """Return True if this provider implements :meth:`crawl`.
+
+        Crawl differs from extract in that the agent provides a *seed URL*
+        and the provider walks linked pages on its own — useful for
+        documentation sites where the agent doesn't know all relevant
+        URLs upfront. Tavily is the only built-in backend that natively
+        crawls today; Firecrawl provides a similar capability that we
+        don't currently surface as a tool.
+
+        Providers that don't crawl should leave this as False; the
+        dispatcher in :func:`tools.web_tools.web_crawl_tool` will fall
+        back to its auxiliary-model summarization path.
+        """
+        return False
+
+    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+        """Execute a web search.
+
+        Override when :meth:`supports_search` returns True. The default
+        raises NotImplementedError; callers should gate on
+        :meth:`supports_search` before calling.
+        """
+        raise NotImplementedError(
+            f"{self.name} does not support search (override supports_search)"
+        )
+
+    def extract(self, urls: List[str], **kwargs: Any) -> Any:
+        """Extract content from one or more URLs.
+
+        Override when :meth:`supports_extract` returns True. The default
+        raises NotImplementedError; callers should gate on
+        :meth:`supports_extract` before calling.
+
+        Return shape: a list of result dicts matching what the legacy
+        :func:`tools.web_tools.web_extract_tool` post-processing pipeline
+        expects::
+
+            [
+                {
+                    "url": str,
+                    "title": str,
+                    "content": str,
+                    "raw_content": str,
+                    "metadata": dict,           # optional
+                    "error": str,               # optional, only on per-URL failure
+                },
+                ...
+            ]
+
+        Implementations MAY be ``async def`` — the dispatcher detects
+        coroutines via :func:`inspect.iscoroutinefunction` and awaits.
+
+        ``kwargs`` may carry forward-compat fields (``format``, ``include_raw``,
+        ``max_chars``) — implementations should ignore unknown keys.
+        """
+        raise NotImplementedError(
+            f"{self.name} does not support extract (override supports_extract)"
+        )
+
+    def crawl(self, url: str, **kwargs: Any) -> Any:
+        """Crawl a seed URL and return results.
+
+        Override when :meth:`supports_crawl` returns True. The default
+        raises NotImplementedError; callers should gate on
+        :meth:`supports_crawl` before calling.
+
+        Return shape: ``{"results": [{"url": str, "title": str,
+        "content": str, ...}, ...]}`` matching what
+        :func:`tools.web_tools.web_crawl_tool` post-processing expects.
+
+        Implementations MAY be ``async def``.
+
+        ``kwargs`` may carry forward-compat fields (e.g. ``max_depth``,
+        ``include_domains``) — implementations should ignore unknown keys.
+        """
+        raise NotImplementedError(
+            f"{self.name} does not support crawl (override supports_crawl)"
+        )
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        """Return provider metadata for the ``hermes tools`` picker.
+
+        Used by ``hermes_cli/tools_config.py`` to inject this provider as a
+        row in the Web Search / Web Extract picker. Shape::
+
+            {
+                "name": "Brave Search (Free)",
+                "badge": "free",
+                "tag": "No paid tier needed — uses Brave's free API.",
+                "env_vars": [
+                    {"key": "BRAVE_SEARCH_API_KEY",
+                     "prompt": "Brave Search API key",
+                     "url": "https://brave.com/search/api/"},
+                ],
+            }
+
+        Default: minimal entry derived from ``display_name``. Override to
+        expose API key prompts, badges, and instance URL fields.
+        """
+        return {
+            "name": self.display_name,
+            "badge": "",
+            "tag": "",
+            "env_vars": [],
+        }
@@ -0,0 +1,262 @@
+"""
+Web Search Provider Registry
+============================
+
+Central map of registered web providers. Populated by plugins at import-time
+via :meth:`PluginContext.register_web_search_provider`; consumed by the
+``web_search`` and ``web_extract`` tool wrappers in :mod:`tools.web_tools` to
+dispatch each call to the active backend.
+
+Active selection
+----------------
+The active provider is chosen by configuration with this precedence:
+
+1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
+   (per-capability override).
+2. ``web.backend`` (shared fallback).
+3. If exactly one capability-eligible provider is registered AND available,
+   use it.
+4. Legacy preference order — ``firecrawl`` → ``parallel`` → ``tavily`` →
+   ``exa`` → ``searxng`` → ``brave-free`` → ``ddgs`` — filtered by
+   availability. Matches the historic ``tools.web_tools._get_backend()``
+   candidate order so installs that never set a config key keep landing
+   on the same provider they did before the plugin migration.
+5. Otherwise ``None`` — the tool surfaces a helpful error pointing at
+   ``hermes tools``.
+
+The capability filter (``supports_search`` / ``supports_extract`` /
+``supports_crawl``) is applied at every step so a search-only provider
+(``brave-free``) configured as ``web.extract_backend`` correctly falls
+through to an extract-capable backend.
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+from typing import Dict, List, Optional
+
+from agent.web_search_provider import WebSearchProvider
+
+logger = logging.getLogger(__name__)
+
+
+_providers: Dict[str, WebSearchProvider] = {}
+_lock = threading.Lock()
+
+
+def register_provider(provider: WebSearchProvider) -> None:
+    """Register a web search/extract provider.
+
+    Re-registration (same ``name``) overwrites the previous entry and logs
+    a debug message — makes hot-reload scenarios (tests, dev loops) behave
+    predictably.
+    """
+    if not isinstance(provider, WebSearchProvider):
+        raise TypeError(
+            f"register_provider() expects a WebSearchProvider instance, "
+            f"got {type(provider).__name__}"
+        )
+    name = provider.name
+    if not isinstance(name, str) or not name.strip():
+        raise ValueError("Web provider .name must be a non-empty string")
+    with _lock:
+        existing = _providers.get(name)
+        _providers[name] = provider
+    if existing is not None:
+        logger.debug(
+            "Web provider '%s' re-registered (was %r)",
+            name, type(existing).__name__,
+        )
+    else:
+        logger.debug(
+            "Registered web provider '%s' (%s)",
+            name, type(provider).__name__,
+        )
+
+
+def list_providers() -> List[WebSearchProvider]:
+    """Return all registered providers, sorted by name."""
+    with _lock:
+        items = list(_providers.values())
+    return sorted(items, key=lambda p: p.name)
+
+
+def get_provider(name: str) -> Optional[WebSearchProvider]:
+    """Return the provider registered under *name*, or None."""
+    if not isinstance(name, str):
+        return None
+    with _lock:
+        return _providers.get(name.strip())
+
+
+# ---------------------------------------------------------------------------
+# Active-provider resolution
+# ---------------------------------------------------------------------------
+
+
+def _read_config_key(*path: str) -> Optional[str]:
+    """Resolve a dotted config key from ``config.yaml``. Returns None on miss."""
+    try:
+        from hermes_cli.config import load_config
+
+        cfg = load_config()
+        cur = cfg
+        for segment in path:
+            if not isinstance(cur, dict):
+                return None
+            cur = cur.get(segment)
+        if isinstance(cur, str) and cur.strip():
+            return cur.strip()
+    except Exception as exc:
+        logger.debug("Could not read config %s: %s", ".".join(path), exc)
+    return None
+
+
+# Legacy preference order — preserves behaviour for users who set no
+# ``web.backend`` / ``web.<capability>_backend`` config key at all. Matches
+# the historic candidate order in :func:`tools.web_tools._get_backend`
+# (paid providers first so existing paid setups don't get downgraded to
+# a free tier on upgrade). Filtered by ``is_available()`` at walk time so
+# we don't surface a provider the user has no credentials for.
+_LEGACY_PREFERENCE = (
+    "firecrawl",
+    "parallel",
+    "tavily",
+    "exa",
+    "searxng",
+    "brave-free",
+    "ddgs",
+)
+
+
+def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
+    """Resolve the active provider for a capability ("search" | "extract" | "crawl").
+
+    Resolution rules (in order):
+
+    1. **Explicit config wins, ignoring availability.** If
+       ``web.{capability}_backend`` or ``web.backend`` names a registered
+       provider that supports *capability*, return it even if its
+       :meth:`is_available` returns False — the dispatcher will surface a
+       precise "X_API_KEY is not set" error to the user instead of silently
+       routing somewhere else. Matches legacy
+       :func:`tools.web_tools._get_backend` behavior for configured names.
+
+    2. **Single-provider shortcut.** When only one registered provider
+       supports *capability* AND ``is_available()`` reports True, return it.
+
+    3. **Legacy preference walk, filtered by availability.** Walk the
+       :data:`_LEGACY_PREFERENCE` order (firecrawl → parallel → tavily →
+       exa → searxng → brave-free → ddgs) looking for a provider whose
+       ``supports_<capability>()`` is True AND whose ``is_available()`` is
+       True. Matches the historic ``tools.web_tools._get_backend()``
+       candidate order so users with credentials but no explicit config
+       key keep landing on the same provider as pre-migration. This is
+       the path that fires when no config key is set — pick the
+       highest-priority backend the user actually has credentials for.
+
+    Returns None when no provider is configured AND no available provider
+    matches the legacy preference; the dispatcher then returns a "set up a
+    provider" error to the user.
+    """
+    with _lock:
+        snapshot = dict(_providers)
+
+    def _capable(p: WebSearchProvider) -> bool:
+        if capability == "search":
+            return bool(p.supports_search())
+        if capability == "extract":
+            return bool(p.supports_extract())
+        if capability == "crawl":
+            return bool(p.supports_crawl())
+        return False
+
+    def _is_available_safe(p: WebSearchProvider) -> bool:
+        """Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
+        try:
+            return bool(p.is_available())
+        except Exception as exc:  # noqa: BLE001
+            logger.debug("provider %s.is_available() raised %s", p.name, exc)
+            return False
+
+    # 1. Explicit config wins — return regardless of is_available() so the
+    #    user gets a precise downstream error message rather than a silent
+    #    backend switch. Matches _get_backend() in web_tools.py.
+    if configured:
+        provider = snapshot.get(configured)
+        if provider is not None and _capable(provider):
+            return provider
+        if provider is None:
+            logger.debug(
+                "web backend '%s' configured but not registered; falling back",
+                configured,
+            )
+        else:
+            logger.debug(
+                "web backend '%s' configured but does not support '%s'; falling back",
+                configured, capability,
+            )
+
+    # 2. + 3. Fallback path — filter by availability so we don't surface
+    #    a provider the user has no credentials for. Without this filter,
+    #    a registered-but-unconfigured provider could end up "active" on
+    #    a fresh install with no API keys at all.
+    eligible = [
+        p for p in snapshot.values()
+        if _capable(p) and _is_available_safe(p)
+    ]
+    if len(eligible) == 1:
+        return eligible[0]
+
+    for legacy in _LEGACY_PREFERENCE:
+        provider = snapshot.get(legacy)
+        if (
+            provider is not None
+            and _capable(provider)
+            and _is_available_safe(provider)
+        ):
+            return provider
+
+    return None
+
+
+def get_active_search_provider() -> Optional[WebSearchProvider]:
+    """Resolve the currently-active web search provider.
+
+    Reads ``web.search_backend`` (preferred) or ``web.backend`` (shared
+    fallback) from config.yaml; falls back per the module docstring.
+    """
+    explicit = _read_config_key("web", "search_backend") or _read_config_key("web", "backend")
+    return _resolve(explicit, capability="search")
+
+
+def get_active_extract_provider() -> Optional[WebSearchProvider]:
+    """Resolve the currently-active web extract provider.
+
+    Reads ``web.extract_backend`` (preferred) or ``web.backend`` (shared
+    fallback) from config.yaml; falls back per the module docstring.
+    """
+    explicit = _read_config_key("web", "extract_backend") or _read_config_key("web", "backend")
+    return _resolve(explicit, capability="extract")
+
+
+def get_active_crawl_provider() -> Optional[WebSearchProvider]:
+    """Resolve the currently-active web crawl provider.
+
+    Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
+    fallback) from config.yaml; falls back per the module docstring.
+
+    Crawl is a niche capability — among built-in providers only Tavily and
+    Firecrawl implement it. Callers should expect ``None`` and fall back to
+    a different strategy (e.g. summarize-via-LLM) when neither is
+    configured.
+    """
+    explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
+    return _resolve(explicit, capability="crawl")
+
+
+def _reset_for_tests() -> None:
+    """Clear the registry. **Test-only.**"""
+    with _lock:
+        _providers.clear()
@@ -364,6 +364,18 @@ compression:
  # compression of older turns.
  protect_last_n: 20

+  # Number of non-system messages to protect at the head of the transcript, in
+  # ADDITION to the system prompt (which is always implicitly protected).
+  # Head messages are NEVER summarized — they survive every compression
+  # indefinitely. This gives stable early context for short/medium sessions,
+  # but in long-running sessions that rely on rolling compaction the pinned
+  # opening turns may not match how you want the session framed over time.
+  # Set to 0 to preserve ONLY the system prompt (plus the rolling summary
+  # and recent tail) — the cleanest configuration for long-running sessions.
+  # Default 3 preserves the system prompt plus the first three non-system
+  # head messages, matching the pre-feature behaviour.
+  protect_first_n: 3
+
  # To pin a specific model/provider for compression summaries, use the
  # auxiliary section below (auxiliary.compression.provider / model).

@@ -1242,7 +1242,13 @@ _STREAM_PAD = "    "  # 4-space indent for streamed response text (matches Panel


 def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
-    """Convert a hex color like '#268bd2' to a true-color ANSI escape."""
+    """Convert a hex color like '#268bd2' to a true-color ANSI escape.
+
+    Auto-remaps known dark-mode-tuned colors to readable light-mode
+    equivalents when running on a light terminal (see
+    _maybe_remap_for_light_mode + _LIGHT_MODE_REMAP).
+    """
+    hex_color = _maybe_remap_for_light_mode(hex_color)
    try:
        r = int(hex_color[1:3], 16)
        g = int(hex_color[3:5], 16)
@@ -1253,6 +1259,250 @@ def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
        return _ACCENT_ANSI_DEFAULT if bold else "\033[38;2;184;134;11m"


+# ────────────────────────────────────────────────────────────────────────
+# Light/dark terminal mode detection.
+#
+# Mirrors ui-tui/src/theme.ts detectLightMode().  Used to decide whether
+# to remap "near-white" skin colors (e.g. #FFF8DC banner_text, #B8860B
+# banner_dim) to darker equivalents that are readable on a light
+# Terminal.app / iTerm2 background.
+#
+# Detection priority:
+#   1. HERMES_LIGHT / HERMES_TUI_LIGHT env (true/false) — explicit override
+#   2. HERMES_TUI_THEME=light|dark — explicit theme
+#   3. HERMES_TUI_BACKGROUND=#RRGGBB — explicit bg hint
+#   4. COLORFGBG env (set by xterm/Konsole/urxvt) — bg slot 7/15 = light
+#   5. OSC 11 query (\x1b]11;?\x1b\\) — ask the terminal directly
+#   6. Default: assume dark (matches the legacy Hermes assumption)
+#
+# Cached after first call so we don't query the terminal repeatedly.
+_LIGHT_MODE_CACHE: bool | None = None
+_TRUE_RE = re.compile(r"^(1|true|on|yes|y)$")
+_FALSE_RE = re.compile(r"^(0|false|off|no|n)$")
+_LIGHT_DEFAULT_TERM_PROGRAMS = frozenset()  # Apple_Terminal doesn't reliably indicate; require explicit
+
+
+def _luminance_from_hex(hex_str: str) -> float | None:
+    s = (hex_str or "").strip().lstrip("#")
+    if len(s) == 3:
+        s = "".join(c * 2 for c in s)
+    if len(s) != 6 or not all(c in "0123456789abcdefABCDEF" for c in s):
+        return None
+    try:
+        r, g, b = int(s[0:2], 16), int(s[2:4], 16), int(s[4:6], 16)
+    except ValueError:
+        return None
+    # Rec.709 luma
+    return (0.2126 * r + 0.7152 * g + 0.0722 * b) / 255.0
+
+
+def _query_osc11_background() -> str | None:
+    """Ask the terminal for its background color via OSC 11.
+
+    Most modern terminals reply with \x1b]11;rgb:RRRR/GGGG/BBBB\x1b\\
+    within a few ms.  We wait up to 100ms total before giving up.
+    Returns "#RRGGBB" or None on timeout / non-tty.
+    """
+    if not sys.stdin.isatty() or not sys.stdout.isatty():
+        return None
+    try:
+        import termios
+        import tty
+        fd = sys.stdin.fileno()
+        old = termios.tcgetattr(fd)
+    except Exception:
+        return None
+    try:
+        try:
+            tty.setcbreak(fd)
+        except Exception:
+            return None
+        try:
+            sys.stdout.write("\x1b]11;?\x1b\\")
+            sys.stdout.flush()
+        except Exception:
+            return None
+        # Read up to ~50ms for the response
+        import select
+        deadline = time.monotonic() + 0.1
+        buf = b""
+        while time.monotonic() < deadline:
+            r, _, _ = select.select([fd], [], [], deadline - time.monotonic())
+            if not r:
+                continue
+            try:
+                chunk = os.read(fd, 64)
+            except OSError:
+                break
+            if not chunk:
+                break
+            buf += chunk
+            if b"\x1b\\" in buf or b"\x07" in buf:
+                break
+        # Parse: \x1b]11;rgb:RRRR/GGGG/BBBB\x1b\\
+        m = re.search(rb"rgb:([0-9a-fA-F]+)/([0-9a-fA-F]+)/([0-9a-fA-F]+)", buf)
+        if not m:
+            return None
+        # Each component is 1-4 hex digits — normalize to 8-bit
+        def norm(h: bytes) -> int:
+            v = int(h, 16)
+            # Scale to 0-255 based on hex length
+            bits = len(h) * 4
+            return (v * 255) // ((1 << bits) - 1) if bits else 0
+        r, g, b = norm(m.group(1)), norm(m.group(2)), norm(m.group(3))
+        return f"#{r:02X}{g:02X}{b:02X}"
+    finally:
+        try:
+            termios.tcsetattr(fd, termios.TCSANOW, old)
+        except Exception:
+            pass
+
+
+def _detect_light_mode() -> bool:
+    global _LIGHT_MODE_CACHE
+    if _LIGHT_MODE_CACHE is not None:
+        return _LIGHT_MODE_CACHE
+    result = False
+    try:
+        # 1. Explicit env override
+        for var in ("HERMES_LIGHT", "HERMES_TUI_LIGHT"):
+            v = (os.environ.get(var) or "").strip().lower()
+            if _TRUE_RE.match(v):
+                result = True
+                _LIGHT_MODE_CACHE = result
+                return result
+            if _FALSE_RE.match(v):
+                _LIGHT_MODE_CACHE = result
+                return result
+        # 2. Theme hint
+        theme = (os.environ.get("HERMES_TUI_THEME") or "").strip().lower()
+        if theme == "light":
+            result = True
+            _LIGHT_MODE_CACHE = result
+            return result
+        if theme == "dark":
+            _LIGHT_MODE_CACHE = result
+            return result
+        # 3. Explicit bg hex
+        bg_hint = os.environ.get("HERMES_TUI_BACKGROUND") or ""
+        bg_lum = _luminance_from_hex(bg_hint)
+        if bg_lum is not None:
+            result = bg_lum >= 0.5
+            _LIGHT_MODE_CACHE = result
+            return result
+        # 4. COLORFGBG (xterm/Konsole/urxvt)
+        cfgbg = (os.environ.get("COLORFGBG") or "").strip()
+        if cfgbg:
+            last = cfgbg.split(";")[-1] if ";" in cfgbg else cfgbg
+            if last.isdigit():
+                bg = int(last)
+                if bg in (7, 15):
+                    result = True
+                    _LIGHT_MODE_CACHE = result
+                    return result
+                if 0 <= bg < 16:
+                    _LIGHT_MODE_CACHE = result
+                    return result
+        # 5. OSC 11 query (best-effort, only when stdin/stdout are TTY)
+        bg_color = _query_osc11_background()
+        if bg_color:
+            lum = _luminance_from_hex(bg_color)
+            if lum is not None:
+                result = lum >= 0.5
+                _LIGHT_MODE_CACHE = result
+                return result
+        # 6. TERM_PROGRAM allow-list (currently empty)
+        tp = (os.environ.get("TERM_PROGRAM") or "").strip()
+        if tp in _LIGHT_DEFAULT_TERM_PROGRAMS:
+            result = True
+    except Exception:
+        result = False
+    _LIGHT_MODE_CACHE = result
+    return result
+
+
+# Light-mode equivalents of skin colors that are unreadable on cream
+# Terminal.app backgrounds.  Used by _SkinAwareAnsi to remap colors
+# at resolution time when light mode is detected.
+#
+# IMPORTANT: only remap colors that are used as STANDALONE foregrounds
+# on the terminal's background.  Don't remap colors that are paired
+# with a dark bg (e.g. status bar text on bg:#1a1a2e) — those would
+# become invisible the OTHER direction (dark gray on dark navy).
+_LIGHT_MODE_REMAP: dict[str, str] = {
+    # Original (dark-mode) -> Light-mode replacement (darker, readable)
+    "#FFF8DC": "#1A1A1A",   # cornsilk -> near-black
+    "#FFD700": "#9A6B00",   # gold -> dark goldenrod (readable on cream)
+    "#FFBF00": "#8A5A00",   # amber -> dark amber
+    "#B8860B": "#5C4500",   # dark goldenrod -> deeper brown (more contrast)
+    "#DAA520": "#6B4F00",   # goldenrod -> dark olive
+    "#F1E6CF": "#1A1A1A",   # cream -> near-black
+    "#c9d1d9": "#24292F",   # github-light fg
+    "#EAF7FF": "#0F1B26",   # ice
+    "#F5F5F5": "#1A1A1A",
+    "#FFF0D4": "#1A1A1A",
+    "#CD7F32": "#8A4F1A",   # bronze -> darker bronze
+    "#FFEFB5": "#3A2A00",
+    # NOTE: skipping #C0C0C0/#888888/#555555/#8B8682 — those are
+    # status-bar foregrounds paired with dark navy bg, where dark
+    # remap values would become invisible.
+}
+
+
+def _maybe_remap_for_light_mode(hex_color: str) -> str:
+    """If we're in light mode, remap a dark-mode-tuned color to a
+    higher-contrast equivalent.  No-op in dark mode."""
+    if not _detect_light_mode():
+        return hex_color
+    if not hex_color or not hex_color.startswith("#"):
+        return hex_color
+    # Case-insensitive lookup
+    upper = hex_color.upper()
+    if upper in _LIGHT_MODE_REMAP_UPPER:
+        return _LIGHT_MODE_REMAP_UPPER[upper]
+    return hex_color
+
+
+# Pre-uppercased lookup table for case-insensitive remapping
+_LIGHT_MODE_REMAP_UPPER = {k.upper(): v for k, v in _LIGHT_MODE_REMAP.items()}
+
+
+def _install_skin_light_mode_hook() -> None:
+    """Wrap SkinConfig.get_color at import time so EVERY skin color read goes
+    through the light-mode remap.  Idempotent."""
+    try:
+        from hermes_cli.skin_engine import SkinConfig  # type: ignore[import]
+    except Exception:
+        return
+    if getattr(SkinConfig, "_hermes_light_mode_hook_installed", False):
+        return
+    _orig_get_color = SkinConfig.get_color
+
+    def _wrapped_get_color(self, key, fallback=""):
+        value = _orig_get_color(self, key, fallback)
+        try:
+            return _maybe_remap_for_light_mode(value)
+        except Exception:
+            return value
+
+    SkinConfig.get_color = _wrapped_get_color  # type: ignore[method-assign]
+    SkinConfig._hermes_light_mode_hook_installed = True  # type: ignore[attr-defined]
+
+
+_install_skin_light_mode_hook()
+
+
+# Prime the light-mode detection cache early (at module load) when
+# we're running interactively so OSC 11 happens before pt grabs the
+# tty.  Skip for non-tty contexts (subagents, gateway, tests).
+try:
+    if sys.stdin.isatty() and sys.stdout.isatty():
+        _detect_light_mode()
+except Exception:
+    pass
+
+
+
 class _SkinAwareAnsi:
    """Lazy ANSI escape that resolves from the skin engine on first use.

@@ -1290,7 +1540,12 @@ class _SkinAwareAnsi:


 _ACCENT = _SkinAwareAnsi("response_border", "#FFD700", bold=True)
-_DIM = _SkinAwareAnsi("banner_dim", "#B8860B")
+# Use ANSI dim+italic attributes (\x1b[2;3m) instead of a hardcoded
+# hex color so dim/thinking text inherits the terminal's default
+# foreground color and stays readable in both light and dark
+# Terminal.app modes.  Hardcoded skin colors like #B8860B
+# (dark goldenrod) become invisible against light cream backgrounds.
+_DIM = "\x1b[2;3m"


 def _accent_hex() -> str:
@@ -1415,9 +1670,6 @@ _OUTPUT_HISTORY_REPLAYING = False
 _OUTPUT_HISTORY_SUPPRESSED = False
 _OUTPUT_HISTORY_MAX_LINES = 200
 _OUTPUT_HISTORY = deque(maxlen=_OUTPUT_HISTORY_MAX_LINES)
-_ANSI_CONTROL_RE = re.compile(
-    r"\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07]*(?:\x07|\x1b\\))"
-)


 def _coerce_output_history_limit(value) -> int:
@@ -1459,10 +1711,10 @@ def _record_output_history_entry(entry) -> None:
 def _record_output_history(text: str) -> None:
    if not _OUTPUT_HISTORY_ENABLED or _OUTPUT_HISTORY_REPLAYING or _OUTPUT_HISTORY_SUPPRESSED:
        return
-    clean = _ANSI_CONTROL_RE.sub("", str(text)).replace("\r", "").rstrip("\n")
-    if not clean:
+    normalized = str(text).replace("\r", "").rstrip("\n")
+    if not normalized:
        return
-    for line in clean.splitlines():
+    for line in normalized.splitlines():
        _record_output_history_entry(line)


@@ -1473,6 +1725,7 @@ def _replay_output_history() -> None:
        return
    _OUTPUT_HISTORY_REPLAYING = True
    try:
+        rendered_lines = []
        for entry in tuple(_OUTPUT_HISTORY):
            if callable(entry):
                try:
@@ -1483,8 +1736,15 @@ def _replay_output_history() -> None:
                    lines = lines.splitlines()
            else:
                lines = [entry]
-            for line in lines:
-                _pt_print(_PT_ANSI(str(line)))
+            rendered_lines.extend(str(line) for line in lines)
+        if rendered_lines:
+            # Replay after resize can contain hundreds of history lines. A
+            # per-line prompt_toolkit print forces one synchronous terminal I/O
+            # and redraw cycle per line, which users perceive as a waterfall of
+            # old output. Keep the existing history contents unchanged, but
+            # emit the replay as one ANSI payload so resize recovery does a
+            # single prompt_toolkit print/redraw.
+            _pt_print(_PT_ANSI("\n".join(rendered_lines)))
    except Exception:
        pass
    finally:
@@ -2639,6 +2899,12 @@ class HermesCLI:

        # Status bar visibility (toggled via /statusbar)
        self._status_bar_visible = True
+        # When True, the input separator rules and the dynamic status bar are
+        # hidden until the next user input. Set by _recover_after_resize() so a
+        # SIGWINCH cannot stamp a freshly-drawn status bar on top of one that
+        # the terminal just reflowed into scrollback — the cause of duplicated
+        # bars / "blank line flooding" reports (#19280, #22976).
+        self._status_bar_suppressed_after_resize = False
        self._resize_recovery_lock = threading.Lock()
        self._resize_recovery_timer = None
        self._resize_recovery_pending = False
@@ -2703,9 +2969,36 @@ class HermesCLI:
            pass

    def _recover_after_resize(self, app, original_on_resize) -> None:
-        """Recover a resized classic CLI without desynchronizing cursor state."""
-        self._clear_prompt_toolkit_screen(app, rebuild_scrollback=True)
-        _replay_output_history()
+        """Recover a resized classic CLI without desynchronizing cursor state.
+
+        Unlike _force_full_redraw, we do NOT clear the physical screen or
+        scrollback here.  The startup banner and tool summary are printed
+        before prompt_toolkit owns the live chrome, so they live in normal
+        terminal scrollback.  Erasing the screen on SIGWINCH removes that
+        startup UI and ``_replay_output_history`` cannot reconstruct it
+        (the banner was never added to ``_OUTPUT_HISTORY``).
+
+        Instead we just reset prompt_toolkit's renderer cache so the next
+        incremental redraw starts from a clean slate, then let
+        ``original_on_resize`` recalculate layout for the new size.
+
+        We also flag ``_status_bar_suppressed_after_resize`` so the dynamic
+        status bar and input separator rules stay hidden until the next user
+        input.  On column shrink the terminal reflows already-rendered status
+        bar rows into scrollback before prompt_toolkit can erase them; drawing
+        a fresh full-width bar immediately makes the old and new versions
+        look duplicated (#19280, #22976).  Clearing the suppression on the
+        next prompt restores the bar cleanly.
+        """
+        self._status_bar_suppressed_after_resize = True
+        try:
+            app.renderer.reset(leave_alternate_screen=False)
+        except Exception:
+            pass
+        try:
+            app.invalidate()
+        except Exception:
+            pass
        original_on_resize()

    def _schedule_resize_recovery(self, app, original_on_resize, delay: float = 0.12) -> None:
@@ -2940,10 +3233,34 @@ class HermesCLI:
            width = self._get_tui_terminal_width()
        return width < 64

+    @staticmethod
+    def _scrollback_box_width(width: Optional[int] = None) -> int:
+        """Return a resize-safe width for printed scrollback box rules.
+
+        Lines already printed to terminal scrollback are reflowed by the
+        terminal emulator when the column count shrinks. A full-width response
+        border drawn at, say, 200 columns will wrap into two or three rows of
+        dashes after the user resizes to 80 columns, looking like duplicated
+        separator lines (the family of bugs tracked by #18449, #19280, #22976).
+
+        Keep decorative scrollback boxes intentionally narrower than the
+        viewport so a moderate resize never triggers reflow. The live TUI
+        footer (status bar, input rule) still uses the full width — only
+        content that is *stamped into scrollback* needs this clamp.
+        """
+        if width is None:
+            try:
+                width = shutil.get_terminal_size((80, 24)).columns
+            except Exception:
+                width = 80
+        return max(32, min(int(width or 80), 56))
+
    def _tui_input_rule_height(self, position: str, width: Optional[int] = None) -> int:
        """Return the visible height for the top/bottom input separator rules."""
        if position not in {"top", "bottom"}:
            raise ValueError(f"Unknown input rule position: {position}")
+        if getattr(self, "_status_bar_suppressed_after_resize", False):
+            return 0
        if position == "top":
            return 1
        return 0 if self._use_minimal_tui_chrome(width=width) else 1
@@ -3453,7 +3770,7 @@ class HermesCLI:
        # Open reasoning box on first reasoning token
        if not getattr(self, "_reasoning_box_opened", False):
            self._reasoning_box_opened = True
-            w = shutil.get_terminal_size().columns
+            w = self._scrollback_box_width()
            r_label = " Reasoning "
            r_fill = w - 2 - len(r_label)
            _cprint(f"\n{_DIM}┌─{r_label}{'─' * max(r_fill - 1, 0)}┐{_RST}")
@@ -3477,7 +3794,7 @@ class HermesCLI:
            if buf:
                _cprint(f"{_DIM}{buf}{_RST}")
                self._reasoning_buf = ""
-            w = shutil.get_terminal_size().columns
+            w = self._scrollback_box_width()
            _cprint(f"{_DIM}└{'─' * (w - 2)}┘{_RST}")
            self._reasoning_box_opened = False

@@ -3668,7 +3985,7 @@ class HermesCLI:
                self._stream_text_ansi = ""
            if self.show_timestamps:
                label = f"{label} {datetime.now().strftime('%H:%M')}"
-            w = shutil.get_terminal_size().columns
+            w = self._scrollback_box_width()
            fill = w - 2 - HermesCLI._status_bar_display_width(label)
            _cprint(f"\n{_ACCENT}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")

@@ -3769,7 +4086,7 @@ class HermesCLI:

        # Close the response box
        if self._stream_box_opened:
-            w = shutil.get_terminal_size().columns
+            w = self._scrollback_box_width()
            _cprint(f"{_ACCENT}╰{'─' * (w - 2)}╯{_RST}")

    def _reset_stream_state(self) -> None:
@@ -6596,7 +6913,7 @@ class HermesCLI:
          /model <name> --provider <provider> — switch provider + model
          /model --provider <provider>        — switch to provider, auto-detect model
        """
-        from hermes_cli.model_switch import switch_model, parse_model_flags, list_authenticated_providers
+        from hermes_cli.model_switch import switch_model, parse_model_flags
        from hermes_cli.providers import get_label

        # Parse args from the original command
@@ -6606,16 +6923,25 @@ class HermesCLI:
        # Parse --provider and --global flags
        model_input, explicit_provider, persist_global = parse_model_flags(raw_args)

-        # Load providers for switch_model (picker path needs them below)
-        user_provs = None
-        custom_provs = None
+        # Single inventory context — replaces the inline config-slice the
+        # dashboard / TUI used to duplicate. Overlay live session state
+        # via with_overrides (truthy-only) so empty self.* attrs don't
+        # clobber disk config.
+        from hermes_cli.inventory import build_models_payload, load_picker_context
+
        try:
-            from hermes_cli.config import get_compatible_custom_providers, load_config
-            cfg = load_config()
-            user_provs = cfg.get("providers")
-            custom_provs = get_compatible_custom_providers(cfg)
+            ctx = load_picker_context().with_overrides(
+                current_provider=self.provider or "",
+                current_model=self.model or "",
+                current_base_url=self.base_url or "",
+            )
        except Exception:
-            pass
+            ctx = None
+
+        # switch_model() + _open_model_picker still need the raw provider
+        # dicts; ConfigContext is the canonical source for both.
+        user_provs = ctx.user_providers if ctx is not None else None
+        custom_provs = ctx.custom_providers if ctx is not None else None

        # No args at all: open prompt_toolkit-native picker modal
        if not model_input and not explicit_provider:
@@ -6623,14 +6949,9 @@ class HermesCLI:
            provider_display = get_label(self.provider) if self.provider else "unknown"

            try:
-                providers = list_authenticated_providers(
-                    current_provider=self.provider or "",
-                    current_base_url=self.base_url or "",
-                    current_model=self.model or "",
-                    user_providers=user_provs,
-                    custom_providers=custom_provs,
-                    max_models=50,
-                )
+                if ctx is None:
+                    raise RuntimeError("inventory context unavailable")
+                providers = build_models_payload(ctx, max_models=50)["providers"]
            except Exception:
                providers = []

@@ -6756,6 +7077,46 @@ class HermesCLI:
        else:
            _cprint("    (session only — add --global to persist)")

+    def _handle_codex_runtime(self, cmd_original: str) -> None:
+        """Handle /codex-runtime — toggle the codex app-server runtime opt-in.
+
+        Usage:
+            /codex-runtime                       — show current state
+            /codex-runtime auto                  — Hermes default (chat_completions)
+            /codex-runtime codex_app_server      — hand turns to codex subprocess
+            /codex-runtime on / off              — synonyms for the above
+        """
+        from hermes_cli import codex_runtime_switch as crs
+
+        parts = cmd_original.split(None, 1)
+        raw_args = parts[1].strip() if len(parts) > 1 else ""
+        new_value, errors = crs.parse_args(raw_args)
+        if errors:
+            for err in errors:
+                _cprint(f"❌ {err}")
+            return
+
+        # Load + persist via the existing config helpers
+        try:
+            from hermes_cli.config import load_config, save_config
+        except Exception as exc:
+            _cprint(f"❌ could not load config: {exc}")
+            return
+        cfg = load_config()
+
+        result = crs.apply(
+            cfg,
+            new_value,
+            persist_callback=(save_config if new_value is not None else None),
+        )
+
+        prefix = "✓" if result.success else "✗"
+        for line in result.message.splitlines():
+            _cprint(f"  {prefix} {line}" if line.startswith("openai_runtime")
+                    else f"    {line}")
+        if result.success and result.requires_new_session:
+            _cprint("    Tip: `/reset` starts a new session immediately.")
+
    def _should_handle_model_command_inline(self, text: str, has_images: bool = False) -> bool:
        """Return True when /model should be handled immediately on the UI thread."""
        if not text or has_images or not _looks_like_slash_command(text):
@@ -7436,6 +7797,8 @@ class HermesCLI:
            self._handle_resume_command(cmd_original)
        elif canonical == "model":
            self._handle_model_switch(cmd_original)
+        elif canonical == "codex-runtime":
+            self._handle_codex_runtime(cmd_original)
        elif canonical == "gquota":
            self._handle_gquota_command(cmd_original)

@@ -7583,6 +7946,8 @@ class HermesCLI:
                _cprint(f"  No agent running; queued as next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
        elif canonical == "goal":
            self._handle_goal_command(cmd_original)
+        elif canonical == "subgoal":
+            self._handle_subgoal_command(cmd_original)
        elif canonical == "skin":
            self._handle_skin_command(cmd_original)
        elif canonical == "voice":
@@ -7600,6 +7965,8 @@ class HermesCLI:
                    exec_cmd = qcmd.get("command", "")
                    if exec_cmd:
                        try:
+                            # shell=True is intentional: quick_commands are user-defined
+                            # shell snippets from config.yaml — not agent/LLM controlled.
                            result = subprocess.run(
                                exec_cmd, shell=True, capture_output=True,
                                text=True, timeout=30
@@ -7801,8 +8168,8 @@ class HermesCLI:
                        from hermes_cli.skin_engine import get_active_skin
                        _skin = get_active_skin()
                        label = _skin.get_branding("response_label", "⚕ Hermes")
-                        _resp_color = _skin.get_color("response_border", "#CD7F32")
-                        _resp_text = _skin.get_color("banner_text", "#FFF8DC")
+                        _resp_color = _maybe_remap_for_light_mode(_skin.get_color("response_border", "#CD7F32"))
+                        _resp_text = _maybe_remap_for_light_mode(_skin.get_color("banner_text", "#FFF8DC"))
                    except Exception:
                        label = "⚕ Hermes"
                        _resp_color = "#CD7F32"
@@ -7817,6 +8184,7 @@ class HermesCLI:
                        style=_resp_text,
                        box=rich_box.HORIZONTALS,
                        padding=(1, 4),
+                        width=self._scrollback_box_width(),
                    ))
                else:
                    _cprint("  (No response generated)")
@@ -8179,6 +8547,81 @@ class HermesCLI:
        except Exception:
            pass

+    def _handle_subgoal_command(self, cmd: str) -> None:
+        """Dispatch /subgoal subcommands.
+
+        Forms:
+          /subgoal                              show current subgoals
+          /subgoal <text>                       append a criterion
+          /subgoal remove <n>                   drop subgoal n (1-based)
+          /subgoal clear                        wipe all subgoals
+
+        Subgoals are extra criteria the user adds mid-loop. They get
+        appended to both the judge prompt (verdict must consider them)
+        and the continuation prompt (agent sees them) on the next turn
+        boundary. No special kick — the running turn finishes, the next
+        judge call includes them.
+        """
+        parts = (cmd or "").strip().split(None, 2)
+        arg = " ".join(parts[1:]).strip() if len(parts) > 1 else ""
+
+        mgr = self._get_goal_manager()
+        if mgr is None:
+            _cprint(f"  {_DIM}Goals unavailable (no active session).{_RST}")
+            return
+
+        if not mgr.has_goal():
+            _cprint(f"  {_DIM}No active goal. Set one with /goal <text>.{_RST}")
+            return
+
+        # No args → list current subgoals.
+        if not arg:
+            _cprint(f"  {mgr.status_line()}")
+            _cprint(f"  {mgr.render_subgoals()}")
+            return
+
+        tokens = arg.split(None, 1)
+        verb = tokens[0].lower()
+        rest = tokens[1].strip() if len(tokens) > 1 else ""
+
+        if verb == "remove":
+            if not rest:
+                _cprint("  Usage: /subgoal remove <n>")
+                return
+            try:
+                idx = int(rest.split()[0])
+            except ValueError:
+                _cprint("  /subgoal remove: <n> must be an integer (1-based index).")
+                return
+            try:
+                removed = mgr.remove_subgoal(idx)
+            except (IndexError, RuntimeError) as exc:
+                _cprint(f"  /subgoal remove: {exc}")
+                return
+            _cprint(f"  ✓ Removed subgoal {idx}: {removed}")
+            return
+
+        if verb == "clear":
+            try:
+                prev = mgr.clear_subgoals()
+            except RuntimeError as exc:
+                _cprint(f"  /subgoal clear: {exc}")
+                return
+            if prev:
+                _cprint(f"  ✓ Cleared {prev} subgoal{'s' if prev != 1 else ''}.")
+            else:
+                _cprint(f"  {_DIM}No subgoals to clear.{_RST}")
+            return
+
+        # Otherwise — append the whole arg as a new subgoal.
+        try:
+            text = mgr.add_subgoal(arg)
+        except (ValueError, RuntimeError) as exc:
+            _cprint(f"  /subgoal: {exc}")
+            return
+        idx = len(mgr.state.subgoals) if mgr.state else 0
+        _cprint(f"  ✓ Added subgoal {idx}: {text}")
+
    def _maybe_continue_goal_after_turn(self) -> None:
        """Hook run after every CLI turn. Judges + maybe re-queues.

@@ -8205,10 +8648,36 @@ class HermesCLI:

        # If a real user message is already queued, don't inject a
        # continuation prompt on top — let the user's turn go first.
+        # Slash commands don't count as "real user messages" for this
+        # check: they're inspection/mutation (e.g. /subgoal added mid-
+        # run) and the process_loop dispatches them via process_command,
+        # not via chat(). If we treat a queued /subgoal as preempting,
+        # the goal loop silently stalls — we'd return here, then the
+        # slash command consumes its queue slot via process_command()
+        # which never re-fires the goal hook. Peek at all queued entries
+        # and only defer when there's a non-slash payload.
        try:
-            if getattr(self, "_pending_input", None) is not None \
-                    and not self._pending_input.empty():
-                return
+            pending = getattr(self, "_pending_input", None)
+            if pending is not None and not pending.empty():
+                has_real_message = False
+                try:
+                    # Queue.queue is the underlying deque — direct peek
+                    # without disturbing FIFO order.
+                    for entry in list(pending.queue):
+                        # Bundled payloads are (text, images) tuples;
+                        # unpack for inspection.
+                        if isinstance(entry, tuple) and entry:
+                            entry = entry[0]
+                        if isinstance(entry, str) and _looks_like_slash_command(entry):
+                            continue
+                        has_real_message = True
+                        break
+                except Exception:
+                    # Fallback: if we can't introspect the queue, behave
+                    # like the old check and defer to be safe.
+                    has_real_message = True
+                if has_real_message:
+                    return
        except Exception:
            pass

@@ -8301,7 +8770,8 @@ class HermesCLI:

        set_active_skin(new_skin)
        _ACCENT.reset()  # Re-resolve ANSI color for the new skin
-        _DIM.reset()     # Re-resolve dim/secondary ANSI color for the new skin
+        # _DIM is now a fixed dim+italic ANSI escape (terminal-default fg)
+        # so it doesn't need re-resolving on skin switch.
        if save_config_value("display.skin", new_skin):
            print(f"  Skin set to: {new_skin} (saved)")
        else:
@@ -9198,7 +9668,7 @@ class HermesCLI:

        Updates the TUI spinner widget so the user can see what the agent
        is doing during tool execution (fills the gap between thinking
-        spinner and next response).  Also plays audio cue in voice mode.
+        spinner and next response).

        On tool.started, records a monotonic timestamp so get_spinner_text()
        can show a live elapsed timer (the TUI poll loop already invalidates
@@ -9277,20 +9747,6 @@ class HermesCLI:
            )
            self._invalidate()

-        if not self._voice_mode:
-            return
-        if not function_name or function_name.startswith("_"):
-            return
-        try:
-            from tools.voice_mode import play_beep
-            threading.Thread(
-                target=play_beep,
-                kwargs={"frequency": 1200, "duration": 0.06, "count": 1},
-                daemon=True,
-            ).start()
-        except Exception:
-            pass
-
    def _on_tool_start(self, tool_call_id: str, function_name: str, function_args: dict):
        """Capture local before-state for write-capable tools."""
        try:
@@ -9895,7 +10351,7 @@ class HermesCLI:
        import time as _time

        with self._approval_lock:
-            timeout = 60
+            timeout = int(CLI_CONFIG.get("approvals", {}).get("timeout", 60))
            response_queue = queue.Queue()

            self._approval_state = {
@@ -10389,7 +10845,7 @@ class HermesCLI:
                    nonlocal _streaming_box_opened
                    if not _streaming_box_opened:
                        _streaming_box_opened = True
-                        w = self.console.width
+                        w = self._scrollback_box_width(getattr(self.console, "width", 80))
                        label = " ⚕ Hermes "
                        if self.show_timestamps:
                            label = f"{label}{datetime.now().strftime('%H:%M')} "
@@ -10674,7 +11130,7 @@ class HermesCLI:
            if self.show_reasoning and result and not _reasoning_already_shown:
                reasoning = result.get("last_reasoning")
                if reasoning:
-                    w = shutil.get_terminal_size().columns
+                    w = self._scrollback_box_width()
                    r_label = " Reasoning "
                    r_fill = w - 2 - len(r_label)
                    r_top = f"{_DIM}┌─{r_label}{'─' * max(r_fill - 1, 0)}┐{_RST}"
@@ -10694,18 +11150,18 @@ class HermesCLI:
                    from hermes_cli.skin_engine import get_active_skin
                    _skin = get_active_skin()
                    label = _skin.get_branding("response_label", "⚕ Hermes")
-                    _resp_color = _skin.get_color("response_border", "#CD7F32")
-                    _resp_text = _skin.get_color("banner_text", "#FFF8DC")
+                    _resp_color = _maybe_remap_for_light_mode(_skin.get_color("response_border", "#CD7F32"))
+                    _resp_text = _maybe_remap_for_light_mode(_skin.get_color("banner_text", "#FFF8DC"))
                except Exception:
                    label = "⚕ Hermes"
-                    _resp_color = "#CD7F32"
-                    _resp_text = "#FFF8DC"
+                    _resp_color = _maybe_remap_for_light_mode("#CD7F32")
+                    _resp_text = _maybe_remap_for_light_mode("#FFF8DC")

                is_error_response = result and (result.get("failed") or result.get("partial"))
                already_streamed = self._stream_started and self._stream_box_opened and not is_error_response
                if use_streaming_tts and _streaming_box_opened and not is_error_response:
                    # Text was already printed sentence-by-sentence; just close the box
-                    w = shutil.get_terminal_size().columns
+                    w = self._scrollback_box_width()
                    _cprint(f"\n{_ACCENT}╰{'─' * (w - 2)}╯{_RST}")
                elif already_streamed:
                    # Response was already streamed token-by-token with box framing;
@@ -10721,6 +11177,7 @@ class HermesCLI:
                        style=_resp_text,
                        box=rich_box.HORIZONTALS,
                        padding=(1, 4),
+                        width=self._scrollback_box_width(),
                    ))


@@ -10937,13 +11394,48 @@ class HermesCLI:
        return "".join(text for _, text in self._get_tui_prompt_fragments())

    def _build_tui_style_dict(self) -> dict[str, str]:
-        """Layer the active skin's prompt_toolkit colors over the base TUI style."""
+        """Layer the active skin's prompt_toolkit colors over the base TUI style.
+
+        Also rewrites any hex-color tokens in the resulting style strings
+        to their light-mode equivalents (via _LIGHT_MODE_REMAP) when the
+        terminal is detected as light.  This makes the chrome readable
+        on cream Terminal.app backgrounds without per-skin overrides.
+        """
        style_dict = dict(getattr(self, "_tui_style_base", {}) or {})
        try:
            from hermes_cli.skin_engine import get_prompt_toolkit_style_overrides
            style_dict.update(get_prompt_toolkit_style_overrides())
        except Exception:
            pass
+        # Light-mode remap on the style strings.  Each value is a pt
+        # style string like "bg:#1a1a2e #C0C0C0 bold" — split on space,
+        # rewrite any "#XXX" tokens (including "bg:#XXX") through the
+        # light-mode remap, rejoin.
+        #
+        # CRITICAL: skip the remap entirely when a style string already
+        # specifies its own bg (e.g. status-bar / completion-menu styles
+        # with `bg:#1a1a2e ...`).  Those colors were tuned for that
+        # specific dark bg and remapping the FG to a dark equivalent
+        # would produce dark-on-dark (invisible).  The terminal's BG
+        # mode is irrelevant — what matters is the bg the style itself
+        # paints.
+        try:
+            if _detect_light_mode():
+                def _remap_value(v: str) -> str:
+                    if not v:
+                        return v
+                    tokens = v.split()
+                    has_explicit_bg = any(t.startswith("bg:") for t in tokens)
+                    if has_explicit_bg:
+                        # The style paints its own bg — leave its fg alone.
+                        return v
+                    return " ".join(
+                        _maybe_remap_for_light_mode(t) if t.startswith("#") else t
+                        for t in tokens
+                    )
+                style_dict = {k: _remap_value(v or "") for k, v in style_dict.items()}
+        except Exception:
+            pass
        return style_dict

    def _apply_tui_skin_style(self) -> bool:
@@ -11029,6 +11521,13 @@ class HermesCLI:

    def run(self):
        """Run the interactive CLI loop with persistent input at bottom."""
+        # Detect light/dark terminal mode now (before pt grabs the tty).
+        # Caches the result so subsequent _hex_to_ansi / style calls
+        # don't risk re-querying mid-render.
+        try:
+            _detect_light_mode()
+        except Exception:
+            pass
        # Push the entire TUI to the bottom of the terminal so the banner,
        # responses, and prompt all appear pinned to the bottom — empty
        # space stays above, not below.  This prints enough blank lines to
@@ -12754,7 +13253,10 @@ class HermesCLI:
                # guard against any future width mismatch.
                wrap_lines=False,
            ),
-            filter=Condition(lambda: cli_ref._status_bar_visible),
+            filter=Condition(
+                lambda: cli_ref._status_bar_visible
+                and not getattr(cli_ref, "_status_bar_suppressed_after_resize", False)
+            ),
        )

        # Allow wrapper CLIs to register extra keybindings.
@@ -12789,11 +13291,16 @@ class HermesCLI:
        
        # Style for the application
        self._tui_style_base = {
-            'input-area': '#FFF8DC',
-            'placeholder': '#555555 italic',
-            'prompt': '#FFF8DC',
+            # Input area / prompt: empty style strings inherit the
+            # terminal's default foreground/background, so the typed
+            # text is readable in both light and dark Terminal.app
+            # color schemes.  (Hardcoding a near-white #FFF8DC made
+            # input invisible on light backgrounds.)
+            'input-area': '',
+            'placeholder': '#888888 italic',
+            'prompt': '',
            'prompt-working': '#888888 italic',
-            'hint': '#555555 italic',
+            'hint': '#888888 italic',
            'status-bar': 'bg:#1a1a2e #C0C0C0',
            'status-bar-strong': 'bg:#1a1a2e #FFD700 bold',
            'status-bar-dim': 'bg:#1a1a2e #8B8682',
@@ -12852,19 +13359,70 @@ class HermesCLI:
        self._app = app  # Store reference for clarify_callback

        # ── Fix ghost status-bar lines on terminal resize ──────────────
-        # When the terminal shrinks (e.g. un-maximize), the emulator reflows
-        # the previously-rendered full-width rows (status bar, input rules)
-        # into multiple narrower rows.  prompt_toolkit's _on_resize handler
-        # only cursor_up()s by the stored layout height, missing the extra
-        # rows created by reflow — leaving ghost duplicates visible.
+        # Resize handling: monkey-patch prompt_toolkit's _output_screen_diff
+        # to suppress the deliberate "reserve vertical space" scroll-up.
        #
-        # It's not just column-shrink: widening, row-shrinking, and
-        # multiplexer-driven SIGWINCH-less redraws (cmux / tmux tab switch)
-        # all produce the same class of drift, where the renderer's tracked
-        # _cursor_pos.y no longer matches terminal reality. The only reliable
-        # recovery is a full screen-clear (\x1b[2J\x1b[H) before the next
-        # redraw, so we force one on every resize rather than trying to
-        # compute the exact drift.
+        # Background: prompt_toolkit's renderer (renderer.py L232-242)
+        # explicitly moves the cursor to the bottom of the canvas after
+        # painting "to make sure the terminal scrolls up, even when the
+        # lower lines of the canvas just contain whitespace".  In
+        # non-fullscreen mode this scrolls chrome content (status bar,
+        # input rules) into terminal scrollback on every render.  When
+        # the terminal column-shrinks, the emulator reflows the previously
+        # rendered full-width rows into multiple narrower rows that get
+        # pushed up — leaving ghost duplicates AND polluting scrollback.
+        # Same issue as pt #29 (open since 2014), #1675, #1933.
+        #
+        # Surgical fix: wrap _output_screen_diff so that when its internal
+        # `if current_height > previous_screen.height` branch fires (the
+        # one that does the bottom-cursor-move), we make it fall through
+        # by inflating previous_screen.height first.
+        try:
+            import prompt_toolkit.renderer as _pt_renderer
+            from prompt_toolkit.renderer import _output_screen_diff as _orig_osd
+
+            if not getattr(_pt_renderer, "_hermes_osd_patched", False):
+                def _patched_output_screen_diff(
+                    app, output, screen, current_pos, color_depth,
+                    previous_screen, last_style, is_done, full_screen,
+                    attrs_for_style_string, style_string_has_style,
+                    size, previous_width,
+                ):
+                    """Wraps pt's _output_screen_diff to suppress the
+                    reserve-vertical-space scroll (renderer.py L232-242).
+
+                    Strategy: ONLY when previous_screen is non-None and
+                    its current height is genuinely smaller than the new
+                    screen's height, inflate it to match.  This prevents
+                    the bottom-cursor-move at L242 without changing any
+                    other code path's behavior.
+
+                    Critical: do NOT replace a None previous_screen with
+                    a fresh Screen() — that would skip the proper
+                    reset_attributes()+erase_down() at L178-185 which
+                    fires when previous_screen is None (first-paint /
+                    width-change).  Without that reset, ANSI styles
+                    leak between renders.
+                    """
+                    try:
+                        if previous_screen is not None and hasattr(previous_screen, "height"):
+                            if previous_screen.height < screen.height:
+                                previous_screen.height = screen.height
+                    except Exception:
+                        pass
+
+                    return _orig_osd(
+                        app, output, screen, current_pos, color_depth,
+                        previous_screen, last_style, is_done, full_screen,
+                        attrs_for_style_string, style_string_has_style,
+                        size, previous_width,
+                    )
+
+                _pt_renderer._output_screen_diff = _patched_output_screen_diff
+                _pt_renderer._hermes_osd_patched = True
+        except Exception:
+            pass
+
        _original_on_resize = app._on_resize

        def _resize_clear_ghosts():
@@ -12923,6 +13481,10 @@ class HermesCLI:
                    if not user_input:
                        continue

+                    # The user has typed and submitted something, so any
+                    # post-resize transient suppression should end here.
+                    self._status_bar_suppressed_after_resize = False
+
                    # Unpack image payload: (text, [Path, ...]) or plain str
                    submit_images = []
                    if isinstance(user_input, tuple):
@@ -39,6 +39,10 @@ if [ "$(id -u)" = "0" ]; then
        # by the mapped user on the host side.
        chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
            echo "Warning: chown failed (rootless container?) — continuing anyway"
+        # The .venv must also be re-chowned when UID is remapped, otherwise
+        # lazy_deps.py cannot install platform packages (discord.py, etc.).
+        chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
+            echo "Warning: chown .venv failed (rootless container?) — continuing anyway"
    fi

    # Ensure config.yaml is readable by the hermes runtime user even if it was
@@ -74,6 +74,24 @@ def _normalize_notice_delivery(value: Any, default: str = "public") -> str:
    return default


+def _ensure_platform_extra_dict(platforms_data: dict, name: str) -> tuple[dict, dict]:
+    """Get-or-create ``platforms_data[name]`` and its nested ``extra`` dict.
+
+    Both slots are coerced to ``{}`` if a non-dict value is encountered, so
+    callers can safely write keys without type-checking.  Returns
+    ``(plat_data, extra)`` for in-place mutation.
+    """
+    plat_data = platforms_data.setdefault(name, {})
+    if not isinstance(plat_data, dict):
+        plat_data = {}
+        platforms_data[name] = plat_data
+    extra = plat_data.setdefault("extra", {})
+    if not isinstance(extra, dict):
+        extra = {}
+        plat_data["extra"] = extra
+    return plat_data, extra
+
+
 # Module-level cache for bundled platform plugin names (lives outside the
 # enum so it doesn't become an accidental enum member).
 _Platform__bundled_plugin_names: Optional[set] = None
@@ -717,6 +735,10 @@ def load_gateway_config() -> GatewayConfig:
                gw_data["thread_sessions_per_user"] = yaml_cfg["thread_sessions_per_user"]

            streaming_cfg = yaml_cfg.get("streaming")
+            if not isinstance(streaming_cfg, dict):
+                # Fall back to nested gateway.streaming written by
+                # ``hermes config set gateway.streaming.*``
+                streaming_cfg = yaml_cfg.get("gateway", {}).get("streaming")
            if isinstance(streaming_cfg, dict):
                gw_data["streaming"] = streaming_cfg

@@ -755,7 +777,27 @@ def load_gateway_config() -> GatewayConfig:
                        merged["extra"] = merged_extra
                    platforms_data[plat_name] = merged
                gw_data["platforms"] = platforms_data
-            for plat in Platform:
+            # Iterate built-in platforms plus any registered plugin platforms
+            # so plugin authors get the same shared-key bridging (#24836).
+            try:
+                from hermes_cli.plugins import discover_plugins
+                discover_plugins()  # idempotent
+                from gateway.platform_registry import platform_registry as _pr
+            except Exception as e:
+                logger.debug("plugin discovery skipped: %s", e)
+                _pr = None
+
+            _shared_loop_targets: list = list(Platform)
+            if _pr is not None:
+                for _entry in _pr.plugin_entries():
+                    try:
+                        _plat = Platform(_entry.name)
+                    except (ValueError, KeyError):
+                        continue
+                    if _plat not in _shared_loop_targets:
+                        _shared_loop_targets.append(_plat)
+
+            for plat in _shared_loop_targets:
                if plat == Platform.LOCAL:
                    continue
                platform_cfg = yaml_cfg.get(plat.value)
@@ -810,20 +852,38 @@ def load_gateway_config() -> GatewayConfig:
                enabled_was_explicit = "enabled" in platform_cfg
                if not bridged and not enabled_was_explicit:
                    continue
-                plat_data = platforms_data.setdefault(plat.value, {})
-                if not isinstance(plat_data, dict):
-                    plat_data = {}
-                    platforms_data[plat.value] = plat_data
+                plat_data, extra = _ensure_platform_extra_dict(platforms_data, plat.value)
                if enabled_was_explicit:
                    plat_data["enabled"] = platform_cfg["enabled"]
-                extra = plat_data.setdefault("extra", {})
-                if not isinstance(extra, dict):
-                    extra = {}
-                    plat_data["extra"] = extra
                if plat == Platform.SLACK and enabled_was_explicit:
                    extra["_enabled_explicit"] = True
                extra.update(bridged)

+            # Plugin-owned YAML→env config bridges (#24836).  See
+            # ``PlatformEntry.apply_yaml_config_fn`` for the hook contract.
+            # Order: shared-key loop (above) → this dispatch → legacy hardcoded
+            # blocks (below; no-op when a hook already set their env var) →
+            # ``_apply_env_overrides()`` after ``GatewayConfig.from_dict``.
+            if _pr is not None:
+                for entry in _pr.all_entries():
+                    if entry.apply_yaml_config_fn is None:
+                        continue
+                    platform_cfg = yaml_cfg.get(entry.name)
+                    if not isinstance(platform_cfg, dict):
+                        continue
+                    try:
+                        seeded = entry.apply_yaml_config_fn(yaml_cfg, platform_cfg)
+                    except Exception as e:
+                        logger.debug(
+                            "apply_yaml_config_fn for %s raised: %s",
+                            entry.name, e,
+                        )
+                        continue
+                    if not isinstance(seeded, dict) or not seeded:
+                        continue
+                    _, extra = _ensure_platform_extra_dict(platforms_data, entry.name)
+                    extra.update(seeded)
+
            # Slack settings → env vars (env vars take precedence)
            slack_cfg = yaml_cfg.get("slack", {})
            if isinstance(slack_cfg, dict):
@@ -852,6 +912,8 @@ def load_gateway_config() -> GatewayConfig:
            if isinstance(discord_cfg, dict):
                if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
                    os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
+                if "thread_require_mention" in discord_cfg and not os.getenv("DISCORD_THREAD_REQUIRE_MENTION"):
+                    os.environ["DISCORD_THREAD_REQUIRE_MENTION"] = str(discord_cfg["thread_require_mention"]).lower()
                frc = discord_cfg.get("free_response_channels")
                if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
                    if isinstance(frc, list):
@@ -119,6 +119,22 @@ class PlatformEntry:
    # Signature: () -> Optional[dict[str, Any]]
    env_enablement_fn: Optional[Callable[[], Optional[dict]]] = None

+    # ── YAML→env config bridge ──
+    # Optional: translate this platform's ``config.yaml`` keys into env vars
+    # and/or seed ``PlatformConfig.extra`` directly.  Lets a plugin own its
+    # YAML config translation instead of forcing core ``gateway/config.py``
+    # to know every platform's schema.
+    #
+    # Signature: (yaml_cfg: dict, platform_cfg: dict) -> Optional[dict]
+    # Called from ``load_gateway_config()`` after the generic shared-key loop
+    # and before ``_apply_env_overrides``.  Mutating ``os.environ`` is allowed
+    # (use ``not os.getenv(...)`` guards to preserve env > YAML precedence);
+    # any returned dict is merged into ``PlatformConfig.extra``.  Exceptions
+    # are caught and logged at debug level.
+    # See website/docs/developer-guide/adding-platform-adapters.md for the
+    # full contract and a worked example.
+    apply_yaml_config_fn: Optional[Callable[[dict, dict], Optional[dict]]] = None
+
    # Optional: home-channel env var name for cron/notification delivery
    # (e.g. ``"IRC_HOME_CHANNEL"``).  When set, ``cron.scheduler`` treats this
    # platform as a valid ``deliver=<name>`` target and reads the env var to
@@ -21,6 +21,14 @@ status display, gateway setup, and more.
  constructed.  Without this, env-only setups don't surface in
  `hermes gateway status` or `get_connected_platforms()` until the SDK
  instantiates.
+- `apply_yaml_config_fn: (yaml_cfg, platform_cfg) -> Optional[dict]` —
+  translate this platform's `config.yaml` keys into env vars and/or seed
+  `PlatformConfig.extra` directly.  Lets a plugin own its YAML schema
+  instead of growing core `gateway/config.py` boilerplate per platform.
+  Mutating `os.environ` is allowed (use `not os.getenv(...)` guards to
+  preserve env > YAML precedence); the returned dict is merged into
+  `PlatformConfig.extra`.  Called during `load_gateway_config()` after
+  the generic shared-key loop and before `_apply_env_overrides()`.
 - `cron_deliver_env_var: str` — name of the `*_HOME_CHANNEL` env var.  When
  set, `deliver=<name>` cron jobs route to this var without editing
  `cron/scheduler.py`'s hardcoded sets.
@@ -1774,8 +1774,12 @@ class BasePlatformAdapter(ABC):
        The default implementation falls back to a numbered text list,
        which works on every platform — the user replies with a number
        ("2") or with the literal choice text, and the gateway intercepts
-        and resolves.  Adapters with native button UIs (Telegram, Discord)
-        SHOULD override this for a richer UX.
+        and resolves.  For the text fallback path, the default calls
+        ``mark_awaiting_text()`` so that the gateway text-intercept
+        (:meth:`GatewayRunner._maybe_intercept_clarify_text`) catches the
+        user's reply instead of timing out.
+        Adapters with native button UIs (Telegram, Discord) SHOULD
+        override this for a richer UX.
        """
        if choices:
            lines = [f"❓ {question}", ""]
@@ -1784,6 +1788,10 @@ class BasePlatformAdapter(ABC):
            lines.append("")
            lines.append("Reply with the number, the option text, or your own answer.")
            text = "\n".join(lines)
+            # Text fallback: enable text-capture so the gateway intercept
+            # picks up the user's typed reply (e.g. "2" or choice text).
+            from tools.clarify_gateway import mark_awaiting_text
+            mark_awaiting_text(clarify_id)
        else:
            text = f"❓ {question}"
        return await self.send(
@@ -111,9 +111,33 @@ DINGTALK_TYPE_MAPPING = {


 def check_dingtalk_requirements() -> bool:
-    """Check if DingTalk dependencies are available and configured."""
+    """Check if DingTalk dependencies are available and configured.
+
+    Lazy-installs dingtalk-stream via ``tools.lazy_deps.ensure("platform.dingtalk")``
+    on first call if not present.
+    """
+    global DINGTALK_STREAM_AVAILABLE, dingtalk_stream, ChatbotMessage, CallbackMessage, AckMessage
+    global HTTPX_AVAILABLE, httpx
    if not DINGTALK_STREAM_AVAILABLE or not HTTPX_AVAILABLE:
-        return False
+        try:
+            from tools.lazy_deps import ensure as _lazy_ensure
+            _lazy_ensure("platform.dingtalk", prompt=False)
+        except Exception:
+            return False
+        try:
+            import dingtalk_stream as _ds
+            from dingtalk_stream import ChatbotMessage as _CM
+            from dingtalk_stream.frames import CallbackMessage as _CBM, AckMessage as _AM
+            import httpx as _httpx
+        except ImportError:
+            return False
+        dingtalk_stream = _ds
+        ChatbotMessage = _CM
+        CallbackMessage = _CBM
+        AckMessage = _AM
+        httpx = _httpx
+        DINGTALK_STREAM_AVAILABLE = True
+        HTTPX_AVAILABLE = True
    if not os.getenv("DINGTALK_CLIENT_ID") or not os.getenv("DINGTALK_CLIENT_SECRET"):
        return False
    return True
@@ -3577,6 +3577,25 @@ class DiscordAdapter(BasePlatformAdapter):
            return {part.strip() for part in s.split(",") if part.strip()}
        return set()

+    def _discord_thread_require_mention(self) -> bool:
+        """Return whether thread participation requires @mention to follow up.
+
+        When ``False`` (default), once the bot has participated in a thread it
+        keeps responding to every message in that thread without needing to be
+        mentioned again — useful for one-on-one conversations.
+
+        When ``True``, the @mention requirement is enforced inside threads as
+        well.  Set this when multiple bots share a thread and you want each
+        one to only fire on explicit @mention, avoiding bot-to-bot loops or
+        unwanted cross-replies.
+        """
+        configured = self.config.extra.get("thread_require_mention")
+        if configured is not None:
+            if isinstance(configured, str):
+                return configured.lower() not in ("false", "0", "no", "off")
+            return bool(configured)
+        return os.getenv("DISCORD_THREAD_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
+
    def _thread_parent_channel(self, channel: Any) -> Any:
        """Return the parent text channel when invoked from a thread."""
        return getattr(channel, "parent", None) or channel
@@ -3877,6 +3896,84 @@ class DiscordAdapter(BasePlatformAdapter):
        except Exception as e:
            return SendResult(success=False, error=str(e))

+    async def send_clarify(
+        self,
+        chat_id: str,
+        question: str,
+        choices: Optional[list],
+        clarify_id: str,
+        session_key: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Render a clarify prompt with one Discord button per choice.
+
+        Multi-choice mode (``choices`` non-empty): renders a button per option
+        plus a final "✏️ Other (type answer)" button. Picking "Other" flips
+        the clarify entry into text-capture mode so the next user message in
+        the session becomes the response. Numeric clicks resolve immediately
+        via ``resolve_gateway_clarify(clarify_id, choice_text)``.
+
+        Open-ended mode (``choices`` empty/None): renders the question as
+        plain embed text — no buttons. The gateway's text-intercept captures
+        the next message in this session and resolves the clarify.
+        """
+        if not self._client or not DISCORD_AVAILABLE:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            target_id = chat_id
+            if metadata and metadata.get("thread_id"):
+                target_id = metadata["thread_id"]
+
+            channel = self._client.get_channel(int(target_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(target_id))
+
+            # Discord embed description limit is 4096; trim conservatively.
+            max_desc = 4088
+            body = str(question or "").strip()
+            if len(body) > max_desc:
+                body = body[: max_desc - 3] + "..."
+
+            embed = discord.Embed(
+                title="❓ Hermes needs your input",
+                description=body,
+                color=discord.Color.orange(),
+            )
+
+            clean_choices = [
+                str(c).strip() for c in (choices or []) if c is not None and str(c).strip()
+            ]
+            # Discord allows up to 5 buttons per row, 5 rows per view = 25.
+            # We reserve one slot for the "Other" button, so cap at 24 choices.
+            clean_choices = clean_choices[:24]
+
+            if clean_choices:
+                embed.add_field(
+                    name="Choices",
+                    value="Pick one below, or click ✏️ Other to type a custom answer.",
+                    inline=False,
+                )
+                view = ClarifyChoiceView(
+                    choices=clean_choices,
+                    clarify_id=clarify_id,
+                    allowed_user_ids=self._allowed_user_ids,
+                    allowed_role_ids=self._allowed_role_ids,
+                )
+            else:
+                embed.add_field(
+                    name="Reply",
+                    value="Reply in this channel with your answer.",
+                    inline=False,
+                )
+                view = None
+
+            msg = await channel.send(embed=embed, view=view) if view else await channel.send(embed=embed)
+            return SendResult(success=True, message_id=str(msg.id))
+        except Exception as e:
+            logger.warning("[%s] send_clarify failed: %s", self.name, e)
+            return SendResult(success=False, error=str(e))
+
    async def send_update_prompt(
        self, chat_id: str, prompt: str, default: str = "",
        session_key: str = "",
@@ -4167,6 +4264,17 @@ class DiscordAdapter(BasePlatformAdapter):
        raw_content = message.content.strip()
        normalized_content = raw_content
        mention_prefix = False
+
+        snapshot_attachments = []
+        if hasattr(message, "message_snapshots") and message.message_snapshots:
+            snapshot_text_parts = []
+            for snap in message.message_snapshots:
+                if getattr(snap, "content", None):
+                    snapshot_text_parts.append(snap.content.strip())
+                snapshot_attachments.extend(getattr(snap, "attachments", []) or [])
+            if snapshot_text_parts and not raw_content:
+                raw_content = "\n".join(snapshot_text_parts)
+                normalized_content = raw_content
        if self._client.user and self._client.user in message.mentions:
            mention_prefix = True
            normalized_content = normalized_content.replace(f"<@{self._client.user.id}>", "").strip()
@@ -4209,8 +4317,15 @@ class DiscordAdapter(BasePlatformAdapter):
            )

            # Skip the mention check if the message is in a thread where
-            # the bot has previously participated (auto-created or replied in).
-            in_bot_thread = is_thread and thread_id in self._threads
+            # the bot has previously participated (auto-created or replied in)
+            # — UNLESS thread_require_mention is enabled, in which case threads
+            # are gated the same as channels.  Useful when multiple bots share
+            # a thread.
+            in_bot_thread = (
+                is_thread
+                and thread_id in self._threads
+                and not self._discord_thread_require_mention()
+            )

            if require_mention and not is_free_channel and not in_bot_thread:
                if self._client.user not in message.mentions and not mention_prefix:
@@ -4223,7 +4338,7 @@ class DiscordAdapter(BasePlatformAdapter):
        if not is_thread and not isinstance(message.channel, discord.DMChannel):
            no_thread_channels_raw = os.getenv("DISCORD_NO_THREAD_CHANNELS", "")
            no_thread_channels = {ch.strip() for ch in no_thread_channels_raw.split(",") if ch.strip()}
-            skip_thread = bool(channel_ids & no_thread_channels)
+            skip_thread = bool(channel_ids & no_thread_channels) or is_free_channel
            auto_thread = os.getenv("DISCORD_AUTO_THREAD", "true").lower() in {"true", "1", "yes"}
            is_reply_message = getattr(message, "type", None) == discord.MessageType.reply
            if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
@@ -4235,13 +4350,15 @@ class DiscordAdapter(BasePlatformAdapter):
                    auto_threaded_channel = thread
                    self._threads.mark(thread_id)

+        all_attachments = list(message.attachments) + snapshot_attachments
+
        # Determine message type
        msg_type = MessageType.TEXT
        if normalized_content.startswith("/"):
            msg_type = MessageType.COMMAND
-        elif message.attachments:
+        elif all_attachments:
            # Check attachment types
-            for att in message.attachments:
+            for att in all_attachments:
                if att.content_type:
                    if att.content_type.startswith("image/"):
                        msg_type = MessageType.PHOTO
@@ -4300,7 +4417,7 @@ class DiscordAdapter(BasePlatformAdapter):
        media_urls = []
        media_types = []
        pending_text_injection: Optional[str] = None
-        for att in message.attachments:
+        for att in all_attachments:
            content_type = att.content_type or "unknown"
            if content_type.startswith("image/"):
                try:
@@ -5099,3 +5216,188 @@ if DISCORD_AVAILABLE:
        async def on_timeout(self):
            self.resolved = True
            self.clear_items()
+
+
+    class ClarifyChoiceView(discord.ui.View):
+        """Interactive button view for the clarify tool's multiple-choice prompts.
+
+        Renders one button per choice (max 24) plus a final ``✏️ Other`` button.
+        Picking a numeric choice resolves the gateway clarify entry immediately;
+        picking ``Other`` flips the entry into text-capture mode so the next
+        user message in the session becomes the response (the gateway's
+        text-intercept handles the resolution).
+
+        Auth gating mirrors ``ExecApprovalView`` — only users/roles in the
+        Discord adapter's allowlist may answer. Single-use: after the first
+        valid click all buttons disable and the embed updates to show who
+        answered and what they chose.
+        """
+
+        def __init__(
+            self,
+            choices: List[str],
+            clarify_id: str,
+            allowed_user_ids: set,
+            allowed_role_ids: Optional[set] = None,
+        ):
+            super().__init__(timeout=300)  # 5-minute timeout
+            self.choices = list(choices)[:24]
+            self.clarify_id = clarify_id
+            self.allowed_user_ids = allowed_user_ids
+            self.allowed_role_ids = allowed_role_ids or set()
+            self.resolved = False
+
+            for index, choice in enumerate(self.choices):
+                # Discord button labels are capped at 80 chars.
+                label_body = choice if len(choice) <= 75 else choice[:72] + "..."
+                button = discord.ui.Button(
+                    label=f"{index + 1}. {label_body}",
+                    style=discord.ButtonStyle.primary,
+                    custom_id=f"clarify:{clarify_id}:{index}",
+                )
+                button.callback = self._make_choice_callback(index, choice)
+                self.add_item(button)
+
+            other_btn = discord.ui.Button(
+                label="✏️ Other (type answer)",
+                style=discord.ButtonStyle.secondary,
+                custom_id=f"clarify:{clarify_id}:other",
+            )
+            other_btn.callback = self._on_other
+            self.add_item(other_btn)
+
+        def _check_auth(self, interaction: "discord.Interaction") -> bool:
+            return _component_check_auth(
+                interaction, self.allowed_user_ids, self.allowed_role_ids,
+            )
+
+        def _make_choice_callback(self, index: int, choice: str):
+            async def _callback(interaction: "discord.Interaction"):
+                await self._resolve_choice(interaction, index, choice)
+            return _callback
+
+        async def _resolve_choice(
+            self,
+            interaction: "discord.Interaction",
+            index: int,
+            choice: str,
+        ) -> None:
+            """Resolve the clarify with a chosen option."""
+            if self.resolved:
+                await interaction.response.send_message(
+                    "This prompt has already been answered~", ephemeral=True,
+                )
+                return
+            if not self._check_auth(interaction):
+                await interaction.response.send_message(
+                    "You're not authorized to answer this prompt~", ephemeral=True,
+                )
+                return
+
+            self.resolved = True
+            for child in self.children:
+                child.disabled = True
+
+            embed = interaction.message.embeds[0] if (
+                interaction.message and interaction.message.embeds
+            ) else None
+            if embed:
+                user = getattr(interaction, "user", None)
+                display_name = getattr(user, "display_name", "user")
+                embed.color = discord.Color.green()
+                embed.set_footer(text=f"Answered by {display_name}: {choice}")
+
+            try:
+                await interaction.response.edit_message(embed=embed, view=self)
+            except Exception:
+                logger.debug(
+                    "Discord clarify edit_message failed for %s",
+                    self.clarify_id,
+                    exc_info=True,
+                )
+                try:
+                    await interaction.response.defer()
+                except Exception:
+                    pass
+
+            # Resolve via the gateway clarify primitive — same mechanism as
+            # Telegram. Look up the canonical choice text from the entry so
+            # we round-trip the original value, not a button-label variant.
+            resolved_text: Optional[str] = None
+            try:
+                from tools.clarify_gateway import _entries as _clarify_entries  # type: ignore
+                entry = _clarify_entries.get(self.clarify_id)
+                if entry and entry.choices and 0 <= index < len(entry.choices):
+                    resolved_text = entry.choices[index]
+            except Exception:
+                resolved_text = None
+            if resolved_text is None:
+                resolved_text = choice
+
+            try:
+                from tools.clarify_gateway import resolve_gateway_clarify
+                resolved = resolve_gateway_clarify(self.clarify_id, resolved_text)
+                logger.info(
+                    "Discord clarify button resolved (id=%s, choice=%r, user=%s, ok=%s)",
+                    self.clarify_id, resolved_text,
+                    getattr(getattr(interaction, "user", None), "display_name", "?"),
+                    resolved,
+                )
+            except Exception as exc:
+                logger.error(
+                    "Discord clarify resolve_gateway_clarify failed (id=%s): %s",
+                    self.clarify_id, exc,
+                )
+
+        async def _on_other(self, interaction: "discord.Interaction") -> None:
+            """Flip the clarify entry into text-capture mode."""
+            if self.resolved:
+                await interaction.response.send_message(
+                    "This prompt has already been answered~", ephemeral=True,
+                )
+                return
+            if not self._check_auth(interaction):
+                await interaction.response.send_message(
+                    "You're not authorized to answer this prompt~", ephemeral=True,
+                )
+                return
+
+            # Don't pop the entry — the gateway's text-intercept needs it
+            # until the user actually types. Just mark it as awaiting text
+            # and disable the buttons so the user can't double-click.
+            try:
+                from tools.clarify_gateway import mark_awaiting_text
+                mark_awaiting_text(self.clarify_id)
+            except Exception as exc:
+                logger.warning(
+                    "Discord clarify mark_awaiting_text failed (id=%s): %s",
+                    self.clarify_id, exc,
+                )
+
+            self.resolved = True
+            for child in self.children:
+                child.disabled = True
+
+            embed = interaction.message.embeds[0] if (
+                interaction.message and interaction.message.embeds
+            ) else None
+            if embed:
+                user = getattr(interaction, "user", None)
+                display_name = getattr(user, "display_name", "user")
+                embed.color = discord.Color.blue()
+                embed.set_footer(
+                    text=f"Awaiting typed response from {display_name}…",
+                )
+
+            try:
+                await interaction.response.edit_message(embed=embed, view=self)
+            except Exception:
+                try:
+                    await interaction.response.defer()
+                except Exception:
+                    pass
+
+        async def on_timeout(self):
+            self.resolved = True
+            for child in self.children:
+                child.disabled = True
@@ -1300,12 +1300,12 @@ def _run_official_feishu_ws_client(ws_client: Any, adapter: Any) -> None:
        except Exception:
            logger.debug("[Feishu] Failed to apply websocket runtime overrides", exc_info=True)

-    async def _connect_with_overrides(*args: Any, **kwargs: Any) -> Any:
+    def _connect_with_overrides(*args: Any, **kwargs: Any) -> Any:
        if adapter._ws_ping_interval is not None and "ping_interval" not in kwargs:
            kwargs["ping_interval"] = adapter._ws_ping_interval
        if adapter._ws_ping_timeout is not None and "ping_timeout" not in kwargs:
            kwargs["ping_timeout"] = adapter._ws_ping_timeout
-        return await original_connect(*args, **kwargs)
+        return original_connect(*args, **kwargs)

    def _configure_with_overrides(conf: Any) -> Any:
        if original_configure is None:
@@ -1343,8 +1343,65 @@ def _run_official_feishu_ws_client(ws_client: Any, adapter: Any) -> None:


 def check_feishu_requirements() -> bool:
-    """Check if Feishu/Lark dependencies are available."""
-    return FEISHU_AVAILABLE
+    """Check if Feishu/Lark dependencies are available.
+
+    Lazy-installs lark-oapi via ``tools.lazy_deps.ensure("platform.feishu")``
+    on first call if not present. Rebinds all module-level globals on success.
+    """
+    if FEISHU_AVAILABLE:
+        return True
+
+    def _import():
+        import lark_oapi as lark
+        from lark_oapi.api.application.v6 import GetApplicationRequest
+        from lark_oapi.api.im.v1 import (
+            CreateFileRequest, CreateFileRequestBody,
+            CreateImageRequest, CreateImageRequestBody,
+            CreateMessageRequest, CreateMessageRequestBody,
+            GetChatRequest, GetMessageRequest, GetMessageResourceRequest,
+            P2ImMessageMessageReadV1,
+            ReplyMessageRequest, ReplyMessageRequestBody,
+            UpdateMessageRequest, UpdateMessageRequestBody,
+        )
+        from lark_oapi.core import AccessTokenType, HttpMethod
+        from lark_oapi.core.const import FEISHU_DOMAIN, LARK_DOMAIN
+        from lark_oapi.core.model import BaseRequest
+        from lark_oapi.event.callback.model.p2_card_action_trigger import (
+            CallBackCard, P2CardActionTriggerResponse,
+        )
+        from lark_oapi.event.dispatcher_handler import EventDispatcherHandler
+        from lark_oapi.ws import Client as FeishuWSClient
+        return {
+            "lark": lark,
+            "GetApplicationRequest": GetApplicationRequest,
+            "CreateFileRequest": CreateFileRequest,
+            "CreateFileRequestBody": CreateFileRequestBody,
+            "CreateImageRequest": CreateImageRequest,
+            "CreateImageRequestBody": CreateImageRequestBody,
+            "CreateMessageRequest": CreateMessageRequest,
+            "CreateMessageRequestBody": CreateMessageRequestBody,
+            "GetChatRequest": GetChatRequest,
+            "GetMessageRequest": GetMessageRequest,
+            "GetMessageResourceRequest": GetMessageResourceRequest,
+            "P2ImMessageMessageReadV1": P2ImMessageMessageReadV1,
+            "ReplyMessageRequest": ReplyMessageRequest,
+            "ReplyMessageRequestBody": ReplyMessageRequestBody,
+            "UpdateMessageRequest": UpdateMessageRequest,
+            "UpdateMessageRequestBody": UpdateMessageRequestBody,
+            "AccessTokenType": AccessTokenType,
+            "HttpMethod": HttpMethod,
+            "FEISHU_DOMAIN": FEISHU_DOMAIN,
+            "LARK_DOMAIN": LARK_DOMAIN,
+            "BaseRequest": BaseRequest,
+            "CallBackCard": CallBackCard,
+            "P2CardActionTriggerResponse": P2CardActionTriggerResponse,
+            "EventDispatcherHandler": EventDispatcherHandler,
+            "FeishuWSClient": FeishuWSClient,
+            "FEISHU_AVAILABLE": True,
+        }
+
+    from tools.lazy_deps import ensure_and_bind
+    return ensure_and_bind("platform.feishu", _import, globals(), prompt=False)


 class FeishuAdapter(BasePlatformAdapter):
@@ -224,7 +224,11 @@ def _check_e2ee_deps() -> bool:


 def check_matrix_requirements() -> bool:
-    """Return True if the Matrix adapter can be used."""
+    """Return True if the Matrix adapter can be used.
+
+    Lazy-installs mautrix via ``tools.lazy_deps.ensure("platform.matrix")``
+    on first call if not present. Rebinds all module-level type globals on success.
+    """
    token = os.getenv("MATRIX_ACCESS_TOKEN", "")
    password = os.getenv("MATRIX_PASSWORD", "")
    homeserver = os.getenv("MATRIX_HOMESERVER", "")
@@ -238,10 +242,31 @@ def check_matrix_requirements() -> bool:
    try:
        import mautrix  # noqa: F401
    except ImportError:
-        logger.warning(
-            "Matrix: mautrix not installed. Run: pip install 'mautrix[encryption]'"
-        )
-        return False
+        def _import():
+            from mautrix.types import (
+                ContentURI, EventID, EventType, PaginationDirection,
+                PresenceState, RoomCreatePreset, RoomID, SyncToken,
+                TrustState, UserID,
+            )
+            return {
+                "ContentURI": ContentURI,
+                "EventID": EventID,
+                "EventType": EventType,
+                "PaginationDirection": PaginationDirection,
+                "PresenceState": PresenceState,
+                "RoomCreatePreset": RoomCreatePreset,
+                "RoomID": RoomID,
+                "SyncToken": SyncToken,
+                "TrustState": TrustState,
+                "UserID": UserID,
+            }
+
+        from tools.lazy_deps import ensure_and_bind
+        if not ensure_and_bind("platform.matrix", _import, globals(), prompt=False):
+            logger.warning(
+                "Matrix: mautrix not installed. Run: pip install 'mautrix[encryption]'"
+            )
+            return False

    # If encryption is requested, verify E2EE deps are available at startup
    # rather than silently degrading to plaintext-only at connect time.
@@ -176,6 +176,28 @@ class QQAdapter(BasePlatformAdapter):
                fut.set_exception(RuntimeError(reason))
        self._pending_responses.clear()

+    def _mark_transport_disconnected(self) -> None:
+        """Mark QQ WS down without stopping the reconnect loop.
+
+        BasePlatformAdapter uses _running for both process lifecycle and
+        connection status. QQBot needs to keep the listener task alive across
+        transient transport drops so it can continue reconnect attempts after a
+        short-lived gateway or network failure.
+        """
+        if self.has_fatal_error:
+            return
+        self._write_runtime_status_safe(
+            "disconnected",
+            platform_state="disconnected",
+            error_code=None,
+            error_message=None,
+        )
+
+    @property
+    def is_connected(self) -> bool:
+        """Return True only when the QQ WebSocket transport is usable."""
+        return bool(self._running and self._ws and not self._ws.closed)
+
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.QQBOT)

@@ -509,7 +531,7 @@ class QQAdapter(BasePlatformAdapter):
                else:
                    quick_disconnect_count = 0

-                self._mark_disconnected()
+                self._mark_transport_disconnected()
                self._fail_pending("Connection closed")

                # Stop reconnecting for fatal codes
@@ -531,6 +553,7 @@ class QQAdapter(BasePlatformAdapter):
                        RATE_LIMIT_DELAY,
                    )
                    if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
+                        self._mark_disconnected()
                        return
                    await asyncio.sleep(RATE_LIMIT_DELAY)
                    if await self._reconnect(backoff_idx):
@@ -584,17 +607,19 @@ class QQAdapter(BasePlatformAdapter):
                    backoff_idx += 1
                    if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
                        logger.error("[%s] Max reconnect attempts reached (QQCloseError)", self._log_tag)
+                        self._mark_disconnected()
                        return

            except Exception as exc:
                if not self._running:
                    return
                logger.warning("[%s] WebSocket error: %s", self._log_tag, exc)
-                self._mark_disconnected()
+                self._mark_transport_disconnected()
                self._fail_pending("Connection interrupted")

                if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
                    logger.error("[%s] Max reconnect attempts reached", self._log_tag)
+                    self._mark_disconnected()
                    return

                if await self._reconnect(backoff_idx):
@@ -73,8 +73,29 @@ class _ThreadContextCache:


 def check_slack_requirements() -> bool:
-    """Check if Slack dependencies are available."""
-    return SLACK_AVAILABLE
+    """Check if Slack dependencies are available.
+
+    Lazy-installs slack-bolt/slack-sdk via ``tools.lazy_deps.ensure("platform.slack")``
+    on first call if not present. Rebinds all module-level globals on success.
+    """
+    if SLACK_AVAILABLE:
+        return True
+
+    def _import():
+        from slack_bolt.async_app import AsyncApp
+        from slack_bolt.adapter.socket_mode.async_handler import AsyncSocketModeHandler
+        from slack_sdk.web.async_client import AsyncWebClient
+        import aiohttp
+        return {
+            "AsyncApp": AsyncApp,
+            "AsyncSocketModeHandler": AsyncSocketModeHandler,
+            "AsyncWebClient": AsyncWebClient,
+            "aiohttp": aiohttp,
+            "SLACK_AVAILABLE": True,
+        }
+
+    from tools.lazy_deps import ensure_and_bind
+    return ensure_and_bind("platform.slack", _import, globals(), prompt=False)


 def _extract_text_from_slack_blocks(blocks: list) -> str:
@@ -1777,6 +1798,26 @@ class SlackAdapter(BasePlatformAdapter):
            return

        original_text = event.get("text", "")
+
+        # Slack blocks native slash commands inside threads ("/queue is not
+        # supported in threads. Sorry!").  As a workaround, recognise a
+        # leading ``!`` as an alternate command prefix and rewrite it to
+        # ``/`` so the rest of the pipeline (MessageType.COMMAND tagging,
+        # gateway dispatcher) handles it like a normal slash command.  Only
+        # rewrite when the first token resolves to a known gateway command
+        # so casual messages like "!nice work" pass through unchanged.
+        if original_text.startswith("!"):
+            try:
+                from hermes_cli.commands import is_gateway_known_command
+                first_token = original_text[1:].split(maxsplit=1)[0]
+                # Strip "@suffix" the same way get_command() does, so
+                # forms like ``!stop@hermes`` still resolve.
+                cmd_name = first_token.split("@", 1)[0].lower()
+                if cmd_name and "/" not in cmd_name and is_gateway_known_command(cmd_name):
+                    original_text = "/" + original_text[1:]
+            except Exception:  # pragma: no cover - defensive
+                pass
+
        text = original_text

        # Extract quoted/forwarded content from Slack blocks.
@@ -332,6 +332,13 @@ class TelegramAdapter(BasePlatformAdapter):
    MEDIA_GROUP_WAIT_SECONDS = 0.8
    _GENERAL_TOPIC_THREAD_ID = "1"

+    # Telegram's edit_message applies MarkdownV2 formatting only on the
+    # finalize=True path.  Without this flag, stream_consumer._send_or_edit
+    # short-circuits when the raw text is unchanged between the last streamed
+    # edit and the final edit, skipping the plain-text → MarkdownV2 conversion.
+    # Fixes #25710.
+    REQUIRES_EDIT_FINALIZE: bool = True
+
    # Adaptive text-batch ingress: short messages need a tighter delay so the
    # first token reaches the agent fast.  Numbers tuned for "feels instant":
    # ≤320 codepoints (one short paragraph) settles in ~180ms; ≤1024
@@ -2070,7 +2077,7 @@ class TelegramAdapter(BasePlatformAdapter):
            return SendResult(success=False, error="Not connected")
        try:
            default_hint = f" (default: {default})" if default else ""
-            text = f"⚕ *Update needs your input:*\n\n{prompt}{default_hint}"
+            text = self.format_message(f"⚕ *Update needs your input:*\n\n{prompt}{default_hint}")
            keyboard = InlineKeyboardMarkup([
                [
                    InlineKeyboardButton("✓ Yes", callback_data="update_prompt:y"),
@@ -2082,7 +2089,7 @@ class TelegramAdapter(BasePlatformAdapter):
            msg = await self._send_message_with_thread_fallback(
                chat_id=int(chat_id),
                text=text,
-                parse_mode=ParseMode.MARKDOWN,
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=keyboard,
                reply_to_message_id=reply_to_id,
                **self._thread_kwargs_for_send(
@@ -2334,11 +2341,13 @@ class TelegramAdapter(BasePlatformAdapter):
            keyboard = InlineKeyboardMarkup(rows)

            provider_label = get_label(current_provider)
-            text = (
-                f"⚙ *Model Configuration*\n\n"
-                f"Current model: `{current_model or 'unknown'}`\n"
-                f"Provider: {provider_label}\n\n"
-                f"Select a provider:"
+            text = self.format_message(
+                (
+                    f"⚙ *Model Configuration*\n\n"
+                    f"Current model: `{current_model or 'unknown'}`\n"
+                    f"Provider: {provider_label}\n\n"
+                    f"Select a provider:"
+                )
            )

            thread_id = metadata.get("thread_id") if metadata else None
@@ -2346,7 +2355,7 @@ class TelegramAdapter(BasePlatformAdapter):
            msg = await self._send_message_with_thread_fallback(
                chat_id=int(chat_id),
                text=text,
-                parse_mode=ParseMode.MARKDOWN,
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=keyboard,
                reply_to_message_id=reply_to_id,
                **self._thread_kwargs_for_send(
@@ -2456,12 +2465,14 @@ class TelegramAdapter(BasePlatformAdapter):
            extra = f"\n_{total - shown} more available — type `/model <name>` directly_" if total > shown else ""

            await query.edit_message_text(
-                text=(
-                    f"⚙ *Model Configuration*\n\n"
-                    f"Provider: *{pname}*{page_info}\n"
-                    f"Select a model:{extra}"
+                text=self.format_message(
+                    (
+                        f"⚙ *Model Configuration*\n\n"
+                        f"Provider: *{pname}*{page_info}\n"
+                        f"Select a model:{extra}"
+                    )
                ),
-                parse_mode=ParseMode.MARKDOWN,
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=keyboard,
            )
            await query.answer()
@@ -2490,12 +2501,14 @@ class TelegramAdapter(BasePlatformAdapter):
            extra = f"\n_{total - shown} more available — type `/model <name>` directly_" if total > shown else ""

            await query.edit_message_text(
-                text=(
-                    f"⚙ *Model Configuration*\n\n"
-                    f"Provider: *{pname}*{page_info}\n"
-                    f"Select a model:{extra}"
+                text=self.format_message(
+                    (
+                        f"⚙ *Model Configuration*\n\n"
+                        f"Provider: *{pname}*{page_info}\n"
+                        f"Select a model:{extra}"
+                    )
                ),
-                parse_mode=ParseMode.MARKDOWN,
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=keyboard,
            )
            await query.answer()
@@ -2530,8 +2543,8 @@ class TelegramAdapter(BasePlatformAdapter):
            # Edit message to show confirmation, remove buttons
            try:
                await query.edit_message_text(
-                    text=result_text,
-                    parse_mode=ParseMode.MARKDOWN,
+                    text=self.format_message(result_text),
+                    parse_mode=ParseMode.MARKDOWN_V2,
                    reply_markup=None,
                )
            except Exception:
@@ -2571,13 +2584,15 @@ class TelegramAdapter(BasePlatformAdapter):
                provider_label = state["current_provider"]

            await query.edit_message_text(
-                text=(
-                    f"⚙ *Model Configuration*\n\n"
-                    f"Current model: `{state['current_model'] or 'unknown'}`\n"
-                    f"Provider: {provider_label}\n\n"
-                    f"Select a provider:"
+                text=self.format_message(
+                    (
+                        f"⚙ *Model Configuration*\n\n"
+                        f"Current model: `{state['current_model'] or 'unknown'}`\n"
+                        f"Provider: {provider_label}\n\n"
+                        f"Select a provider:"
+                    )
                ),
-                parse_mode=ParseMode.MARKDOWN,
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=keyboard,
            )
            await query.answer()
@@ -2660,8 +2675,8 @@ class TelegramAdapter(BasePlatformAdapter):
                # Edit message to show decision, remove buttons
                try:
                    await query.edit_message_text(
-                        text=f"{label} by {user_display}",
-                        parse_mode=ParseMode.MARKDOWN,
+                        text=self.format_message(f"{label} by {user_display}"),
+                        parse_mode=ParseMode.MARKDOWN_V2,
                        reply_markup=None,
                    )
                except Exception:
@@ -2714,8 +2729,8 @@ class TelegramAdapter(BasePlatformAdapter):

                try:
                    await query.edit_message_text(
-                        text=f"{label} by {user_display}",
-                        parse_mode=ParseMode.MARKDOWN,
+                        text=self.format_message(f"{label} by {user_display}"),
+                        parse_mode=ParseMode.MARKDOWN_V2,
                        reply_markup=None,
                    )
                except Exception:
@@ -2740,8 +2755,8 @@ class TelegramAdapter(BasePlatformAdapter):
                        prompt_message_id = getattr(query.message, "message_id", None)
                        send_kwargs: Dict[str, Any] = {
                            "chat_id": int(query.message.chat_id),
-                            "text": result_text,
-                            "parse_mode": ParseMode.MARKDOWN,
+                            "text": self.format_message(result_text),
+                            "parse_mode": ParseMode.MARKDOWN_V2,
                            **self._link_preview_kwargs(),
                        }
                        chat_type_value = getattr(chat_type, "value", chat_type)
@@ -2901,8 +2916,8 @@ class TelegramAdapter(BasePlatformAdapter):
        label = "Yes" if answer == "y" else "No"
        try:
            await query.edit_message_text(
-                text=f"⚕ Update prompt answered: *{label}*",
-                parse_mode=ParseMode.MARKDOWN,
+                text=self.format_message(f"⚕ Update prompt answered: *{label}*"),
+                parse_mode=ParseMode.MARKDOWN_V2,
                reply_markup=None,
            )
        except Exception:
@@ -345,6 +345,7 @@ class WeComAdapter(BasePlatformAdapter):
                try:
                    await self._open_connection()
                    backoff_idx = 0
+                    self._mark_connected()
                    logger.info("[%s] Reconnected", self.name)
                except Exception as reconnect_exc:
                    logger.warning("[%s] Reconnect failed: %s", self.name, reconnect_exc)
@@ -322,6 +322,26 @@ class WhatsAppAdapter(BasePlatformAdapter):
            return {str(part).strip() for part in raw if str(part).strip()}
        return {part.strip() for part in str(raw).split(",") if part.strip()}

+    @staticmethod
+    def _is_broadcast_chat(chat_id: str) -> bool:
+        """True for WhatsApp pseudo-chats that aren't real conversations.
+
+        Covers Status updates (Stories) and Channel/Newsletter broadcasts.
+        These show up as inbound messages on Baileys but the agent should
+        never reply — answering a Story update spams the contact's status
+        feed, and Channel posts aren't addressable in the first place.
+        """
+        if not chat_id:
+            return False
+        cid = chat_id.strip().lower()
+        if cid == "status@broadcast":
+            return True
+        # @broadcast suffix covers status@broadcast plus any future
+        # broadcast-list variants. @newsletter is the Channel JID suffix.
+        if cid.endswith("@broadcast") or cid.endswith("@newsletter"):
+            return True
+        return False
+
    def _is_dm_allowed(self, sender_id: str) -> bool:
        """Check whether a DM from the given sender should be processed."""
        if self._dm_policy == "disabled":
@@ -432,9 +452,16 @@ class WhatsAppAdapter(BasePlatformAdapter):
        return cleaned.strip() or text

    def _should_process_message(self, data: Dict[str, Any]) -> bool:
+        chat_id_raw = str(data.get("chatId") or "")
+        # WhatsApp uses pseudo-chats for Status updates (Stories) and
+        # Channel/Newsletter broadcasts. These are not real conversations
+        # and the agent should never reply to them — even in self-chat mode
+        # where the bridge may surface them as "fromMe" events.
+        if self._is_broadcast_chat(chat_id_raw):
+            return False
        is_group = data.get("isGroup", False)
        if is_group:
-            chat_id = str(data.get("chatId") or "")
+            chat_id = chat_id_raw
            if not self._is_group_allowed(chat_id):
                return False
        else:
@@ -494,12 +521,15 @@ class WhatsAppAdapter(BasePlatformAdapter):
                # plain executable path.
                _npm_bin = shutil.which("npm") or "npm"
                try:
+                    # Read timeout from environment variable, default to 300 seconds (5 minutes)
+                    # to accommodate slower systems like Unraid NAS
+                    npm_install_timeout = int(os.environ.get("WHATSAPP_NPM_INSTALL_TIMEOUT", "300"))
                    install_result = subprocess.run(
                        [_npm_bin, "install", "--silent"],
                        cwd=str(bridge_dir),
                        capture_output=True,
                        text=True,
-                        timeout=60,
+                        timeout=npm_install_timeout,
                    )
                    if install_result.returncode != 0:
                        print(f"[{self.name}] npm install failed: {install_result.stderr}")
@@ -1139,6 +1139,38 @@ def _should_clear_resume_pending_after_turn(agent_result: dict) -> bool:
    return True


+def _preserve_queued_followup_history_offset(
+    current_result: dict,
+    followup_result: dict,
+) -> dict:
+    """Carry the outer history offset through queued follow-up drains.
+
+    ``_process_message_background()`` persists transcript rows only once, after the
+    entire in-band queued-follow-up chain returns.  Each recursive ``_run_agent()``
+    call advances ``history_offset`` to the history it received, so without
+    correction the outermost persistence step sees only the *last* queued turn as
+    "new" and silently drops earlier turns from the same drain chain.
+
+    Preserve the earliest (outermost) history offset so the final transcript slice
+    still includes every queued turn that ran during the chain.
+    """
+    if not isinstance(followup_result, dict):
+        return followup_result
+    if not isinstance(current_result, dict):
+        return followup_result
+
+    current_offset = current_result.get("history_offset")
+    followup_offset = followup_result.get("history_offset")
+    if not isinstance(current_offset, int):
+        return followup_result
+    if isinstance(followup_offset, int) and followup_offset <= current_offset:
+        return followup_result
+
+    merged = dict(followup_result)
+    merged["history_offset"] = current_offset
+    return merged
+
+
 class GatewayRunner:
    """
    Main gateway controller.
@@ -6096,6 +6128,12 @@ class GatewayRunner:
            if _cmd_def_inner and _cmd_def_inner.name == "model":
                return "Agent is running — wait or /stop first, then switch models."

+            # /codex-runtime must not be used while the agent is running.
+            # Switching mid-turn would split a turn across two transports.
+            if _cmd_def_inner and _cmd_def_inner.name == "codex-runtime":
+                return ("Agent is running — wait or /stop first, then "
+                        "change runtime.")
+
            # /approve and /deny must bypass the running-agent interrupt path.
            # The agent thread is blocked on a threading.Event inside
            # tools/approval.py — sending an interrupt won't unblock it.
@@ -6135,6 +6173,12 @@ class GatewayRunner:
                    return await self._handle_goal_command(event)
                return "Agent is running — use /goal status / pause / clear mid-run, or /stop before setting a new goal."

+            # /subgoal is safe mid-run — it only modifies the goal's
+            # subgoals list, which the judge reads at the next turn
+            # boundary. No race with the running turn.
+            if _cmd_def_inner and _cmd_def_inner.name == "subgoal":
+                return await self._handle_subgoal_command(event)
+
            # Session-level toggles that are safe to run mid-agent —
            # /yolo can unblock a pending approval prompt, /verbose cycles
            # the tool-progress display mode for the ongoing stream.
@@ -6430,6 +6474,9 @@ class GatewayRunner:
        if canonical == "model":
            return await self._handle_model_command(event)

+        if canonical == "codex-runtime":
+            return await self._handle_codex_runtime_command(event)
+
        if canonical == "personality":
            return await self._handle_personality_command(event)

@@ -6513,6 +6560,9 @@ class GatewayRunner:
        if canonical == "goal":
            return await self._handle_goal_command(event)

+        if canonical == "subgoal":
+            return await self._handle_subgoal_command(event)
+
        if canonical == "voice":
            return await self._handle_voice_command(event)

@@ -9210,6 +9260,51 @@ class GatewayRunner:

        return "\n".join(lines)

+    async def _handle_codex_runtime_command(self, event: MessageEvent) -> str:
+        """Handle /codex-runtime command in the gateway.
+
+        Same surface as the CLI handler in cli.py:
+            /codex-runtime                  — show current state
+            /codex-runtime auto             — Hermes default runtime
+            /codex-runtime codex_app_server — codex subprocess runtime
+            /codex-runtime on / off         — synonyms
+
+        On change, the cached agent for this session is evicted so the next
+        message creates a fresh AIAgent with the new api_mode wired in
+        (avoids prompt-cache invalidation mid-session)."""
+        from hermes_cli import codex_runtime_switch as crs
+
+        raw_args = event.get_command_args().strip() if event else ""
+        new_value, errors = crs.parse_args(raw_args)
+        if errors:
+            return "❌ " + "\n❌ ".join(errors)
+
+        # Load + persist via the same helpers used for /model and /yolo
+        try:
+            from hermes_cli.config import load_config, save_config
+        except Exception as exc:
+            return f"❌ Could not load config: {exc}"
+        cfg = load_config()
+
+        result = crs.apply(
+            cfg,
+            new_value,
+            persist_callback=(save_config if new_value is not None else None),
+        )
+
+        # On a real change, evict the cached agent so the new runtime takes
+        # effect on the next message rather than waiting for cache TTL.
+        if result.success and new_value is not None and result.requires_new_session:
+            try:
+                session_key = self._session_key_for_source(event.source)
+                self._evict_cached_agent(session_key)
+            except Exception:
+                logger.debug("could not evict cached agent after codex-runtime change",
+                             exc_info=True)
+
+        prefix = "✓" if result.success else "✗"
+        return f"{prefix} {result.message}"
+
    async def _handle_personality_command(self, event: MessageEvent) -> str:
        """Handle /personality command - list or set a personality."""
        from hermes_constants import display_hermes_home
@@ -9438,6 +9533,57 @@ class GatewayRunner:

        return t("gateway.goal.set", budget=state.max_turns, goal=state.goal)

+    async def _handle_subgoal_command(self, event: "MessageEvent") -> str:
+        """Handle /subgoal for gateway platforms (mirror of CLI handler).
+
+        Subgoals are extra criteria appended to the active goal mid-loop.
+        They modify state read at the next turn boundary, so this is safe
+        to invoke while the agent is running.
+        """
+        args = (event.get_command_args() or "").strip()
+        mgr, _session_entry = self._get_goal_manager_for_event(event)
+        if mgr is None:
+            return t("gateway.goal.unavailable")
+        if not mgr.has_goal():
+            return "No active goal. Set one with /goal <text>."
+
+        # No args → list current subgoals.
+        if not args:
+            return f"{mgr.status_line()}\n{mgr.render_subgoals()}"
+
+        tokens = args.split(None, 1)
+        verb = tokens[0].lower()
+        rest = tokens[1].strip() if len(tokens) > 1 else ""
+
+        if verb == "remove":
+            if not rest:
+                return "Usage: /subgoal remove <n>"
+            try:
+                idx = int(rest.split()[0])
+            except ValueError:
+                return "/subgoal remove: <n> must be an integer (1-based index)."
+            try:
+                removed = mgr.remove_subgoal(idx)
+            except (IndexError, RuntimeError) as exc:
+                return f"/subgoal remove: {exc}"
+            return f"✓ Removed subgoal {idx}: {removed}"
+
+        if verb == "clear":
+            try:
+                prev = mgr.clear_subgoals()
+            except RuntimeError as exc:
+                return f"/subgoal clear: {exc}"
+            if prev:
+                return f"✓ Cleared {prev} subgoal{'s' if prev != 1 else ''}."
+            return "No subgoals to clear."
+
+        try:
+            text = mgr.add_subgoal(args)
+        except (ValueError, RuntimeError) as exc:
+            return f"/subgoal: {exc}"
+        idx = len(mgr.state.subgoals) if mgr.state else 0
+        return f"✓ Added subgoal {idx}: {text}"
+
    async def _send_goal_status_notice(self, source: Any, message: str) -> None:
        """Send a /goal judge status line back to the originating chat/thread."""
        adapter = self.adapters.get(source.platform)
@@ -10209,6 +10355,10 @@ class GatewayRunner:

        event_message_id = self._reply_anchor_for_event(event)

+        # Forward image/audio attachments so the background agent can see them.
+        media_urls = list(event.media_urls) if event.media_urls else []
+        media_types = list(event.media_types) if event.media_types else []
+
        # Fire-and-forget the background task
        _task = asyncio.create_task(
            self._run_background_task(
@@ -10216,6 +10366,8 @@ class GatewayRunner:
                source,
                task_id,
                event_message_id=event_message_id,
+                media_urls=media_urls,
+                media_types=media_types,
            )
        )
        self._background_tasks.add(_task)
@@ -10230,10 +10382,15 @@ class GatewayRunner:
        source: "SessionSource",
        task_id: str,
        event_message_id: Optional[str] = None,
+        media_urls: Optional[List[str]] = None,
+        media_types: Optional[List[str]] = None,
    ) -> None:
        """Execute a background agent task and deliver the result to the chat."""
        from run_agent import AIAgent

+        media_urls = media_urls or []
+        media_types = media_types or []
+
        adapter = self.adapters.get(source.platform)
        if not adapter:
            logger.warning("No adapter for platform %s in background task %s", source.platform, task_id)
@@ -10269,6 +10426,23 @@ class GatewayRunner:
            self._service_tier = self._load_service_tier()
            turn_route = self._resolve_turn_agent_config(prompt, model, runtime_kwargs)

+            # Enrich the prompt with image descriptions so the background
+            # agent can see user-attached images (same as the main flow).
+            enriched_prompt = prompt
+            if media_urls:
+                image_paths = []
+                for i, path in enumerate(media_urls):
+                    mtype = media_types[i] if i < len(media_types) else ""
+                    if mtype.startswith("image/"):
+                        image_paths.append(path)
+                if image_paths:
+                    try:
+                        enriched_prompt = await self._enrich_message_with_vision(
+                            prompt, image_paths,
+                        )
+                    except Exception as e:
+                        logger.warning("Background task vision enrichment failed: %s", e)
+
            def run_sync():
                agent = AIAgent(
                    model=turn_route["model"],
@@ -10300,7 +10474,7 @@ class GatewayRunner:
                )
                try:
                    return agent.run_conversation(
-                        user_message=prompt,
+                        user_message=enriched_prompt,
                        task_id=task_id,
                    )
                finally:
@@ -15957,6 +16131,7 @@ class GatewayRunner:
                    _already_streamed = bool(
                        (_sc and getattr(_sc, "final_response_sent", False))
                        or _previewed
+                        or (_sc and getattr(_sc, "final_content_delivered", False))
                    )
                    first_response = result.get("final_response", "")
                    if first_response and not _already_streamed:
@@ -16042,7 +16217,7 @@ class GatewayRunner:
                    except Exception:
                        pass

-                return await self._run_agent(
+                followup_result = await self._run_agent(
                    message=next_message,
                    context_prompt=context_prompt,
                    history=updated_history,
@@ -16054,6 +16229,7 @@ class GatewayRunner:
                    event_message_id=next_message_id,
                    channel_prompt=next_channel_prompt,
                )
+                return _preserve_queued_followup_history_offset(result, followup_result)
        finally:
            # Stop progress sender, interrupt monitor, and notification task
            if progress_task:
@@ -16117,12 +16293,16 @@ class GatewayRunner:
            # response_previewed means the interim_assistant_callback already
            # sent the final text via the adapter (non-streaming path).
            _previewed = bool(response.get("response_previewed"))
-            if not _is_empty_sentinel and (_streamed or _previewed):
+            _content_delivered = bool(
+                _sc and getattr(_sc, "final_content_delivered", False)
+            )
+            if not _is_empty_sentinel and (_streamed or _previewed or _content_delivered):
                logger.info(
-                    "Suppressing normal final send for session %s: final delivery already confirmed (streamed=%s previewed=%s).",
+                    "Suppressing normal final send for session %s: final delivery already confirmed (streamed=%s previewed=%s content_delivered=%s).",
                    session_key or "?",
                    _streamed,
                    _previewed,
+                    _content_delivered,
                )
                response["already_sent"] = True

@@ -128,6 +128,7 @@ def _read_process_cmdline(pid: int) -> Optional[str]:

    On Linux, reads /proc/<pid>/cmdline directly.  On macOS and other
    platforms without /proc, falls back to ``ps -p <pid> -o command=``.
+    On Windows (no /proc, no ps), uses psutil.
    """
    cmdline_path = Path(f"/proc/{pid}/cmdline")
    try:
@@ -150,6 +151,16 @@ def _read_process_cmdline(pid: int) -> Optional[str]:
    except (OSError, subprocess.TimeoutExpired):
        pass

+    # Windows fallback: psutil (already used by _pid_exists)
+    try:
+        import psutil  # type: ignore
+        proc = psutil.Process(pid)
+        cmdline_parts = proc.cmdline()
+        if cmdline_parts:
+            return " ".join(cmdline_parts)
+    except Exception:
+        pass
+
    return None


@@ -178,7 +189,8 @@ def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
    if not isinstance(argv, list) or not argv:
        return False

-    cmdline = " ".join(str(part) for part in argv)
+    # Normalize Windows backslashes so patterns match cross-platform.
+    cmdline = " ".join(str(part) for part in argv).replace("\\", "/")
    patterns = (
        "hermes_cli.main gateway",
        "hermes_cli/main.py gateway",
@@ -150,6 +150,10 @@ class GatewayStreamConsumer:
        self._flood_strikes = 0         # Consecutive flood-control edit failures
        self._current_edit_interval = self.cfg.edit_interval  # Adaptive backoff
        self._final_response_sent = False
+        # Set when the final response content was sent to the user via
+        # streaming, even if the final edit (cursor removal etc.)
+        # subsequently failed.
+        self._final_content_delivered = False
        # Cache adapter lifecycle capability: only platforms that need an
        # explicit finalize call (e.g. DingTalk AI Cards) force us to make
        # a redundant final edit.  Everyone else keeps the fast path.
@@ -187,6 +191,12 @@ class GatewayStreamConsumer:
        """True when the stream consumer delivered the final assistant reply."""
        return self._final_response_sent

+    @property
+    def final_content_delivered(self) -> bool:
+        """True when the final response content reached the user, even if
+        the subsequent cosmetic edit (cursor removal) failed."""
+        return self._final_content_delivered
+
    def on_segment_break(self) -> None:
        """Finalize the current stream segment and start a fresh message."""
        self._queue.put(_NEW_SEGMENT)
@@ -455,6 +465,8 @@ class GatewayStreamConsumer:
                            # tool-progress edits or fallback-mode promotion (#10748)
                            # — that doesn't mean the final answer reached the user.
                            self._final_response_sent = chunks_delivered
+                            if chunks_delivered:
+                                self._final_content_delivered = True
                            return
                        if got_segment_break:
                            self._message_id = None
@@ -505,6 +517,11 @@ class GatewayStreamConsumer:
                    self._last_edit_time = time.monotonic()

                if got_done:
+                    # Record that the final content reached the user even
+                    # if the cosmetic final edit below fails.
+                    if current_update_visible and self._accumulated:
+                        self._final_content_delivered = True
+
                    # Final edit without cursor. If progressive editing failed
                    # mid-stream, send a single continuation/fallback message
                    # here instead of letting the base gateway path send the
@@ -35,7 +35,7 @@ from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from http.server import BaseHTTPRequestHandler, HTTPServer
 from pathlib import Path
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, Tuple
 from urllib.parse import parse_qs, urlencode, urlparse

 import httpx
@@ -284,7 +284,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
    ),
    "alibaba": ProviderConfig(
        id="alibaba",
-        name="Alibaba Cloud (DashScope)",
+        name="Qwen Cloud",
        auth_type="api_key",
        inference_base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
        api_key_env_vars=("DASHSCOPE_API_KEY",),
@@ -3870,6 +3870,39 @@ def _snapshot_nous_pool_status() -> Dict[str, Any]:
        return _empty_nous_auth_status()


+# ── Process-level memo for get_nous_auth_status() ──
+# get_nous_auth_status() validates state by calling resolve_nous_runtime_credentials(),
+# which does a synchronous OAuth refresh POST to portal.nousresearch.com. That can take
+# ~350ms even on the failure path, and read-only UI surfaces (`hermes tools`, status panels,
+# subscription-feature checks) call it many times per render — `hermes tools` → "All Platforms"
+# was firing the refresh ~31× during one menu paint, racking up >13s of HTTP and burning
+# single-use refresh tokens. Cache the snapshot for a few seconds, keyed on the auth.json
+# mtime so that `hermes auth login/logout/add/remove` invalidate naturally on the next call.
+_NOUS_AUTH_STATUS_CACHE_TTL = 15.0  # seconds
+_nous_auth_status_cache: Optional[Tuple[float, Optional[float], Dict[str, Any]]] = None
+
+
+def _auth_file_mtime() -> Optional[float]:
+    try:
+        return _auth_file_path().stat().st_mtime
+    except FileNotFoundError:
+        return None
+    except Exception:
+        return None
+
+
+def invalidate_nous_auth_status_cache() -> None:
+    """Clear the get_nous_auth_status() process-level memo.
+
+    Call this from any code path that mutates Nous auth state without going
+    through resolve_nous_runtime_credentials() (e.g. tests). Login/logout
+    flows touch auth.json, so the mtime check below invalidates them
+    automatically — explicit invalidation is the belt-and-braces option.
+    """
+    global _nous_auth_status_cache
+    _nous_auth_status_cache = None
+
+
 def get_nous_auth_status() -> Dict[str, Any]:
    """Status snapshot for Nous auth.

@@ -3878,7 +3911,32 @@ def get_nous_auth_status() -> Dict[str, Any]:
    by resolving runtime credentials so revoked refresh sessions do not show up
    as a healthy login. If provider state is absent, fall back to the credential
    pool for the just-logged-in / not-yet-promoted case.
+
+    The returned snapshot is memoised for ~15s keyed on the auth.json mtime,
+    so menu/status surfaces that ask repeatedly don't trigger one refresh POST
+    per call. Login/logout flows write to auth.json and therefore invalidate
+    the cache automatically; tests can also call
+    ``invalidate_nous_auth_status_cache()`` explicitly.
    """
+    global _nous_auth_status_cache
+    now = time.monotonic()
+    mtime = _auth_file_mtime()
+    cached = _nous_auth_status_cache
+    if cached is not None:
+        cached_at, cached_mtime, cached_status = cached
+        if (
+            cached_mtime == mtime
+            and (now - cached_at) < _NOUS_AUTH_STATUS_CACHE_TTL
+        ):
+            return dict(cached_status)
+
+    status = _compute_nous_auth_status()
+    _nous_auth_status_cache = (now, mtime, dict(status))
+    return status
+
+
+def _compute_nous_auth_status() -> Dict[str, Any]:
+    """Uncached implementation of get_nous_auth_status(). See that function."""
    state = get_provider_auth_state("nous")
    if state:
        base_status = {
@@ -581,6 +581,19 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
    if mcp_connected:
        summary_parts.append(f"{mcp_connected} MCP servers")
    summary_parts.append("/help for commands")
+    # Indicate when the codex_app_server runtime is active so users
+    # understand why tool counts may not match what's actually reachable
+    # (codex builds its own tool list inside the spawned subprocess).
+    try:
+        from hermes_cli.codex_runtime_switch import get_current_runtime
+        from hermes_cli.config import load_config as _load_cfg
+        if get_current_runtime(_load_cfg()) == "codex_app_server":
+            right_lines.append(
+                f"[bold {accent}]Runtime:[/] [{text}]codex app-server[/] "
+                f"[dim {dim}](terminal/file ops/MCP run inside codex)[/]"
+            )
+    except Exception:
+        pass
    # Show active profile name when not 'default'
    try:
        from hermes_cli.profiles import get_active_profile_name
@@ -22,6 +22,7 @@ from pathlib import Path
 from hermes_constants import is_wsl as _is_wsl

 logger = logging.getLogger(__name__)
+_PNG_SIGNATURE = b"\x89PNG\r\n\x1a\n"


 def save_clipboard_image(dest: Path) -> bool:
@@ -378,10 +379,13 @@ def _wayland_save(dest: Path) -> bool:
            dest.unlink(missing_ok=True)
            return False

-        # BMP needs conversion to PNG (common in WSLg where only BMP
-        # is bridged from Windows clipboard via RDP).
-        if mime == "image/bmp":
-            return _convert_to_png(dest)
+        # save_clipboard_image() promises a PNG output path. Wayland can offer
+        # JPEG/GIF/WebP/BMP payloads, so normalize every non-PNG result before
+        # returning success.
+        if mime != "image/png":
+            if not _convert_to_png(dest) or not _is_png_file(dest):
+                dest.unlink(missing_ok=True)
+                return False

        return True

@@ -433,6 +437,15 @@ def _convert_to_png(path: Path) -> bool:
    return path.exists() and path.stat().st_size > 0


+def _is_png_file(path: Path) -> bool:
+    """Return True when *path* starts with the PNG file signature."""
+    try:
+        with path.open("rb") as f:
+            return f.read(len(_PNG_SIGNATURE)) == _PNG_SIGNATURE
+    except OSError:
+        return False
+
+
 # ── X11 (xclip) ─────────────────────────────────────────────────────────

 def _xclip_has_image() -> bool:
@@ -0,0 +1,614 @@
+"""Migrate Hermes' MCP server config and Codex's installed curated plugins
+to the format Codex expects in ~/.codex/config.toml.
+
+When the user enables the codex_app_server runtime, the codex subprocess
+runs its own MCP client and its own plugin runtime (Linear, Atlassian,
+Asana, plus per-account ChatGPT apps via app/list). For both of those to
+be useful, the user's choices need to be visible to codex too. This
+module:
+
+  1. Reads Hermes' YAML and writes equivalent [mcp_servers.<name>]
+     entries to ~/.codex/config.toml.
+  2. Queries codex's `plugin/list` for the openai-curated marketplace
+     and writes [plugins."<name>@<marketplace>"] entries for any plugin
+     the user has installed=true on their codex CLI. (This is what
+     OpenClaw calls "migrate native codex plugins" — the YouTube-video-
+     worthy bit Pash highlighted: Canva, GitHub, Calendar, Gmail
+     pre-configured.)
+  3. Writes a [permissions] default profile so users on this runtime
+     don't get an approval prompt on every write attempt.
+
+What translates (MCP servers):
+  Hermes mcp_servers.<n>.command/args/env  → codex stdio transport
+  Hermes mcp_servers.<n>.url/headers       → codex streamable_http transport
+  Hermes mcp_servers.<n>.timeout           → codex tool_timeout_sec
+  Hermes mcp_servers.<n>.connect_timeout   → codex startup_timeout_sec
+
+What does NOT translate (warned + skipped):
+  Hermes-specific keys (sampling, etc.) — codex's MCP client has no
+  equivalent. Listed in the per-server skipped[] field of the report.
+
+What's NOT migrated (intentional):
+  AGENTS.md — codex respects this file natively in its cwd. Hermes' own
+  AGENTS.md (project-level) is already in the worktree, so codex picks
+  it up without translation. No code needed.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# Marker comments wrapping the managed section so re-runs can detect
+# what's ours and what's user-edited. Both must appear or strip is a no-op.
+MIGRATION_MARKER = (
+    "# managed by hermes-agent — `hermes codex-runtime migrate` regenerates this section"
+)
+MIGRATION_END_MARKER = (
+    "# end hermes-agent managed section"
+)
+
+
+@dataclass
+class MigrationReport:
+    """Outcome of a migration pass."""
+
+    target_path: Optional[Path] = None
+    migrated: list[str] = field(default_factory=list)
+    skipped_keys_per_server: dict[str, list[str]] = field(default_factory=dict)
+    migrated_plugins: list[str] = field(default_factory=list)
+    plugin_query_error: Optional[str] = None
+    wrote_permissions_default: Optional[str] = None
+    errors: list[str] = field(default_factory=list)
+    written: bool = False
+    dry_run: bool = False
+
+    def summary(self) -> str:
+        lines = []
+        if self.dry_run:
+            lines.append(f"(dry run) Would write {self.target_path}")
+        elif self.written:
+            lines.append(f"Wrote {self.target_path}")
+        if self.migrated:
+            lines.append(f"Migrated {len(self.migrated)} MCP server(s):")
+            for name in self.migrated:
+                skipped = self.skipped_keys_per_server.get(name, [])
+                note = (
+                    f" (skipped: {', '.join(skipped)})" if skipped else ""
+                )
+                lines.append(f"  - {name}{note}")
+        else:
+            lines.append("No MCP servers found in Hermes config.")
+        if self.migrated_plugins:
+            lines.append(
+                f"Migrated {len(self.migrated_plugins)} native Codex plugin(s):"
+            )
+            for name in self.migrated_plugins:
+                lines.append(f"  - {name}")
+        elif self.plugin_query_error:
+            lines.append(f"Codex plugin discovery skipped: {self.plugin_query_error}")
+        if self.wrote_permissions_default:
+            lines.append(
+                f"Wrote default_permissions = "
+                f"{self.wrote_permissions_default!r}"
+            )
+        for err in self.errors:
+            lines.append(f"⚠ {err}")
+        return "\n".join(lines)
+
+
+# Hermes keys that codex's MCP schema doesn't support — dropped during
+# migration with a warning. Anything not on the keep list AND not the
+# transport keys is added to skipped.
+_KNOWN_HERMES_KEYS = {
+    # transport — stdio
+    "command", "args", "env", "cwd",
+    # transport — http
+    "url", "headers", "transport",
+    # timeouts
+    "timeout", "connect_timeout",
+    # general
+    "enabled", "description",
+}
+
+# Subset that have a direct codex equivalent.
+_KEYS_DROPPED_WITH_WARNING = {
+    # Hermes' sampling subsection — codex MCP has no equivalent
+    "sampling",
+}
+
+
+def _translate_one_server(
+    name: str, hermes_cfg: dict
+) -> tuple[Optional[dict], list[str]]:
+    """Translate one Hermes MCP server config to the codex inline-table dict
+    representation. Returns (codex_entry, skipped_keys).
+
+    codex_entry is a dict ready for TOML serialization, or None when the
+    server can't be translated (e.g. neither command nor url present)."""
+    if not isinstance(hermes_cfg, dict):
+        return None, []
+
+    skipped: list[str] = []
+    out: dict[str, Any] = {}
+
+    has_command = bool(hermes_cfg.get("command"))
+    has_url = bool(hermes_cfg.get("url"))
+
+    if has_command and has_url:
+        skipped.append("url (both command and url set; preferring stdio)")
+        has_url = False
+
+    if has_command:
+        # Stdio transport
+        out["command"] = str(hermes_cfg["command"])
+        args = hermes_cfg.get("args") or []
+        if args:
+            out["args"] = [str(a) for a in args]
+        env = hermes_cfg.get("env") or {}
+        if env:
+            # Codex expects string values
+            out["env"] = {str(k): str(v) for k, v in env.items()}
+        cwd = hermes_cfg.get("cwd")
+        if cwd:
+            out["cwd"] = str(cwd)
+    elif has_url:
+        # streamable_http transport (codex covers both http and SSE here)
+        out["url"] = str(hermes_cfg["url"])
+        headers = hermes_cfg.get("headers") or {}
+        if headers:
+            out["http_headers"] = {str(k): str(v) for k, v in headers.items()}
+        # Hermes' transport: sse hint is informational; codex auto-negotiates
+        if hermes_cfg.get("transport") == "sse":
+            skipped.append("transport=sse (codex auto-negotiates)")
+    else:
+        return None, ["no command or url field"]
+
+    # Timeouts
+    if "timeout" in hermes_cfg:
+        try:
+            out["tool_timeout_sec"] = float(hermes_cfg["timeout"])
+        except (TypeError, ValueError):
+            skipped.append("timeout (not numeric)")
+    if "connect_timeout" in hermes_cfg:
+        try:
+            out["startup_timeout_sec"] = float(hermes_cfg["connect_timeout"])
+        except (TypeError, ValueError):
+            skipped.append("connect_timeout (not numeric)")
+
+    # Enabled flag (codex defaults to true so we only emit when explicitly false)
+    if hermes_cfg.get("enabled") is False:
+        out["enabled"] = False
+
+    # Detect keys we explicitly drop with warning
+    for key in hermes_cfg:
+        if key in _KEYS_DROPPED_WITH_WARNING:
+            skipped.append(f"{key} (no codex equivalent)")
+        elif key not in _KNOWN_HERMES_KEYS:
+            skipped.append(f"{key} (unknown Hermes key)")
+
+    return out, skipped
+
+
+def _format_toml_value(value: Any) -> str:
+    """Minimal TOML value formatter for the value types we emit.
+
+    We only emit strings, numbers, booleans, and tables of those — no nested
+    arrays of tables. This covers everything codex's MCP schema accepts."""
+    if isinstance(value, bool):
+        return "true" if value else "false"
+    if isinstance(value, (int, float)):
+        return repr(value)
+    if isinstance(value, str):
+        # Escape per TOML basic-string rules. Order matters: backslash
+        # first so the other escapes don't get re-escaped.
+        # Control characters (newline, tab, etc.) must use \-escapes
+        # because TOML basic strings don't allow literal control chars
+        # — passing them through would produce invalid TOML that codex
+        # would refuse to load. Paths usually don't contain control
+        # chars but env-var passthrough (HERMES_HOME, PYTHONPATH) could
+        # in pathological cases.
+        escaped = (
+            value
+            .replace("\\", "\\\\")
+            .replace('"', '\\"')
+            .replace("\b", "\\b")
+            .replace("\t", "\\t")
+            .replace("\n", "\\n")
+            .replace("\f", "\\f")
+            .replace("\r", "\\r")
+        )
+        return f'"{escaped}"'
+    if isinstance(value, list):
+        items = ", ".join(_format_toml_value(v) for v in value)
+        return f"[{items}]"
+    if isinstance(value, dict):
+        items = ", ".join(
+            f'{_quote_key(k)} = {_format_toml_value(v)}' for k, v in value.items()
+        )
+        return "{ " + items + " }" if items else "{}"
+    raise ValueError(f"Unsupported TOML value type: {type(value).__name__}")
+
+
+def _quote_key(key: str) -> str:
+    """Return key bare-or-quoted depending on whether it's a valid bare key."""
+    if all(c.isalnum() or c in "-_" for c in key) and key:
+        return key
+    escaped = key.replace("\\", "\\\\").replace('"', '\\"')
+    return f'"{escaped}"'
+
+def render_codex_toml_section(
+    servers: dict[str, dict],
+    plugins: Optional[list[dict]] = None,
+    default_permission_profile: Optional[str] = None,
+) -> str:
+    """Render the managed [mcp_servers.<n>] / [plugins.<id>] / [permissions]
+    block for ~/.codex/config.toml.
+
+    Args:
+        servers: dict of MCP server name → translated codex inline-table
+        plugins: optional list of {name, marketplace, enabled} for native
+            Codex plugins to enable. (E.g. the Linear / Atlassian / Asana
+            curated plugins, or per-account ChatGPT apps.)
+        default_permission_profile: when set, write `[permissions] default`
+            so the user doesn't get an approval prompt on every write
+            attempt. Common values: "workspace-write", "read-only",
+            "full-access".
+    """
+    out = [MIGRATION_MARKER]
+    if not servers and not plugins and not default_permission_profile:
+        out.append("# (no MCP servers, plugins, or permissions configured by Hermes)")
+        out.append(MIGRATION_END_MARKER)
+        return "\n".join(out) + "\n"
+
+    if default_permission_profile:
+        # Codex's config schema: `default_permissions` is a top-level
+        # string referencing a profile name. Built-in profile names start
+        # with ":" (":workspace-write", ":read-only", ":full-access"). The
+        # [permissions] table is for *user-defined* named profiles with
+        # structured fields — not what we want.
+        normalized = (
+            default_permission_profile
+            if default_permission_profile.startswith(":")
+            else f":{default_permission_profile}"
+        )
+        out.append("")
+        out.append(f"default_permissions = {_format_toml_value(normalized)}")
+
+    if servers:
+        for name in sorted(servers.keys()):
+            cfg = servers[name]
+            out.append("")
+            out.append(f"[mcp_servers.{_quote_key(name)}]")
+            for k, v in cfg.items():
+                out.append(f"{_quote_key(k)} = {_format_toml_value(v)}")
+
+    if plugins:
+        for plugin in sorted(plugins, key=lambda p: f"{p.get('name','')}@{p.get('marketplace','')}"):
+            name = plugin.get("name") or ""
+            marketplace = plugin.get("marketplace") or "openai-curated"
+            enabled = bool(plugin.get("enabled", True))
+            qualified = f"{name}@{marketplace}"
+            out.append("")
+            out.append(f'[plugins.{_quote_key(qualified)}]')
+            out.append(f"enabled = {_format_toml_value(enabled)}")
+
+    out.append("")
+    out.append(MIGRATION_END_MARKER)
+    return "\n".join(out) + "\n"
+
+
+def _strip_existing_managed_block(toml_text: str) -> str:
+    """Remove any prior managed section so re-runs idempotently replace it.
+
+    The managed section is everything between MIGRATION_MARKER (start) and
+    MIGRATION_END_MARKER (end), inclusive of both markers. User-edited
+    sections above or below are preserved verbatim.
+
+    Backward compatibility: if the start marker is found but no end marker
+    follows, we fall back to the heuristic that swallows lines until we
+    hit a section that's not [mcp_servers.*]/[plugins.*]/[permissions]/
+    a `default_permissions =` key. This matches what older versions of
+    this code wrote so re-runs don't break configs from prior Hermes
+    versions."""
+    lines = toml_text.splitlines(keepends=True)
+    out: list[str] = []
+    in_managed = False
+    saw_end_marker = False
+    for line in lines:
+        line_stripped_nl = line.rstrip("\n")
+        if line_stripped_nl == MIGRATION_MARKER:
+            in_managed = True
+            saw_end_marker = False
+            continue
+        if in_managed:
+            if line_stripped_nl == MIGRATION_END_MARKER:
+                in_managed = False
+                saw_end_marker = True
+                continue
+            stripped = line.lstrip()
+            if not saw_end_marker and stripped.startswith("[") and not (
+                stripped.startswith("[mcp_servers")
+                or stripped.startswith("[plugins")
+                or stripped.startswith("[permissions]")
+                or stripped.startswith("[permissions.")
+            ):
+                # Old-format managed block without end marker: bail back
+                # to user content as soon as we see a non-managed section.
+                in_managed = False
+                out.append(line)
+                continue
+            # Otherwise swallow the line.
+            continue
+        out.append(line)
+    return "".join(out)
+
+
+def _query_codex_plugins(
+    codex_home: Optional[Path] = None,
+    timeout: float = 8.0,
+) -> tuple[list[dict], Optional[str]]:
+    """Query codex's `plugin/list` for installed curated plugins.
+
+    Spawns `codex app-server` briefly, sends initialize + plugin/list,
+    extracts plugins where installed=true. Returns (plugins, error).
+    Plugins is a list of {name, marketplace, enabled} dicts ready for
+    render_codex_toml_section().
+
+    On any failure (codex not installed, RPC error, timeout) returns
+    ([], error_message). Migration treats this as non-fatal — MCP
+    servers and permissions still write through.
+    """
+    try:
+        from agent.transports.codex_app_server import CodexAppServerClient
+    except Exception as exc:
+        return [], f"transport unavailable: {exc}"
+
+    try:
+        with CodexAppServerClient(
+            codex_home=str(codex_home) if codex_home else None
+        ) as client:
+            client.initialize(client_name="hermes-migration")
+            resp = client.request("plugin/list", {}, timeout=timeout)
+    except Exception as exc:
+        return [], f"plugin/list query failed: {exc}"
+
+    out: list[dict] = []
+    seen: set[tuple[str, str]] = set()
+    marketplaces = resp.get("marketplaces") or []
+    if not isinstance(marketplaces, list):
+        return [], "plugin/list response missing 'marketplaces'"
+    for marketplace in marketplaces:
+        if not isinstance(marketplace, dict):
+            continue
+        market_name = str(marketplace.get("name") or "openai-curated")
+        plugins = marketplace.get("plugins") or []
+        if not isinstance(plugins, list):
+            continue
+        for plugin in plugins:
+            if not isinstance(plugin, dict):
+                continue
+            installed = bool(plugin.get("installed", False))
+            if not installed:
+                continue
+            # Skip plugins codex itself reports as unavailable (broken
+            # install, missing OAuth, removed from marketplace, etc.).
+            # Cf. openclaw/openclaw#80815 — OpenClaw learned to gate
+            # migration on app readiness to avoid writing config that
+            # would fail at activation time. Our migration writes to
+            # codex's config.toml directly, so a broken plugin would
+            # surface as a codex error on first use. Skipping it here
+            # keeps the migrated config clean and the user's first
+            # codex turn from failing.
+            availability = str(plugin.get("availability") or "").upper()
+            if availability and availability != "AVAILABLE":
+                logger.debug(
+                    "skipping plugin %s: availability=%s",
+                    plugin.get("name"), availability,
+                )
+                continue
+            name = str(plugin.get("name") or "")
+            if not name:
+                continue
+            key = (name, market_name)
+            if key in seen:
+                continue
+            seen.add(key)
+            # Carry forward whatever 'enabled' codex reports — defaults to
+            # true for installed plugins. This is the same shape OpenClaw
+            # writes when migrating native codex plugins.
+            out.append({
+                "name": name,
+                "marketplace": market_name,
+                "enabled": bool(plugin.get("enabled", True)),
+            })
+    return out, None
+
+
+def _build_hermes_tools_mcp_entry() -> dict:
+    """Build the codex stdio-transport entry that launches Hermes' own
+    tool surface as an MCP server. Codex's subprocess will call back into
+    this for browser/web/delegate_task/vision/memory/skills tools.
+
+    The command runs the worktree's Python via the current sys.executable
+    so a hermes installed under /opt/, /usr/local/, or a venv all work.
+    HERMES_HOME and PYTHONPATH are passed through so the spawned process
+    sees the same config + module layout the user is running."""
+    import sys
+
+    env: dict[str, str] = {}
+    # HERMES_HOME passes through if set so the MCP subprocess sees the
+    # same config / auth / sessions DB as the parent CLI.
+    hermes_home = os.environ.get("HERMES_HOME")
+    if hermes_home:
+        env["HERMES_HOME"] = hermes_home
+    # PYTHONPATH passes through so a worktree-launched hermes finds the
+    # branch's modules instead of the installed package.
+    pythonpath = os.environ.get("PYTHONPATH")
+    if pythonpath:
+        env["PYTHONPATH"] = pythonpath
+    # Quiet mode + redaction defaults so the MCP wire stays clean.
+    env["HERMES_QUIET"] = "1"
+    env["HERMES_REDACT_SECRETS"] = env.get("HERMES_REDACT_SECRETS", "true")
+
+    out: dict[str, Any] = {
+        "command": sys.executable,
+        "args": ["-m", "agent.transports.hermes_tools_mcp_server"],
+    }
+    if env:
+        out["env"] = env
+    # Generous timeouts — browser_navigate or delegate_task can take a
+    # while; we don't want codex's MCP client to give up too early.
+    out["startup_timeout_sec"] = 30.0
+    out["tool_timeout_sec"] = 600.0
+    return out
+
+
+def migrate(
+    hermes_config: dict,
+    *,
+    codex_home: Optional[Path] = None,
+    dry_run: bool = False,
+    discover_plugins: bool = True,
+    default_permission_profile: Optional[str] = ":workspace",
+    expose_hermes_tools: bool = True,
+) -> MigrationReport:
+    """Translate Hermes mcp_servers config + Codex curated plugins into
+    ~/.codex/config.toml.
+
+    Args:
+        hermes_config: full ~/.hermes/config.yaml dict
+        codex_home: override CODEX_HOME (defaults to ~/.codex)
+        dry_run: skip the actual write; report what would happen
+        discover_plugins: when True (default), query `plugin/list` against
+            the live codex CLI to migrate any installed curated plugins
+            into [plugins."<name>@<marketplace>"] entries. Set False to
+            skip the subprocess spawn (for tests or restricted environments).
+        default_permission_profile: when set (default ":workspace"), write
+            top-level `default_permissions = "<name>"` so users on this
+            runtime don't get an approval prompt on every write attempt.
+            Built-in codex profile names are ":workspace", ":read-only",
+            ":danger-no-sandbox" (note the leading ":"). Also accepts a
+            user-defined profile name (no leading ":") that the user has
+            configured in their own [permissions.<name>] table. Set None
+            to leave permissions unset and let codex use its compiled-in
+            default (which is read-only).
+        expose_hermes_tools: when True (default), register Hermes' own
+            tool surface (web_search, browser_*, delegate_task, vision,
+            memory, skills, etc.) as an MCP server in ~/.codex/config.toml
+            so the codex subprocess can call back into Hermes for tools
+            codex doesn't have built in. Set False to opt out.
+    """
+    report = MigrationReport(dry_run=dry_run)
+    codex_home = codex_home or Path.home() / ".codex"
+    target = codex_home / "config.toml"
+    report.target_path = target
+
+    hermes_servers = (hermes_config or {}).get("mcp_servers") or {}
+    if not isinstance(hermes_servers, dict):
+        report.errors.append(
+            "mcp_servers in Hermes config is not a dict; cannot migrate."
+        )
+        return report
+
+    translated: dict[str, dict] = {}
+    for name, cfg in hermes_servers.items():
+        out, skipped = _translate_one_server(str(name), cfg or {})
+        if out is None:
+            report.errors.append(
+                f"server {name!r} skipped: {', '.join(skipped) or 'no transport configured'}"
+            )
+            continue
+        translated[str(name)] = out
+        if skipped:
+            report.skipped_keys_per_server[str(name)] = skipped
+        report.migrated.append(str(name))
+
+    # Discover installed Codex curated plugins. Best-effort — never blocks
+    # the migration if codex is unreachable or the RPC fails.
+    plugins: list[dict] = []
+    if discover_plugins and not dry_run:
+        plugins, plugin_err = _query_codex_plugins(codex_home=codex_home)
+        if plugin_err:
+            report.plugin_query_error = plugin_err
+        for p in plugins:
+            report.migrated_plugins.append(f"{p['name']}@{p['marketplace']}")
+
+    # Track whether we wrote a default permission profile so the report
+    # surfaces it to the user.
+    if default_permission_profile:
+        report.wrote_permissions_default = default_permission_profile
+
+    # Inject Hermes' own tool surface as an MCP server so the spawned
+    # codex subprocess can call back into Hermes for the tools codex
+    # doesn't ship with — web_search, browser_*, delegate_task, vision,
+    # memory, skills, session_search, image_generate, text_to_speech.
+    # The server itself is agent/transports/hermes_tools_mcp_server.py
+    # and is launched on demand by codex (stdio MCP).
+    if expose_hermes_tools:
+        translated["hermes-tools"] = _build_hermes_tools_mcp_entry()
+        if "hermes-tools" not in report.migrated:
+            report.migrated.append("hermes-tools")
+
+    # Build the new managed block
+    managed_block = render_codex_toml_section(
+        translated, plugins=plugins,
+        default_permission_profile=default_permission_profile,
+    )
+
+    # Read existing codex config if any, strip the prior managed block,
+    # append the new one.
+    if target.exists():
+        try:
+            existing = target.read_text(encoding="utf-8")
+        except Exception as exc:
+            report.errors.append(f"could not read {target}: {exc}")
+            return report
+        without_managed = _strip_existing_managed_block(existing)
+        # Ensure exactly one blank line between user content and managed block
+        if without_managed and not without_managed.endswith("\n"):
+            without_managed += "\n"
+        new_text = (
+            without_managed.rstrip("\n") + "\n\n" + managed_block
+            if without_managed.strip()
+            else managed_block
+        )
+    else:
+        new_text = managed_block
+
+    if dry_run:
+        return report
+
+    try:
+        codex_home.mkdir(parents=True, exist_ok=True)
+        # Atomic write: write to a temp file in the same directory then
+        # rename. Same-directory rename is atomic on POSIX and ReplaceFile
+        # on Windows. Avoids leaving a half-written config.toml that
+        # codex would refuse to load if we crash mid-write.
+        import tempfile
+        tmp_fd, tmp_path_str = tempfile.mkstemp(
+            prefix=".config.toml.", dir=str(codex_home)
+        )
+        tmp_path = Path(tmp_path_str)
+        try:
+            with os.fdopen(tmp_fd, "w", encoding="utf-8") as fh:
+                fh.write(new_text)
+            tmp_path.replace(target)
+        except Exception:
+            # Clean up the temp file if the rename didn't happen.
+            try:
+                if tmp_path.exists():
+                    tmp_path.unlink()
+            except Exception:
+                pass
+            raise
+        report.written = True
+    except Exception as exc:
+        report.errors.append(f"could not write {target}: {exc}")
+    return report
@@ -0,0 +1,266 @@
+"""Shared logic for the /codex-runtime slash command.
+
+Toggles `model.openai_runtime` between "auto" (= chat_completions, Hermes'
+default) and "codex_app_server" (= hand turns to a codex subprocess).
+
+Both CLI (cli.py) and gateway (gateway/run.py) call into this module so the
+behavior stays identical across surfaces.
+
+The actual runtime resolution happens in hermes_cli.runtime_provider's
+_maybe_apply_codex_app_server_runtime() helper, which reads the persisted
+config value. This module just persists the value and reports the change.
+"""
+
+from __future__ import annotations
+
+import logging
+from dataclasses import dataclass
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+
+VALID_RUNTIMES = ("auto", "codex_app_server")
+
+
+@dataclass
+class CodexRuntimeStatus:
+    """Result of a /codex-runtime invocation. Callers render this however
+    suits their surface (CLI uses Rich panels, gateway sends a text message)."""
+
+    success: bool
+    new_value: Optional[str] = None
+    old_value: Optional[str] = None
+    message: str = ""
+    requires_new_session: bool = False
+    codex_binary_ok: bool = True
+    codex_version: Optional[str] = None
+
+
+def parse_args(arg_string: str) -> tuple[Optional[str], list[str]]:
+    """Parse the slash-command argument string. Returns (value, errors).
+
+    No args         → return current state (value=None)
+    'auto' / 'codex_app_server' / 'on' / 'off' → return that value
+    anything else   → error
+    """
+    raw = (arg_string or "").strip().lower()
+    if not raw:
+        return None, []
+    # Accept human-friendly synonyms
+    if raw in ("on", "codex", "enable"):
+        return "codex_app_server", []
+    if raw in ("off", "default", "disable", "hermes"):
+        return "auto", []
+    if raw in VALID_RUNTIMES:
+        return raw, []
+    return None, [
+        f"Unknown runtime {raw!r}. Use one of: auto, codex_app_server, on, off"
+    ]
+
+
+def get_current_runtime(config: dict) -> str:
+    """Read the current `model.openai_runtime` value from a config dict.
+    Returns 'auto' for unset / empty / unrecognized values."""
+    if not isinstance(config, dict):
+        return "auto"
+    model_cfg = config.get("model") or {}
+    if not isinstance(model_cfg, dict):
+        return "auto"
+    value = str(model_cfg.get("openai_runtime") or "").strip().lower()
+    if value in VALID_RUNTIMES:
+        return value
+    return "auto"
+
+
+def set_runtime(config: dict, new_value: str) -> str:
+    """Mutate the config dict in place to persist the new runtime value.
+    Returns the previous value for callers that want to report a delta."""
+    if new_value not in VALID_RUNTIMES:
+        raise ValueError(
+            f"invalid runtime {new_value!r}; must be one of {VALID_RUNTIMES}"
+        )
+    old = get_current_runtime(config)
+    if not isinstance(config.get("model"), dict):
+        config["model"] = {}
+    config["model"]["openai_runtime"] = new_value
+    return old
+
+
+def check_codex_binary_ok() -> tuple[bool, Optional[str]]:
+    """Best-effort verification that codex CLI is installed at acceptable
+    version. Returns (ok, version_or_message)."""
+    try:
+        from agent.transports.codex_app_server import check_codex_binary
+
+        return check_codex_binary()
+    except Exception as exc:  # pragma: no cover
+        return False, f"codex check failed: {exc}"
+
+
+def apply(
+    config: dict,
+    new_value: Optional[str],
+    *,
+    persist_callback=None,
+) -> CodexRuntimeStatus:
+    """Top-level entry point used by both CLI and gateway handlers.
+
+    Args:
+        config: in-memory config dict (will be mutated when new_value is set)
+        new_value: desired runtime; None means "show current state only"
+        persist_callback: optional callable taking the mutated config dict
+            and persisting it to disk. Skipped when None (used by tests).
+
+    Returns: CodexRuntimeStatus describing the outcome.
+    """
+    current = get_current_runtime(config)
+
+    # Cache the codex binary check for this apply() call. Subprocess spawn
+    # is cheap (~50ms for `codex --version`), but we'd otherwise call it up
+    # to 3 times in the enable path (read-only/state, gate, success message).
+    # None = not yet checked; (bool, str) = result.
+    _binary_check: Optional[tuple[bool, Optional[str]]] = None
+
+    def _check_binary_cached() -> tuple[bool, Optional[str]]:
+        nonlocal _binary_check
+        if _binary_check is None:
+            _binary_check = check_codex_binary_ok()
+        return _binary_check
+
+    # Read-only call: just report state
+    if new_value is None:
+        ok, ver = _check_binary_cached()
+        msg = (
+            f"openai_runtime: {current}\n"
+            f"codex CLI: {'OK ' + ver if ok else 'not available — ' + (ver or 'install with `npm i -g @openai/codex`')}"
+        )
+        return CodexRuntimeStatus(
+            success=True,
+            new_value=current,
+            old_value=current,
+            message=msg,
+            codex_binary_ok=ok,
+            codex_version=ver if ok else None,
+        )
+
+    # No change requested
+    if new_value == current:
+        return CodexRuntimeStatus(
+            success=True,
+            new_value=current,
+            old_value=current,
+            message=f"openai_runtime already set to {current}",
+        )
+
+    # If switching ON, verify codex CLI is installed before persisting —
+    # an opt-in toggle that silently fails on the first turn is the
+    # worst possible UX. Block here with a clear install hint.
+    if new_value == "codex_app_server":
+        ok, ver_or_msg = _check_binary_cached()
+        if not ok:
+            return CodexRuntimeStatus(
+                success=False,
+                new_value=None,
+                old_value=current,
+                message=(
+                    "Cannot enable codex_app_server runtime: "
+                    f"{ver_or_msg or 'codex CLI not available'}\n"
+                    "Install with: npm i -g @openai/codex"
+                ),
+                codex_binary_ok=False,
+                codex_version=None,
+            )
+
+    set_runtime(config, new_value)
+    if persist_callback is not None:
+        try:
+            persist_callback(config)
+        except Exception as exc:
+            logger.exception("failed to persist openai_runtime change")
+            return CodexRuntimeStatus(
+                success=False,
+                new_value=new_value,
+                old_value=current,
+                message=f"updated config in memory but persist failed: {exc}",
+            )
+
+    msg_lines = [
+        f"openai_runtime: {current} → {new_value}",
+    ]
+    if new_value == "codex_app_server":
+        ok, ver = _check_binary_cached()
+        if ok:
+            msg_lines.append(f"codex CLI: {ver}")
+        # Auto-migrate Hermes' MCP servers + Codex's installed curated
+        # plugins into ~/.codex/config.toml so the spawned codex subprocess
+        # sees the same tool surface AND can call back into Hermes for
+        # browser/web/delegate_task/vision/memory tools (#7 fix).
+        # Failures are non-fatal — the runtime change still proceeds.
+        try:
+            from hermes_cli.codex_runtime_plugin_migration import migrate
+            mig_report = migrate(config)
+            # Tools/MCP servers (excluding the hermes-tools callback,
+            # which is internal plumbing — surface separately).
+            user_servers = [
+                s for s in mig_report.migrated if s != "hermes-tools"
+            ]
+            if user_servers:
+                msg_lines.append(
+                    f"Migrated {len(user_servers)} MCP server(s): "
+                    f"{', '.join(user_servers)}"
+                )
+            # Native Codex plugin migration (Linear, GitHub, etc.)
+            if mig_report.migrated_plugins:
+                msg_lines.append(
+                    f"Migrated {len(mig_report.migrated_plugins)} native "
+                    f"Codex plugin(s): {', '.join(mig_report.migrated_plugins)}"
+                )
+            elif mig_report.plugin_query_error:
+                msg_lines.append(
+                    f"Codex plugin discovery skipped: "
+                    f"{mig_report.plugin_query_error}"
+                )
+            # Permissions + Hermes tool callback are always-on production
+            # bits the user benefits from knowing about.
+            if mig_report.wrote_permissions_default:
+                msg_lines.append(
+                    f"Default sandbox: {mig_report.wrote_permissions_default} "
+                    f"(no approval prompt on every write)"
+                )
+            if "hermes-tools" in mig_report.migrated:
+                msg_lines.append(
+                    "Hermes tool callback registered: codex can now use "
+                    "web_search, web_extract, browser_*, vision_analyze, "
+                    "image_generate, skill_view, skills_list, text_to_speech, "
+                    "kanban_* (worker + orchestrator) via MCP."
+                )
+                msg_lines.append(
+                    "  (delegate_task, memory, session_search, todo run "
+                    "only on the default Hermes runtime — they need the "
+                    "agent loop context.)"
+                )
+            msg_lines.append(f"  (config: {mig_report.target_path})")
+            for err in mig_report.errors:
+                msg_lines.append(f"⚠ MCP migration: {err}")
+        except Exception as exc:
+            msg_lines.append(f"⚠ MCP migration skipped: {exc}")
+        msg_lines.append(
+            "OpenAI/Codex turns now run through `codex app-server` "
+            "(terminal/file ops/patching inside Codex; "
+            "Hermes tools available via MCP callback)."
+        )
+        msg_lines.append(
+            "Effective on next session — current cached agent keeps "
+            "the prior runtime to preserve prompt cache."
+        )
+    else:
+        msg_lines.append("OpenAI/Codex turns will use the default Hermes runtime.")
+        msg_lines.append("Effective on next session.")
+    return CodexRuntimeStatus(
+        success=True,
+        new_value=new_value,
+        old_value=current,
+        message="\n".join(msg_lines),
+        requires_new_session=True,
+    )
@@ -104,6 +104,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               args_hint="<prompt>"),
    CommandDef("goal", "Set a standing goal Hermes works on across turns until achieved", "Session",
               args_hint="[text | pause | resume | clear | status]"),
+    CommandDef("subgoal", "Add or manage extra criteria on the active goal", "Session",
+               args_hint="[text | remove N | clear]"),
    CommandDef("status", "Show session info", "Session"),
    CommandDef("whoami", "Show your slash command access (admin / user)", "Info"),
    CommandDef("profile", "Show active profile name and home directory", "Info"),
@@ -120,6 +122,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               cli_only=True),
    CommandDef("model", "Switch model for this session", "Configuration",
               aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
+    CommandDef("codex-runtime", "Toggle codex app-server runtime for OpenAI/Codex models",
+               "Configuration", args_hint="[auto|codex_app_server]"),
    CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
               cli_only=True),

@@ -238,7 +238,7 @@ _hermes() {{
    esac
 }}

-_hermes "$@"
+compdef _hermes hermes
 """


@@ -731,19 +731,18 @@ DEFAULT_CONFIG = {
        "target_ratio": 0.20,         # fraction of threshold to preserve as recent tail
        "protect_last_n": 20,         # minimum recent messages to keep uncompressed
        "hygiene_hard_message_limit": 400,  # gateway session-hygiene force-compress threshold by message count
+        "protect_first_n": 3,         # non-system head messages always preserved
+                                      # verbatim, in ADDITION to the system prompt
+                                      # (which is always implicitly protected). Set to
+                                      # 0 for long-running rolling-compaction sessions
+                                      # where you want nothing pinned except the
+                                      # system prompt + rolling summary + recent tail.
    },

    # Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
    # cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
-    # long_lived_prefix: when true (default), Claude on Anthropic / OpenRouter / Nous
-    #   Portal uses a split layout: tools[-1] + stable system prefix at long_lived_ttl
-    #   (cross-session cache), last 2 messages at cache_ttl (within-session rolling).
-    #   Set false to keep the legacy "system + last 3 messages" single-tier layout.
-    # long_lived_ttl: TTL for the cross-session prefix tier ("5m" or "1h"; default "1h").
    "prompt_caching": {
        "cache_ttl": "5m",
-        "long_lived_prefix": True,
-        "long_lived_ttl": "1h",
    },

    # OpenRouter-specific settings.
@@ -978,6 +977,21 @@ DEFAULT_CONFIG = {
    # Web dashboard settings
    "dashboard": {
        "theme": "default",  # Dashboard visual theme: "default", "midnight", "ember", "mono", "cyberpunk", "rose"
+        # Hide the token/cost analytics surfaces (Analytics page, token bars and
+        # cost figures on the Models page) by default.  The numbers shown there
+        # are a local debug estimate: they only count successful main-agent
+        # responses with a usable ``response.usage``, and silently exclude every
+        # auxiliary call (context compression, title generation, vision,
+        # session search, web extract, smart approval, MCP routing, plugin LLM
+        # access) plus provider-side retries, fallback attempts, and any call
+        # whose usage block didn't come back.  Cache writes are also missing
+        # from the API response.  On models with heavy auxiliary traffic
+        # (Kimi K2.6, MiniMax M2.7) the local total can be 10x-100x lower than
+        # the provider bill, which is worse than hiding the numbers entirely
+        # because they look precise enough to compare against the provider.
+        # Set this to True to re-enable the surfaces with the understanding
+        # that the numbers are a local lower-bound estimate, not billing.
+        "show_token_analytics": False,
    },

    # Privacy settings
@@ -1236,6 +1250,7 @@ DEFAULT_CONFIG = {
        "free_response_channels": "",  # Comma-separated channel IDs where bot responds without mention
        "allowed_channels": "",        # If set, bot ONLY responds in these channel IDs (whitelist)
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
+        "thread_require_mention": False,  # If True, require @mention in threads too (multi-bot threads)
        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
        "channel_prompts": {},         # Per-channel ephemeral system prompts (forum parents apply to child threads)
        # Opt-in DM role-based auth (#12136). By default, DISCORD_ALLOWED_ROLES
@@ -2114,10 +2129,10 @@ OPTIONAL_ENV_VARS = {
        "category": "tool",
    },
    "FAL_KEY": {
-        "description": "FAL API key for image generation",
+        "description": "FAL API key for image and video generation",
        "prompt": "FAL API key",
        "url": "https://fal.ai/",
-        "tools": ["image_generate"],
+        "tools": ["image_generate", "video_generate"],
        "password": True,
        "category": "tool",
    },
@@ -4326,10 +4341,34 @@ def load_env() -> Dict[str, str]:
    concatenated KEY=VALUE pairs on a single line) are handled
    gracefully instead of producing mangled values such as duplicated
    bot tokens.  See #8908.
+
+    The parsed dict is memoised keyed on the .env file mtime, because
+    ``get_env_value()`` is called dozens-to-hundreds of times per
+    interactive menu render (`hermes tools`, `hermes setup`, status
+    panels). Sanitisation is O(lines × known-keys), so re-parsing the
+    same file on every call was burning ~300ms of CPU per `hermes tools`
+    menu paint on top of the OAuth-refresh slowness. The mtime check
+    invalidates the cache when the user edits .env mid-process.
    """
+    global _env_cache
    env_path = get_env_path()
-    env_vars = {}
-    
+
+    try:
+        mtime = env_path.stat().st_mtime
+        size = env_path.stat().st_size
+        cache_key = (str(env_path), mtime, size)
+    except FileNotFoundError:
+        cache_key = (str(env_path), None, None)
+    except Exception:
+        cache_key = None
+
+    if cache_key is not None and _env_cache is not None:
+        cached_key, cached_vars = _env_cache
+        if cached_key == cache_key:
+            return dict(cached_vars)
+
+    env_vars: Dict[str, str] = {}
+
    if env_path.exists():
        # On Windows, open() defaults to the system locale (cp1252) which can
        # fail on UTF-8 .env files. Always use explicit UTF-8; tolerate BOM
@@ -4345,10 +4384,33 @@ def load_env() -> Dict[str, str]:
            if line and not line.startswith('#') and '=' in line:
                key, _, value = line.partition('=')
                env_vars[key.strip()] = value.strip().strip('"\'')
-    
+
+    if cache_key is not None:
+        _env_cache = (cache_key, dict(env_vars))
+
    return env_vars


+# Module-level memo for load_env(), keyed on (path, mtime, size).
+# Editing .env bumps mtime → next load_env() rebuilds. invalidate_env_cache()
+# is the explicit knob for writers that update .env via this module
+# (set_env_value, save_env, etc.) without relying on filesystem mtime
+# resolution.
+_env_cache: Optional[Tuple[Tuple[str, Optional[float], Optional[int]], Dict[str, str]]] = None
+
+
+def invalidate_env_cache() -> None:
+    """Clear the load_env() process-level memo.
+
+    Writers that mutate .env (set_env_value, save_env, etc.) call this
+    to guarantee the next load_env() sees their change even on
+    filesystems with coarse mtime resolution. Reads invalidate naturally
+    via the mtime/size check.
+    """
+    global _env_cache
+    _env_cache = None
+
+
 def _sanitize_env_lines(lines: list) -> list:
    """Fix corrupted .env lines before reading or writing.

@@ -4451,6 +4513,7 @@ def sanitize_env_file() -> int:
            pass
        raise
    _secure_file(env_path)
+    invalidate_env_cache()
    return fixes


@@ -4562,6 +4625,7 @@ def save_env_value(key: str, value: str):
    _secure_file(env_path)

    os.environ[key] = value
+    invalidate_env_cache()


 def remove_env_value(key: str) -> bool:
@@ -4617,6 +4681,7 @@ def remove_env_value(key: str) -> bool:
        _secure_file(env_path)

    os.environ.pop(key, None)
+    invalidate_env_cache()
    return found


@@ -4803,6 +4868,7 @@ def show_config():
        print(f"  Threshold:    {compression.get('threshold', 0.50) * 100:.0f}%")
        print(f"  Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
        print(f"  Protect last: {compression.get('protect_last_n', 20)} messages")
+        print(f"  Protect first: {compression.get('protect_first_n', 3)} non-system head messages")
        _aux_comp = config.get('auxiliary', {}).get('compression', {})
        _sm = _aux_comp.get('model', '') or '(auto)'
        print(f"  Model:        {_sm}")
@@ -33,8 +33,8 @@ import json
 import logging
 import re
 import time
-from dataclasses import dataclass, asdict
-from typing import Any, Dict, Optional, Tuple
+from dataclasses import dataclass, field, asdict
+from typing import Any, Dict, List, Optional, Tuple

 logger = logging.getLogger(__name__)

@@ -65,6 +65,21 @@ CONTINUATION_PROMPT_TEMPLATE = (
    "If you are blocked and need input from the user, say so clearly and stop."
 )

+# Used when the user has added one or more /subgoal criteria. Surfaced
+# to the agent verbatim so it sees what to target on the next turn,
+# and surfaced to the judge so the verdict considers them too.
+CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE = (
+    "[Continuing toward your standing goal]\n"
+    "Goal: {goal}\n\n"
+    "Additional criteria the user added mid-loop:\n"
+    "{subgoals_block}\n\n"
+    "Continue working toward the goal AND all additional criteria. Take "
+    "the next concrete step. If you believe the goal and every "
+    "additional criterion are complete, state so explicitly and stop. "
+    "If you are blocked and need input from the user, say so clearly "
+    "and stop."
+)
+

 JUDGE_SYSTEM_PROMPT = (
    "You are a strict judge evaluating whether an autonomous agent has "
@@ -88,6 +103,23 @@ JUDGE_USER_PROMPT_TEMPLATE = (
    "Is the goal satisfied?"
 )

+# Used when the user has added /subgoal criteria. The judge must
+# evaluate ALL of them being met, not just the original goal.
+JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE = (
+    "Goal:\n{goal}\n\n"
+    "Additional criteria the user added mid-loop (all must also be "
+    "satisfied for the goal to be DONE):\n{subgoals_block}\n\n"
+    "Agent's most recent response:\n{response}\n\n"
+    "Decision: For each numbered criterion above, find concrete "
+    "evidence in the agent's response that the criterion is "
+    "satisfied. Do not accept generic phrases like 'all requirements "
+    "met' or 'implying it was done' — require specific evidence (a "
+    "file contents excerpt, an output line, a command result). If "
+    "ANY criterion lacks specific evidence in the response, the goal "
+    "is NOT done — return CONTINUE.\n\n"
+    "Is the goal AND every additional criterion satisfied?"
+)
+

 # ──────────────────────────────────────────────────────────────────────
 # Dataclass
@@ -108,6 +140,12 @@ class GoalState:
    last_reason: Optional[str] = None
    paused_reason: Optional[str] = None       # why we auto-paused (budget, etc.)
    consecutive_parse_failures: int = 0       # judge-output parse failures in a row
+    # User-added criteria appended mid-loop via the /subgoal command.
+    # When non-empty the judge prompt and continuation prompt both
+    # include them so the agent works toward them and the judge factors
+    # them into the verdict. Backwards-compatible: defaults to empty so
+    # old state_meta rows load unchanged.
+    subgoals: List[str] = field(default_factory=list)

    def to_json(self) -> str:
        return json.dumps(asdict(self), ensure_ascii=False)
@@ -115,6 +153,10 @@ class GoalState:
    @classmethod
    def from_json(cls, raw: str) -> "GoalState":
        data = json.loads(raw)
+        raw_subgoals = data.get("subgoals") or []
+        subgoals: List[str] = []
+        if isinstance(raw_subgoals, list):
+            subgoals = [str(s).strip() for s in raw_subgoals if str(s).strip()]
        return cls(
            goal=data.get("goal", ""),
            status=data.get("status", "active"),
@@ -126,8 +168,18 @@ class GoalState:
            last_reason=data.get("last_reason"),
            paused_reason=data.get("paused_reason"),
            consecutive_parse_failures=int(data.get("consecutive_parse_failures", 0) or 0),
+            subgoals=subgoals,
        )

+    # --- subgoals helpers -------------------------------------------------
+
+    def render_subgoals_block(self) -> str:
+        """Render the subgoals as a numbered ``- N. text`` block. Empty
+        when no subgoals exist."""
+        if not self.subgoals:
+            return ""
+        return "\n".join(f"- {i}. {text}" for i, text in enumerate(self.subgoals, start=1))
+

 # ──────────────────────────────────────────────────────────────────────
 # Persistence (SessionDB state_meta)
@@ -284,6 +336,7 @@ def judge_goal(
    last_response: str,
    *,
    timeout: float = DEFAULT_JUDGE_TIMEOUT,
+    subgoals: Optional[List[str]] = None,
 ) -> Tuple[str, str, bool]:
    """Ask the auxiliary model whether the goal is satisfied.

@@ -296,6 +349,11 @@ def judge_goal(
    auto-pause after N consecutive parse failures (see
    ``DEFAULT_MAX_CONSECUTIVE_PARSE_FAILURES``).

+    ``subgoals`` is an optional list of user-added criteria (from
+    ``/subgoal``) that the judge must also factor into its DONE/CONTINUE
+    decision. When non-empty the prompt switches to the with-subgoals
+    template; otherwise behavior is identical to the original judge.
+
    This is deliberately fail-open: any error returns ``("continue", "...", False)``
    so a broken judge doesn't wedge progress — the turn budget and the
    consecutive-parse-failures auto-pause are the backstops.
@@ -307,7 +365,7 @@ def judge_goal(
        return "continue", "empty response (nothing to evaluate)", False

    try:
-        from agent.auxiliary_client import get_text_auxiliary_client
+        from agent.auxiliary_client import get_auxiliary_extra_body, get_text_auxiliary_client
    except Exception as exc:
        logger.debug("goal judge: auxiliary client import failed: %s", exc)
        return "continue", "auxiliary client unavailable", False
@@ -321,10 +379,22 @@ def judge_goal(
    if client is None or not model:
        return "continue", "no auxiliary client configured", False

-    prompt = JUDGE_USER_PROMPT_TEMPLATE.format(
-        goal=_truncate(goal, 2000),
-        response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
-    )
+    # Build the prompt — pick the with-subgoals variant when applicable.
+    clean_subgoals = [s.strip() for s in (subgoals or []) if s and s.strip()]
+    if clean_subgoals:
+        subgoals_block = "\n".join(
+            f"- {i}. {text}" for i, text in enumerate(clean_subgoals, start=1)
+        )
+        prompt = JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE.format(
+            goal=_truncate(goal, 2000),
+            subgoals_block=_truncate(subgoals_block, 2000),
+            response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
+        )
+    else:
+        prompt = JUDGE_USER_PROMPT_TEMPLATE.format(
+            goal=_truncate(goal, 2000),
+            response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
+        )

    try:
        resp = client.chat.completions.create(
@@ -336,6 +406,7 @@ def judge_goal(
            temperature=0,
            max_tokens=200,
            timeout=timeout,
+            extra_body=get_auxiliary_extra_body() or None,
        )
    except Exception as exc:
        logger.info("goal judge: API call failed (%s) — falling through to continue", exc)
@@ -396,14 +467,15 @@ class GoalManager:
        if s is None or s.status in {"cleared",}:
            return "No active goal. Set one with /goal <text>."
        turns = f"{s.turns_used}/{s.max_turns} turns"
+        sub = f", {len(s.subgoals)} subgoal{'s' if len(s.subgoals) != 1 else ''}" if s.subgoals else ""
        if s.status == "active":
-            return f"⊙ Goal (active, {turns}): {s.goal}"
+            return f"⊙ Goal (active, {turns}{sub}): {s.goal}"
        if s.status == "paused":
            extra = f" — {s.paused_reason}" if s.paused_reason else ""
-            return f"⏸ Goal (paused, {turns}{extra}): {s.goal}"
+            return f"⏸ Goal (paused, {turns}{sub}{extra}): {s.goal}"
        if s.status == "done":
-            return f"✓ Goal done ({turns}): {s.goal}"
-        return f"Goal ({s.status}, {turns}): {s.goal}"
+            return f"✓ Goal done ({turns}{sub}): {s.goal}"
+        return f"Goal ({s.status}, {turns}{sub}): {s.goal}"

    # --- mutation -----------------------------------------------------

@@ -456,6 +528,53 @@ class GoalManager:
        self._state.last_reason = reason
        save_goal(self.session_id, self._state)

+    # --- /subgoal user controls ---------------------------------------
+
+    def add_subgoal(self, text: str) -> str:
+        """Append a user-added criterion to the active goal. Requires
+        ``has_goal()``; raises ``RuntimeError`` otherwise.
+
+        Returns the cleaned text so the caller can show it back to the user.
+        """
+        if self._state is None or not self.has_goal():
+            raise RuntimeError("no active goal")
+        text = (text or "").strip()
+        if not text:
+            raise ValueError("subgoal text is empty")
+        self._state.subgoals.append(text)
+        save_goal(self.session_id, self._state)
+        return text
+
+    def remove_subgoal(self, index_1based: int) -> str:
+        """Remove a subgoal by 1-based index. Returns the removed text."""
+        if self._state is None or not self.has_goal():
+            raise RuntimeError("no active goal")
+        idx = int(index_1based) - 1
+        if idx < 0 or idx >= len(self._state.subgoals):
+            raise IndexError(
+                f"index out of range (1..{len(self._state.subgoals)})"
+            )
+        removed = self._state.subgoals.pop(idx)
+        save_goal(self.session_id, self._state)
+        return removed
+
+    def clear_subgoals(self) -> int:
+        """Wipe all subgoals. Returns the previous count."""
+        if self._state is None or not self.has_goal():
+            raise RuntimeError("no active goal")
+        prev = len(self._state.subgoals)
+        self._state.subgoals = []
+        save_goal(self.session_id, self._state)
+        return prev
+
+    def render_subgoals(self) -> str:
+        """Public helper for the /subgoal slash command."""
+        if self._state is None:
+            return "(no active goal)"
+        if not self._state.subgoals:
+            return "(no subgoals — use /subgoal <text> to add criteria)"
+        return self._state.render_subgoals_block()
+
    # --- the main entry point called after every turn -----------------

    def evaluate_after_turn(
@@ -493,7 +612,9 @@ class GoalManager:
        state.turns_used += 1
        state.last_turn_at = time.time()

-        verdict, reason, parse_failed = judge_goal(state.goal, last_response)
+        verdict, reason, parse_failed = judge_goal(
+            state.goal, last_response, subgoals=state.subgoals or None
+        )
        state.last_verdict = verdict
        state.last_reason = reason

@@ -578,6 +699,11 @@ class GoalManager:
    def next_continuation_prompt(self) -> Optional[str]:
        if not self._state or self._state.status != "active":
            return None
+        if self._state.subgoals:
+            return CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE.format(
+                goal=self._state.goal,
+                subgoals_block=self._state.render_subgoals_block(),
+            )
        return CONTINUATION_PROMPT_TEMPLATE.format(goal=self._state.goal)


@@ -585,6 +711,9 @@ __all__ = [
    "GoalState",
    "GoalManager",
    "CONTINUATION_PROMPT_TEMPLATE",
+    "CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE",
+    "JUDGE_USER_PROMPT_TEMPLATE",
+    "JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE",
    "DEFAULT_MAX_TURNS",
    "load_goal",
    "save_goal",
@@ -0,0 +1,240 @@
+"""Provider/model inventory context — shared substrate for the dashboard
+``/api/model/options``, the TUI ``model.options``/``model.save_key``
+JSON-RPC handlers, and the interactive picker.
+
+Before this module the three call-sites each duplicated:
+
+1. The 17-LOC config-slice that pulls ``model.{default,name,provider,base_url}``,
+   ``providers:``, and ``custom_providers:`` out of ``load_config()``;
+2. The call into ``list_authenticated_providers`` with the resulting kwargs;
+3. (TUI only) a 45-LOC post-pass that merges authenticated rows with
+   unconfigured ``CANONICAL_PROVIDERS`` rows and emits ``authenticated``/
+   ``auth_type``/``key_env``/``warning`` hints for the picker UI.
+
+Consolidating those three steps into one entry point eliminates two bugs
+the duplicates were hiding:
+
+- The dashboard read ``cfg.get("custom_providers")`` directly, missing the
+  v12+ keyed ``providers:`` form (which the TUI handled via
+  ``get_compatible_custom_providers``).
+- The TUI's canonical-merge keyed on ``is_user_defined`` to decide
+  ordering. Section 3 of ``list_authenticated_providers`` sets
+  ``is_user_defined=True`` even for canonical slugs that appear in the
+  ``providers:`` config dict, which silently demoted them to the tail of
+  the picker. ``_reorder_canonical`` keys on slug membership instead.
+
+Substrate facts (verified May 2026):
+- ``list_authenticated_providers`` already populates each row's
+  ``models`` from the curated catalog (same source as the picker). Do
+  NOT call ``provider_model_ids()`` per row to "freshen" — that bypasses
+  curation and pulls in non-agentic models (Nous /models returns ~400
+  IDs including TTS, embeddings, rerankers, image/video generators).
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, replace
+from typing import Optional
+
+
+# ─── Public types ───────────────────────────────────────────────────────
+
+
+@dataclass(frozen=True)
+class ConfigContext:
+    """Snapshot of the model + provider config every inventory caller
+    needs. Built once via ``load_picker_context()``; the TUI overlays
+    live agent state via ``with_overrides()`` before passing through.
+    """
+
+    current_provider: str
+    current_model: str
+    current_base_url: str
+    user_providers: dict
+    custom_providers: list
+
+    def with_overrides(
+        self,
+        *,
+        current_provider: Optional[str] = None,
+        current_model: Optional[str] = None,
+        current_base_url: Optional[str] = None,
+    ) -> "ConfigContext":
+        """Return a copy with truthy overrides applied.
+
+        Truthy-only because the TUI reads agent attributes that may be
+        empty strings before an agent is spawned — empties must NOT
+        clobber the disk-config values.
+        """
+        kw: dict = {}
+        if current_provider:
+            kw["current_provider"] = current_provider
+        if current_model:
+            kw["current_model"] = current_model
+        if current_base_url:
+            kw["current_base_url"] = current_base_url
+        return replace(self, **kw) if kw else self
+
+
+def load_picker_context() -> ConfigContext:
+    """Load the disk-config snapshot every consumer needs.
+
+    Replaces the inline 17-LOC config-slice that ``web_server.py`` and
+    ``tui_gateway/server.py`` (×2 sites) used to do.
+    """
+    from hermes_cli.config import get_compatible_custom_providers, load_config
+
+    cfg = load_config()
+    model_cfg = cfg.get("model", {})
+    if isinstance(model_cfg, dict):
+        current_model = model_cfg.get("default", model_cfg.get("name", "")) or ""
+        current_provider = model_cfg.get("provider", "") or ""
+        current_base_url = model_cfg.get("base_url", "") or ""
+    else:
+        # config.model can be a bare string in older configs.
+        current_model = str(model_cfg) if model_cfg else ""
+        current_provider = ""
+        current_base_url = ""
+    raw = cfg.get("providers")
+    return ConfigContext(
+        current_provider=current_provider,
+        current_model=current_model,
+        current_base_url=current_base_url,
+        user_providers=raw if isinstance(raw, dict) else {},
+        custom_providers=get_compatible_custom_providers(cfg),
+    )
+
+
+# ─── Public: payload builder ────────────────────────────────────────────
+
+
+def build_models_payload(
+    ctx: ConfigContext,
+    *,
+    include_unconfigured: bool = False,
+    picker_hints: bool = False,
+    canonical_order: bool = False,
+    max_models: int = 50,
+) -> dict:
+    """Build the ``{providers, model, provider}`` shape every consumer
+    needs from a single substrate call.
+
+    Flags:
+    - ``include_unconfigured``: append ``CANONICAL_PROVIDERS`` rows that
+      ``list_authenticated_providers`` didn't emit (TUI uses this to show
+      the full provider universe in the picker).
+    - ``picker_hints``: add ``authenticated``/``auth_type``/``key_env``/
+      ``warning`` per row (TUI ``ModelPickerDialog`` shape).
+    - ``canonical_order``: reorder canonical-slug rows to
+      ``CANONICAL_PROVIDERS`` declaration order; truly-custom rows go
+      last (TUI display order).
+    """
+    from hermes_cli.model_switch import list_authenticated_providers
+
+    rows = list_authenticated_providers(
+        current_provider=ctx.current_provider,
+        current_base_url=ctx.current_base_url,
+        current_model=ctx.current_model,
+        user_providers=ctx.user_providers,
+        custom_providers=ctx.custom_providers,
+        max_models=max_models,
+    )
+
+    if include_unconfigured:
+        rows = list(rows) + _append_unconfigured_rows(rows, ctx)
+    if picker_hints:
+        _apply_picker_hints(rows)
+    if canonical_order:
+        rows = _reorder_canonical(rows)
+
+    return {
+        "providers": rows,
+        "model": ctx.current_model,
+        "provider": ctx.current_provider,
+    }
+
+
+# ─── Internal: row post-processing ──────────────────────────────────────
+
+
+def _append_unconfigured_rows(rows: list[dict], ctx: ConfigContext) -> list[dict]:
+    """Build skeleton rows for canonical providers missing from ``rows``."""
+    from hermes_cli.models import CANONICAL_PROVIDERS, _PROVIDER_LABELS
+
+    seen = {r["slug"].lower() for r in rows}
+    cur = (ctx.current_provider or "").lower()
+    extras: list[dict] = []
+    for entry in CANONICAL_PROVIDERS:
+        if entry.slug.lower() in seen:
+            continue
+        extras.append(
+            {
+                "slug": entry.slug,
+                "name": _PROVIDER_LABELS.get(entry.slug, entry.label),
+                "is_current": entry.slug.lower() == cur,
+                "is_user_defined": False,
+                "models": [],
+                "total_models": 0,
+                "source": "canonical",
+            }
+        )
+    return extras
+
+
+def _apply_picker_hints(rows: list[dict]) -> None:
+    """Add ``authenticated``/``auth_type``/``key_env``/``warning`` per row.
+
+    Mutates ``rows`` in-place. Rows already from
+    ``list_authenticated_providers`` are marked ``authenticated=True``;
+    the unconfigured skeleton rows from ``_append_unconfigured_rows`` get
+    the picker's setup-hint shape.
+    """
+    from hermes_cli.auth import PROVIDER_REGISTRY
+
+    for row in rows:
+        if "authenticated" in row:
+            continue
+        # Distinguish authenticated rows (returned by
+        # list_authenticated_providers) from skeleton rows (from
+        # _append_unconfigured_rows). The skeleton rows have empty
+        # `models` AND source="canonical"; authenticated rows have
+        # populated `models` OR a non-canonical source.
+        is_skeleton = row.get("source") == "canonical" and not row.get("models")
+        row["authenticated"] = not is_skeleton
+        if not is_skeleton or row.get("is_user_defined"):
+            continue
+        cfg = PROVIDER_REGISTRY.get(row["slug"])
+        auth_type = cfg.auth_type if cfg else "api_key"
+        key_env = (
+            cfg.api_key_env_vars[0]
+            if (cfg and cfg.api_key_env_vars)
+            else ""
+        )
+        row["auth_type"] = auth_type
+        row["key_env"] = key_env
+        row["warning"] = (
+            f"paste {key_env} to activate"
+            if auth_type == "api_key" and key_env
+            else f"run `hermes model` to configure ({auth_type})"
+        )
+
+
+def _reorder_canonical(rows: list[dict]) -> list[dict]:
+    """Canonical slugs in ``CANONICAL_PROVIDERS`` declaration order;
+    truly-custom rows last.
+
+    Keys on slug membership, NOT ``is_user_defined`` — section 3 of
+    ``list_authenticated_providers`` sets ``is_user_defined=True`` on
+    rows from the ``providers:`` config dict even when the slug is
+    canonical. Keying on the flag would silently demote canonical
+    providers configured via the new keyed schema.
+    """
+    from hermes_cli.models import CANONICAL_PROVIDERS
+
+    order = {e.slug: i for i, e in enumerate(CANONICAL_PROVIDERS)}
+    canon = sorted(
+        (r for r in rows if r["slug"] in order),
+        key=lambda r: order[r["slug"]],
+    )
+    extras = [r for r in rows if r["slug"] not in order]
+    return canon + extras
@@ -155,7 +155,7 @@ def specify_task(
        )

    try:
-        from agent.auxiliary_client import get_text_auxiliary_client
+        from agent.auxiliary_client import get_auxiliary_extra_body, get_text_auxiliary_client
    except Exception as exc:  # pragma: no cover — import smoke test
        logger.debug("specify: auxiliary client import failed: %s", exc)
        return SpecifyOutcome(task_id, False, "auxiliary client unavailable")
@@ -187,6 +187,7 @@ def specify_task(
            temperature=0.3,
            max_tokens=1500,
            timeout=timeout or 120,
+            extra_body=get_auxiliary_extra_body() or None,
        )
    except Exception as exc:
        logger.info(
@@ -2414,30 +2414,31 @@ def _prompt_provider_choice(choices, *, default=0):
 def _model_flow_openrouter(config, current_model=""):
    """OpenRouter provider: ensure API key, then pick model."""
    from hermes_cli.auth import (
+        ProviderConfig,
        _prompt_model_selection,
        _save_model_choice,
        deactivate_provider,
    )
-    from hermes_cli.config import get_env_value, save_env_value
+    from hermes_cli.config import get_env_value

-    api_key = get_env_value("OPENROUTER_API_KEY")
-    if not api_key:
-        print("No OpenRouter API key configured.")
+    # Route through _prompt_api_key so users can replace a stale/broken key
+    # in-flow (K/R/C) instead of having to edit ~/.hermes/.env by hand. The
+    # previous bypass-when-key-exists branch left no way to recover from a
+    # bad paste short of re-running `hermes setup` from scratch. OpenRouter
+    # isn't in PROVIDER_REGISTRY so we synthesize a minimal pconfig.
+    pconfig = ProviderConfig(
+        id="openrouter",
+        name="OpenRouter",
+        auth_type="api_key",
+        api_key_env_vars=("OPENROUTER_API_KEY",),
+    )
+    existing_key = get_env_value("OPENROUTER_API_KEY") or ""
+    if not existing_key:
        print("Get one at: https://openrouter.ai/keys")
        print()
-        try:
-            import getpass
-
-            key = getpass.getpass("OpenRouter API key (or Enter to cancel): ").strip()
-        except (KeyboardInterrupt, EOFError):
-            print()
-            return
-        if not key:
-            print("Cancelled.")
-            return
-        save_env_value("OPENROUTER_API_KEY", key)
-        print("API key saved.")
-        print()
+    _resolved, abort = _prompt_api_key(pconfig, existing_key, provider_id="openrouter")
+    if abort:
+        return

    from hermes_cli.models import model_ids, get_pricing_for_provider

@@ -2473,33 +2474,26 @@ def _model_flow_openrouter(config, current_model=""):
 def _model_flow_ai_gateway(config, current_model=""):
    """Vercel AI Gateway provider: ensure API key, then pick model with pricing."""
    from hermes_cli.auth import (
+        PROVIDER_REGISTRY,
        _prompt_model_selection,
        _save_model_choice,
        deactivate_provider,
    )
-    from hermes_cli.config import get_env_value, save_env_value
+    from hermes_cli.config import get_env_value

-    api_key = get_env_value("AI_GATEWAY_API_KEY")
-    if not api_key:
-        print("No Vercel AI Gateway API key configured.")
+    # Route through _prompt_api_key so users can replace a stale/broken key
+    # in-flow (K/R/C) instead of having to edit ~/.hermes/.env by hand.
+    pconfig = PROVIDER_REGISTRY["ai-gateway"]
+    existing_key = get_env_value("AI_GATEWAY_API_KEY") or ""
+    if not existing_key:
        print(
            "Create API key here: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai-gateway&title=AI+Gateway"
        )
        print("Add a payment method to get $5 in free credits.")
        print()
-        try:
-            import getpass
-
-            key = getpass.getpass("AI Gateway API key (or Enter to cancel): ").strip()
-        except (KeyboardInterrupt, EOFError):
-            print()
-            return
-        if not key:
-            print("Cancelled.")
-            return
-        save_env_value("AI_GATEWAY_API_KEY", key)
-        print("API key saved.")
-        print()
+    _resolved, abort = _prompt_api_key(pconfig, existing_key, provider_id="ai-gateway")
+    if abort:
+        return

    from hermes_cli.models import ai_gateway_model_ids, get_pricing_for_provider

@@ -3079,6 +3073,21 @@ def _model_flow_custom(config):
            else:
                print(f"  If /v1 should not be in the base URL, try: {suggested}")

+    # Prompt for API compatibility mode explicitly so codex-compatible custom
+    # providers don't silently fall back to chat_completions.
+    current_model_cfg = config.get("model")
+    current_api_mode = ""
+    if isinstance(current_model_cfg, dict):
+        current_api_mode = str(current_model_cfg.get("api_mode") or "").strip()
+    api_mode = _prompt_custom_api_mode_selection(
+        effective_url,
+        current_api_mode=current_api_mode,
+    )
+    if api_mode:
+        print(f"  API mode: {api_mode}")
+    else:
+        print("  API mode: auto-detect")
+
    # Select model — use probe results when available, fall back to manual input
    model_name = ""
    detected_models = probe.get("models") or []
@@ -3142,7 +3151,10 @@ def _model_flow_custom(config):
        model["base_url"] = effective_url
        if effective_key:
            model["api_key"] = effective_key
-        model.pop("api_mode", None)  # let runtime auto-detect from URL
+        if api_mode:
+            model["api_mode"] = api_mode
+        else:
+            model.pop("api_mode", None)
        save_config(cfg)
        deactivate_provider()

@@ -3165,7 +3177,10 @@ def _model_flow_custom(config):
        _caller_model["base_url"] = effective_url
        if effective_key:
            _caller_model["api_key"] = effective_key
-        _caller_model.pop("api_mode", None)
+        if api_mode:
+            _caller_model["api_mode"] = api_mode
+        else:
+            _caller_model.pop("api_mode", None)
        config["model"] = _caller_model
        print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")

@@ -3176,9 +3191,80 @@ def _model_flow_custom(config):
        model_name or "",
        context_length=context_length,
        name=display_name,
+        api_mode=api_mode,
    )


+def _prompt_custom_api_mode_selection(base_url: str, current_api_mode: str = "") -> Optional[str]:
+    """Prompt for a custom provider API mode.
+
+    Returns an explicit mode string, or None to keep auto-detect behavior.
+    """
+    from hermes_cli.runtime_provider import _detect_api_mode_for_url
+
+    detected_mode = _detect_api_mode_for_url(base_url)
+    normalized_current = str(current_api_mode or "").strip().lower()
+    default_mode = normalized_current or detected_mode or ""
+
+    mode_options = [
+        (
+            "",
+            "Auto-detect",
+            "Use Hermes URL heuristics; best for standard OpenAI-compatible endpoints.",
+        ),
+        (
+            "chat_completions",
+            "Chat Completions",
+            "Use /chat/completions for standard OpenAI-compatible servers.",
+        ),
+        (
+            "codex_responses",
+            "Responses / Codex",
+            "Use /responses for Codex-compatible tool-calling backends.",
+        ),
+        (
+            "anthropic_messages",
+            "Anthropic Messages",
+            "Use /v1/messages for Anthropic-compatible endpoints.",
+        ),
+    ]
+
+    print()
+    print("Select API compatibility mode:")
+    for idx, (value, label, description) in enumerate(mode_options, 1):
+        markers = []
+        if value == detected_mode:
+            markers.append("detected")
+        if value == default_mode:
+            markers.append("current")
+        suffix = f" [{' / '.join(markers)}]" if markers else ""
+        print(f"  {idx}. {label}{suffix}")
+        print(f"     {description}")
+
+    try:
+        raw = input(
+            "Choice [1-4, Enter to keep current/detected]: "
+        ).strip().lower()
+    except (KeyboardInterrupt, EOFError):
+        print("\nCancelled.")
+        raise
+
+    if not raw:
+        return default_mode or None
+
+    if raw in {"1", "auto", "detect", "auto-detect"}:
+        return None
+    if raw in {"2", "chat", "chat_completions", "completions"}:
+        return "chat_completions"
+    if raw in {"3", "responses", "codex", "codex_responses"}:
+        return "codex_responses"
+    if raw in {"4", "anthropic", "anthropic_messages", "messages"}:
+        return "anthropic_messages"
+
+    print(f"Invalid API mode choice: {raw}. Falling back to auto-detect.")
+    return None
+
+
 def _auto_provider_name(base_url: str) -> str:
    """Generate a display name from a custom endpoint URL.

@@ -3214,12 +3300,12 @@ def _custom_provider_api_key_config_value(provider_info, resolved_api_key=""):


 def _save_custom_provider(
-    base_url, api_key="", model="", context_length=None, name=None
+    base_url, api_key="", model="", context_length=None, name=None, api_mode=None
 ):
    """Save a custom endpoint to custom_providers in config.yaml.

    Deduplicates by base_url — if the URL already exists, updates the
-    model name and context_length but doesn't add a duplicate entry.
+    model name, context_length, and api_mode but doesn't add a duplicate entry.
    Uses *name* when provided, otherwise auto-generates from the URL.
    """
    from hermes_cli.config import load_config, save_config
@@ -3245,6 +3331,13 @@ def _save_custom_provider(
                models_cfg[model] = {"context_length": context_length}
                entry["models"] = models_cfg
                changed = True
+            if api_mode:
+                if entry.get("api_mode") != api_mode:
+                    entry["api_mode"] = api_mode
+                    changed = True
+            elif "api_mode" in entry:
+                entry.pop("api_mode", None)
+                changed = True
            if changed:
                cfg["custom_providers"] = providers
                save_config(cfg)
@@ -3259,6 +3352,8 @@ def _save_custom_provider(
        entry["api_key"] = api_key
    if model:
        entry["model"] = model
+    if api_mode:
+        entry["api_mode"] = api_mode
    if model and context_length:
        entry["models"] = {model: {"context_length": context_length}}

@@ -3712,7 +3807,7 @@ def _model_flow_named_custom(config, provider_info):
                save_config(cfg)
    else:
        # Save model name to the custom_providers entry for next time
-        _save_custom_provider(base_url, config_api_key, model_name)
+        _save_custom_provider(base_url, config_api_key, model_name, api_mode=api_mode)

    print(f"\n✅ Model set to: {model_name}")
    print(f"   Provider: {name} ({base_url})")
@@ -4869,6 +4964,37 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
        )
        if model_list:
            print(f"  Found {len(model_list)} model(s) from Ollama Cloud")
+    elif provider_id == "novita":
+        from hermes_cli.models import fetch_api_models
+
+        api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
+        curated = _PROVIDER_MODELS.get(provider_id, [])
+        live_models = fetch_api_models(api_key_for_probe, effective_base)
+        if live_models:
+            model_list = live_models
+            print(f"  Found {len(model_list)} model(s) from {pconfig.name} API")
+        else:
+            mdev_models: list = []
+            try:
+                from agent.models_dev import list_agentic_models
+
+                mdev_models = list_agentic_models(provider_id)
+            except Exception:
+                pass
+            if mdev_models:
+                seen = {m.lower() for m in mdev_models}
+                model_list = list(mdev_models)
+                for m in curated:
+                    if m.lower() not in seen:
+                        model_list.append(m)
+                        seen.add(m.lower())
+                print(f"  Found {len(model_list)} model(s) from models.dev registry")
+            else:
+                model_list = curated
+                if model_list:
+                    print(
+                        f'  Showing {len(model_list)} curated models — use "Enter custom model name" for others.'
+                    )
    else:
        curated = _PROVIDER_MODELS.get(provider_id, [])

@@ -6701,6 +6827,74 @@ def _cleanup_quarantined_exes(scripts_dir: Path | None = None) -> None:
        pass


+def _refresh_active_lazy_features() -> None:
+    """Refresh lazy-installed backends after a code update.
+
+    When pyproject.toml's ``[all]`` extra was slimmed down (May 2026), most
+    optional backends moved to ``tools/lazy_deps.py`` and only install on
+    first use. ``hermes update`` runs ``uv pip install -e .[all]`` which
+    leaves those packages untouched — so if we bump a pin in
+    :data:`LAZY_DEPS` (CVE response, transitive bug fix), users who already
+    activated the backend keep the stale version forever.
+
+    This function asks lazy_deps which features the user has previously
+    activated and reinstalls them under the current pins. Features the
+    user never enabled stay quiet — no churn for cold backends.
+
+    Never raises. A failure here must not block the rest of the update.
+    """
+    try:
+        from tools import lazy_deps
+    except Exception as exc:
+        logger.debug("Lazy refresh skipped (import failed): %s", exc)
+        return
+
+    try:
+        active = lazy_deps.active_features()
+    except Exception as exc:
+        logger.debug("Lazy refresh skipped (active_features failed): %s", exc)
+        return
+
+    if not active:
+        return
+
+    print()
+    print(f"→ Refreshing {len(active)} active lazy backend(s)...")
+
+    try:
+        results = lazy_deps.refresh_active_features(prompt=False)
+    except Exception as exc:
+        # refresh_active_features is documented as never-raise, but defend
+        # the update flow against future regressions.
+        print(f"  ⚠ Lazy refresh failed unexpectedly: {exc}")
+        return
+
+    refreshed = [f for f, s in results.items() if s == "refreshed"]
+    current = [f for f, s in results.items() if s == "current"]
+    failed = [(f, s) for f, s in results.items() if s.startswith("failed:")]
+    skipped = [(f, s) for f, s in results.items() if s.startswith("skipped:")]
+
+    if refreshed:
+        print(f"  ↑ {len(refreshed)} refreshed: {', '.join(refreshed)}")
+    if current:
+        print(f"  ✓ {len(current)} already current")
+    if skipped:
+        # Most common reason: security.allow_lazy_installs=false. Show one
+        # line so the user knows why; not an error.
+        names = ", ".join(f for f, _ in skipped)
+        reason = skipped[0][1].split(": ", 1)[-1]
+        print(f"  · {len(skipped)} skipped ({reason}): {names}")
+    if failed:
+        for feature, status in failed:
+            reason = status.split(": ", 1)[-1]
+            # Clip noisy pip stderr to keep update output legible.
+            if len(reason) > 200:
+                reason = reason[:200] + "..."
+            print(f"  ⚠ {feature} failed to refresh: {reason}")
+        print("  Backends keep their previously-installed version; rerun")
+        print("  `hermes update` once the upstream issue is resolved.")
+
+
 def _install_python_dependencies_with_optional_fallback(
    install_cmd_prefix: list[str],
    *,
@@ -7623,6 +7817,8 @@ def _cmd_update_impl(args, gateway_mode: bool):
                _install_psutil_android_compat(pip_cmd)
            _install_python_dependencies_with_optional_fallback(pip_cmd, group=install_group)

+        _refresh_active_lazy_features()
+
        _update_node_dependencies()
        _build_web_ui(PROJECT_ROOT / "web")

@@ -9168,7 +9364,7 @@ def _build_provider_choices() -> list[str]:
            "auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot",
            "anthropic", "gemini", "google-gemini-cli", "xai", "bedrock", "azure-foundry",
            "ollama-cloud", "huggingface", "zai", "kimi-coding", "kimi-coding-cn",
-            "stepfun", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee",
+            "stepfun", "minimax", "minimax-cn", "kilocode", "novita", "xiaomi", "arcee",
            "nvidia", "deepseek", "alibaba", "qwen-oauth", "opencode-zen", "opencode-go",
        ]

@@ -9188,10 +9384,10 @@ _BUILTIN_SUBCOMMANDS = frozenset(
        "computer-use",
        "config", "cron", "curator", "dashboard", "debug", "doctor",
        "dump", "fallback", "gateway", "hooks", "import", "insights",
-        "kanban", "login", "logout", "logs", "mcp", "memory", "model",
-        "pairing", "plugins", "profile", "sessions", "setup", "skills",
-        "slack", "status", "tools", "uninstall", "update", "version",
-        "webhook", "whatsapp", "chat",
+        "kanban", "login", "logout", "logs", "lsp", "mcp", "memory",
+        "model", "pairing", "plugins", "profile", "sessions", "setup",
+        "skills", "slack", "status", "tools", "uninstall", "update",
+        "version", "webhook", "whatsapp", "chat",
        # Help-ish invocations — plugin commands not being listed in
        # top-level --help is an acceptable trade-off for skipping an
        # expensive eager import of every bundled plugin module.
@@ -10,6 +10,7 @@ from __future__ import annotations
 import getpass
 import os
 import sys
+import shlex
 from pathlib import Path

 from hermes_constants import get_hermes_home
@@ -134,7 +135,7 @@ def _install_dependencies(provider_name: str) -> None:
        if check_cmd:
            try:
                subprocess.run(
-                    check_cmd, shell=True, capture_output=True, timeout=5
+                    shlex.split(check_cmd), check=True, capture_output=True, timeout=5
                )
            except Exception:
                if install_cmd:
@@ -378,6 +379,12 @@ def _write_env_vars(env_path: Path, env_writes: dict) -> None:
            new_lines.append(f"{key}={val}")

    env_path.write_text("\n".join(new_lines) + "\n", encoding="utf-8")
+    # Restrict permissions — .env holds API keys and tokens.
+    try:
+        import stat
+        env_path.chmod(stat.S_IRUSR | stat.S_IWUSR)  # 0600
+    except OSError:
+        pass  # Windows or read-only FS


 # ---------------------------------------------------------------------------
@@ -445,6 +445,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
    # Azure Foundry: user-provided endpoint and model.
    # Empty list because models depend on the endpoint configuration.
    "azure-foundry": [],
+    "novita": [
+        "moonshotai/kimi-k2.5",
+        "minimax/minimax-m2.7",
+        "zai-org/glm-5",
+        "deepseek/deepseek-v3-0324",
+        "deepseek/deepseek-r1-0528",
+        "qwen/qwen3-235b-a22b-fp8",
+    ],
 }

 # Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -905,13 +913,14 @@ class ProviderEntry(NamedTuple):
 CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("nous",           "Nous Portal",              "Nous Portal (Nous Research subscription)"),
    ProviderEntry("openrouter",     "OpenRouter",               "OpenRouter (100+ models, pay-per-use)"),
+    ProviderEntry("novita",         "NovitaAI",                 "NovitaAI (AI-native cloud: Model API, Agent Sandbox, GPU Cloud)"),
    ProviderEntry("lmstudio",       "LM Studio",                "LM Studio (local desktop app with built-in model server)"),
    ProviderEntry("anthropic",      "Anthropic",                "Anthropic (Claude models — API key or Claude Code)"),
    ProviderEntry("openai-codex",   "OpenAI Codex",             "OpenAI Codex"),
+    ProviderEntry("alibaba",        "Qwen Cloud",               "Qwen Cloud / DashScope Coding (Qwen + multi-provider)"),
    ProviderEntry("xiaomi",         "Xiaomi MiMo",              "Xiaomi MiMo (MiMo-V2.5 and V2 models — pro, omni, flash)"),
    ProviderEntry("tencent-tokenhub", "Tencent TokenHub",       "Tencent TokenHub (Hy3 Preview — direct API via tokenhub.tencentmaas.com)"),
    ProviderEntry("nvidia",         "NVIDIA NIM",               "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
-    ProviderEntry("qwen-oauth",     "Qwen OAuth (Portal)",      "Qwen OAuth (reuses local Qwen CLI login)"),
    ProviderEntry("copilot",        "GitHub Copilot",           "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
    ProviderEntry("copilot-acp",    "GitHub Copilot ACP",       "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
    ProviderEntry("huggingface",    "Hugging Face",             "Hugging Face Inference Providers (20+ open models)"),
@@ -926,7 +935,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("minimax",        "MiniMax",                  "MiniMax (global direct API)"),
    ProviderEntry("minimax-oauth",  "MiniMax (OAuth)",          "MiniMax via OAuth browser login (Coding Plan, minimax.io)"),
    ProviderEntry("minimax-cn",     "MiniMax (China)",          "MiniMax China (domestic direct API)"),
-    ProviderEntry("alibaba",        "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
    ProviderEntry("ollama-cloud",   "Ollama Cloud",             "Ollama Cloud (cloud-hosted open models — ollama.com)"),
    ProviderEntry("arcee",          "Arcee AI",                 "Arcee AI (Trinity models — direct API)"),
    ProviderEntry("gmi",            "GMI Cloud",                "GMI Cloud (multi-model direct API)"),
@@ -936,6 +944,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("bedrock",        "AWS Bedrock",              "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
    ProviderEntry("azure-foundry",  "Azure Foundry",            "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
    ProviderEntry("ai-gateway",     "Vercel AI Gateway",        "Vercel AI Gateway"),
+    ProviderEntry("qwen-oauth",     "Qwen OAuth (Portal)",      "Qwen OAuth (reuses local Qwen CLI login)"),
 ]

 # Auto-extend CANONICAL_PROVIDERS with any provider registered in providers/
@@ -1014,6 +1023,8 @@ _PROVIDER_ALIASES = {
    "hf": "huggingface",
    "hugging-face": "huggingface",
    "huggingface-hub": "huggingface",
+    "novita-ai": "novita",
+    "novitaai": "novita",
    "mimo": "xiaomi",
    "xiaomi-mimo": "xiaomi",
    "tencent": "tencent-tokenhub",
@@ -1494,7 +1505,7 @@ def _resolve_nous_pricing_credentials() -> tuple[str, str]:


 def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> dict[str, dict[str, str]]:
-    """Return live pricing for providers that support it (openrouter, nous, ai-gateway)."""
+    """Return live pricing for providers that support it (openrouter, nous, ai-gateway, novita)."""
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
        return fetch_models_with_pricing(
@@ -1504,6 +1515,8 @@ def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> d
        )
    if normalized == "ai-gateway":
        return fetch_ai_gateway_pricing(force_refresh=force_refresh)
+    if normalized == "novita":
+        return _fetch_novita_pricing(force_refresh=force_refresh)
    if normalized == "nous":
        api_key, base_url = _resolve_nous_pricing_credentials()
        if base_url:
@@ -1520,6 +1533,65 @@ def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> d
    return {}


+def _fetch_novita_pricing(
+    timeout: float = 8.0,
+    *,
+    force_refresh: bool = False,
+) -> dict[str, dict[str, str]]:
+    """Fetch pricing from NovitaAI /v1/models.
+
+    NovitaAI returns input/output prices per million tokens in units of
+    0.0001 USD. Convert them to the per-token strings used by the shared
+    pricing formatter.
+
+    Results are cached in ``_pricing_cache`` keyed on the resolved base URL,
+    matching the pattern used by ``fetch_ai_gateway_pricing`` — without this,
+    every menu render or pricing lookup re-hits the network.
+    """
+    api_key = os.getenv("NOVITA_API_KEY", "").strip()
+    if not api_key:
+        return {}
+
+    base_url = os.getenv("NOVITA_BASE_URL", "").strip() or "https://api.novita.ai/openai/v1"
+    cache_key = base_url.rstrip("/")
+    if not force_refresh and cache_key in _pricing_cache:
+        return _pricing_cache[cache_key]
+
+    url = cache_key + "/models"
+    headers = {
+        "Authorization": f"Bearer {api_key}",
+        "Accept": "application/json",
+        "User-Agent": _HERMES_USER_AGENT,
+    }
+
+    try:
+        req = urllib.request.Request(url, headers=headers)
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            payload = json.loads(resp.read().decode())
+    except Exception:
+        _pricing_cache[cache_key] = {}
+        return {}
+
+    result: dict[str, dict[str, str]] = {}
+    for item in payload.get("data", []):
+        if not isinstance(item, dict):
+            continue
+        mid = item.get("id")
+        if not mid:
+            continue
+        inp = item.get("input_token_price_per_m")
+        out = item.get("output_token_price_per_m")
+        if inp is None and out is None:
+            continue
+        result[str(mid)] = {
+            "prompt": str(float(inp or 0) / 10_000 / 1_000_000),
+            "completion": str(float(out or 0) / 10_000 / 1_000_000),
+        }
+
+    _pricing_cache[cache_key] = result
+    return result
+
+
 # All provider IDs and aliases that are valid for the provider:model syntax.
 _KNOWN_PROVIDER_NAMES: set[str] = (
    set(_PROVIDER_LABELS.keys())
@@ -542,6 +542,61 @@ class PluginContext:
            self.manifest.name, provider.name,
        )

+    # -- video gen provider registration -------------------------------------
+
+    def register_video_gen_provider(self, provider) -> None:
+        """Register a video generation backend.
+
+        ``provider`` must be an instance of
+        :class:`agent.video_gen_provider.VideoGenProvider`. The
+        ``provider.name`` attribute is what ``video_gen.provider`` in
+        ``config.yaml`` matches against when routing ``video_generate``
+        tool calls.
+        """
+        from agent.video_gen_provider import VideoGenProvider
+        from agent.video_gen_registry import register_provider as _register_video_provider
+
+        if not isinstance(provider, VideoGenProvider):
+            logger.warning(
+                "Plugin '%s' tried to register a video_gen provider that does "
+                "not inherit from VideoGenProvider. Ignoring.",
+                self.manifest.name,
+            )
+            return
+        _register_video_provider(provider)
+        logger.info(
+            "Plugin '%s' registered video_gen provider: %s",
+            self.manifest.name, provider.name,
+        )
+
+    # -- web search/extract provider registration ----------------------------
+
+    def register_web_search_provider(self, provider) -> None:
+        """Register a web search/extract backend.
+
+        ``provider`` must be an instance of
+        :class:`agent.web_search_provider.WebSearchProvider`. The
+        ``provider.name`` attribute is what ``web.search_backend`` /
+        ``web.extract_backend`` / ``web.backend`` in ``config.yaml``
+        matches against when routing ``web_search`` / ``web_extract``
+        tool calls.
+        """
+        from agent.web_search_provider import WebSearchProvider
+        from agent.web_search_registry import register_provider as _register_web_provider
+
+        if not isinstance(provider, WebSearchProvider):
+            logger.warning(
+                "Plugin '%s' tried to register a web provider that does "
+                "not inherit from WebSearchProvider. Ignoring.",
+                self.manifest.name,
+            )
+            return
+        _register_web_provider(provider)
+        logger.info(
+            "Plugin '%s' registered web provider: %s",
+            self.manifest.name, provider.name,
+        )
+
    # -- platform adapter registration ---------------------------------------

    def register_platform(
@@ -1312,6 +1367,21 @@ def invoke_hook(hook_name: str, **kwargs: Any) -> List[Any]:



+_thread_tool_whitelist = threading.local()
+
+
+def set_thread_tool_whitelist(
+    allowed: Optional[Set[str]],
+    deny_msg_fmt: str = "Tool '{tool_name}' denied: not in this thread's tool whitelist",
+) -> None:
+    _thread_tool_whitelist.allowed = allowed
+    _thread_tool_whitelist.fmt = deny_msg_fmt
+
+
+def clear_thread_tool_whitelist() -> None:
+    _thread_tool_whitelist.allowed = None
+
+
 def get_pre_tool_call_block_message(
    tool_name: str,
    args: Optional[Dict[str, Any]],
@@ -1330,6 +1400,11 @@ def get_pre_tool_call_block_message(
    directive wins.  Invalid or irrelevant hook return values are
    silently ignored so existing observer-only hooks are unaffected.
    """
+    allowed = getattr(_thread_tool_whitelist, "allowed", None)
+    if allowed is not None and tool_name not in allowed:
+        fmt = getattr(_thread_tool_whitelist, "fmt", "Tool '{tool_name}' denied")
+        return fmt.format(tool_name=tool_name)
+
    hook_results = invoke_hook(
        "pre_tool_call",
        tool_name=tool_name,
@@ -1295,91 +1295,6 @@ def rename_profile(old_name: str, new_name: str) -> Path:
    return new_dir


-# ---------------------------------------------------------------------------
-# Tab completion
-# ---------------------------------------------------------------------------
-
-def generate_bash_completion() -> str:
-    """Generate a bash completion script for hermes profile names."""
-    return '''# Hermes Agent profile completion
-# Add to ~/.bashrc: eval "$(hermes completion bash)"
-
-_hermes_profiles() {
-    local profiles_dir="$HOME/.hermes/profiles"
-    local profiles="default"
-    if [ -d "$profiles_dir" ]; then
-        profiles="$profiles $(ls "$profiles_dir" 2>/dev/null)"
-    fi
-    echo "$profiles"
-}
-
-_hermes_completion() {
-    local cur prev
-    cur="${COMP_WORDS[COMP_CWORD]}"
-    prev="${COMP_WORDS[COMP_CWORD-1]}"
-
-    # Complete profile names after -p / --profile
-    if [[ "$prev" == "-p" || "$prev" == "--profile" ]]; then
-        COMPREPLY=($(compgen -W "$(_hermes_profiles)" -- "$cur"))
-        return
-    fi
-
-    # Complete profile subcommands
-    if [[ "${COMP_WORDS[1]}" == "profile" ]]; then
-        case "$prev" in
-            profile)
-                COMPREPLY=($(compgen -W "list use create delete show alias rename export import" -- "$cur"))
-                return
-                ;;
-            use|delete|show|alias|rename|export)
-                COMPREPLY=($(compgen -W "$(_hermes_profiles)" -- "$cur"))
-                return
-                ;;
-        esac
-    fi
-
-    # Top-level subcommands
-    if [[ "$COMP_CWORD" == 1 ]]; then
-        local commands="chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version"
-        COMPREPLY=($(compgen -W "$commands" -- "$cur"))
-    fi
-}
-
-complete -F _hermes_completion hermes
-'''
-
-
-def generate_zsh_completion() -> str:
-    """Generate a zsh completion script for hermes profile names."""
-    return '''#compdef hermes
-# Hermes Agent profile completion
-# Add to ~/.zshrc: eval "$(hermes completion zsh)"
-
-_hermes() {
-    local -a profiles
-    profiles=(default)
-    if [[ -d "$HOME/.hermes/profiles" ]]; then
-        profiles+=("${(@f)$(ls $HOME/.hermes/profiles 2>/dev/null)}")
-    fi
-
-    _arguments \\
-        '-p[Profile name]:profile:($profiles)' \\
-        '--profile[Profile name]:profile:($profiles)' \\
-        '1:command:(chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version)' \\
-        '*::arg:->args'
-
-    case $words[1] in
-        profile)
-            _arguments '1:action:(list use create delete show alias rename export import)' \\
-                        '2:profile:($profiles)'
-            ;;
-    esac
-}
-
-_hermes "$@"
-'''
-
-
 # ---------------------------------------------------------------------------
 # Profile env resolution (called from _apply_profile_override)
 # ---------------------------------------------------------------------------
@@ -156,6 +156,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        is_aggregator=True,
        base_url_env_var="HF_BASE_URL",
    ),
+    "novita": HermesOverlay(
+        transport="openai_chat",
+        is_aggregator=True,
+        base_url_env_var="NOVITA_BASE_URL",
+    ),
    "xai": HermesOverlay(
        transport="codex_responses",
        base_url_override="https://api.x.ai/v1",
@@ -309,6 +314,10 @@ ALIASES: Dict[str, str] = {
    "hugging-face": "huggingface",
    "huggingface-hub": "huggingface",

+    # novita
+    "novita-ai": "novita",
+    "novitaai": "novita",
+
    # xiaomi
    "mimo": "xiaomi",
    "xiaomi-mimo": "xiaomi",
@@ -164,7 +164,18 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
        return "chat_completions"


-_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}
+_VALID_API_MODES = {
+    "chat_completions",
+    "codex_responses",
+    "anthropic_messages",
+    "bedrock_converse",
+    # Optional opt-in: hand the entire turn to a `codex app-server` subprocess
+    # so terminal/file-ops/patching/sandboxing run inside Codex's own runtime
+    # instead of Hermes' tool dispatch. Gated behind config key
+    # `model.openai_runtime == "codex_app_server"` AND provider in
+    # {"openai", "openai-codex"}. Default is unchanged.
+    "codex_app_server",
+}


 def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -176,6 +187,32 @@ def _parse_api_mode(raw: Any) -> Optional[str]:
    return None


+def _maybe_apply_codex_app_server_runtime(
+    *,
+    provider: str,
+    api_mode: str,
+    model_cfg: Optional[Dict[str, Any]],
+) -> str:
+    """Optional opt-in: rewrite api_mode → "codex_app_server" for OpenAI/Codex
+    providers when the user has explicitly enabled that runtime via
+    `model.openai_runtime: codex_app_server` in config.yaml.
+
+    Default behavior is preserved: when the key is unset, "auto", or empty,
+    this function is a no-op. Only providers in {"openai", "openai-codex"}
+    are eligible — other providers (anthropic, openrouter, etc.) cannot be
+    rerouted through codex.
+
+    Returns the (possibly-rewritten) api_mode."""
+    if not model_cfg:
+        return api_mode
+    if provider not in ("openai", "openai-codex"):
+        return api_mode
+    runtime = str(model_cfg.get("openai_runtime") or "").strip().lower()
+    if runtime == "codex_app_server":
+        return "codex_app_server"
+    return api_mode
+
+
 def _resolve_runtime_from_pool_entry(
    *,
    provider: str,
@@ -293,6 +330,12 @@ def _resolve_runtime_from_pool_entry(
    if api_mode == "anthropic_messages" and provider in {"opencode-zen", "opencode-go"}:
        base_url = re.sub(r"/v1/?$", "", base_url)

+    # Optional opt-in: route OpenAI/Codex turns through `codex app-server`.
+    # Inert when `model.openai_runtime` is unset or "auto".
+    api_mode = _maybe_apply_codex_app_server_runtime(
+        provider=provider, api_mode=api_mode, model_cfg=model_cfg
+    )
+
    return {
        "provider": provider,
        "api_mode": api_mode,
@@ -454,6 +454,26 @@ def _print_setup_summary(config: dict, hermes_home):
        else:
            tool_status.append(("Image Generation", False, "FAL_KEY or OPENAI_API_KEY"))

+    # Video generation — opt-in via `hermes tools` → Video Generation.
+    # Only show the row when a plugin reports available so we don't badger
+    # users who don't care about video gen with a "missing" status line.
+    try:
+        from agent.video_gen_registry import list_providers as _list_video_providers
+        from hermes_cli.plugins import _ensure_plugins_discovered as _ensure_plugins
+        _ensure_plugins()
+        _video_backend = None
+        for _vp in _list_video_providers():
+            try:
+                if _vp.is_available():
+                    _video_backend = _vp.display_name
+                    break
+            except Exception:
+                continue
+    except Exception:
+        _video_backend = None
+    if _video_backend:
+        tool_status.append((f"Video Generation ({_video_backend})", True, None))
+
    # TTS — show configured provider
    tts_provider = cfg_get(config, "tts", "provider", default="edge")
    if subscription_features.tts.managed_by_nous:
@@ -3246,18 +3266,6 @@ def run_setup_wizard(args):
        print_info(f"  cp {_backup_path} {config_path}")
    _print_setup_summary(config, hermes_home)

-    _offer_launch_chat()
-
-
-def _offer_launch_chat():
-    """Prompt the user to jump straight into chat after setup."""
-    print()
-    if not prompt_yes_no("Launch hermes chat now?", True):
-        return
-
-    from hermes_cli.relaunch import relaunch
-    relaunch(["chat"])
-

 def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
    """Streamlined first-time setup: provider, model, terminal & messaging.
@@ -3301,8 +3309,6 @@ def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):

    _print_setup_summary(config, hermes_home)

-    _offer_launch_chat()
-

 def _run_quick_setup(config: dict, hermes_home):
    """Quick setup — only configure items that are missing."""
@@ -666,25 +666,46 @@ def _load_skin_from_yaml(path: Path) -> Optional[Dict[str, Any]]:
    return None


+def _mapping_or_empty(value: Any, *, section: str, skin_name: str) -> Dict[str, Any]:
+    """Return a mapping value or an empty dict when the section type is invalid."""
+    if isinstance(value, dict):
+        return value
+    if value is None:
+        return {}
+    logger.warning(
+        "Skin '%s' has invalid '%s' section type (%s); ignoring section",
+        skin_name,
+        section,
+        type(value).__name__,
+    )
+    return {}
+
+
 def _build_skin_config(data: Dict[str, Any]) -> SkinConfig:
    """Build a SkinConfig from a raw dict (built-in or loaded from YAML)."""
    # Start with default values as base for missing keys
    default = _BUILTIN_SKINS["default"]
+    skin_name = str(data.get("name", "unknown"))
+    color_overrides = _mapping_or_empty(data.get("colors"), section="colors", skin_name=skin_name)
+    spinner_overrides = _mapping_or_empty(data.get("spinner"), section="spinner", skin_name=skin_name)
+    branding_overrides = _mapping_or_empty(data.get("branding"), section="branding", skin_name=skin_name)
+    emoji_overrides = _mapping_or_empty(data.get("tool_emojis"), section="tool_emojis", skin_name=skin_name)
+
    colors = dict(default.get("colors", {}))
-    colors.update(data.get("colors", {}))
+    colors.update(color_overrides)
    spinner = dict(default.get("spinner", {}))
-    spinner.update(data.get("spinner", {}))
+    spinner.update(spinner_overrides)
    branding = dict(default.get("branding", {}))
-    branding.update(data.get("branding", {}))
+    branding.update(branding_overrides)

    return SkinConfig(
-        name=data.get("name", "unknown"),
+        name=skin_name,
        description=data.get("description", ""),
        colors=colors,
        spinner=spinner,
        branding=branding,
        tool_prefix=data.get("tool_prefix", default.get("tool_prefix", "┊")),
-        tool_emojis=data.get("tool_emojis", {}),
+        tool_emojis=emoji_overrides,
        banner_logo=data.get("banner_logo", ""),
        banner_hero=data.get("banner_hero", ""),
    )
@@ -828,10 +849,14 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
    except Exception:
        return {}

-    prompt = skin.get_color("prompt", "#FFF8DC")
+    # Input/prompt: leave unset by default so the typed text inherits
+    # the terminal's foreground color (readable in both light and dark
+    # color schemes).  Skins can opt into a colored prompt by setting
+    # `prompt` explicitly in their YAML.
+    prompt = skin.get_color("prompt", "")
    input_rule = skin.get_color("input_rule", "#CD7F32")
    title = skin.get_color("banner_title", "#FFD700")
-    text = skin.get_color("banner_text", prompt)
+    text = skin.get_color("banner_text", "#FFF8DC")
    dim = skin.get_color("banner_dim", "#555555")
    label = skin.get_color("ui_label", title)
    warn = skin.get_color("ui_warn", "#FF8C00")
@@ -851,7 +876,11 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
    menu_meta_current_bg = skin.get_color("completion_menu_meta_current_bg", menu_current_bg)

    return {
-        "input-area": prompt,
+        # Typed input always uses terminal default fg/bg so it's
+        # readable in both light and dark Terminal.app modes.  The
+        # skin's `prompt` color (if any) only styles the prompt symbol,
+        # NOT the user's typed text.
+        "input-area": "",
        "placeholder": f"{dim} italic",
        "prompt": prompt,
        "prompt-working": f"{dim} italic",
@@ -60,6 +60,7 @@ CONFIGURABLE_TOOLSETS = [
    ("vision",          "👁️  Vision / Image Analysis",  "vision_analyze"),
    ("video",           "🎬 Video Analysis",            "video_analyze (requires video-capable model)"),
    ("image_gen",       "🎨 Image Generation",          "image_generate"),
+    ("video_gen",       "🎬 Video Generation",          "video_generate (text-to-video + image-to-video)"),
    ("moa",             "🧠 Mixture of Agents",         "mixture_of_agents"),
    ("tts",             "🔊 Text-to-Speech",            "text_to_speech"),
    ("skills",          "📚 Skills",                    "list, view, manage"),
@@ -82,7 +83,11 @@ CONFIGURABLE_TOOLSETS = [
 # Toolsets that are OFF by default for new installs.
 # They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
 # but the setup checklist won't pre-select them for first-time users.
-_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video"}
+#
+# Video gen is off by default — it's a niche, paid, slow feature. Users
+# who want it opt in via `hermes tools` → Video Generation, which walks
+# them through provider + model selection.
+_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video", "video_gen"}

 # Platform-scoped toolsets: only appear in the `hermes tools` checklist for
 # these platforms, and only resolve/save for these platforms.  A toolset
@@ -240,6 +245,15 @@ TOOL_CATEGORIES = {
        "setup_title": "Select Search Provider",
        "setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need a premium provider.",
        "icon": "🔍",
+        # Per-provider rows are injected at runtime from
+        # plugins.web.<vendor>.provider via _plugin_web_search_providers()
+        # in _visible_providers(). Only non-provider UX setup-flow rows
+        # for the firecrawl backend are listed here:
+        #   - "Nous Subscription" — managed Firecrawl billed via Nous
+        #     subscription (requires_nous_auth + override_env_vars).
+        #   - "Firecrawl Self-Hosted" — points firecrawl at a private
+        #     Docker instance via FIRECRAWL_API_URL only.
+        # See PR #25182 for the migration rationale.
        "providers": [
            {
                "name": "Nous Subscription",
@@ -251,42 +265,6 @@ TOOL_CATEGORIES = {
                "managed_nous_feature": "web",
                "override_env_vars": ["FIRECRAWL_API_KEY", "FIRECRAWL_API_URL"],
            },
-            {
-                "name": "Firecrawl Cloud",
-                "badge": "★ recommended",
-                "tag": "Full-featured search, extract, and crawl",
-                "web_backend": "firecrawl",
-                "env_vars": [
-                    {"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
-                ],
-            },
-            {
-                "name": "Exa",
-                "badge": "paid",
-                "tag": "Neural search with semantic understanding",
-                "web_backend": "exa",
-                "env_vars": [
-                    {"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
-                ],
-            },
-            {
-                "name": "Parallel",
-                "badge": "paid",
-                "tag": "AI-powered search and extract",
-                "web_backend": "parallel",
-                "env_vars": [
-                    {"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
-                ],
-            },
-            {
-                "name": "Tavily",
-                "badge": "free tier",
-                "tag": "Search, extract, and crawl — 1000 free searches/mo",
-                "web_backend": "tavily",
-                "env_vars": [
-                    {"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
-                ],
-            },
            {
                "name": "Firecrawl Self-Hosted",
                "badge": "free · self-hosted",
@@ -296,32 +274,6 @@ TOOL_CATEGORIES = {
                    {"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
                ],
            },
-            {
-                "name": "SearXNG",
-                "badge": "free · self-hosted · search only",
-                "tag": "Privacy-respecting metasearch engine — search only (pair with any extract provider)",
-                "web_backend": "searxng",
-                "env_vars": [
-                    {"key": "SEARXNG_URL", "prompt": "Your SearXNG instance URL (e.g., http://localhost:8080)", "url": "https://searxng.github.io/searxng/"},
-                ],
-            },
-            {
-                "name": "Brave Search (Free Tier)",
-                "badge": "free tier · search only",
-                "tag": "2,000 queries/mo free — search only (pair with any extract provider)",
-                "web_backend": "brave-free",
-                "env_vars": [
-                    {"key": "BRAVE_SEARCH_API_KEY", "prompt": "Brave Search subscription token", "url": "https://brave.com/search/api/"},
-                ],
-            },
-            {
-                "name": "DuckDuckGo (ddgs)",
-                "badge": "free · no key · search only",
-                "tag": "Search via the ddgs Python package — no API key (pair with any extract provider)",
-                "web_backend": "ddgs",
-                "env_vars": [],
-                "post_setup": "ddgs",
-            },
        ],
    },
    "image_gen": {
@@ -349,6 +301,15 @@ TOOL_CATEGORIES = {
            },
        ],
    },
+    "video_gen": {
+        "name": "Video Generation",
+        "icon": "🎬",
+        # Providers list is intentionally empty — every video gen backend
+        # is a plugin, surfaced by ``_plugin_video_gen_providers()`` and
+        # injected by ``_visible_providers``. Mirrors the design we'll
+        # converge image_gen toward.
+        "providers": [],
+    },
    "browser": {
        "name": "Browser Automation",
        "icon": "🌐",
@@ -1525,6 +1486,101 @@ def _plugin_image_gen_providers() -> list[dict]:
    return rows


+def _plugin_video_gen_providers() -> list[dict]:
+    """Build picker-row dicts from plugin-registered video gen providers.
+
+    Mirrors ``_plugin_image_gen_providers`` exactly — every video backend
+    is a plugin, so this function is the *only* source of provider rows
+    for the Video Generation category. The hardcoded ``TOOL_CATEGORIES``
+    entry for ``video_gen`` keeps an empty providers list.
+    """
+    try:
+        from agent.video_gen_registry import list_providers
+        from hermes_cli.plugins import _ensure_plugins_discovered
+
+        _ensure_plugins_discovered()
+        providers = list_providers()
+    except Exception:
+        return []
+
+    rows: list[dict] = []
+    for provider in providers:
+        try:
+            schema = provider.get_setup_schema()
+        except Exception:
+            continue
+        if not isinstance(schema, dict):
+            continue
+        rows.append(
+            {
+                "name": schema.get("name", provider.display_name),
+                "badge": schema.get("badge", ""),
+                "tag": schema.get("tag", ""),
+                "env_vars": schema.get("env_vars", []),
+                "video_gen_plugin_name": provider.name,
+            }
+        )
+    return rows
+
+
+# Mirror of _plugin_image_gen_providers for web search backends. Surfaces
+# every plugin-registered web provider so it appears in the
+# "Web Search & Extract" picker. All seven providers (brave-free, ddgs,
+# searxng, exa, parallel, tavily, firecrawl) live as plugins after
+# PR #25182 — this helper is the sole source of truth for the category's
+# provider rows. The hardcoded entries that used to drive the category
+# were deleted in the same PR; only the two non-provider UX rows
+# ("Nous Subscription" managed-gateway entry, "Firecrawl Self-Hosted")
+# remain in TOOL_CATEGORIES because they describe alternative *setup
+# flows* for the firecrawl backend rather than distinct providers.
+def _plugin_web_search_providers() -> list[dict]:
+    """Build picker-row dicts from plugin-registered web search providers.
+
+    Each returned dict is a regular ``TOOL_CATEGORIES`` provider row. It
+    populates both ``web_backend`` (legacy field consumed by setup +
+    selection helpers) and ``web_search_plugin_name`` (informational
+    marker) so the picker behaves identically whether a provider is
+    hardcoded or plugin-registered.
+
+    After PR #25182, all seven web providers (brave-free, ddgs, searxng,
+    exa, parallel, tavily, firecrawl) are plugins; this helper is the sole
+    source of provider rows for the Web Search & Extract category.
+    """
+    try:
+        from agent.web_search_registry import list_providers as _list_web_providers
+        from hermes_cli.plugins import _ensure_plugins_discovered
+
+        _ensure_plugins_discovered()
+        providers = _list_web_providers()
+    except Exception:
+        return []
+
+    rows: list[dict] = []
+    for provider in providers:
+        name = getattr(provider, "name", None)
+        if not name:
+            continue
+        try:
+            schema = provider.get_setup_schema()
+        except Exception:
+            continue
+        if not isinstance(schema, dict):
+            continue
+        row = {
+            "name": schema.get("name", provider.display_name),
+            "badge": schema.get("badge", ""),
+            "tag": schema.get("tag", ""),
+            "env_vars": schema.get("env_vars", []),
+            "web_backend": name,
+            "web_search_plugin_name": name,
+        }
+        # Optional pass-through fields the schema can opt into.
+        if schema.get("post_setup"):
+            row["post_setup"] = schema["post_setup"]
+        rows.append(row)
+    return rows
+
+
 def _visible_providers(cat: dict, config: dict) -> list[dict]:
    """Return provider entries visible for the current auth/config state."""
    features = get_nous_subscription_features(config)
@@ -1541,6 +1597,19 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
    if cat.get("name") == "Image Generation":
        visible.extend(_plugin_image_gen_providers())

+    # Inject plugin-registered video_gen backends. Unlike image_gen,
+    # video_gen has NO hardcoded providers — every backend is a plugin.
+    if cat.get("name") == "Video Generation":
+        visible.extend(_plugin_video_gen_providers())
+
+    # Inject plugin-registered web search backends. After PR #25182, this
+    # is the SOLE source of provider rows for the Web Search & Extract
+    # category — the per-provider hardcoded entries were deleted. The two
+    # remaining hardcoded rows ("Nous Subscription", "Firecrawl
+    # Self-Hosted") are non-provider UX setup-flow rows for firecrawl.
+    if cat.get("name") == "Web Search & Extract":
+        visible.extend(_plugin_web_search_providers())
+
    return visible


@@ -1608,6 +1677,23 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
            from agent.image_gen_registry import list_providers
            from hermes_cli.plugins import _ensure_plugins_discovered

+            _ensure_plugins_discovered()
+            for provider in list_providers():
+                try:
+                    if provider.is_available():
+                        return False
+                except Exception:
+                    continue
+        except Exception:
+            pass
+        return True
+    if ts_key == "video_gen":
+        # Satisfied when any plugin-registered video gen provider reports
+        # available — no in-tree fallback (every backend is a plugin).
+        try:
+            from agent.video_gen_registry import list_providers
+            from hermes_cli.plugins import _ensure_plugins_discovered
+
            _ensure_plugins_discovered()
            for provider in list_providers():
                try:
@@ -1952,6 +2038,106 @@ def _select_plugin_image_gen_provider(plugin_name: str, config: dict) -> None:
    _configure_imagegen_model_for_plugin(plugin_name, config)


+# ─── Video Generation Model Pickers ───────────────────────────────────────────
+
+
+def _plugin_video_gen_catalog(plugin_name: str):
+    """Return ``(catalog_dict, default_model_id)`` for a video gen plugin.
+
+    Mirrors :func:`_plugin_image_gen_catalog`. Returns ``({}, None)`` when
+    the plugin isn't registered or has no models.
+    """
+    try:
+        from agent.video_gen_registry import get_provider
+        from hermes_cli.plugins import _ensure_plugins_discovered
+
+        _ensure_plugins_discovered()
+        provider = get_provider(plugin_name)
+    except Exception:
+        return {}, None
+    if provider is None:
+        return {}, None
+    try:
+        models = provider.list_models() or []
+        default = provider.default_model()
+    except Exception:
+        return {}, None
+    catalog = {m["id"]: m for m in models if isinstance(m, dict) and "id" in m}
+    return catalog, default
+
+
+def _configure_videogen_model_for_plugin(plugin_name: str, config: dict) -> None:
+    """Prompt for a video gen model from a plugin's catalog.
+
+    Mirrors :func:`_configure_imagegen_model_for_plugin`. Writes the
+    selection to ``video_gen.model``.
+    """
+    catalog, default_model = _plugin_video_gen_catalog(plugin_name)
+    if not catalog:
+        return
+
+    cur_cfg = config.setdefault("video_gen", {})
+    if not isinstance(cur_cfg, dict):
+        cur_cfg = {}
+        config["video_gen"] = cur_cfg
+    current_model = cur_cfg.get("model") or default_model
+    if current_model not in catalog:
+        current_model = default_model
+
+    model_ids = list(catalog.keys())
+    ordered = [current_model] + [m for m in model_ids if m != current_model]
+
+    widths = {
+        "model": max(len(m) for m in model_ids),
+        "speed": max((len(catalog[m].get("speed", "")) for m in model_ids), default=6),
+        "strengths": max((len(catalog[m].get("strengths", "")) for m in model_ids), default=0),
+    }
+
+    print()
+    header = (
+        f"  {'Model':<{widths['model']}}  "
+        f"{'Speed':<{widths['speed']}}  "
+        f"{'Strengths':<{widths['strengths']}}  "
+        f"Price"
+    )
+    print(color(header, Colors.CYAN))
+
+    rows = []
+    for mid in ordered:
+        meta = catalog[mid]
+        row = (
+            f"  {mid:<{widths['model']}}  "
+            f"{meta.get('speed', ''):<{widths['speed']}}  "
+            f"{meta.get('strengths', ''):<{widths['strengths']}}  "
+            f"{meta.get('price', '')}"
+        )
+        if mid == current_model:
+            row += "  ← currently in use"
+        rows.append(row)
+
+    idx = _prompt_choice(
+        f"  Choose {plugin_name} model:",
+        rows,
+        default=0,
+    )
+
+    chosen = ordered[idx]
+    cur_cfg["model"] = chosen
+    _print_success(f"  Model set to: {chosen}")
+
+
+def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
+    """Persist a plugin-backed video generation provider selection."""
+    vid_cfg = config.setdefault("video_gen", {})
+    if not isinstance(vid_cfg, dict):
+        vid_cfg = {}
+        config["video_gen"] = vid_cfg
+    vid_cfg["provider"] = plugin_name
+    vid_cfg["use_gateway"] = False
+    _print_success(f"  video_gen.provider set to: {plugin_name}")
+    _configure_videogen_model_for_plugin(plugin_name, config)
+
+
 def _configure_provider(provider: dict, config: dict):
    """Configure a single provider - prompt for API keys and set config."""
    env_vars = provider.get("env_vars", [])
@@ -2014,6 +2200,12 @@ def _configure_provider(provider: dict, config: dict):
        if plugin_name:
            _select_plugin_image_gen_provider(plugin_name, config)
            return
+        # Plugin-registered video_gen provider — same flow, different
+        # registry.
+        video_plugin = provider.get("video_gen_plugin_name")
+        if video_plugin:
+            _select_plugin_video_gen_provider(video_plugin, config)
+            return
        # Imagegen backends prompt for model selection after backend pick.
        backend = provider.get("imagegen_backend")
        if backend:
@@ -2062,6 +2254,10 @@ def _configure_provider(provider: dict, config: dict):
        if plugin_name:
            _select_plugin_image_gen_provider(plugin_name, config)
            return
+        video_plugin = provider.get("video_gen_plugin_name")
+        if video_plugin:
+            _select_plugin_video_gen_provider(video_plugin, config)
+            return
        # Imagegen backends prompt for model selection after env vars are in.
        backend = provider.get("imagegen_backend")
        if backend:
@@ -2286,6 +2482,11 @@ def _reconfigure_provider(provider: dict, config: dict):
        if plugin_name:
            _select_plugin_image_gen_provider(plugin_name, config)
            return
+        # Plugin-registered video_gen provider — same flow, different registry.
+        video_plugin = provider.get("video_gen_plugin_name")
+        if video_plugin:
+            _select_plugin_video_gen_provider(video_plugin, config)
+            return
        # Imagegen backends prompt for model selection on reconfig too.
        backend = provider.get("imagegen_backend")
        if backend:
@@ -2318,6 +2519,12 @@ def _reconfigure_provider(provider: dict, config: dict):
        _select_plugin_image_gen_provider(plugin_name, config)
        return

+    # Plugin-registered video_gen provider — same flow, different registry.
+    video_plugin = provider.get("video_gen_plugin_name")
+    if video_plugin:
+        _select_plugin_video_gen_provider(video_plugin, config)
+        return
+
    backend = provider.get("imagegen_backend")
    if backend:
        _configure_imagegen_model(backend, config)
@@ -994,39 +994,9 @@ def get_model_options():
    can share the same types.
    """
    try:
-        from hermes_cli.model_switch import list_authenticated_providers
+        from hermes_cli.inventory import build_models_payload, load_picker_context

-        cfg = load_config()
-        model_cfg = cfg.get("model", {})
-        if isinstance(model_cfg, dict):
-            current_model = model_cfg.get("default", model_cfg.get("name", "")) or ""
-            current_provider = model_cfg.get("provider", "") or ""
-            current_base_url = model_cfg.get("base_url", "") or ""
-        else:
-            current_model = str(model_cfg) if model_cfg else ""
-            current_provider = ""
-            current_base_url = ""
-
-        user_providers = cfg.get("providers") if isinstance(cfg.get("providers"), dict) else {}
-        custom_providers = (
-            cfg.get("custom_providers")
-            if isinstance(cfg.get("custom_providers"), list)
-            else []
-        )
-
-        providers = list_authenticated_providers(
-            current_provider=current_provider,
-            current_base_url=current_base_url,
-            current_model=current_model,
-            user_providers=user_providers,
-            custom_providers=custom_providers,
-            max_models=50,
-        )
-        return {
-            "providers": providers,
-            "model": current_model,
-            "provider": current_provider,
-        }
+        return build_models_payload(load_picker_context(), max_models=50)
    except Exception:
        _log.exception("GET /api/model/options failed")
        raise HTTPException(status_code=500, detail="Failed to list model options")
@@ -1597,10 +1597,10 @@ class SessionDB:
        self._execute_write(_do)

    def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
-        """Load all messages for a session, ordered by timestamp."""
+        """Load all messages for a session, ordered by insertion order."""
        with self._lock:
            cursor = self._conn.execute(
-                "SELECT * FROM messages WHERE session_id = ? ORDER BY timestamp, id",
+                "SELECT * FROM messages WHERE session_id = ? ORDER BY id",
                (session_id,),
            )
            rows = cursor.fetchall()
@@ -1700,7 +1700,7 @@ class SessionDB:
                "SELECT role, content, tool_call_id, tool_calls, tool_name, "
                "finish_reason, reasoning, reasoning_content, reasoning_details, "
                "codex_reasoning_items, codex_message_items "
-                f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY timestamp, id",
+                f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY id",
                tuple(session_ids),
            ).fetchall()

@@ -1,232 +0,0 @@
---
-name: base
-description: Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.
-version: 0.1.0
-author: youssefea
-license: MIT
-platforms: [linux, macos, windows]
-metadata:
-  hermes:
-    tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
-    related_skills: []
---
-
-# Base Blockchain Skill
-
-Query Base (Ethereum L2) on-chain data enriched with USD pricing via CoinGecko.
-8 commands: wallet portfolio, token info, transactions, gas analysis,
-contract inspection, whale detection, network stats, and price lookup.
-
-No API key needed. Uses only Python standard library (urllib, json, argparse).
-
---
-
-## When to Use
-
- User asks for a Base wallet balance, token holdings, or portfolio value
- User wants to inspect a specific transaction by hash
- User wants ERC-20 token metadata, price, supply, or market cap
- User wants to understand Base gas costs and L1 data fees
- User wants to inspect a contract (ERC type detection, proxy resolution)
- User wants to find large ETH transfers (whale detection)
- User wants Base network health, gas price, or ETH price
- User asks "what's the price of USDC/AERO/DEGEN/ETH?"
-
---
-
-## Prerequisites
-
-The helper script uses only Python standard library (urllib, json, argparse).
-No external packages required.
-
-Pricing data comes from CoinGecko's free API (no key needed, rate-limited
-to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
-
---
-
-## Quick Reference
-
-RPC endpoint (default): https://mainnet.base.org
-Override: export BASE_RPC_URL=https://your-private-rpc.com
-
-Helper script path: ~/.hermes/skills/blockchain/base/scripts/base_client.py
-
-```
-python3 base_client.py wallet   <address> [--limit N] [--all] [--no-prices]
-python3 base_client.py tx       <hash>
-python3 base_client.py token    <contract_address>
-python3 base_client.py gas
-python3 base_client.py contract <address>
-python3 base_client.py whales   [--min-eth N]
-python3 base_client.py stats
-python3 base_client.py price    <contract_address_or_symbol>
-```
-
---
-
-## Procedure
-
-### 0. Setup Check
-
-```bash
-python3 --version
-
-# Optional: set a private RPC for better rate limits
-export BASE_RPC_URL="https://mainnet.base.org"
-
-# Confirm connectivity
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
-```
-
-### 1. Wallet Portfolio
-
-Get ETH balance and ERC-20 token holdings with USD values.
-Checks ~15 well-known Base tokens (USDC, WETH, AERO, DEGEN, etc.)
-via on-chain `balanceOf` calls. Tokens sorted by value, dust filtered.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
-  wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
-```
-
-Flags:
- `--limit N` — show top N tokens (default: 20)
- `--all` — show all tokens, no dust filter, no limit
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
-
-Output includes: ETH balance + USD value, token list with prices sorted
-by value, dust count, total portfolio value in USD.
-
-Note: Only checks known tokens. Unknown ERC-20s are not discovered.
-Use the `token` command with a specific contract address for any token.
-
-### 2. Transaction Details
-
-Inspect a full transaction by its hash. Shows ETH value transferred,
-gas used, fee in ETH/USD, status, and decoded ERC-20/ERC-721 transfers.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
-  tx 0xabc123...your_tx_hash_here
-```
-
-Output: hash, block, from, to, value (ETH + USD), gas price, gas used,
-fee, status, contract creation address (if any), token transfers.
-
-### 3. Token Info
-
-Get ERC-20 token metadata: name, symbol, decimals, total supply, price,
-market cap, and contract code size.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
-  token 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
-```
-
-Output: name, symbol, decimals, total supply, price, market cap.
-Reads name/symbol/decimals directly from the contract via eth_call.
-
-### 4. Gas Analysis
-
-Detailed gas analysis with cost estimates for common operations.
-Shows current gas price, base fee trends over 10 blocks, block
-utilization, and estimated costs for ETH transfers, ERC-20 transfers,
-and swaps.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py gas
-```
-
-Output: current gas price, base fee, block utilization, 10-block trend,
-cost estimates in ETH and USD.
-
-Note: Base is an L2 — actual transaction costs include an L1 data
-posting fee that depends on calldata size and L1 gas prices. The
-estimates shown are for L2 execution only.
-
-### 5. Contract Inspection
-
-Inspect an address: determine if it's an EOA or contract, detect
-ERC-20/ERC-721/ERC-1155 interfaces, resolve EIP-1967 proxy
-implementation addresses.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
-  contract 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
-```
-
-Output: is_contract, code size, ETH balance, detected interfaces
-(ERC-20, ERC-721, ERC-1155), ERC-20 metadata, proxy implementation
-address.
-
-### 6. Whale Detector
-
-Scan the most recent block for large ETH transfers with USD values.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
-  whales --min-eth 1.0
-```
-
-Note: scans the latest block only — point-in-time snapshot, not historical.
-Default threshold is 1.0 ETH (lower than Solana's default since ETH
-values are higher).
-
-### 7. Network Stats
-
-Live Base network health: latest block, chain ID, gas price, base fee,
-block utilization, transaction count, and ETH price.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
-```
-
-### 8. Price Lookup
-
-Quick price check for any token by contract address or known symbol.
-
-```bash
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price ETH
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price USDC
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price AERO
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price DEGEN
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
-```
-
-Known symbols: ETH, WETH, USDC, cbETH, AERO, DEGEN, TOSHI, BRETT,
-WELL, wstETH, rETH, cbBTC.
-
---
-
-## Pitfalls
-
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
-  Price lookups use 1 request per token. Use `--no-prices` for speed.
- **Public RPC rate-limits** — Base's public RPC limits requests.
-  For production use, set BASE_RPC_URL to a private endpoint
-  (Alchemy, QuickNode, Infura).
- **Wallet shows known tokens only** — unlike Solana, EVM chains have no
-  built-in "get all tokens" RPC. The wallet command checks ~15 popular
-  Base tokens via `balanceOf`. Unknown ERC-20s won't appear. Use the
-  `token` command for any specific contract.
- **Token names read from contract** — if a contract doesn't implement
-  `name()` or `symbol()`, these fields may be empty. Known tokens have
-  hardcoded labels as fallback.
- **Gas estimates are L2 only** — Base transaction costs include an L1
-  data posting fee (depends on calldata size and L1 gas prices). The gas
-  command estimates L2 execution cost only.
- **Whale detector scans latest block only** — not historical. Results
-  vary by the moment you query. Default threshold is 1.0 ETH.
- **Proxy detection** — only EIP-1967 proxies are detected. Other proxy
-  patterns (EIP-1167 minimal proxy, custom storage slots) are not checked.
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
-  with exponential backoff on rate-limit errors.
-
---
-
-## Verification
-
-```bash
-# Should print Base chain ID (8453), latest block, gas price, and ETH price
-python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
-```
@@ -0,0 +1,211 @@
+---
+name: evm
+description: "Read-only EVM client: wallets, tokens, gas across 8 chains."
+version: 1.0.0
+author: Mibayy (@Mibayy), youssefea (@youssefea), ethernet8023 (@ethernet8023), Hermes Agent
+license: MIT
+platforms: [linux, macos, windows]
+metadata:
+  hermes:
+    tags: [EVM, Ethereum, BNB, BSC, Base, Arbitrum, Polygon, Optimism, Avalanche, zkSync, Blockchain, Crypto, Web3, DeFi, NFT, ENS, Whale, Security]
+    category: blockchain
+    related_skills: [solana]
+    requires_toolsets: [terminal]
+---
+
+# EVM Blockchain Skill
+
+Query EVM-compatible blockchain data across 8 chains with USD pricing.
+14 commands: wallet portfolio, token info, transactions, activity, gas tracker,
+network stats, price lookup, multi-chain scan, whale detection, ENS resolution,
+allowance checker, contract inspector, and transaction decoder.
+
+Supports 8 chains: Ethereum, BNB Chain (BSC), Base, Arbitrum One, Polygon,
+Optimism, Avalanche (C-Chain), zkSync Era.
+
+No API key needed. Zero external dependencies — Python standard library only
+(urllib, json, argparse, threading).
+
+> **Supersedes the standalone `base` skill.** Base-specific tokens (AERO, DEGEN,
+> TOSHI, BRETT, WELL, cbETH, cbBTC, wstETH, rETH) and all Base RPC functionality
+> previously living under `optional-skills/blockchain/base/` have been folded
+> into this skill. Pass `--chain base` to any command for Base coverage.
+
+---
+
+## When to Use
+- User asks for a wallet balance or portfolio on any EVM chain
+- User wants to check the same wallet across ALL chains at once
+- User wants to inspect a transaction by hash (or decode what it did)
+- User wants ERC-20 token metadata, price, supply, or market cap
+- User wants recent transaction history for an address
+- User wants current gas prices or to compare fees across chains
+- User wants to find large whale transfers in recent blocks
+- User asks to resolve an ENS name (vitalik.eth) or reverse-lookup an address
+- User wants to check if a contract has dangerous token approvals
+- User wants to inspect a smart contract (proxy? ERC-20? ERC-721? bytecode size?)
+- User wants to compare gas costs across chains before a transaction
+
+---
+
+## Prerequisites
+Python 3.8+ standard library only. No pip installs required.
+Pricing: CoinGecko free API (rate-limited, ~10-30 req/min).
+ENS: ensideas.com public API.
+Tx decoding: 4byte.directory public API.
+
+Override RPC endpoint: `export EVM_RPC_URL=https://your-rpc.com`
+
+Helper script path: `~/.hermes/skills/blockchain/evm/scripts/evm_client.py`
+
+---
+
+## Quick Reference
+
+```
+SCRIPT=~/.hermes/skills/blockchain/evm/scripts/evm_client.py
+
+# Network & prices
+python3 $SCRIPT stats                            # Ethereum stats
+python3 $SCRIPT stats --chain arbitrum           # Arbitrum stats
+python3 $SCRIPT compare                          # Gas + prices ALL 8 chains
+
+# Wallet
+python3 $SCRIPT wallet 0xd8dA...96045            # Portfolio (ETH + ERC-20)
+python3 $SCRIPT wallet 0xd8dA...96045 --chain bsc
+python3 $SCRIPT multichain 0xd8dA...96045        # Same wallet on ALL chains
+
+# Tokens & prices
+python3 $SCRIPT price ETH
+python3 $SCRIPT price 0xdAC1...1ec7              # By contract address
+python3 $SCRIPT token 0xdAC1...1ec7              # ERC-20 metadata + market cap
+
+# Transactions
+python3 $SCRIPT tx 0x5c50...f060                 # Transaction details
+python3 $SCRIPT decode 0x5c50...f060             # Decode input data (4byte.directory)
+python3 $SCRIPT activity 0xd8dA...96045          # Recent transactions
+
+# Gas
+python3 $SCRIPT gas                              # Gas prices + cost estimates
+python3 $SCRIPT gas --chain optimism
+
+# Security
+python3 $SCRIPT allowance 0xd8dA...96045         # Dangerous ERC-20 approvals
+python3 $SCRIPT contract 0xdAC1...1ec7           # Contract inspection (proxy? standards?)
+
+# ENS
+python3 $SCRIPT ens vitalik.eth                  # Name -> address + profile
+python3 $SCRIPT ens 0xd8dA...96045               # Address -> ENS name
+
+# Whale detection
+python3 $SCRIPT whale                            # Large transfers (last 20 blocks, >$10k)
+python3 $SCRIPT whale --blocks 50 --min-usd 100000 --chain arbitrum
+```
+
+---
+
+## Procedure
+
+### 0. Setup Check
+```bash
+python3 --version   # 3.8+ required
+python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py stats
+```
+
+### 1. Wallet Portfolio
+Native balance + known ERC-20 tokens, sorted by USD value.
+```bash
+python3 $SCRIPT wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
+python3 $SCRIPT wallet 0xd8dA... --chain bsc --no-prices   # faster
+```
+
+### 2. Multi-Chain Scan
+Scans all 8 chains simultaneously for the same address using threads.
+```bash
+python3 $SCRIPT multichain 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
+```
+Output: per-chain native balance + token holdings + grand total USD.
+
+### 3. Compare (Gas + Prices)
+All 8 chains queried in parallel. Shows cheapest/most expensive chain.
+```bash
+python3 $SCRIPT compare
+```
+
+### 4. Transaction Details & Decode
+```bash
+python3 $SCRIPT tx 0x5c504ed432cb51138bcf09aa5e8a410dd4a1e204ef84bfed1be16dfba1b22060
+python3 $SCRIPT decode 0x5c504ed...   # Shows human-readable function signature
+```
+Decode uses 4byte.directory to translate 0xa9059cbb -> transfer(address,uint256).
+
+### 5. ENS Resolution
+```bash
+python3 $SCRIPT ens vitalik.eth          # -> 0xd8dA... + avatar + social links
+python3 $SCRIPT ens 0xd8dA...96045       # -> vitalik.eth
+```
+
+### 6. Allowance Checker (Security)
+Checks ERC-20 approvals granted to known DEX/bridge contracts.
+```bash
+python3 $SCRIPT allowance 0xYourWallet
+```
+Flags UNLIMITED approvals as HIGH risk.
+
+### 7. Contract Inspector
+```bash
+python3 $SCRIPT contract 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48   # USDC (proxy)
+python3 $SCRIPT contract 0xdAC17F958D2ee523a2206206994597C13D831ec7   # USDT (ERC-20)
+```
+Detects: proxy (EIP-1967/EIP-1167), ERC-20, ERC-721, ERC-165. Shows bytecode size and implementation address for proxies.
+
+### 8. Whale Detection
+```bash
+python3 $SCRIPT whale                                    # ETH, last 20 blocks, >$10k
+python3 $SCRIPT whale --blocks 50 --min-usd 50000 --chain bsc
+```
+
+### 9. Gas Tracker
+```bash
+python3 $SCRIPT gas
+python3 $SCRIPT gas --chain polygon
+```
+Shows gwei price + USD cost for: transfer, ERC-20 transfer, approve, swap, NFT mint, NFT transfer.
+
+---
+
+## Supported Chains
+| Key       | Name           | Native | Chain ID |
+|-----------|----------------|--------|----------|
+| ethereum  | Ethereum       | ETH    | 1        |
+| bsc       | BNB Chain      | BNB    | 56       |
+| base      | Base           | ETH    | 8453     |
+| arbitrum  | Arbitrum One   | ETH    | 42161    |
+| polygon   | Polygon        | POL    | 137      |
+| optimism  | Optimism       | ETH    | 10       |
+| avalanche | Avalanche C    | AVAX   | 43114    |
+| zksync    | zkSync Era     | ETH    | 324      |
+
+---
+
+## Pitfalls
+- CoinGecko free tier: ~10-30 req/min. Use `--no-prices` for faster wallet scans.
+- Public RPCs may throttle. Set EVM_RPC_URL to a private endpoint for production.
+- `wallet` and `allowance` only check known token list (~30 tokens per chain). Use a block explorer for complete token discovery.
+- `activity` scans recent blocks only (max 200). For full history, use Etherscan API.
+- `multichain` runs 8 parallel threads — can trigger rate limits on public RPCs.
+- ENS resolution depends on a single public endpoint (ensideas.com / ens.vitalik.ca) with no fallback. If that endpoint is down, `ens` will fail — re-run later or use a block explorer.
+- Tx decoding depends on a single public endpoint (4byte.directory) with no fallback. Selectors not in their database show up as `unknown`.
+- **L2 gas estimates are L2-execution only.** On rollups like Base, Arbitrum, Optimism, and zkSync, the actual transaction cost also includes an L1 data-posting fee that depends on calldata size and current L1 gas prices. The `gas` command does not estimate that L1 component. For Base specifically, see the network's L1 fee oracle (contract `0x420000000000000000000000000000000000000F`).
+- Address / tx-hash inputs are validated for 0x-prefix + correct length + hex, but EIP-55 checksum casing is **not** enforced (RPC endpoints accept any-case hex).
+
+---
+
+## Verification
+```bash
+# Should print current block, gas price, ETH price
+python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py stats
+
+# Should resolve vitalik.eth to 0xd8dA...
+python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py ens vitalik.eth
+```
@@ -21,6 +21,7 @@ from dataclasses import dataclass, field
 from pathlib import Path

 from hermes_constants import get_hermes_home
+from hermes_cli.profiles import _get_default_hermes_home
 from typing import Any, TYPE_CHECKING

 if TYPE_CHECKING:
@@ -73,7 +74,7 @@ def resolve_config_path() -> Path:
        return local_path

    # Default profile's config — host blocks accumulate here via setup/clone
-    default_path = Path.home() / ".hermes" / "honcho.json"
+    default_path = _get_default_hermes_home() / "honcho.json"
    if default_path != local_path and default_path.exists():
        return default_path

@@ -336,10 +336,17 @@ ADD_RESOURCE_SCHEMA = {

 def _zip_directory(dir_path: Path) -> Path:
    """Create a temporary zip file containing a directory tree."""
+    root = dir_path.resolve()
    zip_path = Path(tempfile.gettempdir()) / f"openviking_upload_{uuid.uuid4().hex}.zip"
    with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zipf:
        for file_path in dir_path.rglob("*"):
+            if file_path.is_symlink():
+                continue
            if file_path.is_file():
+                try:
+                    file_path.resolve().relative_to(root)
+                except ValueError:
+                    continue
                arcname = str(file_path.relative_to(dir_path)).replace("\\", "/")
                zipf.write(file_path, arcname=arcname)
    return zip_path
@@ -2,6 +2,7 @@

 from typing import Any

+from agent.portal_tags import nous_portal_tags
 from providers import register_provider
 from providers.base import ProviderProfile

@@ -12,7 +13,7 @@ class NousProfile(ProviderProfile):
    def build_extra_body(
        self, *, session_id: str | None = None, **context
    ) -> dict[str, Any]:
-        return {"tags": ["product=hermes-agent"]}
+        return {"tags": nous_portal_tags()}

    def build_api_kwargs_extras(
        self,
@@ -0,0 +1,27 @@
+"""NovitaAI provider profile."""
+
+from providers import register_provider
+from providers.base import ProviderProfile
+
+
+novita = ProviderProfile(
+    name="novita",
+    aliases=("novita-ai", "novitaai"),
+    display_name="NovitaAI",
+    description="NovitaAI — AI-native cloud for builders and agents",
+    signup_url="https://novita.ai/settings/key-management",
+    env_vars=("NOVITA_API_KEY", "NOVITA_BASE_URL"),
+    base_url="https://api.novita.ai/openai/v1",
+    auth_type="api_key",
+    default_aux_model="deepseek/deepseek-v3-0324",
+    fallback_models=(
+        "moonshotai/kimi-k2.5",
+        "minimax/minimax-m2.7",
+        "zai-org/glm-5",
+        "deepseek/deepseek-v3-0324",
+        "deepseek/deepseek-r1-0528",
+        "qwen/qwen3-235b-a22b-fp8",
+    ),
+)
+
+register_provider(novita)
@@ -0,0 +1,5 @@
+name: novita-provider
+kind: model-provider
+version: 1.0.0
+description: NovitaAI AI-native cloud for builders and agents
+author: Nous Research
@@ -0,0 +1,523 @@
+"""FAL.ai video generation backend.
+
+User-facing surface: pick a **model family** (e.g. "Pixverse v6",
+"Veo 3.1", "Seedance 2.0", "Kling v3 4K", "LTX 2.3", "Happy Horse").
+The plugin auto-routes to the family's text-to-video endpoint when
+called without ``image_url``, and to its image-to-video endpoint when
+``image_url`` is provided. The agent never sees the routing — it just
+calls ``video_generate(prompt=..., image_url=...)``.
+
+Model families (each with t2v + i2v endpoints):
+
+  Cheap tier:
+    ltx-2.3       fal-ai/ltx-2.3-22b/text-to-video               /  fal-ai/ltx-2.3-22b/image-to-video
+    pixverse-v6   fal-ai/pixverse/v6/text-to-video               /  fal-ai/pixverse/v6/image-to-video
+
+  Premium tier:
+    veo3.1        fal-ai/veo3.1                                  /  fal-ai/veo3.1/image-to-video
+    seedance-2.0  bytedance/seedance-2.0/text-to-video           /  bytedance/seedance-2.0/image-to-video
+    kling-v3-4k   fal-ai/kling-video/v3/4k/text-to-video         /  fal-ai/kling-video/v3/4k/image-to-video
+    happy-horse   fal-ai/happy-horse/text-to-video               /  fal-ai/happy-horse/image-to-video
+
+Selection precedence for the active family:
+    1. ``model=`` arg from the tool call
+    2. ``FAL_VIDEO_MODEL`` env var
+    3. ``video_gen.fal.model`` in ``config.yaml``
+    4. ``video_gen.model`` in ``config.yaml`` (when it's one of our family IDs)
+    5. ``DEFAULT_MODEL``
+
+Authentication via ``FAL_KEY``. Output is an HTTPS URL from FAL's CDN; the
+gateway downloads and delivers it.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from typing import Any, Dict, List, Optional, Tuple
+
+from agent.video_gen_provider import (
+    VideoGenProvider,
+    error_response,
+    success_response,
+)
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Family catalog
+# ---------------------------------------------------------------------------
+#
+# Each family declares both endpoints (when available) plus a per-family
+# capability sheet derived from FAL's OpenAPI schemas. Capability flags
+# drive which keys get added to the request payload — keys a family doesn't
+# advertise are dropped before send.
+#
+# Capabilities:
+#   aspect_ratios  : tuple of supported ratios (None = endpoint decides)
+#   resolutions    : tuple of supported resolutions (None = endpoint decides)
+#   durations      : tuple of supported durations OR (min, max) range
+#                    (heuristic: 2-element with gap > 1 is a range)
+#   audio          : True if generate_audio is supported
+#   negative       : True if negative_prompt is supported
+
+FAL_FAMILIES: Dict[str, Dict[str, Any]] = {
+    # ─── Cheap / fast tier ─────────────────────────────────────────────
+    "ltx-2.3": {
+        "display": "LTX 2.3 (22B)",
+        "speed": "~30-60s",
+        "price": "cheap",
+        "strengths": "22B model with native audio generation. Affordable.",
+        "tier": "cheap",
+        "text_endpoint": "fal-ai/ltx-2.3-22b/text-to-video",
+        "image_endpoint": "fal-ai/ltx-2.3-22b/image-to-video",
+        # LTX docs don't expose duration/aspect/resolution enums — leave
+        # blank so we don't send unrecognized payload keys.
+        "aspect_ratios": None,
+        "resolutions": None,
+        "durations": None,
+        "audio": True,
+        "negative": True,
+    },
+    "pixverse-v6": {
+        "display": "Pixverse v6",
+        "speed": "~30-90s",
+        "price": "cheap",
+        "strengths": "Affordable. Negative prompts. 1-15s durations.",
+        "tier": "cheap",
+        "text_endpoint": "fal-ai/pixverse/v6/text-to-video",
+        "image_endpoint": "fal-ai/pixverse/v6/image-to-video",
+        "aspect_ratios": None,
+        "resolutions": ("360p", "540p", "720p", "1080p"),
+        "durations": (1, 15),
+        "audio": True,
+        "negative": True,
+    },
+    # ─── Expensive / premium tier ──────────────────────────────────────
+    "veo3.1": {
+        "display": "Veo 3.1",
+        "speed": "~60-120s",
+        "price": "premium",
+        "strengths": "Google DeepMind. Cinematic, native audio, strong prompt adherence.",
+        "tier": "premium",
+        "text_endpoint": "fal-ai/veo3.1",
+        "image_endpoint": "fal-ai/veo3.1/image-to-video",
+        "aspect_ratios": ("16:9", "9:16"),
+        "resolutions": ("720p", "1080p"),
+        "durations": (4, 6, 8),
+        "audio": True,
+        "negative": True,
+    },
+    "seedance-2.0": {
+        "display": "Seedance 2.0",
+        "speed": "~60-120s",
+        "price": "premium",
+        "strengths": "ByteDance. Cinematic, synchronized audio + lip-sync, 4-15s.",
+        "tier": "premium",
+        "text_endpoint": "bytedance/seedance-2.0/text-to-video",
+        "image_endpoint": "bytedance/seedance-2.0/image-to-video",
+        # Seedance accepts "auto" too — we omit it from the enum so the
+        # agent can't pass it; the endpoint defaults handle the rest.
+        "aspect_ratios": ("21:9", "16:9", "4:3", "1:1", "3:4", "9:16"),
+        "resolutions": ("480p", "720p", "1080p"),
+        "durations": (4, 15),
+        "audio": True,
+        "negative": False,
+    },
+    "kling-v3-4k": {
+        "display": "Kling v3 4K",
+        "speed": "~120-300s",
+        "price": "premium",
+        "strengths": "4K output, native audio (Chinese/English), 3-15s.",
+        "tier": "premium",
+        "text_endpoint": "fal-ai/kling-video/v3/4k/text-to-video",
+        "image_endpoint": "fal-ai/kling-video/v3/4k/image-to-video",
+        # Kling 4K image-to-video uses `start_image_url` instead of
+        # `image_url`. Handled in _build_payload via image_param_key.
+        "image_param_key": "start_image_url",
+        "aspect_ratios": ("16:9", "9:16", "1:1"),
+        "resolutions": None,  # 4K is implicit
+        "durations": (3, 15),
+        "audio": True,
+        "negative": True,
+    },
+    "happy-horse": {
+        "display": "Happy Horse 1.0",
+        "speed": "~60-120s",
+        "price": "premium",
+        "strengths": "Alibaba. New model, sparse public docs — conservative defaults.",
+        "tier": "premium",
+        "text_endpoint": "fal-ai/happy-horse/text-to-video",
+        "image_endpoint": "fal-ai/happy-horse/image-to-video",
+        # Docs don't expose duration/aspect/resolution — let the endpoint
+        # apply its own defaults.
+        "aspect_ratios": None,
+        "resolutions": None,
+        "durations": None,
+        "audio": False,
+        "negative": False,
+    },
+}
+
+DEFAULT_MODEL = "pixverse-v6"  # cheap, both modalities, sane defaults
+
+
+def _is_duration_range(durations: Any) -> bool:
+    """Heuristic: a 2-tuple of ints with a gap > 1 is treated as ``(min, max)``."""
+    if not isinstance(durations, tuple) or len(durations) != 2:
+        return False
+    if not all(isinstance(d, int) for d in durations):
+        return False
+    return durations[1] - durations[0] > 1
+
+
+def _clamp_duration(family: Dict[str, Any], duration: Optional[int]) -> Optional[int]:
+    durations = family.get("durations")
+    if not durations:
+        return duration
+    if duration is None:
+        return durations[0]
+    if _is_duration_range(durations):
+        lo, hi = durations
+        return max(lo, min(hi, duration))
+    # enum
+    if duration in durations:
+        return duration
+    return min(durations, key=lambda d: abs(d - duration))
+
+
+# ---------------------------------------------------------------------------
+# Config / model resolution
+# ---------------------------------------------------------------------------
+
+
+def _load_video_gen_section() -> Dict[str, Any]:
+    try:
+        from hermes_cli.config import load_config
+
+        cfg = load_config()
+        section = cfg.get("video_gen") if isinstance(cfg, dict) else None
+        return section if isinstance(section, dict) else {}
+    except Exception as exc:
+        logger.debug("Could not load video_gen config: %s", exc)
+        return {}
+
+
+def _resolve_family(explicit: Optional[str]) -> Tuple[str, Dict[str, Any]]:
+    """Decide which FAL family to use. Returns ``(family_id, meta)``."""
+    candidates: List[Optional[str]] = []
+    candidates.append(explicit)
+    candidates.append(os.environ.get("FAL_VIDEO_MODEL"))
+
+    cfg = _load_video_gen_section()
+    fal_cfg = cfg.get("fal") if isinstance(cfg.get("fal"), dict) else {}
+    if isinstance(fal_cfg, dict):
+        candidates.append(fal_cfg.get("model"))
+    top = cfg.get("model")
+    if isinstance(top, str):
+        candidates.append(top)
+
+    for c in candidates:
+        if isinstance(c, str) and c.strip() and c.strip() in FAL_FAMILIES:
+            fid = c.strip()
+            return fid, FAL_FAMILIES[fid]
+
+    return DEFAULT_MODEL, FAL_FAMILIES[DEFAULT_MODEL]
+
+
+# ---------------------------------------------------------------------------
+# Payload construction
+# ---------------------------------------------------------------------------
+
+
+def _build_payload(
+    family: Dict[str, Any],
+    *,
+    prompt: str,
+    image_url: Optional[str],
+    duration: Optional[int],
+    aspect_ratio: str,
+    resolution: str,
+    negative_prompt: Optional[str],
+    audio: Optional[bool],
+    seed: Optional[int],
+) -> Dict[str, Any]:
+    """Build a family-specific payload, dropping keys the family doesn't declare."""
+    payload: Dict[str, Any] = {}
+
+    if prompt:
+        payload["prompt"] = prompt
+    if image_url:
+        # Some endpoints (e.g. Kling v3 4K image-to-video) expect
+        # `start_image_url` instead of `image_url`. The family entry can
+        # declare an override.
+        key = family.get("image_param_key") or "image_url"
+        payload[key] = image_url
+    if seed is not None:
+        payload["seed"] = seed
+
+    if family.get("aspect_ratios"):
+        if aspect_ratio in family["aspect_ratios"]:
+            payload["aspect_ratio"] = aspect_ratio
+        # otherwise let the endpoint auto-crop / use its default
+
+    if family.get("resolutions"):
+        if resolution in family["resolutions"]:
+            payload["resolution"] = resolution
+        # else: let the endpoint default
+
+    clamped = _clamp_duration(family, duration)
+    if clamped is not None and family.get("durations"):
+        # FAL exposes duration as a string in the queue API ("8" not 8).
+        payload["duration"] = str(clamped)
+
+    if family.get("audio") and audio is not None:
+        payload["generate_audio"] = bool(audio)
+
+    if family.get("negative") and negative_prompt:
+        payload["negative_prompt"] = negative_prompt
+
+    return payload
+
+
+# ---------------------------------------------------------------------------
+# fal_client lazy import (same pattern as image_generation_tool)
+# ---------------------------------------------------------------------------
+
+_fal_client: Any = None
+
+
+def _load_fal_client() -> Any:
+    global _fal_client
+    if _fal_client is not None:
+        return _fal_client
+    import fal_client  # type: ignore
+
+    _fal_client = fal_client
+    return fal_client
+
+
+# ---------------------------------------------------------------------------
+# Provider
+# ---------------------------------------------------------------------------
+
+
+class FALVideoGenProvider(VideoGenProvider):
+    """FAL.ai multi-family video generation backend.
+
+    Routes between text-to-video and image-to-video endpoints automatically
+    based on whether ``image_url`` was provided.
+    """
+
+    @property
+    def name(self) -> str:
+        return "fal"
+
+    @property
+    def display_name(self) -> str:
+        return "FAL"
+
+    def is_available(self) -> bool:
+        if not os.environ.get("FAL_KEY", "").strip():
+            return False
+        try:
+            import fal_client  # noqa: F401
+        except ImportError:
+            return False
+        return True
+
+    def list_models(self) -> List[Dict[str, Any]]:
+        out: List[Dict[str, Any]] = []
+        for fid, meta in FAL_FAMILIES.items():
+            modalities: List[str] = []
+            if meta.get("text_endpoint"):
+                modalities.append("text")
+            if meta.get("image_endpoint"):
+                modalities.append("image")
+            out.append({
+                "id": fid,
+                "display": meta["display"],
+                "speed": meta["speed"],
+                "strengths": meta["strengths"],
+                "price": meta["price"],
+                "tier": meta.get("tier", "premium"),
+                "modalities": modalities,
+            })
+        return out
+
+    def default_model(self) -> Optional[str]:
+        return DEFAULT_MODEL
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "FAL",
+            "badge": "paid",
+            "tag": "LTX, Pixverse, Veo 3.1, Seedance 2.0, Kling 4K, Happy Horse — text-to-video & image-to-video",
+            "env_vars": [
+                {
+                    "key": "FAL_KEY",
+                    "prompt": "FAL.ai API key",
+                    "url": "https://fal.ai/dashboard/keys",
+                },
+            ],
+        }
+
+    def capabilities(self) -> Dict[str, Any]:
+        return {
+            "modalities": ["text", "image"],
+            "aspect_ratios": ["16:9", "9:16", "1:1"],
+            "resolutions": ["360p", "540p", "720p", "1080p"],
+            "max_duration": 15,
+            "min_duration": 1,
+            "supports_audio": True,
+            "supports_negative_prompt": True,
+            "max_reference_images": 0,
+        }
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        model: Optional[str] = None,
+        image_url: Optional[str] = None,
+        reference_image_urls: Optional[List[str]] = None,
+        duration: Optional[int] = None,
+        aspect_ratio: str = "16:9",
+        resolution: str = "720p",
+        negative_prompt: Optional[str] = None,
+        audio: Optional[bool] = None,
+        seed: Optional[int] = None,
+        **kwargs: Any,
+    ) -> Dict[str, Any]:
+        if not os.environ.get("FAL_KEY", "").strip():
+            return error_response(
+                error=(
+                    "FAL_KEY not set. Run `hermes tools` → Video Generation "
+                    "→ FAL to configure."
+                ),
+                error_type="auth_required",
+                provider="fal",
+                prompt=prompt,
+            )
+
+        try:
+            fal_client = _load_fal_client()
+        except ImportError:
+            return error_response(
+                error="fal_client Python package not installed (pip install fal-client)",
+                error_type="missing_dependency",
+                provider="fal",
+                prompt=prompt,
+            )
+
+        prompt = (prompt or "").strip()
+        family_id, family = _resolve_family(model)
+
+        # Route: image_url → image-to-video endpoint; else → text-to-video.
+        image_url_norm = (image_url or "").strip() or None
+        if image_url_norm:
+            endpoint = family.get("image_endpoint")
+            modality_used = "image"
+            if not endpoint:
+                return error_response(
+                    error=(
+                        f"FAL family {family_id} has no image-to-video "
+                        f"endpoint. Pick a family with image-to-video support "
+                        f"via `hermes tools` → Video Generation."
+                    ),
+                    error_type="modality_unsupported",
+                    provider="fal", model=family_id, prompt=prompt,
+                )
+        else:
+            endpoint = family.get("text_endpoint")
+            modality_used = "text"
+            if not endpoint:
+                return error_response(
+                    error=(
+                        f"FAL family {family_id} has no text-to-video "
+                        f"endpoint. Pass an image_url to use its "
+                        f"image-to-video endpoint, or pick a different family."
+                    ),
+                    error_type="modality_unsupported",
+                    provider="fal", model=family_id, prompt=prompt,
+                )
+
+        if not prompt:
+            return error_response(
+                error="prompt is required.",
+                error_type="missing_prompt",
+                provider="fal", model=family_id, prompt=prompt,
+            )
+
+        payload = _build_payload(
+            family,
+            prompt=prompt,
+            image_url=image_url_norm,
+            duration=duration,
+            aspect_ratio=aspect_ratio,
+            resolution=resolution,
+            negative_prompt=negative_prompt,
+            audio=audio,
+            seed=seed,
+        )
+
+        try:
+            result = fal_client.subscribe(
+                endpoint,
+                arguments=payload,
+                with_logs=False,
+            )
+        except Exception as exc:
+            logger.warning(
+                "FAL video gen failed (family=%s, endpoint=%s): %s",
+                family_id, endpoint, exc, exc_info=True,
+            )
+            return error_response(
+                error=f"FAL video generation failed: {exc}",
+                error_type="api_error",
+                provider="fal", model=family_id, prompt=prompt,
+                aspect_ratio=aspect_ratio,
+            )
+
+        video = (result or {}).get("video") if isinstance(result, dict) else None
+        url: Optional[str] = None
+        if isinstance(video, dict):
+            url = video.get("url")
+        elif isinstance(video, str):
+            url = video
+
+        if not url:
+            return error_response(
+                error="FAL returned no video URL in response",
+                error_type="empty_response",
+                provider="fal", model=family_id, prompt=prompt,
+            )
+
+        extra: Dict[str, Any] = {"endpoint": endpoint}
+        if isinstance(video, dict):
+            if video.get("file_size"):
+                extra["file_size"] = video["file_size"]
+            if video.get("content_type"):
+                extra["content_type"] = video["content_type"]
+
+        return success_response(
+            video=url,
+            model=family_id,
+            prompt=prompt,
+            modality=modality_used,
+            aspect_ratio=aspect_ratio if "aspect_ratio" in payload else "",
+            duration=int(payload["duration"]) if "duration" in payload else 0,
+            provider="fal",
+            extra=extra,
+        )
+
+
+# ---------------------------------------------------------------------------
+# Plugin entry point
+# ---------------------------------------------------------------------------
+
+
+def register(ctx) -> None:
+    """Plugin entry point — wire ``FALVideoGenProvider`` into the registry."""
+    ctx.register_video_gen_provider(FALVideoGenProvider())
@@ -0,0 +1,7 @@
+name: fal
+version: 1.0.0
+description: "FAL.ai video generation backend. Multi-model — Veo 3.1, Kling, Pixverse — covering text-to-video and image-to-video via fal_client's queue API."
+author: NousResearch
+kind: backend
+requires_env:
+  - FAL_KEY
@@ -0,0 +1,402 @@
+"""xAI Grok-Imagine video generation backend.
+
+Surface: text-to-video and image-to-video (animate an input image)
+through xAI's ``/videos/generations`` endpoint. Edit and extend are not
+exposed in this unified surface — xAI is the only backend that supports
+them and the inconsistency would force per-backend prose in the agent's
+tool description.
+
+Originally salvaged from PR #10600 by @Jaaneek; reshaped into the
+:class:`VideoGenProvider` plugin interface and trimmed to the
+generate-only surface.
+
+Authentication via ``XAI_API_KEY``. Output is an HTTPS URL from xAI's
+CDN; the gateway downloads and delivers it.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import uuid
+from typing import Any, Dict, List, Optional
+
+import httpx
+
+from agent.video_gen_provider import (
+    VideoGenProvider,
+    error_response,
+    success_response,
+)
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+DEFAULT_XAI_BASE_URL = "https://api.x.ai/v1"
+DEFAULT_MODEL = "grok-imagine-video"
+DEFAULT_DURATION = 8
+DEFAULT_ASPECT_RATIO = "16:9"
+DEFAULT_RESOLUTION = "720p"
+DEFAULT_TIMEOUT_SECONDS = 240
+DEFAULT_POLL_INTERVAL_SECONDS = 5
+
+VALID_ASPECT_RATIOS = {"1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3"}
+VALID_RESOLUTIONS = {"480p", "720p"}
+MAX_REFERENCE_IMAGES = 7
+
+
+_MODELS: Dict[str, Dict[str, Any]] = {
+    "grok-imagine-video": {
+        "display": "Grok Imagine Video",
+        "speed": "~60-240s",
+        "strengths": "Text-to-video + image-to-video; up to 7 reference images for style/character.",
+        "price": "see https://docs.x.ai/docs/models",
+        "modalities": ["text", "image"],
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# HTTP helpers
+# ---------------------------------------------------------------------------
+
+
+def _xai_base_url() -> str:
+    return (os.getenv("XAI_BASE_URL") or DEFAULT_XAI_BASE_URL).strip().rstrip("/")
+
+
+def _xai_headers() -> Dict[str, str]:
+    api_key = os.getenv("XAI_API_KEY", "").strip()
+    if not api_key:
+        raise ValueError("XAI_API_KEY not set. Get one at https://console.x.ai/")
+    try:
+        from tools.xai_http import hermes_xai_user_agent
+
+        ua = hermes_xai_user_agent()
+    except Exception:
+        ua = "hermes-agent/video_gen"
+    return {
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json",
+        "User-Agent": ua,
+    }
+
+
+def _normalize_reference_images(reference_image_urls: Optional[List[str]]):
+    refs = []
+    for url in reference_image_urls or []:
+        normalized = (url or "").strip()
+        if normalized:
+            refs.append({"url": normalized})
+    return refs or None
+
+
+def _clamp_duration(duration: Optional[int], has_reference_images: bool) -> int:
+    value = duration if duration is not None else DEFAULT_DURATION
+    if value < 1:
+        value = 1
+    if value > 15:
+        value = 15
+    if has_reference_images and value > 10:
+        value = 10
+    return value
+
+
+async def _submit(
+    client: httpx.AsyncClient,
+    payload: Dict[str, Any],
+) -> str:
+    """POST to /videos/generations — xAI's only public endpoint for our
+    text-to-video and image-to-video surface."""
+    response = await client.post(
+        f"{_xai_base_url()}/videos/generations",
+        headers={**_xai_headers(), "x-idempotency-key": str(uuid.uuid4())},
+        json=payload,
+        timeout=60,
+    )
+    response.raise_for_status()
+    body = response.json()
+    request_id = body.get("request_id")
+    if not request_id:
+        raise RuntimeError("xAI video response did not include request_id")
+    return request_id
+
+
+async def _poll(
+    client: httpx.AsyncClient,
+    request_id: str,
+    *,
+    timeout_seconds: int,
+    poll_interval: int,
+) -> Dict[str, Any]:
+    elapsed = 0.0
+    last_status = "queued"
+    while elapsed < timeout_seconds:
+        response = await client.get(
+            f"{_xai_base_url()}/videos/{request_id}",
+            headers=_xai_headers(),
+            timeout=30,
+        )
+        response.raise_for_status()
+        body = response.json()
+        last_status = (body.get("status") or "").lower()
+
+        if last_status == "done":
+            return {"status": "done", "body": body}
+        if last_status in {"failed", "error", "expired", "cancelled"}:
+            return {"status": last_status, "body": body}
+
+        await asyncio.sleep(poll_interval)
+        elapsed += poll_interval
+
+    return {"status": "timeout", "body": {"status": last_status}}
+
+
+# ---------------------------------------------------------------------------
+# Provider
+# ---------------------------------------------------------------------------
+
+
+class XAIVideoGenProvider(VideoGenProvider):
+    """xAI grok-imagine-video backend (text-to-video + image-to-video)."""
+
+    @property
+    def name(self) -> str:
+        return "xai"
+
+    @property
+    def display_name(self) -> str:
+        return "xAI"
+
+    def is_available(self) -> bool:
+        return bool(os.environ.get("XAI_API_KEY", "").strip())
+
+    def list_models(self) -> List[Dict[str, Any]]:
+        return [{"id": mid, **meta} for mid, meta in _MODELS.items()]
+
+    def default_model(self) -> Optional[str]:
+        return DEFAULT_MODEL
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "xAI",
+            "badge": "paid",
+            "tag": "grok-imagine-video — text-to-video & image-to-video with reference images",
+            "env_vars": [
+                {
+                    "key": "XAI_API_KEY",
+                    "prompt": "xAI API key",
+                    "url": "https://console.x.ai/",
+                },
+            ],
+        }
+
+    def capabilities(self) -> Dict[str, Any]:
+        return {
+            "modalities": ["text", "image"],
+            "aspect_ratios": sorted(VALID_ASPECT_RATIOS),
+            "resolutions": sorted(VALID_RESOLUTIONS),
+            "max_duration": 15,
+            "min_duration": 1,
+            "supports_audio": False,
+            "supports_negative_prompt": False,
+            "max_reference_images": MAX_REFERENCE_IMAGES,
+        }
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        model: Optional[str] = None,
+        image_url: Optional[str] = None,
+        reference_image_urls: Optional[List[str]] = None,
+        duration: Optional[int] = None,
+        aspect_ratio: str = DEFAULT_ASPECT_RATIO,
+        resolution: str = DEFAULT_RESOLUTION,
+        negative_prompt: Optional[str] = None,
+        audio: Optional[bool] = None,
+        seed: Optional[int] = None,
+        **kwargs: Any,
+    ) -> Dict[str, Any]:
+        try:
+            loop = asyncio.new_event_loop()
+            try:
+                return loop.run_until_complete(self._generate_async(
+                    prompt=prompt,
+                    model=model,
+                    image_url=image_url,
+                    reference_image_urls=reference_image_urls,
+                    duration=duration,
+                    aspect_ratio=aspect_ratio,
+                    resolution=resolution,
+                ))
+            finally:
+                loop.close()
+        except Exception as exc:
+            logger.warning("xAI video gen unexpected failure: %s", exc, exc_info=True)
+            return error_response(
+                error=f"xAI video generation failed: {exc}",
+                error_type="api_error",
+                provider="xai",
+                model=model or DEFAULT_MODEL,
+                prompt=prompt,
+                aspect_ratio=aspect_ratio,
+            )
+
+    async def _generate_async(
+        self,
+        *,
+        prompt: str,
+        model: Optional[str],
+        image_url: Optional[str],
+        reference_image_urls: Optional[List[str]],
+        duration: Optional[int],
+        aspect_ratio: str,
+        resolution: str,
+    ) -> Dict[str, Any]:
+        if not os.environ.get("XAI_API_KEY", "").strip():
+            return error_response(
+                error="XAI_API_KEY not set. Get one at https://console.x.ai/",
+                error_type="auth_required",
+                provider="xai", prompt=prompt,
+            )
+
+        prompt = (prompt or "").strip()
+        image_url_norm = (image_url or "").strip() or None
+        normalized_aspect_ratio = (aspect_ratio or DEFAULT_ASPECT_RATIO).strip()
+        normalized_resolution = (resolution or DEFAULT_RESOLUTION).strip().lower()
+        modality_used = "image" if image_url_norm else "text"
+
+        if not prompt:
+            return error_response(
+                error=(
+                    "prompt is required for xAI video generation "
+                    "(text-to-video or image-to-video)"
+                ),
+                error_type="missing_prompt",
+                provider="xai", prompt=prompt,
+            )
+
+        refs = _normalize_reference_images(reference_image_urls)
+        if refs and len(refs) > MAX_REFERENCE_IMAGES:
+            return error_response(
+                error=f"reference_image_urls supports at most {MAX_REFERENCE_IMAGES} images on xAI",
+                error_type="too_many_references",
+                provider="xai", prompt=prompt,
+            )
+        if image_url_norm and refs:
+            return error_response(
+                error="image_url and reference_image_urls cannot be combined on xAI",
+                error_type="conflicting_inputs",
+                provider="xai", prompt=prompt,
+            )
+
+        clamped_duration = _clamp_duration(duration, has_reference_images=bool(refs))
+
+        if normalized_aspect_ratio not in VALID_ASPECT_RATIOS:
+            normalized_aspect_ratio = DEFAULT_ASPECT_RATIO
+        if normalized_resolution not in VALID_RESOLUTIONS:
+            normalized_resolution = DEFAULT_RESOLUTION
+
+        payload: Dict[str, Any] = {
+            "model": model or DEFAULT_MODEL,
+            "prompt": prompt,
+            "duration": clamped_duration,
+            "aspect_ratio": normalized_aspect_ratio,
+            "resolution": normalized_resolution,
+        }
+        if image_url_norm:
+            payload["image"] = {"url": image_url_norm}
+        if refs:
+            payload["reference_images"] = refs
+
+        async with httpx.AsyncClient() as client:
+            try:
+                request_id = await _submit(client, payload)
+            except httpx.HTTPStatusError as exc:
+                detail = ""
+                try:
+                    detail = exc.response.text[:500]
+                except Exception:
+                    pass
+                return error_response(
+                    error=f"xAI submit failed ({exc.response.status_code}): {detail or exc}",
+                    error_type="api_error",
+                    provider="xai",
+                    model=model or DEFAULT_MODEL,
+                    prompt=prompt,
+                )
+
+            poll_result = await _poll(
+                client, request_id,
+                timeout_seconds=DEFAULT_TIMEOUT_SECONDS,
+                poll_interval=DEFAULT_POLL_INTERVAL_SECONDS,
+            )
+
+        status = poll_result["status"]
+        body = poll_result["body"]
+
+        if status == "done":
+            video = body.get("video") or {}
+            url = video.get("url")
+            if not url:
+                return error_response(
+                    error="xAI video generation completed without a video URL",
+                    error_type="empty_response",
+                    provider="xai",
+                    model=body.get("model") or model or DEFAULT_MODEL,
+                    prompt=prompt,
+                )
+            extra: Dict[str, Any] = {
+                "request_id": request_id,
+                "resolution": normalized_resolution,
+            }
+            if body.get("usage"):
+                extra["usage"] = body["usage"]
+            return success_response(
+                video=url,
+                model=body.get("model") or model or DEFAULT_MODEL,
+                prompt=prompt,
+                modality=modality_used,
+                aspect_ratio=normalized_aspect_ratio,
+                duration=video.get("duration") or clamped_duration,
+                provider="xai",
+                extra=extra,
+            )
+
+        if status == "timeout":
+            return error_response(
+                error=f"Timed out waiting for video generation after {DEFAULT_TIMEOUT_SECONDS}s",
+                error_type="timeout",
+                provider="xai",
+                model=model or DEFAULT_MODEL,
+                prompt=prompt,
+            )
+
+        message = (
+            (body.get("error", {}) or {}).get("message")
+            or body.get("message")
+            or f"xAI video generation ended with status '{status}'"
+        )
+        return error_response(
+            error=message,
+            error_type=f"xai_{status}",
+            provider="xai",
+            model=model or DEFAULT_MODEL,
+            prompt=prompt,
+        )
+
+
+# ---------------------------------------------------------------------------
+# Plugin entry point
+# ---------------------------------------------------------------------------
+
+
+def register(ctx) -> None:
+    """Plugin entry point — wire ``XAIVideoGenProvider`` into the registry."""
+    ctx.register_video_gen_provider(XAIVideoGenProvider())
@@ -0,0 +1,7 @@
+name: xai
+version: 1.0.0
+description: "xAI Grok-Imagine video generation backend. Supports text-to-video, image-to-video, reference-image-guided generation, video edit, and video extend via the xAI async videos API."
+author: NousResearch
+kind: backend
+requires_env:
+  - XAI_API_KEY
@@ -0,0 +1,7 @@
+# Bundled web search providers — plugins/web/.
+#
+# Each subdirectory follows the image_gen plugin layout:
+#   plugins/web/<name>/{plugin.yaml, __init__.py, provider.py}
+#
+# They auto-load via kind: backend and register via
+# ctx.register_web_search_provider() into agent.web_search_registry.
@@ -0,0 +1,14 @@
+"""Brave Search (free tier) plugin — bundled, auto-loaded.
+
+Mirrors the ``plugins/image_gen/openai/`` layout: ``provider.py`` holds the
+provider class, ``__init__.py::register(ctx)`` registers an instance.
+"""
+
+from __future__ import annotations
+
+from plugins.web.brave_free.provider import BraveFreeWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the Brave-free provider with the plugin context."""
+    ctx.register_web_search_provider(BraveFreeWebSearchProvider())
@@ -0,0 +1,7 @@
+name: web-brave-free
+version: 1.0.0
+description: "Brave Search (free tier) — web search via Brave's Data-for-Search API. Requires BRAVE_SEARCH_API_KEY (free signup at https://brave.com/search/api/, 2k queries/month)."
+author: NousResearch
+kind: backend
+provides_web_providers:
+  - brave-free
@@ -1,23 +1,20 @@
-"""Brave Search web search provider (free tier).
+"""Brave Search (free tier) — plugin form.

-Brave Search's Data-for-Search API offers a free tier (2,000 queries/mo at the
-time of writing) after signing up at https://brave.com/search/api/.  This
-provider implements ``WebSearchProvider`` only — the Data-for-Search endpoint
-returns search results, it does not extract/crawl arbitrary URLs.
+Subclasses :class:`agent.web_search_provider.WebSearchProvider` (the
+plugin-facing ABC). The legacy in-tree module
+``tools.web_providers.brave_free`` was removed in the same commit that
+moved this code under ``plugins/``; this file is now the canonical
+implementation.

-Configuration::
+Config keys this provider responds to::

-    # ~/.hermes/.env
-    BRAVE_SEARCH_API_KEY=your-subscription-token
-
-    # ~/.hermes/config.yaml
    web:
-      search_backend: "brave-free"
-      extract_backend: "firecrawl"    # pair with an extract provider if needed
+      search_backend: "brave-free"     # explicit per-capability
+      backend: "brave-free"            # shared fallback

-The API uses the ``X-Subscription-Token`` header.  Free-tier keys are rate
-limited (1 qps) and capped at 2k queries/month; see the Brave dashboard for
-current quotas.
+Auth env var::
+
+    BRAVE_SEARCH_API_KEY=...    # https://brave.com/search/api/ (free tier)
 """

 from __future__ import annotations
@@ -26,49 +23,45 @@ import logging
 import os
 from typing import Any, Dict

-from tools.web_providers.base import WebSearchProvider
+from agent.web_search_provider import WebSearchProvider

 logger = logging.getLogger(__name__)

 _BRAVE_ENDPOINT = "https://api.search.brave.com/res/v1/web/search"


-class BraveFreeSearchProvider(WebSearchProvider):
-    """Search via the Brave Search API (free tier).
+class BraveFreeWebSearchProvider(WebSearchProvider):
+    """Search-only Brave provider using the free-tier Data-for-Search API.

-    Requires ``BRAVE_SEARCH_API_KEY`` to be set. The value is passed as the
-    ``X-Subscription-Token`` header. No extract capability — pair with
-    Firecrawl/Tavily/Exa/Parallel when you also need ``web_extract``.
+    Free tier is 2,000 queries/month (1 qps). No content-extraction capability —
+    users pair this with Firecrawl/Tavily/Exa for ``web_extract``.
    """

-    def provider_name(self) -> str:
+    @property
+    def name(self) -> str:
+        # Hyphen form preserved for backward compat with the existing
+        # ``web.search_backend: "brave-free"`` config keys users have set.
        return "brave-free"

-    def is_configured(self) -> bool:
+    @property
+    def display_name(self) -> str:
+        return "Brave Search (Free)"
+
+    def is_available(self) -> bool:
        """Return True when ``BRAVE_SEARCH_API_KEY`` is set to a non-empty value."""
        return bool(os.getenv("BRAVE_SEARCH_API_KEY", "").strip())

+    def supports_search(self) -> bool:
+        return True
+
+    def supports_extract(self) -> bool:
+        return False
+
    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
        """Execute a search against the Brave Search API.

-        Returns normalized results::
-
-            {
-                "success": True,
-                "data": {
-                    "web": [
-                        {
-                            "title": str,
-                            "url": str,
-                            "description": str,
-                            "position": int,
-                        },
-                        ...
-                    ]
-                }
-            }
-
-        On failure returns ``{"success": False, "error": str}``.
+        Returns ``{"success": True, "data": {"web": [{"title", "url", "description", "position"}]}}``
+        on success, or ``{"success": False, "error": str}`` on failure.
        """
        import httpx

@@ -128,3 +121,17 @@ class BraveFreeSearchProvider(WebSearchProvider):
        )

        return {"success": True, "data": {"web": web_results}}
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "Brave Search (Free)",
+            "badge": "free",
+            "tag": "Free-tier API key — 2k queries/mo, search only.",
+            "env_vars": [
+                {
+                    "key": "BRAVE_SEARCH_API_KEY",
+                    "prompt": "Brave Search API key (free tier)",
+                    "url": "https://brave.com/search/api/",
+                },
+            ],
+        }
@@ -0,0 +1,15 @@
+"""DuckDuckGo search plugin — bundled, auto-loaded.
+
+Backed by the community ``ddgs`` Python package which scrapes DDG's HTML
+results page. No API key required, but the package itself must be installed
+(it's an optional dep — gated via :meth:`is_available`).
+"""
+
+from __future__ import annotations
+
+from plugins.web.ddgs.provider import DDGSWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the DDGS provider with the plugin context."""
+    ctx.register_web_search_provider(DDGSWebSearchProvider())
@@ -0,0 +1,7 @@
+name: web-ddgs
+version: 1.0.0
+description: "DuckDuckGo web search via the ddgs Python package — no API key required. Install with `pip install ddgs`."
+author: NousResearch
+kind: backend
+provides_web_providers:
+  - ddgs
@@ -1,28 +1,13 @@
-"""DuckDuckGo web search provider via the ``ddgs`` Python package.
+"""DuckDuckGo search — plugin form (via the ``ddgs`` package).

-DuckDuckGo does not provide an official programmatic search API.  The
-community-maintained `ddgs <https://pypi.org/project/ddgs/>`_ package (the
-renamed successor of ``duckduckgo-search``) scrapes DuckDuckGo's HTML results
-page and normalizes them.  It implements ``WebSearchProvider`` only — there is
-no extract capability.
+Subclasses the plugin-facing :class:`agent.web_search_provider.WebSearchProvider`.
+The legacy in-tree module ``tools.web_providers.ddgs`` was removed in the
+same commit that moved this code under ``plugins/``; this file is now the
+canonical implementation.

-Configuration::
-
-    # No API key required. Enable by installing the package and pointing the
-    # web backend at ddgs:
-    pip install ddgs
-
-    # ~/.hermes/config.yaml
-    web:
-      search_backend: "ddgs"
-      extract_backend: "firecrawl"    # pair with an extract provider if needed
-
-Rate limits are enforced server-side by DuckDuckGo.  Expect intermittent
-``DuckDuckGoSearchException`` / 202 responses under heavy use; this provider
-surfaces them as ``{"success": False, "error": ...}`` rather than crashing
-the tool call.
-
-See https://duckduckgo.com/?q=duckduckgo+tos for terms of use.
+The ``ddgs`` package is an optional dependency. ``is_available()`` reflects
+whether the package is importable; the plugin still registers either way so
+``hermes tools`` can prompt the user to install it.
 """

 from __future__ import annotations
@@ -30,39 +15,49 @@ from __future__ import annotations
 import logging
 from typing import Any, Dict

-from tools.web_providers.base import WebSearchProvider
+from agent.web_search_provider import WebSearchProvider

 logger = logging.getLogger(__name__)


-class DDGSSearchProvider(WebSearchProvider):
-    """Search via the ``ddgs`` package (DuckDuckGo HTML scrape).
+class DDGSWebSearchProvider(WebSearchProvider):
+    """DuckDuckGo HTML-scrape search provider.

-    No API key required.  The provider is considered "configured" when the
-    ``ddgs`` package is importable — there is nothing else to set up.
+    No API key needed. Rate limits are enforced server-side by DuckDuckGo;
+    the provider surfaces ``DuckDuckGoSearchException`` and other ddgs errors
+    as ``{"success": False, "error": ...}`` rather than raising.
    """

-    def provider_name(self) -> str:
+    @property
+    def name(self) -> str:
        return "ddgs"

-    def is_configured(self) -> bool:
+    @property
+    def display_name(self) -> str:
+        return "DuckDuckGo (ddgs)"
+
+    def is_available(self) -> bool:
        """Return True when the ``ddgs`` package is importable.

-        Called at tool-registration time; must not perform network I/O.
+        Probes the import once; cheap because Python caches the import. Must
+        NOT perform network I/O — runs at tool-registration time and on every
+        ``hermes tools`` paint.
        """
        try:
            import ddgs  # noqa: F401
+
            return True
        except ImportError:
            return False

-    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
-        """Execute a DuckDuckGo search and return normalized results.
+    def supports_search(self) -> bool:
+        return True

-        Returns ``{"success": True, "data": {"web": [...]}}`` on success or
-        ``{"success": False, "error": str}`` on failure (missing package,
-        rate-limited, network error, etc.).
-        """
+    def supports_extract(self) -> bool:
+        return False
+
+    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+        """Execute a DuckDuckGo search and return normalized results."""
        try:
            from ddgs import DDGS  # type: ignore
        except ImportError:
@@ -96,3 +91,14 @@ class DDGSSearchProvider(WebSearchProvider):

        logger.info("DDGS search '%s': %d results (limit %d)", query, len(web_results), limit)
        return {"success": True, "data": {"web": web_results}}
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "DuckDuckGo (ddgs)",
+            "badge": "free · no key · search only",
+            "tag": "Search via the ddgs Python package — no API key (pair with any extract provider)",
+            "env_vars": [],
+            # Trigger `_run_post_setup("ddgs")` after the user picks this row
+            # so the ddgs Python package gets pip-installed on first selection.
+            "post_setup": "ddgs",
+        }
@@ -0,0 +1,15 @@
+"""Exa web search + extract plugin — bundled, auto-loaded.
+
+Backed by the official Exa SDK (``exa-py``). Both search and extract are
+sync; the dispatcher in :mod:`tools.web_tools` handles the wrap when the
+caller is async.
+"""
+
+from __future__ import annotations
+
+from plugins.web.exa.provider import ExaWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the Exa provider with the plugin context."""
+    ctx.register_web_search_provider(ExaWebSearchProvider())
@@ -0,0 +1,7 @@
+name: web-exa
+version: 1.0.0
+description: "Exa web search and content extraction. Requires EXA_API_KEY — sign up at https://exa.ai."
+author: NousResearch
+kind: backend
+provides_web_providers:
+  - exa
@@ -0,0 +1,212 @@
+"""Exa web search + content extraction — plugin form.
+
+Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Uses the
+official Exa SDK (``exa-py``) which is lazy-loaded via
+:func:`tools.lazy_deps.ensure` so that cold-start CLI users don't pay the
+SDK import cost when Exa isn't configured.
+
+Config keys this provider responds to::
+
+    web:
+      search_backend: "exa"      # explicit per-capability
+      extract_backend: "exa"     # explicit per-capability
+      backend: "exa"             # shared fallback for both
+
+Env var::
+
+    EXA_API_KEY=...    # https://exa.ai (paid tier; free trial available)
+
+The previous in-tree implementation lived at
+``tools.web_tools._exa_search`` / ``_exa_extract``; this file is the
+canonical replacement. Behavior is bit-for-bit identical aside from the
+ABC method-name change.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from typing import Any, Dict, List
+
+from agent.web_search_provider import WebSearchProvider
+
+logger = logging.getLogger(__name__)
+
+# Module-level note: the canonical ``_exa_client`` cache slot lives on
+# :mod:`tools.web_tools` so tests that do ``tools.web_tools._exa_client =
+# None`` between cases see fresh state. The plugin reads/writes through
+# that public module (see :func:`_get_exa_client`).
+
+
+def _get_exa_client() -> Any:
+    """Lazy-import and cache an Exa SDK client.
+
+    Cache lives on :mod:`tools.web_tools` (as ``_exa_client``) so unit
+    tests that reset that name between cases keep working. Raises
+    ``ValueError`` when ``EXA_API_KEY`` is unset.
+    """
+    import tools.web_tools as _wt
+
+    cached = getattr(_wt, "_exa_client", None)
+    if cached is not None:
+        return cached
+
+    api_key = os.getenv("EXA_API_KEY")
+    if not api_key:
+        raise ValueError(
+            "EXA_API_KEY environment variable not set. "
+            "Get your API key at https://exa.ai"
+        )
+
+    try:
+        from tools.lazy_deps import ensure as _lazy_ensure
+
+        _lazy_ensure("search.exa", prompt=False)
+    except ImportError:
+        pass
+    except Exception as exc:  # noqa: BLE001 — lazy_deps surfaces install hints
+        raise ImportError(str(exc))
+
+    from exa_py import Exa  # noqa: WPS433 — deliberately lazy
+
+    client = Exa(api_key=api_key)
+    client.headers["x-exa-integration"] = "hermes-agent"
+    _wt._exa_client = client
+    return client
+
+
+def _reset_client_for_tests() -> None:
+    """Drop the cached Exa client so tests can re-instantiate cleanly."""
+    import tools.web_tools as _wt
+
+    _wt._exa_client = None
+
+
+class ExaWebSearchProvider(WebSearchProvider):
+    """Exa search + extract provider.
+
+    Both methods are sync — Exa's SDK is sync-only. The web_extract_tool
+    dispatcher wraps sync extracts via ``asyncio.to_thread`` when it
+    needs to keep the event loop responsive.
+    """
+
+    @property
+    def name(self) -> str:
+        return "exa"
+
+    @property
+    def display_name(self) -> str:
+        return "Exa"
+
+    def is_available(self) -> bool:
+        """Return True when ``EXA_API_KEY`` is set to a non-empty value."""
+        return bool(os.getenv("EXA_API_KEY", "").strip())
+
+    def supports_search(self) -> bool:
+        return True
+
+    def supports_extract(self) -> bool:
+        return True
+
+    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+        """Execute an Exa search.
+
+        Returns ``{"success": True, "data": {"web": [{...}, ...]}}`` on
+        success, ``{"success": False, "error": str}`` on failure (incl.
+        missing API key and SDK install errors).
+        """
+        try:
+            from tools.interrupt import is_interrupted
+
+            if is_interrupted():
+                return {"success": False, "error": "Interrupted"}
+
+            logger.info("Exa search: '%s' (limit=%d)", query, limit)
+            response = _get_exa_client().search(
+                query,
+                num_results=limit,
+                contents={"highlights": True},
+            )
+
+            web_results = []
+            for i, result in enumerate(response.results or []):
+                highlights = result.highlights or []
+                web_results.append(
+                    {
+                        "url": result.url or "",
+                        "title": result.title or "",
+                        "description": " ".join(highlights) if highlights else "",
+                        "position": i + 1,
+                    }
+                )
+
+            return {"success": True, "data": {"web": web_results}}
+        except ValueError as exc:
+            # Raised by _get_exa_client when EXA_API_KEY missing
+            return {"success": False, "error": str(exc)}
+        except ImportError as exc:
+            return {"success": False, "error": f"Exa SDK not installed: {exc}"}
+        except Exception as exc:  # noqa: BLE001 — surface as failure
+            logger.warning("Exa search error: %s", exc)
+            return {"success": False, "error": f"Exa search failed: {exc}"}
+
+    def extract(self, urls: List[str], **kwargs: Any) -> List[Dict[str, Any]]:
+        """Extract content from one or more URLs via Exa.
+
+        Returns a list of result dicts shaped for the legacy LLM
+        post-processing pipeline. On per-URL or whole-batch failure,
+        results carry an ``error`` field rather than raising.
+        """
+        try:
+            from tools.interrupt import is_interrupted
+
+            if is_interrupted():
+                return [
+                    {"url": u, "error": "Interrupted", "title": ""} for u in urls
+                ]
+
+            logger.info("Exa extract: %d URL(s)", len(urls))
+            response = _get_exa_client().get_contents(urls, text=True)
+
+            results: List[Dict[str, Any]] = []
+            for result in response.results or []:
+                content = result.text or ""
+                url = result.url or ""
+                title = result.title or ""
+                results.append(
+                    {
+                        "url": url,
+                        "title": title,
+                        "content": content,
+                        "raw_content": content,
+                        "metadata": {"sourceURL": url, "title": title},
+                    }
+                )
+            return results
+        except ValueError as exc:
+            return [{"url": u, "title": "", "content": "", "error": str(exc)} for u in urls]
+        except ImportError as exc:
+            return [
+                {"url": u, "title": "", "content": "", "error": f"Exa SDK not installed: {exc}"}
+                for u in urls
+            ]
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Exa extract error: %s", exc)
+            return [
+                {"url": u, "title": "", "content": "", "error": f"Exa extract failed: {exc}"}
+                for u in urls
+            ]
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "Exa",
+            "badge": "paid",
+            "tag": "Semantic + neural web search with content extraction.",
+            "env_vars": [
+                {
+                    "key": "EXA_API_KEY",
+                    "prompt": "Exa API key",
+                    "url": "https://exa.ai",
+                },
+            ],
+        }
@@ -0,0 +1,28 @@
+"""Firecrawl web search + extract plugin — bundled, auto-loaded.
+
+Largest single plugin in this PR. Captures everything the previous
+inline implementation in tools/web_tools.py did:
+
+  - Lazy import of the firecrawl SDK (~200ms cold-start cost) via a
+    callable proxy that defers the actual import to first use.
+  - Dual client paths: direct (FIRECRAWL_API_KEY / FIRECRAWL_API_URL)
+    OR Nous-hosted tool-gateway routing for subscribers, with
+    web.use_gateway as the tie-breaker.
+  - Per-URL scrape loop with 60s timeout, SSRF re-check after redirect,
+    website-policy gating, and format-aware content selection.
+  - Robust response shape normalization across SDK / direct API /
+    gateway variants (search returns differ by transport).
+
+The plugin re-exports ``Firecrawl`` (the lazy proxy) and
+``check_firecrawl_api_key`` for backward-compatibility with tests and
+external code that imports those names from ``tools.web_tools``.
+"""
+
+from __future__ import annotations
+
+from plugins.web.firecrawl.provider import FirecrawlWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the Firecrawl provider with the plugin context."""
+    ctx.register_web_search_provider(FirecrawlWebSearchProvider())
@@ -0,0 +1,7 @@
+name: web-firecrawl
+version: 1.0.0
+description: "Firecrawl web search + content extraction. Supports direct API and Nous-hosted tool-gateway routing for subscribers. Requires FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or an active Nous subscription with FIRECRAWL_GATEWAY_URL."
+author: NousResearch
+kind: backend
+provides_web_providers:
+  - firecrawl
@@ -0,0 +1,773 @@
+"""Firecrawl web search + extract — plugin form.
+
+Subclasses :class:`agent.web_search_provider.WebSearchProvider`. This is
+the largest provider migrated in this PR; it captures the full inline
+firecrawl implementation that previously lived in tools/web_tools.py:
+
+  - :data:`Firecrawl` lazy proxy that defers the ~200ms SDK import to
+    first use (re-exported by tools.web_tools for backward compat with
+    existing tests that mock that name).
+  - :func:`_get_firecrawl_client` with direct + managed-gateway dual
+    mode, controlled by ``web.use_gateway`` config when both are
+    configured.
+  - :func:`check_firecrawl_api_key` re-exported (tests + tools_config
+    setup hint depend on this name living in tools.web_tools).
+  - :func:`_extract_web_search_results` / :func:`_extract_scrape_payload`
+    response-shape normalizers that handle SDK / direct API / gateway
+    response variants.
+  - Per-URL extract loop with 60s timeout, redirect-aware SSRF re-check,
+    website-policy gating, and format-aware content selection.
+
+Async note: the underlying SDK is sync. ``extract()`` is declared
+``async def`` because it performs per-URL I/O that benefits from
+running in an executor; the implementation wraps each scrape in
+:func:`asyncio.to_thread` with :func:`asyncio.wait_for(timeout=60)` to
+guard against hung fetches.
+
+Config keys this provider responds to::
+
+    web:
+      search_backend: "firecrawl"     # explicit per-capability
+      extract_backend: "firecrawl"    # explicit per-capability
+      backend: "firecrawl"            # shared fallback (default)
+      use_gateway: false              # prefer managed gateway when both
+                                      # direct + gateway credentials exist
+
+Env vars::
+
+    FIRECRAWL_API_KEY=...            # direct cloud auth
+    FIRECRAWL_API_URL=...            # self-hosted Firecrawl
+    FIRECRAWL_GATEWAY_URL=...        # Nous tool-gateway (subscribers)
+    TOOL_GATEWAY_DOMAIN=...          # alternate gateway env
+    TOOL_GATEWAY_SCHEME=...
+    TOOL_GATEWAY_USER_TOKEN=...
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from typing import Any, Dict, List, Optional, TYPE_CHECKING
+
+from agent.web_search_provider import WebSearchProvider
+from tools.website_policy import check_website_access
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Lazy Firecrawl SDK proxy
+# ---------------------------------------------------------------------------
+# The firecrawl SDK pulls ~200ms of imports (httpcore, firecrawl.v1/v2 type
+# trees) on a cold CLI. We only need it when the backend is actually
+# "firecrawl", so defer the import to first use via a callable proxy.
+#
+# Tests that do ``patch("tools.web_tools.Firecrawl", ...)`` continue to
+# work because tools/web_tools.py re-exports ``Firecrawl`` from this
+# module — so the patched name still references the same proxy instance.
+
+if TYPE_CHECKING:
+    from firecrawl import Firecrawl as FirecrawlSDK  # noqa: F401 — type hints only
+
+_FIRECRAWL_CLS_CACHE: Optional[type] = None
+
+
+def _load_firecrawl_cls() -> type:
+    """Import and cache ``firecrawl.Firecrawl``."""
+    global _FIRECRAWL_CLS_CACHE
+    if _FIRECRAWL_CLS_CACHE is None:
+        try:
+            from tools.lazy_deps import ensure as _lazy_ensure
+
+            _lazy_ensure("search.firecrawl", prompt=False)
+        except ImportError:
+            pass
+        except Exception as exc:  # noqa: BLE001 — surface install hint
+            raise ImportError(str(exc))
+        from firecrawl import Firecrawl as _cls  # noqa: WPS433 — deliberately lazy
+
+        _FIRECRAWL_CLS_CACHE = _cls
+    return _FIRECRAWL_CLS_CACHE
+
+
+class _FirecrawlProxy:
+    """Callable proxy that looks like ``firecrawl.Firecrawl`` but imports lazily."""
+
+    __slots__ = ()
+
+    def __call__(self, *args: Any, **kwargs: Any) -> Any:
+        return _load_firecrawl_cls()(*args, **kwargs)
+
+    def __instancecheck__(self, obj: Any) -> bool:
+        return isinstance(obj, _load_firecrawl_cls())
+
+    def __repr__(self) -> str:
+        return "<lazy firecrawl.Firecrawl proxy>"
+
+
+Firecrawl = _FirecrawlProxy()
+
+
+# ---------------------------------------------------------------------------
+# Client construction (direct vs managed-gateway)
+# ---------------------------------------------------------------------------
+#
+# The canonical cache slots live on :mod:`tools.web_tools` so tests that do
+# ``tools.web_tools._firecrawl_client = None`` between cases see fresh
+# state. The plugin reads/writes through that public module — see
+# :func:`_get_firecrawl_client` below.
+
+
+def _get_direct_firecrawl_config() -> Optional[tuple]:
+    """Return explicit direct Firecrawl kwargs + cache key, or None when unset."""
+    api_key = os.getenv("FIRECRAWL_API_KEY", "").strip()
+    api_url = os.getenv("FIRECRAWL_API_URL", "").strip().rstrip("/")
+
+    if not api_key and not api_url:
+        return None
+
+    kwargs: Dict[str, str] = {}
+    if api_key:
+        kwargs["api_key"] = api_key
+    if api_url:
+        kwargs["api_url"] = api_url
+
+    return kwargs, ("direct", api_url or None, api_key or None)
+
+
+def _get_firecrawl_gateway_url() -> str:
+    """Return the configured Firecrawl gateway URL."""
+    import tools.web_tools as _wt
+
+    return _wt.build_vendor_gateway_url("firecrawl")
+
+
+def _is_tool_gateway_ready() -> bool:
+    """Return True when gateway URL + Nous Subscriber token are available.
+
+    Reads ``read_nous_access_token`` and ``resolve_managed_tool_gateway``
+    via :mod:`tools.web_tools` rather than direct imports, so unit tests
+    that ``patch("tools.web_tools._read_nous_access_token", ...)`` see
+    their patches honored. The names are re-exported on
+    :mod:`tools.web_tools` for exactly this reason.
+    """
+    import tools.web_tools as _wt
+
+    return _wt.resolve_managed_tool_gateway(
+        "firecrawl", token_reader=_wt._read_nous_access_token
+    ) is not None
+
+
+def _has_direct_firecrawl_config() -> bool:
+    """Return True when direct Firecrawl config is explicitly configured."""
+    return _get_direct_firecrawl_config() is not None
+
+
+def check_firecrawl_api_key() -> bool:
+    """Return True when Firecrawl backend (direct or gateway) is usable.
+
+    Re-exported by :mod:`tools.web_tools` for backward compatibility with
+    existing tests and the ``hermes tools`` setup flow.
+    """
+    return _has_direct_firecrawl_config() or _is_tool_gateway_ready()
+
+
+def _firecrawl_backend_help_suffix() -> str:
+    """Return optional managed-gateway guidance for Firecrawl help text."""
+    import tools.web_tools as _wt
+
+    if not _wt.managed_nous_tools_enabled():
+        return ""
+    return (
+        ", or use the Nous Tool Gateway via your subscription "
+        "(FIRECRAWL_GATEWAY_URL or TOOL_GATEWAY_DOMAIN)"
+    )
+
+
+def _raise_web_backend_configuration_error() -> None:
+    """Raise a clear error for unsupported web backend configuration."""
+    import tools.web_tools as _wt
+
+    message = (
+        "Web tools are not configured. "
+        "Set FIRECRAWL_API_KEY for cloud Firecrawl or set FIRECRAWL_API_URL "
+        "for a self-hosted Firecrawl instance."
+    )
+    if _wt.managed_nous_tools_enabled():
+        message += (
+            " With your Nous subscription you can also use the Tool Gateway — "
+            "run `hermes tools` and select Nous Subscription as the web provider."
+        )
+    raise ValueError(message)
+
+
+def _get_firecrawl_client() -> Any:
+    """Get or create the cached Firecrawl client.
+
+    When ``web.use_gateway`` is set in config, the managed Tool Gateway is
+    preferred even if direct Firecrawl credentials are present. Otherwise
+    direct Firecrawl takes precedence when explicitly configured.
+
+    Raises ValueError when neither path is usable.
+
+    The cached client is stored on :mod:`tools.web_tools` (as
+    ``_firecrawl_client`` and ``_firecrawl_client_config``) rather than on
+    this plugin module so that unit tests that reset the cache via
+    ``tools.web_tools._firecrawl_client = None`` keep working. Helper
+    functions (``prefers_gateway``, ``resolve_managed_tool_gateway``,
+    ``_read_nous_access_token``, ``Firecrawl``) are also looked up via
+    :mod:`tools.web_tools` for the same reason — see
+    :func:`_is_tool_gateway_ready`.
+    """
+    import tools.web_tools as _wt
+
+    direct_config = _get_direct_firecrawl_config()
+    if direct_config is not None and not _wt.prefers_gateway("web"):
+        kwargs, client_config = direct_config
+    else:
+        managed_gateway = _wt.resolve_managed_tool_gateway(
+            "firecrawl", token_reader=_wt._read_nous_access_token
+        )
+        if managed_gateway is None:
+            logger.error(
+                "Firecrawl client initialization failed: "
+                "missing direct config and tool-gateway auth."
+            )
+            _raise_web_backend_configuration_error()
+
+        kwargs = {
+            "api_key": managed_gateway.nous_user_token,
+            "api_url": managed_gateway.gateway_origin,
+        }
+        client_config = (
+            "tool-gateway",
+            kwargs["api_url"],
+            managed_gateway.nous_user_token,
+        )
+
+    cached = getattr(_wt, "_firecrawl_client", None)
+    cached_config = getattr(_wt, "_firecrawl_client_config", None)
+    if cached is not None and cached_config == client_config:
+        return cached
+
+    # Construct via the re-exported Firecrawl proxy on tools.web_tools so
+    # unit tests patching ``tools.web_tools.Firecrawl`` see their mock.
+    _wt._firecrawl_client = _wt.Firecrawl(**kwargs)
+    _wt._firecrawl_client_config = client_config
+    return _wt._firecrawl_client
+
+
+def _reset_client_for_tests() -> None:
+    """Drop the cached Firecrawl client so tests can re-instantiate cleanly.
+
+    Clears the canonical slots on :mod:`tools.web_tools` (where
+    :func:`_get_firecrawl_client` reads/writes them).
+    """
+    import tools.web_tools as _wt
+
+    _wt._firecrawl_client = None
+    _wt._firecrawl_client_config = None
+
+
+# ---------------------------------------------------------------------------
+# Response shape normalization (SDK / direct / gateway differ)
+# ---------------------------------------------------------------------------
+
+
+def _to_plain_object(value: Any) -> Any:
+    """Convert SDK objects to plain python data structures when possible."""
+    if value is None:
+        return None
+
+    if isinstance(value, (dict, list, str, int, float, bool)):
+        return value
+
+    if hasattr(value, "model_dump"):
+        try:
+            return value.model_dump()
+        except Exception:  # noqa: BLE001
+            pass
+
+    if hasattr(value, "__dict__"):
+        try:
+            return {k: v for k, v in value.__dict__.items() if not k.startswith("_")}
+        except Exception:  # noqa: BLE001
+            pass
+
+    return value
+
+
+def _normalize_result_list(values: Any) -> List[Dict[str, Any]]:
+    """Normalize mixed SDK/list payloads into a list of dicts."""
+    if not isinstance(values, list):
+        return []
+
+    normalized: List[Dict[str, Any]] = []
+    for item in values:
+        plain = _to_plain_object(item)
+        if isinstance(plain, dict):
+            normalized.append(plain)
+    return normalized
+
+
+def _extract_web_search_results(response: Any) -> List[Dict[str, Any]]:
+    """Extract Firecrawl search results across SDK/direct/gateway response shapes."""
+    response_plain = _to_plain_object(response)
+
+    if isinstance(response_plain, dict):
+        data = response_plain.get("data")
+        if isinstance(data, list):
+            return _normalize_result_list(data)
+
+        if isinstance(data, dict):
+            data_web = _normalize_result_list(data.get("web"))
+            if data_web:
+                return data_web
+            data_results = _normalize_result_list(data.get("results"))
+            if data_results:
+                return data_results
+
+        top_web = _normalize_result_list(response_plain.get("web"))
+        if top_web:
+            return top_web
+
+        top_results = _normalize_result_list(response_plain.get("results"))
+        if top_results:
+            return top_results
+
+    if hasattr(response, "web"):
+        return _normalize_result_list(getattr(response, "web", []))
+
+    return []
+
+
+def _extract_scrape_payload(scrape_result: Any) -> Dict[str, Any]:
+    """Normalize Firecrawl scrape payload shape across SDK and gateway variants."""
+    result_plain = _to_plain_object(scrape_result)
+    if not isinstance(result_plain, dict):
+        return {}
+
+    nested = result_plain.get("data")
+    if isinstance(nested, dict):
+        return nested
+
+    return result_plain
+
+
+# ---------------------------------------------------------------------------
+# Provider class
+# ---------------------------------------------------------------------------
+
+
+class FirecrawlWebSearchProvider(WebSearchProvider):
+    """Firecrawl search + extract provider with dual auth paths."""
+
+    @property
+    def name(self) -> str:
+        return "firecrawl"
+
+    @property
+    def display_name(self) -> str:
+        return "Firecrawl"
+
+    def is_available(self) -> bool:
+        """Return True when direct Firecrawl OR managed-gateway path is configured."""
+        return check_firecrawl_api_key()
+
+    def supports_search(self) -> bool:
+        return True
+
+    def supports_extract(self) -> bool:
+        return True
+
+    def supports_crawl(self) -> bool:
+        return True
+
+    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+        """Execute a Firecrawl search.
+
+        Sync; matches the legacy ``_get_firecrawl_client().search(...)``
+        call directly. Normalizes the response across SDK/direct/gateway
+        shapes via :func:`_extract_web_search_results`.
+
+        Pre-flight errors (``ValueError`` from configuration check,
+        ``ImportError`` from missing SDK) propagate to the dispatcher's
+        top-level handler, which wraps them as ``tool_error(...)`` —
+        matching the legacy ``{"error": "Error searching web: ..."}``
+        envelope. Only in-flight errors are caught and surfaced as
+        ``{"success": False, "error": ...}``.
+        """
+        from tools.interrupt import is_interrupted
+
+        if is_interrupted():
+            return {"success": False, "error": "Interrupted"}
+
+        logger.info("Firecrawl search: '%s' (limit=%d)", query, limit)
+        # _get_firecrawl_client() raises ValueError on unconfigured systems —
+        # let it propagate so the dispatcher emits the legacy envelope shape.
+        client = _get_firecrawl_client()
+        try:
+            response = client.search(query=query, limit=limit)
+            web_results = _extract_web_search_results(response)
+            logger.info("Firecrawl: found %d search results", len(web_results))
+            return {"success": True, "data": {"web": web_results}}
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Firecrawl search error: %s", exc)
+            return {"success": False, "error": f"Firecrawl search failed: {exc}"}
+
+    async def extract(self, urls: List[str], **kwargs: Any) -> List[Dict[str, Any]]:
+        """Extract content from one or more URLs via Firecrawl.
+
+        Async; each URL is scraped in a background thread with a 60s
+        timeout. After scraping, the final URL (post-redirect) is
+        re-checked against website-access policy.
+
+        Accepted kwargs (others ignored for forward compat):
+          - ``format``: ``"markdown"`` or ``"html"``; default is both
+            (request both, return markdown when available).
+
+        Returns the legacy per-URL list-of-results shape. Per-URL failures
+        (timeout, SSRF block, scrape error, policy block) become items
+        with an ``error`` field rather than raising.
+        """
+        from tools.interrupt import is_interrupted as _is_interrupted
+
+        if _is_interrupted():
+            return [{"url": u, "error": "Interrupted", "title": ""} for u in urls]
+
+        format = kwargs.get("format")
+        formats: List[str] = []
+        if format == "markdown":
+            formats = ["markdown"]
+        elif format == "html":
+            formats = ["html"]
+        else:
+            formats = ["markdown", "html"]
+
+        # check_website_access is the legacy policy gate; imported at
+        # module level (lazy-friendly because the website_policy import is
+        # cheap) so monkeypatching it in tests works as expected.
+
+        results: List[Dict[str, Any]] = []
+
+        for url in urls:
+            if _is_interrupted():
+                results.append({"url": url, "error": "Interrupted", "title": ""})
+                continue
+
+            # Pre-scrape website policy gate
+            blocked = check_website_access(url)
+            if blocked:
+                logger.info(
+                    "Blocked web_extract for %s by rule %s",
+                    blocked["host"],
+                    blocked["rule"],
+                )
+                results.append(
+                    {
+                        "url": url,
+                        "title": "",
+                        "content": "",
+                        "error": blocked["message"],
+                        "blocked_by_policy": {
+                            "host": blocked["host"],
+                            "rule": blocked["rule"],
+                            "source": blocked["source"],
+                        },
+                    }
+                )
+                continue
+
+            try:
+                logger.info("Firecrawl scraping: %s", url)
+                try:
+                    scrape_result = await asyncio.wait_for(
+                        asyncio.to_thread(
+                            _get_firecrawl_client().scrape,
+                            url=url,
+                            formats=formats,
+                        ),
+                        timeout=60,
+                    )
+                except asyncio.TimeoutError:
+                    logger.warning("Firecrawl scrape timed out for %s", url)
+                    results.append(
+                        {
+                            "url": url,
+                            "title": "",
+                            "content": "",
+                            "error": (
+                                "Scrape timed out after 60s — page may be too large "
+                                "or unresponsive. Try browser_navigate instead."
+                            ),
+                        }
+                    )
+                    continue
+
+                scrape_payload = _extract_scrape_payload(scrape_result)
+                metadata = scrape_payload.get("metadata", {})
+                content_markdown = scrape_payload.get("markdown")
+                content_html = scrape_payload.get("html")
+
+                # Ensure metadata is a dict (SDK may return a typed object)
+                if not isinstance(metadata, dict):
+                    if hasattr(metadata, "model_dump"):
+                        metadata = metadata.model_dump()
+                    elif hasattr(metadata, "__dict__"):
+                        metadata = metadata.__dict__
+                    else:
+                        metadata = {}
+
+                title = metadata.get("title", "")
+                final_url = metadata.get("sourceURL", url)
+
+                # Re-check website-access policy after any redirect
+                final_blocked = check_website_access(final_url)
+                if final_blocked:
+                    logger.info(
+                        "Blocked redirected web_extract for %s by rule %s",
+                        final_blocked["host"],
+                        final_blocked["rule"],
+                    )
+                    results.append(
+                        {
+                            "url": final_url,
+                            "title": title,
+                            "content": "",
+                            "raw_content": "",
+                            "error": final_blocked["message"],
+                            "blocked_by_policy": {
+                                "host": final_blocked["host"],
+                                "rule": final_blocked["rule"],
+                                "source": final_blocked["source"],
+                            },
+                        }
+                    )
+                    continue
+
+                # Choose markdown vs html according to the requested format
+                if format == "markdown" or (format is None and content_markdown):
+                    chosen_content = content_markdown
+                else:
+                    chosen_content = content_html or content_markdown or ""
+
+                results.append(
+                    {
+                        "url": final_url,
+                        "title": title,
+                        "content": chosen_content,
+                        "raw_content": chosen_content,
+                        "metadata": metadata,
+                    }
+                )
+            except Exception as scrape_err:  # noqa: BLE001
+                logger.debug("Firecrawl scrape failed for %s: %s", url, scrape_err)
+                results.append(
+                    {
+                        "url": url,
+                        "title": "",
+                        "content": "",
+                        "raw_content": "",
+                        "error": str(scrape_err),
+                    }
+                )
+
+        return results
+
+    async def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
+        """Crawl a seed URL via Firecrawl's ``/crawl`` endpoint.
+
+        Sync SDK call wrapped in ``asyncio.to_thread`` because the dispatcher
+        in :func:`tools.web_tools.web_crawl_tool` is async and runs LLM
+        post-processing on the response. The dispatcher gates the seed URL
+        against SSRF + website-access policy before calling us; this method
+        re-checks every crawled page's URL against the policy after the
+        crawl returns to catch redirected pages that map to a blocked host.
+
+        Accepted kwargs (others ignored for forward compat):
+          - ``instructions``: str — logged then dropped. Firecrawl's /crawl
+            endpoint does NOT accept natural-language instructions (that's
+            an /extract feature), so we record the value for debugging and
+            proceed without it. Tavily's crawl IS instruction-aware; this
+            divergence is documented in both plugins' docstrings.
+          - ``limit``: int — max pages to crawl (default 20).
+          - ``depth``: str — accepted for API parity with Tavily; ignored
+            by Firecrawl's crawl endpoint.
+
+        Returns ``{"results": [...]}`` matching the shape that
+        :func:`tools.web_tools.web_crawl_tool`'s shared LLM-summarization
+        path expects. Per-page failures (policy block on redirected URL,
+        bad response shape) are included as items with an ``error`` field
+        rather than raising.
+        """
+        try:
+            from tools.interrupt import is_interrupted
+
+            if is_interrupted():
+                return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
+
+            instructions = kwargs.get("instructions")
+            limit = kwargs.get("limit", 20)
+
+            # Firecrawl's /crawl endpoint does not accept natural-language
+            # instructions (that's an /extract feature). Log + drop.
+            if instructions:
+                logger.info(
+                    "Firecrawl crawl: 'instructions' parameter ignored "
+                    "(not supported by Firecrawl /crawl)"
+                )
+
+            logger.info("Firecrawl crawl: %s (limit=%d)", url, limit)
+
+            crawl_params = {
+                "limit": limit,
+                "scrape_options": {"formats": ["markdown"]},
+            }
+
+            # The SDK call is sync; run in a thread so we don't block the
+            # gateway event loop on a multi-page crawl.
+            crawl_result = await asyncio.to_thread(
+                _get_firecrawl_client().crawl,
+                url=url,
+                **crawl_params,
+            )
+
+            # CrawlJob normalization across SDK + direct + gateway shapes.
+            data_list: List[Any] = []
+            if hasattr(crawl_result, "data"):
+                data_list = crawl_result.data if crawl_result.data else []
+                logger.info(
+                    "Firecrawl crawl status: %s, %d pages",
+                    getattr(crawl_result, "status", "unknown"),
+                    len(data_list),
+                )
+            elif isinstance(crawl_result, dict) and "data" in crawl_result:
+                data_list = crawl_result.get("data", []) or []
+            else:
+                logger.warning(
+                    "Firecrawl crawl: unexpected result type %r",
+                    type(crawl_result).__name__,
+                )
+
+            pages: List[Dict[str, Any]] = []
+            for item in data_list:
+                # Pydantic model | typed object | dict — handle all shapes.
+                content_markdown = None
+                content_html = None
+                metadata: Any = {}
+
+                if hasattr(item, "model_dump"):
+                    item_dict = item.model_dump()
+                    content_markdown = item_dict.get("markdown")
+                    content_html = item_dict.get("html")
+                    metadata = item_dict.get("metadata", {})
+                elif hasattr(item, "__dict__"):
+                    content_markdown = getattr(item, "markdown", None)
+                    content_html = getattr(item, "html", None)
+                    metadata_obj = getattr(item, "metadata", {})
+                    if hasattr(metadata_obj, "model_dump"):
+                        metadata = metadata_obj.model_dump()
+                    elif hasattr(metadata_obj, "__dict__"):
+                        metadata = metadata_obj.__dict__
+                    elif isinstance(metadata_obj, dict):
+                        metadata = metadata_obj
+                    else:
+                        metadata = {}
+                elif isinstance(item, dict):
+                    content_markdown = item.get("markdown")
+                    content_html = item.get("html")
+                    metadata = item.get("metadata", {})
+
+                # Ensure metadata is a plain dict.
+                if not isinstance(metadata, dict):
+                    if hasattr(metadata, "model_dump"):
+                        metadata = metadata.model_dump()
+                    elif hasattr(metadata, "__dict__"):
+                        metadata = metadata.__dict__
+                    else:
+                        metadata = {}
+
+                page_url = metadata.get(
+                    "sourceURL", metadata.get("url", "Unknown URL")
+                )
+                title = metadata.get("title", "")
+
+                # Per-page policy re-check (catches blocked redirects).
+                page_blocked = check_website_access(page_url)
+                if page_blocked:
+                    logger.info(
+                        "Blocked crawled page %s by rule %s",
+                        page_blocked["host"],
+                        page_blocked["rule"],
+                    )
+                    pages.append(
+                        {
+                            "url": page_url,
+                            "title": title,
+                            "content": "",
+                            "raw_content": "",
+                            "error": page_blocked["message"],
+                            "blocked_by_policy": {
+                                "host": page_blocked["host"],
+                                "rule": page_blocked["rule"],
+                                "source": page_blocked["source"],
+                            },
+                        }
+                    )
+                    continue
+
+                content = content_markdown or content_html or ""
+                pages.append(
+                    {
+                        "url": page_url,
+                        "title": title,
+                        "content": content,
+                        "raw_content": content,
+                        "metadata": metadata,
+                    }
+                )
+
+            return {"results": pages}
+        except ValueError as exc:
+            return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
+        except ImportError as exc:
+            return {
+                "results": [
+                    {
+                        "url": url,
+                        "title": "",
+                        "content": "",
+                        "error": f"Firecrawl SDK not installed: {exc}",
+                    }
+                ]
+            }
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Firecrawl crawl error: %s", exc)
+            return {
+                "results": [
+                    {
+                        "url": url,
+                        "title": "",
+                        "content": "",
+                        "error": f"Firecrawl crawl failed: {exc}",
+                    }
+                ]
+            }
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "Firecrawl",
+            "badge": "paid · optional gateway",
+            "tag": (
+                "Full search + extract + crawl; supports direct API and "
+                "Nous tool-gateway routing."
+            ),
+            "env_vars": [
+                {
+                    "key": "FIRECRAWL_API_KEY",
+                    "prompt": "Firecrawl API key (or leave blank for self-hosted)",
+                    "url": "https://docs.firecrawl.dev/introduction",
+                },
+            ],
+        }
@@ -0,0 +1,16 @@
+"""Parallel.ai web search + extract plugin — bundled, auto-loaded.
+
+First plugin in this repo to expose an async :meth:`extract` — Parallel's
+SDK is async-native (``AsyncParallel.beta.extract``). The web_extract_tool
+dispatcher detects coroutines via :func:`inspect.iscoroutinefunction` and
+awaits.
+"""
+
+from __future__ import annotations
+
+from plugins.web.parallel.provider import ParallelWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the Parallel provider with the plugin context."""
+    ctx.register_web_search_provider(ParallelWebSearchProvider())
@@ -0,0 +1,7 @@
+name: web-parallel
+version: 1.0.0
+description: "Parallel.ai web search + content extraction. Search returns objective-tuned results; extract uses the async SDK for parallel page fetches. Requires PARALLEL_API_KEY — sign up at https://parallel.ai."
+author: NousResearch
+kind: backend
+provides_web_providers:
+  - parallel
@@ -0,0 +1,291 @@
+"""Parallel.ai web search + content extraction — plugin form.
+
+Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Uses two
+distinct Parallel SDK clients:
+
+- ``Parallel`` (sync)        — for :meth:`search`
+- ``AsyncParallel`` (async)  — for :meth:`extract`
+
+This is the first plugin to exercise the **async-extract** code path in
+the ABC: :meth:`extract` is declared ``async def``, and the dispatcher
+in :func:`tools.web_tools.web_extract_tool` detects coroutines via
+:func:`inspect.iscoroutinefunction` and awaits.
+
+Config keys this provider responds to::
+
+    web:
+      search_backend: "parallel"      # explicit per-capability
+      extract_backend: "parallel"     # explicit per-capability
+      backend: "parallel"             # shared fallback
+      # Optional: search mode (default "agentic"; also "fast" or "one-shot")
+      # via the PARALLEL_SEARCH_MODE env var.
+
+Env vars::
+
+    PARALLEL_API_KEY=...             # https://parallel.ai (required)
+    PARALLEL_SEARCH_MODE=agentic     # optional: agentic|fast|one-shot
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from typing import Any, Dict, List
+
+from agent.web_search_provider import WebSearchProvider
+
+logger = logging.getLogger(__name__)
+
+# Module-level note: the canonical cache slots ``_parallel_client`` and
+# ``_async_parallel_client`` live on :mod:`tools.web_tools` so tests that do
+# ``tools.web_tools._parallel_client = None`` between cases see fresh state.
+# The plugin reads/writes through that public module (see
+# :func:`_get_sync_client` / :func:`_get_async_client`).
+
+
+def _ensure_parallel_sdk_installed() -> None:
+    """Trigger lazy install of the parallel SDK if it isn't present.
+
+    Mirrors the lazy-deps pattern used by the legacy implementation.
+    Swallows benign ImportError from the lazy_deps helper itself; if the
+    SDK is genuinely missing the subsequent ``from parallel import ...``
+    raises ImportError that the caller can handle.
+    """
+    try:
+        from tools.lazy_deps import ensure as _lazy_ensure
+
+        _lazy_ensure("search.parallel", prompt=False)
+    except ImportError:
+        pass
+    except Exception as exc:  # noqa: BLE001 — surface install hint as ImportError
+        raise ImportError(str(exc))
+
+
+def _get_sync_client() -> Any:
+    """Lazy-load + cache the sync Parallel client.
+
+    Cache lives on :mod:`tools.web_tools` (as ``_parallel_client``) so unit
+    tests that reset that name between cases keep working.
+    """
+    import tools.web_tools as _wt
+
+    cached = getattr(_wt, "_parallel_client", None)
+    if cached is not None:
+        return cached
+
+    api_key = os.getenv("PARALLEL_API_KEY")
+    if not api_key:
+        raise ValueError(
+            "PARALLEL_API_KEY environment variable not set. "
+            "Get your API key at https://parallel.ai"
+        )
+
+    _ensure_parallel_sdk_installed()
+    from parallel import Parallel  # noqa: WPS433 — deliberately lazy
+
+    client = Parallel(api_key=api_key)
+    _wt._parallel_client = client
+    return client
+
+
+def _get_async_client() -> Any:
+    """Lazy-load + cache the async Parallel client.
+
+    Cache lives on :mod:`tools.web_tools` (as ``_async_parallel_client``).
+    """
+    import tools.web_tools as _wt
+
+    cached = getattr(_wt, "_async_parallel_client", None)
+    if cached is not None:
+        return cached
+
+    api_key = os.getenv("PARALLEL_API_KEY")
+    if not api_key:
+        raise ValueError(
+            "PARALLEL_API_KEY environment variable not set. "
+            "Get your API key at https://parallel.ai"
+        )
+
+    _ensure_parallel_sdk_installed()
+    from parallel import AsyncParallel  # noqa: WPS433 — deliberately lazy
+
+    client = AsyncParallel(api_key=api_key)
+    _wt._async_parallel_client = client
+    return client
+
+
+def _reset_clients_for_tests() -> None:
+    """Drop both cached clients so tests can re-instantiate cleanly.
+
+    Clears the canonical slots on :mod:`tools.web_tools` (where
+    :func:`_get_sync_client` / :func:`_get_async_client` read/write them).
+    """
+    import tools.web_tools as _wt
+
+    _wt._parallel_client = None
+    _wt._async_parallel_client = None
+
+
+# Backward-compatible aliases for the names that lived in tools.web_tools
+# before the migration (matches existing tests + external callers).
+_get_parallel_client = _get_sync_client
+_get_async_parallel_client = _get_async_client
+
+
+def _resolve_search_mode() -> str:
+    """Return the validated PARALLEL_SEARCH_MODE value (default "agentic")."""
+    mode = os.getenv("PARALLEL_SEARCH_MODE", "agentic").lower().strip()
+    if mode not in {"fast", "one-shot", "agentic"}:
+        mode = "agentic"
+    return mode
+
+
+class ParallelWebSearchProvider(WebSearchProvider):
+    """Parallel.ai search + async extract provider."""
+
+    @property
+    def name(self) -> str:
+        return "parallel"
+
+    @property
+    def display_name(self) -> str:
+        return "Parallel"
+
+    def is_available(self) -> bool:
+        """Return True when ``PARALLEL_API_KEY`` is set to a non-empty value."""
+        return bool(os.getenv("PARALLEL_API_KEY", "").strip())
+
+    def supports_search(self) -> bool:
+        return True
+
+    def supports_extract(self) -> bool:
+        return True
+
+    def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+        """Execute a Parallel search (sync).
+
+        Uses the ``beta.search`` endpoint with the configured mode
+        (``PARALLEL_SEARCH_MODE`` env var, default "agentic"). Limit is
+        capped at 20 server-side.
+        """
+        try:
+            from tools.interrupt import is_interrupted
+
+            if is_interrupted():
+                return {"success": False, "error": "Interrupted"}
+
+            mode = _resolve_search_mode()
+            logger.info(
+                "Parallel search: '%s' (mode=%s, limit=%d)", query, mode, limit
+            )
+            response = _get_sync_client().beta.search(
+                search_queries=[query],
+                objective=query,
+                mode=mode,
+                max_results=min(limit, 20),
+            )
+
+            web_results = []
+            for i, result in enumerate(response.results or []):
+                excerpts = result.excerpts or []
+                web_results.append(
+                    {
+                        "url": result.url or "",
+                        "title": result.title or "",
+                        "description": " ".join(excerpts) if excerpts else "",
+                        "position": i + 1,
+                    }
+                )
+
+            return {"success": True, "data": {"web": web_results}}
+        except ValueError as exc:
+            return {"success": False, "error": str(exc)}
+        except ImportError as exc:
+            return {
+                "success": False,
+                "error": f"Parallel SDK not installed: {exc}",
+            }
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Parallel search error: %s", exc)
+            return {"success": False, "error": f"Parallel search failed: {exc}"}
+
+    async def extract(
+        self, urls: List[str], **kwargs: Any
+    ) -> List[Dict[str, Any]]:
+        """Extract content from one or more URLs via the async SDK.
+
+        Returns the legacy list-of-results shape that
+        :func:`tools.web_tools.web_extract_tool` expects: one entry per
+        successful URL plus one entry per failed URL with an ``error``
+        field. Errors are not raised — they're returned as per-URL items.
+        """
+        try:
+            from tools.interrupt import is_interrupted
+
+            if is_interrupted():
+                return [
+                    {"url": u, "error": "Interrupted", "title": ""} for u in urls
+                ]
+
+            logger.info("Parallel extract: %d URL(s)", len(urls))
+            response = await _get_async_client().beta.extract(
+                urls=urls,
+                full_content=True,
+            )
+
+            results: List[Dict[str, Any]] = []
+            for result in response.results or []:
+                content = result.full_content or ""
+                if not content:
+                    content = "\n\n".join(result.excerpts or [])
+                url = result.url or ""
+                title = result.title or ""
+                results.append(
+                    {
+                        "url": url,
+                        "title": title,
+                        "content": content,
+                        "raw_content": content,
+                        "metadata": {"sourceURL": url, "title": title},
+                    }
+                )
+
+            for error in response.errors or []:
+                results.append(
+                    {
+                        "url": error.url or "",
+                        "title": "",
+                        "content": "",
+                        "error": error.content or error.error_type or "extraction failed",
+                        "metadata": {"sourceURL": error.url or ""},
+                    }
+                )
+
+            return results
+        except ValueError as exc:
+            return [{"url": u, "title": "", "content": "", "error": str(exc)} for u in urls]
+        except ImportError as exc:
+            return [
+                {"url": u, "title": "", "content": "", "error": f"Parallel SDK not installed: {exc}"}
+                for u in urls
+            ]
+        except Exception as exc:  # noqa: BLE001
+            logger.warning("Parallel extract error: %s", exc)
+            return [
+                {"url": u, "title": "", "content": "", "error": f"Parallel extract failed: {exc}"}
+                for u in urls
+            ]
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        return {
+            "name": "Parallel",
+            "badge": "paid",
+            "tag": "Objective-tuned search + parallel page extraction.",
+            "env_vars": [
+                {
+                    "key": "PARALLEL_API_KEY",
+                    "prompt": "Parallel API key",
+                    "url": "https://parallel.ai",
+                },
+            ],
+        }
@@ -0,0 +1,15 @@
+"""SearXNG search plugin — bundled, auto-loaded.
+
+Backed by a user-hosted SearXNG instance (URL configured via ``SEARXNG_URL``).
+Search-only — pair with an extract provider (firecrawl/tavily/exa) for
+``web_extract`` calls.
+"""
+
+from __future__ import annotations
+
+from plugins.web.searxng.provider import SearXNGWebSearchProvider
+
+
+def register(ctx) -> None:
+    """Register the SearXNG provider with the plugin context."""
+    ctx.register_web_search_provider(SearXNGWebSearchProvider())
--- a/Show More
+++ b/Show More