ci: split Tests workflow into 4 parallel shards via pytest-split

Reduces CI wall time by running the test suite as 4 parallel matrix jobs instead of a single job. Each shard runs ~3,000 tests in parallel, so total wall time drops from ~4min to ~60-90s. Changes: - Add pytest-split to dev extras (deterministic test splitting, composes with pytest-xdist's -n auto inside each shard). - Matrix-split tests.yml 'test' job into 4 groups. Each shard runs 'pytest ... --splits 4 --group N' and parallelizes inside with the -n auto already in pyproject.toml's addopts. - fail-fast: false so all shards finish even if one fails (consistent with current behavior when there's no matrix). Expected CI timing: Before: 243s single-job (4m03s) After: ~60-90s per shard in parallel + ~25s install overhead \u2192 total CI ~90-115s No test-file changes. Deterministic hash-based distribution (no .test_durations file yet; can add one later for better balance). The e2e job is unchanged — it's already small (20s) and runs separately.
2026-04-17 04:21:02 -07:00
1537 changed files with 24207 additions and 162473 deletions
--- a/.envrc
+++ b/.envrc
@@ -1,5 +1 @@
-watch_file pyproject.toml uv.lock
-watch_file ui-tui/package-lock.json ui-tui/package.json
-watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix
-
 use flake
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -1,5 +0,0 @@
-# hermes_agent package restructure (PR 1/3)
-# Commit 2: pure git mv — all source files into hermes_agent/
-65ca3ba93b3fa7fd2b15af5b62d54020061f3672
-# Commit 3: rewrite all imports for hermes_agent package
-4b16341975a1217588054f567d0f76dc5a3cc481
--- a/.github/actions/nix-setup/action.yml
+++ b/.github/actions/nix-setup/action.yml
@@ -1,8 +0,0 @@
-name: 'Setup Nix'
-description: 'Install Nix with DeterminateSystems and enable magic-nix-cache'
-
-runs:
-  using: composite
-  steps:
-    - uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25 # v22
-    - uses: DeterminateSystems/magic-nix-cache-action@565684385bcd71bad329742eefe8d12f2e765b39 # v13
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -3,13 +3,8 @@ name: Docker Build and Publish
 on:
  push:
    branches: [main]
-    paths:
-      - '**/*.py'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - 'Dockerfile'
-      - 'docker/**'
-      - '.github/workflows/docker-publish.yml'
+  pull_request:
+    branches: [main]
  release:
    types: [published]

@@ -54,14 +49,6 @@ jobs:

      - name: Test image starts
        run: |
-          # The image runs as the hermes user (UID 10000).  GitHub Actions
-          # creates /tmp/hermes-test root-owned by default, which hermes
-          # can't write to — chown it to match the in-container UID before
-          # bind-mounting.  Real users doing `docker run -v ~/.hermes:...`
-          # with their own UID hit the same issue and have their own
-          # remediations (HERMES_UID env var, or chown locally).
-          mkdir -p /tmp/hermes-test
-          sudo chown -R 10000:10000 /tmp/hermes-test
          docker run --rm \
            -v /tmp/hermes-test:/opt/data \
            --entrypoint /opt/hermes/docker/entrypoint.sh \
--- a/.github/workflows/nix-lockfile-check.yml
+++ b/.github/workflows/nix-lockfile-check.yml
@@ -1,68 +0,0 @@
-name: Nix Lockfile Check
-
-on:
-  pull_request:
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  pull-requests: write
-
-concurrency:
-  group: nix-lockfile-check-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  check:
-    runs-on: ubuntu-latest
-    timeout-minutes: 20
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - uses: ./.github/actions/nix-setup
-
-      - name: Resolve head SHA
-        id: sha
-        shell: bash
-        run: |
-          FULL="${{ github.event.pull_request.head.sha || github.sha }}"
-          echo "full=$FULL" >> "$GITHUB_OUTPUT"
-          echo "short=${FULL:0:7}" >> "$GITHUB_OUTPUT"
-
-      - name: Check lockfile hashes
-        id: check
-        continue-on-error: true
-        env:
-          LINK_SHA: ${{ steps.sha.outputs.full }}
-        run: nix run .#fix-lockfiles -- --check
-
-      - name: Post sticky PR comment (stale)
-        if: steps.check.outputs.stale == 'true' && github.event_name == 'pull_request'
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          message: |
-            ### ⚠️ npm lockfile hash out of date
-
-            Checked against commit [`${{ steps.sha.outputs.short }}`](${{ github.server_url }}/${{ github.repository }}/commit/${{ steps.sha.outputs.full }}) (PR head at check time).
-
-            The `hash = "sha256-..."` line in these nix files no longer matches the committed `package-lock.json`:
-
-            ${{ steps.check.outputs.report }}
-
-            #### Apply the fix
-
-            - [ ] **Apply lockfile fix** — tick to push a commit with the correct hashes to this PR branch
-            - Or [run the Nix Lockfile Fix workflow](${{ github.server_url }}/${{ github.repository }}/actions/workflows/nix-lockfile-fix.yml) manually (pass PR `#${{ github.event.pull_request.number }}`)
-            - Or locally: `nix run .#fix-lockfiles -- --apply` and commit the diff
-
-      - name: Clear sticky PR comment (resolved)
-        if: steps.check.outputs.stale == 'false' && github.event_name == 'pull_request'
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          delete: true
-
-      - name: Fail if stale
-        if: steps.check.outputs.stale == 'true'
-        run: exit 1
--- a/.github/workflows/nix-lockfile-fix.yml
+++ b/.github/workflows/nix-lockfile-fix.yml
@@ -1,149 +0,0 @@
-name: Nix Lockfile Fix
-
-on:
-  workflow_dispatch:
-    inputs:
-      pr_number:
-        description: 'PR number to fix (leave empty to run on the selected branch)'
-        required: false
-        type: string
-  issue_comment:
-    types: [edited]
-
-permissions:
-  contents: write
-  pull-requests: write
-
-concurrency:
-  group: nix-lockfile-fix-${{ github.event.issue.number || github.event.inputs.pr_number || github.ref }}
-  cancel-in-progress: false
-
-jobs:
-  fix:
-    # Run on manual dispatch OR when a task-list checkbox in the sticky
-    # lockfile-check comment flips from `[ ]` to `[x]`.
-    if: |
-      github.event_name == 'workflow_dispatch' ||
-      (github.event_name == 'issue_comment'
-       && github.event.issue.pull_request != null
-       && contains(github.event.comment.body, '[x] **Apply lockfile fix**')
-       && !contains(github.event.changes.body.from, '[x] **Apply lockfile fix**'))
-    runs-on: ubuntu-latest
-    timeout-minutes: 25
-    steps:
-      - name: Authorize & resolve PR
-        id: resolve
-        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea  # v7.0.1
-        with:
-          script: |
-            // 1. Verify the actor has write access — applies to both checkbox
-            //    clicks and manual dispatch.
-            const { data: perm } =
-              await github.rest.repos.getCollaboratorPermissionLevel({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                username: context.actor,
-              });
-            if (!['admin', 'write', 'maintain'].includes(perm.permission)) {
-              core.setFailed(
-                `${context.actor} lacks write access (has: ${perm.permission})`
-              );
-              return;
-            }
-
-            // 2. Resolve which ref to check out.
-            let prNumber = '';
-            if (context.eventName === 'issue_comment') {
-              prNumber = String(context.payload.issue.number);
-            } else if (context.eventName === 'workflow_dispatch') {
-              prNumber = context.payload.inputs.pr_number || '';
-            }
-
-            if (!prNumber) {
-              core.setOutput('ref', context.ref.replace(/^refs\/heads\//, ''));
-              core.setOutput('repo', context.repo.repo);
-              core.setOutput('owner', context.repo.owner);
-              core.setOutput('pr', '');
-              return;
-            }
-
-            const { data: pr } = await github.rest.pulls.get({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              pull_number: Number(prNumber),
-            });
-            core.setOutput('ref', pr.head.ref);
-            core.setOutput('repo', pr.head.repo.name);
-            core.setOutput('owner', pr.head.repo.owner.login);
-            core.setOutput('pr', String(pr.number));
-
-      # Wipe the sticky lockfile-check comment to a "running" state as soon
-      # as the job is authorized, so the user sees their click was picked up
-      # before the ~minute of nix build work.
-      - name: Mark sticky as running
-        if: steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### 🔄 Applying lockfile fix…
-
-            Triggered by @${{ github.actor }} — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-        with:
-          repository: ${{ steps.resolve.outputs.owner }}/${{ steps.resolve.outputs.repo }}
-          ref: ${{ steps.resolve.outputs.ref }}
-          token: ${{ secrets.GITHUB_TOKEN }}
-          fetch-depth: 0
-
-      - uses: ./.github/actions/nix-setup
-
-      - name: Apply lockfile hashes
-        id: apply
-        run: nix run .#fix-lockfiles -- --apply
-
-      - name: Commit & push
-        if: steps.apply.outputs.changed == 'true'
-        shell: bash
-        run: |
-          set -euo pipefail
-          git config user.name 'github-actions[bot]'
-          git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
-          git add nix/tui.nix nix/web.nix
-          git commit -m "fix(nix): refresh npm lockfile hashes"
-          git push
-
-      - name: Update sticky (applied)
-        if: steps.apply.outputs.changed == 'true' && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ✅ Lockfile fix applied
-
-            Pushed a commit refreshing the npm lockfile hashes — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - name: Update sticky (already current)
-        if: steps.apply.outputs.changed == 'false' && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ✅ Lockfile hashes already current
-
-            Nothing to commit — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - name: Update sticky (failed)
-        if: failure() && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ❌ Lockfile fix failed
-
-            See the [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}) for logs.
--- a/.github/workflows/nix.yml
+++ b/.github/workflows/nix.yml
@@ -4,6 +4,15 @@ on:
  push:
    branches: [main]
  pull_request:
+    paths:
+      - 'flake.nix'
+      - 'flake.lock'
+      - 'nix/**'
+      - 'pyproject.toml'
+      - 'uv.lock'
+      - 'hermes_cli/**'
+      - 'run_agent.py'
+      - 'acp_adapter/**'

 permissions:
  contents: read
@@ -20,8 +29,9 @@ jobs:
    runs-on: ${{ matrix.os }}
    timeout-minutes: 30
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
-      - uses: ./.github/actions/nix-setup
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25  # v22
+      - uses: DeterminateSystems/magic-nix-cache-action@565684385bcd71bad329742eefe8d12f2e765b39  # v13
      - name: Check flake
        if: runner.os == 'Linux'
        run: nix flake check --print-build-logs
--- a/.github/workflows/supply-chain-audit.yml
+++ b/.github/workflows/supply-chain-audit.yml
@@ -3,31 +3,14 @@ name: Supply Chain Audit
 on:
  pull_request:
    types: [opened, synchronize, reopened]
-    paths:
-      - '**/*.py'
-      - '**/*.pth'
-      - '**/setup.py'
-      - '**/setup.cfg'
-      - '**/sitecustomize.py'
-      - '**/usercustomize.py'
-      - '**/__init__.pth'

 permissions:
  pull-requests: write
  contents: read

-# Narrow, high-signal scanner. Only fires on critical indicators of supply
-# chain attacks (e.g. the litellm-style payloads). Low-signal heuristics
-# (plain base64, plain exec/eval, dependency/Dockerfile/workflow edits,
-# Actions version unpinning, outbound POST/PUT) were intentionally
-# removed — they fired on nearly every PR and trained reviewers to ignore
-# the scanner. Keep this file's checks ruthlessly narrow: if you find
-# yourself adding WARNING-tier patterns here again, make a separate
-# advisory-only workflow instead.
-
 jobs:
  scan:
-    name: Scan PR for critical supply chain risks
+    name: Scan PR for supply chain risks
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
@@ -35,7 +18,7 @@ jobs:
        with:
          fetch-depth: 0

-      - name: Scan diff for critical patterns
+      - name: Scan diff for suspicious patterns
        id: scan
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -45,19 +28,19 @@ jobs:
          BASE="${{ github.event.pull_request.base.sha }}"
          HEAD="${{ github.event.pull_request.head.sha }}"

-          # Added lines only, excluding lockfiles.
+          # Get the full diff (added lines only)
          DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)

          FINDINGS=""
+          CRITICAL=false

          # --- .pth files (auto-execute on Python startup) ---
-          # The exact mechanism used in the litellm supply chain attack:
-          # https://github.com/BerriAI/litellm/issues/24512
          PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
          if [ -n "$PTH_FILES" ]; then
+            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: .pth file added or modified
-          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required.
+          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).

          **Files:**
          \`\`\`
@@ -66,12 +49,13 @@ jobs:
          "
          fi

-          # --- base64 decode + exec/eval on the same line (the litellm attack pattern) ---
+          # --- base64 + exec/eval combo (the litellm attack pattern) ---
          B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
          if [ -n "$B64_EXEC_HITS" ]; then
+            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: base64 decode + exec/eval combo
-          Base64-decoded strings passed directly to exec/eval — the signature of hidden credential-stealing payloads.
+          This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.

          **Matches:**
          \`\`\`
@@ -80,12 +64,41 @@ jobs:
          "
          fi

-          # --- subprocess with encoded/obfuscated command argument ---
-          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|\\x[0-9a-f]{2}|chr\(' | head -10 || true)
+          # --- base64 decode/encode (alone — legitimate uses exist) ---
+          B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
+          if [ -n "$B64_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: base64 encoding/decoding detected
+          Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
+
+          **Matches (first 20):**
+          \`\`\`
+          ${B64_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- exec/eval with string arguments ---
+          EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
+          if [ -n "$EXEC_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: exec() or eval() usage
+          Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
+
+          **Matches (first 20):**
+          \`\`\`
+          ${EXEC_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- subprocess with encoded/obfuscated commands ---
+          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
          if [ -n "$PROC_HITS" ]; then
+            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: subprocess with encoded/obfuscated command
-          Subprocess calls whose command strings are base64- or hex-encoded are a strong indicator of payload execution.
+          Subprocess calls with encoded arguments are a strong indicator of payload execution.

          **Matches:**
          \`\`\`
@@ -94,12 +107,25 @@ jobs:
          "
          fi

-          # --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
-          # These execute during pip install or interpreter startup.
-          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
+          # --- Network calls to non-standard domains ---
+          EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
+          if [ -n "$EXFIL_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: Outbound network calls (POST/PUT)
+          Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
+
+          **Matches (first 10):**
+          \`\`\`
+          ${EXFIL_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- setup.py / setup.cfg install hooks ---
+          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
          if [ -n "$SETUP_HITS" ]; then
            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: Install-hook file added or modified
+          ### ⚠️ WARNING: Install hook files modified
          These files can execute code during package installation or interpreter startup.

          **Files:**
@@ -109,31 +135,114 @@ jobs:
          "
          fi

+          # --- Compile/marshal/pickle (code object injection) ---
+          MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
+          if [ -n "$MARSHAL_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: marshal/pickle/compile usage
+          These can deserialize or construct executable code objects.
+
+          **Matches:**
+          \`\`\`
+          ${MARSHAL_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- CI/CD workflow files modified ---
+          WORKFLOW_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '\.github/workflows/.*\.ya?ml$' || true)
+          if [ -n "$WORKFLOW_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: CI/CD workflow files modified
+          Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.
+
+          **Files:**
+          \`\`\`
+          ${WORKFLOW_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- Dockerfile / container build files modified ---
+          DOCKER_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -iE '(Dockerfile|\.dockerignore|docker-compose)' || true)
+          if [ -n "$DOCKER_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: Container build files modified
+          Changes to Dockerfiles or compose files can alter base images, add build steps, or expose ports. Verify base image pins and build commands.
+
+          **Files:**
+          \`\`\`
+          ${DOCKER_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- Dependency manifest files modified ---
+          DEP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(pyproject\.toml|requirements.*\.txt|package\.json|Gemfile|go\.mod|Cargo\.toml)$' || true)
+          if [ -n "$DEP_HITS" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: Dependency manifest files modified
+          Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.
+
+          **Files:**
+          \`\`\`
+          ${DEP_HITS}
+          \`\`\`
+          "
+          fi
+
+          # --- GitHub Actions version unpinning (mutable tags instead of SHAs) ---
+          ACTIONS_UNPIN=$(echo "$DIFF" | grep -n '^\+' | grep 'uses:' | grep -v '#' | grep -E '@v[0-9]' | head -10 || true)
+          if [ -n "$ACTIONS_UNPIN" ]; then
+            FINDINGS="${FINDINGS}
+          ### ⚠️ WARNING: GitHub Actions with mutable version tags
+          Actions should be pinned to full commit SHAs (not \`@v4\`, \`@v5\`). Mutable tags can be retargeted silently if a maintainer account is compromised.
+
+          **Matches:**
+          \`\`\`
+          ${ACTIONS_UNPIN}
+          \`\`\`
+          "
+          fi
+
+          # --- Output results ---
          if [ -n "$FINDINGS" ]; then
            echo "found=true" >> "$GITHUB_OUTPUT"
+            if [ "$CRITICAL" = true ]; then
+              echo "critical=true" >> "$GITHUB_OUTPUT"
+            else
+              echo "critical=false" >> "$GITHUB_OUTPUT"
+            fi
+            # Write findings to a file (multiline env vars are fragile)
            echo "$FINDINGS" > /tmp/findings.md
          else
            echo "found=false" >> "$GITHUB_OUTPUT"
+            echo "critical=false" >> "$GITHUB_OUTPUT"
          fi

-      - name: Post critical finding comment
+      - name: Post warning comment
        if: steps.scan.outputs.found == 'true'
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
-          BODY="## 🚨 CRITICAL Supply Chain Risk Detected
+          SEVERITY="⚠️ Supply Chain Risk Detected"
+          if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
+            SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
+          fi

-          This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.
+          BODY="## ${SEVERITY}
+
+          This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.

          $(cat /tmp/findings.md)

          ---
-          *Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.*"
+          *Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"

          gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs — GITHUB_TOKEN is read-only)"

      - name: Fail on critical findings
-        if: steps.scan.outputs.found == 'true'
+        if: steps.scan.outputs.critical == 'true'
        run: |
          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
          exit 1
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -3,14 +3,8 @@ name: Tests
 on:
  push:
    branches: [main]
-    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'
  pull_request:
    branches: [main]
-    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'

 permissions:
  contents: read
@@ -22,8 +16,13 @@ concurrency:

 jobs:
  test:
+    name: test (${{ matrix.group }}/4)
    runs-on: ubuntu-latest
-    timeout-minutes: 20
+    timeout-minutes: 10
+    strategy:
+      fail-fast: false
+      matrix:
+        group: [1, 2, 3, 4]
    steps:
      - name: Checkout code
        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
@@ -43,10 +42,11 @@ jobs:
          source .venv/bin/activate
          uv pip install -e ".[all,dev]"

-      - name: Run tests
+      - name: Run tests (shard ${{ matrix.group }}/4)
        run: |
          source .venv/bin/activate
-          python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
+          python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short \
+            --splits 4 --group ${{ matrix.group }}
        env:
          # Ensure tests don't accidentally call real APIs
          OPENROUTER_API_KEY: ""
--- a/.gitignore
+++ b/.gitignore
@@ -54,17 +54,11 @@ environments/benchmarks/evals/
 # Web UI build output
 hermes_cli/web_dist/

-# Web UI assets — synced from @nous-research/ui at build time via
-# `npm run sync-assets` (see web/package.json).
-web/public/fonts/
-web/public/ds-assets/
-
 # Release script temp files
 .release_notes.md
 mini-swe-agent/

 # Nix
 .direnv/
-.nix-stamps/
 result
 website/static/api/skills-index.json
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -12,59 +12,55 @@ source venv/bin/activate  # ALWAYS activate before running Python

 ```
 hermes-agent/
-├── hermes_agent/             # Single installable package
-│   ├── agent/                # Core conversation loop and agent internals
-│   │   ├── loop.py               # AIAgent class — core conversation loop
-│   │   ├── prompt_builder.py     # System prompt assembly
-│   │   ├── context/              # Context management (engine, compressor, references)
-│   │   ├── memory/               # Memory management (manager, provider)
-│   │   ├── image_gen/            # Image generation (provider, registry)
-│   │   ├── display.py            # KawaiiSpinner, tool preview formatting
-│   │   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
-│   │   └── trajectory.py         # Trajectory saving helpers
-│   ├── providers/            # LLM provider adapters and transports
-│   │   ├── anthropic_adapter.py  # Anthropic adapter
-│   │   ├── anthropic_transport.py # Anthropic transport
-│   │   ├── metadata.py           # Model context lengths, token estimation
-│   │   ├── auxiliary.py           # Auxiliary LLM client (vision, summarization)
-│   │   ├── caching.py            # Anthropic prompt caching
-│   │   └── credential_pool.py    # Credential management
-│   ├── tools/                # Tool implementations
-│   │   ├── dispatch.py           # Tool orchestration, discover_builtin_tools()
-│   │   ├── toolsets.py           # Toolset definitions
-│   │   ├── registry.py           # Central tool registry
-│   │   ├── terminal.py           # Terminal orchestration
-│   │   ├── browser/              # Browser tools (tool, cdp, camofox, providers/)
-│   │   ├── mcp/                  # MCP client and server
-│   │   ├── skills/               # Skill management (manager, tool, hub, guard, sync)
-│   │   ├── media/                # Voice, TTS, transcription, image gen
-│   │   ├── files/                # File operations (tools, operations, state)
-│   │   └── security/             # Path security, URL safety, approval
-│   ├── backends/             # Terminal backends (local, docker, ssh, modal, daytona, singularity)
-│   ├── cli/                  # CLI subcommands and setup
-│   │   ├── main.py               # Entry point — all `hermes` subcommands
-│   │   ├── repl.py               # HermesCLI class — interactive CLI orchestrator
-│   │   ├── config.py             # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
-│   │   ├── commands.py           # Slash command definitions
-│   │   ├── auth/                 # Provider credential resolution
-│   │   ├── models/               # Model catalog, provider lists, switching
-│   │   └── ui/                   # Banner, colors, skin engine, callbacks, tips
-│   ├── gateway/              # Messaging platform gateway
-│   │   ├── run.py                # Main loop, slash commands, message dispatch
-│   │   ├── session.py            # SessionStore — conversation persistence
-│   │   └── platforms/            # Adapters: telegram, discord, slack, whatsapp, etc.
-│   ├── acp/                  # ACP server (VS Code / Zed / JetBrains integration)
-│   ├── cron/                 # Scheduler (jobs.py, scheduler.py)
-│   ├── plugins/              # Plugin system (memory providers, context engines)
-│   ├── constants.py          # Shared constants
-│   ├── state.py              # SessionDB — SQLite session store
-│   ├── logging.py            # Logging configuration
-│   └── utils.py              # Shared utilities
-├── tui_gateway/          # Python JSON-RPC backend for the TUI
-├── ui-tui/               # Ink (React) terminal UI — `hermes --tui`
+├── run_agent.py          # AIAgent class — core conversation loop
+├── model_tools.py        # Tool orchestration, discover_builtin_tools(), handle_function_call()
+├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
+├── cli.py                # HermesCLI class — interactive CLI orchestrator
+├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
+├── agent/                # Agent internals
+│   ├── prompt_builder.py     # System prompt assembly
+│   ├── context_compressor.py # Auto context compression
+│   ├── prompt_caching.py     # Anthropic prompt caching
+│   ├── auxiliary_client.py   # Auxiliary LLM client (vision, summarization)
+│   ├── model_metadata.py     # Model context lengths, token estimation
+│   ├── models_dev.py         # models.dev registry integration (provider-aware context)
+│   ├── display.py            # KawaiiSpinner, tool preview formatting
+│   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
+│   └── trajectory.py         # Trajectory saving helpers
+├── hermes_cli/           # CLI subcommands and setup
+│   ├── main.py           # Entry point — all `hermes` subcommands
+│   ├── config.py         # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
+│   ├── commands.py       # Slash command definitions + SlashCommandCompleter
+│   ├── callbacks.py      # Terminal callbacks (clarify, sudo, approval)
+│   ├── setup.py          # Interactive setup wizard
+│   ├── skin_engine.py    # Skin/theme engine — CLI visual customization
+│   ├── skills_config.py  # `hermes skills` — enable/disable skills per platform
+│   ├── tools_config.py   # `hermes tools` — enable/disable tools per platform
+│   ├── skills_hub.py     # `/skills` slash command (search, browse, install)
+│   ├── models.py         # Model catalog, provider model lists
+│   ├── model_switch.py   # Shared /model switch pipeline (CLI + gateway)
+│   └── auth.py           # Provider credential resolution
+├── tools/                # Tool implementations (one file per tool)
+│   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
+│   ├── approval.py       # Dangerous command detection
+│   ├── terminal_tool.py  # Terminal orchestration
+│   ├── process_registry.py # Background process management
+│   ├── file_tools.py     # File read/write/search/patch
+│   ├── web_tools.py      # Web search/extract (Parallel + Firecrawl)
+│   ├── browser_tool.py   # Browserbase browser automation
+│   ├── code_execution_tool.py # execute_code sandbox
+│   ├── delegate_tool.py  # Subagent delegation
+│   ├── mcp_tool.py       # MCP client (~1050 lines)
+│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
+├── gateway/              # Messaging platform gateway
+│   ├── run.py            # Main loop, slash commands, message dispatch
+│   ├── session.py        # SessionStore — conversation persistence
+│   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal, qqbot
+├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
+├── cron/                 # Scheduler (jobs.py, scheduler.py)
 ├── environments/         # RL training environments (Atropos)
-├── tests/                # Pytest suite
-└── web/                  # Vite + React web dashboard
+├── tests/                # Pytest suite (~3000 tests)
+└── batch_runner.py       # Parallel batch processing
 ```

 **User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
@@ -72,18 +68,18 @@ hermes-agent/
 ## File Dependency Chain

 ```
-hermes_agent/tools/registry.py  (no deps — imported by all tool files)
+tools/registry.py  (no deps — imported by all tool files)
       ↑
-hermes_agent/tools/*.py  (each calls registry.register() at import time)
+tools/*.py  (each calls registry.register() at import time)
       ↑
-hermes_agent/tools/dispatch.py  (imports registry + triggers tool discovery)
+model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
-hermes_agent/agent/loop.py, hermes_agent/cli/repl.py, environments/
+run_agent.py, cli.py, batch_runner.py, environments/
 ```

 ---

-## AIAgent Class (hermes_agent/agent/loop.py)
+## AIAgent Class (run_agent.py)

 ```python
 class AIAgent:
@@ -129,14 +125,14 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re

 ---

-## CLI Architecture (hermes_agent/cli/repl.py)
+## CLI Architecture (cli.py)

 - **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`hermes_agent/agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in repl.py merges hardcoded defaults + user config YAML
- **Skin engine** (`hermes_agent/cli/ui/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
+- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
+- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
+- **Skin engine** (`hermes_cli/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
 - `process_command()` is a method on `HermesCLI` — dispatches on canonical command name resolved via `resolve_command()` from the central registry
- Skill slash commands: `hermes_agent/agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
+- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching

 ### Slash Command Registry (`hermes_cli/commands.py`)

@@ -183,59 +179,6 @@ if canonical == "mycommand":

 ---

-## TUI Architecture (ui-tui + tui_gateway)
-
-The TUI is a full replacement for the classic (prompt_toolkit) CLI, activated via `hermes --tui` or `HERMES_TUI=1`.
-
-### Process Model
-
-```
-hermes --tui
-  └─ Node (Ink)  ──stdio JSON-RPC──  Python (tui_gateway)
-       │                                  └─ AIAgent + tools + sessions
-       └─ renders transcript, composer, prompts, activity
-```
-
-TypeScript owns the screen. Python owns sessions, tools, model calls, and slash command logic.
-
-### Transport
-
-Newline-delimited JSON-RPC over stdio. Requests from Ink, events from Python. See `tui_gateway/server.py` for the full method/event catalog.
-
-### Key Surfaces
-
-| Surface | Ink component | Gateway method |
-|---------|---------------|----------------|
-| Chat streaming | `app.tsx` + `messageLine.tsx` | `prompt.submit` → `message.delta/complete` |
-| Tool activity | `thinking.tsx` | `tool.start/progress/complete` |
-| Approvals | `prompts.tsx` | `approval.respond` ← `approval.request` |
-| Clarify/sudo/secret | `prompts.tsx`, `maskedPrompt.tsx` | `clarify/sudo/secret.respond` |
-| Session picker | `sessionPicker.tsx` | `session.list/resume` |
-| Slash commands | Local handler + fallthrough | `slash.exec` → `_SlashWorker`, `command.dispatch` |
-| Completions | `useCompletion` hook | `complete.slash`, `complete.path` |
-| Theming | `theme.ts` + `branding.tsx` | `gateway.ready` with skin data |
-
-### Slash Command Flow
-
-1. Built-in client commands (`/help`, `/quit`, `/clear`, `/resume`, `/copy`, `/paste`, etc.) handled locally in `app.tsx`
-2. Everything else → `slash.exec` (runs in persistent `_SlashWorker` subprocess) → `command.dispatch` fallback
-
-### Dev Commands
-
-```bash
-cd ui-tui
-npm install       # first time
-npm run dev       # watch mode (rebuilds hermes-ink + tsx --watch)
-npm start         # production
-npm run build     # full build (hermes-ink + tsc)
-npm run type-check # typecheck only (tsc --noEmit)
-npm run lint      # eslint
-npm run fmt       # prettier
-npm test          # vitest
-```
-
---
-
 ## Adding New Tools

 Requires changes in **2 files**:
@@ -263,7 +206,7 @@ registry.register(

 **2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.

-Auto-discovery: any `hermes_agent/tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
+Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.

 The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.

@@ -489,11 +432,11 @@ Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) inst
 ### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
 Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.

-### `_last_resolved_tool_names` is a process-global in `hermes_agent/tools/dispatch.py`
+### `_last_resolved_tool_names` is a process-global in `model_tools.py`
 `_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.

 ### DO NOT hardcode cross-tool references in schema descriptions
-Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `hermes_agent/tools/dispatch.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
+Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.

 ### Tests must not write to `~/.hermes/`
 The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
@@ -515,94 +458,13 @@ def profile_env(tmp_path, monkeypatch):

 ## Testing

-**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
-hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
-4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
-developer machine with API keys set diverges from CI in ways that have caused
-multiple "works locally, fails in CI" incidents (and the reverse).
-
-```bash
-scripts/run_tests.sh                                  # full suite, CI-parity
-scripts/run_tests.sh tests/gateway/                   # one directory
-scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
-scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags
-```
-
-### Why the wrapper (and why the old "just call pytest" doesn't work)
-
-Five real sources of local-vs-CI drift the script closes:
-
-| | Without wrapper | With wrapper |
-|---|---|---|
-| Provider API keys | Whatever is in your env (auto-detects pool) | All `*_API_KEY`/`*_TOKEN`/etc. unset |
-| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
-| Timezone | Local TZ (PDT etc.) | UTC |
-| Locale | Whatever is set | C.UTF-8 |
-| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
-
-`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
-invocation (including IDE integrations) gets hermetic behavior — but the wrapper
-is belt-and-suspenders.
-
-### Running without the wrapper (only if you must)
-
-If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
-pytest directly), at minimum activate the venv and pass `-n 4`:
-
 ```bash
 source venv/bin/activate
-python -m pytest tests/ -q -n 4
+python -m pytest tests/ -q          # Full suite (~3000 tests, ~3 min)
+python -m pytest tests/test_model_tools.py -q   # Toolset resolution
+python -m pytest tests/test_cli_init.py -q       # CLI config loading
+python -m pytest tests/gateway/ -q               # Gateway tests
+python -m pytest tests/tools/ -q                 # Tool-level tests
 ```

-Worker count above 4 will surface test-ordering flakes that CI never sees.
-
 Always run the full suite before pushing changes.
-
-### Don't write change-detector tests
-
-A test is a **change-detector** if it fails whenever data that is **expected
-to change** gets updated — model catalogs, config version numbers,
-enumeration counts, hardcoded lists of provider models. These tests add no
-behavioral coverage; they just guarantee that routine source updates break
-CI and cost engineering time to "fix."
-
-**Do not write:**
-
-```python
-# catalog snapshot — breaks every model release
-assert "gemini-2.5-pro" in _PROVIDER_MODELS["gemini"]
-assert "MiniMax-M2.7" in models
-
-# config version literal — breaks every schema bump
-assert DEFAULT_CONFIG["_config_version"] == 21
-
-# enumeration count — breaks every time a skill/provider is added
-assert len(_PROVIDER_MODELS["huggingface"]) == 8
-```
-
-**Do write:**
-
-```python
-# behavior: does the catalog plumbing work at all?
-assert "gemini" in _PROVIDER_MODELS
-assert len(_PROVIDER_MODELS["gemini"]) >= 1
-
-# behavior: does migration bump the user's version to current latest?
-assert raw["_config_version"] == DEFAULT_CONFIG["_config_version"]
-
-# invariant: no plan-only model leaks into the legacy list
-assert not (set(moonshot_models) & coding_plan_only_models)
-
-# invariant: every model in the catalog has a context-length entry
-for m in _PROVIDER_MODELS["huggingface"]:
-    assert m.lower() in DEFAULT_CONTEXT_LENGTHS_LOWER
-```
-
-The rule: if the test reads like a snapshot of current data, delete it. If
-it reads like a contract about how two pieces of data must relate, keep it.
-When a PR adds a new provider/model and you want a test, make the test
-assert the relationship (e.g. "catalog entries all have context lengths"),
-not the specific names.
-
-Reviewers should reject new change-detector tests; authors should convert
-them into invariants before re-requesting review.
--- a/28
+++ b/28
@@ -21,34 +21,26 @@ RUN useradd -u 10000 -m -d /opt/data hermes
 COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
 COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/

+COPY . /opt/hermes
 WORKDIR /opt/hermes

-# ---------- Layer-cached dependency install ----------
-# Copy only package manifests first so npm install + Playwright are cached
-# unless the lockfiles themselves change.
-COPY package.json package-lock.json ./
-COPY web/package.json web/package-lock.json web/
-
+# Install Node dependencies and Playwright as root (--with-deps needs apt)
 RUN npm install --prefer-offline --no-audit && \
    npx playwright install --with-deps chromium --only-shell && \
-    (cd web && npm install --prefer-offline --no-audit) && \
+    cd /opt/hermes/scripts/whatsapp-bridge && \
+    npm install --prefer-offline --no-audit && \
    npm cache clean --force

-# ---------- Source code ----------
-# .dockerignore excludes node_modules, so the installs above survive.
-COPY --chown=hermes:hermes . .
-
-# Build web dashboard (Vite outputs to hermes_agent/cli/web_dist/)
-RUN cd web && npm run build
-
-# ---------- Python virtualenv ----------
-RUN chown hermes:hermes /opt/hermes
+# Hand ownership to hermes user, then install Python deps in a virtualenv
+RUN chown -R hermes:hermes /opt/hermes
 USER hermes
+
 RUN uv venv && \
    uv pip install --no-cache-dir -e ".[all]"

-# ---------- Runtime ----------
-ENV HERMES_WEB_DIST=/opt/hermes/hermes_agent/cli/web_dist
+USER root
+RUN chmod +x /opt/hermes/docker/entrypoint.sh
+
 ENV HERMES_HOME=/opt/data
 VOLUME [ "/opt/data" ]
 ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,4 +1,3 @@
-graft hermes_agent
 graft skills
 graft optional-skills
 global-exclude __pycache__
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -141,18 +141,11 @@ See `hermes claw migrate --help` for all options, or use the `openclaw-migration

 We welcome contributions! See the [Contributing Guide](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) for development setup, code style, and PR process.

-Quick start for contributors — clone and go with `setup-hermes.sh`:
+Quick start for contributors:

 ```bash
 git clone https://github.com/NousResearch/hermes-agent.git
 cd hermes-agent
-./setup-hermes.sh     # installs uv, creates venv, installs .[all], symlinks ~/.local/bin/hermes
-./hermes              # auto-detects the venv, no need to `source` first
-```
-
-Manual path (equivalent to the above):
-
-```bash
 curl -LsSf https://astral.sh/uv/install.sh | sh
 uv venv venv --python 3.11
 source venv/bin/activate
--- a/hermes_agent/acp/init.py
+++ b/hermes_agent/acp/init.py
--- a/hermes_agent/acp/main.py
+++ b/hermes_agent/acp/main.py
--- a/hermes_agent/acp/auth.py
+++ b/hermes_agent/acp/auth.py
@@ -8,7 +8,7 @@ from typing import Optional
 def detect_provider() -> Optional[str]:
    """Resolve the active Hermes runtime provider, or None if unavailable."""
    try:
-        from hermes_agent.cli.runtime_provider import resolve_runtime_provider
+        from hermes_cli.runtime_provider import resolve_runtime_provider
        runtime = resolve_runtime_provider()
        api_key = runtime.get("api_key")
        provider = runtime.get("provider")
--- a/hermes_agent/acp/entry.py
+++ b/hermes_agent/acp/entry.py
@@ -17,47 +17,7 @@ import asyncio
 import logging
 import sys
 from pathlib import Path
-from hermes_agent.constants import get_hermes_home
-
-
-# Methods clients send as periodic liveness probes. They are not part of the
-# ACP schema, so the acp router correctly returns JSON-RPC -32601 to the
-# caller — but the supervisor task that dispatches the request then surfaces
-# the raised RequestError via ``logging.exception("Background task failed")``,
-# which dumps a traceback to stderr every probe interval. Clients like
-# acp-bridge already treat the -32601 response as "agent alive", so the
-# traceback is pure noise. We keep the protocol response intact and only
-# silence the stderr noise for this specific benign case.
-_BENIGN_PROBE_METHODS = frozenset({"ping", "health", "healthcheck"})
-
-
-class _BenignProbeMethodFilter(logging.Filter):
-    """Suppress acp 'Background task failed' tracebacks caused by unknown
-    liveness-probe methods (e.g. ``ping``) while leaving every other
-    background-task error — including method_not_found for any non-probe
-    method — visible in stderr.
-    """
-
-    def filter(self, record: logging.LogRecord) -> bool:
-        if record.getMessage() != "Background task failed":
-            return True
-        exc_info = record.exc_info
-        if not exc_info:
-            return True
-        exc = exc_info[1]
-        # Imported lazily so this module stays importable when the optional
-        # ``agent-client-protocol`` dependency is not installed.
-        try:
-            from acp.exceptions import RequestError
-        except ImportError:
-            return True
-        if not isinstance(exc, RequestError):
-            return True
-        if getattr(exc, "code", None) != -32601:
-            return True
-        data = getattr(exc, "data", None)
-        method = data.get("method") if isinstance(data, dict) else None
-        return method not in _BENIGN_PROBE_METHODS
+from hermes_constants import get_hermes_home


 def _setup_logging() -> None:
@@ -69,7 +29,6 @@ def _setup_logging() -> None:
            datefmt="%Y-%m-%d %H:%M:%S",
        )
    )
-    handler.addFilter(_BenignProbeMethodFilter())
    root = logging.getLogger()
    root.handlers.clear()
    root.addHandler(handler)
@@ -83,7 +42,7 @@ def _setup_logging() -> None:

 def _load_env() -> None:
    """Load .env from HERMES_HOME (default ``~/.hermes``)."""
-    from hermes_agent.cli.env_loader import load_hermes_dotenv
+    from hermes_cli.env_loader import load_hermes_dotenv

    hermes_home = get_hermes_home()
    loaded = load_hermes_dotenv(hermes_home=hermes_home)
@@ -104,6 +63,11 @@ def main() -> None:
    logger = logging.getLogger(__name__)
    logger.info("Starting hermes-agent ACP adapter")

+    # Ensure the project root is on sys.path so ``from run_agent import AIAgent`` works
+    project_root = str(Path(__file__).resolve().parent.parent)
+    if project_root not in sys.path:
+        sys.path.insert(0, project_root)
+
    import acp
    from .server import HermesACPAgent

--- a/hermes_agent/acp/events.py
+++ b/hermes_agent/acp/events.py
@@ -49,7 +49,6 @@ def make_tool_progress_cb(
    session_id: str,
    loop: asyncio.AbstractEventLoop,
    tool_call_ids: Dict[str, Deque[str]],
-    tool_call_meta: Dict[str, Dict[str, Any]],
 ) -> Callable:
    """Create a ``tool_progress_callback`` for AIAgent.

@@ -85,16 +84,6 @@ def make_tool_progress_cb(
            tool_call_ids[name] = queue
        queue.append(tc_id)

-        snapshot = None
-        if name in {"write_file", "patch", "skill_manage"}:
-            try:
-                from hermes_agent.agent.display import capture_local_edit_snapshot
-
-                snapshot = capture_local_edit_snapshot(name, args)
-            except Exception:
-                logger.debug("Failed to capture ACP edit snapshot for %s", name, exc_info=True)
-        tool_call_meta[tc_id] = {"args": args, "snapshot": snapshot}
-
        update = build_tool_start(tc_id, name, args)
        _send_update(conn, session_id, loop, update)

@@ -130,7 +119,6 @@ def make_step_cb(
    session_id: str,
    loop: asyncio.AbstractEventLoop,
    tool_call_ids: Dict[str, Deque[str]],
-    tool_call_meta: Dict[str, Dict[str, Any]],
 ) -> Callable:
    """Create a ``step_callback`` for AIAgent.

@@ -144,12 +132,10 @@ def make_step_cb(
            for tool_info in prev_tools:
                tool_name = None
                result = None
-                function_args = None

                if isinstance(tool_info, dict):
                    tool_name = tool_info.get("name") or tool_info.get("function_name")
                    result = tool_info.get("result") or tool_info.get("output")
-                    function_args = tool_info.get("arguments") or tool_info.get("args")
                elif isinstance(tool_info, str):
                    tool_name = tool_info

@@ -159,13 +145,8 @@ def make_step_cb(
                    tool_call_ids[tool_name] = queue
                if tool_name and queue:
                    tc_id = queue.popleft()
-                    meta = tool_call_meta.pop(tc_id, {})
                    update = build_tool_complete(
-                        tc_id,
-                        tool_name,
-                        result=str(result) if result is not None else None,
-                        function_args=function_args or meta.get("args"),
-                        snapshot=meta.get("snapshot"),
+                        tc_id, tool_name, result=str(result) if result is not None else None
                    )
                    _send_update(conn, session_id, loop, update)
                    if not queue:
--- a/hermes_agent/acp/permissions.py
+++ b/hermes_agent/acp/permissions.py
@@ -63,9 +63,6 @@ def make_approval_callback(
            logger.warning("Permission request timed out or failed: %s", exc)
            return "deny"

-        if response is None:
-            return "deny"
-
        outcome = response.outcome
        if isinstance(outcome, AllowedOutcome):
            option_id = outcome.option_id
--- a/hermes_agent/acp/server.py
+++ b/hermes_agent/acp/server.py
@@ -4,7 +4,6 @@ from __future__ import annotations

 import asyncio
 import logging
-import os
 from collections import defaultdict, deque
 from concurrent.futures import ThreadPoolExecutor
 from typing import Any, Deque, Optional
@@ -27,7 +26,6 @@ from acp.schema import (
    McpServerHttp,
    McpServerSse,
    McpServerStdio,
-    ModelInfo,
    NewSessionResponse,
    PromptResponse,
    ResumeSessionResponse,
@@ -38,7 +36,6 @@ from acp.schema import (
    SessionCapabilities,
    SessionForkCapabilities,
    SessionListCapabilities,
-    SessionModelState,
    SessionResumeCapabilities,
    SessionInfo,
    TextContentBlock,
@@ -52,31 +49,26 @@ try:
 except ImportError:
    from acp.schema import AuthMethod as AuthMethodAgent  # type: ignore[attr-defined]

-from hermes_agent.acp.auth import detect_provider
-from hermes_agent.acp.events import (
+from acp_adapter.auth import detect_provider, has_provider
+from acp_adapter.events import (
    make_message_cb,
    make_step_cb,
    make_thinking_cb,
    make_tool_progress_cb,
 )
-from hermes_agent.acp.permissions import make_approval_callback
-from hermes_agent.acp.session import SessionManager, SessionState
+from acp_adapter.permissions import make_approval_callback
+from acp_adapter.session import SessionManager, SessionState

 logger = logging.getLogger(__name__)

 try:
-    from hermes_agent.cli import __version__ as HERMES_VERSION
+    from hermes_cli import __version__ as HERMES_VERSION
 except Exception:
    HERMES_VERSION = "0.0.0"

 # Thread pool for running AIAgent (synchronous) in parallel.
 _executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="acp-agent")

-# Server-side page size for list_sessions. The ACP ListSessionsRequest schema
-# does not expose a client-side limit, so this is a fixed cap that clients
-# paginate against using `cursor` / `next_cursor`.
-_LIST_SESSIONS_PAGE_SIZE = 50
-

 def _extract_text(
    prompt: list[
@@ -155,98 +147,6 @@ class HermesACPAgent(acp.Agent):
        self._conn = conn
        logger.info("ACP client connected")

-    @staticmethod
-    def _encode_model_choice(provider: str | None, model: str | None) -> str:
-        """Encode a model selection so ACP clients can keep provider context."""
-        raw_model = str(model or "").strip()
-        if not raw_model:
-            return ""
-        raw_provider = str(provider or "").strip().lower()
-        if not raw_provider:
-            return raw_model
-        return f"{raw_provider}:{raw_model}"
-
-    def _build_model_state(self, state: SessionState) -> SessionModelState | None:
-        """Return the ACP model selector payload for editors like Zed."""
-        model = str(state.model or getattr(state.agent, "model", "") or "").strip()
-        provider = getattr(state.agent, "provider", None) or detect_provider() or "openrouter"
-
-        try:
-            from hermes_agent.cli.models.models import curated_models_for_provider, normalize_provider, provider_label
-
-            normalized_provider = normalize_provider(provider)
-            provider_name = provider_label(normalized_provider)
-            available_models: list[ModelInfo] = []
-            seen_ids: set[str] = set()
-
-            for model_id, description in curated_models_for_provider(normalized_provider):
-                rendered_model = str(model_id or "").strip()
-                if not rendered_model:
-                    continue
-                choice_id = self._encode_model_choice(normalized_provider, rendered_model)
-                if choice_id in seen_ids:
-                    continue
-                desc_parts = [f"Provider: {provider_name}"]
-                if description:
-                    desc_parts.append(str(description).strip())
-                if rendered_model == model:
-                    desc_parts.append("current")
-                available_models.append(
-                    ModelInfo(
-                        model_id=choice_id,
-                        name=rendered_model,
-                        description=" • ".join(part for part in desc_parts if part),
-                    )
-                )
-                seen_ids.add(choice_id)
-
-            current_model_id = self._encode_model_choice(normalized_provider, model)
-            if current_model_id and current_model_id not in seen_ids:
-                available_models.insert(
-                    0,
-                    ModelInfo(
-                        model_id=current_model_id,
-                        name=model,
-                        description=f"Provider: {provider_name} • current",
-                    ),
-                )
-
-            if available_models:
-                return SessionModelState(
-                    available_models=available_models,
-                    current_model_id=current_model_id or available_models[0].model_id,
-                )
-        except Exception:
-            logger.debug("Could not build ACP model state", exc_info=True)
-
-        if not model:
-            return None
-
-        fallback_choice = self._encode_model_choice(provider, model)
-        return SessionModelState(
-            available_models=[ModelInfo(model_id=fallback_choice, name=model)],
-            current_model_id=fallback_choice,
-        )
-
-    @staticmethod
-    def _resolve_model_selection(raw_model: str, current_provider: str) -> tuple[str, str]:
-        """Resolve ``provider:model`` input into the provider and normalized model id."""
-        target_provider = current_provider
-        new_model = raw_model.strip()
-
-        try:
-            from hermes_agent.cli.models.models import detect_provider_for_model, parse_model_input
-
-            target_provider, new_model = parse_model_input(new_model, current_provider)
-            if target_provider == current_provider:
-                detected = detect_provider_for_model(new_model, current_provider)
-                if detected:
-                    target_provider, new_model = detected
-        except Exception:
-            logger.debug("Provider detection failed, using model as-is", exc_info=True)
-
-        return target_provider, new_model
-
    async def _register_session_mcp_servers(
        self,
        state: SessionState,
@@ -257,7 +157,7 @@ class HermesACPAgent(acp.Agent):
            return

        try:
-            from hermes_agent.tools.mcp.tool import register_mcp_servers
+            from tools.mcp_tool import register_mcp_servers

            config_map: dict[str, dict] = {}
            for server in mcp_servers:
@@ -285,7 +185,7 @@ class HermesACPAgent(acp.Agent):
            return

        try:
-            from hermes_agent.tools.dispatch import get_tool_definitions
+            from model_tools import get_tool_definitions

            enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
            disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
@@ -357,18 +257,9 @@ class HermesACPAgent(acp.Agent):
        )

    async def authenticate(self, method_id: str, **kwargs: Any) -> AuthenticateResponse | None:
-        # Only accept authenticate() calls whose method_id matches the
-        # provider we advertised in initialize(). Without this check,
-        # authenticate() would acknowledge any method_id as long as the
-        # server has provider credentials configured — harmless under
-        # Hermes' threat model (ACP is stdio-only, local-trust), but poor
-        # API hygiene and confusing if ACP ever grows multi-method auth.
-        provider = detect_provider()
-        if not provider:
-            return None
-        if not isinstance(method_id, str) or method_id.strip().lower() != provider:
-            return None
-        return AuthenticateResponse()
+        if has_provider():
+            return AuthenticateResponse()
+        return None

    # ---- Session management -------------------------------------------------

@@ -382,10 +273,7 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("New session %s (cwd=%s)", state.session_id, cwd)
        self._schedule_available_commands_update(state.session_id)
-        return NewSessionResponse(
-            session_id=state.session_id,
-            models=self._build_model_state(state),
-        )
+        return NewSessionResponse(session_id=state.session_id)

    async def load_session(
        self,
@@ -401,7 +289,7 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Loaded session %s", session_id)
        self._schedule_available_commands_update(session_id)
-        return LoadSessionResponse(models=self._build_model_state(state))
+        return LoadSessionResponse()

    async def resume_session(
        self,
@@ -417,7 +305,7 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Resumed session %s", state.session_id)
        self._schedule_available_commands_update(state.session_id)
-        return ResumeSessionResponse(models=self._build_model_state(state))
+        return ResumeSessionResponse()

    async def cancel(self, session_id: str, **kwargs: Any) -> None:
        state = self.session_manager.get_session(session_id)
@@ -452,44 +340,12 @@ class HermesACPAgent(acp.Agent):
        cwd: str | None = None,
        **kwargs: Any,
    ) -> ListSessionsResponse:
-        """List ACP sessions with optional ``cwd`` filtering and cursor pagination.
-
-        ``cwd`` is passed through to ``SessionManager.list_sessions`` which already
-        normalizes and filters by working directory. ``cursor`` is a ``session_id``
-        previously returned as ``next_cursor``; results resume after that entry.
-        Server-side page size is capped at ``_LIST_SESSIONS_PAGE_SIZE``; when more
-        results remain, ``next_cursor`` is set to the last returned ``session_id``.
-        """
-        infos = self.session_manager.list_sessions(cwd=cwd)
-
-        if cursor:
-            for idx, s in enumerate(infos):
-                if s["session_id"] == cursor:
-                    infos = infos[idx + 1:]
-                    break
-            else:
-                # Unknown cursor -> empty page (do not fall back to full list).
-                infos = []
-
-        has_more = len(infos) > _LIST_SESSIONS_PAGE_SIZE
-        infos = infos[:_LIST_SESSIONS_PAGE_SIZE]
-
-        sessions = []
-        for s in infos:
-            updated_at = s.get("updated_at")
-            if updated_at is not None and not isinstance(updated_at, str):
-                updated_at = str(updated_at)
-            sessions.append(
-                SessionInfo(
-                    session_id=s["session_id"],
-                    cwd=s["cwd"],
-                    title=s.get("title"),
-                    updated_at=updated_at,
-                )
-            )
-
-        next_cursor = sessions[-1].session_id if has_more and sessions else None
-        return ListSessionsResponse(sessions=sessions, next_cursor=next_cursor)
+        infos = self.session_manager.list_sessions()
+        sessions = [
+            SessionInfo(session_id=s["session_id"], cwd=s["cwd"])
+            for s in infos
+        ]
+        return ListSessionsResponse(sessions=sessions)

    # ---- Prompt (core) ------------------------------------------------------

@@ -533,13 +389,12 @@ class HermesACPAgent(acp.Agent):
            state.cancel_event.clear()

        tool_call_ids: dict[str, Deque[str]] = defaultdict(deque)
-        tool_call_meta: dict[str, dict[str, Any]] = {}
        previous_approval_cb = None

        if conn:
-            tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
+            tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids)
            thinking_cb = make_thinking_cb(conn, session_id, loop)
-            step_cb = make_step_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
+            step_cb = make_step_cb(conn, session_id, loop, tool_call_ids)
            message_cb = make_message_cb(conn, session_id, loop)
            approval_cb = make_approval_callback(conn.request_permission, loop, session_id)
        else:
@@ -555,32 +410,15 @@ class HermesACPAgent(acp.Agent):
        agent.step_callback = step_cb
        agent.message_callback = message_cb

-        # Approval callback is per-thread (thread-local, GHSA-qg5c-hvr5-hjgr).
-        # Set it INSIDE _run_agent so the TLS write happens in the executor
-        # thread — setting it here would write to the event-loop thread's TLS,
-        # not the executor's. Also set HERMES_INTERACTIVE so approval.py
-        # takes the CLI-interactive path (which calls the registered
-        # callback via prompt_dangerous_approval) instead of the
-        # non-interactive auto-approve branch (GHSA-96vc-wcxf-jjff).
-        # ACP's conn.request_permission maps cleanly to the interactive
-        # callback shape — not the gateway-queue HERMES_EXEC_ASK path,
-        # which requires a notify_cb registered in _gateway_notify_cbs.
-        previous_approval_cb = None
-        previous_interactive = None
+        if approval_cb:
+            try:
+                from tools import terminal_tool as _terminal_tool
+                previous_approval_cb = getattr(_terminal_tool, "_approval_callback", None)
+                _terminal_tool.set_approval_callback(approval_cb)
+            except Exception:
+                logger.debug("Could not set ACP approval callback", exc_info=True)

        def _run_agent() -> dict:
-            nonlocal previous_approval_cb, previous_interactive
-            if approval_cb:
-                try:
-                    from hermes_agent.tools import terminal as _terminal_tool
-                    previous_approval_cb = _terminal_tool._get_approval_callback()
-                    _terminal_tool.set_approval_callback(approval_cb)
-                except Exception:
-                    logger.debug("Could not set ACP approval callback", exc_info=True)
-            # Signal to tools.approval that we have an interactive callback
-            # and the non-interactive auto-approve path must not fire.
-            previous_interactive = os.environ.get("HERMES_INTERACTIVE")
-            os.environ["HERMES_INTERACTIVE"] = "1"
            try:
                result = agent.run_conversation(
                    user_message=user_text,
@@ -592,14 +430,9 @@ class HermesACPAgent(acp.Agent):
                logger.exception("Agent error in session %s", session_id)
                return {"final_response": f"Error: {e}", "messages": state.history}
            finally:
-                # Restore HERMES_INTERACTIVE.
-                if previous_interactive is None:
-                    os.environ.pop("HERMES_INTERACTIVE", None)
-                else:
-                    os.environ["HERMES_INTERACTIVE"] = previous_interactive
                if approval_cb:
                    try:
-                        from hermes_agent.tools import terminal as _terminal_tool
+                        from tools import terminal_tool as _terminal_tool
                        _terminal_tool.set_approval_callback(previous_approval_cb)
                    except Exception:
                        logger.debug("Could not restore approval callback", exc_info=True)
@@ -616,19 +449,6 @@ class HermesACPAgent(acp.Agent):
            self.session_manager.save_session(session_id)

        final_response = result.get("final_response", "")
-        if final_response:
-            try:
-                from hermes_agent.agent.title_generator import maybe_auto_title
-
-                maybe_auto_title(
-                    self.session_manager._get_db(),
-                    session_id,
-                    user_text,
-                    final_response,
-                    state.history,
-                )
-            except Exception:
-                logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
        if final_response and conn:
            update = acp.update_agent_message_text(final_response)
            await conn.session_update(session_id, update)
@@ -673,8 +493,8 @@ class HermesACPAgent(acp.Agent):
            await self._conn.session_update(
                session_id=session_id,
                update=AvailableCommandsUpdate(
-                    session_update="available_commands_update",
-                    available_commands=self._available_commands(),
+                    sessionUpdate="available_commands_update",
+                    availableCommands=self._available_commands(),
                ),
            )
        except Exception:
@@ -736,15 +556,27 @@ class HermesACPAgent(acp.Agent):
            provider = getattr(state.agent, "provider", None) or "auto"
            return f"Current model: {model}\nProvider: {provider}"

+        new_model = args.strip()
+        target_provider = None
        current_provider = getattr(state.agent, "provider", None) or "openrouter"
-        target_provider, new_model = self._resolve_model_selection(args, current_provider)
+
+        # Auto-detect provider for the requested model
+        try:
+            from hermes_cli.models import parse_model_input, detect_provider_for_model
+            target_provider, new_model = parse_model_input(new_model, current_provider)
+            if target_provider == current_provider:
+                detected = detect_provider_for_model(new_model, current_provider)
+                if detected:
+                    target_provider, new_model = detected
+        except Exception:
+            logger.debug("Provider detection failed, using model as-is", exc_info=True)

        state.model = new_model
        state.agent = self.session_manager._make_agent(
            session_id=state.session_id,
            cwd=state.cwd,
            model=new_model,
-            requested_provider=target_provider,
+            requested_provider=target_provider or current_provider,
        )
        self.session_manager.save_session(state.session_id)
        provider_label = getattr(state.agent, "provider", None) or target_provider or current_provider
@@ -753,7 +585,7 @@ class HermesACPAgent(acp.Agent):

    def _cmd_tools(self, args: str, state: SessionState) -> str:
        try:
-            from hermes_agent.tools.dispatch import get_tool_definitions
+            from model_tools import get_tool_definitions
            toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
            tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
            if not tools:
@@ -804,7 +636,7 @@ class HermesACPAgent(acp.Agent):
            if not hasattr(agent, "_compress_context"):
                return "Context compression not available for this agent."

-            from hermes_agent.providers.metadata import estimate_messages_tokens_rough
+            from agent.model_metadata import estimate_messages_tokens_rough

            original_count = len(state.history)
            approx_tokens = estimate_messages_tokens_rough(state.history)
@@ -846,30 +678,20 @@ class HermesACPAgent(acp.Agent):
        """Switch the model for a session (called by ACP protocol)."""
        state = self.session_manager.get_session(session_id)
        if state:
+            state.model = model_id
            current_provider = getattr(state.agent, "provider", None)
-            requested_provider, resolved_model = self._resolve_model_selection(
-                model_id,
-                current_provider or "openrouter",
-            )
-            state.model = resolved_model
-            provider_changed = bool(current_provider and requested_provider != current_provider)
-            current_base_url = None if provider_changed else getattr(state.agent, "base_url", None)
-            current_api_mode = None if provider_changed else getattr(state.agent, "api_mode", None)
+            current_base_url = getattr(state.agent, "base_url", None)
+            current_api_mode = getattr(state.agent, "api_mode", None)
            state.agent = self.session_manager._make_agent(
                session_id=session_id,
                cwd=state.cwd,
-                model=resolved_model,
-                requested_provider=requested_provider,
+                model=model_id,
+                requested_provider=current_provider,
                base_url=current_base_url,
                api_mode=current_api_mode,
            )
            self.session_manager.save_session(session_id)
-            logger.info(
-                "Session %s: model switched to %s via provider %s",
-                session_id,
-                resolved_model,
-                requested_provider,
-            )
+            logger.info("Session %s: model switched to %s", session_id, model_id)
            return SetSessionModelResponse()
        logger.warning("Session %s: model switch requested for missing session", session_id)
        return None
--- a/hermes_agent/acp/session.py
+++ b/hermes_agent/acp/session.py
@@ -8,17 +8,13 @@ history.
 """
 from __future__ import annotations

-from hermes_agent.constants import get_hermes_home
+from hermes_constants import get_hermes_home

 import copy
 import json
 import logging
-import os
-import re
 import sys
-import time
 import uuid
-from datetime import datetime, timezone
 from dataclasses import dataclass, field
 from threading import Lock
 from typing import Any, Dict, List, Optional
@@ -26,64 +22,6 @@ from typing import Any, Dict, List, Optional
 logger = logging.getLogger(__name__)


-def _normalize_cwd_for_compare(cwd: str | None) -> str:
-    raw = str(cwd or ".").strip()
-    if not raw:
-        raw = "."
-    expanded = os.path.expanduser(raw)
-
-    # Normalize Windows drive paths into the equivalent WSL mount form so
-    # ACP history filters match the same workspace across Windows and WSL.
-    match = re.match(r"^([A-Za-z]):[\\/](.*)$", expanded)
-    if match:
-        drive = match.group(1).lower()
-        tail = match.group(2).replace("\\", "/")
-        expanded = f"/mnt/{drive}/{tail}"
-    elif re.match(r"^/mnt/[A-Za-z]/", expanded):
-        expanded = f"/mnt/{expanded[5].lower()}/{expanded[7:]}"
-
-    return os.path.normpath(expanded)
-
-
-def _build_session_title(title: Any, preview: Any, cwd: str | None) -> str:
-    explicit = str(title or "").strip()
-    if explicit:
-        return explicit
-    preview_text = str(preview or "").strip()
-    if preview_text:
-        return preview_text
-    leaf = os.path.basename(str(cwd or "").rstrip("/\\"))
-    return leaf or "New thread"
-
-
-def _format_updated_at(value: Any) -> str | None:
-    if value is None:
-        return None
-    if isinstance(value, str) and value.strip():
-        return value
-    try:
-        return datetime.fromtimestamp(float(value), tz=timezone.utc).isoformat()
-    except Exception:
-        return None
-
-
-def _updated_at_sort_key(value: Any) -> float:
-    if value is None:
-        return float("-inf")
-    if isinstance(value, (int, float)):
-        return float(value)
-    raw = str(value).strip()
-    if not raw:
-        return float("-inf")
-    try:
-        return datetime.fromisoformat(raw.replace("Z", "+00:00")).timestamp()
-    except Exception:
-        try:
-            return float(raw)
-        except Exception:
-            return float("-inf")
-
-
 def _acp_stderr_print(*args, **kwargs) -> None:
    """Best-effort human-readable output sink for ACP stdio sessions.

@@ -100,7 +38,7 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
    if not task_id:
        return
    try:
-        from hermes_agent.tools.terminal import register_task_env_overrides
+        from tools.terminal_tool import register_task_env_overrides
        register_task_env_overrides(task_id, {"cwd": cwd})
    except Exception:
        logger.debug("Failed to register ACP task cwd override", exc_info=True)
@@ -111,7 +49,7 @@ def _clear_task_cwd(task_id: str) -> None:
    if not task_id:
        return
    try:
-        from hermes_agent.tools.terminal import clear_task_env_overrides
+        from tools.terminal_tool import clear_task_env_overrides
        clear_task_env_overrides(task_id)
    except Exception:
        logger.debug("Failed to clear ACP task cwd override", exc_info=True)
@@ -224,78 +162,47 @@ class SessionManager:
        logger.info("Forked ACP session %s -> %s", session_id, new_id)
        return state

-    def list_sessions(self, cwd: str | None = None) -> List[Dict[str, Any]]:
+    def list_sessions(self) -> List[Dict[str, Any]]:
        """Return lightweight info dicts for all sessions (memory + database)."""
-        normalized_cwd = _normalize_cwd_for_compare(cwd) if cwd else None
-        db = self._get_db()
-        persisted_rows: dict[str, dict[str, Any]] = {}
-
-        if db is not None:
-            try:
-                for row in db.list_sessions_rich(source="acp", limit=1000):
-                    persisted_rows[str(row["id"])] = dict(row)
-            except Exception:
-                logger.debug("Failed to load ACP sessions from DB", exc_info=True)
-
        # Collect in-memory sessions first.
        with self._lock:
            seen_ids = set(self._sessions.keys())
-            results = []
-            for s in self._sessions.values():
-                history_len = len(s.history)
-                if history_len <= 0:
-                    continue
-                if normalized_cwd and _normalize_cwd_for_compare(s.cwd) != normalized_cwd:
-                    continue
-                persisted = persisted_rows.get(s.session_id, {})
-                preview = next(
-                    (
-                        str(msg.get("content") or "").strip()
-                        for msg in s.history
-                        if msg.get("role") == "user" and str(msg.get("content") or "").strip()
-                    ),
-                    persisted.get("preview") or "",
-                )
-                results.append(
-                    {
-                        "session_id": s.session_id,
-                        "cwd": s.cwd,
-                        "model": s.model,
-                        "history_len": history_len,
-                        "title": _build_session_title(persisted.get("title"), preview, s.cwd),
-                        "updated_at": _format_updated_at(
-                            persisted.get("last_active") or persisted.get("started_at") or time.time()
-                        ),
-                    }
-                )
+            results = [
+                {
+                    "session_id": s.session_id,
+                    "cwd": s.cwd,
+                    "model": s.model,
+                    "history_len": len(s.history),
+                }
+                for s in self._sessions.values()
+            ]

        # Merge any persisted sessions not currently in memory.
-        for sid, row in persisted_rows.items():
-            if sid in seen_ids:
-                continue
-            message_count = int(row.get("message_count") or 0)
-            if message_count <= 0:
-                continue
-            # Extract cwd from model_config JSON.
-            session_cwd = "."
-            mc = row.get("model_config")
-            if mc:
-                try:
-                    session_cwd = json.loads(mc).get("cwd", ".")
-                except (json.JSONDecodeError, TypeError):
-                    pass
-            if normalized_cwd and _normalize_cwd_for_compare(session_cwd) != normalized_cwd:
-                continue
-            results.append({
-                "session_id": sid,
-                "cwd": session_cwd,
-                "model": row.get("model") or "",
-                "history_len": message_count,
-                "title": _build_session_title(row.get("title"), row.get("preview"), session_cwd),
-                "updated_at": _format_updated_at(row.get("last_active") or row.get("started_at")),
-            })
+        db = self._get_db()
+        if db is not None:
+            try:
+                rows = db.search_sessions(source="acp", limit=1000)
+                for row in rows:
+                    sid = row["id"]
+                    if sid in seen_ids:
+                        continue
+                    # Extract cwd from model_config JSON.
+                    cwd = "."
+                    mc = row.get("model_config")
+                    if mc:
+                        try:
+                            cwd = json.loads(mc).get("cwd", ".")
+                        except (json.JSONDecodeError, TypeError):
+                            pass
+                    results.append({
+                        "session_id": sid,
+                        "cwd": cwd,
+                        "model": row.get("model") or "",
+                        "history_len": row.get("message_count") or 0,
+                    })
+            except Exception:
+                logger.debug("Failed to list ACP sessions from DB", exc_info=True)

-        results.sort(key=lambda item: _updated_at_sort_key(item.get("updated_at")), reverse=True)
        return results

    def update_cwd(self, session_id: str, cwd: str) -> Optional[SessionState]:
@@ -355,7 +262,7 @@ class SessionManager:
        if self._db_instance is not None:
            return self._db_instance
        try:
-            from hermes_agent.state import SessionDB
+            from hermes_state import SessionDB
            hermes_home = get_hermes_home()
            self._db_instance = SessionDB(db_path=hermes_home / "state.db")
            return self._db_instance
@@ -523,9 +430,9 @@ class SessionManager:
        if self._agent_factory is not None:
            return self._agent_factory()

-        from hermes_agent.agent.loop import AIAgent
-        from hermes_agent.cli.config import load_config
-        from hermes_agent.cli.runtime_provider import resolve_runtime_provider
+        from run_agent import AIAgent
+        from hermes_cli.config import load_config
+        from hermes_cli.runtime_provider import resolve_runtime_provider

        config = load_config()
        model_cfg = config.get("model")
--- a/hermes_agent/acp/tools.py
+++ b/hermes_agent/acp/tools.py
@@ -2,7 +2,6 @@

 from __future__ import annotations

-import json
 import uuid
 from typing import Any, Dict, List, Optional

@@ -97,170 +96,6 @@ def build_tool_title(tool_name: str, args: Dict[str, Any]) -> str:
    return tool_name


-def _build_patch_mode_content(patch_text: str) -> List[Any]:
-    """Parse V4A patch mode input into ACP diff blocks when possible."""
-    if not patch_text:
-        return [acp.tool_content(acp.text_block(""))]
-
-    try:
-        from hermes_agent.tools.patch_parser import OperationType, parse_v4a_patch
-
-        operations, error = parse_v4a_patch(patch_text)
-        if error or not operations:
-            return [acp.tool_content(acp.text_block(patch_text))]
-
-        content: List[Any] = []
-        for op in operations:
-            if op.operation == OperationType.UPDATE:
-                old_chunks: list[str] = []
-                new_chunks: list[str] = []
-                for hunk in op.hunks:
-                    old_lines = [line.content for line in hunk.lines if line.prefix in (" ", "-")]
-                    new_lines = [line.content for line in hunk.lines if line.prefix in (" ", "+")]
-                    if old_lines or new_lines:
-                        old_chunks.append("\n".join(old_lines))
-                        new_chunks.append("\n".join(new_lines))
-
-                old_text = "\n...\n".join(chunk for chunk in old_chunks if chunk)
-                new_text = "\n...\n".join(chunk for chunk in new_chunks if chunk)
-                if old_text or new_text:
-                    content.append(
-                        acp.tool_diff_content(
-                            path=op.file_path,
-                            old_text=old_text or None,
-                            new_text=new_text or "",
-                        )
-                    )
-                continue
-
-            if op.operation == OperationType.ADD:
-                added_lines = [line.content for hunk in op.hunks for line in hunk.lines if line.prefix == "+"]
-                content.append(
-                    acp.tool_diff_content(
-                        path=op.file_path,
-                        new_text="\n".join(added_lines),
-                    )
-                )
-                continue
-
-            if op.operation == OperationType.DELETE:
-                content.append(
-                    acp.tool_diff_content(
-                        path=op.file_path,
-                        old_text=f"Delete file: {op.file_path}",
-                        new_text="",
-                    )
-                )
-                continue
-
-            if op.operation == OperationType.MOVE:
-                content.append(
-                    acp.tool_content(acp.text_block(f"Move file: {op.file_path} -> {op.new_path}"))
-                )
-
-        return content or [acp.tool_content(acp.text_block(patch_text))]
-    except Exception:
-        return [acp.tool_content(acp.text_block(patch_text))]
-
-
-def _strip_diff_prefix(path: str) -> str:
-    raw = str(path or "").strip()
-    if raw.startswith(("a/", "b/")):
-        return raw[2:]
-    return raw
-
-
-def _parse_unified_diff_content(diff_text: str) -> List[Any]:
-    """Convert unified diff text into ACP diff content blocks."""
-    if not diff_text:
-        return []
-
-    content: List[Any] = []
-    current_old_path: Optional[str] = None
-    current_new_path: Optional[str] = None
-    old_lines: list[str] = []
-    new_lines: list[str] = []
-
-    def _flush() -> None:
-        nonlocal current_old_path, current_new_path, old_lines, new_lines
-        if current_old_path is None and current_new_path is None:
-            return
-        path = current_new_path if current_new_path and current_new_path != "/dev/null" else current_old_path
-        if not path or path == "/dev/null":
-            current_old_path = None
-            current_new_path = None
-            old_lines = []
-            new_lines = []
-            return
-        content.append(
-            acp.tool_diff_content(
-                path=_strip_diff_prefix(path),
-                old_text="\n".join(old_lines) if old_lines else None,
-                new_text="\n".join(new_lines),
-            )
-        )
-        current_old_path = None
-        current_new_path = None
-        old_lines = []
-        new_lines = []
-
-    for line in diff_text.splitlines():
-        if line.startswith("--- "):
-            _flush()
-            current_old_path = line[4:].strip()
-            continue
-        if line.startswith("+++ "):
-            current_new_path = line[4:].strip()
-            continue
-        if line.startswith("@@"):
-            continue
-        if current_old_path is None and current_new_path is None:
-            continue
-        if line.startswith("+"):
-            new_lines.append(line[1:])
-        elif line.startswith("-"):
-            old_lines.append(line[1:])
-        elif line.startswith(" "):
-            shared = line[1:]
-            old_lines.append(shared)
-            new_lines.append(shared)
-
-    _flush()
-    return content
-
-
-def _build_tool_complete_content(
-    tool_name: str,
-    result: Optional[str],
-    *,
-    function_args: Optional[Dict[str, Any]] = None,
-    snapshot: Any = None,
-) -> List[Any]:
-    """Build structured ACP completion content, falling back to plain text."""
-    display_result = result or ""
-    if len(display_result) > 5000:
-        display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"
-
-    if tool_name in {"write_file", "patch", "skill_manage"}:
-        try:
-            from hermes_agent.agent.display import extract_edit_diff
-
-            diff_text = extract_edit_diff(
-                tool_name,
-                result,
-                function_args=function_args,
-                snapshot=snapshot,
-            )
-            if isinstance(diff_text, str) and diff_text.strip():
-                diff_content = _parse_unified_diff_content(diff_text)
-                if diff_content:
-                    return diff_content
-        except Exception:
-            pass
-
-    return [acp.tool_content(acp.text_block(display_result))]
-
-
 # ---------------------------------------------------------------------------
 # Build ACP content objects for tool-call events
 # ---------------------------------------------------------------------------
@@ -284,8 +119,9 @@ def build_tool_start(
            new = arguments.get("new_string", "")
            content = [acp.tool_diff_content(path=path, new_text=new, old_text=old)]
        else:
+            # Patch mode — show the patch content as text
            patch_text = arguments.get("patch", "")
-            content = _build_patch_mode_content(patch_text)
+            content = [acp.tool_content(acp.text_block(patch_text))]
        return acp.start_tool_call(
            tool_call_id, title, kind=kind, content=content, locations=locations,
            raw_input=arguments,
@@ -342,17 +178,16 @@ def build_tool_complete(
    tool_call_id: str,
    tool_name: str,
    result: Optional[str] = None,
-    function_args: Optional[Dict[str, Any]] = None,
-    snapshot: Any = None,
 ) -> ToolCallProgress:
    """Create a ToolCallUpdate (progress) event for a completed tool call."""
    kind = get_tool_kind(tool_name)
-    content = _build_tool_complete_content(
-        tool_name,
-        result,
-        function_args=function_args,
-        snapshot=snapshot,
-    )
+
+    # Truncate very large results for the UI
+    display_result = result or ""
+    if len(display_result) > 5000:
+        display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"
+
+    content = [acp.tool_content(acp.text_block(display_result))]
    return acp.update_tool_call(
        tool_call_id,
        kind=kind,
--- a/hermes_agent/agent/init.py
+++ b/hermes_agent/agent/init.py
--- a/hermes_agent/providers/anthropic_adapter.py
+++ b/hermes_agent/providers/anthropic_adapter.py
@@ -16,10 +16,9 @@ import logging
 import os
 from pathlib import Path

-from hermes_agent.constants import get_hermes_home
+from hermes_constants import get_hermes_home
 from types import SimpleNamespace
 from typing import Any, Dict, List, Optional, Tuple
-from hermes_agent.utils import normalize_proxy_env_vars

 try:
    import anthropic as _anthropic_sdk
@@ -266,14 +265,6 @@ def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
    return True  # Any other endpoint is a third-party proxy


-def _is_kimi_coding_endpoint(base_url: str | None) -> bool:
-    """Return True for Kimi's /coding endpoint that requires claude-code UA."""
-    normalized = _normalize_base_url_text(base_url)
-    if not normalized:
-        return False
-    return normalized.rstrip("/").lower().startswith("https://api.kimi.com/coding")
-
-
 def _requires_bearer_auth(base_url: str | None) -> bool:
    """Return True for Anthropic-compatible providers that require Bearer auth.

@@ -301,15 +292,9 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
    return _COMMON_BETAS


-def build_anthropic_client(api_key: str, base_url: str = None, timeout: Optional[float] = None):
+def build_anthropic_client(api_key: str, base_url: str = None):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

-    If *timeout* is provided it overrides the default 900s read timeout.  The
-    connect timeout stays at 10s.  Callers pass this from the per-provider /
-    per-model ``request_timeout_seconds`` config so Anthropic-native and
-    Anthropic-compatible providers respect the same knob as OpenAI-wire
-    providers.
-
    Returns an anthropic.Anthropic instance.
    """
    if _anthropic_sdk is None:
@@ -317,32 +302,19 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: Optional
            "The 'anthropic' package is required for the Anthropic provider. "
            "Install it with: pip install 'anthropic>=0.39.0'"
        )
-
-    normalize_proxy_env_vars()
-
    from httpx import Timeout

    normalized_base_url = _normalize_base_url_text(base_url)
-    _read_timeout = timeout if (isinstance(timeout, (int, float)) and timeout > 0) else 900.0
    kwargs = {
-        "timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
+        "timeout": Timeout(timeout=900.0, connect=10.0),
    }
    if normalized_base_url:
        kwargs["base_url"] = normalized_base_url
    common_betas = _common_betas_for_base_url(normalized_base_url)

-    if _is_kimi_coding_endpoint(base_url):
-        # Kimi's /coding endpoint requires User-Agent: claude-code/0.1.0
-        # to be recognized as a valid Coding Agent. Without it, returns 403.
-        # Check this BEFORE _requires_bearer_auth since both match api.kimi.com/coding.
-        kwargs["api_key"] = api_key
-        kwargs["default_headers"] = {
-            "User-Agent": "claude-code/0.1.0",
-            **( {"anthropic-beta": ",".join(common_betas)} if common_betas else {} )
-        }
-    elif _requires_bearer_auth(normalized_base_url):
+    if _requires_bearer_auth(normalized_base_url):
        # Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
-        # Authorization: Bearer *** for regular API keys. Route those endpoints
+        # Authorization: Bearer even for regular API keys. Route those endpoints
        # through auth_token so the SDK sends Bearer auth instead of x-api-key.
        # Check this before OAuth token shape detection because MiniMax secrets do
        # not use Anthropic's sk-ant-api prefix and would otherwise be misread as
@@ -1426,25 +1398,11 @@ def build_anthropic_kwargs(
    # MiniMax Anthropic-compat endpoints support thinking (manual mode only,
    # not adaptive).  Haiku does NOT support extended thinking — skip entirely.
    #
-    # Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
-    # its own thinking semantics: when ``thinking.enabled`` is sent, Kimi
-    # validates the message history and requires every prior assistant
-    # tool-call message to carry OpenAI-style ``reasoning_content``.  The
-    # Anthropic path never populates that field, and
-    # ``convert_messages_to_anthropic`` strips all Anthropic thinking blocks
-    # on third-party endpoints — so the request fails with HTTP 400
-    # "thinking is enabled but reasoning_content is missing in assistant
-    # tool call message at index N".  Kimi's reasoning is driven server-side
-    # on the /coding route, so skip Anthropic's thinking parameter entirely
-    # for that host.  (Kimi on chat_completions enables thinking via
-    # extra_body in the ChatCompletionsTransport — see #13503.)
-    #
    # On 4.7+ the `thinking.display` field defaults to "omitted", which
    # silently hides reasoning text that Hermes surfaces in its CLI. We
    # request "summarized" so the reasoning blocks stay populated — matching
    # 4.6 behavior and preserving the activity-feed UX during long tool runs.
-    _is_kimi_coding = _is_kimi_coding_endpoint(base_url)
-    if reasoning_config and isinstance(reasoning_config, dict) and not _is_kimi_coding:
+    if reasoning_config and isinstance(reasoning_config, dict):
        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
            effort = str(reasoning_config.get("effort", "medium")).lower()
            budget = THINKING_BUDGET.get(effort, 8000)
@@ -1560,42 +1518,3 @@ def normalize_anthropic_response(
        ),
        finish_reason,
    )
-
-
-def normalize_anthropic_response_v2(
-    response,
-    strip_tool_prefix: bool = False,
-) -> "NormalizedResponse":
-    """Normalize Anthropic response to NormalizedResponse.
-
-    Wraps the existing normalize_anthropic_response() and maps its output
-    to the shared transport types.  This allows incremental migration —
-    one call site at a time — without changing the original function.
-    """
-    from hermes_agent.providers.types import NormalizedResponse, build_tool_call
-
-    assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
-
-    tool_calls = None
-    if assistant_msg.tool_calls:
-        tool_calls = [
-            build_tool_call(
-                id=tc.id,
-                name=tc.function.name,
-                arguments=tc.function.arguments,
-            )
-            for tc in assistant_msg.tool_calls
-        ]
-
-    provider_data = {}
-    if getattr(assistant_msg, "reasoning_details", None):
-        provider_data["reasoning_details"] = assistant_msg.reasoning_details
-
-    return NormalizedResponse(
-        content=assistant_msg.content,
-        tool_calls=tool_calls,
-        finish_reason=finish_reason,
-        reasoning=getattr(assistant_msg, "reasoning", None),
-        usage=None,  # Anthropic usage is on the raw response, not the normaliser
-        provider_data=provider_data or None,
-    )
--- a/hermes_agent/providers/auxiliary.py
+++ b/hermes_agent/providers/auxiliary.py
--- a/hermes_agent/providers/bedrock_adapter.py
+++ b/hermes_agent/providers/bedrock_adapter.py
--- a/hermes_agent/agent/context/compressor.py
+++ b/hermes_agent/agent/context/compressor.py
@@ -24,14 +24,13 @@ import re
 import time
 from typing import Any, Dict, List, Optional

-from hermes_agent.providers.auxiliary import call_llm
-from hermes_agent.agent.context.engine import ContextEngine
-from hermes_agent.providers.metadata import (
+from agent.auxiliary_client import call_llm
+from agent.context_engine import ContextEngine
+from agent.model_metadata import (
    MINIMUM_CONTEXT_LENGTH,
    get_model_context_length,
    estimate_messages_tokens_rough,
 )
-from hermes_agent.agent.redact import redact_sensitive_text

 logger = logging.getLogger(__name__)

@@ -64,52 +63,6 @@ _CHARS_PER_TOKEN = 4
 _SUMMARY_FAILURE_COOLDOWN_SECONDS = 600


-def _truncate_tool_call_args_json(args: str, head_chars: int = 200) -> str:
-    """Shrink long string values inside a tool-call arguments JSON blob while
-    preserving JSON validity.
-
-    The ``function.arguments`` field on a tool call is a JSON-encoded string
-    passed through to the LLM provider; downstream providers strictly
-    validate it and return a non-retryable 400 when it is not well-formed.
-    An earlier implementation sliced the raw JSON at a fixed byte offset and
-    appended ``...[truncated]`` — which routinely produced strings like::
-
-        {"path": "/foo/bar", "content": "# long markdown
-        ...[truncated]
-
-    i.e. an unterminated string and a missing closing brace. MiniMax, for
-    example, rejects this with ``invalid function arguments json string``
-    and the session gets stuck re-sending the same broken history on every
-    turn. See issue #11762 for the observed loop.
-
-    This helper parses the arguments, shrinks long string leaves inside the
-    parsed structure, and re-serialises. Non-string values (paths, ints,
-    booleans) are preserved intact. If the arguments are not valid JSON
-    to begin with — some model backends use non-JSON tool arguments — the
-    original string is returned unchanged rather than replaced with
-    something neither we nor the backend can parse.
-    """
-    try:
-        parsed = json.loads(args)
-    except (ValueError, TypeError):
-        return args
-
-    def _shrink(obj: Any) -> Any:
-        if isinstance(obj, str):
-            if len(obj) > head_chars:
-                return obj[:head_chars] + "...[truncated]"
-            return obj
-        if isinstance(obj, dict):
-            return {k: _shrink(v) for k, v in obj.items()}
-        if isinstance(obj, list):
-            return [_shrink(v) for v in obj]
-        return obj
-
-    shrunken = _shrink(parsed)
-    # ensure_ascii=False preserves CJK/emoji instead of bloating with \uXXXX
-    return json.dumps(shrunken, ensure_ascii=False)
-
-
 def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
    """Create an informative 1-line summary of a tool call + result.

@@ -496,11 +449,6 @@ class ContextCompressor(ContextEngine):
        # Pass 3: Truncate large tool_call arguments in assistant messages
        # outside the protected tail. write_file with 50KB content, for
        # example, survives pruning entirely without this.
-        #
-        # The shrinking is done inside the parsed JSON structure so the
-        # result remains valid JSON — otherwise downstream providers 400
-        # on every subsequent turn until the broken call falls out of
-        # the window. See ``_truncate_tool_call_args_json`` docstring.
        for i in range(prune_boundary):
            msg = result[i]
            if msg.get("role") != "assistant" or not msg.get("tool_calls"):
@@ -511,10 +459,8 @@ class ContextCompressor(ContextEngine):
                if isinstance(tc, dict):
                    args = tc.get("function", {}).get("arguments", "")
                    if len(args) > 500:
-                        new_args = _truncate_tool_call_args_json(args)
-                        if new_args != args:
-                            tc = {**tc, "function": {**tc["function"], "arguments": new_args}}
-                            modified = True
+                        tc = {**tc, "function": {**tc["function"], "arguments": args[:200] + "...[truncated]"}}
+                        modified = True
                new_tcs.append(tc)
            if modified:
                result[i] = {**msg, "tool_calls": new_tcs}
@@ -551,15 +497,11 @@ class ContextCompressor(ContextEngine):
        Includes tool call arguments and result content (up to
        ``_CONTENT_MAX`` chars per message) so the summarizer can preserve
        specific details like file paths, commands, and outputs.
-
-        All content is redacted before serialization to prevent secrets
-        (API keys, tokens, passwords) from leaking into the summary that
-        gets sent to the auxiliary model and persisted across compactions.
        """
        parts = []
        for msg in turns:
            role = msg.get("role", "unknown")
-            content = redact_sensitive_text(msg.get("content") or "")
+            content = msg.get("content") or ""

            # Tool results: keep enough content for the summarizer
            if role == "tool":
@@ -580,7 +522,7 @@ class ContextCompressor(ContextEngine):
                        if isinstance(tc, dict):
                            fn = tc.get("function", {})
                            name = fn.get("name", "?")
-                            args = redact_sensitive_text(fn.get("arguments", ""))
+                            args = fn.get("arguments", "")
                            # Truncate long arguments but keep enough for context
                            if len(args) > self._TOOL_ARGS_MAX:
                                args = args[:self._TOOL_ARGS_HEAD] + "..."
@@ -638,13 +580,7 @@ class ContextCompressor(ContextEngine):
            "assistant that continues the conversation. "
            "Do NOT respond to any questions or requests in the conversation — "
            "only output the structured summary. "
-            "Do NOT include any preamble, greeting, or prefix. "
-            "Write the summary in the same language the user was using in the "
-            "conversation — do not translate or switch to English. "
-            "NEVER include API keys, tokens, passwords, secrets, credentials, "
-            "or connection strings in the summary — replace any that appear "
-            "with [REDACTED]. Note that the user had credentials present, but "
-            "do not preserve their values."
+            "Do NOT include any preamble, greeting, or prefix."
        )

        # Shared structured template (used by both paths).
@@ -701,7 +637,7 @@ Be specific with file paths, commands, line numbers, and results.]
 [What remains to be done — framed as context, not instructions]

 ## Critical Context
-[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation. NEVER include API keys, tokens, passwords, or credentials — write [REDACTED] instead.]
+[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]

 Target ~{summary_budget} tokens. Be CONCRETE — include file paths, command outputs, error messages, line numbers, and specific values. Avoid vague descriptions like "made some changes" — say exactly what changed.

@@ -741,7 +677,7 @@ Use this exact structure:
            prompt += f"""

 FOCUS TOPIC: "{focus_topic}"
-The user has requested that this compaction PRIORITISE preserving all information related to the focus topic above. For content related to "{focus_topic}", include full detail — exact values, file paths, command outputs, error messages, and decisions. For content NOT related to the focus topic, summarise more aggressively (brief one-liners or omit if truly irrelevant). The focus topic sections should receive roughly 60-70% of the summary token budget. Even for the focus topic, NEVER preserve API keys, tokens, passwords, or credentials — use [REDACTED]."""
+The user has requested that this compaction PRIORITISE preserving all information related to the focus topic above. For content related to "{focus_topic}", include full detail — exact values, file paths, command outputs, error messages, and decisions. For content NOT related to the focus topic, summarise more aggressively (brief one-liners or omit if truly irrelevant). The focus topic sections should receive roughly 60-70% of the summary token budget."""

        try:
            call_kwargs = {
@@ -764,9 +700,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            # Handle cases where content is not a string (e.g., dict from llama.cpp)
            if not isinstance(content, str):
                content = str(content) if content else ""
-            # Redact the summary output as well — the summarizer LLM may
-            # ignore prompt instructions and echo back secrets verbatim.
-            summary = redact_sensitive_text(content.strip())
+            summary = content.strip()
            # Store for iterative updates on next compaction
            self._previous_summary = summary
            self._summary_failure_cooldown_until = 0.0
@@ -807,7 +741,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                )
                self.summary_model = ""  # empty = use main model
                self._summary_failure_cooldown_until = 0.0  # no cooldown
-                return self._generate_summary(turns_to_summarize)  # retry immediately
+                return self._generate_summary(messages, summary_budget)  # retry immediately

            # Transient errors (timeout, rate limit, network) — shorter cooldown
            _transient_cooldown = 60
--- a/hermes_agent/agent/context/engine.py
+++ b/hermes_agent/agent/context/engine.py
--- a/hermes_agent/agent/context/references.py
+++ b/hermes_agent/agent/context/references.py
@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Awaitable, Callable

-from hermes_agent.providers.metadata import estimate_tokens_rough
+from agent.model_metadata import estimate_tokens_rough

 _QUOTED_REFERENCE_VALUE = r'(?:`[^`\n]+`|"[^"\n]+"|\'[^\'\n]+\')'
 REFERENCE_PATTERN = re.compile(
@@ -315,7 +315,7 @@ async def _fetch_url_content(


 async def _default_url_fetcher(url: str) -> str:
-    from hermes_agent.tools.web import web_extract_tool
+    from tools.web_tools import web_extract_tool

    raw = await web_extract_tool([url], format="markdown", use_llm_processing=True)
    payload = json.loads(raw)
@@ -340,7 +340,7 @@ def _resolve_path(cwd: Path, target: str, *, allowed_root: Path | None = None) -


 def _ensure_reference_path_allowed(path: Path) -> None:
-    from hermes_agent.constants import get_hermes_home
+    from hermes_constants import get_hermes_home
    home = Path(os.path.expanduser("~")).resolve()
    hermes_home = get_hermes_home().resolve()

@@ -483,7 +483,9 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
            text=True,
            timeout=10,
        )
-    except (FileNotFoundError, OSError, subprocess.TimeoutExpired):
+    except FileNotFoundError:
+        return None
+    except subprocess.TimeoutExpired:
        return None
    if result.returncode != 0:
        return None
--- a/hermes_agent/agent/copilot_acp_client.py
+++ b/hermes_agent/agent/copilot_acp_client.py
@@ -21,9 +21,6 @@ from pathlib import Path
 from types import SimpleNamespace
 from typing import Any

-from hermes_agent.agent.file_safety import get_read_block_error, is_write_denied
-from hermes_agent.agent.redact import redact_sensitive_text
-
 ACP_MARKER_BASE_URL = "acp://copilot"
 _DEFAULT_TIMEOUT_SECONDS = 900.0

@@ -57,18 +54,6 @@ def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
    }


-def _permission_denied(message_id: Any) -> dict[str, Any]:
-    return {
-        "jsonrpc": "2.0",
-        "id": message_id,
-        "result": {
-            "outcome": {
-                "outcome": "cancelled",
-            }
-        },
-    }
-
-
 def _format_messages_as_prompt(
    messages: list[dict[str, Any]],
    model: str | None = None,
@@ -401,8 +386,6 @@ class CopilotACPClient:
        stderr_tail: deque[str] = deque(maxlen=40)

        def _stdout_reader() -> None:
-            if proc.stdout is None:
-                return
            for line in proc.stdout:
                try:
                    inbox.put(json.loads(line))
@@ -550,13 +533,18 @@ class CopilotACPClient:
        params = msg.get("params") or {}

        if method == "session/request_permission":
-            response = _permission_denied(message_id)
+            response = {
+                "jsonrpc": "2.0",
+                "id": message_id,
+                "result": {
+                    "outcome": {
+                        "outcome": "allow_once",
+                    }
+                },
+            }
        elif method == "fs/read_text_file":
            try:
                path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
-                block_error = get_read_block_error(str(path))
-                if block_error:
-                    raise PermissionError(block_error)
                content = path.read_text() if path.exists() else ""
                line = params.get("line")
                limit = params.get("limit")
@@ -565,8 +553,6 @@ class CopilotACPClient:
                    start = line - 1
                    end = start + limit if isinstance(limit, int) and limit > 0 else None
                    content = "".join(lines[start:end])
-                if content:
-                    content = redact_sensitive_text(content)
                response = {
                    "jsonrpc": "2.0",
                    "id": message_id,
@@ -579,10 +565,6 @@ class CopilotACPClient:
        elif method == "fs/write_text_file":
            try:
                path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
-                if is_write_denied(str(path)):
-                    raise PermissionError(
-                        f"Write denied: '{path}' is a protected system/credential file."
-                    )
                path.parent.mkdir(parents=True, exist_ok=True)
                path.write_text(str(params.get("content") or ""))
                response = {
--- a/hermes_agent/providers/credential_pool.py
+++ b/hermes_agent/providers/credential_pool.py
@@ -13,15 +13,17 @@ from dataclasses import dataclass, fields, replace
 from datetime import datetime
 from typing import Any, Dict, List, Optional, Set, Tuple

-from hermes_agent.constants import OPENROUTER_BASE_URL
-import hermes_agent.cli.auth.auth as auth_mod
-from hermes_agent.cli.auth.auth import (
+from hermes_constants import OPENROUTER_BASE_URL
+import hermes_cli.auth as auth_mod
+from hermes_cli.auth import (
    CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
    DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
    PROVIDER_REGISTRY,
    _auth_store_lock,
    _codex_access_token_is_expiring,
    _decode_jwt_claims,
+    _import_codex_cli_tokens,
+    _write_codex_cli_tokens,
    _load_auth_store,
    _load_provider_state,
    _resolve_kimi_base_url,
@@ -29,7 +31,6 @@ from hermes_agent.cli.auth.auth import (
    _save_auth_store,
    _save_provider_state,
    read_credential_pool,
-    read_provider_credentials,
    write_credential_pool,
 )

@@ -39,7 +40,7 @@ logger = logging.getLogger(__name__)
 def _load_config_safe() -> Optional[dict]:
    """Load config.yaml, returning None on any error."""
    try:
-        from hermes_agent.cli.config import load_config
+        from hermes_cli.config import load_config

        return load_config()
    except Exception:
@@ -289,7 +290,7 @@ def _iter_custom_providers(config: Optional[dict] = None):
    if not isinstance(custom_providers, list):
        # Fall back to the v12+ providers dict via the compatibility layer
        try:
-            from hermes_agent.cli.config import get_compatible_custom_providers
+            from hermes_cli.config import get_compatible_custom_providers

            custom_providers = get_compatible_custom_providers(config)
        except Exception:
@@ -322,7 +323,7 @@ def get_custom_provider_pool_key(base_url: str) -> Optional[str]:

 def list_custom_pool_providers() -> List[str]:
    """Return all 'custom:*' pool keys that have entries in auth.json."""
-    pool_data = read_credential_pool()
+    pool_data = read_credential_pool(None)
    return sorted(
        key for key in pool_data
        if key.startswith(CUSTOM_POOL_PREFIX)
@@ -430,7 +431,7 @@ class CredentialPool:
        if self.provider != "anthropic" or entry.source != "claude_code":
            return entry
        try:
-            from hermes_agent.providers.anthropic_adapter import read_claude_code_credentials
+            from agent.anthropic_adapter import read_claude_code_credentials
            creds = read_claude_code_credentials()
            if not creds:
                return entry
@@ -456,6 +457,39 @@ class CredentialPool:
            logger.debug("Failed to sync from credentials file: %s", exc)
        return entry

+    def _sync_codex_entry_from_cli(self, entry: PooledCredential) -> PooledCredential:
+        """Sync an openai-codex pool entry from ~/.codex/auth.json if tokens differ.
+
+        OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
+        When the Codex CLI (or another Hermes profile) refreshes its token,
+        the pool entry's refresh_token becomes stale.  This method detects that
+        by comparing against ~/.codex/auth.json and syncing the fresh pair.
+        """
+        if self.provider != "openai-codex":
+            return entry
+        try:
+            cli_tokens = _import_codex_cli_tokens()
+            if not cli_tokens:
+                return entry
+            cli_refresh = cli_tokens.get("refresh_token", "")
+            cli_access = cli_tokens.get("access_token", "")
+            if cli_refresh and cli_refresh != entry.refresh_token:
+                logger.debug("Pool entry %s: syncing tokens from ~/.codex/auth.json (refresh token changed)", entry.id)
+                updated = replace(
+                    entry,
+                    access_token=cli_access,
+                    refresh_token=cli_refresh,
+                    last_status=None,
+                    last_status_at=None,
+                    last_error_code=None,
+                )
+                self._replace_entry(entry, updated)
+                self._persist()
+                return updated
+        except Exception as exc:
+            logger.debug("Failed to sync from ~/.codex/auth.json: %s", exc)
+        return entry
+
    def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
        """Write refreshed pool entry tokens back to auth.json providers.

@@ -525,7 +559,7 @@ class CredentialPool:

        try:
            if self.provider == "anthropic":
-                from hermes_agent.providers.anthropic_adapter import refresh_anthropic_oauth_pure
+                from agent.anthropic_adapter import refresh_anthropic_oauth_pure

                refreshed = refresh_anthropic_oauth_pure(
                    entry.refresh_token,
@@ -542,7 +576,7 @@ class CredentialPool:
                # see the latest tokens.
                if entry.source == "claude_code":
                    try:
-                        from hermes_agent.providers.anthropic_adapter import _write_claude_code_credentials
+                        from agent.anthropic_adapter import _write_claude_code_credentials
                        _write_claude_code_credentials(
                            refreshed["access_token"],
                            refreshed["refresh_token"],
@@ -551,6 +585,13 @@ class CredentialPool:
                    except Exception as wexc:
                        logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
            elif self.provider == "openai-codex":
+                # Proactively sync from ~/.codex/auth.json before refresh.
+                # The Codex CLI (or another Hermes profile) may have already
+                # consumed our refresh_token.  Syncing first avoids a
+                # "refresh_token_reused" error when the CLI has a newer pair.
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced is not entry:
+                    entry = synced
                refreshed = auth_mod.refresh_codex_oauth_pure(
                    entry.access_token,
                    entry.refresh_token,
@@ -604,7 +645,7 @@ class CredentialPool:
                if synced.refresh_token != entry.refresh_token:
                    logger.debug("Retrying refresh with synced token from credentials file")
                    try:
-                        from hermes_agent.providers.anthropic_adapter import refresh_anthropic_oauth_pure
+                        from agent.anthropic_adapter import refresh_anthropic_oauth_pure
                        refreshed = refresh_anthropic_oauth_pure(
                            synced.refresh_token,
                            use_json=synced.source.endswith("hermes_pkce"),
@@ -621,7 +662,7 @@ class CredentialPool:
                        self._replace_entry(synced, updated)
                        self._persist()
                        try:
-                            from hermes_agent.providers.anthropic_adapter import _write_claude_code_credentials
+                            from agent.anthropic_adapter import _write_claude_code_credentials
                            _write_claude_code_credentials(
                                refreshed["access_token"],
                                refreshed["refresh_token"],
@@ -636,6 +677,45 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
+            # For openai-codex: the refresh_token may have been consumed by
+            # the Codex CLI between our proactive sync and the refresh call.
+            # Re-sync and retry once.
+            if self.provider == "openai-codex":
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced.refresh_token != entry.refresh_token:
+                    logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
+                    try:
+                        refreshed = auth_mod.refresh_codex_oauth_pure(
+                            synced.access_token,
+                            synced.refresh_token,
+                        )
+                        updated = replace(
+                            synced,
+                            access_token=refreshed["access_token"],
+                            refresh_token=refreshed["refresh_token"],
+                            last_refresh=refreshed.get("last_refresh"),
+                            last_status=STATUS_OK,
+                            last_status_at=None,
+                            last_error_code=None,
+                        )
+                        self._replace_entry(synced, updated)
+                        self._persist()
+                        self._sync_device_code_entry_to_auth_store(updated)
+                        try:
+                            _write_codex_cli_tokens(
+                                updated.access_token,
+                                updated.refresh_token,
+                                last_refresh=updated.last_refresh,
+                            )
+                        except Exception as wexc:
+                            logger.debug("Failed to write refreshed Codex tokens to CLI file (retry): %s", wexc)
+                        return updated
+                    except Exception as retry_exc:
+                        logger.debug("Codex retry refresh also failed: %s", retry_exc)
+                elif not self._entry_needs_refresh(synced):
+                    logger.debug("Codex CLI has valid token, using without refresh")
+                    self._sync_device_code_entry_to_auth_store(synced)
+                    return synced
            self._mark_exhausted(entry, None)
            return None

@@ -654,6 +734,17 @@ class CredentialPool:
        # _seed_from_singletons() on the next load_pool() sees fresh state
        # instead of re-seeding stale/consumed tokens.
        self._sync_device_code_entry_to_auth_store(updated)
+        # Write refreshed tokens back to ~/.codex/auth.json so Codex CLI
+        # and VS Code don't hit "refresh_token_reused" on their next refresh.
+        if self.provider == "openai-codex":
+            try:
+                _write_codex_cli_tokens(
+                    updated.access_token,
+                    updated.refresh_token,
+                    last_refresh=updated.last_refresh,
+                )
+            except Exception as wexc:
+                logger.debug("Failed to write refreshed Codex tokens to CLI file: %s", wexc)
        return updated

    def _entry_needs_refresh(self, entry: PooledCredential) -> bool:
@@ -699,6 +790,16 @@ class CredentialPool:
                if synced is not entry:
                    entry = synced
                    cleared_any = True
+            # For openai-codex entries, sync from ~/.codex/auth.json before
+            # any status/refresh checks.  This picks up tokens refreshed by
+            # the Codex CLI or another Hermes profile.
+            if (self.provider == "openai-codex"
+                    and entry.last_status == STATUS_EXHAUSTED
+                    and entry.refresh_token):
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced is not entry:
+                    entry = synced
+                    cleared_any = True
            if entry.last_status == STATUS_EXHAUSTED:
                exhausted_until = _exhausted_until(entry)
                if exhausted_until is not None and now < exhausted_until:
@@ -876,20 +977,6 @@ class CredentialPool:
            self._current_id = None
        return removed

-    def remove_entry(self, entry_id: str) -> Optional[PooledCredential]:
-        for idx, entry in enumerate(self._entries):
-            if entry.id == entry_id:
-                removed = self._entries.pop(idx)
-                self._entries = [
-                    replace(e, priority=new_priority)
-                    for new_priority, e in enumerate(self._entries)
-                ]
-                self._persist()
-                if self._current_id == removed.id:
-                    self._current_id = None
-                return removed
-        return None
-
    def resolve_target(self, target: Any) -> Tuple[Optional[int], Optional[PooledCredential], Optional[str]]:
        raw = str(target or "").strip()
        if not raw:
@@ -998,35 +1085,32 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
    active_sources: Set[str] = set()
    auth_store = _load_auth_store()

-    # Shared suppression gate — used at every upsert site so
-    # `hermes auth remove <provider> <N>` is stable across all source types.
-    try:
-        from hermes_agent.cli.auth.auth import is_source_suppressed as _is_suppressed
-    except ImportError:
-        def _is_suppressed(_p, _s):  # type: ignore[misc]
-            return False
-
    if provider == "anthropic":
        # Only auto-discover external credentials (Claude Code, Hermes PKCE)
        # when the user has explicitly configured anthropic as their provider.
        # Without this gate, auxiliary client fallback chains silently read
        # ~/.claude/.credentials.json without user consent.  See PR #4210.
        try:
-            from hermes_agent.cli.auth.auth import is_provider_explicitly_configured
+            from hermes_cli.auth import is_provider_explicitly_configured
            if not is_provider_explicitly_configured("anthropic"):
                return changed, active_sources
        except ImportError:
            pass

-        from hermes_agent.providers.anthropic_adapter import read_claude_code_credentials, read_hermes_oauth_credentials
+        from agent.anthropic_adapter import read_claude_code_credentials, read_hermes_oauth_credentials

        for source_name, creds in (
            ("hermes_pkce", read_hermes_oauth_credentials()),
            ("claude_code", read_claude_code_credentials()),
        ):
            if creds and creds.get("accessToken"):
-                if _is_suppressed(provider, source_name):
-                    continue
+                # Check if user explicitly removed this source
+                try:
+                    from hermes_cli.auth import is_source_suppressed
+                    if is_source_suppressed(provider, source_name):
+                        continue
+                except ImportError:
+                    pass
                active_sources.add(source_name)
                changed |= _upsert_entry(
                    entries,
@@ -1044,16 +1128,8 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup

    elif provider == "nous":
        state = _load_provider_state(auth_store, "nous")
-        if state and not _is_suppressed(provider, "device_code"):
+        if state:
            active_sources.add("device_code")
-            # Prefer a user-supplied label embedded in the singleton state
-            # (set by persist_nous_credentials(label=...) when the user ran
-            # `hermes auth add nous --label <name>`).  Fall back to the
-            # auto-derived token fingerprint for logins that didn't supply one.
-            custom_label = str(state.get("label") or "").strip()
-            seeded_label = custom_label or label_from_token(
-                state.get("access_token", ""), "device_code"
-            )
            changed |= _upsert_entry(
                entries,
                provider,
@@ -1072,7 +1148,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
                    "agent_key": state.get("agent_key"),
                    "agent_key_expires_at": state.get("agent_key_expires_at"),
                    "tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
-                    "label": seeded_label,
+                    "label": label_from_token(state.get("access_token", ""), "device_code"),
                },
            )

@@ -1081,25 +1157,24 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
        # env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN).  They don't live in
        # the auth store or credential pool, so we resolve them here.
        try:
-            from hermes_agent.cli.auth.copilot import resolve_copilot_token
+            from hermes_cli.copilot_auth import resolve_copilot_token
            token, source = resolve_copilot_token()
            if token:
                source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
-                if not _is_suppressed(provider, source_name):
-                    active_sources.add(source_name)
-                    pconfig = PROVIDER_REGISTRY.get(provider)
-                    changed |= _upsert_entry(
-                        entries,
-                        provider,
-                        source_name,
-                        {
-                            "source": source_name,
-                            "auth_type": AUTH_TYPE_API_KEY,
-                            "access_token": token,
-                            "base_url": pconfig.inference_base_url if pconfig else "",
-                            "label": source,
-                        },
-                    )
+                active_sources.add(source_name)
+                pconfig = PROVIDER_REGISTRY.get(provider)
+                changed |= _upsert_entry(
+                    entries,
+                    provider,
+                    source_name,
+                    {
+                        "source": source_name,
+                        "auth_type": AUTH_TYPE_API_KEY,
+                        "access_token": token,
+                        "base_url": pconfig.inference_base_url if pconfig else "",
+                        "label": source,
+                    },
+                )
        except Exception as exc:
            logger.debug("Copilot token seed failed: %s", exc)

@@ -1110,45 +1185,48 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
        # Use refresh_if_expiring=False to avoid network calls during
        # pool loading / provider discovery.
        try:
-            from hermes_agent.cli.auth.auth import resolve_qwen_runtime_credentials
+            from hermes_cli.auth import resolve_qwen_runtime_credentials
            creds = resolve_qwen_runtime_credentials(refresh_if_expiring=False)
            token = creds.get("api_key", "")
            if token:
                source_name = creds.get("source", "qwen-cli")
-                if not _is_suppressed(provider, source_name):
-                    active_sources.add(source_name)
-                    changed |= _upsert_entry(
-                        entries,
-                        provider,
-                        source_name,
-                        {
-                            "source": source_name,
-                            "auth_type": AUTH_TYPE_OAUTH,
-                            "access_token": token,
-                            "expires_at_ms": creds.get("expires_at_ms"),
-                            "base_url": creds.get("base_url", ""),
-                            "label": creds.get("auth_file", source_name),
-                        },
-                    )
+                active_sources.add(source_name)
+                changed |= _upsert_entry(
+                    entries,
+                    provider,
+                    source_name,
+                    {
+                        "source": source_name,
+                        "auth_type": AUTH_TYPE_OAUTH,
+                        "access_token": token,
+                        "expires_at_ms": creds.get("expires_at_ms"),
+                        "base_url": creds.get("base_url", ""),
+                        "label": creds.get("auth_file", source_name),
+                    },
+                )
        except Exception as exc:
            logger.debug("Qwen OAuth token seed failed: %s", exc)

    elif provider == "openai-codex":
-        # Respect user suppression — `hermes auth remove openai-codex` marks
-        # the device_code source as suppressed so it won't be re-seeded from
-        # the Hermes auth store.  Without this gate the removal is instantly
-        # undone on the next load_pool() call.
-        if _is_suppressed(provider, "device_code"):
-            return changed, active_sources
-
        state = _load_provider_state(auth_store, "openai-codex")
        tokens = state.get("tokens") if isinstance(state, dict) else None
-        # Hermes owns its own Codex auth state — we do NOT auto-import from
-        # ~/.codex/auth.json at pool-load time.  OAuth refresh tokens are
-        # single-use, so sharing them with Codex CLI / VS Code causes
-        # refresh_token_reused race failures.  Users who want to adopt
-        # existing Codex CLI credentials get a one-time, explicit prompt
-        # via `hermes auth openai-codex`.
+        # Fallback: import from Codex CLI (~/.codex/auth.json) if Hermes auth
+        # store has no tokens.  This mirrors resolve_codex_runtime_credentials()
+        # so that load_pool() and list_authenticated_providers() detect tokens
+        # that only exist in the Codex CLI shared file.
+        if not (isinstance(tokens, dict) and tokens.get("access_token")):
+            try:
+                from hermes_cli.auth import _import_codex_cli_tokens, _save_codex_tokens
+                cli_tokens = _import_codex_cli_tokens()
+                if cli_tokens:
+                    logger.info("Importing Codex CLI tokens into Hermes auth store.")
+                    _save_codex_tokens(cli_tokens)
+                    # Re-read state after import
+                    auth_store = _load_auth_store()
+                    state = _load_provider_state(auth_store, "openai-codex")
+                    tokens = state.get("tokens") if isinstance(state, dict) else None
+            except Exception as exc:
+                logger.debug("Codex CLI token import failed: %s", exc)
        if isinstance(tokens, dict) and tokens.get("access_token"):
            active_sources.add("device_code")
            changed |= _upsert_entry(
@@ -1172,22 +1250,10 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
 def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool, Set[str]]:
    changed = False
    active_sources: Set[str] = set()
-    # Honour user suppression — `hermes auth remove <provider> <N>` for an
-    # env-seeded credential marks the env:<VAR> source as suppressed so it
-    # won't be re-seeded from the user's shell environment or ~/.hermes/.env.
-    # Without this gate the removal is silently undone on the next
-    # load_pool() call whenever the var is still exported by the shell.
-    try:
-        from hermes_agent.cli.auth.auth import is_source_suppressed as _is_source_suppressed
-    except ImportError:
-        def _is_source_suppressed(_p, _s):  # type: ignore[misc]
-            return False
    if provider == "openrouter":
        token = os.getenv("OPENROUTER_API_KEY", "").strip()
        if token:
            source = "env:OPENROUTER_API_KEY"
-            if _is_source_suppressed(provider, source):
-                return changed, active_sources
            active_sources.add(source)
            changed |= _upsert_entry(
                entries,
@@ -1224,8 +1290,6 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
        if not token:
            continue
        source = f"env:{env_var}"
-        if _is_source_suppressed(provider, source):
-            continue
        active_sources.add(source)
        auth_type = AUTH_TYPE_OAUTH if provider == "anthropic" and not token.startswith("sk-ant-api") else AUTH_TYPE_API_KEY
        base_url = env_url or pconfig.inference_base_url
@@ -1270,13 +1334,6 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
    changed = False
    active_sources: Set[str] = set()

-    # Shared suppression gate — same pattern as _seed_from_env/_seed_from_singletons.
-    try:
-        from hermes_agent.cli.auth.auth import is_source_suppressed as _is_suppressed
-    except ImportError:
-        def _is_suppressed(_p, _s):  # type: ignore[misc]
-            return False
-
    # Seed from the custom_providers config entry's api_key field
    cp_config = _get_custom_provider_config(pool_key)
    if cp_config:
@@ -1285,20 +1342,19 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
        name = str(cp_config.get("name") or "").strip()
        if api_key:
            source = f"config:{name}"
-            if not _is_suppressed(pool_key, source):
-                active_sources.add(source)
-                changed |= _upsert_entry(
-                    entries,
-                    pool_key,
-                    source,
-                    {
-                        "source": source,
-                        "auth_type": AUTH_TYPE_API_KEY,
-                        "access_token": api_key,
-                        "base_url": base_url,
-                        "label": name or source,
-                    },
-                )
+            active_sources.add(source)
+            changed |= _upsert_entry(
+                entries,
+                pool_key,
+                source,
+                {
+                    "source": source,
+                    "auth_type": AUTH_TYPE_API_KEY,
+                    "access_token": api_key,
+                    "base_url": base_url,
+                    "label": name or source,
+                },
+            )

    # Seed from model.api_key if model.provider=='custom' and model.base_url matches
    try:
@@ -1318,20 +1374,19 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
                matched_key = get_custom_provider_pool_key(model_base_url)
                if matched_key == pool_key:
                    source = "model_config"
-                    if not _is_suppressed(pool_key, source):
-                        active_sources.add(source)
-                        changed |= _upsert_entry(
-                            entries,
-                            pool_key,
-                            source,
-                            {
-                                "source": source,
-                                "auth_type": AUTH_TYPE_API_KEY,
-                                "access_token": model_api_key,
-                                "base_url": model_base_url,
-                                "label": "model_config",
-                            },
-                        )
+                    active_sources.add(source)
+                    changed |= _upsert_entry(
+                        entries,
+                        pool_key,
+                        source,
+                        {
+                            "source": source,
+                            "auth_type": AUTH_TYPE_API_KEY,
+                            "access_token": model_api_key,
+                            "base_url": model_base_url,
+                            "label": "model_config",
+                        },
+                    )
    except Exception:
        pass

@@ -1340,7 +1395,7 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b

 def load_pool(provider: str) -> CredentialPool:
    provider = (provider or "").strip().lower()
-    raw_entries = read_provider_credentials(provider)
+    raw_entries = read_credential_pool(provider)
    entries = [PooledCredential.from_dict(provider, payload) for payload in raw_entries]

    if provider.startswith(CUSTOM_POOL_PREFIX):
--- a/hermes_agent/agent/display.py
+++ b/hermes_agent/agent/display.py
@@ -13,7 +13,7 @@ from dataclasses import dataclass, field
 from difflib import unified_diff
 from pathlib import Path

-from hermes_agent.utils import safe_json_loads
+from utils import safe_json_loads

 # ANSI escape codes for coloring tool failure indicators
 _RED = "\033[31m"
@@ -43,7 +43,7 @@ def _diff_ansi() -> dict[str, str]:
    plus = "\033[38;2;255;255;255;48;2;20;90;20m"

    try:
-        from hermes_agent.cli.ui.skin_engine import get_active_skin
+        from hermes_cli.skin_engine import get_active_skin
        skin = get_active_skin()

        def _hex_fg(key: str, fallback_rgb: tuple[int, int, int]) -> str:
@@ -118,7 +118,7 @@ def get_tool_preview_max_len() -> int:
 def _get_skin():
    """Get the active skin config, or None if not available."""
    try:
-        from hermes_agent.cli.ui.skin_engine import get_active_skin
+        from hermes_cli.skin_engine import get_active_skin
        return get_active_skin()
    except Exception:
        return None
@@ -148,7 +148,7 @@ def get_tool_emoji(tool_name: str, default: str = "⚡") -> str:
            return override
    # 2. Registry default
    try:
-        from hermes_agent.tools.registry import registry
+        from tools.registry import registry
        emoji = registry.get_emoji(tool_name, default="")
        if emoji:
            return emoji
@@ -225,11 +225,9 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
            content = _oneline(args.get("content", ""))
            return f"+{target}: \"{content[:25]}{'...' if len(content) > 25 else ''}\""
        elif action == "replace":
-            old = _oneline(args.get("old_text") or "") or "<missing old_text>"
-            return f"~{target}: \"{old[:20]}\""
+            return f"~{target}: \"{_oneline(args.get('old_text', '')[:20])}\""
        elif action == "remove":
-            old = _oneline(args.get("old_text") or "") or "<missing old_text>"
-            return f"-{target}: \"{old[:20]}\""
+            return f"-{target}: \"{_oneline(args.get('old_text', '')[:20])}\""
        return action

    if tool_name == "send_message":
@@ -311,7 +309,7 @@ def _resolve_skill_manage_paths(args: dict) -> list[Path]:
    if not action or not name:
        return []

-    from hermes_agent.tools.skills.manager import _find_skill, _resolve_skill_dir
+    from tools.skill_manager_tool import _find_skill, _resolve_skill_dir

    if action == "create":
        skill_dir = _resolve_skill_dir(name, args.get("category"))
@@ -729,7 +727,6 @@ class KawaiiSpinner:
                time.sleep(0.1)
                continue
            frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
-            assert self.start_time is not None  # start() sets it before thread starts
            elapsed = time.time() - self.start_time
            if wings:
                left, right = wings[self.frame_idx % len(wings)]
@@ -942,13 +939,9 @@ def get_cute_tool_message(
        if action == "add":
            return _wrap(f"┊ 🧠 memory    +{target}: \"{_trunc(args.get('content', ''), 30)}\"  {dur}")
        elif action == "replace":
-            old = args.get("old_text") or ""
-            old = old if old else "<missing old_text>"
-            return _wrap(f"┊ 🧠 memory    ~{target}: \"{_trunc(old, 20)}\"  {dur}")
+            return _wrap(f"┊ 🧠 memory    ~{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
        elif action == "remove":
-            old = args.get("old_text") or ""
-            old = old if old else "<missing old_text>"
-            return _wrap(f"┊ 🧠 memory    -{target}: \"{_trunc(old, 20)}\"  {dur}")
+            return _wrap(f"┊ 🧠 memory    -{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
        return _wrap(f"┊ 🧠 memory    {action}  {dur}")
    if tool_name == "skills_list":
        return _wrap(f"┊ 📚 skills    list {args.get('category', 'all')}  {dur}")
--- a/hermes_agent/providers/errors.py
+++ b/hermes_agent/providers/errors.py
@@ -290,7 +290,7 @@ def classify_api_error(
    if isinstance(body, dict):
        _err_obj = body.get("error", {})
        if isinstance(_err_obj, dict):
-            _body_msg = str(_err_obj.get("message") or "").lower()
+            _body_msg = (_err_obj.get("message") or "").lower()
            # Parse metadata.raw for wrapped provider errors
            _metadata = _err_obj.get("metadata", {})
            if isinstance(_metadata, dict):
@@ -302,11 +302,11 @@ def classify_api_error(
                        if isinstance(_inner, dict):
                            _inner_err = _inner.get("error", {})
                            if isinstance(_inner_err, dict):
-                                _metadata_msg = str(_inner_err.get("message") or "").lower()
+                                _metadata_msg = (_inner_err.get("message") or "").lower()
                    except (json.JSONDecodeError, TypeError):
                        pass
        if not _body_msg:
-            _body_msg = str(body.get("message") or "").lower()
+            _body_msg = (body.get("message") or "").lower()
    # Combine all message sources for pattern matching
    parts = [_raw_msg]
    if _body_msg and _body_msg not in _raw_msg:
@@ -606,10 +606,10 @@ def _classify_400(
    if isinstance(body, dict):
        err_obj = body.get("error", {})
        if isinstance(err_obj, dict):
-            err_body_msg = str(err_obj.get("message") or "").strip().lower()
+            err_body_msg = (err_obj.get("message") or "").strip().lower()
        # Responses API (and some providers) use flat body: {"message": "..."}
        if not err_body_msg:
-            err_body_msg = str(body.get("message") or "").strip().lower()
+            err_body_msg = (body.get("message") or "").strip().lower()
    is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
    is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80

--- a/hermes_agent/providers/gemini_cloudcode_adapter.py
+++ b/hermes_agent/providers/gemini_cloudcode_adapter.py
@@ -38,9 +38,8 @@ from typing import Any, Dict, Iterator, List, Optional

 import httpx

-from hermes_agent.providers import google_oauth
-from hermes_agent.providers.gemini_schema import sanitize_gemini_tool_parameters
-from hermes_agent.providers.google_code_assist import (
+from agent import google_oauth
+from agent.google_code_assist import (
    CODE_ASSIST_ENDPOINT,
    FREE_TIER_ID,
    CodeAssistError,
@@ -206,7 +205,7 @@ def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
            decl["description"] = str(fn["description"])
        params = fn.get("parameters")
        if isinstance(params, dict):
-            decl["parameters"] = sanitize_gemini_tool_parameters(params)
+            decl["parameters"] = params
        declarations.append(decl)
    if not declarations:
        return []
@@ -505,16 +504,9 @@ def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
 def _translate_stream_event(
    event: Dict[str, Any],
    model: str,
-    tool_call_counter: List[int],
+    tool_call_indices: Dict[str, int],
 ) -> List[_GeminiStreamChunk]:
-    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s).
-
-    ``tool_call_counter`` is a single-element list used as a mutable counter
-    across events in the same stream. Each ``functionCall`` part gets a
-    fresh, unique OpenAI ``index`` — keying by function name would collide
-    whenever the model issues parallel calls to the same tool (e.g. reading
-    three files in one turn).
-    """
+    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s)."""
    inner = event.get("response") if isinstance(event.get("response"), dict) else event
    candidates = inner.get("candidates") or []
    if not candidates:
@@ -540,8 +532,7 @@ def _translate_stream_event(
        fc = part.get("functionCall")
        if isinstance(fc, dict) and fc.get("name"):
            name = str(fc["name"])
-            idx = tool_call_counter[0]
-            tool_call_counter[0] += 1
+            idx = tool_call_indices.setdefault(name, len(tool_call_indices))
            try:
                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
            except (TypeError, ValueError):
@@ -558,7 +549,7 @@ def _translate_stream_event(
    finish_reason_raw = str(cand.get("finishReason") or "")
    if finish_reason_raw:
        mapped = _map_gemini_finish_reason(finish_reason_raw)
-        if tool_call_counter[0] > 0:
+        if tool_call_indices:
            mapped = "tool_calls"
        chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
    return chunks
@@ -742,9 +733,9 @@ class GeminiCloudCodeClient:
                        # Materialize error body for better diagnostics
                        response.read()
                        raise _gemini_http_error(response)
-                    tool_call_counter: List[int] = [0]
+                    tool_call_indices: Dict[str, int] = {}
                    for event in _iter_sse_events(response):
-                        for chunk in _translate_stream_event(event, model, tool_call_counter):
+                        for chunk in _translate_stream_event(event, model, tool_call_indices):
                            yield chunk
            except httpx.HTTPError as exc:
                raise CodeAssistError(
@@ -756,150 +747,18 @@ class GeminiCloudCodeClient:


 def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
-    """Translate an httpx response into a CodeAssistError with rich metadata.
-
-    Parses Google's error envelope (``{"error": {"code", "message", "status",
-    "details": [...]}}``) so the agent's error classifier can reason about
-    the failure — ``status_code`` enables the rate_limit / auth classification
-    paths, and ``response`` lets the main loop honor ``Retry-After`` just
-    like it does for OpenAI SDK exceptions.
-
-    Also lifts a few recognizable Google conditions into human-readable
-    messages so the user sees something better than a 500-char JSON dump:
-
-        MODEL_CAPACITY_EXHAUSTED → "Gemini model capacity exhausted for
-            <model>. This is a Google-side throttle..."
-        RESOURCE_EXHAUSTED w/o reason → quota-style message
-        404 → "Model <name> not found at cloudcode-pa..."
-    """
    status = response.status_code
-
-    # Parse the body once, surviving any weird encodings.
-    body_text = ""
-    body_json: Dict[str, Any] = {}
    try:
-        body_text = response.text
+        body = response.text[:500]
    except Exception:
-        body_text = ""
-    if body_text:
-        try:
-            parsed = json.loads(body_text)
-            if isinstance(parsed, dict):
-                body_json = parsed
-        except (ValueError, TypeError):
-            body_json = {}
-
-    # Dig into Google's error envelope.  Shape is:
-    #   {"error": {"code": 429, "message": "...", "status": "RESOURCE_EXHAUSTED",
-    #              "details": [{"@type": ".../ErrorInfo", "reason": "MODEL_CAPACITY_EXHAUSTED",
-    #                           "metadata": {...}},
-    #                          {"@type": ".../RetryInfo", "retryDelay": "30s"}]}}
-    err_obj = body_json.get("error") if isinstance(body_json, dict) else None
-    if not isinstance(err_obj, dict):
-        err_obj = {}
-    err_status = str(err_obj.get("status") or "").strip()
-    err_message = str(err_obj.get("message") or "").strip()
-    _raw_details = err_obj.get("details")
-    err_details_list = _raw_details if isinstance(_raw_details, list) else []
-
-    # Extract google.rpc.ErrorInfo reason + metadata.  There may be more
-    # than one ErrorInfo (rare), so we pick the first one with a reason.
-    error_reason = ""
-    error_metadata: Dict[str, Any] = {}
-    retry_delay_seconds: Optional[float] = None
-    for detail in err_details_list:
-        if not isinstance(detail, dict):
-            continue
-        type_url = str(detail.get("@type") or "")
-        if not error_reason and type_url.endswith("/google.rpc.ErrorInfo"):
-            reason = detail.get("reason")
-            if isinstance(reason, str) and reason:
-                error_reason = reason
-            md = detail.get("metadata")
-            if isinstance(md, dict):
-                error_metadata = md
-        elif retry_delay_seconds is None and type_url.endswith("/google.rpc.RetryInfo"):
-            # retryDelay is a google.protobuf.Duration string like "30s" or "1.5s".
-            delay_raw = detail.get("retryDelay")
-            if isinstance(delay_raw, str) and delay_raw.endswith("s"):
-                try:
-                    retry_delay_seconds = float(delay_raw[:-1])
-                except ValueError:
-                    pass
-            elif isinstance(delay_raw, (int, float)):
-                retry_delay_seconds = float(delay_raw)
-
-    # Fall back to the Retry-After header if the body didn't include RetryInfo.
-    if retry_delay_seconds is None:
-        try:
-            header_val = response.headers.get("Retry-After") or response.headers.get("retry-after")
-        except Exception:
-            header_val = None
-        if header_val:
-            try:
-                retry_delay_seconds = float(header_val)
-            except (TypeError, ValueError):
-                retry_delay_seconds = None
-
-    # Classify the error code.  ``code_assist_rate_limited`` stays the default
-    # for 429s; a more specific reason tag helps downstream callers (e.g. tests,
-    # logs) without changing the rate_limit classification path.
+        body = ""
+    # Let run_agent's retry logic see auth errors as rotatable via `api_key`
    code = f"code_assist_http_{status}"
    if status == 401:
        code = "code_assist_unauthorized"
    elif status == 429:
        code = "code_assist_rate_limited"
-        if error_reason == "MODEL_CAPACITY_EXHAUSTED":
-            code = "code_assist_capacity_exhausted"
-
-    # Build a human-readable message.  Keep the status + a raw-body tail for
-    # debugging, but lead with a friendlier summary when we recognize the
-    # Google signal.
-    model_hint = ""
-    if isinstance(error_metadata, dict):
-        model_hint = str(error_metadata.get("model") or error_metadata.get("modelId") or "").strip()
-
-    if status == 429 and error_reason == "MODEL_CAPACITY_EXHAUSTED":
-        target = model_hint or "this Gemini model"
-        message = (
-            f"Gemini capacity exhausted for {target} (Google-side throttle, "
-            f"not a Hermes issue). Try a different Gemini model or set a "
-            f"fallback_providers entry to a non-Gemini provider."
-        )
-        if retry_delay_seconds is not None:
-            message += f" Google suggests retrying in {retry_delay_seconds:g}s."
-    elif status == 429 and err_status == "RESOURCE_EXHAUSTED":
-        message = (
-            f"Gemini quota exhausted ({err_message or 'RESOURCE_EXHAUSTED'}). "
-            f"Check /gquota for remaining daily requests."
-        )
-        if retry_delay_seconds is not None:
-            message += f" Retry suggested in {retry_delay_seconds:g}s."
-    elif status == 404:
-        # Google returns 404 when a model has been retired or renamed.
-        target = model_hint or (err_message or "model")
-        message = (
-            f"Code Assist 404: {target} is not available at "
-            f"cloudcode-pa.googleapis.com. It may have been renamed or "
-            f"retired. Check hermes_cli/models.py for the current list."
-        )
-    elif err_message:
-        # Generic fallback with the parsed message.
-        message = f"Code Assist HTTP {status} ({err_status or 'error'}): {err_message}"
-    else:
-        # Last-ditch fallback — raw body snippet.
-        message = f"Code Assist returned HTTP {status}: {body_text[:500]}"
-
    return CodeAssistError(
-        message,
+        f"Code Assist returned HTTP {status}: {body}",
        code=code,
-        status_code=status,
-        response=response,
-        retry_after=retry_delay_seconds,
-        details={
-            "status": err_status,
-            "reason": error_reason,
-            "metadata": error_metadata,
-            "message": err_message,
-        },
    )
--- a/hermes_agent/providers/google_code_assist.py
+++ b/hermes_agent/providers/google_code_assist.py
@@ -68,45 +68,9 @@ _ONBOARDING_POLL_INTERVAL_SECONDS = 5.0


 class CodeAssistError(RuntimeError):
-    """Exception raised by the Code Assist (``cloudcode-pa``) integration.
-
-    Carries HTTP status / response / retry-after metadata so the agent's
-    ``error_classifier._extract_status_code`` and the main loop's Retry-After
-    handling (which walks ``error.response.headers``) pick up the right
-    signals.  Without these, 429s from the OAuth path look like opaque
-    ``RuntimeError`` and skip the rate-limit path.
-    """
-
-    def __init__(
-        self,
-        message: str,
-        *,
-        code: str = "code_assist_error",
-        status_code: Optional[int] = None,
-        response: Any = None,
-        retry_after: Optional[float] = None,
-        details: Optional[Dict[str, Any]] = None,
-    ) -> None:
+    def __init__(self, message: str, *, code: str = "code_assist_error") -> None:
        super().__init__(message)
        self.code = code
-        # ``status_code`` is picked up by ``agent.error_classifier._extract_status_code``
-        # so a 429 from Code Assist classifies as FailoverReason.rate_limit and
-        # triggers the main loop's fallback_providers chain the same way SDK
-        # errors do.
-        self.status_code = status_code
-        # ``response`` is the underlying ``httpx.Response`` (or a shim with a
-        # ``.headers`` mapping and ``.json()`` method).  The main loop reads
-        # ``error.response.headers["Retry-After"]`` to honor Google's retry
-        # hints when the backend throttles us.
-        self.response = response
-        # Parsed ``Retry-After`` seconds (kept separately for convenience —
-        # Google returns retry hints in both the header and the error body's
-        # ``google.rpc.RetryInfo`` details, and we pick whichever we found).
-        self.retry_after = retry_after
-        # Parsed structured error details from the Google error envelope
-        # (e.g. ``{"reason": "MODEL_CAPACITY_EXHAUSTED", "status": "RESOURCE_EXHAUSTED"}``).
-        # Useful for logging and for tests that want to assert on specifics.
-        self.details = details or {}


 class ProjectIdRequiredError(CodeAssistError):
--- a/hermes_agent/providers/google_oauth.py
+++ b/hermes_agent/providers/google_oauth.py
@@ -60,7 +60,7 @@ from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any, Dict, Optional, Tuple

-from hermes_agent.constants import get_hermes_home
+from hermes_constants import get_hermes_home

 logger = logging.getLogger(__name__)

--- a/hermes_agent/agent/insights.py
+++ b/hermes_agent/agent/insights.py
@@ -10,7 +10,7 @@ multi-platform architecture with additional cost estimation and platform
 breakdown capabilities.

 Usage:
-    from hermes_agent.agent.insights import InsightsEngine
+    from agent.insights import InsightsEngine
    engine = InsightsEngine(db)
    report = engine.generate(days=30)
    print(engine.format_terminal(report))
@@ -22,7 +22,7 @@ from collections import Counter, defaultdict
 from datetime import datetime
 from typing import Any, Dict, List

-from hermes_agent.providers.pricing import (
+from agent.usage_pricing import (
    CanonicalUsage,
    DEFAULT_PRICING,
    estimate_usage_cost,
@@ -124,7 +124,6 @@ class InsightsEngine:
        # Gather raw data
        sessions = self._get_sessions(cutoff, source)
        tool_usage = self._get_tool_usage(cutoff, source)
-        skill_usage = self._get_skill_usage(cutoff, source)
        message_stats = self._get_message_stats(cutoff, source)

        if not sessions:
@@ -136,15 +135,6 @@ class InsightsEngine:
                "models": [],
                "platforms": [],
                "tools": [],
-                "skills": {
-                    "summary": {
-                        "total_skill_loads": 0,
-                        "total_skill_edits": 0,
-                        "total_skill_actions": 0,
-                        "distinct_skills_used": 0,
-                    },
-                    "top_skills": [],
-                },
                "activity": {},
                "top_sessions": [],
            }
@@ -154,7 +144,6 @@ class InsightsEngine:
        models = self._compute_model_breakdown(sessions)
        platforms = self._compute_platform_breakdown(sessions)
        tools = self._compute_tool_breakdown(tool_usage)
-        skills = self._compute_skill_breakdown(skill_usage)
        activity = self._compute_activity_patterns(sessions)
        top_sessions = self._compute_top_sessions(sessions)

@@ -167,7 +156,6 @@ class InsightsEngine:
            "models": models,
            "platforms": platforms,
            "tools": tools,
-            "skills": skills,
            "activity": activity,
            "top_sessions": top_sessions,
        }
@@ -296,82 +284,6 @@ class InsightsEngine:
            for name, count in tool_counts.most_common()
        ]

-    def _get_skill_usage(self, cutoff: float, source: str = None) -> List[Dict]:
-        """Extract per-skill usage from assistant tool calls."""
-        skill_counts: Dict[str, Dict[str, Any]] = {}
-
-        if source:
-            cursor = self._conn.execute(
-                """SELECT m.tool_calls, m.timestamp
-                   FROM messages m
-                   JOIN sessions s ON s.id = m.session_id
-                   WHERE s.started_at >= ? AND s.source = ?
-                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
-                (cutoff, source),
-            )
-        else:
-            cursor = self._conn.execute(
-                """SELECT m.tool_calls, m.timestamp
-                   FROM messages m
-                   JOIN sessions s ON s.id = m.session_id
-                   WHERE s.started_at >= ?
-                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
-                (cutoff,),
-            )
-
-        for row in cursor.fetchall():
-            try:
-                calls = row["tool_calls"]
-                if isinstance(calls, str):
-                    calls = json.loads(calls)
-                if not isinstance(calls, list):
-                    continue
-            except (json.JSONDecodeError, TypeError):
-                continue
-
-            timestamp = row["timestamp"]
-            for call in calls:
-                if not isinstance(call, dict):
-                    continue
-                func = call.get("function", {})
-                tool_name = func.get("name")
-                if tool_name not in {"skill_view", "skill_manage"}:
-                    continue
-
-                args = func.get("arguments")
-                if isinstance(args, str):
-                    try:
-                        args = json.loads(args)
-                    except (json.JSONDecodeError, TypeError):
-                        continue
-                if not isinstance(args, dict):
-                    continue
-
-                skill_name = args.get("name")
-                if not isinstance(skill_name, str) or not skill_name.strip():
-                    continue
-
-                entry = skill_counts.setdefault(
-                    skill_name,
-                    {
-                        "skill": skill_name,
-                        "view_count": 0,
-                        "manage_count": 0,
-                        "last_used_at": None,
-                    },
-                )
-                if tool_name == "skill_view":
-                    entry["view_count"] += 1
-                else:
-                    entry["manage_count"] += 1
-
-                if timestamp is not None and (
-                    entry["last_used_at"] is None or timestamp > entry["last_used_at"]
-                ):
-                    entry["last_used_at"] = timestamp
-
-        return list(skill_counts.values())
-
    def _get_message_stats(self, cutoff: float, source: str = None) -> Dict:
        """Get aggregate message statistics."""
        if source:
@@ -563,46 +475,6 @@ class InsightsEngine:
            })
        return result

-    def _compute_skill_breakdown(self, skill_usage: List[Dict]) -> Dict[str, Any]:
-        """Process per-skill usage into summary + ranked list."""
-        total_skill_loads = sum(s["view_count"] for s in skill_usage) if skill_usage else 0
-        total_skill_edits = sum(s["manage_count"] for s in skill_usage) if skill_usage else 0
-        total_skill_actions = total_skill_loads + total_skill_edits
-
-        top_skills = []
-        for skill in skill_usage:
-            total_count = skill["view_count"] + skill["manage_count"]
-            percentage = (total_count / total_skill_actions * 100) if total_skill_actions else 0
-            top_skills.append({
-                "skill": skill["skill"],
-                "view_count": skill["view_count"],
-                "manage_count": skill["manage_count"],
-                "total_count": total_count,
-                "percentage": percentage,
-                "last_used_at": skill.get("last_used_at"),
-            })
-
-        top_skills.sort(
-            key=lambda s: (
-                s["total_count"],
-                s["view_count"],
-                s["manage_count"],
-                s["last_used_at"] or 0,
-                s["skill"],
-            ),
-            reverse=True,
-        )
-
-        return {
-            "summary": {
-                "total_skill_loads": total_skill_loads,
-                "total_skill_edits": total_skill_edits,
-                "total_skill_actions": total_skill_actions,
-                "distinct_skills_used": len(skill_usage),
-            },
-            "top_skills": top_skills,
-        }
-
    def _compute_activity_patterns(self, sessions: List[Dict]) -> Dict:
        """Analyze activity patterns by day of week and hour."""
        day_counts = Counter()  # 0=Monday ... 6=Sunday
@@ -798,28 +670,6 @@ class InsightsEngine:
                lines.append(f"  ... and {len(report['tools']) - 15} more tools")
            lines.append("")

-        # Skill usage
-        skills = report.get("skills", {})
-        top_skills = skills.get("top_skills", [])
-        if top_skills:
-            lines.append("  🧠 Top Skills")
-            lines.append("  " + "─" * 56)
-            lines.append(f"  {'Skill':<28} {'Loads':>7} {'Edits':>7} {'Last used':>11}")
-            for skill in top_skills[:10]:
-                last_used = "—"
-                if skill.get("last_used_at"):
-                    last_used = datetime.fromtimestamp(skill["last_used_at"]).strftime("%b %d")
-                lines.append(
-                    f"  {skill['skill'][:28]:<28} {skill['view_count']:>7,} {skill['manage_count']:>7,} {last_used:>11}"
-                )
-            summary = skills.get("summary", {})
-            lines.append(
-                f"  Distinct skills: {summary.get('distinct_skills_used', 0)}  "
-                f"Loads: {summary.get('total_skill_loads', 0):,}  "
-                f"Edits: {summary.get('total_skill_edits', 0):,}"
-            )
-            lines.append("")
-
        # Activity patterns
        act = report.get("activity", {})
        if act.get("by_day"):
@@ -903,18 +753,6 @@ class InsightsEngine:
                lines.append(f"  {t['tool']} — {t['count']:,} calls ({t['percentage']:.1f}%)")
            lines.append("")

-        skills = report.get("skills", {})
-        if skills.get("top_skills"):
-            lines.append("**🧠 Top Skills:**")
-            for skill in skills["top_skills"][:5]:
-                suffix = ""
-                if skill.get("last_used_at"):
-                    suffix = f", last used {datetime.fromtimestamp(skill['last_used_at']).strftime('%b %d')}"
-                lines.append(
-                    f"  {skill['skill']} — {skill['view_count']:,} loads, {skill['manage_count']:,} edits{suffix}"
-                )
-            lines.append("")
-
        # Activity summary
        act = report.get("activity", {})
        if act.get("busiest_day") and act.get("busiest_hour"):
--- a/hermes_agent/agent/manual_compression_feedback.py
+++ b/hermes_agent/agent/manual_compression_feedback.py
--- a/hermes_agent/agent/memory/manager.py
+++ b/hermes_agent/agent/memory/manager.py
@@ -33,8 +33,8 @@ import logging
 import re
 from typing import Any, Dict, List, Optional

-from hermes_agent.agent.memory.provider import MemoryProvider
-from hermes_agent.tools.registry import tool_error
+from agent.memory_provider import MemoryProvider
+from tools.registry import tool_error

 logger = logging.getLogger(__name__)

@@ -361,7 +361,7 @@ class MemoryManager:
        ``get_hermes_home()`` themselves.
        """
        if "hermes_home" not in kwargs:
-            from hermes_agent.constants import get_hermes_home
+            from hermes_constants import get_hermes_home
            kwargs["hermes_home"] = str(get_hermes_home())
        for provider in self._providers:
            try:
--- a/hermes_agent/agent/memory/provider.py
+++ b/hermes_agent/agent/memory/provider.py
--- a/hermes_agent/providers/metadata.py
+++ b/hermes_agent/providers/metadata.py
@@ -14,9 +14,7 @@ from urllib.parse import urlparse
 import requests
 import yaml

-from hermes_agent.utils import base_url_host_matches, base_url_hostname
-
-from hermes_agent.constants import OPENROUTER_MODELS_URL
+from hermes_constants import OPENROUTER_MODELS_URL

 logger = logging.getLogger(__name__)

@@ -40,7 +38,6 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
    "xai", "x-ai", "x.ai", "grok",
-    "nvidia", "nim", "nvidia-nim", "nemotron",
    "qwen-portal",
 })

@@ -118,6 +115,7 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gpt-5.4-nano": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4-mini": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4": 1050000,               # GPT-5.4, GPT-5.4 Pro (1.05M context)
+    "gpt-5.3-codex-spark": 128000,    # Spark variant has reduced 128k context
    "gpt-5.1-chat": 128000,           # Chat variant has 128k context
    "gpt-5": 400000,                  # GPT-5.x base, mini, codex variants (400k)
    "gpt-4.1": 1047576,
@@ -126,6 +124,7 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gemini": 1048576,
    # Gemma (open models served via AI Studio)
    "gemma-4-31b": 256000,
+    "gemma-4-26b": 256000,
    "gemma-3": 131072,
    "gemma": 8192,  # fallback for older gemma models
    # DeepSeek
@@ -159,8 +158,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "grok": 131072,             # catch-all (grok-beta, unknown grok-*)
    # Kimi
    "kimi": 262144,
-    # Nemotron — NVIDIA's open-weights series (128K context across all sizes)
-    "nemotron": 131072,
    # Arcee
    "trinity": 262144,
    # OpenRouter
@@ -170,7 +167,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "Qwen/Qwen3.5-35B-A3B": 131072,
    "deepseek-ai/DeepSeek-V3.2": 65536,
    "moonshotai/Kimi-K2.5": 262144,
-    "moonshotai/Kimi-K2.6": 262144,
    "moonshotai/Kimi-K2-Thinking": 262144,
    "MiniMaxAI/MiniMax-M2.5": 204800,
    "XiaomiMiMo/MiMo-V2-Flash": 256000,
@@ -213,15 +209,8 @@ def _normalize_base_url(base_url: str) -> str:
    return (base_url or "").strip().rstrip("/")


-def _auth_headers(api_key: str = "") -> Dict[str, str]:
-    token = str(api_key or "").strip()
-    if not token:
-        return {}
-    return {"Authorization": f"Bearer {token}"}
-
-
 def _is_openrouter_base_url(base_url: str) -> bool:
-    return base_url_host_matches(base_url, "openrouter.ai")
+    return "openrouter.ai" in _normalize_base_url(base_url).lower()


 def _is_custom_endpoint(base_url: str) -> bool:
@@ -251,7 +240,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.fireworks.ai": "fireworks",
    "opencode.ai": "opencode-go",
    "api.x.ai": "xai",
-    "integrate.api.nvidia.com": "nvidia",
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
    "ollama.com": "ollama-cloud",
@@ -319,7 +307,7 @@ def is_local_endpoint(base_url: str) -> bool:
    return False


-def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
+def detect_local_server_type(base_url: str) -> Optional[str]:
    """Detect which local server is running at base_url by probing known endpoints.

    Returns one of: "ollama", "lm-studio", "vllm", "llamacpp", or None.
@@ -331,10 +319,8 @@ def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
    if server_url.endswith("/v1"):
        server_url = server_url[:-3]

-    headers = _auth_headers(api_key)
-
    try:
-        with httpx.Client(timeout=2.0, headers=headers) as client:
+        with httpx.Client(timeout=2.0) as client:
            # LM Studio exposes /api/v1/models — check first (most specific)
            try:
                r = client.get(f"{server_url}/api/v1/models")
@@ -521,59 +507,6 @@ def fetch_endpoint_model_metadata(
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    last_error: Optional[Exception] = None

-    if is_local_endpoint(normalized):
-        try:
-            if detect_local_server_type(normalized, api_key=api_key) == "lm-studio":
-                server_url = normalized[:-3].rstrip("/") if normalized.endswith("/v1") else normalized
-                response = requests.get(
-                    server_url.rstrip("/") + "/api/v1/models",
-                    headers=headers,
-                    timeout=10,
-                )
-                response.raise_for_status()
-                payload = response.json()
-                cache: Dict[str, Dict[str, Any]] = {}
-                for model in payload.get("models", []):
-                    if not isinstance(model, dict):
-                        continue
-                    model_id = model.get("key") or model.get("id")
-                    if not model_id:
-                        continue
-                    entry: Dict[str, Any] = {"name": model.get("name", model_id)}
-
-                    context_length = None
-                    for inst in model.get("loaded_instances", []) or []:
-                        if not isinstance(inst, dict):
-                            continue
-                        cfg = inst.get("config", {})
-                        ctx = cfg.get("context_length") if isinstance(cfg, dict) else None
-                        if isinstance(ctx, int) and ctx > 0:
-                            context_length = ctx
-                            break
-                    if context_length is None:
-                        context_length = _extract_context_length(model)
-                    if context_length is not None:
-                        entry["context_length"] = context_length
-
-                    max_completion_tokens = _extract_max_completion_tokens(model)
-                    if max_completion_tokens is not None:
-                        entry["max_completion_tokens"] = max_completion_tokens
-
-                    pricing = _extract_pricing(model)
-                    if pricing:
-                        entry["pricing"] = pricing
-
-                    _add_model_aliases(cache, model_id, entry)
-                    alt_id = model.get("id")
-                    if isinstance(alt_id, str) and alt_id and alt_id != model_id:
-                        _add_model_aliases(cache, alt_id, entry)
-
-                _endpoint_model_metadata_cache[normalized] = cache
-                _endpoint_model_metadata_cache_time[normalized] = time.time()
-                return cache
-        except Exception as exc:
-            last_error = exc
-
    for candidate in candidates:
        url = candidate.rstrip("/") + "/models"
        try:
@@ -636,7 +569,7 @@ def fetch_endpoint_model_metadata(

 def _get_context_cache_path() -> Path:
    """Return path to the persistent context length cache file."""
-    from hermes_agent.constants import get_hermes_home
+    from hermes_constants import get_hermes_home
    return get_hermes_home() / "context_length_cache.yaml"


@@ -780,7 +713,7 @@ def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
    return False


-def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Optional[int]:
+def query_ollama_num_ctx(model: str, base_url: str) -> Optional[int]:
    """Query an Ollama server for the model's context length.

    Returns the model's maximum context from GGUF metadata via ``/api/show``,
@@ -798,16 +731,14 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
        server_url = server_url[:-3]

    try:
-        server_type = detect_local_server_type(base_url, api_key=api_key)
+        server_type = detect_local_server_type(base_url)
    except Exception:
        return None
    if server_type != "ollama":
        return None

-    headers = _auth_headers(api_key)
-
    try:
-        with httpx.Client(timeout=3.0, headers=headers) as client:
+        with httpx.Client(timeout=3.0) as client:
            resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
            if resp.status_code != 200:
                return None
@@ -835,7 +766,7 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
    return None


-def _query_local_context_length(model: str, base_url: str, api_key: str = "") -> Optional[int]:
+def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
    """Query a local server for the model's context length."""
    import httpx

@@ -848,15 +779,13 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
    if server_url.endswith("/v1"):
        server_url = server_url[:-3]

-    headers = _auth_headers(api_key)
-
    try:
-        server_type = detect_local_server_type(base_url, api_key=api_key)
+        server_type = detect_local_server_type(base_url)
    except Exception:
        server_type = None

    try:
-        with httpx.Client(timeout=3.0, headers=headers) as client:
+        with httpx.Client(timeout=3.0) as client:
            # Ollama: /api/show returns model details with context info
            if server_type == "ollama":
                resp = client.post(f"{server_url}/api/show", json={"name": model})
@@ -1067,7 +996,7 @@ def get_model_context_length(
        if not _is_known_provider_base_url(base_url):
            # 3. Try querying local server directly
            if is_local_endpoint(base_url):
-                local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
+                local_ctx = _query_local_context_length(model, base_url)
                if local_ctx and local_ctx > 0:
                    save_context_length(model, base_url, local_ctx)
                    return local_ctx
@@ -1081,7 +1010,7 @@ def get_model_context_length(

    # 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
    if provider == "anthropic" or (
-        base_url and base_url_hostname(base_url) == "api.anthropic.com"
+        base_url and "api.anthropic.com" in base_url
    ):
        ctx = _query_anthropic_context_length(model, base_url or "https://api.anthropic.com", api_key)
        if ctx:
@@ -1090,13 +1019,9 @@ def get_model_context_length(
    # 4b. AWS Bedrock — use static context length table.
    # Bedrock's ListFoundationModels doesn't expose context window sizes,
    # so we maintain a curated table in bedrock_adapter.py.
-    if provider == "bedrock" or (
-        base_url
-        and base_url_hostname(base_url).startswith("bedrock-runtime.")
-        and base_url_host_matches(base_url, "amazonaws.com")
-    ):
+    if provider == "bedrock" or (base_url and "bedrock-runtime" in base_url):
        try:
-            from hermes_agent.providers.bedrock_adapter import get_bedrock_context_length
+            from agent.bedrock_adapter import get_bedrock_context_length
            return get_bedrock_context_length(model)
        except ImportError:
            pass  # boto3 not installed — fall through to generic resolution
@@ -1118,7 +1043,7 @@ def get_model_context_length(
        if ctx:
            return ctx
    if effective_provider:
-        from hermes_agent.providers.metadata_dev import lookup_models_dev_context
+        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
        if ctx:
            return ctx
@@ -1141,7 +1066,7 @@ def get_model_context_length(

    # 9. Query local server as last resort
    if base_url and is_local_endpoint(base_url):
-        local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
+        local_ctx = _query_local_context_length(model, base_url)
        if local_ctx and local_ctx > 0:
            save_context_length(model, base_url, local_ctx)
            return local_ctx
--- a/hermes_agent/providers/metadata_dev.py
+++ b/hermes_agent/providers/metadata_dev.py
@@ -25,7 +25,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple

-from hermes_agent.utils import atomic_json_write
+from utils import atomic_json_write

 import requests

@@ -179,7 +179,7 @@ _MODELS_DEV_TO_PROVIDER: Optional[Dict[str, str]] = None

 def _get_cache_path() -> Path:
    """Return path to disk cache file."""
-    from hermes_agent.constants import get_hermes_home
+    from hermes_constants import get_hermes_home
    return get_hermes_home() / "models_dev_cache.json"


@@ -420,10 +420,7 @@ def list_provider_models(provider: str) -> List[str]:
    models = _get_provider_models(provider)
    if models is None:
        return []
-    return [
-        mid for mid in models.keys()
-        if not _should_hide_from_provider_catalog(provider, mid)
-    ]
+    return list(models.keys())


 # Patterns that indicate non-agentic or noise models (TTS, embedding,
@@ -435,43 +432,6 @@ _NOISE_PATTERNS: re.Pattern = re.compile(
    re.IGNORECASE,
 )

-# Google's live Gemini catalogs currently include a mix of stale slugs and
-# Gemma models whose TPM quotas are too small for normal Hermes agent traffic.
-# Keep capability metadata available for direct/manual use, but hide these from
-# the Gemini model catalogs we surface in setup and model selection.
-_GOOGLE_HIDDEN_MODELS = frozenset({
-    # Low-TPM Gemma models that trip Google input-token quota walls under
-    # agent-style traffic despite advertising large context windows.
-    "gemma-4-31b-it",
-    "gemma-4-26b-it",
-    "gemma-4-26b-a4b-it",
-    "gemma-3-1b",
-    "gemma-3-1b-it",
-    "gemma-3-2b",
-    "gemma-3-2b-it",
-    "gemma-3-4b",
-    "gemma-3-4b-it",
-    "gemma-3-12b",
-    "gemma-3-12b-it",
-    "gemma-3-27b",
-    "gemma-3-27b-it",
-    # Stale/retired Google slugs that still surface through models.dev-backed
-    # Gemini selection but 404 on the current Google endpoints.
-    "gemini-1.5-flash",
-    "gemini-1.5-pro",
-    "gemini-1.5-flash-8b",
-    "gemini-2.0-flash",
-    "gemini-2.0-flash-lite",
-})
-
-
-def _should_hide_from_provider_catalog(provider: str, model_id: str) -> bool:
-    provider_lower = (provider or "").strip().lower()
-    model_lower = (model_id or "").strip().lower()
-    if provider_lower in {"gemini", "google"} and model_lower in _GOOGLE_HIDDEN_MODELS:
-        return True
-    return False
-

 def list_agentic_models(provider: str) -> List[str]:
    """Return model IDs suitable for agentic use from models.dev.
@@ -488,8 +448,6 @@ def list_agentic_models(provider: str) -> List[str]:
    for mid, entry in models.items():
        if not isinstance(entry, dict):
            continue
-        if _should_hide_from_provider_catalog(provider, mid):
-            continue
        if not entry.get("tool_call", False):
            continue
        if _NOISE_PATTERNS.search(mid):
@@ -624,3 +582,5 @@ def get_model_info(
            return _parse_model_info(mid, mdata, mdev_id)

    return None
+
+
--- a/hermes_agent/providers/nous_rate_guard.py
+++ b/hermes_agent/providers/nous_rate_guard.py
@@ -28,7 +28,7 @@ _STATE_FILENAME = "nous.json"
 def _state_path() -> str:
    """Return the path to the Nous rate limit state file."""
    try:
-        from hermes_agent.constants import get_hermes_home
+        from hermes_constants import get_hermes_home
        base = get_hermes_home()
    except ImportError:
        base = os.path.join(os.path.expanduser("~"), ".hermes")
--- a/hermes_agent/agent/prompt_builder.py
+++ b/hermes_agent/agent/prompt_builder.py
@@ -12,10 +12,10 @@ import threading
 from collections import OrderedDict
 from pathlib import Path

-from hermes_agent.constants import get_hermes_home, get_skills_dir, is_wsl
+from hermes_constants import get_hermes_home, get_skills_dir, is_wsl
 from typing import Optional

-from hermes_agent.agent.skill_utils import (
+from agent.skill_utils import (
    extract_skill_conditions,
    extract_skill_description,
    get_all_skills_dirs,
@@ -24,7 +24,7 @@ from hermes_agent.agent.skill_utils import (
    parse_frontmatter,
    skill_matches_platform,
 )
-from hermes_agent.utils import atomic_json_write
+from utils import atomic_json_write

 logger = logging.getLogger(__name__)

@@ -152,13 +152,7 @@ MEMORY_GUIDANCE = (
    "Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
    "state to memory; use session_search to recall those from past transcripts. "
    "If you've discovered a new way to do something, solved a problem that could be "
-    "necessary later, save it as a skill with the skill tool.\n"
-    "Write memories as declarative facts, not instructions to yourself. "
-    "'User prefers concise responses' ✓ — 'Always respond concisely' ✗. "
-    "'Project uses pytest with xdist' ✓ — 'Run tests with pytest -n 4' ✗. "
-    "Imperative phrasing gets re-read as a directive in later sessions and can "
-    "cause repeated work or override the user's current request. Procedures and "
-    "workflows belong in skills, not memory."
+    "necessary later, save it as a skill with the skill tool."
 )

 SESSION_SEARCH_GUIDANCE = (
@@ -350,13 +344,7 @@ PLATFORM_HINTS = {
    ),
    "cli": (
        "You are a CLI AI Agent. Try not to use markdown but simple text "
-        "renderable inside a terminal. "
-        "File delivery: there is no attachment channel — the user reads your "
-        "response directly in their terminal. Do NOT emit MEDIA:/path tags "
-        "(those are only intercepted on messaging platforms like Telegram, "
-        "Discord, Slack, etc.; on the CLI they render as literal text). "
-        "When referring to a file you created or changed, just state its "
-        "absolute path in plain text; the user can open it from there."
+        "renderable inside a terminal."
    ),
    "sms": (
        "You are communicating via SMS. Keep responses concise and use plain text "
@@ -619,20 +607,18 @@ def build_skills_system_prompt(
    # ── Layer 1: in-process LRU cache ─────────────────────────────────
    # Include the resolved platform so per-platform disabled-skill lists
    # produce distinct cache entries (gateway serves multiple platforms).
-    from hermes_agent.gateway.session_context import get_session_env
+    from gateway.session_context import get_session_env
    _platform_hint = (
        os.environ.get("HERMES_PLATFORM")
        or get_session_env("HERMES_SESSION_PLATFORM")
        or ""
    )
-    disabled = get_disabled_skill_names()
    cache_key = (
        str(skills_dir.resolve()),
        tuple(str(d) for d in external_dirs),
        tuple(sorted(str(t) for t in (available_tools or set()))),
        tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
        _platform_hint,
-        tuple(sorted(disabled)),
    )
    with _SKILLS_PROMPT_CACHE_LOCK:
        cached = _SKILLS_PROMPT_CACHE.get(cache_key)
@@ -640,6 +626,8 @@ def build_skills_system_prompt(
            _SKILLS_PROMPT_CACHE.move_to_end(cache_key)
            return cached

+    disabled = get_disabled_skill_names()
+
    # ── Layer 2: disk snapshot ────────────────────────────────────────
    snapshot = _load_skills_snapshot(skills_dir)

@@ -666,7 +654,7 @@ def build_skills_system_prompt(
            ):
                continue
            skills_by_category.setdefault(category, []).append(
-                (frontmatter_name, entry.get("description", ""))
+                (skill_name, entry.get("description", ""))
            )
        category_descriptions = {
            str(k): str(v)
@@ -691,7 +679,7 @@ def build_skills_system_prompt(
            ):
                continue
            skills_by_category.setdefault(entry["category"], []).append(
-                (entry["frontmatter_name"], entry["description"])
+                (skill_name, entry["description"])
            )

        # Read category-level DESCRIPTION.md files
@@ -734,10 +722,9 @@ def build_skills_system_prompt(
                    continue
                entry = _build_snapshot_entry(skill_file, ext_dir, frontmatter, desc)
                skill_name = entry["skill_name"]
-                frontmatter_name = entry["frontmatter_name"]
-                if frontmatter_name in seen_skill_names:
+                if skill_name in seen_skill_names:
                    continue
-                if frontmatter_name in disabled or skill_name in disabled:
+                if entry["frontmatter_name"] in disabled or skill_name in disabled:
                    continue
                if not _skill_should_show(
                    extract_skill_conditions(frontmatter),
@@ -745,9 +732,9 @@ def build_skills_system_prompt(
                    available_toolsets,
                ):
                    continue
-                seen_skill_names.add(frontmatter_name)
+                seen_skill_names.add(skill_name)
                skills_by_category.setdefault(entry["category"], []).append(
-                    (frontmatter_name, entry["description"])
+                    (skill_name, entry["description"])
                )
            except Exception as e:
                logger.debug("Error reading external skill %s: %s", skill_file, e)
@@ -824,8 +811,8 @@ def build_skills_system_prompt(
 def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -> str:
    """Build a compact Nous subscription capability block for the system prompt."""
    try:
-        from hermes_agent.cli.nous_subscription import get_nous_subscription_features
-        from hermes_agent.tools.backend_helpers import managed_nous_tools_enabled
+        from hermes_cli.nous_subscription import get_nous_subscription_features
+        from tools.tool_backend_helpers import managed_nous_tools_enabled
    except Exception as exc:
        logger.debug("Failed to import Nous subscription helper: %s", exc)
        return ""
@@ -911,7 +898,7 @@ def load_soul_md() -> Optional[str]:
    ``skip_soul=True`` so SOUL.md isn't injected twice.
    """
    try:
-        from hermes_agent.cli.config import ensure_hermes_home
+        from hermes_cli.config import ensure_hermes_home
        ensure_hermes_home()
    except Exception as e:
        logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
--- a/hermes_agent/providers/caching.py
+++ b/hermes_agent/providers/caching.py
--- a/hermes_agent/providers/rate_limiting.py
+++ b/hermes_agent/providers/rate_limiting.py
--- a/hermes_agent/agent/redact.py
+++ b/hermes_agent/agent/redact.py
@@ -13,48 +13,6 @@ import re

 logger = logging.getLogger(__name__)

-# Sensitive query-string parameter names (case-insensitive exact match).
-# Ported from nearai/ironclaw#2529 — catches tokens whose values don't match
-# any known vendor prefix regex (e.g. opaque tokens, short OAuth codes).
-_SENSITIVE_QUERY_PARAMS = frozenset({
-    "access_token",
-    "refresh_token",
-    "id_token",
-    "token",
-    "api_key",
-    "apikey",
-    "client_secret",
-    "password",
-    "auth",
-    "jwt",
-    "session",
-    "secret",
-    "key",
-    "code",           # OAuth authorization codes
-    "signature",      # pre-signed URL signatures
-    "x-amz-signature",
-})
-
-# Sensitive form-urlencoded / JSON body key names (case-insensitive exact match).
-# Exact match, NOT substring — "token_count" and "session_id" must NOT match.
-# Ported from nearai/ironclaw#2529.
-_SENSITIVE_BODY_KEYS = frozenset({
-    "access_token",
-    "refresh_token",
-    "id_token",
-    "token",
-    "api_key",
-    "apikey",
-    "client_secret",
-    "password",
-    "auth",
-    "jwt",
-    "secret",
-    "private_key",
-    "authorization",
-    "key",
-})
-
 # Snapshot at import time so runtime env mutations (e.g. LLM-generated
 # `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
 _REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
@@ -150,30 +108,6 @@ _DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
 # Negative lookahead prevents matching hex strings or identifiers
 _SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")

-# URLs containing query strings — matches `scheme://...?...[# or end]`.
-# Used to scan text for URLs whose query params may contain secrets.
-# Ported from nearai/ironclaw#2529.
-_URL_WITH_QUERY_RE = re.compile(
-    r"(https?|wss?|ftp)://"          # scheme
-    r"([^\s/?#]+)"                    # authority (may include userinfo)
-    r"([^\s?#]*)"                     # path
-    r"\?([^\s#]+)"                    # query (required)
-    r"(#\S*)?",                       # optional fragment
-)
-
-# URLs containing userinfo — `scheme://user:password@host` for ANY scheme
-# (not just DB protocols already covered by _DB_CONNSTR_RE above).
-# Catches things like `https://user:token@api.example.com/v1/foo`.
-_URL_USERINFO_RE = re.compile(
-    r"(https?|wss?|ftp)://([^/\s:@]+):([^/\s@]+)@",
-)
-
-# Form-urlencoded body detection: conservative — only applies when the entire
-# text looks like a query string (k=v&k=v pattern with no newlines).
-_FORM_BODY_RE = re.compile(
-    r"^[A-Za-z_][A-Za-z0-9_.-]*=[^&\s]*(?:&[A-Za-z_][A-Za-z0-9_.-]*=[^&\s]*)+$"
-)
-
 # Compile known prefix patterns into one alternation
 _PREFIX_RE = re.compile(
    r"(?<![A-Za-z0-9_-])(" + "|".join(_PREFIX_PATTERNS) + r")(?![A-Za-z0-9_-])"
@@ -187,72 +121,6 @@ def _mask_token(token: str) -> str:
    return f"{token[:6]}...{token[-4:]}"


-def _redact_query_string(query: str) -> str:
-    """Redact sensitive parameter values in a URL query string.
-
-    Handles `k=v&k=v` format. Sensitive keys (case-insensitive) have values
-    replaced with `***`. Non-sensitive keys pass through unchanged.
-    Empty or malformed pairs are preserved as-is.
-    """
-    if not query:
-        return query
-    parts = []
-    for pair in query.split("&"):
-        if "=" not in pair:
-            parts.append(pair)
-            continue
-        key, _, value = pair.partition("=")
-        if key.lower() in _SENSITIVE_QUERY_PARAMS:
-            parts.append(f"{key}=***")
-        else:
-            parts.append(pair)
-    return "&".join(parts)
-
-
-def _redact_url_query_params(text: str) -> str:
-    """Scan text for URLs with query strings and redact sensitive params.
-
-    Catches opaque tokens that don't match vendor prefix regexes, e.g.
-    `https://example.com/cb?code=ABC123&state=xyz` → `...?code=***&state=xyz`.
-    """
-    def _sub(m: re.Match) -> str:
-        scheme = m.group(1)
-        authority = m.group(2)
-        path = m.group(3)
-        query = _redact_query_string(m.group(4))
-        fragment = m.group(5) or ""
-        return f"{scheme}://{authority}{path}?{query}{fragment}"
-    return _URL_WITH_QUERY_RE.sub(_sub, text)
-
-
-def _redact_url_userinfo(text: str) -> str:
-    """Strip `user:password@` from HTTP/WS/FTP URLs.
-
-    DB protocols (postgres, mysql, mongodb, redis, amqp) are handled
-    separately by `_DB_CONNSTR_RE`.
-    """
-    return _URL_USERINFO_RE.sub(
-        lambda m: f"{m.group(1)}://{m.group(2)}:***@",
-        text,
-    )
-
-
-def _redact_form_body(text: str) -> str:
-    """Redact sensitive values in a form-urlencoded body.
-
-    Only applies when the entire input looks like a pure form body
-    (k=v&k=v with no newlines, no other text). Single-line non-form
-    text passes through unchanged. This is a conservative pass — the
-    `_redact_url_query_params` function handles embedded query strings.
-    """
-    if not text or "\n" in text or "&" not in text:
-        return text
-    # The body-body form check is strict: only trigger on clean k=v&k=v.
-    if not _FORM_BODY_RE.match(text.strip()):
-        return text
-    return _redact_query_string(text.strip())
-
-
 def redact_sensitive_text(text: str) -> str:
    """Apply all redaction patterns to a block of text.

@@ -305,16 +173,6 @@ def redact_sensitive_text(text: str) -> str:
    # JWT tokens (eyJ... — base64-encoded JSON headers)
    text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)

-    # URL userinfo (http(s)://user:pass@host) — redact for non-DB schemes.
-    # DB schemes are handled above by _DB_CONNSTR_RE.
-    text = _redact_url_userinfo(text)
-
-    # URL query params containing opaque tokens (?access_token=…&code=…)
-    text = _redact_url_query_params(text)
-
-    # Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
-    text = _redact_form_body(text)
-
    # Discord user/role mentions (<@snowflake_id>)
    text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)

--- a/hermes_agent/providers/retry.py
+++ b/hermes_agent/providers/retry.py
--- a/hermes_agent/agent/skill_commands.py
+++ b/hermes_agent/agent/skill_commands.py
@@ -8,12 +8,11 @@ can invoke skills via /skill-name commands and prompt-only built-ins like
 import json
 import logging
 import re
-import subprocess
 from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, Optional

-from hermes_agent.constants import display_hermes_home
+from hermes_constants import display_hermes_home

 logger = logging.getLogger(__name__)

@@ -23,110 +22,6 @@ _PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
 _SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
 _SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")

-# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
-# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
-# left as-is so the user can debug them.
-_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
-
-# Matches inline shell snippets like:  !`date +%Y-%m-%d`
-# Non-greedy, single-line only — no newlines inside the backticks.
-_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
-
-# Cap inline-shell output so a runaway command can't blow out the context.
-_INLINE_SHELL_MAX_OUTPUT = 4000
-
-
-def _load_skills_config() -> dict:
-    """Load the ``skills`` section of config.yaml (best-effort)."""
-    try:
-        from hermes_agent.cli.config import load_config
-
-        cfg = load_config() or {}
-        skills_cfg = cfg.get("skills")
-        if isinstance(skills_cfg, dict):
-            return skills_cfg
-    except Exception:
-        logger.debug("Could not read skills config", exc_info=True)
-    return {}
-
-
-def _substitute_template_vars(
-    content: str,
-    skill_dir: Path | None,
-    session_id: str | None,
-) -> str:
-    """Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
-
-    Only substitutes tokens for which a concrete value is available —
-    unresolved tokens are left in place so the author can spot them.
-    """
-    if not content:
-        return content
-
-    skill_dir_str = str(skill_dir) if skill_dir else None
-
-    def _replace(match: re.Match) -> str:
-        token = match.group(1)
-        if token == "HERMES_SKILL_DIR" and skill_dir_str:
-            return skill_dir_str
-        if token == "HERMES_SESSION_ID" and session_id:
-            return str(session_id)
-        return match.group(0)
-
-    return _SKILL_TEMPLATE_RE.sub(_replace, content)
-
-
-def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
-    """Execute a single inline-shell snippet and return its stdout (trimmed).
-
-    Failures return a short ``[inline-shell error: ...]`` marker instead of
-    raising, so one bad snippet can't wreck the whole skill message.
-    """
-    try:
-        completed = subprocess.run(
-            ["bash", "-c", command],
-            cwd=str(cwd) if cwd else None,
-            capture_output=True,
-            text=True,
-            timeout=max(1, int(timeout)),
-            check=False,
-        )
-    except subprocess.TimeoutExpired:
-        return f"[inline-shell timeout after {timeout}s: {command}]"
-    except FileNotFoundError:
-        return f"[inline-shell error: bash not found]"
-    except Exception as exc:
-        return f"[inline-shell error: {exc}]"
-
-    output = (completed.stdout or "").rstrip("\n")
-    if not output and completed.stderr:
-        output = completed.stderr.rstrip("\n")
-    if len(output) > _INLINE_SHELL_MAX_OUTPUT:
-        output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
-    return output
-
-
-def _expand_inline_shell(
-    content: str,
-    skill_dir: Path | None,
-    timeout: int,
-) -> str:
-    """Replace every !`cmd` snippet in ``content`` with its stdout.
-
-    Runs each snippet with the skill directory as CWD so relative paths in
-    the snippet work the way the author expects.
-    """
-    if "!`" not in content:
-        return content
-
-    def _replace(match: re.Match) -> str:
-        cmd = match.group(1).strip()
-        if not cmd:
-            return ""
-        return _run_inline_shell(cmd, skill_dir, timeout)
-
-    return _INLINE_SHELL_RE.sub(_replace, content)
-

 def build_plan_path(
    user_instruction: str = "",
@@ -156,7 +51,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
        return None

    try:
-        from hermes_agent.tools.skills.tool import SKILLS_DIR, skill_view
+        from tools.skills_tool import SKILLS_DIR, skill_view

        identifier_path = Path(raw_identifier).expanduser()
        if identifier_path.is_absolute():
@@ -202,7 +97,7 @@ def _inject_skill_config(loaded_skill: dict[str, Any], parts: list[str]) -> None
    without needing to read config.yaml itself.
    """
    try:
-        from hermes_agent.agent.skill_utils import (
+        from agent.skill_utils import (
            extract_skill_config_vars,
            parse_frontmatter,
            resolve_skill_config_values,
@@ -238,36 +133,14 @@ def _build_skill_message(
    activation_note: str,
    user_instruction: str = "",
    runtime_note: str = "",
-    session_id: str | None = None,
 ) -> str:
    """Format a loaded skill into a user/system message payload."""
-    from hermes_agent.tools.skills.tool import SKILLS_DIR
+    from tools.skills_tool import SKILLS_DIR

    content = str(loaded_skill.get("content") or "")

-    # ── Template substitution and inline-shell expansion ──
-    # Done before anything else so downstream blocks (setup notes,
-    # supporting-file hints) see the expanded content.
-    skills_cfg = _load_skills_config()
-    if skills_cfg.get("template_vars", True):
-        content = _substitute_template_vars(content, skill_dir, session_id)
-    if skills_cfg.get("inline_shell", False):
-        timeout = int(skills_cfg.get("inline_shell_timeout", 10) or 10)
-        content = _expand_inline_shell(content, skill_dir, timeout)
-
    parts = [activation_note, "", content.strip()]

-    # ── Inject the absolute skill directory so the agent can reference
-    #    bundled scripts without an extra skill_view() round-trip. ──
-    if skill_dir:
-        parts.append("")
-        parts.append(f"[Skill directory: {skill_dir}]")
-        parts.append(
-            "Resolve any relative paths in this skill (e.g. `scripts/foo.js`, "
-            "`templates/config.yaml`) against that directory, then run them "
-            "with the terminal tool using the absolute path."
-        )
-
    # ── Inject resolved skill config values ──
    _inject_skill_config(loaded_skill, parts)

@@ -315,13 +188,11 @@ def _build_skill_message(
            # Skill is from an external dir — use the skill name instead
            skill_view_target = skill_dir.name
        parts.append("")
-        parts.append("[This skill has supporting files:]")
+        parts.append("[This skill has supporting files you can load with the skill_view tool:]")
        for sf in supporting:
-            parts.append(f"- {sf}  ->  {skill_dir / sf}")
+            parts.append(f"- {sf}")
        parts.append(
-            f'\nLoad any of these with skill_view(name="{skill_view_target}", '
-            f'file_path="<path>"), or run scripts directly by absolute path '
-            f"(e.g. `node {skill_dir}/scripts/foo.js`)."
+            f'\nTo view any of these, use: skill_view(name="{skill_view_target}", file_path="<path>")'
        )

    if user_instruction:
@@ -344,8 +215,8 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    _skill_commands = {}
    try:
-        from hermes_agent.tools.skills.tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
-        from hermes_agent.agent.skill_utils import get_external_skills_dirs
+        from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
+        from agent.skill_utils import get_external_skills_dirs
        disabled = _get_disabled_skill_names()
        seen_names: set = set()

@@ -461,7 +332,6 @@ def build_skill_invocation_message(
        activation_note,
        user_instruction=user_instruction,
        runtime_note=runtime_note,
-        session_id=task_id,
    )


@@ -500,7 +370,6 @@ def build_preloaded_skills_prompt(
                loaded_skill,
                skill_dir,
                activation_note,
-                session_id=task_id,
            )
        )
        loaded_names.append(skill_name)
--- a/hermes_agent/agent/skill_utils.py
+++ b/hermes_agent/agent/skill_utils.py
@@ -12,7 +12,7 @@ import sys
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Set, Tuple

-from hermes_agent.constants import get_config_path, get_skills_dir
+from hermes_constants import get_config_path, get_skills_dir

 logger = logging.getLogger(__name__)

@@ -145,7 +145,7 @@ def get_disabled_skill_names(platform: str | None = None) -> Set[str]:
    if not isinstance(skills_cfg, dict):
        return set()

-    from hermes_agent.gateway.session_context import get_session_env
+    from gateway.session_context import get_session_env
    resolved_platform = (
        platform
        or os.getenv("HERMES_PLATFORM")
@@ -455,8 +455,7 @@ def parse_qualified_name(name: str) -> Tuple[Optional[str], str]:
    """
    if ":" not in name:
        return None, name
-    ns, bare = name.split(":", 1)
-    return ns, bare
+    return tuple(name.split(":", 1))  # type: ignore[return-value]


 def is_valid_namespace(candidate: Optional[str]) -> bool:
--- a/agent/smart_model_routing.py
+++ b/agent/smart_model_routing.py
@@ -0,0 +1,195 @@
+"""Helpers for optional cheap-vs-strong model routing."""
+
+from __future__ import annotations
+
+import os
+import re
+from typing import Any, Dict, Optional
+
+from utils import is_truthy_value
+
+_COMPLEX_KEYWORDS = {
+    "debug",
+    "debugging",
+    "implement",
+    "implementation",
+    "refactor",
+    "patch",
+    "traceback",
+    "stacktrace",
+    "exception",
+    "error",
+    "analyze",
+    "analysis",
+    "investigate",
+    "architecture",
+    "design",
+    "compare",
+    "benchmark",
+    "optimize",
+    "optimise",
+    "review",
+    "terminal",
+    "shell",
+    "tool",
+    "tools",
+    "pytest",
+    "test",
+    "tests",
+    "plan",
+    "planning",
+    "delegate",
+    "subagent",
+    "cron",
+    "docker",
+    "kubernetes",
+}
+
+_URL_RE = re.compile(r"https?://|www\.", re.IGNORECASE)
+
+
+def _coerce_bool(value: Any, default: bool = False) -> bool:
+    return is_truthy_value(value, default=default)
+
+
+def _coerce_int(value: Any, default: int) -> int:
+    try:
+        return int(value)
+    except (TypeError, ValueError):
+        return default
+
+
+def choose_cheap_model_route(user_message: str, routing_config: Optional[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
+    """Return the configured cheap-model route when a message looks simple.
+
+    Conservative by design: if the message has signs of code/tool/debugging/
+    long-form work, keep the primary model.
+    """
+    cfg = routing_config or {}
+    if not _coerce_bool(cfg.get("enabled"), False):
+        return None
+
+    cheap_model = cfg.get("cheap_model") or {}
+    if not isinstance(cheap_model, dict):
+        return None
+    provider = str(cheap_model.get("provider") or "").strip().lower()
+    model = str(cheap_model.get("model") or "").strip()
+    if not provider or not model:
+        return None
+
+    text = (user_message or "").strip()
+    if not text:
+        return None
+
+    max_chars = _coerce_int(cfg.get("max_simple_chars"), 160)
+    max_words = _coerce_int(cfg.get("max_simple_words"), 28)
+
+    if len(text) > max_chars:
+        return None
+    if len(text.split()) > max_words:
+        return None
+    if text.count("\n") > 1:
+        return None
+    if "```" in text or "`" in text:
+        return None
+    if _URL_RE.search(text):
+        return None
+
+    lowered = text.lower()
+    words = {token.strip(".,:;!?()[]{}\"'`") for token in lowered.split()}
+    if words & _COMPLEX_KEYWORDS:
+        return None
+
+    route = dict(cheap_model)
+    route["provider"] = provider
+    route["model"] = model
+    route["routing_reason"] = "simple_turn"
+    return route
+
+
+def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any]], primary: Dict[str, Any]) -> Dict[str, Any]:
+    """Resolve the effective model/runtime for one turn.
+
+    Returns a dict with model/runtime/signature/label fields.
+    """
+    route = choose_cheap_model_route(user_message, routing_config)
+    if not route:
+        return {
+            "model": primary.get("model"),
+            "runtime": {
+                "api_key": primary.get("api_key"),
+                "base_url": primary.get("base_url"),
+                "provider": primary.get("provider"),
+                "api_mode": primary.get("api_mode"),
+                "command": primary.get("command"),
+                "args": list(primary.get("args") or []),
+                "credential_pool": primary.get("credential_pool"),
+            },
+            "label": None,
+            "signature": (
+                primary.get("model"),
+                primary.get("provider"),
+                primary.get("base_url"),
+                primary.get("api_mode"),
+                primary.get("command"),
+                tuple(primary.get("args") or ()),
+            ),
+        }
+
+    from hermes_cli.runtime_provider import resolve_runtime_provider
+
+    explicit_api_key = None
+    api_key_env = str(route.get("api_key_env") or "").strip()
+    if api_key_env:
+        explicit_api_key = os.getenv(api_key_env) or None
+
+    try:
+        runtime = resolve_runtime_provider(
+            requested=route.get("provider"),
+            explicit_api_key=explicit_api_key,
+            explicit_base_url=route.get("base_url"),
+        )
+    except Exception:
+        return {
+            "model": primary.get("model"),
+            "runtime": {
+                "api_key": primary.get("api_key"),
+                "base_url": primary.get("base_url"),
+                "provider": primary.get("provider"),
+                "api_mode": primary.get("api_mode"),
+                "command": primary.get("command"),
+                "args": list(primary.get("args") or []),
+                "credential_pool": primary.get("credential_pool"),
+            },
+            "label": None,
+            "signature": (
+                primary.get("model"),
+                primary.get("provider"),
+                primary.get("base_url"),
+                primary.get("api_mode"),
+                primary.get("command"),
+                tuple(primary.get("args") or ()),
+            ),
+        }
+
+    return {
+        "model": route.get("model"),
+        "runtime": {
+            "api_key": runtime.get("api_key"),
+            "base_url": runtime.get("base_url"),
+            "provider": runtime.get("provider"),
+            "api_mode": runtime.get("api_mode"),
+            "command": runtime.get("command"),
+            "args": list(runtime.get("args") or []),
+            "credential_pool": runtime.get("credential_pool"),
+        },
+        "label": f"smart route → {route.get('model')} ({runtime.get('provider')})",
+        "signature": (
+            route.get("model"),
+            runtime.get("provider"),
+            runtime.get("base_url"),
+            runtime.get("api_mode"),
+            runtime.get("command"),
+            tuple(runtime.get("args") or ()),
+        ),
+    }
--- a/hermes_agent/agent/subdirectory_hints.py
+++ b/hermes_agent/agent/subdirectory_hints.py
@@ -19,7 +19,7 @@ import shlex
 from pathlib import Path
 from typing import Dict, Any, Optional, Set

-from hermes_agent.agent.prompt_builder import _scan_context_content
+from agent.prompt_builder import _scan_context_content

 logger = logging.getLogger(__name__)

--- a/hermes_agent/agent/title_generator.py
+++ b/hermes_agent/agent/title_generator.py
@@ -8,7 +8,7 @@ import logging
 import threading
 from typing import Optional

-from hermes_agent.providers.auxiliary import call_llm
+from agent.auxiliary_client import call_llm

 logger = logging.getLogger(__name__)

--- a/hermes_agent/agent/trajectory.py
+++ b/hermes_agent/agent/trajectory.py
--- a/hermes_agent/providers/pricing.py
+++ b/hermes_agent/providers/pricing.py
@@ -5,8 +5,7 @@ from datetime import datetime, timezone
 from decimal import Decimal
 from typing import Any, Dict, Literal, Optional

-from hermes_agent.providers.metadata import fetch_endpoint_model_metadata, fetch_model_metadata
-from hermes_agent.utils import base_url_host_matches
+from agent.model_metadata import fetch_endpoint_model_metadata, fetch_model_metadata

 DEFAULT_PRICING = {"input": 0.0, "output": 0.0}

@@ -394,7 +393,7 @@ def resolve_billing_route(

    if provider_name == "openai-codex":
        return BillingRoute(provider="openai-codex", model=model, base_url=base_url or "", billing_mode="subscription_included")
-    if provider_name == "openrouter" or base_url_host_matches(base_url or "", "openrouter.ai"):
+    if provider_name == "openrouter" or "openrouter.ai" in base:
        return BillingRoute(provider="openrouter", model=model, base_url=base_url or "", billing_mode="official_models_api")
    if provider_name == "anthropic":
        return BillingRoute(provider="anthropic", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
--- a/scripts/batch_runner.py
+++ b/scripts/batch_runner.py
@@ -20,13 +20,9 @@ Usage:
    python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --distribution=image_gen
 """

-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-
 import json
 import logging
+import os
 import time
 from pathlib import Path
 from typing import List, Dict, Any, Optional, Tuple
@@ -39,13 +35,13 @@ from rich.console import Console
 logger = logging.getLogger(__name__)
 import fire

-from hermes_agent.agent.loop import AIAgent
-from hermes_agent.tools.distributions import (
+from run_agent import AIAgent
+from toolset_distributions import (
    list_distributions, 
    sample_toolsets_from_distribution,
    validate_distribution
 )
-from hermes_agent.tools.dispatch import TOOL_TO_TOOLSET_MAP
+from model_tools import TOOL_TO_TOOLSET_MAP


 # Global configuration for worker processes
@@ -293,7 +289,7 @@ def _process_single_prompt(
                if config.get("verbose"):
                    print(f"   Prompt {prompt_index}: Docker image check failed: {img_err}", flush=True)

-        from hermes_agent.tools.terminal import register_task_env_overrides
+        from tools.terminal_tool import register_task_env_overrides
        overrides = {
            "docker_image": container_image,
            "modal_image": container_image,
@@ -448,7 +444,6 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
            if not reasoning.get("has_any_reasoning", True):
                print(f"   🚫 Prompt {prompt_index} discarded (no reasoning in any turn)")
                discarded_no_reasoning += 1
-                completed_in_batch.append(prompt_index)
                continue
            
            # Get and normalize tool stats for consistent schema across all entries
@@ -712,7 +707,7 @@ class BatchRunner:
        """
        checkpoint_data["last_updated"] = datetime.now().isoformat()

-        from hermes_agent.utils import atomic_json_write
+        from utils import atomic_json_write
        if lock:
            with lock:
                atomic_json_write(self.checkpoint_file, checkpoint_data)
@@ -1130,7 +1125,7 @@ def main(
    num_workers: int = 4,
    resume: bool = False,
    verbose: bool = False,
-    show_distributions: bool = False,
+    list_distributions: bool = False,
    ephemeral_system_prompt: str = None,
    log_prefix_chars: int = 100,
    providers_allowed: str = None,
@@ -1158,7 +1153,7 @@ def main(
        num_workers (int): Number of parallel worker processes (default: 4)
        resume (bool): Resume from checkpoint if run was interrupted (default: False)
        verbose (bool): Enable verbose logging (default: False)
-        show_distributions (bool): List available toolset distributions and exit
+        list_distributions (bool): List available toolset distributions and exit
        ephemeral_system_prompt (str): System prompt used during agent execution but NOT saved to trajectories (optional)
        log_prefix_chars (int): Number of characters to show in log previews for tool calls/responses (default: 20)
        providers_allowed (str): Comma-separated list of OpenRouter providers to allow (e.g. "anthropic,openai")
@@ -1190,16 +1185,16 @@ def main(
                               --prefill_messages_file=configs/prefill_opus.json
        
        # List available distributions
-        python batch_runner.py --show_distributions
+        python batch_runner.py --list_distributions
    """
    # Handle list distributions
-    if show_distributions:
-        from hermes_agent.tools.distributions import print_distribution_info
-
+    if list_distributions:
+        from toolset_distributions import list_distributions as get_all_dists, print_distribution_info
+        
        print("📊 Available Toolset Distributions")
        print("=" * 70)
-
-        all_dists = list_distributions()
+        
+        all_dists = get_all_dists()
        for dist_name in sorted(all_dists.keys()):
            print_distribution_info(dist_name)
        
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -24,7 +24,6 @@ model:
  #   "minimax"      - MiniMax global (requires: MINIMAX_API_KEY)
  #   "minimax-cn"   - MiniMax China (requires: MINIMAX_CN_API_KEY)
  #   "huggingface"  - Hugging Face Inference (requires: HF_TOKEN)
-  #   "nvidia"       - NVIDIA NIM / build.nvidia.com (requires: NVIDIA_API_KEY)
  #   "xiaomi"       - Xiaomi MiMo (requires: XIAOMI_API_KEY)
  #   "arcee"        - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
  #   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
@@ -63,38 +62,7 @@ model:
  #   Leave unset to use the model's native output ceiling (recommended).
  #   Set only if you want to deliberately limit individual response length.
  #
-# max_tokens: 8192
-
-# Named provider overrides (optional)
-# Use this for per-provider request timeouts, non-stream stale timeouts,
-# and per-model exceptions.
-# Applies to the primary turn client on every api_mode (OpenAI-wire, native
-# Anthropic, and Anthropic-compatible providers), the fallback chain, and
-# client rebuilds during credential rotation.  For OpenAI-wire chat
-# completions (streaming and non-streaming) the configured value is also
-# used as the per-request ``timeout=`` kwarg so it wins over the legacy
-# HERMES_API_TIMEOUT env var (which still applies when no config is set).
-# ``stale_timeout_seconds`` controls the non-streaming stale-call detector and
-# wins over the legacy HERMES_API_CALL_STALE_TIMEOUT env var. Leaving these
-# unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s,
-# HERMES_API_CALL_STALE_TIMEOUT=300s, native Anthropic 900s).
-#
-# Not currently wired for AWS Bedrock (bedrock_converse + AnthropicBedrock
-# SDK paths) — those use boto3 with its own timeout configuration.
-#
-# providers:
-#   ollama-local:
-#     request_timeout_seconds: 300   # Longer timeout for local cold-starts
-#     stale_timeout_seconds: 900     # Explicitly re-enable stale detection on local endpoints
-#   anthropic:
-#     request_timeout_seconds: 30    # Fast-fail cloud requests
-#     models:
-#       claude-opus-4.6:
-#         timeout_seconds: 600       # Longer timeout for extended-thinking Opus calls
-#   openai-codex:
-#     models:
-#       gpt-5.4:
-#         stale_timeout_seconds: 1800  # Longer non-stream stale timeout for slow large-context turns
+  # max_tokens: 8192

 # =============================================================================
 # OpenRouter Provider Routing (only applies when using OpenRouter)
@@ -122,6 +90,20 @@ model:
 #   # Data policy: "allow" (default) or "deny" to exclude providers that may store data
 #   # data_collection: "deny"

+# =============================================================================
+# Smart Model Routing (optional)
+# =============================================================================
+# Use a cheaper model for short/simple turns while keeping your main model for
+# more complex requests. Disabled by default.
+#
+# smart_model_routing:
+#   enabled: true
+#   max_simple_chars: 160
+#   max_simple_words: 28
+#   cheap_model:
+#     provider: openrouter
+#     model: google/gemini-2.5-flash
+
 # =============================================================================
 # Git Worktree Isolation
 # =============================================================================
@@ -374,18 +356,6 @@ compression:
 #   web_extract:
 #     provider: "auto"
 #     model: ""
-#
-#   # Session search — summarizes matching past sessions
-#   session_search:
-#     provider: "auto"
-#     model: ""
-#     timeout: 30
-#     max_concurrency: 3    # Limit parallel summaries to reduce request-burst 429s
-#     extra_body: {}        # Provider-specific OpenAI-compatible request fields
-#                           # Example for providers that support request-body
-#                           # reasoning controls:
-#                           # extra_body:
-#                           #   enable_thinking: false

 # =============================================================================
 # Persistent Memory
@@ -770,12 +740,10 @@ code_execution:
 # Subagent Delegation
 # =============================================================================
 # The delegate_task tool spawns child agents with isolated context.
-# Supports single tasks and batch mode (default 3 parallel, configurable).
+# Supports single tasks and batch mode (up to 3 parallel).
 delegation:
  max_iterations: 50                          # Max tool-calling turns per child (default: 50)
-  # max_concurrent_children: 3                # Max parallel child agents (default: 3)
-  # max_spawn_depth: 1                        # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
-  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
+  default_toolsets: ["terminal", "file", "web"]  # Default toolsets for subagents
  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
  #                                           # Resolves full credentials (base_url, api_key) automatically.
@@ -919,39 +887,3 @@ display:
 #   # Names and usernames are NOT affected (user-chosen, publicly visible).
 #   # Routing/delivery still uses the original values internally.
 #   redact_pii: false
-
-# =============================================================================
-# Shell-script hooks
-# =============================================================================
-# Register shell scripts as plugin-hook callbacks.  Each entry is executed as
-# a subprocess (shell=False, shlex.split) with a JSON payload on stdin.  On
-# stdout the script may return JSON that either blocks the tool call or
-# injects context into the next LLM call.
-#
-# Valid events (mirror hermes_cli.plugins.VALID_HOOKS):
-#   pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
-#   pre_api_request, post_api_request, on_session_start, on_session_end,
-#   on_session_finalize, on_session_reset, subagent_stop
-#
-# First-use consent: each (event, command) pair prompts once on a TTY, then
-# is persisted to ~/.hermes/shell-hooks-allowlist.json.  Non-interactive
-# runs (gateway, cron) need --accept-hooks, HERMES_ACCEPT_HOOKS=1, or the
-# hooks_auto_accept key below.
-#
-# See website/docs/user-guide/features/hooks.md for the full JSON wire
-# protocol and worked examples.
-#
-# hooks:
-#   pre_tool_call:
-#     - matcher: "terminal"
-#       command: "~/.hermes/agent-hooks/block-rm-rf.sh"
-#       timeout: 10
-#   post_tool_call:
-#     - matcher: "write_file|patch"
-#       command: "~/.hermes/agent-hooks/auto-format.sh"
-#   pre_llm_call:
-#     - command: "~/.hermes/agent-hooks/inject-cwd-context.sh"
-#   subagent_stop:
-#     - command: "~/.hermes/agent-hooks/log-orchestration.sh"
-#
-# hooks_auto_accept: false
--- a/hermes_agent/cli/repl.py
+++ b/hermes_agent/cli/repl.py
--- a/hermes_agent/cron/init.py
+++ b/hermes_agent/cron/init.py
@@ -15,7 +15,7 @@ The gateway ticks the scheduler every 60 seconds. A file lock prevents
 duplicate execution if multiple processes overlap.
 """

-from hermes_agent.cron.jobs import (
+from cron.jobs import (
    create_job,
    get_job,
    list_jobs,
@@ -26,7 +26,7 @@ from hermes_agent.cron.jobs import (
    trigger_job,
    JOBS_FILE,
 )
-from hermes_agent.cron.scheduler import tick
+from cron.scheduler import tick

 __all__ = [
    "create_job",
--- a/hermes_agent/cron/jobs.py
+++ b/hermes_agent/cron/jobs.py
@@ -9,18 +9,17 @@ import copy
 import json
 import logging
 import tempfile
-import threading
 import os
 import re
 import uuid
 from datetime import datetime, timedelta
 from pathlib import Path
-from hermes_agent.constants import get_hermes_home
+from hermes_constants import get_hermes_home
 from typing import Optional, Dict, List, Any

 logger = logging.getLogger(__name__)

-from hermes_agent.time import now as _hermes_now
+from hermes_time import now as _hermes_now

 try:
    from croniter import croniter
@@ -35,11 +34,6 @@ except ImportError:
 HERMES_DIR = get_hermes_home().resolve()
 CRON_DIR = HERMES_DIR / "cron"
 JOBS_FILE = CRON_DIR / "jobs.json"
-
-# In-process lock protecting load_jobs→modify→save_jobs cycles.
-# Required when tick() runs jobs in parallel threads — without this,
-# concurrent mark_job_run / advance_next_run calls can clobber each other.
-_jobs_file_lock = threading.Lock()
 OUTPUT_DIR = CRON_DIR / "output"
 ONESHOT_GRACE_SECONDS = 120

@@ -600,44 +594,43 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
    ``delivery_error`` is tracked separately from the agent error — a job
    can succeed (agent produced output) but fail delivery (platform down).
    """
-    with _jobs_file_lock:
-        jobs = load_jobs()
-        for i, job in enumerate(jobs):
-            if job["id"] == job_id:
-                now = _hermes_now().isoformat()
-                job["last_run_at"] = now
-                job["last_status"] = "ok" if success else "error"
-                job["last_error"] = error if not success else None
-                # Track delivery failures separately — cleared on successful delivery
-                job["last_delivery_error"] = delivery_error
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            now = _hermes_now().isoformat()
+            job["last_run_at"] = now
+            job["last_status"] = "ok" if success else "error"
+            job["last_error"] = error if not success else None
+            # Track delivery failures separately — cleared on successful delivery
+            job["last_delivery_error"] = delivery_error
+            
+            # Increment completed count
+            if job.get("repeat"):
+                job["repeat"]["completed"] = job["repeat"].get("completed", 0) + 1
                
-                # Increment completed count
-                if job.get("repeat"):
-                    job["repeat"]["completed"] = job["repeat"].get("completed", 0) + 1
-                    
-                    # Check if we've hit the repeat limit
-                    times = job["repeat"].get("times")
-                    completed = job["repeat"]["completed"]
-                    if times is not None and times > 0 and completed >= times:
-                        # Remove the job (limit reached)
-                        jobs.pop(i)
-                        save_jobs(jobs)
-                        return
-                
-                # Compute next run
-                job["next_run_at"] = compute_next_run(job["schedule"], now)
+                # Check if we've hit the repeat limit
+                times = job["repeat"].get("times")
+                completed = job["repeat"]["completed"]
+                if times is not None and times > 0 and completed >= times:
+                    # Remove the job (limit reached)
+                    jobs.pop(i)
+                    save_jobs(jobs)
+                    return
+            
+            # Compute next run
+            job["next_run_at"] = compute_next_run(job["schedule"], now)

-                # If no next run (one-shot completed), disable
-                if job["next_run_at"] is None:
-                    job["enabled"] = False
-                    job["state"] = "completed"
-                elif job.get("state") != "paused":
-                    job["state"] = "scheduled"
+            # If no next run (one-shot completed), disable
+            if job["next_run_at"] is None:
+                job["enabled"] = False
+                job["state"] = "completed"
+            elif job.get("state") != "paused":
+                job["state"] = "scheduled"

-                save_jobs(jobs)
-                return
+            save_jobs(jobs)
+            return

-        logger.warning("mark_job_run: job_id %s not found, skipping save", job_id)
+    logger.warning("mark_job_run: job_id %s not found, skipping save", job_id)


 def advance_next_run(job_id: str) -> bool:
@@ -652,21 +645,20 @@ def advance_next_run(job_id: str) -> bool:

    Returns True if next_run_at was advanced, False otherwise.
    """
-    with _jobs_file_lock:
-        jobs = load_jobs()
-        for job in jobs:
-            if job["id"] == job_id:
-                kind = job.get("schedule", {}).get("kind")
-                if kind not in ("cron", "interval"):
-                    return False
-                now = _hermes_now().isoformat()
-                new_next = compute_next_run(job["schedule"], now)
-                if new_next and new_next != job.get("next_run_at"):
-                    job["next_run_at"] = new_next
-                    save_jobs(jobs)
-                    return True
+    jobs = load_jobs()
+    for job in jobs:
+        if job["id"] == job_id:
+            kind = job.get("schedule", {}).get("kind")
+            if kind not in ("cron", "interval"):
                return False
-        return False
+            now = _hermes_now().isoformat()
+            new_next = compute_next_run(job["schedule"], now)
+            if new_next and new_next != job.get("next_run_at"):
+                job["next_run_at"] = new_next
+                save_jobs(jobs)
+                return True
+            return False
+    return False


 def get_due_jobs() -> List[Dict[str, Any]]:
--- a/hermes_agent/cron/scheduler.py
+++ b/hermes_agent/cron/scheduler.py
@@ -27,11 +27,16 @@ except ImportError:
    except ImportError:
        msvcrt = None
 from pathlib import Path
-from typing import List, Optional
+from typing import Optional

-from hermes_agent.constants import get_hermes_home
-from hermes_agent.cli.config import load_config
-from hermes_agent.time import now as _hermes_now
+# Add parent directory to path for imports BEFORE repo-level imports.
+# Without this, standalone invocations (e.g. after `hermes update` reloads
+# the module) fail with ModuleNotFoundError for hermes_time et al.
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from hermes_constants import get_hermes_home
+from hermes_cli.config import load_config
+from hermes_time import now as _hermes_now

 logger = logging.getLogger(__name__)

@@ -44,34 +49,7 @@ _KNOWN_DELIVERY_PLATFORMS = frozenset({
    "qqbot",
 })

-# Platforms that support a configured cron/notification home target, mapped to
-# the environment variable used by gateway setup/runtime config.
-_HOME_TARGET_ENV_VARS = {
-    "matrix": "MATRIX_HOME_ROOM",
-    "telegram": "TELEGRAM_HOME_CHANNEL",
-    "discord": "DISCORD_HOME_CHANNEL",
-    "slack": "SLACK_HOME_CHANNEL",
-    "signal": "SIGNAL_HOME_CHANNEL",
-    "mattermost": "MATTERMOST_HOME_CHANNEL",
-    "sms": "SMS_HOME_CHANNEL",
-    "email": "EMAIL_HOME_ADDRESS",
-    "dingtalk": "DINGTALK_HOME_CHANNEL",
-    "feishu": "FEISHU_HOME_CHANNEL",
-    "wecom": "WECOM_HOME_CHANNEL",
-    "weixin": "WEIXIN_HOME_CHANNEL",
-    "bluebubbles": "BLUEBUBBLES_HOME_CHANNEL",
-    "qqbot": "QQBOT_HOME_CHANNEL",
-}
-
-# Legacy env var names kept for back-compat.  Each entry is the current
-# primary env var → the previous name.  _get_home_target_chat_id falls
-# back to the legacy name if the primary is unset, so users who set the
-# old name before the rename keep working until they migrate.
-_LEGACY_HOME_TARGET_ENV_VARS = {
-    "QQBOT_HOME_CHANNEL": "QQ_HOME_CHANNEL",
-}
-
-from hermes_agent.cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
+from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run

 # Sentinel: when a cron agent has nothing new to report, it can start its
 # response with this marker to suppress delivery.  Output is still saved
@@ -98,28 +76,15 @@ def _resolve_origin(job: dict) -> Optional[dict]:
    return None


-def _get_home_target_chat_id(platform_name: str) -> str:
-    """Return the configured home target chat/room ID for a delivery platform."""
-    env_var = _HOME_TARGET_ENV_VARS.get(platform_name.lower())
-    if not env_var:
-        return ""
-    value = os.getenv(env_var, "")
-    if not value:
-        legacy = _LEGACY_HOME_TARGET_ENV_VARS.get(env_var)
-        if legacy:
-            value = os.getenv(legacy, "")
-    return value
-
-
-def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[dict]:
-    """Resolve one concrete auto-delivery target for a cron job."""
-
+def _resolve_delivery_target(job: dict) -> Optional[dict]:
+    """Resolve the concrete auto-delivery target for a cron job, if any."""
+    deliver = job.get("deliver", "local")
    origin = _resolve_origin(job)

-    if deliver_value == "local":
+    if deliver == "local":
        return None

-    if deliver_value == "origin":
+    if deliver == "origin":
        if origin:
            return {
                "platform": origin["platform"],
@@ -128,8 +93,8 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
            }
        # Origin missing (e.g. job created via API/script) — try each
        # platform's home channel as a fallback instead of silently dropping.
-        for platform_name in _HOME_TARGET_ENV_VARS:
-            chat_id = _get_home_target_chat_id(platform_name)
+        for platform_name in ("matrix", "telegram", "discord", "slack", "bluebubbles"):
+            chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
            if chat_id:
                logger.info(
                    "Job '%s' has deliver=origin but no origin; falling back to %s home channel",
@@ -143,11 +108,11 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
                }
        return None

-    if ":" in deliver_value:
-        platform_name, rest = deliver_value.split(":", 1)
+    if ":" in deliver:
+        platform_name, rest = deliver.split(":", 1)
        platform_key = platform_name.lower()

-        from hermes_agent.tools.send_message import _parse_target_ref
+        from tools.send_message_tool import _parse_target_ref

        parsed_chat_id, parsed_thread_id, is_explicit = _parse_target_ref(platform_key, rest)
        if is_explicit:
@@ -157,7 +122,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d

        # Resolve human-friendly labels like "Alice (dm)" to real IDs.
        try:
-            from hermes_agent.gateway.channel_directory import resolve_channel_name
+            from gateway.channel_directory import resolve_channel_name
            resolved = resolve_channel_name(platform_key, chat_id)
            if resolved:
                parsed_chat_id, parsed_thread_id, resolved_is_explicit = _parse_target_ref(platform_key, resolved)
@@ -174,7 +139,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
            "thread_id": thread_id,
        }

-    platform_name = deliver_value
+    platform_name = deliver
    if origin and origin.get("platform") == platform_name:
        return {
            "platform": platform_name,
@@ -184,7 +149,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d

    if platform_name.lower() not in _KNOWN_DELIVERY_PLATFORMS:
        return None
-    chat_id = _get_home_target_chat_id(platform_name)
+    chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
    if not chat_id:
        return None

@@ -195,30 +160,6 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
    }


-def _resolve_delivery_targets(job: dict) -> List[dict]:
-    """Resolve all concrete auto-delivery targets for a cron job (supports comma-separated deliver)."""
-    deliver = job.get("deliver", "local")
-    if deliver == "local":
-        return []
-    parts = [p.strip() for p in str(deliver).split(",") if p.strip()]
-    seen = set()
-    targets = []
-    for part in parts:
-        target = _resolve_single_delivery_target(job, part)
-        if target:
-            key = (target["platform"].lower(), str(target["chat_id"]), target.get("thread_id"))
-            if key not in seen:
-                seen.add(key)
-                targets.append(target)
-    return targets
-
-
-def _resolve_delivery_target(job: dict) -> Optional[dict]:
-    """Resolve the concrete auto-delivery target for a cron job, if any."""
-    targets = _resolve_delivery_targets(job)
-    return targets[0] if targets else None
-
-
 # Media extension sets — keep in sync with gateway/platforms/base.py:_process_message_background
 _AUDIO_EXTS = frozenset({'.ogg', '.opus', '.mp3', '.wav', '.m4a'})
 _VIDEO_EXTS = frozenset({'.mp4', '.mov', '.avi', '.mkv', '.webm', '.3gp'})
@@ -247,11 +188,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata:
                coro = adapter.send_document(chat_id=chat_id, file_path=media_path, metadata=metadata)

            future = asyncio.run_coroutine_threadsafe(coro, loop)
-            try:
-                result = future.result(timeout=30)
-            except TimeoutError:
-                future.cancel()
-                raise
+            result = future.result(timeout=30)
            if result and not getattr(result, "success", True):
                logger.warning(
                    "Job '%s': media send failed for %s: %s",
@@ -263,7 +200,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata:

 def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]:
    """
-    Deliver job output to the configured target(s) (origin chat, specific platform, etc.).
+    Deliver job output to the configured target (origin chat, specific platform, etc.).

    When ``adapters`` and ``loop`` are provided (gateway is running), tries to
    use the live adapter first — this supports E2EE rooms (e.g. Matrix) where
@@ -272,16 +209,35 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option

    Returns None on success, or an error string on failure.
    """
-    targets = _resolve_delivery_targets(job)
-    if not targets:
+    target = _resolve_delivery_target(job)
+    if not target:
        if job.get("deliver", "local") != "local":
            msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}"
            logger.warning("Job '%s': %s", job["id"], msg)
            return msg
        return None  # local-only jobs don't deliver — not a failure

-    from hermes_agent.tools.send_message import _send_to_platform
-    from hermes_agent.gateway.config import load_gateway_config, Platform
+    platform_name = target["platform"]
+    chat_id = target["chat_id"]
+    thread_id = target.get("thread_id")
+
+    # Diagnostic: log thread_id for topic-aware delivery debugging
+    origin = job.get("origin") or {}
+    origin_thread = origin.get("thread_id")
+    if origin_thread and not thread_id:
+        logger.warning(
+            "Job '%s': origin has thread_id=%s but delivery target lost it "
+            "(deliver=%s, target=%s)",
+            job["id"], origin_thread, job.get("deliver", "local"), target,
+        )
+    elif thread_id:
+        logger.debug(
+            "Job '%s': delivering to %s:%s thread_id=%s",
+            job["id"], platform_name, chat_id, thread_id,
+        )
+
+    from tools.send_message_tool import _send_to_platform
+    from gateway.config import load_gateway_config, Platform

    platform_map = {
        "telegram": Platform.TELEGRAM,
@@ -302,6 +258,24 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
        "bluebubbles": Platform.BLUEBUBBLES,
        "qqbot": Platform.QQBOT,
    }
+    platform = platform_map.get(platform_name.lower())
+    if not platform:
+        msg = f"unknown platform '{platform_name}'"
+        logger.warning("Job '%s': %s", job["id"], msg)
+        return msg
+
+    try:
+        config = load_gateway_config()
+    except Exception as e:
+        msg = f"failed to load gateway config: {e}"
+        logger.error("Job '%s': %s", job["id"], msg)
+        return msg
+
+    pconfig = config.platforms.get(platform)
+    if not pconfig or not pconfig.enabled:
+        msg = f"platform '{platform_name}' not configured/enabled"
+        logger.warning("Job '%s': %s", job["id"], msg)
+        return msg

    # Optionally wrap the content with a header/footer so the user knows this
    # is a cron delivery.  Wrapping is on by default; set cron.wrap_response: false
@@ -327,124 +301,70 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
        delivery_content = content

    # Extract MEDIA: tags so attachments are forwarded as files, not raw text
-    from hermes_agent.gateway.platforms.base import BasePlatformAdapter
+    from gateway.platforms.base import BasePlatformAdapter
    media_files, cleaned_delivery_content = BasePlatformAdapter.extract_media(delivery_content)

+    # Prefer the live adapter when the gateway is running — this supports E2EE
+    # rooms (e.g. Matrix) where the standalone HTTP path cannot encrypt.
+    runtime_adapter = (adapters or {}).get(platform)
+    if runtime_adapter is not None and loop is not None and getattr(loop, "is_running", lambda: False)():
+        send_metadata = {"thread_id": thread_id} if thread_id else None
+        try:
+            # Send cleaned text (MEDIA tags stripped) — not the raw content
+            text_to_send = cleaned_delivery_content.strip()
+            adapter_ok = True
+            if text_to_send:
+                future = asyncio.run_coroutine_threadsafe(
+                    runtime_adapter.send(chat_id, text_to_send, metadata=send_metadata),
+                    loop,
+                )
+                send_result = future.result(timeout=60)
+                if send_result and not getattr(send_result, "success", True):
+                    err = getattr(send_result, "error", "unknown")
+                    logger.warning(
+                        "Job '%s': live adapter send to %s:%s failed (%s), falling back to standalone",
+                        job["id"], platform_name, chat_id, err,
+                    )
+                    adapter_ok = False  # fall through to standalone path
+
+            # Send extracted media files as native attachments via the live adapter
+            if adapter_ok and media_files:
+                _send_media_via_adapter(runtime_adapter, chat_id, media_files, send_metadata, loop, job)
+
+            if adapter_ok:
+                logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
+                return None
+        except Exception as e:
+            logger.warning(
+                "Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
+                job["id"], platform_name, chat_id, e,
+            )
+
+    # Standalone path: run the async send in a fresh event loop (safe from any thread)
+    coro = _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files)
    try:
-        config = load_gateway_config()
+        result = asyncio.run(coro)
+    except RuntimeError:
+        # asyncio.run() checks for a running loop before awaiting the coroutine;
+        # when it raises, the original coro was never started — close it to
+        # prevent "coroutine was never awaited" RuntimeWarning, then retry in a
+        # fresh thread that has no running loop.
+        coro.close()
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files))
+            result = future.result(timeout=30)
    except Exception as e:
-        msg = f"failed to load gateway config: {e}"
+        msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
        logger.error("Job '%s': %s", job["id"], msg)
        return msg

-    delivery_errors = []
+    if result and result.get("error"):
+        msg = f"delivery error: {result['error']}"
+        logger.error("Job '%s': %s", job["id"], msg)
+        return msg

-    for target in targets:
-        platform_name = target["platform"]
-        chat_id = target["chat_id"]
-        thread_id = target.get("thread_id")
-
-        # Diagnostic: log thread_id for topic-aware delivery debugging
-        origin = job.get("origin") or {}
-        origin_thread = origin.get("thread_id")
-        if origin_thread and not thread_id:
-            logger.warning(
-                "Job '%s': origin has thread_id=%s but delivery target lost it "
-                "(deliver=%s, target=%s)",
-                job["id"], origin_thread, job.get("deliver", "local"), target,
-            )
-        elif thread_id:
-            logger.debug(
-                "Job '%s': delivering to %s:%s thread_id=%s",
-                job["id"], platform_name, chat_id, thread_id,
-            )
-
-        platform = platform_map.get(platform_name.lower())
-        if not platform:
-            msg = f"unknown platform '{platform_name}'"
-            logger.warning("Job '%s': %s", job["id"], msg)
-            delivery_errors.append(msg)
-            continue
-
-        # Prefer the live adapter when the gateway is running — this supports E2EE
-        # rooms (e.g. Matrix) where the standalone HTTP path cannot encrypt.
-        runtime_adapter = (adapters or {}).get(platform)
-        delivered = False
-        if runtime_adapter is not None and loop is not None and getattr(loop, "is_running", lambda: False)():
-            send_metadata = {"thread_id": thread_id} if thread_id else None
-            try:
-                # Send cleaned text (MEDIA tags stripped) — not the raw content
-                text_to_send = cleaned_delivery_content.strip()
-                adapter_ok = True
-                if text_to_send:
-                    future = asyncio.run_coroutine_threadsafe(
-                        runtime_adapter.send(chat_id, text_to_send, metadata=send_metadata),
-                        loop,
-                    )
-                    try:
-                        send_result = future.result(timeout=60)
-                    except TimeoutError:
-                        future.cancel()
-                        raise
-                    if send_result and not getattr(send_result, "success", True):
-                        err = getattr(send_result, "error", "unknown")
-                        logger.warning(
-                            "Job '%s': live adapter send to %s:%s failed (%s), falling back to standalone",
-                            job["id"], platform_name, chat_id, err,
-                        )
-                        adapter_ok = False  # fall through to standalone path
-
-                # Send extracted media files as native attachments via the live adapter
-                if adapter_ok and media_files:
-                    _send_media_via_adapter(runtime_adapter, chat_id, media_files, send_metadata, loop, job)
-
-                if adapter_ok:
-                    logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
-                    delivered = True
-            except Exception as e:
-                logger.warning(
-                    "Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
-                    job["id"], platform_name, chat_id, e,
-                )
-
-        if not delivered:
-            pconfig = config.platforms.get(platform)
-            if not pconfig or not pconfig.enabled:
-                msg = f"platform '{platform_name}' not configured/enabled"
-                logger.warning("Job '%s': %s", job["id"], msg)
-                delivery_errors.append(msg)
-                continue
-
-            # Standalone path: run the async send in a fresh event loop (safe from any thread)
-            coro = _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files)
-            try:
-                result = asyncio.run(coro)
-            except RuntimeError:
-                # asyncio.run() checks for a running loop before awaiting the coroutine;
-                # when it raises, the original coro was never started — close it to
-                # prevent "coroutine was never awaited" RuntimeWarning, then retry in a
-                # fresh thread that has no running loop.
-                coro.close()
-                with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
-                    future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files))
-                    result = future.result(timeout=30)
-            except Exception as e:
-                msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
-                logger.error("Job '%s': %s", job["id"], msg)
-                delivery_errors.append(msg)
-                continue
-
-            error = result.get("error") if result else None
-            if error:
-                msg = f"delivery error: {error}"
-                logger.error("Job '%s': %s", job["id"], msg)
-                delivery_errors.append(msg)
-                continue
-
-            logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
-
-    if delivery_errors:
-        return "; ".join(delivery_errors)
+    logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
    return None


@@ -503,7 +423,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
        (success, output) — on failure *output* contains the error message so the
        LLM can report the problem to the user.
    """
-    from hermes_agent.constants import get_hermes_home
+    from hermes_constants import get_hermes_home

    scripts_dir = get_hermes_home() / "scripts"
    scripts_dir.mkdir(parents=True, exist_ok=True)
@@ -545,7 +465,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:

        # Redact secrets from both stdout and stderr before any return path.
        try:
-            from hermes_agent.agent.redact import redact_sensitive_text
+            from agent.redact import redact_sensitive_text
            stdout = redact_sensitive_text(stdout)
            stderr = redact_sensitive_text(stderr)
        except Exception:
@@ -567,53 +487,15 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
        return False, f"Script execution failed: {exc}"


-def _parse_wake_gate(script_output: str) -> bool:
-    """Parse the last non-empty stdout line of a cron job's pre-check script
-    as a wake gate.
-
-    The convention (ported from nanoclaw #1232): if the last stdout line is
-    JSON like ``{"wakeAgent": false}``, the agent is skipped entirely — no
-    LLM run, no delivery. Any other output (non-JSON, missing flag, gate
-    absent, or ``wakeAgent: true``) means wake the agent normally.
-
-    Returns True if the agent should wake, False to skip.
-    """
-    if not script_output:
-        return True
-    stripped_lines = [line for line in script_output.splitlines() if line.strip()]
-    if not stripped_lines:
-        return True
-    last_line = stripped_lines[-1].strip()
-    try:
-        gate = json.loads(last_line)
-    except (json.JSONDecodeError, ValueError):
-        return True
-    if not isinstance(gate, dict):
-        return True
-    return gate.get("wakeAgent", True) is not False
-
-
-def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
-    """Build the effective prompt for a cron job, optionally loading one or more skills first.
-
-    Args:
-        job: The cron job dict.
-        prerun_script: Optional ``(success, stdout)`` from a script that has
-            already been executed by the caller (e.g. for a wake-gate check).
-            When provided, the script is not re-executed and the cached
-            result is used for prompt injection. When omitted, the script
-            (if any) runs inline as before.
-    """
+def _build_job_prompt(job: dict) -> str:
+    """Build the effective prompt for a cron job, optionally loading one or more skills first."""
    prompt = job.get("prompt", "")
    skills = job.get("skills")

    # Run data-collection script if configured, inject output as context.
    script_path = job.get("script")
    if script_path:
-        if prerun_script is not None:
-            success, script_output = prerun_script
-        else:
-            success, script_output = _run_job_script(script_path)
+        success, script_output = _run_job_script(script_path)
        if success:
            if script_output:
                prompt = (
@@ -658,7 +540,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
    if not skill_names:
        return prompt

-    from hermes_agent.tools.skills.tool import skill_view
+    from tools.skills_tool import skill_view

    parts = []
    skipped: list[str] = []
@@ -702,65 +584,34 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
    Returns:
        Tuple of (success, full_output_doc, final_response, error_message)
    """
-    from hermes_agent.agent.loop import AIAgent
+    from run_agent import AIAgent
    
    # Initialize SQLite session store so cron job messages are persisted
    # and discoverable via session_search (same pattern as gateway/run.py).
    _session_db = None
    try:
-        from hermes_agent.state import SessionDB
+        from hermes_state import SessionDB
        _session_db = SessionDB()
    except Exception as e:
        logger.debug("Job '%s': SQLite session store not available: %s", job.get("id", "?"), e)
    
    job_id = job["id"]
    job_name = job["name"]
-
-    # Wake-gate: if this job has a pre-check script, run it BEFORE building
-    # the prompt so a ``{"wakeAgent": false}`` response can short-circuit
-    # the whole agent run. We pass the result into _build_job_prompt so
-    # the script is only executed once.
-    prerun_script = None
-    script_path = job.get("script")
-    if script_path:
-        prerun_script = _run_job_script(script_path)
-        _ran_ok, _script_output = prerun_script
-        if _ran_ok and not _parse_wake_gate(_script_output):
-            logger.info(
-                "Job '%s' (ID: %s): wakeAgent=false, skipping agent run",
-                job_name, job_id,
-            )
-            silent_doc = (
-                f"# Cron Job: {job_name}\n\n"
-                f"**Job ID:** {job_id}\n"
-                f"**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
-                "Script gate returned `wakeAgent=false` — agent skipped.\n"
-            )
-            return True, silent_doc, SILENT_MARKER, None
-
-    prompt = _build_job_prompt(job, prerun_script=prerun_script)
+    prompt = _build_job_prompt(job)
    origin = _resolve_origin(job)
    _cron_session_id = f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"

    logger.info("Running job '%s' (ID: %s)", job_name, job_id)
    logger.info("Prompt: %s", prompt[:100])

-    # Mark this as a cron session so the approval system can apply cron_mode.
-    # This env var is process-wide and persists for the lifetime of the
-    # scheduler process — every job this process runs is a cron job.
-    os.environ["HERMES_CRON_SESSION"] = "1"
-
-    # Use ContextVars for per-job session/delivery state so parallel jobs
-    # don't clobber each other's targets (os.environ is process-global).
-    from hermes_agent.gateway.session_context import set_session_vars, clear_session_vars, _VAR_MAP
-
-    _ctx_tokens = set_session_vars(
-        platform=origin["platform"] if origin else "",
-        chat_id=str(origin["chat_id"]) if origin else "",
-        chat_name=origin.get("chat_name", "") if origin else "",
-    )
-
    try:
+        # Inject origin context so the agent's send_message tool knows the chat.
+        # Must be INSIDE the try block so the finally cleanup always runs.
+        if origin:
+            os.environ["HERMES_SESSION_PLATFORM"] = origin["platform"]
+            os.environ["HERMES_SESSION_CHAT_ID"] = str(origin["chat_id"])
+            if origin.get("chat_name"):
+                os.environ["HERMES_SESSION_CHAT_NAME"] = origin["chat_name"]
        # Re-read .env and config.yaml fresh every run so provider/key
        # changes take effect without a gateway restart.
        from dotenv import load_dotenv
@@ -771,10 +622,10 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:

        delivery_target = _resolve_delivery_target(job)
        if delivery_target:
-            _VAR_MAP["HERMES_CRON_AUTO_DELIVER_PLATFORM"].set(delivery_target["platform"])
-            _VAR_MAP["HERMES_CRON_AUTO_DELIVER_CHAT_ID"].set(str(delivery_target["chat_id"]))
+            os.environ["HERMES_CRON_AUTO_DELIVER_PLATFORM"] = delivery_target["platform"]
+            os.environ["HERMES_CRON_AUTO_DELIVER_CHAT_ID"] = str(delivery_target["chat_id"])
            if delivery_target.get("thread_id") is not None:
-                _VAR_MAP["HERMES_CRON_AUTO_DELIVER_THREAD_ID"].set(str(delivery_target["thread_id"]))
+                os.environ["HERMES_CRON_AUTO_DELIVER_THREAD_ID"] = str(delivery_target["thread_id"])

        model = job.get("model") or os.getenv("HERMES_MODEL") or ""

@@ -797,7 +648,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:

        # Apply IPv4 preference if configured.
        try:
-            from hermes_agent.constants import apply_ipv4_preference
+            from hermes_constants import apply_ipv4_preference
            _net_cfg = _cfg.get("network", {})
            if isinstance(_net_cfg, dict) and _net_cfg.get("force_ipv4"):
                apply_ipv4_preference(force=True)
@@ -805,7 +656,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            pass

        # Reasoning config from config.yaml
-        from hermes_agent.constants import parse_reasoning_effort
+        from hermes_constants import parse_reasoning_effort
        effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
        reasoning_config = parse_reasoning_effort(effort)

@@ -813,13 +664,14 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        prefill_messages = None
        prefill_file = os.getenv("HERMES_PREFILL_MESSAGES_FILE", "") or _cfg.get("prefill_messages_file", "")
        if prefill_file:
+            import json as _json
            pfpath = Path(prefill_file).expanduser()
            if not pfpath.is_absolute():
                pfpath = _hermes_home / pfpath
            if pfpath.exists():
                try:
                    with open(pfpath, "r", encoding="utf-8") as _pf:
-                        prefill_messages = json.load(_pf)
+                        prefill_messages = _json.load(_pf)
                    if not isinstance(prefill_messages, list):
                        prefill_messages = None
                except Exception as e:
@@ -831,8 +683,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:

        # Provider routing
        pr = _cfg.get("provider_routing", {})
+        smart_routing = _cfg.get("smart_model_routing", {}) or {}

-        from hermes_agent.cli.runtime_provider import (
+        from hermes_cli.runtime_provider import (
            resolve_runtime_provider,
            format_runtime_provider_error,
        )
@@ -847,12 +700,27 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            message = format_runtime_provider_error(exc)
            raise RuntimeError(message) from exc

+        from agent.smart_model_routing import resolve_turn_route
+        turn_route = resolve_turn_route(
+            prompt,
+            smart_routing,
+            {
+                "model": model,
+                "api_key": runtime.get("api_key"),
+                "base_url": runtime.get("base_url"),
+                "provider": runtime.get("provider"),
+                "api_mode": runtime.get("api_mode"),
+                "command": runtime.get("command"),
+                "args": list(runtime.get("args") or []),
+            },
+        )
+
        fallback_model = _cfg.get("fallback_providers") or _cfg.get("fallback_model") or None
        credential_pool = None
-        runtime_provider = str(runtime.get("provider") or "").strip().lower()
+        runtime_provider = str(turn_route["runtime"].get("provider") or "").strip().lower()
        if runtime_provider:
            try:
-                from hermes_agent.providers.credential_pool import load_pool
+                from agent.credential_pool import load_pool
                pool = load_pool(runtime_provider)
                if pool.has_credentials():
                    credential_pool = pool
@@ -866,13 +734,13 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
                logger.debug("Job '%s': failed to load credential pool for %s: %s", job_id, runtime_provider, e)

        agent = AIAgent(
-            model=model,
-            api_key=runtime.get("api_key"),
-            base_url=runtime.get("base_url"),
-            provider=runtime.get("provider"),
-            api_mode=runtime.get("api_mode"),
-            acp_command=runtime.get("command"),
-            acp_args=runtime.get("args"),
+            model=turn_route["model"],
+            api_key=turn_route["runtime"].get("api_key"),
+            base_url=turn_route["runtime"].get("base_url"),
+            provider=turn_route["runtime"].get("provider"),
+            api_mode=turn_route["runtime"].get("api_mode"),
+            acp_command=turn_route["runtime"].get("command"),
+            acp_args=turn_route["runtime"].get("args"),
            max_iterations=max_iterations,
            reasoning_config=reasoning_config,
            prefill_messages=prefill_messages,
@@ -1017,8 +885,16 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        return False, output, "", error_msg

    finally:
-        # Clean up ContextVar session/delivery state for this job.
-        clear_session_vars(_ctx_tokens)
+        # Clean up injected env vars so they don't leak to other jobs
+        for key in (
+            "HERMES_SESSION_PLATFORM",
+            "HERMES_SESSION_CHAT_ID",
+            "HERMES_SESSION_CHAT_NAME",
+            "HERMES_CRON_AUTO_DELIVER_PLATFORM",
+            "HERMES_CRON_AUTO_DELIVER_CHAT_ID",
+            "HERMES_CRON_AUTO_DELIVER_THREAD_ID",
+        ):
+            os.environ.pop(key, None)
        if _session_db:
            try:
                _session_db.end_session(_cron_session_id, "cron_complete")
@@ -1071,41 +947,15 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
        if verbose:
            logger.info("%s - %s job(s) due", _hermes_now().strftime('%H:%M:%S'), len(due_jobs))

-        # Advance next_run_at for all recurring jobs FIRST, under the file lock,
-        # before any execution begins.  This preserves at-most-once semantics.
+        executed = 0
        for job in due_jobs:
-            advance_next_run(job["id"])
-
-        # Resolve max parallel workers: env var > config.yaml > unbounded.
-        # Set HERMES_CRON_MAX_PARALLEL=1 to restore old serial behaviour.
-        _max_workers: Optional[int] = None
-        try:
-            _env_par = os.getenv("HERMES_CRON_MAX_PARALLEL", "").strip()
-            if _env_par:
-                _max_workers = int(_env_par) or None
-        except (ValueError, TypeError):
-            logger.warning("Invalid HERMES_CRON_MAX_PARALLEL value; defaulting to unbounded")
-        if _max_workers is None:
            try:
-                _ucfg = load_config() or {}
-                _cfg_par = (
-                    _ucfg.get("cron", {}) if isinstance(_ucfg, dict) else {}
-                ).get("max_parallel_jobs")
-                if _cfg_par is not None:
-                    _max_workers = int(_cfg_par) or None
-            except Exception:
-                pass
+                # For recurring jobs (cron/interval), advance next_run_at to the
+                # next future occurrence BEFORE execution.  This way, if the
+                # process crashes mid-run, the job won't re-fire on restart.
+                # One-shot jobs are left alone so they can retry on restart.
+                advance_next_run(job["id"])

-        if verbose:
-            logger.info(
-                "Running %d job(s) in parallel (max_workers=%s)",
-                len(due_jobs),
-                _max_workers if _max_workers else "unbounded",
-            )
-
-        def _process_job(job: dict) -> bool:
-            """Run one due job end-to-end: execute, save, deliver, mark."""
-            try:
                success, output, final_response, error = run_job(job)

                output_file = save_job_output(job["id"], output)
@@ -1137,23 +987,13 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
                    error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"

                mark_job_run(job["id"], success, error, delivery_error=delivery_error)
-                return True
+                executed += 1

            except Exception as e:
                logger.error("Error processing job %s: %s", job['id'], e)
                mark_job_run(job["id"], False, str(e))
-                return False

-        # Run all due jobs concurrently, each in its own ContextVar copy
-        # so session/delivery state stays isolated per-thread.
-        with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
-            _futures = []
-            for job in due_jobs:
-                _ctx = contextvars.copy_context()
-                _futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
-            _results = [f.result() for f in _futures]
-
-        return sum(_results)
+        return executed
    finally:
        if fcntl:
            fcntl.flock(lock_fd, fcntl.LOCK_UN)
--- a/datagen-config-examples/run_browser_tasks.sh
+++ b/datagen-config-examples/run_browser_tasks.sh
@@ -29,7 +29,7 @@ echo "📝 Logging to: $LOG_FILE"
 # Point to the example dataset in this directory
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

-python scripts/batch_runner.py \
+python batch_runner.py \
  --dataset_file="$SCRIPT_DIR/example_browser_tasks.jsonl" \
  --batch_size=5 \
  --run_name="browser_tasks_example" \
--- a/datagen-config-examples/web_research.yaml
+++ b/datagen-config-examples/web_research.yaml
@@ -4,7 +4,7 @@
 # Generates tool-calling trajectories for multi-step web research tasks.
 #
 # Usage:
-#   python scripts/batch_runner.py \
+#   python batch_runner.py \
 #     --config datagen-config-examples/web_research.yaml \
 #     --run_name web_research_v1

--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -65,7 +65,7 @@ fi

 # Sync bundled skills (manifest-based so user edits are preserved)
 if [ -d "$INSTALL_DIR/skills" ]; then
-    hermes-skills-sync
+    python3 "$INSTALL_DIR/tools/skills_sync.py"
 fi

 exec hermes "$@"
--- a/docs/acp-setup.md
+++ b/docs/acp-setup.md
@@ -0,0 +1,228 @@
+# Hermes Agent — ACP (Agent Client Protocol) Setup Guide
+
+Hermes Agent supports the **Agent Client Protocol (ACP)**, allowing it to run as
+a coding agent inside your editor. ACP lets your IDE send tasks to Hermes, and
+Hermes responds with file edits, terminal commands, and explanations — all shown
+natively in the editor UI.
+
+---
+
+## Prerequisites
+
+- Hermes Agent installed and configured (`hermes setup` completed)
+- An API key / provider set up in `~/.hermes/.env` or via `hermes login`
+- Python 3.11+
+
+Install the ACP extra:
+
+```bash
+pip install -e ".[acp]"
+```
+
+---
+
+## VS Code Setup
+
+### 1. Install the ACP Client extension
+
+Open VS Code and install **ACP Client** from the marketplace:
+
+- Press `Ctrl+Shift+X` (or `Cmd+Shift+X` on macOS)
+- Search for **"ACP Client"**
+- Click **Install**
+
+Or install from the command line:
+
+```bash
+code --install-extension anysphere.acp-client
+```
+
+### 2. Configure settings.json
+
+Open your VS Code settings (`Ctrl+,` → click the `{}` icon for JSON) and add:
+
+```json
+{
+  "acpClient.agents": [
+    {
+      "name": "hermes-agent",
+      "registryDir": "/path/to/hermes-agent/acp_registry"
+    }
+  ]
+}
+```
+
+Replace `/path/to/hermes-agent` with the actual path to your Hermes Agent
+installation (e.g. `~/.hermes/hermes-agent`).
+
+Alternatively, if `hermes` is on your PATH, the ACP Client can discover it
+automatically via the registry directory.
+
+### 3. Restart VS Code
+
+After configuring, restart VS Code. You should see **Hermes Agent** appear in
+the ACP agent picker in the chat/agent panel.
+
+---
+
+## Zed Setup
+
+Zed has built-in ACP support.
+
+### 1. Configure Zed settings
+
+Open Zed settings (`Cmd+,` on macOS or `Ctrl+,` on Linux) and add to your
+`settings.json`:
+
+```json
+{
+  "agent_servers": {
+    "hermes-agent": {
+      "type": "custom",
+      "command": "hermes",
+      "args": ["acp"],
+    },
+  },
+}
+```
+
+### 2. Restart Zed
+
+Hermes Agent will appear in the agent panel. Select it and start a conversation.
+
+---
+
+## JetBrains Setup (IntelliJ, PyCharm, WebStorm, etc.)
+
+### 1. Install the ACP plugin
+
+- Open **Settings** → **Plugins** → **Marketplace**
+- Search for **"ACP"** or **"Agent Client Protocol"**
+- Install and restart the IDE
+
+### 2. Configure the agent
+
+- Open **Settings** → **Tools** → **ACP Agents**
+- Click **+** to add a new agent
+- Set the registry directory to your `acp_registry/` folder:
+  `/path/to/hermes-agent/acp_registry`
+- Click **OK**
+
+### 3. Use the agent
+
+Open the ACP panel (usually in the right sidebar) and select **Hermes Agent**.
+
+---
+
+## What You Will See
+
+Once connected, your editor provides a native interface to Hermes Agent:
+
+### Chat Panel
+A conversational interface where you can describe tasks, ask questions, and
+give instructions. Hermes responds with explanations and actions.
+
+### File Diffs
+When Hermes edits files, you see standard diffs in the editor. You can:
+- **Accept** individual changes
+- **Reject** changes you don't want
+- **Review** the full diff before applying
+
+### Terminal Commands
+When Hermes needs to run shell commands (builds, tests, installs), the editor
+shows them in an integrated terminal. Depending on your settings:
+- Commands may run automatically
+- Or you may be prompted to **approve** each command
+
+### Approval Flow
+For potentially destructive operations, the editor will prompt you for
+approval before Hermes proceeds. This includes:
+- File deletions
+- Shell commands
+- Git operations
+
+---
+
+## Configuration
+
+Hermes Agent under ACP uses the **same configuration** as the CLI:
+
+- **API keys / providers**: `~/.hermes/.env`
+- **Agent config**: `~/.hermes/config.yaml`
+- **Skills**: `~/.hermes/skills/`
+- **Sessions**: `~/.hermes/state.db`
+
+You can run `hermes setup` to configure providers, or edit `~/.hermes/.env`
+directly.
+
+### Changing the model
+
+Edit `~/.hermes/config.yaml`:
+
+```yaml
+model: openrouter/nous/hermes-3-llama-3.1-70b
+```
+
+Or set the `HERMES_MODEL` environment variable.
+
+### Toolsets
+
+ACP sessions use the curated `hermes-acp` toolset by default. It is designed for editor workflows and intentionally excludes things like messaging delivery, cronjob management, and audio-first UX features.
+
+---
+
+## Troubleshooting
+
+### Agent doesn't appear in the editor
+
+1. **Check the registry path** — make sure the `acp_registry/` directory path
+   in your editor settings is correct and contains `agent.json`.
+2. **Check `hermes` is on PATH** — run `which hermes` in a terminal. If not
+   found, you may need to activate your virtualenv or add it to PATH.
+3. **Restart the editor** after changing settings.
+
+### Agent starts but errors immediately
+
+1. Run `hermes doctor` to check your configuration.
+2. Check that you have a valid API key: `hermes status`
+3. Try running `hermes acp` directly in a terminal to see error output.
+
+### "Module not found" errors
+
+Make sure you installed the ACP extra:
+
+```bash
+pip install -e ".[acp]"
+```
+
+### Slow responses
+
+- ACP streams responses, so you should see incremental output. If the agent
+  appears stuck, check your network connection and API provider status.
+- Some providers have rate limits. Try switching to a different model/provider.
+
+### Permission denied for terminal commands
+
+If the editor blocks terminal commands, check your ACP Client extension
+settings for auto-approval or manual-approval preferences.
+
+### Logs
+
+Hermes logs are written to stderr when running in ACP mode. Check:
+- VS Code: **Output** panel → select **ACP Client** or **Hermes Agent**
+- Zed: **View** → **Toggle Terminal** and check the process output
+- JetBrains: **Event Log** or the ACP tool window
+
+You can also enable verbose logging:
+
+```bash
+HERMES_LOG_LEVEL=DEBUG hermes acp
+```
+
+---
+
+## Further Reading
+
+- [ACP Specification](https://github.com/anysphere/acp)
+- [Hermes Agent Documentation](https://github.com/NousResearch/hermes-agent)
+- Run `hermes --help` for all CLI options
--- a/docs/honcho-integration-spec.html
+++ b/docs/honcho-integration-spec.html
@@ -0,0 +1,698 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>honcho-integration-spec</title>
+<style>
+  :root {
+    --bg:             #0b0e14;
+    --bg-surface:     #11151c;
+    --bg-elevated:    #181d27;
+    --bg-code:        #0d1018;
+    --fg:             #c9d1d9;
+    --fg-bright:      #e6edf3;
+    --fg-muted:       #6e7681;
+    --fg-subtle:      #484f58;
+    --accent:         #7eb8f6;
+    --accent-dim:     #3d6ea5;
+    --accent-glow:    rgba(126, 184, 246, 0.08);
+    --green:          #7ee6a8;
+    --green-dim:      #2ea04f;
+    --orange:         #e6a855;
+    --red:            #f47067;
+    --purple:         #bc8cff;
+    --cyan:           #56d4dd;
+    --border:         #21262d;
+    --border-subtle:  #161b22;
+    --radius:         6px;
+    --font-sans:      'New York', ui-serif, 'Iowan Old Style', 'Apple Garamond', Baskerville, 'Times New Roman', 'Noto Emoji', serif;
+    --font-mono:      'Departure Mono', 'Noto Emoji', monospace;
+  }
+
+  *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+  html { scroll-behavior: smooth; scroll-padding-top: 2rem; }
+  body {
+    font-family: var(--font-sans);
+    background: var(--bg);
+    color: var(--fg);
+    line-height: 1.7;
+    font-size: 15px;
+    -webkit-font-smoothing: antialiased;
+  }
+
+  .container { max-width: 860px; margin: 0 auto; padding: 3rem 2rem 6rem; }
+
+  .hero {
+    text-align: center;
+    padding: 4rem 0 3rem;
+    border-bottom: 1px solid var(--border);
+    margin-bottom: 3rem;
+  }
+  .hero h1 { font-family: var(--font-mono); font-size: 2.2rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.03em; margin-bottom: 0.5rem; }
+  .hero h1 span { color: var(--accent); }
+  .hero .subtitle { font-family: var(--font-sans); color: var(--fg-muted); font-size: 0.92rem; max-width: 560px; margin: 0 auto; line-height: 1.6; }
+  .hero .meta { margin-top: 1.5rem; display: flex; justify-content: center; gap: 1.5rem; flex-wrap: wrap; }
+  .hero .meta span { font-size: 0.8rem; color: var(--fg-subtle); font-family: var(--font-mono); }
+
+  .toc { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.5rem 2rem; margin-bottom: 3rem; }
+  .toc h2 { font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.1em; color: var(--fg-muted); margin-bottom: 1rem; }
+  .toc ol { list-style: none; counter-reset: toc; columns: 2; column-gap: 2rem; }
+  .toc li { counter-increment: toc; break-inside: avoid; margin-bottom: 0.35rem; }
+  .toc li::before { content: counter(toc, decimal-leading-zero) " "; color: var(--fg-subtle); font-family: var(--font-mono); font-size: 0.75rem; margin-right: 0.25rem; }
+  .toc a { font-family: var(--font-mono); color: var(--fg); text-decoration: none; font-size: 0.82rem; transition: color 0.15s; }
+  .toc a:hover { color: var(--accent); }
+
+  section { margin-bottom: 4rem; }
+  section + section { padding-top: 1rem; }
+
+  h2 { font-family: var(--font-mono); font-size: 1.3rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.01em; margin-bottom: 1.25rem; padding-bottom: 0.5rem; border-bottom: 1px solid var(--border); }
+  h3 { font-family: var(--font-mono); font-size: 1rem; font-weight: 600; color: var(--fg-bright); margin-top: 2rem; margin-bottom: 0.75rem; }
+  h4 { font-family: var(--font-mono); font-size: 0.9rem; font-weight: 600; color: var(--accent); margin-top: 1.5rem; margin-bottom: 0.5rem; }
+
+  p { margin-bottom: 1rem; font-size: 0.95rem; line-height: 1.75; }
+  strong { color: var(--fg-bright); font-weight: 600; }
+  a { color: var(--accent); text-decoration: none; }
+  a:hover { text-decoration: underline; }
+
+  ul, ol { margin-bottom: 1rem; padding-left: 1.5rem; font-size: 0.93rem; line-height: 1.7; }
+  li { margin-bottom: 0.35rem; }
+  li::marker { color: var(--fg-subtle); }
+
+  .table-wrap { overflow-x: auto; margin-bottom: 1.5rem; }
+  table { width: 100%; border-collapse: collapse; font-size: 0.88rem; }
+  th, td { text-align: left; padding: 0.6rem 1rem; border-bottom: 1px solid var(--border-subtle); }
+  th { font-family: var(--font-mono); font-size: 0.72rem; text-transform: uppercase; letter-spacing: 0.06em; color: var(--fg-muted); background: var(--bg-surface); border-bottom-color: var(--border); white-space: nowrap; }
+  td { font-family: var(--font-sans); font-size: 0.88rem; color: var(--fg); }
+  tr:hover td { background: var(--accent-glow); }
+  td code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; font-family: var(--font-mono); font-size: 0.82em; color: var(--cyan); }
+
+  pre { background: var(--bg-code); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem 1.5rem; overflow-x: auto; margin-bottom: 1.5rem; font-family: var(--font-mono); font-size: 0.82rem; line-height: 1.65; color: var(--fg); }
+  pre code { background: none; padding: 0; color: inherit; font-size: inherit; }
+  code { font-family: var(--font-mono); font-size: 0.85em; }
+  p code, li code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; color: var(--cyan); font-size: 0.85em; }
+
+  .kw { color: var(--purple); }
+  .str { color: var(--green); }
+  .cm { color: var(--fg-subtle); font-style: italic; }
+  .num { color: var(--orange); }
+  .key { color: var(--accent); }
+
+  .mermaid { margin: 1.5rem 0 2rem; text-align: center; }
+  .mermaid svg { max-width: 100%; height: auto; }
+
+  .callout { font-family: var(--font-sans); background: var(--bg-surface); border-left: 3px solid var(--accent-dim); border-radius: 0 var(--radius) var(--radius) 0; padding: 1rem 1.25rem; margin-bottom: 1.5rem; font-size: 0.88rem; color: var(--fg-muted); line-height: 1.6; }
+  .callout strong { font-family: var(--font-mono); color: var(--fg-bright); }
+  .callout.success { border-left-color: var(--green-dim); }
+  .callout.warn { border-left-color: var(--orange); }
+
+  .badge { display: inline-block; font-family: var(--font-mono); font-size: 0.65rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; padding: 0.2em 0.6em; border-radius: 3px; vertical-align: middle; margin-left: 0.4rem; }
+  .badge-done { background: var(--green-dim); color: #fff; }
+  .badge-wip { background: var(--orange); color: #0b0e14; }
+  .badge-todo { background: var(--fg-subtle); color: var(--fg); }
+
+  .checklist { list-style: none; padding-left: 0; }
+  .checklist li { padding-left: 1.5rem; position: relative; margin-bottom: 0.5rem; }
+  .checklist li::before { position: absolute; left: 0; font-family: var(--font-mono); font-size: 0.85rem; }
+  .checklist li.done { color: var(--fg-muted); }
+  .checklist li.done::before { content: "\2713"; color: var(--green); }
+  .checklist li.todo::before { content: "\25CB"; color: var(--fg-subtle); }
+  .checklist li.wip::before { content: "\25D4"; color: var(--orange); }
+
+  .compare { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 2rem; }
+  .compare-card { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem; }
+  .compare-card h4 { margin-top: 0; font-size: 0.82rem; }
+  .compare-card.after { border-color: var(--accent-dim); }
+  .compare-card ul { font-family: var(--font-mono); padding-left: 1.25rem; font-size: 0.8rem; }
+
+  hr { border: none; border-top: 1px solid var(--border); margin: 3rem 0; }
+
+  .progress-bar { position: fixed; top: 0; left: 0; height: 2px; background: var(--accent); z-index: 999; transition: width 0.1s linear; }
+
+  @media (max-width: 640px) {
+    .container { padding: 2rem 1rem 4rem; }
+    .hero h1 { font-size: 1.6rem; }
+    .toc ol { columns: 1; }
+    .compare { grid-template-columns: 1fr; }
+    table { font-size: 0.8rem; }
+    th, td { padding: 0.4rem 0.6rem; }
+  }
+</style>
+<link rel="preconnect" href="https://fonts.googleapis.com">
+<link href="https://fonts.googleapis.com/css2?family=Noto+Emoji&display=swap" rel="stylesheet">
+<style>
+  @font-face {
+    font-family: 'Departure Mono';
+    src: url('https://cdn.jsdelivr.net/gh/rektdeckard/departure-mono@latest/fonts/DepartureMono-Regular.woff2') format('woff2');
+    font-weight: normal;
+    font-style: normal;
+    font-display: swap;
+  }
+</style>
+</head>
+<body>
+
+<div class="progress-bar" id="progress"></div>
+
+<div class="container">
+
+<header class="hero">
+  <h1>honcho<span>-integration-spec</span></h1>
+  <p class="subtitle">Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.</p>
+  <div class="meta">
+    <span>hermes-agent / openclaw-honcho</span>
+    <span>Python + TypeScript</span>
+    <span>2026-03-09</span>
+  </div>
+</header>
+
+<nav class="toc">
+  <h2>Contents</h2>
+  <ol>
+    <li><a href="#overview">Overview</a></li>
+    <li><a href="#architecture">Architecture comparison</a></li>
+    <li><a href="#diff-table">Diff table</a></li>
+    <li><a href="#patterns">Hermes patterns to port</a></li>
+    <li><a href="#spec-async">Spec: async prefetch</a></li>
+    <li><a href="#spec-reasoning">Spec: dynamic reasoning level</a></li>
+    <li><a href="#spec-modes">Spec: per-peer memory modes</a></li>
+    <li><a href="#spec-identity">Spec: AI peer identity formation</a></li>
+    <li><a href="#spec-sessions">Spec: session naming strategies</a></li>
+    <li><a href="#spec-cli">Spec: CLI surface injection</a></li>
+    <li><a href="#openclaw-checklist">openclaw-honcho checklist</a></li>
+    <li><a href="#nanobot-checklist">nanobot-honcho checklist</a></li>
+  </ol>
+</nav>
+
+<!-- OVERVIEW -->
+<section id="overview">
+  <h2>Overview</h2>
+
+  <p>Two independent Honcho integrations have been built for two different agent runtimes: <strong>Hermes Agent</strong> (Python, baked into the runner) and <strong>openclaw-honcho</strong> (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, <code>session.context()</code>, <code>peer.chat()</code> — but they made different tradeoffs at every layer.</p>
+
+  <p>This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.</p>
+
+  <div class="callout">
+    <strong>Scope</strong> Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
+  </div>
+</section>
+
+<!-- ARCHITECTURE -->
+<section id="architecture">
+  <h2>Architecture comparison</h2>
+
+  <h3>Hermes: baked-in runner</h3>
+  <p>Honcho is initialised directly inside <code>AIAgent.__init__</code>. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into <code>_cached_system_prompt</code>) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.</p>
+
+  <div class="mermaid">
+%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
+flowchart TD
+    U["user message"] --> P["_honcho_prefetch()<br/>(reads cache — no HTTP)"]
+    P --> SP["_build_system_prompt()<br/>(first turn only, cached)"]
+    SP --> LLM["LLM call"]
+    LLM --> R["response"]
+    R --> FP["_honcho_fire_prefetch()<br/>(daemon threads, turn end)"]
+    FP --> C1["prefetch_context() thread"]
+    FP --> C2["prefetch_dialectic() thread"]
+    C1 --> CACHE["_context_cache / _dialectic_cache"]
+    C2 --> CACHE
+
+    style U fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style P fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
+    style SP fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
+    style LLM fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style R fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style FP fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
+    style C1 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
+    style C2 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
+    style CACHE fill:#11151c,stroke:#484f58,color:#6e7681
+  </div>
+
+  <h3>openclaw-honcho: hook-based plugin</h3>
+  <p>The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside <code>before_prompt_build</code> on every turn. Message capture happens in <code>agent_end</code>. The multi-agent hierarchy is tracked via <code>subagent_spawned</code>. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.</p>
+
+  <div class="mermaid">
+%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
+flowchart TD
+    U2["user message"] --> BPB["before_prompt_build<br/>(BLOCKING HTTP — every turn)"]
+    BPB --> CTX["session.context()"]
+    CTX --> SP2["system prompt assembled"]
+    SP2 --> LLM2["LLM call"]
+    LLM2 --> R2["response"]
+    R2 --> AE["agent_end hook"]
+    AE --> SAVE["session.addMessages()<br/>session.setMetadata()"]
+
+    style U2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style BPB fill:#3a1515,stroke:#f47067,color:#c9d1d9
+    style CTX fill:#3a1515,stroke:#f47067,color:#c9d1d9
+    style SP2 fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
+    style LLM2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style R2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style AE fill:#162030,stroke:#3d6ea5,color:#c9d1d9
+    style SAVE fill:#11151c,stroke:#484f58,color:#6e7681
+  </div>
+</section>
+
+<!-- DIFF TABLE -->
+<section id="diff-table">
+  <h2>Diff table</h2>
+
+  <div class="table-wrap">
+    <table>
+      <thead>
+        <tr>
+          <th>Dimension</th>
+          <th>Hermes Agent</th>
+          <th>openclaw-honcho</th>
+        </tr>
+      </thead>
+      <tbody>
+        <tr>
+          <td><strong>Context injection timing</strong></td>
+          <td>Once per session (cached). Zero HTTP on response path after turn 1.</td>
+          <td>Every turn, blocking. Fresh context per turn but adds latency.</td>
+        </tr>
+        <tr>
+          <td><strong>Prefetch strategy</strong></td>
+          <td>Daemon threads fire at turn end; consumed next turn from cache.</td>
+          <td>None. Blocking call at prompt-build time.</td>
+        </tr>
+        <tr>
+          <td><strong>Dialectic (peer.chat)</strong></td>
+          <td>Prefetched async; result injected into system prompt next turn.</td>
+          <td>On-demand via <code>honcho_recall</code> / <code>honcho_analyze</code> tools.</td>
+        </tr>
+        <tr>
+          <td><strong>Reasoning level</strong></td>
+          <td>Dynamic: scales with message length. Floor = config default. Cap = "high".</td>
+          <td>Fixed per tool: recall=minimal, analyze=medium.</td>
+        </tr>
+        <tr>
+          <td><strong>Memory modes</strong></td>
+          <td><code>user_memory_mode</code> / <code>agent_memory_mode</code>: hybrid / honcho / local.</td>
+          <td>None. Always writes to Honcho.</td>
+        </tr>
+        <tr>
+          <td><strong>Write frequency</strong></td>
+          <td>async (background queue), turn, session, N turns.</td>
+          <td>After every agent_end (no control).</td>
+        </tr>
+        <tr>
+          <td><strong>AI peer identity</strong></td>
+          <td><code>observe_me=True</code>, <code>seed_ai_identity()</code>, <code>get_ai_representation()</code>, SOUL.md → AI peer.</td>
+          <td>Agent files uploaded to agent peer at setup. No ongoing self-observation seeding.</td>
+        </tr>
+        <tr>
+          <td><strong>Context scope</strong></td>
+          <td>User peer + AI peer representation, both injected.</td>
+          <td>User peer (owner) representation + conversation summary. <code>peerPerspective</code> on context call.</td>
+        </tr>
+        <tr>
+          <td><strong>Session naming</strong></td>
+          <td>per-directory / global / manual map / title-based.</td>
+          <td>Derived from platform session key.</td>
+        </tr>
+        <tr>
+          <td><strong>Multi-agent</strong></td>
+          <td>Single-agent only.</td>
+          <td>Parent observer hierarchy via <code>subagent_spawned</code>.</td>
+        </tr>
+        <tr>
+          <td><strong>Tool surface</strong></td>
+          <td>Single <code>query_user_context</code> tool (on-demand dialectic).</td>
+          <td>6 tools: session, profile, search, context (fast) + recall, analyze (LLM).</td>
+        </tr>
+        <tr>
+          <td><strong>Platform metadata</strong></td>
+          <td>Not stripped.</td>
+          <td>Explicitly stripped before Honcho storage.</td>
+        </tr>
+        <tr>
+          <td><strong>Message dedup</strong></td>
+          <td>None (sends on every save cycle).</td>
+          <td><code>lastSavedIndex</code> in session metadata prevents re-sending.</td>
+        </tr>
+        <tr>
+          <td><strong>CLI surface in prompt</strong></td>
+          <td>Management commands injected into system prompt. Agent knows its own CLI.</td>
+          <td>Not injected.</td>
+        </tr>
+        <tr>
+          <td><strong>AI peer name in identity</strong></td>
+          <td>Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured.</td>
+          <td>Not implemented.</td>
+        </tr>
+        <tr>
+          <td><strong>QMD / local file search</strong></td>
+          <td>Not implemented.</td>
+          <td>Passthrough tools when QMD backend configured.</td>
+        </tr>
+        <tr>
+          <td><strong>Workspace metadata</strong></td>
+          <td>Not implemented.</td>
+          <td><code>agentPeerMap</code> in workspace metadata tracks agent&#8594;peer ID.</td>
+        </tr>
+      </tbody>
+    </table>
+  </div>
+</section>
+
+<!-- PATTERNS -->
+<section id="patterns">
+  <h2>Hermes patterns to port</h2>
+
+  <p>Six patterns from Hermes are worth adopting in any Honcho integration. They are described below as integration-agnostic interfaces — the implementation will differ per runtime, but the contract is the same.</p>
+
+  <div class="compare">
+    <div class="compare-card">
+      <h4>Patterns Hermes contributes</h4>
+      <ul>
+        <li>Async prefetch (zero-latency)</li>
+        <li>Dynamic reasoning level</li>
+        <li>Per-peer memory modes</li>
+        <li>AI peer identity formation</li>
+        <li>Session naming strategies</li>
+        <li>CLI surface injection</li>
+      </ul>
+    </div>
+    <div class="compare-card after">
+      <h4>Patterns openclaw contributes back</h4>
+      <ul>
+        <li>lastSavedIndex dedup</li>
+        <li>Platform metadata stripping</li>
+        <li>Multi-agent observer hierarchy</li>
+        <li>peerPerspective on context()</li>
+        <li>Tiered tool surface (fast/LLM)</li>
+        <li>Workspace agentPeerMap</li>
+      </ul>
+    </div>
+  </div>
+</section>
+
+<!-- SPEC: ASYNC PREFETCH -->
+<section id="spec-async">
+  <h2>Spec: async prefetch</h2>
+
+  <h3>Problem</h3>
+  <p>Calling <code>session.context()</code> and <code>peer.chat()</code> synchronously before each LLM call adds 200–800ms of Honcho round-trip latency to every turn. Users experience this as the agent "thinking slowly."</p>
+
+  <h3>Pattern</h3>
+  <p>Fire both calls as non-blocking background work at the <strong>end</strong> of each turn. Store results in a per-session cache keyed by session ID. At the <strong>start</strong> of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.</p>
+
+  <h3>Interface contract</h3>
+  <pre><code><span class="cm">// TypeScript (openclaw / nanobot plugin shape)</span>
+
+<span class="kw">interface</span> <span class="key">AsyncPrefetch</span> {
+  <span class="cm">// Fire context + dialectic fetches at turn end. Non-blocking.</span>
+  firePrefetch(sessionId: <span class="str">string</span>, userMessage: <span class="str">string</span>): <span class="kw">void</span>;
+
+  <span class="cm">// Pop cached results at turn start. Returns empty if cache is cold.</span>
+  popContextResult(sessionId: <span class="str">string</span>): ContextResult | <span class="kw">null</span>;
+  popDialecticResult(sessionId: <span class="str">string</span>): <span class="str">string</span> | <span class="kw">null</span>;
+}
+
+<span class="kw">type</span> <span class="key">ContextResult</span> = {
+  representation: <span class="str">string</span>;
+  card: <span class="str">string</span>[];
+  aiRepresentation?: <span class="str">string</span>;  <span class="cm">// AI peer context if enabled</span>
+  summary?: <span class="str">string</span>;            <span class="cm">// conversation summary if fetched</span>
+};</code></pre>
+
+  <h3>Implementation notes</h3>
+  <ul>
+    <li>Python: <code>threading.Thread(daemon=True)</code>. Write to <code>dict[session_id, result]</code> — GIL makes this safe for simple writes.</li>
+    <li>TypeScript: <code>Promise</code> stored in <code>Map&lt;string, Promise&lt;ContextResult&gt;&gt;</code>. Await at pop time. If not resolved yet, skip (return null) — do not block.</li>
+    <li>The pop is destructive: clears the cache entry after reading so stale data never accumulates.</li>
+    <li>Prefetch should also fire on first turn (even though it won't be consumed until turn 2) — this ensures turn 2 is never cold.</li>
+  </ul>
+
+  <h3>openclaw-honcho adoption</h3>
+  <p>Move <code>session.context()</code> from <code>before_prompt_build</code> to a post-<code>agent_end</code> background task. Store result in <code>state.contextCache</code>. In <code>before_prompt_build</code>, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.</p>
+</section>
+
+<!-- SPEC: DYNAMIC REASONING LEVEL -->
+<section id="spec-reasoning">
+  <h2>Spec: dynamic reasoning level</h2>
+
+  <h3>Problem</h3>
+  <p>Honcho's dialectic endpoint supports reasoning levels from <code>minimal</code> to <code>max</code>. A fixed level per tool wastes budget on simple queries and under-serves complex ones.</p>
+
+  <h3>Pattern</h3>
+  <p>Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at <code>high</code> — never select <code>max</code> automatically.</p>
+
+  <h3>Interface contract</h3>
+  <pre><code><span class="cm">// Shared helper — identical logic in any language</span>
+
+<span class="kw">const</span> LEVELS = [<span class="str">"minimal"</span>, <span class="str">"low"</span>, <span class="str">"medium"</span>, <span class="str">"high"</span>, <span class="str">"max"</span>];
+
+<span class="kw">function</span> <span class="key">dynamicReasoningLevel</span>(
+  query: <span class="str">string</span>,
+  configDefault: <span class="str">string</span> = <span class="str">"low"</span>
+): <span class="str">string</span> {
+  <span class="kw">const</span> baseIdx = Math.max(<span class="num">0</span>, LEVELS.indexOf(configDefault));
+  <span class="kw">const</span> n = query.length;
+  <span class="kw">const</span> bump = n &lt; <span class="num">120</span> ? <span class="num">0</span> : n &lt; <span class="num">400</span> ? <span class="num">1</span> : <span class="num">2</span>;
+  <span class="kw">return</span> LEVELS[Math.min(baseIdx + bump, <span class="num">3</span>)]; <span class="cm">// cap at "high" (idx 3)</span>
+}</code></pre>
+
+  <h3>Config key</h3>
+  <p>Add a <code>dialecticReasoningLevel</code> config field (string, default <code>"low"</code>). This sets the floor. Users can raise or lower it. The dynamic bump always applies on top.</p>
+
+  <h3>openclaw-honcho adoption</h3>
+  <p>Apply in <code>honcho_recall</code> and <code>honcho_analyze</code>: replace the fixed <code>reasoningLevel</code> with the dynamic selector. <code>honcho_recall</code> should use floor <code>"minimal"</code> and <code>honcho_analyze</code> floor <code>"medium"</code> — both still bump with message length.</p>
+</section>
+
+<!-- SPEC: PER-PEER MEMORY MODES -->
+<section id="spec-modes">
+  <h2>Spec: per-peer memory modes</h2>
+
+  <h3>Problem</h3>
+  <p>Users want independent control over whether user context and agent context are written locally, to Honcho, or both. A single <code>memoryMode</code> shorthand is not granular enough.</p>
+
+  <h3>Pattern</h3>
+  <p>Three modes per peer: <code>hybrid</code> (write both local + Honcho), <code>honcho</code> (Honcho only, disable local files), <code>local</code> (local files only, skip Honcho sync for this peer). Two orthogonal axes: user peer and agent peer.</p>
+
+  <h3>Config schema</h3>
+  <pre><code><span class="cm">// ~/.openclaw/openclaw.json  (or ~/.nanobot/config.json)</span>
+{
+  <span class="str">"plugins"</span>: {
+    <span class="str">"openclaw-honcho"</span>: {
+      <span class="str">"config"</span>: {
+        <span class="str">"apiKey"</span>: <span class="str">"..."</span>,
+        <span class="str">"memoryMode"</span>: <span class="str">"hybrid"</span>,          <span class="cm">// shorthand: both peers</span>
+        <span class="str">"userMemoryMode"</span>: <span class="str">"honcho"</span>,       <span class="cm">// override for user peer</span>
+        <span class="str">"agentMemoryMode"</span>: <span class="str">"hybrid"</span>       <span class="cm">// override for agent peer</span>
+      }
+    }
+  }
+}</code></pre>
+
+  <h3>Resolution order</h3>
+  <ol>
+    <li>Per-peer field (<code>userMemoryMode</code> / <code>agentMemoryMode</code>) — wins if present.</li>
+    <li>Shorthand <code>memoryMode</code> — applies to both peers as default.</li>
+    <li>Hardcoded default: <code>"hybrid"</code>.</li>
+  </ol>
+
+  <h3>Effect on Honcho sync</h3>
+  <ul>
+    <li><code>userMemoryMode=local</code>: skip adding user peer messages to Honcho.</li>
+    <li><code>agentMemoryMode=local</code>: skip adding assistant peer messages to Honcho.</li>
+    <li>Both local: skip <code>session.addMessages()</code> entirely.</li>
+    <li><code>userMemoryMode=honcho</code>: disable local USER.md writes.</li>
+    <li><code>agentMemoryMode=honcho</code>: disable local MEMORY.md / SOUL.md writes.</li>
+  </ul>
+</section>
+
+<!-- SPEC: AI PEER IDENTITY -->
+<section id="spec-identity">
+  <h2>Spec: AI peer identity formation</h2>
+
+  <h3>Problem</h3>
+  <p>Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if <code>observe_me=True</code> is set for the agent peer. Without it, the agent peer accumulates nothing and Honcho's AI-side model never forms.</p>
+
+  <p>Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation, rather than waiting for it to emerge from scratch.</p>
+
+  <h3>Part A: observe_me=True for agent peer</h3>
+  <pre><code><span class="cm">// TypeScript — in session.addPeers() call</span>
+<span class="kw">await</span> session.addPeers([
+  [ownerPeer.id, { observeMe: <span class="kw">true</span>,  observeOthers: <span class="kw">false</span> }],
+  [agentPeer.id, { observeMe: <span class="kw">true</span>,  observeOthers: <span class="kw">true</span>  }], <span class="cm">// was false</span>
+]);</code></pre>
+
+  <p>This is a one-line change but foundational. Without it, Honcho's AI peer representation stays empty regardless of what the agent says.</p>
+
+  <h3>Part B: seedAiIdentity()</h3>
+  <pre><code><span class="kw">async function</span> <span class="key">seedAiIdentity</span>(
+  session: HonchoSession,
+  agentPeer: Peer,
+  content: <span class="str">string</span>,
+  source: <span class="str">string</span>
+): Promise&lt;<span class="kw">boolean</span>&gt; {
+  <span class="kw">const</span> wrapped = [
+    <span class="str">`&lt;ai_identity_seed&gt;`</span>,
+    <span class="str">`&lt;source&gt;${source}&lt;/source&gt;`</span>,
+    <span class="str">``</span>,
+    content.trim(),
+    <span class="str">`&lt;/ai_identity_seed&gt;`</span>,
+  ].join(<span class="str">"\n"</span>);
+
+  <span class="kw">await</span> agentPeer.addMessage(<span class="str">"assistant"</span>, wrapped);
+  <span class="kw">return true</span>;
+}</code></pre>
+
+  <h3>Part C: migrate agent files at setup</h3>
+  <p>During <code>openclaw honcho setup</code>, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md, BOOTSTRAP.md) to the agent peer using <code>seedAiIdentity()</code> instead of <code>session.uploadFile()</code>. This routes the content through Honcho's observation pipeline rather than the file store.</p>
+
+  <h3>Part D: AI peer name in identity</h3>
+  <p>When the agent has a configured name (non-default), inject it into the agent's self-identity prefix. In OpenClaw this means adding to the injected system prompt section:</p>
+  <pre><code><span class="cm">// In context hook return value</span>
+<span class="kw">return</span> {
+  systemPrompt: [
+    agentName ? <span class="str">`You are ${agentName}.`</span> : <span class="str">""</span>,
+    <span class="str">"## User Memory Context"</span>,
+    ...sections,
+  ].filter(Boolean).join(<span class="str">"\n\n"</span>)
+};</code></pre>
+
+  <h3>CLI surface: honcho identity subcommand</h3>
+  <pre><code>openclaw honcho identity &lt;file&gt;    <span class="cm"># seed from file</span>
+openclaw honcho identity --show    <span class="cm"># show current AI peer representation</span></code></pre>
+</section>
+
+<!-- SPEC: SESSION NAMING -->
+<section id="spec-sessions">
+  <h2>Spec: session naming strategies</h2>
+
+  <h3>Problem</h3>
+  <p>When Honcho is used across multiple projects or directories, a single global session means every project shares the same context. Per-directory sessions provide isolation without requiring users to name sessions manually.</p>
+
+  <h3>Strategies</h3>
+  <div class="table-wrap">
+    <table>
+      <thead><tr><th>Strategy</th><th>Session key</th><th>When to use</th></tr></thead>
+      <tbody>
+        <tr><td><code>per-directory</code></td><td>basename of CWD</td><td>Default. Each project gets its own session.</td></tr>
+        <tr><td><code>global</code></td><td>fixed string <code>"global"</code></td><td>Single cross-project session.</td></tr>
+        <tr><td>manual map</td><td>user-configured per path</td><td><code>sessions</code> config map overrides directory basename.</td></tr>
+        <tr><td>title-based</td><td>sanitized session title</td><td>When agent supports named sessions; title set mid-conversation.</td></tr>
+      </tbody>
+    </table>
+  </div>
+
+  <h3>Config schema</h3>
+  <pre><code>{
+  <span class="str">"sessionStrategy"</span>: <span class="str">"per-directory"</span>,   <span class="cm">// "per-directory" | "global"</span>
+  <span class="str">"sessionPeerPrefix"</span>: <span class="kw">false</span>,            <span class="cm">// prepend peer name to session key</span>
+  <span class="str">"sessions"</span>: {                            <span class="cm">// manual overrides</span>
+    <span class="str">"/home/user/projects/foo"</span>: <span class="str">"foo-project"</span>
+  }
+}</code></pre>
+
+  <h3>CLI surface</h3>
+  <pre><code>openclaw honcho sessions              <span class="cm"># list all mappings</span>
+openclaw honcho map &lt;name&gt;           <span class="cm"># map cwd to session name</span>
+openclaw honcho map                   <span class="cm"># no-arg = list mappings</span></code></pre>
+
+  <p>Resolution order: manual map wins &rarr; session title &rarr; directory basename &rarr; platform key.</p>
+</section>
+
+<!-- SPEC: CLI SURFACE INJECTION -->
+<section id="spec-cli">
+  <h2>Spec: CLI surface injection</h2>
+
+  <h3>Problem</h3>
+  <p>When a user asks "how do I change my memory settings?" or "what Honcho commands are available?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.</p>
+
+  <h3>Pattern</h3>
+  <p>When Honcho is active, append a compact command reference to the system prompt. The agent can cite these commands directly instead of guessing.</p>
+
+  <pre><code><span class="cm">// In context hook, append to systemPrompt</span>
+<span class="kw">const</span> honchoSection = [
+  <span class="str">"# Honcho memory integration"</span>,
+  <span class="str">`Active. Session: ${sessionKey}. Mode: ${mode}.`</span>,
+  <span class="str">"Management commands:"</span>,
+  <span class="str">"  openclaw honcho status                    — show config + connection"</span>,
+  <span class="str">"  openclaw honcho mode [hybrid|honcho|local] — show or set memory mode"</span>,
+  <span class="str">"  openclaw honcho sessions                  — list session mappings"</span>,
+  <span class="str">"  openclaw honcho map &lt;name&gt;                — map directory to session"</span>,
+  <span class="str">"  openclaw honcho identity [file] [--show]  — seed or show AI identity"</span>,
+  <span class="str">"  openclaw honcho setup                     — full interactive wizard"</span>,
+].join(<span class="str">"\n"</span>);</code></pre>
+
+  <div class="callout warn">
+    <strong>Keep it compact.</strong> This section is injected every turn. Keep it under 300 chars of context. List commands, not explanations — the agent can explain them on request.
+  </div>
+</section>
+
+<!-- OPENCLAW CHECKLIST -->
+<section id="openclaw-checklist">
+  <h2>openclaw-honcho checklist</h2>
+
+  <p>Ordered by impact. Each item maps to a spec section above.</p>
+
+  <ul class="checklist">
+    <li class="todo"><strong>Async prefetch</strong> — move <code>session.context()</code> out of <code>before_prompt_build</code> into post-<code>agent_end</code> background Promise. Pop from cache at prompt build. (<a href="#spec-async">spec</a>)</li>
+    <li class="todo"><strong>observe_me=True for agent peer</strong> — one-line change in <code>session.addPeers()</code> config for agent peer. (<a href="#spec-identity">spec</a>)</li>
+    <li class="todo"><strong>Dynamic reasoning level</strong> — add <code>dynamicReasoningLevel()</code> helper; apply in <code>honcho_recall</code> and <code>honcho_analyze</code>. Add <code>dialecticReasoningLevel</code> to config schema. (<a href="#spec-reasoning">spec</a>)</li>
+    <li class="todo"><strong>Per-peer memory modes</strong> — add <code>userMemoryMode</code> / <code>agentMemoryMode</code> to config; gate Honcho sync and local writes accordingly. (<a href="#spec-modes">spec</a>)</li>
+    <li class="todo"><strong>seedAiIdentity()</strong> — add helper; apply during setup migration for SOUL.md / IDENTITY.md instead of <code>session.uploadFile()</code>. (<a href="#spec-identity">spec</a>)</li>
+    <li class="todo"><strong>Session naming strategies</strong> — add <code>sessionStrategy</code>, <code>sessions</code> map, <code>sessionPeerPrefix</code> to config; implement resolution function. (<a href="#spec-sessions">spec</a>)</li>
+    <li class="todo"><strong>CLI surface injection</strong> — append command reference to <code>before_prompt_build</code> return value when Honcho is active. (<a href="#spec-cli">spec</a>)</li>
+    <li class="todo"><strong>honcho identity subcommand</strong> — add <code>openclaw honcho identity</code> CLI command. (<a href="#spec-identity">spec</a>)</li>
+    <li class="todo"><strong>AI peer name injection</strong> — if <code>aiPeer</code> name configured, prepend to injected system prompt. (<a href="#spec-identity">spec</a>)</li>
+    <li class="todo"><strong>honcho mode / honcho sessions / honcho map</strong> — CLI parity with Hermes. (<a href="#spec-sessions">spec</a>)</li>
+  </ul>
+
+  <div class="callout success">
+    <strong>Already done in openclaw-honcho (do not re-implement):</strong> lastSavedIndex dedup, platform metadata stripping, multi-agent parent observer hierarchy, peerPerspective on context(), tiered tool surface (fast/LLM), workspace agentPeerMap, QMD passthrough, self-hosted Honcho support.
+  </div>
+</section>
+
+<!-- NANOBOT CHECKLIST -->
+<section id="nanobot-checklist">
+  <h2>nanobot-honcho checklist</h2>
+
+  <p>nanobot-honcho is a greenfield integration. Start from openclaw-honcho's architecture (hook-based, dual peer) and apply all Hermes patterns from day one rather than retrofitting. Priority order:</p>
+
+  <h3>Phase 1 — core correctness</h3>
+  <ul class="checklist">
+    <li class="todo">Dual peer model (owner + agent peer), both with <code>observe_me=True</code></li>
+    <li class="todo">Message capture at turn end with <code>lastSavedIndex</code> dedup</li>
+    <li class="todo">Platform metadata stripping before Honcho storage</li>
+    <li class="todo">Async prefetch from day one — do not implement blocking context injection</li>
+    <li class="todo">Legacy file migration at first activation (USER.md → owner peer, SOUL.md → <code>seedAiIdentity()</code>)</li>
+  </ul>
+
+  <h3>Phase 2 — configuration</h3>
+  <ul class="checklist">
+    <li class="todo">Config schema: <code>apiKey</code>, <code>workspaceId</code>, <code>baseUrl</code>, <code>memoryMode</code>, <code>userMemoryMode</code>, <code>agentMemoryMode</code>, <code>dialecticReasoningLevel</code>, <code>sessionStrategy</code>, <code>sessions</code></li>
+    <li class="todo">Per-peer memory mode gating</li>
+    <li class="todo">Dynamic reasoning level</li>
+    <li class="todo">Session naming strategies</li>
+  </ul>
+
+  <h3>Phase 3 — tools and CLI</h3>
+  <ul class="checklist">
+    <li class="todo">Tool surface: <code>honcho_profile</code>, <code>honcho_recall</code>, <code>honcho_analyze</code>, <code>honcho_search</code>, <code>honcho_context</code></li>
+    <li class="todo">CLI: <code>setup</code>, <code>status</code>, <code>sessions</code>, <code>map</code>, <code>mode</code>, <code>identity</code></li>
+    <li class="todo">CLI surface injection into system prompt</li>
+    <li class="todo">AI peer name wired into agent identity</li>
+  </ul>
+</section>
+
+</div>
+
+<script type="module">
+  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
+  mermaid.initialize({ startOnLoad: true, securityLevel: 'loose', fontFamily: 'Departure Mono, Noto Emoji, monospace' });
+</script>
+<script>
+  window.addEventListener('scroll', () => {
+    const bar = document.getElementById('progress');
+    const max = document.documentElement.scrollHeight - window.innerHeight;
+    bar.style.width = (max > 0 ? (window.scrollY / max) * 100 : 0) + '%';
+  });
+</script>
+</body>
+</html>
--- a/docs/honcho-integration-spec.md
+++ b/docs/honcho-integration-spec.md
@@ -0,0 +1,377 @@
+# honcho-integration-spec
+
+Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.
+
+---
+
+## Overview
+
+Two independent Honcho integrations have been built for two different agent runtimes: **Hermes Agent** (Python, baked into the runner) and **openclaw-honcho** (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, `session.context()`, `peer.chat()` — but they made different tradeoffs at every layer.
+
+This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.
+
+> **Scope** Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
+
+---
+
+## Architecture comparison
+
+### Hermes: baked-in runner
+
+Honcho is initialised directly inside `AIAgent.__init__`. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into `_cached_system_prompt`) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.
+
+Turn flow:
+
+```
+user message
+  → _honcho_prefetch()       (reads cache — no HTTP)
+  → _build_system_prompt()   (first turn only, cached)
+  → LLM call
+  → response
+  → _honcho_fire_prefetch()  (daemon threads, turn end)
+       → prefetch_context() thread  ──┐
+       → prefetch_dialectic() thread ─┴→ _context_cache / _dialectic_cache
+```
+
+### openclaw-honcho: hook-based plugin
+
+The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside `before_prompt_build` on every turn. Message capture happens in `agent_end`. The multi-agent hierarchy is tracked via `subagent_spawned`. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.
+
+Turn flow:
+
+```
+user message
+  → before_prompt_build (BLOCKING HTTP — every turn)
+       → session.context()
+  → system prompt assembled
+  → LLM call
+  → response
+  → agent_end hook
+       → session.addMessages()
+       → session.setMetadata()
+```
+
+---
+
+## Diff table
+
+| Dimension | Hermes Agent | openclaw-honcho |
+|---|---|---|
+| **Context injection timing** | Once per session (cached). Zero HTTP on response path after turn 1. | Every turn, blocking. Fresh context per turn but adds latency. |
+| **Prefetch strategy** | Daemon threads fire at turn end; consumed next turn from cache. | None. Blocking call at prompt-build time. |
+| **Dialectic (peer.chat)** | Prefetched async; result injected into system prompt next turn. | On-demand via `honcho_recall` / `honcho_analyze` tools. |
+| **Reasoning level** | Dynamic: scales with message length. Floor = config default. Cap = "high". | Fixed per tool: recall=minimal, analyze=medium. |
+| **Memory modes** | `user_memory_mode` / `agent_memory_mode`: hybrid / honcho / local. | None. Always writes to Honcho. |
+| **Write frequency** | async (background queue), turn, session, N turns. | After every agent_end (no control). |
+| **AI peer identity** | `observe_me=True`, `seed_ai_identity()`, `get_ai_representation()`, SOUL.md → AI peer. | Agent files uploaded to agent peer at setup. No ongoing self-observation. |
+| **Context scope** | User peer + AI peer representation, both injected. | User peer (owner) representation + conversation summary. `peerPerspective` on context call. |
+| **Session naming** | per-directory / global / manual map / title-based. | Derived from platform session key. |
+| **Multi-agent** | Single-agent only. | Parent observer hierarchy via `subagent_spawned`. |
+| **Tool surface** | Single `query_user_context` tool (on-demand dialectic). | 6 tools: session, profile, search, context (fast) + recall, analyze (LLM). |
+| **Platform metadata** | Not stripped. | Explicitly stripped before Honcho storage. |
+| **Message dedup** | None. | `lastSavedIndex` in session metadata prevents re-sending. |
+| **CLI surface in prompt** | Management commands injected into system prompt. Agent knows its own CLI. | Not injected. |
+| **AI peer name in identity** | Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured. | Not implemented. |
+| **QMD / local file search** | Not implemented. | Passthrough tools when QMD backend configured. |
+| **Workspace metadata** | Not implemented. | `agentPeerMap` in workspace metadata tracks agent→peer ID. |
+
+---
+
+## Patterns
+
+Six patterns from Hermes are worth adopting in any Honcho integration. Each is described as an integration-agnostic interface.
+
+**Hermes contributes:**
+- Async prefetch (zero-latency)
+- Dynamic reasoning level
+- Per-peer memory modes
+- AI peer identity formation
+- Session naming strategies
+- CLI surface injection
+
+**openclaw-honcho contributes back (Hermes should adopt):**
+- `lastSavedIndex` dedup
+- Platform metadata stripping
+- Multi-agent observer hierarchy
+- `peerPerspective` on `context()`
+- Tiered tool surface (fast/LLM)
+- Workspace `agentPeerMap`
+
+---
+
+## Spec: async prefetch
+
+### Problem
+
+Calling `session.context()` and `peer.chat()` synchronously before each LLM call adds 200–800ms of Honcho round-trip latency to every turn.
+
+### Pattern
+
+Fire both calls as non-blocking background work at the **end** of each turn. Store results in a per-session cache keyed by session ID. At the **start** of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.
+
+### Interface contract
+
+```typescript
+interface AsyncPrefetch {
+  // Fire context + dialectic fetches at turn end. Non-blocking.
+  firePrefetch(sessionId: string, userMessage: string): void;
+
+  // Pop cached results at turn start. Returns empty if cache is cold.
+  popContextResult(sessionId: string): ContextResult | null;
+  popDialecticResult(sessionId: string): string | null;
+}
+
+type ContextResult = {
+  representation: string;
+  card: string[];
+  aiRepresentation?: string;  // AI peer context if enabled
+  summary?: string;           // conversation summary if fetched
+};
+```
+
+### Implementation notes
+
+- **Python:** `threading.Thread(daemon=True)`. Write to `dict[session_id, result]` — GIL makes this safe for simple writes.
+- **TypeScript:** `Promise` stored in `Map<string, Promise<ContextResult>>`. Await at pop time. If not resolved yet, return null — do not block.
+- The pop is destructive: clears the cache entry after reading so stale data never accumulates.
+- Prefetch should also fire on first turn (even though it won't be consumed until turn 2).
+
+### openclaw-honcho adoption
+
+Move `session.context()` from `before_prompt_build` to a post-`agent_end` background task. Store result in `state.contextCache`. In `before_prompt_build`, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.
+
+---
+
+## Spec: dynamic reasoning level
+
+### Problem
+
+Honcho's dialectic endpoint supports reasoning levels from `minimal` to `max`. A fixed level per tool wastes budget on simple queries and under-serves complex ones.
+
+### Pattern
+
+Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at `high` — never select `max` automatically.
+
+### Logic
+
+```
+< 120 chars  → default (typically "low")
+120–400 chars → one level above default (cap at "high")
+> 400 chars  → two levels above default (cap at "high")
+```
+
+### Config key
+
+Add `dialecticReasoningLevel` (string, default `"low"`). This sets the floor. The dynamic bump always applies on top.
+
+### openclaw-honcho adoption
+
+Apply in `honcho_recall` and `honcho_analyze`: replace fixed `reasoningLevel` with the dynamic selector. `honcho_recall` uses floor `"minimal"`, `honcho_analyze` uses floor `"medium"` — both still bump with message length.
+
+---
+
+## Spec: per-peer memory modes
+
+### Problem
+
+Users want independent control over whether user context and agent context are written locally, to Honcho, or both.
+
+### Modes
+
+| Mode | Effect |
+|---|---|
+| `hybrid` | Write to both local files and Honcho (default) |
+| `honcho` | Honcho only — disable corresponding local file writes |
+| `local` | Local files only — skip Honcho sync for this peer |
+
+### Config schema
+
+```json
+{
+  "memoryMode": "hybrid",
+  "userMemoryMode": "honcho",
+  "agentMemoryMode": "hybrid"
+}
+```
+
+Resolution order: per-peer field wins → shorthand `memoryMode` → default `"hybrid"`.
+
+### Effect on Honcho sync
+
+- `userMemoryMode=local`: skip adding user peer messages to Honcho
+- `agentMemoryMode=local`: skip adding assistant peer messages to Honcho
+- Both local: skip `session.addMessages()` entirely
+- `userMemoryMode=honcho`: disable local USER.md writes
+- `agentMemoryMode=honcho`: disable local MEMORY.md / SOUL.md writes
+
+---
+
+## Spec: AI peer identity formation
+
+### Problem
+
+Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if `observe_me=True` is set for the agent peer. Without it, the agent peer accumulates nothing.
+
+Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation.
+
+### Part A: observe_me=True for agent peer
+
+```typescript
+await session.addPeers([
+  [ownerPeer.id, { observeMe: true,  observeOthers: false }],
+  [agentPeer.id, { observeMe: true,  observeOthers: true  }], // was false
+]);
+```
+
+One-line change. Foundational. Without it, the AI peer representation stays empty regardless of what the agent says.
+
+### Part B: seedAiIdentity()
+
+```typescript
+async function seedAiIdentity(
+  agentPeer: Peer,
+  content: string,
+  source: string
+): Promise<boolean> {
+  const wrapped = [
+    `<ai_identity_seed>`,
+    `<source>${source}</source>`,
+    ``,
+    content.trim(),
+    `</ai_identity_seed>`,
+  ].join("\n");
+
+  await agentPeer.addMessage("assistant", wrapped);
+  return true;
+}
+```
+
+### Part C: migrate agent files at setup
+
+During `honcho setup`, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md) to the agent peer via `seedAiIdentity()` instead of `session.uploadFile()`. This routes content through Honcho's observation pipeline.
+
+### Part D: AI peer name in identity
+
+When the agent has a configured name, prepend it to the injected system prompt:
+
+```typescript
+const namePrefix = agentName ? `You are ${agentName}.\n\n` : "";
+return { systemPrompt: namePrefix + "## User Memory Context\n\n" + sections };
+```
+
+### CLI surface
+
+```
+honcho identity <file>    # seed from file
+honcho identity --show    # show current AI peer representation
+```
+
+---
+
+## Spec: session naming strategies
+
+### Problem
+
+A single global session means every project shares the same Honcho context. Per-directory sessions provide isolation without requiring users to name sessions manually.
+
+### Strategies
+
+| Strategy | Session key | When to use |
+|---|---|---|
+| `per-directory` | basename of CWD | Default. Each project gets its own session. |
+| `global` | fixed string `"global"` | Single cross-project session. |
+| manual map | user-configured per path | `sessions` config map overrides directory basename. |
+| title-based | sanitized session title | When agent supports named sessions set mid-conversation. |
+
+### Config schema
+
+```json
+{
+  "sessionStrategy": "per-directory",
+  "sessionPeerPrefix": false,
+  "sessions": {
+    "/home/user/projects/foo": "foo-project"
+  }
+}
+```
+
+### CLI surface
+
+```
+honcho sessions              # list all mappings
+honcho map <name>            # map cwd to session name
+honcho map                   # no-arg = list mappings
+```
+
+Resolution order: manual map → session title → directory basename → platform key.
+
+---
+
+## Spec: CLI surface injection
+
+### Problem
+
+When a user asks "how do I change my memory settings?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.
+
+### Pattern
+
+When Honcho is active, append a compact command reference to the system prompt. Keep it under 300 chars.
+
+```
+# Honcho memory integration
+Active. Session: {sessionKey}. Mode: {mode}.
+Management commands:
+  honcho status                    — show config + connection
+  honcho mode [hybrid|honcho|local] — show or set memory mode
+  honcho sessions                  — list session mappings
+  honcho map <name>                — map directory to session
+  honcho identity [file] [--show]  — seed or show AI identity
+  honcho setup                     — full interactive wizard
+```
+
+---
+
+## openclaw-honcho checklist
+
+Ordered by impact:
+
+- [ ] **Async prefetch** — move `session.context()` out of `before_prompt_build` into post-`agent_end` background Promise
+- [ ] **observe_me=True for agent peer** — one-line change in `session.addPeers()`
+- [ ] **Dynamic reasoning level** — add helper; apply in `honcho_recall` and `honcho_analyze`; add `dialecticReasoningLevel` to config
+- [ ] **Per-peer memory modes** — add `userMemoryMode` / `agentMemoryMode` to config; gate Honcho sync and local writes
+- [ ] **seedAiIdentity()** — add helper; use during setup migration for SOUL.md / IDENTITY.md
+- [ ] **Session naming strategies** — add `sessionStrategy`, `sessions` map, `sessionPeerPrefix`
+- [ ] **CLI surface injection** — append command reference to `before_prompt_build` return value
+- [ ] **honcho identity subcommand** — seed from file or `--show` current representation
+- [ ] **AI peer name injection** — if `aiPeer` name configured, prepend to injected system prompt
+- [ ] **honcho mode / sessions / map** — CLI parity with Hermes
+
+Already done in openclaw-honcho (do not re-implement): `lastSavedIndex` dedup, platform metadata stripping, multi-agent parent observer, `peerPerspective` on `context()`, tiered tool surface, workspace `agentPeerMap`, QMD passthrough, self-hosted Honcho.
+
+---
+
+## nanobot-honcho checklist
+
+Greenfield integration. Start from openclaw-honcho's architecture and apply all Hermes patterns from day one.
+
+### Phase 1 — core correctness
+
+- [ ] Dual peer model (owner + agent peer), both with `observe_me=True`
+- [ ] Message capture at turn end with `lastSavedIndex` dedup
+- [ ] Platform metadata stripping before Honcho storage
+- [ ] Async prefetch from day one — do not implement blocking context injection
+- [ ] Legacy file migration at first activation (USER.md → owner peer, SOUL.md → `seedAiIdentity()`)
+
+### Phase 2 — configuration
+
+- [ ] Config schema: `apiKey`, `workspaceId`, `baseUrl`, `memoryMode`, `userMemoryMode`, `agentMemoryMode`, `dialecticReasoningLevel`, `sessionStrategy`, `sessions`
+- [ ] Per-peer memory mode gating
+- [ ] Dynamic reasoning level
+- [ ] Session naming strategies
+
+### Phase 3 — tools and CLI
+
+- [ ] Tool surface: `honcho_profile`, `honcho_recall`, `honcho_analyze`, `honcho_search`, `honcho_context`
+- [ ] CLI: `setup`, `status`, `sessions`, `map`, `mode`, `identity`
+- [ ] CLI surface injection into system prompt
+- [ ] AI peer name wired into agent identity
--- a/docs/migration/openclaw.md
+++ b/docs/migration/openclaw.md
@@ -0,0 +1,142 @@
+# Migrating from OpenClaw to Hermes Agent
+
+This guide covers how to import your OpenClaw settings, memories, skills, and API keys into Hermes Agent.
+
+## Three Ways to Migrate
+
+### 1. Automatic (during first-time setup)
+
+When you run `hermes setup` for the first time and Hermes detects `~/.openclaw`, it automatically offers to import your OpenClaw data before configuration begins. Just accept the prompt and everything is handled for you.
+
+### 2. CLI Command (quick, scriptable)
+
+```bash
+hermes claw migrate                      # Preview then migrate (always shows preview first)
+hermes claw migrate --dry-run            # Preview only, no changes
+hermes claw migrate --preset user-data   # Migrate without API keys/secrets
+hermes claw migrate --yes                # Skip confirmation prompt
+```
+
+The migration always shows a full preview of what will be imported before making any changes. You review the preview and confirm before anything is written.
+
+**All options:**
+
+| Flag | Description |
+|------|-------------|
+| `--source PATH` | Path to OpenClaw directory (default: `~/.openclaw`) |
+| `--dry-run` | Preview only — no files are modified |
+| `--preset {user-data,full}` | Migration preset (default: `full`). `user-data` excludes secrets |
+| `--overwrite` | Overwrite existing files (default: skip conflicts) |
+| `--migrate-secrets` | Include allowlisted secrets (auto-enabled with `full` preset) |
+| `--workspace-target PATH` | Copy workspace instructions (AGENTS.md) to this absolute path |
+| `--skill-conflict {skip,overwrite,rename}` | How to handle skill name conflicts (default: `skip`) |
+| `--yes`, `-y` | Skip confirmation prompts |
+
+### 3. Agent-Guided (interactive, with previews)
+
+Ask the agent to run the migration for you:
+
+```
+> Migrate my OpenClaw setup to Hermes
+```
+
+The agent will use the `openclaw-migration` skill to:
+1. Run a preview first to show what would change
+2. Ask about conflict resolution (SOUL.md, skills, etc.)
+3. Let you choose between `user-data` and `full` presets
+4. Execute the migration with your choices
+5. Print a detailed summary of what was migrated
+
+## What Gets Migrated
+
+### `user-data` preset
+| Item | Source | Destination |
+|------|--------|-------------|
+| SOUL.md | `~/.openclaw/workspace/SOUL.md` | `~/.hermes/SOUL.md` |
+| Memory entries | `~/.openclaw/workspace/MEMORY.md` | `~/.hermes/memories/MEMORY.md` |
+| User profile | `~/.openclaw/workspace/USER.md` | `~/.hermes/memories/USER.md` |
+| Skills | `~/.openclaw/workspace/skills/` | `~/.hermes/skills/openclaw-imports/` |
+| Command allowlist | `~/.openclaw/workspace/exec_approval_patterns.yaml` | Merged into `~/.hermes/config.yaml` |
+| Messaging settings | `~/.openclaw/config.yaml` (TELEGRAM_ALLOWED_USERS, MESSAGING_CWD) | `~/.hermes/.env` |
+| TTS assets | `~/.openclaw/workspace/tts/` | `~/.hermes/tts/` |
+
+Workspace files are also checked at `workspace.default/` and `workspace-main/` as fallback paths (OpenClaw renamed `workspace/` to `workspace-main/` in recent versions).
+
+### `full` preset (adds to `user-data`)
+| Item | Source | Destination |
+|------|--------|-------------|
+| Telegram bot token | `openclaw.json` channels config | `~/.hermes/.env` |
+| OpenRouter API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
+| OpenAI API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
+| Anthropic API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
+| ElevenLabs API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
+
+API keys are searched across four sources: inline config values, `~/.openclaw/.env`, the `openclaw.json` `"env"` sub-object, and per-agent auth profiles.
+
+Only allowlisted secrets are ever imported. Other credentials are skipped and reported.
+
+## OpenClaw Schema Compatibility
+
+The migration handles both old and current OpenClaw config layouts:
+
+- **Channel tokens**: Reads from flat paths (`channels.telegram.botToken`) and the newer `accounts.default` layout (`channels.telegram.accounts.default.botToken`)
+- **TTS provider**: OpenClaw renamed "edge" to "microsoft" — both are recognized and mapped to Hermes' "edge"
+- **Provider API types**: Both short (`openai`, `anthropic`) and hyphenated (`openai-completions`, `anthropic-messages`, `google-generative-ai`) values are mapped correctly
+- **thinkingDefault**: All enum values are handled including newer ones (`minimal`, `xhigh`, `adaptive`)
+- **Matrix**: Uses `accessToken` field (not `botToken`)
+- **SecretRef formats**: Plain strings, env templates (`${VAR}`), and `source: "env"` SecretRefs are resolved. `source: "file"` and `source: "exec"` SecretRefs produce a warning — add those keys manually after migration.
+
+## Conflict Handling
+
+By default, the migration **will not overwrite** existing Hermes data:
+
+- **SOUL.md** — skipped if one already exists in `~/.hermes/`
+- **Memory entries** — skipped if memories already exist (to avoid duplicates)
+- **Skills** — skipped if a skill with the same name already exists
+- **API keys** — skipped if the key is already set in `~/.hermes/.env`
+
+To overwrite conflicts, use `--overwrite`. The migration creates backups before overwriting.
+
+For skills, you can also use `--skill-conflict rename` to import conflicting skills under a new name (e.g., `skill-name-imported`).
+
+## Migration Report
+
+Every migration produces a report showing:
+- **Migrated items** — what was successfully imported
+- **Conflicts** — items skipped because they already exist
+- **Skipped items** — items not found in the source
+- **Errors** — items that failed to import
+
+For executed migrations, the full report is saved to `~/.hermes/migration/openclaw/<timestamp>/`.
+
+## Post-Migration Notes
+
+- **Skills require a new session** — imported skills take effect after restarting your agent or starting a new chat.
+- **WhatsApp requires re-pairing** — WhatsApp uses QR-code pairing, not token-based auth. Run `hermes whatsapp` to pair.
+- **Archive cleanup** — after migration, you'll be offered to rename `~/.openclaw/` to `.openclaw.pre-migration/` to prevent state confusion. You can also run `hermes claw cleanup` later.
+
+## Troubleshooting
+
+### "OpenClaw directory not found"
+The migration looks for `~/.openclaw` by default, then tries `~/.clawdbot` and `~/.moltbot`. If your OpenClaw is installed elsewhere, use `--source`:
+```bash
+hermes claw migrate --source /path/to/.openclaw
+```
+
+### "Migration script not found"
+The migration script ships with Hermes Agent. If you installed via pip (not git clone), the `optional-skills/` directory may not be present. Install the skill from the Skills Hub:
+```bash
+hermes skills install openclaw-migration
+```
+
+### Memory overflow
+If your OpenClaw MEMORY.md or USER.md exceeds Hermes' character limits, excess entries are exported to an overflow file in the migration report directory. You can manually review and add the most important ones.
+
+### API keys not found
+Keys might be stored in different places depending on your OpenClaw setup:
+- `~/.openclaw/.env` file
+- Inline in `openclaw.json` under `models.providers.*.apiKey`
+- In `openclaw.json` under the `"env"` or `"env.vars"` sub-objects
+- In `~/.openclaw/agents/main/agent/auth-profiles.json`
+
+The migration checks all four. If keys use `source: "file"` or `source: "exec"` SecretRefs, they can't be resolved automatically — add them via `hermes config set`.
--- a/docs/plans/2026-03-16-pricing-accuracy-architecture-design.md
+++ b/docs/plans/2026-03-16-pricing-accuracy-architecture-design.md
@@ -0,0 +1,608 @@
+# Pricing Accuracy Architecture
+
+Date: 2026-03-16
+
+## Goal
+
+Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
+
+This design replaces the current static, heuristic pricing flow in:
+
+- `run_agent.py`
+- `agent/usage_pricing.py`
+- `agent/insights.py`
+- `cli.py`
+
+with a provider-aware pricing system that:
+
+- handles cache billing correctly
+- distinguishes `actual` vs `estimated` vs `included` vs `unknown`
+- reconciles post-hoc costs when providers expose authoritative billing data
+- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
+
+## Problems In The Current Design
+
+Current Hermes behavior has four structural issues:
+
+1. It stores only `prompt_tokens` and `completion_tokens`, which is insufficient for providers that bill cache reads and cache writes separately.
+2. It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
+3. It assumes public API list pricing matches the user's real billing path.
+4. It has no distinction between live estimates and reconciled billed cost.
+
+## Design Principles
+
+1. Normalize usage before pricing.
+2. Never fold cached tokens into plain input cost.
+3. Track certainty explicitly.
+4. Treat the billing path as part of the model identity.
+5. Prefer official machine-readable sources over scraped docs.
+6. Use post-hoc provider cost APIs when available.
+7. Show `n/a` rather than inventing precision.
+
+## High-Level Architecture
+
+The new system has four layers:
+
+1. `usage_normalization`
+   Converts raw provider usage into a canonical usage record.
+2. `pricing_source_resolution`
+   Determines the billing path, source of truth, and applicable pricing source.
+3. `cost_estimation_and_reconciliation`
+   Produces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.
+4. `presentation`
+   `/usage`, `/insights`, and the status bar display cost with certainty metadata.
+
+## Canonical Usage Record
+
+Add a canonical usage model that every provider path maps into before any pricing math happens.
+
+Suggested structure:
+
+```python
+@dataclass
+class CanonicalUsage:
+    provider: str
+    billing_provider: str
+    model: str
+    billing_route: str
+
+    input_tokens: int = 0
+    output_tokens: int = 0
+    cache_read_tokens: int = 0
+    cache_write_tokens: int = 0
+    reasoning_tokens: int = 0
+    request_count: int = 1
+
+    raw_usage: dict[str, Any] | None = None
+    raw_usage_fields: dict[str, str] | None = None
+    computed_fields: set[str] | None = None
+
+    provider_request_id: str | None = None
+    provider_generation_id: str | None = None
+    provider_response_id: str | None = None
+```
+
+Rules:
+
+- `input_tokens` means non-cached input only.
+- `cache_read_tokens` and `cache_write_tokens` are never merged into `input_tokens`.
+- `output_tokens` excludes cache metrics.
+- `reasoning_tokens` is telemetry unless a provider officially bills it separately.
+
+This is the same normalization pattern used by `opencode`, extended with provenance and reconciliation ids.
+
+## Provider Normalization Rules
+
+### OpenAI Direct
+
+Source usage fields:
+
+- `prompt_tokens`
+- `completion_tokens`
+- `prompt_tokens_details.cached_tokens`
+
+Normalization:
+
+- `cache_read_tokens = cached_tokens`
+- `input_tokens = prompt_tokens - cached_tokens`
+- `cache_write_tokens = 0` unless OpenAI exposes it in the relevant route
+- `output_tokens = completion_tokens`
+
+### Anthropic Direct
+
+Source usage fields:
+
+- `input_tokens`
+- `output_tokens`
+- `cache_read_input_tokens`
+- `cache_creation_input_tokens`
+
+Normalization:
+
+- `input_tokens = input_tokens`
+- `output_tokens = output_tokens`
+- `cache_read_tokens = cache_read_input_tokens`
+- `cache_write_tokens = cache_creation_input_tokens`
+
+### OpenRouter
+
+Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
+
+Reconciliation-time records should also store:
+
+- OpenRouter generation id
+- native token fields when available
+- `total_cost`
+- `cache_discount`
+- `upstream_inference_cost`
+- `is_byok`
+
+### Gemini / Vertex
+
+Use official Gemini or Vertex usage fields where available.
+
+If cached content tokens are exposed:
+
+- map them to `cache_read_tokens`
+
+If a route exposes no cache creation metric:
+
+- store `cache_write_tokens = 0`
+- preserve the raw usage payload for later extension
+
+### DeepSeek And Other Direct Providers
+
+Normalize only the fields that are officially exposed.
+
+If a provider does not expose cache buckets:
+
+- do not infer them unless the provider explicitly documents how to derive them
+
+### Subscription / Included-Cost Routes
+
+These still use the canonical usage model.
+
+Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
+
+## Billing Route Model
+
+Hermes must stop keying pricing solely by `model`.
+
+Introduce a billing route descriptor:
+
+```python
+@dataclass
+class BillingRoute:
+    provider: str
+    base_url: str | None
+    model: str
+    billing_mode: str
+    organization_hint: str | None = None
+```
+
+`billing_mode` values:
+
+- `official_cost_api`
+- `official_generation_api`
+- `official_models_api`
+- `official_docs_snapshot`
+- `subscription_included`
+- `user_override`
+- `custom_contract`
+- `unknown`
+
+Examples:
+
+- OpenAI direct API with Costs API access: `official_cost_api`
+- Anthropic direct API with Usage & Cost API access: `official_cost_api`
+- OpenRouter request before reconciliation: `official_models_api`
+- OpenRouter request after generation lookup: `official_generation_api`
+- GitHub Copilot style subscription route: `subscription_included`
+- local OpenAI-compatible server: `unknown`
+- enterprise contract with configured rates: `custom_contract`
+
+## Cost Status Model
+
+Every displayed cost should have:
+
+```python
+@dataclass
+class CostResult:
+    amount_usd: Decimal | None
+    status: Literal["actual", "estimated", "included", "unknown"]
+    source: Literal[
+        "provider_cost_api",
+        "provider_generation_api",
+        "provider_models_api",
+        "official_docs_snapshot",
+        "user_override",
+        "custom_contract",
+        "none",
+    ]
+    label: str
+    fetched_at: datetime | None
+    pricing_version: str | None
+    notes: list[str]
+```
+
+Presentation rules:
+
+- `actual`: show dollar amount as final
+- `estimated`: show dollar amount with estimate labeling
+- `included`: show `included` or `$0.00 (included)` depending on UX choice
+- `unknown`: show `n/a`
+
+## Official Source Hierarchy
+
+Resolve cost using this order:
+
+1. Request-level or account-level official billed cost
+2. Official machine-readable model pricing
+3. Official docs snapshot
+4. User override or custom contract
+5. Unknown
+
+The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
+
+## Provider-Specific Truth Rules
+
+### OpenAI Direct
+
+Preferred truth:
+
+1. Costs API for reconciled spend
+2. Official pricing page for live estimate
+
+### Anthropic Direct
+
+Preferred truth:
+
+1. Usage & Cost API for reconciled spend
+2. Official pricing docs for live estimate
+
+### OpenRouter
+
+Preferred truth:
+
+1. `GET /api/v1/generation` for reconciled `total_cost`
+2. `GET /api/v1/models` pricing for live estimate
+
+Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
+
+### Gemini / Vertex
+
+Preferred truth:
+
+1. official billing export or billing API for reconciled spend when available for the route
+2. official pricing docs for estimate
+
+### DeepSeek
+
+Preferred truth:
+
+1. official machine-readable cost source if available in the future
+2. official pricing docs snapshot today
+
+### Subscription-Included Routes
+
+Preferred truth:
+
+1. explicit route config marking the model as included in subscription
+
+These should display `included`, not an API list-price estimate.
+
+### Custom Endpoint / Local Model
+
+Preferred truth:
+
+1. user override
+2. custom contract config
+3. unknown
+
+These should default to `unknown`.
+
+## Pricing Catalog
+
+Replace the current `MODEL_PRICING` dict with a richer pricing catalog.
+
+Suggested record:
+
+```python
+@dataclass
+class PricingEntry:
+    provider: str
+    route_pattern: str
+    model_pattern: str
+
+    input_cost_per_million: Decimal | None = None
+    output_cost_per_million: Decimal | None = None
+    cache_read_cost_per_million: Decimal | None = None
+    cache_write_cost_per_million: Decimal | None = None
+    request_cost: Decimal | None = None
+    image_cost: Decimal | None = None
+
+    source: str = "official_docs_snapshot"
+    source_url: str | None = None
+    fetched_at: datetime | None = None
+    pricing_version: str | None = None
+```
+
+The catalog should be route-aware:
+
+- `openai:gpt-5`
+- `anthropic:claude-opus-4-6`
+- `openrouter:anthropic/claude-opus-4.6`
+- `copilot:gpt-4o`
+
+This avoids conflating direct-provider billing with aggregator billing.
+
+## Pricing Sync Architecture
+
+Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
+
+Suggested modules:
+
+- `agent/pricing/catalog.py`
+- `agent/pricing/sources.py`
+- `agent/pricing/sync.py`
+- `agent/pricing/reconcile.py`
+- `agent/pricing/types.py`
+
+### Sync Sources
+
+- OpenRouter models API
+- official provider docs snapshots where no API exists
+- user overrides from config
+
+### Sync Output
+
+Cache pricing entries locally with:
+
+- source URL
+- fetch timestamp
+- version/hash
+- confidence/source type
+
+### Sync Frequency
+
+- startup warm cache
+- background refresh every 6 to 24 hours depending on source
+- manual `hermes pricing sync`
+
+## Reconciliation Architecture
+
+Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
+
+Suggested flow:
+
+1. Agent call completes.
+2. Hermes stores canonical usage plus reconciliation ids.
+3. Hermes computes an immediate estimate if a pricing source exists.
+4. A reconciliation worker fetches actual cost when supported.
+5. Session and message records are updated with `actual` cost.
+
+This can run:
+
+- inline for cheap lookups
+- asynchronously for delayed provider accounting
+
+## Persistence Changes
+
+Session storage should stop storing only aggregate prompt/completion totals.
+
+Add fields for both usage and cost certainty:
+
+- `input_tokens`
+- `output_tokens`
+- `cache_read_tokens`
+- `cache_write_tokens`
+- `reasoning_tokens`
+- `estimated_cost_usd`
+- `actual_cost_usd`
+- `cost_status`
+- `cost_source`
+- `pricing_version`
+- `billing_provider`
+- `billing_mode`
+
+If schema expansion is too large for one PR, add a new pricing events table:
+
+```text
+session_cost_events
+  id
+  session_id
+  request_id
+  provider
+  model
+  billing_mode
+  input_tokens
+  output_tokens
+  cache_read_tokens
+  cache_write_tokens
+  estimated_cost_usd
+  actual_cost_usd
+  cost_status
+  cost_source
+  pricing_version
+  created_at
+  updated_at
+```
+
+## Hermes Touchpoints
+
+### `run_agent.py`
+
+Current responsibility:
+
+- parse raw provider usage
+- update session token counters
+
+New responsibility:
+
+- build `CanonicalUsage`
+- update canonical counters
+- store reconciliation ids
+- emit usage event to pricing subsystem
+
+### `agent/usage_pricing.py`
+
+Current responsibility:
+
+- static lookup table
+- direct cost arithmetic
+
+New responsibility:
+
+- move or replace with pricing catalog facade
+- no fuzzy model-family heuristics
+- no direct pricing without billing-route context
+
+### `cli.py`
+
+Current responsibility:
+
+- compute session cost directly from prompt/completion totals
+
+New responsibility:
+
+- display `CostResult`
+- show status badges:
+  - `actual`
+  - `estimated`
+  - `included`
+  - `n/a`
+
+### `agent/insights.py`
+
+Current responsibility:
+
+- recompute historical estimates from static pricing
+
+New responsibility:
+
+- aggregate stored pricing events
+- prefer actual cost over estimate
+- surface estimates only when reconciliation is unavailable
+
+## UX Rules
+
+### Status Bar
+
+Show one of:
+
+- `$1.42`
+- `~$1.42`
+- `included`
+- `cost n/a`
+
+Where:
+
+- `$1.42` means `actual`
+- `~$1.42` means `estimated`
+- `included` means subscription-backed or explicitly zero-cost route
+- `cost n/a` means unknown
+
+### `/usage`
+
+Show:
+
+- token buckets
+- estimated cost
+- actual cost if available
+- cost status
+- pricing source
+
+### `/insights`
+
+Aggregate:
+
+- actual cost totals
+- estimated-only totals
+- unknown-cost sessions count
+- included-cost sessions count
+
+## Config And Overrides
+
+Add user-configurable pricing overrides in config:
+
+```yaml
+pricing:
+  mode: hybrid
+  sync_on_startup: true
+  sync_interval_hours: 12
+  overrides:
+    - provider: openrouter
+      model: anthropic/claude-opus-4.6
+      billing_mode: custom_contract
+      input_cost_per_million: 4.25
+      output_cost_per_million: 22.0
+      cache_read_cost_per_million: 0.5
+      cache_write_cost_per_million: 6.0
+  included_routes:
+    - provider: copilot
+      model: "*"
+    - provider: codex-subscription
+      model: "*"
+```
+
+Overrides must win over catalog defaults for the matching billing route.
+
+## Rollout Plan
+
+### Phase 1
+
+- add canonical usage model
+- split cache token buckets in `run_agent.py`
+- stop pricing cache-inflated prompt totals
+- preserve current UI with improved backend math
+
+### Phase 2
+
+- add route-aware pricing catalog
+- integrate OpenRouter models API sync
+- add `estimated` vs `included` vs `unknown`
+
+### Phase 3
+
+- add reconciliation for OpenRouter generation cost
+- add actual cost persistence
+- update `/insights` to prefer actual cost
+
+### Phase 4
+
+- add direct OpenAI and Anthropic reconciliation paths
+- add user overrides and contract pricing
+- add pricing sync CLI command
+
+## Testing Strategy
+
+Add tests for:
+
+- OpenAI cached token subtraction
+- Anthropic cache read/write separation
+- OpenRouter estimated vs actual reconciliation
+- subscription-backed models showing `included`
+- custom endpoints showing `n/a`
+- override precedence
+- stale catalog fallback behavior
+
+Current tests that assume heuristic pricing should be replaced with route-aware expectations.
+
+## Non-Goals
+
+- exact enterprise billing reconstruction without an official source or user override
+- backfilling perfect historical cost for old sessions that lack cache bucket data
+- scraping arbitrary provider web pages at request time
+
+## Recommendation
+
+Do not expand the existing `MODEL_PRICING` dict.
+
+That path cannot satisfy the product requirement. Hermes should instead migrate to:
+
+- canonical usage normalization
+- route-aware pricing sources
+- estimate-then-reconcile cost lifecycle
+- explicit certainty states in the UI
+
+This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.
--- a/docs/skins/example-skin.yaml
+++ b/docs/skins/example-skin.yaml
@@ -0,0 +1,97 @@
+# ============================================================================
+# Hermes Agent — Example Skin Template
+# ============================================================================
+#
+# Copy this file to ~/.hermes/skins/<name>.yaml to create a custom skin.
+# All fields are optional — missing values inherit from the default skin.
+# Activate with: /skin <name>  or  display.skin: <name> in config.yaml
+#
+# See hermes_cli/skin_engine.py for the full schema reference.
+# ============================================================================
+
+# Required: unique skin name (used in /skin command and config)
+name: example
+description: An example custom skin — copy and modify this template
+
+# ── Colors ──────────────────────────────────────────────────────────────────
+# Hex color values for Rich markup. These control the CLI's visual palette.
+colors:
+  # Banner panel (the startup welcome box)
+  banner_border: "#CD7F32"        # Panel border
+  banner_title: "#FFD700"         # Panel title text
+  banner_accent: "#FFBF00"        # Section headers (Available Tools, Skills, etc.)
+  banner_dim: "#B8860B"           # Dim/muted text (separators, model info)
+  banner_text: "#FFF8DC"          # Body text (tool names, skill names)
+
+  # UI elements
+  ui_accent: "#FFBF00"            # General accent color
+  ui_label: "#4dd0e1"             # Labels
+  ui_ok: "#4caf50"                # Success indicators
+  ui_error: "#ef5350"             # Error indicators
+  ui_warn: "#ffa726"              # Warning indicators
+
+  # Input area
+  prompt: "#FFF8DC"               # Prompt text color
+  input_rule: "#CD7F32"           # Horizontal rule around input
+
+  # Response box
+  response_border: "#FFD700"      # Response box border (ANSI color)
+
+  # Session display
+  session_label: "#DAA520"        # Session label
+  session_border: "#8B8682"       # Session ID dim color
+
+  # TUI surfaces
+  status_bar_bg: "#1a1a2e"              # Status / usage bar background
+  voice_status_bg: "#1a1a2e"            # Voice-mode badge background
+  completion_menu_bg: "#1a1a2e"         # Completion list background
+  completion_menu_current_bg: "#333355" # Active completion row background
+  completion_menu_meta_bg: "#1a1a2e"    # Completion meta column background
+  completion_menu_meta_current_bg: "#333355"  # Active completion meta background
+
+# ── Spinner ─────────────────────────────────────────────────────────────────
+# Customize the animated spinner shown during API calls and tool execution.
+spinner:
+  # Faces shown while waiting for the API response
+  waiting_faces:
+    - "(｡◕‿◕｡)"
+    - "(◕‿◕✿)"
+    - "٩(◕‿◕｡)۶"
+
+  # Faces shown during extended thinking/reasoning
+  thinking_faces:
+    - "(｡•́︿•̀｡)"
+    - "(◔_◔)"
+    - "(¬‿¬)"
+
+  # Verbs used in spinner messages (e.g., "pondering your request...")
+  thinking_verbs:
+    - "pondering"
+    - "contemplating"
+    - "musing"
+    - "ruminating"
+
+  # Optional: left/right decorations around the spinner
+  # Each entry is a [left, right] pair. Omit entirely for no wings.
+  # wings:
+  #   - ["⟪⚔", "⚔⟫"]
+  #   - ["⟪▲", "▲⟫"]
+
+# ── Branding ────────────────────────────────────────────────────────────────
+# Text strings used throughout the CLI interface.
+branding:
+  agent_name: "Hermes Agent"          # Banner title, about display
+  welcome: "Welcome! Type your message or /help for commands."
+  goodbye: "Goodbye! ⚕"              # Exit message
+  response_label: " ⚕ Hermes "       # Response box header label
+  prompt_symbol: "❯ "                 # Input prompt symbol
+  help_header: "(^_^)? Available Commands"  # /help header text
+
+# ── Tool Output ─────────────────────────────────────────────────────────────
+# Character used as the prefix for tool output lines.
+# Default is "┊" (thin dotted vertical line). Some alternatives:
+#   "╎" (light triple dash vertical)
+#   "▏" (left one-eighth block)
+#   "│" (box drawing light vertical)
+#   "┃" (box drawing heavy vertical)
+tool_prefix: "┊"
--- a/docs/specs/container-cli-review-fixes.md
+++ b/docs/specs/container-cli-review-fixes.md
@@ -0,0 +1,329 @@
+# Container-Aware CLI Review Fixes Spec
+
+**PR:** NousResearch/hermes-agent#7543
+**Review:** cursor[bot] bugbot review (4094049442) + two prior rounds
+**Date:** 2026-04-12
+**Branch:** `feat/container-aware-cli-clean`
+
+## Review Issues Summary
+
+Six issues were raised across three bugbot review rounds. Three were fixed in intermediate commits (38277a6a, 726cf90f). This spec addresses remaining design concerns surfaced by those reviews and simplifies the implementation based on interview decisions.
+
+| # | Issue | Severity | Status |
+|---|-------|----------|--------|
+| 1 | `os.execvp` retry loop unreachable | Medium | Fixed in 79e8cd12 (switched to subprocess.run) |
+| 2 | Redundant `shutil.which("sudo")` | Medium | Fixed in 38277a6a (reuses `sudo` var) |
+| 3 | Missing `chown -h` on symlink update | Low | Fixed in 38277a6a |
+| 4 | Container routing after `parse_args()` | High | Fixed in 726cf90f |
+| 5 | Hardcoded `/home/${user}` | Medium | Fixed in 726cf90f |
+| 6 | Group membership not gated on `container.enable` | Low | Fixed in 726cf90f |
+
+The mechanical fixes are in place but the overall design needs revision. The retry loop, error swallowing, and process model have deeper issues than what the bugbot flagged.
+
+---
+
+## Spec: Revised `_exec_in_container`
+
+### Design Principles
+
+1. **Let it crash.** No silent fallbacks. If `.container-mode` exists but something goes wrong, the error propagates naturally (Python traceback). The only case where container routing is skipped is when `.container-mode` doesn't exist or `HERMES_DEV=1`.
+2. **No retries.** Probe once for sudo, exec once. If it fails, docker/podman's stderr reaches the user verbatim.
+3. **Completely transparent.** No error wrapping, no prefixes, no spinners. Docker's output goes straight through.
+4. **`os.execvp` on the happy path.** Replace the Python process entirely so there's no idle parent during interactive sessions. Note: `execvp` never returns on success (process is replaced) and raises `OSError` on failure (it does not return a value). The container process's exit code becomes the process exit code by definition — no explicit propagation needed.
+5. **One human-readable exception to "let it crash".** `subprocess.TimeoutExpired` from the sudo probe gets a specific catch with a readable message, since a raw traceback for "your Docker daemon is slow" is confusing. All other exceptions propagate naturally.
+
+### Execution Flow
+
+```
+1. get_container_exec_info()
+   - HERMES_DEV=1 → return None (skip routing)
+   - Inside container → return None (skip routing)
+   - .container-mode doesn't exist → return None (skip routing)
+   - .container-mode exists → parse and return dict
+   - .container-mode exists but malformed/unreadable → LET IT CRASH (no try/except)
+
+2. _exec_in_container(container_info, sys.argv[1:])
+   a. shutil.which(backend) → if None, print "{backend} not found on PATH" and sys.exit(1)
+   b. Sudo probe: subprocess.run([runtime, "inspect", "--format", "ok", container_name], timeout=15)
+      - If succeeds → needs_sudo = False
+      - If fails → try subprocess.run([sudo, "-n", runtime, "inspect", ...], timeout=15)
+        - If succeeds → needs_sudo = True
+        - If fails → print error with sudoers hint (including why -n is required) and sys.exit(1)
+      - If TimeoutExpired → catch specifically, print human-readable message about slow daemon
+   c. Build exec_cmd: [sudo? + runtime, "exec", tty_flags, "-u", exec_user, env_flags, container, hermes_bin, *cli_args]
+   d. os.execvp(exec_cmd[0], exec_cmd)
+      - On success: process is replaced — Python is gone, container exit code IS the process exit code
+      - On OSError: let it crash (natural traceback)
+```
+
+### Changes to `hermes_cli/main.py`
+
+#### `_exec_in_container` — rewrite
+
+Remove:
+- The entire retry loop (`max_retries`, `for attempt in range(...)`)
+- Spinner logic (`"Waiting for container..."`, dots)
+- Exit code classification (125/126/127 handling)
+- `subprocess.run` for the exec call (keep it only for the sudo probe)
+- Special TTY vs non-TTY retry counts
+- The `time` import (no longer needed)
+
+Change:
+- Use `os.execvp(exec_cmd[0], exec_cmd)` as the final call
+- Keep the `subprocess` import only for the sudo probe
+- Keep TTY detection for the `-it` vs `-i` flag
+- Keep env var forwarding (TERM, COLORTERM, LANG, LC_ALL)
+- Keep the sudo probe as-is (it's the one "smart" part)
+- Bump probe `timeout` from 5s to 15s — cold podman on a loaded machine needs headroom
+- Catch `subprocess.TimeoutExpired` specifically on both probe calls — print a readable message about the daemon being unresponsive instead of a raw traceback
+- Expand the sudoers hint error message to explain *why* `-n` (non-interactive) is required: a password prompt would hang the CLI or break piped commands
+
+The function becomes roughly:
+
+```python
+def _exec_in_container(container_info: dict, cli_args: list):
+    """Replace the current process with a command inside the managed container.
+
+    Probes whether sudo is needed (rootful containers), then os.execvp
+    into the container. If exec fails, the OS error propagates naturally.
+    """
+    import shutil
+    import subprocess
+
+    backend = container_info["backend"]
+    container_name = container_info["container_name"]
+    exec_user = container_info["exec_user"]
+    hermes_bin = container_info["hermes_bin"]
+
+    runtime = shutil.which(backend)
+    if not runtime:
+        print(f"Error: {backend} not found on PATH. Cannot route to container.",
+              file=sys.stderr)
+        sys.exit(1)
+
+    # Probe whether we need sudo to see the rootful container.
+    # Timeout is 15s — cold podman on a loaded machine can take a while.
+    # TimeoutExpired is caught specifically for a human-readable message;
+    # all other exceptions propagate naturally.
+    needs_sudo = False
+    sudo = None
+    try:
+        probe = subprocess.run(
+            [runtime, "inspect", "--format", "ok", container_name],
+            capture_output=True, text=True, timeout=15,
+        )
+    except subprocess.TimeoutExpired:
+        print(
+            f"Error: timed out waiting for {backend} to respond.\n"
+            f"The {backend} daemon may be unresponsive or starting up.",
+            file=sys.stderr,
+        )
+        sys.exit(1)
+
+    if probe.returncode != 0:
+        sudo = shutil.which("sudo")
+        if sudo:
+            try:
+                probe2 = subprocess.run(
+                    [sudo, "-n", runtime, "inspect", "--format", "ok", container_name],
+                    capture_output=True, text=True, timeout=15,
+                )
+            except subprocess.TimeoutExpired:
+                print(
+                    f"Error: timed out waiting for sudo {backend} to respond.",
+                    file=sys.stderr,
+                )
+                sys.exit(1)
+
+            if probe2.returncode == 0:
+                needs_sudo = True
+            else:
+                print(
+                    f"Error: container '{container_name}' not found via {backend}.\n"
+                    f"\n"
+                    f"The NixOS service runs the container as root. Your user cannot\n"
+                    f"see it because {backend} uses per-user namespaces.\n"
+                    f"\n"
+                    f"Fix: grant passwordless sudo for {backend}. The -n (non-interactive)\n"
+                    f"flag is required because the CLI calls sudo non-interactively —\n"
+                    f"a password prompt would hang or break piped commands:\n"
+                    f"\n"
+                    f'  security.sudo.extraRules = [{{\n'
+                    f'    users = [ "{os.getenv("USER", "your-user")}" ];\n'
+                    f'    commands = [{{ command = "{runtime}"; options = [ "NOPASSWD" ]; }}];\n'
+                    f'  }}];\n'
+                    f"\n"
+                    f"Or run: sudo hermes {' '.join(cli_args)}",
+                    file=sys.stderr,
+                )
+                sys.exit(1)
+        else:
+            print(
+                f"Error: container '{container_name}' not found via {backend}.\n"
+                f"The container may be running under root. Try: sudo hermes {' '.join(cli_args)}",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+
+    is_tty = sys.stdin.isatty()
+    tty_flags = ["-it"] if is_tty else ["-i"]
+
+    env_flags = []
+    for var in ("TERM", "COLORTERM", "LANG", "LC_ALL"):
+        val = os.environ.get(var)
+        if val:
+            env_flags.extend(["-e", f"{var}={val}"])
+
+    cmd_prefix = [sudo, "-n", runtime] if needs_sudo else [runtime]
+    exec_cmd = (
+        cmd_prefix + ["exec"]
+        + tty_flags
+        + ["-u", exec_user]
+        + env_flags
+        + [container_name, hermes_bin]
+        + cli_args
+    )
+
+    # execvp replaces this process entirely — it never returns on success.
+    # On failure it raises OSError, which propagates naturally.
+    os.execvp(exec_cmd[0], exec_cmd)
+```
+
+#### Container routing call site in `main()` — remove try/except
+
+Current:
+```python
+try:
+    from hermes_cli.config import get_container_exec_info
+    container_info = get_container_exec_info()
+    if container_info:
+        _exec_in_container(container_info, sys.argv[1:])
+        sys.exit(1)  # exec failed if we reach here
+except SystemExit:
+    raise
+except Exception:
+    pass  # Container routing unavailable, proceed locally
+```
+
+Revised:
+```python
+from hermes_cli.config import get_container_exec_info
+container_info = get_container_exec_info()
+if container_info:
+    _exec_in_container(container_info, sys.argv[1:])
+    # Unreachable: os.execvp never returns on success (process is replaced)
+    # and raises OSError on failure (which propagates as a traceback).
+    # This line exists only as a defensive assertion.
+    sys.exit(1)
+```
+
+No try/except. If `.container-mode` doesn't exist, `get_container_exec_info()` returns `None` and we skip routing. If it exists but is broken, the exception propagates with a natural traceback.
+
+Note: `sys.exit(1)` after `_exec_in_container` is dead code in all paths — `os.execvp` either replaces the process or raises. It's kept as a belt-and-suspenders assertion with a comment marking it unreachable, not as actual error handling.
+
+### Changes to `hermes_cli/config.py`
+
+#### `get_container_exec_info` — remove inner try/except
+
+Current code catches `(OSError, IOError)` and returns `None`. This silently hides permission errors, corrupt files, etc.
+
+Change: Remove the try/except around file reading. Keep the early returns for `HERMES_DEV=1` and `_is_inside_container()`. The `FileNotFoundError` from `open()` when `.container-mode` doesn't exist should still return `None` (this is the "container mode not enabled" case). All other exceptions propagate.
+
+```python
+def get_container_exec_info() -> Optional[dict]:
+    if os.environ.get("HERMES_DEV") == "1":
+        return None
+    if _is_inside_container():
+        return None
+
+    container_mode_file = get_hermes_home() / ".container-mode"
+
+    try:
+        with open(container_mode_file, "r") as f:
+            # ... parse key=value lines ...
+    except FileNotFoundError:
+        return None
+    # All other exceptions (PermissionError, malformed data, etc.) propagate
+
+    return { ... }
+```
+
+---
+
+## Spec: NixOS Module Changes
+
+### Symlink creation — simplify to two branches
+
+Current: 4 branches (symlink exists, directory exists, other file, doesn't exist).
+
+Revised: 2 branches.
+
+```bash
+if [ -d "${symlinkPath}" ] && [ ! -L "${symlinkPath}" ]; then
+  # Real directory — back it up, then create symlink
+  _backup="${symlinkPath}.bak.$(date +%s)"
+  echo "hermes-agent: backing up existing ${symlinkPath} to $_backup"
+  mv "${symlinkPath}" "$_backup"
+fi
+# For everything else (symlink, doesn't exist, etc.) — just force-create
+ln -sfn "${target}" "${symlinkPath}"
+chown -h ${user}:${cfg.group} "${symlinkPath}"
+```
+
+`ln -sfn` handles: existing symlink (replaces), doesn't exist (creates), and after the `mv` above (creates). The only case that needs special handling is a real directory, because `ln -sfn` cannot atomically replace a directory.
+
+Note: there is a theoretical race between the `[ -d ... ]` check and the `mv` (something could create/remove the directory in between). In practice this is a NixOS activation script running as root during `nixos-rebuild switch` — no other process should be touching `~/.hermes` at that moment. Not worth adding locking for.
+
+### Sudoers — document, don't auto-configure
+
+Do NOT add `security.sudo.extraRules` to the module. Document the sudoers requirement in the module's description/comments and in the error message the CLI prints when sudo probe fails.
+
+### Group membership gating — keep as-is
+
+The fix in 726cf90f (`cfg.container.enable && cfg.container.hostUsers != []`) is correct. Leftover group membership when container mode is disabled is harmless. No cleanup needed.
+
+---
+
+## Spec: Test Rewrite
+
+The existing test file (`tests/hermes_cli/test_container_aware_cli.py`) has 16 tests. With the simplified exec model, several are obsolete.
+
+### Tests to keep (update as needed)
+
+- `test_is_inside_container_dockerenv` — unchanged
+- `test_is_inside_container_containerenv` — unchanged
+- `test_is_inside_container_cgroup_docker` — unchanged
+- `test_is_inside_container_false_on_host` — unchanged
+- `test_get_container_exec_info_returns_metadata` — unchanged
+- `test_get_container_exec_info_none_inside_container` — unchanged
+- `test_get_container_exec_info_none_without_file` — unchanged
+- `test_get_container_exec_info_skipped_when_hermes_dev` — unchanged
+- `test_get_container_exec_info_not_skipped_when_hermes_dev_zero` — unchanged
+- `test_get_container_exec_info_defaults` — unchanged
+- `test_get_container_exec_info_docker_backend` — unchanged
+
+### Tests to add
+
+- `test_get_container_exec_info_crashes_on_permission_error` — verify that `PermissionError` propagates (no silent `None` return)
+- `test_exec_in_container_calls_execvp` — verify `os.execvp` is called with correct args (runtime, tty flags, user, env, container, binary, cli args)
+- `test_exec_in_container_sudo_probe_sets_prefix` — verify that when first probe fails and sudo probe succeeds, `os.execvp` is called with `sudo -n` prefix
+- `test_exec_in_container_no_runtime_hard_fails` — keep existing, verify `sys.exit(1)` when `shutil.which` returns None
+- `test_exec_in_container_non_tty_uses_i_only` — update to check `os.execvp` args instead of `subprocess.run` args
+- `test_exec_in_container_probe_timeout_prints_message` — verify that `subprocess.TimeoutExpired` from the probe produces a human-readable error and `sys.exit(1)`, not a raw traceback
+- `test_exec_in_container_container_not_running_no_sudo` — verify the path where runtime exists (`shutil.which` returns a path) but probe returns non-zero and no sudo is available. Should print the "container may be running under root" error. This is distinct from `no_runtime_hard_fails` which covers `shutil.which` returning None.
+
+### Tests to delete
+
+- `test_exec_in_container_tty_retries_on_container_failure` — retry loop removed
+- `test_exec_in_container_non_tty_retries_silently_exits_126` — retry loop removed
+- `test_exec_in_container_propagates_hermes_exit_code` — no subprocess.run to check exit codes; execvp replaces the process. Note: exit code propagation still works correctly — when `os.execvp` succeeds, the container's process *becomes* this process, so its exit code is the process exit code by OS semantics. No application code needed, no test needed. A comment in the function docstring documents this intent for future readers.
+
+---
+
+## Out of Scope
+
+- Auto-configuring sudoers rules in the NixOS module
+- Any changes to `get_container_exec_info` parsing logic beyond the try/except narrowing
+- Changes to `.container-mode` file format
+- Changes to the `HERMES_DEV=1` bypass
+- Changes to container detection logic (`_is_inside_container`)
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -18,14 +18,11 @@ import logging
 import os
 import uuid
 from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
+from typing import Any, Dict, List, Optional, Set

-if TYPE_CHECKING:
-    from hermes_agent.tools.budget_config import BudgetConfig
-
-from hermes_agent.tools.dispatch import handle_function_call
-from hermes_agent.tools.terminal import get_active_env
-from hermes_agent.tools.result_storage import maybe_persist_tool_result, enforce_turn_budget
+from model_tools import handle_function_call
+from tools.terminal_tool import get_active_env
+from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget

 # Thread pool for running sync tool calls that internally use asyncio.run()
 # (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
@@ -164,7 +161,7 @@ class HermesAgentLoop:
                        thresholds, per-turn aggregate budget, and preview size.
                        If None, uses DEFAULT_BUDGET (current hardcoded values).
        """
-        from hermes_agent.tools.budget_config import DEFAULT_BUDGET
+        from tools.budget_config import DEFAULT_BUDGET
        self.server = server
        self.tool_schemas = tool_schemas
        self.valid_tool_names = valid_tool_names
@@ -190,7 +187,7 @@ class HermesAgentLoop:
        tool_errors: List[ToolError] = []

        # Per-loop TodoStore for the todo tool (ephemeral, dies with the loop)
-        from hermes_agent.tools.todo import TodoStore, todo_tool as _todo_tool
+        from tools.todo_tool import TodoStore, todo_tool as _todo_tool
        _todo_store = TodoStore()

        # Extract user task from first user message for browser_snapshot context
--- a/environments/benchmarks/terminalbench_2/terminalbench2_env.py
+++ b/environments/benchmarks/terminalbench_2/terminalbench2_env.py
@@ -60,7 +60,7 @@ from atroposlib.envs.server_handling.server_manager import APIServerConfig
 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
 from environments.tool_context import ToolContext
-from hermes_agent.tools.terminal import (
+from tools.terminal_tool import (
    register_task_env_overrides,
    clear_task_env_overrides,
    cleanup_vm,
@@ -876,7 +876,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            # Let cancellations propagate (finally blocks run cleanup_vm)
            await asyncio.gather(*eval_tasks, return_exceptions=True)
            # Belt-and-suspenders: clean up any remaining sandboxes
-            from hermes_agent.tools.terminal import cleanup_all_environments
+            from tools.terminal_tool import cleanup_all_environments
            cleanup_all_environments()
            print("All sandboxes cleaned up.")
            return
@@ -984,7 +984,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):

        # Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
        # pool workers still executing commands -- cleanup_all stops them.
-        from hermes_agent.tools.terminal import cleanup_all_environments
+        from tools.terminal_tool import cleanup_all_environments
        print("\nCleaning up all sandboxes...")
        cleanup_all_environments()

--- a/environments/benchmarks/yc_bench/yc_bench_env.py
+++ b/environments/benchmarks/yc_bench/yc_bench_env.py
@@ -709,7 +709,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
            tqdm.write("\n[INTERRUPTED] Stopping evaluation...")
            pbar.close()
            try:
-                from hermes_agent.tools.terminal import cleanup_all_environments
+                from tools.terminal_tool import cleanup_all_environments
                cleanup_all_environments()
            except Exception:
                pass
@@ -819,7 +819,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
            print(f"Results saved to: {self._streaming_path}")

        try:
-            from hermes_agent.tools.terminal import cleanup_all_environments
+            from tools.terminal_tool import cleanup_all_environments
            cleanup_all_environments()
        except Exception:
            pass
--- a/environments/hermes_base_env.py
+++ b/environments/hermes_base_env.py
@@ -62,15 +62,15 @@ from atroposlib.type_definitions import Item

 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.tool_context import ToolContext
-from hermes_agent.tools.budget_config import (
+from tools.budget_config import (
    DEFAULT_RESULT_SIZE_CHARS,
    DEFAULT_TURN_BUDGET_CHARS,
    DEFAULT_PREVIEW_SIZE_CHARS,
 )

 # Import hermes-agent toolset infrastructure
-from hermes_agent.tools.dispatch import get_tool_definitions
-from hermes_agent.tools.distributions import sample_toolsets_from_distribution
+from model_tools import get_tool_definitions
+from toolset_distributions import sample_toolsets_from_distribution

 logger = logging.getLogger(__name__)

@@ -209,7 +209,7 @@ class HermesAgentEnvConfig(BaseEnvConfig):

    def build_budget_config(self):
        """Build a BudgetConfig from env config fields."""
-        from hermes_agent.tools.budget_config import BudgetConfig
+        from tools.budget_config import BudgetConfig
        return BudgetConfig(
            default_result_size=self.default_result_size_chars,
            turn_budget=self.turn_budget_chars,
--- a/environments/tool_context.py
+++ b/environments/tool_context.py
@@ -31,9 +31,9 @@ from typing import Any, Dict, List, Optional
 import asyncio
 import concurrent.futures

-from hermes_agent.tools.dispatch import handle_function_call
-from hermes_agent.tools.terminal import cleanup_vm
-from hermes_agent.tools.browser.tool import cleanup_browser
+from model_tools import handle_function_call
+from tools.terminal_tool import cleanup_vm
+from tools.browser_tool import cleanup_browser

 logger = logging.getLogger(__name__)

@@ -53,6 +53,7 @@ def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str)
    try:
        loop = asyncio.get_running_loop()
        # We're in an async context -- need to run in thread
+        import concurrent.futures
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
            future = pool.submit(
                handle_function_call, tool_name, arguments, task_id
@@ -446,7 +447,7 @@ class ToolContext:
        """
        # Kill any background processes from this rollout (safety net)
        try:
-            from hermes_agent.tools.process_registry import process_registry
+            from tools.process_registry import process_registry
            killed = process_registry.kill_all(task_id=self.task_id)
            if killed:
                logger.debug("Process cleanup for task %s: killed %d process(es)", self.task_id, killed)
--- a/flake.lock
+++ b/flake.lock
@@ -36,26 +36,6 @@
        "type": "github"
      }
    },
-    "npm-lockfile-fix": {
-      "inputs": {
-        "nixpkgs": [
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1775903712,
-        "narHash": "sha256-2GV79U6iVH4gKAPWYrxUReB0S41ty/Y3dBLquU8AlaA=",
-        "owner": "jeslie0",
-        "repo": "npm-lockfile-fix",
-        "rev": "c6093acb0c0548e0f9b8b3d82918823721930fe8",
-        "type": "github"
-      },
-      "original": {
-        "owner": "jeslie0",
-        "repo": "npm-lockfile-fix",
-        "type": "github"
-      }
-    },
    "pyproject-build-systems": {
      "inputs": {
        "nixpkgs": [
@@ -144,7 +124,6 @@
      "inputs": {
        "flake-parts": "flake-parts",
        "nixpkgs": "nixpkgs",
-        "npm-lockfile-fix": "npm-lockfile-fix",
        "pyproject-build-systems": "pyproject-build-systems",
        "pyproject-nix": "pyproject-nix_2",
        "uv2nix": "uv2nix_2"
--- a/flake.nix
+++ b/flake.nix
@@ -19,20 +19,11 @@
      url = "github:pyproject-nix/build-system-pkgs";
      inputs.nixpkgs.follows = "nixpkgs";
    };
-    npm-lockfile-fix = {
-      url = "github:jeslie0/npm-lockfile-fix";
-      inputs.nixpkgs.follows = "nixpkgs";
-    };
  };

-  outputs =
-    inputs:
+  outputs = inputs:
    inputs.flake-parts.lib.mkFlake { inherit inputs; } {
-      systems = [
-        "x86_64-linux"
-        "aarch64-linux"
-        "aarch64-darwin"
-      ];
+      systems = [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" ];

      imports = [
        ./nix/packages.nix
--- a/hermes_agent/gateway/init.py
+++ b/hermes_agent/gateway/init.py
--- a/hermes_agent/gateway/builtin_hooks/init.py
+++ b/hermes_agent/gateway/builtin_hooks/init.py
--- a/hermes_agent/gateway/builtin_hooks/boot_md.py
+++ b/hermes_agent/gateway/builtin_hooks/boot_md.py
@@ -20,9 +20,9 @@ suppress delivery.
 import logging
 import threading

-logger = logging.getLogger(__name__)
+logger = logging.getLogger("hooks.boot-md")

-from hermes_agent.constants import get_hermes_home
+from hermes_constants import get_hermes_home
 HERMES_HOME = get_hermes_home()
 BOOT_FILE = HERMES_HOME / "BOOT.md"

@@ -45,7 +45,7 @@ def _build_boot_prompt(content: str) -> str:
 def _run_boot_agent(content: str) -> None:
    """Spawn a one-shot agent session to execute the boot instructions."""
    try:
-        from hermes_agent.agent.loop import AIAgent
+        from run_agent import AIAgent

        prompt = _build_boot_prompt(content)
        agent = AIAgent(
--- a/hermes_agent/gateway/channel_directory.py
+++ b/hermes_agent/gateway/channel_directory.py
@@ -11,8 +11,8 @@ import logging
 from datetime import datetime
 from typing import Any, Dict, List, Optional

-from hermes_agent.cli.config import get_hermes_home
-from hermes_agent.utils import atomic_json_write
+from hermes_cli.config import get_hermes_home
+from utils import atomic_json_write

 logger = logging.getLogger(__name__)

@@ -63,7 +63,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:

    Returns the directory dict and writes it to DIRECTORY_PATH.
    """
-    from hermes_agent.gateway.config import Platform
+    from gateway.config import Platform

    platforms: Dict[str, List[Dict[str, str]]] = {}

@@ -100,7 +100,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:


 def _build_discord(adapter) -> List[Dict[str, str]]:
-    """Enumerate all text channels and forum channels the Discord bot can see."""
+    """Enumerate all text channels the Discord bot can see."""
    channels = []
    client = getattr(adapter, "_client", None)
    if not client:
@@ -119,15 +119,6 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
                "guild": guild.name,
                "type": "channel",
            })
-        # Forum channels (type 15) — creating a message auto-spawns a thread post.
-        forums = getattr(guild, "forum_channels", None) or []
-        for ch in forums:
-            channels.append({
-                "id": str(ch.id),
-                "name": ch.name,
-                "guild": guild.name,
-                "type": "forum",
-            })
        # Also include DM-capable users we've interacted with is not
        # feasible via guild enumeration; those come from sessions.

@@ -144,7 +135,7 @@ def _build_slack(adapter) -> List[Dict[str, str]]:
        return _build_from_sessions("slack")

    try:
-        from hermes_agent.tools.send_message import _send_slack  # noqa: F401
+        from tools.send_message_tool import _send_slack  # noqa: F401
        # Use the Slack Web API directly if available
    except Exception:
        pass
@@ -200,15 +191,6 @@ def load_directory() -> Dict[str, Any]:
        return {"updated_at": None, "platforms": {}}


-def lookup_channel_type(platform_name: str, chat_id: str) -> Optional[str]:
-    """Return the channel ``type`` string (e.g. ``"channel"``, ``"forum"``) for *chat_id*, or *None* if unknown."""
-    directory = load_directory()
-    for ch in directory.get("platforms", {}).get(platform_name, []):
-        if ch.get("id") == chat_id:
-            return ch.get("type")
-    return None
-
-
 def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
    """
    Resolve a human-friendly channel name to a numeric ID.
--- a/hermes_agent/gateway/config.py
+++ b/hermes_agent/gateway/config.py
@@ -16,8 +16,8 @@ from dataclasses import dataclass, field
 from typing import Dict, List, Optional, Any
 from enum import Enum

-from hermes_agent.cli.config import get_hermes_home
-from hermes_agent.utils import is_truthy_value
+from hermes_cli.config import get_hermes_home
+from utils import is_truthy_value

 logger = logging.getLogger(__name__)

@@ -258,13 +258,6 @@ class GatewayConfig:
    # Streaming configuration
    streaming: StreamingConfig = field(default_factory=StreamingConfig)

-    # Session store pruning: drop SessionEntry records older than this many
-    # days from the in-memory dict and sessions.json.  Keeps the store from
-    # growing unbounded in gateways serving many chats/threads/users over
-    # months.  Pruning is invisible to users — if they resume, they get a
-    # fresh session exactly as if the reset policy had fired.  0 = disabled.
-    session_store_max_age_days: int = 90
-
    def get_connected_platforms(self) -> List[Platform]:
        """Return list of platforms that are enabled and configured."""
        connected = []
@@ -314,14 +307,6 @@ class GatewayConfig:
            # QQBot uses extra dict for app credentials
            elif platform == Platform.QQBOT and config.extra.get("app_id") and config.extra.get("client_secret"):
                connected.append(platform)
-            # DingTalk uses client_id/client_secret from config.extra or env vars
-            elif platform == Platform.DINGTALK and (
-                config.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID")
-            ) and (
-                config.extra.get("client_secret") or os.getenv("DINGTALK_CLIENT_SECRET")
-            ):
-                connected.append(platform)
-        
        return connected
    
    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -372,7 +357,6 @@ class GatewayConfig:
            "thread_sessions_per_user": self.thread_sessions_per_user,
            "unauthorized_dm_behavior": self.unauthorized_dm_behavior,
            "streaming": self.streaming.to_dict(),
-            "session_store_max_age_days": self.session_store_max_age_days,
        }
    
    @classmethod
@@ -420,13 +404,6 @@ class GatewayConfig:
            "pair",
        )

-        try:
-            session_store_max_age_days = int(data.get("session_store_max_age_days", 90))
-            if session_store_max_age_days < 0:
-                session_store_max_age_days = 0
-        except (TypeError, ValueError):
-            session_store_max_age_days = 90
-
        return cls(
            platforms=platforms,
            default_reset_policy=default_policy,
@@ -441,7 +418,6 @@ class GatewayConfig:
            thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
            unauthorized_dm_behavior=unauthorized_dm_behavior,
            streaming=StreamingConfig.from_dict(data.get("streaming", {})),
-            session_store_max_age_days=session_store_max_age_days,
        )

    def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
@@ -576,14 +552,6 @@ def load_gateway_config() -> GatewayConfig:
                    bridged["free_response_channels"] = platform_cfg["free_response_channels"]
                if "mention_patterns" in platform_cfg:
                    bridged["mention_patterns"] = platform_cfg["mention_patterns"]
-                if "dm_policy" in platform_cfg:
-                    bridged["dm_policy"] = platform_cfg["dm_policy"]
-                if "allow_from" in platform_cfg:
-                    bridged["allow_from"] = platform_cfg["allow_from"]
-                if "group_policy" in platform_cfg:
-                    bridged["group_policy"] = platform_cfg["group_policy"]
-                if "group_allow_from" in platform_cfg:
-                    bridged["group_allow_from"] = platform_cfg["group_allow_from"]
                if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
                    bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
                if "channel_prompts" in platform_cfg:
@@ -649,20 +617,6 @@ def load_gateway_config() -> GatewayConfig:
                    if isinstance(ntc, list):
                        ntc = ",".join(str(v) for v in ntc)
                    os.environ["DISCORD_NO_THREAD_CHANNELS"] = str(ntc)
-                # allow_mentions: granular control over what the bot can ping.
-                # Safe defaults (no @everyone/roles) are applied in the adapter;
-                # these YAML keys only override when set and let users opt back
-                # into unsafe modes (e.g. roles=true) if they actually want it.
-                allow_mentions_cfg = discord_cfg.get("allow_mentions")
-                if isinstance(allow_mentions_cfg, dict):
-                    for yaml_key, env_key in (
-                        ("everyone", "DISCORD_ALLOW_MENTION_EVERYONE"),
-                        ("roles", "DISCORD_ALLOW_MENTION_ROLES"),
-                        ("users", "DISCORD_ALLOW_MENTION_USERS"),
-                        ("replied_user", "DISCORD_ALLOW_MENTION_REPLIED_USER"),
-                    ):
-                        if yaml_key in allow_mentions_cfg and not os.getenv(env_key):
-                            os.environ[env_key] = str(allow_mentions_cfg[yaml_key]).lower()

            # Telegram settings → env vars (env vars take precedence)
            telegram_cfg = yaml_cfg.get("telegram", {})
@@ -670,7 +624,8 @@ def load_gateway_config() -> GatewayConfig:
                if "require_mention" in telegram_cfg and not os.getenv("TELEGRAM_REQUIRE_MENTION"):
                    os.environ["TELEGRAM_REQUIRE_MENTION"] = str(telegram_cfg["require_mention"]).lower()
                if "mention_patterns" in telegram_cfg and not os.getenv("TELEGRAM_MENTION_PATTERNS"):
-                    os.environ["TELEGRAM_MENTION_PATTERNS"] = json.dumps(telegram_cfg["mention_patterns"])
+                    import json as _json
+                    os.environ["TELEGRAM_MENTION_PATTERNS"] = _json.dumps(telegram_cfg["mention_patterns"])
                frc = telegram_cfg.get("free_response_chats")
                if frc is not None and not os.getenv("TELEGRAM_FREE_RESPONSE_CHATS"):
                    if isinstance(frc, list):
@@ -707,38 +662,6 @@ def load_gateway_config() -> GatewayConfig:
                    if isinstance(frc, list):
                        frc = ",".join(str(v) for v in frc)
                    os.environ["WHATSAPP_FREE_RESPONSE_CHATS"] = str(frc)
-                if "dm_policy" in whatsapp_cfg and not os.getenv("WHATSAPP_DM_POLICY"):
-                    os.environ["WHATSAPP_DM_POLICY"] = str(whatsapp_cfg["dm_policy"]).lower()
-                af = whatsapp_cfg.get("allow_from")
-                if af is not None and not os.getenv("WHATSAPP_ALLOWED_USERS"):
-                    if isinstance(af, list):
-                        af = ",".join(str(v) for v in af)
-                    os.environ["WHATSAPP_ALLOWED_USERS"] = str(af)
-                if "group_policy" in whatsapp_cfg and not os.getenv("WHATSAPP_GROUP_POLICY"):
-                    os.environ["WHATSAPP_GROUP_POLICY"] = str(whatsapp_cfg["group_policy"]).lower()
-                gaf = whatsapp_cfg.get("group_allow_from")
-                if gaf is not None and not os.getenv("WHATSAPP_GROUP_ALLOWED_USERS"):
-                    if isinstance(gaf, list):
-                        gaf = ",".join(str(v) for v in gaf)
-                    os.environ["WHATSAPP_GROUP_ALLOWED_USERS"] = str(gaf)
-
-            # DingTalk settings → env vars (env vars take precedence)
-            dingtalk_cfg = yaml_cfg.get("dingtalk", {})
-            if isinstance(dingtalk_cfg, dict):
-                if "require_mention" in dingtalk_cfg and not os.getenv("DINGTALK_REQUIRE_MENTION"):
-                    os.environ["DINGTALK_REQUIRE_MENTION"] = str(dingtalk_cfg["require_mention"]).lower()
-                if "mention_patterns" in dingtalk_cfg and not os.getenv("DINGTALK_MENTION_PATTERNS"):
-                    os.environ["DINGTALK_MENTION_PATTERNS"] = json.dumps(dingtalk_cfg["mention_patterns"])
-                frc = dingtalk_cfg.get("free_response_chats")
-                if frc is not None and not os.getenv("DINGTALK_FREE_RESPONSE_CHATS"):
-                    if isinstance(frc, list):
-                        frc = ",".join(str(v) for v in frc)
-                    os.environ["DINGTALK_FREE_RESPONSE_CHATS"] = str(frc)
-                allowed = dingtalk_cfg.get("allowed_users")
-                if allowed is not None and not os.getenv("DINGTALK_ALLOWED_USERS"):
-                    if isinstance(allowed, list):
-                        allowed = ",".join(str(v) for v in allowed)
-                    os.environ["DINGTALK_ALLOWED_USERS"] = str(allowed)

            # Matrix settings → env vars (env vars take precedence)
            matrix_cfg = yaml_cfg.get("matrix", {})
@@ -821,7 +744,7 @@ def _validate_gateway_config(config: "GatewayConfig") -> None:
    # without changing placeholder values get a clear startup error instead
    # of a confusing "auth failed" from the platform API.
    try:
-        from hermes_agent.cli.auth.auth import has_usable_secret
+        from hermes_cli.auth import has_usable_secret
    except ImportError:
        has_usable_secret = None  # type: ignore[assignment]

@@ -1083,25 +1006,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
        if webhook_secret:
            config.platforms[Platform.WEBHOOK].extra["secret"] = webhook_secret

-    # DingTalk
-    dingtalk_client_id = os.getenv("DINGTALK_CLIENT_ID")
-    dingtalk_client_secret = os.getenv("DINGTALK_CLIENT_SECRET")
-    if dingtalk_client_id and dingtalk_client_secret:
-        if Platform.DINGTALK not in config.platforms:
-            config.platforms[Platform.DINGTALK] = PlatformConfig()
-        config.platforms[Platform.DINGTALK].enabled = True
-        config.platforms[Platform.DINGTALK].extra.update({
-            "client_id": dingtalk_client_id,
-            "client_secret": dingtalk_client_secret,
-        })
-        dingtalk_home = os.getenv("DINGTALK_HOME_CHANNEL")
-        if dingtalk_home:
-            config.platforms[Platform.DINGTALK].home_channel = HomeChannel(
-                platform=Platform.DINGTALK,
-                chat_id=dingtalk_home,
-                name=os.getenv("DINGTALK_HOME_CHANNEL_NAME", "Home"),
-            )
-
    # Feishu / Lark
    feishu_app_id = os.getenv("FEISHU_APP_ID")
    feishu_app_secret = os.getenv("FEISHU_APP_SECRET")
@@ -1250,23 +1154,12 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
        qq_group_allowed = os.getenv("QQ_GROUP_ALLOWED_USERS", "").strip()
        if qq_group_allowed:
            extra["group_allow_from"] = qq_group_allowed
-        qq_home = os.getenv("QQBOT_HOME_CHANNEL", "").strip()
-        qq_home_name_env = "QQBOT_HOME_CHANNEL_NAME"
-        if not qq_home:
-            # Back-compat: accept the pre-rename name and log a one-time warning.
-            legacy_home = os.getenv("QQ_HOME_CHANNEL", "").strip()
-            if legacy_home:
-                qq_home = legacy_home
-                qq_home_name_env = "QQ_HOME_CHANNEL_NAME"
-                logging.getLogger(__name__).warning(
-                    "QQ_HOME_CHANNEL is deprecated; rename to QQBOT_HOME_CHANNEL "
-                    "in your .env for consistency with the platform key."
-                )
+        qq_home = os.getenv("QQ_HOME_CHANNEL", "").strip()
        if qq_home:
            config.platforms[Platform.QQBOT].home_channel = HomeChannel(
                platform=Platform.QQBOT,
                chat_id=qq_home,
-                name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
+                name=os.getenv("QQ_HOME_CHANNEL_NAME", "Home"),
            )

    # Session settings
--- a/hermes_agent/gateway/delivery.py
+++ b/hermes_agent/gateway/delivery.py
@@ -14,7 +14,7 @@ from datetime import datetime
 from dataclasses import dataclass
 from typing import Dict, List, Optional, Any

-from hermes_agent.cli.config import get_hermes_home
+from hermes_cli.config import get_hermes_home

 logger = logging.getLogger(__name__)

--- a/hermes_agent/gateway/display_config.py
+++ b/hermes_agent/gateway/display_config.py
--- a/hermes_agent/gateway/hooks.py
+++ b/hermes_agent/gateway/hooks.py
@@ -25,7 +25,7 @@ from typing import Any, Callable, Dict, List, Optional

 import yaml

-from hermes_agent.cli.config import get_hermes_home
+from hermes_cli.config import get_hermes_home


 HOOKS_DIR = get_hermes_home() / "hooks"
@@ -54,7 +54,7 @@ class HookRegistry:
    def _register_builtin_hooks(self) -> None:
        """Register built-in hooks that are always active."""
        try:
-            from hermes_agent.gateway.builtin_hooks.boot_md import handle as boot_md_handle
+            from gateway.builtin_hooks.boot_md import handle as boot_md_handle

            self._handlers.setdefault("gateway:startup", []).append(boot_md_handle)
            self._loaded_hooks.append({
--- a/hermes_agent/gateway/mirror.py
+++ b/hermes_agent/gateway/mirror.py
@@ -14,7 +14,7 @@ import logging
 from datetime import datetime
 from typing import Optional

-from hermes_agent.cli.config import get_hermes_home
+from hermes_cli.config import get_hermes_home

 logger = logging.getLogger(__name__)

@@ -118,7 +118,7 @@ def _append_to_sqlite(session_id: str, message: dict) -> None:
    """Append a message to the SQLite session database."""
    db = None
    try:
-        from hermes_agent.state import SessionDB
+        from hermes_state import SessionDB
        db = SessionDB()
        db.append_message(
            session_id=session_id,
--- a/hermes_agent/gateway/pairing.py
+++ b/hermes_agent/gateway/pairing.py
@@ -27,7 +27,7 @@ import time
 from pathlib import Path
 from typing import Optional

-from hermes_agent.constants import get_hermes_dir
+from hermes_constants import get_hermes_dir


 # Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
--- a/hermes_agent/gateway/platforms/ADDING_A_PLATFORM.md
+++ b/hermes_agent/gateway/platforms/ADDING_A_PLATFORM.md
--- a/hermes_agent/gateway/platforms/init.py
+++ b/hermes_agent/gateway/platforms/init.py
--- a/hermes_agent/gateway/platforms/api_server.py
+++ b/hermes_agent/gateway/platforms/api_server.py
@@ -32,16 +32,16 @@ import sqlite3
 import time
 import uuid
 from typing import Any, Dict, List, Optional
+
 try:
    from aiohttp import web
-
    AIOHTTP_AVAILABLE = True
 except ImportError:
    AIOHTTP_AVAILABLE = False
    web = None  # type: ignore[assignment]

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    SendResult,
    is_network_accessible,
@@ -59,11 +59,6 @@ MAX_NORMALIZED_TEXT_LENGTH = 65_536  # 64 KB cap for normalized content parts
 MAX_CONTENT_LIST_SIZE = 1_000  # Max items when content is an array


-def check_api_server_requirements() -> bool:
-    """Check if API server adapter dependencies are available."""
-    return AIOHTTP_AVAILABLE
-
-
 def _normalize_chat_content(
    content: Any, *, _max_depth: int = 10, _depth: int = 0,
 ) -> str:
@@ -122,159 +117,11 @@ def _normalize_chat_content(
        return ""


-# Content part type aliases used by the OpenAI Chat Completions and Responses
-# APIs.  We accept both spellings on input and emit a single canonical internal
-# shape (``{"type": "text", ...}`` / ``{"type": "image_url", ...}``) that the
-# rest of the agent pipeline already understands.
-_TEXT_PART_TYPES = frozenset({"text", "input_text", "output_text"})
-_IMAGE_PART_TYPES = frozenset({"image_url", "input_image"})
-_FILE_PART_TYPES = frozenset({"file", "input_file"})
+def check_api_server_requirements() -> bool:
+    """Check if API server dependencies are available."""
+    return AIOHTTP_AVAILABLE


-def _normalize_multimodal_content(content: Any) -> Any:
-    """Validate and normalize multimodal content for the API server.
-
-    Returns a plain string when the content is text-only, or a list of
-    ``{"type": "text"|"image_url", ...}`` parts when images are present.
-    The output shape is the native OpenAI Chat Completions vision format,
-    which the agent pipeline accepts verbatim (OpenAI-wire providers) or
-    converts (``_preprocess_anthropic_content`` for Anthropic).
-
-    Raises ``ValueError`` with an OpenAI-style code on invalid input:
-      * ``unsupported_content_type`` — file/input_file/file_id parts, or
-        non-image ``data:`` URLs.
-      * ``invalid_image_url`` — missing URL or unsupported scheme.
-      * ``invalid_content_part`` — malformed text/image objects.
-
-    Callers translate the ValueError into a 400 response.
-    """
-    # Scalar passthrough mirrors ``_normalize_chat_content``.
-    if content is None:
-        return ""
-    if isinstance(content, str):
-        return content[:MAX_NORMALIZED_TEXT_LENGTH] if len(content) > MAX_NORMALIZED_TEXT_LENGTH else content
-    if not isinstance(content, list):
-        # Mirror the legacy text-normalizer's fallback so callers that
-        # pre-existed image support still get a string back.
-        return _normalize_chat_content(content)
-
-    items = content[:MAX_CONTENT_LIST_SIZE] if len(content) > MAX_CONTENT_LIST_SIZE else content
-    normalized_parts: List[Dict[str, Any]] = []
-    text_accum_len = 0
-
-    for part in items:
-        if isinstance(part, str):
-            if part:
-                trimmed = part[:MAX_NORMALIZED_TEXT_LENGTH]
-                normalized_parts.append({"type": "text", "text": trimmed})
-                text_accum_len += len(trimmed)
-            continue
-
-        if not isinstance(part, dict):
-            # Ignore unknown scalars for forward compatibility with future
-            # Responses API additions (e.g. ``refusal``).  The same policy
-            # the text normalizer applies.
-            continue
-
-        raw_type = part.get("type")
-        part_type = str(raw_type or "").strip().lower()
-
-        if part_type in _TEXT_PART_TYPES:
-            text = part.get("text")
-            if text is None:
-                continue
-            if not isinstance(text, str):
-                text = str(text)
-            if text:
-                trimmed = text[:MAX_NORMALIZED_TEXT_LENGTH]
-                normalized_parts.append({"type": "text", "text": trimmed})
-                text_accum_len += len(trimmed)
-            continue
-
-        if part_type in _IMAGE_PART_TYPES:
-            detail = part.get("detail")
-            image_ref = part.get("image_url")
-            # OpenAI Responses sends ``input_image`` with a top-level
-            # ``image_url`` string; Chat Completions sends ``image_url`` as
-            # ``{"url": "...", "detail": "..."}``.  Support both.
-            if isinstance(image_ref, dict):
-                url_value = image_ref.get("url")
-                detail = image_ref.get("detail", detail)
-            else:
-                url_value = image_ref
-            if not isinstance(url_value, str) or not url_value.strip():
-                raise ValueError("invalid_image_url:Image parts must include a non-empty image URL.")
-            url_value = url_value.strip()
-            lowered = url_value.lower()
-            if lowered.startswith("data:"):
-                if not lowered.startswith("data:image/") or "," not in url_value:
-                    raise ValueError(
-                        "unsupported_content_type:Only image data URLs are supported. "
-                        "Non-image data payloads are not supported."
-                    )
-            elif not (lowered.startswith("http://") or lowered.startswith("https://")):
-                raise ValueError(
-                    "invalid_image_url:Image inputs must use http(s) URLs or data:image/... URLs."
-                )
-            image_part: Dict[str, Any] = {"type": "image_url", "image_url": {"url": url_value}}
-            if detail is not None:
-                if not isinstance(detail, str) or not detail.strip():
-                    raise ValueError("invalid_content_part:Image detail must be a non-empty string when provided.")
-                image_part["image_url"]["detail"] = detail.strip()
-            normalized_parts.append(image_part)
-            continue
-
-        if part_type in _FILE_PART_TYPES:
-            raise ValueError(
-                "unsupported_content_type:Inline image inputs are supported, "
-                "but uploaded files and document inputs are not supported on this endpoint."
-            )
-
-        # Unknown part type — reject explicitly so clients get a clear error
-        # instead of a silently dropped turn.
-        raise ValueError(
-            f"unsupported_content_type:Unsupported content part type {raw_type!r}. "
-            "Only text and image_url/input_image parts are supported."
-        )
-
-    if not normalized_parts:
-        return ""
-
-    # Text-only: collapse to a plain string so downstream logging/trajectory
-    # code sees the native shape and prompt caching on text-only turns is
-    # unaffected.
-    if all(p.get("type") == "text" for p in normalized_parts):
-        return "\n".join(p["text"] for p in normalized_parts if p.get("text"))
-
-    return normalized_parts
-
-
-def _content_has_visible_payload(content: Any) -> bool:
-    """True when content has any text or image attachment.  Used to reject empty turns."""
-    if isinstance(content, str):
-        return bool(content.strip())
-    if isinstance(content, list):
-        for part in content:
-            if isinstance(part, dict):
-                ptype = str(part.get("type") or "").strip().lower()
-                if ptype in _TEXT_PART_TYPES and str(part.get("text") or "").strip():
-                    return True
-                if ptype in _IMAGE_PART_TYPES:
-                    return True
-    return False
-
-
-def _multimodal_validation_error(exc: ValueError, *, param: str) -> "web.Response":
-    """Translate a ``_normalize_multimodal_content`` ValueError into a 400 response."""
-    raw = str(exc)
-    code, _, message = raw.partition(":")
-    if not message:
-        code, message = "invalid_content_part", raw
-    return web.json_response(
-        _openai_error(message, code=code, param=param),
-        status=400,
-    )
-
 class ResponseStore:
    """
    SQLite-backed LRU store for Responses API state.
@@ -291,7 +138,7 @@ class ResponseStore:
        self._max_size = max_size
        if db_path is None:
            try:
-                from hermes_agent.cli.config import get_hermes_home
+                from hermes_cli.config import get_hermes_home
                db_path = str(get_hermes_home() / "response_store.db")
            except Exception:
                db_path = ":memory:"
@@ -322,6 +169,7 @@ class ResponseStore:
        ).fetchone()
        if row is None:
            return None
+        import time
        self._conn.execute(
            "UPDATE responses SET accessed_at = ? WHERE response_id = ?",
            (time.time(), response_id),
@@ -331,6 +179,7 @@ class ResponseStore:

    def put(self, response_id: str, data: Dict[str, Any]) -> None:
        """Store a response, evicting the oldest if at capacity."""
+        import time
        self._conn.execute(
            "INSERT OR REPLACE INTO responses (response_id, data, accessed_at) VALUES (?, ?, ?)",
            (response_id, json.dumps(data, default=str), time.time()),
@@ -390,26 +239,30 @@ _CORS_HEADERS = {
 }


-@web.middleware
-async def cors_middleware(request, handler):
-    """Add CORS headers for explicitly allowed origins; handle OPTIONS preflight."""
-    adapter = request.app.get("api_server_adapter")
-    origin = request.headers.get("Origin", "")
-    cors_headers = None
-    if adapter is not None:
-        if not adapter._origin_allowed(origin):
-            return web.Response(status=403)
-        cors_headers = adapter._cors_headers_for_origin(origin)
+if AIOHTTP_AVAILABLE:
+    @web.middleware
+    async def cors_middleware(request, handler):
+        """Add CORS headers for explicitly allowed origins; handle OPTIONS preflight."""
+        adapter = request.app.get("api_server_adapter")
+        origin = request.headers.get("Origin", "")
+        cors_headers = None
+        if adapter is not None:
+            if not adapter._origin_allowed(origin):
+                return web.Response(status=403)
+            cors_headers = adapter._cors_headers_for_origin(origin)

-    if request.method == "OPTIONS":
-        if cors_headers is None:
-            return web.Response(status=403)
-        return web.Response(status=200, headers=cors_headers)
+        if request.method == "OPTIONS":
+            if cors_headers is None:
+                return web.Response(status=403)
+            return web.Response(status=200, headers=cors_headers)
+
+        response = await handler(request)
+        if cors_headers is not None:
+            response.headers.update(cors_headers)
+        return response
+else:
+    cors_middleware = None  # type: ignore[assignment]

-    response = await handler(request)
-    if cors_headers is not None:
-        response.headers.update(cors_headers)
-    return response

 def _openai_error(message: str, err_type: str = "invalid_request_error", param: str = None, code: str = None) -> Dict[str, Any]:
    """OpenAI-style error envelope."""
@@ -423,18 +276,21 @@ def _openai_error(message: str, err_type: str = "invalid_request_error", param:
    }


-@web.middleware
-async def body_limit_middleware(request, handler):
-    """Reject overly large request bodies early based on Content-Length."""
-    if request.method in ("POST", "PUT", "PATCH"):
-        cl = request.headers.get("Content-Length")
-        if cl is not None:
-            try:
-                if int(cl) > MAX_REQUEST_BYTES:
-                    return web.json_response(_openai_error("Request body too large.", code="body_too_large"), status=413)
-            except ValueError:
-                return web.json_response(_openai_error("Invalid Content-Length header.", code="invalid_content_length"), status=400)
-    return await handler(request)
+if AIOHTTP_AVAILABLE:
+    @web.middleware
+    async def body_limit_middleware(request, handler):
+        """Reject overly large request bodies early based on Content-Length."""
+        if request.method in ("POST", "PUT", "PATCH"):
+            cl = request.headers.get("Content-Length")
+            if cl is not None:
+                try:
+                    if int(cl) > MAX_REQUEST_BYTES:
+                        return web.json_response(_openai_error("Request body too large.", code="body_too_large"), status=413)
+                except ValueError:
+                    return web.json_response(_openai_error("Invalid Content-Length header.", code="invalid_content_length"), status=400)
+        return await handler(request)
+else:
+    body_limit_middleware = None  # type: ignore[assignment]

 _SECURITY_HEADERS = {
    "X-Content-Type-Options": "nosniff",
@@ -442,13 +298,16 @@ _SECURITY_HEADERS = {
 }


-@web.middleware
-async def security_headers_middleware(request, handler):
-    """Add security headers to all responses (including errors)."""
-    response = await handler(request)
-    for k, v in _SECURITY_HEADERS.items():
-        response.headers.setdefault(k, v)
-    return response
+if AIOHTTP_AVAILABLE:
+    @web.middleware
+    async def security_headers_middleware(request, handler):
+        """Add security headers to all responses (including errors)."""
+        response = await handler(request)
+        for k, v in _SECURITY_HEADERS.items():
+            response.headers.setdefault(k, v)
+        return response
+else:
+    security_headers_middleware = None  # type: ignore[assignment]


 class _IdempotencyCache:
@@ -456,12 +315,12 @@ class _IdempotencyCache:
    def __init__(self, max_items: int = 1000, ttl_seconds: int = 300):
        from collections import OrderedDict
        self._store = OrderedDict()
-        self._inflight: Dict[tuple[str, str], "asyncio.Task[Any]"] = {}
        self._ttl = ttl_seconds
        self._max = max_items

    def _purge(self):
-        now = time.time()
+        import time as _t
+        now = _t.time()
        expired = [k for k, v in self._store.items() if now - v["ts"] > self._ttl]
        for k in expired:
            self._store.pop(k, None)
@@ -473,27 +332,11 @@ class _IdempotencyCache:
        item = self._store.get(key)
        if item and item["fp"] == fingerprint:
            return item["resp"]
-
-        inflight_key = (key, fingerprint)
-        task = self._inflight.get(inflight_key)
-        if task is None:
-            async def _compute_and_store():
-                resp = await compute_coro()
-                import time as _t
-                self._store[key] = {"resp": resp, "fp": fingerprint, "ts": _t.time()}
-                self._purge()
-                return resp
-
-            task = asyncio.create_task(_compute_and_store())
-            self._inflight[inflight_key] = task
-
-            def _clear_inflight(done_task: "asyncio.Task[Any]") -> None:
-                if self._inflight.get(inflight_key) is done_task:
-                    self._inflight.pop(inflight_key, None)
-
-            task.add_done_callback(_clear_inflight)
-
-        return await asyncio.shield(task)
+        resp = await compute_coro()
+        import time as _t
+        self._store[key] = {"resp": resp, "fp": fingerprint, "ts": _t.time()}
+        self._purge()
+        return resp


 _idem_cache = _IdempotencyCache()
@@ -523,30 +366,6 @@ def _derive_chat_session_id(
    return f"api-{digest}"


-_CRON_AVAILABLE = False
-try:
-    from hermes_agent.cron.jobs import (
-        list_jobs as _cron_list,
-        get_job as _cron_get,
-        create_job as _cron_create,
-        update_job as _cron_update,
-        remove_job as _cron_remove,
-        pause_job as _cron_pause,
-        resume_job as _cron_resume,
-        trigger_job as _cron_trigger,
-    )
-    _CRON_AVAILABLE = True
-except ImportError:
-    _cron_list = None
-    _cron_get = None
-    _cron_create = None
-    _cron_update = None
-    _cron_remove = None
-    _cron_pause = None
-    _cron_resume = None
-    _cron_trigger = None
-
-
 class APIServerAdapter(BasePlatformAdapter):
    """
    OpenAI-compatible HTTP API server adapter.
@@ -604,7 +423,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if explicit and explicit.strip():
            return explicit.strip()
        try:
-            from hermes_agent.cli.profiles import get_active_profile_name
+            from hermes_cli.profiles import get_active_profile_name
            profile = get_active_profile_name()
            if profile and profile not in ("default", "custom"):
                return profile
@@ -680,7 +499,7 @@ class APIServerAdapter(BasePlatformAdapter):
        """
        if self._session_db is None:
            try:
-                from hermes_agent.state import SessionDB
+                from hermes_state import SessionDB
                self._session_db = SessionDB()
            except Exception as e:
                logger.debug("SessionDB unavailable for API server: %s", e)
@@ -707,9 +526,9 @@ class APIServerAdapter(BasePlatformAdapter):
        from config.yaml platform_toolsets.api_server (same as all other
        gateway platforms), falling back to the hermes-api-server default.
        """
-        from hermes_agent.agent.loop import AIAgent
-        from hermes_agent.gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config
-        from hermes_agent.cli.tools_config import _get_platform_tools
+        from run_agent import AIAgent
+        from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config
+        from hermes_cli.tools_config import _get_platform_tools

        runtime_kwargs = _resolve_runtime_agent_kwargs()
        model = _resolve_gateway_model()
@@ -721,7 +540,7 @@ class APIServerAdapter(BasePlatformAdapter):

        # Load fallback provider chain so the API server platform has the
        # same fallback behaviour as Telegram/Discord/Slack (fixes #4954).
-        from hermes_agent.gateway.run import GatewayRunner
+        from gateway.run import GatewayRunner
        fallback_model = GatewayRunner._load_fallback_model()

        agent = AIAgent(
@@ -758,7 +577,7 @@ class APIServerAdapter(BasePlatformAdapter):
        dashboard can display full status without needing a shared PID file or
        /proc access.  No authentication required.
        """
-        from hermes_agent.gateway.status import read_runtime_status
+        from gateway.status import read_runtime_status

        runtime = read_runtime_status() or {}
        return web.json_response({
@@ -793,7 +612,7 @@ class APIServerAdapter(BasePlatformAdapter):
            ],
        })

-    async def _handle_chat_completions(self, request: "web.Request") -> "web.StreamResponse":
+    async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
        """POST /v1/chat/completions — OpenAI Chat Completions format."""
        auth_err = self._check_auth(request)
        if auth_err:
@@ -818,32 +637,26 @@ class APIServerAdapter(BasePlatformAdapter):
        system_prompt = None
        conversation_messages: List[Dict[str, str]] = []

-        for idx, msg in enumerate(messages):
+        for msg in messages:
            role = msg.get("role", "")
-            raw_content = msg.get("content", "")
+            content = _normalize_chat_content(msg.get("content", ""))
            if role == "system":
-                # System messages don't support images (Anthropic rejects, OpenAI
-                # text-model systems don't render them).  Flatten to text.
-                content = _normalize_chat_content(raw_content)
+                # Accumulate system messages
                if system_prompt is None:
                    system_prompt = content
                else:
                    system_prompt = system_prompt + "\n" + content
            elif role in ("user", "assistant"):
-                try:
-                    content = _normalize_multimodal_content(raw_content)
-                except ValueError as exc:
-                    return _multimodal_validation_error(exc, param=f"messages[{idx}].content")
                conversation_messages.append({"role": role, "content": content})

        # Extract the last user message as the primary input
-        user_message: Any = ""
+        user_message = ""
        history = []
        if conversation_messages:
            user_message = conversation_messages[-1].get("content", "")
            history = conversation_messages[:-1]

-        if not _content_has_visible_payload(user_message):
+        if not user_message:
            return web.json_response(
                {"error": {"message": "No user message found in messages", "type": "invalid_request_error"}},
                status=400,
@@ -939,7 +752,7 @@ class APIServerAdapter(BasePlatformAdapter):
                    return
                if name.startswith("_"):
                    return
-                from hermes_agent.agent.display import get_tool_emoji
+                from agent.display import get_tool_emoji
                emoji = get_tool_emoji(name)
                label = preview or name
                _stream_q.put(("__tool_progress__", {
@@ -1577,7 +1390,7 @@ class APIServerAdapter(BasePlatformAdapter):

        return response

-    async def _handle_responses(self, request: "web.Request") -> "web.StreamResponse":
+    async def _handle_responses(self, request: "web.Request") -> "web.Response":
        """POST /v1/responses — OpenAI Responses API format."""
        auth_err = self._check_auth(request)
        if auth_err:
@@ -1611,19 +1424,16 @@ class APIServerAdapter(BasePlatformAdapter):
            # No error if conversation doesn't exist yet — it's a new conversation

        # Normalize input to message list
-        input_messages: List[Dict[str, Any]] = []
+        input_messages: List[Dict[str, str]] = []
        if isinstance(raw_input, str):
            input_messages = [{"role": "user", "content": raw_input}]
        elif isinstance(raw_input, list):
-            for idx, item in enumerate(raw_input):
+            for item in raw_input:
                if isinstance(item, str):
                    input_messages.append({"role": "user", "content": item})
                elif isinstance(item, dict):
                    role = item.get("role", "user")
-                    try:
-                        content = _normalize_multimodal_content(item.get("content", ""))
-                    except ValueError as exc:
-                        return _multimodal_validation_error(exc, param=f"input[{idx}].content")
+                    content = _normalize_chat_content(item.get("content", ""))
                    input_messages.append({"role": role, "content": content})
        else:
            return web.json_response(_openai_error("'input' must be a string or array"), status=400)
@@ -1632,7 +1442,7 @@ class APIServerAdapter(BasePlatformAdapter):
        # This lets stateless clients supply their own history instead of
        # relying on server-side response chaining via previous_response_id.
        # Precedence: explicit conversation_history > previous_response_id.
-        conversation_history: List[Dict[str, Any]] = []
+        conversation_history: List[Dict[str, str]] = []
        raw_history = body.get("conversation_history")
        if raw_history:
            if not isinstance(raw_history, list):
@@ -1646,11 +1456,7 @@ class APIServerAdapter(BasePlatformAdapter):
                        _openai_error(f"conversation_history[{i}] must have 'role' and 'content' fields"),
                        status=400,
                    )
-                try:
-                    entry_content = _normalize_multimodal_content(entry["content"])
-                except ValueError as exc:
-                    return _multimodal_validation_error(exc, param=f"conversation_history[{i}].content")
-                conversation_history.append({"role": str(entry["role"]), "content": entry_content})
+                conversation_history.append({"role": str(entry["role"]), "content": str(entry["content"])})
            if previous_response_id:
                logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")

@@ -1670,8 +1476,8 @@ class APIServerAdapter(BasePlatformAdapter):
            conversation_history.append(msg)

        # Last input message is the user_message
-        user_message: Any = input_messages[-1].get("content", "") if input_messages else ""
-        if not _content_has_visible_payload(user_message):
+        user_message = input_messages[-1].get("content", "") if input_messages else ""
+        if not user_message:
            return web.json_response(_openai_error("No user message found in input"), status=400)

        # Truncation support
@@ -1876,16 +1682,44 @@ class APIServerAdapter(BasePlatformAdapter):
    # Cron jobs API
    # ------------------------------------------------------------------

+    # Check cron module availability once (not per-request)
+    _CRON_AVAILABLE = False
+    try:
+        from cron.jobs import (
+            list_jobs as _cron_list,
+            get_job as _cron_get,
+            create_job as _cron_create,
+            update_job as _cron_update,
+            remove_job as _cron_remove,
+            pause_job as _cron_pause,
+            resume_job as _cron_resume,
+            trigger_job as _cron_trigger,
+        )
+        # Wrap as staticmethod to prevent descriptor binding — these are plain
+        # module functions, not instance methods.  Without this, self._cron_*()
+        # injects ``self`` as the first positional argument and every call
+        # raises TypeError.
+        _cron_list = staticmethod(_cron_list)
+        _cron_get = staticmethod(_cron_get)
+        _cron_create = staticmethod(_cron_create)
+        _cron_update = staticmethod(_cron_update)
+        _cron_remove = staticmethod(_cron_remove)
+        _cron_pause = staticmethod(_cron_pause)
+        _cron_resume = staticmethod(_cron_resume)
+        _cron_trigger = staticmethod(_cron_trigger)
+        _CRON_AVAILABLE = True
+    except ImportError:
+        pass
+
    _JOB_ID_RE = __import__("re").compile(r"[a-f0-9]{12}")
    # Allowed fields for update — prevents clients injecting arbitrary keys
    _UPDATE_ALLOWED_FIELDS = {"name", "schedule", "prompt", "deliver", "skills", "skill", "repeat", "enabled"}
    _MAX_NAME_LENGTH = 200
    _MAX_PROMPT_LENGTH = 5000

-    @staticmethod
-    def _check_jobs_available() -> Optional["web.Response"]:
+    def _check_jobs_available(self) -> Optional["web.Response"]:
        """Return error response if cron module isn't available."""
-        if not _CRON_AVAILABLE:
+        if not self._CRON_AVAILABLE:
            return web.json_response(
                {"error": "Cron module not available"}, status=501,
            )
@@ -1910,7 +1744,7 @@ class APIServerAdapter(BasePlatformAdapter):
            return cron_err
        try:
            include_disabled = request.query.get("include_disabled", "").lower() in ("true", "1")
-            jobs = _cron_list(include_disabled=include_disabled)
+            jobs = self._cron_list(include_disabled=include_disabled)
            return web.json_response({"jobs": jobs})
        except Exception as e:
            return web.json_response({"error": str(e)}, status=500)
@@ -1958,7 +1792,7 @@ class APIServerAdapter(BasePlatformAdapter):
            if repeat is not None:
                kwargs["repeat"] = repeat

-            job = _cron_create(**kwargs)
+            job = self._cron_create(**kwargs)
            return web.json_response({"job": job})
        except Exception as e:
            return web.json_response({"error": str(e)}, status=500)
@@ -1975,7 +1809,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if id_err:
            return id_err
        try:
-            job = _cron_get(job_id)
+            job = self._cron_get(job_id)
            if not job:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"job": job})
@@ -2008,7 +1842,7 @@ class APIServerAdapter(BasePlatformAdapter):
                return web.json_response(
                    {"error": f"Prompt must be ≤ {self._MAX_PROMPT_LENGTH} characters"}, status=400,
                )
-            job = _cron_update(job_id, sanitized)
+            job = self._cron_update(job_id, sanitized)
            if not job:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"job": job})
@@ -2027,7 +1861,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if id_err:
            return id_err
        try:
-            success = _cron_remove(job_id)
+            success = self._cron_remove(job_id)
            if not success:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"ok": True})
@@ -2046,7 +1880,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if id_err:
            return id_err
        try:
-            job = _cron_pause(job_id)
+            job = self._cron_pause(job_id)
            if not job:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"job": job})
@@ -2065,7 +1899,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if id_err:
            return id_err
        try:
-            job = _cron_resume(job_id)
+            job = self._cron_resume(job_id)
            if not job:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"job": job})
@@ -2084,7 +1918,7 @@ class APIServerAdapter(BasePlatformAdapter):
        if id_err:
            return id_err
        try:
-            job = _cron_trigger(job_id)
+            job = self._cron_trigger(job_id)
            if not job:
                return web.json_response({"error": "Job not found"}, status=404)
            return web.json_response({"job": job})
@@ -2471,6 +2305,10 @@ class APIServerAdapter(BasePlatformAdapter):

    async def connect(self) -> bool:
        """Start the aiohttp web server."""
+        if not AIOHTTP_AVAILABLE:
+            logger.warning("[%s] aiohttp not installed", self.name)
+            return False
+
        try:
            mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
            self._app = web.Application(middlewares=mws)
@@ -2517,7 +2355,7 @@ class APIServerAdapter(BasePlatformAdapter):
            # Ported from openclaw/openclaw#64586.
            if is_network_accessible(self._host) and self._api_key:
                try:
-                    from hermes_agent.cli.auth.auth import has_usable_secret
+                    from hermes_cli.auth import has_usable_secret
                    if not has_usable_secret(self._api_key, min_length=8):
                        logger.error(
                            "[%s] Refusing to start: API_SERVER_KEY is set to a "
--- a/hermes_agent/gateway/platforms/base.py
+++ b/hermes_agent/gateway/platforms/base.py
@@ -6,7 +6,6 @@ and implement the required methods.
 """

 import asyncio
-import inspect
 import ipaddress
 import logging
 import os
@@ -19,8 +18,6 @@ import uuid
 from abc import ABC, abstractmethod
 from urllib.parse import urlsplit

-from hermes_agent.utils import normalize_proxy_url
-
 logger = logging.getLogger(__name__)


@@ -161,13 +158,13 @@ def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
    if platform_env_var:
        value = (os.environ.get(platform_env_var) or "").strip()
        if value:
-            return normalize_proxy_url(value)
+            return value
    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
                "https_proxy", "http_proxy", "all_proxy"):
        value = (os.environ.get(key) or "").strip()
        if value:
-            return normalize_proxy_url(value)
-    return normalize_proxy_url(_detect_macos_system_proxy())
+            return value
+    return _detect_macos_system_proxy()


 def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -187,14 +184,16 @@ def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
    if proxy_url.lower().startswith("socks"):
        try:
            from aiohttp_socks import ProxyConnector
-        except ImportError:
-            raise ImportError(
-                "aiohttp-socks is required for SOCKS proxy support. "
-                "Install with: pip install hermes-agent[messaging]"
-            ) from None

-        connector = ProxyConnector.from_url(proxy_url, rdns=True)
-        return {"connector": connector}
+            connector = ProxyConnector.from_url(proxy_url, rdns=True)
+            return {"connector": connector}
+        except ImportError:
+            logger.warning(
+                "aiohttp_socks not installed — SOCKS proxy %s ignored. "
+                "Run: pip install aiohttp-socks",
+                proxy_url,
+            )
+            return {}
    return {"proxy": proxy_url}


@@ -218,14 +217,16 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
    if proxy_url.lower().startswith("socks"):
        try:
            from aiohttp_socks import ProxyConnector
-        except ImportError:
-            raise ImportError(
-                "aiohttp-socks is required for SOCKS proxy support. "
-                "Install with: pip install hermes-agent[messaging]"
-            ) from None

-        connector = ProxyConnector.from_url(proxy_url, rdns=True)
-        return {"connector": connector}, {}
+            connector = ProxyConnector.from_url(proxy_url, rdns=True)
+            return {"connector": connector}, {}
+        except ImportError:
+            logger.warning(
+                "aiohttp_socks not installed — SOCKS proxy %s ignored. "
+                "Run: pip install aiohttp-socks",
+                proxy_url,
+            )
+            return {}, {}
    return {}, {"proxy": proxy_url}


@@ -235,9 +236,12 @@ from pathlib import Path
 from typing import Dict, List, Optional, Any, Callable, Awaitable, Tuple
 from enum import Enum

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.session import SessionSource, build_session_key
-from hermes_agent.constants import get_hermes_dir
+from pathlib import Path as _Path
+sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.session import SessionSource, build_session_key
+from hermes_constants import get_hermes_dir


 GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -293,7 +297,7 @@ async def _ssrf_redirect_guard(response):
    """
    if response.is_redirect and response.next_request:
        redirect_url = str(response.next_request.url)
-        from hermes_agent.tools.security.urls import is_safe_url
+        from tools.url_safety import is_safe_url
        if not is_safe_url(redirect_url):
            raise ValueError(
                f"Blocked redirect to private/internal address: {safe_url_for_log(redirect_url)}"
@@ -382,13 +386,16 @@ async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) ->
    Raises:
        ValueError: If the URL targets a private/internal network (SSRF protection).
    """
-    from hermes_agent.tools.security.urls import is_safe_url
+    from tools.url_safety import is_safe_url
    if not is_safe_url(url):
        raise ValueError(f"Blocked unsafe URL (SSRF protection): {safe_url_for_log(url)}")

+    import asyncio
    import httpx
-    _log = logging.getLogger(__name__)
+    import logging as _logging
+    _log = _logging.getLogger(__name__)

+    last_exc = None
    async with httpx.AsyncClient(
        timeout=30.0,
        follow_redirects=True,
@@ -406,6 +413,7 @@ async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) ->
                response.raise_for_status()
                return cache_image_from_bytes(response.content, ext)
            except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
+                last_exc = exc
                if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
                    raise
                if attempt < retries:
@@ -421,7 +429,7 @@ async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) ->
                    await asyncio.sleep(wait)
                    continue
                raise
-    raise AssertionError("unreachable: retry loop exhausted")
+    raise last_exc


 def cleanup_image_cache(max_age_hours: int = 24) -> int:
@@ -497,13 +505,16 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
    Raises:
        ValueError: If the URL targets a private/internal network (SSRF protection).
    """
-    from hermes_agent.tools.security.urls import is_safe_url
+    from tools.url_safety import is_safe_url
    if not is_safe_url(url):
        raise ValueError(f"Blocked unsafe URL (SSRF protection): {safe_url_for_log(url)}")

+    import asyncio
    import httpx
-    _log = logging.getLogger(__name__)
+    import logging as _logging
+    _log = _logging.getLogger(__name__)

+    last_exc = None
    async with httpx.AsyncClient(
        timeout=30.0,
        follow_redirects=True,
@@ -521,6 +532,7 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
                response.raise_for_status()
                return cache_audio_from_bytes(response.content, ext)
            except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
+                last_exc = exc
                if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
                    raise
                if attempt < retries:
@@ -536,40 +548,7 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
                    await asyncio.sleep(wait)
                    continue
                raise
-    raise AssertionError("unreachable: retry loop exhausted")
-
-
-# ---------------------------------------------------------------------------
-# Video cache utilities
-#
-# Same pattern as image/audio cache -- videos from platforms are downloaded
-# here so the agent can reference them by local file path.
-# ---------------------------------------------------------------------------
-
-VIDEO_CACHE_DIR = get_hermes_dir("cache/videos", "video_cache")
-
-SUPPORTED_VIDEO_TYPES = {
-    ".mp4": "video/mp4",
-    ".mov": "video/quicktime",
-    ".webm": "video/webm",
-    ".mkv": "video/x-matroska",
-    ".avi": "video/x-msvideo",
-}
-
-
-def get_video_cache_dir() -> Path:
-    """Return the video cache directory, creating it if it doesn't exist."""
-    VIDEO_CACHE_DIR.mkdir(parents=True, exist_ok=True)
-    return VIDEO_CACHE_DIR
-
-
-def cache_video_from_bytes(data: bytes, ext: str = ".mp4") -> str:
-    """Save raw video bytes to the cache and return the absolute file path."""
-    cache_dir = get_video_cache_dir()
-    filename = f"video_{uuid.uuid4().hex[:12]}{ext}"
-    filepath = cache_dir / filename
-    filepath.write_bytes(data)
-    return str(filepath)
+    raise last_exc


 # ---------------------------------------------------------------------------
@@ -690,15 +669,6 @@ class MessageEvent:
    # Original platform data
    raw_message: Any = None
    message_id: Optional[str] = None
-
-    # Platform-specific update identifier.  For Telegram this is the
-    # ``update_id`` from the PTB Update wrapper; other platforms currently
-    # ignore it.  Used by ``/restart`` to record the triggering update so the
-    # new gateway can advance the Telegram offset past it and avoid processing
-    # the same ``/restart`` twice if PTB's graceful-shutdown ACK times out
-    # ("Error while calling `get_updates` one more time to mark all fetched
-    # updates" in gateway.log).
-    platform_update_id: Optional[int] = None
    
    # Media attachments
    # media_urls: local file paths (for vision tool access)
@@ -901,11 +871,10 @@ class BasePlatformAdapter(ABC):
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
        # One-shot callbacks to fire after the main response is delivered.
-        # Keyed by session_key. Values are either a bare callback (legacy) or
-        # a ``(generation, callback)`` tuple so GatewayRunner can make deferred
-        # deliveries generation-aware and avoid stale runs clearing callbacks
-        # registered by a fresher run for the same session.
-        self._post_delivery_callbacks: Dict[str, Any] = {}
+        # Keyed by session_key.  GatewayRunner uses this to defer
+        # background-review notifications ("💾 Skill created") until the
+        # primary reply has been sent.
+        self._post_delivery_callbacks: Dict[str, Callable] = {}
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -939,7 +908,7 @@ class BasePlatformAdapter(ABC):
        self._fatal_error_message = None
        self._fatal_error_retryable = True
        try:
-            from hermes_agent.gateway.status import write_runtime_status
+            from gateway.status import write_runtime_status
            write_runtime_status(platform=self.platform.value, platform_state="connected", error_code=None, error_message=None)
        except Exception:
            pass
@@ -949,7 +918,7 @@ class BasePlatformAdapter(ABC):
        if self.has_fatal_error:
            return
        try:
-            from hermes_agent.gateway.status import write_runtime_status
+            from gateway.status import write_runtime_status
            write_runtime_status(platform=self.platform.value, platform_state="disconnected", error_code=None, error_message=None)
        except Exception:
            pass
@@ -960,7 +929,7 @@ class BasePlatformAdapter(ABC):
        self._fatal_error_message = message
        self._fatal_error_retryable = retryable
        try:
-            from hermes_agent.gateway.status import write_runtime_status
+            from gateway.status import write_runtime_status
            write_runtime_status(
                platform=self.platform.value,
                platform_state="fatal",
@@ -980,7 +949,7 @@ class BasePlatformAdapter(ABC):

    def _acquire_platform_lock(self, scope: str, identity: str, resource_desc: str) -> bool:
        """Acquire a scoped lock for this adapter. Returns True on success."""
-        from hermes_agent.gateway.status import acquire_scoped_lock
+        from gateway.status import acquire_scoped_lock
        self._platform_lock_scope = scope
        self._platform_lock_identity = identity
        acquired, existing = acquire_scoped_lock(
@@ -1003,7 +972,7 @@ class BasePlatformAdapter(ABC):
        identity = getattr(self, '_platform_lock_identity', None)
        if not identity:
            return
-        from hermes_agent.gateway.status import release_scoped_lock
+        from gateway.status import release_scoped_lock
        release_scoped_lock(self._platform_lock_scope, identity)
        self._platform_lock_identity = None

@@ -1076,40 +1045,16 @@ class BasePlatformAdapter(ABC):
        """
        pass

-    # Default: the adapter treats ``finalize=True`` on edit_message as a
-    # no-op and is happy to have the stream consumer skip redundant final
-    # edits.  Subclasses that *require* an explicit finalize call to close
-    # out the message lifecycle (e.g. rich card / AI assistant surfaces
-    # such as DingTalk AI Cards) override this to True (class attribute or
-    # property) so the stream consumer knows not to short-circuit.
-    REQUIRES_EDIT_FINALIZE: bool = False
-
    async def edit_message(
        self,
        chat_id: str,
        message_id: str,
        content: str,
-        *,
-        finalize: bool = False,
    ) -> SendResult:
        """
        Edit a previously sent message. Optional — platforms that don't
        support editing return success=False and callers fall back to
        sending a new message.
-
-        ``finalize`` signals that this is the last edit in a streaming
-        sequence.  Most platforms (Telegram, Slack, Discord, Matrix,
-        etc.) treat it as a no-op because their edit APIs have no notion
-        of message lifecycle state — an edit is an edit.  Platforms that
-        render streaming updates with a distinct "in progress" state and
-        require explicit closure (e.g. rich card / AI assistant surfaces
-        such as DingTalk AI Cards) use it to finalize the message and
-        transition the UI out of the streaming indicator — those should
-        also set ``REQUIRES_EDIT_FINALIZE = True`` so callers route a
-        final edit through even when content is unchanged.  Callers
-        should set ``finalize=True`` on the final edit of a streamed
-        response (typically when ``got_done`` fires in the stream
-        consumer) and leave it ``False`` on intermediate edits.
        """
        return SendResult(success=False, error="Not supported")

@@ -1338,7 +1283,7 @@ class BasePlatformAdapter(ABC):
        # Extract MEDIA:<path> tags, allowing optional whitespace after the colon
        # and quoted/backticked paths for LLM-formatted outputs.
        media_pattern = re.compile(
-            r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|pdf)(?=[\s`"',;:)\]}]|$)|\S+)[`"']?'''
+            r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a)(?=[\s`"',;:)\]}]|$)|\S+)[`"']?'''
        )
        for match in media_pattern.finditer(content):
            path = match.group("path").strip()
@@ -1423,13 +1368,7 @@ class BasePlatformAdapter(ABC):

        return paths, cleaned

-    async def _keep_typing(
-        self,
-        chat_id: str,
-        interval: float = 2.0,
-        metadata=None,
-        stop_event: asyncio.Event | None = None,
-    ) -> None:
+    async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
        """
        Continuously send typing indicator until cancelled.
        
@@ -1443,18 +1382,9 @@ class BasePlatformAdapter(ABC):
        """
        try:
            while True:
-                if stop_event is not None and stop_event.is_set():
-                    return
                if chat_id not in self._typing_paused:
                    await self.send_typing(chat_id, metadata=metadata)
-                if stop_event is None:
-                    await asyncio.sleep(interval)
-                    continue
-                try:
-                    await asyncio.wait_for(stop_event.wait(), timeout=interval)
-                except asyncio.TimeoutError:
-                    continue
-                return
+                await asyncio.sleep(interval)
        except asyncio.CancelledError:
            pass  # Normal cancellation when handler completes
        finally:
@@ -1481,59 +1411,6 @@ class BasePlatformAdapter(ABC):
        """Resume typing indicator for a chat after approval resolves."""
        self._typing_paused.discard(chat_id)

-    async def interrupt_session_activity(self, session_key: str, chat_id: str) -> None:
-        """Signal the active session loop to stop and clear typing immediately."""
-        if session_key:
-            interrupt_event = self._active_sessions.get(session_key)
-            if interrupt_event is not None:
-                interrupt_event.set()
-        try:
-            await self.stop_typing(chat_id)
-        except Exception:
-            pass
-
-    def register_post_delivery_callback(
-        self,
-        session_key: str,
-        callback: Callable,
-        *,
-        generation: int | None = None,
-    ) -> None:
-        """Register a deferred callback to fire after the main response.
-
-        ``generation`` lets callers tie the callback to a specific gateway run
-        generation so stale runs cannot clear callbacks owned by a fresher run.
-        """
-        if not session_key or not callable(callback):
-            return
-        if generation is None:
-            self._post_delivery_callbacks[session_key] = callback
-        else:
-            self._post_delivery_callbacks[session_key] = (int(generation), callback)
-
-    def pop_post_delivery_callback(
-        self,
-        session_key: str,
-        *,
-        generation: int | None = None,
-    ) -> Callable | None:
-        """Pop a deferred callback, optionally requiring generation ownership."""
-        if not session_key:
-            return None
-        entry = self._post_delivery_callbacks.get(session_key)
-        if entry is None:
-            return None
-        if isinstance(entry, tuple) and len(entry) == 2:
-            entry_generation, callback = entry
-            if generation is not None and int(entry_generation) != int(generation):
-                return None
-            self._post_delivery_callbacks.pop(session_key, None)
-            return callback if callable(callback) else None
-        if generation is not None:
-            return None
-        self._post_delivery_callbacks.pop(session_key, None)
-        return entry if callable(entry) else None
-
    # ── Processing lifecycle hooks ──────────────────────────────────────────
    # Subclasses override these to react to message processing events
    # (e.g. Discord adds 👀/✅/❌ reactions).
@@ -1702,9 +1579,7 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            from hermes_agent.cli.commands import should_bypass_active_session
-
-            if should_bypass_active_session(cmd):
+            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart", "queue", "q"):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -1774,6 +1649,8 @@ class BasePlatformAdapter(ABC):
          HERMES_HUMAN_DELAY_MIN_MS: minimum delay in ms (default 800, custom mode)
          HERMES_HUMAN_DELAY_MAX_MS: maximum delay in ms (default 2500, custom mode)
        """
+        import random
+
        mode = os.getenv("HERMES_HUMAN_DELAY_MODE", "off").lower()
        if mode == "off":
            return 0.0
@@ -1802,32 +1679,16 @@ class BasePlatformAdapter(ABC):
        # Fall back to a new Event only if the entry was removed externally.
        interrupt_event = self._active_sessions.get(session_key) or asyncio.Event()
        self._active_sessions[session_key] = interrupt_event
-        callback_generation = getattr(interrupt_event, "_hermes_run_generation", None)
        
        # Start continuous typing indicator (refreshes every 2 seconds)
        _thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
-        _keep_typing_kwargs = {"metadata": _thread_metadata}
-        try:
-            _keep_typing_sig = inspect.signature(self._keep_typing)
-        except (TypeError, ValueError):
-            _keep_typing_sig = None
-        if _keep_typing_sig is None or "stop_event" in _keep_typing_sig.parameters:
-            _keep_typing_kwargs["stop_event"] = interrupt_event
-        typing_task = asyncio.create_task(
-            self._keep_typing(
-                event.source.chat_id,
-                **_keep_typing_kwargs,
-            )
-        )
+        typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id, metadata=_thread_metadata))
        
        try:
            await self._run_processing_hook("on_processing_start", event)

-            handler = self._message_handler
-            if handler is None:
-                return
-
-            response = await handler(event)
+            # Call the handler (this can take a while with tool calls)
+            response = await self._message_handler(event)
            
            # Send response if any.  A None/empty response is normal when
            # streaming already delivered the text (already_sent=True) or
@@ -1876,7 +1737,7 @@ class BasePlatformAdapter(ABC):
                        and not media_files
                        and event.source.chat_id not in self._auto_tts_disabled_chats):
                    try:
-                        from hermes_agent.tools.media.tts import text_to_speech_tool, check_tts_requirements
+                        from tools.tts_tool import text_to_speech_tool, check_tts_requirements
                        if check_tts_requirements():
                            import json as _json
                            speech_text = re.sub(r'[*_`#\[\]()]', '', text_content)[:4000].strip()
@@ -2030,18 +1891,9 @@ class BasePlatformAdapter(ABC):
            if session_key in self._pending_messages:
                pending_event = self._pending_messages.pop(session_key)
                logger.debug("[%s] Processing queued message from interrupt", self.name)
-                # Keep the _active_sessions entry live across the turn chain
-                # and only CLEAR the interrupt Event — do NOT delete the entry.
-                # If we deleted here, a concurrent inbound message arriving
-                # during the awaits below would pass the Level-1 guard, spawn
-                # its own _process_message_background, and run simultaneously
-                # with the recursive drain below.  Two agents on one
-                # session_key = duplicate responses, duplicate tool calls.
-                # Clearing the Event keeps the guard live so follow-ups take
-                # the busy-handler path (queue + interrupt) as intended.
-                _active = self._active_sessions.get(session_key)
-                if _active is not None:
-                    _active.clear()
+                # Clean up current session before processing pending
+                if session_key in self._active_sessions:
+                    del self._active_sessions[session_key]
                typing_task.cancel()
                try:
                    await typing_task
@@ -2080,14 +1932,7 @@ class BasePlatformAdapter(ABC):
        finally:
            # Fire any one-shot post-delivery callback registered for this
            # session (e.g. deferred background-review notifications).
-            _callback_generation = callback_generation
-            if hasattr(self, "pop_post_delivery_callback"):
-                _post_cb = self.pop_post_delivery_callback(
-                    session_key,
-                    generation=_callback_generation,
-                )
-            else:
-                _post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
+            _post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
            if callable(_post_cb):
                try:
                    _post_cb()
@@ -2106,37 +1951,9 @@ class BasePlatformAdapter(ABC):
                    await self.stop_typing(event.source.chat_id)
            except Exception:
                pass
-            # Late-arrival drain: a message may have arrived during the
-            # cleanup awaits above (typing_task cancel, stop_typing).  Such
-            # messages passed the Level-1 guard (entry still live, Event
-            # possibly set) and landed in _pending_messages via the
-            # busy-handler path.  Without this block, we would delete the
-            # active-session entry and the queued message would be silently
-            # dropped (user never gets a reply).
-            late_pending = self._pending_messages.pop(session_key, None)
-            if late_pending is not None:
-                logger.debug(
-                    "[%s] Late-arrival pending message during cleanup — spawning drain task",
-                    self.name,
-                )
-                _active = self._active_sessions.get(session_key)
-                if _active is not None:
-                    _active.clear()
-                drain_task = asyncio.create_task(
-                    self._process_message_background(late_pending, session_key)
-                )
-                try:
-                    self._background_tasks.add(drain_task)
-                    drain_task.add_done_callback(self._background_tasks.discard)
-                except TypeError:
-                    # Tests stub create_task() with non-hashable sentinels; tolerate.
-                    pass
-                # Leave _active_sessions[session_key] populated — the drain
-                # task's own lifecycle will clean it up.
-            else:
-                # Clean up session tracking
-                if session_key in self._active_sessions:
-                    del self._active_sessions[session_key]
+            # Clean up session tracking
+            if session_key in self._active_sessions:
+                del self._active_sessions[session_key]
    
    async def cancel_background_tasks(self) -> None:
        """Cancel any in-flight background message-processing tasks.
@@ -2144,26 +1961,12 @@ class BasePlatformAdapter(ABC):
        Used during gateway shutdown/replacement so active sessions from the old
        process do not keep running after adapters are being torn down.
        """
-        # Loop until no new tasks appear.  Without this, a message
-        # arriving during the `await asyncio.gather` below would spawn
-        # a fresh _process_message_background task (added to
-        # self._background_tasks at line ~1668 via handle_message),
-        # and the _background_tasks.clear() at the end of this method
-        # would drop the reference — the task runs untracked against a
-        # disconnecting adapter, logs send-failures, and may linger
-        # until it completes on its own.  Retrying the drain until the
-        # task set stabilizes closes the window.
-        MAX_DRAIN_ROUNDS = 5
-        for _ in range(MAX_DRAIN_ROUNDS):
-            tasks = [task for task in self._background_tasks if not task.done()]
-            if not tasks:
-                break
-            for task in tasks:
-                self._expected_cancelled_tasks.add(task)
-                task.cancel()
+        tasks = [task for task in self._background_tasks if not task.done()]
+        for task in tasks:
+            self._expected_cancelled_tasks.add(task)
+            task.cancel()
+        if tasks:
            await asyncio.gather(*tasks, return_exceptions=True)
-            # Loop: late-arrival tasks spawned during the gather above
-            # will be in self._background_tasks now.  Re-check.
        self._background_tasks.clear()
        self._expected_cancelled_tasks.clear()
        self._pending_messages.clear()
@@ -2188,7 +1991,6 @@ class BasePlatformAdapter(ABC):
        chat_topic: Optional[str] = None,
        user_id_alt: Optional[str] = None,
        chat_id_alt: Optional[str] = None,
-        is_bot: bool = False,
    ) -> SessionSource:
        """Helper to build a SessionSource for this platform."""
        # Normalize empty topic to None
@@ -2205,7 +2007,6 @@ class BasePlatformAdapter(ABC):
            chat_topic=chat_topic.strip() if chat_topic else None,
            user_id_alt=user_id_alt,
            chat_id_alt=chat_id_alt,
-            is_bot=is_bot,
        )
    
    @abstractmethod
--- a/hermes_agent/gateway/platforms/bluebubbles.py
+++ b/hermes_agent/gateway/platforms/bluebubbles.py
@@ -14,14 +14,14 @@ import logging
 import os
 import re
 import uuid
-from datetime import datetime, timezone
+from datetime import datetime
 from typing import Any, Dict, List, Optional
 from urllib.parse import quote

 import httpx

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
@@ -30,7 +30,7 @@ from hermes_agent.gateway.platforms.base import (
    cache_audio_from_bytes,
    cache_document_from_bytes,
 )
-from hermes_agent.gateway.platforms.helpers import strip_markdown
+from gateway.platforms.helpers import strip_markdown

 logger = logging.getLogger(__name__)

@@ -75,7 +75,7 @@ def _redact(text: str) -> str:
 def check_bluebubbles_requirements() -> bool:
    try:
        import aiohttp  # noqa: F401
-        import httpx  # noqa: F401
+        import httpx as _httpx  # noqa: F401
    except ImportError:
        return False
    return True
@@ -377,7 +377,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
        payload = {
            "addresses": [address],
            "message": message,
-            "tempGuid": f"temp-{datetime.now(timezone.utc).timestamp()}",
+            "tempGuid": f"temp-{datetime.utcnow().timestamp()}",
        }
        try:
            res = await self._api_post("/api/v1/chat/new", payload)
@@ -417,7 +417,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
                )
            payload: Dict[str, Any] = {
                "chatGuid": guid,
-                "tempGuid": f"temp-{datetime.now(timezone.utc).timestamp()}",
+                "tempGuid": f"temp-{datetime.utcnow().timestamp()}",
                "message": chunk,
            }
            if reply_to and self._private_api_enabled and self._helper_connected:
@@ -502,7 +502,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
        metadata: Optional[Dict[str, Any]] = None,
    ) -> SendResult:
        try:
-            from hermes_agent.gateway.platforms.base import cache_image_from_url
+            from gateway.platforms.base import cache_image_from_url

            local_path = await cache_image_from_url(image_url)
            return await self._send_attachment(chat_id, local_path, caption=caption)
--- a/gateway/platforms/dingtalk.py
+++ b/gateway/platforms/dingtalk.py
@@ -0,0 +1,347 @@
+"""
+DingTalk platform adapter using Stream Mode.
+
+Uses dingtalk-stream SDK for real-time message reception without webhooks.
+Responses are sent via DingTalk's session webhook (markdown format).
+
+Requires:
+    pip install dingtalk-stream httpx
+    DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET env vars
+
+Configuration in config.yaml:
+    platforms:
+      dingtalk:
+        enabled: true
+        extra:
+          client_id: "your-app-key"      # or DINGTALK_CLIENT_ID env var
+          client_secret: "your-secret"   # or DINGTALK_CLIENT_SECRET env var
+"""
+
+import asyncio
+import logging
+import os
+import re
+import uuid
+from datetime import datetime, timezone
+from typing import Any, Dict, Optional
+
+try:
+    import dingtalk_stream
+    from dingtalk_stream import ChatbotHandler, ChatbotMessage
+    DINGTALK_STREAM_AVAILABLE = True
+except ImportError:
+    DINGTALK_STREAM_AVAILABLE = False
+    dingtalk_stream = None  # type: ignore[assignment]
+
+try:
+    import httpx
+    HTTPX_AVAILABLE = True
+except ImportError:
+    HTTPX_AVAILABLE = False
+    httpx = None  # type: ignore[assignment]
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.helpers import MessageDeduplicator
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+)
+
+logger = logging.getLogger(__name__)
+
+MAX_MESSAGE_LENGTH = 20000
+RECONNECT_BACKOFF = [2, 5, 10, 30, 60]
+_SESSION_WEBHOOKS_MAX = 500
+_DINGTALK_WEBHOOK_RE = re.compile(r'^https://(?:api|oapi)\.dingtalk\.com/')
+
+
+def check_dingtalk_requirements() -> bool:
+    """Check if DingTalk dependencies are available and configured."""
+    if not DINGTALK_STREAM_AVAILABLE or not HTTPX_AVAILABLE:
+        return False
+    if not os.getenv("DINGTALK_CLIENT_ID") or not os.getenv("DINGTALK_CLIENT_SECRET"):
+        return False
+    return True
+
+
+class DingTalkAdapter(BasePlatformAdapter):
+    """DingTalk chatbot adapter using Stream Mode.
+
+    The dingtalk-stream SDK maintains a long-lived WebSocket connection.
+    Incoming messages arrive via a ChatbotHandler callback. Replies are
+    sent via the incoming message's session_webhook URL using httpx.
+    """
+
+    MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
+
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.DINGTALK)
+
+        extra = config.extra or {}
+        self._client_id: str = extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID", "")
+        self._client_secret: str = extra.get("client_secret") or os.getenv("DINGTALK_CLIENT_SECRET", "")
+
+        self._stream_client: Any = None
+        self._stream_task: Optional[asyncio.Task] = None
+        self._http_client: Optional["httpx.AsyncClient"] = None
+
+        # Message deduplication
+        self._dedup = MessageDeduplicator(max_size=1000)
+        # Map chat_id -> session_webhook for reply routing
+        self._session_webhooks: Dict[str, str] = {}
+
+    # -- Connection lifecycle -----------------------------------------------
+
+    async def connect(self) -> bool:
+        """Connect to DingTalk via Stream Mode."""
+        if not DINGTALK_STREAM_AVAILABLE:
+            logger.warning("[%s] dingtalk-stream not installed. Run: pip install dingtalk-stream", self.name)
+            return False
+        if not HTTPX_AVAILABLE:
+            logger.warning("[%s] httpx not installed. Run: pip install httpx", self.name)
+            return False
+        if not self._client_id or not self._client_secret:
+            logger.warning("[%s] DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET required", self.name)
+            return False
+
+        try:
+            self._http_client = httpx.AsyncClient(timeout=30.0)
+
+            credential = dingtalk_stream.Credential(self._client_id, self._client_secret)
+            self._stream_client = dingtalk_stream.DingTalkStreamClient(credential)
+
+            # Capture the current event loop for cross-thread dispatch
+            loop = asyncio.get_running_loop()
+            handler = _IncomingHandler(self, loop)
+            self._stream_client.register_callback_handler(
+                dingtalk_stream.ChatbotMessage.TOPIC, handler
+            )
+
+            self._stream_task = asyncio.create_task(self._run_stream())
+            self._mark_connected()
+            logger.info("[%s] Connected via Stream Mode", self.name)
+            return True
+        except Exception as e:
+            logger.error("[%s] Failed to connect: %s", self.name, e)
+            return False
+
+    async def _run_stream(self) -> None:
+        """Run the stream client with auto-reconnection."""
+        backoff_idx = 0
+        while self._running:
+            try:
+                logger.debug("[%s] Starting stream client...", self.name)
+                await self._stream_client.start()
+            except asyncio.CancelledError:
+                return
+            except Exception as e:
+                if not self._running:
+                    return
+                logger.warning("[%s] Stream client error: %s", self.name, e)
+
+            if not self._running:
+                return
+
+            delay = RECONNECT_BACKOFF[min(backoff_idx, len(RECONNECT_BACKOFF) - 1)]
+            logger.info("[%s] Reconnecting in %ds...", self.name, delay)
+            await asyncio.sleep(delay)
+            backoff_idx += 1
+
+    async def disconnect(self) -> None:
+        """Disconnect from DingTalk."""
+        self._running = False
+        self._mark_disconnected()
+
+        if self._stream_task:
+            self._stream_task.cancel()
+            try:
+                await self._stream_task
+            except asyncio.CancelledError:
+                pass
+            self._stream_task = None
+
+        if self._http_client:
+            await self._http_client.aclose()
+            self._http_client = None
+
+        self._stream_client = None
+        self._session_webhooks.clear()
+        self._dedup.clear()
+        logger.info("[%s] Disconnected", self.name)
+
+    # -- Inbound message processing -----------------------------------------
+
+    async def _on_message(self, message: "ChatbotMessage") -> None:
+        """Process an incoming DingTalk chatbot message."""
+        msg_id = getattr(message, "message_id", None) or uuid.uuid4().hex
+        if self._dedup.is_duplicate(msg_id):
+            logger.debug("[%s] Duplicate message %s, skipping", self.name, msg_id)
+            return
+
+        text = self._extract_text(message)
+        if not text:
+            logger.debug("[%s] Empty message, skipping", self.name)
+            return
+
+        # Chat context
+        conversation_id = getattr(message, "conversation_id", "") or ""
+        conversation_type = getattr(message, "conversation_type", "1")
+        is_group = str(conversation_type) == "2"
+        sender_id = getattr(message, "sender_id", "") or ""
+        sender_nick = getattr(message, "sender_nick", "") or sender_id
+        sender_staff_id = getattr(message, "sender_staff_id", "") or ""
+
+        chat_id = conversation_id or sender_id
+        chat_type = "group" if is_group else "dm"
+
+        # Store session webhook for reply routing (validate origin to prevent SSRF)
+        session_webhook = getattr(message, "session_webhook", None) or ""
+        if session_webhook and chat_id and _DINGTALK_WEBHOOK_RE.match(session_webhook):
+            if len(self._session_webhooks) >= _SESSION_WEBHOOKS_MAX:
+                # Evict oldest entry to cap memory growth
+                try:
+                    self._session_webhooks.pop(next(iter(self._session_webhooks)))
+                except StopIteration:
+                    pass
+            self._session_webhooks[chat_id] = session_webhook
+
+        source = self.build_source(
+            chat_id=chat_id,
+            chat_name=getattr(message, "conversation_title", None),
+            chat_type=chat_type,
+            user_id=sender_id,
+            user_name=sender_nick,
+            user_id_alt=sender_staff_id if sender_staff_id else None,
+        )
+
+        # Parse timestamp
+        create_at = getattr(message, "create_at", None)
+        try:
+            timestamp = datetime.fromtimestamp(int(create_at) / 1000, tz=timezone.utc) if create_at else datetime.now(tz=timezone.utc)
+        except (ValueError, OSError, TypeError):
+            timestamp = datetime.now(tz=timezone.utc)
+
+        event = MessageEvent(
+            text=text,
+            message_type=MessageType.TEXT,
+            source=source,
+            message_id=msg_id,
+            raw_message=message,
+            timestamp=timestamp,
+        )
+
+        logger.debug("[%s] Message from %s in %s: %s",
+                      self.name, sender_nick, chat_id[:20] if chat_id else "?", text[:50])
+        await self.handle_message(event)
+
+    @staticmethod
+    def _extract_text(message: "ChatbotMessage") -> str:
+        """Extract plain text from a DingTalk chatbot message.
+
+        Handles both legacy and current dingtalk-stream SDK payload shapes:
+          * legacy: ``message.text`` was a dict ``{"content": "..."}``
+          * >= 0.20: ``message.text`` is a ``TextContent`` dataclass whose
+            ``__str__`` returns ``"TextContent(content=...)"`` — never fall
+            back to ``str(text)`` without extracting ``.content`` first.
+          * rich text moved from ``message.rich_text`` (list) to
+            ``message.rich_text_content.rich_text_list`` (list of dicts).
+        """
+        text = getattr(message, "text", None)
+        content = ""
+        if text is not None:
+            if isinstance(text, dict):
+                content = (text.get("content") or "").strip()
+            elif hasattr(text, "content"):
+                content = str(text.content or "").strip()
+            else:
+                content = str(text).strip()
+
+        if not content:
+            rich_list = None
+            rtc = getattr(message, "rich_text_content", None)
+            if rtc is not None and hasattr(rtc, "rich_text_list"):
+                rich_list = rtc.rich_text_list
+            if rich_list is None:
+                rich_list = getattr(message, "rich_text", None)
+            if rich_list and isinstance(rich_list, list):
+                parts = [item["text"] for item in rich_list
+                         if isinstance(item, dict) and item.get("text")]
+                content = " ".join(parts).strip()
+        return content
+
+    # -- Outbound messaging -------------------------------------------------
+
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Send a markdown reply via DingTalk session webhook."""
+        metadata = metadata or {}
+
+        session_webhook = metadata.get("session_webhook") or self._session_webhooks.get(chat_id)
+        if not session_webhook:
+            return SendResult(success=False,
+                              error="No session_webhook available. Reply must follow an incoming message.")
+
+        if not self._http_client:
+            return SendResult(success=False, error="HTTP client not initialized")
+
+        payload = {
+            "msgtype": "markdown",
+            "markdown": {"title": "Hermes", "text": content[:self.MAX_MESSAGE_LENGTH]},
+        }
+
+        try:
+            resp = await self._http_client.post(session_webhook, json=payload, timeout=15.0)
+            if resp.status_code < 300:
+                return SendResult(success=True, message_id=uuid.uuid4().hex[:12])
+            body = resp.text
+            logger.warning("[%s] Send failed HTTP %d: %s", self.name, resp.status_code, body[:200])
+            return SendResult(success=False, error=f"HTTP {resp.status_code}: {body[:200]}")
+        except httpx.TimeoutException:
+            return SendResult(success=False, error="Timeout sending message to DingTalk")
+        except Exception as e:
+            logger.error("[%s] Send error: %s", self.name, e)
+            return SendResult(success=False, error=str(e))
+
+    async def send_typing(self, chat_id: str, metadata=None) -> None:
+        """DingTalk does not support typing indicators."""
+        pass
+
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Return basic info about a DingTalk conversation."""
+        return {"name": chat_id, "type": "group" if "group" in chat_id.lower() else "dm"}
+
+
+# ---------------------------------------------------------------------------
+# Internal stream handler
+# ---------------------------------------------------------------------------
+
+class _IncomingHandler(ChatbotHandler if DINGTALK_STREAM_AVAILABLE else object):
+    """dingtalk-stream ChatbotHandler that forwards messages to the adapter."""
+
+    def __init__(self, adapter: DingTalkAdapter, loop: asyncio.AbstractEventLoop):
+        if DINGTALK_STREAM_AVAILABLE:
+            super().__init__()
+        self._adapter = adapter
+        self._loop = loop
+
+    async def process(self, callback_message):
+        """Called by dingtalk-stream when a message arrives.
+
+        dingtalk-stream >= 0.24 passes a CallbackMessage whose `.data` contains
+        the chatbot payload. Convert it to ChatbotMessage and await the adapter
+        handler directly on the main event loop.
+        """
+        try:
+            chatbot_msg = ChatbotMessage.from_dict(callback_message.data)
+            await self._adapter._on_message(chatbot_msg)
+        except Exception:
+            logger.exception("[DingTalk] Error processing incoming message")
+
+        return dingtalk_stream.AckMessage.STATUS_OK, "OK"
--- a/hermes_agent/gateway/platforms/discord.py
+++ b/hermes_agent/gateway/platforms/discord.py
--- a/hermes_agent/gateway/platforms/email.py
+++ b/hermes_agent/gateway/platforms/email.py
@@ -32,7 +32,7 @@ from email import encoders
 from pathlib import Path
 from typing import Any, Dict, List, Optional

-from hermes_agent.gateway.platforms.base import (
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
@@ -40,7 +40,7 @@ from hermes_agent.gateway.platforms.base import (
    cache_document_from_bytes,
    cache_image_from_bytes,
 )
-from hermes_agent.gateway.config import Platform, PlatformConfig
+from gateway.config import Platform, PlatformConfig

 logger = logging.getLogger(__name__)
 # Automated sender patterns — emails from these are silently ignored
@@ -532,7 +532,6 @@ class EmailAdapter(BasePlatformAdapter):
        image_url: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
-        metadata: Optional[Dict[str, Any]] = None,
    ) -> SendResult:
        """Send an image URL as part of an email body."""
        text = caption or ""
@@ -546,7 +545,6 @@ class EmailAdapter(BasePlatformAdapter):
        caption: Optional[str] = None,
        file_name: Optional[str] = None,
        reply_to: Optional[str] = None,
-        **kwargs,
    ) -> SendResult:
        """Send a file as an email attachment."""
        try:
--- a/hermes_agent/gateway/platforms/feishu.py
+++ b/hermes_agent/gateway/platforms/feishu.py
@@ -8,8 +8,7 @@ Supports:
 - Gateway allowlist integration via FEISHU_ALLOWED_USERS
 - Persistent dedup state across restarts
 - Per-chat serial message processing (matches openclaw createChatQueue)
- Processing status reactions: Typing while working, removed on success,
-  swapped for CrossMark on failure
+- Persistent ACK emoji reaction on inbound messages
 - Reaction events routed as synthetic text events (matches openclaw)
 - Interactive card button-click events routed as synthetic COMMAND events
 - Webhook anomaly tracking (matches openclaw createWebhookAnomalyTracker)
@@ -30,7 +29,6 @@ import re
 import threading
 import time
 import uuid
-from collections import OrderedDict
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
@@ -95,12 +93,11 @@ except ImportError:
 FEISHU_WEBSOCKET_AVAILABLE = websockets is not None
 FEISHU_WEBHOOK_AVAILABLE = aiohttp is not None

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
-    ProcessingOutcome,
    SendResult,
    SUPPORTED_DOCUMENT_TYPES,
    cache_document_from_bytes,
@@ -108,8 +105,8 @@ from hermes_agent.gateway.platforms.base import (
    cache_audio_from_bytes,
    cache_image_from_bytes,
 )
-from hermes_agent.gateway.status import acquire_scoped_lock, release_scoped_lock
-from hermes_agent.constants import get_hermes_home
+from gateway.status import acquire_scoped_lock, release_scoped_lock
+from hermes_constants import get_hermes_home

 logger = logging.getLogger(__name__)

@@ -122,8 +119,6 @@ _MARKDOWN_HINT_RE = re.compile(
    re.MULTILINE,
 )
 _MARKDOWN_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
-_MARKDOWN_FENCE_OPEN_RE = re.compile(r"^```([^\n`]*)\s*$")
-_MARKDOWN_FENCE_CLOSE_RE = re.compile(r"^```\s*$")
 _MENTION_RE = re.compile(r"@_user_\d+")
 _MULTISPACE_RE = re.compile(r"[ \t]{2,}")
 _POST_CONTENT_INVALID_RE = re.compile(r"content format of the post type is incorrect", re.IGNORECASE)
@@ -193,17 +188,7 @@ _APPROVAL_LABEL_MAP: Dict[str, str] = {
 }
 _FEISHU_BOT_MSG_TRACK_SIZE = 512                   # LRU size for tracking sent message IDs
 _FEISHU_REPLY_FALLBACK_CODES = frozenset({230011, 231003})  # reply target withdrawn/missing → create fallback
-
-# Feishu reactions render as prominent badges, unlike Discord/Telegram's
-# small footer emoji — a success badge on every message would add noise, so
-# we only mark start (Typing) and failure (CrossMark); the reply itself is
-# the success signal.
-_FEISHU_REACTION_IN_PROGRESS = "Typing"
-_FEISHU_REACTION_FAILURE = "CrossMark"
-# Bound on the (message_id → reaction_id) handle cache. Happy-path entries
-# drain on completion; the cap is a safeguard against unbounded growth from
-# delete-failures, not a capacity plan.
-_FEISHU_PROCESSING_REACTION_CACHE_SIZE = 1024
+_FEISHU_ACK_EMOJI = "OK"

 # QR onboarding constants
 _ONBOARD_ACCOUNTS_URLS = {
@@ -414,7 +399,7 @@ def _strip_markdown_to_plain_text(text: str) -> str:
    Feishu-specific patterns (blockquotes, strikethrough, underline tags,
    horizontal rules, \\r\\n normalisation).
    """
-    from hermes_agent.gateway.platforms.helpers import strip_markdown
+    from gateway.platforms.helpers import strip_markdown
    plain = text.replace("\r\n", "\n")
    plain = _MARKDOWN_LINK_RE.sub(lambda m: f"{m.group(1)} ({m.group(2).strip()})", plain)
    plain = re.sub(r"^>\s?", "", plain, flags=re.MULTILINE)
@@ -445,66 +430,23 @@ def _coerce_required_int(value: Any, default: int, min_value: int = 0) -> int:


 def _build_markdown_post_payload(content: str) -> str:
-    rows = _build_markdown_post_rows(content)
    return json.dumps(
        {
            "zh_cn": {
-                "content": rows,
+                "content": [
+                    [
+                        {
+                            "tag": "md",
+                            "text": content,
+                        }
+                    ]
+                ],
            }
        },
        ensure_ascii=False,
    )


-def _build_markdown_post_rows(content: str) -> List[List[Dict[str, str]]]:
-    """Build Feishu post rows while isolating fenced code blocks.
-
-    Feishu's `md` renderer can swallow trailing content when a fenced code block
-    appears inside one large markdown element. Split the reply at real fence
-    lines so prose before/after the code block remains visible while code stays
-    in a dedicated row.
-    """
-    if not content:
-        return [[{"tag": "md", "text": ""}]]
-    if "```" not in content:
-        return [[{"tag": "md", "text": content}]]
-
-    rows: List[List[Dict[str, str]]] = []
-    current: List[str] = []
-    in_code_block = False
-
-    def _flush_current() -> None:
-        nonlocal current
-        if not current:
-            return
-        segment = "\n".join(current)
-        if segment.strip():
-            rows.append([{"tag": "md", "text": segment}])
-        current = []
-
-    for raw_line in content.splitlines():
-        stripped_line = raw_line.strip()
-        is_fence = bool(
-            _MARKDOWN_FENCE_CLOSE_RE.match(stripped_line)
-            if in_code_block
-            else _MARKDOWN_FENCE_OPEN_RE.match(stripped_line)
-        )
-
-        if is_fence:
-            if not in_code_block:
-                _flush_current()
-            current.append(raw_line)
-            in_code_block = not in_code_block
-            if not in_code_block:
-                _flush_current()
-            continue
-
-        current.append(raw_line)
-
-    _flush_current()
-    return rows or [[{"tag": "md", "text": content}]]
-
-
 def parse_feishu_post_payload(payload: Any) -> FeishuPostParseResult:
    resolved = _resolve_post_payload(payload)
    if not resolved:
@@ -1154,9 +1096,6 @@ class FeishuAdapter(BasePlatformAdapter):
        # Exec approval button state (approval_id → {session_key, message_id, chat_id})
        self._approval_state: Dict[int, Dict[str, str]] = {}
        self._approval_counter = itertools.count(1)
-        # Feishu reaction deletion requires the opaque reaction_id returned
-        # by create, so we cache it per message_id.
-        self._pending_processing_reactions: "OrderedDict[str, str]" = OrderedDict()
        self._load_seen_message_ids()

    @staticmethod
@@ -1289,10 +1228,6 @@ class FeishuAdapter(BasePlatformAdapter):
            .register_p2_im_chat_member_bot_deleted_v1(self._on_bot_removed_from_chat)
            .register_p2_im_chat_access_event_bot_p2p_chat_entered_v1(self._on_p2p_chat_entered)
            .register_p2_im_message_recalled_v1(self._on_message_recalled)
-            .register_p2_customized_event(
-                "drive.notice.comment_add_v1",
-                self._on_drive_comment_event,
-            )
            .build()
        )

@@ -1484,8 +1419,6 @@ class FeishuAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
-        *,
-        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent Feishu text/post message."""
        if not self._client:
@@ -1988,8 +1921,8 @@ class FeishuAdapter(BasePlatformAdapter):
        if not message_id or self._is_duplicate(message_id):
            logger.debug("[Feishu] Dropping duplicate/missing message_id: %s", message_id)
            return
-        if self._is_self_sent_bot_message(event):
-            logger.debug("[Feishu] Dropping self-sent bot event: %s", message_id)
+        if getattr(sender, "sender_type", "") == "bot":
+            logger.debug("[Feishu] Dropping bot-originated event: %s", message_id)
            return

        chat_type = getattr(message, "chat_type", "p2p")
@@ -2032,25 +1965,6 @@ class FeishuAdapter(BasePlatformAdapter):
    def _on_message_recalled(self, data: Any) -> None:
        logger.debug("[Feishu] Message recalled by user")

-    def _on_drive_comment_event(self, data: Any) -> None:
-        """Handle drive document comment notification (drive.notice.comment_add_v1).
-
-        Delegates to :mod:`gateway.platforms.feishu_comment` for parsing,
-        logging, and reaction.  Scheduling follows the same
-        ``run_coroutine_threadsafe`` pattern used by ``_on_message_event``.
-        """
-        from hermes_agent.gateway.platforms.feishu_comment import handle_drive_comment_event
-
-        loop = self._loop
-        if not self._loop_accepts_callbacks(loop):
-            logger.warning("[Feishu] Dropping drive comment event before adapter loop is ready")
-            return
-        future = asyncio.run_coroutine_threadsafe(
-            handle_drive_comment_event(self._client, data, self_open_id=self._bot_open_id),
-            loop,
-        )
-        future.add_done_callback(self._log_background_failure)
-
    def _on_reaction_event(self, event_type: str, data: Any) -> None:
        """Route user reactions on bot messages as synthetic text events."""
        event = getattr(data, "event", None)
@@ -2066,12 +1980,12 @@ class FeishuAdapter(BasePlatformAdapter):
            operator_type,
            emoji_type,
        )
-        # Drop bot/app-origin reactions to break the feedback loop from our
-        # own lifecycle reactions. A human reacting with the same emoji (e.g.
-        # clicking Typing on a bot message) is still routed through.
+        # Only process reactions from real users. Ignore app/bot-generated reactions
+        # and Hermes' own ACK emoji to avoid feedback loops.
        loop = self._loop
        if (
            operator_type in {"bot", "app"}
+            or emoji_type == _FEISHU_ACK_EMOJI
            or not message_id
            or loop is None
            or bool(getattr(loop, "is_closed", lambda: False)())
@@ -2151,7 +2065,7 @@ class FeishuAdapter(BasePlatformAdapter):
            logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
            return
        try:
-            from hermes_agent.tools.security.approval import resolve_gateway_approval
+            from tools.approval import resolve_gateway_approval
            count = resolve_gateway_approval(state["session_key"], choice)
            logger.info(
                "Feishu button resolved %d approval(s) for session %s (choice=%s, user=%s)",
@@ -2295,35 +2209,33 @@ class FeishuAdapter(BasePlatformAdapter):

    async def _handle_message_with_guards(self, event: MessageEvent) -> None:
        """Dispatch a single event through the agent pipeline with per-chat serialization
-        before handing the event off to the agent.
+        and a persistent ACK emoji reaction before processing starts.

-        Per-chat lock ensures messages in the same chat are processed one at a
-        time (matches openclaw's createChatQueue serial queue behaviour).
+        - Per-chat lock: ensures messages in the same chat are processed one at a time
+          (matches openclaw's createChatQueue serial queue behaviour).
+        - ACK indicator: adds a CHECK reaction to the triggering message before handing
+          off to the agent and leaves it in place as a receipt marker.
        """
        chat_id = getattr(event.source, "chat_id", "") or "" if event.source else ""
        chat_lock = self._get_chat_lock(chat_id)
        async with chat_lock:
+            message_id = event.message_id
+            if message_id:
+                await self._add_ack_reaction(message_id)
            await self.handle_message(event)

-    # =========================================================================
-    # Processing status reactions
-    # =========================================================================
-
-    def _reactions_enabled(self) -> bool:
-        return os.getenv("FEISHU_REACTIONS", "true").strip().lower() not in ("false", "0", "no")
-
-    async def _add_reaction(self, message_id: str, emoji_type: str) -> Optional[str]:
-        """Return the reaction_id on success, else None. The id is needed later for deletion."""
-        if not self._client or not message_id or not emoji_type:
+    async def _add_ack_reaction(self, message_id: str) -> Optional[str]:
+        """Add a persistent ACK emoji reaction to signal the message was received."""
+        if not self._client or not message_id:
            return None
        try:
-            from lark_oapi.api.im.v1 import (
+            from lark_oapi.api.im.v1 import (  # lazy import — keeps optional dep optional
                CreateMessageReactionRequest,
                CreateMessageReactionRequestBody,
            )
            body = (
                CreateMessageReactionRequestBody.builder()
-                .reaction_type({"emoji_type": emoji_type})
+                .reaction_type({"emoji_type": _FEISHU_ACK_EMOJI})
                .build()
            )
            request = (
@@ -2336,93 +2248,16 @@ class FeishuAdapter(BasePlatformAdapter):
            if response and getattr(response, "success", lambda: False)():
                data = getattr(response, "data", None)
                return getattr(data, "reaction_id", None)
-            logger.debug(
-                "[Feishu] Add reaction %s on %s rejected: code=%s msg=%s",
-                emoji_type,
+            logger.warning(
+                "[Feishu] Failed to add ack reaction to %s: code=%s msg=%s",
                message_id,
                getattr(response, "code", None),
                getattr(response, "msg", None),
            )
        except Exception:
-            logger.warning(
-                "[Feishu] Add reaction %s on %s raised",
-                emoji_type,
-                message_id,
-                exc_info=True,
-            )
+            logger.warning("[Feishu] Failed to add ack reaction to %s", message_id, exc_info=True)
        return None

-    async def _remove_reaction(self, message_id: str, reaction_id: str) -> bool:
-        if not self._client or not message_id or not reaction_id:
-            return False
-        try:
-            from lark_oapi.api.im.v1 import DeleteMessageReactionRequest
-            request = (
-                DeleteMessageReactionRequest.builder()
-                .message_id(message_id)
-                .reaction_id(reaction_id)
-                .build()
-            )
-            response = await asyncio.to_thread(self._client.im.v1.message_reaction.delete, request)
-            if response and getattr(response, "success", lambda: False)():
-                return True
-            logger.debug(
-                "[Feishu] Remove reaction %s on %s rejected: code=%s msg=%s",
-                reaction_id,
-                message_id,
-                getattr(response, "code", None),
-                getattr(response, "msg", None),
-            )
-        except Exception:
-            logger.warning(
-                "[Feishu] Remove reaction %s on %s raised",
-                reaction_id,
-                message_id,
-                exc_info=True,
-            )
-        return False
-
-    def _remember_processing_reaction(self, message_id: str, reaction_id: str) -> None:
-        cache = self._pending_processing_reactions
-        cache[message_id] = reaction_id
-        cache.move_to_end(message_id)
-        while len(cache) > _FEISHU_PROCESSING_REACTION_CACHE_SIZE:
-            cache.popitem(last=False)
-
-    def _pop_processing_reaction(self, message_id: str) -> Optional[str]:
-        return self._pending_processing_reactions.pop(message_id, None)
-
-    async def on_processing_start(self, event: MessageEvent) -> None:
-        if not self._reactions_enabled():
-            return
-        message_id = event.message_id
-        if not message_id or message_id in self._pending_processing_reactions:
-            return
-        reaction_id = await self._add_reaction(message_id, _FEISHU_REACTION_IN_PROGRESS)
-        if reaction_id:
-            self._remember_processing_reaction(message_id, reaction_id)
-
-    async def on_processing_complete(
-        self, event: MessageEvent, outcome: ProcessingOutcome
-    ) -> None:
-        if not self._reactions_enabled():
-            return
-        message_id = event.message_id
-        if not message_id:
-            return
-
-        start_reaction_id = self._pending_processing_reactions.get(message_id)
-        if start_reaction_id:
-            if not await self._remove_reaction(message_id, start_reaction_id):
-                # Don't stack a second badge on top of a Typing we couldn't
-                # remove — UI would read as both "working" and "done/failed"
-                # simultaneously. Keep the handle so LRU eventually evicts it.
-                return
-            self._pop_processing_reaction(message_id)
-
-        if outcome is ProcessingOutcome.FAILURE:
-            await self._add_reaction(message_id, _FEISHU_REACTION_FAILURE)
-
    # =========================================================================
    # Webhook server and security
    # =========================================================================
@@ -2542,7 +2377,7 @@ class FeishuAdapter(BasePlatformAdapter):
        )

    def _media_batch_key(self, event: MessageEvent) -> str:
-        from hermes_agent.gateway.session import build_session_key
+        from gateway.session import build_session_key

        session_key = build_session_key(
            event.source,
@@ -2619,7 +2454,7 @@ class FeishuAdapter(BasePlatformAdapter):
        default_ext: str,
        preferred_name: str,
    ) -> tuple[str, str]:
-        from hermes_agent.tools.security.urls import is_safe_url
+        from tools.url_safety import is_safe_url
        if not is_safe_url(file_url):
            raise ValueError(f"Blocked unsafe URL (SSRF protection): {file_url[:80]}")

@@ -2755,8 +2590,6 @@ class FeishuAdapter(BasePlatformAdapter):
            self._on_reaction_event(event_type, data)
        elif event_type == "card.action.trigger":
            self._on_card_action_trigger(data)
-        elif event_type == "drive.notice.comment_add_v1":
-            self._on_drive_comment_event(data)
        else:
            logger.debug("[Feishu] Ignoring webhook event type: %s", event_type or "unknown")
        return web.json_response({"code": 0, "msg": "ok"})
@@ -2822,7 +2655,7 @@ class FeishuAdapter(BasePlatformAdapter):

    def _text_batch_key(self, event: MessageEvent) -> str:
        """Return the session-scoped key used for Feishu text aggregation."""
-        from hermes_agent.gateway.session import build_session_key
+        from gateway.session import build_session_key

        return build_session_key(
            event.source,
@@ -3391,23 +3224,6 @@ class FeishuAdapter(BasePlatformAdapter):
            return self._post_mentions_bot(normalized.mentioned_ids)
        return False

-    def _is_self_sent_bot_message(self, event: Any) -> bool:
-        """Return True only for Feishu events emitted by this Hermes bot."""
-        sender = getattr(event, "sender", None)
-        sender_type = str(getattr(sender, "sender_type", "") or "").strip().lower()
-        if sender_type not in {"bot", "app"}:
-            return False
-
-        sender_id = getattr(sender, "sender_id", None)
-        sender_open_id = str(getattr(sender_id, "open_id", "") or "").strip()
-        sender_user_id = str(getattr(sender_id, "user_id", "") or "").strip()
-
-        if self._bot_open_id and sender_open_id == self._bot_open_id:
-            return True
-        if self._bot_user_id and sender_user_id == self._bot_user_id:
-            return True
-        return False
-
    def _message_mentions_bot(self, mentions: List[Any]) -> bool:
        """Check whether any mention targets the configured or inferred bot identity."""
        for mention in mentions:
@@ -3435,55 +3251,10 @@ class FeishuAdapter(BasePlatformAdapter):
        return False

    async def _hydrate_bot_identity(self) -> None:
-        """Best-effort discovery of bot identity for precise group mention gating
-        and self-sent bot event filtering.
-
-        Populates ``_bot_open_id`` and ``_bot_name`` from /open-apis/bot/v3/info
-        (no extra scopes required beyond the tenant access token). Falls back to
-        the application info endpoint for ``_bot_name`` only when the first probe
-        doesn't return it. Each field is hydrated independently — a value already
-        supplied via env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID /
-        FEISHU_BOT_NAME) is preserved and skips its probe.
-        """
+        """Best-effort discovery of bot identity for precise group mention gating."""
        if not self._client:
            return
-        if self._bot_open_id and self._bot_name:
-            # Everything the self-send filter and precise mention gate need is
-            # already in place; nothing to probe.
-            return
-
-        # Primary probe: /open-apis/bot/v3/info — returns bot_name + open_id, no
-        # extra scopes required. This is the same endpoint the onboarding wizard
-        # uses via probe_bot().
-        if not self._bot_open_id or not self._bot_name:
-            try:
-                resp = await asyncio.to_thread(
-                    self._client.request,
-                    method="GET",
-                    url="/open-apis/bot/v3/info",
-                    body=None,
-                    raw_response=True,
-                )
-                content = getattr(resp, "content", None)
-                if content:
-                    payload = json.loads(content)
-                    parsed = _parse_bot_response(payload) or {}
-                    open_id = (parsed.get("bot_open_id") or "").strip()
-                    bot_name = (parsed.get("bot_name") or "").strip()
-                    if open_id and not self._bot_open_id:
-                        self._bot_open_id = open_id
-                    if bot_name and not self._bot_name:
-                        self._bot_name = bot_name
-            except Exception:
-                logger.debug(
-                    "[Feishu] /bot/v3/info probe failed during hydration",
-                    exc_info=True,
-                )
-
-        # Fallback probe for _bot_name only: application info endpoint. Needs
-        # admin:app.info:readonly or application:application:self_manage scope,
-        # so it's best-effort.
-        if self._bot_name:
+        if any((self._bot_open_id, self._bot_user_id, self._bot_name)):
            return
        try:
            request = self._build_get_application_request(app_id=self._app_id, lang="en_us")
@@ -3492,17 +3263,17 @@ class FeishuAdapter(BasePlatformAdapter):
                code = getattr(response, "code", None)
                if code == 99991672:
                    logger.warning(
-                        "[Feishu] Unable to hydrate bot name from application info. "
+                        "[Feishu] Unable to hydrate bot identity from application info. "
                        "Grant admin:app.info:readonly or application:application:self_manage "
                        "so group @mention gating can resolve the bot name precisely."
                    )
                return
            app = getattr(getattr(response, "data", None), "app", None)
            app_name = (getattr(app, "app_name", None) or "").strip()
-            if app_name and not self._bot_name:
+            if app_name:
                self._bot_name = app_name
        except Exception:
-            logger.debug("[Feishu] Failed to hydrate bot name from application info", exc_info=True)
+            logger.debug("[Feishu] Failed to hydrate bot identity", exc_info=True)

    # =========================================================================
    # Deduplication — seen message ID cache (persistent)
--- a/hermes_agent/gateway/platforms/helpers.py
+++ b/hermes_agent/gateway/platforms/helpers.py
@@ -14,7 +14,7 @@ from pathlib import Path
 from typing import TYPE_CHECKING, Dict, Optional

 if TYPE_CHECKING:
-    from hermes_agent.gateway.platforms.base import BasePlatformAdapter, MessageEvent
+    from gateway.platforms.base import BasePlatformAdapter, MessageEvent

 logger = logging.getLogger(__name__)

@@ -214,7 +214,7 @@ class ThreadParticipationTracker:
        self._threads: set = self._load()

    def _state_path(self) -> Path:
-        from hermes_agent.constants import get_hermes_home
+        from hermes_constants import get_hermes_home
        return get_hermes_home() / f"{self._platform}_threads.json"

    def _load(self) -> set:
--- a/hermes_agent/gateway/platforms/homeassistant.py
+++ b/hermes_agent/gateway/platforms/homeassistant.py
@@ -28,8 +28,8 @@ except ImportError:
    AIOHTTP_AVAILABLE = False
    aiohttp = None  # type: ignore[assignment]

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
--- a/Show More
+++ b/Show More