Compare commits

..

241 Commits

Author SHA1 Message Date
emozilla 854206e59e fix(plugins): register dynamically-loaded modules in sys.modules before exec
Dashboard plugin API routes (web_server._mount_plugin_api_routes) and
gateway event hooks (gateway.hooks.HookRegistry.discover_and_load) both
loaded Python files via importlib.util.spec_from_file_location +
exec_module without registering the resulting module in sys.modules.

That breaks any plugin or hook handler that uses `from __future__ import
annotations` together with a Pydantic BaseModel / dataclass / anything
that introspects `__module__`: at first request Pydantic tries to
resolve string-form type hints against the defining module's namespace,
can't find it by name, and raises:

  PydanticUserError: TypeAdapter[...] is not fully defined;
  you should define ... and all referenced types,
  then call `.rebuild()` on the instance.

This is what broke the kanban dashboard's 'triage' button — POST
/api/plugins/kanban/tasks validated against CreateTaskBody (a Pydantic
model in a file using `from __future__ import annotations`) and
returned 500 on every click.

The fix, applied symmetrically to both loaders:

  1. Compute module_name once.
  2. Register the module in sys.modules BEFORE exec_module.
  3. On exec_module failure, pop the half-initialized stub so subsequent
     reloads don't pick up broken state.

GETs were unaffected because they don't build a body TypeAdapter, which
is why this only surfaced when users started POSTing.
2026-04-28 13:24:29 -04:00
teknium1 dd83173621 feat(kanban): per-task force-loaded skills
Tasks can now pin extra skills to load into their worker alongside the
built-in kanban-worker. Use cases: translation tasks that need a
translation skill, review tasks that need github-code-review, security
audits that need security-pr-audit — without editing the assignee's
profile config.

Changes:
- tasks.skills column (JSON array), idempotent migration, Task.skills
  dataclass field.
- create_task(skills=[...]) normalises (strip/dedupe), rejects commas
  in a single name.
- _default_spawn emits one `--skills X` pair per task skill, in
  addition to the built-in `--skills kanban-worker`. Deduped against
  the built-in.
- CLI: `hermes kanban create --skill <name>` (repeatable).
- Tool: `kanban_create(skills=[...])` (accepts list or single string).
- Dashboard: POST /tasks accepts `skills`; inline create form has a
  comma-separated skills input; drawer shows a Skills row.
- `hermes kanban show` prints a skills row when present.
- Docs: kanban.md has a new 'Pinning extra skills to a specific task'
  section; CLI reference shows `--skill`.
- Tests: 16 new (kernel round-trip, dedupe, comma-rejection, dispatcher
  argv with multiple skills, built-in dedupe, CLI flag repeatable, CLI
  no-flag stays None, show renders skills, idempotent migration on
  legacy DB, tool list/string/non-list, dashboard REST round-trip,
  dashboard default empty list).
2026-04-28 07:50:36 -07:00
teknium1 c65c1ddf21 feat(kanban): dispatcher auto-loads kanban-worker skill on every spawn
`_default_spawn` now passes `--skills kanban-worker` in the child's
argv, so every dispatched worker gets the skill loaded automatically
regardless of the profile's default skill config.

Why
  The system prompt carries the MANDATORY lifecycle (via
  KANBAN_GUIDANCE). The skill carries the deeper reference material:
  good summary/metadata patterns, retry diagnostics, block-reason
  examples, workspace handling, CLI fallback. Both are useful;
  requiring users to wire the skill into skills config per-profile
  is a footgun — they'd hit tasks where workers use the lifecycle
  correctly but not the patterns, producing weaker handoffs.

  Auto-loading makes kanban-worker the baseline. Profiles can still
  add more skills via the normal config; --skills is additive.

Test
  New test_default_spawn_auto_loads_kanban_worker_skill intercepts
  subprocess.Popen to capture the argv without actually spawning a
  hermes subprocess. Asserts '--skills kanban-worker' appears in the
  cmd and the env still carries HERMES_KANBAN_TASK + HERMES_PROFILE.

Skill note updated
  Worker skill's opening note changed from "you don't need to load
  this" to "you're seeing this because the dispatcher loaded it for
  you" — reflects the new always-loaded reality.

218/218 kanban + tools suite green.
2026-04-28 07:12:34 -07:00
teknium1 1970bcf5a5 feat(kanban): system-prompt guidance block + reshape skills to reference-only
Moves the kanban worker lifecycle out of the skills and into the system
prompt as a conditionally-injected guidance block, matching the
existing MEMORY_GUIDANCE / SESSION_SEARCH_GUIDANCE / SKILLS_GUIDANCE
pattern in `agent/prompt_builder.py`. Workers now know the full
lifecycle even if no skill was loaded; skills become the deeper
playbook for edge cases.

Why
  Previously the 6-step lifecycle (orient → work → heartbeat → block/
  complete → fan-out) only reached the model if the kanban-worker
  skill was loaded. That's fragile: users configure skills-per-
  platform, profiles can forget to include them, skill loading order
  matters. The lifecycle is load-bearing for correct board behavior
  — it belongs in the prompt path.

What
  - New `KANBAN_GUIDANCE` constant in agent/prompt_builder.py (~3 KB):
    identity framing ("You are a Kanban worker"), 6-step lifecycle
    with exact tool-call shapes (`kanban_show()`, `kanban_complete(
    summary=..., metadata=...)`, `kanban_block(reason=...)`, etc.),
    orchestrator mode note, and explicit DO NOTs including "don't
    shell out to `hermes kanban <verb>`".
  - Wired into `AIAgent._build_system_prompt` gated on `kanban_show
    in self.valid_tool_names`. Because `kanban_show`'s `check_fn`
    gates on `HERMES_KANBAN_TASK` env var, this indirection
    guarantees guidance appears iff a worker was dispatched. Zero
    cost to normal sessions.

Gating verified live
  Normal `hermes chat` session: 2329-char prompt, zero kanban content.
  Worker session (HERMES_KANBAN_TASK set):  5272-char prompt, +2943
  chars of KANBAN_GUIDANCE present. Regression test asserts the
  header phrase and each tool name.

Skills reshape (both from ~6 KB lifecycles to ~7 KB references)
  - `skills/devops/kanban-worker/SKILL.md`: drops the lifecycle
    (now in the guidance block), keeps good-summary patterns,
    block-reason examples, retry diagnostics, pitfalls, CLI
    fallback cheatsheet. Adds explicit note that you don't need to
    load the skill to work a task anymore.
  - `skills/devops/kanban-orchestrator/SKILL.md`: drops the
    "don't execute" rule (now in guidance), keeps the full
    decomposition playbook, specialist roster with typical
    workspaces, a concrete Postgres-migration example with
    `kanban_create` + `parents=[...]` linking, and pitfalls
    (reassignment vs new task, link argument order, tenant
    inheritance).

Tests (+3)
  - `test_kanban_guidance_not_in_normal_prompt`: verifies a regular
    AIAgent with no env var produces a prompt that contains none of
    the kanban lifecycle phrases.
  - `test_kanban_guidance_in_worker_prompt`: spawns an AIAgent with
    HERMES_KANBAN_TASK set, verifies the header + all 4 tool-call
    examples + anti-shell guidance appear.
  - `test_kanban_guidance_prompt_size_bounded`: sanity check that
    the guidance is 1.5-4 KB so it doesn't balloon the cached prompt.

217/217 kanban + tools suite green; 303/303 run_agent suite green.
The guidance block sits alongside the other tool-gated blocks, so
prompt caching remains intact — the system prompt is stable across
every turn of a kanban worker's run.
2026-04-28 05:29:31 -07:00
teknium1 832ecde4b0 feat(kanban): structured tool surface for worker + orchestrator agents
Seven new tools in `tools/kanban_tools.py` that give kanban workers a
backend-portable, schema-filtered way to interact with the board from
inside their own Python process — no shelling out to `hermes kanban`.

Motivation
  The CLI path (`hermes kanban complete \$TASK --summary ...`) breaks
  on any remote terminal backend (Docker, Modal, Singularity, SSH).
  The terminal tool runs `hermes kanban` inside the container, where
  `hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted.
  Tools run in the agent's own Python process, so they always reach
  the board regardless of backend. Also skips shell-quoting fragility
  on --metadata JSON and gives structured error returns the model can
  reason about.

The seven tools
  kanban_show        read current task (defaults to HERMES_KANBAN_TASK)
  kanban_complete    structured handoff: summary + metadata
  kanban_block       ask for human input
  kanban_heartbeat   signal liveness during long operations
  kanban_comment     append to task thread
  kanban_create      fan out into child tasks (orchestrator path)
  kanban_link        add parent→child dependency after the fact

Gating
  Each tool's check_fn returns True iff HERMES_KANBAN_TASK is set in
  the process env. The dispatcher sets it when spawning a worker;
  normal `hermes chat` sessions never have it. Empirically verified:
  a baseline hermes-cli schema is 27 tools; with HERMES_KANBAN_TASK
  set it grows to exactly 34 (+7). Zero leak into normal sessions.

Also set HERMES_PROFILE in the spawn env so the kanban_comment tool's
author default works cleanly (it's what the tool reads to attribute
comments).

Skill updates
  - `skills/devops/kanban-worker/SKILL.md`: lifecycle rewritten to use
    kanban_show / kanban_heartbeat / kanban_block / kanban_complete /
    kanban_comment / kanban_create directly. CLI fallback section
    added for human operators / scripts.
  - `skills/devops/kanban-orchestrator/SKILL.md`: all examples ported
    from CLI to tool form; top-banner note explaining tools are the
    primary surface. kanban_create / kanban_link throughout.

Docs
  `website/docs/user-guide/features/kanban.md`:
  new "How workers interact with the board" section explaining the
  tool surface, gating mechanism, and why tools vs CLI. The worker
  skill / orchestrator skill subsections are now nested under it.

Tests (+25 in tests/tools/test_kanban_tools.py)
  - Schema gating: kanban_tools_hidden_without_env_var,
    kanban_tools_visible_with_env_var.
  - Happy paths: show (default + explicit task_id), complete (with
    summary+metadata, with result only), block, heartbeat (with and
    without note), comment (default + custom author), create (with
    list parents, with string parent), link.
  - Error paths: complete rejects no-handoff and non-dict metadata,
    block rejects empty reason, comment rejects empty body, create
    rejects no title / no assignee / non-list parents, link rejects
    self-reference / missing args / cycles.
  - End-to-end: full worker lifecycle driven entirely through the
    tools, verified against DB state.

214/214 kanban suite pass under scripts/run_tests.sh.
2026-04-28 04:30:22 -07:00
Teknium be184aa5fa fix(kanban): close the two v2-flagged issues in v1
Both items the atypical-scenarios pass flagged as "v2 follow-up"
actually belong in v1. Fixed now.

Fix 1: workspace path traversal
  resolve_workspace now rejects non-absolute paths for all three
  workspace_kinds (scratch-with-explicit-path, dir:, worktree). A
  relative path like '../../../tmp/attacker' was being silently
  resolved against the dispatcher's CWD — a confused-deputy escape.
  Error message points users at the absolute-path requirement.
  Storage remains verbatim (kernel doesn't rewrite user input);
  the refusal happens at resolution time, so the dispatcher's
  existing spawn-failure circuit breaker correctly categorizes it.

  Threat model documented in website/docs/user-guide/features/kanban.md:
  single-host, trusted-local-user. The absolute-path rule prevents
  ambiguity-driven escape, not malicious access — kanban runs as you,
  with your uid, on your filesystem.

Fix 2: build_worker_context unbounded
  Added per-section caps so worker prompts stay bounded on pathological
  boards:
    _CTX_MAX_PRIOR_ATTEMPTS = 10   most-recent N runs shown; older
                                   collapsed into "N earlier attempts
                                   omitted" marker. Attempt numbering
                                   preserved (shows "Attempt 16" not
                                   renumbered).
    _CTX_MAX_COMMENTS       = 30   same pattern for comments.
    _CTX_MAX_FIELD_BYTES    = 4 KB per summary / error / metadata / result.
    _CTX_MAX_BODY_BYTES     = 8 KB per task.body (opening post).
    _CTX_MAX_COMMENT_BYTES  = 2 KB per comment.
  Truncation uses a visible ellipsis + char-count so the worker knows
  it's been truncated.

  Effect on atypical-scenario runs:
    huge_run_count_on_one_task (1000 runs):  63 KB  →    820 chars
    comment_storm       (1000 comments):     50 KB  →  1,671 chars

Tests (+6 in main suite)
  test_resolve_workspace_rejects_relative_dir_path — relative dir:
    path stored verbatim but refused at resolve.
  test_resolve_workspace_accepts_absolute_dir_path — legitimate
    absolute paths are created and returned.
  test_resolve_workspace_rejects_relative_worktree_path — same guard
    for worktree kind.
  test_build_worker_context_caps_prior_attempts — 25 runs → exactly
    _CTX_MAX_PRIOR_ATTEMPTS shown, omitted marker present,
    attempt numbering preserves original index.
  test_build_worker_context_caps_comments — 100 comments → 30 shown,
    70 in the omitted marker.
  test_build_worker_context_caps_huge_summary — 1 MB summary on a
    prior run → context under 10 KB total, truncation marker visible.

189/189 kanban suite pass. Atypical-scenarios stress script still
passes all 28 scenarios with the new caps in effect.
2026-04-28 01:20:08 -07:00
Teknium 63b7b6d5bd test(kanban): atypical-scenario stress suite + clock-skew elapsed clamp fix
Added tests/stress/test_atypical_scenarios.py — 28 scenarios covering
atypical user inputs and environments that the normal tests assume
away. Surfaced one real UI bug in the process.

Bug found and fixed
  - CLI display of negative elapsed time when NTP jumps backward
    between claim_task and complete_task. The kernel faithfully stores
    started_at > ended_at; both CLI display sites (_cmd_show's Runs
    section at line 685, _cmd_runs's table at line 1153) now clamp
    elapsed to max(0, end - start), matching the dashboard JS which
    already had the same Math.max(0, ...) clamp. Regression test
    test_cli_show_clamps_negative_elapsed forces a future started_at
    via raw UPDATE and verifies neither show nor runs prints a
    `-<digits>s` token.

Atypical scenarios covered (all passing, 28 total)
  Data:
    - unicode_and_emoji: CJK, RTL (Hebrew/Arabic), ZWJ emoji sequences,
      control chars, null bytes in titles + metadata
    - huge_strings: 1 MB body + 1 MB summary + 50-level nested metadata
    - sql_injection_attempts: 6 classic payloads, parameterized queries
      hold across every string field
    - newlines_in_summary: full preserved on run, first line in event
      payload for notifier brevity
    - malformed_metadata_via_cli: 4 bad JSON values each cleanly
      rejected with stderr error, no partial task mutation
    - empty_string_fields: empty title rejected, whitespace-only title
      rejected, empty body accepted
    - tenant_with_newlines: multiline tenant strings survive
      board_stats

  Dependency graphs:
    - dependency_cycle: A→B→A refused
    - self_parent: cannot depend on itself
    - diamond_dependency: child promotes only when both parents done
    - wide_fan_out: 500 children promoted in 4ms via complete_task's
      internal recompute_ready
    - wide_fan_in: 500 parents → 1 child, promotion gated correctly
    - parent_in_different_status_states: child stays todo for parent
      in ready/running/blocked/triage/archived; only 'done' unblocks

  Workspace:
    - workspace_path_traversal: dir: workspaces are intentionally
      arbitrary paths (documented threat model)
    - workspace_nonexistent_path: spawn_failure counter increments,
      task returns to ready

  Clock:
    - clock_skew_start_greater_than_end: kernel stores faithfully,
      CLI now clamps at display (see Bug above)

  Filesystem:
    - hermes_home_with_spaces: works
    - hermes_home_with_unicode: works
    - hermes_home_via_symlink: two symlinks to same dir share DB
      (Path.resolve() in _INITIALIZED_PATHS)

  Scale extremes:
    - huge_run_count_on_one_task: 1000 runs → list_runs in <1ms,
      build_worker_context 3ms/63KB (flagged: unbounded on
      retry-heavy tasks; v2 cap candidate)
    - hundred_tenants: 5000 tasks / 100 tenants → stats 1ms, list 26ms
    - comment_storm: 1000 comments → 50KB worker context (same
      unbounded-context flag)

  Lifecycle:
    - completed_task_reclaim_attempt: done tasks can't be
      re-claimed/re-completed/blocked
    - archived_task_resurrection_attempt: archived tasks invisible to
      default list + all ops refused
    - unassigned_task_never_claims: dispatcher skips, task untouched

  Assignees:
    - assignee_with_special_chars: @-signs, dots, CJK, emoji, 200-char
      names, empty strings all handled

  Concurrency:
    - idempotency_key_race: two concurrent processes calling
      create_task with same key both get back the SAME task id,
      exactly one row in DB

  Dashboard:
    - dashboard_rest_with_weird_inputs: empty/huge/unicode titles,
      unknown fields, type mismatches handled correctly (200 for
      valid, 400/422 for invalid)

Observations flagged for v2 (not fixed, not blocking)
  - build_worker_context is unbounded on retry-heavy tasks (1000 runs
    → 63KB) and comment-heavy tasks (1000 comments → 50KB). A
    `--max-prior-attempts` / `--max-comments` cap would be appropriate.
  - `dir:` workspace paths are intentionally arbitrary; docs should
    note the threat model (trusted local user; path is stored
    verbatim, not sandboxed).

183/183 main kanban suite pass. Stress suite still opted out by
default; enable with `pytest --run-stress` or run scripts directly.
2026-04-28 01:13:17 -07:00
Teknium 123f8d0fed feat(kanban): battle-test suite + 2 real bugs it found
Added tests/stress/ — an opt-in suite that stresses the kernel with
real concurrency, real subprocesses, random property fuzzing, and scale
benchmarks. Exposed two real bugs the 4 audit passes missed.

Bugs found and fixed
  - _pid_alive returned True for zombie processes. os.kill(pid, 0)
    succeeds against zombies (process table entry exists until parent
    reaps), so a worker that exited normally but wasn't yet reaped
    would look alive to detect_crashed_workers indefinitely. Fixed by
    peeking at /proc/<pid>/status on Linux and treating State: Z as
    dead. No-op on other POSIX; Windows unchanged.
  - Task ID generator used 2 hex bytes (65k space). By birthday
    paradox, ~5% collision probability at 1k tasks, ~50% at 10k.
    Would raise UNIQUE constraint errors on large boards without a
    retry path. Bumped to 4 hex bytes (4.3B space). Surfaced by
    benchmarks trying to seed 10k tasks.

Stress suite (tests/stress/, opt-in via --run-stress)
  - test_concurrency.py — 5 processes race for 100 tasks. 1
    lost-claim race observed, 0 double-claims, 0 orphan runs.
  - test_concurrency_mixed.py — 10 workers + 1 reclaimer, 500
    tasks, random ops (claim/complete/block/unblock/archive).
    1518 events, 43 lost-claim races, zero invariant violations.
  - test_concurrency_reclaim_race.py — TTL < work duration so the
    reclaimer intentionally yanks tasks mid-work. 82 claims, 44
    reclaimed, 50 completed, 32 complete_refused (CAS correctly
    blocked late completes on reclaimed tasks).
  - test_subprocess_e2e.py — dispatcher spawns real python
    subprocess workers that heartbeat + complete via the CLI.
    Also exercises crash detection against a real dead PID via
    double-fork.
  - test_property_fuzzing.py — 500 randomized sequences, ~40k
    operations, 9 invariant checks after each step. Zero
    violations.
  - test_benchmarks.py — latency at 100/1k/10k tasks. Key numbers:
    dispatch_once @ 10k = 4ms, recompute_ready @ 10k = 47ms,
    list_tasks @ 10k = 50ms, build_worker_context w/ 50 parents
    = 1ms. Erosika's 10k starvation flag is pessimistic by roughly
    an order of magnitude.

Regression tests added to the main suite (+2)
  - test_pid_alive_detects_zombie: /proc check against a real
    zombified process (Linux-only, skipped elsewhere).
  - test_task_ids_dont_collide_at_scale: 500 creates, verify
    uniqueness + format.

New harness
  - tests/stress/conftest.py skips the suite by default; opt in with
    pytest --run-stress. Scripts are also __main__-runnable directly
    since they were developed as standalone stress tests.
  - tests/stress/_fake_worker.py: minimal Python worker that
    exercises the real subprocess contract (reads HERMES_KANBAN_TASK,
    heartbeats, completes via CLI).
  - tests/stress/README.md explains opt-in usage.

Test count: 182/182 kanban suite still pass under scripts/run_tests.sh;
stress suite is additive and out-of-band.
2026-04-27 21:37:34 -07:00
Teknium a24c6e191f fix(kanban): address @erosika's pre-merge review (issue #16102)
Six concrete bugs + two cheap v2 extensions from the review at
https://github.com/NousResearch/hermes-agent/issues/16102#issuecomment-4331125835
Larger items (structured comments as session substrate, taxonomy
reorg) deferred to v2 with reply posted on the issue.

Pre-merge bug fixes
  - unblock_task: close any stale current_run_id pointer with a
    reclaimed run inside the unblock txn. Defensive; the invariant
    holds under current data paths (block_task already closes the
    run) but a future or external write that leaves the pointer
    dangling would otherwise persist across the ready->blocked->
    ready cycle. Mirrors the same pattern in claim_task +
    archive_task.
  - Migration backfill: wrap the in-flight backfill loop in
    write_txn and add a CAS guard (`current_run_id IS NULL`) on
    the pointer UPDATE, with a cleanup path that marks any orphan
    run row reclaimed if the CAS fails. Prevents races against a
    concurrent dispatcher between SELECT and INSERT.
  - Notifier sub leak on non-done terminals: unsub on the last
    delivered event's kind being terminal (completed / blocked /
    gave_up / crashed / timed_out), not just on task.status in
    (done, archived). blocked / gave_up / crashed / timed_out used
    to fire one ping then strand the subscription row forever.
  - Notifier thrashes dead chats: per-subscription send-failure
    counter keyed on (task_id, platform, chat_id, thread_id).
    After 3 consecutive adapter.send exceptions, drop the sub
    automatically. Counter resets on any successful send.

Daemon ops visibility
  - run_daemon on_tick now tracks consecutive ticks where the
    ready queue is non-empty but 0 spawns succeeded. After 6 such
    ticks (default ~30s at interval=5), emits a WARN line to
    stderr pointing at profile health (venv, PATH, credentials)
    and `hermes kanban list --status blocked`. Rate-limited to
    one message per 5 minutes so a persistent outage doesn't
    spam logs.

v2 extensions shipped in scope (pure upside)
  - build_worker_context: new "Recent work by @assignee" section
    surfacing the 5 most-recent completed runs for the current
    task's assignee (excluding this task). Bounded, cached by
    the natural LIMIT, no new dependencies. Skipped when the
    task has no assignee.
  - Gateway notifier message prefix: terminal pings now lead
    with `@<assignee>` so fleets (one chat subscribing to many
    tasks with different workers) stay legible at a glance.
    One-line template change.

Deferred to v2 (noted in reply to erosika)
  - recompute_ready full-scan starvation at 10k+ tasks: dirty-set
    approach is a real refactor; fine as follow-up.
  - Skill ↔ assignee validation for routing: depends on skill
    introspection surface that isn't nailed down.
  - Structured comments (in_reply_to / addressed_to / kind) as
    multi-peer session substrate: schema-affecting, exactly the
    v2-scope design vulcan flagged shouldn't cram into this PR.
  - Pattern vs mechanism taxonomy split in docs: pure docs reorg,
    low urgency.

Tests (+6 in core functionality)
  - unblock_invariant_recovery (engineered leak, defensive close)
  - unblock_normal_path_no_spurious_run (no run created on happy
    block->unblock; erosika's main concern)
  - migration_backfill_idempotent_under_re_run (3x init_db on a
    legacy-shape DB yields exactly 1 run row, not 3)
  - build_worker_context_includes_role_history (role continuity)
  - build_worker_context_role_history_skipped_when_no_assignee
  - build_worker_context_role_history_bounded_to_5

180/180 kanban suite pass under scripts/run_tests.sh. Live-smoke
exercised all three kernel fixes end-to-end with isolated
HERMES_HOME.
2026-04-27 20:40:49 -07:00
Teknium 7206eed319 docs(kanban): add step-by-step tutorial with 10 dashboard screenshots
New website/docs/user-guide/features/kanban-tutorial.md walks four
user stories end-to-end, each backed by a real screenshot of the
dashboard running against seeded data.

Stories
  1. Solo dev shipping a feature (parent->child dependencies,
     structured handoff, run history rendering).
  2. Fleet farming (parallel independent tasks across 3 assignees,
     lanes-by-profile grouping, dispatcher daemon).
  3. Role pipeline with retry (PM spec -> eng implements -> review
     blocks -> eng retries -> review approves; two-run history
     visible in the drawer; downstream workers pull parent
     summary+metadata).
  4. Circuit breaker + crash recovery (2 spawn_failed + 1 gave_up
     for a deploy with missing creds; 1 crashed + 1 completed for
     an OOM-killed migration that recovered on retry).

Each story shows both CLI commands and the dashboard drawer
equivalent. Screenshots captured via playwright + chromium at 2x
device scale, then repalettized with PIL (22MB -> 6.1MB for the
10-image set, no visible quality loss verified against vision).

Side updates
  - website/sidebars.ts: added kanban-tutorial under features.
  - website/docs/user-guide/features/kanban.md: prefix banner
    linking new readers to the tutorial before the reference.

All image references validate: `/img/kanban-tutorial/*` maps to
website/static/img/kanban-tutorial/ (10 files). Docusaurus build
not run locally (no node_modules in worktree); CI build on merge
will confirm.
2026-04-27 20:28:53 -07:00
Teknium 1619c0e503 fix(kanban): third pass — auto-init on first use, show --json carries runs[]
Found during full-stack live-test of the kanban system: two bugs where
the kernel and CLI didn't match the documented contract.

Kernel
  - `connect()` now auto-runs schema creation + migrations on the
    first connection to a given DB path, matching its docstring
    ('Open (and initialize if needed)' — which was aspirational until
    now). Module-level _INITIALIZED_PATHS cache keeps subsequent
    connects cheap. Previously the docstring lied: every path that
    went through connect() on a fresh HERMES_HOME raised 'no such
    table: tasks' — only `hermes kanban init` and `daemon` triggered
    schema creation.
  - `init_db()` always re-runs the migration pass (clears the cache
    entry first). Callers that know the on-disk schema may have
    drifted — tests writing legacy event kinds, external tools
    upgrading an old DB file — can force re-migration.

CLI
  - `kanban_command()` entry point auto-inits the DB before
    dispatching any subcommand. Idempotent; the underlying
    connect-based init pattern makes this a one-line SELECT against
    sqlite_master after the first call.
  - `hermes kanban show --json` now includes:
      - `runs`: full attempt history (id, profile, step_key, status,
        outcome, summary, error, metadata, worker_pid, started_at,
        ended_at)
      - `run_id` on every event object
    Dashboard API already had both; CLI was behind. Now scripts that
    inspect a task can use `show --json` alone.
  - `hermes kanban show` (human-readable) prints a Runs section
    matching the `runs` subcommand's format, and each Event line
    prefixes its run_id when present. Makes attempt attribution
    visible at a glance without a second command.

Tests (+3)
  - cli_create_on_fresh_home_auto_inits (subprocess, no init_db call,
    must succeed — covers the most common first-user path)
  - connect_auto_inits_fresh_db (direct kernel use without init_db)
  - cli_show_json_carries_runs (runs[] present; events carry run_id)

174/174 kanban suite pass under scripts/run_tests.sh.

Live-tested end-to-end
  - Phases 1-6: CLI subprocess (create/claim/complete/bulk-guard/
    synthetic-run/reclaim-via-TTL/multi-attempt history)
  - Phases 7-10: Dashboard FastAPI TestClient (POST/PATCH with
    summary+metadata, drag-drop running->ready, archive-while-
    running, mark-done-with-handoff)
  - Phases 11-13: Dispatcher with stub spawn_fn (3 tasks success,
    3 failures -> gave_up circuit breaker, event.run_id attribution
    across retries)
  - Phases 14-15: WebSocket /events (run_id payload, auth rejects
    wrong tokens with code 1008)
  - Phase 16: Gateway notifier (unseen_events_for_sub returns
    run_id on events, message renders 'done — title' + handoff
    summary from event payload, crashed event path, sub cleanup)

Every surface — kernel, dispatcher, CLI, dashboard REST, dashboard
WebSocket, gateway notifier — exercised end-to-end against a live
FastAPI app with a real SQLite DB in an isolated HERMES_HOME. All
passed.
2026-04-27 19:41:08 -07:00
Teknium e27c819de3 fix(kanban): deep-scan pass 2 — synthetic runs, event.run_id plumbing, invariant recovery, live drawer refresh
Second integration audit covering surfaces the first pass didn't hit.
Found eight issues spanning kernel, dashboard frontend, notifier, and CLI.
All behavioral / UX fixes; no schema change.

Kernel
  - complete_task on a never-claimed task (ready/blocked → done with no
    run in flight) was silently dropping the summary/metadata/result
    onto a non-existent run. Now synthesizes a zero-duration run
    (started_at == ended_at) so attempt history is complete. Only
    fires when there's actually handoff data to persist — bare
    complete_task(tid) remains a no-op for run creation.
  - block_task on a never-claimed task had the same bug for --reason.
    Same fix: synthesize a zero-duration run when a reason is passed.
  - Event dataclass gained a `run_id: Optional[int] = None` field.
    list_events, unseen_events_for_sub, and the dashboard _event_dict
    were all SELECTing the column but dropping it on the way out,
    so downstream consumers couldn't group events by attempt. Every
    read path now surfaces run_id.
  - claim_task got a defensive invariant-recovery step: if somehow
    `current_run_id` is non-NULL on a task in 'ready' status (invariant
    violation from an unknown code path), close the leaked run as
    'reclaimed' inside the same txn as the new claim. No-op in the
    common case; belt-and-suspenders in case a future code path forgets
    to clear the pointer.

Dashboard
  - GET /tasks/:id events array now carries run_id per event (via
    _event_dict).
  - WebSocket /events SELECT now includes run_id in the pushed event
    payload.
  - TaskDrawer reloads itself on live events for its own task id. New
    `taskEventTick[taskId]` state in the Board, incremented on every
    WS event, passed down as `eventTick` prop; drawer's useEffect
    depends on it. Previously, background workers completing a task
    the user was viewing left the drawer showing stale data until
    manual close/reopen.
  - CSS: added `.hermes-kanban-run--ended` rule for the fallback class
    the JS emits when outcome is unset. Harmless before; just
    inconsistent.

CLI
  - `hermes kanban watch --kinds` help text listed the legacy event
    name `spawn_auto_blocked`. The kernel migration renames it to
    `gave_up`, so users typing the documented name got zero matches.
    Now shows the current lexicon (`completed,blocked,gave_up,
    crashed,timed_out`).

Tests (+6 in core functionality, +1 in dashboard plugin)
  - complete_never_claimed_task_synthesizes_run
  - block_never_claimed_task_synthesizes_run
  - complete_never_claimed_without_handoff_skips_synthesis
  - event_dataclass_carries_run_id (created.run_id None, completed.run_id matches)
  - unseen_events_for_sub_includes_run_id (notifier path)
  - claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery)
  - event_dict_includes_run_id (dashboard API shape)

171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated
HERMES_HOME via execute_code) exercised all six fixed paths plus the
claim-after-leak recovery sequence.

Docs
  - Runs section: new 'Synthetic runs for never-claimed completions'
    and 'Live drawer refresh' paragraphs explaining the invariants.
  - Event reference: `created` / `promoted` / `unblocked` entries now
    explicitly note `run_id` is `NULL`; `completed` / `blocked`
    describe synthetic-run fallback.
2026-04-27 19:23:49 -07:00
Teknium 1c78f6627a docs(kanban): document audit-pass invariants — bulk-close guard, reclaimed-on-status-change, completed event carries summary
- Runs section: dashboard PATCH parity (summary/metadata forward),
  `completed` event embeds first-line summary for notifiers, bulk
  --summary/--metadata refused, archive/drag-drop reclaim semantics.
- Event reference: added Payload column to Lifecycle and Edits
  tables; called out the invariant that `status` carries run_id
  when closing a reclaimed run.
2026-04-27 08:46:08 -07:00
Teknium 8ef2ae6502 fix(kanban): audit pass — close orphaned runs on archive / dashboard direct-status / drag-drop
Integration audit of the runs-as-first-class work (0146cb2bd) found five
bugs where structured runs got orphaned or dashboard parity was missing.
All behavioral fixes; no schema change needed.

Kernel
  - archive_task: when called on a running task, now closes the
    in-flight run with outcome='reclaimed' and clears current_run_id.
    Previously, dashboard bulk-archive or CLI `kanban archive <running>`
    would leave the task_runs row open with ended_at=NULL forever and
    strand the pointer. Adds the claim_lock / claim_expires / worker_pid
    clearing to the UPDATE so the task row is clean too.
  - complete_task: embeds the first-line handoff summary in the
    `completed` event payload (capped at 400 chars). Notifier can now
    render `✔ task done — <title>\n<summary>` without a second SQL hit,
    and the full summary still lives on the run row.

Dashboard plugin
  - _set_status_direct: drag-drop OFF 'running' (to 'ready', 'todo',
    'triage', 'done' — anywhere except back to 'running') now closes
    the active run with outcome='reclaimed'. Clears worker_pid too.
    Snapshots previous status + current_run_id before the UPDATE so
    the decision has the right before-state. status event rows now
    carry run_id when closing a run, NULL otherwise.
  - UpdateTaskBody: adds `summary` and `metadata` fields. PATCH
    /tasks/:id with status='done' now forwards them to complete_task,
    giving the dashboard parity with `hermes kanban complete --summary
    ... --metadata ...`. Previously these fields only existed on the
    CLI.

CLI
  - `hermes kanban complete a b c --summary X` or `--metadata Y`:
    refused with a clear stderr message instead of silently applying
    the same handoff to every task. Bulk-close without handoff flags
    still works. (Note: hermes_cli.main discards subcommand exit
    codes via `args.func(args)` without propagating; tracked
    separately. Side-effect check is the real guard.)

Gateway notifier
  - Completion message prefers run.summary (carried in event payload)
    over task.result. task.result remains the fallback for legacy rows
    written before runs shipped.
  - Docstring: renamed stale `spawn_auto_blocked` reference to
    `gave_up` / `timed_out` — matches the actual TERMINAL_KINDS
    tuple, which was already correct in code.

Tests (+8 in core functionality, +3 in dashboard plugin)
  - archive_of_running_task_closes_run
  - archive_of_ready_task_does_not_create_spurious_run
  - dashboard_direct_status_change_off_running_closes_run
  - dashboard_direct_status_change_within_same_state_is_noop_for_runs
  - cli_bulk_complete_with_summary_rejects (side-effect assertion)
  - cli_bulk_complete_without_summary_still_works
  - completed_event_payload_carries_summary
  - completed_event_payload_summary_none_when_missing
  - patch_status_done_with_summary_and_metadata
  - patch_status_done_without_summary_still_works (legacy path)
  - patch_status_archive_closes_running_run (E2E through FastAPI TestClient)

164/164 kanban suite pass under scripts/run_tests.sh. Live smoke
(execute_code with isolated HERMES_HOME) covered all five fixed paths
plus a re-claim-after-drag-drop to confirm the fresh run is tracked
correctly after the orphan close.
2026-04-27 07:44:39 -07:00
Teknium 0146cb2bd2 feat(kanban): runs as first-class (v1); structured handoffs; forward-compat for v2 workflows
Addresses vulcan-artivus's RFC review on issue #16102. Picks up the
structural changes that are expensive to retrofit later and zero-cost
to land now; defers workflow-template routing + per-stage lanes to v2
(kept forward-compat hooks in the schema).

Kernel
  - New `task_runs` table. Each claim opens a run (pid, claim_lock,
    heartbeat, max_runtime, started_at), each terminal transition
    closes it with an outcome (completed / blocked / crashed /
    timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per
    task when retries happen, preserving full attempt history.
  - `tasks.current_run_id` points at the active run (NULL when idle);
    denormalised for cheap reads.
  - `task_events.run_id` carries the run a given event belongs to so
    UIs group events by attempt. claim/spawned/complete/block/crash/
    timeout/spawn_fail/gave_up/heartbeat events are all run-scoped;
    created/promoted/assigned/edited stay task-scoped (run_id=NULL).
  - Legacy DBs: migration adds the columns + indexes + synthesizes a
    run row for any task that's 'running' before the runs table
    existed, so subsequent complete/heartbeat/reclaim calls have a
    target. Idempotent.

Structured handoff
  - `complete_task(summary=, metadata=)` persists both on the closing
    run. `summary` falls back to `result` when omitted so single-run
    callers don't duplicate. `metadata` is a free-form dict
    ({changed_files, tests_run, findings, ...}).
  - `build_worker_context` rewrites: now reads "Prior attempts on this
    task" (closed runs: outcome, summary, error, metadata) and
    "Parent task results" pulls run.summary + run.metadata of the
    most-recent completed run per parent, falling back to task.result
    for legacy rows without runs. Retrying workers see why earlier
    attempts failed; downstream workers see parent handoffs
    structurally, not as loose `result` strings.

CLI
  - `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`.
    JSON is parsed and rejected with exit-2 if malformed.
  - New `hermes kanban runs <id> [--json]` verb. Shows per-run rows:
    outcome, profile, elapsed, summary, error. JSON mode serializes
    the full run dataclass for scripting.

Dashboard plugin
  - GET /tasks/:id now carries a runs[] array alongside task / events /
    comments / links. Each run serialised with outcome, summary,
    metadata, worker_pid, elapsed fields.
  - New Run History section in the drawer. Outcome-coloured left
    border (green=active, blue=completed, amber=reclaimed,
    red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs
    with a '+N earlier' toggle. Shows summary + error + metadata
    inline.

Forward-compat for v2 (vulcan's workflow templates + stages)
  - `tasks.workflow_template_id` and `tasks.current_step_key` added as
    nullable columns. v1 kernel ignores them for routing; v2 will add
    workflow_templates + workflow_steps tables and wire the dispatcher
    to consult them. task_runs has a matching `step_key` column. Lets
    a v2 release land additively without another schema migration.

Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard)
  - run_created_on_claim / run_closed_on_complete_with_summary
  - run_summary_falls_back_to_result
  - multiple_attempts_preserved_as_runs (3 attempts: reclaimed →
    crashed → completed, all visible in list_runs)
  - run_on_block_with_reason / run_on_spawn_failure_records_failed_runs
    (5 spawn_failed runs + 1 gave_up run)
  - event_rows_carry_run_id (task-scoped vs run-scoped split)
  - build_worker_context_includes_prior_attempts
  - build_worker_context_uses_parent_run_summary (metadata JSON in context)
  - migration_backfills_inflight_run_for_legacy_db (simulates a
    pre-migration running task, re-runs init_db, asserts backfill)
  - forward_compat_columns_writable
  - cli_runs_verb + cli_runs_json
  - cli_complete_with_summary_and_metadata (JSON round-trip through
    shlex + argparse)
  - cli_complete_bad_metadata_exits_nonzero
  - task_detail_includes_runs / task_detail_runs_empty_before_claim

269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke
covered: single-attempt complete → run closed + summary persisted;
retry scenario → two runs visible (blocked + completed); parent run
summary + metadata surfaced to child via build_worker_context;
forward-compat columns writable via UPDATE; GET /tasks/:id returns
runs[].

Docs
  - New 'Runs — one row per attempt' section in kanban.md: the
    why (full attempt history, structured metadata), the two-table
    model (task is logical, run is execution), the structured handoff
    shape (--summary / --metadata), example CLI + dashboard output,
    forward-compat note for v2.
  - Event reference updated to mention task_events.run_id.
  - CLI reference gains 'hermes kanban runs <id>'.

Not in v1 (deferred to v2):
  - Workflow templates (workflow_templates + workflow_steps tables,
    stage-based routing, success/failure step links).
  - 'stage' as a distinct axis from status in the UI.
  - Shared-by-default workspace binding across stages of the same
    workflow run.
  - Pipeline replacement for the kanban-orchestrator skill (the
    orchestrator's 'decompose, don't execute' guidance is still
    correct; it becomes partly redundant once workflows land).
2026-04-27 06:54:19 -07:00
Teknium da7d09c3b6 feat(kanban): max-runtime timeouts, worker heartbeats, assignees picker, event vocab cleanup
Ports four items from the Multica audit (https://github.com/multica-ai/multica).
Dropped their cross-host server/daemon architecture and their Postgres+pgvector
skill search — both the wrong shape for our single-host SQLite kernel.

1. Per-task max-runtime (`max_runtime_seconds` column)
   - New kernel function `enforce_max_runtime(conn)` runs in every dispatch
     tick. When a running task's elapsed time exceeds the cap, we SIGTERM
     the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The
     task goes back to 'ready' with a `timed_out` event and re-queues
     on the next tick (unless the spawn-failure circuit breaker has
     already parked it).
   - Host-local only: lock prefix must match this host's claimer_id so we
     never signal a PID on another machine.
   - CLI: `hermes kanban create --max-runtime 30m | 2h | 1d | <seconds>`.
     New `_parse_duration` helper accepts s/m/h/d suffixes or bare
     integers.
   - Dashboard POST body + the card's `max_runtime_seconds` field.

2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event)
   - `heartbeat_worker(conn, task_id, note=None)` emits the event and
     touches last_heartbeat_at. Refused when the task isn't running.
   - CLI: `hermes kanban heartbeat <id> [--note "..."]`.
   - kanban-worker skill instructs workers to heartbeat during long
     loops (training runs, encodes, crawls, batch uploads).
   - Separate signal from PID crash detection: a worker's Python can
     still be alive while the actual work process is stuck. Heartbeat
     absence is diagnostic; future work can auto-block on stale
     heartbeats but v1 just surfaces the signal.

3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`)
   - Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions
     with current assignees on the board. Each entry returns
     {name, on_disk, counts: {status: n}}.
   - CLI: `hermes kanban assignees [--json]`. Also hooked into
     `hermes kanban init` which now prints discovered profiles so new
     installs see 'these are the assignees you can target' immediately.
   - Dashboard: GET /api/plugins/kanban/assignees for the picker.

4. Event vocab cleanup (three renames + three new kinds)
   - `ready` → `promoted` (fires when deps clear; clearer semantic).
   - `priority` → `reprioritized` (past-tense verb, matches others).
   - `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit
     breaker gave up on this task).
   - New: `spawned` (emitted with {pid} on successful spawn),
     `heartbeat` ({note?}), `timed_out`
     ({pid, elapsed_seconds, limit_seconds, sigkill}).
   - One-shot migration in `_migrate_add_optional_columns` renames
     legacy rows in-place on init_db(), so existing DBs upgrade cleanly.
   - Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its
     own ⏱ message template, gave_up renamed from 'auto-blocked'.
   - Plugin_api.py's two 'priority' emit sites renamed to
     'reprioritized'.
   - Documented in a new 'Event reference' section in kanban.md,
     grouped into three clusters (lifecycle / edits / worker
     telemetry) with payload shapes.

Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py,
136/136 pass):
  - max_runtime_terminates_overrun_worker: real SIGTERM flow with
    _pid_alive stub, verifies event payload + state reset.
  - max_runtime_none_means_no_cap: unbounded tasks aren't timed out.
  - create_task_persists_max_runtime.
  - enforce_max_runtime_integrates_with_dispatch: kernel-level +
    dispatch_once chaining.
  - heartbeat_on_running_task + heartbeat_refused_when_not_running.
  - cli_heartbeat_verb with --note round-trip.
  - recompute_ready_emits_promoted_not_ready.
  - spawn_failure_circuit_breaker_emits_gave_up.
  - spawned_event_emitted_with_pid.
  - migration_renames_legacy_event_kinds (injects old rows, re-runs
    init_db, asserts rename).
  - list_profiles_on_disk (tmp_path + config.yaml filter).
  - known_assignees_merges_disk_and_board (profiles on disk + board
    assignees + per-status counts).
  - cli_assignees_json.
  - parse_duration_accepts_formats (s/m/h/d/float).
  - parse_duration_rejects_garbage.
  - cli_create_max_runtime_via_duration (2h → 7200).
  - cli_create_max_runtime_bad_format_exits_nonzero.

Live smoke: POST /tasks with max_runtime_seconds round-trips;
/assignees returns the union of on-disk + board-assigned names;
PATCH priority produces 'reprioritized' events (not 'priority');
board cards expose max_runtime_seconds + last_heartbeat_at.

Docs (website/docs/user-guide/features/kanban.md):
  - New 'Event reference' section with three-cluster table
    (lifecycle / edits / worker telemetry) + payload shapes.
  - CLI reference updated for --max-runtime, heartbeat, assignees.
  - Gateway notifications section updated for the new TERMINAL_KINDS.

Not ported from Multica (deliberate, documented in the out-of-scope
section already): Postgres+pgvector skill search (heavy deps conflict
with SQLite kernel), server+daemon cross-host model (we're
single-host on purpose), first-class agent identity with threaded
comments (we keep the board profile-agnostic).
2026-04-27 06:32:17 -07:00
Teknium af8d43dbbb feat(kanban): core hardening — daemon, circuit breaker, crash detect, logs, notify, bulk, stats
Eliminates every 'known broken on day one' item in the core functionality
audit. The board is now self-driving (daemon, not cron), self-healing
(crash detection, spawn-failure circuit breaker), and self-reporting
(logs, stats, gateway notifications).

Dispatcher
  - New `hermes kanban daemon` long-lived loop with --interval, --max,
    --failure-limit, --pidfile, --verbose, signal-clean shutdown
    (SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point
    lets tests drive it inline without subprocess.
  - `hermes kanban init` now prints the dispatcher setup hint so users
    don't leave the board off-by-default. Ships a systemd user unit at
    plugins/kanban/systemd/hermes-kanban-dispatcher.service.
  - Removed the old 'add this to cron' doc path. Cron runs agent
    prompts (LLM cost per tick) — unacceptable for a per-minute
    coordination loop.

Worker aliveness / safety
  - Spawn returns the child's PID; dispatcher stores it on the task row
    and calls detect_crashed_workers() every tick. If the PID is gone
    but the claim TTL hasn't expired, the task drops back to ready with
    a 'crashed' event. Host-local only — cross-host PIDs are ignored
    per the single-host design.
  - Spawn-failure circuit breaker: after N consecutive spawn_failed
    events on the same task (default 5), the dispatcher auto-blocks
    with the last error as the reason. Success resets the counter.
    Workspace-resolution failures count against the same budget.
  - Log rotation: _rotate_worker_log trims at 2 MiB, keeps one
    generation (.log.1), bounds per-task disk usage at ~4 MiB.

Idempotency / dedup
  - create_task(idempotency_key=...) returns the existing non-archived
    task id for retried webhooks. --idempotency-key on the CLI, json
    body field on the dashboard plugin. Archived tasks don't block a
    fresh create with the same key.

CLI surface
  - Bulk verbs: complete, unblock, archive accept multiple ids;
    block accepts --ids for sibling blocks with the same reason.
  - New verbs: daemon, watch (live event tail filtered by
    assignee/tenant/kinds), stats, log, notify-subscribe,
    notify-list, notify-unsubscribe.
  - dispatch gains --failure-limit + crashed/auto_blocked columns in
    JSON output and human-readable output.
  - gc accepts --event-retention-days / --log-retention-days; prunes
    task_events for terminal tasks and old log files.

Gateway integration
  - New GatewayRunner._kanban_notifier_watcher: polls
    kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed
    chats for completed/blocked/spawn_auto_blocked/crashed events.
    Cursor-advanced per-sub; auto-removed when the task reaches
    done/archived. Runs alongside the session expiry and platform
    reconnect watchers — SQLite work in asyncio.to_thread so the
    event loop never blocks.
  - /kanban create in the gateway auto-subscribes the originating
    chat (platform + chat_id + thread_id). Users see
    '(subscribed — you'll be notified when t_abcd completes or
    blocks)' appended to the response.

Dashboard plugin
  - GET /stats returns board_stats (by_status, by_assignee,
    oldest_ready_age_seconds).
  - GET /tasks/:id/log returns the worker log with optional ?tail=N
    cap. 404 on unknown task, exists=false when the task has never
    spawned.
  - POST /tasks accepts idempotency_key; both Pydantic body and the
    create_task kwarg now round-trip.
  - /board attaches task.age (created/started/time_to_complete in
    seconds) so the UI can colour stale cards without recomputing.
  - Card CSS: amber border after N minutes, red border when clearly
    stuck (tier per status: running 10m/60m, ready 1h/24h, todo
    7d/30d, blocked 1h/24h).
  - Drawer: new Worker log section, auto-loads on mount, last 100 KB
    cap with on-disk path surfaced when truncated.

Kernel
  - Schema additions: tasks.idempotency_key, tasks.spawn_failures,
    tasks.worker_pid, tasks.last_spawn_error; new
    kanban_notify_subs table. All gated by _migrate_add_optional_columns
    so legacy DBs upgrade cleanly.
  - release_stale_claims / complete_task / block_task now all clear
    worker_pid so crash detection doesn't false-positive on reclaimed
    tasks.
  - read_worker_log fixed: tail-skip no longer eats one-giant-line
    logs (common with child processes that don't flush newlines
    before dying).

Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new)
  - Idempotency: same key returns existing, archived doesn't block,
    no key never collides
  - Circuit breaker: auto-blocks after limit, success resets counter,
    workspace-resolution failure counts against budget
  - Aliveness: _pid_alive helper, detect_crashed_workers reclaims
    exited child
  - Daemon: runs and stops cleanly via stop_event, survives a tick
    exception
  - Stats + task_age helpers
  - Notify subs: CRUD, cursor advances, distinct-thread is a separate row
  - GC: events-only-for-terminal-tasks, old worker logs deleted
  - Log: rotation keeps one generation, read_worker_log tail
  - CLI: bulk complete/archive/unblock/block, create with
    --idempotency-key, stats --json, notify-subscribe+list, log
    missing task, gc reports counts
  - run_slash parity: smoke-tests every registered verb (23
    invocations); none may raise or return empty string

Full kanban test suite: 234/234 pass under scripts/run_tests.sh
(60 original + 30 dashboard plugin + 28 new core + 116 command
registry). Live smoke covers /stats, idempotency, age, log endpoint
with and without content, log?tail= truncation signal, 404 on unknown
task.

Docs (website/docs/user-guide/features/kanban.md)
  - 'Core concepts' rewritten: new statuses (triage), idempotency key,
    dispatcher-as-daemon-not-cron with circuit breaker behaviour
    documented.
  - Quick start swapped to daemon. New systemd section covers user
    service install.
  - New sections: idempotent create, bulk verbs, gateway
    notifications, out-of-scope single-host note (kanban.db is local;
    don't expect multi-host).
  - CLI reference updated for every new verb, every new flag.
2026-04-26 13:01:09 -07:00
Teknium 27fc6c1086 feat(kanban): bulk ops, drawer edit, dep editor, markdown, touch, config
The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.

Plugin backend
  - POST /tasks/bulk — apply the same patch (status / archive / assignee
    / priority) to every id in the request body. Each id runs
    independently: one bad id reports {ok: false, error: ...} without
    aborting siblings. Status transitions that aren't legal for the
    current state are surfaced per-id ('transition to done refused').
    Used by the multi-select bulk action bar.
  - GET /config — returns the dashboard.kanban section of config.yaml
    (default_tenant, lane_by_profile, include_archived_by_default,
    render_markdown) with sensible defaults when the section is absent.
    Loaded once by the SPA to preselect filters and toggle markdown
    rendering.
  - _conn() helper — every handler now goes through it, calling
    kanban_db.init_db() (idempotent) before every connection. Fresh
    installs work whether the first hit is GET /board, POST /tasks, or
    any other endpoint — no more 'no such table: tasks' when the CLI
    or a script hits the plugin before the dashboard has ever loaded.

Plugin UI (plugin bundle, +~12 KB)
  - Multi-select: per-card checkbox; shift/ctrl-click also toggles
    without opening the drawer. A BulkActionBar appears above the
    columns with batch → ready / complete / archive / reassign
    (profile dropdown + unassign option). Destructive batches confirm
    first. Partial failures from the backend are surfaced inline.
  - Drawer inline editing:
    - Click the title → TitleEditor swaps in an input, Enter saves,
      Escape cancels.
    - Click the Assignee meta row → AssigneeEditor input (empty string
      unassigns).
    - Click the Priority meta row → PriorityEditor numeric input.
    - New 'edit' button on Description → full-width textarea; Save /
      Cancel switch back to rendered view.
  - Dependency editor: chip list of parents + children with per-chip
    × button (calls DELETE /links). Add-parent / add-child dropdowns
    filter out self + already-linked tasks so you cannot re-add a
    duplicate edge or a self-loop. Cycle rejections from the server
    surface cleanly via the existing error banner.
  - Parent selection in InlineCreate: new dropdown listing every task
    on the board ('{id} — {title}') — picking one sends parents=[id]
    with the create payload, so the task lands in todo (or triage if
    created from the Triage column) with the dependency wired up.
  - Safe markdown rendering for description, comment bodies, and
    result. A small in-bundle renderer handles headings, bold, italic,
    inline code, fenced code, bullet lists, and http(s)/mailto links.
    Every substitution runs on HTML-escaped input (no raw HTML), links
    get target=_blank + rel=noopener,noreferrer. Disabled by config
    key dashboard.kanban.render_markdown=false (falls back to <pre>).
  - Touch drag-drop: attachTouchDrag() installs a pointerdown handler
    that spawns a drag proxy, tracks elementFromPoint under the finger,
    and dispatches a hermes-kanban:drop CustomEvent on the column when
    released. Desktop continues to use native HTML5 DnD. Columns
    listen for both.
  - ErrorBoundary already present from the prior commit catches any
    renderer throw; markdown escape + touch-proxy cleanup both have
    their own try/finally.

Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
  - bulk_status_ready: 3 tasks blocked, batch → ready, all move
  - bulk_archive hides all ids from default board
  - bulk_reassign changes every assignee
  - bulk_unassign_via_empty_string sets assignee back to None
  - bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
    good siblings still get priority=7
  - bulk_empty_ids_400
  - config_returns_defaults_when_section_missing
  - config_reads_dashboard_kanban_section (writes config.yaml, verifies
    every key round-trips)

Live smoke (real FastAPI app + isolated HERMES_HOME):
  - /config without section returns defaults
  - /config with dashboard.kanban section returns the configured values
  - POST /tasks as the first-ever request (no prior /board) succeeds —
    auto-init handles it
  - Link add + remove via POST /links + DELETE /links round-trip
  - Bulk priority bump on 2 ids, both get priority=5
  - Bulk archive hides ids from default board
  - PATCH {title, body} updates the task, markdown source survives
    the round trip
  - POST /tasks {triage: true, parents: [id]} lands in triage, not todo
  - Bulk partial: 2 good + 1 bogus returns per-id outcome

Docs (website/docs/user-guide/features/kanban.md)
  - 'What the plugin gives you' rewritten to reflect bulk, drawer
    edit, dep editor, parent-on-create, markdown, touch drag-drop.
  - New 'Dashboard config' subsection with a YAML example for
    dashboard.kanban.*.
  - REST table gains /tasks/bulk and /config rows.
2026-04-26 12:36:23 -07:00
Teknium 45806629c5 feat(kanban): Triage column, progress rollup, WS auth, lanes, polish
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.

Kernel changes
  - kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
    CHECK constraint so no schema migration is needed.
  - create_task(triage=True) forces the initial status to 'triage'
    regardless of parents, and parent ids are still validated so the
    eventual link rows don't dangle. recompute_ready() only promotes
    'todo' -> 'ready', so triage tasks are naturally isolated from the
    dispatcher pipeline.
  - hermes kanban create gains --triage.
  Patterns table (docs) gains P9 'Triage specifier'.

Plugin backend (plugins/kanban/dashboard/plugin_api.py)
  - GET /board now auto-init's kanban.db on first read (idempotent).
    A fresh install shows an empty board instead of 'failed to load'.
  - GET /board returns a new 'progress' field per task — {done, total}
    of child-task completion, or None if the task has no children.
  - BOARD_COLUMNS prepends 'triage'.
  - POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
    {status: 'triage'}.
  - WebSocket /events now requires ?token=<session_token> as a query
    param — browsers can't set Authorization on a WS upgrade, so this
    matches the pattern the in-browser PTY bridge uses. Constant-time
    compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
    contexts (no dashboard module) the check no-ops so the tail loop
    stays testable. Security boundary documented in the module header
    and in website/docs/user-guide/features/kanban.md.

Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
  - Adds the Triage column (lilac dot) with helper text
    'Raw ideas — a specifier will flesh out the spec'. Inline-create
    from the Triage column parks new tasks in triage.
  - Status action row in the drawer gains '→ triage'.
  - Progress pill (N/M) on cards that have children. Full-complete
    state tints the pill green.
  - 'Lanes by profile' toolbar toggle — sub-groups the Running column
    by assignee so you see at a glance which specialist is busy on
    what.
  - Destructive status moves (done / archived / blocked) via drag-drop
    OR via the drawer action row now prompt for confirmation.
  - Escape closes the drawer.
  - Live-update reloads are debounced (250ms) so a burst of
    task_events triggers one refetch, not N.
  - WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
  - WebSocket reconnect uses exponential backoff capped at 30s, not
    a fixed 1.5s spin loop, and surfaces a user-visible error on
    code-1008 (auth rejected) instead of reconnecting forever.
  - ErrorBoundary wraps the page — a bad card render shows a
    'rendering error, reload view' card instead of crashing the tab.

Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
  - empty-board shape now asserts all 6 columns including 'triage'
  - create_triage_lands_in_triage_column
  - triage_task_not_promoted_to_ready (dispatcher bypasses triage)
  - patch_status_triage_works (both into triage and out of it)
  - board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
  - board_auto_initializes_missing_db
  - ws_events_rejects_when_token_required (three sub-assertions:
    missing → 1008, wrong → 1008, correct → handshake accepted)

All 82 kanban tests pass under scripts/run_tests.sh.

Docs
  - kanban.md 'What the plugin gives you' fully rewritten to match
    shipped reality (triage, progress pill, assignee lanes,
    destructive-confirm, Escape-close, debounce).
  - New 'Security model' subsection documents the explicit-plugin-
    route-bypass, the WS token requirement, and the --host 0.0.0.0
    warning; also notes that kanban.db is profile-agnostic on purpose
    (the coordination primitive) so cross-profile visibility is
    expected.
  - CLI command reference shows --triage.
  - Collaboration patterns table adds P9 'Triage specifier'.
2026-04-26 12:26:43 -07:00
Teknium 4093201c47 feat(kanban): dashboard plugin — Linear/Fusion-style board UI
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core
changes — uses the standard dashboard plugin contract (manifest.json +
dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'.

What the tab gives you:
- One column per kanban status (todo / ready / running / blocked / done;
  archived behind a toggle), column counts, coloured status dots.
- Cards with id, title, priority badge, tenant tag, assignee,
  comment/link counts, 'created N ago'.
- HTML5 drag-drop between columns — status change routes through the
  same kanban_db code the CLI /kanban verbs use, so the three surfaces
  (CLI, gateway, dashboard) can never drift.
- Inline create per-column (title, assignee, priority).
- Side drawer on card click: description, status action row
  (→ ready / → running / block / unblock / complete / archive),
  dependency links, comment thread with Enter-to-submit,
  last 20 events.
- Toolbar: search, tenant filter, assignee filter, show-archived,
  nudge-dispatcher (skip the 60s wait), refresh.
- Live updates via WebSocket tailing task_events — the board reflects
  CLI or gateway actions in real time.

REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id,
POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links,
DELETE /links, POST /dispatch, WS /events. Every handler is a thin
wrapper around kanban_db — no new business logic.

Visually theme-aware: the plugin CSS reads only --color-*, --radius,
--font-mono etc. so it reskins with whichever dashboard theme is active.

Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests):
- empty board shape
- create + appears in ready column with tenant/assignee rollups
- tenant filter
- detail includes parents/children/events
- 404 on unknown task
- PATCH status: complete / block / unblock / ready drag-drop / running
- PATCH reassign, priority, edit, invalid-status rejection
- POST comment (plus empty-body rejection)
- POST link + DELETE link + cycle rejection
- POST dispatch (dry run)

All 76 kanban tests pass under scripts/run_tests.sh.

Docs: website/docs/user-guide/features/kanban.md gains a full
'Dashboard (GUI)' section covering install, architecture, REST surface,
live-updates mechanism, extending, and scope boundary.
2026-04-26 12:08:47 -07:00
Teknium 9f610aa8f3 docs(kanban): add GUI/Dashboard plugin section
The /kanban CLI + slash command are enough to run the board
headlessly, but triage and cross-profile supervision want a
visual board. Document the design as a dashboard plugin that:

- reads live state from kanban.db over a WebSocket on
  task_events (no polling)
- writes through run_slash() so CLI/gateway/GUI cannot drift
- mounts under /api/plugins/kanban/ following the existing
  'Extending the Dashboard' plugin shape

The plugin is strictly a thin layer over kanban_db — no new
business logic, nothing to merge into the kernel.
2026-04-26 11:57:04 -07:00
Teknium e1c5e741ad feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.

Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).

What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
  dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
  entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
  worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
  the running-agent guard (board writes don't touch agent state), so
  /kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
  vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
  4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
  updated, sidebar entry added.

Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
  execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
  No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
  failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
  claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
  can serve many businesses with data isolation by workspace path.

Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
2026-04-26 08:29:46 -07:00
Teknium e3901d5b25 fix(run_agent): background review fork inherits parent's live runtime (#16099)
The background memory/skill review (_spawn_background_review) has always
forked a new AIAgent passing only model and provider, then relied on
AIAgent.__init__ to re-resolve credentials from env vars. This works for
users with keys in ~/.hermes/.env but silently falls back to env-var
auto-resolution in all cases, which fails for OAuth-only providers,
session-scoped creds, and credential-pool setups where auth can't be
reconstructed from env.

This used to be invisible -- failures were swallowed via logger.debug().
PR 8a2506af4 (Apr 24) surfaced auxiliary failures to the user, which
made the stale bug visible as:
    "Auxiliary background review failed: No LLM provider configured"

Fix: pass api_key, base_url, api_mode, and credential_pool from the
parent's live runtime into the fork -- matching how every other
auxiliary path (compression, memory flush, vision, session search)
already inherits the parent's credentials via _current_main_runtime().
2026-04-26 08:29:40 -07:00
Teknium 06f81752ed Revert "feat(kanban): durable multi-profile collaboration board (#16081)" (#16098)
This reverts commit 15937a6b46.
2026-04-26 08:29:37 -07:00
Teknium 9ef1ae138a fix(docker): don't chown config.yaml after gosu drop (#15865) (#16096)
The chown/chmod block on config.yaml was added in b24d239ce to keep the
file readable by the hermes runtime user, but it sat in the post-gosu
'running as hermes' section of the entrypoint. That meant:

1. Default `docker run <image>` — container starts as root, entrypoint
   drops to hermes via gosu, then non-root hermes tries to chown the
   file to hermes. Works by coincidence because the file was just
   created by root during volume setup and gosu target == target owner.
2. `docker run -u $(id -u):$(id -g) <image>` (#15865) — container
   starts as the caller's UID. The root block is skipped entirely, we
   land in the hermes section as some arbitrary non-root user, and
   chown to 'hermes' fails with 'Operation not permitted'. Script
   aborts under `set -e`.

Move the chown/chmod into the root block (before the gosu exec) where
it actually has privilege, and guard with `2>/dev/null || true` so
rootless Podman (where even in-container root lacks host-side chown
rights) doesn't abort either.

Closes #15865
2026-04-26 08:27:39 -07:00
Teknium c5196f1fc2 chore(release): map focusflow.app.help@gmail.com to yes999zc
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
2026-04-26 08:25:22 -07:00
FocusFlow Dev 63bf7a29b6 fix(run_agent): prevent reasoning_content regression in DeepSeek/Kimi tool-call replay
PR #15478 fixed missing reasoning_content for DeepSeek API but introduced
a regression: tool-call messages with genuine 'reasoning' field were
overwritten by empty-string fallback before promotion.

Re-order _copy_reasoning_content_for_api steps:
  1. Preserve explicit reasoning_content
  2. Promote 'reasoning' field (MOVED UP)
  3. DeepSeek/Kimi tool-call empty-string fallback (MOVED DOWN)
  4. Non-thinking provider cleanup

Fixes #15812, relates #15749, #15478.
2026-04-26 08:25:22 -07:00
Teknium 15937a6b46 feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.

Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).

What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
  dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
  entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
  worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
  the running-agent guard (board writes don't touch agent state), so
  /kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
  vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
  4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
  updated, sidebar entry added.

Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
  execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
  No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
  failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
  claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
  can serve many businesses with data isolation by workspace path.

Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
2026-04-26 08:24:26 -07:00
Teknium 454d883e69 refactor: drop persist_session plumbing + fix broken btw mid-turn bypass (#16075)
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.

run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
  short-circuit. Only /btw ever passed persist_session=False; with
  /btw gone the default (always persist) is the only behavior anyone
  ever wanted.

gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
  (PR #16059). Canonical name for a /btw message is 'background' after
  alias resolution — the comparison could never be true, and it called
  _handle_btw_command which no longer exists. The /background branch
  above it already dispatches /btw correctly.

tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
  (the real dispatch target for /btw) instead of the deleted
  _handle_btw_command.
2026-04-26 07:15:23 -07:00
Teknium 70f56e7605 fix(gateway): let /btw dispatch mid-turn instead of being rejected
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:

   Agent is running — /btw can't run mid-turn. Wait for the current
  response or /stop first.

That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.

Reported by @IuriiTiunov on Telegram.
2026-04-26 07:11:10 -07:00
Teknium 7fa70b6c87 refactor: /btw is now an alias for /background (#16053)
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.

Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.

Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py

Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
2026-04-26 07:11:08 -07:00
Teknium 9a70260490 Revert "feat(onboarding): port first-touch hints to the TUI (#16054)" (#16062)
This reverts commit ffd2621039.
2026-04-26 06:31:37 -07:00
Teknium ffd2621039 feat(onboarding): port first-touch hints to the TUI (#16054)
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY).  This extends the same
latch to the TUI with TUI-native wording.

The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts.  The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.

Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
  hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
  _on_tool_complete for the 30s/tool_progress=all path.  Same
  config.yaml latch so each hint fires at most once per install across
  CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
  busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
  section explaining the TUI's double-Enter model and the hints

Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
  tests/tui_gateway/test_protocol.py \
  tests/gateway/test_busy_session_ack.py
  -> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint       -> clean
npm --prefix ui-tui run build      -> clean
2026-04-26 06:24:19 -07:00
Teknium 1e37ddc929 feat(cli): add 'hermes fallback' command to manage fallback providers (#16052)
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.

  hermes fallback [list]   Show the current chain (primary + fallbacks)
  hermes fallback add      Run the model picker, append selection to chain
  hermes fallback remove   Pick an entry to delete (arrow-key menu)
  hermes fallback clear    Remove all entries (with confirmation)

'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
2026-04-26 06:19:04 -07:00
Teknium 83c1c201f6 feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046)
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:

1. First message while the agent is working — appends a hint to the busy-ack
   explaining the /busy queue vs /busy interrupt knob, phrased to match the
   mode that was just applied (don't tell a queue-mode user to switch to
   queue).

2. First tool that runs for >= 30s in the noisiest progress mode
   (tool_progress: all) — prints a hint about /verbose to cycle display
   modes (all -> new -> off -> verbose). Gated on /verbose actually being
   usable on the surface: always shown on CLI; on gateway only shown when
   display.tool_progress_command is enabled.

Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.

New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
  both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
  load_cli_config defaults (cli.py). No _config_version bump — deep
  merge handles new keys.

Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
  after building the ack.  progress_callback tracks tool.completed
  duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
  Enter; _on_tool_progress appends the tool-progress hint on the first
  >=30s tool completion.  In-memory CLI_CONFIG is also updated so
  subsequent fires in the same process are suppressed immediately.

All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
2026-04-26 06:06:27 -07:00
Teknium 4bda9dcade fix(gateway): honor voice.auto_tts config in auto-TTS gate (#16007) (#16039)
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.

Gate the base adapter on a three-layer decision instead:
  1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
  2. chat in _auto_tts_disabled_chats (explicit /voice off)  → suppress
  3. else → voice.auto_tts global default

Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.

Closes #16007.
2026-04-26 05:52:05 -07:00
Teknium 67dcace412 docs(config): show options in comments for display settings (#16038)
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.

- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
  note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
  streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
  poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
2026-04-26 05:51:37 -07:00
Teknium 35c57cc46b fix(gateway): suppress tool-progress bubbles after interrupt (#16034)
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the ' Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.

progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The ' Interrupting'
message is sent through a separate adapter path and is unaffected.
2026-04-26 05:47:37 -07:00
Teknium e8441c4c0f fix(clipboard): report native/tmux success, keep Ctrl+Shift+C on dashboard
Follow-up on #16020 salvage. Three corrections:

1. Truth signal for /copy
   Before: success was 'OSC 52 sequence was emitted to stdout'. That's
   false on local Linux inside tmux (emitSequence=false), so /copy kept
   printing 'clipboard copy failed' to users whose xclip/wl-copy had
   already succeeded fire-and-forget.
   Fix: setClipboard() now returns { sequence, success } where success =
   native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative()
   returns a boolean telling setClipboard whether a native attempt was
   made. /copy only shows 'failed' when literally no path was taken.

2. Dashboard keybinding
   Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste).
   That swallows SIGINT when a stale selection is present and breaks
   the xterm/gnome-terminal/konsole/Windows-Terminal convention where
   Ctrl+C in a terminal emulator is always SIGINT. The real bug was
   that clipboard writes lost user-gesture through OSC-52 round-trips,
   which the direct writeText already fixes.
   Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct
   writeText in the keydown handler preserves user gesture. term.write
   Escape replaced with term.clearSelection() (works without relying
   on TUI input mode).

3. Error toast text
   Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to
   debug but not how to fix.
   Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual
   escape hatch), mention the debug var second.
2026-04-26 05:46:45 -07:00
Harry Riddle 2511207cb0 chore: revert docs 2026-04-26 05:46:45 -07:00
Harry Riddle 0f3a6f0fb3 fix(clipboard): dashboard Ctrl+C direct copy; TUI honest feedback; HERMES_TUI_FORCE_OSC52
- Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture);
  send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback.
- TUI /copy: copySelection() async; only reports success if OSC52 emitted.
- Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection.
- Fixes "copied N chars" false-positive when clipboard backend absent.

Changes:
  web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText
  ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection
  ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52
  ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback
2026-04-26 05:46:45 -07:00
Harry Riddle a562420383 fix(tui): robust clipboard handling with debug logging and headless detection
Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty.
Root causes:
- Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in
  headless Docker/SSH they fail or hang.
- OSC 52 fallback requires terminal emulator support; when absent, sequence
  is dropped silently.
- Dashboard OSC 52 → Clipboard API path fails due to missing user gesture;
  errors were silently caught.
- User feedback 'copied selection' was shown unconditionally, regardless of
  success.

Solution implemented:
- Short-circuit Linux native clipboard probing when no display server is
  present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and
  timeouts.
- Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to
  stderr which clipboard path is used, probe results on Linux, and whether
  OSC 52 was emitted. Greatly improves diagnosability.
- Improve dashboard clipboard error handling: replace empty catch blocks
  with console.warn messages for OSC 52 decode/Write failures and direct
  copy/paste errors. Makes browser permission/user-gesture failures visible
  in DevTools.
- Add comprehensive clipboard troubleshooting documentation to README and
  AGENTS, covering OSC 52 verification, tmux config, Docker/headless
  constraints, env vars, dashboard caveats, and fallback strategies.

Technical details:
-  in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts:
  - Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset.
  - Refactor probe sequence to async  with 500ms timeout,
    caching result; subsequent copies use cached tool immediately.
  - Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1.
-  in ink.tsx: log when OSC 52 not emitted (native
  or tmux path in use) in debug mode.
- : OSC 52 handler and Ctrl+Shift+C handler now
  log warnings to console on Clipboard API rejection with error message.
- Documentation: new 'Clipboard Troubleshooting' section in README; new
  'Clipboard environment variables and pitfalls' subsection in AGENTS.md
  (Known Pitfalls).

Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests
unaffected. No breaking changes.

Files changed:
- ui-tui/packages/hermes-ink/src/ink/termio/osc.ts
- ui-tui/packages/hermes-ink/src/ink/ink.tsx
- web/src/pages/ChatPage.tsx
- README.md
- AGENTS.md
- CHANGELOG.md (new)
2026-04-26 05:46:45 -07:00
Teknium 855366909f feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.

Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).

Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).

Config (new model_catalog section, default enabled):
  model_catalog.url       master manifest URL
  model_catalog.ttl_hours disk cache TTL (default 24h)
  model_catalog.providers.<name>.url   optional per-provider override

Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.

Changes:
- website/static/api/model-catalog.json    initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py           regenerator from in-repo lists
- hermes_cli/model_catalog.py              fetch + validate + cache module
- hermes_cli/models.py                     fetch_openrouter_models() +
                                           new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py   Nous flows use the helper
- hermes_cli/config.py                     model_catalog defaults
- website/docs/reference/model-catalog.md  + sidebars.ts
- tests/hermes_cli/test_model_catalog.py   21 tests (validation, fetch
                                           success/failure, accessors,
                                           disabled, overrides, integration)
2026-04-26 05:46:43 -07:00
Teknium d09ab8ff13 fix(mcp-oauth): preserve server_url path for protected-resource validation (#16031)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.

For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).

Closes #16015.
2026-04-26 05:43:54 -07:00
Teknium 438db0c7b0 fix(cli): /model picker honors provider-specific context caps (#16030)
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.

Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.

Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.

Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).

## Validation
| path                          | before    | after     |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex    | 1,050,000 | 272,000   |
| picker -> gpt-5.5 on OpenAI   | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000   | 272,000   |
2026-04-26 05:43:31 -07:00
zkl 2ccdadcca6 fix(deepseek): bump V4 family context window to 1M tokens
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.

Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:

- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)

Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.

Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
2026-04-26 05:32:54 -07:00
Teknium 76042f5867 feat(review): class-first skill review prompt (#16026)
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.

This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").

Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.

Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.

Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
2026-04-26 05:17:10 -07:00
Teknium 192e7eb21f fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898)
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.

Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.

Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.

Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
2026-04-26 04:53:42 -07:00
Teknium 59b56d445c feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.

Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
  invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
  used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
  authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
  _serialize_payload already surfaces non-top-level kwargs under
  payload['extra'], so duration_ms lands at extra.duration_ms for
  shell-hook scripts

Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.

Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).

Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
2026-04-25 22:13:12 -07:00
Teknium eb28145f36 feat(approval): hardline blocklist for unrecoverable commands (#15878)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode.  Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.

The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
  /lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
  systemctl poweroff/reboot/halt/kexec

Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.

Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.

Inspired by Mercury Agent's permission-hardened blocklist.
2026-04-25 22:07:12 -07:00
Teknium a55de5bcd0 feat(setup): auto-reconfigure on existing installs (#15879)
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.

Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)

The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.

The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).

Inspired by Mercury Agent's `mercury doctor` UX.

Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
  (guarding behavior that no longer exists — covered by
  test_setup_reconfigure.py instead)
2026-04-25 22:02:02 -07:00
brooklyn! cec0af02ad Merge pull request #15870 from NousResearch/bb/fix-skills-search
fix(tui): restore skills search RPC
2026-04-25 22:13:28 -05:00
Brooklyn Nicholson 91a7a0acbe fix(tui): restore skills search RPC 2026-04-25 22:11:52 -05:00
Teknium 7c50ed707c docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
  and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
  routing, /v1 stripping, api-version query forwarding, and the
  provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
  AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
  HangGlidersRule (noreply), and pein892 so the contributor-attribution
  CI check does not reject the salvage.
2026-04-25 18:48:43 -07:00
Teknium 731e1ef8cb feat(azure-foundry): auto-detect transport, models, context length
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:

  1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
     Claude routes and skip to anthropic_messages.
  2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
     model list, we switch to chat_completions and prefill the picker
     with the returned deployment/model IDs.
  3. Anthropic Messages probe — fallback for endpoints that don't expose
     /models but do speak the Anthropic Messages shape.
  4. Manual fallback — private endpoints / custom routes still work;
     the user picks API mode + types a deployment name.

Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.

Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.

New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection

New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError

Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth).  The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case.  Users on private/gated endpoints fall through to manual entry.
2026-04-25 18:48:43 -07:00
akhater ac57114284 fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
2026-04-25 18:48:43 -07:00
pein892 24b4b24d79 fix: preserve URL query params for Azure OpenAI and custom endpoints
Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.
2026-04-25 18:48:43 -07:00
HangGlidersRule c15064fa37 fix: pass api-version as default_query param, not in base_url — SDK was producing malformed URLs like /anthropic?api-version=.../v1/messages 2026-04-25 18:48:43 -07:00
HangGlidersRule 7bfa9442de fix: skip OAuth token refresh for Azure Anthropic endpoints — prevents ~/.claude/.credentials.json from overwriting Azure key mid-session 2026-04-25 18:48:43 -07:00
HangGlidersRule d8e4c7214e fix: Azure Anthropic short-circuit in resolve_runtime_provider — bypass custom runtime when provider=anthropic + azure.com URL 2026-04-25 18:48:43 -07:00
HangGlidersRule 6ef3a47ce5 fix: use Azure API key directly for Azure endpoints, bypass OAuth token priority chain 2026-04-25 18:48:43 -07:00
TechPrototyper 3a7653dd1f feat: Add Azure Foundry provider with OpenAI/Anthropic API mode selection
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.

Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)

Usage:
  hermes model -> More providers -> Azure Foundry

The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name

Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
2026-04-25 18:48:43 -07:00
Teknium 125de02056 fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844)
Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.

## What changed

New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.

`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).

Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)

## Context probe tiers

`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.

## Repro (from #15779)

```yaml
custom_providers:
  - name: my-custom-endpoint
    base_url: https://example.invalid/v1
    model: gpt-5.5
    models:
      gpt-5.5:
        context_length: 1050000
```

`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".

## Tests

- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
2026-04-25 18:47:53 -07:00
Teknium 4c591c2819 chore(release): map fqsy1416@gmail.com to EKKOLearnAI 2026-04-25 18:40:35 -07:00
Teknium 01535a4732 fix(api_server): cap stop-run wait at 5s so interrupt can't hang handler
task.cancel() can't preempt the run_in_executor thread running
run_conversation(), so we rely on agent.interrupt() to wake the loop.
Without a timeout, a slow/unresponsive interrupt blocks the HTTP
response indefinitely. Wrap the await in wait_for(shield(task), 5.0)
and log a warning on timeout.

Also tidy one extra space in the module docstring's /stop entry.
2026-04-25 18:40:35 -07:00
ekko 0a15dbdc43 feat(api_server): add POST /v1/runs/{run_id}/stop endpoint
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.

Includes 15 tests covering start, events, and stop scenarios.
2026-04-25 18:40:35 -07:00
Teknium ce0513dd2e chore(release): map Feranmi10 personal email 2026-04-25 18:39:55 -07:00
Oluwadare Feranmi dc5e02ea7f feat(cli): implement hermes update --check flag (fixes #10318) 2026-04-25 18:39:55 -07:00
brooklyn! ff851ba7b9 Merge pull request #15821 from NousResearch/fix/tui-ctrl-g-editor
fix: external editor handoff in CLI/TUI
2026-04-25 20:37:05 -05:00
Brooklyn Nicholson 14dd8e9a72 fix(tui): address Copilot review on editor handoff
- resolveEditor() now returns argv (string[]) so EDITOR='code --wait'
  and VISUAL='emacsclient -t' tokenize correctly into spawnSync's
  separate command + args. Previously the whole string was passed as
  argv[0] and would ENOENT.
- Skip the POSIX X_OK PATH walk on Windows; return ['notepad.exe']
  there since fs.constants.X_OK is not meaningful and PATHEXT-based
  resolution would need its own implementation.
- Surface openEditor() rejections via actions.sys instead of letting
  them become unhandled promise rejections in the useInput callback.
- Hotkey docs/comment now say Cmd/Ctrl+G to match isAction()'s
  platform-action-modifier behavior (Cmd on macOS, Ctrl elsewhere).
2026-04-25 20:34:24 -05:00
Wysie 1d80e92c7e test(discord): add guild to fake e2e messages 2026-04-25 18:25:56 -07:00
Teknium edce7522a5 chore(release): add AUTHOR_MAP entry for voidborne-d personal email 2026-04-25 18:25:13 -07:00
voidborne-d 45e1228a8a fix(cli): suppress OSError EIO on interrupt shutdown
When the user interrupts a long-running task, prompt_toolkit tries to
flush stdout during emergency shutdown.  If stdout is in a broken state
(redirected to /dev/null, pipe closed, terminal gone), the flush raises
`OSError: [Errno 5] Input/output error` which propagates unhandled and
crashes the CLI.

Two defense layers:

1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to
   the asyncio exception handler, matching the existing pattern for
   `RuntimeError("Event loop is closed")` and `KeyError("is not
   registered")`.

2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check
   before the existing string-match guards, silently suppressing the
   error instead of printing a misleading stdin-related message.

Fixes #13710.
2026-04-25 18:25:13 -07:00
Brooklyn Nicholson 83129e72de refactor(tui): tighten editor handoff helpers
- editor.ts: collapse two private helpers into one flatMap-driven lookup,
  keep `isExecutable` as the only named primitive, document the fallback
  chain with prompt_toolkit parity
- editor.test.ts: hoist the `exe` helper out of `describe`, drop the
  empty afterEach + dead mkdir branch, materialize expected paths before
  the resolveEditor call so argument evaluation order doesn't bite
- useComposerState.openEditor: rmSync the mkdtemp dir (was leaking),
  early-return on bad exit / empty buffer, run cleanup in finally
- useInputHandlers: cheap `ch.toLowerCase() === 'g'` guard before the
  modifier check
- hermes-ink/screen.ts: pick up `npm run fix` import-sort cleanup so
  lint passes
2026-04-25 20:24:06 -05:00
Teknium 4d170134ef chore(release): map nerijusn76@gmail.com to Nerijusas (#15833) 2026-04-25 18:22:49 -07:00
nerijusas 81e01f6ee9 fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
Brooklyn Nicholson 7fd8dc0bfb fix: preserve prompt_toolkit editor picker and mirror it in TUI
Base CLI's editor UX was better because prompt_toolkit picks the system
editor first, then friendly terminal editors before vi. Do not override
that with a vim-first chain.

Keep the CLI on prompt_toolkit's picker and only set tempfile_suffix='.md'
to avoid the complex-tempfile EEXIST path. Update the TUI resolver to
match prompt_toolkit's fallback order: $VISUAL, $EDITOR, editor, nano,
pico, vi, emacs.
2026-04-25 20:20:05 -05:00
Brooklyn Nicholson d056b610b7 fix: avoid prompt_toolkit complex tempfile bug and prefer nvim first
Setting buffer.tempfile = 'prompt.md' pushed prompt_toolkit into its
complex-tempfile path, which creates a temp dir and then calls
os.makedirs() on that same path when no subdirectory is present. That
raises EEXIST before the editor can launch.

Keep prompt_toolkit on the simple tempfile path with .md suffix, and
make the editor fallback chain explicit on both surfaces:
$VISUAL -> $EDITOR -> nvim -> vim -> vi -> nano.
2026-04-25 20:16:50 -05:00
Teknium 2536a36f6f fix(tui): route /save through session.save JSON-RPC
The cherry-picked approach serialized the UI-shaped transcript on the Node
side, producing a third JSON format alongside cli.py save_conversation and
tui_gateway session.save. Simpler to call the existing session.save method,
which already writes the canonical agent history (raw OpenAI messages +
model) to an absolute-path file.

- /save still short-circuits before the slash worker
- Empty transcript -> 'no conversation yet'
- No active session -> 'no active session - nothing to save'
- Otherwise: rpc('session.save', {session_id}) and echo back the file path
- Tests updated to assert RPC contract; new test covers the no-sid case
2026-04-25 18:11:37 -07:00
helix4u 1b8ca9254f fix(tui): save live transcript from slash command 2026-04-25 18:11:37 -07:00
Brooklyn Nicholson db7c5735f0 fix: prefer vim over nano for $EDITOR fallback (CLI + TUI)
prompt_toolkit's default editor list is: $VISUAL, $EDITOR, /usr/bin/editor,
/usr/bin/nano, /usr/bin/pico, /usr/bin/vi, /usr/bin/emacs — so when
neither env var is set, the base CLI launched nano. The TUI fell back
to a literal 'vi'. Same Ctrl+G keystroke, two different editors.

Pick the same chain on both surfaces:
  $VISUAL → $EDITOR → vim → vi → nano

CLI: override input_area.buffer._open_file_in_editor on the TextArea
once at app build time. Local to that buffer; doesn't touch
os.environ or affect other subprocesses.

TUI: extract resolveEditor() into ui-tui/src/lib/editor.ts. PATH walk
with accessSync(X_OK), no shelling out. Six-line unit test verifies
the priority order and the multi-entry PATH walk.
2026-04-25 20:11:25 -05:00
Teknium 8bbeaea6c7 fix(config): broaden api-key ref lookup to templated base_url
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.

Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.

Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
2026-04-25 18:10:52 -07:00
helix4u 1fdc31b214 fix(config): preserve custom provider api key refs 2026-04-25 18:10:52 -07:00
Brooklyn Nicholson 5fac6c3440 fix(cli): write editor draft to prompt.md so syntax highlighting works
Base CLI was handing prompt_toolkit's Buffer.open_in_editor() a default
config — Buffer.tempfile_suffix and .tempfile both empty — so it
created /tmp/tmpXXXXXX with no extension. nano/vim/helix all key
syntax highlighting off the file extension, so the buffer rendered
plain.

The TUI already writes to <mkdtemp>/prompt.md and gets full markdown
highlighting + a sensible title bar. Set buffer.tempfile = 'prompt.md'
on the TextArea so prompt_toolkit's complex-tempfile path produces
<mkdtemp>/prompt.md to match. shutil.rmtree cleanup is built-in.
2026-04-25 20:04:04 -05:00
kshitijk4poor 2c56dce0ed fix(model): preserve custom endpoint credentials and accept cloud models not in /v1/models
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
  being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
  switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
  provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
  to switch_model so the config model list is available for validation

Closes #15088
2026-04-25 18:03:47 -07:00
Teknium 01cf2c65cc chore(release): map iris@growthpillars.co to irispillars (#15825)
Follow-up to #15533 (merged). Prevents release notes CI from
attributing the contributor to the placeholder.
2026-04-25 18:02:13 -07:00
helix4u b2d3308f98 fix(doctor): accept bare custom provider 2026-04-25 18:01:36 -07:00
Iris Jin 25ba6a4a74 fix(gateway): make reasoning session-scoped by default 2026-04-25 18:01:31 -07:00
Brooklyn Nicholson 4c797bfae9 fix(cli): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
Same problem as the TUI: Cursor and VSCode bind Ctrl+G to "Find Next"
at the editor level, so the keystroke never reaches the terminal and
the prompt_toolkit-driven Hermes CLI sees nothing.

Register ('escape', 'g') alongside the existing 'c-g' on the same
handler so the editor handoff works inside Cursor/VSCode too. The
filter (no clarify/approval/sudo/secret prompt active) is unchanged.
2026-04-25 20:01:03 -05:00
Brooklyn Nicholson c58956a9a2 fix(tui): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
VSCode and Cursor bind Ctrl+G to "Find Next" at the editor level, so
the keystroke never reaches the embedded terminal — Ctrl+G to open
\$EDITOR was effectively dead inside those IDEs.

Alt+G is unbound in both editors and reaches the TUI cleanly as
`\x1bg` → `key.meta && ch === 'g'` after parse-keypress. Accept it
alongside the existing isAction(key, ch, 'g') check, and document the
fallback in README + the hotkeys panel.
2026-04-25 19:57:17 -05:00
Brooklyn Nicholson 3944b22506 fix(tui): suspend Ink properly when opening $EDITOR via Ctrl+G
The Ctrl+G handler was toggling the alt-screen by hand
(`\x1b[?1049l` ... `\x1b[?1049h`) without releasing stdin or kitty
keyboard mode, so the launched editor would lose keystrokes (Ink kept
swallowing them) and editors that don't speak CSI-u (e.g. nano) would
print "Unknown sequence" for every Ctrl-key.

Switch to `withInkSuspended` from @hermes/ink, the same helper
`/setup` already uses. It pauses Ink, removes stdin listeners, drops
raw mode, disables kitty/modifyOtherKeys + mouse + focus reporting,
runs the editor, then restores everything with a full repaint.
2026-04-25 19:54:06 -05:00
brooklyn! 489bed6f96 Merge pull request #15478 from yes999zc/fix-deepseek-reasoning-all-assistant-messages
fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
2026-04-25 19:19:33 -05:00
FocusFlow Dev ad0ac89478 fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.

- Remove the 'source_msg.get("tool_calls") and' guard
- Update test: plain assistant turns now get padded for DeepSeek/Kimi

Fixes #15213
2026-04-26 07:47:13 +08:00
Teknium dc4d92f131 docs: embed tutorial videos on webhooks + auxiliary models pages (#15809)
- webhooks.md: adds a Video Tutorial section under the intro with a
  responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
  Models with a responsive YouTube iframe (NoF-YajElIM).

Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
2026-04-25 16:44:53 -07:00
Teknium 47420a84b9 docs(obliteratus): link YouTube video guide in SKILL.md (#15808)
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
2026-04-25 16:30:38 -07:00
brooklyn! f93d4624bf Merge pull request #15749 from Zjianru/fix/copy-reasoning-content-ordering-and-cross-provider-isolation
fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
2026-04-25 17:21:49 -05:00
codez 5ae608152e fix: remove has_reasoning guard — inject empty reasoning_content for DeepSeek/Kimi tool_calls unconditionally 2026-04-26 06:08:54 +08:00
brooklyn! 88b65cc82a Update run_agent.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-26 05:49:38 +08:00
brooklyn! edc78e258c Merge pull request #15766 from NousResearch/bb/tui-ssh-copy
fix(tui): honor client copy shortcut over ssh
2026-04-25 15:33:17 -05:00
Brooklyn Nicholson 31d7f1951a fix(tui): clamp copied selection bounds
Clamp copied selection columns to the screen width before scanning rendered cells.
2026-04-25 15:32:45 -05:00
Brooklyn Nicholson b1c18e5a41 refactor(tui): format screen imports
Keep screen.ts import ordering aligned with the ui-tui formatter.
2026-04-25 15:26:51 -05:00
Brooklyn Nicholson bd66e55a02 fix(tui): track rendered spaces for selection copy
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
2026-04-25 15:21:26 -05:00
Brooklyn Nicholson 1735ced93b fix(tui): preserve code block indentation in selection
Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.
2026-04-25 15:17:36 -05:00
Brooklyn Nicholson bba16943f6 fix(tui): preserve rendered indentation in selections
- trim only empty edge rows instead of full selected text
- bound selection paint using unwritten cells so rendered indentation remains copyable
2026-04-25 15:14:26 -05:00
Brooklyn Nicholson 132620ba3d refactor(tui): simplify remote copy hotkey hints
Use an explicit conditional table instead of spread casting for SSH copy hint rows.
2026-04-25 15:09:12 -05:00
Brooklyn Nicholson 876bb60044 fix(tui): trim whitespace-only selection chrome
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
2026-04-25 15:07:29 -05:00
Brooklyn Nicholson a68793b6c4 refactor(tui): share remote shell detection
Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.
2026-04-25 14:55:28 -05:00
Brooklyn Nicholson bcc5362432 fix(tui): honor client copy shortcut over ssh
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
2026-04-25 14:44:39 -05:00
brooklyn! 283c8fd6e2 Merge pull request #15755 from NousResearch/bb/tui-model-flag
fix(tui): honor launch model overrides
2026-04-25 14:30:26 -05:00
Brooklyn Nicholson 919274b60e fix(tui): align overlay q shortcut casing
Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.
2026-04-25 14:26:35 -05:00
Brooklyn Nicholson 6e83d90eb4 refactor(tui): tighten overlay helpers
- rename overlay help text component to match its role
- share picker window math across model, session, and skills overlays
2026-04-25 14:23:45 -05:00
Brooklyn Nicholson c6fdf48b79 fix(tui): sync inference model after switches
- keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches
- clarify static provider detection remapping docs
2026-04-25 14:17:57 -05:00
Brooklyn Nicholson a046483e86 fix(tui): share overlay close controls
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
2026-04-25 14:17:04 -05:00
Brooklyn Nicholson fdcbd2257b fix(tui): resolve startup model aliases statically
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
2026-04-25 14:13:02 -05:00
Brooklyn Nicholson 48bdd2445e fix(tui): apply ui-tui fix pass and restore type-check
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
2026-04-25 14:08:54 -05:00
Brooklyn Nicholson 5e52011de3 fix(tui): bind provider as model alias 2026-04-25 13:58:59 -05:00
Brooklyn Nicholson e48a497d16 fix(tui): share static model detection 2026-04-25 13:56:16 -05:00
Brooklyn Nicholson 2dfcc8087a fix(tui): avoid network lookup during startup 2026-04-25 13:47:18 -05:00
Brooklyn Nicholson 4db58d45d4 fix(tui): address startup provider review 2026-04-25 13:29:15 -05:00
Brooklyn Nicholson 57b43fdd4b fix(tui): preserve provider precedence on startup 2026-04-25 13:25:43 -05:00
Brooklyn Nicholson e9c47c7042 fix(tui): honor launch model overrides 2026-04-25 13:21:59 -05:00
brooklyn! ee0728c6c4 Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle
fix(tui): rebuild when ink bundle is missing
2026-04-25 13:14:23 -05:00
codez 9daa0620a6 fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
Fix logic-ordering bug where normalized_reasoning promotion returns
before the DeepSeek/Kimi needs_empty_reasoning guard, causing
cross-provider reasoning content (MiniMax → DeepSeek) to leak into
reasoning_content and trigger HTTP 400.

Changes:
- Reorder branching: existing reasoning_content check first
- Add 'not has_reasoning' guard so poisoned histories (no reasoning)
  still get '' injected for DeepSeek/Kimi
- Healthy same-provider reasoning promotion path unchanged

Refs: #15250, #15213
2026-04-26 02:04:52 +08:00
kshitij 648b89911f fix: use output_text for assistant message content in Codex Responses API (#15690)
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.

_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:

  Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().

Fixes #15687
2026-04-25 10:13:29 -07:00
kshitijk4poor 7c17accb29 fix: /stop now immediately aborts streaming retry loop
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.

On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.

Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.

Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.

Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
2026-04-25 09:51:39 -07:00
Teknium 5006b2204b fix(update): honor RestartSec when polling for gateway respawn (#15707)
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed

  ⚠ hermes-gateway drained but didn't relaunch — forcing restart

followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).

Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.

Observed timeline from journalctl before fix:
  08:56:22.262  old PID exits 75
  08:56:32.707  systemd logs Stopped -> Started  (10.4s gap, > 10s budget)

After fix the poll covers 40s — comfortably inside RestartSec + slack.

Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
  'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
2026-04-25 09:08:27 -07:00
Teknium a9fa73a620 feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704)
Makes hermes -z usable by sweeper without mutating user config.

- Top-level -m/--model and --provider flags that apply to -z/--oneshot
  (mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
  for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
  given without --provider, detect_provider_for_model() auto-selects the
  provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
  model across to a different provider is usually wrong, and silently
  picking the provider's catalog default hides the mismatch.

Config defaults still used when both flags are omitted (existing behavior).

Validation (all live against OpenRouter):
  -z 'x' ....................... uses config default (opus-4.7)
  -z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
  -z 'x' --model ... --provider  pair as given
  HERMES_INFERENCE_MODEL=... -z  haiku-4.5 via env var
  -z 'x' --provider anthropic .. exits 2 with error to stderr
2026-04-25 08:55:36 -07:00
Teknium 7c8c031f60 feat: add hermes -z <prompt> one-shot mode (#15702)
* feat: add `hermes -z <prompt>` one-shot mode

Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.

Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().

* feat(oneshot): handle interactive-callback gaps explicitly

Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:

  - clarify       — inject a callback that tells the agent to pick the
                    best default and continue (previously returned a
                    generic 'not available in this execution context'
                    error that wastes a tool call)
  - sudo password — terminal_tool already gates on HERMES_INTERACTIVE
                    (we don't set it); sudo fails gracefully
  - shell hooks   — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
                    back to deny on non-tty stdin
  - dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
  - secret capture— tool returns gracefully when no callback wired

Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
2026-04-25 08:44:38 -07:00
Teknium ea01bdcebe refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
2026-04-25 08:21:14 -07:00
kshitijk4poor d635e2df3f fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.

Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').

Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.

Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
2026-04-25 07:09:47 -07:00
Teknium cf2fabc40f docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.

Changes
- extending-the-dashboard.md:
  - Frontmatter description + intro bullet now mention page-scoped slots
  - New TOC entry "Augmenting built-in pages (page-scoped slots)"
  - New dedicated subsection after "Replacing built-in pages"
    explaining the heavy-vs-light tradeoff, listing the pages that
    expose slots, and showing a worked manifest + IIFE example with
    tab.hidden: true
  - Cross-link from the tab.override section pointing readers to the
    lighter augmentation option
- web-dashboard.md:
  - Bullet mentioning "page-scoped slots (inject widgets into
    built-in pages without overriding them)"

Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
  the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
  link) reproduce on bare main -- not introduced here
2026-04-25 06:59:24 -07:00
Teknium af22421e87 feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.

* feat(dashboard): page-scoped plugin slots for built-in pages

Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.

Previously, plugins could only:
  1. Add new tabs (tab.path)
  2. Replace whole built-in pages (tab.override)
  3. Inject into global shell slots (header-*, footer-*, pre-main, ...)

None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.

Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
  (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
  grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
    <PluginSlot name="<page>:top" />
  as the first child of its outer wrapper and
    <PluginSlot name="<page>:bottom" />
  as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
  sessions:top via registerSlot(), with matching slots entry in
  the manifest -- so freshly-setup users can see what page-scoped
  slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
  authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
  colon-bearing slot names (sessions:top, analytics:bottom, ...)

Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
2026-04-25 06:55:35 -07:00
Teknium 97d54f0e4d fix(terminal): three-layer defense against watch_patterns notification spam (#15642)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
2026-04-25 06:41:58 -07:00
Teknium 6e561ffa6d fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639)
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call.  The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later.  Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.

Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s.  Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.

Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
2026-04-25 06:11:22 -07:00
Teknium ac05daa189 fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634)
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.

Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
2026-04-25 05:53:08 -07:00
Teknium 3c1c65e754 fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
2026-04-25 05:50:34 -07:00
Teknium f92006ce1c fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium b35d692f45 chore(release): map ash@users.noreply.github.com to ash 2026-04-25 05:27:17 -07:00
Ash Rowan Vale 🌿 facea84559 fix(auxiliary): retry without temperature when any provider rejects it
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
2026-04-25 05:27:17 -07:00
Teknium f67a61dc93 fix(flush_memories): strip temperature from codex_responses fallback (#15620)
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:

- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models

The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.

Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
2026-04-25 05:01:25 -07:00
Teknium 6ed37e0f42 feat(tools): make discord/discord_admin opt-in, Discord-only
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.

Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
  - _prompt_toolset_checklist: checklist filter
  - _get_platform_tools: resolution filter (both branches)
  - _save_platform_tools: save-time filter (covers 'Configure all
    platforms' and hand-edited config.yaml)
  - tools_disable_enable_command: rejects `hermes tools enable discord`
    on non-Discord platforms with a clear error

build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
2026-04-25 04:51:11 -07:00
alt-glitch 591deeb928 feat(session): inject Discord IDs block when discord tool is loaded
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.

Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string).  The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.

The block adapts to context:

  - Thread:     guild / parent channel / thread / message
  - Channel:    guild / channel / message
  - (DM has no guild/channel IDs worth listing; only message)

Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
2026-04-25 04:51:11 -07:00
alt-glitch 5ae07e7b5c fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.

Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``.  Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
2026-04-25 04:51:11 -07:00
alt-glitch 47b02e961c feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message.  Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.

For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.

Behavioural: still a no-op — nothing consumes these fields yet.
2026-04-25 04:51:11 -07:00
alt-glitch 0702231dd8 feat(session): add guild_id/parent_chat_id/message_id to SessionSource
Groundwork for injecting raw platform identifiers into the agent's
system prompt.  Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.

Adds three optional fields:

- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)

Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them.  Behaviourally a
no-op: nothing reads the fields yet and they default to None.
2026-04-25 04:51:11 -07:00
alt-glitch db09477b77 feat(feishu): wire feishu doc/drive tools into hermes-feishu composite
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
2026-04-25 04:50:14 -07:00
alt-glitch 81987f0350 feat(discord): split discord_server into discord + discord_admin tools
Split the monolithic discord_server tool (14 actions) into two:

- discord: core actions (fetch_messages, search_members, create_thread)
  that are useful for the agent's normal operation. Auto-enabled on
  the discord platform via the pipeline fix.

- discord_admin: server management actions (list channels/roles, pins,
  role assignment) that require explicit opt-in via hermes tools.
  Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
2026-04-25 04:50:14 -07:00
alt-glitch 9830905dab fix(tools): recover non-configurable toolsets from composite resolution
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
2026-04-25 04:50:14 -07:00
Teknium 0d548d1db9 fix(cron): wire context_from through the update action
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.

- tools/cronjob_tools.py: handle context_from in the update branch the
  same way script/enabled_toolsets/workdir are handled: normalize str/list
  to refs, validate each referenced job exists (same check the create
  branch does), store as list-or-None to match create_job()'s shape.
  Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
  clear (both shapes)/bad-ref/preserve-across-unrelated-update.
2026-04-25 04:49:28 -07:00
MorAlekss eb92222811 fix(cron): silent skip when context_from job has no output yet 2026-04-25 04:49:28 -07:00
MorAlekss e4a91ccb76 test(cron): add PermissionError coverage for context_from 2026-04-25 04:49:28 -07:00
MorAlekss 5ac5365923 feat(cron): add context_from field for cron job output chaining 2026-04-25 04:49:28 -07:00
Teknium f433197f23 feat(installer): FHS layout for root installs on Linux (#15608)
Root installs on Linux now put the code at /usr/local/lib/hermes-agent and
the hermes command at /usr/local/bin/hermes.  HERMES_HOME (~/.hermes) stays
state-only.  Matches Claude Code / Codex CLI / OpenClaw, keeps Docker
bind-mounted /root/ volumes lean, and puts the command on every shell's
default PATH without touching shell RC files.

- Non-root users and macOS root: unchanged
- Existing root installs at $HERMES_HOME/hermes-agent: preserved in-place
  (detected via .git dir) — no auto-migration, no breakage
- Explicit --dir / $HERMES_INSTALL_DIR: always wins, never overridden
- Termux: unchanged (package manager manages /data/data/...)

Requested by @souly9999 (Discord). Our own Dockerfile already uses this
split (code at /opt/hermes, data at /opt/data volume); the user-install
path now matches.
2026-04-25 04:49:16 -07:00
Teknium df485628ce chore(release): map Readon's git email to GitHub login 2026-04-25 04:49:07 -07:00
Yindong 9fde22d233 fix the reset of model change by /model. 2026-04-25 04:49:07 -07:00
alt-glitch 9d7b64b5dd fix(tools): normalize numeric entries and clear stale no_mcp in _save_platform_tools
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.

The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
2026-04-25 04:49:02 -07:00
vominh1919 5401a0080d fix: recalculate token budgets on model switch in ContextCompressor
update_model() recalculated threshold_tokens but left tail_token_budget
and max_summary_tokens at their __init__ values. When switching from a
200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K)
instead of the intended ~10%.

Adds budget recalculation in update_model() and 2 regression tests.
2026-04-25 15:07:56 +05:30
Teknium e5647d7863 docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530)
The web-dashboard.md and dashboard-plugins.md pages had overlapping,
partial coverage of the theme and plugin systems. Themes were split
across two pages; the plugin docs had a minimal manifest reference but
no step-by-step guide, no slot catalog, and no theme+plugin demo.

New: user-guide/features/extending-the-dashboard.md — single navigable
reference for all three extension layers (themes, UI plugins, backend
plugins). Includes:

- Theme quick-start + full schema (palette, typography, layout, layout
  variants, assets, componentStyles, colorOverrides, customCSS)
- Plugin quick-start + full schema (manifest, SDK, slots, tab.override,
  tab.hidden, backend routes, custom CSS)
- 10-slot shell catalog with locations
- Plugin discovery + load lifecycle
- Combined theme+plugin walkthrough (Strike Freedom cockpit demo)
- API reference + troubleshooting

web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS,
development). Theme/plugin content now points to the new page with a
built-in themes summary table.

dashboard-plugins.md: deleted (merged into extending-the-dashboard.md).

sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under
the Management group.

No user-facing behavior change; docs-only.
2026-04-24 23:26:51 -07:00
Teknium 023b1bff11 fix(delegate): resolve subagent approval prompts without deadlocking parent TUI (#15491)
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.

Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:

  false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
  true            -> _subagent_auto_approve (opt-in YOLO for cron/batch)

Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).

- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
  _get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
  and TLS scoping in the worker thread
2026-04-24 22:37:22 -07:00
brooklyn! 6407b3d5b3 Merge pull request #15488 from kevin-ho/fix/tui-mouse-toggle
fix(tui): proactive mouse disable on ConPTY + /mouse toggle command
2026-04-24 22:43:47 -05:00
Teknium 0a59994030 fix(cli-config): keep delegation overrides commented in example 2026-04-24 20:38:58 -07:00
MorAlekss 0ed37c0ca4 docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning 2026-04-24 20:38:58 -07:00
Vesper (on behalf of Director) 1c8ce33d51 fix(tui): proactive mouse disable on ConPTY + /mouse toggle command
On Windows WSL2, ConPTY implicitly enables mouse event injection when
the alternate screen buffer (DEC 1049) is entered, causing raw escape
sequences to appear in the transcript as ghost characters.

Fix (two parts):
1. ConPTY fix: send DISABLE_MOUSE_TRACKING immediately after entering
   alt screen when mouse tracking is off (AlternateScreen.tsx)
2. Runtime toggle: add /mouse [on|off|toggle] slash command with config
   persistence (display.tui_mouse) so users can manage this at runtime

The env var HERMES_TUI_DISABLE_MOUSE continues to work as the initial
default, but can now be overridden via /mouse and persisted to config.

Closes: upstream ConPTY mouse injection issue
Credits: OutThisLife / PR #13716 for the toggle concept
2026-04-24 20:32:12 -07:00
Clifford Garwood 2182de55bb fix(matrix): drop needless DeviceID import + mock put_device_id in tests
Two adjustments to make CI pass:

- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
  so passing `client.device_id` directly (already a str) works identically
  at runtime. The explicit import was cosmetic and tripped CI environments
  where `mautrix.types` doesn't re-export DeviceID at the expected path
  ("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").

- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
  `PgCryptoStore` fake so the three encryption-path tests
  (test_connect_with_access_token_and_encryption,
  test_connect_uses_configured_device_id_over_whoami,
  test_connect_registers_encrypted_event_handler_when_encryption_on) can
  exercise the new crypto-store binding without AttributeError.
2026-04-25 07:17:03 +05:30
Clifford Garwood 3cf13747b7 fix(matrix): bind PgCryptoStore device_id so fresh E2EE installs work
PgCryptoStore.__init__ defaults _device_id to "" and put_account writes
that blank value into crypto_account. The UPSERT's ON CONFLICT DO UPDATE
clause deliberately does not touch device_id, so once the row is written
blank it stays blank forever — breaking every downstream device-scoped
olm operation. Peers' to-device olm ciphertext can't match our identity
key, no megolm sessions ever land, and the user sees "hermes is in the
room but never responds to encrypted messages".

Fix: call put_device_id(client.device_id) immediately after
crypto_store.open() and before olm.load(). This sets the store's
in-memory _device_id so the first put_account INSERT writes the correct
value from the start.

Observable symptoms without the fix, on a fresh crypto.db:
  - crypto_account.device_id = ""
  - crypto_tracked_user: 0 rows
  - crypto_device: 0 rows
  - crypto_olm_session: 0 rows
  - crypto_megolm_inbound_session: 0 rows
  - "No one-time keys nor device keys got when trying to share keys"
    warning on every startup
  - "olm event doesn't contain ciphertext for this device" DecryptionError
    on any inbound to-device event
  - Encrypted room messages arrive but never decrypt

After the fix (wiped crypto.db + restart):
  - device_id populated with actual runtime device (e.g. CZIKTRFLOV)
  - all counts populate from sync as expected
  - encrypted DMs flow normally

Who hits this: anyone with a fresh crypto.db — includes first-time matrix
E2EE setup, nio→mautrix migrations (since matrix.py removes the legacy
pickle on startup, creating a fresh SQLite store), and anyone who wipes
crypto.db to start over. Existing installs that somehow already have a
non-blank device_id would be unaffected, but no prior code path writes
it correctly, so that set is likely empty.
2026-04-25 07:17:03 +05:30
Siddharth Balyan 3e61703b08 fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths (#15444)
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths

fix-lockfiles checked npm lockfile hashes by running
`nix build .#<attr>.npmDeps`, but fetchNpmDeps is a fixed-output
derivation — if the old store path exists locally, Nix returns it from
cache without re-fetching. This caused the script to report "ok" even
when hashes were stale, while CI (with no cache) failed with a hash
mismatch.

Adding --rebuild forces Nix to re-derive and verify the output hash
against the declared one, catching staleness regardless of local cache
state. Also updates the tui and web npm deps hashes that were stale.

* fix(nix): regenerate ui-tui lockfile to add missing @emnapi entries

npm ci was failing because @emnapi/core and @emnapi/runtime were
missing from ui-tui/package-lock.json despite being required as peer
deps by @napi-rs/wasm-runtime (via @rolldown/binding-wasm32-wasi).

Running npm install --package-lock-only adds the missing entries.
The npmDepsHash reverts to its previous value since fetchNpmDeps was
already fetching these packages as transitive dependencies.
2026-04-25 06:14:32 +05:30
Teknium 05d8f11085 fix(/model): show provider-enforced context length, not raw models.dev (#15438)
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.

Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.

Fix applied to all 3 /model display sites:
  cli.py _handle_model_switch
  gateway/run.py picker on_model_selected callback
  gateway/run.py text-fallback confirmation

Reported by @emilstridell (Telegram, April 2026).
2026-04-24 17:21:38 -07:00
Teknium 13038dc747 fix(skills): ship google-workspace deps as [google] extra; make setup.py 3.9-parseable
Closes #13626.

Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:

1. Declare a [google] optional extra in pyproject.toml
   (google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
   include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
   default, so `setup.py --check` does not need to shell out to pip at
   runtime — the imports succeed and install_deps() is never reached.
   This fixes the Nix breakage where pip/ensurepip are stripped.

2. Add `from __future__ import annotations` to setup.py so the PEP 604
   `str | None` annotation parses on Python 3.9 (macOS system python).
   Previously system python3 SyntaxError'd before any code ran.

install_deps() error message now also points users at the extra instead of
just the raw pip command.
2026-04-24 16:45:27 -07:00
Teknium 629e108ee2 chore(release): map jerome.benoit@sap.com to jerome-benoit 2026-04-24 16:45:27 -07:00
Jérôme Benoit c34d3f4807 fix(skills): factor HERMES_HOME resolution into shared _hermes_home helper
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:

- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)

Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles

All three scripts now import from _hermes_home instead of duplicating.

7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.

Closes #12722
2026-04-24 16:45:27 -07:00
Teknium f14264c438 chore(release): map simbamax99@gmail.com to @simbam99 2026-04-24 16:42:31 -07:00
simbam99 19a3e2ce8e fix(gateway): follow compression continuations during /resume 2026-04-24 16:42:31 -07:00
Teknium d58b305adf refactor(deepseek-reasoning): consolidate detection into helpers + regression tests
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.

Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone

21 tests, all pass.
2026-04-24 16:38:29 -07:00
Teknium e93cc934c7 chore(release): map chenzeshi@live.com -> chen1749144759 in AUTHOR_MAP 2026-04-24 16:38:29 -07:00
chen1749144759 93a2d6b307 fix: add DeepSeek reasoning_content echo for tool-call messages
DeepSeek V4 thinking mode requires reasoning_content on every
assistant message that includes tool_calls. When this field is
missing from persisted history, replaying the session causes
HTTP 400: 'The reasoning_content in the thinking mode must be
passed back to the API.'

Two-part fix (refs #15250):

1. _copy_reasoning_content_for_api: Merge the Kimi-only and
   DeepSeek detection into a single needs_tool_reasoning_echo
   check. This handles already-poisoned persisted sessions by
   injecting an empty reasoning_content on replay.

2. _build_assistant_message: Store reasoning_content='' on new
   DeepSeek tool-call messages at creation time, preventing
   future session poisoning at the source.

Additional fix:
3. _handle_max_iterations: Add missing call to
   _copy_reasoning_content_for_api in the max-iterations flush
   path (previously only main loop and flush_memories had it).

Detection covers:
- provider == 'deepseek'
- model name containing 'deepseek' (case-insensitive)
- base URL matching api.deepseek.com (for custom provider)
2026-04-24 16:38:29 -07:00
Teknium 4fade39c90 chore(release): map benjaminsehl noreply email in AUTHOR_MAP 2026-04-24 16:04:37 -07:00
Benjamin Sehl f731c2c2bd fix(gateway/bluebubbles): align iMessage delivery with non-editable UX 2026-04-24 16:04:37 -07:00
Brian D. Evans 00c3d848d8 fix(memory): skip external-provider sync on interrupted turns (#15218)
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present.  That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth.  Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".

Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site.  That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.

The new method encodes three independent skip conditions:

  1. ``interrupted`` → skip entirely (the #15218 fix).  Applies even
     when ``final_response`` and ``original_user_message`` happen to
     be populated — an interrupt may have landed between a streamed
     reply and the next tool call, so the strings on disk are not
     actually the turn the user took away.
  2. No memory manager / no final_response / no user message →
     preserve existing skip behaviour (nothing new for providerless
     sessions, system-initiated refreshes, tool-only turns that never
     resolved, etc.).
  3. Sync_all / queue_prefetch_all exceptions → swallow.  External
     memory providers are strictly best-effort; a misconfigured or
     offline backend must never block the user from seeing their
     response.

The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.

### Tests (16 new, all passing on py3.11 venv)

``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use).  Coverage:

- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
  could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
  with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
  pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
  original_user_message) asserts sync fires iff interrupted=False AND
  both strings are non-empty

Closes #15218

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:30:18 -07:00
Yukipukii1 fd10463069 fix(env): safely quote ~/ subpaths in wrapped cd commands 2026-04-24 15:25:12 -07:00
sprmn24 c599a41b84 fix(auth): preserve corrupt auth.json and warn instead of silently resetting
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.

Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
2026-04-24 15:22:44 -07:00
Teknium c7d62b3fe3 chore(release): map ebukau84@gmail.com -> UgwujaGeorge in AUTHOR_MAP 2026-04-24 15:22:19 -07:00
Teknium 36d68bcb82 fix(api-server): persist incomplete snapshot on asyncio.CancelledError too
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.

Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.

Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
2026-04-24 15:22:19 -07:00
UgwujaGeorge a29bad2a3c fix(api-server): persist response snapshot on client disconnect when store=True 2026-04-24 15:22:19 -07:00
sprmn24 7957da7a1d fix(web_server): hold _oauth_sessions_lock during PKCE session state writes
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.

Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
2026-04-24 15:22:04 -07:00
Cyprian Kowalczyk fd3864d8bd feat(cli): wrap /compress in _busy_command to block input during compression
Before this, typing during /compress was accepted by the classic CLI
prompt and landed in the next prompt after compression finished,
effectively consuming a keystroke for a prompt that was about to be
replaced. Wrapping the body in self._busy_command('Compressing
context...') blocks input rendering for the duration, matching the
pattern /skills install and other slow commands already use.

Salvages the useful part of #10303 (@iRonin). The `_compressing` flag
added to run_agent.py in the original PR was dead code (set in 3 spots,
read nowhere — not by cli.py, not by run_agent.py, not by the Ink TUI
which doesn't use _busy_command at all) and was dropped.
2026-04-24 15:21:22 -07:00
Yukipukii1 8ea389a7f8 fix(gateway/config): coerce quoted boolean values in config parsing 2026-04-24 15:20:05 -07:00
knockyai 3e6c108565 fix(gateway): honor queue mode in runner PRIORITY interrupt path
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.

Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.

Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
2026-04-24 15:18:34 -07:00
Teknium e3a1a9c24d chore(release): map julia@alexland.us -> alexg0bot in AUTHOR_MAP (#15384) 2026-04-24 15:18:09 -07:00
Teknium e3697e20a6 chore(release): map iRonin personal email to GitHub login 2026-04-24 15:17:09 -07:00
Teknium ed91b79b7e fix(cli): keep Ctrl+D no-op when only attachments pending
Follow-up to @iRonin's Ctrl+D EOF fix. If the input text is empty but
the user has pending attached images, do nothing rather than exiting —
otherwise a stray Ctrl+D silently discards the attachments.
2026-04-24 15:17:09 -07:00
CK iRonin.IT 08d5c9c539 fix: Ctrl+D deletes char under cursor, only exits on empty input (bash/zsh behaviour) 2026-04-24 15:17:09 -07:00
Julia Bennet 1dcf79a864 feat: add slash command for busy input mode 2026-04-24 15:15:26 -07:00
teknium1 2de8a7a229 fix(skills): drop raw_content to avoid doubling skill payload
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.

The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
2026-04-24 15:15:07 -07:00
helix4u ead66f0c92 fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
Allard 0bcbc9e316 docs(faq): Update docs on backups
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
2026-04-24 15:14:08 -07:00
Teknium 2d444fc84d fix(run_agent): handle unescaped control chars in tool_call arguments (#15356)
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.

Adds two repair passes:
  - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
  - Pass 4: escape 0x00-0x1F control chars inside string values, then retry

Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).

Credit: @truenorth-lj for the original utility design.

4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
2026-04-24 15:06:41 -07:00
Teknium bb53d79d26 chore(release): map q19dcp@gmail.com -> aj-nt in AUTHOR_MAP 2026-04-24 15:03:07 -07:00
AJ 17fc84c256 fix: repair malformed tool call args in streaming assembly before flagging as truncated
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).

_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.

Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.

This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.

Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
2026-04-24 15:03:07 -07:00
Teknium b7c1d77e55 fix(dashboard): remove unimplemented 'block' busy_input_mode option
The web UI schema advertised 'block' as a busy_input_mode choice, but
no implementation ever existed — the gateway and CLI both silently
collapsed 'block' (and anything other than 'queue') to 'interrupt'.
Users who picked 'block' in the dashboard got interrupts anyway.

Drop 'block' from the select options. The two supported modes are
'interrupt' (default) and 'queue'.
2026-04-24 15:01:38 -07:00
luyao618 7a192b124e fix(run_agent): repair corrupted tool_call arguments before sending to provider
When a session is split by context compression mid-tool-call, an assistant
message may end up with truncated/invalid JSON in tool_calls[*].function.arguments.
On the next turn this is replayed verbatim and providers reject the entire request
with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that
cannot recover without manual session quarantine.

This patch adds a defensive sanitizer that runs immediately before
client.chat.completions.create() in AIAgent.run_conversation():

- Validates each assistant tool_calls[*].function.arguments via json.loads
- Replaces invalid/empty arguments with '{}'
- Injects a synthetic tool response (or prepends a marker to the existing one)
  so downstream messages keep valid tool_call_id pairing
- Logs each repair with session_id / message_index / preview for observability

Defense in depth: corruption can originate from compression splits, manual edits,
or plugin bugs. Sanitizing at the send chokepoint catches all sources.

Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args,
existing matching tool response (no duplicate injection), non-assistant messages
ignored, multiple repairs.

Fixes #15236
2026-04-24 14:55:47 -07:00
helix4u 0738b80833 fix(tui): rebuild when ink bundle is missing 2026-04-24 15:51:38 -06:00
Teknium 4093ee9c62 fix(codex): detect leaked tool-call text in assistant content (#15347)
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.

Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.

Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.

Reported on Discord (gpt-5.4 / openai-codex).
2026-04-24 14:39:59 -07:00
helix4u 6a957a74bc fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
Teknium 14b27bb68c chore(release): map @tochukwuada in AUTHOR_MAP
Contributor email for PR #15161 salvage (debthemelon
<thomasgeorgevii09@gmail.com>).
2026-04-24 14:32:21 -07:00
Teknium ef9355455b test: regression coverage for checkpoint dedup and inf/nan coercion
Covers the two bugs salvaged from PR #15161:

- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
  the final aggregated completed_prompts list has no duplicate indices,
  and keeps a sanity anchor test documenting the pre-fix pattern so a
  future refactor that re-introduces it is caught immediately.

- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
  returns the original string for inf/-inf/nan/Infinity inputs and that
  the result round-trips through strict (allow_nan=False) json.dumps.
2026-04-24 14:32:21 -07:00
debthemelon dbdefa43c8 fix: eliminate duplicate checkpoint entries and JSON-unsafe coercion
batch_runner: completed_prompts_set is already fully populated by the
time the aggregation loop runs (incremental updates happen at result
collection time), so the subsequent extend() call re-added every
completed prompt index a second time. Removed the redundant variable
and extend, and write sorted(completed_prompts_set) directly to the
final checkpoint instead.

model_tools: _coerce_number returned Python float('inf')/float('nan')
for inf/nan strings rather than the original string. json.dumps raises
ValueError for these values, so any tool call where the model emitted
"inf" or "nan" for a numeric parameter would crash at serialization.
Changed the guard to return the original string, matching the
function's documented "returns original string on failure" contract.
2026-04-24 14:32:21 -07:00
Teknium db9d6375fb feat(models): add openai/gpt-5.5 and gpt-5.5-pro to OpenRouter + Nous Portal (#15343)
Replaces gpt-5.4 / gpt-5.4-pro entries in the OpenRouter fallback snapshot
and the Nous Portal curated list. Other aggregators (Vercel AI Gateway)
and provider-native lists are unchanged.
2026-04-24 14:31:47 -07:00
helix4u 8a2506af43 fix(aux): surface auxiliary failures in UI 2026-04-24 14:31:21 -07:00
helix4u e7590f92a2 fix(telegram): honor no_proxy for explicit proxy setup 2026-04-24 14:31:04 -07:00
brooklyn! a5129c72ef Merge pull request #15337 from NousResearch/bb/tui-kawaii-default-off
fix(tui): keep default personality neutral
2026-04-24 16:23:00 -05:00
Brooklyn Nicholson 53fc10fc9a fix(tui): keep default personality neutral 2026-04-24 16:19:23 -05:00
brooklyn! 93ddff53e3 Merge pull request #15321 from NousResearch/bb/tui-inline-diff-tooltrail-order
fix(tui): render tool trail before anchored inline diffs
2026-04-24 15:20:42 -05:00
Brooklyn Nicholson de596aca1c fix(tui): render tool trail before anchored inline diffs
Inline diff segments were anchored relative to assistant narration, but the
turn details pane still rendered after streamSegments. On completion that put
the diff before the tool telemetry that produced it. When a turn has anchored
diff segments, commit the accumulated thinking/tool trail as a pre-diff trail
message, then render the diff and final summary.
2026-04-24 15:07:02 -05:00
brooklyn! 6f1eed3968 Merge pull request #15274 from NousResearch/bb/tui-null-config-guard
fix(tui): tolerate + warn on null sections in config.yaml
2026-04-24 13:02:12 -05:00
Brooklyn Nicholson e3940f9807 fix(tui): guard personality overlay when personalities is null
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
2026-04-24 12:57:51 -05:00
Brooklyn Nicholson bfa60234c8 feat(tui): warn on bare null sections in config.yaml
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
2026-04-24 12:49:02 -05:00
Brooklyn Nicholson fd9b692d33 fix(tui): tolerate null top-level sections in config.yaml
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.

Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
2026-04-24 12:43:09 -05:00
Austin Pickett c61547c067 Merge pull request #14890 from NousResearch/bb/tui-web-chat-unified
feat(web): dashboard Chat tab — xterm.js + JSON-RPC sidecar (supersedes #12710 + #13379)
2026-04-24 10:35:43 -07:00
brooklyn! 7f0f67d5f7 Merge pull request #15266 from NousResearch/bb/fix-tui-section-toggle
fix(tui): chevrons re-toggle even when section default is expanded
2026-04-24 12:24:27 -05:00
Brooklyn Nicholson f5e2a77a80 fix(tui): chevrons re-toggle even when section default is expanded
Recovers the manual click on the details accordion: with #14968's new
SECTION_DEFAULTS (thinking/tools start `expanded`), every panel render
was OR-ing the local open toggle against `visible.X === 'expanded'`.
That pinned `open=true` for the default-expanded sections, so clicking
the chevron flipped the local state but the panel never collapsed.

Local toggle is now the sole source of truth at render time; the
useState init still seeds from the resolved visibility (so first paint
is correct) and the existing useEffect still re-syncs when the user
mutates visibility at runtime via `/details`.

Same OR-lock cleared inside SubagentAccordion (`showChildren ||
openX`) — pre-existing but the same shape, so expand-all on the
spawn tree no longer makes inner sections un-collapsible either.
2026-04-24 12:22:20 -05:00
Austin Pickett 850fac14e3 chore: address copilot comments 2026-04-24 12:51:04 -04:00
Austin Pickett 5500b51800 chore: fix lint 2026-04-24 12:32:10 -04:00
Austin Pickett 63975aa75b fix: mobile chat in new layout 2026-04-24 12:07:46 -04:00
Teknium 62c14d5513 refactor(gateway): extract WhatsApp identity helpers into shared module
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.

Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
2026-04-24 07:55:55 -07:00
Keira Voss 10deb1b87d fix(gateway): canonicalize WhatsApp identity in session keys
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:

  1. DM chat_id — a user's DM sessions split in half, transcripts and
     per-sender state diverge.
  2. Group participant_id (with group_sessions_per_user enabled) — a
     member's per-user session inside a group splits in half for the
     same reason.

Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.

Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.

_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
2026-04-24 07:55:55 -07:00
emozilla f49afd3122 feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.

Architecture:

    browser <Terminal> (xterm.js)
           │  onData ───► ws.send(keystrokes)
           │  onResize ► ws.send('\x1b[RESIZE:cols;rows]')
           │  write   ◄── ws.onmessage (PTY bytes)
           ▼
    FastAPI /api/pty (token-gated, loopback-only)
           ▼
    PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent

Components
----------

hermes_cli/pty_bridge.py
  Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
  master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
  inherently byte-oriented and UTF-8 boundaries may land mid-read),
  non-blocking select-based reads, TIOCSWINSZ resize, idempotent
  SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
  is WSL-supported only).

hermes_cli/web_server.py
  @app.websocket("/api/pty") endpoint gated by the existing
  _SESSION_TOKEN (via ?token= query param since browsers can't set
  Authorization on WS upgrades). Loopback-only enforcement. Reader task
  uses run_in_executor to pump PTY bytes without blocking the event
  loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
  before forwarding to the PTY. The endpoint resolves the TUI argv
  through a _resolve_chat_argv hook so tests can inject fake commands
  without building the real TUI.

Tests
-----

tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.

tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.

96 tests pass under scripts/run_tests.sh.

(cherry picked from commit 29b337bca7)

feat(web): add Chat tab with xterm.js terminal + Sessions resume button

(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
 against current main: BUILTIN_ROUTES table + plugin slot layout)

fix(tui): replace OSC 52 jargon in /copy confirmation

When the user ran /copy successfully, Ink confirmed with:

  sent OSC52 copy sequence (terminal support required)

That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.

Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:

  copied 1482 chars

(cherry picked from commit a0701b1d5a)

docs: document the dashboard Chat tab

AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'

website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).

(cherry picked from commit 2c2e32cc45)

feat(tui-gateway): transport-aware dispatch + WebSocket sidecar

Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.

- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
  + module-level `StdioTransport` fallback. The stdio stream resolves
  through a lambda so existing tests that monkey-patch `_real_stdout`
  keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
  endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
  - `write_json` routes via session transport (for async events) →
    contextvar transport (for in-request writes) → stdio fallback.
  - `dispatch(req, transport=None)` binds the transport for the request
    lifetime and propagates it to pool workers via `contextvars.copy_context`
    so async handlers don't lose their sink.
  - `_init_session` and the manual-session create path stash the
    request's transport so out-of-band events (subagent.complete, etc.)
    fan out to the right peer.

`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.

feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal

Composes the two transports into a single Chat tab:

  ┌─────────────────────────────────────────┬──────────────┐
  │  xterm.js / PTY  (emozilla #13379)      │ ChatSidebar  │
  │  the literal hermes --tui process       │  /api/ws     │
  └─────────────────────────────────────────┴──────────────┘
        terminal bytes                          structured events

The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:

  • model + provider badge with connection state (click → switch)
  • running tool-call list (driven by tool.start / tool.progress /
    tool.complete events)
  • model picker dialog (gateway-driven, reuses ModelPickerDialog)

The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.

- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
  Owns its GatewayClient, drives the model picker through
  `slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
  (`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
  guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.

Co-authored-by: emozilla <emozilla@nousresearch.com>

refactor(web): /clean pass on ChatSidebar + ChatPage lint debt

- ChatSidebar: lift gw out of useRef into a useMemo derived from a
  reconnect counter. React 19's react-hooks/refs and react-hooks/
  set-state-in-effect rules both fire when you touch a ref during
  render or call setState from inside a useEffect body. The
  counter-derived gw is the canonical pattern for "external resource
  that needs to be replaceable on user action" — re-creating the
  client comes from bumping `version`, the effect just wires + tears
  down. Drops the imperative `gwRef.current = …` reassign in
  reconnect, drops the truthy ref guard in JSX. modelLabel +
  banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
  so the effect body doesn't have to setState on first run. Drops
  the unused react-hooks/exhaustive-deps eslint-disable. Adds a
  scoped no-control-regex disable on the SGR mouse parser regex
  (the \\x1b is intentional for xterm escape sequences).

All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.

Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.

chore: uptick

fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary

The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.

Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.

feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast

The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.

Cross-process forwarding:

- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
  (event_publisher.py, sync websockets client). entry.py installs the
  tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
  every dispatcher emit to a back-WS without disturbing Ink's stdio
  handshake.

- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
  (subscriber) endpoints with a per-channel registry. /api/pty now
  accepts ?channel= and propagates the sidecar URL via env. start_server
  also stashes app.state.bound_port so the URL is constructable.

- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
  passes it to /api/pty and as a prop to ChatSidebar.

- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
  tool.start/progress/complete back into the ToolCall list. Restores
  the ToolCall primitive.

Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).

fix(web): address Copilot review on #14890

Five threads, all real:

- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
  the open handshake.  Server emits `gateway.ready` immediately after
  accept, so a listener attached after the open promise could race past
  the initial skin payload and lose it.

- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
  WS into the existing error banner.  4401/4403 (auth/loopback reject)
  surface as a "reload the page" message; mid-stream drops surface as
  "events feed disconnected" with the existing reconnect button.  Clean
  unmount closes (1000/1001) stay silent.

- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
  ptyprocess lives in the `pty` extra, not `web`.  Switch to
  `hermes-agent[web,pty]` in both prerequisite blocks.

- AGENTS.md: previous "never add a parallel React chat surface" guidance
  was overbroad and contradicted this PR's sidebar.  Tightened to forbid
  re-implementing the transcript/composer/PTY terminal while explicitly
  allowing structured supporting widgets (sidebar / model picker /
  inspectors), matching the actual architecture.

- web/package-lock.json: regenerated cleanly so the wterm sibling
  workspace paths (extraneous machine-local entries) stop polluting CI.

Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.

refactor(web): /clean pass on ChatSidebar events handler

Spotted in the round-2 review:

- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
  fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
  hit the "unexpected drop" branch.  Track `unmounting` in the effect
  scope and gate the banner through a `surface()` helper so cleanup
  closes stay silent.

- DRY the duplicated "events feed disconnected" string into a local
  const used by both the error and close handlers.

- Drop the `opened` flag (no longer needed once the unmount guard is
  the source of truth for "is this an expected close?").
2026-04-24 10:51:49 -04:00
Austin Pickett 1143f234e3 Merge pull request #14899 from NousResearch/feat/dashboard-layout
Feat/dashboard layout
2026-04-24 07:48:31 -07:00
Teknium c4627f4933 chore(release): map Group G contributors in AUTHOR_MAP 2026-04-24 07:26:07 -07:00
bsgdigital 7c3e5706d8 fix(bedrock): Bedrock-aware _rebuild_anthropic_client helper on interrupt
Three interrupt-recovery sites in run_agent.py rebuilt self._anthropic_client
with build_anthropic_client(self._anthropic_api_key, ...) unconditionally.
When provider=bedrock + api_mode=anthropic_messages (AnthropicBedrock SDK
path), self._anthropic_api_key is the sentinel 'aws-sdk' — build_anthropic_client
doesn't accept that and the rebuild either crashed or produced a non-functional
client.

Extract a _rebuild_anthropic_client() helper that dispatches to
build_anthropic_bedrock_client(region) when provider='bedrock', falling back
to build_anthropic_client() for native Anthropic and other anthropic_messages
providers (MiniMax, Kimi, Alibaba, etc.). Three inline rebuild sites now call
the helper.

Partial salvage of #14680 by @bsgdigital — only the _rebuild_anthropic_client
helper. The normalize_model_name Bedrock-prefix piece was subsumed by #14664,
and the aux client aws_sdk branch was subsumed by #14770 (both in the same
salvage PR as this commit).
2026-04-24 07:26:07 -07:00
Andre Kurait a9ccb03ccc fix(bedrock): evict cached boto3 client on stale-connection errors
## Problem

When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:

  * botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
    EndpointConnectionError / ConnectTimeoutError
  * urllib3.exceptions.ProtocolError
  * A bare AssertionError raised from inside urllib3 or botocore
    (internal connection-pool invariant check)

The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.

The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:

    ⚠️  API call failed: AssertionError
       📝 Error:

with no hint of what went wrong.

## Fix

Add two helpers to agent/bedrock_adapter.py:

  * is_stale_connection_error(exc) — classifies exceptions that
    indicate dead-client/dead-socket state. Matches botocore
    ConnectionError + HTTPClientError subtrees, urllib3
    ProtocolError / NewConnectionError, and AssertionError
    raised from a frame whose module name starts with urllib3.,
    botocore., or boto3.. Application-level AssertionErrors are
    intentionally excluded.

  * invalidate_runtime_client(region) — per-region counterpart to
    the existing reset_client_cache(). Evicts a single cached
    client so the next call rebuilds it (and its connection pool).

Wire both into the Converse call sites:

  * call_converse() / call_converse_stream() in
    bedrock_adapter.py (defense-in-depth for any future caller)
  * The two direct client.converse(**kwargs) /
    client.converse_stream(**kwargs) call sites in run_agent.py
    (the paths the agent loop actually uses)

On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.

## Tests

tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):

  * TestInvalidateRuntimeClient — per-region eviction correctness;
    non-cached region returns False.
  * TestIsStaleConnectionError — classifies botocore
    ConnectionClosedError / EndpointConnectionError /
    ReadTimeoutError, urllib3 ProtocolError, library-internal
    AssertionError (both urllib3.* and botocore.* frames), and
    correctly ignores application-level AssertionError and
    unrelated exceptions (ValueError, KeyError).
  * TestCallConverseInvalidatesOnStaleError — end-to-end: stale
    error evicts the cached client, non-stale error (validation)
    leaves it alone, successful call leaves it cached.

All 116 tests in test_bedrock_adapter.py pass.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Tranquil-Flow 7dc6eb9fbf fix(agent): handle aws_sdk auth type in resolve_provider_client
Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None).  This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.

Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata).  Default
auxiliary model is Haiku for cost efficiency.

Closes #13919
2026-04-24 07:26:07 -07:00
Andre Kurait b290297d66 fix(bedrock): resolve context length via static table before custom-endpoint probe
## Problem

`get_model_context_length()` in `agent/model_metadata.py` had a resolution
order bug that caused every Bedrock model to fall back to the 128K default
context length instead of reaching the static Bedrock table (200K for
Claude, etc.).

The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in
`_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False.
The resolution order then ran the custom-endpoint probe (step 2) *before*
the Bedrock branch (step 4b), which:

  1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`).
  2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the
     bedrock-runtime URL (Bedrock doesn't serve this shape).
  3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the
     "probe-down" branch — never reaching the Bedrock static table.

Result: users on Bedrock saw 128K context for Claude models that
actually support 200K on Bedrock, causing premature auto-compression.

## Fix

Promote the Bedrock branch from step 4b to step 1b, so it runs *before*
the custom-endpoint probe at step 2. The static table in
`bedrock_adapter.py::get_bedrock_context_length()` is the authoritative
source for Bedrock (the ListFoundationModels API doesn't expose context
window sizes), so there's no reason to probe `/models` first.

The original step 4b is replaced with a one-line breadcrumb comment
pointing to the new location, to make the resolution-order docstring
accurate.

## Changes

- `agent/model_metadata.py`
  - Add step 1b: Bedrock static-table branch (unchanged predicate, moved).
  - Remove dead step 4b block, replace with breadcrumb comment.
  - Update resolution-order docstring to include step 1b.

- `tests/agent/test_model_metadata.py`
  - New `TestBedrockContextResolution` class (3 tests):
    - `test_bedrock_provider_returns_static_table_before_probe`:
      confirms `provider="bedrock"` hits the static table and does NOT
      call `fetch_endpoint_model_metadata` (regression guard).
    - `test_bedrock_url_without_provider_hint`: confirms the
      `bedrock-runtime.*.amazonaws.com` host match works without an
      explicit `provider=` hint.
    - `test_non_bedrock_url_still_probes`: confirms the probe still
      fires for genuinely-custom endpoints (no over-reach).

## Testing

  pytest tests/agent/test_model_metadata.py -q
  # 83 passed in 1.95s (3 new + 80 existing)

## Risk

Very low.

- Predicate is identical to the original step 4b — no behaviour change
  for non-Bedrock paths.
- Original step 4b was dead code for the user-facing case (always hit
  the 128K fallback first), so removing it cannot regress behaviour.
- Bedrock path now short-circuits before any network I/O — faster too.
- `ImportError` fall-through preserved so users without `boto3`
  installed are unaffected.

## Related

- This is a prerequisite for accurate context-window accounting on
  Bedrock — the fix for #14710 (stale-connection client eviction)
  depends on correct context sizing to know when to compress.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Qi Ke f2fba4f9a1 fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295)
Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7,
us.anthropic.claude-sonnet-4-5-v1:0), not version separators.
normalize_model_name() was unconditionally converting all dots to hyphens,
producing invalid IDs that Bedrock rejects with HTTP 400/404.

This affected both the main agent loop (partially mitigated by
_anthropic_preserve_dots in run_agent.py) and all auxiliary client calls
(compression, session_search, vision, etc.) which go through
_AnthropicCompletionsAdapter and never pass preserve_dots=True.

Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes
(anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen
conversion for these IDs regardless of the preserve_dots flag.
2026-04-24 07:26:07 -07:00
Teknium fcc05284fc fix(delegate): tool-activity-aware heartbeat stale detection (#13041) (#15183)
A child running a legitimately long-running tool (terminal command,
browser fetch, big file read) holds current_tool set and keeps
api_call_count frozen while the tool runs. The previous stale check
treated that as idle after 5 heartbeat cycles (~150s), stopped
touching the parent, and let the gateway kill the session.

Split the threshold in two:
- _HEARTBEAT_STALE_CYCLES_IDLE=5 (~150s)  — applied only when
  current_tool is None (child wedged between turns)
- _HEARTBEAT_STALE_CYCLES_IN_TOOL=20 (~600s) — applied when the child
  is inside a tool call

Stale counter also resets when current_tool changes (new tool =
progress). The hard child_timeout_seconds (default 600s) is still
the final cap, so genuinely stuck tools don't get to block forever.
2026-04-24 07:25:19 -07:00
Teknium 1840c6a57d feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180)
A — 'hermes tools' activation now runs the full Spotify wizard.

Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.

Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.

Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.

B — Docs page now covers cron + Spotify.

New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.

Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.

Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
    tests/hermes_cli/test_spotify_auth.py
    tests/tools/test_spotify_client.py
  → 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
  → SUCCESS, no new warnings
2026-04-24 07:24:28 -07:00
Blind Dev 591aa159aa feat: allow Telegram chat allowlists for groups and forums (#15027)
* feat: allow Telegram chat allowlists for groups and forums

* chore: map web3blind noreply email for release attribution

---------

Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-04-24 07:23:14 -07:00
Austin Pickett d3e56b9f39 chore: refac 2026-04-24 10:17:57 -04:00
Austin Pickett 0fdbfad2b0 feat: embed docs 2026-04-24 09:04:11 -04:00
Austin Pickett 4f5669a569 feat: add docs link 2026-04-24 08:22:44 -04:00
Austin Pickett 809868e628 feat: refac 2026-04-24 01:04:19 -04:00
Austin Pickett e5d2815b41 feat: add sidebar 2026-04-24 00:56:19 -04:00
364 changed files with 42820 additions and 10531 deletions
-4
View File
@@ -52,10 +52,6 @@ ignored/
.worktrees/
environments/benchmarks/evals/
# Compression eval run outputs (harness lives in scripts/compression_eval/)
scripts/compression_eval/results/*
!scripts/compression_eval/results/.gitkeep
# Web UI build output
hermes_cli/web_dist/
+13
View File
@@ -240,6 +240,19 @@ npm run fmt # prettier
npm test # vitest
```
### TUI in the Dashboard (`hermes dashboard` → `/chat`)
The dashboard embeds the real `hermes --tui`**not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools
+41 -4
View File
@@ -390,7 +390,16 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
"timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
}
if normalized_base_url:
kwargs["base_url"] = normalized_base_url
# Azure Anthropic endpoints require an ``api-version`` query parameter.
# Pass it via default_query so the SDK appends it to every request URL
# without corrupting the base_url (appending it directly produces
# malformed paths like /anthropic?api-version=.../v1/messages).
_is_azure_endpoint = "azure.com" in normalized_base_url.lower()
if _is_azure_endpoint and "api-version" not in normalized_base_url:
kwargs["base_url"] = normalized_base_url.rstrip("/")
kwargs["default_query"] = {"api-version": "2025-04-15"}
else:
kwargs["base_url"] = normalized_base_url
common_betas = _common_betas_for_base_url(normalized_base_url)
if _is_kimi_coding_endpoint(base_url):
@@ -986,6 +995,26 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
# ---------------------------------------------------------------------------
def _is_bedrock_model_id(model: str) -> bool:
"""Detect AWS Bedrock model IDs that use dots as namespace separators.
Bedrock model IDs come in two forms:
- Bare: ``anthropic.claude-opus-4-7``
- Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
In both cases the dots separate namespace components, not version
numbers, and must be preserved verbatim for the Bedrock API.
"""
lower = model.lower()
# Regional inference-profile prefixes
if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
return True
# Bare Bedrock model IDs: provider.model-family
if lower.startswith("anthropic."):
return True
return False
def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
"""Normalize a model name for the Anthropic API.
@@ -993,11 +1022,19 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
- Converts dots to hyphens in version numbers (OpenRouter uses dots,
Anthropic uses hyphens: claude-opus-4.6 → claude-opus-4-6), unless
preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
- Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
regional inference profiles (``us.anthropic.claude-*``) whose dots
are namespace separators, not version separators.
"""
lower = model.lower()
if lower.startswith("anthropic/"):
model = model[len("anthropic/"):]
if not preserve_dots:
# Bedrock model IDs use dots as namespace separators
# (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
@@ -1652,9 +1689,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
+167 -13
View File
@@ -42,6 +42,7 @@ import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
@@ -52,6 +53,17 @@ from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_
logger = logging.getLogger(__name__)
def _extract_url_query_params(url: str):
"""Extract query params from URL, return (clean_url, default_query dict or None)."""
parsed = urlparse(url)
if parsed.query:
clean = urlunparse(parsed._replace(query=""))
params = {k: v[0] for k, v in parse_qs(parsed.query).items()}
return clean, params
return url, None
# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
_stale_base_url_warned = False
@@ -390,7 +402,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
converted = []
@@ -1157,8 +1169,10 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
return None, None
model = _read_main_model() or "gpt-4o-mini"
logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
_clean_base, _dq = _extract_url_query_params(custom_base)
_extra = {"default_query": _dq} if _dq else {}
if custom_mode == "codex_responses":
real_client = OpenAI(api_key=custom_key, base_url=custom_base)
real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
return CodexAuxiliaryClient(real_client, model), model
if custom_mode == "anthropic_messages":
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
@@ -1172,12 +1186,12 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
"Custom endpoint declares api_mode=anthropic_messages but the "
"anthropic SDK is not installed — falling back to OpenAI-wire."
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
return (
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
@@ -1349,6 +1363,49 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
"""Detect provider 400s for an unsupported request parameter.
Different OpenAI-compatible endpoints phrase the same class of error a few
ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
``param`` field, ``X is not supported``, ``unknown parameter: X``,
``unrecognized request argument: X``. We match on both the parameter
name and a generic "unsupported/unknown/unrecognized parameter" marker so
call sites can reactively retry without the offending key instead of
surfacing a noisy auxiliary failure.
Generalizes the temperature-specific detector that originally shipped
with PR #15621 so the same retry strategy can cover ``max_tokens``,
``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
for the generalization pattern.
"""
param_lower = (param or "").lower()
if not param_lower:
return False
err_lower = str(exc).lower()
if param_lower not in err_lower:
return False
return any(marker in err_lower for marker in (
"unsupported parameter",
"unsupported_parameter",
"not supported",
"does not support",
"unknown parameter",
"unrecognized request argument",
"unrecognized parameter",
"invalid parameter",
))
def _is_unsupported_temperature_error(exc: Exception) -> bool:
"""Back-compat wrapper: detect API errors where the model rejects ``temperature``.
Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
public symbol because existing tests and call sites import it by name.
"""
return _is_unsupported_parameter_error(exc, "temperature")
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
@@ -1782,12 +1839,15 @@ def resolve_provider_client(
provider,
)
extra = {}
_clean_base, _dq = _extract_url_query_params(custom_base)
if _dq:
extra["default_query"] = _dq
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
@@ -1824,6 +1884,8 @@ def resolve_provider_client(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
_clean_base2, _dq2 = _extract_url_query_params(custom_base)
_extra2 = {"default_query": _dq2} if _dq2 else {}
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
@@ -1841,7 +1903,7 @@ def resolve_provider_client(
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
@@ -1850,7 +1912,7 @@ def resolve_provider_client(
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
@@ -1993,6 +2055,39 @@ def resolve_provider_client(
"directly supported", provider)
return None, None
elif pconfig.auth_type == "aws_sdk":
# AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
# boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
from agent.anthropic_adapter import build_anthropic_bedrock_client
except ImportError:
logger.warning("resolve_provider_client: bedrock requested but "
"boto3 or anthropic SDK not installed")
return None, None
if not has_aws_credentials():
logger.debug("resolve_provider_client: bedrock requested but "
"no AWS credentials found")
return None, None
region = resolve_bedrock_region()
default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
final_model = _normalize_resolved_model(model or default_model, provider)
try:
real_client = build_anthropic_bedrock_client(region)
except ImportError as exc:
logger.warning("resolve_provider_client: cannot create Bedrock "
"client: %s", exc)
return None, None
client = AnthropicAuxiliaryClient(
real_client, final_model, api_key="aws-sdk",
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
# OAuth providers — route through their specific try functions
if provider == "nous":
@@ -2727,8 +2822,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@@ -2816,7 +2911,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "flush_memories").
"session_search", "skills_hub", "mcp", "title_generation").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.
@@ -2919,13 +3014,45 @@ def call_llm(
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
# Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
# then payment fallback.
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s: provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
# If retry still fails, fall through to the max_tokens /
# payment / auth chains below using the temperature-stripped
# kwargs. Re-raise only if the retry hit something those
# chains won't handle.
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3188,8 +3315,35 @@ async def async_call_llm(
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s (async): provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
await client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
+130 -2
View File
@@ -87,6 +87,114 @@ def reset_client_cache():
_bedrock_control_client_cache.clear()
def invalidate_runtime_client(region: str) -> bool:
"""Evict the cached ``bedrock-runtime`` client for a single region.
Per-region counterpart to :func:`reset_client_cache`. Used by the converse
call wrappers to discard clients whose underlying HTTP connection has
gone stale, so the next call allocates a fresh client (with a fresh
connection pool) instead of reusing a dead socket.
Returns True if a cached entry was evicted, False if the region was not
cached.
"""
existed = region in _bedrock_runtime_client_cache
_bedrock_runtime_client_cache.pop(region, None)
return existed
# ---------------------------------------------------------------------------
# Stale-connection detection
# ---------------------------------------------------------------------------
#
# boto3 caches its HTTPS connection pool inside the client object. When a
# pooled connection is killed out from under us (NAT timeout, VPN flap,
# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
# one of a handful of low-level exceptions — most commonly
# ``botocore.exceptions.ConnectionClosedError`` or
# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
# ``assert`` in a couple of paths (connection pool state checks, chunked
# response readers) which bubbles up as a bare ``AssertionError`` with an
# empty ``str(exc)``.
#
# In all of these cases the client is the problem, not the request: retrying
# with the same cached client reproduces the failure until the process
# restarts. The fix is to evict the region's cached client so the next
# attempt builds a new one.
_STALE_LIB_MODULE_PREFIXES = (
"urllib3.",
"botocore.",
"boto3.",
)
def _traceback_frames_modules(exc: BaseException):
"""Yield ``__name__``-style module strings for each frame in exc's traceback."""
tb = getattr(exc, "__traceback__", None)
while tb is not None:
frame = tb.tb_frame
module = frame.f_globals.get("__name__", "")
yield module or ""
tb = tb.tb_next
def is_stale_connection_error(exc: BaseException) -> bool:
"""Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
Matches:
* ``botocore.exceptions.ConnectionError`` and subclasses
(``ConnectionClosedError``, ``EndpointConnectionError``,
``ReadTimeoutError``, ``ConnectTimeoutError``).
* ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
``ConnectionError`` (best-effort import — urllib3 is a transitive
dependency of botocore so it is always available in practice).
* Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
or boto3. These are internal-invariant failures (typically triggered
by corrupted connection-pool state after a dropped socket) and are
recoverable by swapping the client.
Non-library ``AssertionError``s (from application code or tests) are
intentionally not matched — only library-internal asserts signal stale
connection state.
"""
# botocore: the canonical signal — HTTPClientError is the umbrella for
# ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
# ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
# the same family via a different branch of the hierarchy.
try:
from botocore.exceptions import (
ConnectionError as BotoConnectionError,
HTTPClientError,
)
botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
except ImportError: # pragma: no cover — botocore always present with boto3
botocore_errors = ()
if botocore_errors and isinstance(exc, botocore_errors):
return True
# urllib3: low-level transport failures
try:
from urllib3.exceptions import (
ProtocolError,
NewConnectionError,
ConnectionError as Urllib3ConnectionError,
)
urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
except ImportError: # pragma: no cover
urllib3_errors = ()
if urllib3_errors and isinstance(exc, urllib3_errors):
return True
# Library-internal AssertionError (urllib3 / botocore / boto3)
if isinstance(exc, AssertionError):
for module in _traceback_frames_modules(exc):
if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
return True
return False
# ---------------------------------------------------------------------------
# AWS credential detection
# ---------------------------------------------------------------------------
@@ -787,7 +895,17 @@ def call_converse(
guardrail_config=guardrail_config,
)
response = client.converse(**kwargs)
try:
response = client.converse(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse(region=%s, model=%s): "
"%s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_response(response)
@@ -819,7 +937,17 @@ def call_converse_stream(
guardrail_config=guardrail_config,
)
response = client.converse_stream(**kwargs)
try:
response = client.converse_stream(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse_stream(region=%s, "
"model=%s): %s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_stream_events(response)
+197 -11
View File
@@ -23,26 +23,52 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
#
# to=functions.exec_command
# assistant to=functions.exec_command
# <|channel|>commentary to=functions.exec_command
#
# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
# Harmony channel prefix varies by degeneration mode. Case-insensitive to
# cover lowercase/uppercase ``assistant`` variants.
_TOOL_CALL_LEAK_PATTERN = re.compile(
r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Multimodal content helpers
# ---------------------------------------------------------------------------
def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> List[Dict[str, Any]]:
"""Convert chat-style multimodal content to Responses API input parts.
Input: ``[{"type":"text"|"image_url", ...}]`` (native OpenAI Chat format)
Output: ``[{"type":"input_text"|"input_image", ...}]`` (Responses format)
Output: ``[{"type":"input_text"|"output_text"|"input_image", ...}]`` (Responses format)
The ``role`` parameter controls the text content type:
- ``"user"`` (default) → ``"input_text"``
- ``"assistant"`` → ``"output_text"``
The Responses API rejects ``input_text`` inside assistant messages and
``output_text`` inside user messages, so callers MUST pass the correct
role for the message being converted.
Returns an empty list when ``content`` is not a list or contains no
recognized parts — callers fall back to the string path.
"""
text_type = "output_text" if role == "assistant" else "input_text"
if not isinstance(content, list):
return []
converted: List[Dict[str, Any]] = []
for part in content:
if isinstance(part, str):
if part:
converted.append({"type": "input_text", "text": part})
converted.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
continue
@@ -50,7 +76,7 @@ def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
if ptype in {"text", "input_text", "output_text"}:
text = part.get("text")
if isinstance(text, str) and text:
converted.append({"type": "input_text", "text": text})
converted.append({"type": text_type, "text": text})
continue
if ptype in {"image_url", "input_image"}:
image_ref = part.get("image_url")
@@ -201,6 +227,23 @@ def _responses_tools(tools: Optional[List[Dict[str, Any]]] = None) -> Optional[L
# Message format conversion
# ---------------------------------------------------------------------------
_RESPONSE_MESSAGE_STATUSES = {"completed", "incomplete", "in_progress"}
def _normalize_responses_message_status(value: Any, *, default: str = "completed") -> str:
"""Normalize a Responses assistant message status for replay.
The API accepts completed/incomplete/in_progress on replayed assistant
output messages. Preserve those exactly (modulo case/hyphen spelling) so
incomplete Codex continuation turns don't get falsely marked completed.
"""
if isinstance(value, str):
status = value.strip().lower().replace("-", "_").replace(" ", "_")
if status in _RESPONSE_MESSAGE_STATUSES:
return status
return default
def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items."""
items: List[Dict[str, Any]] = []
@@ -216,9 +259,10 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
if role in {"user", "assistant"}:
content = msg.get("content", "")
if isinstance(content, list):
content_parts = _chat_content_to_responses_parts(content)
content_parts = _chat_content_to_responses_parts(content, role=role)
text_type = "output_text" if role == "assistant" else "input_text"
content_text = "".join(
p.get("text", "") for p in content_parts if p.get("type") == "input_text"
p.get("text", "") for p in content_parts if p.get("type") == text_type
)
else:
content_parts = []
@@ -245,7 +289,57 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
seen_item_ids.add(item_id)
has_codex_reasoning = True
if content_parts:
# Replay exact assistant message items (with id/phase) from
# previous turns so the API can maintain prefix-cache hits.
# OpenAI docs: "preserve and resend phase on all assistant
# messages — dropping it can degrade performance."
codex_message_items = msg.get("codex_message_items")
replayed_message_items = 0
if isinstance(codex_message_items, list):
for raw_item in codex_message_items:
if not isinstance(raw_item, dict):
continue
if raw_item.get("type") != "message" or raw_item.get("role") != "assistant":
continue
raw_content_parts = raw_item.get("content")
if not isinstance(raw_content_parts, list):
continue
normalized_content_parts = []
for part in raw_content_parts:
if not isinstance(part, dict):
continue
part_type = str(part.get("type") or "").strip()
if part_type not in {"output_text", "text"}:
continue
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content_parts.append({"type": "output_text", "text": text})
if not normalized_content_parts:
continue
replay_item = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(raw_item.get("status")),
"content": normalized_content_parts,
}
item_id = raw_item.get("id")
if isinstance(item_id, str) and item_id.strip():
replay_item["id"] = item_id.strip()
phase = raw_item.get("phase")
if isinstance(phase, str) and phase.strip():
replay_item["phase"] = phase.strip()
items.append(replay_item)
replayed_message_items += 1
if replayed_message_items > 0:
pass
elif content_parts:
items.append({"role": "assistant", "content": content_parts})
elif content_text.strip():
items.append({"role": "assistant", "content": content_text})
@@ -405,6 +499,47 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
normalized.append(reasoning_item)
continue
if item_type == "message":
role = item.get("role")
if role != "assistant":
raise ValueError(f"Codex Responses input[{idx}] message items must have role='assistant'.")
content = item.get("content")
if not isinstance(content, list):
raise ValueError(f"Codex Responses input[{idx}] message item must have content list.")
normalized_content = []
for part_idx, part in enumerate(content):
if not isinstance(part, dict):
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] must be an object."
)
part_type = part.get("type")
if part_type not in {"output_text", "text"}:
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] has unsupported type {part_type!r}."
)
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content.append({"type": "output_text", "text": text})
if not normalized_content:
raise ValueError(f"Codex Responses input[{idx}] message item must contain at least one text part.")
normalized_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item.get("status")),
"content": normalized_content,
}
item_id = item.get("id")
if isinstance(item_id, str) and item_id.strip():
normalized_item["id"] = item_id.strip()
phase = item.get("phase")
if isinstance(phase, str) and phase.strip():
normalized_item["phase"] = phase.strip()
normalized.append(normalized_item)
continue
role = item.get("role")
if role in {"user", "assistant"}:
content = item.get("content", "")
@@ -412,13 +547,16 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
content = ""
if isinstance(content, list):
# Multimodal content from ``_chat_messages_to_responses_input``
# is already in Responses format (``input_text`` / ``input_image``).
# Validate each part and pass through.
# is already in Responses format (``input_text`` / ``output_text``
# / ``input_image``). Validate each part and pass through.
# Use the correct text type for the role — ``output_text`` for
# assistant messages, ``input_text`` for user messages.
text_type = "output_text" if role == "assistant" else "input_text"
validated: List[Dict[str, Any]] = []
for part_idx, part in enumerate(content):
if isinstance(part, str):
if part:
validated.append({"type": "input_text", "text": part})
validated.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
raise ValueError(
@@ -429,7 +567,7 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
text = part.get("text", "")
if not isinstance(text, str):
text = str(text or "")
validated.append({"type": "input_text", "text": text})
validated.append({"type": text_type, "text": text})
elif ptype in {"input_image", "image_url"}:
image_ref = part.get("image_url", "")
detail = part.get("detail")
@@ -686,6 +824,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
content_parts: List[str] = []
reasoning_parts: List[str] = []
reasoning_items_raw: List[Dict[str, Any]] = []
message_items_raw: List[Dict[str, Any]] = []
tool_calls: List[Any] = []
has_incomplete_items = response_status in {"queued", "in_progress", "incomplete"}
saw_commentary_phase = False
@@ -704,6 +843,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if item_type == "message":
item_phase = getattr(item, "phase", None)
normalized_phase = None
if isinstance(item_phase, str):
normalized_phase = item_phase.strip().lower()
if normalized_phase in {"commentary", "analysis"}:
@@ -713,6 +853,18 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
message_text = _extract_responses_message_text(item)
if message_text:
content_parts.append(message_text)
raw_message_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item_status),
"content": [{"type": "output_text", "text": message_text}],
}
item_id = getattr(item, "id", None)
if isinstance(item_id, str) and item_id:
raw_message_item["id"] = item_id
if normalized_phase:
raw_message_item["phase"] = normalized_phase
message_items_raw.append(raw_message_item)
elif item_type == "reasoning":
reasoning_text = _extract_responses_reasoning_text(item)
if reasoning_text:
@@ -787,6 +939,37 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if isinstance(out_text, str):
final_text = out_text.strip()
# ── Tool-call leak recovery ──────────────────────────────────
# gpt-5.x on the Codex Responses API sometimes degenerates and emits
# what should be a structured `function_call` item as plain assistant
# text using the Harmony/Codex serialization (``to=functions.foo
# {json}`` or ``assistant to=functions.foo {json}``). The model
# intended to call a tool, but the intent never made it into
# ``response.output`` as a ``function_call`` item, so ``tool_calls``
# is empty here. If we pass this through, the parent sees a
# confident-looking summary with no audit trail (empty ``tool_trace``)
# and no tools actually ran — the Taiwan-embassy-email incident.
#
# Detection: leaked tokens always contain ``to=functions.<name>`` and
# the assistant message has no real tool calls. Treat it as incomplete
# so the existing Codex-incomplete continuation path (3 retries,
# handled in run_agent.py) gets a chance to re-elicit a proper
# ``function_call`` item. The existing loop already handles message
# append, dedup, and retry budget.
leaked_tool_call_text = False
if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
leaked_tool_call_text = True
logger.warning(
"Codex response contains leaked tool-call text in assistant content "
"(no structured function_call items). Treating as incomplete so the "
"continuation path can re-elicit a proper tool call. Leaked snippet: %r",
final_text[:300],
)
# Clear the text so downstream code doesn't surface the garbage as
# a summary. The encrypted reasoning items (if any) are preserved
# so the model keeps its chain-of-thought on the retry.
final_text = ""
assistant_message = SimpleNamespace(
content=final_text,
tool_calls=tool_calls,
@@ -794,10 +977,13 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
reasoning_content=None,
reasoning_details=None,
codex_reasoning_items=reasoning_items_raw or None,
codex_message_items=message_items_raw or None,
)
if tool_calls:
finish_reason = "tool_calls"
elif leaked_tool_call_text:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:
+15
View File
@@ -294,6 +294,7 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -317,6 +318,13 @@ class ContextCompressor(ContextEngine):
int(context_length * self.threshold_percent),
MINIMUM_CONTEXT_LENGTH,
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
def __init__(
self,
@@ -389,6 +397,7 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -812,10 +821,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
@@ -853,6 +864,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
+43 -2
View File
@@ -31,6 +31,7 @@ from __future__ import annotations
import json
import logging
import re
import inspect
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@@ -312,7 +313,39 @@ class MemoryManager:
)
return "\n\n".join(parts)
def on_memory_write(self, action: str, target: str, content: str) -> None:
@staticmethod
def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
"""Return how to pass metadata to a provider's memory-write hook."""
try:
signature = inspect.signature(provider.on_memory_write)
except (TypeError, ValueError):
return "keyword"
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return "keyword"
if "metadata" in signature.parameters:
return "keyword"
accepted = [
p for p in params
if p.kind in (
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
]
if len(accepted) >= 4:
return "positional"
return "legacy"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
@@ -321,7 +354,15 @@ class MemoryManager:
if provider.name == "builtin":
continue
try:
provider.on_memory_write(action, target, content)
metadata_mode = self._provider_memory_write_metadata_mode(provider)
if metadata_mode == "keyword":
provider.on_memory_write(
action, target, content, metadata=dict(metadata or {})
)
elif metadata_mode == "positional":
provider.on_memory_write(action, target, content, dict(metadata or {}))
else:
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
+12 -3
View File
@@ -26,7 +26,7 @@ Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) — per-turn tick with runtime context
on_session_end(messages) — end-of-session extraction
on_pre_compress(messages) -> str — extract before context compression
on_memory_write(action, target, content) — mirror built-in memory writes
on_memory_write(action, target, content, metadata=None) — mirror built-in memory writes
on_delegation(task, result, **kwargs) — parent-side observation of subagent work
"""
@@ -34,7 +34,7 @@ from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@@ -220,12 +220,21 @@ class MemoryProvider(ABC):
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(self, action: str, target: str, content: str) -> None:
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
metadata: structured provenance for the write, when available. Common
keys include ``write_origin``, ``execution_context``, ``session_id``,
``parent_session_id``, ``platform``, and ``tool_name``.
Use to mirror built-in memory writes to your backend.
"""
+61 -21
View File
@@ -106,9 +106,11 @@ _endpoint_model_metadata_cache_time: Dict[str, float] = {}
_ENDPOINT_MODEL_CACHE_TTL = 300
# Descending tiers for context length probing when the model is unknown.
# We start at 128K (a safe default for most modern models) and step down
# on context-length errors until one works.
# We start at 256K (covers GPT-5.x, many current large-context models) and
# step down on context-length errors until one works. Tier[0] is also the
# default fallback when no detection method succeeds.
CONTEXT_PROBE_TIERS = [
256_000,
128_000,
64_000,
32_000,
@@ -143,10 +145,11 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
"gpt-5.5": 400000,
# GPT-5.5 (launched Apr 23 2026) is 1.05M on the direct OpenAI API and
# ChatGPT Codex OAuth caps it at 272K; both paths resolve via their own
# provider-aware branches (_resolve_codex_oauth_context_length + models.dev).
# This hardcoded value is only reached when every probe misses.
"gpt-5.5": 1050000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4": 1050000, # GPT-5.4, GPT-5.4 Pro (1.05M context)
@@ -162,7 +165,17 @@ DEFAULT_CONTEXT_LENGTHS = {
"gemma-4-31b": 256000,
"gemma-3": 131072,
"gemma": 8192, # fallback for older gemma models
# DeepSeek
# DeepSeek — V4 family ships with a 1M context window. The legacy
# aliases ``deepseek-chat`` / ``deepseek-reasoner`` are server-side
# mapped to the non-thinking / thinking modes of ``deepseek-v4-flash``
# and inherit the same 1M window. The ``deepseek`` substring entry
# below remains as a 128K fallback for older / unknown DeepSeek model
# ids (e.g. via custom endpoints).
# https://api-docs.deepseek.com/zh-cn/quick_start/pricing
"deepseek-v4-pro": 1_000_000,
"deepseek-v4-flash": 1_000_000,
"deepseek-chat": 1_000_000,
"deepseek-reasoner": 1_000_000,
"deepseek": 128000,
# Meta
"llama": 131072,
@@ -1193,12 +1206,14 @@ def get_model_context_length(
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
custom_providers: list | None = None,
) -> int:
"""Get the context length for a model.
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
@@ -1212,6 +1227,23 @@ def get_model_context_length(
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# 0b. custom_providers per-model override — check before any probe.
# This closes the gap where /model switch and display paths used to fall
# back to 128K despite the user having a per-model context_length set.
# See #15779.
if custom_providers and base_url and model:
try:
from hermes_cli.config import get_custom_provider_context_length
cp_ctx = get_custom_provider_context_length(
model=model,
base_url=base_url,
custom_providers=custom_providers,
)
if cp_ctx:
return cp_ctx
except Exception:
pass # fall through to probing
# Normalise provider-prefixed model names (e.g. "local:model-name" →
# "model-name") so cache lookups and server queries use the bare ID that
# local servers actually know about. Ollama "model:tag" colons are preserved.
@@ -1237,6 +1269,26 @@ def get_model_context_length(
else:
return cached
# 1b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels API doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py that reflects
# AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
# Anthropic API). This must run BEFORE the custom-endpoint probe at
# step 2 — bedrock-runtime.<region>.amazonaws.com is not in
# _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
# fail the /models probe (Bedrock doesn't expose that shape), and fall
# back to the 128K default before reaching the original step 4b branch.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
# /models endpoint may report a provider-imposed limit (e.g. Copilot
@@ -1282,19 +1334,7 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
@@ -1343,7 +1383,7 @@ def get_model_context_length(
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
return metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
+142
View File
@@ -180,3 +180,145 @@ def format_remaining(seconds: float) -> str:
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
# Buckets with reset windows shorter than this are treated as transient
# (upstream jitter, secondary throttling) rather than a genuine quota
# exhaustion worth a cross-session breaker trip.
_MIN_RESET_FOR_BREAKER_SECONDS = 60.0
def is_genuine_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
last_known_state: Optional[Any] = None,
) -> bool:
"""Decide whether a 429 from Nous Portal is a real account rate limit.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes, ...) behind one endpoint. A 429 can mean either:
(a) The caller's own RPM / RPH / TPM / TPH bucket on Nous is
exhausted — a genuine rate limit that will last until the
bucket resets.
(b) The upstream provider is out of capacity for a specific model
— transient, clears in seconds, and has nothing to do with
the caller's quota on Nous.
Tripping the cross-session breaker on (b) blocks ALL Nous requests
(and all models, since Nous is one provider key) for minutes even
though the caller's account is healthy and a different model would
have worked. That's the bug users hit when DeepSeek V4 Pro 429s
trigger a breaker that then blocks Kimi 2.6 and MiMo V2.5 Pro.
We tell the two apart by looking at:
1. The 429 response's own ``x-ratelimit-*`` headers. Nous emits
the full suite on every response including 429s. An exhausted
bucket (``remaining == 0`` with a reset window >= 60s) is
proof of (a).
2. The last-known-good rate-limit state captured by
``_capture_rate_limits()`` on the previous successful
response. If any bucket there was already near-exhausted with
a substantial reset window, the current 429 is almost
certainly (a) continuing from that condition.
If neither signal fires, we treat the 429 as (b): fail the single
request, let the retry loop or model-switch proceed, and do NOT
write the cross-session breaker file.
Returns True when the evidence points at (a).
"""
# Signal 1: current 429 response headers.
state = _parse_buckets_from_headers(headers)
if _has_exhausted_bucket(state):
return True
# Signal 2: last-known-good state from a recent successful response.
# Accepts either a RateLimitState (dataclass from rate_limit_tracker)
# or a dict of bucket snapshots.
if last_known_state is not None and _has_exhausted_bucket_in_object(last_known_state):
return True
return False
def _parse_buckets_from_headers(
headers: Optional[Mapping[str, str]],
) -> dict[str, tuple[Optional[int], Optional[float]]]:
"""Extract (remaining, reset_seconds) per bucket from x-ratelimit-* headers.
Returns empty dict when no rate-limit headers are present.
"""
if not headers:
return {}
lowered = {k.lower(): v for k, v in headers.items()}
if not any(k.startswith("x-ratelimit-") for k in lowered):
return {}
def _maybe_int(raw: Optional[str]) -> Optional[int]:
if raw is None:
return None
try:
return int(float(raw))
except (TypeError, ValueError):
return None
def _maybe_float(raw: Optional[str]) -> Optional[float]:
if raw is None:
return None
try:
return float(raw)
except (TypeError, ValueError):
return None
result: dict[str, tuple[Optional[int], Optional[float]]] = {}
for tag in ("requests", "requests-1h", "tokens", "tokens-1h"):
remaining = _maybe_int(lowered.get(f"x-ratelimit-remaining-{tag}"))
reset = _maybe_float(lowered.get(f"x-ratelimit-reset-{tag}"))
if remaining is not None or reset is not None:
result[tag] = (remaining, reset)
return result
def _has_exhausted_bucket(
buckets: Mapping[str, tuple[Optional[int], Optional[float]]],
) -> bool:
"""Return True when any bucket has remaining == 0 AND a meaningful reset window."""
for remaining, reset in buckets.values():
if remaining is None or remaining > 0:
continue
if reset is None:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
def _has_exhausted_bucket_in_object(state: Any) -> bool:
"""Check a RateLimitState-like object for an exhausted bucket.
Accepts the dataclass from ``agent.rate_limit_tracker`` (buckets
exposed as attributes ``requests_min``, ``requests_hour``,
``tokens_min``, ``tokens_hour``) and falls back gracefully for any
object missing those attributes.
"""
for attr in ("requests_min", "requests_hour", "tokens_min", "tokens_hour"):
bucket = getattr(state, attr, None)
if bucket is None:
continue
limit = getattr(bucket, "limit", 0) or 0
remaining = getattr(bucket, "remaining", 0) or 0
# Prefer the adjusted "remaining_seconds_now" property when present;
# fall back to raw reset_seconds.
reset = getattr(bucket, "remaining_seconds_now", None)
if reset is None:
reset = getattr(bucket, "reset_seconds", 0.0) or 0.0
if limit <= 0:
continue
if remaining > 0:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
+144
View File
@@ -0,0 +1,144 @@
"""
Contextual first-touch onboarding hints.
Instead of blocking first-run questionnaires, show a one-time hint the *first*
time a user hits a behavior fork — message-while-running, first long-running
tool, etc. Each hint is shown once per install (tracked in ``config.yaml`` under
``onboarding.seen.<flag>``) and then never again.
Keep this module tiny and dependency-free so both the CLI and gateway can import
it without pulling in heavy modules.
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
# -------------------------------------------------------------------------
# Flag names (stable — used as config.yaml keys under onboarding.seen)
# -------------------------------------------------------------------------
BUSY_INPUT_FLAG = "busy_input_prompt"
TOOL_PROGRESS_FLAG = "tool_progress_prompt"
# -------------------------------------------------------------------------
# Hint content
# -------------------------------------------------------------------------
def busy_input_hint_gateway(mode: str) -> str:
"""Hint shown the first time a user messages while the agent is busy.
``mode`` is the effective busy_input_mode that was just applied, so the
message matches reality ("I just interrupted…" vs "I just queued…").
"""
if mode == "queue":
return (
"💡 First-time tip — I queued your message instead of interrupting. "
"Send `/busy interrupt` to make new messages stop the current task "
"immediately, or `/busy status` to check. This notice won't appear again."
)
return (
"💡 First-time tip — I just interrupted my current task to answer you. "
"Send `/busy queue` to queue follow-ups for after the current task instead, "
"or `/busy status` to check. This notice won't appear again."
)
def busy_input_hint_cli(mode: str) -> str:
"""CLI version of the busy-input hint (plain text, no markdown)."""
if mode == "queue":
return (
"(tip) Your message was queued for the next turn. "
"Use /busy interrupt to make Enter stop the current run instead. "
"This tip only shows once."
)
return (
"(tip) Your message interrupted the current run. "
"Use /busy queue to queue messages for the next turn instead. "
"This tip only shows once."
)
def tool_progress_hint_gateway() -> str:
return (
"💡 First-time tip — that tool took a while and I'm streaming every step. "
"If the progress messages feel noisy, send `/verbose` to cycle modes "
"(all → new → off). This notice won't appear again."
)
def tool_progress_hint_cli() -> str:
return (
"(tip) That tool ran for a while. Use /verbose to cycle tool-progress "
"display modes (all -> new -> off -> verbose). This tip only shows once."
)
# -------------------------------------------------------------------------
# State read / write
# -------------------------------------------------------------------------
def _get_seen_dict(config: Mapping[str, Any]) -> Mapping[str, Any]:
onboarding = config.get("onboarding") if isinstance(config, Mapping) else None
if not isinstance(onboarding, Mapping):
return {}
seen = onboarding.get("seen")
return seen if isinstance(seen, Mapping) else {}
def is_seen(config: Mapping[str, Any], flag: str) -> bool:
"""Return True if the user has already been shown this first-touch hint."""
return bool(_get_seen_dict(config).get(flag))
def mark_seen(config_path: Path, flag: str) -> bool:
"""Persist ``onboarding.seen.<flag> = True`` to ``config_path``.
Uses the atomic YAML writer so a concurrent process can't observe a
partially-written file. Returns True on success, False on any error
(including the config file being absent — onboarding is best-effort).
"""
try:
import yaml
from utils import atomic_yaml_write
except Exception as e: # pragma: no cover — dependency issue
logger.debug("onboarding: failed to import yaml/utils: %s", e)
return False
try:
cfg: dict = {}
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
if not isinstance(cfg.get("onboarding"), dict):
cfg["onboarding"] = {}
seen = cfg["onboarding"].get("seen")
if not isinstance(seen, dict):
seen = {}
cfg["onboarding"]["seen"] = seen
if seen.get(flag) is True:
return True # already marked — nothing to do
seen[flag] = True
atomic_yaml_write(config_path, cfg)
return True
except Exception as e:
logger.debug("onboarding: failed to mark flag %s: %s", flag, e)
return False
__all__ = [
"BUSY_INPUT_FLAG",
"TOOL_PROGRESS_FLAG",
"busy_input_hint_gateway",
"busy_input_hint_cli",
"tool_progress_hint_gateway",
"tool_progress_hint_cli",
"is_seen",
"mark_seen",
]
+58
View File
@@ -176,6 +176,64 @@ SKILLS_GUIDANCE = (
"Skills that aren't maintained become liabilities."
)
KANBAN_GUIDANCE = (
"# You are a Kanban worker\n"
"You were spawned by the Hermes Kanban dispatcher to execute ONE task from "
"the shared board at `~/.hermes/kanban.db`. Your task id is in "
"`$HERMES_KANBAN_TASK`; your workspace is `$HERMES_KANBAN_WORKSPACE`. "
"The `kanban_*` tools in your schema are your primary coordination surface — "
"they write directly to the shared SQLite DB and work regardless of terminal "
"backend (local/docker/modal/ssh).\n"
"\n"
"## Lifecycle\n"
"\n"
"1. **Orient.** Call `kanban_show()` first (no args — it defaults to your "
"task). The response includes title, body, parent-task handoffs (summary + "
"metadata), any prior attempts on this task if you're a retry, the full "
"comment thread, and a pre-formatted `worker_context` you can treat as "
"ground truth.\n"
"2. **Work inside the workspace.** `cd $HERMES_KANBAN_WORKSPACE` before "
"any file operations. The workspace is yours for this run. Don't modify "
"files outside it unless the task explicitly asks.\n"
"3. **Heartbeat on long operations.** Call `kanban_heartbeat(note=...)` "
"every few minutes during long subprocesses (training, encoding, crawling). "
"Skip heartbeats for short tasks.\n"
"4. **Block on genuine ambiguity.** If you need a human decision you cannot "
"infer (missing credentials, UX choice, paywalled source, peer output you "
"need first), call `kanban_block(reason=\"...\")` and stop. Don't guess. "
"The user will unblock with context and the dispatcher will respawn you.\n"
"5. **Complete with structured handoff.** Call `kanban_complete(summary=..., "
"metadata=...)`. `summary` is 13 human-readable sentences naming concrete "
"artifacts. `metadata` is machine-readable facts "
"(`{changed_files: [...], tests_run: N, decisions: [...]}`). Downstream "
"workers read both via their own `kanban_show`. Never put secrets / "
"tokens / raw PII in either field — run rows are durable forever.\n"
"6. **If follow-up work appears, create it; don't do it.** Use "
"`kanban_create(title=..., assignee=<right-profile>, parents=[your-task-id])` "
"to spawn a child task for the appropriate specialist profile instead of "
"scope-creeping into the next thing.\n"
"\n"
"## Orchestrator mode\n"
"\n"
"If your task is itself a decomposition task (e.g. a planner profile given "
"a high-level goal), use `kanban_create` to fan out into child tasks — one "
"per specialist, each with an explicit `assignee` and `parents=[...]` to "
"express dependencies. Then `kanban_complete` your own task with a summary "
"of the decomposition. Do NOT execute the work yourself; your job is "
"routing, not implementation.\n"
"\n"
"## Do NOT\n"
"\n"
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "
"the `kanban_*` tools — they work across all terminal backends.\n"
"- Do not complete a task you didn't actually finish. Block it.\n"
"- Do not assign follow-up work to yourself. Assign it to the right "
"specialist profile.\n"
"- Do not call `delegate_task` as a board substitute. `delegate_task` is "
"for short reasoning subtasks inside your own run; board tasks are for "
"cross-agent handoffs that outlive one API loop."
)
TOOL_USE_ENFORCEMENT_GUIDANCE = (
"# Tool-use enforcement\n"
"You MUST use your tools to take action — do not describe what you would do "
+8 -107
View File
@@ -7,11 +7,15 @@ can invoke skills via /skill-name commands.
import json
import logging
import re
import subprocess
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
from agent.skill_preprocessing import (
expand_inline_shell as _expand_inline_shell,
load_skills_config as _load_skills_config,
substitute_template_vars as _substitute_template_vars,
)
logger = logging.getLogger(__name__)
@@ -20,111 +24,6 @@ _skill_commands: Dict[str, Dict[str, Any]] = {}
_SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
_SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only — no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def _load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def _substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available —
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return f"[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
return output
def _expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return _run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
"""Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
raw_identifier = (skill_identifier or "").strip()
@@ -143,7 +42,9 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
else:
normalized = raw_identifier.lstrip("/")
loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
loaded_skill = json.loads(
skill_view(normalized, task_id=task_id, preprocess=False)
)
except Exception:
return None
+131
View File
@@ -0,0 +1,131 @@
"""Shared SKILL.md preprocessing helpers."""
import logging
import re
import subprocess
from pathlib import Path
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only -- no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available --
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return "[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
return output
def expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return content
+7 -2
View File
@@ -23,9 +23,14 @@ def get_transport(api_mode: str):
This allows gradual migration call sites can check for None
and fall back to the legacy code path.
"""
if not _REGISTRY:
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
# The registry can be partially populated when a specific transport
# module was imported directly (for example chat_completions before
# codex). Discover on misses, not only when the registry is empty, so
# test/order-dependent imports do not make valid api_modes unavailable.
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
return None
return cls()
+5 -4
View File
@@ -31,15 +31,15 @@ class ChatCompletionsTransport(ProviderTransport):
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` on the
message, ``call_id``/``response_item_id`` on tool_calls) that strict
chat-completions providers reject with 400/422.
Strips Codex Responses API fields (``codex_reasoning_items`` /
``codex_message_items`` on the message, ``call_id``/``response_item_id``
on tool_calls) that strict chat-completions providers reject with 400/422.
"""
needs_sanitize = False
for msg in messages:
if not isinstance(msg, dict):
continue
if "codex_reasoning_items" in msg:
if "codex_reasoning_items" in msg or "codex_message_items" in msg:
needs_sanitize = True
break
tool_calls = msg.get("tool_calls")
@@ -59,6 +59,7 @@ class ChatCompletionsTransport(ProviderTransport):
if not isinstance(msg, dict):
continue
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
+20
View File
@@ -120,6 +120,24 @@ class ResponsesApiTransport(ProviderTransport):
if request_overrides:
kwargs.update(request_overrides)
if is_codex_backend:
prompt_cache_key = kwargs.get("prompt_cache_key")
cache_scope_id = str(prompt_cache_key or session_id or "").strip()
if cache_scope_id:
existing_extra_headers = kwargs.get("extra_headers")
merged_extra_headers: Dict[str, str] = {}
if isinstance(existing_extra_headers, dict):
merged_extra_headers.update(
{
str(key): str(value)
for key, value in existing_extra_headers.items()
if key and value is not None
}
)
merged_extra_headers["session_id"] = cache_scope_id
merged_extra_headers["x-client-request-id"] = cache_scope_id
kwargs["extra_headers"] = merged_extra_headers
max_tokens = params.get("max_tokens")
if max_tokens is not None and not is_codex_backend:
kwargs["max_output_tokens"] = max_tokens
@@ -160,6 +178,8 @@ class ResponsesApiTransport(ProviderTransport):
provider_data = {}
if msg and hasattr(msg, "codex_reasoning_items") and msg.codex_reasoning_items:
provider_data["codex_reasoning_items"] = msg.codex_reasoning_items
if msg and hasattr(msg, "codex_message_items") and msg.codex_message_items:
provider_data["codex_message_items"] = msg.codex_message_items
if msg and hasattr(msg, "reasoning_details") and msg.reasoning_details:
provider_data["reasoning_details"] = msg.reasoning_details
+6 -1
View File
@@ -97,7 +97,7 @@ class NormalizedResponse:
Response-level ``provider_data`` examples:
* Anthropic: ``{"reasoning_details": [...]}``
* Codex: ``{"codex_reasoning_items": [...]}``
* Codex: ``{"codex_reasoning_items": [...], "codex_message_items": [...]}``
* Others: ``None``
"""
@@ -126,6 +126,11 @@ class NormalizedResponse:
pd = self.provider_data or {}
return pd.get("codex_reasoning_items")
@property
def codex_message_items(self):
pd = self.provider_data or {}
return pd.get("codex_message_items")
# ---------------------------------------------------------------------------
# Factory helpers
+2 -6
View File
@@ -951,13 +951,9 @@ class BatchRunner:
root_logger.setLevel(original_level)
# Aggregate all batch statistics and update checkpoint
all_completed_prompts = list(completed_prompts_set)
total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
for batch_result in results:
# Add newly completed prompts
all_completed_prompts.extend(batch_result.get("completed_prompts", []))
# Aggregate tool stats
for tool_name, stats in batch_result.get("tool_stats", {}).items():
if tool_name not in total_tool_stats:
@@ -977,7 +973,7 @@ class BatchRunner:
# Save final checkpoint (best-effort; incremental writes already happened)
try:
checkpoint_data["completed_prompts"] = all_completed_prompts
checkpoint_data["completed_prompts"] = sorted(completed_prompts_set)
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
print(f"⚠️ Warning: Failed to save final checkpoint: {ckpt_err}")
+32 -10
View File
@@ -790,9 +790,16 @@ code_execution:
# Supports single tasks and batch mode (default 3 parallel, configurable).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
# max_concurrent_children: 3 # Max parallel child agents (default: 3)
# max_spawn_depth: 1 # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
# max_concurrent_children: 3 # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
# WARNING: values above 10 multiply API cost linearly.
# max_spawn_depth: 1 # Delegation tree depth cap (range: 1-3, default: 1 = flat).
# Raise to 2 to allow workers to spawn their own subagents.
# Requires role="orchestrator" on intermediate agents.
# orchestrator_enabled: true # Kill switch for role="orchestrator" children (default: true).
# subagent_auto_approve: false # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
# or auto-approve "once" (true) instead of blocking on stdin.
# The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
# Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
# inherit_mcp_toolsets: true # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
# model: "google/gemini-3-flash-preview" # Override model for subagents (empty = inherit parent)
# provider: "openrouter" # Override provider for subagents (empty = inherit parent)
@@ -817,7 +824,9 @@ delegation:
# Display
# =============================================================================
display:
# Use compact banner mode
# Use compact banner mode (hides the ASCII-art banner, shows a single line).
# true: Compact single-line banner
# false: Full ASCII banner with tool/skill summary (default)
compact: false
# Tool progress display level (CLI and gateway)
@@ -831,12 +840,15 @@ display:
# Gateway-only natural mid-turn assistant updates.
# When true, completed assistant status messages are sent as separate chat
# messages. This is independent of tool_progress and gateway streaming.
# true: Send mid-turn assistant updates as separate messages (default)
# false: Only send the final response
interim_assistant_messages: true
# What Enter does when Hermes is already busy in the CLI.
# What Enter does when Hermes is already busy (CLI and gateway platforms).
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# Ctrl+C always interrupts regardless of this setting.
# Ctrl+C (or /stop in gateway) always interrupts regardless of this setting.
# Toggle at runtime with /busy_input_mode <interrupt|queue>.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
@@ -852,17 +864,22 @@ display:
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
# true: Ring the terminal bell on each response
# false: Silent (default)
bell_on_complete: false
# Show model reasoning/thinking before each response.
# When enabled, a dim box shows the model's thought process above the response.
# Toggle at runtime with /reasoning show or /reasoning hide.
# true: Show the reasoning box
# false: Hide reasoning (default)
show_reasoning: false
# Stream tokens to the terminal as they arrive instead of waiting for the
# full response. The response box opens on first token and text appears
# line-by-line. Tool calls are still captured silently.
# Stream tokens to the terminal in real-time. Disable to wait for full responses.
# true: Stream tokens as they arrive (default)
# false: Wait for the full response before rendering
streaming: true
# ───────────────────────────────────────────────────────────────────────────
@@ -872,10 +889,15 @@ display:
# response box label, and branding text. Change at runtime with /skin <name>.
#
# Built-in skins:
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# daylight — Bright light-mode theme
# warm-lightmode — Warm paper-tone light-mode theme
# poseidon — Sea-green/teal Olympian theme
# sisyphus — Earthy stone-and-moss theme
# charizard — Fiery orange dragon theme
#
# Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
# Schema (all fields optional, missing values inherit from default):
+231 -226
View File
@@ -22,6 +22,7 @@ import re
import concurrent.futures
import base64
import atexit
import errno
import tempfile
import time
import uuid
@@ -416,6 +417,11 @@ def load_cli_config() -> Dict[str, Any]:
"base_url": "", # Direct OpenAI-compatible endpoint for subagents
"api_key": "", # API key for delegation.base_url (falls back to OPENAI_API_KEY)
},
"onboarding": {
# First-touch hint flags (see agent/onboarding.py). Each hint is
# shown once per install then latched here.
"seen": {},
},
}
# Track whether the config file explicitly set terminal config.
@@ -3176,7 +3182,14 @@ class HermesCLI:
# the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
runtime_model = runtime.get("model")
if runtime_model and isinstance(runtime_model, str):
self.model = runtime_model
# Only use runtime model if: model is unset, or model equals provider name
should_use_runtime_model = (
not self.model or # No model configured yet
self.model == self.provider or # Model is the provider slug
self.model == runtime.get("name") # Model matches provider display name
)
if should_use_runtime_model:
self.model = runtime_model
# If model is still empty (e.g. user ran `hermes auth add openai-codex`
# without `hermes model`), fall back to the provider's first catalog
@@ -4311,7 +4324,7 @@ class HermesCLI:
_cprint(f"\n {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
_cprint(f" {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
_cprint(f" {_DIM}Draft editor: Ctrl+G{_RST}")
_cprint(f" {_DIM}Draft editor: Ctrl+G (Alt+G in VSCode/Cursor){_RST}")
if _is_termux_environment():
_cprint(f" {_DIM}Attach image: /image {_termux_example_image_path()} or start your prompt with a local image path{_RST}\n")
else:
@@ -4661,10 +4674,6 @@ class HermesCLI:
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
@@ -5149,27 +5158,29 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
mi = result.model_info
try:
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
cache_enabled = (
(base_url_host_matches(result.base_url or "", "openrouter.ai") and "claude" in result.new_model.lower())
@@ -5270,24 +5281,22 @@ class HermesCLI:
# Parse --provider and --global flags
model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
# Load providers for switch_model (picker path needs them below)
user_provs = None
custom_provs = None
try:
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
user_provs = cfg.get("providers")
custom_provs = get_compatible_custom_providers(cfg)
except Exception:
pass
# No args at all: open prompt_toolkit-native picker modal
if not model_input and not explicit_provider:
model_display = self.model or "unknown"
provider_display = get_label(self.provider) if self.provider else "unknown"
user_provs = None
custom_provs = None
try:
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
user_provs = cfg.get("providers")
custom_provs = get_compatible_custom_providers(cfg)
except Exception:
pass
try:
providers = list_authenticated_providers(
current_provider=self.provider or "",
@@ -5374,29 +5383,26 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Rich metadata from models.dev
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
# Fallback to old context length lookup
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@@ -5812,7 +5818,28 @@ class HermesCLI:
print(f"(._.) Unknown cron command: {subcommand}")
print(" Available: list, add, edit, pause, resume, run, remove")
def _handle_kanban_command(self, cmd: str):
"""Handle the /kanban command — delegate to the shared kanban CLI.
The string form passed here is the user's full ``/kanban ...``
including the leading slash; we strip it and hand the remainder
to ``kanban.run_slash`` which returns a single formatted string.
"""
from hermes_cli.kanban import run_slash
rest = cmd.strip()
if rest.startswith("/"):
rest = rest.lstrip("/")
if rest.startswith("kanban"):
rest = rest[len("kanban"):].lstrip()
try:
output = run_slash(rest)
except Exception as exc: # pragma: no cover - defensive
output = f"(._.) kanban error: {exc}"
if output:
print(output)
def _handle_skills_command(self, cmd: str):
"""Handle /skills slash command — delegates to hermes_cli.skills_hub."""
from hermes_cli.skills_hub import handle_skills_slash
@@ -6049,6 +6076,8 @@ class HermesCLI:
self.save_conversation()
elif canonical == "cron":
self._handle_cron_command(cmd_original)
elif canonical == "kanban":
self._handle_kanban_command(cmd_original)
elif canonical == "skills":
with self._busy_command(self._slow_command_status(cmd_original)):
self._handle_skills_command(cmd_original)
@@ -6123,8 +6152,6 @@ class HermesCLI:
self._handle_agents_command()
elif canonical == "background":
self._handle_background_command(cmd_original)
elif canonical == "btw":
self._handle_btw_command(cmd_original)
elif canonical == "queue":
# Extract prompt after "/queue " or "/q "
parts = cmd_original.split(None, 1)
@@ -6165,6 +6192,8 @@ class HermesCLI:
self._handle_skin_command(cmd_original)
elif canonical == "voice":
self._handle_voice_command(cmd_original)
elif canonical == "busy":
self._handle_busy_command(cmd_original)
else:
# Check for user-defined quick commands (bypass agent loop, no LLM call)
base_cmd = cmd_lower.split()[0]
@@ -6409,122 +6438,6 @@ class HermesCLI:
self._background_tasks[task_id] = thread
thread.start()
def _handle_btw_command(self, cmd: str):
"""Handle /btw <question> — ephemeral side question using session context.
Snapshots the current conversation history, spawns a no-tools agent in
a background thread, and prints the answer without persisting anything
to the main session.
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or not parts[1].strip():
_cprint(" Usage: /btw <question>")
_cprint(" Example: /btw what module owns session title sanitization?")
_cprint(" Answers using session context. No tools, not persisted.")
return
question = parts[1].strip()
task_id = f"btw_{datetime.now().strftime('%H%M%S')}_{uuid.uuid4().hex[:6]}"
if not self._ensure_runtime_credentials():
_cprint(" (>_<) Cannot start /btw: no valid credentials.")
return
turn_route = self._resolve_turn_agent_config(question)
history_snapshot = list(self.conversation_history)
preview = question[:60] + ("..." if len(question) > 60 else "")
_cprint(f' 💬 /btw: "{preview}"')
def run_btw():
try:
btw_agent = AIAgent(
model=turn_route["model"],
api_key=turn_route["runtime"].get("api_key"),
base_url=turn_route["runtime"].get("base_url"),
provider=turn_route["runtime"].get("provider"),
api_mode=turn_route["runtime"].get("api_mode"),
acp_command=turn_route["runtime"].get("command"),
acp_args=turn_route["runtime"].get("args"),
max_iterations=8,
enabled_toolsets=[],
quiet_mode=True,
verbose_logging=False,
session_id=task_id,
platform="cli",
reasoning_config=self.reasoning_config,
service_tier=self.service_tier,
request_overrides=turn_route.get("request_overrides"),
providers_allowed=self._providers_only,
providers_ignored=self._providers_ignore,
providers_order=self._providers_order,
provider_sort=self._provider_sort,
provider_require_parameters=self._provider_require_params,
provider_data_collection=self._provider_data_collection,
fallback_model=self._fallback_model,
session_db=None,
skip_memory=True,
skip_context_files=True,
persist_session=False,
)
btw_prompt = (
"[Ephemeral /btw side question. Answer using the conversation "
"context. No tools available. Be direct and concise.]\n\n"
+ question
)
result = btw_agent.run_conversation(
user_message=btw_prompt,
conversation_history=history_snapshot,
task_id=task_id,
)
response = (result.get("final_response") or "") if result else ""
if not response and result and result.get("error"):
response = f"Error: {result['error']}"
# TUI refresh before printing
if self._app:
self._app.invalidate()
time.sleep(0.05)
print()
if response:
try:
from hermes_cli.skin_engine import get_active_skin
_skin = get_active_skin()
_resp_color = _skin.get_color("response_border", "#4F6D4A")
except Exception:
_resp_color = "#4F6D4A"
ChatConsole().print(Panel(
_render_final_assistant_content(response, mode=self.final_response_markdown),
title=f"[{_resp_color} bold]⚕ /btw[/]",
title_align="left",
border_style=_resp_color,
box=rich_box.HORIZONTALS,
padding=(1, 4),
))
else:
_cprint(" 💬 /btw: (no response)")
if self.bell_on_complete:
sys.stdout.write("\a")
sys.stdout.flush()
except Exception as e:
if self._app:
self._app.invalidate()
time.sleep(0.05)
print()
_cprint(f" ❌ /btw failed: {e}")
finally:
if self._app:
self._invalidate(min_interval=0)
thread = threading.Thread(target=run_btw, daemon=True, name=f"btw-{task_id}")
thread.start()
@staticmethod
def _try_launch_chrome_debug(port: int, system: str) -> bool:
"""Try to launch Chrome/Chromium with remote debugging enabled.
@@ -6901,6 +6814,36 @@ class HermesCLI:
else:
_cprint(f" {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")
def _handle_busy_command(self, cmd: str):
"""Handle /busy — control what Enter does while Hermes is working.
Usage:
/busy Show current busy input mode
/busy status Show current busy input mode
/busy queue Queue input for the next turn instead of interrupting
/busy interrupt Interrupt the current run on Enter (default)
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or parts[1].strip().lower() == "status":
_cprint(f" {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
_cprint(f" {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
arg = parts[1].strip().lower()
if arg not in {"queue", "interrupt"}:
_cprint(f" {_DIM}(._.) Unknown argument: {arg}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
self.busy_input_mode = arg
if save_config_value("display.busy_input_mode", arg):
behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
_cprint(f" {_DIM}{behavior}{_RST}")
else:
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (session only){_RST}")
def _handle_fast_command(self, cmd: str):
"""Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
if not self._fast_command_available():
@@ -6979,51 +6922,52 @@ class HermesCLI:
focus_topic = parts[1].strip()
original_count = len(self.conversation_history)
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
with self._busy_command("Compressing context..."):
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
def _handle_debug_command(self):
"""Handle /debug — upload debug report + logs and print paste URLs."""
@@ -7378,6 +7322,31 @@ class HermesCLI:
_cprint(f" {line}")
except Exception:
pass
# First-touch onboarding: on the first tool in this process
# that takes longer than the threshold while we're in the
# noisiest progress mode, print a one-time hint about
# /verbose. Latched on self so it fires at most once per
# process; persisted to config.yaml so it never fires again
# across processes either.
try:
if (
not getattr(self, "_long_tool_hint_fired", False)
and self.tool_progress_mode == "all"
and duration >= 30.0
):
from agent.onboarding import (
TOOL_PROGRESS_FLAG,
is_seen,
mark_seen,
tool_progress_hint_cli,
)
if not is_seen(CLI_CONFIG, TOOL_PROGRESS_FLAG):
self._long_tool_hint_fired = True
_cprint(f" {_DIM}{tool_progress_hint_cli()}{_RST}")
mark_seen(_hermes_home / "config.yaml", TOOL_PROGRESS_FLAG)
CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[TOOL_PROGRESS_FLAG] = True
except Exception:
pass
self._invalidate()
return
if event_type != "tool.started":
@@ -9261,6 +9230,24 @@ class HermesCLI:
f"agent_running={self._agent_running}\n")
except Exception:
pass
# First-touch onboarding: on the very first busy-while-running
# event for this install, print a one-line tip explaining the
# /busy knob. Flag persists to config.yaml and never fires
# again. Guarded for exceptions so onboarding can't break
# the input loop.
try:
from agent.onboarding import (
BUSY_INPUT_FLAG,
busy_input_hint_cli,
is_seen,
mark_seen,
)
if not is_seen(CLI_CONFIG, BUSY_INPUT_FLAG):
_cprint(f" {_DIM}{busy_input_hint_cli(self.busy_input_mode)}{_RST}")
mark_seen(_hermes_home / "config.yaml", BUSY_INPUT_FLAG)
CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[BUSY_INPUT_FLAG] = True
except Exception:
pass
else:
self._pending_input.put(payload)
event.app.current_buffer.reset(append_to_history=True)
@@ -9275,14 +9262,18 @@ class HermesCLI:
"""Ctrl+Enter (c-j) inserts a newline. Most terminals send c-j for Ctrl+Enter."""
event.current_buffer.insert_text('\n')
@kb.add(
'c-g',
filter=Condition(
lambda: not self._clarify_state and not self._approval_state and not self._sudo_state and not self._secret_state
),
# VSCode/Cursor bind Ctrl+G to "Find Next" at the editor level, so
# the keystroke never reaches the embedded terminal. Alt+G is unbound
# in those IDEs and arrives here as ('escape', 'g') — register it as
# a fallback so the editor handoff works inside Cursor/VSCode too.
_editor_filter = Condition(
lambda: not self._clarify_state and not self._approval_state and not self._sudo_state and not self._secret_state
)
@kb.add('c-g', filter=_editor_filter)
@kb.add('escape', 'g', filter=_editor_filter)
def handle_open_in_editor(event):
"""Ctrl+G opens the current draft in an external editor."""
"""Ctrl+G (or Alt+G in VSCode/Cursor) opens the current draft in an external editor."""
cli_ref._open_external_editor(event.current_buffer)
@kb.add('tab', eager=True)
@@ -9525,9 +9516,20 @@ class HermesCLI:
@kb.add('c-d')
def handle_ctrl_d(event):
"""Handle Ctrl+D - exit."""
self._should_exit = True
event.app.exit()
"""Ctrl+D: delete char under cursor (standard readline behaviour).
Only exit when the input is empty same as bash/zsh. Pending
attached images count as input and block the EOF-exit so the
user doesn't lose them silently.
"""
buf = event.app.current_buffer
if buf.text:
buf.delete()
elif self._attached_images:
# Empty text but pending attachments — no-op, don't exit.
return
else:
self._should_exit = True
event.app.exit()
_modal_prompt_active = Condition(
lambda: bool(self._secret_state or self._sudo_state)
@@ -9735,6 +9737,11 @@ class HermesCLI:
completer=_completer,
),
)
# Keep prompt_toolkit on its simple tempfile path. Setting
# buffer.tempfile = "prompt.md" triggers its complex-tempfile branch,
# which tries to mkdir() the mkdtemp() directory again and raises
# EEXIST. The suffix keeps markdown highlighting without that bug.
input_area.buffer.tempfile_suffix = '.md'
# Dynamic height: accounts for both explicit newlines AND visual
# wrapping of long lines so the input area always fits its content.
@@ -10687,6 +10694,8 @@ class HermesCLI:
return # silently suppress
if isinstance(exc, KeyError) and "is not registered" in str(exc):
return # suppress selector registration failures (#6393)
if isinstance(exc, OSError) and getattr(exc, "errno", None) == errno.EIO:
return # suppress I/O errors from broken stdout on interrupt (#13710)
# Fall back to default handler for everything else
loop.default_exception_handler(context)
@@ -10719,9 +10728,11 @@ class HermesCLI:
except (EOFError, KeyboardInterrupt, BrokenPipeError):
pass
except (KeyError, OSError) as _stdin_err:
# Catch selector registration failures from broken stdin (#6393).
# This is the fallback for cases that slip past the fstat() guard.
if "is not registered" in str(_stdin_err) or "Bad file descriptor" in str(_stdin_err):
# Catch selector registration failures from broken stdin (#6393)
# and I/O errors from broken stdout during interrupt (#13710).
if isinstance(_stdin_err, OSError) and getattr(_stdin_err, "errno", None) == errno.EIO:
pass # suppress broken-stdout I/O errors on interrupt (#13710)
elif "is not registered" in str(_stdin_err) or "Bad file descriptor" in str(_stdin_err):
print(
f"\nError: stdin is not usable ({_stdin_err}).\n"
"This can happen with certain Python installations (e.g. uv-managed cPython on macOS).\n"
@@ -10740,12 +10751,6 @@ class HermesCLI:
self.agent.interrupt()
except Exception:
pass
# Flush memories before exit (only for substantial conversations)
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Shut down voice recorder (release persistent audio stream)
if hasattr(self, '_voice_recorder') and self._voice_recorder:
try:
+14 -1
View File
@@ -16,7 +16,7 @@ import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any
from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
@@ -417,6 +417,7 @@ def create_job(
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
@@ -438,6 +439,9 @@ def create_job(
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
@@ -481,6 +485,14 @@ def create_job(
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
context_from = [context_from.strip()] if context_from.strip() else None
elif isinstance(context_from, list):
context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
else:
context_from = None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
"id": job_id,
@@ -492,6 +504,7 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
+41
View File
@@ -671,6 +671,47 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
f"{prompt}"
)
# Inject output from referenced cron jobs as context.
context_from = job.get("context_from")
if context_from:
from cron.jobs import OUTPUT_DIR
if isinstance(context_from, str):
context_from = [context_from]
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
if not job_output_dir.exists():
continue # silent skip — no output yet
output_files = sorted(
job_output_dir.glob("*.md"),
key=lambda f: f.stat().st_mtime,
reverse=True,
)
if not output_files:
continue # silent skip — no output yet
latest_output = output_files[0].read_text(encoding="utf-8").strip()
# Truncate to 8K characters to avoid prompt bloat
_MAX_CONTEXT_CHARS = 8000
if len(latest_output) > _MAX_CONTEXT_CHARS:
latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
if latest_output:
prompt = (
f"## Output from job '{source_job_id}'\n"
"The following is the most recent output from a preceding "
"cron job. Use it as context for your analysis.\n\n"
f"```\n{latest_output}\n```\n\n"
f"{prompt}"
)
else:
continue # silent skip — empty output
except (OSError, PermissionError) as e:
logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
# silent skip — do not pollute the prompt with error messages
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
+9 -7
View File
@@ -41,6 +41,15 @@ if [ "$(id -u)" = "0" ]; then
echo "Warning: chown failed (rootless container?) — continuing anyway"
fi
# Ensure config.yaml is readable by the hermes runtime user even if it was
# edited on the host after initial ownership setup. Must run here (as root)
# rather than after the gosu drop, otherwise a non-root caller like
# `docker run -u $(id -u):$(id -g)` hits "Operation not permitted" (#15865).
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
fi
echo "Dropping root privileges"
exec gosu hermes "$0" "$@"
fi
@@ -67,13 +76,6 @@ if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# Ensure the main config file remains accessible to the hermes runtime user
# even if it was edited on the host after initial ownership setup.
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml"
chmod 640 "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
Binary file not shown.
+8 -3
View File
@@ -135,7 +135,7 @@ class SessionResetPolicy:
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=notify if notify is not None else True,
notify=_coerce_bool(notify, True),
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@@ -178,7 +178,7 @@ class PlatformConfig:
home_channel = HomeChannel.from_dict(data["home_channel"])
return cls(
enabled=data.get("enabled", False),
enabled=_coerce_bool(data.get("enabled"), False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
@@ -435,7 +435,7 @@ class GatewayConfig:
reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=data.get("always_log_local", True),
always_log_local=_coerce_bool(data.get("always_log_local"), True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@@ -687,6 +687,11 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
gac = telegram_cfg["group_allowed_chats"]
if isinstance(gac, list):
gac = ",".join(str(v) for v in gac)
os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
+16 -3
View File
@@ -21,6 +21,7 @@ Errors in hooks are caught and logged but never block the main pipeline.
import asyncio
import importlib.util
import sys
from typing import Any, Callable, Dict, List, Optional
import yaml
@@ -103,16 +104,28 @@ class HookRegistry:
print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
continue
# Dynamically load the handler module
# Dynamically load the handler module.
# Register in sys.modules BEFORE exec_module so Pydantic /
# dataclasses / typing introspection can resolve forward
# references (triggered by `from __future__ import annotations`
# in the handler). Without this, a handler that declares a
# Pydantic BaseModel for webhook/event payloads fails at first
# dispatch with "TypeAdapter ... is not fully defined".
module_name = f"hermes_hook_{hook_name}"
spec = importlib.util.spec_from_file_location(
f"hermes_hook_{hook_name}", handler_path
module_name, handler_path
)
if spec is None or spec.loader is None:
print(f"[hooks] Skipping {hook_name}: could not load handler.py", flush=True)
continue
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
sys.modules[module_name] = module
try:
spec.loader.exec_module(module)
except Exception:
sys.modules.pop(module_name, None)
raise
handle_fn = getattr(module, "handle", None)
if handle_fn is None:
+150 -22
View File
@@ -9,6 +9,7 @@ Exposes an HTTP server with endpoints:
- GET /v1/models lists hermes-agent as an available model
- POST /v1/runs start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id}/events SSE stream of structured lifecycle events
- POST /v1/runs/{run_id}/stop interrupt a running agent
- GET /health health check
- GET /health/detailed rich status for cross-container dashboard probing
@@ -586,6 +587,9 @@ class APIServerAdapter(BasePlatformAdapter):
self._run_streams: Dict[str, "asyncio.Queue[Optional[Dict]]"] = {}
# Creation timestamps for orphaned-run TTL sweep
self._run_streams_created: Dict[str, float] = {}
# Active run agent/task references for stop support
self._active_run_agents: Dict[str, Any] = {}
self._active_run_tasks: Dict[str, "asyncio.Task"] = {}
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
@staticmethod
@@ -1204,10 +1208,12 @@ class APIServerAdapter(BasePlatformAdapter):
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
asyncio task is cancelled. When ``store=True`` an initial
``in_progress`` snapshot is persisted immediately after
``response.created`` and disconnects update it to an
``incomplete`` snapshot so GET /v1/responses/{id} and
``previous_response_id`` chaining still have something to
recover from.
"""
import queue as _q
@@ -1269,6 +1275,60 @@ class APIServerAdapter(BasePlatformAdapter):
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
terminal_snapshot_persisted = False
def _persist_response_snapshot(
response_env: Dict[str, Any],
*,
conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
) -> None:
if not store:
return
if conversation_history_snapshot is None:
conversation_history_snapshot = list(conversation_history)
conversation_history_snapshot.append({"role": "user", "content": user_message})
self._response_store.put(response_id, {
"response": response_env,
"conversation_history": conversation_history_snapshot,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
def _persist_incomplete_if_needed() -> None:
"""Persist an ``incomplete`` snapshot if no terminal one was written.
Called from both the client-disconnect (``ConnectionResetError``)
and server-cancellation (``asyncio.CancelledError``) paths so
GET /v1/responses/{id} and ``previous_response_id`` chaining keep
working after abrupt stream termination.
"""
if not store or terminal_snapshot_persisted:
return
incomplete_text = "".join(final_text_parts) or final_response_text
incomplete_items: List[Dict[str, Any]] = list(emitted_items)
if incomplete_text:
incomplete_items.append({
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": incomplete_text}],
})
incomplete_env = _envelope("incomplete")
incomplete_env["output"] = incomplete_items
incomplete_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
incomplete_history = list(conversation_history)
incomplete_history.append({"role": "user", "content": user_message})
if incomplete_text:
incomplete_history.append({"role": "assistant", "content": incomplete_text})
_persist_response_snapshot(
incomplete_env,
conversation_history_snapshot=incomplete_history,
)
try:
# response.created — initial envelope, status=in_progress
@@ -1278,6 +1338,7 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "response.created",
"response": created_env,
})
_persist_response_snapshot(created_env)
last_activity = time.monotonic()
async def _open_message_item() -> None:
@@ -1534,6 +1595,18 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
_failed_history = list(conversation_history)
_failed_history.append({"role": "user", "content": user_message})
if final_response_text or agent_error:
_failed_history.append({
"role": "assistant",
"content": final_response_text or agent_error,
})
_persist_response_snapshot(
failed_env,
conversation_history_snapshot=_failed_history,
)
terminal_snapshot_persisted = True
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
@@ -1546,30 +1619,24 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
)
terminal_snapshot_persisted = True
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
_persist_incomplete_if_needed()
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
@@ -1585,6 +1652,22 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
except asyncio.CancelledError:
# Server-side cancellation (e.g. shutdown, request timeout) —
# persist an incomplete snapshot so GET /v1/responses/{id} and
# previous_response_id chaining still work, then re-raise so the
# runtime's cancellation semantics are respected.
_persist_incomplete_if_needed()
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE task cancelled")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
return response
@@ -2362,6 +2445,7 @@ class APIServerAdapter(BasePlatformAdapter):
stream_delta_callback=_text_cb,
tool_progress_callback=event_cb,
)
self._active_run_agents[run_id] = agent
def _run_sync():
r = agent.run_conversation(
user_message=user_message,
@@ -2401,8 +2485,11 @@ class APIServerAdapter(BasePlatformAdapter):
q.put_nowait(None)
except Exception:
pass
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
task = asyncio.create_task(_run_and_close())
self._active_run_tasks[run_id] = task
try:
self._background_tasks.add(task)
except TypeError:
@@ -2461,6 +2548,44 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _handle_stop_run(self, request: "web.Request") -> "web.Response":
"""POST /v1/runs/{run_id}/stop — interrupt a running agent."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
run_id = request.match_info["run_id"]
agent = self._active_run_agents.get(run_id)
task = self._active_run_tasks.get(run_id)
if agent is None and task is None:
return web.json_response(_openai_error(f"Run not found: {run_id}", code="run_not_found"), status=404)
if agent is not None:
try:
agent.interrupt("Stop requested via API")
except Exception:
pass
if task is not None and not task.done():
task.cancel()
# Bounded wait: run_conversation() executes in the default
# executor thread which task.cancel() cannot preempt — we rely on
# agent.interrupt() above to break the loop. Cap the wait so a
# slow/unresponsive interrupt can't hang this handler.
try:
await asyncio.wait_for(asyncio.shield(task), timeout=5.0)
except asyncio.TimeoutError:
logger.warning(
"[api_server] stop for run %s timed out after 5s; "
"agent may still be finishing the current step",
run_id,
)
except (asyncio.CancelledError, Exception):
pass
return web.json_response({"run_id": run_id, "status": "stopping"})
async def _sweep_orphaned_runs(self) -> None:
"""Periodically clean up run streams that were never consumed."""
while True:
@@ -2475,6 +2600,8 @@ class APIServerAdapter(BasePlatformAdapter):
logger.debug("[api_server] sweeping orphaned run %s", run_id)
self._run_streams.pop(run_id, None)
self._run_streams_created.pop(run_id, None)
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
# ------------------------------------------------------------------
# BasePlatformAdapter interface
@@ -2510,6 +2637,7 @@ class APIServerAdapter(BasePlatformAdapter):
# Structured event streaming
self._app.router.add_post("/v1/runs", self._handle_runs)
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
self._app.router.add_post("/v1/runs/{run_id}/stop", self._handle_stop_run)
# Start background sweep to clean up orphaned (unconsumed) run streams
sweep_task = asyncio.create_task(self._sweep_orphaned_runs())
try:
+147 -8
View File
@@ -148,7 +148,102 @@ def _detect_macos_system_proxy() -> str | None:
return None
def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
def _split_host_port(value: str) -> tuple[str, int | None]:
raw = str(value or "").strip()
if not raw:
return "", None
if "://" in raw:
parsed = urlsplit(raw)
return (parsed.hostname or "").lower().rstrip("."), parsed.port
if raw.startswith("[") and "]" in raw:
host, _, rest = raw[1:].partition("]")
port = None
if rest.startswith(":") and rest[1:].isdigit():
port = int(rest[1:])
return host.lower().rstrip("."), port
if raw.count(":") == 1:
host, _, maybe_port = raw.rpartition(":")
if maybe_port.isdigit():
return host.lower().rstrip("."), int(maybe_port)
return raw.lower().strip("[]").rstrip("."), None
def _no_proxy_entries() -> list[str]:
entries: list[str] = []
for key in ("NO_PROXY", "no_proxy"):
raw = os.environ.get(key, "")
entries.extend(part.strip() for part in raw.split(",") if part.strip())
return entries
def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
token = str(entry or "").strip().lower()
if not token:
return False
if token == "*":
return True
token_host, token_port = _split_host_port(token)
if token_port is not None and port is not None and token_port != port:
return False
if token_port is not None and port is None:
return False
if not token_host:
return False
try:
network = ipaddress.ip_network(token_host, strict=False)
try:
return ipaddress.ip_address(host) in network
except ValueError:
return False
except ValueError:
pass
try:
token_ip = ipaddress.ip_address(token_host)
try:
return ipaddress.ip_address(host) == token_ip
except ValueError:
return False
except ValueError:
pass
if token_host.startswith("*."):
suffix = token_host[1:]
return host.endswith(suffix)
if token_host.startswith("."):
return host == token_host[1:] or host.endswith(token_host)
return host == token_host or host.endswith(f".{token_host}")
def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
"""Return True when NO_PROXY/no_proxy matches at least one target host.
Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
CIDR ranges, optional host:port entries, and ``*``.
"""
entries = _no_proxy_entries()
if not entries or not target_hosts:
return False
if isinstance(target_hosts, str):
candidates = [target_hosts]
else:
candidates = list(target_hosts)
for candidate in candidates:
host, port = _split_host_port(str(candidate))
if not host:
continue
if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
return True
return False
def resolve_proxy_url(
platform_env_var: str | None = None,
*,
target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
) -> str | None:
"""Return a proxy URL from env vars, or macOS system proxy.
Check order:
@@ -156,18 +251,26 @@ def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
2. macOS system proxy via ``scutil --proxy`` (auto-detect)
Returns *None* if no proxy is found.
Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
of ``target_hosts``.
"""
if platform_env_var:
value = (os.environ.get(platform_env_var) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
return normalize_proxy_url(_detect_macos_system_proxy())
detected = normalize_proxy_url(_detect_macos_system_proxy())
if detected and should_bypass_proxy(target_hosts):
return None
return detected
def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -922,7 +1025,20 @@ class BasePlatformAdapter(ABC):
self._post_delivery_callbacks: Dict[str, Any] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
# Auto-TTS on voice input: ``_auto_tts_default`` is the global default
# (``voice.auto_tts`` in config.yaml, pushed by GatewayRunner on connect).
# Per-chat overrides live in two sets populated from ``_voice_mode``:
# - ``_auto_tts_enabled_chats``: chat explicitly opted in via ``/voice on``
# or ``/voice tts`` (mode is ``voice_only`` or ``all``). Fires even when
# the global default is False.
# - ``_auto_tts_disabled_chats``: chat explicitly opted out via
# ``/voice off`` (mode is ``off``). Suppresses auto-TTS even when the
# global default is True.
# The gate in _process_message() is:
# fire if chat in _auto_tts_enabled_chats
# OR (_auto_tts_default and chat not in _auto_tts_disabled_chats)
self._auto_tts_default: bool = False
self._auto_tts_enabled_chats: set = set()
self._auto_tts_disabled_chats: set = set()
# Chats where typing indicator is paused (e.g. during approval waits).
# _keep_typing skips send_typing when the chat_id is in this set.
@@ -944,6 +1060,21 @@ class BasePlatformAdapter(ABC):
def fatal_error_retryable(self) -> bool:
return self._fatal_error_retryable
def _should_auto_tts_for_chat(self, chat_id: str) -> bool:
"""Whether auto-TTS on voice input should fire for ``chat_id``.
Decision layers (Issue #16007):
1. Explicit ``/voice on`` or ``/voice tts`` always fire (even if
``voice.auto_tts`` is False).
2. Explicit ``/voice off`` never fire.
3. Fall back to the global ``voice.auto_tts`` config default.
"""
if chat_id in self._auto_tts_enabled_chats:
return True
if chat_id in self._auto_tts_disabled_chats:
return False
return bool(self._auto_tts_default)
def set_fatal_error_handler(self, handler: Callable[["BasePlatformAdapter"], Awaitable[None] | None]) -> None:
self._fatal_error_handler = handler
@@ -2111,12 +2242,14 @@ class BasePlatformAdapter(ABC):
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
# Auto-TTS: if voice message, generate audio FIRST (before sending text)
# Skipped when the chat has voice mode disabled (/voice off)
# Gated via ``_should_auto_tts_for_chat``: fires when the chat has
# an explicit ``/voice on|tts`` opt-in OR when ``voice.auto_tts`` is
# True globally and no ``/voice off`` has been issued.
_tts_path = None
if (event.message_type == MessageType.VOICE
if (self._should_auto_tts_for_chat(event.source.chat_id)
and event.message_type == MessageType.VOICE
and text_content
and not media_files
and event.source.chat_id not in self._auto_tts_disabled_chats):
and not media_files):
try:
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
if check_tts_requirements():
@@ -2440,6 +2573,9 @@ class BasePlatformAdapter(ABC):
user_id_alt: Optional[str] = None,
chat_id_alt: Optional[str] = None,
is_bot: bool = False,
guild_id: Optional[str] = None,
parent_chat_id: Optional[str] = None,
message_id: Optional[str] = None,
) -> SessionSource:
"""Helper to build a SessionSource for this platform."""
# Normalize empty topic to None
@@ -2457,6 +2593,9 @@ class BasePlatformAdapter(ABC):
user_id_alt=user_id_alt,
chat_id_alt=chat_id_alt,
is_bot=is_bot,
guild_id=str(guild_id) if guild_id else None,
parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
message_id=str(message_id) if message_id else None,
)
@abstractmethod
+19 -2
View File
@@ -99,6 +99,7 @@ def _normalize_server_url(raw: str) -> str:
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
SUPPORTS_MESSAGE_EDITING = False
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
@@ -391,6 +392,13 @@ class BlueBubblesAdapter(BasePlatformAdapter):
# Text sending
# ------------------------------------------------------------------
@staticmethod
def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
# Use the base splitter but skip pagination indicators — iMessage
# bubbles flow naturally without "(1/3)" suffixes.
chunks = BasePlatformAdapter.truncate_message(content, max_length)
return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
async def send(
self,
chat_id: str,
@@ -398,10 +406,19 @@ class BlueBubblesAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = strip_markdown(content or "")
text = self.format_message(content)
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
# Split on paragraph breaks first (double newlines) so each thought
# becomes its own iMessage bubble, then truncate any that are still
# too long.
paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
chunks: List[str] = []
for para in (paragraphs or [text]):
if len(para) <= self.MAX_MESSAGE_LENGTH:
chunks.append(para)
else:
chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)
+4 -5
View File
@@ -2315,11 +2315,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_background(interaction: discord.Interaction, prompt: str):
await self._run_simple_slash(interaction, f"/background {prompt}", "Background task started~")
@tree.command(name="btw", description="Ephemeral side question using session context")
@discord.app_commands.describe(question="Your side question (no tools, not persisted)")
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
@@ -3261,6 +3256,7 @@ class DiscordAdapter(BasePlatformAdapter):
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
thread = await self._auto_create_thread(message)
if thread:
parent_channel_id = str(message.channel.id)
is_thread = True
thread_id = str(thread.id)
auto_threaded_channel = thread
@@ -3320,6 +3316,9 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
# Build media URLs -- download image attachments to local cache so the
+14
View File
@@ -532,6 +532,20 @@ class MatrixAdapter(BasePlatformAdapter):
)
await crypto_store.open()
# Bind the store to the runtime device_id before any
# put_account() runs. PgCryptoStore defaults _device_id
# to "" and its crypto_account UPSERT never updates the
# device_id column on conflict — so once put_account
# writes blank, it stays blank forever. That breaks
# every downstream device-scoped olm operation: peer
# to-device ciphertext can't find our identity key and
# no megolm sessions ever land. Setting _device_id here
# (in-memory; the on-disk row may not exist yet) makes
# the first put_account write the correct value.
# DeviceID is a NewType(str) so plain str works at runtime.
if client.device_id:
await crypto_store.put_device_id(client.device_id)
crypto_state = _CryptoStateStore(state_store, self._joined_rooms)
olm = OlmMachine(client, crypto_store, crypto_state)
+2 -1
View File
@@ -703,7 +703,6 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -714,6 +713,8 @@ class TelegramAdapter(BasePlatformAdapter):
", ".join(fallback_ips),
)
proxy_targets = ["api.telegram.org", *fallback_ips]
proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
if fallback_ips and not proxy_url and not disable_fallback:
logger.info(
"[%s] Telegram fallback IPs active: %s",
+3 -3
View File
@@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
def _resolve_proxy_url(target_hosts=None) -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY")
return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url()
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
+709 -470
View File
File diff suppressed because it is too large Load Diff
+99 -16
View File
@@ -60,6 +60,10 @@ from .config import (
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
@dataclass
@@ -83,6 +87,9 @@ class SessionSource:
user_id_alt: Optional[str] = None # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
chat_id_alt: Optional[str] = None # Signal group internal ID
is_bot: bool = False # True when the message author is a bot/webhook (Discord)
guild_id: Optional[str] = None # Discord guild / Slack workspace / Matrix server scope
parent_chat_id: Optional[str] = None # Parent channel when chat_id refers to a thread
message_id: Optional[str] = None # ID of the triggering message (for pin/reply/react)
@property
def description(self) -> str:
@@ -120,8 +127,14 @@ class SessionSource:
d["user_id_alt"] = self.user_id_alt
if self.chat_id_alt:
d["chat_id_alt"] = self.chat_id_alt
if self.guild_id:
d["guild_id"] = self.guild_id
if self.parent_chat_id:
d["parent_chat_id"] = self.parent_chat_id
if self.message_id:
d["message_id"] = self.message_id
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
return cls(
@@ -135,6 +148,9 @@ class SessionSource:
chat_topic=data.get("chat_topic"),
user_id_alt=data.get("user_id_alt"),
chat_id_alt=data.get("chat_id_alt"),
guild_id=data.get("guild_id"),
parent_chat_id=data.get("parent_chat_id"),
message_id=data.get("message_id"),
)
@@ -186,6 +202,31 @@ that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def _discord_tools_loaded() -> bool:
"""True iff the agent will actually have Discord tools this session.
Two conditions must hold:
1. The `discord` or `discord_admin` toolset is enabled for the
Discord platform via `hermes tools` (opt-in, default OFF).
2. `DISCORD_BOT_TOKEN` is set the tool's `check_fn` gates on it
at registry time, so the toolset being enabled in config is not
enough if the token isn't configured.
Returns False (safe default keeps the stale-API disclaimer) on any
error so a bad config can't silently promise tools the agent lacks.
"""
if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
return False
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import _get_platform_tools
cfg = load_config()
enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
return "discord" in enabled or "discord_admin" in enabled
except Exception:
return False
def build_session_context_prompt(
context: SessionContext,
*,
@@ -273,13 +314,44 @@ def build_session_context_prompt(
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.DISCORD:
# Inject the Discord IDs block only when the agent actually has
# Discord tools loaded this session — i.e. the user opted into
# `discord` / `discord_admin` via `hermes tools` AND the bot
# token is configured. Otherwise keep the stale-API disclaimer
# honest so we never promise tools the agent lacks.
if _discord_tools_loaded():
src = context.source
id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
if src.guild_id:
id_lines.append(f" - Guild: `{src.guild_id}`")
if src.thread_id and src.parent_chat_id:
id_lines.append(f" - Parent channel: `{src.parent_chat_id}`")
id_lines.append(f" - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
else:
id_lines.append(f" - Channel: `{src.chat_id}`")
if src.message_id:
id_lines.append(f" - Triggering message: `{src.message_id}`")
lines.extend(id_lines)
else:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
"**Platform notes:** You are responding via iMessage. "
"Keep responses short and conversational — think texts, not essays. "
"Structure longer replies as separate short thoughts, each separated "
"by a blank line (double newline). Each block between blank lines "
"will be delivered as its own iMessage bubble, so write accordingly: "
"one idea per bubble, 13 sentences each. "
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
)
# Connected platforms
@@ -367,11 +439,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@@ -407,7 +479,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
"expiry_finalized": self.expiry_finalized,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@@ -459,7 +531,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
@@ -518,15 +590,24 @@ def build_session_key(
"""
platform = source.platform.value
if source.chat_type == "dm":
if source.chat_id:
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
@@ -1151,6 +1232,7 @@ class SessionStore:
reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
codex_message_items=message.get("codex_message_items") if message.get("role") == "assistant" else None,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -1183,6 +1265,7 @@ class SessionStore:
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
+135
View File
@@ -0,0 +1,135 @@
"""Shared helpers for canonicalising WhatsApp sender identity.
WhatsApp's bridge can surface the same human under two different JID shapes
within a single conversation:
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
Both the authorisation path (:mod:`gateway.run`) and the session-key path
(:mod:`gateway.session`) need to collapse these aliases to a single stable
identity. This module is the single source of truth for that resolution so
the two paths can never drift apart.
Public helpers:
- :func:`normalize_whatsapp_identifier` strip JID/LID/device/plus syntax
down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` walk the bridge's
``lid-mapping-*.json`` files and return a stable canonical identity
across phone/LID variants.
- :func:`expand_whatsapp_aliases` return the full alias set for an
identifier. Used by authorisation code that needs to match any known
form of a sender against an allow-list.
Plugins that need per-sender behaviour on WhatsApp (role-based routing,
per-contact authorisation, policy gating in a gateway hook) should use
``canonical_whatsapp_identifier`` so their bookkeeping lines up with
Hermes' own session keys.
"""
from __future__ import annotations
import json
from typing import Set
from hermes_constants import get_hermes_home
def normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
Accepts any of the identifier shapes the WhatsApp bridge may emit:
``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
Returns just the numeric identifier (``"60123456789"``) suitable for
equality comparisons.
Useful for plugins that want to match sender IDs against
user-supplied config (phone numbers in ``config.yaml``) without
worrying about which variant the bridge happens to deliver.
"""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
Returns the set of all identifiers transitively reachable through the
bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
starting from ``identifier``. The result always includes the
normalized input itself, so callers can safely ``in`` check against
the return value without a separate fallback branch.
Returns an empty set if ``identifier`` normalizes to empty.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = get_hermes_home() / "whatsapp" / "session"
resolved: Set[str] = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
def canonical_whatsapp_identifier(identifier: str) -> str:
"""Return a stable WhatsApp sender identity across phone-JID/LID variants.
WhatsApp may surface the same person under either a phone-format JID
(``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
member inside a group chat both represent a user identity, and the
bridge may flip between the two for the same human.
This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
files, walks the mapping transitively, and picks the shortest
(numeric-preferred) alias as the canonical identity.
:func:`gateway.session.build_session_key` uses this for both WhatsApp
DM chat_ids and WhatsApp group participant_ids, so callers get the
same session-key identity Hermes itself uses.
Plugins that need per-sender behaviour (role-based routing,
authorisation, per-contact policy) should use this so their
bookkeeping lines up with Hermes' session bookkeeping even when
the bridge reshuffles aliases.
Returns an empty string if ``identifier`` normalizes to empty. If no
mapping files exist yet (fresh bridge install), returns the
normalized input unchanged.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return ""
# expand_whatsapp_aliases always includes `normalized` itself in the
# returned set, so the min() below degrades gracefully to `normalized`
# when no lid-mapping files are present.
aliases = expand_whatsapp_aliases(normalized)
return min(aliases, key=lambda candidate: (len(candidate), candidate))
+22 -3
View File
@@ -356,6 +356,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=(),
base_url_env_var="BEDROCK_BASE_URL",
),
"azure-foundry": ProviderConfig(
id="azure-foundry",
name="Azure Foundry",
auth_type="api_key",
inference_base_url="", # User-provided endpoint
api_key_env_vars=("AZURE_FOUNDRY_API_KEY",),
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
}
@@ -743,7 +751,18 @@ def _load_auth_store(auth_file: Optional[Path] = None) -> Dict[str, Any]:
try:
raw = json.loads(auth_file.read_text())
except Exception:
except Exception as exc:
corrupt_path = auth_file.with_suffix(".json.corrupt")
try:
import shutil
shutil.copy2(auth_file, corrupt_path)
except Exception:
pass
logger.warning(
"auth: failed to parse %s (%s) — starting with empty store. "
"Corrupt file preserved at %s",
auth_file, exc, corrupt_path,
)
return {"version": AUTH_STORE_VERSION, "providers": {}}
if isinstance(raw, dict) and (
@@ -4225,10 +4244,10 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
)
from hermes_cli.models import (
_PROVIDER_MODELS, get_pricing_for_provider,
get_curated_nous_model_ids, get_pricing_for_provider,
check_nous_free_tier, partition_nous_models_by_tier,
)
model_ids = _PROVIDER_MODELS.get("nous", [])
model_ids = get_curated_nous_model_ids()
print()
unavailable_models: list = []
+300
View File
@@ -0,0 +1,300 @@
"""Azure Foundry endpoint auto-detection.
Inspect an Azure AI Foundry / Azure OpenAI endpoint to determine:
- API transport (OpenAI-style ``chat_completions`` vs
Anthropic-style ``anthropic_messages``)
- Available models (best effort Azure does not expose a deployment
listing via the inference API key, but Azure OpenAI v1 endpoints
return the resource's model catalog via ``GET /models``)
- Context length for each discovered/entered model, via the existing
:func:`agent.model_metadata.get_model_context_length` resolver.
Rationale:
Azure has no pure-API-key deployment-listing endpoint per Microsoft,
deployment enumeration requires ARM management-plane auth. Azure
OpenAI v1 endpoints ``{resource}.openai.azure.com/openai/v1`` do return
a ``/models`` list, but it reflects the resource's *available* models
rather than the user's *deployed* deployment names. In practice it is
still a useful hint the user picks a familiar model name and we look
up its context length from the catalog.
The detector never crashes on errors (every HTTP call is wrapped in a
broad try/except). Callers get a :class:`DetectionResult` with whatever
information could be gathered, and fall back to manual entry for the
rest.
"""
from __future__ import annotations
import json
import logging
import re
from dataclasses import dataclass, field
from typing import Optional
from urllib import request as urllib_request
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse, urlunparse
logger = logging.getLogger(__name__)
# Default Azure OpenAI ``api-version`` to probe with. The v1 GA endpoint
# accepts requests without ``api-version`` entirely, so this is only used
# as a fallback for pre-v1 resources that still require it.
_AZURE_OPENAI_PROBE_API_VERSIONS = (
"2025-04-01-preview",
"2024-10-21", # oldest GA that supports /models
)
# Default Azure Anthropic ``api-version``. Matches the value used by
# ``agent/anthropic_adapter.py`` when building the Anthropic client.
_AZURE_ANTHROPIC_API_VERSION = "2025-04-15"
@dataclass
class DetectionResult:
"""Everything auto-detection could gather from a base URL + API key."""
#: Detected API transport: ``"chat_completions"``,
#: ``"anthropic_messages"``, or ``None`` when detection failed.
api_mode: Optional[str] = None
#: Deployment / model IDs returned by ``/models`` (best effort).
#: Empty when the endpoint doesn't expose the list with an API key.
models: list[str] = field(default_factory=list)
#: Lowercased host from the base URL (used for display messages).
hostname: str = ""
#: Human-readable reason the detector chose ``api_mode``. Useful
#: for explaining auto-detection to the user in the wizard.
reason: str = ""
#: ``True`` when ``/models`` returned a valid OpenAI-shaped payload.
models_probe_ok: bool = False
#: ``True`` when the URL was determined to be an Anthropic-style
#: endpoint (from path suffix or live probe).
is_anthropic: bool = False
def _http_get_json(url: str, api_key: str, timeout: float = 6.0) -> tuple[int, Optional[dict]]:
"""GET a URL with ``api-key`` + ``Authorization`` headers. Return
``(status_code, parsed_json_or_None)``. Never raises."""
req = urllib_request.Request(url, method="GET")
# Azure OpenAI uses ``api-key``. Some Azure deployments (and
# Anthropic-style routes) use ``Authorization: Bearer``. Send both
# so we probe once per URL rather than twice.
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=timeout) as resp:
body = resp.read()
try:
return resp.status, json.loads(body.decode("utf-8", errors="replace"))
except Exception:
return resp.status, None
except HTTPError as exc:
return exc.code, None
except (URLError, TimeoutError, OSError) as exc:
logger.debug("azure_detect: GET %s failed: %s", url, exc)
return 0, None
except Exception as exc: # pragma: no cover — defensive
logger.debug("azure_detect: GET %s unexpected error: %s", url, exc)
return 0, None
def _strip_trailing_v1(url: str) -> str:
"""Strip trailing ``/v1`` or ``/v1/`` so we can construct sub-paths."""
return re.sub(r"/v1/?$", "", url.rstrip("/"))
def _looks_like_anthropic_path(url: str) -> bool:
"""Return True when the URL's path ends in ``/anthropic`` or
contains a ``/anthropic/`` segment. Used by Azure Foundry
resources that route Claude traffic through a dedicated path."""
try:
parsed = urlparse(url)
path = (parsed.path or "").lower().rstrip("/")
return path.endswith("/anthropic") or "/anthropic/" in path + "/"
except Exception:
return False
def _extract_model_ids(payload: dict) -> list[str]:
"""Extract a list of model IDs from an OpenAI-shaped ``/models``
response. Returns ``[]`` on any shape mismatch."""
data = payload.get("data") if isinstance(payload, dict) else None
if not isinstance(data, list):
return []
ids: list[str] = []
for item in data:
if not isinstance(item, dict):
continue
# OpenAI shape: {"id": "gpt-5.4", "object": "model", ...}
mid = item.get("id") or item.get("model") or item.get("name")
if isinstance(mid, str) and mid:
ids.append(mid)
return ids
def _probe_openai_models(base_url: str, api_key: str) -> tuple[bool, list[str]]:
"""Probe ``<base>/models`` for an OpenAI-shaped response.
Returns ``(ok, models)``. ``ok`` is True iff the endpoint accepted
us as an OpenAI-style caller (200 OK + OpenAI-shaped JSON body).
"""
base_url = base_url.rstrip("/")
# Azure OpenAI v1: {resource}.openai.azure.com/openai/v1 — no
# api-version required for GA paths, so probe without first.
candidates = [f"{base_url}/models"]
# Fallback: explicit api-version for pre-v1 resources
for v in _AZURE_OPENAI_PROBE_API_VERSIONS:
candidates.append(f"{base_url}/models?api-version={v}")
for url in candidates:
status, body = _http_get_json(url, api_key)
if status == 200 and body is not None:
ids = _extract_model_ids(body)
if ids:
logger.info(
"azure_detect: /models probe OK at %s (%d models)",
url, len(ids),
)
return True, ids
# 200 + empty list still counts as "OpenAI shape, no models
# listed" — let the user proceed with manual entry.
if isinstance(body, dict) and "data" in body:
return True, []
return False, []
def _probe_anthropic_messages(base_url: str, api_key: str) -> bool:
"""Send a zero-token request to ``<base>/v1/messages`` and check
whether the endpoint at least *recognises* the Anthropic Messages
shape (any 4xx that mentions ``messages`` or ``model``, or a 400
``invalid_request`` with an Anthropic error shape). Never completes
a real chat.
"""
base = _strip_trailing_v1(base_url)
url = f"{base}/v1/messages?api-version={_AZURE_ANTHROPIC_API_VERSION}"
payload = json.dumps({
"model": "probe",
"max_tokens": 1,
"messages": [{"role": "user", "content": "ping"}],
}).encode("utf-8")
req = urllib_request.Request(url, method="POST", data=payload)
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("anthropic-version", "2023-06-01")
req.add_header("content-type", "application/json")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=6.0) as resp:
# Should never 200 — "probe" isn't a real deployment. But
# if it does, the endpoint definitely speaks Anthropic.
return resp.status < 500
except HTTPError as exc:
# 4xx with an Anthropic-shaped error body = Anthropic endpoint.
try:
body = exc.read().decode("utf-8", errors="replace")
lowered = body.lower()
if "anthropic" in lowered or '"type"' in lowered and '"error"' in lowered:
return True
# Pre-Azure-v1 Azure Foundry returns a plain 404 for
# Anthropic-style calls on non-Anthropic deployments. A
# 400 "model not found" IS Anthropic though.
if exc.code == 400 and ("messages" in lowered or "model" in lowered):
return True
return False
except Exception:
return False
except (URLError, TimeoutError, OSError):
return False
except Exception: # pragma: no cover
return False
def detect(base_url: str, api_key: str) -> DetectionResult:
"""Inspect an Azure endpoint and describe its transport + models.
Call this from the wizard before asking the user to pick an API
mode manually. The caller should treat the returned
:class:`DetectionResult` as *advisory* if ``api_mode`` is None,
fall back to asking the user.
"""
result = DetectionResult()
try:
parsed = urlparse(base_url)
result.hostname = (parsed.hostname or "").lower()
except Exception:
result.hostname = ""
# 1. Path sniff. Azure Foundry exposes Anthropic-style deployments
# under a dedicated ``/anthropic`` path.
if _looks_like_anthropic_path(base_url):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "URL path ends in /anthropic → Anthropic Messages API"
return result
# 2. Try the OpenAI-style /models probe. If this works, the
# endpoint definitely speaks OpenAI wire.
ok, models = _probe_openai_models(base_url, api_key)
if ok:
result.models_probe_ok = True
result.models = models
result.api_mode = "chat_completions"
result.reason = (
f"GET /models returned {len(models)} model(s) — OpenAI-style endpoint"
if models
else "GET /models returned an OpenAI-shaped empty list — OpenAI-style endpoint"
)
return result
# 3. Fallback: probe the Anthropic Messages shape. Slower and more
# intrusive than /models, so only run it when the OpenAI probe
# failed.
if _probe_anthropic_messages(base_url, api_key):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "Endpoint accepts Anthropic Messages shape"
return result
# Nothing matched. Caller falls back to manual selection.
result.reason = (
"Could not probe endpoint (private network, missing model list, or "
"non-standard path) — falling back to manual API-mode selection"
)
return result
def lookup_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
"""Thin wrapper around :func:`agent.model_metadata.get_model_context_length`
that returns ``None`` when only the fallback default (128k) would
fire, so the wizard can distinguish "we actually know this" from
"we guessed."""
try:
from agent.model_metadata import (
DEFAULT_FALLBACK_CONTEXT,
get_model_context_length,
)
except Exception:
return None
try:
n = get_model_context_length(model, base_url=base_url, api_key=api_key)
except Exception as exc:
logger.debug("azure_detect: context length lookup failed: %s", exc)
return None
if isinstance(n, int) and n > 0 and n != DEFAULT_FALLBACK_CONTEXT:
return n
return None
__all__ = ["DetectionResult", "detect", "lookup_context_length"]
+11 -4
View File
@@ -84,9 +84,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("deny", "Deny a pending dangerous command", "Session",
gateway_only=True),
CommandDef("background", "Run a prompt in the background", "Session",
aliases=("bg",), args_hint="<prompt>"),
CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
args_hint="<question>"),
aliases=("bg", "btw"), args_hint="<prompt>"),
CommandDef("agents", "Show active agents and running tasks", "Session",
aliases=("tasks",)),
CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
@@ -103,7 +101,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
@@ -126,6 +125,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
cli_only=True, args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -138,6 +140,11 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("kanban", "Multi-profile collaboration board (tasks, links, comments)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("list", "ls", "show", "create", "assign", "link", "unlink",
"claim", "comment", "complete", "block", "unblock", "archive",
"tail", "dispatch", "context", "init", "gc")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
+118 -9
View File
@@ -612,14 +612,6 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",
@@ -783,6 +775,15 @@ DEFAULT_CONFIG = {
# warning log if out of range.
"max_spawn_depth": 1, # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
"orchestrator_enabled": True, # kill switch for role="orchestrator"
# When a subagent hits a dangerous-command approval prompt, the parent's
# prompt_toolkit TUI owns stdin — a thread-local input() call from the
# subagent worker would deadlock the parent UI. To avoid the deadlock,
# subagent threads ALWAYS resolve approvals non-interactively:
# false (default) → auto-deny with a logger.warning audit line (safe)
# true → auto-approve "once" with a logger.warning audit line
# Flip to true only if you trust delegated work to run dangerous cmds
# without human review (cron pipelines, batch automation, etc.).
"subagent_auto_approve": False,
},
# Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -839,7 +840,7 @@ DEFAULT_CONFIG = {
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
# discord_server tool: restrict which actions the agent may call.
# discord / discord_admin tools: restrict which actions the agent may call.
# Default (empty) = all actions allowed (subject to bot privileged intents).
# Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
# or YAML list. Unknown names are dropped with a warning at load time.
@@ -958,6 +959,27 @@ DEFAULT_CONFIG = {
"backup_count": 3, # Number of rotated backup files to keep
},
# Remotely-hosted model catalog manifest. When enabled, the CLI fetches
# curated model lists for OpenRouter and Nous Portal from this URL,
# falling back to the in-repo snapshot on network failure. Lets us
# update model picker lists without shipping a hermes-agent release.
# The default URL is served by the docs site GitHub Pages deploy.
"model_catalog": {
"enabled": True,
"url": "https://hermes-agent.nousresearch.com/docs/api/model-catalog.json",
# Disk cache TTL in hours. Beyond this, the CLI refetches on the
# next /model or `hermes model` invocation; network failures
# silently fall back to the stale cache.
"ttl_hours": 24,
# Optional per-provider override URLs for third parties that want
# to self-host their own curation list using the same schema.
# Example:
# providers:
# openrouter:
# url: https://example.com/my-curation.json
"providers": {},
},
# Network settings — workarounds for connectivity issues.
"network": {
# Force IPv4 connections. On servers with broken or unreachable IPv6,
@@ -994,6 +1016,13 @@ DEFAULT_CONFIG = {
"min_interval_hours": 24,
},
# Contextual first-touch onboarding hints (see agent/onboarding.py).
# Each hint is shown once per install and then latched here so it
# never fires again. Users can wipe the section to re-see all hints.
"onboarding": {
"seen": {},
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1370,6 +1399,21 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AZURE_FOUNDRY_API_KEY": {
"description": "Azure Foundry API key for custom Azure endpoints",
"prompt": "Azure Foundry API Key",
"url": "https://ai.azure.com/",
"password": True,
"category": "provider",
},
"AZURE_FOUNDRY_BASE_URL": {
"description": "Azure Foundry base URL (set via 'hermes model' for endpoint-specific config)",
"prompt": "Azure Foundry base URL",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -2205,6 +2249,71 @@ def get_compatible_custom_providers(
return compatible
def get_custom_provider_context_length(
model: str,
base_url: str,
custom_providers: Optional[List[Dict[str, Any]]] = None,
config: Optional[Dict[str, Any]] = None,
) -> Optional[int]:
"""Look up a per-model ``context_length`` override from ``custom_providers``.
Matches any entry whose ``base_url`` equals ``base_url`` (trailing-slash
insensitive) and returns ``custom_providers[i].models.<model>.context_length``
if present and valid. Returns ``None`` when no override applies.
This is the single source of truth for custom-provider context overrides,
used by:
* ``AIAgent.__init__`` (startup resolution)
* ``AIAgent.switch_model`` (mid-session ``/model`` switch)
* ``hermes_cli.model_switch.resolve_display_context_length`` (``/model`` confirmation display)
* ``gateway.run._format_session_info`` (``/info`` display)
* ``agent.model_metadata.get_model_context_length`` (when custom_providers is threaded through)
Before this helper existed, the lookup was duplicated in ``run_agent.py``'s
startup path only; every other path (notably ``/model`` switch) fell back
to the 128K default. See #15779.
"""
if not model or not base_url:
return None
if custom_providers is None:
try:
custom_providers = get_compatible_custom_providers(config)
except Exception:
if config is None:
return None
raw = config.get("custom_providers")
custom_providers = raw if isinstance(raw, list) else []
if not isinstance(custom_providers, list):
return None
target_url = (base_url or "").rstrip("/")
if not target_url:
return None
for entry in custom_providers:
if not isinstance(entry, dict):
continue
entry_url = (entry.get("base_url") or "").rstrip("/")
if not entry_url or entry_url != target_url:
continue
models = entry.get("models")
if not isinstance(models, dict):
continue
model_cfg = models.get(model)
if not isinstance(model_cfg, dict):
continue
raw_ctx = model_cfg.get("context_length")
if raw_ctx is None:
continue
try:
ctx = int(raw_ctx)
except (TypeError, ValueError):
continue
if ctx > 0:
return ctx
return None
def check_config_version() -> Tuple[int, int]:
"""
Check config version.
+5 -1
View File
@@ -320,7 +320,11 @@ def run_doctor(args):
known_providers.add("custom:" + name.lower().replace(" ", "-"))
canonical_provider = provider
if provider and _resolve_provider_full is not None and provider != "auto":
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
+361
View File
@@ -0,0 +1,361 @@
"""
hermes fallback manage the fallback provider chain.
Fallback providers are tried in order when the primary model fails with
rate-limit, overload, or connection errors. See:
https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers
Subcommands:
hermes fallback [list] Show the current fallback chain (default when no subcommand)
hermes fallback add Pick provider + model via the same picker as `hermes model`,
then append the selection to the chain
hermes fallback remove Pick an entry to delete from the chain
hermes fallback clear Remove all fallback entries
Storage: ``fallback_providers`` in ``~/.hermes/config.yaml`` (top-level, list of
``{provider, model, base_url?, api_mode?}`` dicts). The legacy single-dict
``fallback_model`` format is migrated to the new list format on first add.
"""
from __future__ import annotations
import copy
from typing import Any, Dict, List, Optional
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return the normalized fallback chain as a list of dicts.
Accepts both the new list format (``fallback_providers``) and the legacy
single-dict format (``fallback_model``). The returned list is always a
fresh copy callers can mutate without touching the config dict.
"""
chain = config.get("fallback_providers") or []
if isinstance(chain, list):
result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
if result:
return result
legacy = config.get("fallback_model")
if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
return [dict(legacy)]
if isinstance(legacy, list):
return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
return []
def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
"""Persist the chain to ``fallback_providers`` and clear legacy key."""
config["fallback_providers"] = chain
# Drop the legacy single-dict key on write so there's only one source of truth.
if "fallback_model" in config:
config.pop("fallback_model", None)
def _format_entry(entry: Dict[str, Any]) -> str:
"""One-line human-readable rendering of a fallback entry."""
provider = entry.get("provider", "?")
model = entry.get("model", "?")
base = entry.get("base_url")
suffix = f" [{base}]" if base else ""
return f"{model} (via {provider}){suffix}"
def _extract_fallback_from_model_cfg(model_cfg: Any) -> Optional[Dict[str, Any]]:
"""Pull the ``{provider, model, base_url?, api_mode?}`` dict from a ``config["model"]`` snapshot."""
if not isinstance(model_cfg, dict):
return None
provider = (model_cfg.get("provider") or "").strip()
# The picker writes the selected model to ``model.default``.
model = (model_cfg.get("default") or model_cfg.get("model") or "").strip()
if not provider or not model:
return None
entry: Dict[str, Any] = {"provider": provider, "model": model}
base_url = (model_cfg.get("base_url") or "").strip()
if base_url:
entry["base_url"] = base_url
api_mode = (model_cfg.get("api_mode") or "").strip()
if api_mode:
entry["api_mode"] = api_mode
return entry
def _snapshot_auth_active_provider() -> Any:
"""Return the current ``active_provider`` in auth.json, or a sentinel if unavailable."""
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
return store.get("active_provider")
except Exception:
return None
def _restore_auth_active_provider(value: Any) -> None:
"""Write back a previously snapshotted ``active_provider`` value."""
try:
from hermes_cli.auth import _auth_store_lock, _load_auth_store, _save_auth_store
with _auth_store_lock():
store = _load_auth_store()
store["active_provider"] = value
_save_auth_store(store)
except Exception:
# Best-effort — if auth.json can't be restored, the user's primary
# provider may have been deactivated by the picker. They can re-run
# `hermes model` to fix it. Don't fail the fallback add.
pass
# ---------------------------------------------------------------------------
# Subcommand handlers
# ---------------------------------------------------------------------------
def cmd_fallback_list(args) -> None: # noqa: ARG001
"""Print the current fallback chain."""
from hermes_cli.config import load_config
config = load_config()
chain = _read_chain(config)
print()
if not chain:
print(" No fallback providers configured.")
print()
print(" Add one with: hermes fallback add")
print()
return
primary = _describe_primary(config)
if primary:
print(f" Primary: {primary}")
print()
print(f" Fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
print(" Tried in order when the primary fails (rate-limit, 5xx, connection errors).")
print(" Docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers")
print()
def _describe_primary(config: Dict[str, Any]) -> Optional[str]:
"""One-line description of the primary model for display purposes."""
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
provider = (model_cfg.get("provider") or "?").strip() or "?"
model = (model_cfg.get("default") or model_cfg.get("model") or "?").strip() or "?"
return f"{model} (via {provider})"
if isinstance(model_cfg, str) and model_cfg.strip():
return model_cfg.strip()
return None
def cmd_fallback_add(args) -> None:
"""Launch the same picker as `hermes model`, then append the selection to the chain."""
from hermes_cli.main import _require_tty, select_provider_and_model
from hermes_cli.config import load_config, save_config
_require_tty("fallback add")
# Snapshot BEFORE the picker runs so we can distinguish "user actually
# picked something" from "user cancelled" by comparing before/after.
before_cfg = load_config()
model_before = copy.deepcopy(before_cfg.get("model"))
active_provider_before = _snapshot_auth_active_provider()
print()
print(" Adding a fallback provider. The picker below is the same one used by")
print(" `hermes model` — select the provider + model you want as a fallback.")
print()
try:
select_provider_and_model(args=args)
except SystemExit:
# Some provider flows exit on auth failure — restore state and re-raise.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
raise
# Read the post-picker state to see what the user selected.
after_cfg = load_config()
model_after = after_cfg.get("model")
new_entry = _extract_fallback_from_model_cfg(model_after)
if not new_entry:
# Picker didn't complete (user cancelled or flow bailed). Nothing to do.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(" No fallback added.")
return
# Picker picked the same thing that's already the primary → nothing changed,
# and there's nothing useful to add as a fallback to itself.
primary_entry = _extract_fallback_from_model_cfg(model_before)
if primary_entry and primary_entry["provider"] == new_entry["provider"] \
and primary_entry["model"] == new_entry["model"]:
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(f" Selected model matches the current primary ({_format_entry(new_entry)}).")
print(" A provider cannot be a fallback for itself — no change.")
return
# Reload the config with the primary restored, then append the new entry
# to ``fallback_providers``. We deliberately re-load (rather than mutating
# ``after_cfg``) because the picker may have touched other top-level keys
# (custom_providers, providers credentials) that we want to keep.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
final_cfg = load_config()
chain = _read_chain(final_cfg)
# Reject exact-duplicate fallback entries.
for existing in chain:
if existing.get("provider") == new_entry["provider"] \
and existing.get("model") == new_entry["model"]:
print()
print(f" {_format_entry(new_entry)} is already in the fallback chain — skipped.")
return
chain.append(new_entry)
_write_chain(final_cfg, chain)
save_config(final_cfg)
print()
print(f" Added fallback: {_format_entry(new_entry)}")
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
print()
print(" Run `hermes fallback list` to view, or `hermes fallback remove` to delete.")
def _restore_model_cfg(model_before: Any) -> None:
"""Restore ``config["model"]`` to a previously-captured snapshot."""
from hermes_cli.config import load_config, save_config
cfg = load_config()
if model_before is None:
cfg.pop("model", None)
else:
cfg["model"] = copy.deepcopy(model_before)
save_config(cfg)
def cmd_fallback_remove(args) -> None: # noqa: ARG001
"""Pick an entry from the chain and remove it."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to remove.")
print()
return
choices = [_format_entry(e) for e in chain]
choices.append("Cancel")
try:
from hermes_cli.setup import _curses_prompt_choice
idx = _curses_prompt_choice("Select a fallback to remove:", choices, 0)
except Exception:
idx = _numbered_pick("Select a fallback to remove:", choices)
if idx is None or idx < 0 or idx >= len(chain):
print()
print(" Cancelled — no change.")
return
removed = chain.pop(idx)
_write_chain(config, chain)
save_config(config)
print()
print(f" Removed fallback: {_format_entry(removed)}")
if chain:
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
else:
print(" Fallback chain is now empty.")
print()
def cmd_fallback_clear(args) -> None: # noqa: ARG001
"""Remove all fallback entries (with confirmation)."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to clear.")
print()
return
print()
print(f" Current fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
try:
resp = input(" Clear all entries? [y/N]: ").strip().lower()
except (KeyboardInterrupt, EOFError):
print()
print(" Cancelled.")
return
if resp not in ("y", "yes"):
print(" Cancelled — no change.")
return
_write_chain(config, [])
save_config(config)
print()
print(" Fallback chain cleared.")
print()
def _numbered_pick(question: str, choices: List[str]) -> Optional[int]:
"""Fallback numbered-list picker when curses is unavailable."""
print(question)
for i, c in enumerate(choices, 1):
print(f" {i}. {c}")
print()
while True:
try:
val = input(f"Choice [1-{len(choices)}]: ").strip()
if not val:
return None
idx = int(val) - 1
if 0 <= idx < len(choices):
return idx
print(f"Please enter 1-{len(choices)}")
except ValueError:
print("Please enter a number")
except (KeyboardInterrupt, EOFError):
print()
return None
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
def cmd_fallback(args) -> None:
"""Top-level dispatcher for ``hermes fallback [subcommand]``."""
sub = getattr(args, "fallback_command", None)
if sub in (None, "", "list", "ls"):
cmd_fallback_list(args)
elif sub == "add":
cmd_fallback_add(args)
elif sub in ("remove", "rm"):
cmd_fallback_remove(args)
elif sub == "clear":
cmd_fallback_clear(args)
else:
print(f"Unknown fallback subcommand: {sub}")
print("Use one of: list, add, remove, clear")
raise SystemExit(2)
+1
View File
@@ -125,6 +125,7 @@ _DEFAULT_PAYLOADS = {
"task_id": "test-task",
"tool_call_id": "test-call",
"result": '{"output": "hello"}',
"duration_ms": 42,
},
"pre_llm_call": {
"session_id": "test-session",
+1281
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+595 -35
View File
@@ -839,6 +839,8 @@ def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
def _tui_build_needed(tui_dir: Path) -> bool:
if _hermes_ink_bundle_stale(tui_dir):
return True
entry = tui_dir / "dist" / "entry.js"
if not entry.exists():
return True
@@ -1026,7 +1028,12 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
return [node, str(root / "dist" / "entry.js")], root
def _launch_tui(resume_session_id: Optional[str] = None, tui_dev: bool = False):
def _launch_tui(
resume_session_id: Optional[str] = None,
tui_dev: bool = False,
model: Optional[str] = None,
provider: Optional[str] = None,
):
"""Replace current process with the TUI."""
tui_dir = PROJECT_ROOT / "ui-tui"
@@ -1036,6 +1043,12 @@ def _launch_tui(resume_session_id: Optional[str] = None, tui_dev: bool = False):
)
env.setdefault("HERMES_PYTHON", sys.executable)
env.setdefault("HERMES_CWD", os.getcwd())
if model:
env["HERMES_MODEL"] = model
env["HERMES_INFERENCE_MODEL"] = model
if provider:
env["HERMES_TUI_PROVIDER"] = provider
env["HERMES_INFERENCE_PROVIDER"] = provider
# Guarantee an 8GB V8 heap + exposed GC for the TUI. Default node cap is
# ~1.54GB depending on version and can fatal-OOM on long sessions with
# large transcripts / reasoning blobs. Token-level merge: respect any
@@ -1174,6 +1187,8 @@ def cmd_chat(args):
_launch_tui(
getattr(args, "resume", None),
tui_dev=getattr(args, "tui_dev", False),
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
)
# Import and run the CLI
@@ -1512,6 +1527,83 @@ def select_provider_and_model(args=None):
all_providers = [(p.slug, p.tui_desc) for p in CANONICAL_PROVIDERS]
def _named_custom_provider_map(cfg) -> dict[str, dict[str, str]]:
from hermes_cli.config import read_raw_config
# Build a lookup of raw (un-expanded) api_key templates keyed by a
# stable identity. We intentionally bypass
# ``get_compatible_custom_providers(read_raw_config())`` here because
# its ``_normalize_custom_provider_entry`` step calls ``urlparse()``
# on ``base_url`` and drops any entry whose ``base_url`` is itself an
# env-ref template (e.g. ``${NEURALWATT_API_BASE}``). Dropping those
# entries is exactly how env-ref preservation fails for the user
# config that motivated this fix.
raw_api_key_refs: dict[tuple, str] = {}
raw_cfg = read_raw_config()
def _record_raw(
name: str,
provider_key: str,
model: str,
api_key: str,
) -> None:
template = str(api_key or "").strip()
if "${" not in template:
return
name = str(name or "").strip()
provider_key = str(provider_key or "").strip()
model = str(model or "").strip()
# Index by every plausible identity the loaded (expanded) config
# might present: (name), (name, model), (provider_key), and
# (provider_key, model). Case-insensitive on name/provider_key so
# the loaded entry matches regardless of display casing.
if name:
raw_api_key_refs.setdefault((name.lower(),), template)
raw_api_key_refs.setdefault((name.lower(), model), template)
if provider_key:
raw_api_key_refs.setdefault((provider_key.lower(),), template)
raw_api_key_refs.setdefault(
(provider_key.lower(), model), template
)
raw_list = raw_cfg.get("custom_providers")
if isinstance(raw_list, list):
for raw_entry in raw_list:
if not isinstance(raw_entry, dict):
continue
_record_raw(
raw_entry.get("name", ""),
"",
raw_entry.get("model", "")
or raw_entry.get("default_model", ""),
raw_entry.get("api_key", ""),
)
raw_providers = raw_cfg.get("providers")
if isinstance(raw_providers, dict):
for raw_key, raw_entry in raw_providers.items():
if not isinstance(raw_entry, dict):
continue
_record_raw(
raw_entry.get("name", "") or raw_key,
raw_key,
raw_entry.get("model", "")
or raw_entry.get("default_model", ""),
raw_entry.get("api_key", ""),
)
def _lookup_ref(name: str, provider_key: str, model: str) -> str:
name_lc = str(name or "").strip().lower()
pkey_lc = str(provider_key or "").strip().lower()
model = str(model or "").strip()
for identity in (
(pkey_lc, model),
(pkey_lc,),
(name_lc, model),
(name_lc,),
):
if identity[0] and identity in raw_api_key_refs:
return raw_api_key_refs[identity]
return ""
custom_provider_map = {}
for entry in get_compatible_custom_providers(cfg):
if not isinstance(entry, dict):
@@ -1535,6 +1627,9 @@ def select_provider_and_model(args=None):
"model": entry.get("model", ""),
"api_mode": entry.get("api_mode", ""),
"provider_key": provider_key,
"api_key_ref": _lookup_ref(
name, provider_key, entry.get("model", "")
),
}
return custom_provider_map
@@ -1624,6 +1719,8 @@ def select_provider_and_model(args=None):
_model_flow_stepfun(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider == "azure-foundry":
_model_flow_azure_foundry(config, current_model)
elif selected_provider in (
"gemini",
"deepseek",
@@ -1707,7 +1804,6 @@ _AUX_TASKS: list[tuple[str, str, str]] = [
("session_search", "Session search", "past-conversation recall"),
("approval", "Approval", "smart command approval"),
("mcp", "MCP", "MCP tool reasoning"),
("flush_memories", "Flush memories", "memory consolidation"),
("title_generation", "Title generation", "session titles"),
("skills_hub", "Skills hub", "skills search/install"),
]
@@ -2219,13 +2315,13 @@ def _model_flow_nous(config, current_model="", args=None):
# The live /models endpoint returns hundreds of models; the curated list
# shows only agentic models users recognize from OpenRouter.
from hermes_cli.models import (
_PROVIDER_MODELS,
get_curated_nous_model_ids,
get_pricing_for_provider,
check_nous_free_tier,
partition_nous_models_by_tier,
)
model_ids = _PROVIDER_MODELS.get("nous", [])
model_ids = get_curated_nous_model_ids()
if not model_ids:
print("No curated models available for Nous Portal.")
return
@@ -2768,6 +2864,19 @@ def _auto_provider_name(base_url: str) -> str:
return name
def _custom_provider_api_key_config_value(provider_info, resolved_api_key=""):
"""Return the value that should be persisted for a custom provider key."""
api_key_ref = str(provider_info.get("api_key_ref", "") or "").strip()
if api_key_ref:
return api_key_ref
key_env = str(provider_info.get("key_env", "") or "").strip()
if key_env and not str(provider_info.get("api_key", "") or "").strip():
return f"${{{key_env}}}"
return str(resolved_api_key or "").strip()
def _save_custom_provider(
base_url, api_key="", model="", context_length=None, name=None
):
@@ -2823,6 +2932,203 @@ def _save_custom_provider(
print(f' 💾 Saved to custom providers as "{name}" (edit in config.yaml)')
def _model_flow_azure_foundry(config, current_model=""):
"""Azure Foundry provider: configure endpoint, API mode, API key, and model.
Azure Foundry supports both OpenAI-style (``/v1/chat/completions``) and
Anthropic-style (``/v1/messages``) endpoints. The wizard auto-detects
the transport and available models when possible:
* URLs ending in ``/anthropic`` Anthropic Messages API.
* Successful ``GET <base>/models`` probe OpenAI-style + populates
a picker with the returned deployment / model IDs.
* Anthropic Messages probe fallback when ``/models`` fails.
* Manual entry when every probe fails (private endpoints, etc.).
Context lengths for the chosen model are resolved via the standard
:func:`agent.model_metadata.get_model_context_length` chain
(models.dev, provider metadata, hardcoded family fallbacks).
"""
from hermes_cli.auth import _save_model_choice, deactivate_provider # noqa: F401
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
from hermes_cli import azure_detect
import getpass
# ── Load current Azure Foundry configuration ─────────────────────
model_cfg = config.get("model", {})
if isinstance(model_cfg, dict) and model_cfg.get("provider") == "azure-foundry":
current_base_url = str(model_cfg.get("base_url", "") or "")
current_api_mode = str(model_cfg.get("api_mode", "") or "")
else:
current_base_url = ""
current_api_mode = ""
current_api_key = get_env_value("AZURE_FOUNDRY_API_KEY") or ""
print()
print("Azure Foundry Configuration")
print("=" * 50)
print()
print("Azure Foundry can host models with either OpenAI-style or")
print("Anthropic-style API endpoints. Hermes will probe your")
print("endpoint to auto-detect the transport and the deployed")
print("models when possible.")
print()
if current_base_url:
print(f" Current endpoint: {current_base_url}")
if current_api_mode:
_lbl = "OpenAI-style" if current_api_mode == "chat_completions" else "Anthropic-style"
print(f" Current API mode: {_lbl}")
if current_api_key:
print(f" Current API key: {current_api_key[:8]}...")
print()
# ── Step 1: endpoint URL ─────────────────────────────────────────
try:
base_url = input(
f"API endpoint URL [{current_base_url or 'e.g. https://your-resource.openai.azure.com/openai/v1'}]: "
).strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
effective_url = (base_url or current_base_url).rstrip("/")
if not effective_url:
print("No endpoint URL provided. Cancelled.")
return
if not effective_url.startswith(("http://", "https://")):
print(f"Invalid URL: {effective_url} (must start with http:// or https://)")
return
# ── Step 2: API key ──────────────────────────────────────────────
print()
try:
api_key = getpass.getpass(
f"API key [{current_api_key[:8] + '...' if current_api_key else 'required'}]: "
).strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
effective_key = api_key or current_api_key
if not effective_key:
print("No API key provided. Cancelled.")
return
# ── Step 3: auto-detect transport + models ───────────────────────
print()
print("◐ Probing endpoint to auto-detect transport and models...")
detection = azure_detect.detect(effective_url, effective_key)
discovered_models: list[str] = list(detection.models)
api_mode: str = detection.api_mode or ""
if api_mode:
mode_label = "OpenAI-style" if api_mode == "chat_completions" else "Anthropic-style"
print(f"✓ Detected API transport: {mode_label}")
if detection.reason:
print(f" ({detection.reason})")
if discovered_models:
print(f"✓ Found {len(discovered_models)} deployed model(s) on this endpoint")
else:
print(f"⚠ Auto-detection incomplete: {detection.reason}")
print()
print("Select the API format your Azure Foundry endpoint uses:")
print(" 1. OpenAI-style (POST /v1/chat/completions)")
print(" For: GPT models, Llama, Mistral, and most open models")
print(" 2. Anthropic-style (POST /v1/messages)")
print(" For: Claude models deployed via Anthropic API format")
try:
default_choice = "2" if current_api_mode == "anthropic_messages" else "1"
mode_choice = input(f"API format [1/2] ({default_choice}): ").strip() or default_choice
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
api_mode = "anthropic_messages" if mode_choice == "2" else "chat_completions"
# ── Step 4: model name ───────────────────────────────────────────
print()
effective_model = ""
if discovered_models:
print("Available models on this endpoint:")
for i, mid in enumerate(discovered_models[:30], start=1):
print(f" {i:>2}. {mid}")
if len(discovered_models) > 30:
print(f" ... and {len(discovered_models) - 30} more (type name manually if not shown)")
print()
try:
pick = input(
f"Pick by number, or type a deployment name [{current_model or discovered_models[0]}]: "
).strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
if not pick:
effective_model = current_model or discovered_models[0]
elif pick.isdigit() and 1 <= int(pick) <= min(len(discovered_models), 30):
effective_model = discovered_models[int(pick) - 1]
else:
effective_model = pick
else:
try:
model_name = input(
f"Model / deployment name [{current_model or 'e.g. gpt-5.4, claude-sonnet-4-6'}]: "
).strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
effective_model = model_name or current_model
if not effective_model:
print("No model name provided. Cancelled.")
return
# ── Step 5: context-length lookup ────────────────────────────────
ctx_len = azure_detect.lookup_context_length(
effective_model, effective_url, effective_key,
)
# ── Step 6: persist ──────────────────────────────────────────────
save_env_value("AZURE_FOUNDRY_API_KEY", effective_key)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "azure-foundry"
model["base_url"] = effective_url
model["api_mode"] = api_mode
model["default"] = effective_model
if ctx_len:
model["context_length"] = ctx_len
save_config(cfg)
deactivate_provider()
config["model"] = dict(model)
# Clear any conflicting env vars so auxiliary clients don't poison
# themselves with a stale OpenAI base URL / key.
if get_env_value("OPENAI_BASE_URL"):
save_env_value("OPENAI_BASE_URL", "")
if get_env_value("OPENAI_API_KEY"):
save_env_value("OPENAI_API_KEY", "")
mode_label = "OpenAI-style" if api_mode == "chat_completions" else "Anthropic-style"
print()
print("✓ Azure Foundry configured:")
print(f" Endpoint: {effective_url}")
print(f" API mode: {mode_label}")
print(f" Model: {effective_model}")
if ctx_len:
print(f" Context length: {ctx_len:,} tokens")
else:
print(" Context length: not auto-detected (will fall back at runtime)")
print()
def _remove_custom_provider(config):
"""Let the user remove a saved custom provider from config.yaml."""
from hermes_cli.config import load_config, save_config
@@ -2909,6 +3215,7 @@ def _model_flow_named_custom(config, provider_info):
# Resolve key from env var if api_key not set directly
if not api_key and key_env:
api_key = os.environ.get(key_env, "")
config_api_key = _custom_provider_api_key_config_value(provider_info, api_key)
print(f" Provider: {name}")
print(f" URL: {base_url}")
@@ -3005,8 +3312,8 @@ def _model_flow_named_custom(config, provider_info):
else:
model["provider"] = "custom"
model["base_url"] = base_url
if api_key:
model["api_key"] = api_key
if config_api_key:
model["api_key"] = config_api_key
# Apply api_mode from custom_providers entry, or clear stale value
custom_api_mode = provider_info.get("api_mode", "")
if custom_api_mode:
@@ -3024,15 +3331,15 @@ def _model_flow_named_custom(config, provider_info):
provider_entry = providers_cfg.get(provider_key)
if isinstance(provider_entry, dict):
provider_entry["default_model"] = model_name
if api_key and not str(provider_entry.get("api_key", "") or "").strip():
provider_entry["api_key"] = api_key
if config_api_key and not str(provider_entry.get("api_key", "") or "").strip():
provider_entry["api_key"] = config_api_key
if key_env and not str(provider_entry.get("key_env", "") or "").strip():
provider_entry["key_env"] = key_env
cfg["providers"] = providers_cfg
save_config(cfg)
else:
# Save model name to the custom_providers entry for next time
_save_custom_provider(base_url, api_key, model_name)
_save_custom_provider(base_url, config_api_key, model_name)
print(f"\n✅ Model set to: {model_name}")
print(f" Provider: {name} ({base_url})")
@@ -4473,6 +4780,13 @@ def cmd_webhook(args):
webhook_command(args)
def cmd_kanban(args):
"""Multi-profile collaboration board."""
from hermes_cli.kanban import kanban_command
return kanban_command(args)
def cmd_hooks(args):
"""Shell-hook inspection and management."""
from hermes_cli.hooks import hooks_command
@@ -5570,6 +5884,54 @@ def _finalize_update_output(state):
pass
def _cmd_update_check():
"""Implement ``hermes update --check``: fetch and report without installing."""
git_dir = PROJECT_ROOT / ".git"
if not git_dir.exists():
print("✗ Not a git repository — cannot check for updates.")
sys.exit(1)
git_cmd = ["git"]
if sys.platform == "win32":
git_cmd = ["git", "-c", "windows.appendAtomically=false"]
print("→ Fetching from origin...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "origin"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if fetch_result.returncode != 0:
stderr = fetch_result.stderr.strip()
if "Could not resolve host" in stderr or "unable to access" in stderr:
print("✗ Network error — cannot reach the remote repository.")
elif "Authentication failed" in stderr or "could not read Username" in stderr:
print("✗ Authentication failed — check your git credentials or SSH key.")
else:
print("✗ Failed to fetch from origin.")
if stderr:
print(f" {stderr.splitlines()[0]}")
sys.exit(1)
rev_result = subprocess.run(
git_cmd + ["rev-list", "HEAD..origin/main", "--count"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
check=True,
)
behind = int(rev_result.stdout.strip())
if behind == 0:
print("✓ Already up to date.")
else:
commits_word = "commit" if behind == 1 else "commits"
print(f"⚕ Update available: {behind} {commits_word} behind origin/main.")
from hermes_cli.config import recommended_update_command
print(f" Run '{recommended_update_command()}' to install.")
def cmd_update(args):
"""Update Hermes Agent to the latest version.
@@ -5583,6 +5945,10 @@ def cmd_update(args):
managed_error("update Hermes Agent")
return
if getattr(args, "check", False):
_cmd_update_check()
return
gateway_mode = getattr(args, "gateway", False)
# Protect against mid-update terminal disconnects (SIGHUP) and tolerate
@@ -6046,6 +6412,75 @@ def _cmd_update_impl(args, gateway_mode: bool):
)
import signal as _signal
def _wait_for_service_active(
scope_cmd_: list, svc_name_: str, timeout: float = 10.0,
) -> bool:
"""Poll ``systemctl is-active`` until the unit reports active.
systemd's Stopped -> Started transition after a graceful exit
(or a hard restart) is not instantaneous; a one-shot check
races that window and falsely reports the unit as down.
Poll every 0.5s up to ``timeout`` seconds before giving up.
"""
deadline = _time.monotonic() + max(timeout, 0.5)
while True:
try:
_verify = subprocess.run(
scope_cmd_ + ["is-active", svc_name_],
capture_output=True, text=True, timeout=5,
)
if _verify.stdout.strip() == "active":
return True
except (FileNotFoundError, subprocess.TimeoutExpired):
pass
if _time.monotonic() >= deadline:
return False
_time.sleep(0.5)
def _service_restart_sec(
scope_cmd_: list, svc_name_: str, default: float = 0.0,
) -> float:
"""Read the unit's ``RestartUSec`` (RestartSec) in seconds.
After a graceful exit-75, systemd waits ``RestartSec`` before
respawning the unit. Callers that poll for ``is-active``
must use a timeout >= ``RestartSec`` + transition slack, or
they'll give up *during* the cooldown window and wrongly
conclude the unit didn't relaunch.
"""
try:
_show = subprocess.run(
scope_cmd_ + [
"show", svc_name_,
"--property=RestartUSec", "--value",
],
capture_output=True, text=True, timeout=5,
)
except (FileNotFoundError, subprocess.TimeoutExpired):
return default
raw = (_show.stdout or "").strip()
# systemd emits values like "30s", "100ms", "1min 30s", or
# "infinity". Parse conservatively; on any miss return default.
if not raw or raw == "infinity":
return default
total = 0.0
matched = False
for part in raw.split():
for _suf, _mult in (
("ms", 0.001),
("us", 0.000001),
("min", 60.0),
("s", 1.0),
):
if part.endswith(_suf):
try:
total += float(part[: -len(_suf)]) * _mult
matched = True
except ValueError:
pass
break
return total if matched else default
# Drain budget for graceful SIGUSR1 restarts. The gateway drains
# for up to ``agent.restart_drain_timeout`` (default 60s) before
# exiting with code 75; we wait slightly longer so the drain
@@ -6152,14 +6587,23 @@ def _cmd_update_impl(args, gateway_mode: bool):
if _graceful_ok:
# Gateway exited 75; systemd should relaunch
# via Restart=on-failure. Verify the new
# process came up.
_time.sleep(3)
verify = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True, text=True, timeout=5,
# via Restart=on-failure. The unit's
# RestartSec (default 30s on ours) gates the
# respawn — poll past that + slack so we
# don't give up mid-cooldown and falsely
# print "drained but didn't relaunch". For
# units without RestartSec set we fall back
# to the original 10s budget.
_restart_sec = _service_restart_sec(
scope_cmd, svc_name, default=0.0,
)
if verify.stdout.strip() == "active":
_post_drain_timeout = max(
10.0, _restart_sec + 10.0,
)
if _wait_for_service_active(
scope_cmd, svc_name,
timeout=_post_drain_timeout,
):
restarted_services.append(svc_name)
continue
# Process exited but wasn't respawned (older
@@ -6185,14 +6629,9 @@ def _cmd_update_impl(args, gateway_mode: bool):
# Verify the service actually survived the
# restart. systemctl restart returns 0 even
# if the new process crashes immediately.
_time.sleep(3)
verify = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True,
text=True,
timeout=5,
)
if verify.stdout.strip() == "active":
if _wait_for_service_active(
scope_cmd, svc_name, timeout=10.0,
):
restarted_services.append(svc_name)
else:
# Retry once — transient startup failures
@@ -6207,14 +6646,9 @@ def _cmd_update_impl(args, gateway_mode: bool):
text=True,
timeout=15,
)
_time.sleep(3)
verify2 = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True,
text=True,
timeout=5,
)
if verify2.stdout.strip() == "active":
if _wait_for_service_active(
scope_cmd, svc_name, timeout=10.0,
):
restarted_services.append(svc_name)
print(f"{svc_name} recovered on retry")
else:
@@ -6715,9 +7149,15 @@ def cmd_dashboard(args):
try:
import fastapi # noqa: F401
import uvicorn # noqa: F401
except ImportError:
print("Web UI dependencies not installed.")
print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'")
except ImportError as e:
print("Web UI dependencies not installed (need fastapi + uvicorn).")
print(
f"Re-install the package into this interpreter so metadata updates apply:\n"
f" cd {PROJECT_ROOT}\n"
f" {sys.executable} -m pip install -e .\n"
"If `pip` is missing in this venv, use: uv pip install -e ."
)
print(f"Import error: {e}")
sys.exit(1)
if "HERMES_WEB_DIST" not in os.environ:
@@ -6726,11 +7166,13 @@ def cmd_dashboard(args):
from hermes_cli.web_server import start_server
embedded_chat = args.tui or os.environ.get("HERMES_DASHBOARD_TUI") == "1"
start_server(
host=args.host,
port=args.port,
open_browser=not args.no_open,
allow_public=getattr(args, "insecure", False),
embedded_chat=embedded_chat,
)
@@ -6788,6 +7230,9 @@ Examples:
hermes auth remove <p> <t> Remove pooled credential by index, id, or label
hermes auth reset <provider> Clear exhaustion status for a provider
hermes model Select default model
hermes fallback [list] Show fallback provider chain
hermes fallback add Add a fallback provider (same picker as `hermes model`)
hermes fallback remove Remove a fallback provider from the chain
hermes config View configuration
hermes config edit Edit config in $EDITOR
hermes config set model gpt-4 Set a config value
@@ -6813,6 +7258,40 @@ For more help on a command:
parser.add_argument(
"--version", "-V", action="store_true", help="Show version and exit"
)
parser.add_argument(
"-z",
"--oneshot",
metavar="PROMPT",
default=None,
help=(
"One-shot mode: send a single prompt and print ONLY the final "
"response text to stdout. No banner, no spinner, no tool "
"previews, no session_id line. Tools, memory, rules, and "
"AGENTS.md in the CWD are loaded as normal; approvals are "
"auto-bypassed. Intended for scripts / pipes."
),
)
# --model / --provider are accepted at the top level so they can pair
# with -z without needing the `chat` subcommand. If neither -z nor a
# subcommand consumes them, they fall through harmlessly as None.
# Mirrors `hermes chat --model ... --provider ...` semantics.
parser.add_argument(
"-m",
"--model",
default=None,
help=(
"Model override for this invocation (e.g. anthropic/claude-sonnet-4.6). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_MODEL env var."
),
)
parser.add_argument(
"--provider",
default=None,
help=(
"Provider override for this invocation (e.g. openrouter, anthropic). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
),
)
parser.add_argument(
"--resume",
"-r",
@@ -7095,6 +7574,42 @@ For more help on a command:
)
model_parser.set_defaults(func=cmd_model)
# =========================================================================
# fallback command — manage the fallback provider chain
# =========================================================================
from hermes_cli.fallback_cmd import cmd_fallback
fallback_parser = subparsers.add_parser(
"fallback",
help="Manage fallback providers (tried when the primary model fails)",
description=(
"Manage the fallback provider chain. Fallback providers are tried "
"in order when the primary model fails with rate-limit, overload, or "
"connection errors. See: "
"https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers"
),
)
fallback_subparsers = fallback_parser.add_subparsers(dest="fallback_command")
fallback_subparsers.add_parser(
"list",
aliases=["ls"],
help="Show the current fallback chain (default when no subcommand)",
)
fallback_subparsers.add_parser(
"add",
help="Pick a provider + model (same picker as `hermes model`) and append to the chain",
)
fallback_subparsers.add_parser(
"remove",
aliases=["rm"],
help="Pick an entry to delete from the chain",
)
fallback_subparsers.add_parser(
"clear",
help="Remove all fallback entries",
)
fallback_parser.set_defaults(func=cmd_fallback)
# =========================================================================
# gateway command
# =========================================================================
@@ -7265,6 +7780,19 @@ For more help on a command:
setup_parser.add_argument(
"--reset", action="store_true", help="Reset configuration to defaults"
)
setup_parser.add_argument(
"--reconfigure",
action="store_true",
help="(Default on existing installs.) Re-run the full wizard, "
"showing current values as defaults. Kept for backwards "
"compatibility — a bare 'hermes setup' now does this.",
)
setup_parser.add_argument(
"--quick",
action="store_true",
help="On existing installs: only prompt for items that are missing "
"or unset, instead of running the full reconfigure wizard.",
)
setup_parser.set_defaults(func=cmd_setup)
# =========================================================================
@@ -7595,6 +8123,13 @@ For more help on a command:
webhook_parser.set_defaults(func=cmd_webhook)
# =========================================================================
# kanban command — multi-profile collaboration board
# =========================================================================
from hermes_cli.kanban import build_parser as _build_kanban_parser
kanban_parser = _build_kanban_parser(subparsers)
kanban_parser.set_defaults(func=cmd_kanban)
# =========================================================================
# hooks command — shell-hook inspection and management
# =========================================================================
@@ -8747,6 +9282,12 @@ Examples:
default=False,
help="Gateway mode: use file-based IPC for prompts instead of stdin (used internally by /update)",
)
update_parser.add_argument(
"--check",
action="store_true",
default=False,
help="Check whether an update is available without installing anything",
)
update_parser.set_defaults(func=cmd_update)
# =========================================================================
@@ -8916,6 +9457,14 @@ Examples:
action="store_true",
help="Allow binding to non-localhost (DANGEROUS: exposes API keys on the network)",
)
dashboard_parser.add_argument(
"--tui",
action="store_true",
help=(
"Expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket). "
"Alternatively set HERMES_DASHBOARD_TUI=1."
),
)
dashboard_parser.set_defaults(func=cmd_dashboard)
# =========================================================================
@@ -9085,6 +9634,17 @@ Examples:
exc_info=True,
)
# Handle top-level --oneshot / -z: single-shot mode, stdout = final
# response only, nothing else. Bypasses cli.py entirely.
if getattr(args, "oneshot", None):
from hermes_cli.oneshot import run_oneshot
sys.exit(run_oneshot(
args.oneshot,
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
))
# Handle top-level --resume / --continue as shortcut to chat
if (args.resume or args.continue_last) and args.command is None:
args.command = "chat"
+329
View File
@@ -0,0 +1,329 @@
"""Remote model catalog fetcher.
The Hermes docs site hosts a JSON manifest of curated models for providers
we want to update without shipping a release (currently OpenRouter and
Nous Portal). This module fetches, validates, and caches that manifest,
falling back to the in-repo hardcoded lists when the network is unavailable.
Pipeline
--------
1. ``get_catalog()`` returns a parsed manifest dict.
- Checks in-process cache (invalidated by TTL).
- Reads disk cache at ``~/.hermes/cache/model_catalog.json``.
- Fetches the master URL if disk cache is stale or missing.
- On any fetch failure, keeps using the stale cache (or empty dict).
2. ``get_curated_openrouter_models()`` / ``get_curated_nous_models()``
thin accessors returning the shapes existing callers expect. Each
falls back to the in-repo hardcoded list on any lookup failure.
Schema (version 1)
------------------
::
{
"version": 1,
"updated_at": "2026-04-25T22:00:00Z",
"metadata": {...}, # free-form
"providers": {
"openrouter": {
"metadata": {...}, # free-form
"models": [
{"id": "vendor/model", "description": "recommended",
"metadata": {...}} # free-form, model-level
]
},
"nous": {...}
}
}
Unknown fields are ignored extra metadata can be added at either level
without bumping ``version``. ``version`` bumps are reserved for
breaking changes (renaming ``providers``, changing ``models`` shape).
"""
from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
from hermes_cli import __version__ as _HERMES_VERSION
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_CATALOG_URL = (
"https://hermes-agent.nousresearch.com/docs/api/model-catalog.json"
)
DEFAULT_TTL_HOURS = 24
DEFAULT_FETCH_TIMEOUT = 8.0
SUPPORTED_SCHEMA_VERSION = 1
_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
# In-process cache to avoid repeated disk + parse work across multiple
# calls within the same session. Invalidated by TTL against the disk file's
# mtime, so calling code never has to think about this.
_catalog_cache: dict[str, Any] | None = None
_catalog_cache_source_mtime: float = 0.0
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_catalog_config() -> dict[str, Any]:
"""Load the ``model_catalog`` config block with defaults filled in."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
except Exception:
cfg = {}
raw = cfg.get("model_catalog")
if not isinstance(raw, dict):
raw = {}
return {
"enabled": bool(raw.get("enabled", True)),
"url": str(raw.get("url") or DEFAULT_CATALOG_URL),
"ttl_hours": float(raw.get("ttl_hours") or DEFAULT_TTL_HOURS),
"providers": raw.get("providers") if isinstance(raw.get("providers"), dict) else {},
}
def _cache_path() -> Path:
"""Return the disk cache path. Import lazily so tests can monkeypatch home."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "cache" / "model_catalog.json"
# ---------------------------------------------------------------------------
# Fetch + validate + cache
# ---------------------------------------------------------------------------
def _fetch_manifest(url: str, timeout: float) -> dict[str, Any] | None:
"""HTTP GET the manifest URL and return a parsed dict, or None on failure."""
try:
req = urllib.request.Request(
url,
headers={
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError, OSError) as exc:
logger.info("model catalog fetch failed (%s): %s", url, exc)
return None
except Exception as exc: # pragma: no cover — defensive
logger.info("model catalog fetch errored (%s): %s", url, exc)
return None
if not _validate_manifest(data):
logger.info("model catalog at %s failed schema validation", url)
return None
return data
def _validate_manifest(data: Any) -> bool:
"""Return True when ``data`` matches the minimum manifest shape."""
if not isinstance(data, dict):
return False
version = data.get("version")
if not isinstance(version, int) or version > SUPPORTED_SCHEMA_VERSION:
# Future schema version we don't understand — refuse rather than
# guess. Older schemas (version < 1) aren't supported either.
return False
providers = data.get("providers")
if not isinstance(providers, dict):
return False
for pname, pblock in providers.items():
if not isinstance(pname, str) or not isinstance(pblock, dict):
return False
models = pblock.get("models")
if not isinstance(models, list):
return False
for m in models:
if not isinstance(m, dict):
return False
if not isinstance(m.get("id"), str) or not m["id"].strip():
return False
return True
def _read_disk_cache() -> tuple[dict[str, Any] | None, float]:
"""Return ``(data_or_none, mtime)``. mtime is 0 if file is missing."""
path = _cache_path()
try:
mtime = path.stat().st_mtime
except (OSError, FileNotFoundError):
return (None, 0.0)
try:
with open(path) as fh:
data = json.load(fh)
except (OSError, json.JSONDecodeError):
return (None, 0.0)
if not _validate_manifest(data):
return (None, 0.0)
return (data, mtime)
def _write_disk_cache(data: dict[str, Any]) -> None:
path = _cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
with open(tmp, "w") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
os.replace(tmp, path)
except OSError as exc:
logger.info("model catalog cache write failed: %s", exc)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def get_catalog(*, force_refresh: bool = False) -> dict[str, Any]:
"""Return the parsed model catalog manifest, or an empty dict on failure.
Callers should treat a missing provider/model as "use the in-repo fallback"
never raise from this function so the CLI keeps working offline.
"""
global _catalog_cache, _catalog_cache_source_mtime
cfg = _load_catalog_config()
if not cfg["enabled"]:
return {}
ttl_seconds = max(0.0, cfg["ttl_hours"] * 3600.0)
disk_data, disk_mtime = _read_disk_cache()
now = time.time()
disk_fresh = disk_data is not None and (now - disk_mtime) < ttl_seconds
# In-process cache hit: disk hasn't changed since we loaded it and still fresh.
if (
not force_refresh
and _catalog_cache is not None
and disk_data is not None
and disk_mtime == _catalog_cache_source_mtime
and disk_fresh
):
return _catalog_cache
# Disk is fresh enough — use it without a network hit.
if not force_refresh and disk_fresh and disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
# Need to (re)fetch. If it fails, fall back to any stale disk copy.
fetched = _fetch_manifest(cfg["url"], DEFAULT_FETCH_TIMEOUT)
if fetched is not None:
_write_disk_cache(fetched)
new_disk_data, new_mtime = _read_disk_cache()
if new_disk_data is not None:
_catalog_cache = new_disk_data
_catalog_cache_source_mtime = new_mtime
return new_disk_data
_catalog_cache = fetched
_catalog_cache_source_mtime = now
return fetched
if disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
return {}
def _fetch_provider_override(provider: str) -> dict[str, Any] | None:
"""If ``model_catalog.providers.<name>.url`` is set, fetch that instead."""
cfg = _load_catalog_config()
if not cfg["enabled"]:
return None
provider_cfg = cfg["providers"].get(provider)
if not isinstance(provider_cfg, dict):
return None
override_url = provider_cfg.get("url")
if not isinstance(override_url, str) or not override_url.strip():
return None
# Override fetches skip the disk cache because they're usually
# third-party self-hosted. Re-request on every call but with a short
# timeout so they don't block the picker.
return _fetch_manifest(override_url.strip(), DEFAULT_FETCH_TIMEOUT)
def _get_provider_block(provider: str) -> dict[str, Any] | None:
"""Return the provider's manifest block, respecting per-provider overrides."""
override = _fetch_provider_override(provider)
if override is not None:
block = override.get("providers", {}).get(provider)
if isinstance(block, dict):
return block
catalog = get_catalog()
if not catalog:
return None
block = catalog.get("providers", {}).get(provider)
return block if isinstance(block, dict) else None
def get_curated_openrouter_models() -> list[tuple[str, str]] | None:
"""Return OpenRouter's curated ``[(id, description), ...]`` from the manifest.
Returns ``None`` when the manifest is unavailable, so callers can fall
back to their hardcoded list.
"""
block = _get_provider_block("openrouter")
if not block:
return None
out: list[tuple[str, str]] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if not mid:
continue
desc = str(m.get("description") or "")
out.append((mid, desc))
return out or None
def get_curated_nous_models() -> list[str] | None:
"""Return Nous Portal's curated list of model ids from the manifest.
Returns ``None`` when the manifest is unavailable.
"""
block = _get_provider_block("nous")
if not block:
return None
out: list[str] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if mid:
out.append(mid)
return out or None
def reset_cache() -> None:
"""Clear the in-process cache. Used by tests and ``hermes model --refresh``."""
global _catalog_cache, _catalog_cache_source_mtime
_catalog_cache = None
_catalog_cache_source_mtime = 0.0
+75 -12
View File
@@ -527,6 +527,49 @@ def _resolve_alias_fallback(
return None
def resolve_display_context_length(
model: str,
provider: str,
base_url: str = "",
api_key: str = "",
model_info: Optional[ModelInfo] = None,
custom_providers: list | None = None,
) -> Optional[int]:
"""Resolve the context length to show in /model output.
models.dev reports per-vendor context (e.g. gpt-5.5 = 1.05M on openai)
but provider-enforced limits can be lower (e.g. Codex OAuth caps the
same slug at 272k). The authoritative source is
``agent.model_metadata.get_model_context_length`` which already knows
about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
rest.
When ``custom_providers`` is provided, per-model ``context_length``
overrides from ``custom_providers[].models.<id>.context_length`` are
honored this closes #15779 where ``/model`` switch ignored user-set
overrides.
Prefer the provider-aware value; fall back to ``model_info.context_window``
only if the resolver returns nothing.
"""
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
model,
base_url=base_url or "",
api_key=api_key or "",
provider=provider or None,
custom_providers=custom_providers,
)
if ctx:
return int(ctx)
except Exception:
pass
if model_info is not None and model_info.context_window:
return int(model_info.context_window)
return None
# ---------------------------------------------------------------------------
# Core model-switching pipeline
# ---------------------------------------------------------------------------
@@ -795,9 +838,14 @@ def switch_model(
requested=current_provider,
target_model=new_model,
)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
# If resolution fell through to "custom" (e.g. named custom provider like
# "ollama-launch" that resolve_runtime_provider doesn't know), keep existing
# credentials. Otherwise use the resolved values (picks up credential rotation,
# base_url adjustments for OpenCode, etc.).
if runtime.get("provider") != "custom":
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
except Exception:
pass
@@ -831,16 +879,31 @@ def switch_model(
"message": f"Could not validate `{new_model}`: {e}",
}
# Override rejection if model is in the user's saved provider config.
# API /v1/models may not list cloud/aliased models even though the server supports them.
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
override = False
if user_providers:
for up in user_providers:
if isinstance(up, dict) and up.get("provider") == target_provider:
cfg_models = up.get("models", [])
if new_model in cfg_models or any(
m.get("name") == new_model for m in cfg_models if isinstance(m, dict)
):
override = True
break
if override:
validation = {"accepted": True, "persist": True, "recognized": False, "message": validation.get("message", "")}
else:
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
# Apply auto-correction if validation found a closer match
if validation.get("corrected_model"):
+144 -62
View File
@@ -42,7 +42,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-sonnet-4.5", ""),
("anthropic/claude-haiku-4.5", ""),
("openrouter/elephant-alpha", "free"),
("openai/gpt-5.4", ""),
("openai/gpt-5.5", ""),
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
@@ -65,7 +65,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("nvidia/nemotron-3-super-120b-a12b:free", "free"),
("arcee-ai/trinity-large-preview:free", "free"),
("arcee-ai/trinity-large-thinking", ""),
("openai/gpt-5.4-pro", ""),
("openai/gpt-5.5-pro", ""),
("openai/gpt-5.4-nano", ""),
]
@@ -120,7 +120,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"anthropic/claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.4",
"openai/gpt-5.5",
"openai/gpt-5.4-mini",
"openai/gpt-5.3-codex",
"google/gemini-3-pro-preview",
@@ -139,7 +139,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"x-ai/grok-4.20-beta",
"nvidia/nemotron-3-super-120b-a12b",
"arcee-ai/trinity-large-thinking",
"openai/gpt-5.4-pro",
"openai/gpt-5.5-pro",
"openai/gpt-5.4-nano",
],
# Native OpenAI Chat Completions (api.openai.com). Used by /model counts and
@@ -383,6 +383,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
# Azure Foundry: user-provided endpoint and model.
# Empty list because models depend on the endpoint configuration.
"azure-foundry": [],
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -740,6 +743,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
ProviderEntry("azure-foundry", "Azure Foundry", "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
]
# Derived dicts — used throughout the codebase
@@ -872,7 +876,16 @@ def fetch_openrouter_models(
if _openrouter_catalog_cache is not None and not force_refresh:
return list(_openrouter_catalog_cache)
fallback = list(OPENROUTER_MODELS)
# Prefer the remotely-hosted catalog manifest; fall back to the in-repo
# snapshot when the manifest is unreachable. Both are curated lists that
# drive the picker; the OpenRouter live /v1/models filter (tool support,
# free pricing) is applied on top either way.
try:
from hermes_cli.model_catalog import get_curated_openrouter_models
remote = get_curated_openrouter_models()
except Exception:
remote = None
fallback = list(remote) if remote else list(OPENROUTER_MODELS)
preferred_ids = [mid for mid, _ in fallback]
try:
@@ -925,6 +938,24 @@ def model_ids(*, force_refresh: bool = False) -> list[str]:
return [mid for mid, _ in fetch_openrouter_models(force_refresh=force_refresh)]
def get_curated_nous_model_ids() -> list[str]:
"""Return the curated Nous Portal model-id list.
Prefers the remotely-hosted catalog manifest (published under
``website/static/api/model-catalog.json``); falls back to the in-repo
snapshot in ``_PROVIDER_MODELS["nous"]`` when the manifest is
unreachable. Always returns a list (never None).
"""
try:
from hermes_cli.model_catalog import get_curated_nous_models
remote = get_curated_nous_models()
except Exception:
remote = None
if remote:
return list(remote)
return list(_PROVIDER_MODELS.get("nous", []))
def _ai_gateway_model_is_free(pricing: Any) -> bool:
"""Return True if an AI Gateway model has $0 input AND output pricing."""
if not isinstance(pricing, dict):
@@ -1379,27 +1410,93 @@ def curated_models_for_provider(
return [(m, "") for m in models]
def detect_provider_for_model(
def _provider_keys(provider: str) -> set[str]:
key = (provider or "").strip().lower()
normalized = normalize_provider(provider)
return {k for k in (key, normalized) if k}
def _model_in_provider_catalog(name_lower: str, providers: set[str]) -> bool:
return any(
name_lower == model.lower()
for provider in providers
for model in _PROVIDER_MODELS.get(provider, [])
)
_AGGREGATOR_PROVIDERS = frozenset(
{"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
)
def _resolve_static_model_alias(
name_lower: str,
current_keys: set[str],
) -> Optional[tuple[str, str]]:
"""Resolve short aliases (e.g. sonnet/opus) using static catalogs only."""
try:
from hermes_cli.model_switch import MODEL_ALIASES
except Exception:
return None
identity = MODEL_ALIASES.get(name_lower)
if identity is None:
return None
vendor = identity.vendor
family = identity.family
def _match(provider: str) -> Optional[str]:
models = _PROVIDER_MODELS.get(provider, [])
if not models:
return None
prefix = (
f"{vendor}/{family}"
if provider in _AGGREGATOR_PROVIDERS
else family
).lower()
for model in models:
if model.lower().startswith(prefix):
return model
return None
for provider in current_keys:
if matched := _match(provider):
return provider, matched
for provider in _PROVIDER_MODELS:
if provider in current_keys or provider in _AGGREGATOR_PROVIDERS:
continue
if matched := _match(provider):
return provider, matched
for provider in _AGGREGATOR_PROVIDERS:
if provider in current_keys and (matched := _match(provider)):
return provider, matched
return None
def detect_static_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
"""Auto-detect a provider from static catalogs only.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``(provider_id, model_name)``. The model name may be remapped
when a static alias or bare provider name resolves to a catalog default.
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider with credentials (highest)
2. Direct provider without credentials remap to OpenRouter slug
3. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
name_lower = name.lower()
current_keys = _provider_keys(current_provider)
alias_match = _resolve_static_model_alias(name_lower, current_keys)
if alias_match:
return alias_match
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
@@ -1412,64 +1509,49 @@ def detect_provider_for_model(
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider != normalize_provider(current_provider)
and resolved_provider not in current_keys
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
if any(name_lower == m.lower() for m in current_models):
if _model_in_provider_catalog(name_lower, current_keys):
return None
# --- Step 1: check static provider catalogs for a direct match ---
direct_match: Optional[str] = None
for pid, models in _PROVIDER_MODELS.items():
if pid == current_provider or pid in _AGGREGATORS:
if pid in current_keys or pid in _AGGREGATOR_PROVIDERS:
continue
if any(name_lower == m.lower() for m in models):
direct_match = pid
break
return (pid, name)
if direct_match:
# Check if we have credentials for this provider — env vars,
# credential pool, or auth store entries.
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(direct_match)
if pconfig:
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
has_creds = True
break
except Exception:
pass
# Also check credential pool and auth store — covers OAuth,
# Claude Code tokens, and other non-env-var credentials (#10300).
if not has_creds:
try:
from agent.credential_pool import load_pool
pool = load_pool(direct_match)
if pool.has_credentials():
has_creds = True
except Exception:
pass
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
return None
# Always return the direct provider match. If credentials are
# missing, the client init will give a clear error rather than
# silently routing through the wrong provider (#10300).
return (direct_match, name)
def detect_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider static catalog match
2. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
static_match = detect_static_provider_for_model(name, current_provider)
if static_match:
return static_match
if _model_in_provider_catalog(name.lower(), _provider_keys(current_provider)):
return None
# --- Step 2: check OpenRouter catalog ---
# First try exact match (handles provider/model format)
@@ -2571,8 +2653,8 @@ def validate_requested_model(
)
return {
"accepted": False,
"persist": False,
"accepted": True,
"persist": True,
"recognized": False,
"message": message,
}
+202
View File
@@ -0,0 +1,202 @@
"""Oneshot (-z) mode: send a prompt, get the final content block, exit.
Bypasses cli.py entirely. No banner, no spinner, no session_id line,
no stderr chatter. Just the agent's final text to stdout.
Toolsets = whatever the user has configured for "cli" in `hermes tools`.
Rules / memory / AGENTS.md / preloaded skills = same as a normal chat turn.
Approvals = auto-bypassed (HERMES_YOLO_MODE=1 is set for the call).
Working directory = the user's CWD (AGENTS.md etc. resolve from there as usual).
Model / provider selection mirrors `hermes chat`:
- Both optional. If omitted, use the user's configured default.
- If both given, pair them exactly as given.
- If only --model given, auto-detect the provider that serves it.
- If only --provider given, error out (ambiguous caller must pick a model).
Env var fallbacks (used when the corresponding arg is not passed):
- HERMES_INFERENCE_MODEL
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
"""
from __future__ import annotations
import logging
import os
import sys
from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
def run_oneshot(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> int:
"""Execute a single prompt and print only the final content block.
Args:
prompt: The user message to send.
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
env var, then config.yaml's model.default / model.model.
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
Returns the exit code. Caller should sys.exit() with the return.
"""
# Silence every stdlib logger for the duration. AIAgent, tools, and
# provider adapters all log to stderr through the root logger; file
# handlers added by setup_logging() keep working (they're attached to
# the root logger's handler list, not affected by level), but no
# bytes reach the terminal.
logging.disable(logging.CRITICAL)
# --provider without --model is ambiguous: carrying the user's configured
# model across to a different provider is usually wrong (that provider may
# not host it), and silently picking the provider's catalog default hides
# the mismatch. Require the caller to be explicit. Validate BEFORE the
# stderr redirect so the message actually reaches the terminal.
env_model_early = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
if provider and not ((model or "").strip() or env_model_early):
sys.stderr.write(
"hermes -z: --provider requires --model (or HERMES_INFERENCE_MODEL). "
"Pass both explicitly, or neither to use your configured defaults.\n"
)
return 2
# Auto-approve any shell / tool approvals. Non-interactive by
# definition — a prompt would hang forever.
os.environ["HERMES_YOLO_MODE"] = "1"
os.environ["HERMES_ACCEPT_HOOKS"] = "1"
# Redirect stderr AND stdout to devnull for the entire call tree.
# We'll print the final response to the real stdout at the end.
real_stdout = sys.stdout
devnull = open(os.devnull, "w")
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
response = _run_agent(prompt, model=model, provider=provider)
finally:
try:
devnull.close()
except Exception:
pass
if response:
real_stdout.write(response)
if not response.endswith("\n"):
real_stdout.write("\n")
real_stdout.flush()
return 0
def _run_agent(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> str:
"""Build an AIAgent exactly like a normal CLI chat turn would, then
run a single conversation. Returns the final response string."""
# Imports are local so they don't run when hermes is invoked for
# other commands (keeps top-level CLI startup cheap).
from hermes_cli.config import load_config
from hermes_cli.models import detect_provider_for_model
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.tools_config import _get_platform_tools
from run_agent import AIAgent
cfg = load_config()
# Resolve effective model: explicit arg → env var → config.
model_cfg = cfg.get("model") or {}
if isinstance(model_cfg, str):
cfg_model = model_cfg
else:
cfg_model = model_cfg.get("default") or model_cfg.get("model") or ""
env_model = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
effective_model = (model or "").strip() or env_model or cfg_model
# Resolve effective provider: explicit arg → (auto-detect from model if
# model was explicit) → env / config (handled inside resolve_runtime_provider).
#
# When --model is given without --provider, auto-detect the provider that
# serves that model — same semantic as `/model <name>` in an interactive
# session. Without this, resolve_runtime_provider() would fall back to
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
)
# Pull in whatever toolsets the user has enabled for "cli".
# sorted() gives stable ordering; set→list for AIAgent's signature.
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
agent = AIAgent(
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=effective_model,
enabled_toolsets=toolsets_list,
quiet_mode=True,
platform="cli",
credential_pool=runtime.get("credential_pool"),
# Interactive callbacks are intentionally NOT wired beyond this
# one. In oneshot mode there's no user sitting at a terminal:
# - clarify → returns a synthetic "pick a default" instruction
# so the agent continues instead of stalling on
# the tool's built-in "not available" error
# - sudo password prompt → terminal_tool gates on
# HERMES_INTERACTIVE which we never set
# - shell-hook approval → auto-approved via HERMES_ACCEPT_HOOKS=1
# (set above); also falls back to deny on non-tty
# - dangerous-command approval → bypassed via HERMES_YOLO_MODE=1
# - skill secret capture → returns gracefully when no callback set
clarify_callback=_oneshot_clarify_callback,
)
# Belt-and-braces: make sure AIAgent doesn't invoke any streaming
# display callbacks that would bypass our stdout capture.
agent.suppress_status_output = True
agent.stream_delta_callback = None
agent.tool_gen_callback = None
return agent.chat(prompt) or ""
def _oneshot_clarify_callback(question: str, choices=None) -> str:
"""Clarify is disabled in oneshot mode — tell the agent to pick a
default and proceed instead of stalling or erroring."""
if choices:
return (
f"[oneshot mode: no user available. Pick the best option from "
f"{choices} using your own judgment and continue.]"
)
return (
"[oneshot mode: no user available. Make the most reasonable "
"assumption you can and continue.]"
)
+6
View File
@@ -167,6 +167,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
# Azure Foundry: supports both OpenAI-style and Anthropic-style endpoints.
# The transport is determined at runtime from config.yaml model.api_mode.
"azure-foundry": HermesOverlay(
transport="openai_chat", # default; overridden by api_mode in config
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
}
+229
View File
@@ -0,0 +1,229 @@
"""PTY bridge for `hermes dashboard` chat tab.
Wraps a child process behind a pseudo-terminal so its ANSI output can be
streamed to a browser-side terminal emulator (xterm.js) and typed
keystrokes can be fed back in. The only caller today is the
``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
Design constraints:
* **POSIX-only.** Hermes Agent supports Windows exclusively via WSL, which
exposes a native POSIX PTY via ``openpty(3)``. Native Windows Python
has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
install/platform message so the dashboard can render a banner instead of
crashing.
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
which is a pure-Python wrapper around the OS calls. The browser talks
to the same ``hermes --tui`` binary it would launch from the CLI, so
every TUI feature (slash popover, model picker, tool rows, markdown,
skin engine, clarify/sudo/approval prompts) ships automatically.
* **Byte-safe I/O.** Reads and writes go through the PTY master fd
directly we avoid :class:`ptyprocess.PtyProcessUnicode` because
streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
mid-read.
"""
from __future__ import annotations
import errno
import fcntl
import os
import select
import signal
import struct
import sys
import termios
import time
from typing import Optional, Sequence
try:
import ptyprocess # type: ignore
_PTY_AVAILABLE = not sys.platform.startswith("win")
except ImportError: # pragma: no cover - dev env without ptyprocess
ptyprocess = None # type: ignore
_PTY_AVAILABLE = False
__all__ = ["PtyBridge", "PtyUnavailableError"]
class PtyUnavailableError(RuntimeError):
"""Raised when a PTY cannot be created on this platform.
Today this means native Windows (no ConPTY bindings) or a dev
environment missing the ``ptyprocess`` dependency. The dashboard
surfaces the message to the user as a chat-tab banner.
"""
class PtyBridge:
"""Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
Not thread-safe. A single bridge is owned by the WebSocket handler
that spawned it; the reader runs in an executor thread while writes
happen on the event-loop thread. Both sides are OK because the
kernel PTY is the actual synchronization point we never call
:mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
``os.write`` on the master fd, which is safe.
"""
def __init__(self, proc: "ptyprocess.PtyProcess"): # type: ignore[name-defined]
self._proc = proc
self._fd: int = proc.fd
self._closed = False
# -- lifecycle --------------------------------------------------------
@classmethod
def is_available(cls) -> bool:
"""True if a PTY can be spawned on this platform."""
return bool(_PTY_AVAILABLE)
@classmethod
def spawn(
cls,
argv: Sequence[str],
*,
cwd: Optional[str] = None,
env: Optional[dict] = None,
cols: int = 80,
rows: int = 24,
) -> "PtyBridge":
"""Spawn ``argv`` behind a new PTY and return a bridge.
Raises :class:`PtyUnavailableError` if the platform can't host a
PTY. Raises :class:`FileNotFoundError` or :class:`OSError` for
ordinary exec failures (missing binary, bad cwd, etc.).
"""
if not _PTY_AVAILABLE:
if sys.platform.startswith("win"):
raise PtyUnavailableError(
"Pseudo-terminals are unavailable on this platform. "
"Hermes Agent supports Windows only via WSL."
)
if ptyprocess is None:
raise PtyUnavailableError(
"The `ptyprocess` package is missing. "
"Install with: pip install ptyprocess "
"(or pip install -e '.[pty]')."
)
raise PtyUnavailableError("Pseudo-terminals are unavailable.")
# Let caller-supplied env fully override inheritance; if they pass
# None we inherit the server's env (same semantics as subprocess).
spawn_env = os.environ.copy() if env is None else env
proc = ptyprocess.PtyProcess.spawn( # type: ignore[union-attr]
list(argv),
cwd=cwd,
env=spawn_env,
dimensions=(rows, cols),
)
return cls(proc)
@property
def pid(self) -> int:
return int(self._proc.pid)
def is_alive(self) -> bool:
if self._closed:
return False
try:
return bool(self._proc.isalive())
except Exception:
return False
# -- I/O --------------------------------------------------------------
def read(self, timeout: float = 0.2) -> Optional[bytes]:
"""Read up to 64 KiB of raw bytes from the PTY master.
Returns:
* bytes zero or more bytes of child output
* empty bytes (``b""``) no data available within ``timeout``
* None child has exited and the master fd is at EOF
Never blocks longer than ``timeout`` seconds. Safe to call after
:meth:`close`; returns ``None`` in that case.
"""
if self._closed:
return None
try:
readable, _, _ = select.select([self._fd], [], [], timeout)
except (OSError, ValueError):
return None
if not readable:
return b""
try:
data = os.read(self._fd, 65536)
except OSError as exc:
# EIO on Linux = slave side closed. EBADF = already closed.
if exc.errno in (errno.EIO, errno.EBADF):
return None
raise
if not data:
return None
return data
def write(self, data: bytes) -> None:
"""Write raw bytes to the PTY master (i.e. the child's stdin)."""
if self._closed or not data:
return
# os.write can return a short write under load; loop until drained.
view = memoryview(data)
while view:
try:
n = os.write(self._fd, view)
except OSError as exc:
if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
return
raise
if n <= 0:
return
view = view[n:]
def resize(self, cols: int, rows: int) -> None:
"""Forward a terminal resize to the child via ``TIOCSWINSZ``."""
if self._closed:
return
# struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
try:
fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
except OSError:
pass
# -- teardown ---------------------------------------------------------
def close(self) -> None:
"""Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
Idempotent. Reaping the child is important so we don't leak
zombies across the lifetime of the dashboard process.
"""
if self._closed:
return
self._closed = True
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
if not self._proc.isalive():
break
try:
self._proc.kill(sig)
except Exception:
pass
deadline = time.monotonic() + 0.5
while self._proc.isalive() and time.monotonic() < deadline:
time.sleep(0.02)
try:
self._proc.close(force=True)
except Exception:
pass
# Context-manager sugar — handy in tests and ad-hoc scripts.
def __enter__(self) -> "PtyBridge":
return self
def __exit__(self, *_exc) -> None:
self.close()
+148 -7
View File
@@ -221,6 +221,19 @@ def _resolve_runtime_from_pool_entry(
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
elif provider == "azure-foundry":
# Azure Foundry: read api_mode and base_url from config
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# For Anthropic-style endpoints, strip /v1 suffix
if api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -589,6 +602,71 @@ def _resolve_openrouter_runtime(
}
def _resolve_azure_foundry_runtime(
*,
requested_provider: str,
model_cfg: Dict[str, Any],
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve an Azure Foundry runtime entry.
Reads ``model.base_url`` + ``model.api_mode`` from config.yaml (or
explicit overrides), pulls the API key from ``.env`` / env var, and
strips a trailing ``/v1`` for Anthropic-style endpoints because the
Anthropic SDK appends ``/v1/messages`` internally.
Raises :class:`AuthError` when required values are missing.
"""
explicit_api_key = str(explicit_api_key or "").strip()
explicit_base_url_clean = str(explicit_base_url or "").strip().rstrip("/")
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
cfg_base_url = ""
cfg_api_mode = "chat_completions"
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"
env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
base_url = explicit_base_url_clean or cfg_base_url or env_base_url
if not base_url:
raise AuthError(
"Azure Foundry requires a base URL. Set it via 'hermes model' or "
"the AZURE_FOUNDRY_BASE_URL environment variable."
)
api_key = explicit_api_key
if not api_key:
try:
from hermes_cli.config import get_env_value
api_key = get_env_value("AZURE_FOUNDRY_API_KEY") or ""
except Exception:
api_key = ""
if not api_key:
api_key = os.getenv("AZURE_FOUNDRY_API_KEY", "").strip()
if not api_key:
raise AuthError(
"Azure Foundry requires an API key. Set AZURE_FOUNDRY_API_KEY in "
"~/.hermes/.env or run 'hermes model' to configure."
)
# Anthropic SDK appends /v1/messages itself, so strip any trailing /v1
# we inherited from the configured base_url to avoid double-/v1 paths.
if cfg_api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
source = "explicit" if (explicit_api_key or explicit_base_url) else "config"
return {
"provider": "azure-foundry",
"api_mode": cfg_api_mode,
"base_url": base_url,
"api_key": api_key,
"source": source,
"requested_provider": requested_provider,
}
def _resolve_explicit_runtime(
*,
provider: str,
@@ -678,6 +756,15 @@ def _resolve_explicit_runtime(
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
if provider == "azure-foundry":
return _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=model_cfg,
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
env_url = ""
@@ -746,6 +833,40 @@ def resolve_runtime_provider(
"""
requested_provider = resolve_requested_provider(requested)
# Azure Anthropic short-circuit: when explicitly targeting an Azure endpoint
# with provider="anthropic", bypass _resolve_named_custom_runtime (which would
# return provider="custom" with chat_completions api_mode and no valid key).
# Instead, use the Azure key directly with anthropic_messages api_mode.
_eff_base = (explicit_base_url or "").strip()
if requested_provider == "anthropic" and "azure.com" in _eff_base:
_azure_key = (
(explicit_api_key or "").strip()
or os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
"base_url": _eff_base.rstrip("/"),
"api_key": _azure_key,
"source": "azure-explicit",
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
# (OpenAI-style chat_completions or Anthropic-style anthropic_messages).
# Resolve before the custom-runtime / pool / generic paths so Azure
# config is always picked up from model.base_url + model.api_mode,
# regardless of whether the caller passed explicit_* args.
if requested_provider == "azure-foundry":
azure_runtime = _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=_get_model_config(),
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
)
return azure_runtime
custom_runtime = _resolve_named_custom_runtime(
requested_provider=requested_provider,
explicit_api_key=explicit_api_key,
@@ -924,13 +1045,6 @@ def resolve_runtime_provider(
# Anthropic (native Messages API)
if provider == "anthropic":
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
# Allow base URL override from config.yaml model.base_url, but only
# when the configured provider is anthropic — otherwise a non-Anthropic
# base_url (e.g. Codex endpoint) would leak into Anthropic requests.
@@ -939,6 +1053,33 @@ def resolve_runtime_provider(
if cfg_provider == "anthropic":
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
base_url = cfg_base_url or "https://api.anthropic.com"
# For Azure AI Foundry endpoints, use ANTHROPIC_API_KEY directly —
# Claude Code OAuth tokens (sk-ant-oat01) are not accepted by Azure.
# Azure keys don't start with "sk-ant-" so resolve_anthropic_token()
# would find the Claude Code OAuth token first (priority 3) and return
# that instead, causing 401s. Detect Azure endpoints and use the env
# key directly to bypass the OAuth priority chain.
_is_azure_endpoint = "azure.com" in base_url.lower() or (
cfg_base_url and "azure.com" in cfg_base_url.lower()
)
if _is_azure_endpoint:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
if not token:
raise AuthError(
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or ANTHROPIC_API_KEY."
)
else:
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
+27 -49
View File
@@ -2863,17 +2863,6 @@ SETUP_SECTIONS = [
("agent", "Agent Settings", setup_agent_settings),
]
# The returning-user menu intentionally omits standalone TTS because model setup
# already includes TTS selection and tools setup covers the rest of the provider
# configuration. Keep this list in the same order as the visible menu entries.
RETURNING_USER_MENU_SECTION_KEYS = [
"model",
"terminal",
"gateway",
"tools",
"agent",
]
def run_setup_wizard(args):
"""Run the interactive setup wizard.
@@ -2898,6 +2887,9 @@ def run_setup_wizard(args):
save_config(copy.deepcopy(DEFAULT_CONFIG))
print_success("Configuration reset to defaults.")
reconfigure_requested = bool(getattr(args, "reconfigure", False))
quick_requested = bool(getattr(args, "quick", False))
config = load_config()
hermes_home = get_hermes_home()
@@ -2989,50 +2981,36 @@ def run_setup_wizard(args):
migration_ran = False
if is_existing:
# ── Returning User Menu ──
print()
print_header("Welcome Back!")
print_success("You already have Hermes configured.")
print()
menu_choices = [
"Quick Setup - configure missing items only",
"Full Setup - reconfigure everything",
"Model & Provider",
"Terminal Backend",
"Messaging Platforms (Gateway)",
"Tools",
"Agent Settings",
"Exit",
]
choice = prompt_choice("What would you like to do?", menu_choices, 0)
if choice == 0:
# Quick setup
# Existing install — default is the full-wizard reconfigure flow.
# Every prompt shows the current value as its default, so pressing
# Enter keeps it. Opt into `--quick` for the narrow "just fill in
# missing items" flow (useful after a partial OpenClaw migration
# or when a required API key got cleared).
if quick_requested:
_run_quick_setup(config, hermes_home)
return
elif choice == 1:
# Full setup — fall through to run all sections
pass
elif choice == 7:
print_info("Exiting. Run 'hermes setup' again when ready.")
return
elif 2 <= choice <= 6:
# Individual section — map by key, not by position.
# SETUP_SECTIONS includes TTS but the returning-user menu skips it,
# so positional indexing (choice - 2) would dispatch the wrong section.
section_key = RETURNING_USER_MENU_SECTION_KEYS[choice - 2]
section = next((s for s in SETUP_SECTIONS if s[0] == section_key), None)
if section:
_, label, func = section
func(config)
save_config(config)
_print_setup_summary(config, hermes_home)
return
print()
print_header("Reconfigure")
print_success("You already have Hermes configured.")
print_info("Running the full wizard — each prompt shows your current value.")
print_info("Press Enter to keep it, or type a new value to change it.")
print_info("")
print_info("Tip: jump straight to a section with 'hermes setup model|terminal|")
print_info(" gateway|tools|agent', or fill only missing items with --quick.")
# Fall through to the "Full Setup — run all sections" block below.
# --reconfigure is now the default on existing installs; the flag
# is preserved for backwards compatibility but is a no-op here.
else:
# ── First-Time Setup ──
print()
# --reconfigure / --quick on a fresh install are meaningless — fall
# through to the normal first-time flow.
if reconfigure_requested or quick_requested:
print_info("No existing configuration found — running first-time setup.")
print()
# Offer OpenClaw migration before configuration begins
migration_ran = _offer_openclaw_migration(hermes_home)
if migration_ran:
+1 -2
View File
@@ -10,8 +10,7 @@ import random
TIPS = [
# --- Slash Commands ---
"/btw <question> asks a quick side question without tools or history — great for clarifications.",
"/background <prompt> runs a task in a separate session while your current one stays free.",
"/background <prompt> (alias /bg or /btw) runs a task in a separate session while your current one stays free.",
"/branch forks the current session so you can explore a different direction without losing progress.",
"/compress manually compresses conversation context when things get long.",
"/rollback lists filesystem checkpoints — restore files the agent modified to any prior state.",
+155 -19
View File
@@ -68,25 +68,58 @@ CONFIGURABLE_TOOLSETS = [
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
("spotify", "🎵 Spotify", "playback, search, playlists, library"),
("discord", "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
("discord_admin", "🛡️ Discord Server Admin", "list channels/roles, pin, assign roles"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify"}
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
# absent from this map is available on every platform (current behaviour).
#
# Use this for tools whose APIs only make sense on one platform (Discord
# server admin, Slack workspace admin, etc.). Keeps every other platform's
# checklist from filling up with irrelevant toggles.
_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
"discord": {"discord"},
"discord_admin": {"discord"},
}
def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
"""Return True if ``ts_key`` is configurable on ``platform``.
Toolsets without a restriction entry are allowed everywhere (the default).
"""
allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
return allowed is None or platform in allowed
def _get_effective_configurable_toolsets():
"""Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.
Plugin toolsets are appended at the end so they appear after the
built-in toolsets in the TUI checklist.
built-in toolsets in the TUI checklist. A plugin whose toolset key
already appears in ``CONFIGURABLE_TOOLSETS`` is skipped bundled
plugins (e.g. ``plugins/spotify``) share their toolset key with the
built-in entry, and we want the built-in label/description to win.
Without the dedupe, ``hermes tools`` "reconfigure existing" would
list the same toolset twice.
"""
result = list(CONFIGURABLE_TOOLSETS)
seen = {ts_key for ts_key, _, _ in result}
try:
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
result.extend(get_plugin_toolsets())
for entry in get_plugin_toolsets():
if entry[0] in seen:
continue
seen.add(entry[0])
result.append(entry)
except Exception:
pass
return result
@@ -368,13 +401,9 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Spotify Web API",
"tag": "PKCE OAuth — run `hermes auth spotify` after this",
"env_vars": [
{"key": "HERMES_SPOTIFY_CLIENT_ID", "prompt": "Spotify app client_id",
"url": "https://developer.spotify.com/dashboard"},
{"key": "HERMES_SPOTIFY_REDIRECT_URI", "prompt": "Redirect URI (must be allow-listed in your Spotify app)",
"default": "http://127.0.0.1:43827/spotify/callback"},
],
"tag": "PKCE OAuth — opens the setup wizard",
"env_vars": [],
"post_setup": "spotify",
},
],
},
@@ -478,6 +507,35 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard
# (opens the Spotify dashboard, prompts for client_id, persists
# to ~/.hermes/.env), then continues straight into PKCE. If they
# already have an app, it skips the wizard and just does OAuth.
from types import SimpleNamespace
try:
from hermes_cli.auth import login_spotify_command
except Exception as exc:
_print_warning(f" Could not load Spotify auth: {exc}")
_print_info(" Run manually: hermes auth spotify")
return
_print_info(" Starting Spotify login...")
try:
login_spotify_command(SimpleNamespace(
client_id=None, redirect_uri=None, scope=None,
no_browser=False, timeout=None,
))
_print_success(" Spotify authenticated")
except SystemExit as exc:
# User aborted the wizard, or OAuth failed — don't fail the
# toolset enable; they can retry with `hermes auth spotify`.
_print_warning(f" Spotify login did not complete: {exc}")
_print_info(" Run later: hermes auth spotify")
except Exception as exc:
_print_warning(f" Spotify login failed: {exc}")
_print_info(" Run manually: hermes auth spotify")
elif post_setup_key == "rl_training":
try:
__import__("tinker_atropos")
@@ -566,7 +624,7 @@ def _get_platform_tools(
include_default_mcp_servers: bool = True,
) -> Set[str]:
"""Resolve which individual toolset names are enabled for a platform."""
from toolsets import resolve_toolset
from toolsets import resolve_toolset, TOOLSETS
platform_toolsets = config.get("platform_toolsets") or {}
toolset_names = platform_toolsets.get(platform)
@@ -580,6 +638,8 @@ def _get_platform_tools(
toolset_names = [str(ts) for ts in toolset_names]
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_ts_keys = _get_plugin_toolset_keys()
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
# If the saved list contains any configurable keys directly, the user
# has explicitly configured this platform — use direct membership.
@@ -589,7 +649,10 @@ def _get_platform_tools(
has_explicit_config = any(ts in configurable_keys for ts in toolset_names)
if has_explicit_config:
enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
enabled_toolsets = {
ts for ts in toolset_names
if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
}
else:
# No explicit config — fall back to resolving composite toolset names
# (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -599,14 +662,52 @@ def _get_platform_tools(
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
if not _toolset_allowed_for_platform(ts_key, platform):
continue
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
default_off = set(_DEFAULT_OFF_TOOLSETS)
if platform in default_off:
# Legacy safety: if the platform's own name matches a default-off
# toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
# keep that toolset enabled on first install. Skip this dodge for
# platform-restricted toolsets — those are always opt-in even on
# their own platform (e.g. `discord` + `discord` should stay OFF).
if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
default_off.remove(platform)
enabled_toolsets -= default_off
# Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
# feishu_drive). These are part of the platform's default composite but
# absent from CONFIGURABLE_TOOLSETS, so they can't appear in the TUI
# checklist or in a user-saved config. Must run in BOTH branches —
# otherwise saving via `hermes tools` (which flips has_explicit_config
# to True) silently drops them.
platform_tool_universe = set(resolve_toolset(PLATFORMS[platform]["default_toolset"]))
configurable_tool_universe = set()
for ck in configurable_keys:
configurable_tool_universe.update(resolve_toolset(ck))
claimed = set()
for ts_key in enabled_toolsets:
claimed.update(resolve_toolset(ts_key))
skip = configurable_keys | plugin_ts_keys | platform_default_keys
skip |= {k for k in TOOLSETS if k.startswith("hermes-")}
skip |= set(_DEFAULT_OFF_TOOLSETS) - {platform}
for ts_key, ts_def in TOOLSETS.items():
if ts_key in skip:
continue
if ts_def.get("includes"):
continue
ts_tools = set(resolve_toolset(ts_key))
if not ts_tools or not ts_tools.issubset(platform_tool_universe):
continue
if ts_tools.issubset(configurable_tool_universe):
continue
if not ts_tools.issubset(claimed):
enabled_toolsets.add(ts_key)
claimed.update(ts_tools)
# Plugin toolsets: enabled by default unless explicitly disabled, or
# unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
# shipped as a bundled plugin but user must opt in via `hermes tools`
@@ -614,7 +715,6 @@ def _get_platform_tools(
# A plugin toolset is "known" for a platform once `hermes tools`
# has been saved for that platform (tracked via known_plugin_toolsets).
# Unknown plugins default to enabled; known-but-absent = disabled.
plugin_ts_keys = _get_plugin_toolset_keys()
if plugin_ts_keys:
known_map = config.get("known_plugin_toolsets", {})
known_for_platform = set(known_map.get(platform, []))
@@ -632,7 +732,6 @@ def _get_platform_tools(
# Preserve any explicit non-configurable toolset entries (for example,
# custom toolsets or MCP server names saved in platform_toolsets).
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
explicit_passthrough = {
ts
for ts in toolset_names
@@ -678,6 +777,14 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
"""
config.setdefault("platform_toolsets", {})
# Drop platform-scoped toolsets that don't apply here. Prevents the
# "Configure all platforms" checklist (or a hand-edited config.yaml)
# from turning on, say, the `discord` toolset for Telegram.
enabled_toolset_keys = {
ts for ts in enabled_toolset_keys
if _toolset_allowed_for_platform(ts, platform)
}
# Get the set of all configurable toolset keys (built-in + plugin)
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_keys = _get_plugin_toolset_keys()
@@ -692,6 +799,7 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
existing_toolsets = [str(ts) for ts in existing_toolsets]
# Preserve any entries that are NOT configurable toolsets and NOT platform
# defaults (i.e. only MCP server names should be preserved)
@@ -699,6 +807,11 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
entry for entry in existing_toolsets
if entry not in configurable_keys and entry not in platform_default_keys
}
# Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
# saving from the picker as consent to clear the "no_mcp" sentinel. The
# picker has no checkbox for no_mcp, so without this users who once set it
# by hand could never re-enable MCP servers through the UI.
preserved_entries.discard("no_mcp")
# Merge preserved entries with new enabled toolsets
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -806,7 +919,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
return _tool_token_cache
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
from toolsets import resolve_toolset
@@ -814,7 +927,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
# Pre-compute per-tool token counts (cached after first call).
tool_tokens = _estimate_tool_tokens()
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
# Drop platform-scoped toolsets that don't apply to this platform.
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
labels = []
for ts_key, ts_label, ts_desc in effective:
@@ -1728,7 +1846,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)
added = new_enabled - current_enabled
removed = current_enabled - new_enabled
@@ -2084,7 +2202,11 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]
def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
"""Print a summary of enabled/disabled toolsets and MCP tool filters."""
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
print(f"Built-in toolsets ({platform}):")
@@ -2150,6 +2272,20 @@ def tools_disable_enable_command(args):
_print_error(f"Unknown toolset '{name}'")
toolset_targets = [t for t in toolset_targets if t in valid_toolsets]
# Reject platform-scoped toolsets on platforms that don't allow them.
restricted_targets = [
t for t in toolset_targets
if not _toolset_allowed_for_platform(t, platform)
]
if restricted_targets:
for name in restricted_targets:
allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
_print_error(
f"Toolset '{name}' is not available on platform '{platform}' "
f"(only: {', '.join(allowed)})"
)
toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
if toolset_targets:
_apply_toolset_change(config, platform, toolset_targets, action)
+365 -14
View File
@@ -49,7 +49,7 @@ from hermes_cli.config import (
from gateway.status import get_running_pid, read_runtime_status
try:
from fastapi import FastAPI, HTTPException, Request
from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
@@ -73,6 +73,10 @@ app = FastAPI(title="Hermes Agent", version=__version__)
_SESSION_TOKEN = secrets.token_urlsafe(32)
_SESSION_HEADER_NAME = "X-Hermes-Session-Token"
# In-browser Chat tab (/chat, /api/pty, …). Off unless ``hermes dashboard --tui``
# or HERMES_DASHBOARD_TUI=1. Set from :func:`start_server`.
_DASHBOARD_EMBEDDED_CHAT_ENABLED = False
# Simple rate limiter for the reveal endpoint
_reveal_timestamps: List[float] = []
_REVEAL_MAX_PER_WINDOW = 5
@@ -283,7 +287,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"display.busy_input_mode": {
"type": "select",
"description": "Input behavior while agent is running",
"options": ["queue", "interrupt", "block"],
"options": ["interrupt", "queue"],
},
"memory.provider": {
"type": "select",
@@ -1529,26 +1533,30 @@ def _submit_anthropic_pkce(session_id: str, code_input: str) -> Dict[str, Any]:
with urllib.request.urlopen(req, timeout=20) as resp:
result = json.loads(resp.read().decode())
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
access_token = result.get("access_token", "")
refresh_token = result.get("refresh_token", "")
expires_in = int(result.get("expires_in") or 3600)
if not access_token:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
return {"ok": False, "status": "error", "message": sess["error_message"]}
expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
try:
_save_anthropic_oauth_creds(access_token, refresh_token, expires_at_ms)
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
sess["status"] = "approved"
with _oauth_sessions_lock:
sess["status"] = "approved"
_log.info("oauth/pkce: anthropic login completed (session=%s)", session_id)
return {"ok": True, "status": "approved"}
@@ -2263,6 +2271,329 @@ async def get_usage_analytics(days: int = 30):
db.close()
# ---------------------------------------------------------------------------
# /api/pty — PTY-over-WebSocket bridge for the dashboard "Chat" tab.
#
# The endpoint spawns the same ``hermes --tui`` binary the CLI uses, behind
# a POSIX pseudo-terminal, and forwards bytes + resize escapes across a
# WebSocket. The browser renders the ANSI through xterm.js (see
# web/src/pages/ChatPage.tsx).
#
# Auth: ``?token=<session_token>`` query param (browsers can't set
# Authorization on the WS upgrade). Same ephemeral ``_SESSION_TOKEN`` as
# REST. Localhost-only — we defensively reject non-loopback clients even
# though uvicorn binds to 127.0.0.1.
# ---------------------------------------------------------------------------
import re
import asyncio
from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
_PTY_READ_CHUNK_TIMEOUT = 0.2
_VALID_CHANNEL_RE = re.compile(r"^[A-Za-z0-9._-]{1,128}$")
# Starlette's TestClient reports the peer as "testclient"; treat it as
# loopback so tests don't need to rewrite request scope.
_LOOPBACK_HOSTS = frozenset({"127.0.0.1", "::1", "localhost", "testclient"})
# Per-channel subscriber registry used by /api/pub (PTY-side gateway → dashboard)
# and /api/events (dashboard → browser sidebar). Keyed by an opaque channel id
# the chat tab generates on mount; entries auto-evict when the last subscriber
# drops AND the publisher has disconnected.
_event_channels: dict[str, set] = {}
_event_lock = asyncio.Lock()
def _resolve_chat_argv(
resume: Optional[str] = None,
sidecar_url: Optional[str] = None,
) -> tuple[list[str], Optional[str], Optional[dict]]:
"""Resolve the argv + cwd + env for the chat PTY.
Default: whatever ``hermes --tui`` would run. Tests monkeypatch this
function to inject a tiny fake command (``cat``, ``sh -c 'printf …'``)
so nothing has to build Node or the TUI bundle.
Session resume is propagated via the ``HERMES_TUI_RESUME`` env var
matching what ``hermes_cli.main._launch_tui`` does for the CLI path.
Appending ``--resume <id>`` to argv doesn't work because ``ui-tui`` does
not parse its argv.
`sidecar_url` (when set) is forwarded as ``HERMES_TUI_SIDECAR_URL`` so
the spawned ``tui_gateway.entry`` can mirror dispatcher emits to the
dashboard's ``/api/pub`` endpoint (see :func:`pub_ws`).
"""
from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
env: Optional[dict] = None
if resume or sidecar_url:
env = os.environ.copy()
if resume:
env["HERMES_TUI_RESUME"] = resume
if sidecar_url:
env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
return list(argv), str(cwd) if cwd else None, env
def _build_sidecar_url(channel: str) -> Optional[str]:
"""ws:// URL the PTY child should publish events to, or None when unbound."""
host = getattr(app.state, "bound_host", None)
port = getattr(app.state, "bound_port", None)
if not host or not port:
return None
netloc = f"[{host}]:{port}" if ":" in host and not host.startswith("[") else f"{host}:{port}"
qs = urllib.parse.urlencode({"token": _SESSION_TOKEN, "channel": channel})
return f"ws://{netloc}/api/pub?{qs}"
async def _broadcast_event(channel: str, payload: str) -> None:
"""Fan out one publisher frame to every subscriber on `channel`."""
async with _event_lock:
subs = list(_event_channels.get(channel, ()))
for sub in subs:
try:
await sub.send_text(payload)
except Exception:
# Subscriber went away mid-send; the /api/events finally clause
# will remove it from the registry on its next iteration.
pass
def _channel_or_close_code(ws: WebSocket) -> Optional[str]:
"""Return the channel id from the query string or None if invalid."""
channel = ws.query_params.get("channel", "")
return channel if _VALID_CHANNEL_RE.match(channel) else None
@app.websocket("/api/pty")
async def pty_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
# --- auth + loopback check (before accept so we can close cleanly) ---
token = ws.query_params.get("token", "")
expected = _SESSION_TOKEN
if not hmac.compare_digest(token.encode(), expected.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
await ws.accept()
# --- spawn PTY ------------------------------------------------------
resume = ws.query_params.get("resume") or None
channel = _channel_or_close_code(ws)
sidecar_url = _build_sidecar_url(channel) if channel else None
try:
argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
except SystemExit as exc:
# _make_tui_argv calls sys.exit(1) when node/npm is missing.
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
try:
bridge = PtyBridge.spawn(argv, cwd=cwd, env=env)
except PtyUnavailableError as exc:
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
except (FileNotFoundError, OSError) as exc:
await ws.send_text(f"\r\n\x1b[31mChat failed to start: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
loop = asyncio.get_running_loop()
# --- reader task: PTY master → WebSocket ----------------------------
async def pump_pty_to_ws() -> None:
while True:
chunk = await loop.run_in_executor(
None, bridge.read, _PTY_READ_CHUNK_TIMEOUT
)
if chunk is None: # EOF
return
if not chunk: # no data this tick; yield control and retry
await asyncio.sleep(0)
continue
try:
await ws.send_bytes(chunk)
except Exception:
return
reader_task = asyncio.create_task(pump_pty_to_ws())
# --- writer loop: WebSocket → PTY master ----------------------------
try:
while True:
msg = await ws.receive()
msg_type = msg.get("type")
if msg_type == "websocket.disconnect":
break
raw = msg.get("bytes")
if raw is None:
text = msg.get("text")
raw = text.encode("utf-8") if isinstance(text, str) else b""
if not raw:
continue
# Resize escape is consumed locally, never written to the PTY.
match = _RESIZE_RE.match(raw)
if match and match.end() == len(raw):
cols = int(match.group(1))
rows = int(match.group(2))
bridge.resize(cols=cols, rows=rows)
continue
bridge.write(raw)
except WebSocketDisconnect:
pass
finally:
reader_task.cancel()
try:
await reader_task
except (asyncio.CancelledError, Exception):
pass
bridge.close()
# ---------------------------------------------------------------------------
# /api/ws — JSON-RPC WebSocket sidecar for the dashboard "Chat" tab.
#
# Drives the same `tui_gateway.dispatch` surface Ink uses over stdio, so the
# dashboard can render structured metadata (model badge, tool-call sidebar,
# slash launcher, session info) alongside the xterm.js terminal that PTY
# already paints. Both transports bind to the same session id when one is
# active, so a tool.start emitted by the agent fans out to both sinks.
# ---------------------------------------------------------------------------
@app.websocket("/api/ws")
async def gateway_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
from tui_gateway.ws import handle_ws
await handle_ws(ws)
# ---------------------------------------------------------------------------
# /api/pub + /api/events — chat-tab event broadcast.
#
# The PTY-side ``tui_gateway.entry`` opens /api/pub at startup (driven by
# HERMES_TUI_SIDECAR_URL set in /api/pty's PTY env) and writes every
# dispatcher emit through it. The dashboard fans those frames out to any
# subscriber that opened /api/events on the same channel id. This is what
# gives the React sidebar its tool-call feed without breaking the PTY
# child's stdio handshake with Ink.
# ---------------------------------------------------------------------------
@app.websocket("/api/pub")
async def pub_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
try:
while True:
await _broadcast_event(channel, await ws.receive_text())
except WebSocketDisconnect:
pass
@app.websocket("/api/events")
async def events_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
async with _event_lock:
_event_channels.setdefault(channel, set()).add(ws)
try:
while True:
# Subscribers don't speak — the receive() just blocks until
# disconnect so the connection stays open as long as the
# browser holds it.
await ws.receive_text()
except WebSocketDisconnect:
pass
finally:
async with _event_lock:
subs = _event_channels.get(channel)
if subs is not None:
subs.discard(ws)
if not subs:
_event_channels.pop(channel, None)
def mount_spa(application: FastAPI):
"""Mount the built SPA. Falls back to index.html for client-side routing.
@@ -2284,8 +2615,10 @@ def mount_spa(application: FastAPI):
def _serve_index():
"""Return index.html with the session token injected."""
html = _index_path.read_text()
chat_js = "true" if _DASHBOARD_EMBEDDED_CHAT_ENABLED else "false"
token_script = (
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";</script>'
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";'
f"window.__HERMES_DASHBOARD_EMBEDDED_CHAT__={chat_js};</script>"
)
html = html.replace("</head>", f"{token_script}</head>", 1)
return HTMLResponse(
@@ -2770,13 +3103,23 @@ def _mount_plugin_api_routes():
_log.warning("Plugin %s declares api=%s but file not found", plugin["name"], api_file_name)
continue
try:
spec = importlib.util.spec_from_file_location(
f"hermes_dashboard_plugin_{plugin['name']}", api_path,
)
module_name = f"hermes_dashboard_plugin_{plugin['name']}"
spec = importlib.util.spec_from_file_location(module_name, api_path)
if spec is None or spec.loader is None:
continue
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
# Register in sys.modules BEFORE exec_module so pydantic/FastAPI
# can resolve forward references (e.g. models defined in a file
# that uses `from __future__ import annotations`). Without this,
# TypeAdapter lazy-build fails at first request with
# "is not fully defined" because the module namespace isn't
# reachable by name for string-annotation resolution.
sys.modules[module_name] = mod
try:
spec.loader.exec_module(mod)
except Exception:
sys.modules.pop(module_name, None)
raise
router = getattr(mod, "router", None)
if router is None:
_log.warning("Plugin %s api file has no 'router' attribute", plugin["name"])
@@ -2798,10 +3141,15 @@ def start_server(
port: int = 9119,
open_browser: bool = True,
allow_public: bool = False,
*,
embedded_chat: bool = False,
):
"""Start the web UI server."""
import uvicorn
global _DASHBOARD_EMBEDDED_CHAT_ENABLED
_DASHBOARD_EMBEDDED_CHAT_ENABLED = embedded_chat
_LOCALHOST = ("127.0.0.1", "localhost", "::1")
if host not in _LOCALHOST and not allow_public:
raise SystemExit(
@@ -2817,7 +3165,10 @@ def start_server(
# Record the bound host so host_header_middleware can validate incoming
# Host headers against it. Defends against DNS rebinding (GHSA-ppp5-vxwm-4cf7).
# bound_port is also stashed so /api/pty can build the back-WS URL the
# PTY child uses to publish events to the dashboard sidebar.
app.state.bound_host = host
app.state.bound_port = port
if open_browser:
import webbrowser
+29 -5
View File
@@ -31,7 +31,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 8
SCHEMA_VERSION = 9
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -83,7 +83,8 @@ CREATE TABLE IF NOT EXISTS messages (
reasoning TEXT,
reasoning_content TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
codex_reasoning_items TEXT,
codex_message_items TEXT
);
CREATE TABLE IF NOT EXISTS state_meta (
@@ -356,6 +357,15 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -956,6 +966,7 @@ class SessionDB:
reasoning_content: str = None,
reasoning_details: Any = None,
codex_reasoning_items: Any = None,
codex_message_items: Any = None,
) -> int:
"""
Append a message to a session. Returns the message row ID.
@@ -972,6 +983,10 @@ class SessionDB:
json.dumps(codex_reasoning_items)
if codex_reasoning_items else None
)
codex_message_items_json = (
json.dumps(codex_message_items)
if codex_message_items else None
)
tool_calls_json = json.dumps(tool_calls) if tool_calls else None
# Pre-compute tool call count
@@ -983,8 +998,9 @@ class SessionDB:
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -999,6 +1015,7 @@ class SessionDB:
reasoning_content,
reasoning_details_json,
codex_items_json,
codex_message_items_json,
),
)
msg_id = cursor.lastrowid
@@ -1112,7 +1129,8 @@ class SessionDB:
with self._lock:
cursor = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items, "
"codex_message_items "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
@@ -1150,6 +1168,12 @@ class SessionDB:
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_reasoning_items, falling back to None")
msg["codex_reasoning_items"] = None
if row["codex_message_items"]:
try:
msg["codex_message_items"] = json.loads(row["codex_message_items"])
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_message_items, falling back to None")
msg["codex_message_items"] = None
messages.append(msg)
return messages
+41 -25
View File
@@ -24,6 +24,7 @@ import json
import asyncio
import logging
import threading
import time
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import discover_builtin_tools, registry
@@ -288,30 +289,34 @@ def get_tool_definitions(
filtered_tools[i] = {"type": "function", "function": dynamic_schema}
break
# Rebuild discord_server schema based on the bot's privileged intents
# (detected from GET /applications/@me) and the user's action allowlist
# in config. Hides actions the bot's intents don't support so the
# model never attempts them, and annotates fetch_messages when the
# Rebuild discord / discord_admin schemas based on the bot's privileged
# intents (detected from GET /applications/@me) and the user's action
# allowlist in config. Hides actions the bot's intents don't support so
# the model never attempts them, and annotates fetch_messages when the
# MESSAGE_CONTENT intent is missing.
if "discord_server" in available_tool_names:
try:
from tools.discord_tool import get_dynamic_schema
dynamic = get_dynamic_schema()
except Exception: # pragma: no cover — defensive, fall back to static
dynamic = None
if dynamic is None:
# Tool filtered out entirely (empty allowlist or detection disabled
# the only remaining actions). Drop it from the schema list.
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != "discord_server"
]
available_tool_names.discard("discord_server")
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == "discord_server":
filtered_tools[i] = {"type": "function", "function": dynamic}
break
_discord_schema_fns = {
"discord": "get_dynamic_schema_core",
"discord_admin": "get_dynamic_schema_admin",
}
for discord_tool_name in _discord_schema_fns:
if discord_tool_name in available_tool_names:
try:
from tools import discord_tool as _dt
schema_fn = getattr(_dt, _discord_schema_fns[discord_tool_name])
dynamic = schema_fn()
except Exception:
dynamic = None
if dynamic is None:
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != discord_tool_name
]
available_tool_names.discard(discord_tool_name)
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == discord_tool_name:
filtered_tools[i] = {"type": "function", "function": dynamic}
break
# Strip web tool cross-references from browser_navigate description when
# web_search / web_extract are not available. The static schema says
@@ -464,9 +469,9 @@ def _coerce_number(value: str, integer_only: bool = False):
f = float(value)
except (ValueError, OverflowError):
return value
# Guard against inf/nan before int() conversion
# Guard against inf/nan — not JSON-serializable, keep original string
if f != f or f == float("inf") or f == float("-inf"):
return f
return value
# If it looks like an integer (no fractional part), return int
if f == int(f):
return int(f)
@@ -563,6 +568,14 @@ def handle_function_call(
except Exception:
pass # file_tools may not be loaded yet
# Measure tool dispatch latency so post_tool_call and
# transform_tool_result hooks can observe per-tool duration.
# Inspired by Claude Code 2.1.119, which added ``duration_ms`` to
# PostToolUse hook inputs so plugin authors can build latency
# dashboards, budget alerts, and regression canaries without having
# to wrap every tool manually. We use monotonic() so the value is
# unaffected by wall-clock adjustments during the call.
_dispatch_start = time.monotonic()
if function_name == "execute_code":
# Prefer the caller-provided list so subagents can't overwrite
# the parent's tool set via the process-global.
@@ -578,6 +591,7 @@ def handle_function_call(
task_id=task_id,
user_task=user_task,
)
duration_ms = int((time.monotonic() - _dispatch_start) * 1000)
try:
from hermes_cli.plugins import invoke_hook
@@ -589,6 +603,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
except Exception:
pass
@@ -609,6 +624,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
for hook_result in hook_results:
if isinstance(hook_result, str):
+1 -1
View File
@@ -156,7 +156,7 @@
for entry in "''${ENTRIES[@]}"; do
IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --rebuild --print-build-logs 2>&1)
STATUS=$?
if [ "$STATUS" -eq 0 ]; then
echo " ok"
+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-TS/vrCHbdvXkPcAPxImKzAd2pdDCrKlgYZkXBMQ+TEg=";
hash = "sha256-4Z8KQ69QhO83X6zff+5urWBv6MME686MhTTMdwSl65o=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
+25
View File
@@ -91,4 +91,29 @@
// Register this plugin — the dashboard picks it up automatically.
window.__HERMES_PLUGINS__.register("example", ExamplePage);
// ─────────────────────────────────────────────────────────────────────
// Page-scoped slot demo: inject a small banner at the top of /sessions.
//
// Built-in pages expose named slots (<page>:top, <page>:bottom) that
// plugins can populate without overriding the whole route. The
// manifest lists the slots we use in its `slots` array so the shell
// knows to render <PluginSlot name="sessions:top" /> there.
// ─────────────────────────────────────────────────────────────────────
function SessionsTopBanner() {
return React.createElement(Card, {
className: "border-dashed",
},
React.createElement(CardContent, { className: "flex items-center gap-3 py-2" },
React.createElement(Badge, { variant: "outline" }, "Example"),
React.createElement("span", {
className: "text-xs text-muted-foreground",
}, "This banner was injected into the Sessions page by the example plugin via the ",
React.createElement("code", { className: "font-courier" }, "sessions:top"),
" slot."),
),
);
}
window.__HERMES_PLUGINS__.registerSlot("example", "sessions:top", SessionsTopBanner);
})();
@@ -8,6 +8,7 @@
"path": "/example",
"position": "after:skills"
},
"slots": ["sessions:top"],
"entry": "dist/index.js",
"api": "plugin_api.py"
}
File diff suppressed because it is too large Load Diff
+752
View File
@@ -0,0 +1,752 @@
/*
* Hermes Kanban dashboard plugin styles.
*
* All colors reference theme CSS vars so the board reskins with the
* active dashboard theme. No hardcoded palette.
*/
.hermes-kanban {
width: 100%;
}
/* ---- Columns layout -------------------------------------------------- */
.hermes-kanban-columns {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(260px, 1fr));
gap: 0.75rem;
align-items: start;
}
.hermes-kanban-column {
display: flex;
flex-direction: column;
background: color-mix(in srgb, var(--color-card) 85%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius);
padding: 0.5rem;
min-height: 200px;
max-height: calc(100vh - 220px);
transition: border-color 120ms ease, background-color 120ms ease;
}
.hermes-kanban-column--drop {
border-color: var(--color-ring);
background: color-mix(in srgb, var(--color-ring) 8%, var(--color-card));
}
.hermes-kanban-column-header {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.25rem 0.25rem 0.35rem;
font-weight: 600;
font-size: 0.85rem;
color: var(--color-foreground);
}
.hermes-kanban-column-label {
flex: 1;
letter-spacing: 0.01em;
}
.hermes-kanban-column-count {
font-variant-numeric: tabular-nums;
color: var(--color-muted-foreground);
font-size: 0.75rem;
font-weight: 500;
}
.hermes-kanban-column-add {
appearance: none;
background: transparent;
border: 1px solid var(--color-border);
color: var(--color-foreground);
border-radius: var(--radius-sm, 0.25rem);
width: 22px;
height: 22px;
line-height: 1;
font-size: 1rem;
cursor: pointer;
}
.hermes-kanban-column-add:hover {
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
}
.hermes-kanban-column-sub {
padding: 0 0.25rem 0.5rem;
font-size: 0.7rem;
color: var(--color-muted-foreground);
border-bottom: 1px solid color-mix(in srgb, var(--color-border) 60%, transparent);
margin-bottom: 0.5rem;
}
.hermes-kanban-column-body {
display: flex;
flex-direction: column;
gap: 0.45rem;
overflow-y: auto;
padding-right: 0.1rem;
}
.hermes-kanban-empty {
padding: 1.5rem 0.5rem;
text-align: center;
font-size: 0.75rem;
color: var(--color-muted-foreground);
border: 1px dashed color-mix(in srgb, var(--color-border) 70%, transparent);
border-radius: var(--radius-sm, 0.25rem);
}
/* ---- Status dots ----------------------------------------------------- */
.hermes-kanban-dot {
display: inline-block;
width: 0.5rem;
height: 0.5rem;
border-radius: 999px;
background: var(--color-muted-foreground);
}
.hermes-kanban-dot-triage { background: #b47dd6; } /* lilac — fresh/unspecified */
.hermes-kanban-dot-todo { background: var(--color-muted-foreground); }
.hermes-kanban-dot-ready { background: #d4b348; } /* amber */
.hermes-kanban-dot-running { background: #3fb97d; } /* green */
.hermes-kanban-dot-blocked { background: var(--color-destructive, #d14a4a); }
.hermes-kanban-dot-done { background: #4a8cd1; } /* blue */
.hermes-kanban-dot-archived { background: var(--color-border); }
/* ---- Progress pill (N/M child tasks done) --------------------------- */
.hermes-kanban-progress {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.62rem;
padding: 0.05rem 0.35rem;
border-radius: 999px;
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
border: 1px solid color-mix(in srgb, var(--color-border) 80%, transparent);
color: var(--color-muted-foreground);
letter-spacing: 0.02em;
}
.hermes-kanban-progress--full {
background: color-mix(in srgb, #3fb97d 22%, transparent);
border-color: color-mix(in srgb, #3fb97d 45%, transparent);
color: var(--color-foreground);
}
/* ---- Lanes (per-profile sub-grouping inside Running) ---------------- */
.hermes-kanban-lane {
display: flex;
flex-direction: column;
gap: 0.35rem;
padding: 0.25rem 0 0.35rem;
border-top: 1px dashed color-mix(in srgb, var(--color-border) 70%, transparent);
}
.hermes-kanban-lane:first-child {
border-top: 0;
padding-top: 0;
}
.hermes-kanban-lane-head {
display: flex;
align-items: center;
gap: 0.4rem;
font-size: 0.65rem;
text-transform: uppercase;
letter-spacing: 0.08em;
color: var(--color-muted-foreground);
padding: 0 0.1rem;
}
.hermes-kanban-lane-name {
font-weight: 600;
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-lane-count {
margin-left: auto;
font-variant-numeric: tabular-nums;
}
/* ---- Card ------------------------------------------------------------ */
.hermes-kanban-card {
cursor: grab;
transition: transform 100ms ease, box-shadow 100ms ease;
}
.hermes-kanban-card:hover {
box-shadow: 0 1px 0 0 var(--color-ring) inset, 0 0 0 1px var(--color-ring) inset;
}
.hermes-kanban-card:active {
cursor: grabbing;
transform: scale(0.995);
}
.hermes-kanban-card-content {
padding: 0.5rem 0.6rem !important;
display: flex;
flex-direction: column;
gap: 0.3rem;
}
.hermes-kanban-card-row {
display: flex;
align-items: center;
gap: 0.35rem;
flex-wrap: wrap;
}
.hermes-kanban-card-id {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.65rem;
color: var(--color-muted-foreground);
letter-spacing: 0.03em;
}
.hermes-kanban-card-title {
font-size: 0.85rem;
font-weight: 500;
line-height: 1.3;
color: var(--color-foreground);
word-break: break-word;
}
.hermes-kanban-card-meta {
font-size: 0.7rem;
color: var(--color-muted-foreground);
gap: 0.55rem;
}
.hermes-kanban-priority {
font-size: 0.6rem !important;
padding: 0.05rem 0.3rem !important;
background: color-mix(in srgb, var(--color-ring) 18%, transparent);
color: var(--color-foreground);
border: 1px solid color-mix(in srgb, var(--color-ring) 40%, transparent);
}
.hermes-kanban-tag {
font-size: 0.6rem !important;
padding: 0.05rem 0.3rem !important;
}
.hermes-kanban-assignee {
font-weight: 500;
color: color-mix(in srgb, var(--color-foreground) 80%, var(--color-muted-foreground));
}
.hermes-kanban-unassigned {
font-style: italic;
}
.hermes-kanban-ago {
margin-left: auto;
}
/* ---- Inline create --------------------------------------------------- */
.hermes-kanban-inline-create {
display: flex;
flex-direction: column;
gap: 0.35rem;
padding: 0.5rem;
margin-bottom: 0.5rem;
background: color-mix(in srgb, var(--color-card) 70%, transparent);
border: 1px dashed var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
}
/* ---- Drawer (task detail side panel) --------------------------------- */
.hermes-kanban-drawer-shade {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.45);
z-index: 60;
display: flex;
justify-content: flex-end;
}
.hermes-kanban-drawer {
width: min(480px, 92vw);
height: 100vh;
background: var(--color-card);
border-left: 1px solid var(--color-border);
display: flex;
flex-direction: column;
box-shadow: -4px 0 18px rgba(0, 0, 0, 0.35);
animation: hermes-kanban-drawer-in 180ms ease-out;
}
@keyframes hermes-kanban-drawer-in {
from { transform: translateX(100%); opacity: 0.3; }
to { transform: translateX(0); opacity: 1; }
}
.hermes-kanban-drawer-head {
display: flex;
align-items: center;
justify-content: space-between;
padding: 0.6rem 0.8rem;
border-bottom: 1px solid var(--color-border);
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-drawer-close {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
font-size: 1.25rem;
line-height: 1;
cursor: pointer;
padding: 0 0.25rem;
}
.hermes-kanban-drawer-close:hover { color: var(--color-foreground); }
.hermes-kanban-drawer-body {
flex: 1;
overflow-y: auto;
padding: 0.9rem;
display: flex;
flex-direction: column;
gap: 0.85rem;
}
.hermes-kanban-drawer-title {
display: flex;
align-items: center;
gap: 0.5rem;
font-size: 1rem;
font-weight: 600;
}
.hermes-kanban-drawer-meta {
display: flex;
flex-direction: column;
gap: 0.15rem;
padding: 0.5rem 0.6rem;
background: color-mix(in srgb, var(--color-foreground) 4%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
}
.hermes-kanban-meta-row {
display: flex;
gap: 0.5rem;
font-size: 0.72rem;
}
.hermes-kanban-meta-label {
width: 92px;
color: var(--color-muted-foreground);
}
.hermes-kanban-meta-value {
color: var(--color-foreground);
word-break: break-word;
}
.hermes-kanban-actions {
display: flex;
flex-wrap: wrap;
gap: 0.3rem;
}
.hermes-kanban-section {
display: flex;
flex-direction: column;
gap: 0.35rem;
}
.hermes-kanban-section-head {
font-size: 0.72rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.07em;
color: var(--color-muted-foreground);
}
.hermes-kanban-pre {
margin: 0;
padding: 0.45rem 0.55rem;
white-space: pre-wrap;
word-break: break-word;
background: color-mix(in srgb, var(--color-foreground) 4%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.72rem;
color: var(--color-foreground);
}
.hermes-kanban-comment {
border-left: 2px solid color-mix(in srgb, var(--color-ring) 35%, transparent);
padding-left: 0.5rem;
display: flex;
flex-direction: column;
gap: 0.2rem;
}
.hermes-kanban-comment-head {
display: flex;
gap: 0.5rem;
font-size: 0.7rem;
}
.hermes-kanban-comment-author {
font-weight: 600;
color: var(--color-foreground);
}
.hermes-kanban-comment-ago {
color: var(--color-muted-foreground);
}
.hermes-kanban-event {
display: flex;
gap: 0.5rem;
font-size: 0.7rem;
color: var(--color-muted-foreground);
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-event-kind {
color: var(--color-foreground);
min-width: 6rem;
}
.hermes-kanban-event-payload {
color: var(--color-muted-foreground);
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
max-width: 280px;
}
.hermes-kanban-drawer-comment-row {
display: flex;
gap: 0.4rem;
padding: 0.55rem 0.75rem;
border-top: 1px solid var(--color-border);
background: color-mix(in srgb, var(--color-card) 90%, transparent);
}
.hermes-kanban-count {
display: inline-flex;
gap: 0.2rem;
align-items: center;
}
/* ---- Selection chrome ----------------------------------------------- */
.hermes-kanban-card--selected :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px var(--color-ring) inset,
0 0 0 1px var(--color-ring) inset;
background: color-mix(in srgb, var(--color-ring) 6%, var(--color-card));
}
.hermes-kanban-card-check {
width: 0.85rem;
height: 0.85rem;
margin: 0;
cursor: pointer;
accent-color: var(--color-ring);
}
/* ---- Bulk action bar ------------------------------------------------ */
.hermes-kanban-bulk {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.4rem 0.75rem;
background: color-mix(in srgb, var(--color-ring) 10%, var(--color-card));
border: 1px solid color-mix(in srgb, var(--color-ring) 40%, var(--color-border));
border-radius: var(--radius-sm, 0.25rem);
flex-wrap: wrap;
}
.hermes-kanban-bulk-count {
font-weight: 600;
font-size: 0.75rem;
padding-right: 0.25rem;
}
.hermes-kanban-bulk-btn {
height: 1.7rem !important;
padding: 0 0.5rem !important;
font-size: 0.7rem !important;
border: 1px solid var(--color-border);
cursor: pointer;
}
.hermes-kanban-bulk-btn:hover {
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
}
.hermes-kanban-bulk-reassign {
display: flex;
align-items: center;
gap: 0.25rem;
padding-left: 0.5rem;
border-left: 1px solid color-mix(in srgb, var(--color-border) 70%, transparent);
}
/* ---- Dependency editor chips --------------------------------------- */
.hermes-kanban-deps-row {
display: flex;
align-items: center;
gap: 0.5rem;
margin-bottom: 0.4rem;
}
.hermes-kanban-deps-label {
font-size: 0.68rem;
text-transform: uppercase;
letter-spacing: 0.08em;
color: var(--color-muted-foreground);
min-width: 4rem;
}
.hermes-kanban-deps-chips {
display: flex;
gap: 0.3rem;
flex-wrap: wrap;
flex: 1;
}
.hermes-kanban-deps-empty {
font-size: 0.7rem;
color: var(--color-muted-foreground);
font-style: italic;
}
.hermes-kanban-dep-chip {
display: inline-flex;
align-items: center;
gap: 0.15rem;
padding: 0.1rem 0.35rem;
background: color-mix(in srgb, var(--color-foreground) 6%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.68rem;
color: var(--color-foreground);
}
.hermes-kanban-dep-chip-x {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
cursor: pointer;
font-size: 0.85rem;
line-height: 1;
padding: 0 0.15rem;
}
.hermes-kanban-dep-chip-x:hover { color: var(--color-destructive, #d14a4a); }
/* ---- Inline edit affordances --------------------------------------- */
.hermes-kanban-editable {
cursor: pointer;
border-bottom: 1px dotted color-mix(in srgb, var(--color-border) 80%, transparent);
}
.hermes-kanban-editable:hover {
color: var(--color-foreground);
border-bottom-color: var(--color-ring);
}
.hermes-kanban-drawer-title-text {
cursor: pointer;
}
.hermes-kanban-drawer-title-text:hover {
text-decoration: underline;
text-decoration-color: var(--color-ring);
text-decoration-style: dotted;
text-underline-offset: 3px;
}
.hermes-kanban-edit-row {
display: flex;
align-items: center;
gap: 0.35rem;
width: 100%;
}
.hermes-kanban-section-head-row {
display: flex;
align-items: center;
justify-content: space-between;
gap: 0.5rem;
}
.hermes-kanban-edit-link {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 0.05em;
cursor: pointer;
padding: 0;
}
.hermes-kanban-edit-link:hover { color: var(--color-ring); }
.hermes-kanban-textarea {
width: 100%;
min-height: 8rem;
background: var(--color-card);
color: var(--color-foreground);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
padding: 0.5rem 0.6rem;
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.8rem;
line-height: 1.5;
resize: vertical;
}
.hermes-kanban-textarea:focus {
outline: none;
border-color: var(--color-ring);
box-shadow: 0 0 0 2px color-mix(in srgb, var(--color-ring) 30%, transparent);
}
/* ---- Markdown rendering -------------------------------------------- */
.hermes-kanban-md {
font-size: 0.8rem;
line-height: 1.55;
color: var(--color-foreground);
}
.hermes-kanban-md p { margin: 0.25rem 0; }
.hermes-kanban-md h1,
.hermes-kanban-md h2,
.hermes-kanban-md h3,
.hermes-kanban-md h4 {
margin: 0.6rem 0 0.2rem;
line-height: 1.25;
}
.hermes-kanban-md h1 { font-size: 1.05rem; }
.hermes-kanban-md h2 { font-size: 0.95rem; }
.hermes-kanban-md h3 { font-size: 0.88rem; }
.hermes-kanban-md h4 { font-size: 0.82rem; }
.hermes-kanban-md ul {
margin: 0.25rem 0 0.25rem 1.1rem;
padding: 0;
}
.hermes-kanban-md li { margin: 0.1rem 0; }
.hermes-kanban-md a {
color: var(--color-ring);
text-decoration: underline;
}
.hermes-kanban-md code {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.75rem;
padding: 0.05rem 0.3rem;
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
border-radius: 3px;
}
.hermes-kanban-md-code {
margin: 0.35rem 0;
padding: 0.5rem 0.6rem;
background: color-mix(in srgb, var(--color-foreground) 5%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
overflow-x: auto;
}
.hermes-kanban-md-code code {
background: transparent;
padding: 0;
font-size: 0.75rem;
white-space: pre;
}
.hermes-kanban-md strong { font-weight: 600; }
/* ---- Touch-drag proxy ---------------------------------------------- */
.hermes-kanban-touch-proxy {
pointer-events: none;
opacity: 0.85;
box-shadow: 0 8px 20px rgba(0, 0, 0, 0.35);
transform: scale(1.02);
transition: none;
}
/* ---- Staleness tiers ------------------------------------------------ */
.hermes-kanban-card--stale-amber :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 1px #d4b34888 inset;
}
.hermes-kanban-card--stale-amber:hover :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px #d4b348 inset;
}
.hermes-kanban-card--stale-red :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 1px var(--color-destructive, #d14a4a) inset,
0 0 8px color-mix(in srgb, var(--color-destructive, #d14a4a) 30%, transparent);
}
.hermes-kanban-card--stale-red:hover :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px var(--color-destructive, #d14a4a) inset,
0 0 10px color-mix(in srgb, var(--color-destructive, #d14a4a) 45%, transparent);
}
/* ---- Worker log pane ------------------------------------------------ */
.hermes-kanban-log {
max-height: 340px;
overflow: auto;
white-space: pre;
font-size: 0.7rem;
line-height: 1.45;
}
/* ---- Run history (per-attempt log in the drawer) ------------------- */
.hermes-kanban-run {
border-left: 2px solid var(--color-border);
padding: 0.35rem 0.5rem;
margin-bottom: 0.4rem;
background: color-mix(in srgb, var(--color-foreground) 3%, transparent);
border-radius: var(--radius-sm, 0.25rem);
}
.hermes-kanban-run--active { border-left-color: #3fb97d; }
.hermes-kanban-run--completed { border-left-color: #4a8cd1; }
.hermes-kanban-run--ended { border-left-color: #6b7280; } /* generic fallback when outcome is unset */
.hermes-kanban-run--blocked { border-left-color: var(--color-destructive, #d14a4a); }
.hermes-kanban-run--crashed,
.hermes-kanban-run--timed_out,
.hermes-kanban-run--gave_up,
.hermes-kanban-run--spawn_failed {
border-left-color: var(--color-destructive, #d14a4a);
background: color-mix(in srgb, var(--color-destructive, #d14a4a) 6%, transparent);
}
.hermes-kanban-run--reclaimed { border-left-color: #d4b348; }
.hermes-kanban-run-head {
display: flex;
align-items: center;
gap: 0.6rem;
font-size: 0.7rem;
}
.hermes-kanban-run-outcome {
font-family: var(--font-mono, ui-monospace, monospace);
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--color-foreground);
}
.hermes-kanban-run-profile {
color: var(--color-muted-foreground);
}
.hermes-kanban-run-elapsed {
font-variant-numeric: tabular-nums;
color: var(--color-muted-foreground);
}
.hermes-kanban-run-ago {
margin-left: auto;
color: var(--color-muted-foreground);
}
.hermes-kanban-run-summary {
font-size: 0.75rem;
padding: 0.2rem 0 0;
color: var(--color-foreground);
}
.hermes-kanban-run-error {
font-size: 0.7rem;
color: var(--color-destructive, #d14a4a);
padding: 0.15rem 0 0;
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-run-meta {
display: block;
font-size: 0.65rem;
padding: 0.15rem 0 0;
color: var(--color-muted-foreground);
white-space: pre-wrap;
word-break: break-word;
font-family: var(--font-mono, ui-monospace, monospace);
}
+14
View File
@@ -0,0 +1,14 @@
{
"name": "kanban",
"label": "Kanban",
"description": "Multi-agent collaboration board — drag-drop cards across columns, read comment threads, see which profile is running what",
"icon": "Package",
"version": "1.0.0",
"tab": {
"path": "/kanban",
"position": "after:skills"
},
"entry": "dist/index.js",
"css": "dist/style.css",
"api": "plugin_api.py"
}
+830
View File
@@ -0,0 +1,830 @@
"""Kanban dashboard plugin — backend API routes.
Mounted at /api/plugins/kanban/ by the dashboard plugin system.
This layer is intentionally thin: every handler is a small wrapper around
``hermes_cli.kanban_db`` or a direct SQL query. Writes use the same code
paths the CLI and gateway ``/kanban`` command use, so the three surfaces
cannot drift.
Live updates arrive via the ``/events`` WebSocket, which tails the
append-only ``task_events`` table on a short poll interval (WAL mode lets
reads run alongside the dispatcher's IMMEDIATE write transactions).
Security note
-------------
The dashboard's HTTP auth middleware (``web_server.auth_middleware``)
explicitly skips ``/api/plugins/`` plugin routes are unauthenticated by
design because the dashboard binds to localhost by default. For the
WebSocket we still require the session token as a ``?token=`` query
parameter (browsers cannot set the ``Authorization`` header on an upgrade
request), matching the established pattern used by the in-browser PTY
bridge in ``hermes_cli/web_server.py``. If you run the dashboard with
``--host 0.0.0.0``, every plugin route kanban included becomes
reachable from the network. Don't do that on a shared host.
"""
from __future__ import annotations
import asyncio
import hmac
import json
import logging
import sqlite3
import time
from dataclasses import asdict
from typing import Any, Optional
from fastapi import APIRouter, HTTPException, Query, WebSocket, WebSocketDisconnect, status as http_status
from pydantic import BaseModel, Field
from hermes_cli import kanban_db
log = logging.getLogger(__name__)
router = APIRouter()
# ---------------------------------------------------------------------------
# Auth helper — WebSocket only (HTTP routes live behind the dashboard's
# existing plugin-bypass; this is documented above).
# ---------------------------------------------------------------------------
def _check_ws_token(provided: Optional[str]) -> bool:
"""Constant-time compare against the dashboard session token.
Imported lazily so the plugin still loads in test contexts where the
dashboard web_server module isn't importable (e.g. the bare-FastAPI
test harness).
"""
if not provided:
return False
try:
from hermes_cli import web_server as _ws
except Exception:
# No dashboard context (tests). Accept so the tail loop is still
# testable; in production the dashboard module always imports
# cleanly because it's the caller.
return True
expected = getattr(_ws, "_SESSION_TOKEN", None)
if not expected:
return True
return hmac.compare_digest(str(provided), str(expected))
def _conn():
"""Open a kanban_db connection, creating the schema on first use.
Every handler that mutates the DB goes through this so the plugin
self-heals on a fresh install (no user-visible "no such table"
error if somebody hits POST /tasks before GET /board).
``init_db`` is idempotent.
"""
try:
kanban_db.init_db()
except Exception as exc:
log.warning("kanban init_db failed: %s", exc)
return kanban_db.connect()
# ---------------------------------------------------------------------------
# Serialization helpers
# ---------------------------------------------------------------------------
# Columns shown by the dashboard, in left-to-right order. "archived" is
# available via a filter toggle rather than a visible column.
BOARD_COLUMNS: list[str] = [
"triage", "todo", "ready", "running", "blocked", "done",
]
def _task_dict(task: kanban_db.Task) -> dict[str, Any]:
d = asdict(task)
# Add derived age metrics so the UI can colour stale cards without
# computing deltas client-side.
d["age"] = kanban_db.task_age(task)
# Keep body short on list endpoints; full body comes from /tasks/:id.
return d
def _event_dict(event: kanban_db.Event) -> dict[str, Any]:
return {
"id": event.id,
"task_id": event.task_id,
"kind": event.kind,
"payload": event.payload,
"created_at": event.created_at,
"run_id": event.run_id,
}
def _comment_dict(c: kanban_db.Comment) -> dict[str, Any]:
return {
"id": c.id,
"task_id": c.task_id,
"author": c.author,
"body": c.body,
"created_at": c.created_at,
}
def _run_dict(r: kanban_db.Run) -> dict[str, Any]:
"""Serialise a Run for the drawer's Run history section."""
return {
"id": r.id,
"task_id": r.task_id,
"profile": r.profile,
"step_key": r.step_key,
"status": r.status,
"claim_lock": r.claim_lock,
"claim_expires": r.claim_expires,
"worker_pid": r.worker_pid,
"max_runtime_seconds": r.max_runtime_seconds,
"last_heartbeat_at": r.last_heartbeat_at,
"started_at": r.started_at,
"ended_at": r.ended_at,
"outcome": r.outcome,
"summary": r.summary,
"metadata": r.metadata,
"error": r.error,
}
def _links_for(conn: sqlite3.Connection, task_id: str) -> dict[str, list[str]]:
"""Return {'parents': [...], 'children': [...]} for a task."""
parents = [
r["parent_id"]
for r in conn.execute(
"SELECT parent_id FROM task_links WHERE child_id = ? ORDER BY parent_id",
(task_id,),
)
]
children = [
r["child_id"]
for r in conn.execute(
"SELECT child_id FROM task_links WHERE parent_id = ? ORDER BY child_id",
(task_id,),
)
]
return {"parents": parents, "children": children}
# ---------------------------------------------------------------------------
# GET /board
# ---------------------------------------------------------------------------
@router.get("/board")
def get_board(
tenant: Optional[str] = Query(None, description="Filter to a single tenant"),
include_archived: bool = Query(False),
):
"""Return the full board grouped by status column.
``_conn()`` auto-initializes ``kanban.db`` on first call so a fresh
install doesn't surface a "failed to load" error on the plugin tab.
"""
conn = _conn()
try:
tasks = kanban_db.list_tasks(
conn, tenant=tenant, include_archived=include_archived
)
# Pre-fetch link counts per task (cheap: one query).
link_counts: dict[str, dict[str, int]] = {}
for row in conn.execute(
"SELECT parent_id, child_id FROM task_links"
).fetchall():
link_counts.setdefault(row["parent_id"], {"parents": 0, "children": 0})[
"children"
] += 1
link_counts.setdefault(row["child_id"], {"parents": 0, "children": 0})[
"parents"
] += 1
# Comment + event counts (both cheap aggregates).
comment_counts: dict[str, int] = {
r["task_id"]: r["n"]
for r in conn.execute(
"SELECT task_id, COUNT(*) AS n FROM task_comments GROUP BY task_id"
)
}
# Progress rollup: for each parent, how many children are done / total.
# One pass over task_links joined with child status — cheaper than
# N per-task queries and the plugin uses it to render "N/M".
progress: dict[str, dict[str, int]] = {}
for row in conn.execute(
"SELECT l.parent_id AS pid, t.status AS cstatus "
"FROM task_links l JOIN tasks t ON t.id = l.child_id"
).fetchall():
p = progress.setdefault(row["pid"], {"done": 0, "total": 0})
p["total"] += 1
if row["cstatus"] == "done":
p["done"] += 1
latest_event_id = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
).fetchone()["m"]
columns: dict[str, list[dict]] = {c: [] for c in BOARD_COLUMNS}
if include_archived:
columns["archived"] = []
for t in tasks:
d = _task_dict(t)
d["link_counts"] = link_counts.get(t.id, {"parents": 0, "children": 0})
d["comment_count"] = comment_counts.get(t.id, 0)
d["progress"] = progress.get(t.id) # None when the task has no children
col = t.status if t.status in columns else "todo"
columns[col].append(d)
# Stable per-column ordering already applied by list_tasks
# (priority DESC, created_at ASC), keep as-is.
# List of known tenants for the UI filter dropdown.
tenants = [
r["tenant"]
for r in conn.execute(
"SELECT DISTINCT tenant FROM tasks WHERE tenant IS NOT NULL ORDER BY tenant"
)
]
# List of distinct assignees for the lane-by-profile sub-grouping.
assignees = [
r["assignee"]
for r in conn.execute(
"SELECT DISTINCT assignee FROM tasks WHERE assignee IS NOT NULL "
"AND status != 'archived' ORDER BY assignee"
)
]
return {
"columns": [
{"name": name, "tasks": columns[name]} for name in columns.keys()
],
"tenants": tenants,
"assignees": assignees,
"latest_event_id": int(latest_event_id),
"now": int(time.time()),
}
finally:
conn.close()
# ---------------------------------------------------------------------------
# GET /tasks/:id
# ---------------------------------------------------------------------------
@router.get("/tasks/{task_id}")
def get_task(task_id: str):
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
return {
"task": _task_dict(task),
"comments": [_comment_dict(c) for c in kanban_db.list_comments(conn, task_id)],
"events": [_event_dict(e) for e in kanban_db.list_events(conn, task_id)],
"links": _links_for(conn, task_id),
"runs": [_run_dict(r) for r in kanban_db.list_runs(conn, task_id)],
}
finally:
conn.close()
# ---------------------------------------------------------------------------
# POST /tasks
# ---------------------------------------------------------------------------
class CreateTaskBody(BaseModel):
title: str
body: Optional[str] = None
assignee: Optional[str] = None
tenant: Optional[str] = None
priority: int = 0
workspace_kind: str = "scratch"
workspace_path: Optional[str] = None
parents: list[str] = Field(default_factory=list)
triage: bool = False
idempotency_key: Optional[str] = None
max_runtime_seconds: Optional[int] = None
skills: Optional[list[str]] = None
@router.post("/tasks")
def create_task(payload: CreateTaskBody):
conn = _conn()
try:
task_id = kanban_db.create_task(
conn,
title=payload.title,
body=payload.body,
assignee=payload.assignee,
created_by="dashboard",
workspace_kind=payload.workspace_kind,
workspace_path=payload.workspace_path,
tenant=payload.tenant,
priority=payload.priority,
parents=payload.parents,
triage=payload.triage,
idempotency_key=payload.idempotency_key,
max_runtime_seconds=payload.max_runtime_seconds,
skills=payload.skills,
)
task = kanban_db.get_task(conn, task_id)
return {"task": _task_dict(task) if task else None}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
finally:
conn.close()
# ---------------------------------------------------------------------------
# PATCH /tasks/:id (status / assignee / priority / title / body)
# ---------------------------------------------------------------------------
class UpdateTaskBody(BaseModel):
status: Optional[str] = None
assignee: Optional[str] = None
priority: Optional[int] = None
title: Optional[str] = None
body: Optional[str] = None
result: Optional[str] = None
block_reason: Optional[str] = None
# Structured handoff fields — forwarded to complete_task when status
# transitions to 'done'. Dashboard parity with ``hermes kanban
# complete --summary ... --metadata ...``.
summary: Optional[str] = None
metadata: Optional[dict] = None
@router.patch("/tasks/{task_id}")
def update_task(task_id: str, payload: UpdateTaskBody):
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
# --- assignee ----------------------------------------------------
if payload.assignee is not None:
try:
ok = kanban_db.assign_task(
conn, task_id, payload.assignee or None,
)
except RuntimeError as e:
raise HTTPException(status_code=409, detail=str(e))
if not ok:
raise HTTPException(status_code=404, detail="task not found")
# --- status -------------------------------------------------------
if payload.status is not None:
s = payload.status
ok = True
if s == "done":
ok = kanban_db.complete_task(
conn, task_id,
result=payload.result,
summary=payload.summary,
metadata=payload.metadata,
)
elif s == "blocked":
ok = kanban_db.block_task(conn, task_id, reason=payload.block_reason)
elif s == "ready":
# Re-open a blocked task, or just an explicit status set.
current = kanban_db.get_task(conn, task_id)
if current and current.status == "blocked":
ok = kanban_db.unblock_task(conn, task_id)
else:
# Direct status write for drag-drop (todo -> ready etc).
ok = _set_status_direct(conn, task_id, "ready")
elif s == "archived":
ok = kanban_db.archive_task(conn, task_id)
elif s in ("todo", "running", "triage"):
ok = _set_status_direct(conn, task_id, s)
else:
raise HTTPException(status_code=400, detail=f"unknown status: {s}")
if not ok:
raise HTTPException(
status_code=409,
detail=f"status transition to {s!r} not valid from current state",
)
# --- priority -----------------------------------------------------
if payload.priority is not None:
with kanban_db.write_txn(conn):
conn.execute(
"UPDATE tasks SET priority = ? WHERE id = ?",
(int(payload.priority), task_id),
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'reprioritized', ?, ?)",
(task_id, json.dumps({"priority": int(payload.priority)}),
int(time.time())),
)
# --- title / body -------------------------------------------------
if payload.title is not None or payload.body is not None:
with kanban_db.write_txn(conn):
sets, vals = [], []
if payload.title is not None:
if not payload.title.strip():
raise HTTPException(status_code=400, detail="title cannot be empty")
sets.append("title = ?")
vals.append(payload.title.strip())
if payload.body is not None:
sets.append("body = ?")
vals.append(payload.body)
vals.append(task_id)
conn.execute(
f"UPDATE tasks SET {', '.join(sets)} WHERE id = ?", vals,
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'edited', NULL, ?)",
(task_id, int(time.time())),
)
updated = kanban_db.get_task(conn, task_id)
return {"task": _task_dict(updated) if updated else None}
finally:
conn.close()
def _set_status_direct(
conn: sqlite3.Connection, task_id: str, new_status: str,
) -> bool:
"""Direct status write for drag-drop moves that aren't covered by the
structured complete/block/unblock/archive verbs (e.g. todo<->ready,
running<->ready). Appends a ``status`` event row for the live feed.
When this transitions OFF ``running`` to anything other than the
terminal verbs above (which own their own run closing), we close the
active run with outcome='reclaimed' so attempt history isn't
orphaned. ``running -> ready`` via drag-drop is the common case
(user yanking a stuck worker back to the queue).
"""
with kanban_db.write_txn(conn):
# Snapshot current state so we know whether to close a run.
prev = conn.execute(
"SELECT status, current_run_id FROM tasks WHERE id = ?",
(task_id,),
).fetchone()
if prev is None:
return False
was_running = prev["status"] == "running"
cur = conn.execute(
"UPDATE tasks SET status = ?, "
" claim_lock = CASE WHEN ? = 'running' THEN claim_lock ELSE NULL END, "
" claim_expires = CASE WHEN ? = 'running' THEN claim_expires ELSE NULL END, "
" worker_pid = CASE WHEN ? = 'running' THEN worker_pid ELSE NULL END "
"WHERE id = ?",
(new_status, new_status, new_status, new_status, task_id),
)
if cur.rowcount != 1:
return False
run_id = None
if was_running and new_status != "running" and prev["current_run_id"]:
run_id = kanban_db._end_run(
conn, task_id,
outcome="reclaimed", status="reclaimed",
summary=f"status changed to {new_status} (dashboard/direct)",
)
conn.execute(
"INSERT INTO task_events (task_id, run_id, kind, payload, created_at) "
"VALUES (?, ?, 'status', ?, ?)",
(task_id, run_id, json.dumps({"status": new_status}), int(time.time())),
)
# If we re-opened something, children may have gone stale.
if new_status in ("done", "ready"):
kanban_db.recompute_ready(conn)
return True
# ---------------------------------------------------------------------------
# Comments
# ---------------------------------------------------------------------------
class CommentBody(BaseModel):
body: str
author: Optional[str] = "dashboard"
@router.post("/tasks/{task_id}/comments")
def add_comment(task_id: str, payload: CommentBody):
if not payload.body.strip():
raise HTTPException(status_code=400, detail="body is required")
conn = _conn()
try:
if kanban_db.get_task(conn, task_id) is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
kanban_db.add_comment(
conn, task_id, author=payload.author or "dashboard", body=payload.body,
)
return {"ok": True}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Links
# ---------------------------------------------------------------------------
class LinkBody(BaseModel):
parent_id: str
child_id: str
@router.post("/links")
def add_link(payload: LinkBody):
conn = _conn()
try:
kanban_db.link_tasks(conn, payload.parent_id, payload.child_id)
return {"ok": True}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
finally:
conn.close()
@router.delete("/links")
def delete_link(parent_id: str = Query(...), child_id: str = Query(...)):
conn = _conn()
try:
ok = kanban_db.unlink_tasks(conn, parent_id, child_id)
return {"ok": bool(ok)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Bulk actions (multi-select on the board)
# ---------------------------------------------------------------------------
class BulkTaskBody(BaseModel):
ids: list[str]
status: Optional[str] = None
assignee: Optional[str] = None # "" or None = unassign
priority: Optional[int] = None
archive: bool = False
@router.post("/tasks/bulk")
def bulk_update(payload: BulkTaskBody):
"""Apply the same patch to every id in ``payload.ids``.
This is an *independent* iteration per-task failures don't abort
siblings. Returns per-id outcome so the UI can surface partials.
"""
ids = [i for i in (payload.ids or []) if i]
if not ids:
raise HTTPException(status_code=400, detail="ids is required")
results: list[dict] = []
conn = _conn()
try:
for tid in ids:
entry: dict[str, Any] = {"id": tid, "ok": True}
try:
task = kanban_db.get_task(conn, tid)
if task is None:
entry.update(ok=False, error="not found")
results.append(entry)
continue
if payload.archive:
if not kanban_db.archive_task(conn, tid):
entry.update(ok=False, error="archive refused")
if payload.status is not None and not payload.archive:
s = payload.status
if s == "done":
ok = kanban_db.complete_task(conn, tid)
elif s == "blocked":
ok = kanban_db.block_task(conn, tid)
elif s == "ready":
cur = kanban_db.get_task(conn, tid)
if cur and cur.status == "blocked":
ok = kanban_db.unblock_task(conn, tid)
else:
ok = _set_status_direct(conn, tid, "ready")
elif s in ("todo", "running", "triage"):
ok = _set_status_direct(conn, tid, s)
else:
entry.update(ok=False, error=f"unknown status {s!r}")
results.append(entry)
continue
if not ok:
entry.update(ok=False, error=f"transition to {s!r} refused")
if payload.assignee is not None:
try:
if not kanban_db.assign_task(
conn, tid, payload.assignee or None,
):
entry.update(ok=False, error="assign refused")
except RuntimeError as e:
entry.update(ok=False, error=str(e))
if payload.priority is not None:
with kanban_db.write_txn(conn):
conn.execute(
"UPDATE tasks SET priority = ? WHERE id = ?",
(int(payload.priority), tid),
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'reprioritized', ?, ?)",
(tid, json.dumps({"priority": int(payload.priority)}),
int(time.time())),
)
except Exception as e: # defensive — one bad id shouldn't kill the batch
entry.update(ok=False, error=str(e))
results.append(entry)
return {"results": results}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Plugin config (read dashboard.kanban.* defaults from config.yaml)
# ---------------------------------------------------------------------------
@router.get("/config")
def get_config():
"""Return kanban dashboard preferences from ~/.hermes/config.yaml.
Reads the ``dashboard.kanban`` section if present; defaults otherwise.
Used by the UI to pre-select tenant filters, toggle markdown rendering,
or set column-width preferences without a round-trip per page load.
"""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
except Exception:
cfg = {}
dash_cfg = (cfg.get("dashboard") or {})
# dashboard.kanban may itself be a dict; fall back to {}.
k_cfg = dash_cfg.get("kanban") or {}
return {
"default_tenant": k_cfg.get("default_tenant") or "",
"lane_by_profile": bool(k_cfg.get("lane_by_profile", True)),
"include_archived_by_default": bool(k_cfg.get("include_archived_by_default", False)),
"render_markdown": bool(k_cfg.get("render_markdown", True)),
}
# ---------------------------------------------------------------------------
# Stats (per-profile / per-status counts + oldest-ready age)
# ---------------------------------------------------------------------------
@router.get("/stats")
def get_stats():
"""Per-status + per-assignee counts + oldest-ready age.
Designed for the dashboard HUD and for router profiles that need to
answer "is this specialist overloaded?" without scanning the whole
board themselves.
"""
conn = _conn()
try:
return kanban_db.board_stats(conn)
finally:
conn.close()
@router.get("/assignees")
def get_assignees():
"""Known profiles + per-profile task counts.
Returns the union of ``~/.hermes/profiles/*`` on disk and every
distinct assignee currently used on the board. The dashboard uses
this to populate its assignee dropdown so a freshly-created profile
appears in the picker before it's been given any task.
"""
conn = _conn()
try:
return {"assignees": kanban_db.known_assignees(conn)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Worker log (read-only; file written by _default_spawn)
# ---------------------------------------------------------------------------
@router.get("/tasks/{task_id}/log")
def get_task_log(task_id: str, tail: Optional[int] = Query(None, ge=1, le=2_000_000)):
"""Return the worker's stdout/stderr log.
``tail`` caps the response size (bytes) so the dashboard drawer
doesn't paginate megabytes into the browser. Returns 404 if the task
has never spawned. The on-disk log is rotated at 2 MiB per
``_rotate_worker_log`` a single ``.log.1`` is kept, no further
generations, so disk usage per task is bounded at ~4 MiB.
"""
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
finally:
conn.close()
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
content = kanban_db.read_worker_log(task_id, tail_bytes=tail)
log_path = kanban_db.worker_log_path(task_id)
size = log_path.stat().st_size if log_path.exists() else 0
return {
"task_id": task_id,
"path": str(log_path),
"exists": content is not None,
"size_bytes": size,
"content": content or "",
# Truncated when the on-disk file was larger than the tail cap.
"truncated": bool(tail and size > tail),
}
# ---------------------------------------------------------------------------
# Dispatch nudge (optional quick-path so the UI doesn't wait 60 s)
# ---------------------------------------------------------------------------
@router.post("/dispatch")
def dispatch(dry_run: bool = Query(False), max_n: int = Query(8, alias="max")):
conn = _conn()
try:
result = kanban_db.dispatch_once(
conn, dry_run=dry_run, max_spawn=max_n,
)
# DispatchResult is a dataclass.
try:
return asdict(result)
except TypeError:
return {"result": str(result)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# WebSocket: /events?since=<event_id>
# ---------------------------------------------------------------------------
# Poll interval for the event tail loop. SQLite WAL + 300 ms polling is
# the simplest and most robust approach; it adds a fraction of a percent
# of CPU and has no shared state to synchronize across workers.
_EVENT_POLL_SECONDS = 0.3
@router.websocket("/events")
async def stream_events(ws: WebSocket):
# Enforce the dashboard session token as a query param — browsers can't
# set Authorization on a WS upgrade. This matches how the PTY bridge
# authenticates in hermes_cli/web_server.py.
token = ws.query_params.get("token")
if not _check_ws_token(token):
await ws.close(code=http_status.WS_1008_POLICY_VIOLATION)
return
await ws.accept()
try:
since_raw = ws.query_params.get("since", "0")
try:
cursor = int(since_raw)
except ValueError:
cursor = 0
def _fetch_new(cursor_val: int) -> tuple[int, list[dict]]:
conn = kanban_db.connect()
try:
rows = conn.execute(
"SELECT id, task_id, run_id, kind, payload, created_at "
"FROM task_events WHERE id > ? ORDER BY id ASC LIMIT 200",
(cursor_val,),
).fetchall()
out: list[dict] = []
new_cursor = cursor_val
for r in rows:
try:
payload = json.loads(r["payload"]) if r["payload"] else None
except Exception:
payload = None
out.append({
"id": r["id"],
"task_id": r["task_id"],
"run_id": r["run_id"],
"kind": r["kind"],
"payload": payload,
"created_at": r["created_at"],
})
new_cursor = r["id"]
return new_cursor, out
finally:
conn.close()
while True:
cursor, events = await asyncio.to_thread(_fetch_new, cursor)
if events:
await ws.send_json({"events": events, "cursor": cursor})
await asyncio.sleep(_EVENT_POLL_SECONDS)
except WebSocketDisconnect:
return
except Exception as exc: # defensive: never crash the dashboard worker
log.warning("Kanban event stream error: %s", exc)
try:
await ws.close()
except Exception:
pass
@@ -0,0 +1,17 @@
[Unit]
Description=Hermes Kanban dispatcher (hermes kanban daemon)
Documentation=https://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/env hermes kanban daemon --interval 60 --pidfile %t/hermes-kanban-dispatcher.pid
Restart=on-failure
RestartSec=5
# Log to the journal via stdout/stderr; the dispatcher also writes per-task
# worker output to $HERMES_HOME/kanban/logs/<task>.log.
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=default.target
+1 -1
View File
@@ -43,7 +43,7 @@ _TIMEOUT = 30.0
# ---------------------------------------------------------------------------
# Process-level atexit safety net — ensures pending sessions are committed
# even if shutdown_memory_provider is never called (e.g. gateway crash,
# SIGKILL, or exception in _async_flush_memories preventing shutdown).
# SIGKILL, or exception in the session expiry watcher preventing shutdown).
# ---------------------------------------------------------------------------
_last_active_provider: Optional["OpenVikingMemoryProvider"] = None
+11
View File
@@ -78,6 +78,16 @@ termux = [
]
dingtalk = ["dingtalk-stream>=0.20,<1", "alibabacloud-dingtalk>=2.0.0", "qrcode>=7.0,<8"]
feishu = ["lark-oapi>=1.5.3,<2", "qrcode>=7.0,<8"]
google = [
# Required by the google-workspace skill (Gmail, Calendar, Drive, Contacts,
# Sheets, Docs). Declared here so packagers (Nix, Homebrew) ship them with
# the [all] extra and users don't hit runtime `pip install` paths that fail
# in environments without pip (e.g. Nix-managed Python).
"google-api-python-client>=2.100,<3",
"google-auth-oauthlib>=1.0,<2",
"google-auth-httplib2>=0.2,<1",
]
# `hermes dashboard` (localhost SPA + API). Not in core to keep the default install lean.
web = ["fastapi>=0.104.0,<1", "uvicorn[standard]>=0.24.0,<1"]
rl = [
"atroposlib @ git+https://github.com/NousResearch/atropos.git@c20c85256e5a45ad31edf8b7276e9c5ee1995a30",
@@ -109,6 +119,7 @@ all = [
"hermes-agent[voice]",
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[google]",
"hermes-agent[mistral]",
"hermes-agent[bedrock]",
"hermes-agent[web]",
+723 -319
View File
File diff suppressed because it is too large Load Diff
+95
View File
@@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""Build the Hermes Model Catalog — a centralized JSON manifest of curated models.
This script reads the in-repo hardcoded curated lists (``OPENROUTER_MODELS``,
``_PROVIDER_MODELS["nous"]``) and writes them to a JSON manifest that the
Hermes CLI fetches at runtime. Publishing the catalog through the docs site
lets maintainers update model lists without shipping a Hermes release.
The runtime fetcher falls back to the same in-repo hardcoded lists if the
manifest is unreachable, so this script is a convenience for keeping the
manifest in sync not a source of truth.
Usage::
python scripts/build_model_catalog.py
Output: ``website/static/api/model-catalog.json``
Live URL (after ``deploy-site.yml`` runs on merge to main):
``https://hermes-agent.nousresearch.com/docs/api/model-catalog.json``
"""
from __future__ import annotations
import json
import os
import sys
from datetime import datetime, timezone
REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, REPO_ROOT)
# Ensure HERMES_HOME is set for imports that touch it at module level.
os.environ.setdefault("HERMES_HOME", os.path.join(os.path.expanduser("~"), ".hermes"))
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS # noqa: E402
OUTPUT_PATH = os.path.join(REPO_ROOT, "website", "static", "api", "model-catalog.json")
CATALOG_VERSION = 1
def build_catalog() -> dict:
return {
"version": CATALOG_VERSION,
"updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"metadata": {
"source": "hermes-agent repo",
"docs": "https://hermes-agent.nousresearch.com/docs/reference/model-catalog",
},
"providers": {
"openrouter": {
"metadata": {
"display_name": "OpenRouter",
"note": (
"Descriptions drive picker badges. Live /api/v1/models "
"filters curated ids by tool-calling support and free pricing."
),
},
"models": [
{"id": mid, "description": desc}
for mid, desc in OPENROUTER_MODELS
],
},
"nous": {
"metadata": {
"display_name": "Nous Portal",
"note": (
"Free-tier gating is determined live via Portal pricing "
"(partition_nous_models_by_tier), not this manifest."
),
},
"models": [
{"id": mid}
for mid in _PROVIDER_MODELS.get("nous", [])
],
},
},
}
def main() -> int:
catalog = build_catalog()
os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
with open(OUTPUT_PATH, "w") as fh:
json.dump(catalog, fh, indent=2)
fh.write("\n")
print(f"Wrote {OUTPUT_PATH}")
for provider, block in catalog["providers"].items():
print(f" {provider}: {len(block['models'])} models")
return 0
if __name__ == "__main__":
sys.exit(main())
-377
View File
@@ -1,377 +0,0 @@
# Compression Eval — Design
Status: proposal. Nothing under `scripts/compression_eval/` runs in CI.
This is an offline tool authors run before merging prompt or algorithm
changes to `agent/context_compressor.py`.
## Why
We tune the compressor prompt and the `_template_sections` checklist by
hand, ship, and wait for the next real session to notice regressions.
There is no automated check that a prompt edit still preserves file
paths, error messages, or the active task across a compression.
Factory.ai's December 2025 write-up
(https://factory.ai/news/evaluating-compression) describes a
probe-based eval that scores compressed state on six dimensions. The
methodology is the valuable part — the benchmarks in the post are a
marketing piece. We adopt the methodology and discard the scoreboard.
## Goal
Given a real session transcript and a bank of probe questions that
exercise what the transcript contained, answer:
1. After `ContextCompressor.compress()` runs, can the agent still
answer each probe correctly from the compressed state?
2. Which of the six dimensions (accuracy, context awareness, artifact
trail, completeness, continuity, instruction following) is the
prompt weakest on?
3. Does a prompt change improve or regress any dimension vs. the
previous run?
That is the full scope. No "compare against OpenAI and Anthropic"
benchmarking, no public scoreboard, no marketing claims.
## Non-goals
- Not a pytest. Requires API credentials, costs money, takes minutes
per fixture, and output is LLM-graded and non-deterministic.
- Not part of `scripts/run_tests.sh`. Not invoked by CI.
- Not a replacement for the existing compressor unit tests in
`tests/agent/test_context_compressor.py` — those stay as the
structural / boundary / tool-pair-sanitization guard.
- Not a general trajectory eval. Scoped to context compaction only.
## Where it lives
```
scripts/compression_eval/
├── DESIGN.md # this file
├── README.md # how to run, cost expectations, caveats
├── run_eval.py # entry point (fire CLI, like sample_and_compress.py)
├── scrub_fixtures.py # regenerate fixtures from ~/.hermes/sessions/*.jsonl
├── fixtures/ # checked-in scrubbed session snapshots
│ ├── feature-impl-context-priority.json
│ ├── debug-session-feishu-id-model.json
│ └── config-build-competitive-scouts.json
├── probes/ # probe banks paired with fixtures
│ └── <fixture>.probes.json
├── rubric.py # grading prompt + dimension definitions
├── grader.py # judge-model call + score parsing
├── compressor_driver.py # thin wrapper over ContextCompressor
└── results/ # gitignored; timestamped output per run
└── .gitkeep
```
`scripts/` is the right home: offline tooling, no CI involvement,
precedent already set by `sample_and_compress.py`,
`contributor_audit.py`, `discord-voice-doctor.py`.
`environments/` is for Atropos RL training environments — wrong shape.
`tests/` is hermetic and credential-free — incompatible with a
probe-based eval that needs a judge model.
## Fixture format
A fixture is a single compressed-enough conversation captured from a
real session. Stored as JSON (pretty-printed, reviewable in PRs):
```json
{
"name": "401-debug",
"description": "178-turn session debugging a 401 on /api/auth/login",
"model": "anthropic/claude-sonnet-4.6",
"context_length": 200000,
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "...", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "...", "content": "..."}
],
"notes": "Captured 2026-04-24 from session 20260424_*.jsonl; \
PII scrubbed; secrets redacted via redact_sensitive_text."
}
```
### Sourcing fixtures
Fixtures are scrubbed snapshots of real sessions from the
maintainer's `~/.hermes/sessions/*.jsonl` store, generated
reproducibly by `scrub_fixtures.py` in this directory. Re-run the
scrubber with `python3 scripts/compression_eval/scrub_fixtures.py`
to regenerate them after a scrubber change.
Three shipped fixtures cover three different session shapes:
| Fixture | Source shape | Messages | Tokens (rough) | Tests |
|---|---|---|---|---|
| `feature-impl-context-priority` | investigate → patch → test → PR → merge | 75 | ~45k | continuation, artifact trail (2 files modified, 1 PR, ~16k skill_view in head) |
| `debug-session-feishu-id-model` | PR triage + upstream docs + decision | 59 | ~28k | recall (PR #, error shape), decision (outcome + reason), large PR diff blocks |
| `config-build-competitive-scouts` | iterative config: 11 cron jobs across 7 weekdays | 61 | ~26k | artifact trail (which jobs, which days), iterative-merge |
The `~26k-45k` token range is below the default 50%-of-200k
compression threshold, so the eval will always **force** a
`compress()` call rather than wait for the natural trigger. That is
the intended shape — we want a controlled single-shot compression so
score deltas are attributable to the prompt change, not to whether
the threshold happened to fire at the same boundary twice.
### Scrubber pipeline
`scrub_fixtures.py` applies, per message:
1. `agent.redact.redact_sensitive_text` — API keys, tokens,
connection strings
2. Username paths: `/home/teknium``/home/user`
3. Personal handles: all case variants of the maintainer name → `user`
4. Email addresses → `contributor@example.com`; git
`Author: Name <addr>` header lines normalised
5. `<REASONING_SCRATCHPAD>...</REASONING_SCRATCHPAD>` and
`<think>...</think>` stripped from assistant content
6. Messaging-platform user mentions (`<@123456>`, `<@***>`) →
`<@user>`
7. First user message paraphrased to remove personal voice;
subsequent user turns kept verbatim after the redactions above
8. System prompt replaced with a generic public-safe placeholder so
we don't check in the maintainer's tuned soul/skills/memory system
block
9. Orphan empty-assistant messages (artifact of scratchpad-only
turns) and trailing tool messages with no matching assistant are
dropped
10. Tool outputs preserved verbatim. An earlier iteration truncated
> 2KB tool bodies to keep fixture JSON small, but that defeats
the purpose: real sessions have 30KB `skill_view` dumps, 10KB
`read_file` outputs, 5KB `web_extract` bodies — compression has
to handle them. Truncation is now a no-op; the pipeline note
remains in `scrubbing_passes` for audit trail clarity.
Before every fixture PR: grep the fixture for PII patterns. An
audit is embedded at the bottom of the scrubber as comments.
**Fixtures must stay small.** Target <200 KB per fixture, <500 KB
total for the directory. Current total: ~410 KB across three
fixtures. Larger sessions are truncated with a
`truncated_to: <index>` field in the fixture header so the cut is
reviewable.
## Probe format
One probe file per fixture, so reviewers can see the question bank
evolve alongside the fixture:
```json
{
"fixture": "401-debug",
"probes": [
{
"id": "recall-error-code",
"type": "recall",
"question": "What was the original error code and endpoint?",
"expected_facts": ["401", "/api/auth/login"]
},
{
"id": "artifact-files-modified",
"type": "artifact",
"question": "Which files have been modified in this session?",
"expected_facts": ["session_store.py", "redis_client.py"]
},
{
"id": "continuation-next-step",
"type": "continuation",
"question": "What should we do next?",
"expected_facts": ["re-run the integration tests", "restart the worker"]
},
{
"id": "decision-redis-approach",
"type": "decision",
"question": "What did we decide about the Redis issue?",
"expected_facts": ["switch to redis-py 5.x", "pooled connection"]
}
]
}
```
The four probe types come directly from Factory's methodology:
**recall, artifact, continuation, decision**. `expected_facts` gives
the grader concrete anchors instead of relying purely on LLM taste.
Authoring a probe bank is a one-time cost per fixture. 8-12 probes per
fixture is the target — enough to cover all four types, few enough to
grade in under a minute at reasonable cost.
## Grading
Each probe gets scored 0-5 on **six dimensions** (Factory's six):
| Dimension | What it measures |
|-----------------------|-----------------------------------------------------|
| accuracy | File paths, function names, error codes are correct |
| context_awareness | Reflects current state, not a mid-session snapshot |
| artifact_trail | Knows which files were read / modified / created |
| completeness | Addresses all parts of the probe |
| continuity | Agent can continue without re-fetching |
| instruction_following | Probe answered in the requested form |
Grading is done by a single judge-model call per probe with a
deterministic rubric prompt (see `rubric.py`). The rubric includes the
`expected_facts` list so the judge has a concrete anchor. Default
judge model: whatever the user has configured as their main model at
run time (same resolution path as `auxiliary_client.call_llm`). A
`--judge-model` flag allows overriding for consistency across runs.
Non-determinism caveat: two runs of the same fixture will produce
different scores. A single run means nothing. Report medians over
N=3 runs by default, and require an improvement of >=0.3 on any
dimension before claiming a prompt change is a win.
## Run flow
```
python scripts/compression_eval/run_eval.py [OPTIONS]
```
Options (fire-style, mirroring `sample_and_compress.py`):
| Flag | Default | Purpose |
|------------------------|------------|-------------------------------------------|
| `--fixtures` | all | Comma-separated fixture names |
| `--runs` | 3 | Runs per fixture (for median) |
| `--judge-model` | auto | Override judge model |
| `--compressor-model` | auto | Override model used *inside* the compressor |
| `--label` | timestamp | Subdirectory under `results/` |
| `--focus-topic` | none | Pass-through to `compress(focus_topic=)` |
| `--compare-to` | none | Path to a previous run for diff output |
Steps per fixture per run:
1. Load fixture JSON and probe bank.
2. Construct a `ContextCompressor` against the fixture's model.
3. Call `compressor.compress(messages)` — capture the compressed
message list.
4. For each probe: ask the judge model to role-play as the continuing
agent with only the compressed state, then grade the answer on the
six dimensions using `rubric.py`.
5. Write a per-run JSON to `results/<label>/<fixture>-run-N.json`.
6. After all runs, emit a markdown summary to
`results/<label>/report.md`.
## Report format
Pasted verbatim into PR descriptions that touch the compressor:
```
## Compression eval — label 2026-04-25_13-40-02
Main model: anthropic/claude-sonnet-4.6 Judge: same
3 runs per fixture, medians reported.
| Fixture | Accuracy | Context | Artifact | Complete | Continuity | Instruction | Overall |
|----------------|----------|---------|----------|----------|------------|-------------|---------|
| 401-debug | 4.1 | 4.0 | 2.5 | 4.3 | 3.8 | 5.0 | 3.95 |
| pr-review | 3.9 | 3.8 | 3.1 | 4.2 | 3.9 | 5.0 | 3.98 |
| feature-impl | 4.0 | 3.9 | 2.9 | 4.1 | 4.0 | 5.0 | 3.98 |
Per-probe misses (score < 3.0):
- 401-debug / artifact-files-modified: 1.7 — summary dropped redis_client.py
- pr-review / decision-auth-rewrite: 2.3 — outcome captured, reasoning dropped
```
## Cost expectations
Dominated by the judge calls. For 3 fixtures × 10 probes × 3 runs =
90 judge calls per eval run. On Claude Sonnet 4.6 that is roughly
$0.50-$1.50 per full eval depending on probe length. The compressor
itself makes 1 call per fixture × 3 runs = 9 additional calls.
**This is not a check to run after every commit.** It is a
before-merge check for PRs that touch:
- `agent/context_compressor.py` — any change to `_template_sections`,
`_generate_summary`, or `compress()`.
- `agent/auxiliary_client.py` — when changing how compression tasks
are routed.
- `agent/prompt_builder.py` — when the compression-note phrasing
changes.
## Open questions (to resolve before implementing)
1. **Fixture scrubbing: manual or scripted?** A scripted scrub that
also replaces project names / hostnames would lower the cost of
contributing a new fixture. Risk: over-aggressive replacement
destroys the signal the probe depends on. Propose: start manual,
add scripted helpers once we have 3 fixtures and know the common
PII shapes.
2. **Judge model selection.** Factory uses GPT-5.2. We can't pin one
— user's main model changes. Options: (a) grade with main model
(cheap, inconsistent across users), (b) require a specific judge
model (e.g. `claude-sonnet-4.6`), inconsistent for users without
access. Propose (a) with a `--judge-model` override, and make the
model name prominent in the report so comparisons across machines
are legible.
3. **Noise floor.** Before landing prompt changes, run the current
prompt N=10 times to measure per-dimension stddev. That tells us
the minimum delta to call a change significant. Suspect 0.2-0.3 on
a 0-5 scale. Decision deferred until after the first fixture is
landed.
4. **Iterative-merge coverage.** The real Factory-vs-Anthropic
difference is incremental merge vs. regenerate. A fixture that
only compresses once doesn't exercise our iterative path. Add a
fourth fixture that forces two compressions (manually chained),
with probes that test whether information from the first
compression survives the second. Deferred to a follow-up PR.
## Implementation status
This PR ships the full eval end-to-end:
- `scrub_fixtures.py` — reproducible scrubber
- `fixtures/` — three scrubbed session fixtures
- `probes/` — three probe banks (10-11 probes each, all four types)
- `rubric.py` — six-dimension grading rubric + judge-prompt builder + response parser
- `compressor_driver.py` — thin wrapper around `ContextCompressor` for forced single-shot compression
- `grader.py` — two-phase continuation + grading calls via OpenAI SDK
- `report.py` — markdown report renderer + `--compare-to` delta mode + per-run JSON dumper
- `run_eval.py` — entry point (`fire`-style CLI)
- `tests/scripts/test_compression_eval.py` — 33 unit tests covering rubric parsing, report rendering, fixture/probe loading, and a PII smoke test on the fixtures (LLM paths not tested — they require credentials and are exercised by the eval itself)
### Noise floor — one empirical data point
A single same-inputs re-run of `debug-session-feishu-id-model`
(compressor + judge = `openai/gpt-5.4-mini` via Nous Portal,
runs=1) produced:
- Run A overall: 3.25
- Run B overall: 3.17 (delta -0.08)
Individual dimensions varied by up to ±0.5 between the two runs on
single-run medians. This confirms DESIGN.md's "< 0.3 is noise"
guidance is the right order of magnitude for a single-run
comparison. With `runs=3` default, per-dimension variance should
tighten; noise-floor measurement at N=10 is still a useful
follow-up to calibrate precisely.
## Open follow-ups (not blocking this PR)
1. **Iterative-merge fixture** — our actual compression win over
"regenerate from scratch" approaches is only exercised when
`_previous_summary` is re-used on a second compression. None of
the three shipped fixtures force two compressions. The natural
basis is `config-build-competitive-scouts` (already iterative by
shape); splitting it at the Monday/Tuesday boundary would force
the second compression to merge rather than regenerate.
2. **Noise-floor precision** — run the current prompt N=10 times
against one fixture to pin down per-dimension stddev and publish
the numbers in README.
3. **Scripted scrubber helpers** — the current scrubber is manual
per-fixture. A helper that identifies candidate sessions to
scrub (by shape or by keyword) would lower the cost of adding
fixture #4+.
4. **Judge model selection policy** — current code uses whatever
the user passes as `--judge-model` (default: same as compressor).
Pinning the judge across users would stabilise cross-machine
comparisons, at the cost of blocking users without access to
the pinned model.
-110
View File
@@ -1,110 +0,0 @@
# compression_eval
Offline eval harness for `agent/context_compressor.py`. Runs a real
conversation transcript through the compressor, then probes the
compressed state with targeted questions graded on six dimensions.
## When to run
Before merging changes to:
- `agent/context_compressor.py` — any change to `_template_sections`,
`_generate_summary`, `compress()`, or its boundary logic
- `agent/auxiliary_client.py` — when changing how compression tasks
are routed
- `agent/prompt_builder.py` — when the compression-note phrasing
changes
## Not for CI
This harness makes real model calls (compressor + continuation +
judge = ~3 calls per probe × probes per fixture × runs). Costs ~$0.50
to ~$1.50 per full run depending on models, takes minutes, is
LLM-graded (non-deterministic). It lives in `scripts/` and is
invoked by hand. `tests/` and `scripts/run_tests.sh` do not touch it.
`tests/scripts/test_compression_eval.py` covers the non-LLM code
paths (rubric parsing, report rendering, fixture/probe loading, PII
smoke check on the checked-in fixtures) and DOES run in CI.
## Usage
```bash
# Run all three fixtures, 3 runs each, with your configured provider
python3 scripts/compression_eval/run_eval.py
# Faster iteration — one fixture, one run
python3 scripts/compression_eval/run_eval.py \
--fixtures=debug-session-feishu-id-model --runs=1
# Pin a cheap model for both compression + judge (recommended)
python3 scripts/compression_eval/run_eval.py \
--compressor-provider=nous --compressor-model=openai/gpt-5.4-mini \
--judge-provider=nous --judge-model=openai/gpt-5.4-mini \
--runs=3 --label=baseline
# After editing context_compressor.py, rerun with a new label and diff
python3 scripts/compression_eval/run_eval.py \
--compressor-provider=nous --compressor-model=openai/gpt-5.4-mini \
--judge-provider=nous --judge-model=openai/gpt-5.4-mini \
--runs=3 --label=my-prompt-tweak \
--compare-to=results/baseline
```
Results land in `results/<label>/report.md` and are intended to be
pasted verbatim into PR descriptions. `--compare-to` renders a delta
column per dimension so reviewers can see "did this actually help?"
at a glance.
Rule of thumb: dimension deltas below ±0.3 are within run-to-run
noise on `runs=3`. Publish a bigger N if you want tighter bounds.
## Fixtures
Three scrubbed session snapshots live under `fixtures/`:
- `feature-impl-context-priority.json` — 75 msgs, investigate →
patch → test → PR → merge
- `debug-session-feishu-id-model.json` — 59 msgs, PR triage +
upstream docs + decision
- `config-build-competitive-scouts.json` — 61 msgs, iterative
config accumulation (11 cron jobs)
Regenerate them from the maintainer's `~/.hermes/sessions/*.jsonl`
with `python3 scripts/compression_eval/scrub_fixtures.py`. The
scrubber pipeline and PII-audit checklist are documented in
`DESIGN.md` under **Scrubber pipeline**.
## Probes
One probe bank per fixture under `probes/`, 10-11 probes each,
covering all four types: **recall**, **artifact**, **continuation**,
**decision**. Each probe carries an `expected_facts` list of concrete
anchors (PR numbers, file paths, error codes, commands run) that the
judge sees alongside the assistant's answer.
## How it scores
Six dimensions, 0-5 per probe:
| Dimension | What it measures |
|-----------------------|------------------------------------------------------|
| accuracy | File paths, function names, PR/issue numbers correct |
| context_awareness | Reflects current session state, not a snapshot |
| artifact_trail | Correctly enumerates files / commands / PRs |
| completeness | Addresses ALL parts of the probe |
| continuity | Next assistant could continue without re-fetching |
| instruction_following | Answer in the requested form |
Report renders medians across N runs; probes scoring below 3.0
overall surface in a separate section with the judge's specific
complaint noted inline.
## Related
- `agent/context_compressor.py` — the thing under test
- `tests/agent/test_context_compressor.py` — structural unit tests
that do run in CI
- `scripts/sample_and_compress.py` — the closest existing script in
shape (offline, credential-requiring, not in CI)
- `DESIGN.md` — full architecture + methodology + open follow-ups
@@ -1,114 +0,0 @@
"""Wraps ContextCompressor to run a single forced compression on a fixture.
The real agent loop checks ``should_compress()`` before calling ``compress()``.
Fixtures are intentionally sized below the 100k threshold so ``compress()``
runs in a controlled, single-shot mode score deltas attribute to the
prompt change, not to whether the threshold happened to fire at the same
boundary twice.
Resolves the provider for the compression call via the same path the real
agent uses (``hermes_cli.runtime_provider.resolve_runtime_provider``) so
behaviour matches production aside from being a single call.
"""
from __future__ import annotations
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional
# Make sibling imports work whether invoked as a script or as a module.
_REPO_ROOT = Path(__file__).resolve().parents[2]
if str(_REPO_ROOT) not in sys.path:
sys.path.insert(0, str(_REPO_ROOT))
from agent.context_compressor import ( # noqa: E402
ContextCompressor,
estimate_messages_tokens_rough,
)
def run_compression(
*,
messages: List[Dict[str, Any]],
compressor_model: str,
compressor_provider: str,
compressor_base_url: str,
compressor_api_key: str,
compressor_api_mode: str,
context_length: int,
focus_topic: Optional[str] = None,
summary_model_override: Optional[str] = None,
) -> Dict[str, Any]:
"""Run a single forced compression pass over the fixture messages.
Returns a dict with:
- compressed_messages: the post-compression message list
- summary_text: the summary produced (extracted from the compressed head)
- pre_tokens, post_tokens: rough token counts before/after
- compression_ratio: 1 - (post/pre)
- pre_message_count, post_message_count
"""
compressor = ContextCompressor(
model=compressor_model,
threshold_percent=0.50,
protect_first_n=3,
protect_last_n=20,
summary_target_ratio=0.20,
quiet_mode=True,
summary_model_override=summary_model_override or "",
base_url=compressor_base_url,
api_key=compressor_api_key,
config_context_length=context_length,
provider=compressor_provider,
api_mode=compressor_api_mode,
)
pre_tokens = estimate_messages_tokens_rough(messages)
compressed = compressor.compress(
messages,
current_tokens=pre_tokens,
focus_topic=focus_topic,
)
post_tokens = estimate_messages_tokens_rough(compressed)
summary_text = _extract_summary_from_messages(compressed)
ratio = (1.0 - (post_tokens / pre_tokens)) if pre_tokens > 0 else 0.0
return {
"compressed_messages": compressed,
"summary_text": summary_text,
"pre_tokens": pre_tokens,
"post_tokens": post_tokens,
"compression_ratio": ratio,
"pre_message_count": len(messages),
"post_message_count": len(compressed),
}
_SUMMARY_MARKERS = (
"## Active Task",
"## Goal",
"## Completed Actions",
)
def _extract_summary_from_messages(messages: List[Dict[str, Any]]) -> str:
"""Find the structured summary block inside the compressed message list.
The compressor injects the summary as a user (or system-appended) message
near the head. We look for the section-header markers from
``_template_sections`` in ``agent/context_compressor.py``.
"""
for msg in messages:
content = msg.get("content")
if not isinstance(content, str):
if isinstance(content, list):
content = "\n".join(
p.get("text", "") for p in content if isinstance(p, dict)
)
else:
continue
if any(marker in content for marker in _SUMMARY_MARKERS):
return content
return ""
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
-181
View File
@@ -1,181 +0,0 @@
"""Two-phase probe grading.
Phase 1 **Continuation**: simulate the next assistant turn. Feed the
compressed message list plus the probe question and ask the continuing
model to answer using only the compressed context. This is exactly what
a real next-turn call would look like.
Phase 2 **Grading**: a separate judge-model call scores the answer on
the six rubric dimensions using ``rubric.build_judge_prompt``.
Both phases use the OpenAI SDK directly against the resolved provider
endpoint, so the explicit api_key + base_url we pass always reaches the
wire. (``agent.auxiliary_client.call_llm`` is designed for task-tagged
auxiliary calls backed by config lookups; for eval we need the explicit
credentials to win unconditionally.)
"""
from __future__ import annotations
import logging
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional
_REPO_ROOT = Path(__file__).resolve().parents[2]
if str(_REPO_ROOT) not in sys.path:
sys.path.insert(0, str(_REPO_ROOT))
from openai import OpenAI # noqa: E402
from rubric import build_judge_prompt, parse_judge_response # noqa: E402
logger = logging.getLogger(__name__)
_CONTINUATION_SYSTEM = (
"You are the continuing assistant in a long session. Earlier turns have "
"been compacted into a handoff summary that is now part of the "
"conversation history. The user has just asked you a question. "
"Answer using ONLY what you can determine from the conversation history "
"you see (including the handoff summary). Do NOT invent details. If the "
"summary does not contain a specific fact, say so explicitly rather "
"than guessing. Be direct and concrete — cite file paths, PR numbers, "
"error codes, and exact values when they are present in the summary."
)
def answer_probe(
*,
compressed_messages: List[Dict[str, Any]],
probe_question: str,
model: str,
provider: str,
base_url: str,
api_key: str,
max_tokens: int = 1024,
timeout: Optional[float] = 120.0,
) -> str:
"""Run the continuation call: what does the next assistant answer?
Builds a messages list of [system_continuation, *compressed, probe_user]
and asks the configured model. Returns the answer content as a string.
"""
# Strip any pre-existing system message from the compressed list and
# replace with our continuation system prompt. The fixture's generic
# system is not the right frame for the continuation simulation.
history = [m for m in compressed_messages if m.get("role") != "system"]
messages = (
[{"role": "system", "content": _CONTINUATION_SYSTEM}]
+ _sanitize_for_chat_api(history)
+ [{"role": "user", "content": probe_question}]
)
client = OpenAI(api_key=api_key, base_url=base_url, timeout=timeout)
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
)
content = response.choices[0].message.content
if not isinstance(content, str):
content = "" if content is None else str(content)
return content.strip()
def grade_probe(
*,
probe_question: str,
probe_type: str,
expected_facts: List[str],
assistant_answer: str,
judge_model: str,
judge_provider: str,
judge_base_url: str,
judge_api_key: str,
max_tokens: int = 512,
timeout: Optional[float] = 120.0,
) -> Dict[str, Any]:
"""Run the judge call and parse the six dimension scores.
Returns dict {scores: {dim: int}, notes: str, overall: float,
raw: str, parse_error: str|None}. On parse failure, scores are zeros
and parse_error is populated the caller decides whether to retry
or accept.
"""
prompt = build_judge_prompt(
probe_question=probe_question,
probe_type=probe_type,
expected_facts=expected_facts,
assistant_answer=assistant_answer,
)
client = OpenAI(api_key=judge_api_key, base_url=judge_base_url, timeout=timeout)
response = client.chat.completions.create(
model=judge_model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
)
raw = response.choices[0].message.content or ""
if not isinstance(raw, str):
raw = str(raw)
try:
parsed = parse_judge_response(raw)
parsed["raw"] = raw
parsed["parse_error"] = None
return parsed
except ValueError as exc:
logger.warning("Judge response parse failed: %s | raw=%r", exc, raw[:200])
from rubric import DIMENSIONS
return {
"scores": {d: 0 for d in DIMENSIONS},
"notes": "",
"overall": 0.0,
"raw": raw,
"parse_error": str(exc),
}
def _sanitize_for_chat_api(
messages: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Drop tool_calls/tool pairs that are incomplete.
A compressed message list may contain tool_call references whose matching
``tool`` result was summarized away, which breaks strict-validator
providers (Anthropic, OpenAI). Easiest correct behaviour for the eval:
strip tool_calls entirely and drop ``tool`` role messages the
continuation model only needs the summary + recent turns to answer the
probe, not the precise tool-call bookkeeping.
"""
clean: List[Dict[str, Any]] = []
for m in messages:
role = m.get("role")
if role == "tool":
# Convert tool result to a plain user note so the continuation
# model still sees the content without needing the structured
# tool_call_id pairing.
content = m.get("content")
if isinstance(content, list):
content = "\n".join(
p.get("text", "") for p in content if isinstance(p, dict)
)
clean.append({
"role": "user",
"content": f"[earlier tool result]\n{content or ''}",
})
continue
new = {"role": role, "content": m.get("content", "")}
# Drop tool_calls — the downstream assistant message's content
# still describes what the agent was doing.
clean.append(new)
# Collapse consecutive same-role turns into one (alternation rule)
merged: List[Dict[str, Any]] = []
for m in clean:
if merged and merged[-1]["role"] == m["role"]:
prev = merged[-1]
prev_c = prev.get("content") or ""
new_c = m.get("content") or ""
prev["content"] = f"{prev_c}\n\n{new_c}" if prev_c else new_c
else:
merged.append(m)
return merged
@@ -1,96 +0,0 @@
{
"fixture": "config-build-competitive-scouts",
"description": "Probes for the competitive-scout cron-job setup session. Anchors are which agents were configured, which day of the week each runs, and the full final schedule. This fixture most directly tests artifact-trail and iterative-merge because the job list grows by one per user turn.",
"probes": [
{
"id": "recall-first-repo",
"type": "recall",
"question": "What was the first repository the user asked to create a scout cron for, and on what day of the week?",
"expected_facts": ["openclaw", "Sunday"]
},
{
"id": "recall-closed-source-target",
"type": "recall",
"question": "One of the scout targets does not have an open-source repository and had to be configured as a web scan instead. Which one, and on what day?",
"expected_facts": ["claude code", "Friday", "web scan"]
},
{
"id": "artifact-all-jobs",
"type": "artifact",
"question": "List every scout cron job created in this session.",
"expected_facts": [
"openclaw-pr-scout",
"nanoclaw-pr-scout",
"ironclaw-pr-scout",
"kilocode-pr-scout",
"codex-pr-scout",
"gemini-cli-pr-scout",
"cline-pr-scout",
"opencode-pr-scout",
"claude-code-scout",
"aider-pr-scout",
"roocode-pr-scout"
]
},
{
"id": "artifact-final-schedule",
"type": "artifact",
"question": "What is the final weekly schedule? Give the day and the agents scanned on each day.",
"expected_facts": [
"Sun: openclaw, nanoclaw, ironclaw",
"Mon: kilo code",
"Tue: codex",
"Wed: gemini cli, cline",
"Thu: opencode",
"Fri: claude code",
"Sat: aider, roo"
]
},
{
"id": "artifact-sunday-count",
"type": "artifact",
"question": "How many cron jobs run on Sunday?",
"expected_facts": ["3", "three", "openclaw, nanoclaw, ironclaw"]
},
{
"id": "artifact-total-count",
"type": "artifact",
"question": "How many scout cron jobs were created in total by the end of the session?",
"expected_facts": ["11", "eleven"]
},
{
"id": "decision-kilo-open-source",
"type": "decision",
"question": "The user asked whether Kilo Code is open source. What was the answer, and what did the user decide to do with it?",
"expected_facts": [
"yes, open source",
"Kilo-Org/kilocode",
"added as Monday scout"
]
},
{
"id": "decision-saturday-fill",
"type": "decision",
"question": "Saturday was the last open day at one point. Which scout(s) were placed on Saturday, and why were those chosen?",
"expected_facts": ["aider", "roo", "filled in last based on openrouter popularity / cli comparison rankings"]
},
{
"id": "continuation-execution-time",
"type": "continuation",
"question": "At what local time of day do these scout cron jobs run?",
"expected_facts": ["10 AM Pacific", "17:00 UTC", "0 17 * * *"]
},
{
"id": "continuation-skill-used",
"type": "continuation",
"question": "Each scout job runs with a specific skill preloaded. Which one?",
"expected_facts": ["hermes-agent-dev"]
},
{
"id": "continuation-weekday-coverage",
"type": "continuation",
"question": "After the session ended, are there any weekdays still uncovered by a scout job?",
"expected_facts": ["no", "all 7 days covered", "full week loaded"]
}
]
}
@@ -1,72 +0,0 @@
{
"fixture": "debug-session-feishu-id-model",
"description": "Probes for the Feishu identity-model PR #8388 triage session. Anchors are the PR number, what the PR actually contained, what upstream docs confirmed, and the final decision + reasoning.",
"probes": [
{
"id": "recall-pr-number",
"type": "recall",
"question": "What is the PR number under review in this session, and what repository is it against?",
"expected_facts": ["PR #8388", "NousResearch/hermes-agent", "hermes-agent"]
},
{
"id": "recall-bug-claim",
"type": "recall",
"question": "What is the core bug the PR claims to fix? Be specific about the identifier involved.",
"expected_facts": ["open_id", "app-scoped", "not canonical", "Feishu identity model"]
},
{
"id": "recall-upstream-confirmation",
"type": "recall",
"question": "Do upstream Feishu/Lark docs confirm that open_id is app-scoped rather than a canonical cross-app identity?",
"expected_facts": ["yes", "confirmed", "open.feishu.cn", "same user has different Open IDs in different apps"]
},
{
"id": "artifact-pr-scope",
"type": "artifact",
"question": "Roughly how large is PR #8388, and which gateway subsystems does it touch beyond the Feishu adapter?",
"expected_facts": ["4647 lines", "gateway/run.py", "cron/scheduler.py", "gateway/config.py", "multi-account", "bind"]
},
{
"id": "artifact-new-tool",
"type": "artifact",
"question": "Does the PR add a new tool file? If so, what is its path?",
"expected_facts": ["tools/feishu_id_tool.py", "new file"]
},
{
"id": "decision-pr-assessment",
"type": "decision",
"question": "What is the reviewer's overall assessment of PR #8388 — approve, reject, or something more nuanced? Explain in one sentence.",
"expected_facts": [
"core claim is correct",
"scope is wrong",
"bait-and-switch",
"overbuilt",
"implement cleaner ourselves"
]
},
{
"id": "decision-core-claim-validity",
"type": "decision",
"question": "Setting aside the PR's size, is the underlying identity-model concern technically valid or not?",
"expected_facts": ["technically valid", "correct", "open_id is app-scoped"]
},
{
"id": "continuation-next-action",
"type": "continuation",
"question": "Based on the review outcome, what is the next action the agent has been asked to take regarding this PR?",
"expected_facts": ["close the PR", "implement ourselves", "cleaner", "less complex"]
},
{
"id": "continuation-implementation-scope",
"type": "continuation",
"question": "If implementing the Feishu fix cleanly ourselves, which specific behaviour needs to change — what should replace the current use of open_id?",
"expected_facts": ["use union_id", "or user_id", "canonical identity", "cross-app stable ID"]
},
{
"id": "continuation-sources-to-reference",
"type": "continuation",
"question": "Which upstream documentation sources were fetched during review that should be referenced when writing the clean implementation?",
"expected_facts": ["open.feishu.cn", "open.larkoffice.com", "user-identity-introduction"]
}
]
}
@@ -1,74 +0,0 @@
{
"fixture": "feature-impl-context-priority",
"description": "Probes for the .hermes.md / AGENTS.md / CLAUDE.md / .cursorrules priority feature session. Anchors are the concrete facts the next assistant would need to continue: user's priority order, files modified, helper-function structure, live-test scenarios, and PR number.",
"probes": [
{
"id": "recall-priority-order",
"type": "recall",
"question": "What is the priority order the user asked for when multiple project-context files are present? List them from highest to lowest priority.",
"expected_facts": [".hermes.md", "AGENTS.md", "CLAUDE.md", ".cursorrules", "highest to lowest"]
},
{
"id": "recall-selection-mode",
"type": "recall",
"question": "When multiple context files exist in the same directory, does the agent now load all of them or pick only one?",
"expected_facts": ["only one", "priority-based selection", "highest-priority winner"]
},
{
"id": "artifact-files-modified",
"type": "artifact",
"question": "Which files in the hermes-agent repository were modified during this session? List them.",
"expected_facts": [
"agent/prompt_builder.py",
"tests/agent/test_prompt_builder.py"
]
},
{
"id": "artifact-helper-functions",
"type": "artifact",
"question": "The session introduced separate helper functions for each context-file type. What are their names?",
"expected_facts": [
"_load_hermes_md",
"_load_agents_md",
"_load_claude_md",
"_load_cursorrules"
]
},
{
"id": "artifact-test-scenarios",
"type": "artifact",
"question": "A scratch directory was created with scenario subdirectories to live-test the priority chain. Roughly how many scenarios, and what directory was it created under?",
"expected_facts": ["10 scenarios", "/tmp/context-priority-test"]
},
{
"id": "decision-claude-md-was-unsupported",
"type": "decision",
"question": "What was the finding about CLAUDE.md support in the existing loader before this session's changes?",
"expected_facts": ["CLAUDE.md was not handled", "not supported", "new handler added"]
},
{
"id": "decision-load-all-or-one",
"type": "decision",
"question": "Was the decision to load multiple context files when present, or to load only the highest-priority one? Explain the reasoning in one sentence.",
"expected_facts": ["load only one", "highest priority", "user preference", "do not want to load multiple"]
},
{
"id": "continuation-pr-number-and-status",
"type": "continuation",
"question": "A pull request was opened for this feature. What is the PR number and what is its merge status?",
"expected_facts": ["PR #2301", "merged", "squash"]
},
{
"id": "continuation-test-suite-result",
"type": "continuation",
"question": "What was the result of the full test suite run after the implementation changes?",
"expected_facts": ["5680 passed", "0 failures", "clean"]
},
{
"id": "continuation-next-step",
"type": "continuation",
"question": "If asked to pick up this session, what is the current state of main? Anything left to do?",
"expected_facts": ["merged to main", "main is current", "nothing outstanding", "pulled"]
}
]
}
-235
View File
@@ -1,235 +0,0 @@
"""Markdown report rendering + diff-against-baseline for compression-eval runs.
Report format is optimised for pasting directly into a PR description.
Top-of-report table is the per-fixture medians; below that is the
probe-by-probe miss list (scores < 3.0 on overall).
Diff mode (``compare_to``) emits a second table with deltas per fixture
per dimension against a previous run directory.
"""
from __future__ import annotations
import json
import statistics
from pathlib import Path
from typing import Any, Dict, List, Optional
from rubric import DIMENSIONS
def write_run_json(
*,
results_dir: Path,
fixture_name: str,
run_index: int,
payload: Dict[str, Any],
) -> Path:
"""Dump one fixture's per-run results as JSON for later diffing."""
results_dir.mkdir(parents=True, exist_ok=True)
path = results_dir / f"{fixture_name}-run-{run_index}.json"
with path.open("w") as fh:
json.dump(payload, fh, indent=2, ensure_ascii=False)
return path
def _median(values: List[float]) -> float:
return statistics.median(values) if values else 0.0
def _format_score(value: float) -> str:
return f"{value:.2f}"
def _format_delta(baseline: float, current: float) -> str:
delta = current - baseline
if abs(delta) < 0.01:
return f"{current:.2f} (±0)"
sign = "+" if delta > 0 else ""
return f"{current:.2f} ({sign}{delta:.2f})"
def summarize_fixture_runs(
fixture_runs: List[Dict[str, Any]],
) -> Dict[str, Any]:
"""Collapse N runs of one fixture into per-dimension medians + metadata.
Each run payload is {probes: [{id, type, scores: {...}, overall, ...}]}.
Returns {fixture_name, runs, dimension_medians, overall_median, misses}.
"""
if not fixture_runs:
return {}
fixture_name = fixture_runs[0]["fixture_name"]
n_runs = len(fixture_runs)
# Per-probe-per-dimension aggregation across runs
probe_ids = [p["id"] for p in fixture_runs[0]["probes"]]
per_probe: Dict[str, Dict[str, List[float]]] = {
pid: {d: [] for d in DIMENSIONS} for pid in probe_ids
}
per_probe_overall: Dict[str, List[float]] = {pid: [] for pid in probe_ids}
for run in fixture_runs:
for p in run["probes"]:
pid = p["id"]
for d in DIMENSIONS:
per_probe[pid][d].append(p["scores"].get(d, 0))
per_probe_overall[pid].append(p["overall"])
# Median each probe across runs, then median those medians across probes
dim_medians: Dict[str, float] = {}
for d in DIMENSIONS:
per_probe_med = [_median(per_probe[pid][d]) for pid in probe_ids]
dim_medians[d] = _median(per_probe_med)
overall_median = _median([_median(per_probe_overall[pid]) for pid in probe_ids])
# Misses = probes whose median overall < 3.0
misses: List[Dict[str, Any]] = []
for pid in probe_ids:
med = _median(per_probe_overall[pid])
if med < 3.0:
# Pull the notes from the last run to give the reader a
# concrete clue. (Taking the most recent run is fine —
# notes vary across runs and any one is illustrative.)
notes = ""
probe_type = ""
for p in fixture_runs[-1]["probes"]:
if p["id"] == pid:
notes = p.get("notes", "")
probe_type = p.get("type", "")
break
misses.append({
"id": pid,
"type": probe_type,
"overall_median": med,
"notes": notes,
})
return {
"fixture_name": fixture_name,
"runs": n_runs,
"dimension_medians": dim_medians,
"overall_median": overall_median,
"misses": misses,
"compression": fixture_runs[0].get("compression", {}),
}
def render_report(
*,
label: str,
compressor_model: str,
judge_model: str,
runs_per_fixture: int,
summaries: List[Dict[str, Any]],
baseline_summaries: Optional[List[Dict[str, Any]]] = None,
) -> str:
"""Render the full markdown report.
baseline_summaries is the same shape as summaries, sourced from a
previous run (via --compare-to). When present, dimension scores in
the main table render with deltas.
"""
lines: List[str] = []
lines.append(f"## Compression eval — label `{label}`")
lines.append("")
lines.append(f"- Compressor model: `{compressor_model}`")
lines.append(f"- Judge model: `{judge_model}`")
lines.append(f"- Runs per fixture: {runs_per_fixture}")
lines.append("- Medians over runs reported")
if baseline_summaries:
lines.append("- Deltas shown against baseline run")
lines.append("")
baseline_by_name: Dict[str, Dict[str, Any]] = {}
if baseline_summaries:
baseline_by_name = {s["fixture_name"]: s for s in baseline_summaries}
# Main table
header = ["Fixture"] + DIMENSIONS + ["overall"]
lines.append("| " + " | ".join(header) + " |")
lines.append("|" + "|".join(["---"] * len(header)) + "|")
for s in summaries:
row = [s["fixture_name"]]
baseline = baseline_by_name.get(s["fixture_name"])
for d in DIMENSIONS:
cur = s["dimension_medians"][d]
if baseline and d in baseline.get("dimension_medians", {}):
row.append(_format_delta(baseline["dimension_medians"][d], cur))
else:
row.append(_format_score(cur))
if baseline:
row.append(_format_delta(baseline["overall_median"], s["overall_median"]))
else:
row.append(_format_score(s["overall_median"]))
lines.append("| " + " | ".join(row) + " |")
lines.append("")
# Compression metadata
lines.append("### Compression summary")
lines.append("")
lines.append("| Fixture | Pre tokens | Post tokens | Ratio | Pre msgs | Post msgs |")
lines.append("|---|---|---|---|---|---|")
for s in summaries:
c = s.get("compression", {})
lines.append(
"| {name} | {pre} | {post} | {ratio:.1%} | {pm} | {pom} |".format(
name=s["fixture_name"],
pre=c.get("pre_tokens", 0),
post=c.get("post_tokens", 0),
ratio=c.get("compression_ratio", 0.0),
pm=c.get("pre_message_count", 0),
pom=c.get("post_message_count", 0),
)
)
lines.append("")
# Per-probe misses
any_misses = any(s["misses"] for s in summaries)
if any_misses:
lines.append("### Probes scoring below 3.0 overall (median)")
lines.append("")
for s in summaries:
if not s["misses"]:
continue
lines.append(f"**{s['fixture_name']}**")
for m in s["misses"]:
note_part = f"{m['notes']}" if m["notes"] else ""
lines.append(
f"- `{m['id']}` ({m['type']}): "
f"{m['overall_median']:.2f}{note_part}"
)
lines.append("")
lines.append("### Methodology")
lines.append("")
lines.append(
"Probe-based eval adapted from "
"https://factory.ai/news/evaluating-compression. Each fixture is "
"compressed in a single forced `ContextCompressor.compress()` call, "
"then a continuation call asks the compressor model to answer each "
"probe from the compressed state, then the judge model scores the "
"answer 0-5 on six dimensions. A single run is noisy; medians "
"across multiple runs are the meaningful signal. Changes below "
"~0.3 on any dimension are likely within run-to-run noise."
)
return "\n".join(lines) + "\n"
def load_baseline_summaries(baseline_dir: Path) -> List[Dict[str, Any]]:
"""Load summaries from a previous eval run for --compare-to.
Reads the dumped per-run JSONs and re-summarises them so the
aggregation matches whatever summariser was current at the time of
the new run (forward-compatible with schema additions).
"""
if not baseline_dir.exists():
raise FileNotFoundError(f"baseline dir not found: {baseline_dir}")
by_fixture: Dict[str, List[Dict[str, Any]]] = {}
for path in sorted(baseline_dir.glob("*-run-*.json")):
with path.open() as fh:
payload = json.load(fh)
by_fixture.setdefault(payload["fixture_name"], []).append(payload)
return [summarize_fixture_runs(runs) for runs in by_fixture.values()]
-198
View File
@@ -1,198 +0,0 @@
"""Rubric for probe-based compression eval grading.
Six dimensions scored 0-5 by a judge model. The scoring anchors are spelled
out so the judge interpretation is stable across runs and across judge
models.
Adapted from the methodology in
https://factory.ai/news/evaluating-compression. Their scoreboard is not
adopted; only the dimension definitions and the 0-5 scale.
"""
from __future__ import annotations
from typing import Any, Dict, List
# Canonical dimension order. All reports, parsers, and comparisons derive
# from this list — do not hardcode the order elsewhere.
DIMENSIONS: List[str] = [
"accuracy",
"context_awareness",
"artifact_trail",
"completeness",
"continuity",
"instruction_following",
]
DIMENSION_DESCRIPTIONS: Dict[str, str] = {
"accuracy": (
"Are concrete facts correct — file paths, function names, PR/issue "
"numbers, error codes, command outputs, line numbers? A single wrong "
"path or error code should cost points. Vague but non-contradicting "
"answers score mid-range."
),
"context_awareness": (
"Does the answer reflect the CURRENT state of the session, not a "
"mid-session snapshot? For example, if a file was modified then "
"reverted, does the answer describe the reverted state? If three "
"PRs were opened, does the answer know which was merged?"
),
"artifact_trail": (
"Does the answer correctly enumerate the artifacts (files read, "
"files modified, commands run, tools called, PRs opened, cron jobs "
"created)? Missing artifacts cost more than extra unrelated ones."
),
"completeness": (
"Does the answer address ALL parts of the probe question? If the "
"probe asks for three things and only two are answered, that is "
"incomplete regardless of accuracy on the two."
),
"continuity": (
"Could the next assistant continue the work using only this answer, "
"without having to re-fetch files or re-explore the codebase? An "
"answer that lists files by name but doesn't mention the change is "
"poor continuity even if accurate."
),
"instruction_following": (
"Is the answer in the format the probe requested (list, number, "
"short phrase, yes/no)? Ignore tone and length, only assess "
"whether the requested form was honoured."
),
}
SCORE_SCALE: Dict[int, str] = {
0: "No useful information; wrong or hallucinated.",
1: "Major gaps or a key fact is wrong.",
2: "Partially correct but significant omissions.",
3: "Mostly correct with minor omissions or imprecision.",
4: "Correct and complete with only trivial imprecision.",
5: "Fully correct, complete, and in the requested format.",
}
_RUBRIC_HEADER = """You are an evaluator grading a single answer produced by an AI assistant \
that was given a COMPRESSED handoff summary of an earlier conversation and \
asked a probe question. You are NOT evaluating the compression summary \
directly you are evaluating whether the answer the assistant produced \
from that summary is correct, complete, and useful.
Grade on six dimensions, each 0-5:
{dimension_block}
0-5 scale:
{scale_block}
Grade strictly. Fractional scores are NOT allowed output integers only. \
If the answer is ambiguous, use the lower of the two candidate scores."""
def build_judge_prompt(
*,
probe_question: str,
probe_type: str,
expected_facts: List[str],
assistant_answer: str,
) -> str:
"""Build the full judge prompt for one (probe, answer) pair.
The judge is told the expected_facts up front so grading is anchored to
concrete signal rather than judge taste. Expected facts are intentionally
NOT shown to the assistant that produces the answer.
"""
dim_block = "\n".join(
f"- {d}: {DIMENSION_DESCRIPTIONS[d]}" for d in DIMENSIONS
)
scale_block = "\n".join(
f" {score}: {desc}" for score, desc in sorted(SCORE_SCALE.items())
)
header = _RUBRIC_HEADER.format(
dimension_block=dim_block,
scale_block=scale_block,
)
expected_block = (
"\n".join(f"- {f}" for f in expected_facts) if expected_facts else "(none provided)"
)
output_schema = (
"Respond with ONLY a JSON object, no prose before or after, matching "
"this schema exactly:\n"
"{\n"
' "accuracy": <int 0-5>,\n'
' "context_awareness": <int 0-5>,\n'
' "artifact_trail": <int 0-5>,\n'
' "completeness": <int 0-5>,\n'
' "continuity": <int 0-5>,\n'
' "instruction_following": <int 0-5>,\n'
' "notes": "<one short sentence, <=200 chars, identifying the '
'single biggest issue with the answer if any>"\n'
"}"
)
return (
f"{header}\n\n"
f"PROBE TYPE: {probe_type}\n\n"
f"PROBE QUESTION:\n{probe_question}\n\n"
f"EXPECTED FACTS (the answer should contain these concrete anchors; "
f"missing any is a material defect in accuracy and/or completeness):\n"
f"{expected_block}\n\n"
f"ASSISTANT ANSWER TO GRADE:\n{assistant_answer}\n\n"
f"{output_schema}"
)
def parse_judge_response(raw: str) -> Dict[str, Any]:
"""Parse the judge model's JSON response into a score dict.
Tolerates surrounding prose (judges ignore instructions sometimes) by
extracting the first {...} block. Validates that every dimension is
present as an integer 0-5.
Returns dict with keys: scores (dim->int), notes (str), overall (float).
Raises ValueError if the response cannot be parsed into a complete
score set.
"""
import json
import re
if not raw or not raw.strip():
raise ValueError("empty judge response")
# Strip code fences and any ```json prefix judges sometimes emit.
stripped = raw.strip()
fence_match = re.match(r"^```(?:json)?\s*(.*?)\s*```$", stripped, re.DOTALL)
if fence_match:
stripped = fence_match.group(1).strip()
# Extract the first {...} block greedy-to-matching-brace.
brace_match = re.search(r"\{.*\}", stripped, re.DOTALL)
if not brace_match:
raise ValueError(f"no JSON object found in judge response: {raw[:200]!r}")
candidate = brace_match.group(0)
try:
parsed = json.loads(candidate)
except json.JSONDecodeError as exc:
raise ValueError(f"judge response not valid JSON: {exc}; raw={candidate[:200]!r}")
scores: Dict[str, int] = {}
for dim in DIMENSIONS:
if dim not in parsed:
raise ValueError(f"judge response missing dimension {dim!r}: {parsed}")
value = parsed[dim]
if isinstance(value, bool) or not isinstance(value, (int, float)):
raise ValueError(f"dimension {dim} is not numeric: {value!r}")
int_val = int(round(value))
if int_val < 0 or int_val > 5:
raise ValueError(f"dimension {dim} out of range: {int_val}")
scores[dim] = int_val
notes_val = parsed.get("notes", "")
notes = str(notes_val)[:200] if notes_val else ""
overall = sum(scores.values()) / len(scores)
return {
"scores": scores,
"notes": notes,
"overall": overall,
}
-383
View File
@@ -1,383 +0,0 @@
#!/usr/bin/env python3
"""Compression eval — entry point.
Runs the full probe-based eval over one or more fixtures, produces a
markdown report in ``results/<label>/report.md`` paired with per-run JSON
for later diffing.
Not a pytest. Requires a configured provider + credentials (same path the
agent uses). Does not run in CI. See README.md for usage examples.
"""
from __future__ import annotations
import json
import logging
import sys
import time
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
_HERE = Path(__file__).resolve().parent
_REPO_ROOT = _HERE.parents[1]
if str(_REPO_ROOT) not in sys.path:
sys.path.insert(0, str(_REPO_ROOT))
# Make our sibling modules importable whether invoked as a script or as -m.
if str(_HERE) not in sys.path:
sys.path.insert(0, str(_HERE))
try:
import fire # noqa: F401
except ImportError:
fire = None # fallback to argparse if fire is unavailable
from hermes_cli.runtime_provider import resolve_runtime_provider # noqa: E402
from compressor_driver import run_compression # noqa: E402
from grader import answer_probe, grade_probe # noqa: E402
from report import ( # noqa: E402
load_baseline_summaries,
render_report,
summarize_fixture_runs,
write_run_json,
)
logger = logging.getLogger("compression_eval")
FIXTURES_DIR = _HERE / "fixtures"
PROBES_DIR = _HERE / "probes"
RESULTS_DIR = _HERE / "results"
def _load_fixture(name: str) -> Dict[str, Any]:
path = FIXTURES_DIR / f"{name}.json"
if not path.exists():
available = sorted(p.stem for p in FIXTURES_DIR.glob("*.json"))
raise FileNotFoundError(
f"Fixture not found: {name}. Available: {available}"
)
with path.open() as fh:
return json.load(fh)
def _load_probes(name: str) -> Dict[str, Any]:
path = PROBES_DIR / f"{name}.probes.json"
if not path.exists():
raise FileNotFoundError(f"Probe bank not found for fixture {name}: {path}")
with path.open() as fh:
return json.load(fh)
def _resolve_runtime(
*,
provider_override: Optional[str],
model_override: Optional[str],
) -> Dict[str, Any]:
"""Resolve provider credentials via the same path the agent uses."""
runtime = resolve_runtime_provider(
requested=provider_override,
target_model=model_override,
)
if not runtime.get("api_key") and not runtime.get("base_url"):
raise RuntimeError(
"No provider configured. Run `hermes setup` or set provider "
"credentials in the environment before running the eval."
)
return runtime
def _available_fixtures() -> List[str]:
return sorted(p.stem for p in FIXTURES_DIR.glob("*.json"))
def _run_one_fixture(
*,
fixture_name: str,
run_index: int,
compressor_runtime: Dict[str, Any],
compressor_model: str,
judge_runtime: Dict[str, Any],
judge_model: str,
focus_topic: Optional[str],
) -> Dict[str, Any]:
fx = _load_fixture(fixture_name)
probes = _load_probes(fixture_name)
logger.info(
"[%s run=%d] compressing (%d messages, ctx=%d)",
fixture_name, run_index, len(fx["messages"]), fx["context_length"],
)
compression = run_compression(
messages=fx["messages"],
compressor_model=compressor_model,
compressor_provider=compressor_runtime["provider"],
compressor_base_url=compressor_runtime["base_url"],
compressor_api_key=compressor_runtime["api_key"],
compressor_api_mode=compressor_runtime.get("api_mode", ""),
context_length=fx["context_length"],
focus_topic=focus_topic,
# Force the compressor to use the model we're testing, bypassing
# any auxiliary.compression.model config override. Without this,
# ContextCompressor.call_llm(task="compression") routes through
# the user's config which may pin a different model (e.g.
# google/gemini-3-flash-preview).
summary_model_override=compressor_model,
)
logger.info(
"[%s run=%d] compressed %d -> %d tokens (%.1f%%)",
fixture_name, run_index,
compression["pre_tokens"], compression["post_tokens"],
compression["compression_ratio"] * 100,
)
probe_results: List[Dict[str, Any]] = []
for probe in probes["probes"]:
t0 = time.monotonic()
try:
answer = answer_probe(
compressed_messages=compression["compressed_messages"],
probe_question=probe["question"],
provider=compressor_runtime["provider"],
model=compressor_model,
base_url=compressor_runtime["base_url"],
api_key=compressor_runtime["api_key"],
)
except Exception as exc:
logger.warning(
"[%s run=%d probe=%s] continuation failed: %s",
fixture_name, run_index, probe["id"], exc,
)
answer = ""
try:
grade = grade_probe(
probe_question=probe["question"],
probe_type=probe["type"],
expected_facts=probe.get("expected_facts", []),
assistant_answer=answer,
judge_provider=judge_runtime["provider"],
judge_model=judge_model,
judge_base_url=judge_runtime["base_url"],
judge_api_key=judge_runtime["api_key"],
)
except Exception as exc:
logger.warning(
"[%s run=%d probe=%s] grading failed: %s",
fixture_name, run_index, probe["id"], exc,
)
from rubric import DIMENSIONS
grade = {
"scores": {d: 0 for d in DIMENSIONS},
"notes": f"grading error: {exc}",
"overall": 0.0,
"raw": "",
"parse_error": str(exc),
}
elapsed = time.monotonic() - t0
logger.info(
"[%s run=%d probe=%s] overall=%.2f (%.1fs)",
fixture_name, run_index, probe["id"], grade["overall"], elapsed,
)
probe_results.append({
"id": probe["id"],
"type": probe["type"],
"question": probe["question"],
"expected_facts": probe.get("expected_facts", []),
"answer": answer,
"scores": grade["scores"],
"overall": grade["overall"],
"notes": grade["notes"],
"parse_error": grade["parse_error"],
"elapsed_seconds": elapsed,
})
return {
"fixture_name": fixture_name,
"run_index": run_index,
"compression": {
"pre_tokens": compression["pre_tokens"],
"post_tokens": compression["post_tokens"],
"compression_ratio": compression["compression_ratio"],
"pre_message_count": compression["pre_message_count"],
"post_message_count": compression["post_message_count"],
"summary_text": compression["summary_text"],
},
"probes": probe_results,
}
def _coerce_fixtures_arg(arg: Optional[str]) -> List[str]:
if not arg:
return _available_fixtures()
return [s.strip() for s in arg.split(",") if s.strip()]
def main(
fixtures: Optional[str] = None,
runs: int = 3,
judge_model: Optional[str] = None,
judge_provider: Optional[str] = None,
compressor_model: Optional[str] = None,
compressor_provider: Optional[str] = None,
label: Optional[str] = None,
focus_topic: Optional[str] = None,
compare_to: Optional[str] = None,
verbose: bool = False,
) -> int:
"""Run the compression eval.
Args:
fixtures: Comma-separated fixture names; default = all in fixtures/.
runs: Runs per fixture. Medians reported. Default 3.
judge_model: Override the judge model (default = same as
compressor model resolved from config).
judge_provider: Override the judge provider.
compressor_model: Override the compressor model (default =
whatever resolve_runtime_provider returns for the active
configuration).
compressor_provider: Override the compressor provider.
label: Output subdirectory under results/. Default = timestamp.
focus_topic: Optional focus topic passed through to
ContextCompressor.compress(focus_topic=...).
compare_to: Path to a previous run directory (e.g.
results/2026-04-24_baseline) to diff against in the report.
verbose: Print debug logs.
"""
logging.basicConfig(
level=logging.DEBUG if verbose else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
fixture_names = _coerce_fixtures_arg(fixtures)
# Validate every fixture has a probe bank before spending any money.
for name in fixture_names:
_load_fixture(name)
_load_probes(name)
compressor_runtime = _resolve_runtime(
provider_override=compressor_provider,
model_override=compressor_model,
)
effective_compressor_model = (
compressor_model or compressor_runtime.get("resolved_model") or "auto"
)
if effective_compressor_model == "auto":
# resolve_runtime_provider doesn't always fill resolved_model;
# fall back to reading model.default from config.
from hermes_cli.config import load_config
cfg = load_config()
mc = cfg.get("model", {}) or {}
if isinstance(mc, dict):
effective_compressor_model = (
mc.get("default") or mc.get("model") or "anthropic/claude-sonnet-4.6"
)
else:
effective_compressor_model = str(mc) or "anthropic/claude-sonnet-4.6"
if judge_provider or judge_model:
judge_runtime = _resolve_runtime(
provider_override=judge_provider,
model_override=judge_model,
)
effective_judge_model = judge_model or effective_compressor_model
else:
judge_runtime = compressor_runtime
effective_judge_model = effective_compressor_model
effective_label = label or datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
out_dir = RESULTS_DIR / effective_label
out_dir.mkdir(parents=True, exist_ok=True)
logger.info(
"Compression eval starting: label=%s fixtures=%s runs=%d "
"compressor=%s judge=%s out=%s",
effective_label, fixture_names, runs,
effective_compressor_model, effective_judge_model, out_dir,
)
all_summaries: List[Dict[str, Any]] = []
for fixture_name in fixture_names:
per_run: List[Dict[str, Any]] = []
for run_i in range(1, runs + 1):
payload = _run_one_fixture(
fixture_name=fixture_name,
run_index=run_i,
compressor_runtime=compressor_runtime,
compressor_model=effective_compressor_model,
judge_runtime=judge_runtime,
judge_model=effective_judge_model,
focus_topic=focus_topic,
)
write_run_json(
results_dir=out_dir,
fixture_name=fixture_name,
run_index=run_i,
payload=payload,
)
per_run.append(payload)
summary = summarize_fixture_runs(per_run)
all_summaries.append(summary)
baseline_summaries: Optional[List[Dict[str, Any]]] = None
if compare_to:
baseline_path = Path(compare_to)
if not baseline_path.is_absolute():
baseline_path = _HERE / baseline_path
baseline_summaries = load_baseline_summaries(baseline_path)
report_md = render_report(
label=effective_label,
compressor_model=effective_compressor_model,
judge_model=effective_judge_model,
runs_per_fixture=runs,
summaries=all_summaries,
baseline_summaries=baseline_summaries,
)
report_path = out_dir / "report.md"
report_path.write_text(report_md)
# Also write a machine-readable summary.json alongside the human report.
summary_path = out_dir / "summary.json"
with summary_path.open("w") as fh:
json.dump(
{
"label": effective_label,
"compressor_model": effective_compressor_model,
"judge_model": effective_judge_model,
"runs_per_fixture": runs,
"fixtures": all_summaries,
},
fh,
indent=2,
ensure_ascii=False,
)
print()
print(report_md)
print(f"Report written to {report_path}")
print(f"Per-run JSON in {out_dir}")
return 0
if __name__ == "__main__":
if fire is not None:
# fire preserves docstrings as --help and handles kwarg-style CLI.
sys.exit(fire.Fire(main))
else:
import argparse
p = argparse.ArgumentParser()
p.add_argument("--fixtures")
p.add_argument("--runs", type=int, default=3)
p.add_argument("--judge-model", dest="judge_model")
p.add_argument("--judge-provider", dest="judge_provider")
p.add_argument("--compressor-model", dest="compressor_model")
p.add_argument("--compressor-provider", dest="compressor_provider")
p.add_argument("--label")
p.add_argument("--focus-topic", dest="focus_topic")
p.add_argument("--compare-to", dest="compare_to")
p.add_argument("--verbose", action="store_true")
args = p.parse_args()
sys.exit(main(**vars(args)))
-381
View File
@@ -1,381 +0,0 @@
"""One-shot fixture scrubber for scripts/compression_eval/fixtures/.
Source: ~/.hermes/sessions/<file>.jsonl
Output: .worktrees/.../scripts/compression_eval/fixtures/<name>.json
Scrubbing passes:
1. agent.redact.redact_sensitive_text API keys, tokens, connection strings
2. Username paths /home/teknium/ /home/user/, ~/.hermes/ preserved as-is
(that path is universal)
3. Personal handles "Teknium"/"teknium"/"teknium1" "user"
4. Reasoning scratchpads strip <REASONING_SCRATCHPAD>...</REASONING_SCRATCHPAD>
blocks and <think>...</think> tags (personality leakage risk)
5. session_meta line drop entirely, we only need the messages
6. User message personality lightly paraphrase the first user message to keep
task intent while removing "vibe"; subsequent user turns kept verbatim
since they're short instructions
The fixture format matches DESIGN.md:
{
"name": "...",
"description": "...",
"model": "...", # best guess from original session
"context_length": 200000,
"messages": [...], # OpenAI-format, only role/content/tool_calls/tool_call_id/tool_name
"notes": "Scrubbed from ~/.hermes/sessions/... on 2026-04-24"
}
"""
from __future__ import annotations
import json
import re
import sys
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List
# Resolve the hermes-agent checkout relative to this script so agent.redact
# imports cleanly whether we run from a worktree or a main clone.
_REPO_ROOT = Path(__file__).resolve().parents[2]
sys.path.insert(0, str(_REPO_ROOT))
from agent.redact import redact_sensitive_text # noqa: E402
SESSION_DIR = Path.home() / ".hermes" / "sessions"
# Resolve FIXTURES_DIR relative to this script so the scrubber runs the
# same way inside a worktree, a main checkout, or from a contributor's
# clone at a different path.
FIXTURES_DIR = Path(__file__).resolve().parent / "fixtures"
# (source_file, output_name, description, user_first_paraphrase, model_guess, context_length, truncate_at)
# truncate_at: keep messages[:truncate_at] (None = keep all). Applied BEFORE
# orphan-empty-assistant cleanup.
SPECS = [
(
"20260321_060441_fef7be92.jsonl",
"feature-impl-context-priority",
"~75-turn feature-impl: user asks how multiple project-context files "
"(.hermes.md / AGENTS.md / CLAUDE.md / .cursorrules) are handled when "
"all are present; agent investigates the codebase, designs a priority "
"order, patches the loader + tests, live-tests with a scenario "
"directory, commits to a feature branch, opens a PR, and merges after "
"approval. Exercises investigate → decide → implement → verify → "
"ship flow with clear artifact trail (2 files modified, 1 PR).",
(
"If .hermes.md, AGENTS.md, CLAUDE.md, and .cursorrules all exist in "
"the same directory, does the agent load all of them or pick one? "
"Use the hermes-agent-dev skill to check."
),
"anthropic/claude-sonnet-4.6",
200000,
74, # cut at "Merged and pulled. Main is current." — drops trailing unrelated cron-delivery messages
),
(
"20260412_233741_3f2119a8.jsonl",
"debug-session-feishu-id-model",
"~60-turn debug/triage PR-review session: a third-party bug report "
"says the gateway's Feishu adapter misuses the open_id / union_id / "
"user_id identity model (open_id is app-scoped, not the bot's "
"canonical ID). An open community PR (#8388) tries to fix it. Agent "
"reviews the PR against current main, fetches upstream Feishu/Lark "
"identity docs, and produces a decision. Exercises long tool-heavy "
"context with PR diffs, upstream docs, and a clear decision at the "
"end — the classic 'can the summary still name the PR number, the "
"root cause, and the decision?' scenario.",
(
"A community user reports the Feishu/Lark adapter gets the identity "
"model wrong — open_id is app-scoped, not the bot's canonical ID. "
"There's an open PR #8388 trying to fix it. Use the hermes-agent-dev "
"skill and the pr-triage-salvage skill to review it."
),
"anthropic/claude-sonnet-4.6",
200000,
58, # end at "Here's my review: ..." — clean decision point before the "close it, implement cleaner" pivot
),
(
"20260328_160817_77bd258b.jsonl",
"config-build-competitive-scouts",
"~60-turn iterative config/build session: user wants a set of weekly "
"cron jobs that scan competing AI coding agents (openclaw, nanoclaw, "
"ironclaw, codex, opencode, claude-code, kilo-code, gemini-cli, "
"cline, aider, roo) for merged PRs or web updates worth porting to "
"hermes-agent. User adds one target per turn; agent creates each cron "
"job and re-states the accumulated schedule. Exercises artifact trail "
"(which jobs are configured, which days) and iterative state "
"accumulation — the canonical case for iterative-merge summarization.",
(
"Set up a cron job for the agent every Sunday to scan all PRs "
"merged into openclaw that week, decide which are worth adding to "
"hermes-agent, and open PRs porting those features."
),
"anthropic/claude-sonnet-4.6",
200000,
None,
),
]
# Tool output truncation is DELIBERATELY DISABLED.
#
# An earlier iteration truncated tool outputs > 2KB to keep fixture JSON
# files small, but that defeats the whole purpose of the eval. Real
# sessions have 30KB skill_view dumps, 10KB read_file outputs, 5KB
# web_extract bodies — compression has to either head-protect them,
# summarize them, or drop them. A fixture without that load doesn't
# exercise the compressor. The size win wasn't worth the signal loss.
#
# The function remains so the scrubbing_passes record in the fixture
# JSON continues to truthfully describe what was applied (no-op in this
# configuration).
_TOOL_OUTPUT_MAX = None # None disables truncation entirely
def _maybe_truncate_tool_output(text: str, tool_name: str) -> str:
if _TOOL_OUTPUT_MAX is None or not text or len(text) <= _TOOL_OUTPUT_MAX:
return text
keep = _TOOL_OUTPUT_MAX - 200
head = text[:keep]
return (
head
+ f"\n\n[... tool output truncated for fixture — original was {len(text)} chars"
+ (f" from {tool_name}" if tool_name else "")
+ "]"
)
_PATH_RE = re.compile(r"/home/teknium\b")
# No \b boundaries — some tool content stores newlines as the literal
# two-char sequence "\\n" (escaped JSON), so a "\\nTeknium..." run has a
# word char ('n') immediately before 'T' and \b fails. Substring match is
# safer here; "Teknium" as a substring of an unrelated word is
# implausible in this corpus.
_USER_RE = re.compile(r"teknium1|Teknium|teknium", re.IGNORECASE)
# Only strip scratchpads in ASSISTANT content, not tool results (might be legit)
_SCRATCH_RE = re.compile(
r"<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>\s*", re.DOTALL
)
_THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)
# Discord/Telegram user mention leakage in messaging-platform sessions
_USER_MENTION_RE = re.compile(r"<@\*{3}>|<@\d+>")
# Contributor emails (from git show output etc) — anything@domain.tld
# Keep noreply@github-style placeholders obvious; real personal emails get
# replaced with a contributor placeholder.
_EMAIL_RE = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b")
# "Author: Name <email>" git-show headers — rewrite the whole line
_GIT_AUTHOR_RE = re.compile(r"Author:\s*[^<\n]+<[^>]+>")
def _scrub_text(text: str, *, drop_scratchpads: bool = False) -> str:
"""Apply the pipeline to a raw text string.
drop_scratchpads only affects assistant messages tool outputs that
happen to contain similar markers are left alone.
"""
if not text:
return text
if drop_scratchpads:
text = _SCRATCH_RE.sub("", text)
text = _THINK_RE.sub("", text)
text = _PATH_RE.sub("/home/user", text)
text = _USER_RE.sub("user", text)
text = _USER_MENTION_RE.sub("<@user>", text)
# Rewrite git "Author: Name <email>" lines before generic email replace
text = _GIT_AUTHOR_RE.sub("Author: contributor <contributor@example.com>", text)
text = _EMAIL_RE.sub("contributor@example.com", text)
text = redact_sensitive_text(text)
return text
def _content_to_str(content: Any) -> str:
if content is None:
return ""
if isinstance(content, str):
return content
if isinstance(content, list):
parts = []
for p in content:
if isinstance(p, dict) and "text" in p:
parts.append(p["text"])
elif isinstance(p, str):
parts.append(p)
return "\n".join(parts)
return str(content)
def _scrub_tool_calls(tool_calls: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
out = []
for tc in tool_calls or []:
if not isinstance(tc, dict):
continue
fn = tc.get("function", {}) or {}
args = fn.get("arguments", "")
if isinstance(args, str):
args = _scrub_text(args)
new_tc = {
"id": tc.get("id", ""),
"type": tc.get("type", "function"),
"function": {
"name": fn.get("name", ""),
"arguments": args,
},
}
out.append(new_tc)
return out
def _scrub_message(m: Dict[str, Any], *, first_user_paraphrase: str | None, user_turn_idx: List[int]) -> Dict[str, Any] | None:
role = m.get("role")
if role in (None, "session_meta"):
return None
content = _content_to_str(m.get("content"))
if role == "assistant":
content = _scrub_text(content, drop_scratchpads=True)
elif role == "user":
# Use paraphrase for the very first user turn only
user_turn_idx[0] += 1
if user_turn_idx[0] == 1 and first_user_paraphrase is not None:
content = first_user_paraphrase
else:
content = _scrub_text(content)
else:
content = _scrub_text(content)
# Truncate large tool outputs
if role == "tool":
tn = m.get("tool_name") or m.get("name") or ""
content = _maybe_truncate_tool_output(content, tn)
new_msg: Dict[str, Any] = {"role": role, "content": content}
if role == "assistant":
tcs = m.get("tool_calls") or []
if tcs:
new_msg["tool_calls"] = _scrub_tool_calls(tcs)
if role == "tool":
if m.get("tool_call_id"):
new_msg["tool_call_id"] = m["tool_call_id"]
if m.get("tool_name") or m.get("name"):
new_msg["tool_name"] = m.get("tool_name") or m.get("name")
return new_msg
def build_fixture(
source_file: str,
output_name: str,
description: str,
first_user_paraphrase: str,
model_guess: str,
context_length: int,
truncate_at: int | None = None,
) -> Dict[str, Any]:
src = SESSION_DIR / source_file
raw_msgs: List[Dict[str, Any]] = []
with src.open() as fh:
for line in fh:
try:
raw_msgs.append(json.loads(line))
except Exception:
pass
# Skip session_meta lines up front so truncate_at counts real messages
raw_msgs = [m for m in raw_msgs if m.get("role") != "session_meta"]
if truncate_at is not None:
raw_msgs = raw_msgs[:truncate_at]
user_turn_counter = [0]
scrubbed: List[Dict[str, Any]] = []
for m in raw_msgs:
new = _scrub_message(
m,
first_user_paraphrase=first_user_paraphrase,
user_turn_idx=user_turn_counter,
)
if new is not None:
scrubbed.append(new)
# Drop empty-content assistant messages that have no tool_calls
# (artifact of scratchpad-only turns post-scrub)
pruned: List[Dict[str, Any]] = []
for m in scrubbed:
if (
m["role"] == "assistant"
and not (m.get("content") or "").strip()
and not m.get("tool_calls")
):
continue
pruned.append(m)
# Trim trailing orphan tool messages (no matching assistant)
while pruned and pruned[-1]["role"] == "tool":
pruned.pop()
scrubbed = pruned
# Inject a synthetic public-safe system message so the compressor has
# a head to anchor on. The real system prompts embed personality and
# platform-specific content we don't want checked in.
system_msg = {
"role": "system",
"content": (
"You are a helpful AI coding assistant with access to tools "
"(terminal, file editing, search, web, etc.). You operate in a "
"conversational loop: the user gives you a task, you call tools "
"to accomplish it, and you report back concisely."
),
}
if scrubbed and scrubbed[0].get("role") == "system":
scrubbed[0] = system_msg
else:
scrubbed.insert(0, system_msg)
fixture = {
"name": output_name,
"description": description,
"model": model_guess,
"context_length": context_length,
"source": f"~/.hermes/sessions/{source_file}",
"truncated_to": truncate_at,
"scrubbed_at": datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
"scrubbing_passes": [
"redact_sensitive_text (agent.redact)",
"username paths replaced with /home/user",
"personal handles (all case variants of the maintainer name) replaced with 'user'",
"email addresses replaced with contributor@example.com",
"git 'Author: Name <addr>' header lines normalised",
"reasoning scratchpad blocks stripped from assistant content",
"think tag blocks stripped from assistant content",
"messaging-platform user mentions replaced with <@user>",
"first user message paraphrased to remove personal voice",
"subsequent user messages kept verbatim (after above redactions)",
"system prompt replaced with generic public-safe placeholder",
"orphan empty-assistant messages and trailing tool messages dropped",
"tool outputs preserved verbatim (truncation disabled so the compressor sees real load)",
],
"messages": scrubbed,
}
return fixture
def main() -> int:
FIXTURES_DIR.mkdir(parents=True, exist_ok=True)
for spec in SPECS:
source_file, output_name, description, paraphrase, model, ctx, truncate = spec
fixture = build_fixture(
source_file=source_file,
output_name=output_name,
description=description,
first_user_paraphrase=paraphrase,
model_guess=model,
context_length=ctx,
truncate_at=truncate,
)
out_path = FIXTURES_DIR / f"{output_name}.json"
with out_path.open("w") as fh:
json.dump(fixture, fh, indent=2, ensure_ascii=False)
size_kb = out_path.stat().st_size / 1024
print(f" {output_name}.json {size_kb:.1f} KB {len(fixture['messages'])} msgs")
return 0
if __name__ == "__main__":
sys.exit(main())
+99 -7
View File
@@ -29,10 +29,25 @@ BOLD='\033[1m'
REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
# INSTALL_DIR is resolved AFTER arg parsing and OS detection so we can pick an
# FHS-style layout for root installs. Track whether the user gave us an
# explicit directory — if so we never override it.
if [ -n "${HERMES_INSTALL_DIR:-}" ]; then
INSTALL_DIR="$HERMES_INSTALL_DIR"
INSTALL_DIR_EXPLICIT=true
else
INSTALL_DIR=""
INSTALL_DIR_EXPLICIT=false
fi
PYTHON_VERSION="3.11"
NODE_VERSION="22"
# FHS-style root install layout (set by resolve_install_layout when applicable):
# code at /usr/local/lib/hermes-agent, command at /usr/local/bin/hermes,
# data still at /root/.hermes (HERMES_HOME). Matches Claude Code / Codex CLI
# and keeps Docker bind-mounted /root/ volumes lean.
ROOT_FHS_LAYOUT=false
# Options
USE_VENV=true
RUN_SETUP=true
@@ -64,6 +79,7 @@ while [[ $# -gt 0 ]]; do
;;
--dir)
INSTALL_DIR="$2"
INSTALL_DIR_EXPLICIT=true
shift 2
;;
--hermes-home)
@@ -79,9 +95,20 @@ while [[ $# -gt 0 ]]; do
echo " --no-venv Don't create virtual environment"
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " --dir PATH Installation directory"
echo " default (non-root): ~/.hermes/hermes-agent"
echo " default (root, Linux): /usr/local/lib/hermes-agent"
echo " --hermes-home PATH Data directory (default: ~/.hermes, or \$HERMES_HOME)"
echo " -h, --help Show this help"
echo ""
echo "Notes:"
echo " When running as root on Linux, Hermes installs the code under"
echo " /usr/local/lib/hermes-agent and links the command into"
echo " /usr/local/bin/hermes (FHS layout — matches Claude Code / Codex CLI)."
echo " Data, config, sessions, and logs still live in \$HERMES_HOME"
echo " (default /root/.hermes). This keeps Docker bind-mounted volumes"
echo " small and ensures the command is on PATH for all shells."
echo " Existing installs at \$HERMES_HOME/hermes-agent are preserved in-place."
exit 0
;;
*)
@@ -163,9 +190,60 @@ is_termux() {
[ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
}
# Decide where the repo checkout + venv live, and where the `hermes` command
# symlink goes. Called after detect_os so $OS/$DISTRO are known.
#
# Defaults:
# - Non-root, any OS: INSTALL_DIR = $HERMES_HOME/hermes-agent
# command link in $HOME/.local/bin
# - Termux (any uid): INSTALL_DIR = $HERMES_HOME/hermes-agent
# command link in $PREFIX/bin (already on PATH)
# - Root on Linux (new): INSTALL_DIR = /usr/local/lib/hermes-agent
# command link in /usr/local/bin
# (unless a legacy install already exists at
# $HERMES_HOME/hermes-agent — then preserve it)
#
# Always no-op when the user set --dir or $HERMES_INSTALL_DIR.
resolve_install_layout() {
if [ "$INSTALL_DIR_EXPLICIT" = true ]; then
log_info "Install directory: $INSTALL_DIR (explicit)"
return 0
fi
# Termux: package manager manages /data/data/..., keep code in HERMES_HOME.
if is_termux; then
INSTALL_DIR="$HERMES_HOME/hermes-agent"
return 0
fi
# Root on Linux: prefer FHS layout unless a legacy install already exists.
# macOS root installs keep the legacy layout because /usr/local/ on macOS
# is Homebrew territory and we don't want to fight that.
if [ "$OS" = "linux" ] && [ "$(id -u)" -eq 0 ]; then
if [ -d "$HERMES_HOME/hermes-agent/.git" ]; then
INSTALL_DIR="$HERMES_HOME/hermes-agent"
log_info "Existing install detected at $INSTALL_DIR — keeping legacy layout"
log_info " (new root installs use /usr/local/lib/hermes-agent)"
return 0
fi
INSTALL_DIR="/usr/local/lib/hermes-agent"
ROOT_FHS_LAYOUT=true
log_info "Root install on Linux — using FHS layout"
log_info " Code: $INSTALL_DIR"
log_info " Command: /usr/local/bin/hermes"
log_info " Data: $HERMES_HOME (unchanged)"
return 0
fi
# Default: non-root, non-Termux → legacy user-scoped layout.
INSTALL_DIR="$HERMES_HOME/hermes-agent"
}
get_command_link_dir() {
if is_termux && [ -n "${PREFIX:-}" ]; then
echo "$PREFIX/bin"
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo "/usr/local/bin"
else
echo "$HOME/.local/bin"
fi
@@ -174,6 +252,8 @@ get_command_link_dir() {
get_command_link_display_dir() {
if is_termux && [ -n "${PREFIX:-}" ]; then
echo '$PREFIX/bin'
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo '/usr/local/bin'
else
echo '~/.local/bin'
fi
@@ -975,6 +1055,14 @@ setup_path() {
return 0
fi
# FHS layout: /usr/local/bin is on PATH for every standard shell, nothing to inject.
if [ "$ROOT_FHS_LAYOUT" = true ]; then
export PATH="$command_link_dir:$PATH"
log_info "/usr/local/bin is already on PATH for all shells"
log_success "hermes command ready"
return 0
fi
# Check if ~/.local/bin is on PATH; if not, add it to shell config.
# Detect the user's actual login shell (not the shell running this script,
# which is always bash when piped from curl).
@@ -1339,12 +1427,12 @@ print_success() {
echo ""
# Show file locations
echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
echo -e "${CYAN}${BOLD}📁 Your files:${NC}"
echo ""
echo -e " ${YELLOW}Config:${NC} ~/.hermes/config.yaml"
echo -e " ${YELLOW}API Keys:${NC} ~/.hermes/.env"
echo -e " ${YELLOW}Data:${NC} ~/.hermes/cron/, sessions/, logs/"
echo -e " ${YELLOW}Code:${NC} ~/.hermes/hermes-agent/"
echo -e " ${YELLOW}Config:${NC} $HERMES_HOME/config.yaml"
echo -e " ${YELLOW}API Keys:${NC} $HERMES_HOME/.env"
echo -e " ${YELLOW}Data:${NC} $HERMES_HOME/cron/, sessions/, logs/"
echo -e " ${YELLOW}Code:${NC} $INSTALL_DIR"
echo ""
echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
@@ -1364,6 +1452,9 @@ print_success() {
if [ "$DISTRO" = "termux" ]; then
echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
echo ""
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo -e "${YELLOW}⚡ 'hermes' was linked into /usr/local/bin and is ready to use — no shell reload needed.${NC}"
echo ""
else
echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
echo ""
@@ -1415,6 +1506,7 @@ main() {
print_banner
detect_os
resolve_install_layout
install_uv
check_python
check_git
+26
View File
@@ -43,11 +43,16 @@ AUTHOR_MAP = {
"teknium1@gmail.com": "teknium1",
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
"focusflow.app.help@gmail.com": "yes999zc",
"343873859@qq.com": "DrStrangerUJN",
"uzmpsk.dilekakbas@gmail.com": "dlkakbs",
"jefferson@heimdallstrategy.com": "Mind-Dragon",
"130918800+devorun@users.noreply.github.com": "devorun",
"maks.mir@yahoo.com": "say8hi",
"web3blind@users.noreply.github.com": "web3blind",
"julia@alexland.us": "alexg0bot",
"1060770+benjaminsehl@users.noreply.github.com": "benjaminsehl",
"nerijusn76@gmail.com": "Nerijusas",
# contributors (from noreply pattern)
"david.vv@icloud.com": "davidvv",
"wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
@@ -59,14 +64,21 @@ AUTHOR_MAP = {
"keifergu@tencent.com": "keifergu",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"abner.the.foreman@agentmail.to": "Abnertheforeman",
"thomasgeorgevii09@gmail.com": "tochukwuada",
"harryykyle1@gmail.com": "hharry11",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"keira.voss94@gmail.com": "keiravoss94",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"fqsy1416@gmail.com": "EKKOLearnAI",
"simbamax99@gmail.com": "simbam99",
"iris@growthpillars.co": "irispillars",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
"101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
"255305877+ismell0992-afk@users.noreply.github.com": "ismell0992-afk",
"cyprian@ironin.pl": "iRonin",
"valdi.jorge@gmail.com": "jvcl",
"q19dcp@gmail.com": "aj-nt",
"ebukau84@gmail.com": "UgwujaGeorge",
"francip@gmail.com": "francip",
"omni@comelse.com": "omnissiah-comelse",
"oussama.redcode@gmail.com": "mavrickdeveloper",
@@ -84,6 +96,8 @@ AUTHOR_MAP = {
"104278804+Sertug17@users.noreply.github.com": "Sertug17",
"112503481+caentzminger@users.noreply.github.com": "caentzminger",
"258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
"liusway405@gmail.com": "voidborne-d",
"xydarcher@uestc.edu.cn": "Readon",
"sir_even@icloud.com": "sirEven",
"36056348+sirEven@users.noreply.github.com": "sirEven",
"70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
@@ -106,6 +120,7 @@ AUTHOR_MAP = {
"30841158+n-WN@users.noreply.github.com": "n-WN",
"tsuijinglei@gmail.com": "hiddenpuppy",
"jerome@clawwork.ai": "HiddenPuppy",
"jerome.benoit@sap.com": "jerome-benoit",
"wysie@users.noreply.github.com": "Wysie",
"leoyuan0099@gmail.com": "keyuyuan",
"bxzt2006@163.com": "Only-Code-A",
@@ -166,6 +181,10 @@ AUTHOR_MAP = {
"jaisehgal11299@gmail.com": "jaisup",
"percydikec@gmail.com": "PercyDikec",
"noonou7@gmail.com": "HenkDz",
# Azure Foundry salvage (PRs #9029, #4599, #10086, #8766)
"tech@smartlogics.net": "TechPrototyper",
"637186+HangGlidersRule@users.noreply.github.com": "HangGlidersRule",
"pein892@gmail.com": "pein892",
"dean.kerr@gmail.com": "deankerr",
"socrates1024@gmail.com": "socrates1024",
"seanalt555@gmail.com": "Salt-555",
@@ -200,6 +219,9 @@ AUTHOR_MAP = {
"1434494126@qq.com": "5park1e",
"158153005+5park1e@users.noreply.github.com": "5park1e",
"innocarpe@gmail.com": "innocarpe",
"noreply@ked.com": "qike-ms",
"andrekurait@gmail.com": "AndreKurait",
"bsgdigital@users.noreply.github.com": "bsgdigital",
"numman.ali@gmail.com": "nummanali",
"rohithsaimidigudla@gmail.com": "whitehatjr1001",
"0xNyk@users.noreply.github.com": "0xNyk",
@@ -397,6 +419,7 @@ AUTHOR_MAP = {
"105142614+VTRiot@users.noreply.github.com": "VTRiot",
"vivien000812@gmail.com": "iamagenius00",
"89228157+Feranmi10@users.noreply.github.com": "Feranmi10",
"oluwadareferanmi11@gmail.com": "Feranmi10",
"simon@gtcl.us": "simon-gtcl",
"suzukaze.haduki@gmail.com": "houko",
"cliff@cigii.com": "cgarwood82",
@@ -490,6 +513,9 @@ AUTHOR_MAP = {
"zhangxicen@example.com": "zhangxicen",
"codex@openai.invalid": "teknium1",
"screenmachine@gmail.com": "teknium1",
"chenzeshi@live.com": "chen1749144759",
"mor.aleksandr@yahoo.com": "MorAlekss",
"ash@users.noreply.github.com": "ash",
}
@@ -281,7 +281,6 @@ Type these during an interactive chat session.
### Utility
```
/branch (/fork) Branch the current session
/btw Ephemeral side question (doesn't interrupt main task)
/fast Toggle priority/fast processing
/browser Open CDP browser connection
/history Show conversation history (CLI)
+152
View File
@@ -0,0 +1,152 @@
---
name: kanban-orchestrator
description: Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, orchestration, routing]
related_skills: [kanban-worker]
---
# Kanban Orchestrator — Decomposition Playbook
> The **core worker lifecycle** (including the `kanban_create` fan-out pattern and the "decompose, don't execute" rule) is auto-injected into every kanban process via the `KANBAN_GUIDANCE` system-prompt block. This skill is the deeper playbook when you're an orchestrator profile whose whole job is routing.
## When to use the board (vs. just doing the work)
Create Kanban tasks when any of these are true:
1. **Multiple specialists are needed.** Research + analysis + writing is three profiles.
2. **The work should survive a crash or restart.** Long-running, recurring, or important.
3. **The user might want to interject.** Human-in-the-loop at any step.
4. **Multiple subtasks can run in parallel.** Fan-out for speed.
5. **Review / iteration is expected.** A reviewer profile loops on drafter output.
6. **The audit trail matters.** Board rows persist in SQLite forever.
If *none* of those apply — it's a small one-shot reasoning task — use `delegate_task` instead or answer the user directly.
## The anti-temptation rules
Your job description says "route, don't execute." The rules that enforce that:
- **Do not execute the work yourself.** Your restricted toolset usually doesn't even include terminal/file/code/web for implementation. If you find yourself "just fixing this quickly" — stop and create a task for the right specialist.
- **For any concrete task, create a Kanban task and assign it.** Every single time.
- **If no specialist fits, ask the user which profile to create.** Do not default to doing it yourself under "close enough."
- **Decompose, route, and summarize — that's the whole job.**
## The standard specialist roster (convention)
Unless the user's setup has customized profiles, assume these exist. Adjust to whatever the user actually has — ask if you're unsure.
| Profile | Does | Typical workspace |
|---|---|---|
| `researcher` | Reads sources, gathers facts, writes findings | `scratch` |
| `analyst` | Synthesizes, ranks, de-dupes. Consumes multiple `researcher` outputs | `scratch` |
| `writer` | Drafts prose in the user's voice | `scratch` or `dir:` into their Obsidian vault |
| `reviewer` | Reads output, leaves findings, gates approval | `scratch` |
| `backend-eng` | Writes server-side code | `worktree` |
| `frontend-eng` | Writes client-side code | `worktree` |
| `ops` | Runs scripts, manages services, handles deployments | `dir:` into ops scripts repo |
| `pm` | Writes specs, acceptance criteria | `scratch` |
## Decomposition playbook
### Step 1 — Understand the goal
Ask clarifying questions if the goal is ambiguous. Cheap to ask; expensive to spawn the wrong fleet.
### Step 2 — Sketch the task graph
Before creating anything, draft the graph out loud (in your response to the user). Example for "Analyze whether we should migrate to Postgres":
```
T1 researcher research: Postgres cost vs current
T2 researcher research: Postgres performance vs current
T3 analyst synthesize migration recommendation parents: T1, T2
T4 writer draft decision memo parents: T3
```
Show this to the user. Let them correct it before you create anything.
### Step 3 — Create tasks and link
```python
t1 = kanban_create(
title="research: Postgres cost vs current",
assignee="researcher",
body="Compare estimated infrastructure costs, migration costs, and ongoing ops costs over a 3-year window. Sources: AWS/GCP pricing, team time estimates, current Postgres bills from peers.",
tenant=os.environ.get("HERMES_TENANT"),
)["task_id"]
t2 = kanban_create(
title="research: Postgres performance vs current",
assignee="researcher",
body="Compare query latency, throughput, and scaling characteristics at our expected data volume (~500GB, 10k QPS peak). Sources: benchmark papers, public case studies, pgbench results if easy.",
)["task_id"]
t3 = kanban_create(
title="synthesize migration recommendation",
assignee="analyst",
body="Read the findings from T1 (cost) and T2 (performance). Produce a 1-page recommendation with explicit trade-offs and a go/no-go call.",
parents=[t1, t2],
)["task_id"]
t4 = kanban_create(
title="draft decision memo",
assignee="writer",
body="Turn the analyst's recommendation into a 2-page memo for the CTO. Match the tone of previous decision memos in the team's knowledge base.",
parents=[t3],
)["task_id"]
```
`parents=[...]` gates promotion — children stay in `todo` until every parent reaches `done`, then auto-promote to `ready`. No manual coordination needed; the dispatcher and dependency engine handle it.
### Step 4 — Complete your own task
If you were spawned as a task yourself (e.g. `planner` profile was assigned `T0: "investigate Postgres migration"`), mark it done with a summary of what you created:
```python
kanban_complete(
summary="decomposed into T1-T4: 2 researchers parallel, 1 analyst on their outputs, 1 writer on the recommendation",
metadata={
"task_graph": {
"T1": {"assignee": "researcher", "parents": []},
"T2": {"assignee": "researcher", "parents": []},
"T3": {"assignee": "analyst", "parents": ["T1", "T2"]},
"T4": {"assignee": "writer", "parents": ["T3"]},
},
},
)
```
### Step 5 — Report back to the user
Tell them what you created in plain prose:
> I've queued 4 tasks:
> - **T1** (researcher): cost comparison
> - **T2** (researcher): performance comparison, in parallel with T1
> - **T3** (analyst): synthesizes T1 + T2 into a recommendation
> - **T4** (writer): turns T3 into a CTO memo
>
> The dispatcher will pick up T1 and T2 now. T3 starts when both finish. You'll get a gateway ping when T4 completes. Use the dashboard or `hermes kanban tail <id>` to follow along.
## Common patterns
**Fan-out + fan-in (research → synthesize):** N `researcher` tasks with no parents, one `analyst` task with all of them as parents.
**Pipeline with gates:** `pm → backend-eng → reviewer`. Each stage's `parents=[previous_task]`. Reviewer blocks or completes; if reviewer blocks, the operator unblocks with feedback and respawns.
**Same-profile queue:** 50 tasks, all assigned to `translator`, no dependencies between them. Dispatcher serializes — translator processes them in priority order, accumulating experience in their own memory.
**Human-in-the-loop:** Any task can `kanban_block()` to wait for input. Dispatcher respawns after `/unblock`. The comment thread carries the full context.
## Pitfalls
**Reassignment vs. new task.** If a reviewer blocks with "needs changes," create a NEW task linked from the reviewer's task — don't re-run the same task with a stern look. The new task is assigned to the original implementer profile.
**Argument order for links.** `kanban_link(parent_id=..., child_id=...)` — parent first. Mixing them up demotes the wrong task to `todo`.
**Don't pre-create the whole graph if the shape depends on intermediate findings.** If T3's structure depends on what T1 and T2 find, let T3 exist as a "synthesize findings" task whose own first step is to read parent handoffs and plan the rest. Orchestrators can spawn orchestrators.
**Tenant inheritance.** If `HERMES_TENANT` is set in your env, pass `tenant=os.environ.get("HERMES_TENANT")` on every `kanban_create` call so child tasks stay in the same namespace.
+134
View File
@@ -0,0 +1,134 @@
---
name: kanban-worker
description: Pitfalls, examples, and edge cases for Hermes Kanban workers. The lifecycle itself is auto-injected into every worker's system prompt as KANBAN_GUIDANCE (from agent/prompt_builder.py); this skill is what you load when you want deeper detail on specific scenarios.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, collaboration, workflow, pitfalls]
related_skills: [kanban-orchestrator]
---
# Kanban Worker — Pitfalls and Examples
> You're seeing this skill because the Hermes Kanban dispatcher spawned you as a worker with `--skills kanban-worker` — it's loaded automatically for every dispatched worker. The **lifecycle** (6 steps: orient → work → heartbeat → block/complete) also lives in the `KANBAN_GUIDANCE` block that's auto-injected into your system prompt. This skill is the deeper detail: good handoff shapes, retry diagnostics, edge cases.
## Workspace handling
Your workspace kind determines how you should behave inside `$HERMES_KANBAN_WORKSPACE`:
| Kind | What it is | How to work |
|---|---|---|
| `scratch` | Fresh tmp dir, yours alone | Read/write freely; it gets GC'd when the task is archived. |
| `dir:<path>` | Shared persistent directory | Other runs will read what you write. Treat it like long-lived state. Path is guaranteed absolute (the kernel rejects relative paths). |
| `worktree` | Git worktree at the resolved path | If `.git` doesn't exist, run `git worktree add <path> <branch>` from the main repo first, then cd and work normally. Commit work here. |
## Tenant isolation
If `$HERMES_TENANT` is set, the task belongs to a tenant namespace. When reading or writing persistent memory, prefix memory entries with the tenant so context doesn't leak across tenants:
- Good: `business-a: Acme is our biggest customer`
- Bad (leaks): `Acme is our biggest customer`
## Good summary + metadata shapes
The `kanban_complete(summary=..., metadata=...)` handoff is how downstream workers read what you did. Patterns that work:
**Coding task:**
```python
kanban_complete(
summary="shipped rate limiter — token bucket, keys on user_id with IP fallback, 14 tests pass",
metadata={
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
"tests_run": 14,
"tests_passed": 14,
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
},
)
```
**Research task:**
```python
kanban_complete(
summary="3 competing libraries reviewed; vLLM wins on throughput, SGLang on latency, Tensorrt-LLM on memory efficiency",
metadata={
"sources_read": 12,
"recommendation": "vLLM",
"benchmarks": {"vllm": 1.0, "sglang": 0.87, "trtllm": 0.72},
},
)
```
**Review task:**
```python
kanban_complete(
summary="reviewed PR #123; 2 blocking issues found (SQL injection in /search, missing CSRF on /settings)",
metadata={
"pr_number": 123,
"findings": [
{"severity": "critical", "file": "api/search.py", "line": 42, "issue": "raw SQL concat"},
{"severity": "high", "file": "api/settings.py", "issue": "missing CSRF middleware"},
],
"approved": False,
},
)
```
Shape `metadata` so downstream parsers (reviewers, aggregators, schedulers) can use it without re-reading your prose.
## Block reasons that get answered fast
Bad: `"stuck"` — the human has no context.
Good: one sentence naming the specific decision you need. Leave longer context as a comment instead.
```python
kanban_comment(
task_id=os.environ["HERMES_KANBAN_TASK"],
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers. Keying on IP alone causes false positives.",
)
kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id (requires auth, skips anonymous endpoints)?")
```
The block message is what appears in the dashboard / gateway notifier. The comment is the deeper context a human reads when they open the task.
## Heartbeats worth sending
Good heartbeats name progress: `"epoch 12/50, loss 0.31"`, `"scanned 1.2M/2.4M rows"`, `"uploaded 47/120 videos"`.
Bad heartbeats: `"still working"`, empty notes, sub-second intervals. Every few minutes max; skip entirely for tasks under ~2 minutes.
## Retry scenarios
If you open the task and `kanban_show` returns `runs: [...]` with one or more closed runs, you're a retry. The prior runs' `outcome` / `summary` / `error` tell you what didn't work. Don't repeat that path. Typical retry diagnostics:
- `outcome: "timed_out"` — the previous attempt hit `max_runtime_seconds`. You may need to chunk the work or shorten it.
- `outcome: "crashed"` — OOM or segfault. Reduce memory footprint.
- `outcome: "spawn_failed"` + `error: "..."` — usually a profile config issue (missing credential, bad PATH). Ask the human via `kanban_block` instead of retrying blindly.
- `outcome: "reclaimed"` + `summary: "task archived..."` — operator archived the task out from under the previous run; you probably shouldn't be running at all, check status carefully.
- `outcome: "blocked"` — a previous attempt blocked; the unblock comment should be in the thread by now.
## Do NOT
- Call `delegate_task` as a substitute for `kanban_create`. `delegate_task` is for short reasoning subtasks inside YOUR run; `kanban_create` is for cross-agent handoffs that outlive one API loop.
- Modify files outside `$HERMES_KANBAN_WORKSPACE` unless the task body says to.
- Create follow-up tasks assigned to yourself — assign to the right specialist.
- Complete a task you didn't actually finish. Block it instead.
## Pitfalls
**Task state can change between dispatch and your startup.** Between when the dispatcher claimed and when your process actually booted, the task may have been blocked, reassigned, or archived. Always `kanban_show` first. If it reports `blocked` or `archived`, stop — you shouldn't be running.
**Workspace may have stale artifacts.** Especially `dir:` and `worktree` workspaces can have files from previous runs. Read the comment thread — it usually explains why you're running again and what state the workspace is in.
**Don't rely on the CLI when the guidance is available.** The `kanban_*` tools work across all terminal backends (Docker, Modal, SSH). `hermes kanban <verb>` from your terminal tool will fail in containerized backends because the CLI isn't installed there. When in doubt, use the tool.
## CLI fallback (for scripting)
Every tool has a CLI equivalent for human operators and scripts:
- `kanban_show``hermes kanban show <id> --json`
- `kanban_complete``hermes kanban complete <id> --summary "..." --metadata '{...}'`
- `kanban_block``hermes kanban block <id> "reason"`
- `kanban_create``hermes kanban create "title" --assignee <profile> [--parent <id>]`
- etc.
Use the tools from inside an agent; the CLI exists for the human at the terminal.
@@ -17,6 +17,13 @@ Remove refusal behaviors (guardrails) from open-weight LLMs without retraining o
**License warning:** OBLITERATUS is AGPL-3.0. NEVER import it as a Python library. Always invoke via CLI (`obliteratus` command) or subprocess. This keeps Hermes Agent's MIT license clean.
## Video Guide
Walkthrough of OBLITERATUS used by a Hermes agent to abliterate Gemma:
https://www.youtube.com/watch?v=8fG9BrNTeHs ("OBLITERATUS: An AI Agent Removed Gemma 4's Safety Guardrails")
Useful when the user wants a visual overview of the end-to-end workflow before running it themselves.
## When to Use This Skill
Trigger when the user:
@@ -134,6 +134,7 @@ masks = processor.image_processor.post_process_masks(
### Model architecture
<!-- ascii-guard-ignore -->
```
SAM Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
@@ -144,6 +145,7 @@ SAM Architecture:
Image Embeddings Prompt Embeddings Masks + IoU
(computed once) (per prompt) predictions
```
<!-- ascii-guard-ignore-end -->
### Model variants
@@ -0,0 +1,42 @@
"""Resolve HERMES_HOME for standalone skill scripts.
Skill scripts may run outside the Hermes process (e.g. system Python,
nix env, CI) where ``hermes_constants`` is not importable. This module
provides the same ``get_hermes_home()`` and ``display_hermes_home()``
contracts as ``hermes_constants`` without requiring it on ``sys.path``.
When ``hermes_constants`` IS available it is used directly so that any
future enhancements (profile resolution, Docker detection, etc.) are
picked up automatically. The fallback path replicates the core logic
from ``hermes_constants.py`` using only the stdlib.
All scripts under ``google-workspace/scripts/`` should import from here
instead of duplicating the ``HERMES_HOME = Path(os.getenv(...))`` pattern.
"""
from __future__ import annotations
import os
from pathlib import Path
try:
from hermes_constants import display_hermes_home as display_hermes_home
from hermes_constants import get_hermes_home as get_hermes_home
except (ModuleNotFoundError, ImportError):
def get_hermes_home() -> Path:
"""Return the Hermes home directory (default: ~/.hermes).
Mirrors ``hermes_constants.get_hermes_home()``."""
val = os.environ.get("HERMES_HOME", "").strip()
return Path(val) if val else Path.home() / ".hermes"
def display_hermes_home() -> str:
"""Return a user-friendly ``~/``-shortened display string.
Mirrors ``hermes_constants.display_hermes_home()``."""
home = get_hermes_home()
try:
return "~/" + str(home.relative_to(Path.home()))
except ValueError:
return str(home)
@@ -31,7 +31,14 @@ from datetime import datetime, timedelta, timezone
from email.mime.text import MIMEText
from pathlib import Path
HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
from _hermes_home import get_hermes_home
HERMES_HOME = get_hermes_home()
TOKEN_PATH = HERMES_HOME / "google_token.json"
CLIENT_SECRET_PATH = HERMES_HOME / "google_client_secret.json"
@@ -10,9 +10,12 @@ import sys
from datetime import datetime, timezone
from pathlib import Path
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
def get_hermes_home() -> Path:
return Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
from _hermes_home import get_hermes_home
def get_token_path() -> Path:
@@ -21,6 +21,8 @@ Agent workflow:
6. Run --check to verify. Done.
"""
from __future__ import annotations # allow PEP 604 `X | None` on Python 3.9+
import argparse
import json
import os
@@ -28,13 +30,12 @@ import subprocess
import sys
from pathlib import Path
try:
from hermes_constants import display_hermes_home, get_hermes_home
except ModuleNotFoundError:
HERMES_AGENT_ROOT = Path(__file__).resolve().parents[4]
if HERMES_AGENT_ROOT.exists():
sys.path.insert(0, str(HERMES_AGENT_ROOT))
from hermes_constants import display_hermes_home, get_hermes_home
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
from _hermes_home import display_hermes_home, get_hermes_home
HERMES_HOME = get_hermes_home()
TOKEN_PATH = HERMES_HOME / "google_token.json"
@@ -111,7 +112,11 @@ def install_deps():
return True
except subprocess.CalledProcessError as e:
print(f"ERROR: Failed to install dependencies: {e}")
print(f"Try manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
print(
"On environments without pip (e.g. Nix), install the optional extra instead:"
)
print(" pip install 'hermes-agent[google]'")
print(f"Or manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
return False

Some files were not shown because too many files have changed in this diff Show More