Compare commits

..

22 Commits

Author SHA1 Message Date
emozilla 854206e59e fix(plugins): register dynamically-loaded modules in sys.modules before exec
Dashboard plugin API routes (web_server._mount_plugin_api_routes) and
gateway event hooks (gateway.hooks.HookRegistry.discover_and_load) both
loaded Python files via importlib.util.spec_from_file_location +
exec_module without registering the resulting module in sys.modules.

That breaks any plugin or hook handler that uses `from __future__ import
annotations` together with a Pydantic BaseModel / dataclass / anything
that introspects `__module__`: at first request Pydantic tries to
resolve string-form type hints against the defining module's namespace,
can't find it by name, and raises:

  PydanticUserError: TypeAdapter[...] is not fully defined;
  you should define ... and all referenced types,
  then call `.rebuild()` on the instance.

This is what broke the kanban dashboard's 'triage' button — POST
/api/plugins/kanban/tasks validated against CreateTaskBody (a Pydantic
model in a file using `from __future__ import annotations`) and
returned 500 on every click.

The fix, applied symmetrically to both loaders:

  1. Compute module_name once.
  2. Register the module in sys.modules BEFORE exec_module.
  3. On exec_module failure, pop the half-initialized stub so subsequent
     reloads don't pick up broken state.

GETs were unaffected because they don't build a body TypeAdapter, which
is why this only surfaced when users started POSTing.
2026-04-28 13:24:29 -04:00
teknium1 dd83173621 feat(kanban): per-task force-loaded skills
Tasks can now pin extra skills to load into their worker alongside the
built-in kanban-worker. Use cases: translation tasks that need a
translation skill, review tasks that need github-code-review, security
audits that need security-pr-audit — without editing the assignee's
profile config.

Changes:
- tasks.skills column (JSON array), idempotent migration, Task.skills
  dataclass field.
- create_task(skills=[...]) normalises (strip/dedupe), rejects commas
  in a single name.
- _default_spawn emits one `--skills X` pair per task skill, in
  addition to the built-in `--skills kanban-worker`. Deduped against
  the built-in.
- CLI: `hermes kanban create --skill <name>` (repeatable).
- Tool: `kanban_create(skills=[...])` (accepts list or single string).
- Dashboard: POST /tasks accepts `skills`; inline create form has a
  comma-separated skills input; drawer shows a Skills row.
- `hermes kanban show` prints a skills row when present.
- Docs: kanban.md has a new 'Pinning extra skills to a specific task'
  section; CLI reference shows `--skill`.
- Tests: 16 new (kernel round-trip, dedupe, comma-rejection, dispatcher
  argv with multiple skills, built-in dedupe, CLI flag repeatable, CLI
  no-flag stays None, show renders skills, idempotent migration on
  legacy DB, tool list/string/non-list, dashboard REST round-trip,
  dashboard default empty list).
2026-04-28 07:50:36 -07:00
teknium1 c65c1ddf21 feat(kanban): dispatcher auto-loads kanban-worker skill on every spawn
`_default_spawn` now passes `--skills kanban-worker` in the child's
argv, so every dispatched worker gets the skill loaded automatically
regardless of the profile's default skill config.

Why
  The system prompt carries the MANDATORY lifecycle (via
  KANBAN_GUIDANCE). The skill carries the deeper reference material:
  good summary/metadata patterns, retry diagnostics, block-reason
  examples, workspace handling, CLI fallback. Both are useful;
  requiring users to wire the skill into skills config per-profile
  is a footgun — they'd hit tasks where workers use the lifecycle
  correctly but not the patterns, producing weaker handoffs.

  Auto-loading makes kanban-worker the baseline. Profiles can still
  add more skills via the normal config; --skills is additive.

Test
  New test_default_spawn_auto_loads_kanban_worker_skill intercepts
  subprocess.Popen to capture the argv without actually spawning a
  hermes subprocess. Asserts '--skills kanban-worker' appears in the
  cmd and the env still carries HERMES_KANBAN_TASK + HERMES_PROFILE.

Skill note updated
  Worker skill's opening note changed from "you don't need to load
  this" to "you're seeing this because the dispatcher loaded it for
  you" — reflects the new always-loaded reality.

218/218 kanban + tools suite green.
2026-04-28 07:12:34 -07:00
teknium1 1970bcf5a5 feat(kanban): system-prompt guidance block + reshape skills to reference-only
Moves the kanban worker lifecycle out of the skills and into the system
prompt as a conditionally-injected guidance block, matching the
existing MEMORY_GUIDANCE / SESSION_SEARCH_GUIDANCE / SKILLS_GUIDANCE
pattern in `agent/prompt_builder.py`. Workers now know the full
lifecycle even if no skill was loaded; skills become the deeper
playbook for edge cases.

Why
  Previously the 6-step lifecycle (orient → work → heartbeat → block/
  complete → fan-out) only reached the model if the kanban-worker
  skill was loaded. That's fragile: users configure skills-per-
  platform, profiles can forget to include them, skill loading order
  matters. The lifecycle is load-bearing for correct board behavior
  — it belongs in the prompt path.

What
  - New `KANBAN_GUIDANCE` constant in agent/prompt_builder.py (~3 KB):
    identity framing ("You are a Kanban worker"), 6-step lifecycle
    with exact tool-call shapes (`kanban_show()`, `kanban_complete(
    summary=..., metadata=...)`, `kanban_block(reason=...)`, etc.),
    orchestrator mode note, and explicit DO NOTs including "don't
    shell out to `hermes kanban <verb>`".
  - Wired into `AIAgent._build_system_prompt` gated on `kanban_show
    in self.valid_tool_names`. Because `kanban_show`'s `check_fn`
    gates on `HERMES_KANBAN_TASK` env var, this indirection
    guarantees guidance appears iff a worker was dispatched. Zero
    cost to normal sessions.

Gating verified live
  Normal `hermes chat` session: 2329-char prompt, zero kanban content.
  Worker session (HERMES_KANBAN_TASK set):  5272-char prompt, +2943
  chars of KANBAN_GUIDANCE present. Regression test asserts the
  header phrase and each tool name.

Skills reshape (both from ~6 KB lifecycles to ~7 KB references)
  - `skills/devops/kanban-worker/SKILL.md`: drops the lifecycle
    (now in the guidance block), keeps good-summary patterns,
    block-reason examples, retry diagnostics, pitfalls, CLI
    fallback cheatsheet. Adds explicit note that you don't need to
    load the skill to work a task anymore.
  - `skills/devops/kanban-orchestrator/SKILL.md`: drops the
    "don't execute" rule (now in guidance), keeps the full
    decomposition playbook, specialist roster with typical
    workspaces, a concrete Postgres-migration example with
    `kanban_create` + `parents=[...]` linking, and pitfalls
    (reassignment vs new task, link argument order, tenant
    inheritance).

Tests (+3)
  - `test_kanban_guidance_not_in_normal_prompt`: verifies a regular
    AIAgent with no env var produces a prompt that contains none of
    the kanban lifecycle phrases.
  - `test_kanban_guidance_in_worker_prompt`: spawns an AIAgent with
    HERMES_KANBAN_TASK set, verifies the header + all 4 tool-call
    examples + anti-shell guidance appear.
  - `test_kanban_guidance_prompt_size_bounded`: sanity check that
    the guidance is 1.5-4 KB so it doesn't balloon the cached prompt.

217/217 kanban + tools suite green; 303/303 run_agent suite green.
The guidance block sits alongside the other tool-gated blocks, so
prompt caching remains intact — the system prompt is stable across
every turn of a kanban worker's run.
2026-04-28 05:29:31 -07:00
teknium1 832ecde4b0 feat(kanban): structured tool surface for worker + orchestrator agents
Seven new tools in `tools/kanban_tools.py` that give kanban workers a
backend-portable, schema-filtered way to interact with the board from
inside their own Python process — no shelling out to `hermes kanban`.

Motivation
  The CLI path (`hermes kanban complete \$TASK --summary ...`) breaks
  on any remote terminal backend (Docker, Modal, Singularity, SSH).
  The terminal tool runs `hermes kanban` inside the container, where
  `hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted.
  Tools run in the agent's own Python process, so they always reach
  the board regardless of backend. Also skips shell-quoting fragility
  on --metadata JSON and gives structured error returns the model can
  reason about.

The seven tools
  kanban_show        read current task (defaults to HERMES_KANBAN_TASK)
  kanban_complete    structured handoff: summary + metadata
  kanban_block       ask for human input
  kanban_heartbeat   signal liveness during long operations
  kanban_comment     append to task thread
  kanban_create      fan out into child tasks (orchestrator path)
  kanban_link        add parent→child dependency after the fact

Gating
  Each tool's check_fn returns True iff HERMES_KANBAN_TASK is set in
  the process env. The dispatcher sets it when spawning a worker;
  normal `hermes chat` sessions never have it. Empirically verified:
  a baseline hermes-cli schema is 27 tools; with HERMES_KANBAN_TASK
  set it grows to exactly 34 (+7). Zero leak into normal sessions.

Also set HERMES_PROFILE in the spawn env so the kanban_comment tool's
author default works cleanly (it's what the tool reads to attribute
comments).

Skill updates
  - `skills/devops/kanban-worker/SKILL.md`: lifecycle rewritten to use
    kanban_show / kanban_heartbeat / kanban_block / kanban_complete /
    kanban_comment / kanban_create directly. CLI fallback section
    added for human operators / scripts.
  - `skills/devops/kanban-orchestrator/SKILL.md`: all examples ported
    from CLI to tool form; top-banner note explaining tools are the
    primary surface. kanban_create / kanban_link throughout.

Docs
  `website/docs/user-guide/features/kanban.md`:
  new "How workers interact with the board" section explaining the
  tool surface, gating mechanism, and why tools vs CLI. The worker
  skill / orchestrator skill subsections are now nested under it.

Tests (+25 in tests/tools/test_kanban_tools.py)
  - Schema gating: kanban_tools_hidden_without_env_var,
    kanban_tools_visible_with_env_var.
  - Happy paths: show (default + explicit task_id), complete (with
    summary+metadata, with result only), block, heartbeat (with and
    without note), comment (default + custom author), create (with
    list parents, with string parent), link.
  - Error paths: complete rejects no-handoff and non-dict metadata,
    block rejects empty reason, comment rejects empty body, create
    rejects no title / no assignee / non-list parents, link rejects
    self-reference / missing args / cycles.
  - End-to-end: full worker lifecycle driven entirely through the
    tools, verified against DB state.

214/214 kanban suite pass under scripts/run_tests.sh.
2026-04-28 04:30:22 -07:00
Teknium be184aa5fa fix(kanban): close the two v2-flagged issues in v1
Both items the atypical-scenarios pass flagged as "v2 follow-up"
actually belong in v1. Fixed now.

Fix 1: workspace path traversal
  resolve_workspace now rejects non-absolute paths for all three
  workspace_kinds (scratch-with-explicit-path, dir:, worktree). A
  relative path like '../../../tmp/attacker' was being silently
  resolved against the dispatcher's CWD — a confused-deputy escape.
  Error message points users at the absolute-path requirement.
  Storage remains verbatim (kernel doesn't rewrite user input);
  the refusal happens at resolution time, so the dispatcher's
  existing spawn-failure circuit breaker correctly categorizes it.

  Threat model documented in website/docs/user-guide/features/kanban.md:
  single-host, trusted-local-user. The absolute-path rule prevents
  ambiguity-driven escape, not malicious access — kanban runs as you,
  with your uid, on your filesystem.

Fix 2: build_worker_context unbounded
  Added per-section caps so worker prompts stay bounded on pathological
  boards:
    _CTX_MAX_PRIOR_ATTEMPTS = 10   most-recent N runs shown; older
                                   collapsed into "N earlier attempts
                                   omitted" marker. Attempt numbering
                                   preserved (shows "Attempt 16" not
                                   renumbered).
    _CTX_MAX_COMMENTS       = 30   same pattern for comments.
    _CTX_MAX_FIELD_BYTES    = 4 KB per summary / error / metadata / result.
    _CTX_MAX_BODY_BYTES     = 8 KB per task.body (opening post).
    _CTX_MAX_COMMENT_BYTES  = 2 KB per comment.
  Truncation uses a visible ellipsis + char-count so the worker knows
  it's been truncated.

  Effect on atypical-scenario runs:
    huge_run_count_on_one_task (1000 runs):  63 KB  →    820 chars
    comment_storm       (1000 comments):     50 KB  →  1,671 chars

Tests (+6 in main suite)
  test_resolve_workspace_rejects_relative_dir_path — relative dir:
    path stored verbatim but refused at resolve.
  test_resolve_workspace_accepts_absolute_dir_path — legitimate
    absolute paths are created and returned.
  test_resolve_workspace_rejects_relative_worktree_path — same guard
    for worktree kind.
  test_build_worker_context_caps_prior_attempts — 25 runs → exactly
    _CTX_MAX_PRIOR_ATTEMPTS shown, omitted marker present,
    attempt numbering preserves original index.
  test_build_worker_context_caps_comments — 100 comments → 30 shown,
    70 in the omitted marker.
  test_build_worker_context_caps_huge_summary — 1 MB summary on a
    prior run → context under 10 KB total, truncation marker visible.

189/189 kanban suite pass. Atypical-scenarios stress script still
passes all 28 scenarios with the new caps in effect.
2026-04-28 01:20:08 -07:00
Teknium 63b7b6d5bd test(kanban): atypical-scenario stress suite + clock-skew elapsed clamp fix
Added tests/stress/test_atypical_scenarios.py — 28 scenarios covering
atypical user inputs and environments that the normal tests assume
away. Surfaced one real UI bug in the process.

Bug found and fixed
  - CLI display of negative elapsed time when NTP jumps backward
    between claim_task and complete_task. The kernel faithfully stores
    started_at > ended_at; both CLI display sites (_cmd_show's Runs
    section at line 685, _cmd_runs's table at line 1153) now clamp
    elapsed to max(0, end - start), matching the dashboard JS which
    already had the same Math.max(0, ...) clamp. Regression test
    test_cli_show_clamps_negative_elapsed forces a future started_at
    via raw UPDATE and verifies neither show nor runs prints a
    `-<digits>s` token.

Atypical scenarios covered (all passing, 28 total)
  Data:
    - unicode_and_emoji: CJK, RTL (Hebrew/Arabic), ZWJ emoji sequences,
      control chars, null bytes in titles + metadata
    - huge_strings: 1 MB body + 1 MB summary + 50-level nested metadata
    - sql_injection_attempts: 6 classic payloads, parameterized queries
      hold across every string field
    - newlines_in_summary: full preserved on run, first line in event
      payload for notifier brevity
    - malformed_metadata_via_cli: 4 bad JSON values each cleanly
      rejected with stderr error, no partial task mutation
    - empty_string_fields: empty title rejected, whitespace-only title
      rejected, empty body accepted
    - tenant_with_newlines: multiline tenant strings survive
      board_stats

  Dependency graphs:
    - dependency_cycle: A→B→A refused
    - self_parent: cannot depend on itself
    - diamond_dependency: child promotes only when both parents done
    - wide_fan_out: 500 children promoted in 4ms via complete_task's
      internal recompute_ready
    - wide_fan_in: 500 parents → 1 child, promotion gated correctly
    - parent_in_different_status_states: child stays todo for parent
      in ready/running/blocked/triage/archived; only 'done' unblocks

  Workspace:
    - workspace_path_traversal: dir: workspaces are intentionally
      arbitrary paths (documented threat model)
    - workspace_nonexistent_path: spawn_failure counter increments,
      task returns to ready

  Clock:
    - clock_skew_start_greater_than_end: kernel stores faithfully,
      CLI now clamps at display (see Bug above)

  Filesystem:
    - hermes_home_with_spaces: works
    - hermes_home_with_unicode: works
    - hermes_home_via_symlink: two symlinks to same dir share DB
      (Path.resolve() in _INITIALIZED_PATHS)

  Scale extremes:
    - huge_run_count_on_one_task: 1000 runs → list_runs in <1ms,
      build_worker_context 3ms/63KB (flagged: unbounded on
      retry-heavy tasks; v2 cap candidate)
    - hundred_tenants: 5000 tasks / 100 tenants → stats 1ms, list 26ms
    - comment_storm: 1000 comments → 50KB worker context (same
      unbounded-context flag)

  Lifecycle:
    - completed_task_reclaim_attempt: done tasks can't be
      re-claimed/re-completed/blocked
    - archived_task_resurrection_attempt: archived tasks invisible to
      default list + all ops refused
    - unassigned_task_never_claims: dispatcher skips, task untouched

  Assignees:
    - assignee_with_special_chars: @-signs, dots, CJK, emoji, 200-char
      names, empty strings all handled

  Concurrency:
    - idempotency_key_race: two concurrent processes calling
      create_task with same key both get back the SAME task id,
      exactly one row in DB

  Dashboard:
    - dashboard_rest_with_weird_inputs: empty/huge/unicode titles,
      unknown fields, type mismatches handled correctly (200 for
      valid, 400/422 for invalid)

Observations flagged for v2 (not fixed, not blocking)
  - build_worker_context is unbounded on retry-heavy tasks (1000 runs
    → 63KB) and comment-heavy tasks (1000 comments → 50KB). A
    `--max-prior-attempts` / `--max-comments` cap would be appropriate.
  - `dir:` workspace paths are intentionally arbitrary; docs should
    note the threat model (trusted local user; path is stored
    verbatim, not sandboxed).

183/183 main kanban suite pass. Stress suite still opted out by
default; enable with `pytest --run-stress` or run scripts directly.
2026-04-28 01:13:17 -07:00
Teknium 123f8d0fed feat(kanban): battle-test suite + 2 real bugs it found
Added tests/stress/ — an opt-in suite that stresses the kernel with
real concurrency, real subprocesses, random property fuzzing, and scale
benchmarks. Exposed two real bugs the 4 audit passes missed.

Bugs found and fixed
  - _pid_alive returned True for zombie processes. os.kill(pid, 0)
    succeeds against zombies (process table entry exists until parent
    reaps), so a worker that exited normally but wasn't yet reaped
    would look alive to detect_crashed_workers indefinitely. Fixed by
    peeking at /proc/<pid>/status on Linux and treating State: Z as
    dead. No-op on other POSIX; Windows unchanged.
  - Task ID generator used 2 hex bytes (65k space). By birthday
    paradox, ~5% collision probability at 1k tasks, ~50% at 10k.
    Would raise UNIQUE constraint errors on large boards without a
    retry path. Bumped to 4 hex bytes (4.3B space). Surfaced by
    benchmarks trying to seed 10k tasks.

Stress suite (tests/stress/, opt-in via --run-stress)
  - test_concurrency.py — 5 processes race for 100 tasks. 1
    lost-claim race observed, 0 double-claims, 0 orphan runs.
  - test_concurrency_mixed.py — 10 workers + 1 reclaimer, 500
    tasks, random ops (claim/complete/block/unblock/archive).
    1518 events, 43 lost-claim races, zero invariant violations.
  - test_concurrency_reclaim_race.py — TTL < work duration so the
    reclaimer intentionally yanks tasks mid-work. 82 claims, 44
    reclaimed, 50 completed, 32 complete_refused (CAS correctly
    blocked late completes on reclaimed tasks).
  - test_subprocess_e2e.py — dispatcher spawns real python
    subprocess workers that heartbeat + complete via the CLI.
    Also exercises crash detection against a real dead PID via
    double-fork.
  - test_property_fuzzing.py — 500 randomized sequences, ~40k
    operations, 9 invariant checks after each step. Zero
    violations.
  - test_benchmarks.py — latency at 100/1k/10k tasks. Key numbers:
    dispatch_once @ 10k = 4ms, recompute_ready @ 10k = 47ms,
    list_tasks @ 10k = 50ms, build_worker_context w/ 50 parents
    = 1ms. Erosika's 10k starvation flag is pessimistic by roughly
    an order of magnitude.

Regression tests added to the main suite (+2)
  - test_pid_alive_detects_zombie: /proc check against a real
    zombified process (Linux-only, skipped elsewhere).
  - test_task_ids_dont_collide_at_scale: 500 creates, verify
    uniqueness + format.

New harness
  - tests/stress/conftest.py skips the suite by default; opt in with
    pytest --run-stress. Scripts are also __main__-runnable directly
    since they were developed as standalone stress tests.
  - tests/stress/_fake_worker.py: minimal Python worker that
    exercises the real subprocess contract (reads HERMES_KANBAN_TASK,
    heartbeats, completes via CLI).
  - tests/stress/README.md explains opt-in usage.

Test count: 182/182 kanban suite still pass under scripts/run_tests.sh;
stress suite is additive and out-of-band.
2026-04-27 21:37:34 -07:00
Teknium a24c6e191f fix(kanban): address @erosika's pre-merge review (issue #16102)
Six concrete bugs + two cheap v2 extensions from the review at
https://github.com/NousResearch/hermes-agent/issues/16102#issuecomment-4331125835
Larger items (structured comments as session substrate, taxonomy
reorg) deferred to v2 with reply posted on the issue.

Pre-merge bug fixes
  - unblock_task: close any stale current_run_id pointer with a
    reclaimed run inside the unblock txn. Defensive; the invariant
    holds under current data paths (block_task already closes the
    run) but a future or external write that leaves the pointer
    dangling would otherwise persist across the ready->blocked->
    ready cycle. Mirrors the same pattern in claim_task +
    archive_task.
  - Migration backfill: wrap the in-flight backfill loop in
    write_txn and add a CAS guard (`current_run_id IS NULL`) on
    the pointer UPDATE, with a cleanup path that marks any orphan
    run row reclaimed if the CAS fails. Prevents races against a
    concurrent dispatcher between SELECT and INSERT.
  - Notifier sub leak on non-done terminals: unsub on the last
    delivered event's kind being terminal (completed / blocked /
    gave_up / crashed / timed_out), not just on task.status in
    (done, archived). blocked / gave_up / crashed / timed_out used
    to fire one ping then strand the subscription row forever.
  - Notifier thrashes dead chats: per-subscription send-failure
    counter keyed on (task_id, platform, chat_id, thread_id).
    After 3 consecutive adapter.send exceptions, drop the sub
    automatically. Counter resets on any successful send.

Daemon ops visibility
  - run_daemon on_tick now tracks consecutive ticks where the
    ready queue is non-empty but 0 spawns succeeded. After 6 such
    ticks (default ~30s at interval=5), emits a WARN line to
    stderr pointing at profile health (venv, PATH, credentials)
    and `hermes kanban list --status blocked`. Rate-limited to
    one message per 5 minutes so a persistent outage doesn't
    spam logs.

v2 extensions shipped in scope (pure upside)
  - build_worker_context: new "Recent work by @assignee" section
    surfacing the 5 most-recent completed runs for the current
    task's assignee (excluding this task). Bounded, cached by
    the natural LIMIT, no new dependencies. Skipped when the
    task has no assignee.
  - Gateway notifier message prefix: terminal pings now lead
    with `@<assignee>` so fleets (one chat subscribing to many
    tasks with different workers) stay legible at a glance.
    One-line template change.

Deferred to v2 (noted in reply to erosika)
  - recompute_ready full-scan starvation at 10k+ tasks: dirty-set
    approach is a real refactor; fine as follow-up.
  - Skill ↔ assignee validation for routing: depends on skill
    introspection surface that isn't nailed down.
  - Structured comments (in_reply_to / addressed_to / kind) as
    multi-peer session substrate: schema-affecting, exactly the
    v2-scope design vulcan flagged shouldn't cram into this PR.
  - Pattern vs mechanism taxonomy split in docs: pure docs reorg,
    low urgency.

Tests (+6 in core functionality)
  - unblock_invariant_recovery (engineered leak, defensive close)
  - unblock_normal_path_no_spurious_run (no run created on happy
    block->unblock; erosika's main concern)
  - migration_backfill_idempotent_under_re_run (3x init_db on a
    legacy-shape DB yields exactly 1 run row, not 3)
  - build_worker_context_includes_role_history (role continuity)
  - build_worker_context_role_history_skipped_when_no_assignee
  - build_worker_context_role_history_bounded_to_5

180/180 kanban suite pass under scripts/run_tests.sh. Live-smoke
exercised all three kernel fixes end-to-end with isolated
HERMES_HOME.
2026-04-27 20:40:49 -07:00
Teknium 7206eed319 docs(kanban): add step-by-step tutorial with 10 dashboard screenshots
New website/docs/user-guide/features/kanban-tutorial.md walks four
user stories end-to-end, each backed by a real screenshot of the
dashboard running against seeded data.

Stories
  1. Solo dev shipping a feature (parent->child dependencies,
     structured handoff, run history rendering).
  2. Fleet farming (parallel independent tasks across 3 assignees,
     lanes-by-profile grouping, dispatcher daemon).
  3. Role pipeline with retry (PM spec -> eng implements -> review
     blocks -> eng retries -> review approves; two-run history
     visible in the drawer; downstream workers pull parent
     summary+metadata).
  4. Circuit breaker + crash recovery (2 spawn_failed + 1 gave_up
     for a deploy with missing creds; 1 crashed + 1 completed for
     an OOM-killed migration that recovered on retry).

Each story shows both CLI commands and the dashboard drawer
equivalent. Screenshots captured via playwright + chromium at 2x
device scale, then repalettized with PIL (22MB -> 6.1MB for the
10-image set, no visible quality loss verified against vision).

Side updates
  - website/sidebars.ts: added kanban-tutorial under features.
  - website/docs/user-guide/features/kanban.md: prefix banner
    linking new readers to the tutorial before the reference.

All image references validate: `/img/kanban-tutorial/*` maps to
website/static/img/kanban-tutorial/ (10 files). Docusaurus build
not run locally (no node_modules in worktree); CI build on merge
will confirm.
2026-04-27 20:28:53 -07:00
Teknium 1619c0e503 fix(kanban): third pass — auto-init on first use, show --json carries runs[]
Found during full-stack live-test of the kanban system: two bugs where
the kernel and CLI didn't match the documented contract.

Kernel
  - `connect()` now auto-runs schema creation + migrations on the
    first connection to a given DB path, matching its docstring
    ('Open (and initialize if needed)' — which was aspirational until
    now). Module-level _INITIALIZED_PATHS cache keeps subsequent
    connects cheap. Previously the docstring lied: every path that
    went through connect() on a fresh HERMES_HOME raised 'no such
    table: tasks' — only `hermes kanban init` and `daemon` triggered
    schema creation.
  - `init_db()` always re-runs the migration pass (clears the cache
    entry first). Callers that know the on-disk schema may have
    drifted — tests writing legacy event kinds, external tools
    upgrading an old DB file — can force re-migration.

CLI
  - `kanban_command()` entry point auto-inits the DB before
    dispatching any subcommand. Idempotent; the underlying
    connect-based init pattern makes this a one-line SELECT against
    sqlite_master after the first call.
  - `hermes kanban show --json` now includes:
      - `runs`: full attempt history (id, profile, step_key, status,
        outcome, summary, error, metadata, worker_pid, started_at,
        ended_at)
      - `run_id` on every event object
    Dashboard API already had both; CLI was behind. Now scripts that
    inspect a task can use `show --json` alone.
  - `hermes kanban show` (human-readable) prints a Runs section
    matching the `runs` subcommand's format, and each Event line
    prefixes its run_id when present. Makes attempt attribution
    visible at a glance without a second command.

Tests (+3)
  - cli_create_on_fresh_home_auto_inits (subprocess, no init_db call,
    must succeed — covers the most common first-user path)
  - connect_auto_inits_fresh_db (direct kernel use without init_db)
  - cli_show_json_carries_runs (runs[] present; events carry run_id)

174/174 kanban suite pass under scripts/run_tests.sh.

Live-tested end-to-end
  - Phases 1-6: CLI subprocess (create/claim/complete/bulk-guard/
    synthetic-run/reclaim-via-TTL/multi-attempt history)
  - Phases 7-10: Dashboard FastAPI TestClient (POST/PATCH with
    summary+metadata, drag-drop running->ready, archive-while-
    running, mark-done-with-handoff)
  - Phases 11-13: Dispatcher with stub spawn_fn (3 tasks success,
    3 failures -> gave_up circuit breaker, event.run_id attribution
    across retries)
  - Phases 14-15: WebSocket /events (run_id payload, auth rejects
    wrong tokens with code 1008)
  - Phase 16: Gateway notifier (unseen_events_for_sub returns
    run_id on events, message renders 'done — title' + handoff
    summary from event payload, crashed event path, sub cleanup)

Every surface — kernel, dispatcher, CLI, dashboard REST, dashboard
WebSocket, gateway notifier — exercised end-to-end against a live
FastAPI app with a real SQLite DB in an isolated HERMES_HOME. All
passed.
2026-04-27 19:41:08 -07:00
Teknium e27c819de3 fix(kanban): deep-scan pass 2 — synthetic runs, event.run_id plumbing, invariant recovery, live drawer refresh
Second integration audit covering surfaces the first pass didn't hit.
Found eight issues spanning kernel, dashboard frontend, notifier, and CLI.
All behavioral / UX fixes; no schema change.

Kernel
  - complete_task on a never-claimed task (ready/blocked → done with no
    run in flight) was silently dropping the summary/metadata/result
    onto a non-existent run. Now synthesizes a zero-duration run
    (started_at == ended_at) so attempt history is complete. Only
    fires when there's actually handoff data to persist — bare
    complete_task(tid) remains a no-op for run creation.
  - block_task on a never-claimed task had the same bug for --reason.
    Same fix: synthesize a zero-duration run when a reason is passed.
  - Event dataclass gained a `run_id: Optional[int] = None` field.
    list_events, unseen_events_for_sub, and the dashboard _event_dict
    were all SELECTing the column but dropping it on the way out,
    so downstream consumers couldn't group events by attempt. Every
    read path now surfaces run_id.
  - claim_task got a defensive invariant-recovery step: if somehow
    `current_run_id` is non-NULL on a task in 'ready' status (invariant
    violation from an unknown code path), close the leaked run as
    'reclaimed' inside the same txn as the new claim. No-op in the
    common case; belt-and-suspenders in case a future code path forgets
    to clear the pointer.

Dashboard
  - GET /tasks/:id events array now carries run_id per event (via
    _event_dict).
  - WebSocket /events SELECT now includes run_id in the pushed event
    payload.
  - TaskDrawer reloads itself on live events for its own task id. New
    `taskEventTick[taskId]` state in the Board, incremented on every
    WS event, passed down as `eventTick` prop; drawer's useEffect
    depends on it. Previously, background workers completing a task
    the user was viewing left the drawer showing stale data until
    manual close/reopen.
  - CSS: added `.hermes-kanban-run--ended` rule for the fallback class
    the JS emits when outcome is unset. Harmless before; just
    inconsistent.

CLI
  - `hermes kanban watch --kinds` help text listed the legacy event
    name `spawn_auto_blocked`. The kernel migration renames it to
    `gave_up`, so users typing the documented name got zero matches.
    Now shows the current lexicon (`completed,blocked,gave_up,
    crashed,timed_out`).

Tests (+6 in core functionality, +1 in dashboard plugin)
  - complete_never_claimed_task_synthesizes_run
  - block_never_claimed_task_synthesizes_run
  - complete_never_claimed_without_handoff_skips_synthesis
  - event_dataclass_carries_run_id (created.run_id None, completed.run_id matches)
  - unseen_events_for_sub_includes_run_id (notifier path)
  - claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery)
  - event_dict_includes_run_id (dashboard API shape)

171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated
HERMES_HOME via execute_code) exercised all six fixed paths plus the
claim-after-leak recovery sequence.

Docs
  - Runs section: new 'Synthetic runs for never-claimed completions'
    and 'Live drawer refresh' paragraphs explaining the invariants.
  - Event reference: `created` / `promoted` / `unblocked` entries now
    explicitly note `run_id` is `NULL`; `completed` / `blocked`
    describe synthetic-run fallback.
2026-04-27 19:23:49 -07:00
Teknium 1c78f6627a docs(kanban): document audit-pass invariants — bulk-close guard, reclaimed-on-status-change, completed event carries summary
- Runs section: dashboard PATCH parity (summary/metadata forward),
  `completed` event embeds first-line summary for notifiers, bulk
  --summary/--metadata refused, archive/drag-drop reclaim semantics.
- Event reference: added Payload column to Lifecycle and Edits
  tables; called out the invariant that `status` carries run_id
  when closing a reclaimed run.
2026-04-27 08:46:08 -07:00
Teknium 8ef2ae6502 fix(kanban): audit pass — close orphaned runs on archive / dashboard direct-status / drag-drop
Integration audit of the runs-as-first-class work (0146cb2bd) found five
bugs where structured runs got orphaned or dashboard parity was missing.
All behavioral fixes; no schema change needed.

Kernel
  - archive_task: when called on a running task, now closes the
    in-flight run with outcome='reclaimed' and clears current_run_id.
    Previously, dashboard bulk-archive or CLI `kanban archive <running>`
    would leave the task_runs row open with ended_at=NULL forever and
    strand the pointer. Adds the claim_lock / claim_expires / worker_pid
    clearing to the UPDATE so the task row is clean too.
  - complete_task: embeds the first-line handoff summary in the
    `completed` event payload (capped at 400 chars). Notifier can now
    render `✔ task done — <title>\n<summary>` without a second SQL hit,
    and the full summary still lives on the run row.

Dashboard plugin
  - _set_status_direct: drag-drop OFF 'running' (to 'ready', 'todo',
    'triage', 'done' — anywhere except back to 'running') now closes
    the active run with outcome='reclaimed'. Clears worker_pid too.
    Snapshots previous status + current_run_id before the UPDATE so
    the decision has the right before-state. status event rows now
    carry run_id when closing a run, NULL otherwise.
  - UpdateTaskBody: adds `summary` and `metadata` fields. PATCH
    /tasks/:id with status='done' now forwards them to complete_task,
    giving the dashboard parity with `hermes kanban complete --summary
    ... --metadata ...`. Previously these fields only existed on the
    CLI.

CLI
  - `hermes kanban complete a b c --summary X` or `--metadata Y`:
    refused with a clear stderr message instead of silently applying
    the same handoff to every task. Bulk-close without handoff flags
    still works. (Note: hermes_cli.main discards subcommand exit
    codes via `args.func(args)` without propagating; tracked
    separately. Side-effect check is the real guard.)

Gateway notifier
  - Completion message prefers run.summary (carried in event payload)
    over task.result. task.result remains the fallback for legacy rows
    written before runs shipped.
  - Docstring: renamed stale `spawn_auto_blocked` reference to
    `gave_up` / `timed_out` — matches the actual TERMINAL_KINDS
    tuple, which was already correct in code.

Tests (+8 in core functionality, +3 in dashboard plugin)
  - archive_of_running_task_closes_run
  - archive_of_ready_task_does_not_create_spurious_run
  - dashboard_direct_status_change_off_running_closes_run
  - dashboard_direct_status_change_within_same_state_is_noop_for_runs
  - cli_bulk_complete_with_summary_rejects (side-effect assertion)
  - cli_bulk_complete_without_summary_still_works
  - completed_event_payload_carries_summary
  - completed_event_payload_summary_none_when_missing
  - patch_status_done_with_summary_and_metadata
  - patch_status_done_without_summary_still_works (legacy path)
  - patch_status_archive_closes_running_run (E2E through FastAPI TestClient)

164/164 kanban suite pass under scripts/run_tests.sh. Live smoke
(execute_code with isolated HERMES_HOME) covered all five fixed paths
plus a re-claim-after-drag-drop to confirm the fresh run is tracked
correctly after the orphan close.
2026-04-27 07:44:39 -07:00
Teknium 0146cb2bd2 feat(kanban): runs as first-class (v1); structured handoffs; forward-compat for v2 workflows
Addresses vulcan-artivus's RFC review on issue #16102. Picks up the
structural changes that are expensive to retrofit later and zero-cost
to land now; defers workflow-template routing + per-stage lanes to v2
(kept forward-compat hooks in the schema).

Kernel
  - New `task_runs` table. Each claim opens a run (pid, claim_lock,
    heartbeat, max_runtime, started_at), each terminal transition
    closes it with an outcome (completed / blocked / crashed /
    timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per
    task when retries happen, preserving full attempt history.
  - `tasks.current_run_id` points at the active run (NULL when idle);
    denormalised for cheap reads.
  - `task_events.run_id` carries the run a given event belongs to so
    UIs group events by attempt. claim/spawned/complete/block/crash/
    timeout/spawn_fail/gave_up/heartbeat events are all run-scoped;
    created/promoted/assigned/edited stay task-scoped (run_id=NULL).
  - Legacy DBs: migration adds the columns + indexes + synthesizes a
    run row for any task that's 'running' before the runs table
    existed, so subsequent complete/heartbeat/reclaim calls have a
    target. Idempotent.

Structured handoff
  - `complete_task(summary=, metadata=)` persists both on the closing
    run. `summary` falls back to `result` when omitted so single-run
    callers don't duplicate. `metadata` is a free-form dict
    ({changed_files, tests_run, findings, ...}).
  - `build_worker_context` rewrites: now reads "Prior attempts on this
    task" (closed runs: outcome, summary, error, metadata) and
    "Parent task results" pulls run.summary + run.metadata of the
    most-recent completed run per parent, falling back to task.result
    for legacy rows without runs. Retrying workers see why earlier
    attempts failed; downstream workers see parent handoffs
    structurally, not as loose `result` strings.

CLI
  - `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`.
    JSON is parsed and rejected with exit-2 if malformed.
  - New `hermes kanban runs <id> [--json]` verb. Shows per-run rows:
    outcome, profile, elapsed, summary, error. JSON mode serializes
    the full run dataclass for scripting.

Dashboard plugin
  - GET /tasks/:id now carries a runs[] array alongside task / events /
    comments / links. Each run serialised with outcome, summary,
    metadata, worker_pid, elapsed fields.
  - New Run History section in the drawer. Outcome-coloured left
    border (green=active, blue=completed, amber=reclaimed,
    red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs
    with a '+N earlier' toggle. Shows summary + error + metadata
    inline.

Forward-compat for v2 (vulcan's workflow templates + stages)
  - `tasks.workflow_template_id` and `tasks.current_step_key` added as
    nullable columns. v1 kernel ignores them for routing; v2 will add
    workflow_templates + workflow_steps tables and wire the dispatcher
    to consult them. task_runs has a matching `step_key` column. Lets
    a v2 release land additively without another schema migration.

Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard)
  - run_created_on_claim / run_closed_on_complete_with_summary
  - run_summary_falls_back_to_result
  - multiple_attempts_preserved_as_runs (3 attempts: reclaimed →
    crashed → completed, all visible in list_runs)
  - run_on_block_with_reason / run_on_spawn_failure_records_failed_runs
    (5 spawn_failed runs + 1 gave_up run)
  - event_rows_carry_run_id (task-scoped vs run-scoped split)
  - build_worker_context_includes_prior_attempts
  - build_worker_context_uses_parent_run_summary (metadata JSON in context)
  - migration_backfills_inflight_run_for_legacy_db (simulates a
    pre-migration running task, re-runs init_db, asserts backfill)
  - forward_compat_columns_writable
  - cli_runs_verb + cli_runs_json
  - cli_complete_with_summary_and_metadata (JSON round-trip through
    shlex + argparse)
  - cli_complete_bad_metadata_exits_nonzero
  - task_detail_includes_runs / task_detail_runs_empty_before_claim

269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke
covered: single-attempt complete → run closed + summary persisted;
retry scenario → two runs visible (blocked + completed); parent run
summary + metadata surfaced to child via build_worker_context;
forward-compat columns writable via UPDATE; GET /tasks/:id returns
runs[].

Docs
  - New 'Runs — one row per attempt' section in kanban.md: the
    why (full attempt history, structured metadata), the two-table
    model (task is logical, run is execution), the structured handoff
    shape (--summary / --metadata), example CLI + dashboard output,
    forward-compat note for v2.
  - Event reference updated to mention task_events.run_id.
  - CLI reference gains 'hermes kanban runs <id>'.

Not in v1 (deferred to v2):
  - Workflow templates (workflow_templates + workflow_steps tables,
    stage-based routing, success/failure step links).
  - 'stage' as a distinct axis from status in the UI.
  - Shared-by-default workspace binding across stages of the same
    workflow run.
  - Pipeline replacement for the kanban-orchestrator skill (the
    orchestrator's 'decompose, don't execute' guidance is still
    correct; it becomes partly redundant once workflows land).
2026-04-27 06:54:19 -07:00
Teknium da7d09c3b6 feat(kanban): max-runtime timeouts, worker heartbeats, assignees picker, event vocab cleanup
Ports four items from the Multica audit (https://github.com/multica-ai/multica).
Dropped their cross-host server/daemon architecture and their Postgres+pgvector
skill search — both the wrong shape for our single-host SQLite kernel.

1. Per-task max-runtime (`max_runtime_seconds` column)
   - New kernel function `enforce_max_runtime(conn)` runs in every dispatch
     tick. When a running task's elapsed time exceeds the cap, we SIGTERM
     the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The
     task goes back to 'ready' with a `timed_out` event and re-queues
     on the next tick (unless the spawn-failure circuit breaker has
     already parked it).
   - Host-local only: lock prefix must match this host's claimer_id so we
     never signal a PID on another machine.
   - CLI: `hermes kanban create --max-runtime 30m | 2h | 1d | <seconds>`.
     New `_parse_duration` helper accepts s/m/h/d suffixes or bare
     integers.
   - Dashboard POST body + the card's `max_runtime_seconds` field.

2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event)
   - `heartbeat_worker(conn, task_id, note=None)` emits the event and
     touches last_heartbeat_at. Refused when the task isn't running.
   - CLI: `hermes kanban heartbeat <id> [--note "..."]`.
   - kanban-worker skill instructs workers to heartbeat during long
     loops (training runs, encodes, crawls, batch uploads).
   - Separate signal from PID crash detection: a worker's Python can
     still be alive while the actual work process is stuck. Heartbeat
     absence is diagnostic; future work can auto-block on stale
     heartbeats but v1 just surfaces the signal.

3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`)
   - Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions
     with current assignees on the board. Each entry returns
     {name, on_disk, counts: {status: n}}.
   - CLI: `hermes kanban assignees [--json]`. Also hooked into
     `hermes kanban init` which now prints discovered profiles so new
     installs see 'these are the assignees you can target' immediately.
   - Dashboard: GET /api/plugins/kanban/assignees for the picker.

4. Event vocab cleanup (three renames + three new kinds)
   - `ready` → `promoted` (fires when deps clear; clearer semantic).
   - `priority` → `reprioritized` (past-tense verb, matches others).
   - `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit
     breaker gave up on this task).
   - New: `spawned` (emitted with {pid} on successful spawn),
     `heartbeat` ({note?}), `timed_out`
     ({pid, elapsed_seconds, limit_seconds, sigkill}).
   - One-shot migration in `_migrate_add_optional_columns` renames
     legacy rows in-place on init_db(), so existing DBs upgrade cleanly.
   - Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its
     own ⏱ message template, gave_up renamed from 'auto-blocked'.
   - Plugin_api.py's two 'priority' emit sites renamed to
     'reprioritized'.
   - Documented in a new 'Event reference' section in kanban.md,
     grouped into three clusters (lifecycle / edits / worker
     telemetry) with payload shapes.

Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py,
136/136 pass):
  - max_runtime_terminates_overrun_worker: real SIGTERM flow with
    _pid_alive stub, verifies event payload + state reset.
  - max_runtime_none_means_no_cap: unbounded tasks aren't timed out.
  - create_task_persists_max_runtime.
  - enforce_max_runtime_integrates_with_dispatch: kernel-level +
    dispatch_once chaining.
  - heartbeat_on_running_task + heartbeat_refused_when_not_running.
  - cli_heartbeat_verb with --note round-trip.
  - recompute_ready_emits_promoted_not_ready.
  - spawn_failure_circuit_breaker_emits_gave_up.
  - spawned_event_emitted_with_pid.
  - migration_renames_legacy_event_kinds (injects old rows, re-runs
    init_db, asserts rename).
  - list_profiles_on_disk (tmp_path + config.yaml filter).
  - known_assignees_merges_disk_and_board (profiles on disk + board
    assignees + per-status counts).
  - cli_assignees_json.
  - parse_duration_accepts_formats (s/m/h/d/float).
  - parse_duration_rejects_garbage.
  - cli_create_max_runtime_via_duration (2h → 7200).
  - cli_create_max_runtime_bad_format_exits_nonzero.

Live smoke: POST /tasks with max_runtime_seconds round-trips;
/assignees returns the union of on-disk + board-assigned names;
PATCH priority produces 'reprioritized' events (not 'priority');
board cards expose max_runtime_seconds + last_heartbeat_at.

Docs (website/docs/user-guide/features/kanban.md):
  - New 'Event reference' section with three-cluster table
    (lifecycle / edits / worker telemetry) + payload shapes.
  - CLI reference updated for --max-runtime, heartbeat, assignees.
  - Gateway notifications section updated for the new TERMINAL_KINDS.

Not ported from Multica (deliberate, documented in the out-of-scope
section already): Postgres+pgvector skill search (heavy deps conflict
with SQLite kernel), server+daemon cross-host model (we're
single-host on purpose), first-class agent identity with threaded
comments (we keep the board profile-agnostic).
2026-04-27 06:32:17 -07:00
Teknium af8d43dbbb feat(kanban): core hardening — daemon, circuit breaker, crash detect, logs, notify, bulk, stats
Eliminates every 'known broken on day one' item in the core functionality
audit. The board is now self-driving (daemon, not cron), self-healing
(crash detection, spawn-failure circuit breaker), and self-reporting
(logs, stats, gateway notifications).

Dispatcher
  - New `hermes kanban daemon` long-lived loop with --interval, --max,
    --failure-limit, --pidfile, --verbose, signal-clean shutdown
    (SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point
    lets tests drive it inline without subprocess.
  - `hermes kanban init` now prints the dispatcher setup hint so users
    don't leave the board off-by-default. Ships a systemd user unit at
    plugins/kanban/systemd/hermes-kanban-dispatcher.service.
  - Removed the old 'add this to cron' doc path. Cron runs agent
    prompts (LLM cost per tick) — unacceptable for a per-minute
    coordination loop.

Worker aliveness / safety
  - Spawn returns the child's PID; dispatcher stores it on the task row
    and calls detect_crashed_workers() every tick. If the PID is gone
    but the claim TTL hasn't expired, the task drops back to ready with
    a 'crashed' event. Host-local only — cross-host PIDs are ignored
    per the single-host design.
  - Spawn-failure circuit breaker: after N consecutive spawn_failed
    events on the same task (default 5), the dispatcher auto-blocks
    with the last error as the reason. Success resets the counter.
    Workspace-resolution failures count against the same budget.
  - Log rotation: _rotate_worker_log trims at 2 MiB, keeps one
    generation (.log.1), bounds per-task disk usage at ~4 MiB.

Idempotency / dedup
  - create_task(idempotency_key=...) returns the existing non-archived
    task id for retried webhooks. --idempotency-key on the CLI, json
    body field on the dashboard plugin. Archived tasks don't block a
    fresh create with the same key.

CLI surface
  - Bulk verbs: complete, unblock, archive accept multiple ids;
    block accepts --ids for sibling blocks with the same reason.
  - New verbs: daemon, watch (live event tail filtered by
    assignee/tenant/kinds), stats, log, notify-subscribe,
    notify-list, notify-unsubscribe.
  - dispatch gains --failure-limit + crashed/auto_blocked columns in
    JSON output and human-readable output.
  - gc accepts --event-retention-days / --log-retention-days; prunes
    task_events for terminal tasks and old log files.

Gateway integration
  - New GatewayRunner._kanban_notifier_watcher: polls
    kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed
    chats for completed/blocked/spawn_auto_blocked/crashed events.
    Cursor-advanced per-sub; auto-removed when the task reaches
    done/archived. Runs alongside the session expiry and platform
    reconnect watchers — SQLite work in asyncio.to_thread so the
    event loop never blocks.
  - /kanban create in the gateway auto-subscribes the originating
    chat (platform + chat_id + thread_id). Users see
    '(subscribed — you'll be notified when t_abcd completes or
    blocks)' appended to the response.

Dashboard plugin
  - GET /stats returns board_stats (by_status, by_assignee,
    oldest_ready_age_seconds).
  - GET /tasks/:id/log returns the worker log with optional ?tail=N
    cap. 404 on unknown task, exists=false when the task has never
    spawned.
  - POST /tasks accepts idempotency_key; both Pydantic body and the
    create_task kwarg now round-trip.
  - /board attaches task.age (created/started/time_to_complete in
    seconds) so the UI can colour stale cards without recomputing.
  - Card CSS: amber border after N minutes, red border when clearly
    stuck (tier per status: running 10m/60m, ready 1h/24h, todo
    7d/30d, blocked 1h/24h).
  - Drawer: new Worker log section, auto-loads on mount, last 100 KB
    cap with on-disk path surfaced when truncated.

Kernel
  - Schema additions: tasks.idempotency_key, tasks.spawn_failures,
    tasks.worker_pid, tasks.last_spawn_error; new
    kanban_notify_subs table. All gated by _migrate_add_optional_columns
    so legacy DBs upgrade cleanly.
  - release_stale_claims / complete_task / block_task now all clear
    worker_pid so crash detection doesn't false-positive on reclaimed
    tasks.
  - read_worker_log fixed: tail-skip no longer eats one-giant-line
    logs (common with child processes that don't flush newlines
    before dying).

Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new)
  - Idempotency: same key returns existing, archived doesn't block,
    no key never collides
  - Circuit breaker: auto-blocks after limit, success resets counter,
    workspace-resolution failure counts against budget
  - Aliveness: _pid_alive helper, detect_crashed_workers reclaims
    exited child
  - Daemon: runs and stops cleanly via stop_event, survives a tick
    exception
  - Stats + task_age helpers
  - Notify subs: CRUD, cursor advances, distinct-thread is a separate row
  - GC: events-only-for-terminal-tasks, old worker logs deleted
  - Log: rotation keeps one generation, read_worker_log tail
  - CLI: bulk complete/archive/unblock/block, create with
    --idempotency-key, stats --json, notify-subscribe+list, log
    missing task, gc reports counts
  - run_slash parity: smoke-tests every registered verb (23
    invocations); none may raise or return empty string

Full kanban test suite: 234/234 pass under scripts/run_tests.sh
(60 original + 30 dashboard plugin + 28 new core + 116 command
registry). Live smoke covers /stats, idempotency, age, log endpoint
with and without content, log?tail= truncation signal, 404 on unknown
task.

Docs (website/docs/user-guide/features/kanban.md)
  - 'Core concepts' rewritten: new statuses (triage), idempotency key,
    dispatcher-as-daemon-not-cron with circuit breaker behaviour
    documented.
  - Quick start swapped to daemon. New systemd section covers user
    service install.
  - New sections: idempotent create, bulk verbs, gateway
    notifications, out-of-scope single-host note (kanban.db is local;
    don't expect multi-host).
  - CLI reference updated for every new verb, every new flag.
2026-04-26 13:01:09 -07:00
Teknium 27fc6c1086 feat(kanban): bulk ops, drawer edit, dep editor, markdown, touch, config
The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.

Plugin backend
  - POST /tasks/bulk — apply the same patch (status / archive / assignee
    / priority) to every id in the request body. Each id runs
    independently: one bad id reports {ok: false, error: ...} without
    aborting siblings. Status transitions that aren't legal for the
    current state are surfaced per-id ('transition to done refused').
    Used by the multi-select bulk action bar.
  - GET /config — returns the dashboard.kanban section of config.yaml
    (default_tenant, lane_by_profile, include_archived_by_default,
    render_markdown) with sensible defaults when the section is absent.
    Loaded once by the SPA to preselect filters and toggle markdown
    rendering.
  - _conn() helper — every handler now goes through it, calling
    kanban_db.init_db() (idempotent) before every connection. Fresh
    installs work whether the first hit is GET /board, POST /tasks, or
    any other endpoint — no more 'no such table: tasks' when the CLI
    or a script hits the plugin before the dashboard has ever loaded.

Plugin UI (plugin bundle, +~12 KB)
  - Multi-select: per-card checkbox; shift/ctrl-click also toggles
    without opening the drawer. A BulkActionBar appears above the
    columns with batch → ready / complete / archive / reassign
    (profile dropdown + unassign option). Destructive batches confirm
    first. Partial failures from the backend are surfaced inline.
  - Drawer inline editing:
    - Click the title → TitleEditor swaps in an input, Enter saves,
      Escape cancels.
    - Click the Assignee meta row → AssigneeEditor input (empty string
      unassigns).
    - Click the Priority meta row → PriorityEditor numeric input.
    - New 'edit' button on Description → full-width textarea; Save /
      Cancel switch back to rendered view.
  - Dependency editor: chip list of parents + children with per-chip
    × button (calls DELETE /links). Add-parent / add-child dropdowns
    filter out self + already-linked tasks so you cannot re-add a
    duplicate edge or a self-loop. Cycle rejections from the server
    surface cleanly via the existing error banner.
  - Parent selection in InlineCreate: new dropdown listing every task
    on the board ('{id} — {title}') — picking one sends parents=[id]
    with the create payload, so the task lands in todo (or triage if
    created from the Triage column) with the dependency wired up.
  - Safe markdown rendering for description, comment bodies, and
    result. A small in-bundle renderer handles headings, bold, italic,
    inline code, fenced code, bullet lists, and http(s)/mailto links.
    Every substitution runs on HTML-escaped input (no raw HTML), links
    get target=_blank + rel=noopener,noreferrer. Disabled by config
    key dashboard.kanban.render_markdown=false (falls back to <pre>).
  - Touch drag-drop: attachTouchDrag() installs a pointerdown handler
    that spawns a drag proxy, tracks elementFromPoint under the finger,
    and dispatches a hermes-kanban:drop CustomEvent on the column when
    released. Desktop continues to use native HTML5 DnD. Columns
    listen for both.
  - ErrorBoundary already present from the prior commit catches any
    renderer throw; markdown escape + touch-proxy cleanup both have
    their own try/finally.

Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
  - bulk_status_ready: 3 tasks blocked, batch → ready, all move
  - bulk_archive hides all ids from default board
  - bulk_reassign changes every assignee
  - bulk_unassign_via_empty_string sets assignee back to None
  - bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
    good siblings still get priority=7
  - bulk_empty_ids_400
  - config_returns_defaults_when_section_missing
  - config_reads_dashboard_kanban_section (writes config.yaml, verifies
    every key round-trips)

Live smoke (real FastAPI app + isolated HERMES_HOME):
  - /config without section returns defaults
  - /config with dashboard.kanban section returns the configured values
  - POST /tasks as the first-ever request (no prior /board) succeeds —
    auto-init handles it
  - Link add + remove via POST /links + DELETE /links round-trip
  - Bulk priority bump on 2 ids, both get priority=5
  - Bulk archive hides ids from default board
  - PATCH {title, body} updates the task, markdown source survives
    the round trip
  - POST /tasks {triage: true, parents: [id]} lands in triage, not todo
  - Bulk partial: 2 good + 1 bogus returns per-id outcome

Docs (website/docs/user-guide/features/kanban.md)
  - 'What the plugin gives you' rewritten to reflect bulk, drawer
    edit, dep editor, parent-on-create, markdown, touch drag-drop.
  - New 'Dashboard config' subsection with a YAML example for
    dashboard.kanban.*.
  - REST table gains /tasks/bulk and /config rows.
2026-04-26 12:36:23 -07:00
Teknium 45806629c5 feat(kanban): Triage column, progress rollup, WS auth, lanes, polish
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.

Kernel changes
  - kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
    CHECK constraint so no schema migration is needed.
  - create_task(triage=True) forces the initial status to 'triage'
    regardless of parents, and parent ids are still validated so the
    eventual link rows don't dangle. recompute_ready() only promotes
    'todo' -> 'ready', so triage tasks are naturally isolated from the
    dispatcher pipeline.
  - hermes kanban create gains --triage.
  Patterns table (docs) gains P9 'Triage specifier'.

Plugin backend (plugins/kanban/dashboard/plugin_api.py)
  - GET /board now auto-init's kanban.db on first read (idempotent).
    A fresh install shows an empty board instead of 'failed to load'.
  - GET /board returns a new 'progress' field per task — {done, total}
    of child-task completion, or None if the task has no children.
  - BOARD_COLUMNS prepends 'triage'.
  - POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
    {status: 'triage'}.
  - WebSocket /events now requires ?token=<session_token> as a query
    param — browsers can't set Authorization on a WS upgrade, so this
    matches the pattern the in-browser PTY bridge uses. Constant-time
    compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
    contexts (no dashboard module) the check no-ops so the tail loop
    stays testable. Security boundary documented in the module header
    and in website/docs/user-guide/features/kanban.md.

Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
  - Adds the Triage column (lilac dot) with helper text
    'Raw ideas — a specifier will flesh out the spec'. Inline-create
    from the Triage column parks new tasks in triage.
  - Status action row in the drawer gains '→ triage'.
  - Progress pill (N/M) on cards that have children. Full-complete
    state tints the pill green.
  - 'Lanes by profile' toolbar toggle — sub-groups the Running column
    by assignee so you see at a glance which specialist is busy on
    what.
  - Destructive status moves (done / archived / blocked) via drag-drop
    OR via the drawer action row now prompt for confirmation.
  - Escape closes the drawer.
  - Live-update reloads are debounced (250ms) so a burst of
    task_events triggers one refetch, not N.
  - WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
  - WebSocket reconnect uses exponential backoff capped at 30s, not
    a fixed 1.5s spin loop, and surfaces a user-visible error on
    code-1008 (auth rejected) instead of reconnecting forever.
  - ErrorBoundary wraps the page — a bad card render shows a
    'rendering error, reload view' card instead of crashing the tab.

Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
  - empty-board shape now asserts all 6 columns including 'triage'
  - create_triage_lands_in_triage_column
  - triage_task_not_promoted_to_ready (dispatcher bypasses triage)
  - patch_status_triage_works (both into triage and out of it)
  - board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
  - board_auto_initializes_missing_db
  - ws_events_rejects_when_token_required (three sub-assertions:
    missing → 1008, wrong → 1008, correct → handshake accepted)

All 82 kanban tests pass under scripts/run_tests.sh.

Docs
  - kanban.md 'What the plugin gives you' fully rewritten to match
    shipped reality (triage, progress pill, assignee lanes,
    destructive-confirm, Escape-close, debounce).
  - New 'Security model' subsection documents the explicit-plugin-
    route-bypass, the WS token requirement, and the --host 0.0.0.0
    warning; also notes that kanban.db is profile-agnostic on purpose
    (the coordination primitive) so cross-profile visibility is
    expected.
  - CLI command reference shows --triage.
  - Collaboration patterns table adds P9 'Triage specifier'.
2026-04-26 12:26:43 -07:00
Teknium 4093201c47 feat(kanban): dashboard plugin — Linear/Fusion-style board UI
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core
changes — uses the standard dashboard plugin contract (manifest.json +
dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'.

What the tab gives you:
- One column per kanban status (todo / ready / running / blocked / done;
  archived behind a toggle), column counts, coloured status dots.
- Cards with id, title, priority badge, tenant tag, assignee,
  comment/link counts, 'created N ago'.
- HTML5 drag-drop between columns — status change routes through the
  same kanban_db code the CLI /kanban verbs use, so the three surfaces
  (CLI, gateway, dashboard) can never drift.
- Inline create per-column (title, assignee, priority).
- Side drawer on card click: description, status action row
  (→ ready / → running / block / unblock / complete / archive),
  dependency links, comment thread with Enter-to-submit,
  last 20 events.
- Toolbar: search, tenant filter, assignee filter, show-archived,
  nudge-dispatcher (skip the 60s wait), refresh.
- Live updates via WebSocket tailing task_events — the board reflects
  CLI or gateway actions in real time.

REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id,
POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links,
DELETE /links, POST /dispatch, WS /events. Every handler is a thin
wrapper around kanban_db — no new business logic.

Visually theme-aware: the plugin CSS reads only --color-*, --radius,
--font-mono etc. so it reskins with whichever dashboard theme is active.

Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests):
- empty board shape
- create + appears in ready column with tenant/assignee rollups
- tenant filter
- detail includes parents/children/events
- 404 on unknown task
- PATCH status: complete / block / unblock / ready drag-drop / running
- PATCH reassign, priority, edit, invalid-status rejection
- POST comment (plus empty-body rejection)
- POST link + DELETE link + cycle rejection
- POST dispatch (dry run)

All 76 kanban tests pass under scripts/run_tests.sh.

Docs: website/docs/user-guide/features/kanban.md gains a full
'Dashboard (GUI)' section covering install, architecture, REST surface,
live-updates mechanism, extending, and scope boundary.
2026-04-26 12:08:47 -07:00
Teknium 9f610aa8f3 docs(kanban): add GUI/Dashboard plugin section
The /kanban CLI + slash command are enough to run the board
headlessly, but triage and cross-profile supervision want a
visual board. Document the design as a dashboard plugin that:

- reads live state from kanban.db over a WebSocket on
  task_events (no polling)
- writes through run_slash() so CLI/gateway/GUI cannot drift
- mounts under /api/plugins/kanban/ following the existing
  'Extending the Dashboard' plugin shape

The plugin is strictly a thin layer over kanban_db — no new
business logic, nothing to merge into the kernel.
2026-04-26 11:57:04 -07:00
Teknium e1c5e741ad feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.

Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).

What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
  dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
  entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
  worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
  the running-agent guard (board writes don't touch agent state), so
  /kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
  vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
  4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
  updated, sidebar entry added.

Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
  execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
  No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
  failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
  claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
  can serve many businesses with data isolation by workspace path.

Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
2026-04-26 08:29:46 -07:00
725 changed files with 21721 additions and 74380 deletions
-2
View File
@@ -5,9 +5,7 @@
# Dependencies
node_modules
**/node_modules
.venv
**/.venv
# CI/CD
.github
+1 -7
View File
@@ -13,7 +13,7 @@ concurrency:
cancel-in-progress: true
jobs:
nix-lockfile-check:
check:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
@@ -36,12 +36,6 @@ jobs:
LINK_SHA: ${{ steps.sha.outputs.full }}
run: nix run .#fix-lockfiles -- --check
- name: Fail if check crashed without reporting
if: steps.check.outputs.stale != 'true' && steps.check.outputs.stale != 'false'
run: |
echo "::error::fix-lockfiles exited without reporting stale status — likely an infrastructure or script failure"
exit 1
- name: Post sticky PR comment (stale)
if: steps.check.outputs.stale == 'true' && github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2.9.1
+2 -103
View File
@@ -1,13 +1,6 @@
name: Nix Lockfile Fix
on:
push:
branches: [main]
paths:
- 'ui-tui/package-lock.json'
- 'ui-tui/package.json'
- 'web/package-lock.json'
- 'web/package.json'
workflow_dispatch:
inputs:
pr_number:
@@ -26,103 +19,9 @@ concurrency:
cancel-in-progress: false
jobs:
# ── Auto-fix on main ───────────────────────────────────────────────
# Fires when a push to main touches package.json or package-lock.json
# in ui-tui/ or web/. Runs fix-lockfiles --apply and pushes the hash
# update commit directly to main so Nix builds never stay broken.
#
# Safety invariants:
# 1. The fix commit only touches nix/*.nix files, which are NOT in
# the paths filter above, so this cannot re-trigger itself.
# 2. An explicit file-whitelist check before commit aborts if
# fix-lockfiles ever modifies unexpected files.
# 3. Job-level concurrency with cancel-in-progress: true ensures
# back-to-back pushes collapse to the newest; ref: main checkout
# always operates on the latest branch state.
# 4. Uses a GitHub App token (not GITHUB_TOKEN) so the fix commit
# triggers downstream nix.yml verification.
auto-fix-main:
if: github.event_name == 'push'
runs-on: ubuntu-latest
timeout-minutes: 25
concurrency:
group: auto-fix-main
cancel-in-progress: true
steps:
- name: Generate GitHub App token
id: app-token
uses: actions/create-github-app-token@7bfa3a4717ef143a604ee0a99d859b8886a96d00 # v1.9.3
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: main
token: ${{ steps.app-token.outputs.token }}
- uses: ./.github/actions/nix-setup
- name: Apply lockfile hashes
id: apply
run: nix run .#fix-lockfiles -- --apply
- name: Commit & push
if: steps.apply.outputs.changed == 'true'
shell: bash
run: |
set -euo pipefail
# Ensure only nix files were modified — prevents accidental
# self-triggering if fix-lockfiles ever touches package files.
unexpected="$(git diff --name-only | grep -Ev '^nix/(tui|web)\.nix$' || true)"
if [ -n "$unexpected" ]; then
echo "::error::Unexpected modified files: $unexpected"
exit 1
fi
# Record the base SHA before committing — used to detect package
# file changes if we need to rebase after a non-fast-forward push.
BASE_SHA="$(git rev-parse HEAD)"
git config user.name 'github-actions[bot]'
git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
git add nix/tui.nix nix/web.nix
git commit -m "fix(nix): auto-refresh npm lockfile hashes" \
-m "Source: $GITHUB_SHA" \
-m "Run: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID"
# Retry push with rebase in case main advanced with an unrelated
# commit during the nix build. Without this, a non-fast-forward
# rejection silently loses the fix. If package files changed during
# the rebase, abort — a fresh auto-fix run will handle the new state.
for attempt in 1 2 3; do
if git push origin HEAD:main; then
exit 0
fi
echo "::warning::Push attempt $attempt failed (non-fast-forward?), rebasing…"
git fetch origin main
# If package files changed between our base and the new main,
# our computed hashes are stale. Abort and let the next triggered
# run recompute from the correct package-lock state.
pkg_changed="$(git diff --name-only "$BASE_SHA"..origin/main -- \
'ui-tui/package-lock.json' 'ui-tui/package.json' \
'web/package-lock.json' 'web/package.json' || true)"
if [ -n "$pkg_changed" ]; then
echo "::warning::Package files changed since hash computation — aborting; a fresh run will recompute"
exit 0
fi
git rebase origin/main
done
echo "::error::Failed to push after 3 rebase attempts"
exit 1
# ── PR fix (manual / checkbox) ─────────────────────────────────────
# Existing behavior: run on manual dispatch OR when a task-list
# checkbox in the sticky lockfile-check comment flips from [ ] to [x].
fix:
# Run on manual dispatch OR when a task-list checkbox in the sticky
# lockfile-check comment flips from `[ ]` to `[x]`.
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'issue_comment'
-1
View File
@@ -69,4 +69,3 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
models-dev-upstream/
+1 -1
View File
@@ -38,7 +38,7 @@ hermes-agent/
│ │ # homeassistant, signal, matrix, mattermost, email, sms,
│ │ # dingtalk, wecom, weixin, feishu, qqbot, bluebubbles,
│ │ # webhook, api_server, ...). See ADDING_A_PLATFORM.md.
│ └── builtin_hooks/ # Extension point for always-registered gateway hooks (none shipped)
│ └── builtin_hooks/ # Always-registered gateway hooks (boot-md, ...)
├── plugins/ # Plugin system (see "Plugins" section below)
│ ├── memory/ # Memory-provider plugins (honcho, mem0, supermemory, ...)
│ ├── context_engine/ # Context-engine plugins
+1 -1
View File
@@ -494,7 +494,7 @@ branding:
agent_name: "My Agent"
welcome: "Welcome message"
response_label: " ⚔ Agent "
prompt_symbol: "⚔"
prompt_symbol: "⚔ "
tool_prefix: "╎" # Tool output line prefix
```
+3 -13
View File
@@ -14,7 +14,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -30,28 +30,18 @@ WORKDIR /opt/hermes
# unless the lockfiles themselves change.
COPY package.json package-lock.json ./
COPY web/package.json web/package-lock.json web/
COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
COPY ui-tui/packages/hermes-ink/package.json ui-tui/packages/hermes-ink/package-lock.json ui-tui/packages/hermes-ink/
RUN npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
(cd web && npm install --prefer-offline --no-audit) && \
(cd ui-tui && npm install --prefer-offline --no-audit) && \
npm cache clean --force
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY --chown=hermes:hermes . .
# Build browser dashboard and terminal UI assets.
RUN cd web && npm run build && \
cd ../ui-tui && npm run build && \
rm -rf node_modules/@hermes/ink && \
rm -rf packages/hermes-ink/node_modules && \
cp -R packages/hermes-ink node_modules/@hermes/ink && \
npm install --omit=dev --prefer-offline --no-audit --prefix node_modules/@hermes/ink && \
rm -rf node_modules/@hermes/ink/node_modules/react && \
node --input-type=module -e "await import('@hermes/ink')"
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
-11
View File
@@ -112,17 +112,6 @@ def main() -> None:
import acp
from .server import HermesACPAgent
# MCP tool discovery from config.yaml — run before asyncio.run() so
# it's safe to use blocking waits. (ACP also registers per-session
# MCP servers dynamically via asyncio.to_thread inside the event
# loop; that path is unaffected.) Moved from model_tools.py module
# scope to avoid freezing the gateway's loop on lazy import (#16856).
try:
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug("MCP tool discovery failed at ACP startup", exc_info=True)
agent = HermesACPAgent()
try:
asyncio.run(acp.run_agent(agent, use_unstable_protocol=True))
+1 -28
View File
@@ -3,7 +3,6 @@
from __future__ import annotations
import asyncio
import contextvars
import logging
import os
from collections import defaultdict, deque
@@ -575,22 +574,6 @@ class HermesACPAgent(acp.Agent):
def _run_agent() -> dict:
nonlocal previous_approval_cb, previous_interactive
# Bind HERMES_SESSION_KEY for this session so per-session caches
# (e.g. the interactive sudo password cache in tools.terminal_tool)
# scope to the ACP session rather than leaking across sessions
# that land on the same reused executor thread. This call runs
# inside a contextvars.copy_context() below, so the ContextVar
# write is isolated from other concurrent ACP sessions.
try:
from gateway.session_context import (
clear_session_vars,
set_session_vars,
)
session_tokens = set_session_vars(session_key=session_id)
except Exception:
session_tokens = None
clear_session_vars = None # type: ignore[assignment]
logger.debug("Could not set ACP session context", exc_info=True)
if approval_cb:
try:
from tools import terminal_tool as _terminal_tool
@@ -624,19 +607,9 @@ class HermesACPAgent(acp.Agent):
_terminal_tool.set_approval_callback(previous_approval_cb)
except Exception:
logger.debug("Could not restore approval callback", exc_info=True)
if session_tokens is not None and clear_session_vars is not None:
try:
clear_session_vars(session_tokens)
except Exception:
logger.debug("Could not clear ACP session context", exc_info=True)
try:
# Wrap the executor call in a fresh copy of the current context so
# concurrent ACP sessions on the shared ThreadPoolExecutor don't
# stomp on each other's ContextVar writes (HERMES_SESSION_KEY in
# particular — used by the interactive sudo password cache scope).
ctx = contextvars.copy_context()
result = await loop.run_in_executor(_executor, ctx.run, _run_agent)
result = await loop.run_in_executor(_executor, _run_agent)
except Exception:
logger.exception("Executor error for session %s", session_id)
return PromptResponse(stop_reason="end_turn")
+78 -155
View File
@@ -22,25 +22,10 @@ from hermes_constants import get_hermes_home
from typing import Any, Dict, List, Optional, Tuple
from utils import normalize_proxy_env_vars
# NOTE: `import anthropic` is deliberately NOT at module top — the SDK pulls
# ~220 ms of imports (anthropic.types, anthropic.lib.tools._beta_runner, etc.)
# and the 3 usage sites (build_anthropic_client, build_anthropic_bedrock_client,
# read_claude_code_credentials_from_keychain) are all on cold user-triggered
# paths. Access via the `_get_anthropic_sdk()` accessor below, which caches
# the module after the first call and returns None on ImportError.
_anthropic_sdk: Any = ... # sentinel — None means "tried and missing"
def _get_anthropic_sdk():
"""Return the ``anthropic`` SDK module, importing lazily. None if not installed."""
global _anthropic_sdk
if _anthropic_sdk is ...:
try:
import anthropic as _sdk
_anthropic_sdk = _sdk
except ImportError:
_anthropic_sdk = None
return _anthropic_sdk
try:
import anthropic as _anthropic_sdk
except ImportError:
_anthropic_sdk = None # type: ignore[assignment]
logger = logging.getLogger(__name__)
@@ -217,33 +202,19 @@ def _forbids_sampling_params(model: str) -> bool:
# Beta headers for enhanced features (sent with ALL auth types).
# As of Opus 4.7 (2026-04-16), the first two are GA on Claude 4.6+ — the
# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
# that still gate on the headers continue to get the enhanced features.
#
# ``context-1m-2025-08-07`` unlocks the 1M context window on Claude Opus 4.6/4.7
# and Sonnet 4.6 when served via AWS Bedrock or Azure AI Foundry. 1M is GA on
# native Anthropic (api.anthropic.com) for Opus 4.6+, but Bedrock/Azure still
# gate it behind this beta header as of 2026-04 — without it Bedrock caps Opus
# at 200K even though model_metadata.py advertises 1M. The header is a harmless
# no-op on endpoints where 1M is GA.
#
# Migration guide: remove these if you no longer support ≤4.5 models or once
# Bedrock/Azure promote 1M to GA.
# Migration guide: remove these if you no longer support ≤4.5 models.
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
"context-1m-2025-08-07",
]
# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
# the fine-grained tool streaming beta is present. Omit it so tool calls
# fall back to the provider's default response path.
_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
# 1M context beta — see comment on _COMMON_BETAS above. Stripped for
# Bearer-auth (MiniMax) endpoints since they host their own models and
# unknown Anthropic beta headers risk request rejection.
_CONTEXT_1M_BETA = "context-1m-2025-08-07"
# Fast mode beta — enables the ``speed: "fast"`` request parameter for
# significantly higher output token throughput on Opus 4.6 (~2.5x).
@@ -257,11 +228,10 @@ _OAUTH_ONLY_BETAS = [
"oauth-2025-04-20",
]
# Claude Code version — sent on OAuth token-exchange / refresh requests
# (platform.claude.com/v1/oauth/token) as the client's user-agent. Anthropic's
# OAuth flow validates the UA and may reject requests with a version that's
# too old, so detecting dynamically keeps users on a current Claude Code
# install from hitting stale-version errors during login/refresh.
# Claude Code identity — required for OAuth requests to be routed correctly.
# Without these, Anthropic's infrastructure intermittently 500s OAuth traffic.
# The version must stay reasonably current — Anthropic rejects OAuth requests
# when the spoofed user-agent version is too far behind the actual release.
_CLAUDE_CODE_VERSION_FALLBACK = "2.1.74"
_claude_code_version_cache: Optional[str] = None
@@ -269,9 +239,9 @@ _claude_code_version_cache: Optional[str] = None
def _detect_claude_code_version() -> str:
"""Detect the installed Claude Code version, fall back to a static constant.
Used only by the OAuth token-exchange / refresh flow
(``platform.claude.com/v1/oauth/token``). The Messages API client no
longer sends a claude-cli user-agent.
Anthropic's OAuth infrastructure validates the user-agent version and may
reject requests with a version that's too old. Detecting dynamically means
users who keep Claude Code updated never hit stale-version 400s.
"""
import subprocess as _sp
@@ -291,13 +261,12 @@ def _detect_claude_code_version() -> str:
return _CLAUDE_CODE_VERSION_FALLBACK
def _get_claude_code_version() -> str:
"""Lazily detect the installed Claude Code version for OAuth flow headers.
_CLAUDE_CODE_SYSTEM_PREFIX = "You are Claude Code, Anthropic's official CLI for Claude."
_MCP_TOOL_PREFIX = "mcp_"
Used only on the OAuth token-exchange and refresh endpoints
(``platform.claude.com/v1/oauth/token``). The Messages API client does
not send a claude-cli user-agent.
"""
def _get_claude_code_version() -> str:
"""Lazily detect the installed Claude Code version when OAuth headers need it."""
global _claude_code_version_cache
if _claude_code_version_cache is None:
_claude_code_version_cache = _detect_claude_code_version()
@@ -388,14 +357,9 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
that include Anthropic's ``fine-grained-tool-streaming`` beta — every
tool-use message triggers a connection error. Strip that beta for
Bearer-auth endpoints while keeping all other betas intact.
The ``context-1m-2025-08-07`` beta is also stripped for Bearer-auth
endpoints — MiniMax hosts its own models, not Claude, so the header is
irrelevant at best and risks request rejection at worst.
"""
if _requires_bearer_auth(base_url):
_stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
return [b for b in _COMMON_BETAS if b not in _stripped]
return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
return _COMMON_BETAS
@@ -410,7 +374,6 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
Returns an anthropic.Anthropic instance.
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Anthropic provider. "
@@ -467,21 +430,15 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
if common_betas:
kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
elif _is_oauth_token(api_key):
# OAuth access token / setup-token → Bearer auth + OAuth-only betas.
# The OAuth-specific beta headers are still required by Anthropic's
# OAuth-gated Messages API path; the Claude Code user-agent / x-app
# spoofing is deliberately NOT sent — Hermes identifies as itself.
#
# ``context-1m-2025-08-07`` is stripped here: Anthropic rejects
# OAuth requests that carry it with
# "This authentication style is incompatible with the long
# context beta header."
# Subscription-gated OAuth traffic gets the 200K default window.
oauth_safe_common = [b for b in common_betas if b != _CONTEXT_1M_BETA]
all_betas = oauth_safe_common + _OAUTH_ONLY_BETAS
# OAuth access token / setup-token → Bearer auth + Claude Code identity.
# Anthropic routes OAuth requests based on user-agent and headers;
# without Claude Code's fingerprint, requests get intermittent 500s.
all_betas = common_betas + _OAUTH_ONLY_BETAS
kwargs["auth_token"] = api_key
kwargs["default_headers"] = {
"anthropic-beta": ",".join(all_betas),
"user-agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
"x-app": "cli",
}
else:
# Regular API key → x-api-key header + common betas
@@ -499,16 +456,8 @@ def build_anthropic_bedrock_client(region: str):
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Attaches the common Anthropic beta headers as client-level defaults so
that Bedrock-hosted Claude models get the same enhanced features as
native Anthropic. The ``context-1m-2025-08-07`` beta in particular
unlocks the 1M context window for Opus 4.6/4.7 on Bedrock — without
it, Bedrock caps these models at 200K even though the Anthropic API
serves them with 1M natively.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "
@@ -524,7 +473,6 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
default_headers={"anthropic-beta": ",".join(_COMMON_BETAS)},
)
@@ -540,6 +488,9 @@ def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
@@ -825,45 +776,17 @@ def resolve_anthropic_token() -> Optional[str]:
"""Resolve an Anthropic token from all available sources.
Priority:
1. Hermes credential pool (``~/.hermes/auth.json`` →
``credential_pool.anthropic``) — OAuth tokens minted by Hermes'
own PKCE login flow. Entries are auto-refreshed when near
expiry. Env-sourced pool entries (``source="env:..."``) are
skipped here so the env-var priority logic below still runs.
2. ANTHROPIC_TOKEN env var (OAuth/setup token saved by Hermes)
3. CLAUDE_CODE_OAUTH_TOKEN env var
4. Claude Code credentials (~/.claude.json or ~/.claude/.credentials.json)
1. ANTHROPIC_TOKEN env var (OAuth/setup token saved by Hermes)
2. CLAUDE_CODE_OAUTH_TOKEN env var
3. Claude Code credentials (~/.claude.json or ~/.claude/.credentials.json)
— with automatic refresh if expired and a refresh token is available
5. ANTHROPIC_API_KEY env var (regular API key, or legacy fallback)
4. ANTHROPIC_API_KEY env var (regular API key, or legacy fallback)
Returns the token string or None.
"""
# 1. Hermes credential pool — the live source of truth for tokens
# minted via ``hermes login anthropic`` / the dashboard PKCE flow.
# ``select()`` picks the best available entry and refreshes it if
# it's near expiry, so callers always get a fresh token.
#
# Skip env-sourced pool entries (``env:ANTHROPIC_TOKEN``, etc.) —
# those are passthroughs of the env var, and the env-var branches
# below have richer priority logic (``_prefer_refreshable_claude_code_token``)
# that can upgrade a static env OAuth token to a refreshed
# Claude Code token. Letting the pool win here would short-circuit
# that upgrade.
try:
from agent.credential_pool import load_pool
pool = load_pool("anthropic")
entry = pool.select()
if entry and entry.access_token and not entry.source.startswith("env:"):
return entry.access_token
except Exception as exc:
# Pool lookup is best-effort — fall through to env/file sources
# if anything goes wrong (e.g. auth.json corruption during a
# concurrent write).
logger.debug("Credential-pool lookup failed for anthropic: %s", exc)
creds = read_claude_code_credentials()
# 2. Hermes-managed OAuth/setup token env var
# 1. Hermes-managed OAuth/setup token env var
token = os.getenv("ANTHROPIC_TOKEN", "").strip()
if token:
preferred = _prefer_refreshable_claude_code_token(token, creds)
@@ -871,7 +794,7 @@ def resolve_anthropic_token() -> Optional[str]:
return preferred
return token
# 3. CLAUDE_CODE_OAUTH_TOKEN (used by Claude Code for setup-tokens)
# 2. CLAUDE_CODE_OAUTH_TOKEN (used by Claude Code for setup-tokens)
cc_token = os.getenv("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
if cc_token:
preferred = _prefer_refreshable_claude_code_token(cc_token, creds)
@@ -879,12 +802,12 @@ def resolve_anthropic_token() -> Optional[str]:
return preferred
return cc_token
# 4. Claude Code credential file
# 3. Claude Code credential file
resolved_claude_token = _resolve_claude_code_token_from_credentials(creds)
if resolved_claude_token:
return resolved_claude_token
# 5. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
# 4. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
# This remains as a compatibility fallback for pre-migration Hermes configs.
api_key = os.getenv("ANTHROPIC_API_KEY", "").strip()
if api_key:
@@ -1131,33 +1054,6 @@ def _sanitize_tool_id(tool_id: str) -> str:
return sanitized or "tool_0"
def _normalize_tool_input_schema(schema: Any) -> Dict[str, Any]:
"""Normalize tool schemas before sending them to Anthropic.
Anthropic's tool schema validator rejects nullable unions such as
``anyOf: [{"type": "string"}, {"type": "null"}]`` that Pydantic/MCP
commonly emits for optional fields. Tool optionality is represented by
the parent ``required`` array, so we delegate to the shared
``strip_nullable_unions`` helper to collapse nullable unions to the
non-null branch while preserving metadata like description/default.
``keep_nullable_hint=False`` because the Anthropic validator does not
recognize the OpenAPI-style ``nullable: true`` extension and strict
schema-to-grammar converters may reject unknown keywords.
"""
if not schema:
return {"type": "object", "properties": {}}
from tools.schema_sanitizer import strip_nullable_unions
normalized = strip_nullable_unions(schema, keep_nullable_hint=False)
if not isinstance(normalized, dict):
return {"type": "object", "properties": {}}
if normalized.get("type") == "object" and not isinstance(normalized.get("properties"), dict):
normalized = {**normalized, "properties": {}}
return normalized
def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
"""Convert OpenAI tool definitions to Anthropic format."""
if not tools:
@@ -1168,9 +1064,7 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
result.append({
"name": fn.get("name", ""),
"description": fn.get("description", ""),
"input_schema": _normalize_tool_input_schema(
fn.get("parameters", {"type": "object", "properties": {}})
),
"input_schema": fn.get("parameters", {"type": "object", "properties": {}}),
})
return result
@@ -1649,10 +1543,8 @@ def build_anthropic_kwargs(
"max_tokens too large given prompt" errors and retry with a smaller cap
(see parse_available_output_tokens_from_error + _ephemeral_max_output_tokens).
When *is_oauth* is True, enables the OAuth-only beta headers required by
Anthropic's subscription-gated Messages endpoint (fast-mode branch only;
the default headers are set by build_anthropic_client). No system-prompt
or tool-name rewriting is performed — Hermes identifies as itself.
When *is_oauth* is True, applies Claude Code compatibility transforms:
system prompt prefix, tool name prefixing, and prompt sanitization.
When *preserve_dots* is True, model name dots are not converted to hyphens
(for Alibaba/DashScope anthropic-compatible endpoints: qwen3.5-plus).
@@ -1685,11 +1577,45 @@ def build_anthropic_kwargs(
if context_length and effective_max_tokens > context_length:
effective_max_tokens = max(context_length - 1, 1)
# OAuth requests go through Anthropic's subscription-gated Messages
# endpoint but otherwise send the real Hermes system prompt and real
# Hermes tool names — the only OAuth-specific wire differences are
# Bearer auth and the _OAUTH_ONLY_BETAS header (applied in
# build_anthropic_client and the fast-mode branch below).
# ── OAuth: Claude Code identity ──────────────────────────────────
if is_oauth:
# 1. Prepend Claude Code system prompt identity
cc_block = {"type": "text", "text": _CLAUDE_CODE_SYSTEM_PREFIX}
if isinstance(system, list):
system = [cc_block] + system
elif isinstance(system, str) and system:
system = [cc_block, {"type": "text", "text": system}]
else:
system = [cc_block]
# 2. Sanitize system prompt — replace product name references
# to avoid Anthropic's server-side content filters.
for block in system:
if isinstance(block, dict) and block.get("type") == "text":
text = block.get("text", "")
text = text.replace("Hermes Agent", "Claude Code")
text = text.replace("Hermes agent", "Claude Code")
text = text.replace("hermes-agent", "claude-code")
text = text.replace("Nous Research", "Anthropic")
block["text"] = text
# 3. Prefix tool names with mcp_ (Claude Code convention)
if anthropic_tools:
for tool in anthropic_tools:
if "name" in tool:
tool["name"] = _MCP_TOOL_PREFIX + tool["name"]
# 4. Prefix tool names in message history (tool_use and tool_result blocks)
for msg in anthropic_messages:
content = msg.get("content")
if isinstance(content, list):
for block in content:
if isinstance(block, dict):
if block.get("type") == "tool_use" and "name" in block:
if not block["name"].startswith(_MCP_TOOL_PREFIX):
block["name"] = _MCP_TOOL_PREFIX + block["name"]
elif block.get("type") == "tool_result" and "tool_use_id" in block:
pass # tool_result uses ID, not name
kwargs: Dict[str, Any] = {
"model": model,
@@ -1780,9 +1706,6 @@ def build_anthropic_kwargs(
# extra_headers override the client-level anthropic-beta header).
betas = list(_common_betas_for_base_url(base_url))
if is_oauth:
# Strip context-1m — incompatible with OAuth auth. See matching
# comment in build_anthropic_client().
betas = [b for b in betas if b != _CONTEXT_1M_BETA]
betas.extend(_OAUTH_ONLY_BETAS)
betas.append(_FAST_MODE_BETA)
kwargs["extra_headers"] = {"anthropic-beta": ",".join(betas)}
+55 -338
View File
@@ -41,57 +41,10 @@ import threading
import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple, TYPE_CHECKING
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
# NOTE: `from openai import OpenAI` is deliberately NOT at module top — the
# openai SDK pulls a large type tree (~240 ms cold, including responses/*,
# graders/*). We expose `OpenAI` here as a thin proxy that imports the SDK on
# first call and forwards, so:
# (a) the 15+ in-module `OpenAI(...)` construction sites work unchanged
# (Python's function-scope name lookup resolves `OpenAI` to the proxy
# object bound in module globals here, without triggering any import);
# (b) external code can still do `auxiliary_client.OpenAI` or
# `patch("agent.auxiliary_client.OpenAI", ...)` — tests see the proxy,
# and patch replaces the module attribute as usual;
# (c) `OpenAI` as a type annotation resolves at runtime to the proxy class
# (which is harmless — annotations aren't type-checked at runtime).
# See tests/agent/test_auxiliary_client.py for patch patterns this supports.
if TYPE_CHECKING:
from openai import OpenAI # noqa: F401 — type hints only
_OPENAI_CLS_CACHE: Optional[type] = None
def _load_openai_cls() -> type:
"""Import and cache ``openai.OpenAI``."""
global _OPENAI_CLS_CACHE
if _OPENAI_CLS_CACHE is None:
from openai import OpenAI as _cls
_OPENAI_CLS_CACHE = _cls
return _OPENAI_CLS_CACHE
class _OpenAIProxy:
"""Module-level proxy that looks like the ``openai.OpenAI`` class.
Forwards ``OpenAI(...)`` calls and ``isinstance(x, OpenAI)`` checks to the
real SDK class, importing the SDK lazily on first use.
"""
__slots__ = ()
def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)
def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())
def __repr__(self):
return "<lazy openai.OpenAI proxy>"
OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from openai import OpenAI
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
@@ -129,8 +82,6 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -141,10 +92,6 @@ _PROVIDER_ALIASES = {
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
}
@@ -208,7 +155,6 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"gmi": "google/gemini-3.1-flash-lite-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
@@ -217,7 +163,6 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"opencode-go": "glm-5",
"kilocode": "google/gemini-3-flash-preview",
"ollama-cloud": "nemotron-3-nano:30b",
"tencent-tokenhub": "hy3-preview",
}
# Vision-specific model overrides for direct providers.
@@ -457,33 +402,6 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Translate extra_body.reasoning (chat.completions shape) into the
# Responses API's top-level reasoning + include fields. Mirrors
# agent/transports/codex.py::build_kwargs() so auxiliary callers
# that configure reasoning via auxiliary.<task>.extra_body get the
# same behavior as the main agent's Codex transport.
extra_body = kwargs.get("extra_body") or {}
if isinstance(extra_body, dict):
reasoning_cfg = extra_body.get("reasoning")
if isinstance(reasoning_cfg, dict):
if reasoning_cfg.get("enabled") is False:
# Reasoning explicitly disabled — do not set reasoning
# or include. The Codex backend still thinks by
# default, but we honor the caller's intent where the
# API allows it.
pass
else:
effort = reasoning_cfg.get("effort", "medium")
# Codex backend rejects "minimal"; clamp to "low" to
# match the main-agent Codex transport behavior.
if effort == "minimal":
effort = "low"
resp_kwargs["reasoning"] = {
"effort": effort,
"summary": "auto",
}
resp_kwargs["include"] = ["reasoning.encrypted_content"]
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
@@ -713,7 +631,9 @@ class _AnthropicCompletionsAdapter:
response = self._client.messages.create(**anthropic_kwargs)
_transport = get_transport("anthropic_messages")
_nr = _transport.normalize_response(response)
_nr = _transport.normalize_response(
response, strip_tool_prefix=self._is_oauth
)
# ToolCall already duck-types as OpenAI shape (.type, .function.name,
# .function.arguments) via properties, so no wrapping needed.
@@ -791,116 +711,6 @@ class AsyncAnthropicAuxiliaryClient:
self.base_url = sync_wrapper.base_url
def _endpoint_speaks_anthropic_messages(base_url: str) -> bool:
"""True if the endpoint at ``base_url`` speaks the Anthropic Messages
protocol instead of OpenAI chat.completions.
Mirrors ``hermes_cli.runtime_provider._detect_api_mode_for_url`` so the
auxiliary client and the main agent stay in sync on transport selection.
Covers:
- Any URL ending in ``/anthropic`` (MiniMax, Zhipu GLM, LiteLLM proxies,
Anthropic-compatible gateways).
- ``api.kimi.com/coding`` (Kimi Coding Plan — the /coding route only
speaks Claude-Code's native Anthropic shape; ``chat.completions``
returns 404 on Anthropic-only model aliases like ``kimi-for-coding``).
- ``api.anthropic.com`` (native Anthropic).
"""
normalized = (base_url or "").strip().lower().rstrip("/")
if not normalized:
return False
if normalized.endswith("/anthropic"):
return True
hostname = base_url_hostname(normalized)
if hostname == "api.anthropic.com":
return True
if hostname == "api.kimi.com" and "/coding" in normalized:
return True
return False
def _maybe_wrap_anthropic(
client_obj: Any,
model: str,
api_key: str,
base_url: str,
api_mode: Optional[str] = None,
) -> Any:
"""Rewrap a plain OpenAI client in ``AnthropicAuxiliaryClient`` when
the endpoint actually speaks Anthropic Messages.
This is the single chokepoint for aux-client transport correction.
Runs at the end of every ``resolve_provider_client`` branch so that
api_key providers (Kimi Coding Plan), the ``custom`` endpoint, and
future /anthropic gateways all land on the right wire format
regardless of which branch built the client.
Returns ``client_obj`` unchanged when:
- It's already an Anthropic/Codex/Gemini/CopilotACP wrapper.
- The endpoint is an OpenAI-wire endpoint.
- ``api_mode`` is explicitly set to a non-Anthropic transport.
- The ``anthropic`` SDK is not installed (falls back to OpenAI wire).
"""
# Already wrapped — don't double-wrap.
if isinstance(client_obj, AnthropicAuxiliaryClient):
return client_obj
# Other specialized adapters we should never re-dispatch.
if isinstance(client_obj, CodexAuxiliaryClient):
return client_obj
try:
from agent.gemini_native_adapter import GeminiNativeClient
if isinstance(client_obj, GeminiNativeClient):
return client_obj
except ImportError:
pass
try:
from agent.copilot_acp_client import CopilotACPClient
if isinstance(client_obj, CopilotACPClient):
return client_obj
except ImportError:
pass
# Explicit non-anthropic api_mode wins over URL heuristics.
if api_mode and api_mode != "anthropic_messages":
return client_obj
should_wrap = (
api_mode == "anthropic_messages"
or _endpoint_speaks_anthropic_messages(base_url)
)
if not should_wrap:
return client_obj
try:
from agent.anthropic_adapter import build_anthropic_client
except ImportError:
logger.warning(
"Endpoint %s speaks Anthropic Messages but the anthropic SDK is "
"not installed — falling back to OpenAI-wire (will likely 404).",
base_url,
)
return client_obj
try:
real_client = build_anthropic_client(api_key, base_url)
except Exception as exc:
logger.warning(
"Failed to build Anthropic client for %s (%s) — falling back to "
"OpenAI-wire client.", base_url, exc,
)
return client_obj
logger.debug(
"Auxiliary transport: wrapping client in AnthropicAuxiliaryClient "
"(model=%s, base_url=%s, api_mode=%s)",
model, base_url[:60] if base_url else "", api_mode or "auto-detected",
)
return AnthropicAuxiliaryClient(
real_client, model, api_key, base_url, is_oauth=False,
)
def _read_nous_auth() -> Optional[dict]:
"""Read and validate ~/.hermes/auth.json for an active Nous provider.
@@ -1071,9 +881,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, base_url)
return _client, model
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
creds = resolve_api_key_provider_credentials(provider_id)
api_key = str(creds.get("api_key", "")).strip()
@@ -1099,9 +907,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, base_url)
return _client, model
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
return None, None
@@ -1385,13 +1191,7 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
# URL-based anthropic detection for custom endpoints that didn't set
# api_mode explicitly (e.g. kimi.com/coding reached via custom config).
_fallback_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _maybe_wrap_anthropic(
_fallback_client, model, custom_key, custom_base, custom_mode,
)
return _fallback_client, model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
@@ -1817,14 +1617,8 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
# below — never look up auth env vars ad-hoc.
def _to_async_client(sync_client, model: str, is_vision: bool = False):
"""Convert a sync client to its async counterpart, preserving Codex routing.
When ``is_vision=True`` and the underlying base URL is Copilot, the
resulting async client carries the ``Copilot-Vision-Request: true``
header so the request is routed to Copilot's vision-capable
infrastructure (otherwise vision payloads silently time out).
"""
def _to_async_client(sync_client, model: str):
"""Convert a sync client to its async counterpart, preserving Codex routing."""
from openai import AsyncOpenAI
if isinstance(sync_client, CodexAuxiliaryClient):
@@ -1853,11 +1647,9 @@ def _to_async_client(sync_client, model: str, is_vision: bool = False):
if base_url_host_matches(sync_base_url, "openrouter.ai"):
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
from hermes_cli.copilot_auth import copilot_request_headers
from hermes_cli.models import copilot_default_headers
async_kwargs["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
async_kwargs["default_headers"] = copilot_default_headers()
elif base_url_host_matches(sync_base_url, "api.kimi.com"):
async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
return AsyncOpenAI(**async_kwargs), model
@@ -1884,7 +1676,6 @@ def resolve_provider_client(
explicit_api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Central router: given a provider name and optional model, return a
configured client with the correct auth, base URL, and API format.
@@ -1942,20 +1733,8 @@ def resolve_provider_client(
return True
return False
def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = "",
api_key_str: str = ""):
"""Wrap a plain OpenAI client in the correct transport adapter.
Handles two cases:
- ``CodexAuxiliaryClient`` when the endpoint needs the Responses API
(explicit ``api_mode=codex_responses`` or api.openai.com + codex
model name).
- ``AnthropicAuxiliaryClient`` when the endpoint speaks Anthropic
Messages (explicit ``api_mode=anthropic_messages``, any ``/anthropic``
suffix, ``api.kimi.com/coding``, or ``api.anthropic.com``).
Clients that are already specialized wrappers pass through unchanged.
"""
def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
"""Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
logger.debug(
"resolve_provider_client: wrapping client in CodexAuxiliaryClient "
@@ -1963,11 +1742,7 @@ def resolve_provider_client(
api_mode or "auto-detected", final_model_str,
base_url_str[:60] if base_url_str else "")
return CodexAuxiliaryClient(client_obj, final_model_str)
# Anthropic-wire endpoints: rewrap plain OpenAI clients so
# chat.completions.create() is translated to /v1/messages.
return _maybe_wrap_anthropic(
client_obj, final_model_str, api_key_str, base_url_str, api_mode,
)
return client_obj
# ── Auto: try all providers in priority order ────────────────────
if provider == "auto":
@@ -1984,7 +1759,7 @@ def resolve_provider_client(
"auxiliary provider (using %r instead)", model, resolved)
model = None
final_model = model or resolved
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# ── OpenRouter ───────────────────────────────────────────────────
@@ -1997,7 +1772,7 @@ def resolve_provider_client(
)
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# ── Nous Portal (OAuth) ──────────────────────────────────────────
@@ -2014,7 +1789,7 @@ def resolve_provider_client(
"but Nous Portal not configured (run: hermes auth)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
@@ -2041,13 +1816,13 @@ def resolve_provider_client(
"but no Codex OAuth token found (run: hermes model)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
if provider == "custom":
if explicit_base_url:
custom_base = _to_openai_base_url(explicit_base_url).strip()
custom_base = explicit_base_url.strip()
custom_key = (
(explicit_api_key or "").strip()
or os.getenv("OPENAI_API_KEY", "").strip()
@@ -2060,7 +1835,7 @@ def resolve_provider_client(
)
return None, None
final_model = _normalize_resolved_model(
model or (main_runtime.get("model") if main_runtime else None) or "gpt-4o-mini",
model or _read_main_model() or "gpt-4o-mini",
provider,
)
extra = {}
@@ -2070,13 +1845,11 @@ def resolve_provider_client(
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.copilot_auth import copilot_request_headers
extra["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# Try custom first, then codex, then API-key providers
for try_fn in (_try_custom_endpoint, _try_codex,
@@ -2085,9 +1858,8 @@ def resolve_provider_client(
if client is not None:
final_model = _normalize_resolved_model(model or default, provider)
_cbase = str(getattr(client, "base_url", "") or "")
_ckey = str(getattr(client, "api_key", "") or "")
client = _wrap_if_needed(client, final_model, _cbase, _ckey)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
client = _wrap_if_needed(client, final_model, _cbase)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: custom/main requested "
"but no endpoint credentials found")
@@ -2109,22 +1881,10 @@ def resolve_provider_client(
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model
or custom_entry.get("model")
or (main_runtime.get("model") if main_runtime else None)
or _read_main_model()
or "gpt-4o-mini",
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
# anthropic_messages talks to the /anthropic surface directly;
# OpenAI-wire paths (chat_completions / codex_responses) need the
# /v1 equivalent. Rewrite only on the OpenAI-wire path so the
# Anthropic fallback SDK still sees the original URL.
if entry_api_mode == "anthropic_messages":
openai_base = custom_base
else:
openai_base = _to_openai_base_url(custom_base)
_clean_base2, _dq2 = _extract_url_query_params(openai_base)
_clean_base2, _dq2 = _extract_url_query_params(custom_base)
_extra2 = {"default_query": _dq2} if _dq2 else {}
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
@@ -2143,13 +1903,8 @@ def resolve_provider_client(
"installed — falling back to OpenAI-wire.",
provider,
)
# Fallback went OpenAI-wire after all — redo the query
# extraction against the rewritten /v1 URL.
_fallback_base = _to_openai_base_url(custom_base)
_fb_clean, _fb_dq = _extract_url_query_params(_fallback_base)
_fb_extra = {"default_query": _fb_dq} if _fb_dq else {}
client = OpenAI(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
@@ -2167,8 +1922,8 @@ def resolve_provider_client(
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, openai_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning(
"resolve_provider_client: named custom provider %r has no base_url",
@@ -2200,7 +1955,7 @@ def resolve_provider_client(
logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
return None, None
final_model = _normalize_resolved_model(model or default_model, provider)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
creds = resolve_api_key_provider_credentials(provider)
api_key = str(creds.get("api_key", "")).strip()
@@ -2226,7 +1981,7 @@ def resolve_provider_client(
if is_native_gemini_base_url(base_url):
client = GeminiNativeClient(api_key=api_key, base_url=base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
# Provider-specific headers
@@ -2234,11 +1989,9 @@ def resolve_provider_client(
if base_url_host_matches(base_url, "api.kimi.com"):
headers["User-Agent"] = "claude-code/0.1.0"
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.copilot_auth import copilot_request_headers
from hermes_cli.models import copilot_default_headers
headers.update(copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
))
headers.update(copilot_default_headers())
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
@@ -2260,24 +2013,16 @@ def resolve_provider_client(
# Honor api_mode for any API-key provider (e.g. direct OpenAI with
# codex-family models). The copilot-specific wrapping above handles
# copilot; this covers the general case (#6800). Also rewraps
# Anthropic-wire endpoints (Kimi Coding Plan api.kimi.com/coding,
# /anthropic-suffixed gateways) so named providers like kimi-coding
# land on the right transport without needing per-provider branches.
client = _wrap_if_needed(client, final_model, base_url, api_key)
# copilot; this covers the general case (#6800).
client = _wrap_if_needed(client, final_model, base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
if pconfig.auth_type == "external_process":
creds = resolve_external_process_provider_credentials(provider)
final_model = _normalize_resolved_model(
model
or (main_runtime.get("model") if main_runtime else None)
or _read_main_model(),
provider,
)
final_model = _normalize_resolved_model(model or _read_main_model(), provider)
if provider == "copilot-acp":
api_key = str(creds.get("api_key", "")).strip()
base_url = str(creds.get("base_url", "")).strip()
@@ -2304,7 +2049,7 @@ def resolve_provider_client(
args=args,
)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: external-process provider %s not "
"directly supported", provider)
@@ -2340,7 +2085,7 @@ def resolve_provider_client(
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
@@ -2415,13 +2160,8 @@ def _normalize_vision_provider(provider: Optional[str]) -> str:
return _normalize_aux_provider(provider)
def _resolve_strict_vision_backend(
provider: str,
model: Optional[str] = None,
) -> Tuple[Optional[Any], Optional[str]]:
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
provider = _normalize_vision_provider(provider)
if provider == "copilot":
return resolve_provider_client("copilot", model, is_vision=True)
if provider == "openrouter":
return _try_openrouter()
if provider == "nous":
@@ -2489,7 +2229,7 @@ def resolve_vision_provider_client(
return resolved_provider, None, None
final_model = resolved_model or default_model
if async_mode:
async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
async_client, async_model = _to_async_client(sync_client, final_model)
return resolved_provider, async_client, async_model
return resolved_provider, sync_client, final_model
@@ -2521,11 +2261,8 @@ def resolve_vision_provider_client(
main_provider = _read_main_provider()
main_model = _read_main_model()
if main_provider and main_provider not in ("auto", ""):
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
if main_provider == "nous":
sync_client, default_model = _resolve_strict_vision_backend(
main_provider, vision_model
)
sync_client, default_model = _resolve_strict_vision_backend(main_provider)
if sync_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2533,10 +2270,10 @@ def resolve_vision_provider_client(
)
return _finalize(main_provider, sync_client, default_model)
else:
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
rpc_client, rpc_model = resolve_provider_client(
main_provider, vision_model,
api_mode=resolved_api_mode,
is_vision=True)
api_mode=resolved_api_mode)
if rpc_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2558,14 +2295,11 @@ def resolve_vision_provider_client(
return None, None, None
if requested in _VISION_AUTO_PROVIDER_ORDER:
sync_client, default_model = _resolve_strict_vision_backend(
requested, resolved_model
)
sync_client, default_model = _resolve_strict_vision_backend(requested)
return _finalize(requested, sync_client, default_model)
client, final_model = _get_cached_client(requested, resolved_model, async_mode,
api_mode=resolved_api_mode,
is_vision=True)
api_mode=resolved_api_mode)
if client is None:
return requested, None, None
return requested, client, final_model
@@ -2629,11 +2363,10 @@ def _client_cache_key(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> tuple:
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
@@ -2659,7 +2392,6 @@ def _refresh_nous_auxiliary_client(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
runtime = _resolve_nous_runtime_api(force_refresh=True)
@@ -2677,7 +2409,7 @@ def _refresh_nous_auxiliary_client(
current_loop = _aio.get_event_loop()
except RuntimeError:
pass
client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
client, final_model = _to_async_client(sync_client, final_model or "")
else:
client = sync_client
@@ -2688,7 +2420,6 @@ def _refresh_nous_auxiliary_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
_store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
return client, final_model
@@ -2800,19 +2531,12 @@ def _is_openrouter_client(client: Any) -> bool:
return False
def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
"""Best-effort check for cached clients that accept ``vendor/model`` IDs."""
if _is_openrouter_client(client):
return True
return bool(cached_default and "/" in cached_default)
def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
"""Keep slash-bearing model IDs only for cached clients that support them.
"""Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
"""
if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
if model and "/" in model and not _is_openrouter_client(client):
return cached_default
return model or cached_default
@@ -2825,7 +2549,6 @@ def _get_cached_client(
api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Get or create a cached client for the given provider.
@@ -2862,7 +2585,6 @@ def _get_cached_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
with _client_cache_lock:
if cache_key in _client_cache:
@@ -2894,7 +2616,6 @@ def _get_cached_client(
explicit_api_key=api_key,
api_mode=api_mode,
main_runtime=runtime,
is_vision=is_vision,
)
if client is not None:
# For async clients, remember which loop they were created on so we
@@ -3358,7 +3079,6 @@ def call_llm(
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
@@ -3649,7 +3369,6 @@ async def async_call_llm(
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
@@ -3718,9 +3437,7 @@ async def async_call_llm(
extra_body=effective_extra_body,
base_url=str(getattr(fb_client, "base_url", "") or ""))
# Convert sync fallback client to async
async_fb, async_fb_model = _to_async_client(
fb_client, fb_model or "", is_vision=(task == "vision")
)
async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
if async_fb_model and async_fb_model != fb_kwargs.get("model"):
fb_kwargs["model"] = async_fb_model
return _validate_llm_response(
+3 -41
View File
@@ -291,52 +291,14 @@ def has_aws_credentials(env: Optional[Dict[str, str]] = None) -> bool:
def resolve_bedrock_region(env: Optional[Dict[str, str]] = None) -> str:
"""Resolve the AWS region for Bedrock API calls.
Priority:
1. AWS_REGION env var
2. AWS_DEFAULT_REGION env var
3. boto3/botocore configured region (from ~/.aws/config or SSO profile)
4. us-east-1 (hard fallback)
The boto3 fallback is critical for EU/AP users who configure their region
in ~/.aws/config via a named profile rather than env vars — without it,
live model discovery would always return us.* profile IDs regardless of
the user's actual region.
Priority: AWS_REGION → AWS_DEFAULT_REGION → us-east-1 (fallback).
"""
env = env if env is not None else os.environ
explicit = (
return (
env.get("AWS_REGION", "").strip()
or env.get("AWS_DEFAULT_REGION", "").strip()
or "us-east-1"
)
if explicit:
return explicit
try:
import botocore.session
region = botocore.session.get_session().get_config_variable("region")
if region:
return region
except Exception:
pass
return "us-east-1"
def bedrock_model_ids_or_none() -> Optional[List[str]]:
"""Live-discover Bedrock model IDs for the active region.
Returns a list of model ID strings if discovery succeeds and yields
at least one model, or ``None`` on failure / empty result. Callers
should fall back to the static curated list when ``None`` is returned.
This helper consolidates the discover → extract-ids → fallback
pattern that was previously duplicated across ``provider_model_ids``,
``list_authenticated_providers`` section 2, and section 3.
"""
try:
discovered = discover_bedrock_models(resolve_bedrock_region())
if discovered:
return [m["id"] for m in discovered]
except Exception:
pass
return None
# ---------------------------------------------------------------------------
+5 -113
View File
@@ -61,52 +61,9 @@ _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
# Flat token cost per attached image part. Real cost varies by provider and
# dimensions (Anthropic ≈ width×height/750, GPT-4o up to ~1700 for
# high-detail 2048×2048, Gemini 258/tile), but 1600 is a realistic ceiling
# that keeps compression budgeting honest for multi-image conversations.
# Matches Claude Code's IMAGE_TOKEN_ESTIMATE constant.
_IMAGE_TOKEN_ESTIMATE = 1600
# Same figure expressed in the char-budget currency the rest of the
# compressor speaks in. Used when accumulating message "content length"
# for tail-cut decisions.
_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _content_length_for_budget(raw_content: Any) -> int:
"""Return the effective char-length of a message's content for token budgeting.
Plain strings: ``len(content)``. Multimodal lists: sum of text-part
``len(text)`` plus a flat ``_IMAGE_CHAR_EQUIVALENT`` per image part
(``image_url`` / ``input_image`` / Anthropic-style ``image``). This
keeps the compressor from treating a turn with 5 attached images as
near-zero tokens just because the text part is empty.
"""
if isinstance(raw_content, str):
return len(raw_content)
if not isinstance(raw_content, list):
return len(str(raw_content or ""))
total = 0
for p in raw_content:
if isinstance(p, str):
total += len(p)
continue
if not isinstance(p, dict):
total += len(str(p))
continue
ptype = p.get("type")
if ptype in {"image_url", "input_image", "image"}:
total += _IMAGE_CHAR_EQUIVALENT
else:
# text / input_text / tool_result-with-text / anything else with
# a text field. Ignore the raw base64 payload inside image_url
# dicts — dimensions don't matter, only whether it's an image.
total += len(p.get("text", "") or "")
return total
def _content_text_for_contains(content: Any) -> str:
"""Return a best-effort text view of message content.
@@ -338,10 +295,6 @@ class ContextCompressor(ContextEngine):
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -445,17 +398,6 @@ class ContextCompressor(ContextEngine):
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
# When summary generation fails and a static fallback is inserted,
# record how many turns were unrecoverably dropped so callers
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
# succeeded. Silent recovery would hide the broken config.
self._last_aux_model_failure_error: Optional[str] = None
self._last_aux_model_failure_model: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -542,7 +484,7 @@ class ContextCompressor(ContextEngine):
for i in range(len(result) - 1, -1, -1):
msg = result[i]
raw_content = msg.get("content") or ""
content_len = _content_length_for_budget(raw_content)
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -915,50 +857,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
# Record the aux-model failure so callers can warn the user
# even if the retry-on-main succeeds — a misconfigured aux
# model is something the user needs to fix.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic) # retry immediately
# Unknown-error best-effort retry on main model. Losing N turns of
# context is almost always worse than one extra summary attempt, so
# if we haven't already fallen back and the summary model differs
# from the main model, try once more on main before entering
# cooldown. Errors that DID match _is_model_not_found above are
# already handled by the fast-path retry; this branch catches
# everything else (400s, provider-specific "no route" strings,
# aggregator rejections, etc.) where auto-retry is still safer
# than dropping the turns.
if (
self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' failed (%s). "
"Retrying on main model '%s' before giving up.",
self.summary_model, e, self.model,
)
# Record the aux-model failure (see 404 branch above) — user
# should know their configured model is broken even if main
# recovers the call.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
@@ -1180,9 +1082,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
for i in range(n - 1, head_end - 1, -1):
msg = messages[i]
raw_content = msg.get("content") or ""
content_len = _content_length_for_budget(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
content = msg.get("content") or ""
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -1251,13 +1152,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
related to this topic and be more aggressive about compressing
everything else. Inspired by Claude Code's ``/compact``.
"""
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_summary_error = None
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
@@ -1336,13 +1230,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting static fallback context marker")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"Summary generation was unavailable. {n_dropped} conversation turns were "
f"removed to free context space but could not be summarized. The removed "
f"messages contained earlier work in this session. Continue based on the "
f"turns contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
)
+4 -82
View File
@@ -7,13 +7,13 @@ import random
import threading
import time
import uuid
import os
import re
from dataclasses import dataclass, fields, replace
from datetime import datetime
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import get_env_value
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
@@ -455,70 +455,6 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_codex_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Codex device_code pool entry from auth.json if tokens differ.
When a Codex OAuth access token expires (or the ChatGPT account hits
its 5h/weekly quota), the pool entry gets marked ``STATUS_EXHAUSTED``
with a ``last_error_reset_at`` that can be many hours in the future.
Meanwhile the user may run ``hermes model`` / ``hermes auth`` which
performs a fresh device-code login and writes new tokens to
``auth.json`` under ``_auth_store_lock``. Without this sync the pool
entry stays frozen until ``last_error_reset_at`` elapses — even
though fresh credentials are sitting on disk — and every request
fails with "no available entries (all exhausted or empty)".
Mirrors the Nous/Anthropic resync paths above. Only applies to
device_code-sourced entries; env/API-key-sourced entries have no
auth.json shadow to sync from.
"""
if self.provider != "openai-codex" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "openai-codex")
if not isinstance(state, dict):
return entry
tokens = state.get("tokens")
if not isinstance(tokens, dict):
return entry
store_access = tokens.get("access_token", "")
store_refresh = tokens.get("refresh_token", "")
# Adopt auth.json tokens when either side differs. Codex refresh
# tokens are single-use too, so a fresh refresh_token from
# another process means our entry's pair is consumed/stale.
entry_access = entry.access_token or ""
entry_refresh = entry.refresh_token or ""
if store_access and (
store_access != entry_access
or (store_refresh and store_refresh != entry_refresh)
):
logger.debug(
"Pool entry %s: syncing Codex tokens from auth.json "
"(refreshed by another process)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh or entry.refresh_token,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
"last_error_reason": None,
"last_error_message": None,
"last_error_reset_at": None,
}
if state.get("last_refresh"):
field_updates["last_refresh"] = state["last_refresh"]
updated = replace(entry, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Codex entry from auth.json: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
@@ -851,18 +787,6 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For openai-codex entries, same pattern: the user may have
# re-authed via `hermes model` / `hermes auth` after a 429/401,
# leaving fresh tokens on disk while the pool entry is still
# frozen behind last_error_reset_at (can be hours in the
# future for ChatGPT weekly windows).
if (self.provider == "openai-codex"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_codex_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -1349,8 +1273,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
def _is_source_suppressed(_p, _s): # type: ignore[misc]
return False
if provider == "openrouter":
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value("OPENROUTER_API_KEY") or "").strip()
token = os.getenv("OPENROUTER_API_KEY", "").strip()
if token:
source = "env:OPENROUTER_API_KEY"
if _is_source_suppressed(provider, source):
@@ -1376,7 +1299,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
env_url = ""
if pconfig.base_url_env_var:
env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")
env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
env_vars = list(pconfig.api_key_env_vars)
if provider == "anthropic":
@@ -1387,8 +1310,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
]
for env_var in env_vars:
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value(env_var) or "").strip()
token = os.getenv(env_var, "").strip()
if not token:
continue
source = f"env:{env_var}"
+1
View File
@@ -47,6 +47,7 @@ from __future__ import annotations
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Callable, List, Optional
-561
View File
@@ -1,561 +0,0 @@
"""Curator — background skill maintenance orchestrator.
The curator is an auxiliary-model task that periodically reviews agent-created
skills and maintains the collection. It runs inactivity-triggered (no cron
daemon): when the agent is idle and the last curator run was longer than
``interval_hours`` ago, ``maybe_run_curator()`` spawns a forked AIAgent to do
the review.
Responsibilities:
- Auto-transition lifecycle states based on last_used_at timestamps
- Spawn a background review agent that can pin / archive / consolidate /
patch agent-created skills via skill_manage
- Persist curator state (last_run_at, paused, etc.) in .curator_state
Strict invariants:
- Only touches agent-created skills (see tools/skill_usage.is_agent_created)
- Never auto-deletes — only archives. Archive is recoverable.
- Pinned skills bypass all auto-transitions
- Uses the auxiliary client; never touches the main session's prompt cache
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import threading
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Callable, Dict, Optional
from hermes_constants import get_hermes_home
from tools import skill_usage
logger = logging.getLogger(__name__)
DEFAULT_INTERVAL_HOURS = 24 * 7 # 7 days
DEFAULT_MIN_IDLE_HOURS = 2
DEFAULT_STALE_AFTER_DAYS = 30
DEFAULT_ARCHIVE_AFTER_DAYS = 90
# ---------------------------------------------------------------------------
# .curator_state — persistent scheduler + status
# ---------------------------------------------------------------------------
def _state_file() -> Path:
return get_hermes_home() / "skills" / ".curator_state"
def _default_state() -> Dict[str, Any]:
return {
"last_run_at": None,
"last_run_duration_seconds": None,
"last_run_summary": None,
"paused": False,
"run_count": 0,
}
def load_state() -> Dict[str, Any]:
path = _state_file()
if not path.exists():
return _default_state()
try:
data = json.loads(path.read_text(encoding="utf-8"))
if isinstance(data, dict):
base = _default_state()
base.update({k: v for k, v in data.items() if k in base or k.startswith("_")})
return base
except (OSError, json.JSONDecodeError) as e:
logger.debug("Failed to read curator state: %s", e)
return _default_state()
def save_state(data: Dict[str, Any]) -> None:
path = _state_file()
try:
path.parent.mkdir(parents=True, exist_ok=True)
fd, tmp = tempfile.mkstemp(dir=str(path.parent), prefix=".curator_state_", suffix=".tmp")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
except Exception as e:
logger.debug("Failed to save curator state: %s", e, exc_info=True)
def set_paused(paused: bool) -> None:
state = load_state()
state["paused"] = bool(paused)
save_state(state)
def is_paused() -> bool:
return bool(load_state().get("paused"))
# ---------------------------------------------------------------------------
# Config access
# ---------------------------------------------------------------------------
def _load_config() -> Dict[str, Any]:
"""Read curator.* config from ~/.hermes/config.yaml. Tolerates missing file."""
try:
from hermes_cli.config import load_config
cfg = load_config()
except Exception as e:
logger.debug("Failed to load config for curator: %s", e)
return {}
if not isinstance(cfg, dict):
return {}
cur = cfg.get("curator") or {}
if not isinstance(cur, dict):
return {}
return cur
def is_enabled() -> bool:
"""Default ON when no config says otherwise."""
cfg = _load_config()
return bool(cfg.get("enabled", True))
def get_interval_hours() -> int:
cfg = _load_config()
try:
return int(cfg.get("interval_hours", DEFAULT_INTERVAL_HOURS))
except (TypeError, ValueError):
return DEFAULT_INTERVAL_HOURS
def get_min_idle_hours() -> float:
cfg = _load_config()
try:
return float(cfg.get("min_idle_hours", DEFAULT_MIN_IDLE_HOURS))
except (TypeError, ValueError):
return DEFAULT_MIN_IDLE_HOURS
def get_stale_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("stale_after_days", DEFAULT_STALE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_STALE_AFTER_DAYS
def get_archive_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("archive_after_days", DEFAULT_ARCHIVE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_ARCHIVE_AFTER_DAYS
# ---------------------------------------------------------------------------
# Idle / interval check
# ---------------------------------------------------------------------------
def _parse_iso(ts: Optional[str]) -> Optional[datetime]:
if not ts:
return None
try:
return datetime.fromisoformat(ts)
except (TypeError, ValueError):
return None
def should_run_now(now: Optional[datetime] = None) -> bool:
"""Return True if the curator should run immediately.
Gates:
- curator.enabled == True
- not paused
- last_run_at missing, OR older than interval_hours
The idle check (min_idle_hours) is applied at the call site where we know
whether an agent is actively running — here we only enforce the static
gates.
"""
if not is_enabled():
return False
if is_paused():
return False
state = load_state()
last = _parse_iso(state.get("last_run_at"))
if last is None:
return True
if now is None:
now = datetime.now(timezone.utc)
if last.tzinfo is None:
last = last.replace(tzinfo=timezone.utc)
interval = timedelta(hours=get_interval_hours())
return (now - last) >= interval
# ---------------------------------------------------------------------------
# Automatic state transitions (pure function, no LLM)
# ---------------------------------------------------------------------------
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every agent-created skill and move active/stale/archived based on
last_used_at. Pinned skills are never touched. Returns a counter dict
describing what changed."""
from tools import skill_usage as _u
if now is None:
now = datetime.now(timezone.utc)
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}
for row in _u.agent_created_report():
counts["checked"] += 1
name = row["name"]
if row.get("pinned"):
continue
last_used = _parse_iso(row.get("last_used_at"))
# If never used, treat as using created_at as the anchor so new skills
# don't immediately archive themselves.
anchor = last_used or _parse_iso(row.get("created_at")) or now
if anchor.tzinfo is None:
anchor = anchor.replace(tzinfo=timezone.utc)
current = row.get("state", _u.STATE_ACTIVE)
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
ok, _msg = _u.archive_skill(name)
if ok:
counts["archived"] += 1
elif anchor <= stale_cutoff and current == _u.STATE_ACTIVE:
_u.set_state(name, _u.STATE_STALE)
counts["marked_stale"] += 1
elif anchor > stale_cutoff and current == _u.STATE_STALE:
# Skill got used again after being marked stale — reactivate.
_u.set_state(name, _u.STATE_ACTIVE)
counts["reactivated"] += 1
return counts
# ---------------------------------------------------------------------------
# Review prompt for the forked agent
# ---------------------------------------------------------------------------
CURATOR_REVIEW_PROMPT = (
"You are running as Hermes' background skill CURATOR. This is an "
"UMBRELLA-BUILDING consolidation pass, not a passive audit and not a "
"duplicate-finder.\n\n"
"The goal of the skill collection is a LIBRARY OF CLASS-LEVEL "
"INSTRUCTIONS AND EXPERIENTIAL KNOWLEDGE. A collection of hundreds of "
"narrow skills where each one captures one session's specific bug is "
"a FAILURE of the library — not a feature. An agent searching skills "
"matches on descriptions, not on exact names; one broad umbrella "
"skill with labeled subsections beats five narrow siblings for "
"discoverability, not the other way around.\n\n"
"The right target shape is CLASS-LEVEL skills with rich SKILL.md "
"bodies + `references/`, `templates/`, and `scripts/` subfiles for "
"session-specific detail — not one-session-one-skill micro-entries.\n\n"
"Hard rules — do not violate:\n"
"1. DO NOT touch bundled or hub-installed skills. The candidate list "
"below is already filtered to agent-created skills only.\n"
"2. DO NOT delete any skill. Archiving (moving the skill's directory "
"into ~/.hermes/skills/.archive/) is the maximum destructive action. "
"Archives are recoverable; deletion is not.\n"
"3. DO NOT touch skills shown as pinned=yes. Skip them entirely.\n"
"4. DO NOT use usage counters as a reason to skip consolidation. The "
"counters are new and often mostly zero. Judge overlap on CONTENT, "
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
"absence of evidence either way.\n"
"5. DO NOT reject consolidation on the grounds that 'each skill has "
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
"right bar is: 'would a human maintainer write this as N separate "
"skills, or as one skill with N labeled subsections?' When the "
"answer is the latter, merge.\n\n"
"How to work — not optional:\n"
"1. Scan the full candidate list. Identify PREFIX CLUSTERS (skills "
"sharing a first word or domain keyword). Examples you are likely "
"to find: hermes-config-*, hermes-dashboard-*, gateway-*, codex-*, "
"ollama-*, anthropic-*, gemini-*, mcp-*, salvage-*, pr-*, "
"competitor-*, python-*, security-*, etc. Expect 10-25 clusters.\n"
"2. For each cluster with 2+ members, do NOT ask 'are these pairs "
"overlapping?' — ask 'what is the UMBRELLA CLASS these skills all "
"serve? Would a maintainer name that class and write one skill for "
"it?' If yes, pick (or create) the umbrella and absorb the siblings "
"into it.\n"
"3. Three ways to consolidate — use the right one per cluster:\n"
" a. MERGE INTO EXISTING UMBRELLA — one skill in the cluster is "
"already broad enough to be the umbrella (example: `pr-triage-"
"salvage` for the PR review cluster). Patch it to add a labeled "
"section for each sibling's unique insight, then archive the "
"siblings.\n"
" b. CREATE A NEW UMBRELLA SKILL.md — no existing member is broad "
"enough. Use skill_manage action=create to write a new class-level "
"skill whose SKILL.md covers the shared workflow and has short "
"labeled subsections. Archive the now-absorbed narrow siblings.\n"
" c. DEMOTE TO REFERENCES/TEMPLATES/SCRIPTS — a sibling has "
"narrow-but-valuable session-specific content. Move it into the "
"umbrella's appropriate support directory:\n"
" • `references/<topic>.md` for session-specific detail OR "
"condensed knowledge banks (quoted research, API docs excerpts, "
"domain notes, provider quirks, reproduction recipes)\n"
" • `templates/<name>.<ext>` for starter files meant to be "
"copied and modified\n"
" • `scripts/<name>.<ext>` for statically re-runnable actions "
"(verification scripts, fixture generators, probes)\n"
" Then archive the old sibling. Use `terminal` with `mkdir -p "
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
"references/<topic>.md` (or templates/ / scripts/).\n"
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
"a feature codename, a specific error string, an 'audit' / "
"'diagnosis' / 'salvage' session artifact). These almost always "
"belong as a subsection or support file under a class-level umbrella.\n"
"5. Iterate. After one consolidation round, scan the remaining set "
"and look for the NEXT umbrella opportunity. Don't stop after 3 "
"merges.\n\n"
"Your toolset:\n"
" - skills_list, skill_view — read the current landscape\n"
" - skill_manage action=patch — add sections to the umbrella\n"
" - skill_manage action=create — create a new umbrella SKILL.md\n"
" - skill_manage action=write_file — add a references/, templates/, "
"or scripts/ file under an existing skill (the skill must already "
"exist)\n"
" - terminal — mv a sibling into the archive "
"OR move its content into a support subfile\n\n"
"'keep' is a legitimate decision ONLY when the skill is already a "
"class-level umbrella and none of the proposed merges would improve "
"discoverability. 'This is narrow but distinct from its siblings' "
"is NOT a reason to keep — it's a reason to move it under an "
"umbrella as a subsection or support file.\n\n"
"Expected output: real umbrella-ification. Process every obvious "
"cluster. If you end the pass with fewer than 10 archives, you "
"stopped too early — go back and look at the clusters you left "
"alone.\n\n"
"When done, write a summary with: clusters processed, skills "
"patched/absorbed, skills demoted to references/templates/scripts, "
"skills archived, new umbrellas created, and clusters you "
"deliberately left alone with one line each."
)
# ---------------------------------------------------------------------------
# Orchestrator — spawn a forked AIAgent for the LLM review pass
# ---------------------------------------------------------------------------
def _render_candidate_list() -> str:
"""Human/agent-readable list of agent-created skills with usage stats."""
rows = skill_usage.agent_created_report()
if not rows:
return "No agent-created skills to review."
lines = [f"Agent-created skills ({len(rows)}):\n"]
for r in rows:
lines.append(
f"- {r['name']} "
f"state={r['state']} "
f"pinned={'yes' if r.get('pinned') else 'no'} "
f"use={r.get('use_count', 0)} "
f"view={r.get('view_count', 0)} "
f"patches={r.get('patch_count', 0)} "
f"last_used={r.get('last_used_at') or 'never'}"
)
return "\n".join(lines)
def run_curator_review(
on_summary: Optional[Callable[[str], None]] = None,
synchronous: bool = False,
) -> Dict[str, Any]:
"""Execute a single curator review pass.
Steps:
1. Apply automatic state transitions (pure, no LLM).
2. If there are agent-created skills, spawn a forked AIAgent that runs
the LLM review prompt against the current candidate list.
3. Update .curator_state with last_run_at and a one-line summary.
4. Invoke *on_summary* with a user-visible description.
If *synchronous* is True, the LLM review runs in the calling thread; the
default is to spawn a daemon thread so the caller returns immediately.
"""
start = datetime.now(timezone.utc)
counts = apply_automatic_transitions(now=start)
auto_summary_parts = []
if counts["marked_stale"]:
auto_summary_parts.append(f"{counts['marked_stale']} marked stale")
if counts["archived"]:
auto_summary_parts.append(f"{counts['archived']} archived")
if counts["reactivated"]:
auto_summary_parts.append(f"{counts['reactivated']} reactivated")
auto_summary = ", ".join(auto_summary_parts) if auto_summary_parts else "no changes"
# Persist state before the LLM pass so a crash mid-review still records
# the run and doesn't immediately re-trigger.
state = load_state()
state["last_run_at"] = start.isoformat()
state["run_count"] = int(state.get("run_count", 0)) + 1
state["last_run_summary"] = f"auto: {auto_summary}"
save_state(state)
def _llm_pass():
nonlocal auto_summary
try:
candidate_list = _render_candidate_list()
if "No agent-created skills" in candidate_list:
final_summary = f"auto: {auto_summary}; llm: skipped (no candidates)"
else:
prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
llm_summary = _run_llm_review(prompt)
final_summary = f"auto: {auto_summary}; llm: {llm_summary}"
except Exception as e:
logger.debug("Curator LLM pass failed: %s", e, exc_info=True)
final_summary = f"auto: {auto_summary}; llm: error ({e})"
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
state2 = load_state()
state2["last_run_duration_seconds"] = elapsed
state2["last_run_summary"] = final_summary
save_state(state2)
if on_summary:
try:
on_summary(f"curator: {final_summary}")
except Exception:
pass
if synchronous:
_llm_pass()
else:
t = threading.Thread(target=_llm_pass, daemon=True, name="curator-review")
t.start()
return {
"started_at": start.isoformat(),
"auto_transitions": counts,
"summary_so_far": auto_summary,
}
def _run_llm_review(prompt: str) -> str:
"""Spawn an AIAgent fork to run the curator review prompt. Returns a short
summary of what the model said in its final response."""
import contextlib
try:
from run_agent import AIAgent
except Exception as e:
return f"AIAgent import failed: {e}"
# Resolve provider + model the same way the CLI does, so the curator
# fork inherits the user's active main config rather than falling
# through to an empty provider/model pair (which sends HTTP 400
# "No models provided"). AIAgent() without explicit provider/model
# arguments hits an auto-resolution path that fails for OAuth-only
# providers and for pool-backed credentials.
_api_key = None
_base_url = None
_api_mode = None
_resolved_provider = None
_model_name = ""
try:
from hermes_cli.config import load_config
from hermes_cli.runtime_provider import resolve_runtime_provider
_cfg = load_config()
_m = _cfg.get("model", {}) if isinstance(_cfg.get("model"), dict) else {}
_provider = _m.get("provider") or "auto"
_model_name = _m.get("default") or _m.get("model") or ""
_rp = resolve_runtime_provider(
requested=_provider, target_model=_model_name
)
_api_key = _rp.get("api_key")
_base_url = _rp.get("base_url")
_api_mode = _rp.get("api_mode")
_resolved_provider = _rp.get("provider") or _provider
except Exception as e:
logger.debug("Curator provider resolution failed: %s", e, exc_info=True)
review_agent = None
try:
review_agent = AIAgent(
model=_model_name,
provider=_resolved_provider,
api_key=_api_key,
base_url=_base_url,
api_mode=_api_mode,
# Umbrella-building over a large skill collection is worth a
# high iteration ceiling — the pass typically takes 50-100
# API calls against hundreds of candidate skills. The
# single-session review path caps itself at a much smaller
# number because it's not doing a curation sweep.
max_iterations=9999,
quiet_mode=True,
platform="curator",
skip_context_files=True,
skip_memory=True,
)
# Disable recursive nudges — the curator must never spawn its own review.
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Redirect the forked agent's stdout/stderr to /dev/null while it
# runs so its tool-call chatter doesn't pollute the foreground
# terminal. The background-thread runner also hides it; this
# belt-and-suspenders path matters when a caller invokes
# run_curator_review(synchronous=True) from the CLI.
with open(os.devnull, "w") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
result = review_agent.run_conversation(user_message=prompt)
final = ""
if isinstance(result, dict):
final = str(result.get("final_response") or "").strip()
return (final[:240] + "") if len(final) > 240 else (final or "no change")
except Exception as e:
return f"error: {e}"
finally:
if review_agent is not None:
try:
review_agent.close()
except Exception:
pass
# ---------------------------------------------------------------------------
# Public entrypoint for the session-start hook
# ---------------------------------------------------------------------------
def maybe_run_curator(
*,
idle_for_seconds: Optional[float] = None,
on_summary: Optional[Callable[[str], None]] = None,
) -> Optional[Dict[str, Any]]:
"""Best-effort: run a curator pass if all gates pass. Returns the result
dict if a pass was started, else None. Never raises."""
try:
if not should_run_now():
return None
# Idle gating: only enforce when the caller provided a measurement.
if idle_for_seconds is not None:
min_idle_s = get_min_idle_hours() * 3600.0
if idle_for_seconds < min_idle_s:
return None
return run_curator_review(on_summary=on_summary)
except Exception as e:
logger.debug("maybe_run_curator failed: %s", e, exc_info=True)
return None
-32
View File
@@ -42,7 +42,6 @@ class FailoverReason(enum.Enum):
# Context / payload
context_overflow = "context_overflow" # Context too large — compress, not failover
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
@@ -91,7 +90,6 @@ class ClassifiedError:
_BILLING_PATTERNS = [
"insufficient credits",
"insufficient_quota",
"insufficient balance",
"credit balance",
"credits have been exhausted",
"top up your credits",
@@ -149,20 +147,6 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
"error code: 413",
]
# Image-size patterns. Matched against 400 bodies (not 413) because most
# providers return a 400 with a specific image-too-big message before the
# whole request hits the 413 size limit. Anthropic's wording is the most
# important here (hard 5 MB per image, returned as
# "messages.N.content.K.image.source.base64: image exceeds 5 MB maximum").
_IMAGE_TOO_LARGE_PATTERNS = [
"image exceeds", # Anthropic: "image exceeds 5 MB maximum"
"image too large", # generic
"image_too_large", # error_code variant
"image size exceeds", # variant
# "request_too_large" on a request known to contain an image → image is
# the likely culprit; we still try the shrink path before giving up.
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@@ -687,15 +671,6 @@ def _classify_400(
) -> ClassifiedError:
"""Classify 400 Bad Request — context overflow, format error, or generic."""
# Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
# Must be checked BEFORE context_overflow because messages can trip both
# patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Context overflow from 400
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
return result_fn(
@@ -823,13 +798,6 @@ def _classify_by_message(
should_compress=True,
)
# Image-too-large patterns (from message text when no status_code)
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Usage-limit patterns need the same disambiguation as 402: some providers
# surface "usage limit" errors without an HTTP status code. A transient
# signal ("try again", "resets at", …) means it's a periodic quota, not
+2
View File
@@ -30,6 +30,7 @@ from __future__ import annotations
import json
import logging
import os
import time
import uuid
from types import SimpleNamespace
@@ -41,6 +42,7 @@ from agent import google_oauth
from agent.gemini_schema import sanitize_gemini_tool_parameters
from agent.google_code_assist import (
CODE_ASSIST_ENDPOINT,
FREE_TIER_ID,
CodeAssistError,
ProjectContext,
resolve_project_context,
+1 -1
View File
@@ -2,7 +2,7 @@
from __future__ import annotations
from typing import Any, Dict
from typing import Any, Dict, List
# Gemini's ``FunctionDeclaration.parameters`` field accepts the ``Schema``
# object, which is only a subset of OpenAPI 3.0 / JSON Schema. Strip fields
+1
View File
@@ -29,6 +29,7 @@ from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.parse
+3 -3
View File
@@ -49,13 +49,14 @@ import json
import logging
import os
import secrets
import socket
import stat
import threading
import time
import urllib.error
import urllib.parse
import urllib.request
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, Optional, Tuple
@@ -97,7 +98,6 @@ _DEFAULT_CLIENT_SECRET = f"GOCSPX-{_PUBLIC_CLIENT_SECRET_SUFFIX}"
# Regex patterns for fallback scraping from an installed gemini-cli.
import re as _re
from utils import atomic_replace
_CLIENT_ID_PATTERN = _re.compile(
r"OAUTH_CLIENT_ID\s*=\s*['\"]([0-9]+-[a-z0-9]+\.apps\.googleusercontent\.com)['\"]"
)
@@ -499,7 +499,7 @@ def save_credentials(creds: GoogleCredentials) -> Path:
fh.flush()
os.fsync(fh.fileno())
os.chmod(tmp_path, stat.S_IRUSR | stat.S_IWUSR)
atomic_replace(tmp_path, path)
os.replace(tmp_path, path)
finally:
try:
if tmp_path.exists():
-236
View File
@@ -1,236 +0,0 @@
"""Routing helpers for inbound user-attached images.
Two modes:
native attach images as OpenAI-style ``image_url`` content parts on the
user turn. Provider adapters (Anthropic, Gemini, Bedrock, Codex,
OpenAI chat.completions) already translate these into their
vendor-specific multimodal formats.
text run ``vision_analyze`` on each image up-front and prepend the
description to the user's text. The model never sees the pixels;
it only sees a lossy text summary. This is the pre-existing
behaviour and still the right choice for non-vision models.
The decision is made once per message turn by :func:`decide_image_input_mode`.
It reads ``agent.image_input_mode`` from config.yaml (``auto`` | ``native``
| ``text``, default ``auto``) and the active model's capability metadata.
In ``auto`` mode:
- If the user has explicitly configured ``auxiliary.vision.provider``
(i.e. not ``auto`` and not empty), we assume they want the text pipeline
regardless of the main model they've opted in to a specific vision
backend for a reason (cost, quality, local-only, etc.).
- Otherwise, if the active model reports ``supports_vision=True`` in its
models.dev metadata, we attach natively.
- Otherwise (non-vision model, no explicit override), we fall back to text.
This keeps ``vision_analyze`` surfaced as a tool in every session skills
and agent flows that chain it (browser screenshots, deeper inspection of
URL-referenced images, style-gating loops) keep working. The routing only
affects *how user-attached images on the current turn* are presented to the
main model.
"""
from __future__ import annotations
import base64
import logging
import mimetypes
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
return "auto"
val = raw.strip().lower()
if val in _VALID_MODES:
return val
return "auto"
def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
"""True when the user configured a specific auxiliary vision backend.
An explicit override means the user *wants* the text pipeline (they're
paying for a dedicated vision model), so we don't silently bypass it.
"""
if not isinstance(cfg, dict):
return False
aux = cfg.get("auxiliary") or {}
if not isinstance(aux, dict):
return False
vision = aux.get("vision") or {}
if not isinstance(vision, dict):
return False
provider = str(vision.get("provider") or "").strip().lower()
model = str(vision.get("model") or "").strip()
base_url = str(vision.get("base_url") or "").strip()
# "auto" / "" / blank = not explicit
if provider in ("", "auto") and not model and not base_url:
return False
return True
def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
"""Return True/False if we can resolve caps, None if unknown."""
if not provider or not model:
return None
try:
from agent.models_dev import get_model_capabilities
caps = get_model_capabilities(provider, model)
except Exception as exc: # pragma: no cover - defensive
logger.debug("image_routing: caps lookup failed for %s:%s%s", provider, model, exc)
return None
if caps is None:
return None
return bool(caps.supports_vision)
def decide_image_input_mode(
provider: str,
model: str,
cfg: Optional[Dict[str, Any]],
) -> str:
"""Return ``"native"`` or ``"text"`` for the given turn.
Args:
provider: active inference provider ID (e.g. ``"anthropic"``, ``"openrouter"``).
model: active model slug as it would be sent to the provider.
cfg: loaded config.yaml dict, or None. When None, behaves as auto.
"""
mode_cfg = "auto"
if isinstance(cfg, dict):
agent_cfg = cfg.get("agent") or {}
if isinstance(agent_cfg, dict):
mode_cfg = _coerce_mode(agent_cfg.get("image_input_mode"))
if mode_cfg == "native":
return "native"
if mode_cfg == "text":
return "text"
# auto
if _explicit_aux_vision_override(cfg):
return "text"
supports = _lookup_supports_vision(provider, model)
if supports is True:
return "native"
return "text"
# Image size handling is REACTIVE rather than proactive: we attempt native
# attachment at full size regardless of provider, and rely on
# ``run_agent._try_shrink_image_parts_in_messages`` to shrink + retry if
# the provider rejects the request (e.g. Anthropic's hard 5 MB per-image
# ceiling returned as HTTP 400 "image exceeds 5 MB maximum").
#
# Why reactive: our knowledge of provider ceilings is partial and evolving
# (OpenAI accepts 49 MB+, Anthropic 5 MB, Gemini 100 MB, others unknown).
# A proactive per-provider table would be stale the moment a provider raises
# or lowers its limit, and silently degrading quality for users on providers
# that would have accepted the full image is the worse failure mode.
# The shrink-on-reject path loses 1 API call + maybe 1s of Pillow work when
# it fires, which is cheaper than permanent quality loss.
def _guess_mime(path: Path) -> str:
mime, _ = mimetypes.guess_type(str(path))
if mime and mime.startswith("image/"):
return mime
# mimetypes on some Linux distros mis-maps .jpg; default to jpeg when
# the suffix looks imagey.
suffix = path.suffix.lower()
return {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(suffix, "image/jpeg")
def _file_to_data_url(path: Path) -> Optional[str]:
"""Encode a local image as a base64 data URL at its native size.
Size limits are NOT enforced here the agent retry loop
(``run_agent._try_shrink_image_parts_in_messages``) shrinks on the
provider's first rejection. Keeping this simple means providers that
accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
quality tax just because one other provider is stricter.
Returns None only if the file can't be read (missing, permission
denied, etc.); the caller reports those paths in ``skipped``.
"""
try:
raw = path.read_bytes()
except Exception as exc:
logger.warning("image_routing: failed to read %s%s", path, exc)
return None
mime = _guess_mime(path)
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"
def build_native_content_parts(
user_text: str,
image_paths: List[str],
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "..."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
...]
Images are attached at their native size. If a provider rejects the
request because an image is too large (e.g. Anthropic's 5 MB per-image
ceiling), the agent's retry loop transparently shrinks and retries
once see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk.
"""
parts: List[Dict[str, Any]] = []
skipped: List[str] = []
text = (user_text or "").strip()
if text:
parts.append({"type": "text", "text": text})
for raw_path in image_paths:
p = Path(raw_path)
if not p.exists() or not p.is_file():
skipped.append(str(raw_path))
continue
data_url = _file_to_data_url(p)
if not data_url:
skipped.append(str(raw_path))
continue
parts.append({
"type": "image_url",
"image_url": {"url": data_url},
})
# If the text was empty, add a neutral prompt so the turn isn't just images.
if not text and any(p.get("type") == "image_url" for p in parts):
parts.insert(0, {"type": "text", "text": "What do you see in this image?"})
return parts, skipped
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
]
-48
View File
@@ -1,48 +0,0 @@
"""LM Studio reasoning-effort resolution shared by the chat-completions
transport and run_agent's iteration-limit summary path.
LM Studio publishes per-model ``capabilities.reasoning.allowed_options`` (e.g.
``["off","on"]`` for toggle-style models, ``["off","minimal","low"]`` for
graduated models). We map the user's ``reasoning_config`` onto LM Studio's
OpenAI-compatible vocabulary, then clamp against the model's allowed set so
the server doesn't 400 on an unsupported effort.
"""
from __future__ import annotations
from typing import List, Optional
# LM Studio accepts these top-level reasoning_effort values via its
# OpenAI-compatible chat.completions endpoint.
_LM_VALID_EFFORTS = {"none", "minimal", "low", "medium", "high", "xhigh"}
# Toggle-style models publish allowed_options as ["off","on"] in /api/v1/models.
# Map them onto the OpenAI-compatible request vocabulary.
_LM_EFFORT_ALIASES = {"off": "none", "on": "medium"}
def resolve_lmstudio_effort(
reasoning_config: Optional[dict],
allowed_options: Optional[List[str]],
) -> Optional[str]:
"""Return the ``reasoning_effort`` string to send to LM Studio, or ``None``.
``None`` means "omit the field": the user picked a level the model can't
honor, so let LM Studio fall back to the model's declared default rather
than silently substituting a different effort. When ``allowed_options`` is
falsy (probe failed), skip clamping and send the resolved effort anyway.
"""
effort = "medium"
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
effort = "none"
else:
raw = (reasoning_config.get("effort") or "").strip().lower()
raw = _LM_EFFORT_ALIASES.get(raw, raw)
if raw in _LM_VALID_EFFORTS:
effort = raw
if allowed_options:
allowed = {_LM_EFFORT_ALIASES.get(opt, opt) for opt in allowed_options}
if effort not in allowed:
return None
return effort
+6 -114
View File
@@ -28,6 +28,7 @@ Usage in run_agent.py:
from __future__ import annotations
import json
import logging
import re
import inspect
@@ -62,124 +63,15 @@ def sanitize_context(text: str) -> str:
return text
class StreamingContextScrubber:
"""Stateful scrubber for streaming text that may contain split memory-context spans.
The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
a ``<memory-context>`` opened in one delta and closed in a later delta
leaks its payload to the UI because the non-greedy block regex needs
both tags in one string. This scrubber runs a small state machine
across deltas, holding back partial-tag tails and discarding
everything inside a span (including the system-note line).
Usage::
scrubber = StreamingContextScrubber()
for delta in stream:
visible = scrubber.feed(delta)
if visible:
emit(visible)
trailing = scrubber.flush() # at end of stream
if trailing:
emit(trailing)
The scrubber is re-entrant per agent instance. Callers building new
top-level responses (new turn) should create a fresh scrubber or call
``reset()``.
"""
_OPEN_TAG = "<memory-context>"
_CLOSE_TAG = "</memory-context>"
def __init__(self) -> None:
self._in_span: bool = False
self._buf: str = ""
def reset(self) -> None:
self._in_span = False
self._buf = ""
def feed(self, text: str) -> str:
"""Return the visible portion of ``text`` after scrubbing.
Any trailing fragment that could be the start of an open/close tag
is held back in the internal buffer and surfaced on the next
``feed()`` call or discarded/emitted by ``flush()``.
"""
if not text:
return ""
buf = self._buf + text
self._buf = ""
out: list[str] = []
while buf:
if self._in_span:
idx = buf.lower().find(self._CLOSE_TAG)
if idx == -1:
# Hold back a potential partial close tag; drop the rest
held = self._max_partial_suffix(buf, self._CLOSE_TAG)
self._buf = buf[-held:] if held else ""
return "".join(out)
# Found close — skip span content + tag, continue
buf = buf[idx + len(self._CLOSE_TAG):]
self._in_span = False
else:
idx = buf.lower().find(self._OPEN_TAG)
if idx == -1:
# No open tag — hold back a potential partial open tag
held = self._max_partial_suffix(buf, self._OPEN_TAG)
if held:
out.append(buf[:-held])
self._buf = buf[-held:]
else:
out.append(buf)
return "".join(out)
# Emit text before the tag, enter span
if idx > 0:
out.append(buf[:idx])
buf = buf[idx + len(self._OPEN_TAG):]
self._in_span = True
return "".join(out)
def flush(self) -> str:
"""Emit any held-back buffer at end-of-stream.
If we're still inside an unterminated span the remaining content is
discarded (safer: leaking partial memory context is worse than a
truncated answer). Otherwise the held-back partial-tag tail is
emitted verbatim (it turned out not to be a real tag).
"""
if self._in_span:
self._buf = ""
self._in_span = False
return ""
tail = self._buf
self._buf = ""
return tail
@staticmethod
def _max_partial_suffix(buf: str, tag: str) -> int:
"""Return the length of the longest buf-suffix that is a tag-prefix.
Case-insensitive. Returns 0 if no suffix could start the tag.
"""
tag_lower = tag.lower()
buf_lower = buf.lower()
max_check = min(len(buf_lower), len(tag_lower) - 1)
for i in range(max_check, 0, -1):
if tag_lower.startswith(buf_lower[-i:]):
return i
return 0
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note."""
"""Wrap prefetched memory in a fenced block with system note.
The fence prevents the model from treating recalled context as user
discourse. Injected at API-call time only never persisted.
"""
if not raw_context or not raw_context.strip():
return ""
clean = sanitize_context(raw_context)
if clean != raw_context:
logger.warning("memory provider returned pre-wrapped context; stripped")
return (
"<memory-context>\n"
"[System note: The following is recalled memory context, "
+25 -49
View File
@@ -51,8 +51,6 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"qwen-oauth",
"xiaomi",
"arcee",
"gmi",
"tencent-tokenhub",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
@@ -61,9 +59,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"ollama",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"tencent", "tokenhub", "tencent-cloud", "tencentmaas",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal",
@@ -210,8 +206,6 @@ DEFAULT_CONTEXT_LENGTHS = {
"grok": 131072, # catch-all (grok-beta, unknown grok-*)
# Kimi
"kimi": 262144,
# Tencent — Hy3 Preview (Hunyuan) with 256K context window
"hy3-preview": 256000,
# Nemotron — NVIDIA's open-weights series (128K context across all sizes)
"nemotron": 131072,
# Arcee
@@ -313,8 +307,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"integrate.api.nvidia.com": "nvidia",
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"tokenhub.tencentmaas.com": "tencent-tokenhub",
"ollama.com": "ollama-cloud",
}
@@ -625,6 +617,8 @@ def fetch_endpoint_model_metadata(
if isinstance(ctx, int) and ctx > 0:
context_length = ctx
break
if context_length is None:
context_length = _extract_context_length(model)
if context_length is not None:
entry["context_length"] = context_length
@@ -708,29 +702,6 @@ def fetch_endpoint_model_metadata(
return {}
def _resolve_endpoint_context_length(
model: str,
base_url: str,
api_key: str = "",
) -> Optional[int]:
"""Resolve context length from an endpoint's live ``/models`` metadata."""
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
return None
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
from hermes_constants import get_hermes_home
@@ -1014,7 +985,10 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
ctx = cfg.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
break
# Fall back to max_context_length (theoretical model max)
ctx = m.get("max_context_length") or m.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# LM Studio / vLLM / llama.cpp: try /v1/models/{model}
resp = client.get(f"{server_url}/v1/models/{model}")
@@ -1276,10 +1250,7 @@ def get_model_context_length(
model = _strip_provider_prefix(model)
# 1. Check persistent cache (model+provider)
# LM Studio is excluded — its loaded context length is transient (the
# user can reload the model with a different context_length at any time
# via /api/v1/models/load), so a stale cached value would mask reloads.
if base_url and provider != "lmstudio":
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
@@ -1324,16 +1295,28 @@ def get_model_context_length(
# returns 128k) instead of the model's full context (400k). models.dev
# has the correct per-provider values and is checked at step 5+.
if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if context_length is not None:
return context_length
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
if local_ctx and local_ctx > 0:
if provider != "lmstudio":
save_context_length(model, base_url, local_ctx)
save_context_length(model, base_url, local_ctx)
return local_ctx
logger.info(
"Could not detect context length for model %r at %s"
@@ -1391,12 +1374,6 @@ def get_model_context_length(
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider == "gmi" and base_url:
# GMI exposes authoritative context_length via /models, but it is not
# in models.dev yet. Preserve that higher-fidelity endpoint lookup.
ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if ctx is not None:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
@@ -1423,8 +1400,7 @@ def get_model_context_length(
if base_url and is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
if local_ctx and local_ctx > 0:
if provider != "lmstudio":
save_context_length(model, base_url, local_ctx)
save_context_length(model, base_url, local_ctx)
return local_ctx
# 10. Default fallback — 128K
+1 -2
View File
@@ -18,7 +18,6 @@ import os
import tempfile
import time
from typing import Any, Mapping, Optional
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -119,7 +118,7 @@ def record_nous_rate_limit(
try:
with os.fdopen(fd, "w") as f:
json.dump(state, f)
atomic_replace(tmp_path, path)
os.replace(tmp_path, path)
except Exception:
# Clean up temp file on failure
try:
+5 -52
View File
@@ -25,7 +25,6 @@ logger = logging.getLogger(__name__)
BUSY_INPUT_FLAG = "busy_input_prompt"
TOOL_PROGRESS_FLAG = "tool_progress_prompt"
OPENCLAW_RESIDUE_FLAG = "openclaw_residue_cleanup"
# -------------------------------------------------------------------------
@@ -44,18 +43,10 @@ def busy_input_hint_gateway(mode: str) -> str:
"Send `/busy interrupt` to make new messages stop the current task "
"immediately, or `/busy status` to check. This notice won't appear again."
)
if mode == "steer":
return (
"💡 First-time tip — I steered your message into the current run; "
"it will arrive after the next tool call instead of interrupting. "
"Send `/busy interrupt` or `/busy queue` to change this, or "
"`/busy status` to check. This notice won't appear again."
)
return (
"💡 First-time tip — I just interrupted my current task to answer you. "
"Send `/busy queue` to queue follow-ups for after the current task instead, "
"`/busy steer` to inject them mid-run without interrupting, or "
"`/busy status` to check. This notice won't appear again."
"or `/busy status` to check. This notice won't appear again."
)
@@ -64,19 +55,13 @@ def busy_input_hint_cli(mode: str) -> str:
if mode == "queue":
return (
"(tip) Your message was queued for the next turn. "
"Use /busy interrupt to make Enter stop the current run instead, "
"or /busy steer to inject mid-run. This tip only shows once."
)
if mode == "steer":
return (
"(tip) Your message was steered into the current run; it arrives "
"after the next tool call. Use /busy interrupt or /busy queue to "
"change this. This tip only shows once."
"Use /busy interrupt to make Enter stop the current run instead. "
"This tip only shows once."
)
return (
"(tip) Your message interrupted the current run. "
"Use /busy queue to queue messages for the next turn instead, "
"or /busy steer to inject mid-run. This tip only shows once."
"Use /busy queue to queue messages for the next turn instead. "
"This tip only shows once."
)
@@ -95,35 +80,6 @@ def tool_progress_hint_cli() -> str:
)
def openclaw_residue_hint_cli() -> str:
"""Banner shown the first time Hermes starts and finds ``~/.openclaw/``.
OpenClaw-era config, memory, and skill paths in ``~/.openclaw/`` will
otherwise attract the agent (memory entries like ``~/.openclaw/config.yaml``
get carried forward and the agent dutifully reads them). ``hermes claw
cleanup`` renames the directory so the agent stops finding it.
"""
return (
"Heads up — an OpenClaw workspace was detected at ~/.openclaw/.\n"
"After migrating, the agent can still get confused and read that "
"directory's config/memory instead of Hermes's.\n"
"Run `hermes claw cleanup` to archive it (rename → .openclaw.pre-migration). "
"This tip only shows once; rerun it any time with `hermes claw cleanup`."
)
def detect_openclaw_residue(home: Optional[Path] = None) -> bool:
"""Return True if an OpenClaw workspace directory is present in ``$HOME``.
Pure filesystem check no side effects. ``home`` override exists for tests.
"""
base = home or Path.home()
try:
return (base / ".openclaw").is_dir()
except OSError:
return False
# -------------------------------------------------------------------------
# State read / write
# -------------------------------------------------------------------------
@@ -179,13 +135,10 @@ def mark_seen(config_path: Path, flag: str) -> bool:
__all__ = [
"BUSY_INPUT_FLAG",
"TOOL_PROGRESS_FLAG",
"OPENCLAW_RESIDUE_FLAG",
"busy_input_hint_gateway",
"busy_input_hint_cli",
"tool_progress_hint_gateway",
"tool_progress_hint_cli",
"openclaw_residue_hint_cli",
"detect_openclaw_residue",
"is_seen",
"mark_seen",
]
+58 -38
View File
@@ -141,12 +141,6 @@ DEFAULT_AGENT_IDENTITY = (
"Be targeted and efficient in your exploration and investigations."
)
HERMES_AGENT_HELP_GUIDANCE = (
"If the user asks about configuring, setting up, or using Hermes Agent "
"itself, load the `hermes-agent` skill with skill_view(name='hermes-agent') "
"before answering. Docs: https://hermes-agent.nousresearch.com/docs"
)
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Save durable facts using the memory "
"tool: user preferences, environment details, tool quirks, and stable conventions. "
@@ -182,6 +176,64 @@ SKILLS_GUIDANCE = (
"Skills that aren't maintained become liabilities."
)
KANBAN_GUIDANCE = (
"# You are a Kanban worker\n"
"You were spawned by the Hermes Kanban dispatcher to execute ONE task from "
"the shared board at `~/.hermes/kanban.db`. Your task id is in "
"`$HERMES_KANBAN_TASK`; your workspace is `$HERMES_KANBAN_WORKSPACE`. "
"The `kanban_*` tools in your schema are your primary coordination surface — "
"they write directly to the shared SQLite DB and work regardless of terminal "
"backend (local/docker/modal/ssh).\n"
"\n"
"## Lifecycle\n"
"\n"
"1. **Orient.** Call `kanban_show()` first (no args — it defaults to your "
"task). The response includes title, body, parent-task handoffs (summary + "
"metadata), any prior attempts on this task if you're a retry, the full "
"comment thread, and a pre-formatted `worker_context` you can treat as "
"ground truth.\n"
"2. **Work inside the workspace.** `cd $HERMES_KANBAN_WORKSPACE` before "
"any file operations. The workspace is yours for this run. Don't modify "
"files outside it unless the task explicitly asks.\n"
"3. **Heartbeat on long operations.** Call `kanban_heartbeat(note=...)` "
"every few minutes during long subprocesses (training, encoding, crawling). "
"Skip heartbeats for short tasks.\n"
"4. **Block on genuine ambiguity.** If you need a human decision you cannot "
"infer (missing credentials, UX choice, paywalled source, peer output you "
"need first), call `kanban_block(reason=\"...\")` and stop. Don't guess. "
"The user will unblock with context and the dispatcher will respawn you.\n"
"5. **Complete with structured handoff.** Call `kanban_complete(summary=..., "
"metadata=...)`. `summary` is 13 human-readable sentences naming concrete "
"artifacts. `metadata` is machine-readable facts "
"(`{changed_files: [...], tests_run: N, decisions: [...]}`). Downstream "
"workers read both via their own `kanban_show`. Never put secrets / "
"tokens / raw PII in either field — run rows are durable forever.\n"
"6. **If follow-up work appears, create it; don't do it.** Use "
"`kanban_create(title=..., assignee=<right-profile>, parents=[your-task-id])` "
"to spawn a child task for the appropriate specialist profile instead of "
"scope-creeping into the next thing.\n"
"\n"
"## Orchestrator mode\n"
"\n"
"If your task is itself a decomposition task (e.g. a planner profile given "
"a high-level goal), use `kanban_create` to fan out into child tasks — one "
"per specialist, each with an explicit `assignee` and `parents=[...]` to "
"express dependencies. Then `kanban_complete` your own task with a summary "
"of the decomposition. Do NOT execute the work yourself; your job is "
"routing, not implementation.\n"
"\n"
"## Do NOT\n"
"\n"
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "
"the `kanban_*` tools — they work across all terminal backends.\n"
"- Do not complete a task you didn't actually finish. Block it.\n"
"- Do not assign follow-up work to yourself. Assign it to the right "
"specialist profile.\n"
"- Do not call `delegate_task` as a board substitute. `delegate_task` is "
"for short reasoning subtasks inside your own run; board tasks are for "
"cross-agent handoffs that outlive one API loop."
)
TOOL_USE_ENFORCEMENT_GUIDANCE = (
"# Tool-use enforcement\n"
"You MUST use your tools to take action — do not describe what you would do "
@@ -310,10 +362,6 @@ PLATFORM_HINTS = {
"Standard markdown is automatically converted to Telegram format. "
"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
"`inline code`, ```code blocks```, [links](url), and ## headers. "
"Telegram has NO table syntax — prefer bullet lists or labeled "
"key: value pairs over pipe tables (any tables you do emit are "
"auto-rewritten into row-group bullets, which you can produce "
"directly for cleaner output). "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
@@ -432,29 +480,6 @@ PLATFORM_HINTS = {
"your response. Images are sent as native photos, and other files arrive as downloadable "
"documents."
),
"yuanbao": (
"You are on Yuanbao (腾讯元宝), a Chinese AI assistant platform. "
"Markdown formatting is supported (code blocks, tables, bold/italic). "
"You CAN send media files natively — to deliver a file to the user, include "
"MEDIA:/absolute/path/to/file in your response. The file will be sent as a native "
"Yuanbao attachment: images (.jpg, .png, .webp, .gif) are sent as photos, "
"and other files (.pdf, .docx, .txt, .zip, etc.) arrive as downloadable documents "
"(max 50 MB). You can also include image URLs in markdown format ![alt](url) and "
"they will be downloaded and sent as native photos. "
"Do NOT tell the user you lack file-sending capability — use MEDIA: syntax "
"whenever a file delivery is appropriate.\n\n"
"Stickers (贴纸 / 表情包 / TIM face): Yuanbao has a built-in sticker catalogue. "
"When the user sends a sticker (you see '[emoji: 名称]' in their message) or asks "
"you to send/reply-with a 贴纸/表情/表情包, you MUST use the sticker tools:\n"
" 1. Call yb_search_sticker with a Chinese keyword (e.g. '666', '比心', '吃瓜', "
" '捂脸', '合十') to discover matching sticker_ids.\n"
" 2. Call yb_send_sticker with the chosen sticker_id or name — this sends a real "
" TIMFaceElem that renders as a native sticker in the chat.\n"
"DO NOT draw sticker-like PNGs with execute_code/Pillow/matplotlib and then send "
"them via MEDIA: or send_image_file. That produces a fake low-quality 'sticker' "
"image and is the WRONG path. Bare Unicode emoji in text is also not a substitute "
"— when a sticker is the right response, use yb_send_sticker."
),
}
# ---------------------------------------------------------------------------
@@ -858,11 +883,6 @@ def build_skills_system_prompt(
"Skills also encode the user's preferred approach, conventions, and quality standards "
"for tasks like code review, planning, and testing — load them even for tasks you "
"already know how to do, because the skill defines how it should be done here.\n"
"Whenever the user asks you to configure, set up, install, enable, disable, modify, "
"or troubleshoot Hermes Agent itself — its CLI, config, models, providers, tools, "
"skills, voice, gateway, plugins, or any feature — load the `hermes-agent` skill "
"first. It has the actual commands (e.g. `hermes config set …`, `hermes tools`, "
"`hermes setup`) so you don't have to guess or invent workarounds.\n"
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
+6 -58
View File
@@ -56,12 +56,8 @@ _SENSITIVE_BODY_KEYS = frozenset({
})
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
# `export HERMES_REDACT_SECRETS=true`) cannot enable/disable redaction
# mid-session. OFF by default — user must opt in via
# `security.redact_secrets: true` in config.yaml (bridged to this env var
# in hermes_cli/main.py and gateway/run.py) or `HERMES_REDACT_SECRETS=true`
# in ~/.hermes/.env.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("1", "true", "yes", "on")
# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
@@ -184,59 +180,11 @@ _PREFIX_RE = re.compile(
)
def mask_secret(
value: str,
*,
head: int = 4,
tail: int = 4,
floor: int = 12,
placeholder: str = "***",
empty: str = "",
) -> str:
"""Mask a secret for display, preserving ``head`` and ``tail`` characters.
Canonical helper for display-time redaction across Hermes used by
``hermes config``, ``hermes status``, ``hermes dump``, and anywhere
a secret needs to be shown truncated for debuggability while still
keeping the bulk hidden.
Args:
value: The secret to mask. ``None``/empty returns ``empty``.
head: Leading characters to preserve. Default 4.
tail: Trailing characters to preserve. Default 4.
floor: Values shorter than ``head + tail + floor_margin`` are
fully masked (returns ``placeholder``). Default 12
matches the existing config/status/dump convention.
placeholder: Value returned for too-short inputs. Default ``"***"``.
empty: Value returned when ``value`` is falsy (None, ""). The
caller can override this to e.g. ``color("(not set)",
Colors.DIM)`` for user-facing display.
Examples:
>>> mask_secret("sk-proj-abcdef1234567890")
'sk-p...7890'
>>> mask_secret("short") # fully masked
'***'
>>> mask_secret("") # empty default
''
>>> mask_secret("", empty="(not set)") # empty override
'(not set)'
>>> mask_secret("long-token", head=6, tail=4, floor=18)
'***'
"""
if not value:
return empty
if len(value) < floor:
return placeholder
return f"{value[:head]}...{value[-tail:]}"
def _mask_token(token: str) -> str:
"""Mask a log token — conservative 18-char floor, preserves 6 prefix / 4 suffix."""
# Empty input: historically this returned "***" rather than "". Preserve.
if not token:
"""Mask a token, preserving prefix for long tokens."""
if len(token) < 18:
return "***"
return mask_secret(token, head=6, tail=4, floor=18)
return f"{token[:6]}...{token[-4:]}"
def _redact_query_string(query: str) -> str:
@@ -309,7 +257,7 @@ def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled by default enable via security.redact_secrets: true in config.yaml.
Disabled when security.redact_secrets is false in config.yaml.
"""
if text is None:
return None
+2 -7
View File
@@ -76,7 +76,6 @@ except ImportError: # pragma: no cover
fcntl = None # type: ignore[assignment]
from hermes_constants import get_hermes_home
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -569,7 +568,7 @@ def save_allowlist(data: Dict[str, Any]) -> None:
try:
with os.fdopen(fd, "w") as fh:
fh.write(json.dumps(data, indent=2, sort_keys=True))
atomic_replace(tmp_path, p)
os.replace(tmp_path, p)
except Exception:
try:
os.unlink(tmp_path)
@@ -755,11 +754,7 @@ def _resolve_effective_accept(
if env in ("1", "true", "yes", "on"):
return True
cfg_val = cfg.get("hooks_auto_accept", False)
if isinstance(cfg_val, bool):
return cfg_val
if isinstance(cfg_val, str):
return cfg_val.strip().lower() in ("1", "true", "yes", "on")
return False
return bool(cfg_val)
# ---------------------------------------------------------------------------
+2 -2
View File
@@ -329,7 +329,7 @@ def build_skill_invocation_message(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want '
"you to follow its instructions. The full skill content is loaded below.]"
)
return _build_skill_message(
@@ -368,7 +368,7 @@ def build_preloaded_skills_prompt(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[IMPORTANT: The user launched this CLI session with the "{skill_name}" skill '
f'[SYSTEM: The user launched this CLI session with the "{skill_name}" skill '
"preloaded. Treat its instructions as active guidance for the duration of this "
"session unless the user overrides them.]"
)
+1 -9
View File
@@ -200,9 +200,6 @@ def get_external_skills_dirs() -> List[Path]:
if not isinstance(raw_dirs, list):
return []
from hermes_constants import get_hermes_home
hermes_home = get_hermes_home()
local_skills = get_skills_dir().resolve()
seen: Set[Path] = set()
result: List[Path] = []
@@ -213,12 +210,7 @@ def get_external_skills_dirs() -> List[Path]:
continue
# Expand ~ and environment variables
expanded = os.path.expanduser(os.path.expandvars(entry))
p = Path(expanded)
# Resolve relative paths against HERMES_HOME, not cwd
if not p.is_absolute():
p = (hermes_home / p).resolve()
else:
p = p.resolve()
p = Path(expanded).resolve()
if p == local_skills:
continue
if p in seen:
+5 -39
View File
@@ -6,18 +6,12 @@ adds latency to the user-facing reply.
import logging
import threading
from typing import Callable, Optional
from typing import Optional
from agent.auxiliary_client import call_llm
logger = logging.getLogger(__name__)
# Callback signature: (task_name, exception) -> None. Used to surface
# auxiliary failures to the user through AIAgent._emit_auxiliary_failure
# so silent-drops (e.g. OpenRouter 402 exhausting the fallback chain)
# become visible instead of piling up as NULL session titles.
FailureCallback = Callable[[str, BaseException], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
@@ -25,23 +19,11 @@ _TITLE_PROMPT = (
)
def generate_title(
user_message: str,
assistant_response: str,
timeout: float = 30.0,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> Optional[str]:
def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the main runtime's model when available, falling back to the
auxiliary LLM client (cheapest/fastest available model).
Uses the auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
``failure_callback`` is invoked with ``(task, exception)`` when the
auxiliary call raises the caller typically wires this to
``AIAgent._emit_auxiliary_failure`` so the user sees a warning instead
of silently accumulating untitled sessions.
"""
# Truncate long messages to keep the request small
user_snippet = user_message[:500] if user_message else ""
@@ -59,7 +41,6 @@ def generate_title(
max_tokens=500,
temperature=0.3,
timeout=timeout,
main_runtime=main_runtime,
)
title = (response.choices[0].message.content or "").strip()
# Clean up: remove quotes, trailing punctuation, prefixes like "Title: "
@@ -71,15 +52,7 @@ def generate_title(
title = title[:77] + "..."
return title if title else None
except Exception as e:
# Log at WARNING so this shows up in agent.log without debug mode.
# Full detail at debug level for operators who need the stack.
logger.warning("Title generation failed: %s", e)
logger.debug("Title generation traceback", exc_info=True)
if failure_callback is not None:
try:
failure_callback("title generation", e)
except Exception:
logger.debug("Title generation failure_callback raised", exc_info=True)
logger.debug("Title generation failed: %s", e)
return None
@@ -88,8 +61,6 @@ def auto_title_session(
session_id: str,
user_message: str,
assistant_response: str,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> None:
"""Generate and set a session title if one doesn't already exist.
@@ -110,9 +81,7 @@ def auto_title_session(
except Exception:
return
title = generate_title(
user_message, assistant_response, failure_callback=failure_callback, main_runtime=main_runtime
)
title = generate_title(user_message, assistant_response)
if not title:
return
@@ -129,8 +98,6 @@ def maybe_auto_title(
user_message: str,
assistant_response: str,
conversation_history: list,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> None:
"""Fire-and-forget title generation after the first exchange.
@@ -152,7 +119,6 @@ def maybe_auto_title(
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
kwargs={"failure_callback": failure_callback, "main_runtime": main_runtime},
daemon=True,
name="auto-title",
)
+7 -1
View File
@@ -85,6 +85,9 @@ class AnthropicTransport(ProviderTransport):
from agent.anthropic_adapter import _to_plain_data
from agent.transports.types import ToolCall
strip_tool_prefix = kwargs.get("strip_tool_prefix", False)
_MCP_PREFIX = "mcp_"
text_parts = []
reasoning_parts = []
reasoning_details = []
@@ -99,10 +102,13 @@ class AnthropicTransport(ProviderTransport):
if isinstance(block_dict, dict):
reasoning_details.append(block_dict)
elif block.type == "tool_use":
name = block.name
if strip_tool_prefix and name.startswith(_MCP_PREFIX):
name = name[len(_MCP_PREFIX):]
tool_calls.append(
ToolCall(
id=block.id,
name=block.name,
name=name,
arguments=json.dumps(block.input),
)
)
+2 -92
View File
@@ -12,65 +12,12 @@ reasoning configuration, temperature handling, and extra_body assembly.
import copy
from typing import Any, Dict, List, Optional
from agent.lmstudio_reasoning import resolve_lmstudio_effort
from agent.moonshot_schema import is_moonshot_model, sanitize_moonshot_tools
from agent.prompt_builder import DEVELOPER_ROLE_MODELS
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
def _build_gemini_thinking_config(model: str, reasoning_config: dict | None) -> dict | None:
"""Translate Hermes/OpenRouter-style reasoning config to Gemini thinkingConfig.
Gemini native/cloud-code adapters do not read ``extra_body.reasoning``.
They only inspect ``extra_body.thinking_config`` / ``thinkingConfig`` and
then request thought parts with ``includeThoughts`` enabled.
"""
if reasoning_config is None or not isinstance(reasoning_config, dict):
return None
if reasoning_config.get("enabled") is False:
# Gemini can hide thought parts even when internal thinking still
# happens; omit thinkingLevel to avoid model-specific validation quirks.
return {"includeThoughts": False}
effort = str(reasoning_config.get("effort", "medium") or "medium").strip().lower()
if effort == "none":
return {"includeThoughts": False}
thinking_config: Dict[str, Any] = {"includeThoughts": True}
normalized_model = (model or "").strip().lower()
if normalized_model.startswith("google/"):
normalized_model = normalized_model.split("/", 1)[1]
# Gemini 2.5 accepts thinkingBudget; don't guess a budget from Hermes'
# coarse effort levels. ``includeThoughts`` alone is enough to surface
# thought parts without risking request validation errors.
if normalized_model.startswith("gemini-2.5-"):
return thinking_config
if effort not in {"minimal", "low", "medium", "high", "xhigh"}:
effort = "medium"
# Gemini 3 Flash documents low/medium/high thinking levels; Gemini 3 Pro
# is stricter (low/high). Clamp Hermes' wider effort set to what each
# family accepts so we never forward an undocumented level verbatim.
if normalized_model.startswith(("gemini-3", "gemini-3.1")):
if "flash" in normalized_model:
if effort in {"minimal", "low"}:
thinking_config["thinkingLevel"] = "low"
elif effort in {"high", "xhigh"}:
thinking_config["thinkingLevel"] = "high"
else:
thinking_config["thinkingLevel"] = "medium"
elif "pro" in normalized_model:
thinking_config["thinkingLevel"] = (
"high" if effort in {"high", "xhigh"} else "low"
)
return thinking_config
class ChatCompletionsTransport(ProviderTransport):
"""Transport for api_mode='chat_completions'.
@@ -154,7 +101,6 @@ class ChatCompletionsTransport(ProviderTransport):
is_github_models: bool
is_nvidia_nim: bool
is_kimi: bool
is_lmstudio: bool
is_custom_provider: bool
ollama_num_ctx: int | None
# Provider routing
@@ -168,7 +114,6 @@ class ChatCompletionsTransport(ProviderTransport):
# Reasoning
supports_reasoning: bool
github_reasoning_extra: dict | None
lmstudio_reasoning_options: list[str] | None # raw allowed_options from /api/v1/models
# Claude on OpenRouter/Nous max output
anthropic_max_output: int | None
# Extra
@@ -243,7 +188,6 @@ class ChatCompletionsTransport(ProviderTransport):
anthropic_max_out = params.get("anthropic_max_output")
is_nvidia_nim = params.get("is_nvidia_nim", False)
is_kimi = params.get("is_kimi", False)
is_tokenhub = params.get("is_tokenhub", False)
reasoning_config = params.get("reasoning_config")
if ephemeral is not None and max_tokens_fn:
@@ -275,40 +219,12 @@ class ChatCompletionsTransport(ProviderTransport):
_kimi_effort = _e
api_kwargs["reasoning_effort"] = _kimi_effort
# Tencent TokenHub: top-level reasoning_effort (unless thinking disabled)
if is_tokenhub:
_tokenhub_thinking_off = bool(
reasoning_config
and isinstance(reasoning_config, dict)
and reasoning_config.get("enabled") is False
)
if not _tokenhub_thinking_off:
_tokenhub_effort = "high"
if reasoning_config and isinstance(reasoning_config, dict):
_e = (reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
_tokenhub_effort = _e
api_kwargs["reasoning_effort"] = _tokenhub_effort
# LM Studio: top-level reasoning_effort. Only emit when the model
# declares reasoning support via /api/v1/models capabilities (gated
# upstream by params["supports_reasoning"]). resolve_lmstudio_effort
# is shared with run_agent's summary path so both stay in sync.
if params.get("is_lmstudio", False) and params.get("supports_reasoning", False):
_lm_effort = resolve_lmstudio_effort(
reasoning_config,
params.get("lmstudio_reasoning_options"),
)
if _lm_effort is not None:
api_kwargs["reasoning_effort"] = _lm_effort
# extra_body assembly
extra_body: Dict[str, Any] = {}
is_openrouter = params.get("is_openrouter", False)
is_nous = params.get("is_nous", False)
is_github_models = params.get("is_github_models", False)
provider_name = str(params.get("provider_name") or "").strip().lower()
provider_prefs = params.get("provider_preferences")
if provider_prefs and is_openrouter:
@@ -324,9 +240,8 @@ class ChatCompletionsTransport(ProviderTransport):
"type": "enabled" if _kimi_thinking_enabled else "disabled",
}
# Reasoning. LM Studio is handled above via top-level reasoning_effort,
# so skip emitting extra_body.reasoning for it.
if params.get("supports_reasoning", False) and not params.get("is_lmstudio", False):
# Reasoning
if params.get("supports_reasoning", False):
if is_github_models:
gh_reasoning = params.get("github_reasoning_extra")
if gh_reasoning is not None:
@@ -362,11 +277,6 @@ class ChatCompletionsTransport(ProviderTransport):
if is_qwen:
extra_body["vl_high_resolution_images"] = True
if provider_name in {"gemini", "google-gemini-cli"}:
thinking_config = _build_gemini_thinking_config(model, reasoning_config)
if thinking_config:
extra_body["thinking_config"] = thinking_config
# Merge any pre-built extra_body additions
additions = params.get("extra_body_additions")
if additions:
+3 -1
View File
@@ -8,7 +8,7 @@ streaming, or the _run_codex_stream() call path.
from typing import Any, Dict, List, Optional
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall
from agent.transports.types import NormalizedResponse, ToolCall, Usage
class ResponsesApiTransport(ProviderTransport):
@@ -151,6 +151,8 @@ class ResponsesApiTransport(ProviderTransport):
"""Normalize Codex Responses API response to NormalizedResponse."""
from agent.codex_responses_adapter import (
_normalize_codex_response,
_extract_responses_message_text,
_extract_responses_reasoning_text,
)
# _normalize_codex_response returns (SimpleNamespace, finish_reason_str)
+8 -12
View File
@@ -30,13 +30,14 @@ model:
# "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
# "kilocode" - KiloCode gateway (requires: KILOCODE_API_KEY)
# "ai-gateway" - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
# "lmstudio" - LM Studio local server (optional: LM_API_KEY, defaults to http://127.0.0.1:1234/v1)
#
# Local servers (LM Studio, Ollama, vLLM, llama.cpp):
# "custom" - Any other OpenAI-compatible endpoint. Set base_url below.
# Aliases: "ollama", "vllm", "llamacpp" all map to "custom".
# LM Studio is first-class and uses provider: "lmstudio".
# It works with both no-auth and auth-enabled server modes.
# "custom" - Any OpenAI-compatible endpoint. Set base_url below.
# Aliases: "lmstudio", "ollama", "vllm", "llamacpp" all map to "custom".
# Example for LM Studio:
# provider: "lmstudio"
# base_url: "http://localhost:1234/v1"
# No API key needed — local servers typically ignore auth.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"
@@ -605,7 +606,6 @@ platform_toolsets:
signal: [hermes-signal]
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
yuanbao: [hermes-yuanbao]
# =============================================================================
# Gateway Platform Settings
@@ -847,12 +847,8 @@ display:
# What Enter does when Hermes is already busy (CLI and gateway platforms).
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# steer: Inject your message mid-run via /steer, arriving at the agent
# after the next tool call — no interrupt, no role violation.
# Falls back to 'queue' if the agent isn't running yet or if
# images are attached (steer only carries text).
# Ctrl+C (or /stop in gateway) always interrupts regardless of this setting.
# Toggle at runtime with /busy <interrupt|queue|steer>.
# Toggle at runtime with /busy_input_mode <interrupt|queue>.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
@@ -927,7 +923,7 @@ display:
# agent_name: "My Agent" # Banner title and branding
# welcome: "Welcome message" # Shown at CLI startup
# response_label: " ⚔ Agent " # Response box header label
# prompt_symbol: "⚔" # Prompt symbol (bare token; renderers add trailing space)
# prompt_symbol: "⚔ " # Prompt symbol
# tool_prefix: "╎" # Tool output line prefix (default: ┊)
#
skin: default
+201 -550
View File
File diff suppressed because it is too large Load Diff
+5 -34
View File
@@ -21,7 +21,6 @@ from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
from hermes_time import now as _hermes_now
from utils import atomic_replace
try:
from croniter import croniter
@@ -312,12 +311,6 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
elif schedule["kind"] == "cron":
if not HAS_CRONITER:
logger.warning(
"Cannot compute next run for cron schedule %r: 'croniter' "
"is not installed. Install the 'cron' extra (pip install "
"'hermes-agent[cron]') to re-enable recurring cron jobs.",
schedule.get("expr"),
)
return None
cron = croniter(schedule["expr"], now)
next_run = cron.get_next(datetime)
@@ -368,7 +361,7 @@ def save_jobs(jobs: List[Dict[str, Any]]):
json.dump({"jobs": jobs, "updated_at": _hermes_now().isoformat()}, f, indent=2)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, JOBS_FILE)
os.replace(tmp_path, JOBS_FILE)
_secure_file(JOBS_FILE)
except BaseException:
try:
@@ -705,32 +698,10 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
# Compute next run
job["next_run_at"] = compute_next_run(job["schedule"], now)
# If no next run, decide whether this is terminal completion
# (one-shot) or a transient failure (recurring schedule couldn't
# compute — e.g. 'croniter' missing from the runtime env).
# Recurring jobs must NEVER be silently disabled: that turns a
# missing runtime dep into "job completed" and the user's
# schedule quietly goes off. See issue #16265.
# If no next run (one-shot completed), disable
if job["next_run_at"] is None:
kind = job.get("schedule", {}).get("kind")
if kind in ("cron", "interval"):
job["state"] = "error"
if not job.get("last_error"):
job["last_error"] = (
"Failed to compute next run for recurring "
"schedule (is the 'croniter' package "
"installed in the gateway's Python env?)"
)
logger.error(
"Job '%s' (%s) could not compute next_run_at; "
"leaving enabled and marking state=error so the "
"job is not silently disabled.",
job.get("name", job["id"]),
kind,
)
else:
job["enabled"] = False
job["state"] = "completed"
job["enabled"] = False
job["state"] = "completed"
elif job.get("state") != "paused":
job["state"] = "scheduled"
@@ -864,7 +835,7 @@ def save_job_output(job_id: str, output: str):
f.write(output)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, output_file)
os.replace(tmp_path, output_file)
_secure_file(output_file)
except BaseException:
try:
+5 -39
View File
@@ -77,7 +77,7 @@ _KNOWN_DELIVERY_PLATFORMS = frozenset({
"telegram", "discord", "slack", "whatsapp", "signal",
"matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
"wecom", "wecom_callback", "weixin", "sms", "email", "webhook", "bluebubbles",
"qqbot", "yuanbao",
"qqbot",
})
# Platforms that support a configured cron/notification home target, mapped to
@@ -198,9 +198,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
if resolved:
parsed_chat_id, parsed_thread_id, resolved_is_explicit = _parse_target_ref(platform_key, resolved)
if resolved_is_explicit:
chat_id = parsed_chat_id
if parsed_thread_id is not None:
thread_id = parsed_thread_id
chat_id, thread_id = parsed_chat_id, parsed_thread_id
else:
chat_id = resolved
except Exception:
@@ -339,7 +337,6 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
"sms": Platform.SMS,
"bluebubbles": Platform.BLUEBUBBLES,
"qqbot": Platform.QQBOT,
"yuanbao": Platform.YUANBAO,
}
# Optionally wrap the content with a header/footer so the user knows this
@@ -718,7 +715,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
"[IMPORTANT: You are running as a scheduled cron job. "
"[SYSTEM: You are running as a scheduled cron job. "
"DELIVERY: Your final response will be automatically delivered "
"to the user — do NOT use send_message or try to deliver "
"the output yourself. Just produce your report/output as your "
@@ -754,7 +751,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
parts.append("")
parts.extend(
[
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
"",
content,
]
@@ -762,7 +759,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
if skipped:
notice = (
f"[IMPORTANT: The following skill(s) were listed for this job but could not be found "
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
f"and were skipped: {', '.join(skipped)}. "
f"Start your response with a brief notice so the user is aware, e.g.: "
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
@@ -824,8 +821,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
agent = None
# Mark this as a cron session so the approval system can apply cron_mode.
# This env var is process-wide and persists for the lifetime of the
# scheduler process — every job this process runs is a cron job.
@@ -1174,24 +1169,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_session_db.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close SQLite session store: %s", job_id, e)
# Release subprocesses, terminal sandboxes, browser daemons, and the
# main OpenAI/httpx client held by this ephemeral cron agent. Without
# this, a gateway that ticks cron every N minutes leaks fds per job
# until it hits EMFILE (#10200 / "too many open files").
try:
if agent is not None:
agent.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close agent resources: %s", job_id, e)
# Each cron run spins up a short-lived worker thread whose event loop
# dies as soon as the ``ThreadPoolExecutor`` shuts down. Any async
# httpx clients cached under that loop are now unusable — reap them
# so their transports don't accumulate in the process-global cache.
try:
from agent.auxiliary_client import cleanup_stale_async_clients
cleanup_stale_async_clients()
except Exception as e:
logger.debug("Job '%s': failed to reap stale auxiliary clients: %s", job_id, e)
def tick(verbose: bool = True, adapters=None, loop=None) -> int:
@@ -1331,17 +1308,6 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
# Best-effort sweep of MCP stdio subprocesses that survived their
# session teardown during this tick. Runs AFTER every job has
# finished so active sessions (including live user chats) are
# never touched — only PIDs explicitly detected as orphans in
# tools.mcp_tool._run_stdio's finally block are reaped.
try:
from tools.mcp_tool import _kill_orphaned_mcp_children
_kill_orphaned_mcp_children()
except Exception as _e:
logger.debug("Post-tick MCP orphan cleanup failed: %s", _e)
return sum(_results)
finally:
if fcntl:
Binary file not shown.
-1
View File
@@ -36,7 +36,6 @@
imports = [
./nix/packages.nix
./nix/overlays.nix
./nix/nixosModules.nix
./nix/checks.nix
./nix/devShell.nix
+85
View File
@@ -0,0 +1,85 @@
"""Built-in boot-md hook — run ~/.hermes/BOOT.md on gateway startup.
This hook is always registered. It silently skips if no BOOT.md exists.
To activate, create ``~/.hermes/BOOT.md`` with instructions for the
agent to execute on every gateway restart.
Example BOOT.md::
# Startup Checklist
1. Check if any cron jobs failed overnight
2. Send a status update to Discord #general
3. If there are errors in /opt/app/deploy.log, summarize them
The agent runs in a background thread so it doesn't block gateway
startup. If nothing needs attention, it replies with [SILENT] to
suppress delivery.
"""
import logging
import threading
logger = logging.getLogger("hooks.boot-md")
from hermes_constants import get_hermes_home
HERMES_HOME = get_hermes_home()
BOOT_FILE = HERMES_HOME / "BOOT.md"
def _build_boot_prompt(content: str) -> str:
"""Wrap BOOT.md content in a system-level instruction."""
return (
"You are running a startup boot checklist. Follow the BOOT.md "
"instructions below exactly.\n\n"
"---\n"
f"{content}\n"
"---\n\n"
"Execute each instruction. If you need to send a message to a "
"platform, use the send_message tool.\n"
"If nothing needs attention and there is nothing to report, "
"reply with ONLY: [SILENT]"
)
def _run_boot_agent(content: str) -> None:
"""Spawn a one-shot agent session to execute the boot instructions."""
try:
from run_agent import AIAgent
prompt = _build_boot_prompt(content)
agent = AIAgent(
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
max_iterations=20,
)
result = agent.run_conversation(prompt)
response = result.get("final_response", "")
if response and "[SILENT]" not in response:
logger.info("boot-md completed: %s", response[:200])
else:
logger.info("boot-md completed (nothing to report)")
except Exception as e:
logger.error("boot-md agent failed: %s", e)
async def handle(event_type: str, context: dict) -> None:
"""Gateway startup handler — run BOOT.md if it exists."""
if not BOOT_FILE.exists():
return
content = BOOT_FILE.read_text(encoding="utf-8").strip()
if not content:
return
logger.info("Running BOOT.md (%d chars)", len(content))
# Run in a background thread so we don't block gateway startup.
thread = threading.Thread(
target=_run_boot_agent,
args=(content,),
name="boot-md",
daemon=True,
)
thread.start()
+14 -67
View File
@@ -57,7 +57,7 @@ def _session_entry_name(origin: Dict[str, Any]) -> str:
# Build / refresh
# ---------------------------------------------------------------------------
async def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
"""
Build a channel directory from connected platform adapters and session data.
@@ -72,7 +72,7 @@ async def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
if platform == Platform.DISCORD:
platforms["discord"] = _build_discord(adapter)
elif platform == Platform.SLACK:
platforms["slack"] = await _build_slack(adapter)
platforms["slack"] = _build_slack(adapter)
except Exception as e:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
@@ -136,66 +136,21 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
return channels
async def _build_slack(adapter) -> List[Dict[str, Any]]:
"""List Slack channels the bot has joined across all workspaces.
Uses ``users.conversations`` against each workspace's web client. Pulls
public + private channels the bot is a member of, then merges in DMs
discovered from session history (IMs aren't useful to enumerate
proactively).
"""
team_clients = getattr(adapter, "_team_clients", None) or {}
if not team_clients:
def _build_slack(adapter) -> List[Dict[str, str]]:
"""List Slack channels the bot has joined."""
# Slack adapter may expose a web client
client = getattr(adapter, "_app", None) or getattr(adapter, "_client", None)
if not client:
return _build_from_sessions("slack")
channels: List[Dict[str, Any]] = []
seen_ids: set = set()
try:
from tools.send_message_tool import _send_slack # noqa: F401
# Use the Slack Web API directly if available
except Exception:
pass
for team_id, client in team_clients.items():
try:
cursor: Optional[str] = None
for _page in range(20): # safety cap on pagination
response = await client.users_conversations(
types="public_channel,private_channel",
exclude_archived=True,
limit=200,
cursor=cursor,
)
if not response.get("ok"):
logger.warning(
"Channel directory: users.conversations not ok for team %s: %s",
team_id,
response.get("error", "unknown"),
)
break
for ch in response.get("channels", []):
cid = ch.get("id")
name = ch.get("name")
if not cid or not name or cid in seen_ids:
continue
seen_ids.add(cid)
channels.append({
"id": cid,
"name": name,
"type": "private" if ch.get("is_private") else "channel",
})
cursor = (response.get("response_metadata") or {}).get("next_cursor")
if not cursor:
break
except Exception as e:
logger.warning(
"Channel directory: failed to list Slack channels for team %s: %s",
team_id, e,
)
continue
# Merge in DM/group entries discovered from session history.
for entry in _build_from_sessions("slack"):
if entry.get("id") not in seen_ids:
channels.append(entry)
seen_ids.add(entry.get("id"))
return channels
# Fallback to session data
return _build_from_sessions("slack")
def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
@@ -268,14 +223,6 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
if not channels:
return None
# 0. Exact ID match — case-sensitive, no normalization. Lets callers pass
# raw platform IDs (e.g. Slack "C0B0QV5434G") even when the format guard
# in _parse_target_ref hasn't recognized them as explicit.
raw = name.strip()
for ch in channels:
if ch.get("id") == raw:
return ch["id"]
query = _normalize_channel_query(name)
# 1. Exact name match, including the display labels shown by send_message(action="list")
+3 -84
View File
@@ -67,7 +67,6 @@ class Platform(Enum):
WEIXIN = "weixin"
BLUEBUBBLES = "bluebubbles"
QQBOT = "qqbot"
YUANBAO = "yuanbao"
@dataclass
@@ -196,14 +195,6 @@ class StreamingConfig:
edit_interval: float = 1.0 # Seconds between message edits (Telegram rate-limits at ~1/s)
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
# Ported from openclaw/openclaw#72038. When >0, the final edit for
# a long-running streamed response is delivered as a fresh message
# if the original preview has been visible for at least this many
# seconds, so the platform's visible timestamp reflects completion
# time instead of the preview creation time. Currently applied to
# Telegram only (other platforms ignore the setting). Default 60s
# matches the OpenClaw rollout. Set to 0 to disable.
fresh_final_after_seconds: float = 60.0
def to_dict(self) -> Dict[str, Any]:
return {
@@ -212,7 +203,6 @@ class StreamingConfig:
"edit_interval": self.edit_interval,
"buffer_threshold": self.buffer_threshold,
"cursor": self.cursor,
"fresh_final_after_seconds": self.fresh_final_after_seconds,
}
@classmethod
@@ -225,9 +215,6 @@ class StreamingConfig:
edit_interval=float(data.get("edit_interval", 1.0)),
buffer_threshold=int(data.get("buffer_threshold", 40)),
cursor=data.get("cursor", ""),
fresh_final_after_seconds=float(
data.get("fresh_final_after_seconds", 60.0)
),
)
@@ -327,9 +314,6 @@ class GatewayConfig:
# QQBot uses extra dict for app credentials
elif platform == Platform.QQBOT and config.extra.get("app_id") and config.extra.get("client_secret"):
connected.append(platform)
# Yuanbao uses extra dict for app credentials
elif platform == Platform.YUANBAO and config.extra.get("app_id") and config.extra.get("app_secret"):
connected.append(platform)
# DingTalk uses client_id/client_secret from config.extra or env vars
elif platform == Platform.DINGTALK and (
config.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID")
@@ -566,8 +550,6 @@ def load_gateway_config() -> GatewayConfig:
existing = {}
# Deep-merge extra dicts so gateway.json defaults survive
merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
if plat_name == Platform.SLACK.value and "enabled" in plat_block:
merged_extra["_enabled_explicit"] = True
merged = {**existing, **plat_block}
if merged_extra:
merged["extra"] = merged_extra
@@ -588,8 +570,6 @@ def load_gateway_config() -> GatewayConfig:
)
if "reply_prefix" in platform_cfg:
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
if "reply_in_thread" in platform_cfg:
bridged["reply_in_thread"] = platform_cfg["reply_in_thread"]
if "require_mention" in platform_cfg:
bridged["require_mention"] = platform_cfg["require_mention"]
if "free_response_channels" in platform_cfg:
@@ -604,7 +584,7 @@ def load_gateway_config() -> GatewayConfig:
bridged["group_policy"] = platform_cfg["group_policy"]
if "group_allow_from" in platform_cfg:
bridged["group_allow_from"] = platform_cfg["group_allow_from"]
if plat in (Platform.DISCORD, Platform.SLACK) and "channel_skill_bindings" in platform_cfg:
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
@@ -612,21 +592,16 @@ def load_gateway_config() -> GatewayConfig:
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
enabled_was_explicit = "enabled" in platform_cfg
if not bridged and not enabled_was_explicit:
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
if plat == Platform.SLACK and enabled_was_explicit:
extra["_enabled_explicit"] = True
extra.update(bridged)
# Slack settings → env vars (env vars take precedence)
@@ -634,8 +609,6 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(slack_cfg, dict):
if "require_mention" in slack_cfg and not os.getenv("SLACK_REQUIRE_MENTION"):
os.environ["SLACK_REQUIRE_MENTION"] = str(slack_cfg["require_mention"]).lower()
if "strict_mention" in slack_cfg and not os.getenv("SLACK_STRICT_MENTION"):
os.environ["SLACK_STRICT_MENTION"] = str(slack_cfg["strict_mention"]).lower()
if "allow_bots" in slack_cfg and not os.getenv("SLACK_ALLOW_BOTS"):
os.environ["SLACK_ALLOW_BOTS"] = str(slack_cfg["allow_bots"]).lower()
frc = slack_cfg.get("free_response_channels")
@@ -945,20 +918,8 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
slack_token = os.getenv("SLACK_BOT_TOKEN")
if slack_token:
if Platform.SLACK not in config.platforms:
# No yaml config for Slack — env-only setup, enable it
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
else:
slack_config = config.platforms[Platform.SLACK]
enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
if not slack_config.enabled and not enabled_was_explicit:
# Top-level Slack settings such as channel prompts should not
# turn an env-token setup into a disabled platform. Only an
# explicit slack.enabled/platforms.slack.enabled false should.
slack_config.enabled = True
# If yaml config exists, respect its enabled flag (don't override
# explicit enabled: false). Token is still stored so skills that
# send Slack messages can use it without activating the gateway adapter.
config.platforms[Platform.SLACK].enabled = True
config.platforms[Platform.SLACK].token = slack_token
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home and Platform.SLACK in config.platforms:
@@ -1315,48 +1276,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
)
# Yuanbao — YUANBAO_APP_ID preferred
yuanbao_app_id = os.getenv("YUANBAO_APP_ID") or os.getenv("YUANBAO_APP_KEY")
yuanbao_app_secret = os.getenv("YUANBAO_APP_SECRET")
if yuanbao_app_id and yuanbao_app_secret:
if Platform.YUANBAO not in config.platforms:
config.platforms[Platform.YUANBAO] = PlatformConfig()
config.platforms[Platform.YUANBAO].enabled = True
extra = config.platforms[Platform.YUANBAO].extra
extra["app_id"] = yuanbao_app_id
extra["app_secret"] = yuanbao_app_secret
yuanbao_bot_id = os.getenv("YUANBAO_BOT_ID")
if yuanbao_bot_id:
extra["bot_id"] = yuanbao_bot_id
yuanbao_ws_url = os.getenv("YUANBAO_WS_URL")
if yuanbao_ws_url:
extra["ws_url"] = yuanbao_ws_url
yuanbao_api_domain = os.getenv("YUANBAO_API_DOMAIN")
if yuanbao_api_domain:
extra["api_domain"] = yuanbao_api_domain
yuanbao_route_env = os.getenv("YUANBAO_ROUTE_ENV")
if yuanbao_route_env:
extra["route_env"] = yuanbao_route_env
yuanbao_home = os.getenv("YUANBAO_HOME_CHANNEL")
if yuanbao_home:
config.platforms[Platform.YUANBAO].home_channel = HomeChannel(
platform=Platform.YUANBAO,
chat_id=yuanbao_home,
name=os.getenv("YUANBAO_HOME_CHANNEL_NAME", "Home"),
)
yuanbao_dm_policy = os.getenv("YUANBAO_DM_POLICY")
if yuanbao_dm_policy:
extra["dm_policy"] = yuanbao_dm_policy.strip().lower()
yuanbao_dm_allow_from = os.getenv("YUANBAO_DM_ALLOW_FROM")
if yuanbao_dm_allow_from:
extra["dm_allow_from"] = yuanbao_dm_allow_from
yuanbao_group_policy = os.getenv("YUANBAO_GROUP_POLICY")
if yuanbao_group_policy:
extra["group_policy"] = yuanbao_group_policy.strip().lower()
yuanbao_group_allow_from = os.getenv("YUANBAO_GROUP_ALLOW_FROM")
if yuanbao_group_allow_from:
extra["group_allow_from"] = yuanbao_group_allow_from
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
+1 -3
View File
@@ -79,9 +79,7 @@ _PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
"discord": _TIER_HIGH,
# Tier 2 — edit support, often customer/workspace channels
# Slack: tool_progress off by default — Bolt posts cannot be edited like CLI;
# "new"/"all" spam permanent lines in channels (hermes-agent#14663).
"slack": {**_TIER_MEDIUM, "tool_progress": "off"},
"slack": _TIER_MEDIUM,
"mattermost": _TIER_MEDIUM,
"matrix": _TIER_MEDIUM,
"feishu": _TIER_MEDIUM,
+28 -9
View File
@@ -21,6 +21,7 @@ Errors in hooks are caught and logged but never block the main pipeline.
import asyncio
import importlib.util
import sys
from typing import Any, Callable, Dict, List, Optional
import yaml
@@ -52,13 +53,19 @@ class HookRegistry:
return list(self._loaded_hooks)
def _register_builtin_hooks(self) -> None:
"""Register built-in hooks that are always active.
"""Register built-in hooks that are always active."""
try:
from gateway.builtin_hooks.boot_md import handle as boot_md_handle
Currently empty no shipped built-in hooks. Kept as the extension
point for future always-on gateway hooks so they drop in without
re-plumbing discover_and_load().
"""
return
self._handlers.setdefault("gateway:startup", []).append(boot_md_handle)
self._loaded_hooks.append({
"name": "boot-md",
"description": "Run ~/.hermes/BOOT.md on gateway startup",
"events": ["gateway:startup"],
"path": "(builtin)",
})
except Exception as e:
print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
def discover_and_load(self) -> None:
"""
@@ -97,16 +104,28 @@ class HookRegistry:
print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
continue
# Dynamically load the handler module
# Dynamically load the handler module.
# Register in sys.modules BEFORE exec_module so Pydantic /
# dataclasses / typing introspection can resolve forward
# references (triggered by `from __future__ import annotations`
# in the handler). Without this, a handler that declares a
# Pydantic BaseModel for webhook/event payloads fails at first
# dispatch with "TypeAdapter ... is not fully defined".
module_name = f"hermes_hook_{hook_name}"
spec = importlib.util.spec_from_file_location(
f"hermes_hook_{hook_name}", handler_path
module_name, handler_path
)
if spec is None or spec.loader is None:
print(f"[hooks] Skipping {hook_name}: could not load handler.py", flush=True)
continue
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
sys.modules[module_name] = module
try:
spec.loader.exec_module(module)
except Exception:
sys.modules.pop(module_name, None)
raise
handle_fn = getattr(module, "handle", None)
if handle_fn is None:
+11 -57
View File
@@ -28,7 +28,6 @@ def mirror_to_session(
message_text: str,
source_label: str = "cli",
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> bool:
"""
Append a delivery-mirror message to the target session's transcript.
@@ -40,20 +39,9 @@ def mirror_to_session(
All errors are caught -- this is never fatal.
"""
try:
session_id = _find_session_id(
platform,
str(chat_id),
thread_id=thread_id,
user_id=user_id,
)
session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
if not session_id:
logger.debug(
"Mirror: no session found for %s:%s:%s:%s",
platform,
chat_id,
thread_id,
user_id,
)
logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
return False
mirror_msg = {
@@ -71,33 +59,17 @@ def mirror_to_session(
return True
except Exception as e:
logger.debug(
"Mirror failed for %s:%s:%s:%s: %s",
platform,
chat_id,
thread_id,
user_id,
e,
)
logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
return False
def _find_session_id(
platform: str,
chat_id: str,
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> Optional[str]:
def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
"""
Find the active session_id for a platform + chat_id pair.
Scans sessions.json entries and matches where origin.chat_id == chat_id
on the right platform. DM session keys don't embed the chat_id
(e.g. "agent:main:telegram:dm"), so we check the origin dict.
When *user_id* is provided, prefer exact sender matches. If multiple
same-chat candidates exist and none matches the user, return None instead
of guessing and contaminating another participant's session.
"""
if not _SESSIONS_INDEX.exists():
return None
@@ -109,7 +81,8 @@ def _find_session_id(
return None
platform_lower = platform.lower()
candidates = []
best_match = None
best_updated = ""
for _key, entry in data.items():
origin = entry.get("origin") or {}
@@ -123,31 +96,12 @@ def _find_session_id(
origin_thread_id = origin.get("thread_id")
if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
continue
candidates.append(entry)
updated = entry.get("updated_at", "")
if updated > best_updated:
best_updated = updated
best_match = entry.get("session_id")
if not candidates:
return None
if user_id:
exact_user_matches = [
entry for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "") == str(user_id)
]
if exact_user_matches:
candidates = exact_user_matches
elif len(candidates) > 1:
return None
elif len(candidates) > 1:
distinct_user_ids = {
str((entry.get("origin") or {}).get("user_id") or "").strip()
for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "").strip()
}
if len(distinct_user_ids) > 1:
return None
best_entry = max(candidates, key=lambda entry: entry.get("updated_at", ""))
return best_entry.get("session_id")
return best_match
def _append_to_jsonl(session_id: str, message: dict) -> None:
+1 -2
View File
@@ -28,7 +28,6 @@ from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_dir
from utils import atomic_replace
# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
@@ -60,7 +59,7 @@ def _secure_write(path: Path, data: str) -> None:
f.write(data)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, path)
os.replace(tmp_path, str(path))
try:
os.chmod(path, 0o600)
except OSError:
-2
View File
@@ -10,12 +10,10 @@ Each adapter handles:
from .base import BasePlatformAdapter, MessageEvent, SendResult
from .qqbot import QQAdapter
from .yuanbao import YuanbaoAdapter
__all__ = [
"BasePlatformAdapter",
"MessageEvent",
"SendResult",
"QQAdapter",
"YuanbaoAdapter",
]
+11 -198
View File
@@ -307,14 +307,9 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""Build kwargs for standalone ``aiohttp.ClientSession`` with proxy.
Returns ``(session_kwargs, request_kwargs)`` where:
- With aiohttp-socks ``({"connector": ProxyConnector(...)}, {})``
for *all* proxy schemes (SOCKS **and** HTTP/HTTPS).
- HTTP without aiohttp-socks ``({}, {"proxy": url})``.
- None ``({}, {})``.
Prefer the connector path: it works transparently with libraries
(like mautrix) that call ``session.request()`` without forwarding
per-request ``proxy=`` kwargs.
- SOCKS ``({"connector": ProxyConnector(...)}, {})``
- HTTP ``({}, {"proxy": url})``
- None ``({}, {})``
Usage::
@@ -325,53 +320,20 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""
if not proxy_url:
return {}, {}
try:
from aiohttp_socks import ProxyConnector
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
if proxy_url.lower().startswith("socks"):
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}, {}
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
"""Return True when ``hostname`` matches a ``NO_PROXY`` entry.
Supports comma- or whitespace-separated entries with optional leading dots
and ``*.`` wildcards, which match both the apex domain and subdomains.
"""
raw = no_proxy_value
if raw is None:
raw = os.environ.get("NO_PROXY") or os.environ.get("no_proxy") or ""
raw = raw.strip()
if not raw:
return False
lower_hostname = hostname.lower()
for entry in re.split(r"[\s,]+", raw):
normalized = entry.strip().lower()
if not normalized:
continue
if normalized == "*":
return True
if normalized.startswith("*."):
normalized = normalized[2:]
elif normalized.startswith("."):
normalized = normalized[1:]
if lower_hostname == normalized or lower_hostname.endswith(f".{normalized}"):
return True
return False
return {}, {"proxy": proxy_url}
from dataclasses import dataclass, field
@@ -731,15 +693,7 @@ SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
".md": "text/markdown",
".txt": "text/plain",
".csv": "text/csv",
".log": "text/plain",
".json": "application/json",
".xml": "application/xml",
".yaml": "application/yaml",
".yml": "application/yaml",
".toml": "application/toml",
".ini": "text/plain",
".cfg": "text/plain",
".zip": "application/zip",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -907,41 +861,6 @@ class MessageEvent:
return args
_PLAINTEXT_GATEWAY_RESTART_PATTERNS: tuple[re.Pattern[str], ...] = (
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?gateway[.!?\s]*$", re.IGNORECASE),
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?hermes\s+gateway[.!?\s]*$", re.IGNORECASE),
re.compile(r"^(?:please\s+)?restart\s+hermes[.!?\s]*$", re.IGNORECASE),
)
def coerce_plaintext_gateway_command(event: "MessageEvent") -> None:
"""Rewrite a tiny set of DM plaintext admin phrases into slash commands.
This keeps high-impact operational phrases like ``restart gateway`` out of
the LLM/tool path, where they can trigger a self-restart from inside the
currently running agent and leave the gateway stuck in ``draining`` while it
waits for that same agent to finish.
Scope is intentionally narrow: DM text messages only, exact restart-style
phrases only. Group chats keep natural-language semantics.
"""
try:
if event is None or event.message_type != MessageType.TEXT:
return
text = (event.text or "").strip()
if not text or text.startswith("/"):
return
source = getattr(event, "source", None)
if getattr(source, "chat_type", None) != "dm":
return
for pattern in _PLAINTEXT_GATEWAY_RESTART_PATTERNS:
if pattern.match(text):
event.text = "/restart"
return
except Exception:
return
@dataclass
class SendResult:
"""Result of sending a message."""
@@ -1063,61 +982,6 @@ def resolve_channel_prompt(
return None
def resolve_channel_skills(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> list[str] | None:
"""Resolve auto-loaded skill(s) for a channel/thread from platform config.
Looks up ``channel_skill_bindings`` in the adapter's ``config.extra`` dict.
Config format::
channel_skill_bindings:
- id: "C0123" # Slack channel ID or Discord channel/forum ID
skills: ["skill-a", "skill-b"]
- id: "D0ABCDE"
skill: "solo-skill" # single string also accepted
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / Slack threads inheriting the parent channel's
binding).
Returns a deduplicated list of skill names (order preserved), or None if
no match is found.
"""
bindings = config_extra.get("channel_skill_bindings") or []
if not isinstance(bindings, list) or not bindings:
return None
ids_to_check: set[str] = set()
if channel_id:
ids_to_check.add(str(channel_id))
if parent_id:
ids_to_check.add(str(parent_id))
if not ids_to_check:
return None
for entry in bindings:
if not isinstance(entry, dict):
continue
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
s = skills.strip()
return [s] if s else None
if isinstance(skills, list) and skills:
seen: list[str] = []
for name in skills:
if not isinstance(name, str):
continue
nm = name.strip()
if nm and nm not in seen:
seen.append(nm)
return seen or None
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -1394,27 +1258,6 @@ class BasePlatformAdapter(ABC):
"""
return SendResult(success=False, error="Not supported")
async def delete_message(
self,
chat_id: str,
message_id: str,
) -> bool:
"""
Delete a previously sent message. Optional platforms that don't
support deletion return ``False`` and callers fall back to leaving
the message in place.
Used by the stream consumer's fresh-final cleanup path (see
openclaw/openclaw#72038) to remove long-lived preview messages
after sending the completed reply as a fresh message so the
platform's visible timestamp reflects completion time.
Returns ``True`` on successful deletion, ``False`` otherwise.
Subclasses should override for platforms with a deletion API
(e.g. Telegram ``deleteMessage``).
"""
return False
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""
Send a typing indicator.
@@ -1742,41 +1585,13 @@ class BasePlatformAdapter(ABC):
the agent is waiting for dangerous-command approval). This is critical
for Slack's Assistant API where ``assistant_threads_setStatus`` disables
the compose box pausing lets the user type ``/approve`` or ``/deny``.
Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
network round-trip can't stall the refresh cadence. Telegram- and
Discord-side typing expire after ~5s; if any individual send_typing
takes longer than the refresh interval, the bubble would die and
stay dead until that call returns. Abandoning the slow call lets
the next tick fire a fresh send_typing on schedule as long as
one of them succeeds within the 5s platform-side window, the bubble
stays visible across provider stalls / upstream API timeouts.
"""
# Bound each send_typing round-trip so the refresh cadence isn't
# gated on network health. Must stay below ``interval`` so a slow
# call gets abandoned before the next scheduled tick.
_send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
try:
while True:
if stop_event is not None and stop_event.is_set():
return
if chat_id not in self._typing_paused:
try:
await asyncio.wait_for(
self.send_typing(chat_id, metadata=metadata),
timeout=_send_typing_timeout,
)
except asyncio.TimeoutError:
# Slow network — abandon this tick, keep the loop
# on schedule so the next send_typing fires fresh.
pass
except asyncio.CancelledError:
raise
except Exception as typing_err:
logger.debug(
"[%s] send_typing error (non-fatal): %s",
self.name, typing_err,
)
await self.send_typing(chat_id, metadata=metadata)
if stop_event is None:
await asyncio.sleep(interval)
continue
@@ -2228,8 +2043,6 @@ class BasePlatformAdapter(ABC):
"""
if not self._message_handler:
return
coerce_plaintext_gateway_command(event)
session_key = build_session_key(
event.source,
+19 -18
View File
@@ -305,7 +305,7 @@ class VoiceReceiver:
encrypted = bytes(payload_with_nonce[:-4])
try:
import nacl.secret # noqa: E402 — delayed import, only in voice path
import nacl.secret # noqa: delayed import only in voice path
box = nacl.secret.Aead(self._secret_key)
decrypted = box.decrypt(encrypted, header, bytes(nonce))
except Exception as e:
@@ -813,14 +813,7 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
return
# Discord's per-app command-management bucket is ~5 writes / 20 s,
# so a mass-prune-plus-upsert reconcile (e.g. 77 orphans + 30
# desired = 107 writes) takes several minutes of forced waits.
# A flat 30 s budget blew up reliably under bucket pressure and
# left slash commands broken for ~60 min until the bucket fully
# recovered. Use a wide ceiling; the cap still guards against a
# true hang. (#16713)
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=30)
logger.info(
"[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
self.name,
@@ -832,11 +825,7 @@ class DiscordAdapter(BasePlatformAdapter):
summary["deleted"],
)
except asyncio.TimeoutError:
logger.warning(
"[%s] Slash command sync timed out — Discord rate-limit bucket "
"may be saturated; will retry on next reconnect",
self.name,
)
logger.warning("[%s] Slash command sync timed out after 30s", self.name)
except asyncio.CancelledError:
raise
except Exception as e: # pragma: no cover - defensive logging
@@ -2690,8 +2679,21 @@ class DiscordAdapter(BasePlatformAdapter):
skills: ["skill-a", "skill-b"]
Also checks parent_id so forum threads inherit the forum's bindings.
"""
from gateway.platforms.base import resolve_channel_skills
return resolve_channel_skills(self.config.extra, channel_id, parent_id)
bindings = self.config.extra.get("channel_skill_bindings", [])
if not bindings:
return None
ids_to_check = {channel_id}
if parent_id:
ids_to_check.add(parent_id)
for entry in bindings:
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
return [skills]
if isinstance(skills, list) and skills:
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
@@ -3305,7 +3307,6 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)
# Build source
guild = getattr(message, "guild", None)
source = self.build_source(
chat_id=str(effective_channel.id),
chat_name=chat_name,
@@ -3315,7 +3316,7 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(guild.id) if guild else None,
guild_id=str(message.guild.id) if message.guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
-3
View File
@@ -28,7 +28,6 @@ from email.header import decode_header
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email.utils import formatdate
from email import encoders
from pathlib import Path
from typing import Any, Dict, List, Optional
@@ -505,7 +504,6 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
@@ -588,7 +586,6 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
+1
View File
@@ -974,6 +974,7 @@ def build_whole_comment_prompt(
def _resolve_model_and_runtime() -> Tuple[str, dict]:
"""Resolve model and provider credentials, same as gateway message handling."""
import os
from gateway.run import _load_gateway_config, _resolve_gateway_model
user_config = _load_gateway_config()
+2 -11
View File
@@ -11,10 +11,10 @@ import logging
import re
import time
from pathlib import Path
from typing import TYPE_CHECKING, Dict
from typing import TYPE_CHECKING, Dict, Optional
if TYPE_CHECKING:
from gateway.platforms.base import MessageEvent
from gateway.platforms.base import BasePlatformAdapter, MessageEvent
logger = logging.getLogger(__name__)
@@ -57,15 +57,6 @@ class MessageDeduplicator:
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
self._seen = {k: v for k, v in self._seen.items() if v > cutoff}
if len(self._seen) > self._max_size:
# TTL pruning alone does not cap the cache when every entry is
# still fresh. Keep the newest entries so the helper's
# max_size bound is enforced under sustained traffic.
newest = sorted(
self._seen.items(),
key=lambda item: item[1],
)[-self._max_size:]
self._seen = dict(newest)
return False
def clear(self):
+47 -493
View File
@@ -11,7 +11,6 @@ Environment variables:
MATRIX_PASSWORD Password (alternative to access token)
MATRIX_ENCRYPTION Set "true" to enable E2EE
MATRIX_DEVICE_ID Stable device ID for E2EE persistence across restarts
MATRIX_PROXY HTTP(S) or SOCKS proxy URL for Matrix traffic
MATRIX_ALLOWED_USERS Comma-separated Matrix user IDs (@user:server)
MATRIX_HOME_ROOM Room ID for cron/notification delivery
MATRIX_REACTIONS Set "false" to disable processing lifecycle reactions
@@ -19,7 +18,6 @@ Environment variables:
MATRIX_REQUIRE_MENTION Require @mention in rooms (default: true)
MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement
MATRIX_AUTO_THREAD Auto-create threads for room messages (default: true)
MATRIX_DM_AUTO_THREAD Auto-create threads for DM messages (default: false)
MATRIX_RECOVERY_KEY Recovery key for cross-signing verification after device key rotation
MATRIX_DM_MENTION_THREADS Create a thread when bot is @mentioned in a DM (default: false)
"""
@@ -32,8 +30,6 @@ import mimetypes
import os
import re
import time
from dataclasses import dataclass
from html import escape as _html_escape
from pathlib import Path
from typing import Any, Dict, Optional, Set
@@ -99,25 +95,11 @@ from gateway.platforms.base import (
MessageType,
ProcessingOutcome,
SendResult,
resolve_proxy_url,
proxy_kwargs_for_aiohttp,
)
from gateway.platforms.helpers import ThreadParticipationTracker
logger = logging.getLogger(__name__)
@dataclass
class _MatrixApprovalPrompt:
"""Tracks a pending Matrix reaction-based exec approval prompt."""
def __init__(self, session_key: str, chat_id: str, message_id: str, resolved: bool = False):
self.session_key = session_key
self.chat_id = chat_id
self.message_id = message_id
self.resolved = resolved
self.bot_reaction_events: dict[str, str] = {} # emoji -> event_id
# Matrix message size limit (4000 chars practical, spec has no hard limit
# but clients render poorly above this).
MAX_MESSAGE_LENGTH = 4000
@@ -132,85 +114,11 @@ _CRYPTO_DB_PATH = _STORE_DIR / "crypto.db"
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
_OUTBOUND_MENTION_RE = re.compile(
r"(?<![\w/])(@[0-9A-Za-z._=/-]+:[0-9A-Za-z.-]+(?::\d+)?)"
)
_E2EE_INSTALL_HINT = (
"Install with: pip install 'mautrix[encryption]' (requires libolm C library)"
)
_MATRIX_IMAGE_FILENAME_EXTS = frozenset({
".jpg",
".jpeg",
".png",
".gif",
".webp",
".bmp",
".svg",
".heic",
".heif",
".avif",
})
def _looks_like_matrix_image_filename(text: str) -> bool:
"""Return True when Matrix image body text is probably just a transport filename.
Matrix ``m.image`` events commonly populate ``content.body`` with the uploaded
filename when the user did not add a caption. Treating that raw filename as
user-authored text confuses downstream vision enrichment.
"""
candidate = str(text or "").strip()
if not candidate or "\n" in candidate or candidate.endswith("/"):
return False
name = Path(candidate).name
if not name or name != candidate:
return False
suffix = Path(name).suffix.lower()
if not suffix:
return False
guessed_type, _ = mimetypes.guess_type(name)
if guessed_type and guessed_type.startswith("image/"):
return True
return suffix in _MATRIX_IMAGE_FILENAME_EXTS
def _create_matrix_session(proxy_url: str | None):
"""Create an ``aiohttp.ClientSession`` whose proxy applies to *all* requests.
mautrix's ``HTTPAPI._send()`` calls ``session.request()`` without forwarding
per-request ``proxy=`` kwargs. For HTTP(S) proxies we use aiohttp's native
``proxy=`` session parameter which sets a default for every request. For SOCKS
we use ``aiohttp_socks.ProxyConnector`` (connector-level).
When no proxy is configured we enable ``trust_env`` so standard env vars
(``HTTP_PROXY`` / ``HTTPS_PROXY``) are honoured automatically.
"""
import aiohttp
if not proxy_url:
return aiohttp.ClientSession(trust_env=True)
if proxy_url.split("://")[0].lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
return aiohttp.ClientSession(
connector=ProxyConnector.from_url(proxy_url, rdns=True),
)
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return aiohttp.ClientSession(trust_env=True)
return aiohttp.ClientSession(proxy=proxy_url)
def _check_e2ee_deps() -> bool:
"""Return True if mautrix E2EE dependencies (python-olm) are available."""
@@ -352,9 +260,6 @@ class MatrixAdapter(BasePlatformAdapter):
"1",
"yes",
)
self._dm_auto_thread: bool = os.getenv(
"MATRIX_DM_AUTO_THREAD", "false"
).lower() in ("true", "1", "yes")
self._dm_mention_threads: bool = os.getenv(
"MATRIX_DM_MENTION_THREADS", "false"
).lower() in ("true", "1", "yes")
@@ -365,11 +270,6 @@ class MatrixAdapter(BasePlatformAdapter):
).lower() not in ("false", "0", "no")
self._pending_reactions: dict[tuple[str, str], str] = {}
# Proxy support — resolve once at init, reuse for all HTTP traffic.
self._proxy_url: str | None = resolve_proxy_url(platform_env_var="MATRIX_PROXY")
if self._proxy_url:
logger.info("Matrix: proxy configured — %s", self._proxy_url)
# Text batching: merge rapid successive messages (Telegram-style).
# Matrix clients split long messages around 4000 chars.
self._text_batch_delay_seconds = float(
@@ -381,18 +281,6 @@ class MatrixAdapter(BasePlatformAdapter):
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
# Matrix reaction-based dangerous command approvals.
self._approval_reaction_map = {
"": "once",
"": "deny",
}
self._approval_prompts_by_event: Dict[str, _MatrixApprovalPrompt] = {}
self._approval_prompt_by_session: Dict[str, str] = {}
allowed_users_raw = os.getenv("MATRIX_ALLOWED_USERS", "")
self._allowed_user_ids: Set[str] = {
u.strip() for u in allowed_users_raw.split(",") if u.strip()
}
def _is_duplicate_event(self, event_id) -> bool:
"""Return True if this event was already processed. Tracks the ID otherwise."""
if not event_id:
@@ -438,7 +326,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
return False
except Exception as exc:
logger.error("Matrix: post-upload key verification failed: %s", exc, exc_info=True)
logger.error("Matrix: post-upload key verification failed: %s", exc)
return False
return True
@@ -454,7 +342,6 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: cannot verify device keys on server: %s — refusing E2EE",
exc,
exc_info=True,
)
return False
@@ -469,7 +356,7 @@ class MatrixAdapter(BasePlatformAdapter):
try:
await olm.share_keys()
except Exception as exc:
logger.error("Matrix: failed to re-upload device keys: %s", exc, exc_info=True)
logger.error("Matrix: failed to re-upload device keys: %s", exc)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -509,7 +396,6 @@ class MatrixAdapter(BasePlatformAdapter):
"Try generating a new access token to get a fresh device.",
client.device_id,
exc,
exc_info=True,
)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -534,11 +420,9 @@ class MatrixAdapter(BasePlatformAdapter):
_STORE_DIR.mkdir(parents=True, exist_ok=True)
# Create the HTTP API layer.
client_session = _create_matrix_session(self._proxy_url)
api = HTTPAPI(
base_url=self._homeserver,
token=self._access_token or "",
client_session=client_session,
)
# Create the client.
@@ -581,7 +465,6 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER: %s",
exc,
exc_info=True,
)
await api.session.close()
return False
@@ -724,44 +607,6 @@ class MatrixAdapter(BasePlatformAdapter):
logger.warning(
"Matrix: recovery key verification failed: %s", exc
)
else:
# No recovery key — bootstrap cross-signing if the bot
# has none yet. Without this, Element shows "Encrypted
# by a device not verified by its owner" on every
# message from this bot, indefinitely. mautrix's
# generate_recovery_key does the full flow: generates
# MSK/SSK/USK, uploads private keys to SSSS, publishes
# public keys to the homeserver, and signs the current
# device with the new SSK. Some homeservers require UIA
# for /keys/device_signing/upload — those will need an
# alternate path; Continuwuity and Synapse-with-shared-
# secret accept the unauthenticated upload.
try:
own_xsign = await olm.get_own_cross_signing_public_keys()
except Exception as exc:
own_xsign = None
logger.warning(
"Matrix: cross-signing key lookup failed: %s", exc
)
if own_xsign is None:
try:
new_recovery_key = await olm.generate_recovery_key()
logger.warning(
"Matrix: bootstrapped cross-signing for %s. "
"SAVE THIS RECOVERY KEY — set "
"MATRIX_RECOVERY_KEY for future restarts so "
"the bot can re-sign its device after key "
"rotation: %s",
client.mxid,
new_recovery_key,
)
except Exception as exc:
logger.warning(
"Matrix: cross-signing bootstrap failed "
"(non-fatal — Element will show 'not "
"verified by its owner'): %s",
exc,
)
client.crypto = olm
logger.info(
@@ -819,7 +664,6 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: initial sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
else:
logger.warning(
"Matrix: initial sync returned unexpected type %s",
@@ -883,8 +727,17 @@ class MatrixAdapter(BasePlatformAdapter):
chunks = self.truncate_message(formatted, MAX_MESSAGE_LENGTH)
last_event_id = None
for i, chunk in enumerate(chunks):
msg_content = self._build_text_message_content(chunk)
for chunk in chunks:
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": chunk,
}
# Convert markdown to HTML for rich rendering.
html = self._markdown_to_html(chunk)
if html and html != chunk:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
# Reply-to support.
if reply_to:
@@ -991,21 +844,25 @@ class MatrixAdapter(BasePlatformAdapter):
"""Edit an existing message (via m.replace)."""
formatted = self.format_message(content)
new_content = self._build_text_message_content(formatted)
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": f"* {formatted}",
"m.new_content": new_content,
"m.new_content": {
"msgtype": "m.text",
"body": formatted,
},
"m.relates_to": {
"rel_type": "m.replace",
"event_id": message_id,
},
}
if "m.mentions" in new_content:
msg_content["m.mentions"] = new_content["m.mentions"]
if "formatted_body" in new_content:
html = self._markdown_to_html(formatted)
if html and html != formatted:
msg_content["m.new_content"]["format"] = "org.matrix.custom.html"
msg_content["m.new_content"]["formatted_body"] = html
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = f'* {new_content["formatted_body"]}'
msg_content["m.relates_to"] = {
"rel_type": "m.replace",
"event_id": message_id,
}
msg_content["formatted_body"] = f"* {html}"
try:
event_id = await self._client.send_message_event(
@@ -1038,12 +895,10 @@ class MatrixAdapter(BasePlatformAdapter):
# Try aiohttp first (always available), fall back to httpx
try:
import aiohttp as _aiohttp
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(self._proxy_url)
async with _aiohttp.ClientSession(**_sess_kw) as http:
async with _aiohttp.ClientSession(trust_env=True) as http:
async with http.get(
image_url,
timeout=_aiohttp.ClientTimeout(total=30),
**_req_kw,
image_url, timeout=_aiohttp.ClientTimeout(total=30)
) as resp:
resp.raise_for_status()
data = await resp.read()
@@ -1053,10 +908,8 @@ class MatrixAdapter(BasePlatformAdapter):
)
except ImportError:
import httpx
_httpx_kw: dict = {}
if self._proxy_url:
_httpx_kw["proxy"] = self._proxy_url
async with httpx.AsyncClient(**_httpx_kw) as http:
async with httpx.AsyncClient() as http:
resp = await http.get(image_url, follow_redirects=True, timeout=30)
resp.raise_for_status()
data = resp.content
@@ -1131,56 +984,6 @@ class MatrixAdapter(BasePlatformAdapter):
chat_id, video_path, "m.video", caption, reply_to, metadata=metadata
)
async def send_exec_approval(
self,
chat_id: str,
command: str,
session_key: str,
description: str = "dangerous command",
metadata: Optional[dict] = None,
) -> SendResult:
"""Send a reaction-based exec approval prompt for Matrix."""
if not self._client:
return SendResult(success=False, error="Not connected")
cmd_preview = command[:2000] + "..." if len(command) > 2000 else command
text = (
"⚠️ **Dangerous command requires approval**\n"
f"```\n{cmd_preview}\n```\n"
f"Reason: {description}\n\n"
"Reply `/approve` to execute, `/approve session` to approve this pattern for the session, "
"`/approve always` to approve permanently, or `/deny` to cancel.\n\n"
"You can also click the reaction to approve:\n"
"✅ = /approve\n"
"❎ = /deny"
)
result = await self.send(chat_id, text, metadata=metadata)
if not result.success or not result.message_id:
return result
prompt = _MatrixApprovalPrompt(
session_key=session_key,
chat_id=chat_id,
message_id=result.message_id,
)
old_event = self._approval_prompt_by_session.get(session_key)
if old_event:
self._approval_prompts_by_event.pop(old_event, None)
self._approval_prompts_by_event[result.message_id] = prompt
self._approval_prompt_by_session[session_key] = result.message_id
for emoji in ("", ""):
try:
reaction_result = await self._send_reaction(chat_id, result.message_id, emoji)
# Save the bot's reaction event_id for later cleanup
if reaction_result:
prompt.bot_reaction_events[emoji] = str(reaction_result)
except Exception as exc:
logger.debug("Matrix: failed to add approval reaction %s: %s", emoji, exc)
return result
def format_message(self, content: str) -> str:
"""Pass-through — Matrix supports standard Markdown natively."""
# Strip image markdown; media is uploaded separately.
@@ -1312,15 +1115,9 @@ class MatrixAdapter(BasePlatformAdapter):
next_batch = await client.sync_store.get_next_batch()
while not self._closing:
try:
# Wrap in asyncio.wait_for to guard against TCP-level hangs
# that the Matrix long-poll timeout cannot catch. Long-poll
# is 30s, so 45s gives 15s slack for network drain.
sync_data = await asyncio.wait_for(
client.sync(
since=next_batch,
timeout=30000,
),
timeout=45.0,
sync_data = await client.sync(
since=next_batch,
timeout=30000,
)
# nio returns SyncError objects (not exceptions) for auth
@@ -1356,7 +1153,6 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
except asyncio.CancelledError:
return
@@ -1382,92 +1178,13 @@ class MatrixAdapter(BasePlatformAdapter):
# Event callbacks
# ------------------------------------------------------------------
def _is_self_sender(self, sender: str) -> bool:
"""Return True if the sender refers to the bot's own account.
Matrix user IDs are byte-compared after trimming whitespace and
lowercasing some homeservers normalize the localpart case
differently at different API surfaces, and the reply-loop tail
of the "hall of mirrors" bug (#15763) has been observed with the
bot's own account bypassing a case-sensitive equality check.
When ``self._user_id`` is empty (whoami hasn't resolved yet, or
login failed), we cannot prove a sender is NOT us, so we return
True defensively an unidentified bot dropping its own events
is always preferable to falling into an echo loop.
"""
own = (self._user_id or "").strip().lower()
if not own:
return True
return sender.strip().lower() == own
@staticmethod
def _is_system_or_bridge_sender(sender: str) -> bool:
"""Return True if the sender looks like a system / bridge / appservice
identity rather than a real user.
Appservice namespaces on Matrix conventionally prefix bot / puppet
user IDs with an underscore (e.g. ``@_telegram_12345:server``,
``@_discord_999:server``, ``@_slack_...:server``). Server-notices
bots and bridge-controller bots on many homeservers use the same
pattern.
We treat these as system identities for pairing purposes: they
should never be offered a pairing code, because an operator
approving the code would hand the bridge itself permanent
authorization and every outbound message relayed by the bridge
would then loop back into the agent as an "authorized user
message", which is the root of issue #15763.
Matches:
``@_something:server`` appservice namespace convention
``@:server`` malformed / empty localpart
``:server`` malformed, no leading ``@``
"""
s = (sender or "").strip()
if not s:
return True
# Localpart is everything between leading '@' and ':'
if s.startswith("@"):
s = s[1:]
if ":" in s:
localpart, _, _ = s.partition(":")
else:
localpart = s
if not localpart:
return True
return localpart.startswith("_")
async def _on_room_message(self, event: Any) -> None:
"""Handle incoming room message events (text, media)."""
room_id = str(getattr(event, "room_id", ""))
sender = str(getattr(event, "sender", ""))
# Diagnostic: confirm the callback is firing at all when DEBUG is on.
# Helps users troubleshoot silent inbound issues like #5819, #7914, #12614.
logger.debug(
"Matrix: callback fired — event %s from %s in %s",
getattr(event, "event_id", "?"),
sender,
room_id,
)
# Ignore own messages (case-insensitive; also drops when our own
# user_id hasn't been resolved yet — see _is_self_sender docstring
# and issue #15763).
if self._is_self_sender(sender):
return
# Ignore appservice / bridge / system identities so they never
# trigger the pairing flow. Once a bridge user is paired, every
# outbound message it relays would loop back as an authorized
# user message (the "hall of mirrors" in #15763).
if self._is_system_or_bridge_sender(sender):
logger.debug(
"Matrix: ignoring system/bridge sender %s in %s",
sender,
room_id,
)
# Ignore own messages.
if sender == self._user_id:
return
# Deduplicate by event ID.
@@ -1563,12 +1280,6 @@ class MatrixAdapter(BasePlatformAdapter):
in_bot_thread = bool(thread_id and thread_id in self._threads)
if self._require_mention and not is_free_room and not in_bot_thread:
if not is_mentioned:
logger.debug(
"Matrix: ignoring message %s in %s — no @mention "
"(set MATRIX_REQUIRE_MENTION=false to disable)",
event_id,
room_id,
)
return None
# DM mention-thread.
@@ -1581,7 +1292,7 @@ class MatrixAdapter(BasePlatformAdapter):
body = self._strip_mention(body)
# Auto-thread.
if not thread_id and ((not is_dm and self._auto_thread) or (is_dm and self._dm_auto_thread)):
if not is_dm and not thread_id and self._auto_thread:
thread_id = event_id
self._threads.mark(thread_id)
@@ -1823,9 +1534,6 @@ class MatrixAdapter(BasePlatformAdapter):
return
body, is_dm, chat_type, thread_id, display_name, source = ctx
if msgtype == "m.image" and _looks_like_matrix_image_filename(body):
body = ""
allow_http_fallback = bool(http_url) and not is_encrypted_media
media_urls = (
[cached_path]
@@ -1855,35 +1563,13 @@ class MatrixAdapter(BasePlatformAdapter):
"Matrix: invited to %s — joining",
room_id,
)
await self._join_room_by_id(room_id)
async def _join_room_by_id(self, room_id: str) -> bool:
"""Join a room by ID and refresh local caches on success."""
if not room_id:
return False
if room_id in self._joined_rooms:
return True
try:
await self._client.join_room(RoomID(room_id))
self._joined_rooms.add(room_id)
logger.info("Matrix: joined %s", room_id)
await self._refresh_dm_cache()
return True
except Exception as exc:
logger.warning("Matrix: error joining %s: %s", room_id, exc)
return False
async def _join_pending_invites(self, sync_data: Dict[str, Any]) -> None:
"""Join rooms still present in rooms.invite after sync processing."""
rooms = sync_data.get("rooms", {}) if isinstance(sync_data, dict) else {}
invites = rooms.get("invite", {})
if not isinstance(invites, dict):
return
for room_id in invites:
if room_id in self._joined_rooms:
continue
logger.info("Matrix: reconciling pending invite for %s", room_id)
await self._join_room_by_id(str(room_id))
# ------------------------------------------------------------------
# Reactions (send, receive, processing lifecycle)
@@ -1968,7 +1654,7 @@ class MatrixAdapter(BasePlatformAdapter):
async def _on_reaction(self, event: Any) -> None:
"""Handle incoming reaction events."""
sender = str(getattr(event, "sender", ""))
if self._is_self_sender(sender):
if sender == self._user_id:
return
event_id = str(getattr(event, "event_id", ""))
if self._is_duplicate_event(event_id):
@@ -1998,51 +1684,6 @@ class MatrixAdapter(BasePlatformAdapter):
room_id,
)
# Check if this reaction resolves a pending approval prompt.
prompt = self._approval_prompts_by_event.get(reacts_to)
if prompt and not prompt.resolved:
if room_id != prompt.chat_id:
return
if self._allowed_user_ids and sender not in self._allowed_user_ids:
logger.info(
"Matrix: ignoring approval reaction from unauthorized user %s on %s",
sender, reacts_to,
)
return
choice = self._approval_reaction_map.get(key)
if not choice:
return
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(prompt.session_key, choice)
if count:
prompt.resolved = True
self._approval_prompts_by_event.pop(reacts_to, None)
self._approval_prompt_by_session.pop(prompt.session_key, None)
logger.info(
"Matrix reaction resolved %d approval(s) for session %s "
"(choice=%s, user=%s)",
count, prompt.session_key, choice, sender,
)
# Redact bot's seed reactions, leaving only the user's
await self._redact_bot_approval_reactions(room_id, prompt)
except Exception as exc:
logger.error("Failed to resolve gateway approval from Matrix reaction: %s", exc)
async def _redact_bot_approval_reactions(
self,
room_id: str,
prompt: "_MatrixApprovalPrompt",
) -> None:
"""Redact the bot's seed ✅/❎ reactions, leaving only the user's reaction."""
for emoji, evt_id in prompt.bot_reaction_events.items():
try:
await self.redact_message(room_id, evt_id, "approval resolved")
logger.debug("Matrix: redacted bot reaction %s (%s)", emoji, evt_id)
except Exception as exc:
logger.debug("Matrix: failed to redact bot reaction %s: %s", emoji, exc)
# ------------------------------------------------------------------
# Text message aggregation (handles Matrix client-side splits)
# ------------------------------------------------------------------
@@ -2268,7 +1909,11 @@ class MatrixAdapter(BasePlatformAdapter):
if not self._client or not text:
return SendResult(success=False, error="No client or empty text")
msg_content = self._build_text_message_content(text, msgtype=msgtype)
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
html = self._markdown_to_html(text)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
try:
event_id = await self._client.send_message_event(
@@ -2331,77 +1976,6 @@ class MatrixAdapter(BasePlatformAdapter):
# Mention detection helpers
# ------------------------------------------------------------------
def _build_text_message_content(self, text: str, msgtype: str = "m.text") -> Dict[str, Any]:
"""Build Matrix text content with HTML and outbound mention metadata."""
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
mention_user_ids = self._extract_outbound_mentions(text)
if mention_user_ids:
msg_content["m.mentions"] = {"user_ids": mention_user_ids}
html_source = self._inject_outbound_mention_links(text)
html = self._markdown_to_html(html_source)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
return msg_content
def _extract_outbound_mentions(self, text: str) -> list[str]:
"""Return unique Matrix user IDs mentioned in outbound text."""
protected, _ = self._protect_outbound_mention_regions(text)
seen: Set[str] = set()
mentions: list[str] = []
for match in _OUTBOUND_MENTION_RE.finditer(protected):
user_id = match.group(1)
if user_id not in seen:
seen.add(user_id)
mentions.append(user_id)
return mentions
def _inject_outbound_mention_links(self, text: str) -> str:
"""Wrap outbound Matrix mentions in markdown links outside code spans."""
if not text:
return text
protected, placeholders = self._protect_outbound_mention_regions(text)
linked = _OUTBOUND_MENTION_RE.sub(
lambda match: f"[{match.group(1)}](https://matrix.to/#/{match.group(1)})",
protected,
)
for idx, original in enumerate(placeholders):
linked = linked.replace(f"\x00MENTION_PROTECTED{idx}\x00", original)
return linked
def _protect_outbound_mention_regions(self, text: str) -> tuple[str, list[str]]:
"""Protect markdown regions where outbound mentions should stay literal."""
placeholders: list[str] = []
def _protect(fragment: str) -> str:
idx = len(placeholders)
placeholders.append(fragment)
return f"\x00MENTION_PROTECTED{idx}\x00"
protected = re.sub(
r"```[\s\S]*?```",
lambda match: _protect(match.group(0)),
text or "",
)
protected = re.sub(
r"`[^`\n]+`",
lambda match: _protect(match.group(0)),
protected,
)
protected = re.sub(
r"\[[^\]]+\]\([^)]+\)",
lambda match: _protect(match.group(0)),
protected,
)
return protected, placeholders
def _is_bot_mentioned(
self,
body: str,
@@ -2436,33 +2010,13 @@ class MatrixAdapter(BasePlatformAdapter):
return False
def _strip_mention(self, body: str) -> str:
"""Remove explicit bot mentions from message body.
"""Strip the bot's full MXID (``@user:server``) from *body*.
Important: only strip explicit mention tokens (``@user:server`` or
``@localpart``). Do NOT strip bare words matching the bot localpart,
otherwise normal phrases like "Hermes Agent" become "Agent".
The bare localpart is intentionally *not* stripped it would
mangle file paths like ``/home/hermes/media/file.png``.
"""
if not body:
return ""
# Strip explicit full MXID mentions.
if self._user_id:
body = body.replace(self._user_id, "")
# Strip explicit @localpart mentions only (not bare localpart words).
if self._user_id and ":" in self._user_id:
localpart = self._user_id.split(":")[0].lstrip("@")
if localpart:
body = re.sub(
r'(?<![\w])@' + re.escape(localpart) + r'\b',
'',
body,
flags=re.IGNORECASE,
)
# Normalize spacing after mention removal.
body = re.sub(r'[ \t]{2,}', ' ', body)
body = re.sub(r'\s+([,.;:!?])', r'\1', body)
return body.strip()
async def _get_display_name(self, room_id: str, user_id: str) -> str:
+1
View File
@@ -412,6 +412,7 @@ class MattermostAdapter(BasePlatformAdapter):
import aiohttp
last_exc = None
file_data = None
ct = "application/octet-stream"
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
+7 -2
View File
@@ -1957,7 +1957,7 @@ class QQAdapter(BasePlatformAdapter):
self, openid: str, content: str, reply_to: Optional[str] = None
) -> SendResult:
"""Send text to a C2C user via REST API."""
self._next_msg_seq(reply_to or openid)
msg_seq = self._next_msg_seq(reply_to or openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
@@ -1970,7 +1970,7 @@ class QQAdapter(BasePlatformAdapter):
self, group_openid: str, content: str, reply_to: Optional[str] = None
) -> SendResult:
"""Send text to a group via REST API."""
self._next_msg_seq(reply_to or group_openid)
msg_seq = self._next_msg_seq(reply_to or group_openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
@@ -2135,6 +2135,11 @@ class QQAdapter(BasePlatformAdapter):
# Route
chat_type = self._guess_chat_type(chat_id)
target_path = (
f"/v2/users/{chat_id}/files"
if chat_type == "c2c"
else f"/v2/groups/{chat_id}/files"
)
if chat_type == "guild":
# Guild channels don't support native media upload in the same way
File diff suppressed because it is too large Load Diff
+14 -138
View File
@@ -84,7 +84,6 @@ from gateway.platforms.telegram_network import (
discover_fallback_ips,
parse_fallback_ip_env,
)
from utils import atomic_replace
def check_telegram_requirements() -> bool:
@@ -123,12 +122,12 @@ def _strip_mdv2(text: str) -> str:
# ---------------------------------------------------------------------------
# Markdown table → Telegram-friendly row groups
# Markdown table → code block conversion
# ---------------------------------------------------------------------------
# Telegram's MarkdownV2 has no table syntax — '|' is just an escaped literal,
# so pipe tables render as noisy backslash-pipe text with no alignment.
# Reformating each row into a bold heading plus bullet list keeps the content
# readable on mobile clients while preserving the source data.
# Wrapping the table in a fenced code block makes Telegram render it as
# monospace preformatted text with columns intact.
# Matches a GFM table delimiter row: optional outer pipes, cells containing
# only dashes (with optional leading/trailing colons for alignment) separated
@@ -145,49 +144,13 @@ def _is_table_row(line: str) -> bool:
return bool(stripped) and '|' in stripped
def _split_markdown_table_row(line: str) -> list[str]:
"""Split a simple GFM table row into stripped cell values."""
stripped = line.strip()
if stripped.startswith("|"):
stripped = stripped[1:]
if stripped.endswith("|"):
stripped = stripped[:-1]
return [cell.strip() for cell in stripped.split("|")]
def _render_table_block_for_telegram(table_block: list[str]) -> str:
"""Render a detected GFM table as Telegram-friendly row groups."""
if len(table_block) < 3:
return "\n".join(table_block)
headers = _split_markdown_table_row(table_block[0])
if len(headers) < 2:
return "\n".join(table_block)
rendered_rows: list[str] = []
for index, row in enumerate(table_block[2:], start=1):
cells = _split_markdown_table_row(row)
if len(cells) < len(headers):
cells.extend([""] * (len(headers) - len(cells)))
elif len(cells) > len(headers):
cells = cells[: len(headers)]
heading = next((cell for cell in cells if cell), f"Row {index}")
rendered_rows.append(f"**{heading}**")
rendered_rows.extend(
f"{header}: {value}" for header, value in zip(headers, cells)
)
return "\n\n".join(rendered_rows)
def _wrap_markdown_tables(text: str) -> str:
"""Rewrite GFM-style pipe tables into Telegram-friendly bullet groups.
"""Wrap GFM-style pipe tables in ``` fences so Telegram renders them.
Detected by a row containing '|' immediately followed by a delimiter
row matching :data:`_TABLE_SEPARATOR_RE`. Subsequent pipe-containing
non-blank lines are consumed as the table body and rewritten as
per-row bullet groups. Tables inside existing fenced code blocks are left
non-blank lines are consumed as the table body and included in the
wrapped block. Tables inside existing fenced code blocks are left
alone.
"""
if '|' not in text or '-' not in text:
@@ -224,7 +187,9 @@ def _wrap_markdown_tables(text: str) -> str:
while j < len(lines) and _is_table_row(lines[j]):
table_block.append(lines[j])
j += 1
out.append(_render_table_block_for_telegram(table_block))
out.append('```')
out.extend(table_block)
out.append('```')
i = j
continue
@@ -369,49 +334,6 @@ class TelegramAdapter(BasePlatformAdapter):
return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
return {"disable_web_page_preview": True}
async def _drain_polling_connections(self) -> None:
"""Reset the httpx connection pool used for getUpdates polling.
Network errors (especially through proxies like sing-box) can leave
httpx connections in a half-closed state that still occupy pool slots.
After enough reconnect cycles the pool fills up entirely, causing
``Pool timeout: All connections in the connection pool are occupied.``
We reset ONLY ``_request[0]`` (the getUpdates request) the general
request (``_request[1]``) is left untouched so concurrent
``send_message`` / ``edit_message`` calls are never interrupted.
Implementation note: accesses ``Bot._request[0]`` which is the
get-updates ``BaseRequest`` in the PTB 22.x internal tuple
``(get_updates_request, general_request)``. There is no public
accessor for the polling request; review if upgrading to PTB 23+.
"""
if not (self._app and self._app.bot):
return
try:
# PTB 22.x: _request is a (get_updates, general) tuple;
# no public accessor exists for the polling request.
polling_req = self._app.bot._request[0] # noqa: SLF001
except Exception:
return
try:
await polling_req.shutdown()
except Exception:
logger.debug(
"[%s] Polling request shutdown failed (non-fatal)",
self.name, exc_info=True,
)
try:
await polling_req.initialize()
logger.debug(
"[%s] Polling request pool drained before reconnect", self.name
)
except Exception:
logger.debug(
"[%s] Polling request re-initialize failed (non-fatal)",
self.name, exc_info=True,
)
async def _handle_polling_network_error(self, error: Exception) -> None:
"""Reconnect polling after a transient network interruption.
@@ -457,8 +379,6 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception:
pass
await self._drain_polling_connections()
try:
await self._app.updater.start_polling(
allowed_updates=Update.ALL_TYPES,
@@ -506,7 +426,6 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception:
pass
await asyncio.sleep(RETRY_DELAY)
await self._drain_polling_connections()
try:
await self._app.updater.start_polling(
allowed_updates=Update.ALL_TYPES,
@@ -635,7 +554,7 @@ class TelegramAdapter(BasePlatformAdapter):
_yaml.dump(config, f, default_flow_style=False, sort_keys=False)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, config_path)
os.replace(tmp_path, config_path)
except BaseException:
try:
os.unlink(tmp_path)
@@ -1290,31 +1209,6 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=False, error=str(e))
async def delete_message(self, chat_id: str, message_id: str) -> bool:
"""Delete a previously sent Telegram message.
Used by the stream consumer's fresh-final cleanup path (ported
from openclaw/openclaw#72038) to remove long-lived preview
messages after sending the completed reply as a fresh message.
Telegram's Bot API ``deleteMessage`` works for bot-posted
messages in the last 48 hours. Failures are non-fatal the
caller leaves the preview in place and logs at debug level.
"""
if not self._bot:
return False
try:
await self._bot.delete_message(
chat_id=int(chat_id),
message_id=int(message_id),
)
return True
except Exception as e:
logger.debug(
"[%s] Failed to delete Telegram message %s: %s",
self.name, message_id, e,
)
return False
async def send_update_prompt(
self, chat_id: str, prompt: str, default: str = "",
session_key: str = "",
@@ -2161,8 +2055,10 @@ class TelegramAdapter(BasePlatformAdapter):
text = content
# 0) Rewrite GFM-style pipe tables into Telegram-friendly row groups
# before the normal MarkdownV2 conversions run.
# 0) Pre-wrap GFM-style pipe tables in ``` fences. Telegram can't
# render tables natively, but fenced code blocks render as
# monospace preformatted text with columns intact. The wrapped
# tables then flow through step (1) below as protected regions.
text = _wrap_markdown_tables(text)
# 1) Protect fenced code blocks (``` ... ```)
@@ -2432,26 +2328,6 @@ class TelegramAdapter(BasePlatformAdapter):
user = getattr(entity, "user", None)
if user and getattr(user, "id", None) == bot_id:
return True
elif entity_type == "bot_command" and expected:
# Telegram's official group-disambiguation form for slash
# commands (``/cmd@botname``) is emitted as a single
# ``bot_command`` entity covering the whole span — there
# is no accompanying ``mention`` entity. Treat it as a
# direct address to this bot when the ``@botname`` suffix
# matches. This is the form Telegram's own command menu
# autocomplete produces in groups, so dropping it at the
# mention gate would break /new, /reset, /help, ... for
# every group that has ``require_mention`` enabled (#15415).
offset = int(getattr(entity, "offset", -1))
length = int(getattr(entity, "length", 0))
if offset < 0 or length <= 0:
continue
command_text = source_text[offset:offset + length]
at_index = command_text.find("@")
if at_index < 0:
continue
if command_text[at_index:].strip().lower() == expected:
return True
return False
def _message_matches_mention_patterns(self, message: Message) -> bool:
+3 -26
View File
@@ -89,7 +89,6 @@ MAX_CONSECUTIVE_FAILURES = 3
RETRY_DELAY_SECONDS = 2
BACKOFF_DELAY_SECONDS = 30
SESSION_EXPIRED_ERRCODE = -14
RATE_LIMIT_ERRCODE = -2 # iLink frequency limit — backoff and retry
MESSAGE_DEDUP_TTL_SECONDS = 300
MEDIA_IMAGE = 1
@@ -1114,7 +1113,7 @@ async def qr_login(
class WeixinAdapter(BasePlatformAdapter):
"""Native Hermes adapter for Weixin personal accounts."""
MAX_MESSAGE_LENGTH = 2000
MAX_MESSAGE_LENGTH = 4000
# WeChat does not support editing sent messages — streaming must use the
# fallback "send-final-only" path so the cursor (▉) is never left visible.
@@ -1139,10 +1138,10 @@ class WeixinAdapter(BasePlatformAdapter):
extra.get("cdn_base_url") or os.getenv("WEIXIN_CDN_BASE_URL", WEIXIN_CDN_BASE_URL)
).strip().rstrip("/")
self._send_chunk_delay_seconds = float(
extra.get("send_chunk_delay_seconds") or os.getenv("WEIXIN_SEND_CHUNK_DELAY_SECONDS", "1.5")
extra.get("send_chunk_delay_seconds") or os.getenv("WEIXIN_SEND_CHUNK_DELAY_SECONDS", "0.35")
)
self._send_chunk_retries = int(
extra.get("send_chunk_retries") or os.getenv("WEIXIN_SEND_CHUNK_RETRIES", "4")
extra.get("send_chunk_retries") or os.getenv("WEIXIN_SEND_CHUNK_RETRIES", "2")
)
self._send_chunk_retry_delay_seconds = float(
extra.get("send_chunk_retry_delay_seconds")
@@ -1532,28 +1531,6 @@ class WeixinAdapter(BasePlatformAdapter):
self.name, _safe_id(chat_id),
)
continue
# Rate limit (-2) — backoff and retry
is_rate_limited = (
ret == RATE_LIMIT_ERRCODE
or errcode == RATE_LIMIT_ERRCODE
)
if is_rate_limited:
errmsg = resp.get("errmsg") or resp.get("msg") or "rate limited"
# Record the error so we raise a descriptive
# RuntimeError (instead of AssertionError) if the
# loop exhausts with the server still rate-limiting.
last_error = RuntimeError(
f"iLink sendmessage rate limited: ret={ret} errcode={errcode} errmsg={errmsg}"
)
if attempt >= self._send_chunk_retries:
break
wait = self._send_chunk_retry_delay_seconds * 3 # 3x backoff for rate limit
logger.warning(
"[%s] rate limited for %s; backing off %.1fs before retry",
self.name, _safe_id(chat_id), wait,
)
await asyncio.sleep(wait)
continue
errmsg = resp.get("errmsg") or resp.get("msg") or "unknown error"
raise RuntimeError(
f"iLink sendmessage error: ret={ret} errcode={errcode} errmsg={errmsg}"
File diff suppressed because it is too large Load Diff
-645
View File
@@ -1,645 +0,0 @@
"""
yuanbao_media.py 元宝平台媒体处理模块
提供 COS 上传文件下载TIM 媒体消息构建等功能
移植自 TypeScript media.tsyuanbao-openclaw-plugin
使用 httpx 替代 cos-nodejs-sdk-v5避免引入额外 SDK 依赖
COS 上传流程
1. 调用 genUploadInfo 获取临时凭证tmpSecretId/tmpSecretKey/sessionToken
2. 用临时凭证通过 HMAC-SHA1 签名构建 Authorization
3. HTTP PUT 上传到 COS
TIM 消息体构建
- buildImageMsgBody() TIMImageElem
- buildFileMsgBody() TIMFileElem
"""
from __future__ import annotations
import hashlib
import hmac
import logging
import os
import secrets
import struct
import time
import urllib.parse
from typing import Optional, Any
import httpx
logger = logging.getLogger(__name__)
# ============ 常量 ============
UPLOAD_INFO_PATH = "/api/resource/genUploadInfo"
DEFAULT_API_DOMAIN = "yuanbao.tencent.com"
DEFAULT_MAX_SIZE_MB = 50
# COS 加速域名后缀(优先使用全球加速)
COS_USE_ACCELERATE = True
# ============ 类型映射 ============
# MIME → image_format 数字(TIM 协议字段)
_MIME_TO_IMAGE_FORMAT: dict[str, int] = {
"image/jpeg": 1,
"image/jpg": 1,
"image/gif": 2,
"image/png": 3,
"image/bmp": 4,
"image/webp": 255,
"image/heic": 255,
"image/tiff": 255,
}
# 文件扩展名 → MIME
_EXT_TO_MIME: dict[str, str] = {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
".heic": "image/heic",
".tiff": "image/tiff",
".ico": "image/x-icon",
".pdf": "application/pdf",
".doc": "application/msword",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xls": "application/vnd.ms-excel",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".ppt": "application/vnd.ms-powerpoint",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
".txt": "text/plain",
".zip": "application/zip",
".tar": "application/x-tar",
".gz": "application/gzip",
".mp3": "audio/mpeg",
".mp4": "video/mp4",
".wav": "audio/wav",
".ogg": "audio/ogg",
".webm": "video/webm",
}
# ============ 工具函数 ============
def guess_mime_type(filename: str) -> str:
"""根据文件扩展名猜测 MIME 类型。"""
ext = os.path.splitext(filename)[-1].lower()
return _EXT_TO_MIME.get(ext, "application/octet-stream")
def is_image(filename: str, mime_type: str = "") -> bool:
"""判断是否为图片类型。"""
if mime_type.startswith("image/"):
return True
ext = os.path.splitext(filename)[-1].lower()
return ext in {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".heic", ".tiff", ".ico"}
def get_image_format(mime_type: str) -> int:
"""获取 TIM 图片格式编号。"""
return _MIME_TO_IMAGE_FORMAT.get(mime_type.lower(), 255)
def md5_hex(data: bytes) -> str:
"""计算 MD5 十六进制摘要。"""
return hashlib.md5(data).hexdigest()
def generate_file_id() -> str:
"""生成随机文件 ID(32 位 hex)。"""
return secrets.token_hex(16)
# ============ 图片尺寸解析(纯 Python,无需 Pillow ============
def parse_image_size(data: bytes) -> Optional[dict[str, int]]:
"""
解析图片宽高支持 JPEG/PNG/GIF/WebP无需第三方依赖
返回 {"width": w, "height": h} None无法识别
"""
return (
_parse_png_size(data)
or _parse_jpeg_size(data)
or _parse_gif_size(data)
or _parse_webp_size(data)
)
def _parse_png_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 24:
return None
if buf[:4] != b"\x89PNG":
return None
w = struct.unpack(">I", buf[16:20])[0]
h = struct.unpack(">I", buf[20:24])[0]
return {"width": w, "height": h}
def _parse_jpeg_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 4 or buf[0] != 0xFF or buf[1] != 0xD8:
return None
i = 2
while i < len(buf) - 9:
if buf[i] != 0xFF:
i += 1
continue
marker = buf[i + 1]
if marker in (0xC0, 0xC2):
h = struct.unpack(">H", buf[i + 5: i + 7])[0]
w = struct.unpack(">H", buf[i + 7: i + 9])[0]
return {"width": w, "height": h}
if i + 3 < len(buf):
i += 2 + struct.unpack(">H", buf[i + 2: i + 4])[0]
else:
break
return None
def _parse_gif_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 10:
return None
sig = buf[:6].decode("ascii", errors="replace")
if sig not in ("GIF87a", "GIF89a"):
return None
w = struct.unpack("<H", buf[6:8])[0]
h = struct.unpack("<H", buf[8:10])[0]
return {"width": w, "height": h}
def _parse_webp_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 16:
return None
if buf[:4] != b"RIFF" or buf[8:12] != b"WEBP":
return None
chunk = buf[12:16].decode("ascii", errors="replace")
if chunk == "VP8 ":
if len(buf) >= 30 and buf[23] == 0x9D and buf[24] == 0x01 and buf[25] == 0x2A:
w = struct.unpack("<H", buf[26:28])[0] & 0x3FFF
h = struct.unpack("<H", buf[28:30])[0] & 0x3FFF
return {"width": w, "height": h}
elif chunk == "VP8L":
if len(buf) >= 25 and buf[20] == 0x2F:
bits = struct.unpack("<I", buf[21:25])[0]
w = (bits & 0x3FFF) + 1
h = ((bits >> 14) & 0x3FFF) + 1
return {"width": w, "height": h}
elif chunk == "VP8X":
if len(buf) >= 30:
w = (buf[24] | (buf[25] << 8) | (buf[26] << 16)) + 1
h = (buf[27] | (buf[28] << 8) | (buf[29] << 16)) + 1
return {"width": w, "height": h}
return None
# ============ URL 下载 ============
async def download_url(
url: str,
max_size_mb: int = DEFAULT_MAX_SIZE_MB,
) -> tuple[bytes, str]:
"""
下载 URL 内容返回 (bytes, content_type)
Args:
url: HTTP(S) URL
max_size_mb: 最大允许大小MB超过则抛出异常
Returns:
(data_bytes, content_type_string)
Raises:
ValueError: 内容超过大小限制
httpx.HTTPError: 网络/HTTP 错误
"""
max_bytes = max_size_mb * 1024 * 1024
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
# 先 HEAD 检查大小
try:
head = await client.head(url)
content_length = int(head.headers.get("content-length", 0) or 0)
if content_length > 0 and content_length > max_bytes:
raise ValueError(
f"文件过大: {content_length / 1024 / 1024:.1f} MB > {max_size_mb} MB"
)
except httpx.HTTPStatusError:
pass # 部分服务器不支持 HEAD,忽略
# GET 下载(流式读取,防止超限)
async with client.stream("GET", url) as resp:
resp.raise_for_status()
content_type = resp.headers.get("content-type", "").split(";")[0].strip()
chunks: list[bytes] = []
downloaded = 0
async for chunk in resp.aiter_bytes(65536):
downloaded += len(chunk)
if downloaded > max_bytes:
raise ValueError(
f"文件过大: 已超过 {max_size_mb} MB 限制"
)
chunks.append(chunk)
data = b"".join(chunks)
return data, content_type
# ============ COS 鉴权(HMAC-SHA1 ============
def _cos_sign(
method: str,
path: str,
params: dict[str, str],
headers: dict[str, str],
secret_id: str,
secret_key: str,
start_time: Optional[int] = None,
expire_seconds: int = 3600,
) -> str:
"""
构建 COS 请求签名q-sign-algorithm=sha1 方案
参考https://cloud.tencent.com/document/product/436/7778
Args:
method: HTTP 方法小写 "put"
path: URL 路径URL encode 后的小写
params: URL 查询参数 dict用于签名
headers: 参与签名的请求头 dictkey 需小写
secret_id: 临时 SecretIdtmpSecretId
secret_key: 临时 SecretKeytmpSecretKey
start_time: 签名起始 Unix 时间戳默认 now
expire_seconds: 签名有效期默认 3600
Returns:
Authorization header 完整字符串
"""
now = int(time.time())
q_sign_time = f"{start_time or now};{(start_time or now) + expire_seconds}"
# Step 1: SignKey = HMAC-SHA1(SecretKey, q-sign-time)
sign_key = hmac.new(
secret_key.encode("utf-8"),
q_sign_time.encode("utf-8"),
hashlib.sha1,
).hexdigest()
# Step 2: HttpString
# 参数和头部需按字典序排列,key 小写
sorted_params = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in params.items())
sorted_headers = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in headers.items())
url_param_list = ";".join(k for k, _ in sorted_params)
url_params = "&".join(f"{k}={v}" for k, v in sorted_params)
header_list = ";".join(k for k, _ in sorted_headers)
header_str = "&".join(f"{k}={v}" for k, v in sorted_headers)
http_string = "\n".join([
method.lower(),
path,
url_params,
header_str,
"",
])
# Step 3: StringToSign = sha1 hash of HttpString
sha1_of_http = hashlib.sha1(http_string.encode("utf-8")).hexdigest()
string_to_sign = "\n".join([
"sha1",
q_sign_time,
sha1_of_http,
"",
])
# Step 4: Signature = HMAC-SHA1(SignKey, StringToSign)
signature = hmac.new(
sign_key.encode("utf-8"),
string_to_sign.encode("utf-8"),
hashlib.sha1,
).hexdigest()
return (
f"q-sign-algorithm=sha1"
f"&q-ak={secret_id}"
f"&q-sign-time={q_sign_time}"
f"&q-key-time={q_sign_time}"
f"&q-header-list={header_list}"
f"&q-url-param-list={url_param_list}"
f"&q-signature={signature}"
)
# ============ 主要公开 API ============
async def get_cos_credentials(
app_key: str,
api_domain: str,
token: str,
filename: str = "file",
file_id: Optional[str] = None,
bot_id: str = "",
route_env: str = "",
) -> dict:
"""
调用 genUploadInfo 接口获取 COS 临时密钥及上传配置
Args:
app_key: 应用 Key用于 X-ID
api_domain: API 域名 https://bot.yuanbao.tencent.com
token: 当前有效的签票 tokenX-Token
filename: 待上传的文件名含扩展名
file_id: 客户端生成的唯一文件 ID不传则自动生成
bot_id: Bot 账号 ID用于 X-ID
Returns:
COS 上传配置 dict包含以下字段
bucketName (str) COS Bucket 名称
region (str) COS 地域
location (str) 上传 Key对象路径
encryptTmpSecretId (str) 临时 SecretId
encryptTmpSecretKey(str) 临时 SecretKey
encryptToken (str) SessionToken
startTime (int) 凭证起始时间戳Unix
expiredTime (int) 凭证过期时间戳Unix
resourceUrl (str) 上传后的公网访问 URL
resourceID (str) 资源 ID可选
Raises:
RuntimeError: 接口返回非 0 code 或字段缺失
"""
if file_id is None:
file_id = generate_file_id()
upload_url = f"{api_domain.rstrip('/')}{UPLOAD_INFO_PATH}"
headers = {
"Content-Type": "application/json",
"X-Token": token,
"X-ID": bot_id or app_key,
"X-Source": "web",
}
if route_env:
headers["X-Route-Env"] = route_env
body = {
"fileName": filename,
"fileId": file_id,
"docFrom": "localDoc",
"docOpenId": "",
}
async with httpx.AsyncClient(timeout=15.0) as client:
resp = await client.post(upload_url, json=body, headers=headers)
resp.raise_for_status()
result: dict[str, Any] = resp.json()
code = result.get("code")
if code != 0 and code is not None:
raise RuntimeError(
f"genUploadInfo 失败: code={code}, msg={result.get('msg', '')}"
)
data = result.get("data") or result
required_fields = ["bucketName", "location"]
missing = [f for f in required_fields if not data.get(f)]
if missing:
raise RuntimeError(
f"genUploadInfo 返回字段不完整: 缺少字段 {missing}"
)
return data
async def upload_to_cos(
file_bytes: bytes,
filename: str,
content_type: str,
credentials: dict,
bucket: str,
region: str,
) -> dict:
"""
通过 httpx PUT 请求将文件上传到 COS
使用临时凭证tmpSecretId/tmpSecretKey/sessionToken构建 HMAC-SHA1 签名
Args:
file_bytes: 文件二进制内容
filename: 文件名用于辅助计算 MIMEUUID
content_type: MIME 类型 "image/jpeg"
credentials: get_cos_credentials() 返回的 dict包含
encryptTmpSecretId tmpSecretId
encryptTmpSecretKey tmpSecretKey
encryptToken sessionToken
location COS key对象路径
resourceUrl 上传后公网 URL
startTime 凭证起始时间Unix
expiredTime 凭证过期时间Unix
bucket: COS Bucket 名称 chatbot-1234567890
region: COS 地域 ap-guangzhou
Returns:
上传结果 dict包含
url (str) COS 公网访问 URL
uuid (str) 文件内容 MD5
size (int) 文件大小字节
width (int, optional) 图片宽度仅图片
height (int, optional) 图片高度仅图片
Raises:
httpx.HTTPStatusError: COS 返回非 2xx 状态
RuntimeError: credentials 字段缺失
"""
secret_id: str = credentials.get("encryptTmpSecretId", "")
secret_key: str = credentials.get("encryptTmpSecretKey", "")
session_token: str = credentials.get("encryptToken", "")
cos_key: str = credentials.get("location", "")
resource_url: str = credentials.get("resourceUrl", "")
start_time: Optional[int] = credentials.get("startTime")
expired_time: Optional[int] = credentials.get("expiredTime")
if not secret_id or not secret_key or not cos_key:
raise RuntimeError(
f"COS credentials 不完整: secretId={bool(secret_id)}, "
f"secretKey={bool(secret_key)}, location={bool(cos_key)}"
)
# 构建 COS 上传 URL(优先使用全球加速域名)
if COS_USE_ACCELERATE:
cos_host = f"{bucket}.cos.accelerate.myqcloud.com"
else:
cos_host = f"{bucket}.cos.{region}.myqcloud.com"
# URL encode cos_key(保留 /
encoded_key = urllib.parse.quote(cos_key, safe="/")
cos_url = f"https://{cos_host}/{encoded_key.lstrip('/')}"
# 确定 Content-Type
if not content_type or content_type == "application/octet-stream":
if is_image(filename):
content_type = guess_mime_type(filename)
else:
content_type = "application/octet-stream"
# 计算文件 MD5 + size
file_uuid = md5_hex(file_bytes)
file_size = len(file_bytes)
# 参与签名的请求头
sign_headers = {
"host": cos_host,
"content-type": content_type,
"x-cos-security-token": session_token,
}
# 计算签名有效期
now = int(time.time())
sign_start = start_time if start_time else now
sign_expire = (expired_time - now) if expired_time and expired_time > now else 3600
authorization = _cos_sign(
method="put",
path=f"/{encoded_key.lstrip('/')}",
params={},
headers=sign_headers,
secret_id=secret_id,
secret_key=secret_key,
start_time=sign_start,
expire_seconds=sign_expire,
)
put_headers = {
"Authorization": authorization,
"Content-Type": content_type,
"x-cos-security-token": session_token,
}
logger.info(
"COS PUT: bucket=%s region=%s key=%s size=%d mime=%s",
bucket, region, cos_key, file_size, content_type,
)
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.put(
cos_url,
content=file_bytes,
headers=put_headers,
)
resp.raise_for_status()
# 解析图片尺寸(仅图片类型)
result: dict[str, Any] = {
"url": resource_url or cos_url,
"uuid": file_uuid,
"size": file_size,
}
if content_type.startswith("image/"):
size_info = parse_image_size(file_bytes)
if size_info:
result["width"] = size_info["width"]
result["height"] = size_info["height"]
logger.info(
"COS 上传成功: url=%s size=%d",
result["url"], file_size,
)
return result
# ============ TIM 媒体消息构建 ============
def build_image_msg_body(
url: str,
uuid: Optional[str] = None,
filename: Optional[str] = None,
size: int = 0,
width: int = 0,
height: int = 0,
mime_type: str = "",
) -> list[dict]:
"""
构建腾讯 IM TIMImageElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 图片公网访问 URLCOS resourceUrl
uuid: 文件 UUIDMD5 或其他唯一标识
filename: 文件名uuid 为空时作为备用
size: 文件大小字节
width: 图片宽度像素
height: 图片高度像素
mime_type: MIME 类型用于确定 image_format
Returns:
TIMImageElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename or _basename_from_url(url) or "image"
image_format = get_image_format(mime_type) if mime_type else 255
return [
{
"msg_type": "TIMImageElem",
"msg_content": {
"uuid": _uuid,
"image_format": image_format,
"image_info_array": [
{
"type": 1, # 1 = 原图
"size": size,
"width": width,
"height": height,
"url": url,
}
],
},
}
]
def build_file_msg_body(
url: str,
filename: str,
uuid: Optional[str] = None,
size: int = 0,
) -> list[dict]:
"""
构建腾讯 IM TIMFileElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 文件公网访问 URLCOS resourceUrl
filename: 文件名含扩展名
uuid: 文件 UUIDMD5 或其他唯一标识不传则使用 filename
size: 文件大小字节
Returns:
TIMFileElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename
return [
{
"msg_type": "TIMFileElem",
"msg_content": {
"uuid": _uuid,
"file_name": filename,
"file_size": size,
"url": url,
},
}
]
# ============ 内部工具 ============
def _basename_from_url(url: str) -> str:
"""从 URL 提取文件名。"""
try:
parsed = urllib.parse.urlparse(url)
return os.path.basename(parsed.path)
except Exception:
return ""
File diff suppressed because it is too large Load Diff
-558
View File
@@ -1,558 +0,0 @@
"""
Yuanbao sticker (TIMFaceElem) support.
Ported from yuanbao-openclaw-plugin/src/sticker/.
TIMFaceElem wire format:
{
"msg_type": "TIMFaceElem",
"msg_content": {
"index": 0, # always 0 per Yuanbao convention
"data": "<json>", # serialised sticker metadata
}
}
The `data` field carries a JSON string with the sticker's metadata so the
receiver can look up the correct asset in the emoji pack.
"""
from __future__ import annotations
import json
import random
import re
import unicodedata
from typing import Optional
# ---------------------------------------------------------------------------
# Sticker catalogue ported from builtin-stickers.json
# Key : canonical name (Chinese)
# Value : {sticker_id, package_id, name, description, width, height, formats}
# ---------------------------------------------------------------------------
STICKER_MAP: dict[str, dict] = {
"六六六": {
"sticker_id": "278", "package_id": "1003", "name": "六六六",
"description": "666 厉害 牛 棒 绝了 好强 awesome",
"width": 128, "height": 128, "formats": "png",
},
"我想开了": {
"sticker_id": "262", "package_id": "1003", "name": "我想开了",
"description": "想开 佛系 释怀 顿悟 看淡了 无所谓",
"width": 128, "height": 128, "formats": "png",
},
"害羞": {
"sticker_id": "130", "package_id": "1003", "name": "害羞",
"description": "腼腆 不好意思 脸红 娇羞 羞涩 捂脸",
"width": 128, "height": 128, "formats": "png",
},
"比心": {
"sticker_id": "252", "package_id": "1003", "name": "比心",
"description": "笔芯 爱你 爱心手势 love heart 喜欢你",
"width": 128, "height": 128, "formats": "png",
},
"委屈": {
"sticker_id": "125", "package_id": "1003", "name": "委屈",
"description": "难过 想哭 可怜巴巴 瘪嘴 受伤 被欺负",
"width": 128, "height": 128, "formats": "png",
},
"亲亲": {
"sticker_id": "146", "package_id": "1003", "name": "亲亲",
"description": "么么 mua 亲一下 kiss 飞吻 啵",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "131", "package_id": "1003", "name": "",
"description": "帅 墨镜 cool 高冷 有型 swagger",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "145", "package_id": "1003", "name": "",
"description": "睡觉 困 zzZ 打盹 躺平 休眠 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"发呆": {
"sticker_id": "152", "package_id": "1003", "name": "发呆",
"description": "懵 愣住 放空 呆滞 出神 脑子空白",
"width": 128, "height": 128, "formats": "png",
},
"可怜": {
"sticker_id": "157", "package_id": "1003", "name": "可怜",
"description": "卖萌 求饶 委屈巴巴 弱小 拜托 眼巴巴",
"width": 128, "height": 128, "formats": "png",
},
"摊手": {
"sticker_id": "200", "package_id": "1003", "name": "摊手",
"description": "无奈 没办法 耸肩 随便 那咋整 whatever",
"width": 128, "height": 128, "formats": "png",
},
"头大": {
"sticker_id": "213", "package_id": "1003", "name": "头大",
"description": "头疼 烦恼 郁闷 难搞 崩溃 一团乱",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "256", "package_id": "1003", "name": "",
"description": "害怕 惊恐 震惊 吓一跳 恐怖 怂",
"width": 128, "height": 128, "formats": "png",
},
"吐血": {
"sticker_id": "203", "package_id": "1003", "name": "吐血",
"description": "无语 崩溃 被雷 内伤 一口老血 屮",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "185", "package_id": "1003", "name": "",
"description": "傲娇 生气 不满 撇嘴 不理 赌气",
"width": 128, "height": 128, "formats": "png",
},
"嘿嘿": {
"sticker_id": "220", "package_id": "1003", "name": "嘿嘿",
"description": "坏笑 猥琐笑 偷笑 憨笑 得意 你懂的",
"width": 128, "height": 128, "formats": "png",
},
"头秃": {
"sticker_id": "218", "package_id": "1003", "name": "头秃",
"description": "程序员 加班 焦虑 没头发 秃了 肝爆",
"width": 128, "height": 128, "formats": "png",
},
"暗中观察": {
"sticker_id": "221", "package_id": "1003", "name": "暗中观察",
"description": "窥屏 潜水 偷偷看 角落 围观 屏住呼吸",
"width": 128, "height": 128, "formats": "png",
},
"我酸了": {
"sticker_id": "224", "package_id": "1003", "name": "我酸了",
"description": "嫉妒 柠檬精 羡慕 吃柠檬 眼红 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"打call": {
"sticker_id": "246", "package_id": "1003", "name": "打call",
"description": "应援 加油 支持 喝彩 助威 call",
"width": 128, "height": 128, "formats": "png",
},
"庆祝": {
"sticker_id": "251", "package_id": "1003", "name": "庆祝",
"description": "祝贺 开心 耶 party 胜利 干杯",
"width": 128, "height": 128, "formats": "png",
},
"奋斗": {
"sticker_id": "151", "package_id": "1003", "name": "奋斗",
"description": "努力 加油 拼搏 冲 干劲 卷起来",
"width": 128, "height": 128, "formats": "png",
},
"惊讶": {
"sticker_id": "143", "package_id": "1003", "name": "惊讶",
"description": "震惊 哇 不敢相信 OMG 居然 这么离谱",
"width": 128, "height": 128, "formats": "png",
},
"疑问": {
"sticker_id": "144", "package_id": "1003", "name": "疑问",
"description": "问号 不懂 啥 为什么 啥情况 懵逼问",
"width": 128, "height": 128, "formats": "png",
},
"仔细分析": {
"sticker_id": "248", "package_id": "1003", "name": "仔细分析",
"description": "思考 推敲 认真 研究 琢磨 让我想想",
"width": 128, "height": 128, "formats": "png",
},
"撅嘴": {
"sticker_id": "184", "package_id": "1003", "name": "撅嘴",
"description": "嘟嘴 卖萌 不高兴 撒娇 嘴翘",
"width": 128, "height": 128, "formats": "png",
},
"泪奔": {
"sticker_id": "199", "package_id": "1003", "name": "泪奔",
"description": "大哭 伤心 破防 感动哭 泪流满面 呜呜",
"width": 128, "height": 128, "formats": "png",
},
"尊嘟假嘟": {
"sticker_id": "276", "package_id": "1003", "name": "尊嘟假嘟",
"description": "真的假的 真假 可爱问 你骗我 是不是",
"width": 128, "height": 128, "formats": "png",
},
"略略略": {
"sticker_id": "113", "package_id": "1003", "name": "略略略",
"description": "调皮 吐舌 不服 略 气死你 鬼脸",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "180", "package_id": "1003", "name": "",
"description": "想睡 倦 打哈欠 睁不开眼 好困啊 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"折磨": {
"sticker_id": "181", "package_id": "1003", "name": "折磨",
"description": "难受 痛苦 煎熬 蚌埠住了 受不了 要命",
"width": 128, "height": 128, "formats": "png",
},
"抠鼻": {
"sticker_id": "182", "package_id": "1003", "name": "抠鼻",
"description": "不屑 无聊 淡定 无所谓 鄙视 挖鼻",
"width": 128, "height": 128, "formats": "png",
},
"鼓掌": {
"sticker_id": "183", "package_id": "1003", "name": "鼓掌",
"description": "拍手 叫好 赞同 666 喝彩 掌声",
"width": 128, "height": 128, "formats": "png",
},
"斜眼笑": {
"sticker_id": "204", "package_id": "1003", "name": "斜眼笑",
"description": "滑稽 坏笑 doge 意味深长 阴阳怪气 嘿嘿嘿",
"width": 128, "height": 128, "formats": "png",
},
"辣眼睛": {
"sticker_id": "216", "package_id": "1003", "name": "辣眼睛",
"description": "看不下去 cringe 毁三观 太丑了 瞎了",
"width": 128, "height": 128, "formats": "png",
},
"哦哟": {
"sticker_id": "217", "package_id": "1003", "name": "哦哟",
"description": "惊讶 起哄 哇哦 有戏 不简单 哟",
"width": 128, "height": 128, "formats": "png",
},
"吃瓜": {
"sticker_id": "222", "package_id": "1003", "name": "吃瓜",
"description": "围观 看戏 八卦 路人 看热闹 板凳",
"width": 128, "height": 128, "formats": "png",
},
"狗头": {
"sticker_id": "225", "package_id": "1003", "name": "狗头",
"description": "doge 保命 开玩笑 滑稽 反讽 懂的都懂",
"width": 128, "height": 128, "formats": "png",
},
"敬礼": {
"sticker_id": "227", "package_id": "1003", "name": "敬礼",
"description": "salute 尊重 收到 遵命 致敬 报告",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "231", "package_id": "1003", "name": "",
"description": "知道了 明白 敷衍 嗯 这样啊 收到",
"width": 128, "height": 128, "formats": "png",
},
"拿到红包": {
"sticker_id": "236", "package_id": "1003", "name": "拿到红包",
"description": "红包 谢谢老板 发财 开心 抢到了 欧气",
"width": 128, "height": 128, "formats": "png",
},
"牛吖": {
"sticker_id": "239", "package_id": "1003", "name": "牛吖",
"description": "牛 厉害 强 666 佩服 大佬",
"width": 128, "height": 128, "formats": "png",
},
"贴贴": {
"sticker_id": "272", "package_id": "1003", "name": "贴贴",
"description": "抱抱 亲昵 蹭蹭 亲密 靠靠 撒娇贴",
"width": 128, "height": 128, "formats": "png",
},
"爱心": {
"sticker_id": "138", "package_id": "1003", "name": "爱心",
"description": "心 love 喜欢你 红心 示爱 么么哒",
"width": 128, "height": 128, "formats": "png",
},
"晚安": {
"sticker_id": "170", "package_id": "1003", "name": "晚安",
"description": "好梦 睡了 night 早点休息 安啦 moon",
"width": 128, "height": 128, "formats": "png",
},
"太阳": {
"sticker_id": "176", "package_id": "1003", "name": "太阳",
"description": "晴天 早上好 阳光 morning 好天气 日",
"width": 128, "height": 128, "formats": "png",
},
"柠檬": {
"sticker_id": "266", "package_id": "1003", "name": "柠檬",
"description": "酸 嫉妒 柠檬精 羡慕 我酸 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"大冤种": {
"sticker_id": "267", "package_id": "1003", "name": "大冤种",
"description": "倒霉 吃亏 自嘲 好心没好报 背锅 工具人",
"width": 128, "height": 128, "formats": "png",
},
"吐了": {
"sticker_id": "132", "package_id": "1003", "name": "吐了",
"description": "恶心 yue 受不了 嫌弃 想吐 生理不适",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "134", "package_id": "1003", "name": "",
"description": "生气 愤怒 火大 暴躁 气炸 怼",
"width": 128, "height": 128, "formats": "png",
},
"玫瑰": {
"sticker_id": "165", "package_id": "1003", "name": "玫瑰",
"description": "花 示爱 表白 浪漫 送你花 情人节",
"width": 128, "height": 128, "formats": "png",
},
"凋谢": {
"sticker_id": "119", "package_id": "1003", "name": "凋谢",
"description": "花谢 失恋 难过 枯萎 心碎 凉了",
"width": 128, "height": 128, "formats": "png",
},
"点赞": {
"sticker_id": "159", "package_id": "1003", "name": "点赞",
"description": "赞 认同 好棒 good like 大拇指 顶",
"width": 128, "height": 128, "formats": "png",
},
"握手": {
"sticker_id": "164", "package_id": "1003", "name": "握手",
"description": "合作 你好 商务 hello deal 成交 友好",
"width": 128, "height": 128, "formats": "png",
},
"抱拳": {
"sticker_id": "163", "package_id": "1003", "name": "抱拳",
"description": "谢谢 失敬 江湖 承让 拜托 有礼",
"width": 128, "height": 128, "formats": "png",
},
"ok": {
"sticker_id": "169", "package_id": "1003", "name": "ok",
"description": "好的 收到 没问题 okay 行 可以 懂了",
"width": 128, "height": 128, "formats": "png",
},
"拳头": {
"sticker_id": "174", "package_id": "1003", "name": "拳头",
"description": "加油 干 冲 fight 力量 击拳 硬气",
"width": 128, "height": 128, "formats": "png",
},
"鞭炮": {
"sticker_id": "191", "package_id": "1003", "name": "鞭炮",
"description": "过年 喜庆 爆竹 春节 噼里啪啦 红",
"width": 128, "height": 128, "formats": "png",
},
"烟花": {
"sticker_id": "258", "package_id": "1003", "name": "烟花",
"description": "庆典 漂亮 新年 嘭 绽放 节日快乐",
"width": 128, "height": 128, "formats": "png",
},
}
def get_sticker_by_name(name: str) -> Optional[dict]:
"""
按名称查找贴纸支持模糊匹配
匹配优先级
1. 完全相等name
2. name 包含查询词前缀/子串
3. description 包含查询词同义词搜索
4. 通用模糊评分 sticker-search 同算法命中即返回得分最高的一条
返回 sticker dict找不到返回 None
"""
if not name:
return None
query = name.strip()
if query in STICKER_MAP:
return STICKER_MAP[query]
for key, sticker in STICKER_MAP.items():
if query in key or key in query:
return sticker
for sticker in STICKER_MAP.values():
desc = sticker.get("description", "")
if query in desc:
return sticker
matches = search_stickers(query, limit=1)
return matches[0] if matches else None
def get_random_sticker(category: str = None) -> dict:
"""
随机返回一个贴纸
若指定 category则在 description 中含有该关键词的贴纸里随机选取
category None 时从全表随机
"""
if category:
candidates = [
s for s in STICKER_MAP.values()
if category in s.get("description", "") or category in s.get("name", "")
]
if candidates:
return random.choice(candidates)
return random.choice(list(STICKER_MAP.values()))
def get_sticker_by_id(sticker_id: str) -> Optional[dict]:
"""按 sticker_id 精确查找贴纸。"""
if not sticker_id:
return None
sid = str(sticker_id).strip()
for sticker in STICKER_MAP.values():
if sticker.get("sticker_id") == sid:
return sticker
return None
# ---------------------------------------------------------------------------
# 模糊搜索(对齐 chatbot-web yuanbao-openclaw-plugin/sticker-cache.ts.searchStickers
# ---------------------------------------------------------------------------
_PUNCT_RE = re.compile(r"[\s\u3000\-_·.,,。!?\"“”'‘’、/\\]+")
def _normalize_text(raw: str) -> str:
return unicodedata.normalize("NFKC", str(raw or "")).strip().lower()
def _compact_text(raw: str) -> str:
return _PUNCT_RE.sub("", _normalize_text(raw))
def _multiset_char_hit_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
bag: dict[str, int] = {}
for ch in haystack:
bag[ch] = bag.get(ch, 0) + 1
hits = 0
for ch in needle:
n = bag.get(ch, 0)
if n > 0:
hits += 1
bag[ch] = n - 1
return hits / len(needle)
def _bigram_jaccard(a: str, b: str) -> float:
if len(a) < 2 or len(b) < 2:
return 0.0
A = {a[i:i + 2] for i in range(len(a) - 1)}
B = {b[i:i + 2] for i in range(len(b) - 1)}
inter = len(A & B)
union = len(A) + len(B) - inter
return inter / union if union else 0.0
def _longest_subsequence_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
j = 0
for ch in haystack:
if j >= len(needle):
break
if ch == needle[j]:
j += 1
return j / len(needle)
def _score_field(haystack: str, query: str) -> float:
hay = _normalize_text(haystack)
q = _normalize_text(query)
if not hay or not q:
return 0.0
hay_c = _compact_text(haystack)
q_c = _compact_text(query)
best = 0.0
if hay == q:
best = max(best, 100.0)
if q in hay:
best = max(best, 92 + min(6, len(q)))
if len(q) >= 2 and hay.startswith(q):
best = max(best, 88.0)
if q_c and q_c in hay_c:
best = max(best, 86.0)
best = max(best, _multiset_char_hit_ratio(q_c, hay_c) * 62)
best = max(best, _bigram_jaccard(q_c, hay_c) * 58)
best = max(best, _longest_subsequence_ratio(q_c, hay_c) * 52)
if len(q) == 1 and q in hay:
best = max(best, 68.0)
return best
def search_stickers(query: str, limit: int = 10) -> list[dict]:
"""
在内置贴纸表中按模糊匹配排序返回前 N 条结果
评分综合 name/description 字段的子串字符多重集覆盖bigram Jaccard子序列比例
name 权重略高于 description×0.88 query 时按字典顺序返回前 N
"""
safe_limit = max(1, min(500, int(limit) if limit else 10))
if not query or not _normalize_text(query):
return list(STICKER_MAP.values())[:safe_limit]
scored: list[tuple[float, dict]] = []
for sticker in STICKER_MAP.values():
name_s = _score_field(sticker.get("name", ""), query)
desc_s = _score_field(sticker.get("description", ""), query) * 0.88
sid = str(sticker.get("sticker_id", "")).strip()
q_norm = _normalize_text(query)
id_s = 0.0
if sid and q_norm:
sid_norm = _normalize_text(sid)
if sid_norm == q_norm:
id_s = 100.0
elif q_norm in sid_norm:
id_s = 84.0
scored.append((max(name_s, desc_s, id_s), sticker))
scored.sort(key=lambda x: x[0], reverse=True)
top = scored[0][0] if scored else 0
if top <= 0:
return [s for _, s in scored[:safe_limit]]
if top >= 22:
floor = 18.0
elif top >= 12:
floor = max(10.0, top * 0.5)
else:
floor = max(6.0, top * 0.35)
filtered = [pair for pair in scored if pair[0] >= floor]
out = filtered if filtered else scored
return [s for _, s in out[:safe_limit]]
def build_face_msg_body(
face_index: int,
face_type: int = 1,
data: Optional[str] = None,
) -> list:
"""
构造 TIMFaceElem 消息体
Yuanbao 约定
- index 固定传 0服务端通过 data 字段识别具体表情
- data JSON 字符串包含 sticker_id / package_id 等字段
Args:
face_index: 保留字段暂时不影响 wire formatYuanbao 固定 index=0
face_index > 0 时视为旧版 QQ 表情 ID直接放入 index
face_type: 保留字段兼容旧接口当前未使用
data: 已序列化的 JSON 字符串 None 时仅传 index
Returns:
符合 Yuanbao TIM 协议的 msg_body list::
[{"msg_type": "TIMFaceElem", "msg_content": {"index": 0, "data": "..."}}]
"""
msg_content: dict = {"index": face_index}
if data is not None:
msg_content["data"] = data
return [{"msg_type": "TIMFaceElem", "msg_content": msg_content}]
def build_sticker_msg_body(sticker: dict) -> list:
"""
STICKER_MAP 中的 sticker dict 直接构造 TIMFaceElem 消息体
这是 send_sticker() 的内部辅助确保 data 字段与原始 JS 插件一致
"""
data_payload = json.dumps(
{
"sticker_id": sticker["sticker_id"],
"package_id": sticker["package_id"],
"width": sticker.get("width", 128),
"height": sticker.get("height", 128),
"formats": sticker.get("formats", "png"),
"name": sticker["name"],
},
ensure_ascii=False,
separators=(",", ":"),
)
return build_face_msg_body(face_index=0, data=data_payload)
+421 -1050
View File
File diff suppressed because it is too large Load Diff
-150
View File
@@ -1,150 +0,0 @@
"""Gateway runtime-metadata footer.
Renders a compact footer showing runtime state (model, context %, cwd) and
appends it to the FINAL message of an agent turn when enabled. Off by default
to keep replies minimal.
Config (``~/.hermes/config.yaml``)::
display:
runtime_footer:
enabled: true # off by default
fields: [model, context_pct, cwd] # order shown; drop any to hide
Per-platform overrides live under ``display.platforms.<platform>.runtime_footer``.
Users can toggle the global setting with ``/footer on|off`` from both the CLI
and any gateway platform.
The footer is appended to the final response text in ``gateway/run.py`` right
before returning the response to the adapter send path so it only lands on
the final message a user sees, not on tool-progress updates or streaming
partials. When streaming is on and the final text has already been delivered
piecemeal, the footer is sent as a separate trailing message via
``send_trailing_footer()``.
"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Any, Iterable, Optional
_DEFAULT_FIELDS: tuple[str, ...] = ("model", "context_pct", "cwd")
_SEP = " · "
def _home_relative_cwd(cwd: str) -> str:
"""Return *cwd* with ``$HOME`` collapsed to ``~``. Empty string if unset."""
if not cwd:
return ""
try:
home = os.path.expanduser("~")
p = os.path.abspath(cwd)
if home and (p == home or p.startswith(home + os.sep)):
return "~" + p[len(home):]
return p
except Exception:
return cwd
def _model_short(model: Optional[str]) -> str:
"""Drop ``vendor/`` prefix for readability (``openai/gpt-5.4`` → ``gpt-5.4``)."""
if not model:
return ""
return model.rsplit("/", 1)[-1]
def resolve_footer_config(
user_config: dict[str, Any] | None,
platform_key: str | None = None,
) -> dict[str, Any]:
"""Resolve effective runtime-footer config for *platform_key*.
Merge order (later wins):
1. Built-in defaults (enabled=False)
2. ``display.runtime_footer``
3. ``display.platforms.<platform_key>.runtime_footer``
"""
resolved = {"enabled": False, "fields": list(_DEFAULT_FIELDS)}
cfg = (user_config or {}).get("display") or {}
global_cfg = cfg.get("runtime_footer")
if isinstance(global_cfg, dict):
if "enabled" in global_cfg:
resolved["enabled"] = bool(global_cfg.get("enabled"))
if isinstance(global_cfg.get("fields"), list) and global_cfg["fields"]:
resolved["fields"] = [str(f) for f in global_cfg["fields"]]
if platform_key:
platforms = cfg.get("platforms") or {}
plat_cfg = platforms.get(platform_key)
if isinstance(plat_cfg, dict):
plat_footer = plat_cfg.get("runtime_footer")
if isinstance(plat_footer, dict):
if "enabled" in plat_footer:
resolved["enabled"] = bool(plat_footer.get("enabled"))
if isinstance(plat_footer.get("fields"), list) and plat_footer["fields"]:
resolved["fields"] = [str(f) for f in plat_footer["fields"]]
return resolved
def format_runtime_footer(
*,
model: Optional[str],
context_tokens: int,
context_length: Optional[int],
cwd: Optional[str] = None,
fields: Iterable[str] = _DEFAULT_FIELDS,
) -> str:
"""Render the footer line, or return "" if no fields have data.
Fields are skipped silently when their underlying data is missing a
partially-populated footer is better than a line with ``?%`` or empty slots.
"""
parts: list[str] = []
for field in fields:
if field == "model":
m = _model_short(model)
if m:
parts.append(m)
elif field == "context_pct":
if context_length and context_length > 0 and context_tokens >= 0:
pct = max(0, min(100, round((context_tokens / context_length) * 100)))
parts.append(f"{pct}%")
elif field == "cwd":
rel = _home_relative_cwd(cwd or os.environ.get("TERMINAL_CWD", ""))
if rel:
parts.append(rel)
# Unknown field names are silently ignored.
if not parts:
return ""
return _SEP.join(parts)
def build_footer_line(
*,
user_config: dict[str, Any] | None,
platform_key: str | None,
model: Optional[str],
context_tokens: int,
context_length: Optional[int],
cwd: Optional[str] = None,
) -> str:
"""Top-level entry point used by gateway/run.py.
Returns the footer text (empty string when disabled or no data). Callers
append this to the final response themselves, preserving a single blank
line of separation.
"""
cfg = resolve_footer_config(user_config, platform_key)
if not cfg.get("enabled"):
return ""
return format_runtime_footer(
model=model,
context_tokens=context_tokens,
context_length=context_length,
cwd=cwd,
fields=cfg.get("fields") or _DEFAULT_FIELDS,
)
+21 -16
View File
@@ -62,8 +62,8 @@ from .config import (
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
from utils import atomic_replace
@dataclass
@@ -310,9 +310,8 @@ def build_session_context_prompt(
"**Platform notes:** You are running inside Slack. "
"You do NOT have access to Slack-specific APIs — you cannot search "
"channel history, pin/unpin messages, manage channels, or list users. "
"Do not promise to perform these actions. The gateway may inline the "
"current message's Slack block/attachment payload when available, but "
"you still cannot call Slack APIs yourself."
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.DISCORD:
# Inject the Discord IDs block only when the agent actually has
@@ -354,14 +353,6 @@ def build_session_context_prompt(
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
)
elif context.source.platform == Platform.YUANBAO:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Yuanbao. "
"You CAN send private (DM) messages via the send_message tool. "
"Use target='yuanbao:direct:<account_id>' for DM "
"and target='yuanbao:group:<group_code>' for group chat."
)
# Connected platforms
platforms_list = ["local (files on this machine)"]
@@ -705,7 +696,7 @@ class SessionStore:
json.dump(data, f, indent=2)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, sessions_file)
os.replace(tmp_path, sessions_file)
except BaseException:
try:
os.unlink(tmp_path)
@@ -1257,11 +1248,25 @@ class SessionStore:
Used by /retry, /undo, and /compress to persist modified conversation history.
Rewrites both SQLite and legacy JSONL storage.
"""
# SQLite: replace atomically so a mid-rewrite failure doesn't leave
# the session half-empty in the DB while JSONL still has history.
# SQLite: clear old messages and re-insert
if self._db:
try:
self._db.replace_messages(session_id, messages)
self._db.clear_messages(session_id)
for msg in messages:
role = msg.get("role", "unknown")
self._db.append_message(
session_id=session_id,
role=role,
content=msg.get("content"),
tool_name=msg.get("tool_name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
-145
View File
@@ -44,14 +44,6 @@ class StreamConsumerConfig:
buffer_threshold: int = 40
cursor: str = ""
buffer_only: bool = False
# When >0, the final edit for a streamed response is delivered as a
# fresh message if the original preview has been visible for at least
# this many seconds. This makes the platform's visible timestamp
# reflect completion time instead of first-token time for long-running
# responses (e.g. reasoning models that stream slowly). Ported from
# openclaw/openclaw#72038. Default 0 = always edit in place (legacy
# behavior). The gateway enables this selectively per-platform.
fresh_final_after_seconds: float = 0.0
class GatewayStreamConsumer:
@@ -91,29 +83,14 @@ class GatewayStreamConsumer:
chat_id: str,
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
on_new_message: Optional[callable] = None,
):
self.adapter = adapter
self.chat_id = chat_id
self.cfg = config or StreamConsumerConfig()
self.metadata = metadata
# Fired whenever a fresh content bubble is created on the platform
# (first-send of a new message, commentary, overflow chunk, or
# fallback continuation). The gateway uses this to linearize the
# tool-progress bubble: when content resumes after a tool batch,
# the next tool.started should open a NEW progress bubble below
# the content, not edit the old bubble above it.
# Called with no arguments. Exceptions are swallowed.
self._on_new_message = on_new_message
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
# Wall-clock timestamp (time.monotonic) when ``_message_id`` was
# first assigned from a successful first-send. Used by the
# fresh-final logic to detect long-lived previews whose edit
# timestamps would be stale by completion time. Ported from
# openclaw/openclaw#72038.
self._message_created_ts: Optional[float] = None
self._already_sent = False
self._edit_supported = True # Disabled when progressive edits are no longer usable
self._last_edit_time = 0.0
@@ -155,21 +132,10 @@ class GatewayStreamConsumer:
if text:
self._queue.put((_COMMENTARY, text))
def _notify_new_message(self) -> None:
"""Fire the on_new_message callback, swallowing any errors."""
cb = self._on_new_message
if cb is None:
return
try:
cb()
except Exception:
logger.debug("on_new_message callback error", exc_info=True)
def _reset_segment_state(self, *, preserve_no_edit: bool = False) -> None:
if preserve_no_edit and self._message_id == "__no_edit__":
return
self._message_id = None
self._message_created_ts = None
self._accumulated = ""
self._last_sent_text = ""
self._fallback_final_send = False
@@ -548,9 +514,6 @@ class GatewayStreamConsumer:
self._message_id = str(result.message_id)
self._already_sent = True
self._last_sent_text = text
# Fresh content bubble — close off any stale tool bubble
# above so the next tool starts a new bubble below.
self._notify_new_message()
return str(result.message_id)
else:
self._edit_supported = False
@@ -683,9 +646,6 @@ class GatewayStreamConsumer:
sent_any_chunk = True
last_successful_chunk = chunk
last_message_id = result.message_id or last_message_id
# Each fallback chunk is a fresh platform message — notify
# so any stale tool-progress bubble gets closed off.
self._notify_new_message()
self._message_id = last_message_id
self._already_sent = True
@@ -769,91 +729,11 @@ class GatewayStreamConsumer:
# tool..."), not the final response. Setting already_sent would cause
# the final response to be incorrectly suppressed when there are
# multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
if result.success:
# Commentary counts as fresh content — close off any
# stale tool bubble above it so the next tool starts a
# new bubble below.
self._notify_new_message()
return result.success
except Exception as e:
logger.error("Commentary send error: %s", e)
return False
def _should_send_fresh_final(self) -> bool:
"""Return True when a long-lived preview should be replaced with a
fresh final message instead of an edit.
Conditions:
- Fresh-final is enabled (``fresh_final_after_seconds > 0``).
- We have a real preview message id (not the ``__no_edit__`` sentinel
and not ``None``).
- The preview has been visible for at least the configured threshold.
Ported from openclaw/openclaw#72038.
"""
threshold = getattr(self.cfg, "fresh_final_after_seconds", 0.0) or 0.0
if threshold <= 0:
return False
if not self._message_id or self._message_id == "__no_edit__":
return False
if self._message_created_ts is None:
return False
age = time.monotonic() - self._message_created_ts
return age >= threshold
async def _try_fresh_final(self, text: str) -> bool:
"""Send ``text`` as a brand-new message (best-effort delete the old
preview) so the platform's visible timestamp reflects completion
time. Returns True on successful delivery, False on any failure so
the caller falls back to the normal edit path.
Ported from openclaw/openclaw#72038.
"""
old_message_id = self._message_id
try:
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
metadata=self.metadata,
)
except Exception as e:
logger.debug("Fresh-final send failed, falling back to edit: %s", e)
return False
if not getattr(result, "success", False):
return False
# Successful fresh send — try to delete the stale preview so the
# user doesn't see the old edit-stuck message underneath. Cleanup
# is best-effort; platforms that don't implement ``delete_message``
# just leave the preview behind (still an acceptable outcome —
# the visible final timestamp is the important part).
if old_message_id and old_message_id != "__no_edit__":
delete_fn = getattr(self.adapter, "delete_message", None)
if delete_fn is not None:
try:
await delete_fn(self.chat_id, old_message_id)
except Exception as e:
logger.debug(
"Fresh-final preview cleanup failed (%s): %s",
old_message_id, e,
)
# Adopt the new message id as the current message so subsequent
# callers (e.g. overflow split loops, finalize retries) see a
# consistent state.
new_message_id = getattr(result, "message_id", None)
if new_message_id:
self._message_id = new_message_id
self._message_created_ts = time.monotonic()
else:
# Send succeeded but platform didn't return an id — treat the
# delivery as final-only and fall back to "__no_edit__" so we
# don't try to edit something we can't address.
self._message_id = "__no_edit__"
self._message_created_ts = None
self._already_sent = True
self._last_sent_text = text
self._final_response_sent = True
return True
async def _send_or_edit(self, text: str, *, finalize: bool = False) -> bool:
"""Send or edit the streaming message.
@@ -906,22 +786,6 @@ class GatewayStreamConsumer:
finalize and self._adapter_requires_finalize
):
return True
# Fresh-final for long-lived previews: when finalizing
# the last edit in a streaming sequence, if the
# original preview has been visible for at least
# ``fresh_final_after_seconds``, send the completed
# reply as a fresh message so the platform's visible
# timestamp reflects completion time instead of the
# preview creation time. Best-effort cleanup of the
# old preview follows. Ported from
# openclaw/openclaw#72038. Gated by config so the
# legacy edit-in-place path stays the default.
if (
finalize
and self._should_send_fresh_final()
and await self._try_fresh_final(text)
):
return True
# Edit existing message
result = await self.adapter.edit_message(
chat_id=self.chat_id,
@@ -988,10 +852,6 @@ class GatewayStreamConsumer:
if result.success:
if result.message_id:
self._message_id = result.message_id
# Track when the preview first became visible to
# the user so fresh-final logic can detect stale
# preview timestamps on long-running responses.
self._message_created_ts = time.monotonic()
else:
self._edit_supported = False
self._already_sent = True
@@ -1003,11 +863,6 @@ class GatewayStreamConsumer:
# every delta/tool boundary when platforms accept a
# message but do not return an editable message id.
self._message_id = "__no_edit__"
# Notify the gateway that a fresh content bubble was
# created so any accumulated tool-progress bubble above
# gets closed off — the next tool fires into a new
# bubble below, preserving chronological order.
self._notify_new_message()
return True
else:
# Initial send failed — disable streaming for this session
+1 -21
View File
@@ -31,17 +31,8 @@ Hermes' own session keys.
from __future__ import annotations
import json
import logging
import re
from typing import Set
logger = logging.getLogger(__name__)
# WhatsApp JIDs are numeric (or plus-prefixed numeric) with optional
# ``@``, ``.`` and ``:`` separators. ``\w`` is pinned to ASCII so
# full-width digits / Unicode word chars can't sneak through.
_SAFE_IDENTIFIER_RE = re.compile(r"^[A-Za-z0-9@.+\-]+$")
from hermes_constants import get_hermes_home
@@ -90,16 +81,6 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
current = queue.pop(0)
if not current or current in resolved:
continue
# Defense-in-depth: reject identifiers that could sneak path
# separators / traversal segments into the ``lid-mapping-{current}``
# filename below. The hardcoded ``lid-mapping-`` prefix already
# prevents escape via pathlib's component split (an attacker can't
# create ``lid-mapping-..`` as a real directory in session_dir), but
# this keeps the identifier space to the characters WhatsApp JIDs
# actually use and avoids depending on that filesystem-layout
# invariant.
if not _SAFE_IDENTIFIER_RE.match(current):
continue
resolved.add(current)
for suffix in ("", "_reverse"):
@@ -110,8 +91,7 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except (OSError, json.JSONDecodeError) as exc:
logger.debug("whatsapp_identity: failed to read %s: %s", mapping_path, exc)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
+5 -65
View File
@@ -43,7 +43,6 @@ import yaml
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_constants import OPENROUTER_BASE_URL
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -110,12 +109,6 @@ SERVICE_PROVIDER_NAMES: Dict[str, str] = {
DEFAULT_GEMINI_CLOUDCODE_BASE_URL = "cloudcode-pa://google"
GEMINI_OAUTH_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 60 # refresh 60s before expiry
# LM Studio's default no-auth mode still requires *some* non-empty bearer for
# the API-key code paths (auxiliary_client, runtime resolver) to treat the
# provider as configured. This sentinel is sent only to LM Studio, never to
# any remote service.
LMSTUDIO_NOAUTH_PLACEHOLDER = "dummy-lm-api-key"
# =============================================================================
# Provider Registry
@@ -166,14 +159,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_GEMINI_CLOUDCODE_BASE_URL,
),
"lmstudio": ProviderConfig(
id="lmstudio",
name="LM Studio",
auth_type="api_key",
inference_base_url="http://127.0.0.1:1234/v1",
api_key_env_vars=("LM_API_KEY",),
base_url_env_var="LM_BASE_URL",
),
"copilot": ProviderConfig(
id="copilot",
name="GitHub Copilot",
@@ -239,14 +224,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("ARCEEAI_API_KEY",),
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": ProviderConfig(
id="gmi",
name="GMI Cloud",
auth_type="api_key",
inference_base_url="https://api.gmi-serving.com/v1",
api_key_env_vars=("GMI_API_KEY",),
base_url_env_var="GMI_BASE_URL",
),
"minimax": ProviderConfig(
id="minimax",
name="MiniMax",
@@ -363,14 +340,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"tencent-tokenhub": ProviderConfig(
id="tencent-tokenhub",
name="Tencent TokenHub",
auth_type="api_key",
inference_base_url="https://tokenhub.tencentmaas.com/v1",
api_key_env_vars=("TOKENHUB_API_KEY",),
base_url_env_var="TOKENHUB_BASE_URL",
),
"ollama-cloud": ProviderConfig(
id="ollama-cloud",
name="Ollama Cloud",
@@ -498,27 +467,11 @@ def _resolve_api_key_provider_secret(
pass
return "", ""
from hermes_cli.config import get_env_value
for env_var in pconfig.api_key_env_vars:
# Check both os.environ and ~/.hermes/.env file
val = (get_env_value(env_var) or "").strip()
val = os.getenv(env_var, "").strip()
if has_usable_secret(val):
return val, env_var
# Fallback: try credential pool (e.g. zai key stored via auth.json)
try:
from agent.credential_pool import load_pool
pool = load_pool(provider_id)
if pool and pool.has_credentials():
entry = pool.peek()
if entry:
key = getattr(entry, "access_token", "") or getattr(entry, "runtime_api_key", "")
key = str(key).strip()
if has_usable_secret(key):
return key, f"credential_pool:{provider_id}"
except Exception:
pass
return "", ""
@@ -843,7 +796,7 @@ def _save_auth_store(auth_store: Dict[str, Any]) -> Path:
handle.write(payload)
handle.flush()
os.fsync(handle.fileno())
atomic_replace(tmp_path, auth_file)
os.replace(tmp_path, auth_file)
try:
dir_fd = os.open(str(auth_file.parent), os.O_RDONLY)
except OSError:
@@ -1151,7 +1104,6 @@ def resolve_provider(
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"step": "stepfun", "stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee", "arceeai": "arcee",
"gmi-cloud": "gmi", "gmicloud": "gmi",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
@@ -1164,13 +1116,11 @@ def resolve_provider(
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth", "google-gemini-cli": "google-gemini-cli", "gemini-cli": "google-gemini-cli", "gemini-oauth": "google-gemini-cli",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub", "tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub", "tencentmaas": "tencent-tokenhub",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
"lmstudio": "lmstudio", "lm-studio": "lmstudio", "lm_studio": "lmstudio",
# Local server aliases — route through the generic custom provider
"lmstudio": "custom", "lm-studio": "custom", "lm_studio": "custom",
"ollama": "custom", "ollama_cloud": "ollama-cloud",
"vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
@@ -1217,11 +1167,8 @@ def resolve_provider(
continue
# GitHub tokens are commonly present for repo/tool access but should not
# hijack inference auto-selection unless the user explicitly chooses
# Copilot/GitHub Models as the provider. LM Studio is a local server
# whose availability isn't implied by LM_API_KEY presence (it may be
# offline, and the no-auth setup uses a placeholder value), so it
# also requires explicit selection.
if pid in ("copilot", "lmstudio"):
# Copilot/GitHub Models as the provider.
if pid == "copilot":
continue
for env_var in pconfig.api_key_env_vars:
if has_usable_secret(os.getenv(env_var, "")):
@@ -3499,13 +3446,6 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
key_source = ""
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
# No-auth LM Studio: substitute a placeholder so runtime / auxiliary_client
# see the local server as configured. doctor still reports unconfigured
# because get_api_key_provider_status uses the raw secret resolver.
if not api_key and provider_id == "lmstudio":
api_key = LMSTUDIO_NOAUTH_PLACEHOLDER
key_source = key_source or "default"
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
+1 -1
View File
@@ -34,7 +34,7 @@ from dataclasses import dataclass, field
from typing import Optional
from urllib import request as urllib_request
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse
from urllib.parse import urlparse, urlunparse
logger = logging.getLogger(__name__)
+1 -272
View File
@@ -36,23 +36,12 @@ _EXCLUDED_DIRS = {
"__pycache__", # bytecode caches — regenerated on import
".git", # nested git dirs (profiles shouldn't have these, but safety)
"node_modules", # js deps if website/ somehow leaks in
"backups", # prior auto-backups — don't nest backups exponentially
"checkpoints", # session-local trajectory caches — regenerated per-session,
# session-hash-keyed so they don't port to another machine anyway
}
# File-name suffixes to skip
_EXCLUDED_SUFFIXES = (
".pyc",
".pyo",
# SQLite sidecar files — the backup takes a consistent snapshot of ``*.db``
# via ``sqlite3.backup()``, so shipping the live WAL / shared-memory /
# rollback-journal alongside would pair a fresh snapshot with stale sidecar
# state and produce a torn restore on the next open. They're transient and
# regenerated on first connection anyway.
".db-wal",
".db-shm",
".db-journal",
)
# File names to skip (runtime state that's meaningless on another machine)
@@ -465,12 +454,6 @@ def run_import(args) -> None:
# Critical state files to include in quick snapshots (relative to HERMES_HOME).
# Everything else is either regeneratable (logs, cache) or managed separately
# (skills, repo, sessions/).
#
# Entries may be individual files OR directories. Directories are captured
# recursively; missing entries are silently skipped. Pairing data lives in
# platform-specific JSON blobs outside state.db, so it's listed here explicitly
# — `hermes update` snapshots this set before pulling so approved-user lists
# are recoverable if anything goes wrong (issue #15733).
_QUICK_STATE_FILES = (
"state.db",
"config.yaml",
@@ -480,10 +463,6 @@ _QUICK_STATE_FILES = (
"gateway_state.json",
"channel_directory.json",
"processes.json",
# Pairing stores (generic + per-platform JSONs outside state.db)
"pairing", # legacy location (gateway/pairing.py)
"platforms/pairing", # new location (gateway/pairing.py)
"feishu_comment_pairing.json", # Feishu comment subscription pairings
)
_QUICK_SNAPSHOTS_DIR = "state-snapshots"
@@ -519,27 +498,7 @@ def create_quick_snapshot(
for rel in _QUICK_STATE_FILES:
src = home / rel
if not src.exists():
continue
if src.is_dir():
# Walk the directory and record each file individually in the
# manifest so restore can treat them uniformly. Empty dirs are
# skipped (nothing to snapshot).
for sub in src.rglob("*"):
if not sub.is_file():
continue
sub_rel = sub.relative_to(home).as_posix()
dst = snap_dir / sub_rel
dst.parent.mkdir(parents=True, exist_ok=True)
try:
shutil.copy2(sub, dst)
manifest[sub_rel] = dst.stat().st_size
except (OSError, PermissionError) as exc:
logger.warning("Could not snapshot %s: %s", sub_rel, exc)
continue
if not src.is_file():
if not src.exists() or not src.is_file():
continue
dst = snap_dir / rel
@@ -694,233 +653,3 @@ def run_quick_backup(args) -> None:
print(f" Restore with: /snapshot restore {snap_id}")
else:
print("No state files found to snapshot.")
# ---------------------------------------------------------------------------
# Shared full-zip backup helper
# ---------------------------------------------------------------------------
def _write_full_zip_backup(out_path: Path, hermes_root: Path) -> Optional[Path]:
"""Write a full zip snapshot of ``hermes_root`` to ``out_path``.
Uses the same exclusion rules and SQLite safe-copy as :func:`run_backup`.
Returns the output path on success, None on failure (nothing to back up,
or write error caller should surface the outcome but not raise).
"""
files_to_add: list[tuple[Path, Path]] = []
try:
for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
dp = Path(dirpath)
# Prune excluded directories in-place so os.walk doesn't descend
dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
for fname in filenames:
fpath = dp / fname
try:
rel = fpath.relative_to(hermes_root)
except ValueError:
continue
if _should_exclude(rel):
continue
# Skip the output zip itself if it already exists inside root.
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Full-zip backup: walk failed: %s", exc)
return None
if not files_to_add:
return None
try:
with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for abs_path, rel_path in files_to_add:
try:
if abs_path.suffix == ".db":
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
tmp_db = Path(tmp.name)
try:
if _safe_copy_db(abs_path, tmp_db):
zf.write(tmp_db, arcname=str(rel_path))
finally:
tmp_db.unlink(missing_ok=True)
else:
zf.write(abs_path, arcname=str(rel_path))
except (PermissionError, OSError, ValueError) as exc:
logger.debug("Skipping %s in zip backup: %s", rel_path, exc)
continue
except OSError as exc:
logger.warning("Full-zip backup: zip write failed: %s", exc)
# Best-effort cleanup of partial file
try:
out_path.unlink(missing_ok=True)
except OSError:
pass
return None
return out_path
# ---------------------------------------------------------------------------
# Pre-update auto-backup
# ---------------------------------------------------------------------------
_PRE_UPDATE_BACKUPS_DIR = "backups"
_PRE_UPDATE_PREFIX = "pre-update-"
_PRE_UPDATE_DEFAULT_KEEP = 5
def _pre_update_backup_dir(hermes_home: Optional[Path] = None) -> Path:
home = hermes_home or get_hermes_home()
return home / _PRE_UPDATE_BACKUPS_DIR
def _prune_pre_update_backups(backup_dir: Path, keep: int) -> int:
"""Remove oldest pre-update backups beyond the keep limit.
Returns the number of files deleted. Only touches files matching
``pre-update-*.zip`` so hand-made zips dropped in the same directory
are never touched.
"""
if keep < 0:
keep = 0
if not backup_dir.exists():
return 0
backups = sorted(
(p for p in backup_dir.iterdir()
if p.is_file() and p.name.startswith(_PRE_UPDATE_PREFIX) and p.suffix.lower() == ".zip"),
key=lambda p: p.name,
reverse=True,
)
deleted = 0
for p in backups[keep:]:
try:
p.unlink()
deleted += 1
except OSError as exc:
logger.warning("Failed to prune backup %s: %s", p.name, exc)
return deleted
def create_pre_update_backup(
hermes_home: Optional[Path] = None,
keep: int = _PRE_UPDATE_DEFAULT_KEEP,
) -> Optional[Path]:
"""Create a full zip backup of HERMES_HOME under ``backups/``.
Mirrors :func:`run_backup` (same exclusion rules, same SQLite safe-copy)
but writes to ``<HERMES_HOME>/backups/pre-update-<timestamp>.zip`` and
auto-prunes old pre-update backups.
Returns the path to the created zip, or ``None`` if no files were
found or the backup could not be created. Never raises the caller
(``hermes update``) should continue even if the backup fails.
"""
hermes_root = hermes_home or get_default_hermes_root()
if not hermes_root.is_dir():
return None
backup_dir = _pre_update_backup_dir(hermes_root)
try:
backup_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
logger.warning("Could not create pre-update backup dir %s: %s", backup_dir, exc)
return None
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_UPDATE_PREFIX}{stamp}.zip"
result = _write_full_zip_backup(out_path, hermes_root)
if result is None:
return None
_prune_pre_update_backups(backup_dir, keep=keep)
return out_path
# ---------------------------------------------------------------------------
# Pre-migration auto-backup (used by `hermes claw migrate`)
# ---------------------------------------------------------------------------
_PRE_MIGRATION_PREFIX = "pre-migration-"
_PRE_MIGRATION_DEFAULT_KEEP = 5
def _prune_pre_migration_backups(backup_dir: Path, keep: int) -> int:
"""Remove oldest pre-migration backups beyond the keep limit.
Only touches files matching ``pre-migration-*.zip`` so other backups in
the same directory are never touched.
"""
if keep < 0:
keep = 0
if not backup_dir.exists():
return 0
backups = sorted(
(p for p in backup_dir.iterdir()
if p.is_file() and p.name.startswith(_PRE_MIGRATION_PREFIX) and p.suffix.lower() == ".zip"),
key=lambda p: p.name,
reverse=True,
)
deleted = 0
for p in backups[keep:]:
try:
p.unlink()
deleted += 1
except OSError as exc:
logger.warning("Failed to prune pre-migration backup %s: %s", p.name, exc)
return deleted
def create_pre_migration_backup(
hermes_home: Optional[Path] = None,
keep: int = _PRE_MIGRATION_DEFAULT_KEEP,
) -> Optional[Path]:
"""Create a full zip backup of HERMES_HOME under ``backups/`` before a
``hermes claw migrate`` apply.
Shares implementation with :func:`create_pre_update_backup` via
``_write_full_zip_backup`` same exclusions, same SQLite safe-copy,
restorable with ``hermes import <archive>``. Writes to
``<HERMES_HOME>/backups/pre-migration-<timestamp>.zip`` and auto-prunes
old pre-migration backups.
Returns the path to the created zip, or ``None`` if nothing was found
to back up (fresh install) or the write failed. Never raises the
caller decides whether to abort or proceed.
"""
hermes_root = hermes_home or get_default_hermes_root()
if not hermes_root.is_dir():
return None
# Reuses the shared backups/ directory so `hermes import` and the
# update-backup listing pick up pre-migration archives too.
backup_dir = _pre_update_backup_dir(hermes_root)
try:
backup_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
logger.warning("Could not create pre-migration backup dir %s: %s", backup_dir, exc)
return None
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_MIGRATION_PREFIX}{stamp}.zip"
result = _write_full_zip_backup(out_path, hermes_root)
if result is None:
return None
_prune_pre_migration_backups(backup_dir, keep=keep)
return out_path
+1
View File
@@ -562,6 +562,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
right_content = "\n".join(right_lines)
layout_table.add_row(left_content, right_content)
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
version_label = format_banner_version_label()
-138
View File
@@ -1,138 +0,0 @@
"""Shared helpers for attaching Hermes to a local Chrome CDP port."""
from __future__ import annotations
import os
import platform
import shlex
import shutil
import subprocess
from hermes_constants import get_hermes_home
DEFAULT_BROWSER_CDP_PORT = 9222
DEFAULT_BROWSER_CDP_URL = f"http://127.0.0.1:{DEFAULT_BROWSER_CDP_PORT}"
_DARWIN_APPS = (
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Chromium.app/Contents/MacOS/Chromium",
"/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
)
_WINDOWS_INSTALL_PARTS = (
("Google", "Chrome", "Application", "chrome.exe"),
("Chromium", "Application", "chrome.exe"),
("Chromium", "Application", "chromium.exe"),
("BraveSoftware", "Brave-Browser", "Application", "brave.exe"),
("Microsoft", "Edge", "Application", "msedge.exe"),
)
_LINUX_BIN_NAMES = (
"google-chrome", "google-chrome-stable", "chromium-browser",
"chromium", "brave-browser", "microsoft-edge",
)
_WINDOWS_BIN_NAMES = (
"chrome.exe", "msedge.exe", "brave.exe", "chromium.exe",
"chrome", "msedge", "brave", "chromium",
)
def get_chrome_debug_candidates(system: str) -> list[str]:
candidates: list[str] = []
seen: set[str] = set()
def add(path: str | None) -> None:
if not path:
return
normalized = os.path.normcase(os.path.normpath(path))
if normalized in seen or not os.path.isfile(path):
return
candidates.append(path)
seen.add(normalized)
def add_install_paths(bases: tuple[str | None, ...]) -> None:
for base in filter(None, bases):
for parts in _WINDOWS_INSTALL_PARTS:
add(os.path.join(base, *parts))
if system == "Darwin":
for app in _DARWIN_APPS:
add(app)
return candidates
if system == "Windows":
for name in _WINDOWS_BIN_NAMES:
add(shutil.which(name))
add_install_paths((
os.environ.get("ProgramFiles"),
os.environ.get("ProgramFiles(x86)"),
os.environ.get("LOCALAPPDATA"),
))
return candidates
for name in _LINUX_BIN_NAMES:
add(shutil.which(name))
add_install_paths(("/mnt/c/Program Files", "/mnt/c/Program Files (x86)"))
return candidates
def chrome_debug_data_dir() -> str:
return str(get_hermes_home() / "chrome-debug")
def _chrome_debug_args(port: int) -> list[str]:
return [
f"--remote-debugging-port={port}",
f"--user-data-dir={chrome_debug_data_dir()}",
"--no-first-run",
"--no-default-browser-check",
]
def manual_chrome_debug_command(port: int = DEFAULT_BROWSER_CDP_PORT, system: str | None = None) -> str | None:
system = system or platform.system()
candidates = get_chrome_debug_candidates(system)
if candidates:
argv = [candidates[0], *_chrome_debug_args(port)]
return subprocess.list2cmdline(argv) if system == "Windows" else shlex.join(argv)
if system == "Darwin":
data_dir = chrome_debug_data_dir()
return (
f'open -a "Google Chrome" --args --remote-debugging-port={port} '
f'--user-data-dir="{data_dir}" --no-first-run --no-default-browser-check'
)
return None
def _detach_kwargs(system: str) -> dict:
if system != "Windows":
return {"start_new_session": True}
flags = getattr(subprocess, "DETACHED_PROCESS", 0) | getattr(
subprocess, "CREATE_NEW_PROCESS_GROUP", 0
)
return {"creationflags": flags} if flags else {}
def try_launch_chrome_debug(port: int = DEFAULT_BROWSER_CDP_PORT, system: str | None = None) -> bool:
system = system or platform.system()
candidates = get_chrome_debug_candidates(system)
if not candidates:
return False
os.makedirs(chrome_debug_data_dir(), exist_ok=True)
try:
subprocess.Popen(
[candidates[0], *_chrome_debug_args(port)],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**_detach_kwargs(system),
)
return True
except Exception:
return False
+6 -67
View File
@@ -4,8 +4,7 @@ Usage:
hermes claw migrate # Preview then migrate (always shows preview first)
hermes claw migrate --dry-run # Preview only, no changes
hermes claw migrate --yes # Skip confirmation prompt
hermes claw migrate --preset full --overwrite --migrate-secrets # Full run w/ secrets
hermes claw migrate --no-backup # Skip pre-migration snapshot
hermes claw migrate --preset full --overwrite # Full migration, overwrite conflicts
hermes claw cleanup # Archive leftover OpenClaw directories
hermes claw cleanup --dry-run # Preview what would be archived
"""
@@ -16,7 +15,6 @@ import subprocess
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home, get_config_path, load_config, save_config
from hermes_constants import get_optional_skills_dir
@@ -323,13 +321,10 @@ def _cmd_migrate(args):
migrate_secrets = getattr(args, "migrate_secrets", False)
workspace_target = getattr(args, "workspace_target", None)
skill_conflict = getattr(args, "skill_conflict", "skip")
no_backup = getattr(args, "no_backup", False)
# Secrets are never included implicitly — they must be explicitly requested
# via --migrate-secrets, even under --preset full. This mirrors OpenClaw's
# migrate-hermes posture (two-phase: run once without secrets, rerun with
# --include-secrets) and prevents a --preset full invocation from silently
# importing API keys that the user may not have intended to copy.
# If using the "full" preset, secrets are included by default
if preset == "full":
migrate_secrets = True
print()
print(
@@ -436,24 +431,15 @@ def _cmd_migrate(args):
preview_summary = preview_report.get("summary", {})
preview_count = preview_summary.get("migrated", 0)
preview_conflicts = preview_summary.get("conflict", 0)
# "Nothing to migrate" means nothing migrated AND nothing blocked by
# conflicts. If there are conflicts, we still want to show the plan and
# surface the refusal/--overwrite guidance instead of silently bailing.
if preview_count == 0 and preview_conflicts == 0:
if preview_count == 0:
print()
print_info("Nothing to migrate from OpenClaw.")
_print_migration_report(preview_report, dry_run=True)
return
print()
if preview_count > 0:
print_header(f"Migration Preview — {preview_count} item(s) would be imported")
else:
print_header(
f"Migration Preview — {preview_conflicts} conflict(s), nothing would be imported"
)
print_header(f"Migration Preview — {preview_count} item(s) would be imported")
print_info("No changes have been made yet. Review the list below:")
_print_migration_report(preview_report, dry_run=True)
@@ -461,24 +447,6 @@ def _cmd_migrate(args):
if dry_run:
return
# ── Phase 1b: Refuse if the plan has conflicts and --overwrite is not set ─
# Modelled on OpenClaw's assertConflictFreePlan() — apply is a safe no-op
# on conflicts unless the user explicitly opts in to overwriting. Without
# this guard, the user would answer "yes, proceed" and silently end up
# with a migration that skipped every conflicting item.
if preview_conflicts > 0 and not overwrite:
print()
print_error(
f"Plan has {preview_conflicts} conflict(s). Refusing to apply."
)
print_info(
"Each conflict is an item whose target already exists in ~/.hermes/. "
"Re-run with --overwrite to replace conflicting targets (item-level "
"backups are written to the migration report directory)."
)
print_info("Or re-run with --dry-run to review the full plan.")
return
# ── Phase 2: Confirm and execute ───────────────────────────
print()
if not auto_yes:
@@ -490,32 +458,6 @@ def _cmd_migrate(args):
print_info("Migration cancelled.")
return
# ── Phase 2b: Pre-apply backup of the Hermes home ─────────
# Delegates to hermes_cli.backup.create_pre_migration_backup(), which
# shares implementation with the pre-update backup (same exclusion
# rules, same SQLite safe-copy, zip format) so the archive is
# restorable with `hermes import`. Mirrors OpenClaw's
# createPreMigrationBackup posture — one atomic restore point before
# any mutation, auto-pruned to the last 5 pre-migration zips.
backup_archive: Optional[Path] = None
if not no_backup:
try:
from hermes_cli.backup import create_pre_migration_backup, _format_size
backup_archive = create_pre_migration_backup(hermes_home=hermes_home)
if backup_archive:
size_str = _format_size(backup_archive.stat().st_size)
print()
print_success(f"Pre-migration backup: {backup_archive} ({size_str})")
print_info(f"Restore with: hermes import {backup_archive.name}")
except Exception as e:
print()
print_error(f"Could not create pre-migration backup: {e}")
print_info(
"Re-run with --no-backup to skip, or free up disk space under the Hermes home."
)
logger.debug("Pre-migration backup error", exc_info=True)
return
try:
migrator = mod.Migrator(
source_root=source_dir.resolve(),
@@ -534,9 +476,6 @@ def _cmd_migrate(args):
print()
print_error(f"Migration failed: {e}")
logger.debug("OpenClaw migration error", exc_info=True)
if backup_archive:
print_info(f"A pre-migration backup is available at: {backup_archive}")
print_info(f"Restore with: hermes import {backup_archive.name}")
return
# Print results
+6 -169
View File
@@ -62,8 +62,6 @@ COMMAND_REGISTRY: list[CommandDef] = [
aliases=("reset",)),
CommandDef("clear", "Clear screen and start a new session", "Session",
cli_only=True),
CommandDef("redraw", "Force a full UI repaint (recovers from terminal drift)", "Session",
cli_only=True),
CommandDef("history", "Show conversation history", "Session",
cli_only=True),
CommandDef("save", "Save the current conversation", "Session",
@@ -115,9 +113,6 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
"Configuration", cli_only=True,
gateway_config_gate="display.tool_progress_command"),
CommandDef("footer", "Toggle gateway runtime-metadata footer on final replies",
"Configuration", args_hint="[on|off|status]",
subcommands=("on", "off", "status")),
CommandDef("yolo", "Toggle YOLO mode (skip all dangerous command approvals)",
"Configuration"),
CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
@@ -128,14 +123,11 @@ COMMAND_REGISTRY: list[CommandDef] = [
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
cli_only=True, args_hint="[name]"),
CommandDef("indicator", "Pick the TUI busy-indicator style", "Configuration",
cli_only=True, args_hint="[kaomoji|emoji|unicode|ascii]",
subcommands=("kaomoji", "emoji", "unicode", "ascii")),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|steer|interrupt|status]",
subcommands=("queue", "steer", "interrupt", "status")),
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -148,9 +140,11 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("curator", "Background skill maintenance (status, run, pin, archive)",
CommandDef("kanban", "Multi-profile collaboration board (tasks, links, comments)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("status", "run", "pause", "resume", "pin", "unpin", "restore")),
subcommands=("list", "ls", "show", "create", "assign", "link", "unlink",
"claim", "comment", "complete", "block", "unblock", "archive",
"tail", "dispatch", "context", "init", "gc")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
@@ -817,114 +811,6 @@ def discord_skill_commands_by_category(
return trimmed_categories, uncategorized, hidden
# ---------------------------------------------------------------------------
# Slack native slash commands
# ---------------------------------------------------------------------------
# Slack slash command name constraints: lowercase a-z, 0-9, hyphens,
# underscores. Max 32 chars. Slack app manifest accepts up to 50 slash
# commands per app.
_SLACK_MAX_SLASH_COMMANDS = 50
_SLACK_NAME_LIMIT = 32
_SLACK_INVALID_CHARS = re.compile(r"[^a-z0-9_\-]")
def _sanitize_slack_name(raw: str) -> str:
"""Convert a command name to a valid Slack slash command name.
Slack allows lowercase a-z, digits, hyphens, and underscores. Max 32
chars. Uppercase is lowercased; invalid chars are stripped.
"""
name = raw.lower()
name = _SLACK_INVALID_CHARS.sub("", name)
name = name.strip("-_")
return name[:_SLACK_NAME_LIMIT]
def slack_native_slashes() -> list[tuple[str, str, str]]:
"""Return (slash_name, description, usage_hint) triples for Slack.
Every gateway-available command in ``COMMAND_REGISTRY`` is surfaced as
a standalone Slack slash command (e.g. ``/btw``, ``/stop``, ``/model``),
matching Discord's and Telegram's model where every command is a
first-class slash and not a ``/hermes <verb>`` subcommand.
Both canonical names and aliases are included so users can type any
documented form (e.g. ``/background``, ``/bg``, and ``/btw`` all work).
Plugin-registered slash commands are included too.
Results are clamped to Slack's 50-command limit with duplicate-name
avoidance. ``/hermes`` is always reserved as the first entry so the
legacy ``/hermes <subcommand>`` form keeps working for anything that
gets dropped by the clamp or for free-form questions.
"""
overrides = _resolve_config_gates()
entries: list[tuple[str, str, str]] = []
seen: set[str] = set()
# Reserve /hermes as the catch-all top-level command.
entries.append(("hermes", "Talk to Hermes or run a subcommand", "[subcommand] [args]"))
seen.add("hermes")
def _add(name: str, desc: str, hint: str) -> None:
slack_name = _sanitize_slack_name(name)
if not slack_name or slack_name in seen:
return
if len(entries) >= _SLACK_MAX_SLASH_COMMANDS:
return
# Slack description cap is 2000 chars; keep it short.
entries.append((slack_name, desc[:140], hint[:100]))
seen.add(slack_name)
# First pass: canonical names (so they win slots if we hit the cap).
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
_add(cmd.name, cmd.description, cmd.args_hint or "")
# Second pass: aliases.
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
for alias in cmd.aliases:
# Skip aliases that only differ from canonical by case/punctuation
# normalization (already covered by _add dedup).
_add(alias, f"Alias for /{cmd.name}{cmd.description}", cmd.args_hint or "")
# Third pass: plugin commands.
for name, description, args_hint in _iter_plugin_command_entries():
_add(name, description, args_hint or "")
return entries
def slack_app_manifest(request_url: str = "https://hermes-agent.local/slack/commands") -> dict[str, Any]:
"""Generate a Slack app manifest with all gateway commands as slashes.
``request_url`` is required by Slack's manifest schema for every slash
command, but in Socket Mode (which we use) Slack ignores it and routes
the command event through the WebSocket. A placeholder URL is fine.
The returned dict is the ``features.slash_commands`` portion only
callers compose it into a full manifest (or merge into an existing
one). Keeping it narrow avoids coupling us to the rest of the manifest
schema (display_information, oauth_config, settings, etc.) which users
set up once in the Slack UI and rarely change.
"""
slashes = []
for name, desc, usage in slack_native_slashes():
entry = {
"command": f"/{name}",
"description": desc or f"Run /{name}",
"should_escape": False,
"url": request_url,
}
if usage:
entry["usage_hint"] = usage
slashes.append(entry)
return {"features": {"slash_commands": slashes}}
def slack_subcommand_map() -> dict[str, str]:
"""Return subcommand -> /command mapping for Slack /hermes handler.
@@ -952,42 +838,6 @@ def slack_subcommand_map() -> dict[str, str]:
# Autocomplete
# ---------------------------------------------------------------------------
# Per-process cache for /model<space> LM Studio autocomplete. Probing on
# every keystroke would block the UI; a short TTL keeps it live without
# hammering the server.
_LMSTUDIO_COMPLETION_CACHE: tuple[float, list[str]] | None = None
def _lmstudio_completion_models() -> list[str]:
"""Locally-loaded LM Studio models for /model autocomplete (cached, gated)."""
global _LMSTUDIO_COMPLETION_CACHE
# Gate: don't probe 127.0.0.1 on every keystroke for users who don't use LM Studio.
if not (os.environ.get("LM_API_KEY") or os.environ.get("LM_BASE_URL")):
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store() or {}
if "lmstudio" not in (store.get("providers") or {}) \
and "lmstudio" not in (store.get("credential_pool") or {}):
return []
except Exception:
return []
now = time.time()
if _LMSTUDIO_COMPLETION_CACHE and (now - _LMSTUDIO_COMPLETION_CACHE[0]) < 30.0:
return _LMSTUDIO_COMPLETION_CACHE[1]
try:
from hermes_cli.models import fetch_lmstudio_models
models = fetch_lmstudio_models(
api_key=os.environ.get("LM_API_KEY", ""),
base_url=os.environ.get("LM_BASE_URL") or "http://127.0.0.1:1234/v1",
timeout=0.8,
)
except Exception:
models = []
_LMSTUDIO_COMPLETION_CACHE = (now, models)
return models
class SlashCommandCompleter(Completer):
"""Autocomplete for built-in slash commands, subcommands, and skill commands."""
@@ -1411,19 +1261,6 @@ class SlashCommandCompleter(Completer):
)
except Exception:
pass
# LM Studio: surface locally-loaded models. Gated on the user actually
# having LM Studio configured (env var or auth-store entry) so we
# don't probe 127.0.0.1 on every keystroke for users who don't use it.
for name in _lmstudio_completion_models():
if name in seen:
continue
if name.startswith(sub_lower) and name != sub_lower:
yield Completion(
name,
start_position=-len(sub_text),
display=name,
display_meta="LM Studio",
)
def get_completions(self, document, complete_event):
text = document.text_before_cursor
+43 -371
View File
@@ -30,67 +30,34 @@ logger = logging.getLogger(__name__)
_IS_WINDOWS = platform.system() == "Windows"
_ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
_LAST_EXPANDED_CONFIG_BY_PATH: Dict[str, Any] = {}
# (path, mtime_ns, size) -> cached expanded config dict.
# load_config() returns a deepcopy of the cached value when the file
# hasn't changed since the last load, skipping yaml.safe_load +
# _deep_merge + _normalize_* + _expand_env_vars (~13 ms/call).
# save_config() + migrate_config() write via atomic_yaml_write which
# produces a fresh inode, so stat() sees a new mtime_ns and the next
# load repopulates automatically — no explicit invalidation hook.
_LOAD_CONFIG_CACHE: Dict[str, Tuple[int, int, Dict[str, Any]]] = {}
# (path, mtime_ns, size) -> cached raw yaml dict. Same pattern as
# _LOAD_CONFIG_CACHE but for read_raw_config() — used when callers want
# the user's on-disk values without defaults merged in.
_RAW_CONFIG_CACHE: Dict[str, Tuple[int, int, Dict[str, Any]]] = {}
# Env var names written to .env that aren't in OPTIONAL_ENV_VARS
# (managed by setup/provider flows directly).
_EXTRA_ENV_KEYS = frozenset({
"OPENAI_API_KEY", "OPENAI_BASE_URL",
"ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
"DISCORD_HOME_CHANNEL", "DISCORD_HOME_CHANNEL_NAME",
"TELEGRAM_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL_NAME",
"SLACK_HOME_CHANNEL", "SLACK_HOME_CHANNEL_NAME",
"DISCORD_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL",
"SIGNAL_ACCOUNT", "SIGNAL_HTTP_URL",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"SIGNAL_HOME_CHANNEL", "SIGNAL_HOME_CHANNEL_NAME",
"SMS_HOME_CHANNEL", "SMS_HOME_CHANNEL_NAME",
"DINGTALK_CLIENT_ID", "DINGTALK_CLIENT_SECRET",
"DINGTALK_HOME_CHANNEL", "DINGTALK_HOME_CHANNEL_NAME",
"FEISHU_APP_ID", "FEISHU_APP_SECRET", "FEISHU_ENCRYPT_KEY", "FEISHU_VERIFICATION_TOKEN",
"FEISHU_HOME_CHANNEL", "FEISHU_HOME_CHANNEL_NAME",
"YUANBAO_HOME_CHANNEL", "YUANBAO_HOME_CHANNEL_NAME",
"WECOM_BOT_ID", "WECOM_SECRET",
"WECOM_CALLBACK_CORP_ID", "WECOM_CALLBACK_CORP_SECRET", "WECOM_CALLBACK_AGENT_ID",
"WECOM_CALLBACK_TOKEN", "WECOM_CALLBACK_ENCODING_AES_KEY",
"WECOM_CALLBACK_HOST", "WECOM_CALLBACK_PORT",
"WECOM_HOME_CHANNEL", "WECOM_HOME_CHANNEL_NAME",
"WEIXIN_ACCOUNT_ID", "WEIXIN_TOKEN", "WEIXIN_BASE_URL", "WEIXIN_CDN_BASE_URL",
"WEIXIN_HOME_CHANNEL", "WEIXIN_HOME_CHANNEL_NAME", "WEIXIN_DM_POLICY", "WEIXIN_GROUP_POLICY",
"WEIXIN_ALLOWED_USERS", "WEIXIN_GROUP_ALLOWED_USERS", "WEIXIN_ALLOW_ALL_USERS",
"BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_PASSWORD",
"BLUEBUBBLES_HOME_CHANNEL", "BLUEBUBBLES_HOME_CHANNEL_NAME",
"QQ_APP_ID", "QQ_CLIENT_SECRET", "QQBOT_HOME_CHANNEL", "QQBOT_HOME_CHANNEL_NAME",
"QQ_HOME_CHANNEL", "QQ_HOME_CHANNEL_NAME", # legacy aliases (pre-rename, still read for back-compat)
"QQ_ALLOWED_USERS", "QQ_GROUP_ALLOWED_USERS", "QQ_ALLOW_ALL_USERS", "QQ_MARKDOWN_SUPPORT",
"QQ_STT_API_KEY", "QQ_STT_BASE_URL", "QQ_STT_MODEL",
"TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_HOME_CHANNEL_NAME", "MATTERMOST_REPLY_MODE",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
"MATRIX_PASSWORD", "MATRIX_ENCRYPTION", "MATRIX_DEVICE_ID", "MATRIX_HOME_ROOM",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD", "MATRIX_DM_AUTO_THREAD",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD",
"MATRIX_RECOVERY_KEY",
# Langfuse observability plugin — optional tuning keys + standard SDK vars.
# Activation is via plugins.enabled (opt-in through `hermes plugins enable
# observability/langfuse` or `hermes tools → Langfuse`); credentials gate
# the plugin at runtime.
"HERMES_LANGFUSE_ENV",
"HERMES_LANGFUSE_RELEASE",
"HERMES_LANGFUSE_SAMPLE_RATE",
"HERMES_LANGFUSE_MAX_CHARS",
"HERMES_LANGFUSE_DEBUG",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
})
import yaml
@@ -239,7 +206,6 @@ def get_container_exec_info() -> Optional[dict]:
# Re-export from hermes_constants — canonical definition lives there.
from hermes_constants import get_hermes_home # noqa: F811,E402
from utils import atomic_replace
def get_config_path() -> Path:
"""Get the main config file path."""
@@ -423,34 +389,6 @@ DEFAULT_CONFIG = {
# (60+ tool iterations with tiny output) before users assume the
# bot is dead and /restart.
"gateway_notify_interval": 180,
# Freshness window for the gateway auto-continue note (seconds).
# After a gateway crash/restart/SIGTERM mid-run, the next user
# message gets a "[System note: your previous turn was
# interrupted — process the unfinished tool result(s) first]"
# prepended so the model picks up where it left off. That's the
# right behaviour while the interruption is fresh, but stale
# markers (transcript last touched hours or days ago) can revive
# an unrelated old task when the user's next message starts new
# work. This window is the max age of the last persisted
# transcript row for which we still inject the continue note.
# Default 3600s comfortably covers a long turn (gateway_timeout
# default is 1800s) plus runtime slack. Set to 0 to disable the
# gate and restore pre-fix behaviour (always inject).
"gateway_auto_continue_freshness": 3600,
# How user-attached images are presented to the main model on each turn.
# "auto" — attach natively when the active model reports
# supports_vision=True AND the user hasn't explicitly
# configured auxiliary.vision.provider. Otherwise fall
# back to text (vision_analyze pre-analysis).
# "native" — always attach natively; non-vision models will either
# error at the provider or get a last-chance text fallback
# (see run_agent._prepare_messages_for_api).
# "text" — always pre-analyze with vision_analyze and prepend the
# description as text; the main model never sees pixels.
# Affects gateway platforms, the TUI, and CLI /attach. vision_analyze
# remains available as a tool regardless of this setting — the routing
# only controls how inbound user images are presented.
"image_input_mode": "auto",
},
"terminal": {
@@ -527,7 +465,6 @@ DEFAULT_CONFIG = {
"command_timeout": 30, # Timeout for browser commands in seconds (screenshot, navigate, etc.)
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"auto_local_for_private_urls": True, # When a cloud provider is set, auto-spawn local Chromium for LAN/localhost URLs instead of sending them to the cloud
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
@@ -549,19 +486,6 @@ DEFAULT_CONFIG = {
"checkpoints": {
"enabled": True,
"max_snapshots": 50, # Max checkpoints to keep per directory
# Auto-maintenance: shadow repos accumulate forever under
# ~/.hermes/checkpoints/ (one per cd'd working directory). Field
# reports put the typical offender at 1000+ repos / ~12 GB. When
# auto_prune is on, hermes sweeps at startup (at most once per
# min_interval_hours) and deletes:
# * orphan repos: HERMES_WORKDIR no longer exists on disk
# * stale repos: newest mtime older than retention_days
# Opt-in so users who rely on /rollback against long-ago sessions
# never lose data silently.
"auto_prune": False,
"retention_days": 7,
"delete_orphans": True,
"min_interval_hours": 24,
},
# Maximum characters returned by a single read_file call. Reads that
@@ -594,7 +518,7 @@ DEFAULT_CONFIG = {
"threshold": 0.50, # compress when context usage exceeds this ratio
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
"hygiene_hard_message_limit": 400, # gateway session-hygiene force-compress threshold by message count
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
@@ -702,12 +626,7 @@ DEFAULT_CONFIG = {
"compact": False,
"personality": "kawaii",
"resume_display": "full",
"busy_input_mode": "interrupt", # interrupt | queue | steer
# When true, `hermes --tui` auto-resumes the most recent human-
# facing session on launch instead of forging a fresh one.
# Mirrors `hermes -c` muscle memory. Default off so existing
# users aren't surprised. HERMES_TUI_RESUME=<id> always wins.
"tui_auto_resume_recent": False,
"busy_input_mode": "interrupt",
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
@@ -715,9 +634,6 @@ DEFAULT_CONFIG = {
"inline_diffs": True, # Show inline diff previews for write actions (write_file, patch, skill_manage)
"show_cost": False, # Show $ cost in the status bar (off by default)
"skin": "default",
# TUI busy indicator style: kaomoji (default), emoji, unicode (braille
# spinner), or ascii. Live-swappable via `/indicator <style>`.
"tui_status_indicator": "kaomoji",
"user_message_preview": { # CLI: how many submitted user-message lines to echo back in scrollback
"first_lines": 2,
"last_lines": 2,
@@ -727,14 +643,6 @@ DEFAULT_CONFIG = {
"tool_progress_overrides": {}, # DEPRECATED — use display.platforms instead
"tool_preview_length": 0, # Max chars for tool call previews (0 = no limit, show full paths/commands)
"platforms": {}, # Per-platform display overrides: {"telegram": {"tool_progress": "all"}, "slack": {"tool_progress": "off"}}
# Gateway runtime-metadata footer appended to the FINAL message of a turn
# (disabled by default to keep replies minimal). When enabled, renders
# e.g. `model · 68% · ~/projects/hermes`. Per-platform overrides go under
# display.platforms.<platform>.runtime_footer.
"runtime_footer": {
"enabled": False,
"fields": ["model", "context_pct", "cwd"], # Order shown; drop any to hide
},
},
# Web dashboard settings
@@ -915,35 +823,6 @@ DEFAULT_CONFIG = {
"guard_agent_created": False,
},
# Curator — background skill maintenance.
#
# Periodically reviews AGENT-CREATED skills (never bundled or
# hub-installed) and keeps the collection tidy: marks long-unused skills
# as stale, archives genuinely obsolete ones (archive only, never
# deletes), and spawns a forked aux-model agent to consolidate overlaps
# and patch drift. Runs inactivity-triggered from session start — no
# cron daemon.
#
# See `hermes curator status` for the last run summary.
"curator": {
"enabled": True,
# How long to wait between curator runs (hours). Default: 7 days.
"interval_hours": 24 * 7,
# Only run when the agent has been idle at least this long (hours).
"min_idle_hours": 2,
# Mark a skill as "stale" after this many days without use.
"stale_after_days": 30,
# Archive a skill (move to skills/.archive/) after this many days
# without use. Archived skills are recoverable — no auto-deletion.
"archive_after_days": 90,
# Optional per-task override for the curator's aux model. Leave null
# to use Hermes' main auxiliary client resolution.
"auxiliary": {
"provider": None,
"model": None,
},
},
# Honcho AI-native memory -- reads ~/.honcho/config.json as single source of truth.
# This section is only needed for hermes-specific overrides; everything else
# (apiKey, workspace, peerName, sessions, enabled) comes from the global config.
@@ -981,7 +860,6 @@ DEFAULT_CONFIG = {
# Telegram platform settings (gateway mode)
"telegram": {
"reactions": False, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
},
@@ -1036,7 +914,7 @@ DEFAULT_CONFIG = {
# Pre-exec security scanning via tirith
"security": {
"allow_private_urls": False, # Allow requests to private/internal IPs (for OpenWrt, proxies, VPNs)
"redact_secrets": False,
"redact_secrets": True,
"tirith_enabled": True,
"tirith_path": "tirith",
"tirith_timeout": 5,
@@ -1145,20 +1023,6 @@ DEFAULT_CONFIG = {
"seen": {},
},
# ``hermes update`` behaviour.
"updates": {
# Run a full ``hermes backup``-style zip of HERMES_HOME before every
# ``hermes update``. Backups land in ``<HERMES_HOME>/backups/`` and
# can be restored with ``hermes import <path>``. Off by default —
# on large HERMES_HOME directories the zip can add minutes to every
# update. Set to true to re-enable, or pass ``--backup`` to opt in
# for a single update run.
"pre_update_backup": False,
# How many pre-update backup zips to retain. Older ones are pruned
# automatically after each successful backup.
"backup_keep": 5,
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1260,22 +1124,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"LM_API_KEY": {
"description": "LM Studio bearer token for auth-enabled local servers",
"prompt": "LM Studio API key / bearer token",
"url": None,
"password": True,
"category": "provider",
"advanced": True,
},
"LM_BASE_URL": {
"description": "LM Studio base URL override",
"prompt": "LM Studio base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"GLM_API_KEY": {
"description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
"prompt": "Z.AI / GLM API key",
@@ -1364,22 +1212,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"GMI_API_KEY": {
"description": "GMI Cloud API key",
"prompt": "GMI Cloud API key",
"url": "https://www.gmicloud.ai/",
"password": True,
"category": "provider",
"advanced": True,
},
"GMI_BASE_URL": {
"description": "GMI Cloud base URL override",
"prompt": "GMI Cloud base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"MINIMAX_API_KEY": {
"description": "MiniMax API key (international)",
"prompt": "MiniMax API key",
@@ -1749,44 +1581,6 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Bundled skills (opt-in: only needed if the user uses that skill) ──
# These use category="skill" (distinct from "tool") so the sandbox
# env blocklist in tools/environments/local.py does NOT rewrite them —
# skills legitimately need these passed through to curl via
# tools/env_passthrough.py when the user's skill calls out.
"NOTION_API_KEY": {
"description": "Notion integration token (used by the `notion` skill)",
"prompt": "Notion API key",
"url": "https://www.notion.so/my-integrations",
"password": True,
"category": "skill",
"advanced": True,
},
"LINEAR_API_KEY": {
"description": "Linear personal API key (used by the `linear` skill)",
"prompt": "Linear API key",
"url": "https://linear.app/settings/api",
"password": True,
"category": "skill",
"advanced": True,
},
"AIRTABLE_API_KEY": {
"description": "Airtable personal access token (used by the `airtable` skill)",
"prompt": "Airtable API key",
"url": "https://airtable.com/create/tokens",
"password": True,
"category": "skill",
"advanced": True,
},
"TENOR_API_KEY": {
"description": "Tenor API key for GIF search (used by the `gif-search` skill)",
"prompt": "Tenor API key",
"url": "https://developers.google.com/tenor/guides/quickstart",
"password": True,
"category": "skill",
"advanced": True,
},
# ── Honcho ──
"HONCHO_API_KEY": {
"description": "Honcho API key for AI-native persistent memory",
@@ -1802,30 +1596,6 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Langfuse observability ──
"HERMES_LANGFUSE_PUBLIC_KEY": {
"description": "Langfuse project public key (pk-lf-...)",
"prompt": "Langfuse public key",
"url": "https://cloud.langfuse.com",
"password": False,
"category": "tool",
},
"HERMES_LANGFUSE_SECRET_KEY": {
"description": "Langfuse project secret key (sk-lf-...)",
"prompt": "Langfuse secret key",
"url": "https://cloud.langfuse.com",
"password": True,
"category": "tool",
},
"HERMES_LANGFUSE_BASE_URL": {
"description": "Langfuse server URL (default: https://cloud.langfuse.com)",
"prompt": "Langfuse server URL (leave empty for cloud.langfuse.com)",
"url": None,
"password": False,
"category": "tool",
"advanced": True,
},
# ── Messaging platforms ──
"TELEGRAM_BOT_TOKEN": {
"description": "Telegram bot token from @BotFather",
@@ -1973,14 +1743,6 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"MATRIX_DM_AUTO_THREAD": {
"description": "Auto-create threads for DM messages in Matrix (default: false)",
"prompt": "Auto-create threads in DMs (true/false)",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"MATRIX_DEVICE_ID": {
"description": "Stable Matrix device ID for E2EE persistence across restarts (e.g. HERMES_BOT)",
"prompt": "Matrix device ID (stable across restarts)",
@@ -2322,21 +2084,14 @@ def _normalize_custom_provider_entry(
"baseUrl": "base_url",
"apiMode": "api_mode",
"keyEnv": "key_env",
"apiKeyEnv": "key_env", # alias — OpenClaw-compatible + docs variant
"defaultModel": "default_model",
"contextLength": "context_length",
"rateLimitDelay": "rate_limit_delay",
}
# api_key_env is a documented snake_case alias for key_env (see
# website/docs/guides/azure-foundry.md). Normalize it up front so the
# rest of the normalizer treats it as the canonical field.
if "api_key_env" in entry and "key_env" not in entry:
entry["key_env"] = entry["api_key_env"]
_KNOWN_KEYS = {
"name", "api", "url", "base_url", "api_key", "key_env", "api_key_env",
"name", "api", "url", "base_url", "api_key", "key_env",
"api_mode", "transport", "model", "default_model", "models",
"context_length", "rate_limit_delay",
"request_timeout_seconds", "stale_timeout_seconds",
}
for camel, snake in _CAMEL_ALIASES.items():
if camel in entry and snake not in entry:
@@ -2588,9 +2343,6 @@ _KNOWN_ROOT_KEYS = {
_VALID_CUSTOM_PROVIDER_FIELDS = {
"name", "base_url", "api_key", "api_mode", "model", "models",
"context_length", "rate_limit_delay",
# key_env is read at runtime by runtime_provider.py and auxiliary_client.py
# — include it here so the set accurately describes the supported schema.
"key_env",
}
# Fields that look like they should be inside custom_providers, not at root
@@ -2667,32 +2419,10 @@ def validate_config_structure(config: Optional[Dict[str, Any]] = None) -> List["
"Add the API endpoint URL, e.g.: base_url: https://api.example.com/v1",
))
# ── fallback_model: single dict OR list of dicts (chain) ─────────────
# ── fallback_model must be a top-level dict with provider + model ────
fb = config.get("fallback_model")
if fb is not None:
if isinstance(fb, list):
# Chain fallback — validate each entry
for i, entry in enumerate(fb):
if not isinstance(entry, dict):
issues.append(ConfigIssue(
"error",
f"fallback_model[{i}] should be a dict, got {type(entry).__name__}",
"Each entry needs provider + model",
))
else:
if not entry.get("provider"):
issues.append(ConfigIssue(
"warning",
f"fallback_model[{i}] is missing 'provider' field",
"Add: provider: openrouter (or another provider)",
))
if not entry.get("model"):
issues.append(ConfigIssue(
"warning",
f"fallback_model[{i}] is missing 'model' field",
"Add: model: <model-name>",
))
elif not isinstance(fb, dict):
if not isinstance(fb, dict):
issues.append(ConfigIssue(
"error",
f"fallback_model should be a dict with 'provider' and 'model', got {type(fb).__name__}",
@@ -3485,62 +3215,25 @@ def read_raw_config() -> Dict[str, Any]:
be parsed. Use this for lightweight config reads where you just need a
single value and don't want the overhead of ``load_config()``'s deep-merge
+ migration pipeline.
Cached on the config file's (mtime_ns, size) — same strategy as
``load_config()``. Returns a deepcopy on every call since some callers
mutate the result before passing to ``save_config()``.
"""
try:
config_path = get_config_path()
st = config_path.stat()
cache_key = (st.st_mtime_ns, st.st_size)
except (FileNotFoundError, OSError):
return {}
path_key = str(config_path)
cached = _RAW_CONFIG_CACHE.get(path_key)
if cached is not None and cached[:2] == cache_key:
return copy.deepcopy(cached[2])
try:
with open(config_path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except Exception:
return {}
if not isinstance(data, dict):
data = {}
_RAW_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], copy.deepcopy(data))
return data
pass
return {}
def load_config() -> Dict[str, Any]:
"""Load configuration from ~/.hermes/config.yaml.
Cached on the config file's (mtime_ns, size). Returns a deepcopy of
the cached value when unchanged, since most call sites mutate the
result (e.g. ``cfg["model"]["default"] = ...`` before ``save_config``).
The cache is keyed on ``str(config_path)`` so profile switches
(which change ``HERMES_HOME`` and therefore ``get_config_path()``)
don't collide.
"""
"""Load configuration from ~/.hermes/config.yaml."""
ensure_hermes_home()
config_path = get_config_path()
path_key = str(config_path)
try:
st = config_path.stat()
cache_key: Optional[Tuple[int, int]] = (st.st_mtime_ns, st.st_size)
except FileNotFoundError:
cache_key = None
cached = _LOAD_CONFIG_CACHE.get(path_key)
if cached is not None and cache_key is not None and cached[:2] == cache_key:
return copy.deepcopy(cached[2])
config = copy.deepcopy(DEFAULT_CONFIG)
if cache_key is not None:
if config_path.exists():
try:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
@@ -3558,26 +3251,20 @@ def load_config() -> Dict[str, Any]:
normalized = _normalize_root_model_keys(_normalize_max_turns_config(config))
expanded = _expand_env_vars(normalized)
_LAST_EXPANDED_CONFIG_BY_PATH[path_key] = copy.deepcopy(expanded)
if cache_key is not None:
_LOAD_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], copy.deepcopy(expanded))
else:
_LOAD_CONFIG_CACHE.pop(path_key, None)
_LAST_EXPANDED_CONFIG_BY_PATH[str(config_path)] = copy.deepcopy(expanded)
return expanded
_SECURITY_COMMENT = """
# ── Security ──────────────────────────────────────────────────────────
# Secret redaction is OFF by default — tool output (terminal stdout,
# read_file results, web content) passes through unmodified. Set
# redact_secrets to true to mask strings that look like API keys, tokens,
# and passwords before they enter the model context and logs.
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# tirith pre-exec scanning is enabled by default when the tirith binary
# is available. Configure via security.tirith_* keys or env vars
# (TIRITH_ENABLED, TIRITH_BIN, TIRITH_TIMEOUT, TIRITH_FAIL_OPEN).
#
# security:
# redact_secrets: true
# redact_secrets: false
# tirith_enabled: true
# tirith_path: "tirith"
# tirith_timeout: 5
@@ -3610,11 +3297,11 @@ _FALLBACK_COMMENT = """
_COMMENTED_SECTIONS = """
# ── Security ──────────────────────────────────────────────────────────
# Secret redaction is OFF by default. Set to true to mask strings that
# look like API keys, tokens, and passwords in tool output and logs.
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
#
# security:
# redact_secrets: true
# redact_secrets: false
# ── Fallback Model ────────────────────────────────────────────────────
# Automatic provider failover when primary is unavailable.
@@ -3665,12 +3352,7 @@ def save_config(config: Dict[str, Any]):
if not sec or sec.get("redact_secrets") is None:
parts.append(_SECURITY_COMMENT)
fb = normalized.get("fallback_model", {})
fb_is_valid = False
if isinstance(fb, list):
fb_is_valid = any(isinstance(e, dict) and e.get("provider") and e.get("model") for e in fb)
elif isinstance(fb, dict):
fb_is_valid = bool(fb.get("provider") and fb.get("model"))
if not fb_is_valid:
if not fb or not isinstance(fb, dict) or not (fb.get("provider") and fb.get("model")):
parts.append(_FALLBACK_COMMENT)
atomic_yaml_write(
@@ -3739,27 +3421,18 @@ def _sanitize_env_lines(lines: list) -> list:
# Detect concatenated KEY=VALUE pairs on one line.
# Search for known KEY= patterns at any position in the line.
# We collect full needle ranges so we can drop matches that are
# fully contained within a longer overlapping needle. Without this,
# suffix collisions corrupt the file: e.g. LM_API_KEY= inside
# GLM_API_KEY= would otherwise split the line into "G\nLM_API_KEY=...".
match_ranges: list[tuple[int, int]] = []
split_positions = []
for key_name in known_keys:
needle = key_name + "="
idx = stripped.find(needle)
while idx >= 0:
match_ranges.append((idx, idx + len(needle)))
split_positions.append(idx)
idx = stripped.find(needle, idx + len(needle))
split_positions = sorted({
s for s, e in match_ranges
if not any(
s2 <= s and e2 >= e and (s2, e2) != (s, e)
for s2, e2 in match_ranges
)
})
if len(split_positions) > 1:
split_positions.sort()
# Deduplicate (shouldn't happen, but be safe)
split_positions = sorted(set(split_positions))
for i, pos in enumerate(split_positions):
end = split_positions[i + 1] if i + 1 < len(split_positions) else len(stripped)
part = stripped[pos:end].strip()
@@ -3805,7 +3478,7 @@ def sanitize_env_file() -> int:
f.writelines(sanitized)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, env_path)
os.replace(tmp_path, env_path)
except BaseException:
try:
os.unlink(tmp_path)
@@ -3868,7 +3541,7 @@ def save_env_value(key: str, value: str):
value = _check_non_ascii_credential(key, value)
ensure_hermes_home()
env_path = get_env_path()
# On Windows, open() defaults to the system locale (cp1252) which can
# cause OSError errno 22 on UTF-8 .env files.
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
@@ -3880,7 +3553,7 @@ def save_env_value(key: str, value: str):
lines = f.readlines()
# Sanitize on every read: split concatenated keys, drop stale placeholders
lines = _sanitize_env_lines(lines)
# Find and update or append
found = False
for i, line in enumerate(lines):
@@ -3888,7 +3561,7 @@ def save_env_value(key: str, value: str):
lines[i] = f"{key}={value}\n"
found = True
break
if not found:
# Ensure there's a newline at the end of the file before appending
if lines and not lines[-1].endswith("\n"):
@@ -3908,7 +3581,7 @@ def save_env_value(key: str, value: str):
f.writelines(lines)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, env_path)
os.replace(tmp_path, env_path)
# Restore original permissions before _secure_file may tighten them.
if original_mode is not None:
try:
@@ -3964,7 +3637,7 @@ def remove_env_value(key: str) -> bool:
f.writelines(new_lines)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, env_path)
os.replace(tmp_path, env_path)
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
@@ -4051,13 +3724,12 @@ def get_env_value(key: str) -> Optional[str]:
# =============================================================================
def redact_key(key: str) -> str:
"""Redact an API key for display.
Thin wrapper over :func:`agent.redact.mask_secret` preserves the
"(not set)" placeholder in dim color for the empty case.
"""
from agent.redact import mask_secret
return mask_secret(key, empty=color("(not set)", Colors.DIM))
"""Redact an API key for display."""
if not key:
return color("(not set)", Colors.DIM)
if len(key) < 12:
return "***"
return key[:4] + "..." + key[-4:]
def show_config():
-232
View File
@@ -1,232 +0,0 @@
"""CLI subcommand: `hermes curator <subcommand>`.
Thin shell around agent/curator.py and tools/skill_usage.py. Renders a status
table, triggers a run, pauses/resumes, and pins/unpins skills.
This module intentionally has no side effects at import time main.py wires
the argparse subparsers on demand.
"""
from __future__ import annotations
import argparse
import sys
from datetime import datetime, timezone
from typing import Optional
def _fmt_ts(ts: Optional[str]) -> str:
if not ts:
return "never"
try:
dt = datetime.fromisoformat(ts)
except (TypeError, ValueError):
return str(ts)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
delta = datetime.now(timezone.utc) - dt
secs = int(delta.total_seconds())
if secs < 60:
return f"{secs}s ago"
if secs < 3600:
return f"{secs // 60}m ago"
if secs < 86400:
return f"{secs // 3600}h ago"
return f"{secs // 86400}d ago"
def _cmd_status(args) -> int:
from agent import curator
from tools import skill_usage
state = curator.load_state()
enabled = curator.is_enabled()
paused = state.get("paused", False)
last_run = state.get("last_run_at")
summary = state.get("last_run_summary") or "(none)"
runs = state.get("run_count", 0)
status_line = (
"ENABLED" if enabled and not paused else
"PAUSED" if paused else
"DISABLED"
)
print(f"curator: {status_line}")
print(f" runs: {runs}")
print(f" last run: {_fmt_ts(last_run)}")
print(f" last summary: {summary}")
_ih = curator.get_interval_hours()
_interval_label = (
f"{_ih // 24}d" if _ih % 24 == 0 and _ih >= 24
else f"{_ih}h"
)
print(f" interval: every {_interval_label}")
print(f" stale after: {curator.get_stale_after_days()}d unused")
print(f" archive after: {curator.get_archive_after_days()}d unused")
rows = skill_usage.agent_created_report()
if not rows:
print("\nno agent-created skills")
return 0
by_state = {"active": [], "stale": [], "archived": []}
pinned = []
for r in rows:
state_name = r.get("state", "active")
by_state.setdefault(state_name, []).append(r)
if r.get("pinned"):
pinned.append(r["name"])
print(f"\nagent-created skills: {len(rows)} total")
for state_name in ("active", "stale", "archived"):
bucket = by_state.get(state_name, [])
print(f" {state_name:10s} {len(bucket)}")
if pinned:
print(f"\npinned ({len(pinned)}): {', '.join(pinned)}")
# Show top 5 least-recently-used active skills
active = sorted(
by_state.get("active", []),
key=lambda r: r.get("last_used_at") or r.get("created_at") or "",
)[:5]
if active:
print("\nleast recently used (top 5):")
for r in active:
last = _fmt_ts(r.get("last_used_at"))
print(f" {r['name']:40s} use={r.get('use_count', 0):3d} last_used={last}")
return 0
def _cmd_run(args) -> int:
from agent import curator
if not curator.is_enabled():
print("curator: disabled via config; enable with `curator.enabled: true`")
return 1
print("curator: running review pass...")
def _on_summary(msg: str) -> None:
print(msg)
result = curator.run_curator_review(
on_summary=_on_summary,
synchronous=bool(args.synchronous),
)
auto = result.get("auto_transitions", {})
if auto:
print(
f"auto: checked={auto.get('checked', 0)} "
f"stale={auto.get('marked_stale', 0)} "
f"archived={auto.get('archived', 0)} "
f"reactivated={auto.get('reactivated', 0)}"
)
if not args.synchronous:
print("llm pass running in background — check `hermes curator status` later")
return 0
def _cmd_pause(args) -> int:
from agent import curator
curator.set_paused(True)
print("curator: paused")
return 0
def _cmd_resume(args) -> int:
from agent import curator
curator.set_paused(False)
print("curator: resumed")
return 0
def _cmd_pin(args) -> int:
from tools import skill_usage
if not skill_usage.is_agent_created(args.skill):
print(
f"curator: '{args.skill}' is bundled or hub-installed — cannot pin "
"(only agent-created skills participate in curation)"
)
return 1
skill_usage.set_pinned(args.skill, True)
print(f"curator: pinned '{args.skill}' (will bypass auto-transitions)")
return 0
def _cmd_unpin(args) -> int:
from tools import skill_usage
if not skill_usage.is_agent_created(args.skill):
print(
f"curator: '{args.skill}' is bundled or hub-installed — "
"there's nothing to unpin (curator only tracks agent-created skills)"
)
return 1
skill_usage.set_pinned(args.skill, False)
print(f"curator: unpinned '{args.skill}'")
return 0
def _cmd_restore(args) -> int:
from tools import skill_usage
ok, msg = skill_usage.restore_skill(args.skill)
print(f"curator: {msg}")
return 0 if ok else 1
# ---------------------------------------------------------------------------
# argparse wiring (called from hermes_cli.main)
# ---------------------------------------------------------------------------
def register_cli(parent: argparse.ArgumentParser) -> None:
"""Attach `curator` subcommands to *parent*.
main.py calls this with the ArgumentParser returned by
``subparsers.add_parser("curator", ...)``.
"""
parent.set_defaults(func=lambda a: (parent.print_help(), 0)[1])
subs = parent.add_subparsers(dest="curator_command")
p_status = subs.add_parser("status", help="Show curator status and skill stats")
p_status.set_defaults(func=_cmd_status)
p_run = subs.add_parser("run", help="Trigger a curator review now")
p_run.add_argument(
"--sync", "--synchronous", dest="synchronous", action="store_true",
help="Wait for the LLM review pass to finish (default: background thread)",
)
p_run.set_defaults(func=_cmd_run)
p_pause = subs.add_parser("pause", help="Pause the curator until resumed")
p_pause.set_defaults(func=_cmd_pause)
p_resume = subs.add_parser("resume", help="Resume a paused curator")
p_resume.set_defaults(func=_cmd_resume)
p_pin = subs.add_parser("pin", help="Pin a skill so the curator never auto-transitions it")
p_pin.add_argument("skill", help="Skill name")
p_pin.set_defaults(func=_cmd_pin)
p_unpin = subs.add_parser("unpin", help="Unpin a skill")
p_unpin.add_argument("skill", help="Skill name")
p_unpin.set_defaults(func=_cmd_unpin)
p_restore = subs.add_parser("restore", help="Restore an archived skill")
p_restore.add_argument("skill", help="Skill name")
p_restore.set_defaults(func=_cmd_restore)
def cli_main(argv=None) -> int:
"""Standalone entry (also usable by hermes_cli.main fallthrough)."""
parser = argparse.ArgumentParser(prog="hermes curator")
register_cli(parser)
args = parser.parse_args(argv)
fn = getattr(args, "func", None)
if fn is None:
parser.print_help()
return 0
return int(fn(args) or 0)
if __name__ == "__main__": # pragma: no cover
sys.exit(cli_main())
+7 -13
View File
@@ -7,6 +7,7 @@ Currently supports:
import io
import json
import os
import sys
import time
import urllib.error
@@ -17,7 +18,6 @@ from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_home
from utils import atomic_replace
# ---------------------------------------------------------------------------
@@ -45,13 +45,8 @@ def _pending_file() -> Path:
Each entry: ``{"url": "...", "expire_at": <unix_ts>}``. Scheduled
DELETEs used to be handled by spawning a detached Python process per
paste that slept for 6 hours; those accumulated forever if the user
ran ``hermes debug share`` repeatedly.
Deletion is now driven by the gateway's cron ticker
(``gateway/run.py::_start_cron_ticker``) which calls
``_sweep_expired_pastes`` once per hour. ``hermes debug share`` also
runs an opportunistic sweep on entry as a fallback for CLI-only users
who never start the gateway.
ran ``hermes debug share`` repeatedly. We now persist the schedule
to disk and sweep expired entries on the next debug invocation.
"""
return get_hermes_home() / "pastes" / "pending.json"
@@ -79,7 +74,7 @@ def _save_pending(entries: list[dict]) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(".json.tmp")
tmp.write_text(json.dumps(entries, indent=2), encoding="utf-8")
atomic_replace(tmp, path)
os.replace(tmp, path)
except OSError:
# Non-fatal — worst case the user has to run ``hermes debug delete``
# manually.
@@ -228,10 +223,9 @@ def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SEC
interpreters that never exited until the sleep completed.
The replacement is stateless: we append to ``~/.hermes/pastes/pending.json``
and the gateway's cron ticker sweeps expired entries once per hour.
``hermes debug share`` also runs an opportunistic sweep as a fallback
for CLI-only users. If neither runs again, paste.rs's own retention
policy handles cleanup.
and rely on opportunistic sweeps (``_sweep_expired_pastes``) called from
every ``hermes debug`` invocation. If the user never runs ``hermes debug``
again, paste.rs's own retention policy handles cleanup.
"""
_record_pending(urls, delay_seconds=delay_seconds)
+1
View File
@@ -13,6 +13,7 @@ automatically.
from __future__ import annotations
import io
import os
import sys
import time
+13 -78
View File
@@ -46,7 +46,6 @@ _PROVIDER_ENV_HINTS = (
"Z_AI_API_KEY",
"KIMI_API_KEY",
"KIMI_CN_API_KEY",
"GMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"KILOCODE_API_KEY",
@@ -57,7 +56,6 @@ _PROVIDER_ENV_HINTS = (
"OPENCODE_ZEN_API_KEY",
"OPENCODE_GO_API_KEY",
"XIAOMI_API_KEY",
"TOKENHUB_API_KEY",
)
@@ -293,23 +291,15 @@ def run_doctor(args):
known_providers: set = set()
try:
from hermes_cli.auth import (
PROVIDER_REGISTRY,
resolve_provider as _resolve_auth_provider,
)
from hermes_cli.auth import PROVIDER_REGISTRY
known_providers = set(PROVIDER_REGISTRY.keys()) | {"openrouter", "custom", "auto"}
except Exception:
_resolve_auth_provider = None
pass
try:
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import (
normalize_provider as _normalize_catalog_provider,
resolve_provider_full as _resolve_provider_full,
)
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
except Exception:
_compatible_custom_providers = None
_normalize_catalog_provider = None
_resolve_provider_full = None
custom_providers = []
@@ -329,43 +319,17 @@ def run_doctor(args):
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
valid_provider_ids = set(known_providers)
provider_ids_to_accept = {provider} if provider else set()
if _normalize_catalog_provider is not None:
for known_provider in known_providers:
try:
valid_provider_ids.add(_normalize_catalog_provider(known_provider))
except Exception:
continue
runtime_provider = provider
if (
provider
and _resolve_auth_provider is not None
and provider not in ("auto", "custom")
):
try:
runtime_provider = _resolve_auth_provider(provider)
provider_ids_to_accept.add(runtime_provider)
except Exception:
runtime_provider = provider
catalog_provider = provider
canonical_provider = provider
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
catalog_provider = provider_def.id if provider_def is not None else None
if catalog_provider is not None:
provider_ids_to_accept.add(catalog_provider)
canonical_provider = provider_def.id if provider_def is not None else None
if provider and provider != "auto":
if catalog_provider is None or (
known_providers
and not (provider_ids_to_accept & valid_provider_ids)
):
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
known_list = ", ".join(sorted(known_providers)) if known_providers else "(unavailable)"
check_fail(
f"model.provider '{provider_raw}' is not a recognised provider",
@@ -378,24 +342,7 @@ def run_doctor(args):
)
# Warn if model is set to a provider-prefixed name on a provider that doesn't use them
provider_for_policy = runtime_provider or catalog_provider
providers_accepting_vendor_slugs = {
"openrouter",
"custom",
"auto",
"ai-gateway",
"kilocode",
"opencode-zen",
"huggingface",
"lmstudio",
"nous",
}
if (
default_model
and "/" in default_model
and provider_for_policy
and provider_for_policy not in providers_accepting_vendor_slugs
):
if default_model and "/" in default_model and canonical_provider and canonical_provider not in ("openrouter", "custom", "auto", "ai-gateway", "kilocode", "opencode-zen", "huggingface", "nous"):
check_warn(
f"model.default '{default_model}' uses a vendor/model slug but provider is '{provider_raw}'",
"(vendor-prefixed slugs belong to aggregators like openrouter)",
@@ -411,24 +358,20 @@ def run_doctor(args):
# own env-var checks elsewhere in doctor, and get_auth_status()
# returns a bare {logged_in: False} for anything it doesn't
# explicitly dispatch, which would produce false positives.
if runtime_provider and runtime_provider not in ("auto", "custom", "openrouter"):
if canonical_provider and canonical_provider not in ("auto", "custom", "openrouter"):
try:
from hermes_cli.auth import PROVIDER_REGISTRY, get_auth_status
pconfig = PROVIDER_REGISTRY.get(runtime_provider)
pconfig = PROVIDER_REGISTRY.get(canonical_provider)
if pconfig and getattr(pconfig, "auth_type", "") == "api_key":
status = get_auth_status(runtime_provider) or {}
configured = bool(
status.get("configured")
or status.get("logged_in")
or status.get("api_key")
)
status = get_auth_status(canonical_provider) or {}
configured = bool(status.get("configured") or status.get("logged_in") or status.get("api_key"))
if not configured:
check_fail(
f"model.provider '{runtime_provider}' is set but no API key is configured",
f"model.provider '{canonical_provider}' is set but no API key is configured",
"(check ~/.hermes/.env or run 'hermes setup')",
)
issues.append(
f"No credentials found for provider '{runtime_provider}'. "
f"No credentials found for provider '{canonical_provider}'. "
f"Run 'hermes setup' or set the provider's API key in {_DHH}/.env, "
f"or switch providers with 'hermes config set model.provider <name>'"
)
@@ -572,14 +515,7 @@ def run_doctor(args):
if shutil.which("codex"):
check_ok("codex CLI")
else:
# Native OAuth uses Hermes' own device-code flow — the Codex CLI is
# only needed if you want to import existing tokens from
# ~/.codex/auth.json. Downgrade to info so users running
# `hermes auth openai-codex` aren't told they're missing something.
check_info(
"codex CLI not installed "
"(optional — only required to import tokens from an existing Codex CLI login)"
)
check_warn("codex CLI not found", "(required for openai-codex login)")
# =========================================================================
# Check: Directory structure
@@ -1001,7 +937,6 @@ def run_doctor(args):
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
+6 -8
View File
@@ -33,14 +33,12 @@ def _get_git_commit(project_root: Path) -> str:
def _redact(value: str) -> str:
"""Redact all but first 4 and last 4 chars.
Thin wrapper over :func:`agent.redact.mask_secret`. Returns ``""`` for
an empty value (matches the historical behavior of this helper
``hermes dump`` formats empty values as blank, not as ``"(not set)"``).
"""
from agent.redact import mask_secret
return mask_secret(value)
"""Redact all but first 4 and last 4 chars."""
if not value:
return ""
if len(value) < 12:
return "***"
return value[:4] + "..." + value[-4:]
def _gateway_status() -> str:
+1 -2
View File
@@ -7,7 +7,6 @@ import sys
from pathlib import Path
from dotenv import load_dotenv
from utils import atomic_replace
# Env var name suffixes that indicate credential values. These are the
@@ -128,7 +127,7 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
f.writelines(sanitized)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp, path)
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
+4 -25
View File
@@ -2724,24 +2724,6 @@ _PLATFORMS = [
"help": "OpenID to deliver cron results and notifications to."},
],
},
{
"key": "yuanbao",
"label": "Yuanbao",
"emoji": "💎",
"token_var": "YUANBAO_APP_ID",
"setup_instructions": [
"1. Download the Yuanbao app from https://yuanbao.tencent.com/",
"2. In the app, go to PAI → My Bot and create a new bot",
"3. After the bot is created, copy the App ID and App Secret",
"4. Enter them below and Hermes will connect automatically over WebSocket",
],
"vars": [
{"name": "YUANBAO_APP_ID", "prompt": "App ID", "password": False,
"help": "The App ID from your Yuanbao IM Bot credentials."},
{"name": "YUANBAO_APP_SECRET", "prompt": "App Secret", "password": True,
"help": "The App Secret (used for HMAC signing) from your Yuanbao IM Bot."},
],
},
]
@@ -2953,7 +2935,7 @@ def _setup_sms():
def _setup_dingtalk():
"""Configure DingTalk — QR scan (recommended) or manual credential entry."""
from hermes_cli.setup import (
prompt_choice, prompt_yes_no, print_success, print_warning,
prompt_choice, prompt_yes_no, print_info, print_success, print_warning,
)
dingtalk_platform = next(p for p in _PLATFORMS if p["key"] == "dingtalk")
@@ -3126,12 +3108,6 @@ def _setup_wecom():
print_success("💬 WeCom configured!")
def _setup_yuanbao():
"""Configure Yuanbao via the standard platform setup."""
yuanbao_platform = next(p for p in _PLATFORMS if p["key"] == "yuanbao")
_setup_standard_platform(yuanbao_platform)
def _is_service_installed() -> bool:
"""Check if the gateway is installed as a system service."""
if supports_systemd_services():
@@ -3504,6 +3480,7 @@ def _setup_qqbot():
method_idx = prompt_choice(" How would you like to set up QQ Bot?", method_choices, 0)
credentials = None
used_qr = False
if method_idx == 0:
# ── QR scan-to-configure ──
@@ -3514,6 +3491,8 @@ def _setup_qqbot():
print()
print_warning(" QQ Bot setup cancelled.")
return
if credentials:
used_qr = True
if not credentials:
print_info(" QR setup did not complete. Continuing with manual input.")
+2 -1
View File
@@ -19,8 +19,9 @@ format) lives there.
from __future__ import annotations
import json
import os
from pathlib import Path
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
def hooks_command(args) -> None:
+1281
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+74 -701
View File
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -46,6 +46,7 @@ from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.request
@@ -53,7 +54,6 @@ from pathlib import Path
from typing import Any
from hermes_cli import __version__ as _HERMES_VERSION
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -190,7 +190,7 @@ def _write_disk_cache(data: dict[str, Any]) -> None:
with open(tmp, "w") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
atomic_replace(tmp, path)
os.replace(tmp, path)
except OSError as exc:
logger.info("model catalog cache write failed: %s", exc)
+6 -84
View File
@@ -213,15 +213,10 @@ def _load_direct_aliases() -> dict[str, DirectAlias]:
def _ensure_direct_aliases() -> None:
"""Lazy-load direct aliases on first use.
Mutates the existing DIRECT_ALIASES dict in place rather than rebinding
the module attribute. This keeps `from hermes_cli.model_switch import
DIRECT_ALIASES` references valid in callers rebinding would leave them
pointing at a stale empty dict.
"""
"""Lazy-load direct aliases on first use."""
global DIRECT_ALIASES
if not DIRECT_ALIASES:
DIRECT_ALIASES.update(_load_direct_aliases())
DIRECT_ALIASES = _load_direct_aliases()
# ---------------------------------------------------------------------------
@@ -984,7 +979,6 @@ def list_authenticated_providers(
user_providers: dict = None,
custom_providers: list | None = None,
max_models: int = 8,
current_model: str = "",
) -> List[dict]:
"""Detect which providers have credentials and list their curated models.
@@ -1031,34 +1025,6 @@ def list_authenticated_providers(
if "ollama-cloud" not in curated:
from hermes_cli.models import fetch_ollama_cloud_models
curated["ollama-cloud"] = fetch_ollama_cloud_models()
# LM Studio has no static catalog — probe its native /api/v1/models
# endpoint live so the picker reflects whatever the user has loaded.
# Base URL precedence: LM_BASE_URL env var > active config's base_url
# (when current provider is lmstudio) > 127.0.0.1 default.
# On auth rejection or unreachable server, fall back to the caller-supplied
# current model so the picker still shows something when offline / mis-keyed.
if "lmstudio" not in curated and (
os.environ.get("LM_API_KEY") or os.environ.get("LM_BASE_URL") or current_provider.strip().lower() == "lmstudio"
):
from hermes_cli.models import fetch_lmstudio_models
from hermes_cli.auth import AuthError
is_current_lmstudio = current_provider.strip().lower() == "lmstudio"
lm_base = (
os.environ.get("LM_BASE_URL")
or (current_base_url if is_current_lmstudio and current_base_url else None)
or "http://127.0.0.1:1234/v1"
)
try:
live = fetch_lmstudio_models(
api_key=os.environ.get("LM_API_KEY", ""),
base_url=lm_base,
timeout=1.5, # Smaller timeout for picker
)
except AuthError:
live = []
if not live and is_current_lmstudio and current_model:
live = [current_model]
curated["lmstudio"] = live
# --- 1. Check Hermes-mapped providers ---
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
@@ -1209,15 +1175,6 @@ def list_authenticated_providers(
if hermes_slug in {"copilot", "copilot-acp"}:
model_ids = provider_model_ids(hermes_slug)
# For aws_sdk providers (bedrock), use live discovery so the list
# reflects the active region (eu.*, ap.*) not the static us.* list.
elif overlay.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
model_ids = _ids if _ids is not None else (curated.get(hermes_slug, []) or curated.get(pid, []))
except Exception:
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
@@ -1280,30 +1237,10 @@ def list_authenticated_providers(
except Exception:
pass
# Special case: aws_sdk auth (bedrock) — no API key env vars,
# credentials come from the boto3 credential chain (env vars,
# ~/.aws/credentials, instance roles, etc.)
if not _cp_has_creds and _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
try:
from agent.bedrock_adapter import has_aws_credentials
_cp_has_creds = has_aws_credentials()
except Exception:
pass
if not _cp_has_creds:
continue
# For bedrock, use live discovery so the list reflects the active
# region (eu.*, us.*, ap.*) instead of the hardcoded us.* static list.
if _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
_cp_model_ids = _ids if _ids is not None else curated.get(_cp.slug, [])
except Exception:
_cp_model_ids = curated.get(_cp.slug, [])
else:
_cp_model_ids = curated.get(_cp.slug, [])
_cp_model_ids = curated.get(_cp.slug, [])
_cp_total = len(_cp_model_ids)
_cp_top = _cp_model_ids[:max_models]
@@ -1375,23 +1312,8 @@ def list_authenticated_providers(
if fb:
models_list = list(fb)
# Prefer the endpoint's live /models list when credentials are
# available. This keeps OpenAI-compatible relays (for example CRS)
# in sync when the server catalog changes without requiring the
# user to mirror every model into config.yaml.
api_key = str(ep_cfg.get("api_key", "") or "").strip()
if not api_key:
key_env = str(ep_cfg.get("key_env", "") or "").strip()
api_key = os.environ.get(key_env, "").strip() if key_env else ""
if api_url and api_key:
try:
from hermes_cli.models import fetch_api_models
live_models = fetch_api_models(api_key, api_url)
if live_models:
models_list = live_models
except Exception:
pass
# Try to probe /v1/models if URL is set (but don't block on it)
# For now just show what we know from config
results.append({
"slug": ep_name,
"name": display_name,
+42 -483
View File
@@ -33,6 +33,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("deepseek/deepseek-v4-pro", ""),
("deepseek/deepseek-v4-flash", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@@ -44,7 +46,6 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
("tencent/hy3-preview:free", "free"),
("openai/gpt-5.3-codex", ""),
("google/gemini-3-pro-image-preview", ""),
("google/gemini-3-flash-preview", ""),
@@ -107,57 +108,13 @@ def _codex_curated_models() -> list[str]:
return _add_forward_compat_models(list(DEFAULT_CODEX_MODELS))
# Static fallback for xAI when the models.dev disk cache is empty (fresh
# install, offline first run, etc.). Mirrors the xAI-direct model IDs from
# $HERMES_HOME/models_dev_cache.json as of 2026-04-28. Whenever xAI renames
# or retires a model, the disk cache picks it up on the next refresh and the
# fallback here only matters until that refresh lands.
_XAI_STATIC_FALLBACK: list[str] = [
"grok-4.20-0309-reasoning",
"grok-4.20-0309-non-reasoning",
"grok-4.20-multi-agent-0309",
"grok-4-1-fast",
"grok-4-1-fast-non-reasoning",
"grok-4-fast",
"grok-4-fast-non-reasoning",
"grok-4",
"grok-code-fast-1",
]
def _xai_curated_models() -> list[str]:
"""Derive the xAI-direct curated list from models.dev disk cache.
Reads $HERMES_HOME/models_dev_cache.json directly (no network) so this
runs at import time without blocking. Falls back to ``_XAI_STATIC_FALLBACK``
when the cache is empty or unreadable. Hermes refreshes the cache from
https://models.dev/api.json on normal use, so this list self-heals as
xAI renames models.
Mirrors ``_codex_curated_models()``'s role for openai-codex.
"""
try:
from agent.models_dev import _load_disk_cache
data = _load_disk_cache()
xai = data.get("xai") if isinstance(data, dict) else None
models = xai.get("models") if isinstance(xai, dict) else None
if isinstance(models, dict) and models:
ids = [mid for mid in models.keys() if isinstance(mid, str)]
if ids:
return sorted(ids)
except Exception:
# Any failure (missing file, malformed JSON, import error)
# falls through to the static list.
pass
return list(_XAI_STATIC_FALLBACK)
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"deepseek/deepseek-v4-pro",
"deepseek/deepseek-v4-flash",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"tencent/hy3-preview",
"anthropic/claude-opus-4.7",
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
@@ -240,7 +197,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"glm-4.5",
"glm-4.5-flash",
],
"xai": _xai_curated_models(),
"xai": [
"grok-4.20-reasoning",
"grok-4-1-fast-reasoning",
],
"nvidia": [
# NVIDIA flagship reasoning models
"nvidia/nemotron-3-super-120b-a12b",
@@ -317,22 +277,11 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"mimo-v2-omni",
"mimo-v2-flash",
],
"tencent-tokenhub": [
"hy3-preview",
],
"arcee": [
"trinity-large-thinking",
"trinity-large-preview",
"trinity-mini",
],
"gmi": [
"zai-org/GLM-5.1-FP8",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"google/gemini-3.1-flash-lite-preview",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
],
"opencode-zen": [
"kimi-k2.5",
"gpt-5.4-pro",
@@ -397,7 +346,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
# to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
# or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
"alibaba": [
"qwen3.6-plus",
"kimi-k2.5",
"qwen3.5-plus",
"qwen3-coder-plus",
@@ -765,15 +713,14 @@ class ProviderEntry(NamedTuple):
label: str
tui_desc: str # detailed description for `hermes model` TUI
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
ProviderEntry("lmstudio", "LM Studio", "LM Studio (local desktop app with built-in model server)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, $5 free credit, no markup)"),
ProviderEntry("anthropic", "Anthropic", "Anthropic (Claude models — API key or Claude Code)"),
ProviderEntry("openai-codex", "OpenAI Codex", "OpenAI Codex"),
ProviderEntry("xiaomi", "Xiaomi MiMo", "Xiaomi MiMo (MiMo-V2.5 and V2 models — pro, omni, flash)"),
ProviderEntry("tencent-tokenhub", "Tencent TokenHub", "Tencent TokenHub (Hy3 Preview — direct API via tokenhub.tencentmaas.com)"),
ProviderEntry("nvidia", "NVIDIA NIM", "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
ProviderEntry("copilot", "GitHub Copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
@@ -792,7 +739,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("gmi", "GMI Cloud", "GMI Cloud (multi-model direct API)"),
ProviderEntry("kilocode", "Kilo Code", "Kilo Code (Kilo Gateway API)"),
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
@@ -827,8 +773,6 @@ _PROVIDER_ALIASES = {
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -856,10 +800,6 @@ _PROVIDER_ALIASES = {
"huggingface-hub": "huggingface",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
@@ -871,9 +811,6 @@ _PROVIDER_ALIASES = {
"nvidia-nim": "nvidia",
"build-nvidia": "nvidia",
"nemotron": "nvidia",
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
"lm_studio": "lmstudio",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama_cloud": "ollama-cloud",
}
@@ -1680,41 +1617,31 @@ def provider_label(provider: Optional[str]) -> str:
# Models that support OpenAI Priority Processing (service_tier="priority").
# See https://openai.com/api-priority-processing/ for the canonical list.
#
# Pattern-based matching — any OpenAI flagship model (gpt-*, o1*, o3*, o4*)
# is assumed to support Priority Processing. service_tier=priority is silently
# ignored by non-OpenAI endpoints (OpenRouter/Copilot/opencode-zen proxies
# strip the field), so false positives are harmless. Codex-series models
# (gpt-5-codex, gpt-5.3-codex, etc.) are excluded — they don't expose the
# service_tier parameter through the Codex Responses API.
_OPENAI_FAST_MODE_PREFIXES: tuple[str, ...] = (
"gpt-",
"o1",
# Only the bare model slug is stored (no vendor prefix).
_PRIORITY_PROCESSING_MODELS: frozenset[str] = frozenset({
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5.2",
"gpt-5.1",
"gpt-5",
"gpt-5-mini",
"gpt-4.1",
"gpt-4.1-mini",
"gpt-4.1-nano",
"gpt-4o",
"gpt-4o-mini",
"o3",
"o4",
)
def _is_openai_fast_model(model_id: Optional[str]) -> bool:
"""Return True if the model is an OpenAI flagship eligible for Priority Processing."""
raw = _strip_vendor_prefix(str(model_id or ""))
base = raw.split(":")[0]
if not base:
return False
# Exclude Codex-series — they route through the Codex Responses API
# which doesn't accept service_tier.
if "codex" in base:
return False
return any(base.startswith(prefix) for prefix in _OPENAI_FAST_MODE_PREFIXES)
"o4-mini",
})
# Models that support Anthropic Fast Mode (speed="fast").
# See https://platform.claude.com/docs/en/build-with-claude/fast-mode
#
# Pattern-based matching — any claude-* model is eligible. The anthropic
# adapter gates speed=fast on native Anthropic endpoints only (see
# _is_third_party_anthropic_endpoint in agent/anthropic_adapter.py), so
# third-party proxies that would reject the beta header are protected.
# Currently only Claude Opus 4.6. Both hyphen and dot variants are stored
# to handle native Anthropic (claude-opus-4-6) and OpenRouter (claude-opus-4.6).
_ANTHROPIC_FAST_MODE_MODELS: frozenset[str] = frozenset({
"claude-opus-4-6",
"claude-opus-4.6",
})
def _strip_vendor_prefix(model_id: str) -> str:
@@ -1727,14 +1654,20 @@ def _strip_vendor_prefix(model_id: str) -> str:
def model_supports_fast_mode(model_id: Optional[str]) -> bool:
"""Return whether Hermes should expose the /fast toggle for this model."""
return _is_anthropic_fast_model(model_id) or _is_openai_fast_model(model_id)
raw = _strip_vendor_prefix(str(model_id or ""))
if raw in _PRIORITY_PROCESSING_MODELS:
return True
# Anthropic fast mode — strip date suffixes (e.g. claude-opus-4-6-20260401)
# and OpenRouter variant tags (:fast, :beta) for matching.
base = raw.split(":")[0]
return base in _ANTHROPIC_FAST_MODE_MODELS
def _is_anthropic_fast_model(model_id: Optional[str]) -> bool:
"""Return True if the model is a Claude model eligible for Anthropic Fast Mode."""
"""Return True if the model supports Anthropic's fast mode (speed='fast')."""
raw = _strip_vendor_prefix(str(model_id or ""))
base = raw.split(":")[0]
return base.startswith("claude-")
return base in _ANTHROPIC_FAST_MODE_MODELS
def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | None:
@@ -1756,61 +1689,14 @@ def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | Non
def _resolve_copilot_catalog_api_key() -> str:
"""Best-effort GitHub token for fetching the Copilot model catalog.
Resolution order:
1. ``resolve_api_key_provider_credentials("copilot")`` env vars
(``COPILOT_GITHUB_TOKEN`` / ``GH_TOKEN`` / ``GITHUB_TOKEN``) plus
the ``gh auth token`` CLI fallback.
2. ``read_credential_pool("copilot")`` a token (typically a
``gho_*`` from device-code login, or a fine-grained PAT) stored in
``auth.json`` under ``credential_pool.copilot[]``. The pool is
populated by ``hermes auth add copilot`` and by ``_seed_from_env``
when the env var is set in ``~/.hermes/.env``.
Without (2), users whose only Copilot credential is in the pool see
the ``/model`` picker fall back to a stale hardcoded list because the
live catalog fetch silently 401s. To avoid wedging on a malformed pool
entry, each candidate is exchanged via ``exchange_copilot_token``
only entries that actually exchange successfully are returned, so a
later valid entry is reachable when an earlier one is unsupported.
"""
"""Best-effort GitHub token for fetching the Copilot model catalog."""
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("copilot")
api_key = str(creds.get("api_key") or "").strip()
if api_key:
return api_key
return str(creds.get("api_key") or "").strip()
except Exception:
pass
try:
from hermes_cli.auth import read_credential_pool
from hermes_cli.copilot_auth import (
exchange_copilot_token,
validate_copilot_token,
)
for entry in read_credential_pool("copilot"):
if not isinstance(entry, dict):
continue
raw = str(entry.get("access_token") or "").strip()
if not raw:
continue
valid, _ = validate_copilot_token(raw)
if not valid:
continue
try:
api_token, _expires_at = exchange_copilot_token(raw)
except Exception:
continue
if api_token:
return api_token
except Exception:
pass
return ""
return ""
# Providers where models.dev is treated as authoritative: curated static
@@ -1967,19 +1853,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
if normalized == "gmi":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("gmi")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@@ -1992,18 +1865,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = fetch_api_models(api_key, base_url)
if live:
return live
# Bedrock uses live discovery keyed by the resolved AWS region so that
# EU/AP users see eu.*/ap.* model IDs instead of the static us.* list.
# Note: early return intentionally skips _MODELS_DEV_PREFERRED merge
# below — bedrock is not expected to appear in that table.
if normalized == "bedrock":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
ids = bedrock_model_ids_or_none()
if ids is not None:
return ids
except Exception:
pass
curated_static = list(_PROVIDER_MODELS.get(normalized, []))
if normalized in _MODELS_DEV_PREFERRED:
return _merge_with_models_dev(normalized, curated_static)
@@ -2199,228 +2060,6 @@ def _is_github_models_base_url(base_url: Optional[str]) -> bool:
)
def _lmstudio_server_root(base_url: Optional[str]) -> Optional[str]:
"""Strip ``/v1`` suffix from an LM Studio base URL to get the native API root.
Returns ``None`` when the base URL is empty/invalid.
"""
root = (base_url or "").strip().rstrip("/")
if root.endswith("/v1"):
root = root[:-3].rstrip("/")
return root or None
def _lmstudio_request_headers(api_key: Optional[str] = None) -> dict:
"""Build HTTP headers for LM Studio native API requests."""
headers = {"User-Agent": _HERMES_USER_AGENT}
token = str(api_key or "").strip()
if token:
headers["Authorization"] = f"Bearer {token}"
return headers
def _lmstudio_fetch_raw_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> Optional[list[dict]]:
"""Fetch the raw model list from LM Studio's ``/api/v1/models``.
Returns the ``models`` list of dicts on success, ``None`` on network
errors or malformed responses. Raises ``AuthError`` on HTTP 401/403.
"""
server_root = _lmstudio_server_root(base_url)
if not server_root:
return None
headers = _lmstudio_request_headers(api_key)
request = urllib.request.Request(server_root + "/api/v1/models", headers=headers)
try:
with urllib.request.urlopen(request, timeout=timeout) as resp:
payload = json.loads(resp.read().decode())
except urllib.error.HTTPError as exc:
if exc.code in (401, 403):
from hermes_cli.auth import AuthError
raise AuthError(
f"LM Studio rejected the request with HTTP {exc.code}.",
provider="lmstudio",
code="auth_rejected",
) from exc
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s failed with HTTP %s", server_root, exc.code,
)
return None
except Exception as exc:
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s failed: %s", server_root, exc,
)
return None
raw_models = payload.get("models") if isinstance(payload, dict) else None
if not isinstance(raw_models, list):
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s returned malformed payload (no `models` list)",
server_root,
)
return None
return raw_models
def probe_lmstudio_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> Optional[list[str]]:
"""Probe LM Studio's model listing.
Returns chat-capable model keys on success, including the valid empty-list
case when the server is reachable but has no non-embedding models.
Returns ``None`` on network errors, malformed responses, or empty/invalid
base URLs.
Raises ``AuthError`` on HTTP 401/403 so callers can surface token issues
separately from reachability problems.
"""
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=timeout)
if raw_models is None:
return None
keys: list[str] = []
for raw in raw_models:
if not isinstance(raw, dict):
continue
if str(raw.get("type") or "").strip().lower() == "embedding":
continue
key = str(raw.get("key") or raw.get("id") or "").strip()
if key and key not in keys:
keys.append(key)
return keys
def fetch_lmstudio_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> list[str]:
"""Fetch LM Studio chat-capable model keys from native ``/api/v1/models``.
Returns a list of model keys (e.g. ``publisher/model-name``) with embedding
models filtered out. Returns an empty list on network errors, malformed
responses, or empty/invalid base URLs.
Raises ``AuthError`` on HTTP 401/403 so callers can distinguish a missing
or wrong ``LM_API_KEY`` from an unreachable server the most common
LM Studio support case once auth-enabled mode is turned on.
"""
models = probe_lmstudio_models(api_key=api_key, base_url=base_url, timeout=timeout)
return models or []
def ensure_lmstudio_model_loaded(
model: str,
base_url: Optional[str],
api_key: Optional[str],
target_context_length: int,
timeout: float = 120.0,
) -> Optional[int]:
"""Ensure LM Studio has ``model`` loaded with at least ``target_context_length``.
No-op when an instance is already loaded with sufficient context. Otherwise
POSTs ``/api/v1/models/load`` to (re)load with the target context, capped
at the model's ``max_context_length``. Returns the resolved loaded context
length, or ``None`` when the probe / load failed.
"""
server_root = _lmstudio_server_root(base_url)
if not server_root:
return None
headers = _lmstudio_request_headers(api_key)
try:
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=10)
except Exception:
raw_models = None
if raw_models is None:
return None
target_entry = None
for raw in raw_models:
if not isinstance(raw, dict):
continue
if raw.get("key") == model or raw.get("id") == model:
target_entry = raw
break
if target_entry is None:
return None
max_ctx = target_entry.get("max_context_length")
if isinstance(max_ctx, int) and max_ctx > 0:
target_context_length = min(target_context_length, max_ctx)
for inst in target_entry.get("loaded_instances") or []:
cfg = inst.get("config") if isinstance(inst, dict) else None
loaded_ctx = cfg.get("context_length") if isinstance(cfg, dict) else None
if isinstance(loaded_ctx, int) and loaded_ctx >= target_context_length:
return loaded_ctx
body = json.dumps({
"model": model,
"context_length": target_context_length,
}).encode()
load_headers = dict(headers)
load_headers["Content-Type"] = "application/json"
try:
with urllib.request.urlopen(
urllib.request.Request(
server_root + "/api/v1/models/load",
data=body,
headers=load_headers,
method="POST",
),
timeout=timeout,
) as resp:
resp.read()
except Exception:
return None
return target_context_length
def lmstudio_model_reasoning_options(
model: str,
base_url: Optional[str],
api_key: Optional[str] = None,
timeout: float = 5.0,
) -> list[str]:
"""Return the reasoning ``allowed_options`` LM Studio publishes for ``model``.
Pulls ``capabilities.reasoning.allowed_options`` from ``/api/v1/models``.
Returns ``[]`` when the model is unknown, the endpoint is unreachable,
or the model does not declare a reasoning capability.
"""
try:
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=timeout)
except Exception:
raw_models = None
if not raw_models:
return []
for raw in raw_models:
if not isinstance(raw, dict):
continue
if raw.get("key") != model and raw.get("id") != model:
continue
caps = raw.get("capabilities")
reasoning = caps.get("reasoning") if isinstance(caps, dict) else None
opts = reasoning.get("allowed_options") if isinstance(reasoning, dict) else None
if isinstance(opts, list):
return [str(o).strip().lower() for o in opts if isinstance(o, str)]
return []
return []
def _fetch_github_models(api_key: Optional[str] = None, timeout: float = 5.0) -> Optional[list[str]]:
catalog = fetch_github_model_catalog(api_key=api_key, timeout=timeout)
if not catalog:
@@ -2591,52 +2230,6 @@ def copilot_model_api_mode(
return "chat_completions"
# Azure Foundry model families that require the Responses API. Azure
# rejects /chat/completions against these deployments with
# ``400 "The requested operation is unsupported."`` — the same payload Bob
# Dobolina hit in April 2026 on ``gpt-5.3-codex`` while ``gpt-4o-pure`` on
# the same endpoint worked fine. Keep the patterns broad enough to cover
# vendor-renamed deployments (e.g. ``gpt-5.3-codex``, ``gpt-5-codex``,
# ``gpt-5.4``, ``o1-preview``) but tight enough to leave GPT-4 / 3.5 / Llama /
# Mistral / Grok deployments on chat completions.
_AZURE_FOUNDRY_RESPONSES_PREFIXES = (
"codex", # codex-*, codex-mini
"gpt-5", # gpt-5, gpt-5.x, gpt-5-codex, gpt-5.x-codex
"o1", # o1, o1-preview, o1-mini
"o3", # o3, o3-mini
"o4", # o4, o4-mini
)
def azure_foundry_model_api_mode(model_name: Optional[str]) -> Optional[str]:
"""Infer Azure Foundry api_mode from a deployment/model name.
Returns ``"codex_responses"`` when the model name matches a family that
only accepts the Responses API on Azure Foundry (GPT-5.x, codex, o1/o3/o4
reasoning models). Returns ``None`` otherwise the caller should fall
back to the configured/default api_mode (typically ``chat_completions``)
so GPT-4o, GPT-4 Turbo, Llama, Mistral, etc. keep working.
Intentionally does NOT return ``anthropic_messages``; Anthropic-style
Azure endpoints are disambiguated by URL (``/anthropic`` suffix) in
``runtime_provider._detect_api_mode_for_url`` and by the user setting
``model.api_mode: anthropic_messages`` explicitly.
"""
raw = str(model_name or "").strip().lower()
if not raw:
return None
# Strip any vendor/ prefix a user may have copied from OpenRouter / Copilot.
if "/" in raw:
raw = raw.rsplit("/", 1)[-1]
# gpt-5-mini speaks chat completions on Copilot but Azure Foundry deploys
# the full gpt-5 family uniformly on Responses API — don't carve an
# exception here.
for prefix in _AZURE_FOUNDRY_RESPONSES_PREFIXES:
if raw.startswith(prefix):
return "codex_responses"
return None
def normalize_opencode_model_id(provider_id: Optional[str], model_id: Optional[str]) -> str:
"""Normalize OpenCode config IDs to the bare model slug used in API requests."""
provider = normalize_provider(provider_id)
@@ -3016,40 +2609,6 @@ def validate_requested_model(
"message": "Model names cannot contain spaces.",
}
if normalized == "lmstudio":
from hermes_cli.auth import AuthError
# Use probe_lmstudio_models so we can distinguish None (unreachable
# / malformed response) from [] (reachable, but no chat-capable models
# are loaded). fetch_lmstudio_models collapses both to [].
try:
models = probe_lmstudio_models(api_key=api_key, base_url=base_url)
except AuthError as exc:
return {
"accepted": False, "persist": False, "recognized": False,
"message": (
f"{exc} Set `LM_API_KEY` (or update it) to match the server's bearer token."
),
}
if models is None:
return {
"accepted": False, "persist": False, "recognized": False,
"message": f"Could not reach LM Studio's `/api/v1/models` to validate `{requested}`.",
}
if not models:
return {
"accepted": False, "persist": False, "recognized": False,
"message": (
f"LM Studio is reachable but no chat-capable models are loaded. "
f"Load `{requested}` in LM Studio (Developer tab → Load Model) and try again."
),
}
if requested_for_lookup in set(models):
return {"accepted": True, "persist": True, "recognized": True, "message": None}
return {
"accepted": False, "persist": False, "recognized": False,
"message": f"Model `{requested}` was not found in LM Studio's model listing.",
}
if normalized == "custom":
# Try probing with correct auth for the api_mode.
if api_mode == "anthropic_messages":
+8 -16
View File
@@ -9,7 +9,6 @@ from typing import Dict, Iterable, Optional, Set
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.config import get_env_value, load_config
from tools.managed_tool_gateway import is_managed_tool_gateway_ready
from utils import is_truthy_value
from tools.tool_backend_helpers import (
fal_key_is_configured,
has_direct_modal_credentials,
@@ -26,13 +25,6 @@ _DEFAULT_PLATFORM_TOOLSETS = {
}
def _uses_gateway(section: object) -> bool:
"""Return True when a config section explicitly opts into the gateway."""
if not isinstance(section, dict):
return False
return is_truthy_value(section.get("use_gateway"), default=False)
@dataclass(frozen=True)
class NousFeatureState:
key: str
@@ -270,11 +262,11 @@ def get_nous_subscription_features(
# use_gateway flags — when True, the user explicitly opted into the
# Tool Gateway via `hermes model`, so direct credentials should NOT
# prevent gateway routing.
web_use_gateway = _uses_gateway(web_cfg)
tts_use_gateway = _uses_gateway(tts_cfg)
browser_use_gateway = _uses_gateway(browser_cfg)
web_use_gateway = bool(web_cfg.get("use_gateway"))
tts_use_gateway = bool(tts_cfg.get("use_gateway"))
browser_use_gateway = bool(browser_cfg.get("use_gateway"))
image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
image_use_gateway = _uses_gateway(image_gen_cfg)
image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
direct_exa = bool(get_env_value("EXA_API_KEY"))
direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
@@ -609,10 +601,10 @@ def get_gateway_eligible_tools(
# no direct keys exist — we only skip the prompt for tools where
# use_gateway was explicitly set.
opted_in = {
"web": _uses_gateway(config.get("web")),
"image_gen": _uses_gateway(config.get("image_gen")),
"tts": _uses_gateway(config.get("tts")),
"browser": _uses_gateway(config.get("browser")),
"web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
"image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
"tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
"browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
}
unconfigured: list[str] = []
+11 -28
View File
@@ -128,44 +128,27 @@ def _run_agent(
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
explicit_base_url_from_alias: Optional[str] = None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
# First check DIRECT_ALIASES populated from config.yaml `model_aliases:`.
# These map a user-defined alias to (model, provider, base_url) for
# endpoints not in any catalog (local servers, custom proxies, etc.).
try:
from hermes_cli import model_switch as _ms
_ms._ensure_direct_aliases()
direct = _ms.DIRECT_ALIASES.get(explicit_model.strip().lower())
except Exception:
direct = None
if direct is not None:
effective_model = direct.model
effective_provider = direct.provider
if direct.base_url:
explicit_base_url_from_alias = direct.base_url.rstrip("/")
else:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
explicit_base_url=explicit_base_url_from_alias,
)
# Pull in whatever toolsets the user has enabled for "cli".
-1
View File
@@ -36,7 +36,6 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
("wecom_callback", PlatformInfo(label="💬 WeCom Callback", default_toolset="hermes-wecom-callback")),
("weixin", PlatformInfo(label="💬 Weixin", default_toolset="hermes-weixin")),
("qqbot", PlatformInfo(label="💬 QQBot", default_toolset="hermes-qqbot")),
("yuanbao", PlatformInfo(label="🤖 Yuanbao", default_toolset="hermes-yuanbao")),
("webhook", PlatformInfo(label="🔗 Webhook", default_toolset="hermes-webhook")),
("api_server", PlatformInfo(label="🌐 API Server", default_toolset="hermes-api-server")),
("cron", PlatformInfo(label="⏰ Cron", default_toolset="hermes-cron")),
-14
View File
@@ -79,20 +79,6 @@ VALID_HOOKS: Set[str] = {
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
# Approval lifecycle hooks. Fired by tools/approval.py when a dangerous
# command needs user approval -- fires BOTH for CLI-interactive prompts
# and for gateway/ACP approvals (Telegram, Discord, Slack, TUI, etc.).
# Observers only: return values are ignored. Plugins cannot veto or
# pre-answer an approval from these hooks (use pre_tool_call to block
# a tool before it reaches approval).
#
# Kwargs for pre_approval_request:
# command: str, description: str, pattern_key: str, pattern_keys: list[str],
# session_key: str, surface: "cli" | "gateway"
# Kwargs for post_approval_response: same as above plus
# choice: "once" | "session" | "always" | "deny" | "timeout"
"pre_approval_request",
"post_approval_response",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
+1
View File
@@ -999,6 +999,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
# We need to map logical cursor positions to screen rows
# accounting for non-navigable separator/headers
draw_row = 0 # tracks navigable item index
# --- General Plugins section ---
if n_plugins > 0:
+2 -58
View File
@@ -954,59 +954,6 @@ def import_profile(archive_path: str, name: Optional[str] = None) -> Path:
# Rename
# ---------------------------------------------------------------------------
def _migrate_honcho_profile_host(old_name: str, new_name: str, new_dir: Path) -> None:
"""Rename Honcho host blocks for a renamed profile without changing peers."""
old_host = f"hermes.{old_name}"
new_host = f"hermes.{new_name}"
candidates = [
new_dir / "honcho.json",
_get_default_hermes_home() / "honcho.json",
Path.home() / ".honcho" / "config.json",
]
seen: set[Path] = set()
for path in candidates:
try:
resolved = path.resolve()
except OSError:
resolved = path
if resolved in seen or not path.is_file():
continue
seen.add(resolved)
try:
raw = json.loads(path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
hosts = raw.get("hosts")
if not isinstance(hosts, dict) or old_host not in hosts:
continue
if new_host in hosts:
print(f"⚠ Honcho host block not migrated: {new_host} already exists in {path}")
continue
block = hosts[old_host]
if isinstance(block, dict) and "aiPeer" not in block:
bare = old_host.split(".", 1)[1] if "." in old_host else old_host
block["aiPeer"] = bare
hosts[new_host] = hosts.pop(old_host)
tmp = path.with_suffix(path.suffix + ".tmp")
try:
tmp.write_text(json.dumps(raw, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
tmp.replace(path)
except OSError:
try:
tmp.unlink(missing_ok=True)
except OSError:
pass
continue
print(f"✓ Honcho host updated: {old_host}{new_host}")
def rename_profile(old_name: str, new_name: str) -> Path:
"""Rename a profile: directory, wrapper script, service, active_profile.
@@ -1037,10 +984,7 @@ def rename_profile(old_name: str, new_name: str) -> Path:
old_dir.rename(new_dir)
print(f"✓ Renamed {old_dir.name}{new_dir.name}")
# 3. Update profile-scoped Honcho host blocks, preserving aiPeer identity
_migrate_honcho_profile_host(old_name, new_name, new_dir)
# 4. Update wrapper script
# 3. Update wrapper script
remove_wrapper_script(old_name)
collision = check_alias_collision(new_name)
if not collision:
@@ -1049,7 +993,7 @@ def rename_profile(old_name: str, new_name: str) -> Path:
else:
print(f"⚠ Cannot create alias '{new_name}'{collision}")
# 5. Update active_profile if it pointed to old name
# 4. Update active_profile if it pointed to old name
try:
if get_active_profile() == old_name:
set_active_profile(new_name)
-34
View File
@@ -71,13 +71,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
auth_type="oauth_external",
base_url_override="cloudcode-pa://google",
),
"lmstudio": HermesOverlay(
transport="openai_chat",
auth_type="api_key",
extra_env_vars=("LM_API_KEY",),
base_url_override="http://127.0.0.1:1234/v1",
base_url_env_var="LM_BASE_URL",
),
"copilot-acp": HermesOverlay(
transport="codex_responses",
auth_type="external_process",
@@ -165,21 +158,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="XIAOMI_BASE_URL",
),
"tencent-tokenhub": HermesOverlay(
transport="openai_chat",
base_url_env_var="TOKENHUB_BASE_URL",
),
"arcee": HermesOverlay(
transport="openai_chat",
base_url_override="https://api.arcee.ai/api/v1",
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": HermesOverlay(
transport="openai_chat",
extra_env_vars=("GMI_API_KEY",),
base_url_override="https://api.gmi-serving.com/v1",
base_url_env_var="GMI_BASE_URL",
),
"ollama-cloud": HermesOverlay(
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
@@ -190,10 +173,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat", # default; overridden by api_mode in config
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
"bedrock": HermesOverlay(
transport="bedrock_converse",
auth_type="aws_sdk",
),
}
@@ -308,12 +287,6 @@ ALIASES: Dict[str, str] = {
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
# tencent
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
# bedrock
"aws": "bedrock",
"aws-bedrock": "bedrock",
@@ -324,10 +297,6 @@ ALIASES: Dict[str, str] = {
"arcee-ai": "arcee",
"arceeai": "arcee",
# gmi
"gmi-cloud": "gmi",
"gmicloud": "gmi",
# Local server aliases → virtual "local" concept (resolved via user config)
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
@@ -350,9 +319,6 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
"gmi": "GMI Cloud",
"tencent-tokenhub": "Tencent TokenHub",
"lmstudio": "LM Studio",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+14 -100
View File
@@ -231,19 +231,6 @@ def _resolve_runtime_from_pool_entry(
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# Model-family inference for GPT-5.x / codex / o1-o4: Azure rejects
# /chat/completions on these with 400 "operation unsupported" — see
# azure_foundry_model_api_mode() for rationale. Skip when the user
# explicitly picked anthropic_messages (Anthropic-style endpoint).
if effective_model and api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
api_mode = inferred
# For Anthropic-style endpoints, strip /v1 suffix
if api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
@@ -260,16 +247,11 @@ def _resolve_runtime_from_pool_entry(
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if provider in ("opencode-zen", "opencode-go"):
# Re-derive api_mode from the effective model rather than the
# persisted api_mode: the opencode providers serve both
# anthropic_messages and chat_completions models, so the previous
# session's mode must not leak across /model switches.
# Refs #16878.
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, effective_model)
elif configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@@ -469,30 +451,6 @@ def _resolve_named_custom_runtime(
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
) -> Optional[Dict[str, Any]]:
# Bare `provider="custom"` with an explicit base_url (e.g. propagated
# from a `model_aliases:` direct-alias resolution) — build a runtime
# directly so the alias's base_url actually takes effect.
requested_norm = (requested_provider or "").strip().lower()
if requested_norm == "custom" and explicit_base_url:
base_url = explicit_base_url.strip().rstrip("/")
api_key_candidates = [
(explicit_api_key or "").strip(),
os.getenv("OPENAI_API_KEY", "").strip(),
os.getenv("OPENROUTER_API_KEY", "").strip(),
]
api_key = next(
(c for c in api_key_candidates if has_usable_secret(c)),
"",
) or "no-key-required"
return {
"provider": "custom",
"api_mode": _detect_api_mode_for_url(base_url) or "chat_completions",
"base_url": base_url,
"api_key": api_key,
"source": "direct-alias",
"requested_provider": requested_provider,
}
custom_provider = _get_named_custom_provider(requested_provider)
if not custom_provider:
return None
@@ -650,7 +608,6 @@ def _resolve_azure_foundry_runtime(
model_cfg: Dict[str, Any],
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve an Azure Foundry runtime entry.
@@ -671,22 +628,6 @@ def _resolve_azure_foundry_runtime(
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"
# Model-family inference: Azure Foundry deploys GPT-5.x / codex / o1-o4
# reasoning models as Responses-API-only. Calling /chat/completions
# against them returns 400 "The requested operation is unsupported."
# Upgrade api_mode when the model name matches, unless the user has
# explicitly chosen anthropic_messages (Anthropic-style endpoint).
effective_model = str(target_model or model_cfg.get("default") or "").strip()
if effective_model and cfg_api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
cfg_api_mode = inferred
env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
base_url = explicit_base_url_clean or cfg_base_url or env_base_url
if not base_url:
@@ -923,7 +864,6 @@ def resolve_runtime_provider(
model_cfg=_get_model_config(),
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
target_model=target_model,
)
return azure_runtime
@@ -1124,34 +1064,13 @@ def resolve_runtime_provider(
cfg_base_url and "azure.com" in cfg_base_url.lower()
)
if _is_azure_endpoint:
# Honor user-specified env var hints on the model config before
# falling back to the built-in AZURE_ANTHROPIC_KEY / ANTHROPIC_API_KEY
# chain. Accept both `key_env` (Hermes canonical — matches the
# custom_providers field name) and `api_key_env` (documented in the
# Azure Foundry guide and read by most Hermes-compatible importers).
# Matches the config.yaml examples in website/docs/guides/azure-foundry.md.
token = ""
for hint_key in ("key_env", "api_key_env"):
env_var = str(model_cfg.get(hint_key) or "").strip()
if env_var:
token = os.getenv(env_var, "").strip()
if token:
break
# Next: an inline api_key on the model config (useful in multi-profile
# setups that want to avoid env-var juggling).
if not token:
token = str(model_cfg.get("api_key") or "").strip()
# Finally fall back to the historical fixed names.
if not token:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
if not token:
raise AuthError(
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or "
"ANTHROPIC_API_KEY, or point key_env/api_key_env in your "
"config.yaml model section at a custom env var."
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or ANTHROPIC_API_KEY."
)
else:
from agent.anthropic_adapter import resolve_anthropic_token
@@ -1262,20 +1181,15 @@ def resolve_runtime_provider(
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Only honor persisted api_mode when it belongs to the same provider family.
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if provider in ("opencode-zen", "opencode-go"):
# opencode-zen/go must always re-derive api_mode from the
# target model (not the stale persisted api_mode), because
# the same provider serves both anthropic_messages
# (e.g. minimax-m2.7) and chat_completions (e.g.
# deepseek-v4-flash) and switching models via /model would
# otherwise carry the previous mode forward, stripping /v1
# from base_url for chat_completions models and 404'ing.
# Refs #16878.
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
# Prefer the target_model from the caller (explicit mid-session
# switch) over the stale model.default; see _resolve_runtime_from_pool_entry
# for the same rationale.
_effective = target_model or model_cfg.get("default", "")
api_mode = opencode_model_api_mode(provider, _effective)
elif configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
else:
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)

Some files were not shown because too many files have changed in this diff Show More