Compare commits

...

331 Commits

Author SHA1 Message Date
Brooklyn Nicholson b9393296fc Fix TUI input field ANSI leaks and text selection issues
1. Fix ANSI escape code leakage during scroll operations:
   - Add ensureSafeAnsi utility to ensure all ANSI sequences are properly terminated
   - Modify renderWithCursor and renderWithSelection to always include reset codes
   - Add final safety check in text rendering to catch any potential leaks

2. Improve text selection in input field:
   - Add proper mouse drag event handling for text selection
   - Enhance click handlers to support selection operations
   - Fix edge cases in selection rendering
   - Ensure proper reset of ANSI codes after selections

Fixes reported issues where users couldn't copy/paste text in input field and
experienced ANSI leakage during scrolling operations.
2026-04-26 23:31:49 -05:00
brooklyn! e63929d4f3 Merge pull request #15926 from NousResearch/bb/tui-long-session-perf
perf(tui): stabilize long-session scrolling
2026-04-26 23:10:08 -05:00
Teknium 859e09b7ce chore(release): map xiahu889889@proton.me to xiahu88988 2026-04-26 21:08:19 -07:00
xiahu88988 898ccfd667 fix(skills): honor scope query from Google OAuth redirect URL
Parse scope from the raw callback URL before stripping the auth code so Flow.fetch_token matches user-granted scopes. Add regression test for dual-scope callbacks.

Made-with: Cursor
2026-04-26 21:08:19 -07:00
Teknium 6c87371815 fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327)
Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes
migration (especially migrations done via OpenClaw's own tool, which
doesn't archive the source directory).

1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
   rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml
   (capital H — a directory that doesn't exist). Now case-preserving:
   "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem
   paths land on the real Hermes home). Regex logic unchanged — replacement
   function now checks if the matched text was all-lowercase and emits the
   replacement in the matching case.

2. agent/onboarding.py + cli.py: one-time startup banner the first time
   Hermes launches and finds ~/.openclaw/. Tells the user to run
   `hermes claw cleanup` to archive it, gated on the existing onboarding
   seen-flag framework (onboarding.seen.openclaw_residue_cleanup in
   config.yaml). Fires once per install; re-running requires wiping that
   flag or running cleanup directly.

Tests:
- 4 new TestDetectOpenclawResidue tests (present / absent / file-instead-
  of-dir / default-home smoke)
- 2 TestOpenclawResidueHint tests (content check)
- 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip)
- test_rebrand_text_preserves_filesystem_path_casing regression test
  with 4 scenarios including the exact ~/.openclaw/config.yaml case
- Existing test_rebrand_text_* tests updated to the new case-preserving
  contract (lowercase input → lowercase output)

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:26 -07:00
Teknium 517f30b043 improve(agent): guidance for plain-text URLs, subagent language/verification, hermes-config routing (#16325)
Four small tool-description / skill-content tweaks addressing recurring
model mistakes seen in @versun's docx feedback (Kimi 2.6, but the patterns
apply to every model):

1. browser_navigate description: call out .md/.txt/.json/.yaml/.csv/.xml,
   raw.githubusercontent.com, and API endpoints as specifically preferring
   curl or web_extract. The generic "prefer web_search or web_extract" was
   too weak; models kept firing up the browser for plain-text URLs.

2. delegate_task description: two additions.
   (a) Pass user language / output-style preferences in 'context' when they
   differ from English — otherwise subagents default to English and their
   summaries contaminate the final reply (caused the bilingual digest bug).
   (b) Subagent summaries are self-reports, not verified facts. For
   operations with external side-effects (HTTP uploads, remote writes,
   file creation at shared paths), require a verifiable handle (URL, ID,
   path) and verify it yourself before claiming success.

3. agent/prompt_builder.py Skills-mandatory block: new explicit line
   "Whenever the user asks to configure / set up / modify / install /
   enable / disable / troubleshoot Hermes Agent itself, load the
   `hermes-agent` skill first." The generic "load what's relevant" didn't
   route Hermes-meta questions (like "how do I turn off redaction?") to
   the one skill that has the answer.

4. skills/autonomous-ai-agents/hermes-agent/SKILL.md: new "Security &
   Privacy Toggles" section covering security.redact_secrets (with the
   import-time-snapshot restart-required caveat), privacy.redact_pii,
   approvals.mode (manual/smart/off) + --yolo + HERMES_YOLO_MODE, shell
   hooks allowlist, and how to disable network/media tools entirely.
   Every command verified against the actual config keys — no invented
   knobs.

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:19 -07:00
Teknium 9c416e20ab feat(skills): install skills from a direct HTTP(S) URL (#16323)
* feat(skills): install skills from a direct HTTP(S) URL

Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and
`/skills install <url>` work as first-class operations — no more
improvising with curl + patch + cp.

- Claims identifiers that start with http(s):// and end in .md
- Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those)
- Skill name from YAML frontmatter, URL-slug fallback
- Single-file SKILL.md only (v1 scope — multi-file skills need a manifest)
- Trust level 'community'; full security scan still runs
- Lock file stores the URL as identifier so `hermes skills update`
  re-fetches from the same URL cleanly

Scope matches real user need from @versun's docx feedback where
`https://sharethis.chat/SKILL.md` had no first-class install path.

* feat(skills): interactive name/category for URL installs + --name override

Follow-up to the UrlSource adapter. The previous commit fell back to weak
heuristics when frontmatter had no ``name:`` and could produce garbage names
like ``SKILL`` or ``unnamed-skill``. Now:

tools/skills_hub.py
- ``UrlSource._is_valid_skill_name()`` — strict identifier check
  (``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``,
  ``INDEX``, ``unnamed-skill``, empty, non-strings).
- ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when
  nothing valid is resolvable. Also ignores unsafe frontmatter names
  (``../evil``) and falls through to URL slug instead of returning None
  immediately, so a URL with a bad frontmatter but a good path still
  works.
- ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in
  metadata/extra when resolution fails, letting ``do_install`` decide
  whether to prompt, apply an override, or error out.

hermes_cli/skills_hub.py
- ``do_install`` gains a ``name_override`` parameter.
- On URL-sourced bundles with ``awaiting_name=True``:
  1. If ``name_override`` is valid → use it.
  2. If ``name_override`` is invalid → refuse with a clear error.
  3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI /
     gateway / scripts) → refuse with an actionable retry hint pointing
     at ``--name <your-name>`` on both CLI and slash forms.
  4. Else (interactive TTY) → prompt for the name.
- Interactive TTY also prompts for a category when none is given for a
  URL-sourced install, hinting existing category buckets so users can
  reuse ``productivity``, ``devops``, etc. Empty input → flat install.
- ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that
  look like category buckets (contain nested SKILL.md files); skips
  top-level skills and hidden dirs.
- ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers
  (EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style).

hermes_cli/main.py
- ``hermes skills install`` argparse gains ``--name <name>``.

hermes_cli/skills_hub.py (slash)
- ``/skills install <url> --name <x>`` parsing added.

Tests
- tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert
  the new ``awaiting_name`` metadata; added 4 new tests for
  ``_is_valid_skill_name`` rejection sets and the awaiting-name marker.
- tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name
  override accept/reject, non-interactive error, interactive name prompt,
  interactive category prompt, cancel-aborts-install, and
  ``_existing_categories`` scan behavior (buckets vs flat skills).
- E2E verified all four paths (no-name/no-override → error;
  --name override → install; frontmatter name → install;
  invalid --name → rejection).

---------

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:10 -07:00
Brooklyn Nicholson d308ae27e1 fix(nix): refresh tui npm deps hash
Update nix/tui.nix npmDeps hash to match the current ui-tui package-lock inputs so nix builds and CI lockfile checks pass.
2026-04-26 22:56:36 -05:00
sprmn24 b288934dff fix(discord_tool): coerce limit parameter to int before min() call
_search_members() and _fetch_messages() call min(limit, 100) assuming
limit is int. Models can pass limit as a string (e.g. "10"), causing
TypeError: '<' not supported between instances of 'str' and 'int'.

Add try/except int() coercion with safe defaults at the top of both
functions, matching the pattern used in session_search fix (#10522).
2026-04-26 20:48:38 -07:00
Teknium e19854d893 fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322)
`_resolve_effective_accept()` used `return bool(cfg_val)` for the
`hooks_auto_accept` config key. In Python, `bool("false")` is `True`,
so a user setting `hooks_auto_accept: "false"` (quoted YAML string)
in `config.yaml` would silently enable auto-approval of every shell
hook, bypassing the consent prompt entirely.

Replace the coercion with the same type-aware parsing already used for
the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough,
strings checked against {1,true,yes,on} case-insensitively, everything
else (including "false", None, 0, ints) rejected.

Add TestHooksAutoAcceptParsing guarding the regression across all four
value shapes (bool, string-truthy, string-falsy, missing/None).

Reported by @sprmn24 in #16244.
2026-04-26 20:48:35 -07:00
Teknium 6993e566ba fix(whatsapp_identity): pin identifier regex to ASCII, clarify it's defense-in-depth
Follow-up on top of #16243. Two small tweaks:

- Compile the regex once as `_SAFE_IDENTIFIER_RE` and pin it to
  `[A-Za-z0-9@.+\-]`. The previous `\w` accepts Unicode word chars
  (full-width digits, accented letters) which aren't valid WhatsApp
  identifiers and shouldn't reach the mapping-file lookup.
- Add a comment clarifying this is defense-in-depth, not a live
  traversal. The hardcoded `lid-mapping-{current}{suffix}.json`
  prefix already prevents escape via pathlib's component split —
  with `current='../secrets'`, the first path component under
  `session/` is the literal directory name `lid-mapping-..`,
  which the attacker cannot create.

E2E verified: legit mapping chains still resolve, all probed attack
shapes (`../`, absolute paths, shell metacharacters, Unicode digit
tricks) are rejected before any file access.
2026-04-26 20:48:31 -07:00
sprmn24 91512b8210 fix(whatsapp_identity): guard against path traversal and silent mapping errors
expand_whatsapp_aliases() interpolated untrusted identifiers directly
into filenames (lid-mapping-{current}.json) without validation.
An identifier containing ../ or / could escape the session directory.

Also replaced bare except Exception: continue with targeted
(OSError, json.JSONDecodeError) and a debug log so mapping
corruption is diagnosable instead of silently skipped.

Fixes:
- Reject identifiers with unsafe characters via re.match guard
- Replace broad exception swallow with specific catch + debug log
2026-04-26 20:48:31 -07:00
Teknium 366351b94d refactor(timeouts): drop redundant ImportError in except clause
Exception already covers ImportError; (ImportError, Exception) was a
cosmetic wart from the bugfix. Pure no-op.
2026-04-26 20:48:20 -07:00
sprmn24 16e243e067 fix(timeouts): guard load_config() call against runtime exceptions
Both get_provider_request_timeout() and get_provider_stale_timeout()
wrapped the load_config import in try/except ImportError but left the
actual load_config() call unprotected. A corrupt config file, YAML
parse error, or permission failure would raise instead of returning
None safely.

Move load_config() inside the try block so any exception returns None.
2026-04-26 20:48:20 -07:00
Brooklyn Nicholson 3e1664923d Revert "fix(tui): report actual session on exit"
This reverts commit 1566f1eecc.
2026-04-26 22:43:34 -05:00
Brooklyn Nicholson c23463fce9 chore(tui): keep MRU resume split out of perf PR
- remove the temporary -c MRU logic and companion test from this branch so PR #15926 stays focused on TUI perf work
- keep the resume-ordering change isolated in the dedicated follow-up PR
2026-04-26 22:40:35 -05:00
Brooklyn Nicholson de790eaceb test(tui): align viewport snapshot key test with quantization
- keep 8-row key binning for scroll jitter stability and update the assertion to match runtime behavior
2026-04-26 22:35:55 -05:00
Brooklyn Nicholson d81b1cd86c chore: uptick 2026-04-26 22:22:31 -05:00
Brooklyn Nicholson 7945fcef21 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-long-session-perf 2026-04-26 22:17:22 -05:00
Brooklyn Nicholson ffa33e53f6 chore(tui): remove dead branch cleanup code
- drop unused TUI helpers, test-only layout scaffolding, and stale public debug exports
- remove an unused profiler import and trim test-only coverage for deleted helpers
2026-04-26 21:54:24 -05:00
Brooklyn Nicholson 635948d0e0 chore(tui): tighten todo-fix comments, drop dead archive call
- gateway handler: turnController always archives in recordMessageComplete,
  so the post-complete archiveTodosAtTurnEnd().forEach is dead code. Drop
  it and the now-unused import.
- turnController: collapse archive prepend into a single spread expression.
- gateway server: one-line comment for the tool.start todo skip.
2026-04-26 21:46:50 -05:00
Brooklyn Nicholson c2ca02fcff fix(tui): stabilize live todo panel count and anchor position
Two bugs surfaced together while the model fired the todo tool:

1. Count flickered (e.g. 3 → 1 → 3) because tool.start echoed
   args.todos as the live state. With merge=true (or any partial
   replacement) args.todos is just the items being updated, not the
   full list. Drop the early echo — tool.complete already carries the
   canonical full list from the tool result.

2. After turn end the panel jumped from under the user prompt to below
   thinking/tools because archiveDoneTodos() was pushed AFTER segments
   in finalMessages. Prepend the archive trail msg so it sits right
   after the user prompt — same visual slot the live panel occupied
   during streaming.
2026-04-26 21:45:18 -05:00
Brooklyn Nicholson b51c528613 fix(tui): address virtual row and perf log review notes
Keep transcript row keys stable across capped-history trims and rename React Profiler timestamp fields so JSONL consumers don't confuse absolute timestamps with durations.
2026-04-26 21:37:43 -05:00
Brooklyn Nicholson 625c31fcea fix(tui): run built TUI with production React by default
CPU profiling showed the built TUI loading React development modules unless NODE_ENV was set. Default CLI and dashboard TUI children to production while preserving explicit user overrides.
2026-04-26 21:34:31 -05:00
Brooklyn Nicholson dda12775f2 fix(tui): address Copilot review follow-ups
Keep history metadata consistent with lineage replay, globally order replayed lineage messages, and make Ink cache eviction report post-eviction sizes. Also keys TUI config cache by path to avoid cross-home test leakage.
2026-04-26 21:24:54 -05:00
Brooklyn Nicholson 2e4b65b9f5 chore(tui): clean remaining Ink perf scaffolding
Trim narration comments and collapse small one-off helpers in the remaining ui-tui perf support files while preserving behaviour.
2026-04-26 21:20:54 -05:00
Teknium cb51baeceb chore(release): map Tosko4 in AUTHOR_MAP 2026-04-26 19:07:18 -07:00
Tosko4 e85b752516 fix: signal compression boundary to context engine
When _compress_context rotates session_id (compression split), fire
on_session_start(new_sid, boundary_reason="compression",
old_session_id=<old>) on the active context engine. Plugin engines
(e.g. hermes-lcm) use this to preserve DAG lineage across the rollover
instead of re-initializing fresh per-session state.

Built-in ContextCompressor.on_session_start accepts **kwargs and ignores
them — no behavior change for default users.

Closes hermes-lcm#68 symptom: after Hermes compressed and minted a new
physical session, LCM was treating the split as a fresh /new and losing
continuity (compression_count: 1, store_messages: 0, dag_nodes: 0).

Credit: @Tosko4 (PR #13370) — minimized scope to the boundary_reason
signal only; the broader session-lifecycle refactor will be taken in
separate PRs if justified by concrete plugin need.
2026-04-26 19:07:18 -07:00
Brooklyn Nicholson 7da2f07641 Merge remote-tracking branch 'origin/main' into bb/tui-long-session-perf 2026-04-26 21:07:15 -05:00
Teknium 478444c262 feat(checkpoints): auto-prune orphan and stale shadow repos at startup (#16303)
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
2026-04-26 19:05:52 -07:00
Teknium ced8f44cd2 fix(file-tools): broaden dedup-status write guard to cover small wrappers
The write_file guard added in #16223 used strict equality against the
internal dedup status message. In practice, the model sometimes
prepends a short note or appends a trailing comment before calling
write_file, which slipped past the strict check.

Broaden the heuristic: reject writes whose stripped content equals
the status message OR contains it and is <=2x its length. Short,
status-dominated writes are always corruption; legitimate docs that
quote the message verbatim are always much longer.

Adds two tests: one for the small-wrapper corruption shape, one
confirming large legitimate files that quote the status still write.
2026-04-26 19:05:36 -07:00
helix4u 977d5f56c9 fix(file-tools): keep read dedup status out of file content 2026-04-26 19:05:36 -07:00
voidborne-d a32b325d06 fix(tools): invalidate read_file dedup cache on write_file and patch
write_file_tool and patch_tool both call _update_read_timestamp to
refresh the staleness tracker after writing, but they never invalidate
the dedup cache entries for the written path.  The dedup cache keys are
(resolved_path, offset, limit) → mtime tuples populated by read_file_tool.

On filesystems where a read and write land in the same mtime second (or
when mtime granularity is 1s), the cached and current mtime are equal,
so the dedup check incorrectly returns a 'File unchanged since last
read' stub — even though the file was just overwritten.

The agent then sees stale content (or a stale 'File not found' error)
and enters expensive error-recovery loops, burning API calls.

Fix: add _invalidate_dedup_for_path(filepath, task_id) that removes all
dedup entries whose resolved path matches the written file.  Called from
_update_read_timestamp so both write_file_tool and patch_tool benefit
automatically.  Scoped to the writing task_id — other tasks' caches are
not affected.

6 regression tests added covering:
- read→write→read within same mtime second (core #13144 scenario)
- invalidation across all offset/limit combinations
- isolation: writing file A does not invalidate file B's cache
- isolation: writing in task A does not invalidate task B's cache
- _invalidate_dedup_for_path safety on missing task / empty dedup

All 25 tests pass (19 existing + 6 new).

Fixes #13144
2026-04-26 19:05:36 -07:00
0z! 419535f07f Update maps_client.py 2026-04-26 19:03:54 -07:00
0z! e504a599fe Update maps_client.py
fix: include seconds in timezone UTC offset output
2026-04-26 19:03:54 -07:00
Yukipukii1 dbe5015566 fix(session-search): exclude current lineage root deterministically in recent mode 2026-04-26 19:03:17 -07:00
teknium ebad6d3f1e chore(release): map yoimexex@gmail.com -> Yoimex 2026-04-26 19:02:55 -07:00
Teknium 87610ce380 fix(tools): coerce quoted use_gateway in image_gen UI detection
Follow-up to #15960 — the provider-active detection in tools_config.py
also read use_gateway with raw truthiness (is False, not dict.get), so
quoted 'false' caused the FAL-direct row to show wrong active status in
the hermes tools picker. Route both sites through is_truthy_value().
2026-04-26 19:02:55 -07:00
Yoimex f66ebe64e8 fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
Teknium 36b13709f5 chore(release): map johnncenae in AUTHOR_MAP 2026-04-26 19:01:50 -07:00
Teknium 77d4766602 fix(gateway): clear pending model note on auto-reset paths too
PR #16013 plugged the leak in `/new`, but two sibling session-boundary
resets had the same bug:

1. Inactivity / suspended-session auto-reset (top of `_handle_message`)
   previously cleared only reasoning. Now drops model override and the
   queued "/model switched" note as well.
2. Compression-exhaustion auto-reset now also drops the pending note
   alongside the existing model/reasoning cleanup.

All three session-boundary sites now use the identical cleanup idiom.
2026-04-26 19:01:50 -07:00
johnncenae 00c6480a05 fix(gateway): clear stale pending model note on session reset 2026-04-26 19:01:50 -07:00
helix4u 88a85d30c1 fix(logging): attach gateway log after cli init 2026-04-26 19:01:26 -07:00
simbam99 cebf95854b Fix MessageDeduplicator max_size enforcement 2026-04-26 18:51:51 -07:00
Teknium 34eb1aaa9a fix(update): use npm ci to stop rewriting package-lock on every update (#16295)
`npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`)
silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.),
leaving the working tree dirty after every `hermes update`. The next update
then detects the dirty lockfile and stashes it — producing a trail of
hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json,
and root package-lock.json.

Switch to `npm ci` (strict, lockfile-preserving) via a new
`_run_npm_install_deterministic` helper that falls back to `npm install`
when the lockfile is missing or out of sync (WIP forks).

Verified locally: all three lockfiles stay byte-identical after the real
_build_web_ui / _update_node_dependencies run twice back-to-back. Fallback
path tested with a deliberately out-of-sync lockfile and a no-lockfile case.
2026-04-26 18:51:31 -07:00
Teknium ab6879634e yuanbao platform (#16298)
Co-authored-by: loongzhao <loongzhao@tencent.com>
2026-04-26 18:50:49 -07:00
Teknium 5eb6cd82b2 fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296)
Four independent session-UX bugs reported by an external user (#16294).

/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.

'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.

TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.

ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.

Closes #16294.
2026-04-26 18:49:48 -07:00
Teknium 7e3c8a31f0 feat(skills/airtable): tailor skill to Hermes idioms + expand cookbook
Expand the airtable skill from bare CRUD to a full Hermes-shaped
cookbook matching the linear/notion neighbors, and trim the
description to fit the 60-char system-prompt cutoff.

Hermes-specific additions:
- Explicit 'use the terminal tool with curl — not web_extract or
  browser_navigate' guidance, matching the same note in linear.
- Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the
  subprocess automatically via env_passthrough, so curl calls don't
  need to re-export it.
- Prefer 'python3 -m json.tool' (always present) over jq (optional)
  for pretty-printing, with -s on every curl to keep output clean.
- Read-before-write workflow that resolves record IDs via
  filterByFormula instead of guessing.

Cookbook expansion (new vs original):
- Field-type reference table (text, select, multi-select, attachment,
  linked record, user) with the exact write-shape Airtable expects.
- typecast flag for auto-coercing values / auto-creating select options.
- performUpsert PATCH for idempotent sync by merge field.
- Batch create/delete endpoints (10-record cap per call).
- Sort + fields query params with URL-encoding (%5B / %5D).
- Named-view query that applies saved filter/sort server-side.
- Full pagination loop template (while loop with offset).
- Common filterByFormula patterns (exact match, contains, AND/OR,
  date comparison, NOT empty).
- Rate-limit backoff guidance (Retry-After header, per-base budget).
- Airtable error-code reference (AUTHENTICATION_REQUIRED,
  INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND,
  INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to
  user-actionable fixes instead of just retrying.

Also: description trimmed from 183 chars (truncated to 60 in system
prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars
that render whole: 'Airtable REST API via curl. Records CRUD, filters,
upserts.' Catalog row updated to match.

SKILL.md grew from 115 to 228 lines — still under the 500-line soft
cap and below the linear skill (297 lines) which serves the same
role for GraphQL.
2026-04-26 18:45:15 -07:00
Teknium 0bef0b9416 chore: docs + attribution for airtable skill
- scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the
  check-attribution CI job and release notes credit Soynchu correctly.
- website/docs/reference/skills-catalog.md: add the airtable row to
  the productivity bundled-skills table.
2026-04-26 18:45:15 -07:00
Teknium 55e9329ee6 feat(config): register bundled-skill API keys in OPTIONAL_ENV_VARS
Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY
to OPTIONAL_ENV_VARS so:

- They persist to ~/.hermes/.env via save_env_value like every other
  key Hermes knows about, instead of being ad-hoc variables the user
  has to hand-edit the dotfile for.
- load_env() / reload_env() populate os.environ from .env on every
  startup — the user sets the key once, skills keep working across
  restarts without losing access.
- hermes setup / hermes config show surface them as known optional
  vars with the correct signup URL (linear.app/settings/api,
  airtable.com/create/tokens, etc.).

These four entries use category="skill" (new) rather than "tool".
tools/environments/local.py auto-adds every category=tool/messaging
entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough
from leaking provider credentials into the execute_code sandbox
(GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the
point is for the agent's subprocess to see them so curl can read
Authorization headers — so they must be outside the blocklist. The
new category is inert for that check.

All four entries are advanced=True: they show up in 'hermes config'
and 'hermes status' displays, but do not nag users who have never
touched those skills during setup checklists.

E2E verified: save_env_value → reload_env → os.environ populated →
skill_view reports setup_needed=False → env_passthrough registers
the key for subprocess inheritance.
2026-04-26 18:45:15 -07:00
Teknium 0d4247d9bf fix(skills/airtable): use .env credential pattern matching notion/linear
Convert the airtable skill from 'skills.config.airtable.api_key'
(config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars:
[AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled
skill that authenticates with an API token.

Why the original shape was wrong:
- metadata.hermes.config is for non-secret skill settings (paths,
  preferences) per references/skill-config-interface.md. Storing a
  bearer token under skills.config.* also triggered the documented
  'hermes config migrate' nag-on-every-run problem.
- The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't
  read skills.config.airtable.api_key anyway — it's a yaml path, not
  an env var.

Follow-up polish on the same pass:
- Added version/author/license frontmatter to match notion/linear.
- Added prerequisites.commands: [curl].
- Setup section now specifies the PAT format (pat...) that replaced
  legacy 'key...' API keys in Feb 2024, plus the three required scopes
  (data.records:read/write, schema.bases:read) and the per-base Access
  list requirement.
- Clarified PATCH vs PUT and pagination (100 records/page cap).
- Swapped verification from 'hermes -q ...' (non-deterministic) to a
  curl /v0/meta/bases call that returns a verifiable HTTP status code.
2026-04-26 18:45:15 -07:00
Sonoyunchu c997183f53 feat(skills): add bundled Airtable productivity skill 2026-04-26 18:45:15 -07:00
Teknium f01e4402a9 chore(release): map georgeglessner in AUTHOR_MAP 2026-04-26 18:43:57 -07:00
George Glessner 5b5a53a155 fix(cli): check hermes_cli/web_dist/ not web/dist/ for build staleness
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.

Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.

Fixes #14898
2026-04-26 18:43:57 -07:00
Teknium 90c84c6dba fix(gateway): unblock update subprocess on recognized-command bypass
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.

Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.

Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
2026-04-26 18:39:44 -07:00
Yukipukii1 bdaf56a94d fix(gateway): bypass slash commands during pending update prompts 2026-04-26 18:39:44 -07:00
Brooklyn Nicholson b1c49d5e73 chore(tui): /clean recent perf work — KISS/DRY pass
24 files, -319 LoC. Behaviour preserved, 369/369 tests green.

- hermes-ink caches: shared lruEvict helper for the four parallel LRU
  caches (stringWidth, wrapText, sliceAnsi, lineWidth); touch-on-read
  stays inlined per cache; tightened output.ts skip-slice fast path.
- wheelAccel: trimmed provenance header, collapsed env parsing, ternary
  dispatch in computeWheelStep.
- perfPane: folded ensureLogDir into once-flag, spread-with-overrides
  for fastPath/phases instead of full rebuilds.
- env: extracted truthy() (used 4×).
- virtualHeights: collapsed user/diff/slash height bumps; trail+todos
  estimate.
- useInputHandlers: scrollIdleTimer cleanup on unmount, ?? undefined
  shorthand.
- useMainApp: dropped dead liveTailVisible IIFE and liveProgress
  indirection.
- appLayout, markdown, messageLine, entry: vertical rhythm, dropped
  narration comments, inlined one-shot vars.
- fix: empty catch blocks → /* best-effort */ for no-empty lint.
2026-04-26 20:38:47 -05:00
Teknium bdc1adf711 chore(release): map haru398801, badgerbees, xnbi in AUTHOR_MAP 2026-04-26 18:33:35 -07:00
Badgerbees 55f212a7a2 fix(slack): honor NO_PROXY for Slack transport 2026-04-26 18:33:35 -07:00
Xnbi 7eaad06a87 fix(gateway): default Slack tool_progress to off
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).

- Built-in slack default: off; other tier-2 platforms unchanged.

- Adjust /verbose isolation test for off to new cycle.

- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
2026-04-26 18:33:35 -07:00
haru398801 a01e767b24 fix(gateway): respect config.yaml slack.enabled when SLACK_BOT_TOKEN env var is set
Previously, setting SLACK_BOT_TOKEN in .env would unconditionally enable
the Slack gateway adapter regardless of `slack.enabled: false` in config.yaml.
This caused spurious "SLACK_APP_TOKEN not set" errors when the token was
used only by skills (e.g. cron jobs that send Slack messages) rather than
for the Hermes messaging gateway.

Now, enabled: false in config.yaml is respected — the token is stored so
skills can still use it, but the gateway adapter is not activated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 18:33:35 -07:00
hharry11 fd474d0f00 fix(gateway): avoid cross-user mirror writes in per-user group sessions 2026-04-26 18:31:24 -07:00
Teknium cd2aee36ca test(sessions): wire sessions_dir through auto-prune + file-cleanup regression tests
- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
  when sessions_dir is passed, preserves them when it isn't (backward-
  compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
  don't break when delete_session signature gains sessions_dir.
2026-04-26 18:31:07 -07:00
Yang Zhi 3b60abb6bb fix(sessions): delete on-disk transcript files during prune and delete (#3015)
`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).

Changes:
- Add `_remove_session_files()` static helper that cleans up
  `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
  files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
  files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
  pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
  never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
  existing behavior
2026-04-26 18:31:07 -07:00
Wysie 0ba6471dd1 fix: recover hindsight embedded daemon after idle shutdown 2026-04-26 18:29:11 -07:00
Yukipukii1 7317d69f19 fix(security): treat quoted false as false in browser SSRF guards 2026-04-26 18:27:13 -07:00
Teknium 2a0fc97c76 chore(release): map mewwts in AUTHOR_MAP 2026-04-26 18:25:41 -07:00
mewwts 8fb861ea6e feat(gateway/slack): support channel_skill_bindings
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.

Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.

Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
  into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
  (behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
  auto_skill to MessageEvent. gateway/run.py already handles auto-skill
  injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
  config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
  covering DM/thread/parent resolution, single-vs-list skills, dedup,
  malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.

Config example:
  slack:
    channel_skill_bindings:
      - id: "D0ATH9TQ0G6"
        skills: ["german-flashcards"]
2026-04-26 18:25:41 -07:00
Teknium 635253b918 feat(busy): add 'steer' as a third display.busy_input_mode option (#16279)
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.

Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
  agent.steer(text) on the UI thread (thread-safe), with automatic
  fallback to 'queue' when the agent is missing, steer() is unavailable,
  images are attached, or steer() rejects the payload. /busy accepts
  'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
  path both route through running_agent.steer() when the mode is 'steer',
  with the same fallback-to-queue safety net. Ack wording tells users
  their message was steered into the current run. Restart-drain queueing
  now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
  CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
  and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
  inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
  _load_busy_input_mode accepts steer; busy-session ack exercises
  steer success + two fallback-to-queue branches.

Requested on X by @CodingAcct.

Default is unchanged (interrupt).
2026-04-26 18:21:29 -07:00
Teknium 87477756fd chore(release): map Ito-69 in AUTHOR_MAP 2026-04-26 18:21:20 -07:00
Ivan Tonov 930494d687 fix(cron): reap orphaned MCP stdio subprocesses after each tick
MCP stdio servers are spawned via the SDK's stdio_client, which on
Linux uses start_new_session=True (setsid).  When a cron job is
cancelled mid-way (timeout, agent finish, exception), the subprocess
often escapes the SDK's teardown and survives as a session leader.
Because setsid() detaches the child from the gateway's process group
/ cgroup tree, systemd does not reap it on service restart either —
so every cron tick that touches an MCP tool leaks a dangling server
process.

Fix:

* tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session
  context in try/finally.  On any exit path (clean, exception,
  cancellation), PIDs still alive are moved from the active
  _stdio_pids set into a new _orphan_stdio_pids set.  Orphan
  detection is done via os.kill(pid, 0) — a cheap liveness probe
  that never signals the target.

* tools/mcp_tool.py — _kill_orphaned_mcp_children gains an
  include_active=False flag.  Default behaviour now only reaps the
  orphan set so concurrent sessions (other parallel cron jobs or
  live user chats) are never disrupted.  The existing shutdown path
  passes include_active=True to keep the previous "kill everything"
  semantics after the MCP loop is stopped.

* cron/scheduler.py — the cleanup hook is moved from run_job()'s
  finally (which would race with parallel siblings after #13021)
  into tick() after the ThreadPoolExecutor has joined every future.
  At that point there are no in-flight sessions from this tick, so
  sweeping the orphan set is always safe.

Net effect: zero regression for healthy sessions, and orphan MCP
servers no longer accumulate between gateway restarts.

Made-with: Cursor
2026-04-26 18:21:20 -07:00
Teknium 5db6db891c chore(release): map ghostmfr in AUTHOR_MAP 2026-04-26 18:20:17 -07:00
ghostmfr e818ec520a fix(slack): harden attachment handling
Multiple overlapping Slack attachment improvements:

1. Upload retry with backoff on transient errors (429, 5xx, connection
   reset, rate_limited, service unavailable). New _is_retryable_upload_error
   helper covers three upload paths: _upload_file, send_video,
   send_document. Up to 3 attempts with 1.5s * attempt backoff.

2. Thread participation tracking: successful file uploads now add the
   thread_ts to _bot_message_ts, mirroring how text replies are tracked.
   This lets follow-up thread messages auto-trigger the bot (same
   engagement rules as replied threads).

3. Thread metadata preservation in the image redirect-guard fallback
   (send_image → send text fallback) and in two gateway.run.py send
   paths (image + document fallback calls).

4. HTML response rejection in _download_slack_file_bytes. Parallels
   the existing check in _download_slack_file. Guards against Slack
   returning a sign-in / redirect page as document bytes when scopes
   are missing, so the agent doesn't get HTML-as-a-PDF.

5. File lifecycle event acks (file_shared / file_created / file_change).
   These events arrive around snippet uploads. Acking them silences the
   slack_bolt 'Unhandled request' 404 warnings without changing behavior.

6. Post-loop message type classification so a mixed image+document upload
   classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
   Previously, the per-file classification in the inbound loop could be
   overwritten unpredictably.

7. Expanded text-inject whitelist in inbound document handling to cover
   .csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
   snippets and config files are directly visible to the agent, not just
   cached as opaque uploads. Paired with new MIME entries in
   SUPPORTED_DOCUMENT_TYPES in base.py.

Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).
2026-04-26 18:20:17 -07:00
Brooklyn Nicholson 527ac351b4 fix(tui): address Copilot review comments
- stringWidth: true LRU on cache hit (touch-on-read via delete+set) so
  hot strings stay resident under long sessions; was insertion-order
  FIFO before
- virtualHeights: include todos, panel sections, and intro version in
  messageHeightKey so height-cache reuse correctly invalidates when
  todo content / panel sections change
- virtualHeights: estimate trail+todos rows at todos.length+2 (or 2
  collapsed) instead of the generic ~1-line fallback, so initial
  virtualization offsets are closer to reality
- useInputHandlers: clearTimeout on unmount for scrollIdleTimer so
  pending relaxStreaming() never fires after teardown
- render-node-to-output: drop unused declined.noHint counter from
  scrollFastPathStats; it was always 0 (the "hint missing" branch is
  outside the diagnostics block)
- perfPane / hermes-ink.d.ts: follow the noHint removal
- wheelAccel: replace ~/claude-code path comment with generic
  attribution that doesn't reference a developer-local checkout
2026-04-26 20:07:41 -05:00
Brooklyn Nicholson b115ea62da feat(tui): anchor LiveTodoPanel to latest user message row
TodoPanel now renders as a child of the most recent user message's
virtualized row container, so it visually belongs to that prompt and
follows it during scroll. Falls back gracefully when no user message
exists yet (panel just doesn't render).
2026-04-26 20:07:29 -05:00
Brooklyn Nicholson 25767513f2 perf(tui): unified Ink cache eviction on memory pressure + session reset
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:

- memoryMonitor: half-prune on 'high', full drop on 'critical', before
  the heap dump / auto-restart path. Gives long sessions a shot at
  recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
  with a half-warm pool and the prior session can resume cheaply.

Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.

Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
2026-04-26 19:41:53 -05:00
Brooklyn Nicholson c370e2e1e5 perf(tui): cache stringWidth/wrapText/sliceAnsi + skip-slice when line fits clip
CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:

  Output.get() per-frame walk:                 24% total
  └─ sliceAnsi(line, from, to) per write:     18% total
  stringWidth(line) chain (cached + JS):      14% total

All three were re-doing identical work every frame: same string → same
clipped slice → same width.

Fixes:

1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
   skips the cache (inline scan beats Map.get for short ASCII, the >90%
   case). String.charCodeAt scan up to 64 chars is cheaper than the regex
   fallback.

2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
   is pure and the same content reflows identically every frame.

3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
   end-defined hot path used by Output.get().

4. Skip the slice entirely in Output.get() when the line already fits the
   clip box (startsBefore=false && endsAfter=false). Most transcript lines
   never exceed their container width, and tokenizing them just to slice
   (line, 0, width) was pure overhead. This single fast-path drops
   sliceAnsi from 18% → ~0% in the profile.

Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.

Validation, real-user CPU profile, page-up scroll on 11k-line session:

  Output.get() self-time:     24%   →   0.3%
  sliceAnsi total:            18%   →   not in top 25
  stringWidth family:         14%   →   ~3%
  idle:                     60.7%   →  77.3%

Frame timings (synthetic page-up profile harness):
  dur p95:   ~10ms   →  4.87ms
  dur p99:   25ms+   → 12.80ms
  yoga p99:  ~20ms   →  1.87ms

The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
2026-04-26 19:28:09 -05:00
Teknium b16f9d438b feat(telegram): send fresh finals for stale preview streams (port openclaw#72038) (#16261)
Ports openclaw/openclaw#72038 to hermes-agent.

Telegram's `editMessageText` preserves the original message timestamp,
so a long-running streamed reply (reasoning models that take 60+ seconds
to finish) would keep the first-token timestamp even after completion.
Users can't tell how long a task actually took.

When a preview message has been visible for >= 60s (configurable via
`streaming.fresh_final_after_seconds`), finalize by sending a fresh
message instead of editing in place, then best-effort delete the stale
preview. Short previews still edit in place (the existing fast path).

Implementation notes adapted from OpenClaw's TypeScript original:
- `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 =
  legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60.
- `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and
  checks it in `_send_or_edit` on `finalize=True`. New helpers
  `_should_send_fresh_final` + `_try_fresh_final`.
- `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)`
  returning False by default. `TelegramAdapter` implements it via
  `_bot.delete_message`.
- `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`;
  other platforms ignore the setting (they don't have the stale-edit
  timestamp problem or edit-then-read works cheaply).
- Fallback to normal edit on any fresh-send failure — no user-visible
  regression if Telegram rate-limits a send or the message is gone.

Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py
covering short/long previews, config plumbing, delete-support absent,
send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig
round-trip.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-04-26 17:26:37 -07:00
Brooklyn Nicholson 85e9a23efb feat(tui): HERMES_TUI_FPS=1 shows live fps counter
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).

Implementation:
  * lib/fpsStore.ts — nanostore atom updated from a trackFrame()
    sink.  Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
    trackFrame is undefined when SHOW_FPS is off so ink's onFrame
    short-circuits at the optional chain.
  * components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
    when SHOW_FPS is off (React skips the subtree entirely).
  * entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
    trackFrame (fps) so both flags can coexist.  When both are off,
    onFrame is undefined and ink never attaches the handler.
  * appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
    aligned Box below the composer, conditional on SHOW_FPS.

Usage:
  HERMES_TUI_FPS=1 hermes --tui
  # bottom right: "  62.3fps ·   0.8ms · #1234" (green/yellow/red)

Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.

126 files post-compile with React Compiler; 352 tests still pass.
2026-04-26 17:20:47 -05:00
Brooklyn Nicholson 4395c2b007 feat(tui): port claude-code's wheel accel state machine
Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events
with an adaptive accel state machine that infers user intent from
inter-event timing.

Algorithm ported straight from claude-code's
src/components/ScrollKeybindingHandler.tsx.  All tuning constants,
the native/xterm.js path split, the encoder-bounce detection, the
trackpad-burst signature → all theirs.  This file is a mechanical
port into our module structure.

What it does:

  precision click (>500ms gap)   1 row/event   (deliberate scan)
  sustained mouse (40-200ms)     2-6 rows      (decay curve)
  detected wheel bounce          ramps to 15   (sticky wheel-mode)
  trackpad flick (5+ <5ms)       1 row/event   (burst detect)
  direction reversal             reset to base

Two implementation paths:

  * native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear
    window-ramp + optional wheel-mode curve triggered by detected
    encoder bounce.  SGR proportional reporting handled via the
    burst-count guard.

  * xterm.js (VS Code / Cursor / browser terminals) — pure
    exponential-decay curve with fractional carry.  Events arrive
    1-per-notch with no pre-amplification, so the curve is more
    aggressive.

Selected at construction via isXtermJs() from @hermes/ink (now
exported).  Per-user tune via HERMES_TUI_SCROLL_SPEED (alias
CLAUDE_CODE_SCROLL_SPEED for portability).

13 unit tests covering direction flip/bounce/reversal, idle
disengage, trackpad-burst disengage, frac invariants, and the
native vs xterm.js branches.

Profiled under --rate 30 (stress test) and --rate 10 (realistic
sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to
1-3 rows at sparse 10Hz clicks.  Perf is comparable to baseline
because accel IS multiplying step — the win is perceptual (fast
flicks cover distance, slow clicks keep precision), not raw fps.

Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the
base; this modulates around it.
2026-04-26 17:16:11 -05:00
Brooklyn Nicholson 0cd98499bb Promote debugging-hermes-tui-commands to in-repo skill
Was user-local in ~/.hermes/skills/. Ported into skills/software-development/
so other Hermes users get it and so the related_skills links from
node-inspect-debugger and python-debugpy resolve in-repo.

Frontmatter upgraded to match repo convention (version/author/license/
metadata.hermes.{tags,related_skills}, description rewritten as "Use when ...").
Body expanded with debugging-tactics section pointing at the two new
debugger skills, and additional common-issues / pitfalls entries.
2026-04-26 17:13:12 -05:00
Brooklyn Nicholson 4cdb6962ca Add hermes-agent-skill-authoring skill
Class-level skill for writing SKILL.md files inside this repo: required
frontmatter per tools/skill_manager_tool.py validator, size limits,
peer-matched structure, directory placement, write_file vs skill_manage,
caching pitfalls, cross-reference caveats.
2026-04-26 17:12:25 -05:00
Brooklyn Nicholson 9a46feb9bd experiment(tui): HERMES_TUI_INLINE flag to skip AlternateScreen
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.

Result: definitively NO.  Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height.  Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.

Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):

  metric                    fullscreen       inline       delta
  patches_total              28,864         1,111,574     +3751%
  writeBytes_total           42 KB          1.6 MB        +3881%
  fps_throughput             15.8 fps       1.75 fps      -89%
  frames                     179            18            -90%
  gap_p50_ms                 17 (~60fps)    726 (~1fps)   +4170%
  yoga_p99                   34 ms          405 ms        +1083%
  renderer_p99               14 ms          169 ms        +1062%
  flickers                   0              5 offscreen   —

This is actually the cleanest data we've gotten so far:

  * AlternateScreen is LOAD-BEARING for perf — its viewport height
    constraint is what lets useVirtualHistory's culling work.  No
    constraint → ScrollBox grows unbounded → every fiber mounts.

  * The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
    under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
    frames.  Our terminal-write hypothesis from last session was
    wrong: the bottleneck is React + Yoga, not the wire.

  * Doing proper inline mode (non-virtualized transcript in
    scrollback, composer pinned below) is not a flag flip — it's a
    different UI architecture.  Leaving this flag in so anyone
    re-running the experiment gets the same numbers, but not
    building the architecture until we're sure the perf win is
    worth the UX loss (it probably isn't — the fullscreen + virt
    path is the one we should optimize, not replace).

Keeping the flag as an experiment gate.  Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
2026-04-26 17:11:49 -05:00
Brooklyn Nicholson 8d2b08342c Add node-inspect-debugger and python-debugpy skills
Two new skills under skills/software-development/ for real breakpoint-driven
debugging from the terminal:

- node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL,
  CDP scripting via chrome-remote-interface, attaching to running Node
  processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger,
  CPU profiles + heap snapshots.

- python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb
  (with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for
  remote/attach, remote-pdb as the agent-friendly alternative to DAP,
  recipes for tui_gateway/_SlashWorker/subprocess debugging.
2026-04-26 17:10:11 -05:00
Brooklyn Nicholson 82f842277e perf(tui): profile harness gains --loop, --save, --compare
Before: change code → build → run profile → manually compare to
mental model of last run.  After: `--loop` watches ui-tui/src and
packages/hermes-ink/src for .ts(x) changes, rebuilds on change,
re-runs the same scenario, prints a side-by-side A/B diff against
the previous iteration — so each edit's impact is quantified
instantly.  Ctrl+C to stop.

Also added:
  --save LABEL     saves metrics snapshot to /tmp/perf-<LABEL>.json
  --compare LABEL  diffs the current run vs that snapshot
  --extra-flag X   pass-through to node dist/entry.js (prepping for
                   --no-fullscreen below)

key_metrics() flattens a full run into scalar numbers across
frames, React commits, and per-phase timings.  format_diff() prints
a table with ↑/↓ markers denoting regressions vs improvements based
on whether the metric is lower-is-better (p99, max, patches, drain)
or higher-is-better (fps, gaps_under_16ms).

Run-to-run noise on static code is ~5-15% on most metrics — big
signal (>30% change on renderer_p99 / fps) cuts through cleanly.
Useful both for validating a single fix and for detecting subtle
regressions during the wheel-accel port.

Usage during the next perf session:

  # one-shot with a baseline for later comparison
  scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel

  # after porting the wheel handler
  scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel

  # continuous iteration
  scripts/profile-tui.py --seconds 6 --hold wheel_up --loop
2026-04-26 17:08:07 -05:00
Brooklyn Nicholson f823535db2 perf(tui): instrument stdout drain — rule out terminal parse bottleneck
Adds four fields to FrameEvent.phases and the matching profile
summary:

  optimizedPatches  post-optimize patch count (what's actually
                    written to stdout; the .patches field is
                    pre-optimize)
  writeBytes        UTF-8 byte count of the write this frame
  backpressure      true when Node's stdout.write returned false
                    (Writable buffer full — outer terminal can't
                    keep up)
  prevFrameDrainMs  end-to-end drain time of the PREVIOUS frame's
                    write, captured from stdout.write's 2-arg
                    callback.  Reported on the next frame so the
                    measurement reflects "time until OS flushed
                    the bytes to the terminal fd", not "time until
                    queued in Node".

writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback.  Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.

Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):

  patches total    28,888
  optimized total  16,700   (ratio 0.58 — optimizer cuts ~42%)
  writeBytes       42 KB / 10s = 4.2 KB/s throughput
  drainMs p50      0.14 ms   terminal accepts bytes instantly
  drainMs p99      0.85 ms
  backpressure     0% of frames

This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s.  The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
2026-04-26 17:06:22 -05:00
Brooklyn Nicholson d3dedf10aa revert(tui): drop DeferredMd, profiling showed it was neutral
Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel.
The placeholder → microtask-upgrade pattern did not reduce renderer
p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse).  Each fresh
row still pays the Md cost — just on a follow-up commit instead of
inline — and the follow-up commit shows up as a second heavy frame
a few ms later.

The real bottlenecks turned out to be:

  1. wheel step too large (fixed in 7ca16eea)
  2. outer terminal ANSI parse throughput (diagnosing next)
  3. React commit frequency during hold-scroll (needs coalescing)

None of which DeferredMd addresses.  Clearing the complexity so the
next experiments land on a simpler substrate.
2026-04-26 17:03:38 -05:00
Brooklyn Nicholson 7ca16eea56 perf(tui): scroll one row at a time per wheel event, half-viewport per pageUp
User observation: "it doesn't scroll line by line/row by row."

Was right.  Two places hardcoded big deltas:

1. WHEEL_SCROLL_STEP = 6 (config/limits.ts)
   Each wheel event scrolled 6 rows.  A mechanical wheel notch emits
   3-5 events → 18-30 rows per click, which visually teleports past
   content instead of smooth-scrolling it.  Drop to 1.  Trackpads
   emit 50-100 events per flick — at step=1 that's still a fast flick
   (a whole viewport in one flick) but each intermediate frame is
   visible.  Porting claude-code's wheel accel state machine is the
   right next step if this feels sluggish on precision scrolls.

2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts)
   Full-viewport jumps replace the entire screen — no visual
   continuity, can't scan content — AND land right at Ink's fast-path
   threshold (`delta < innerHeight`), which disqualifies the DECSTBM
   blit on every press.  Half-viewport keeps 50% continuity AND
   drops well under the threshold.  Two presses still cover the same
   total distance.

Profiled against the 1106-msg session, holding the key at 30Hz for
6s:

  wheel_up (step 6 → 1):
    frames       142  →  163    (+15%)
    throughput   10.7 → 15.8 fps (+48%)
    patches tot  53018→ 36562   (-31%)
    gap p50      5ms  → 16ms    (actual rendering ~60fps now)
    <16ms frames 93   → 76
    16-33ms      82   → 76
    hitches      3    → 1

  pageUp (viewport-2 → viewport/2):
    throughput   10.7 → 9.5 fps  (same ballpark — smaller delta × same
                                  event rate = less total scroll)

Ink's proportional drain caps at `innerHeight - 1` per frame to keep
the DECSTBM fast path firing.  With these smaller deltas every event
comfortably fits under that cap, so fast-path hit rate goes up and
patch volume per frame drops — the measured 31% reduction in total
patches-sent correlates with users perceiving smoother scrolling
because the outer terminal (VS Code / xterm.js / tmux) isn't drowning
in ANSI between paints.

Tests/type-check/build clean; 352 tests pass.
2026-04-26 17:01:22 -05:00
Brooklyn Nicholson 4a9070c9ac perf(tui): defer Md upgrade for fresh-mounted assistant rows
Adds DeferredMd — a wrapper around <Md> that renders a lightweight
<Text> placeholder on first mount and upgrades to the full markdown
subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine
mounts during PageUp hold run our markdown tokenizer + syntax
highlighter synchronously, producing the 63-112ms renderer spikes
profiled earlier. A plain <Text> placeholder only needs Yoga to wrap
the pre-stripped string (no tokenizer, no highlight), then the Md
subtree builds in a follow-up React commit.

Upgrade cache: once a (theme, compact, text) tuple has been upgraded,
a WeakMap-keyed Set remembers it so remounts (scroll-out then
scroll-back) mount straight into <Md> — no placeholder round-trip.
WeakMap on theme means palette swaps re-upgrade naturally.

Honesty note: profiling under hold-PageUp showed this didn't reduce
renderer p99 measurably — the upgrade commit just pays the Md cost on
a follow-up frame instead of inline. The bigger bottleneck turned out
to be React commit frequency (3.5 commits/sec during 30Hz scroll
input, with 200ms+ silent gaps between commits dominating perceived
FPS), which this change doesn't address. Keeping the deferred path
anyway because:

  1. It's correct and tested — no regressions across 352 tests
  2. Defensive for pathological fresh-mount cases (giant code blocks,
     wide tables) that aren't in the current profile fixture
  3. Pairs naturally with useVirtualHistory's useDeferredValue to keep
     React's concurrent scheduler able to interrupt upgrade commits

If the follow-up perf investigation (terminal write throughput / patch
volume / commit frequency) shows DeferredMd is net-neutral-or-worse in
practice, this can be reverted with a one-line swap back to <Md> in
messageLine.tsx:115.

Companion to the streaming 2-column fix in 7242361a — these two
touched messageLine.tsx together so they land as a pair.
2026-04-26 16:56:09 -05:00
Brooklyn Nicholson 7242361a69 fix(tui): wrap streaming markdown split in column Box
StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md>
children. Each <Md> returns a <Box flexDirection="column">, but its
parent in messageLine.tsx (line 169) is `<Box width={...}>` with no
flexDirection, which Ink defaults to 'row'. So during streaming the
two column boxes rendered side-by-side, producing the visible "tokens
jumble into two columns until it fixes itself" bug — the "fix" was
message.complete flipping isStreaming→false, which swaps the
StreamingMd subtree for a single DeferredMd/Md child (no siblings → row
direction is harmless).

Wrap the two <Md> siblings in a flexDirection="column" Box so they
stack. Localized fix so the non-streaming path (single-child, works
fine in a row parent) is untouched.

Reported by user:
> "tokens streaming... going into 2 columns randomly and jumbling
>  together until it fixes itself"

No test changes — findStableBoundary tests still pass (the layout
change is parent-structural, not in the boundary logic). Build clean,
tsc clean, 352 tests pass.
2026-04-26 16:55:56 -05:00
Brooklyn Nicholson cd7a200e6c perf(tui): instrument scroll fast-path decline reasons
Adds scrollFastPathStats counters to render-node-to-output.ts: captures
every time a ScrollBox's DECSTBM scroll hint is generated, records
whether the fast path took it (blit+shift from prevScreen) or declined,
and why. Exposed through hermes-ink's public exports and snapshotted on
every FrameEvent so the profiler harness can correlate decline reasons
with the actual patch/renderer cost per frame.

This is pure observation — no behaviour change. Preparing for the
virtual-history rewrite: the hypothesis was that our topSpacer/
bottomSpacer scheme disqualifies every scroll via heightDelta
mismatch, but the data shows the fast path is actually taken on most
scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the
remaining steady-state renderer cost is Yoga tree traversal, not
the per-frame full redraw I initially suspected.

Declines that do happen correlate with React commits that changed the
mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer
cases the virtualization rewrite still needs to address.

No test diffs — instrumentation-only.  Build verified: `tsc --noEmit`
plus the full `npm run build` compiler post-pass pass cleanly.
2026-04-26 16:45:53 -05:00
Brooklyn Nicholson 71eee26640 perf(tui): full-pipeline instrumentation + profiling harness
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.

perfPane.tsx:
  Wires ink's onFrame callback (already plumbed through the fork) into
  the same perf.log as the React.Profiler samples. Captures per-phase
  timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
  optimize, stdout write) plus yoga counters (visited/measured/cache-
  Hits/live) and patch counts per frame.  Events are tagged
  {src: 'react'|'frame'} so jq can split them.  logFrameEvent is
  undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
  the callback.

entry.tsx:
  Passes logFrameEvent into render().

types/hermes-ink.d.ts:
  Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
  type-checks against the plumbed-through ink option.

scripts/profile-tui.py:
  New harness. Launches the built TUI under a PTY with the longest
  session in state.db resumed, holds PageUp/PageDown/etc at a
  configurable Hz for N seconds, then parses perf.log and prints
  per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
  beyond stdlib. Exit 2 if nothing was captured (wiring broken).

Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
  - Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
  - 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
  - One pathological 97ms frame with yoga measuring 70,415 text cells
    and Yoga visiting 225k nodes — the cold-unmeasured-region hit
  - Ink's scroll fast-path (DECSTBM blit from prevScreen) is
    disqualified because our spacer-based virtual history doesn't
    keep heightDelta in sync with scroll.delta, so every PageUp step
    falls through to a full 2000-4800 patch re-render instead of ~40
2026-04-26 16:36:25 -05:00
Brooklyn Nicholson 69ff201050 feat(tui): anchor todo panel above streaming output 2026-04-26 16:26:50 -05:00
Brooklyn Nicholson 2259eac49e feat(tui): collapse completed todo panel on turn end 2026-04-26 16:24:15 -05:00
Brooklyn Nicholson cb7cfba6de fix(cli): surface last_active in search_sessions so -c works 2026-04-26 16:21:57 -05:00
Brooklyn Nicholson debae25f1c perf(tui): incremental markdown during streaming
Split in-flight assistant text at the last stable block boundary so only
the unclosed tail re-tokenizes per stream delta. Previously the full
text was rendered as plain <Text> during streaming and only flipped to
<Md> at message.complete — cheap per delta but loses live markdown
formatting.

New StreamingMd component holds a monotonically-growing stablePrefix
in a ref (idempotent under StrictMode double-render), renders it as
one <Md> that memoizes across deltas, and renders the unstable suffix
as a second <Md> that re-parses on each delta. Cost per delta drops
from O(total length) to O(unstable length).

findStableBoundary walks back to the last "\n\n" outside an open
fenced code block — splitting inside an open fence would orphan the
opener and break highlighting in the prefix.

Adapted from claude-code's src/components/Markdown.tsx:186 but built
on our line-based tokenizer instead of marked.lexer. 9 new tests cover
fence balance, boundary walk, and empty input.

Part of the --tui perf audit (see audit #7).
2026-04-26 16:21:34 -05:00
Brooklyn Nicholson bde89c169b fix(cli): -c picks the most recently used session 2026-04-26 16:17:39 -05:00
Brooklyn Nicholson b36007b246 feat(tui): allow collapsing archived todo panels 2026-04-26 16:15:59 -05:00
Brooklyn Nicholson c78b528125 feat(tui): archive todos at turn end with incomplete hint 2026-04-26 16:14:58 -05:00
Brooklyn Nicholson 319c1c1691 fix(tui): inline todo in transcript, group across thinking 2026-04-26 16:09:28 -05:00
Brooklyn Nicholson 4943ea2a7c fix(tui): merge tools into contextual shelves 2026-04-26 16:00:38 -05:00
Brooklyn Nicholson 4d3e3a738d chore(tui): sort imports 2026-04-26 15:56:47 -05:00
Brooklyn Nicholson a5319fb7af test(tui): cover live todo completion flow 2026-04-26 15:56:08 -05:00
Brooklyn Nicholson f5552f92e2 fix(tui): stabilize live todo progress 2026-04-26 15:55:38 -05:00
Brooklyn Nicholson 1566f1eecc fix(tui): report actual session on exit 2026-04-26 15:55:01 -05:00
Brooklyn Nicholson a30db69dd5 chore(tui): clean live progress lint 2026-04-26 15:42:07 -05:00
Brooklyn Nicholson f6846205cc fix(tui): isolate turn state from app render 2026-04-26 15:40:38 -05:00
Brooklyn Nicholson 6a3873942f fix(tui): format thinking paragraphs 2026-04-26 15:38:18 -05:00
Brooklyn Nicholson 64de685d3f test(tui): remove stale turn freeze experiment 2026-04-26 15:35:41 -05:00
Brooklyn Nicholson cee4036e8b fix(tui): merge tool shelves in transcript 2026-04-26 15:35:38 -05:00
Brooklyn Nicholson cf8439263a fix(tui): keep todo pinned outside transcript 2026-04-26 15:33:01 -05:00
Brooklyn Nicholson 3271ffbd80 fix(tui): pin todo panel above live output 2026-04-26 15:27:31 -05:00
Brooklyn Nicholson a7831b63db fix(tui): stabilize live progress rendering 2026-04-26 15:23:43 -05:00
Brooklyn Nicholson d4dde6b5f2 fix(tui): restore resumed transcript lineage 2026-04-26 15:16:12 -05:00
Teknium 755a280424 chore(release): map Wang-tianhao in AUTHOR_MAP 2026-04-26 13:02:51 -07:00
Wang-tianhao 6087e04043 fix(slack): extract rich_text quotes/lists and link unfurl previews
Slack's modern composer sends messages with a 'blocks' array that
contains rich_text elements. When a user forwards or quotes another
message, the quoted content shows up in the rich_text_quote children
of that array — and is NOT included in the plain 'text' field. The
agent saw only the lossy plain text and was blind to forwarded /
quoted content. Same story for link unfurl previews (Notion, docs,
GitHub, etc.) which Slack puts in the 'attachments' array.

Two fixes in the inbound handler:

1. _extract_text_from_slack_blocks walks rich_text / rich_text_quote /
   rich_text_list / rich_text_preformatted trees and renders readable
   text ('> quoted', '• bullet', code fences), dedupes against the
   plain text field, and appends the extracted content so the agent
   sees everything.

2. Link unfurl / attachment preview extraction reads title, url,
   body, and footer from the 'attachments' array and appends a
   '📎 [title](url)\n   body\n   _footer_' section per preview.
   Skips is_msg_unfurl to avoid echoing our own Slack replies back.

Routing is careful not to trust augmented text: mention gating
(is_mentioned) and slash-command detection both run against the
original 'text' field, so forwarded content containing '<@bot>' or
'/deploy' in a quote can't trick the bot into responding in a
channel it shouldn't or classifying a normal message as a command.

Adjustment from original PR: dropped _serialize_slack_blocks_for_agent,
which inlined a redacted JSON dump of non-rich_text blocks (section,
accessory, actions, etc.) — the agent would see the raw Block Kit
structure for UI-heavy alerts. It added up to 6000 characters to the
prompt context on every qualifying message with no opt-out. The
rich_text extraction and attachment unfurls cover the common bug-fix
case (quoted/forwarded content + link previews) without the prefill
tax. If a user needs block inspection later, it can return as a
config opt-in.

Also updates the Slack platform notes in session.py to accurately
describe what the gateway inlines.
2026-04-26 13:02:51 -07:00
Teknium 4921b26945 fix(cron): keep homeassistant toolset enabled when HASS_TOKEN is set (#16208)
After #14798 made cron honor per-platform `hermes tools` config, the
`_DEFAULT_OFF_TOOLSETS` filter silently stripped `homeassistant` from
cron jobs for users who'd been relying on the previous blanket toolset.
Norbert's HA cron reports regressed as a result.

The HA toolset is already runtime-gated by its `check_fn` (requires
HASS_TOKEN to register any tools). When HASS_TOKEN is set the user has
explicitly opted in — `_DEFAULT_OFF_TOOLSETS` adds nothing in that case,
so stop double-gating and restore HA for cron / cli / other platforms
without an explicit saved toolset list.

moa and rl stay off by default (original #14798 goal preserved).

Fixes HA cron regression reported by Norbert.
2026-04-26 12:55:58 -07:00
Teknium 822b507a72 chore(release): map maxims-oss in AUTHOR_MAP 2026-04-26 12:54:46 -07:00
maxims-oss 18beb69b49 fix(memory): close embedded Hindsight async client cleanly
HindsightEmbedded.close() delegates to its sync client.close(). When Hermes
created/used that client on the shared async loop, closing it from the main
thread raises 'attached to a different loop' before aiohttp releases the
session — so the ClientSession / TCPConnector leak past provider teardown.

Close the embedded inner async client on the shared loop first via
_run_sync(inner_client.aclose()), then let the wrapper's sync close()
do its daemon/UI bookkeeping.

Salvage of #14605: test placement rebased — appended TestShutdown class
after TestSharedEventLoopLifecycle (which landed on main after the PR was
written). Original author attribution preserved.
2026-04-26 12:54:46 -07:00
Tranquil-Flow bf05b8f4a2 fix(gateway): clean up cached agents on shutdown (#11205) 2026-04-26 12:51:53 -07:00
Zainan Victor Zhou 778fd1898e fix(slack): surface attachment access diagnostics
Translate Slack attachment failures into actionable user-facing notices
instead of generic download errors. When a scope/auth/permission issue
breaks attachment processing, the user sees:

  [Slack attachment notice]
  - Slack attachment access failed for photo.jpg. Missing scope:
    files:read. Update the Slack app scopes/settings and reinstall
    the app to the workspace.

Two helpers do the translation:

  _describe_slack_api_error — handles SlackApiError responses
    (missing_scope, invalid_auth, file_not_found, access_denied, etc.)

  _describe_slack_download_failure — handles httpx.HTTPStatusError
    (401/403/404) and Slack-returns-HTML-sign-in fallbacks

Wired into three existing call sites:
 - the Slack Connect files.info path (PR #11111) so scope errors
   surface instead of being logged as generic "files.info failed"
 - the image, audio, and document download paths so 401/403 and
   HTML-body responses translate into actionable notices

Adjustment from original PR: dropped _probe_slack_file_access_issue,
the proactive pre-download files.info probe. It added one extra
Slack API call per attachment even on healthy ones, and overlapped
with the existing files.info call from PR #11111. The post-failure
translation path covers the same user-facing diagnostic value
without the per-message tax.

Also documents files:read scope more prominently in the Slack setup
guide and troubleshooting table.

Contributed back from https://github.com/xinbenlv/zn-hermes-agent.

Closes #7015.
Co-authored-by: xinbenlv <zzn+pa@zzn.im>
2026-04-26 12:47:43 -07:00
Teknium 45bfcb9e71 test: update bare-agent helper for live-runtime attrs added by #16099
Background review fork now inherits session_id, credential_pool, and
status_callback from the parent (added in #16099 after this PR was
written). Extend the bare-agent helper so the regression test keeps
reaching the cleanup assertions instead of failing in the runtime
resolver.

Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>
2026-04-26 12:45:39 -07:00
MRHwick aa7b5acfcd pass attribution check 2026-04-26 12:45:39 -07:00
MRHwick 36e352afa7 preserve the original comment 2026-04-26 12:45:39 -07:00
MRHwick 2d86e97a7e fix(run_agent): shut down background review memory providers
Temporary background review agents can initialize Hindsight-backed memory clients, but close() alone skips provider teardown. Shut the memory provider down before closing so aiohttp sessions do not leak at process exit.

Made-with: Cursor
2026-04-26 12:45:39 -07:00
Teknium edadeaf495 chore(release): map Satoshi-agi and kunlabs in AUTHOR_MAP 2026-04-26 12:35:16 -07:00
kunlabs f9885130b4 fix(slack): download files in Slack Connect channels
Slack Connect channels return file objects with file_access="check_file_info"
and no url_private_download field (see
https://docs.slack.dev/reference/objects/file-object/#slack_connect_files).
These stub objects must be resolved via files.info before download can
proceed. Without this the agent silently skips attachments posted in
Slack Connect channels.

Call files.info on every file whose file_access is check_file_info,
replace the stub with the full file object, and let the existing
download path continue. Warn and skip on files.info failures.

Closes #11095.
2026-04-26 12:35:16 -07:00
flobo3 f414df3a56 fix(slack): include team_id in thread-context cache key 2026-04-26 12:35:16 -07:00
Satoshi-agi c0d25df311 fix(slack): preserve thread-parent context when cron/bot posted the parent
The Slack thread-context fetcher used to drop every message with a
bot_id, which silently erased the thread parent whenever a cron job (or
any other bot) had posted it. As a result, replies to a cron-posted
summary lost all context and the agent answered as if from a blank
thread.

Changes:

1. gateway/platforms/slack.py::_fetch_thread_context
   - Keep the thread parent even when it was posted by a bot
     (e.g. cron summaries, third-party integrations).
   - Only skip *our own* prior bot replies to avoid circular context,
     matching the per-workspace bot user id via _team_bot_user_ids so
     multi-workspace deployments stay correct.
   - Keep non-self bot children (useful third-party context).

2. gateway/platforms/slack.py::_handle_slack_message
   - Populate MessageEvent.reply_to_text for thread replies (parity
     with Telegram/Discord/Feishu/WeCom). gateway.run uses this field
     to inject a [Replying to: "..."] prefix when the parent is not
     already in the session history, which is exactly the scenario
     triggered by cron-generated thread parents.
   - New helper _fetch_thread_parent_text reuses the existing thread-
     context cache (and its 60s TTL) to avoid duplicate
     conversations.replies calls; falls back to a cheap limit=1 fetch
     when the cache is cold.

Tests:

- Updated TestSlackThreadContext::test_skips_bot_messages to reflect
  the new behaviour (self-bot child dropped, third-party bot kept).
- Added:
    * test_fetch_thread_context_includes_bot_parent
    * test_fetch_thread_context_excludes_self_bot_replies
    * test_fetch_thread_context_multi_workspace
    * test_fetch_thread_context_current_ts_excluded (regression guard)
    * test_fetch_thread_parent_text_from_cache
    * test_slack_reply_to_text_set_on_thread_reply
    * test_slack_reply_to_text_none_for_top_level_message

Full Slack suite: 176 passed (was 169).
2026-04-26 12:35:16 -07:00
helix4u 10e36188da fix(cli): wire approvals in background tasks 2026-04-26 12:29:48 -07:00
Teknium 6a3102f9d4 chore(release): map hhuang91 in AUTHOR_MAP 2026-04-26 12:29:02 -07:00
bde3249023 75d3eaa0e4 fix(slack): exclude U/W user IDs from explicit target regex
Slack's chat.postMessage API rejects user IDs (U...) and workspace
IDs (W...) — they are not valid conversation IDs. Posting to them
fails because the API requires a channel ID (C/G/D). To DM a user,
the sender must first call conversations.open to obtain a D... ID.

Tighten _SLACK_TARGET_RE from [CGDUW] to [CGD] so the send path rejects
U/W values as explicit targets and instead falls through to channel-
name resolution (where they'll fail with a clear 'could not resolve'
error rather than silently getting stuck in a retry loop on the API).

Flip the corresponding regression test to assert U/W values are not
explicit. Matches the narrower regex briandevans proposed in #15939.

Co-authored-by: briandevans <brian@bde.io>
2026-04-26 12:29:02 -07:00
hhuang91 802c7acb81 fix(Slack): resolve Slack channels by raw ID and enumerate joined channels
send_message(target='slack:<channel_id>') failed with "Could not
resolve" because _parse_target_ref had no Slack branch — Slack's
uppercase alphanumeric IDs fell through to channel-name resolution,
which only matched by name. As a fallback, the agent would retry with
bare target='slack' and post to the home channel instead.

Three fixes:

- _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as
  explicit targets so the name-resolver is bypassed entirely.
- resolve_channel_name tries a case-sensitive raw-ID match before
  the existing name match, so any platform's IDs resolve cleanly.
- _build_slack now actually calls users.conversations against each
  workspace's AsyncWebClient (paginated), instead of only returning
  session-history entries. This populates the directory with public
  and private channels the bot has joined, so action='list' shows
  them and they can also be addressed by name. Errors from one
  workspace don't block others.

build_channel_directory becomes async (Slack web calls require it).
The two async-context callers in gateway/run.py are awaited; the
cron ticker thread call bridges via asyncio.run_coroutine_threadsafe.

Slack bot needs channels:read and groups:read scopes for full
enumeration; missing scopes degrade gracefully per-workspace.

addressing #15927
2026-04-26 12:29:02 -07:00
Teknium 541cd732e8 chore(models): drop deepseek from OpenRouter and Nous Portal curated picker lists (#16197)
Removes deepseek/deepseek-v4-pro and deepseek/deepseek-v4-flash from
OPENROUTER_MODELS and _PROVIDER_MODELS['nous'], then regenerates
website/static/api/model-catalog.json so the hosted picker JSON drops
them too. Direct-API deepseek provider support is unchanged.
2026-04-26 12:28:17 -07:00
Teknium 4d119bb62a test: blank platform-gating env vars in hermetic fixture
load_gateway_config() has a side effect: when config.yaml contains
platform-gating keys (slack.require_mention, slack.strict_mention,
slack.free_response_channels, slack.allow_bots, slack.reactions, plus
analogous keys for discord/telegram/whatsapp/dingtalk/matrix), it calls
os.environ[KEY] = ... to bridge them to env-var form.

monkeypatch.delenv doesn't track direct os.environ mutations made
inside the test body, so tests that call load_gateway_config() leak
those env vars into later tests on the same xdist worker. The failure
mode is flaky seed-dependent: test_top_level_message_requires_mention_
even_with_session (and siblings in TestThreadReplyHandling) pass when
SLACK_REQUIRE_MENTION is unset but fail when a leaked value of 'false'
is present.

Add the gating env vars to _HERMES_BEHAVIORAL_VARS so the hermetic
autouse fixture blanks them on every test setup, closing the leak
regardless of which test sets them.
2026-04-26 12:23:20 -07:00
Teknium 878c196738 chore(release): map hhhonzik in AUTHOR_MAP 2026-04-26 12:23:20 -07:00
Honza Stepanovsky 50dd67c680 fix(slack): skip _mentioned_threads registration when strict_mention is on
Extends the strict_mention feature so an @mention in strict mode no
longer persistently tags the thread as 'mentioned'. Without this, the
thread's first mention would permanently auto-trigger the bot on every
subsequent message — which is exactly what strict_mention is designed
to prevent. Closes the agent-to-agent ack loop hole hhhonzik identified
in #14117.

Co-authored-by: hhhonzik <me@janstepanovsky.cz>
2026-04-26 12:23:20 -07:00
Ching aea4a90f0e feat(slack): add opt-in slack.strict_mention gate for channel threads
Adds a strict_mention config option that, when enabled, requires an
explicit @-mention on every message in channel threads. Disables the
'once mentioned, forever in the thread' and session-presence auto-triggers.

- New _slack_strict_mention() helper (config.extra + SLACK_STRICT_MENTION env)
- Bridged top-level slack.strict_mention yaml to SLACK_STRICT_MENTION env,
  matching require_mention/allow_bots bridging
- Unit tests for the helper + config bridge
2026-04-26 12:23:20 -07:00
Teknium 897dc3a2bb fix(install+update): add /usr/local/bin PATH guard for RHEL root non-login shells (#16191)
* fix(install): add /usr/local/bin PATH guard for RHEL root non-login shells

The FHS-layout branch assumed /usr/local/bin is on PATH for every
standard shell. That holds for login shells (via /etc/profile's
pathmunge) but breaks on RHEL/CentOS/Rocky/Alma 8+ root in non-login
interactive shells (su, sudo -s, tmux panes, some web terminals) —
/etc/bashrc does not add /usr/local/bin and /root/.bash_profile
doesn't either. Result: hermes command links to /usr/local/bin/hermes
but the user has to type the absolute path each time.

Probe a fresh 'bash -i -c' (non-login interactive, matching the user
scenario) after symlinking. If hermes isn't resolvable, append an
idempotent PATH guard to /root/.bashrc and /root/.bash_profile, same
grep pattern already used by the ~/.local/bin branch below. No change
on distros where /usr/local/bin is already inherited.

* fix(update): repair RHEL root PATH on hermes update

Existing RHEL/CentOS/Rocky/Alma root installs won't be repaired by the
install.sh fix alone because 'hermes update' is an in-place git pull, not
a rerun of install.sh. Port the same probe + idempotent .bashrc write
into cmd_update so affected users get fixed automatically on next update.

_ensure_fhs_path_guard() runs after 'Update complete!':
- Linux + root + FHS-layout install (command at /usr/local/bin/hermes) only
- Probe: env -i bash -i -c 'command -v hermes' — fresh non-login interactive
  shell, same scenario the user reports
- On failure, append PATH guard to /root/.bashrc and /root/.bash_profile,
  skipping if any uncommented PATH line already mentions /usr/local/bin
- Silent no-op on macOS, non-root, legacy layout, or shells that already
  resolve hermes
2026-04-26 12:22:37 -07:00
Brooklyn Nicholson 350ee1bf23 refactor(tui): render progress in ordered stream timeline 2026-04-26 14:12:43 -05:00
Brooklyn Nicholson 3d21f97422 fix(tui): keep live tool state before stream segments 2026-04-26 14:06:42 -05:00
Teknium 4b5a88d714 fix(slack): honor reply_in_thread=false for top-level channel messages
Top-level channel messages arrive at _resolve_thread_ts with
metadata.thread_id set to the message's own ts, because the inbound
handler in _handle_message_event uses 'event.ts' as a session-keying
fallback when event.thread_ts is absent. That made metadata alone
insufficient to distinguish a real thread reply from a top-level
message, so reply_in_thread=false only took effect in DMs.

Use reply_to (== incoming message_id == ts for top-level messages) as
the tiebreaker: when metadata.thread_id == reply_to the 'thread' is the
synthetic session-keying fallback, not a real parent, so we reply
directly in the channel. Real thread replies (reply_to != thread_id)
still resolve to the parent thread and preserve conversation context.

Closes #9268.
2026-04-26 12:04:46 -07:00
bde3249023 b1be86ef96 fix(gateway): bridge slack.reply_in_thread config 2026-04-26 12:04:46 -07:00
Brooklyn Nicholson 7b5b524fc7 refactor(tui): clean thinking and viewport helpers 2026-04-26 14:03:36 -05:00
Brooklyn Nicholson a30ffbe1d4 fix(tui): show queued prompts when drained 2026-04-26 14:01:14 -05:00
Brooklyn Nicholson c9f7b703dd fix(tui): filter thinking status noise 2026-04-26 13:59:56 -05:00
Brooklyn Nicholson a8bfe72d35 fix(tui): address latest review feedback 2026-04-26 13:56:26 -05:00
Teknium ae7687cdc5 chore(release): map zhiyanliu in AUTHOR_MAP 2026-04-26 11:56:23 -07:00
sgaofen c730f6cc0b test(gateway): cover Slack vs non-Slack home-channel onboarding hint
Parameterize the test helpers in test_status_command.py to accept a
Platform and add two regression tests ensuring the first-run home-channel
onboarding uses '/hermes sethome' on Slack and '/sethome' everywhere else.

Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
2026-04-26 11:56:23 -07:00
Zhi Yan Liu d993a3f450 fix(gateway): use /hermes sethome in onboarding hint on Slack
Slack's adapter registers a single parent slash command /hermes and
dispatches subcommands via slack_subcommand_map(). Bare /sethome is
not a registered command on Slack and fails with 'app did not
respond', logging 'Unhandled request' in slack_bolt.AsyncApp.

Show /hermes sethome in the first-run onboarding hint when the
source platform is Slack; keep /sethome for Telegram, Discord,
Matrix, Mattermost, and other platforms that register it directly.

Fixes #14632
2026-04-26 11:56:23 -07:00
Teknium 1dfcc2ffc3 fix(gateway): /queue is now a true FIFO — each invocation gets its own turn (#16175)
Repeated /queue commands now each produce a full agent turn, in order,
with no merging.  Previously the second /queue overwrote the first
because the handler wrote directly into the adapter's single-slot
_pending_messages dict.

- GatewayRunner grows a _queued_events overflow buffer (dict of list).
- /queue puts new items in the adapter's next-up slot when free,
  otherwise appends to the overflow.  After each run's drain consumes
  the slot, the next overflow item is promoted so the recursive run
  picks it up.
- /new and /reset clear the overflow.
- /status now reports queue depth when non-zero.
- Ack message shows the depth once it exceeds 1.

Helpers (_enqueue_fifo, _promote_queued_event, _queue_depth) use the
getattr default-fallback pattern so existing tests that build bare
GatewayRunner instances via object.__new__ keep working.
2026-04-26 11:55:09 -07:00
Teknium 5b2c59559a feat(terminal): collapse subagent task_ids to shared container (#16177)
Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.

After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.

RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.

file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.

Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.

Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()

Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
2026-04-26 11:55:02 -07:00
Brooklyn Nicholson 2be5e181a9 fix(tui): keep thinking color theme-neutral 2026-04-26 13:54:12 -05:00
Brooklyn Nicholson 015f6c825d fix(tui): support modified enter for multiline input 2026-04-26 13:52:54 -05:00
Brooklyn Nicholson bb59d3bac2 fix(tui): preserve completed thinking panel 2026-04-26 13:49:41 -05:00
Brooklyn Nicholson 4a21920b5e fix(tui): address copilot review nits 2026-04-26 13:43:08 -05:00
Brooklyn Nicholson cc16d0ef77 Merge remote-tracking branch 'origin/main' into bb/tui-long-session-perf
# Conflicts:
#	ui-tui/src/app/interfaces.ts
2026-04-26 13:39:57 -05:00
Teknium 087e74d4d7 feat(slack): register every gateway command as a native slash (Discord/Telegram parity) (#16164)
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.

Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.

Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
  generate a Slack manifest from the registry (canonical names +
  aliases + plugin commands), clamped to Slack's 50-slash cap with
  /hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
  registered slash to _handle_slash_command, which dispatches on
  command['command']. Legacy /hermes <subcommand> keeps working for
  backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
  manifest' command prints/writes a full manifest (display info,
  OAuth scopes, event subs, socket mode, slash commands) ready to
  paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
  and points users at the 'From an app manifest' flow; also offers
  to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
  /model), legacy /hermes <sub> compat, manifest structure, and
  telegram<->slack parity (every Telegram command must also register
  as a Slack slash). Existing /hermes-registration test updated to
  assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
  flow in Step 1; cli-commands.md documents 'hermes slack manifest'.

Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
2026-04-26 11:38:32 -07:00
Brooklyn Nicholson a8fcd1c742 fix(tui): apply details mode live 2026-04-26 13:34:33 -05:00
Teknium 9be83728a6 docs(docker-backend): clarify container is shared across sessions, not per-session (#16158)
The Docker terminal-backend docs said 'each session starts a long-lived
container', implying a fresh container per chat session. That hasn't been
true for a while: for the top-level agent, task_id defaults to 'default'
and the container is cached in _active_environments for the lifetime of
the Hermes process. /new, /reset, and switching sessions all reuse the
same container. Only delegate_task subagents and RL rollouts get isolated
containers keyed by their own task_id.
2026-04-26 10:46:08 -07:00
Teknium 9397767513 chore(skills): remove empty feeds category (#16153)
skills/feeds/ only contained a category-marker DESCRIPTION.md with no
actual skills in it. Removing the directory and the 'feeds' -> 'Feeds'
display-label mapping in website/scripts/extract-skills.py (the only
other reference in the repo).
2026-04-26 10:44:56 -07:00
Teknium 9662e3218a fix(tui): call maybe_auto_title for TUI sessions (#15949) (#16151)
* fix(tui): call maybe_auto_title for TUI sessions (#15961)

The maybe_auto_title() helper is called from cli.py and gateway/run.py
but was never wired into tui_gateway/server.py, so every session started
via 'hermes --tui' landed in state.db with an empty title. Evidence from
the issue reporter: 0/154 TUI sessions titled vs 91/383 CLI.

Mirror the CLI/Gateway pattern: after emitting message.complete, when the
turn finished cleanly, fire-and-forget title generation using the session
key, user prompt, agent response, and current history.

Fixes #15949.

Co-authored-by: math0r-be <math0r-be@github.com>

* chore(release): map math0r-be placeholder email in AUTHOR_MAP

---------

Co-authored-by: math0r-be <math0r-be@github.com>
2026-04-26 10:44:22 -07:00
Teknium 0824ba6a9d fix(/branch): redirect session_log_file and expose branch sessions in list (#14854) (#16150)
* fix(/branch): redirect session_log_file and expose branch sessions in list

Two bugs when using /branch:

1. cli.py _handle_branch_command updated agent.session_id but not
   agent.session_log_file, so all messages written after branching
   landed in the original session's JSON file and the branch never
   got its own session_{id}.json on disk.

   Fix: mirror the compression-split path (run_agent.py:7579) and
   update session_log_file immediately after changing session_id.

2. hermes_state.py list_sessions_rich filtered out every session
   with parent_session_id IS NOT NULL to hide sub-agent runs and
   compression continuations. Branch sessions share this column, so
   they became invisible to `hermes sessions list` and `sessions browse`.

   Fix: also include branch children — those whose parent ended with
   end_reason='branched' AND whose started_at >= parent.ended_at
   (the same timing condition that get_compression_tip uses to
   distinguish continuations from live-spawned subagents).

Fixes #14854

Co-Authored-By: Octopus <liyuan851277048@icloud.com>

* chore(release): map octo-patch placeholder email in AUTHOR_MAP

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Octopus <liyuan851277048@icloud.com>
2026-04-26 10:28:19 -07:00
Teknium 42c076d349 feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.

Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.

Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.

Feature is on by default. Opt out via:
  browser:
    auto_local_for_private_urls: false

The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
Teknium 0e2a53eab2 feat(skills): show enabled/disabled status in 'skills list' (#16129)
'hermes skills list' now shows every skill's enabled/disabled status
and accepts --enabled-only to filter down to what will actually load
for the active profile:

    hermes -p dario skills list --enabled-only

Previously the command was a flat catalog — it did not apply
skills.disabled from config.yaml, so there was no way to see the
live skill set for a profile without reading config by hand.
Profile switching already works via -p (swaps HERMES_HOME); this
just surfaces the result visibly.

Changes:
- hermes_cli/skills_hub.py: do_list adds a Status column and an
  enabled_only filter; summary reports enabled/disabled split
- hermes_cli/main.py: --enabled-only flag on 'skills list'
- /skills list slash command accepts --enabled-only too
- tests: 4 new (status column, disabled marking, enabled-only
  hiding, no platform leakage into get_disabled_skill_names);
  existing fixtures updated to accept skip_disabled kwarg

Reported by @mochizukimr on X.
2026-04-26 09:20:53 -07:00
Brooklyn Nicholson 6814646b36 fix(tui): avoid duplicating flushed stream text 2026-04-26 10:58:18 -05:00
Teknium eaa7e2db67 feat(cli,tui): surface /queue, /bg, /steer in agent-running placeholder (#16118)
* feat(cli,tui): surface /queue, /bg, /steer in agent-running placeholder

While the agent loop is running, the input placeholder previously only
hinted at Enter-to-interrupt. Surface the full set of busy-time actions
(interrupt via new message, /queue, /bg, /steer) so users discover them
without hunting through docs or Teknium's tweets.

- cli.py: "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
- ui-tui/src/components/appLayout.tsx: same string (was "Ctrl+C to interrupt…")

* revert tui placeholder change (cli-only per review)
2026-04-26 08:50:30 -07:00
briandevans 4e356098d2 fixup! fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654)
Address Copilot review findings:

1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts.
   Both fields are semantically paired — desc describes the activity *at* ts.
   Updating desc without ts made get_activity_summary() report "starting new
   turn (cached)" for 20+ minutes while the timestamp showed the true stale
   duration, producing misleading diagnostic output.

2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert
   on _last_activity_ts values.  Real time.time() comparisons were latently
   flaky under slow CI or NTP adjustments.  _FAKE_NOW = 10_000.0 is used
   as the reference; assertions are now exact equality rather than >=.

3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to
   directly cover the gated desc behaviour introduced by (1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:45:44 -07:00
briandevans de24315978 fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654)
_last_activity_ts was unconditionally reset to time.time() on every
_agent_cache hit.  For interrupt-recursive _run_agent calls
(_interrupt_depth > 0) this silently reset the inactivity watchdog's
idle clock on each re-entry, preventing the 30-min timeout from ever
firing when a turn got stuck in an interrupt loop.  A stuck session
would emit "Still working... iteration 0/60, starting new turn (cached)"
heartbeats indefinitely instead of timing out.

Gate the reset on _interrupt_depth == 0 only.  Fresh external turns
still receive the reset so a session idle for 29 min doesn't trip the
watchdog before the new turn makes its first API call (#9051).

The per-turn reset logic is extracted into a static helper
_init_cached_agent_for_turn() to make it directly testable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:45:44 -07:00
Teknium 20cb706e03 chore: extend [SYSTEM:→[IMPORTANT: rename + AUTHOR_MAP
Follow-up to #6616 covering the remaining user-injected prompt markers that
the original PR did not touch (reporter's second comment on #6576 explicitly
flagged these). Azure OpenAI Default/DefaultV2 content filters treat any
bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400.

Remaining call sites renamed:
- cli.py: background-process notifications (watch_disabled, watch_match,
  completion), MCP reload notice (4 live + 1 docstring)
- gateway/run.py: same notification paths + auto-loaded skill banner +
  MCP reload notice (5 live + 1 docstring)
- tools/process_registry.py: comment reference

Not renamed:
- environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training
  trajectory rendering only, never sent to Azure, part of a symmetric
  [USER]/[ASSISTANT]/[TOOL] scheme.

AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.
2026-04-26 08:44:58 -07:00
ygd58 d7a3468246 fix(prompts): replace [SYSTEM: with [IMPORTANT: to avoid Azure content filter
Azure OpenAI content filters (Default/DefaultV2) treat bracketed
[SYSTEM: ...] meta-instructions as prompt-injection attempts and
reject requests with HTTP 400.

Replacing [SYSTEM: with [IMPORTANT: preserves the same semantic
meaning for the model while bypassing the Azure heuristic.

Fixes #6576
2026-04-26 08:44:58 -07:00
Teknium f2d655529a fix(auth): hoist get_env_value import + strengthen .env fallback tests
Follow-up to cherry-picked PR #15920:

- agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value'
  to module top instead of inline try/except in each seed site (3 sites).
  No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool.
- hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop.
- tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests
  with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies
  _seed_from_env / _resolve_api_key_provider_secret read from it, and asserts
  the full priority chain: os.environ > .env > credential_pool. Uses
  'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY
  and _seed_from_env's generic path requires a real pconfig lookup.
2026-04-26 08:32:09 -07:00
阿泥豆 27f4dba5ce test: add unit tests for credential pool env fallback 2026-04-26 08:32:09 -07:00
阿泥豆 8443998dc3 fix(auth): resolve API keys from ~/.hermes/.env and credential_pool
_resolve_api_key_provider_secret() and _seed_from_env() only checked
os.environ for provider API keys. When keys exist in ~/.hermes/.env but
are not loaded into the process environment (e.g. ACP adapter entry
point, post-session-start .env edits, or non-CLI entry points), the
resolution returns an empty string, causing HTTP 401 failures.

Changes:
- credential_pool._seed_from_env: use get_env_value() which checks both
  os.environ and ~/.hermes/.env file, preventing _prune_stale_seeded_entries
  from removing valid entries whose env var isn't in os.environ
- credential_pool._seed_from_env: same fix for openrouter and
  base_url_env_var resolution
- auth._resolve_api_key_provider_secret: use get_env_value() instead of
  os.getenv(), and add credential_pool fallback when env resolution fails

Fixes #15914
2026-04-26 08:32:09 -07:00
Teknium e3901d5b25 fix(run_agent): background review fork inherits parent's live runtime (#16099)
The background memory/skill review (_spawn_background_review) has always
forked a new AIAgent passing only model and provider, then relied on
AIAgent.__init__ to re-resolve credentials from env vars. This works for
users with keys in ~/.hermes/.env but silently falls back to env-var
auto-resolution in all cases, which fails for OAuth-only providers,
session-scoped creds, and credential-pool setups where auth can't be
reconstructed from env.

This used to be invisible -- failures were swallowed via logger.debug().
PR 8a2506af4 (Apr 24) surfaced auxiliary failures to the user, which
made the stale bug visible as:
    "Auxiliary background review failed: No LLM provider configured"

Fix: pass api_key, base_url, api_mode, and credential_pool from the
parent's live runtime into the fork -- matching how every other
auxiliary path (compression, memory flush, vision, session search)
already inherits the parent's credentials via _current_main_runtime().
2026-04-26 08:29:40 -07:00
Teknium 06f81752ed Revert "feat(kanban): durable multi-profile collaboration board (#16081)" (#16098)
This reverts commit 15937a6b46.
2026-04-26 08:29:37 -07:00
Teknium 9ef1ae138a fix(docker): don't chown config.yaml after gosu drop (#15865) (#16096)
The chown/chmod block on config.yaml was added in b24d239ce to keep the
file readable by the hermes runtime user, but it sat in the post-gosu
'running as hermes' section of the entrypoint. That meant:

1. Default `docker run <image>` — container starts as root, entrypoint
   drops to hermes via gosu, then non-root hermes tries to chown the
   file to hermes. Works by coincidence because the file was just
   created by root during volume setup and gosu target == target owner.
2. `docker run -u $(id -u):$(id -g) <image>` (#15865) — container
   starts as the caller's UID. The root block is skipped entirely, we
   land in the hermes section as some arbitrary non-root user, and
   chown to 'hermes' fails with 'Operation not permitted'. Script
   aborts under `set -e`.

Move the chown/chmod into the root block (before the gosu exec) where
it actually has privilege, and guard with `2>/dev/null || true` so
rootless Podman (where even in-container root lacks host-side chown
rights) doesn't abort either.

Closes #15865
2026-04-26 08:27:39 -07:00
Teknium c5196f1fc2 chore(release): map focusflow.app.help@gmail.com to yes999zc
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
2026-04-26 08:25:22 -07:00
FocusFlow Dev 63bf7a29b6 fix(run_agent): prevent reasoning_content regression in DeepSeek/Kimi tool-call replay
PR #15478 fixed missing reasoning_content for DeepSeek API but introduced
a regression: tool-call messages with genuine 'reasoning' field were
overwritten by empty-string fallback before promotion.

Re-order _copy_reasoning_content_for_api steps:
  1. Preserve explicit reasoning_content
  2. Promote 'reasoning' field (MOVED UP)
  3. DeepSeek/Kimi tool-call empty-string fallback (MOVED DOWN)
  4. Non-thinking provider cleanup

Fixes #15812, relates #15749, #15478.
2026-04-26 08:25:22 -07:00
Teknium 15937a6b46 feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.

Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).

What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
  dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
  entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
  worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
  the running-agent guard (board writes don't touch agent state), so
  /kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
  vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
  4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
  updated, sidebar entry added.

Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
  execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
  No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
  failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
  claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
  can serve many businesses with data isolation by workspace path.

Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
2026-04-26 08:24:26 -07:00
Teknium 454d883e69 refactor: drop persist_session plumbing + fix broken btw mid-turn bypass (#16075)
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.

run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
  short-circuit. Only /btw ever passed persist_session=False; with
  /btw gone the default (always persist) is the only behavior anyone
  ever wanted.

gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
  (PR #16059). Canonical name for a /btw message is 'background' after
  alias resolution — the comparison could never be true, and it called
  _handle_btw_command which no longer exists. The /background branch
  above it already dispatches /btw correctly.

tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
  (the real dispatch target for /btw) instead of the deleted
  _handle_btw_command.
2026-04-26 07:15:23 -07:00
Teknium 70f56e7605 fix(gateway): let /btw dispatch mid-turn instead of being rejected
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:

   Agent is running — /btw can't run mid-turn. Wait for the current
  response or /stop first.

That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.

Reported by @IuriiTiunov on Telegram.
2026-04-26 07:11:10 -07:00
Teknium 7fa70b6c87 refactor: /btw is now an alias for /background (#16053)
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.

Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.

Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py

Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
2026-04-26 07:11:08 -07:00
Teknium 9a70260490 Revert "feat(onboarding): port first-touch hints to the TUI (#16054)" (#16062)
This reverts commit ffd2621039.
2026-04-26 06:31:37 -07:00
Teknium ffd2621039 feat(onboarding): port first-touch hints to the TUI (#16054)
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY).  This extends the same
latch to the TUI with TUI-native wording.

The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts.  The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.

Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
  hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
  _on_tool_complete for the 30s/tool_progress=all path.  Same
  config.yaml latch so each hint fires at most once per install across
  CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
  busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
  section explaining the TUI's double-Enter model and the hints

Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
  tests/tui_gateway/test_protocol.py \
  tests/gateway/test_busy_session_ack.py
  -> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint       -> clean
npm --prefix ui-tui run build      -> clean
2026-04-26 06:24:19 -07:00
Teknium 1e37ddc929 feat(cli): add 'hermes fallback' command to manage fallback providers (#16052)
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.

  hermes fallback [list]   Show the current chain (primary + fallbacks)
  hermes fallback add      Run the model picker, append selection to chain
  hermes fallback remove   Pick an entry to delete (arrow-key menu)
  hermes fallback clear    Remove all entries (with confirmation)

'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
2026-04-26 06:19:04 -07:00
Teknium 83c1c201f6 feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046)
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:

1. First message while the agent is working — appends a hint to the busy-ack
   explaining the /busy queue vs /busy interrupt knob, phrased to match the
   mode that was just applied (don't tell a queue-mode user to switch to
   queue).

2. First tool that runs for >= 30s in the noisiest progress mode
   (tool_progress: all) — prints a hint about /verbose to cycle display
   modes (all -> new -> off -> verbose). Gated on /verbose actually being
   usable on the surface: always shown on CLI; on gateway only shown when
   display.tool_progress_command is enabled.

Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.

New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
  both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
  load_cli_config defaults (cli.py). No _config_version bump — deep
  merge handles new keys.

Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
  after building the ack.  progress_callback tracks tool.completed
  duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
  Enter; _on_tool_progress appends the tool-progress hint on the first
  >=30s tool completion.  In-memory CLI_CONFIG is also updated so
  subsequent fires in the same process are suppressed immediately.

All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
2026-04-26 06:06:27 -07:00
Teknium 4bda9dcade fix(gateway): honor voice.auto_tts config in auto-TTS gate (#16007) (#16039)
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.

Gate the base adapter on a three-layer decision instead:
  1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
  2. chat in _auto_tts_disabled_chats (explicit /voice off)  → suppress
  3. else → voice.auto_tts global default

Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.

Closes #16007.
2026-04-26 05:52:05 -07:00
Teknium 67dcace412 docs(config): show options in comments for display settings (#16038)
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.

- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
  note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
  streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
  poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
2026-04-26 05:51:37 -07:00
Teknium 35c57cc46b fix(gateway): suppress tool-progress bubbles after interrupt (#16034)
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the ' Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.

progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The ' Interrupting'
message is sent through a separate adapter path and is unaffected.
2026-04-26 05:47:37 -07:00
Teknium e8441c4c0f fix(clipboard): report native/tmux success, keep Ctrl+Shift+C on dashboard
Follow-up on #16020 salvage. Three corrections:

1. Truth signal for /copy
   Before: success was 'OSC 52 sequence was emitted to stdout'. That's
   false on local Linux inside tmux (emitSequence=false), so /copy kept
   printing 'clipboard copy failed' to users whose xclip/wl-copy had
   already succeeded fire-and-forget.
   Fix: setClipboard() now returns { sequence, success } where success =
   native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative()
   returns a boolean telling setClipboard whether a native attempt was
   made. /copy only shows 'failed' when literally no path was taken.

2. Dashboard keybinding
   Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste).
   That swallows SIGINT when a stale selection is present and breaks
   the xterm/gnome-terminal/konsole/Windows-Terminal convention where
   Ctrl+C in a terminal emulator is always SIGINT. The real bug was
   that clipboard writes lost user-gesture through OSC-52 round-trips,
   which the direct writeText already fixes.
   Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct
   writeText in the keydown handler preserves user gesture. term.write
   Escape replaced with term.clearSelection() (works without relying
   on TUI input mode).

3. Error toast text
   Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to
   debug but not how to fix.
   Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual
   escape hatch), mention the debug var second.
2026-04-26 05:46:45 -07:00
Harry Riddle 2511207cb0 chore: revert docs 2026-04-26 05:46:45 -07:00
Harry Riddle 0f3a6f0fb3 fix(clipboard): dashboard Ctrl+C direct copy; TUI honest feedback; HERMES_TUI_FORCE_OSC52
- Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture);
  send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback.
- TUI /copy: copySelection() async; only reports success if OSC52 emitted.
- Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection.
- Fixes "copied N chars" false-positive when clipboard backend absent.

Changes:
  web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText
  ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection
  ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52
  ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback
2026-04-26 05:46:45 -07:00
Harry Riddle a562420383 fix(tui): robust clipboard handling with debug logging and headless detection
Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty.
Root causes:
- Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in
  headless Docker/SSH they fail or hang.
- OSC 52 fallback requires terminal emulator support; when absent, sequence
  is dropped silently.
- Dashboard OSC 52 → Clipboard API path fails due to missing user gesture;
  errors were silently caught.
- User feedback 'copied selection' was shown unconditionally, regardless of
  success.

Solution implemented:
- Short-circuit Linux native clipboard probing when no display server is
  present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and
  timeouts.
- Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to
  stderr which clipboard path is used, probe results on Linux, and whether
  OSC 52 was emitted. Greatly improves diagnosability.
- Improve dashboard clipboard error handling: replace empty catch blocks
  with console.warn messages for OSC 52 decode/Write failures and direct
  copy/paste errors. Makes browser permission/user-gesture failures visible
  in DevTools.
- Add comprehensive clipboard troubleshooting documentation to README and
  AGENTS, covering OSC 52 verification, tmux config, Docker/headless
  constraints, env vars, dashboard caveats, and fallback strategies.

Technical details:
-  in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts:
  - Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset.
  - Refactor probe sequence to async  with 500ms timeout,
    caching result; subsequent copies use cached tool immediately.
  - Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1.
-  in ink.tsx: log when OSC 52 not emitted (native
  or tmux path in use) in debug mode.
- : OSC 52 handler and Ctrl+Shift+C handler now
  log warnings to console on Clipboard API rejection with error message.
- Documentation: new 'Clipboard Troubleshooting' section in README; new
  'Clipboard environment variables and pitfalls' subsection in AGENTS.md
  (Known Pitfalls).

Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests
unaffected. No breaking changes.

Files changed:
- ui-tui/packages/hermes-ink/src/ink/termio/osc.ts
- ui-tui/packages/hermes-ink/src/ink/ink.tsx
- web/src/pages/ChatPage.tsx
- README.md
- AGENTS.md
- CHANGELOG.md (new)
2026-04-26 05:46:45 -07:00
Teknium 855366909f feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.

Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).

Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).

Config (new model_catalog section, default enabled):
  model_catalog.url       master manifest URL
  model_catalog.ttl_hours disk cache TTL (default 24h)
  model_catalog.providers.<name>.url   optional per-provider override

Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.

Changes:
- website/static/api/model-catalog.json    initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py           regenerator from in-repo lists
- hermes_cli/model_catalog.py              fetch + validate + cache module
- hermes_cli/models.py                     fetch_openrouter_models() +
                                           new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py   Nous flows use the helper
- hermes_cli/config.py                     model_catalog defaults
- website/docs/reference/model-catalog.md  + sidebars.ts
- tests/hermes_cli/test_model_catalog.py   21 tests (validation, fetch
                                           success/failure, accessors,
                                           disabled, overrides, integration)
2026-04-26 05:46:43 -07:00
Teknium d09ab8ff13 fix(mcp-oauth): preserve server_url path for protected-resource validation (#16031)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.

For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).

Closes #16015.
2026-04-26 05:43:54 -07:00
Teknium 438db0c7b0 fix(cli): /model picker honors provider-specific context caps (#16030)
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.

Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.

Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.

Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).

## Validation
| path                          | before    | after     |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex    | 1,050,000 | 272,000   |
| picker -> gpt-5.5 on OpenAI   | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000   | 272,000   |
2026-04-26 05:43:31 -07:00
zkl 2ccdadcca6 fix(deepseek): bump V4 family context window to 1M tokens
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.

Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:

- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)

Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.

Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
2026-04-26 05:32:54 -07:00
Teknium 76042f5867 feat(review): class-first skill review prompt (#16026)
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.

This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").

Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.

Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.

Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
2026-04-26 05:17:10 -07:00
Teknium 192e7eb21f fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898)
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.

Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.

Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.

Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
2026-04-26 04:53:42 -07:00
Brooklyn Nicholson d91e24547c fix(tui): attach inline diffs to tool timeline 2026-04-26 05:17:26 -05:00
Brooklyn Nicholson 05dc2eec36 fix(tui): tighten timeline detail spacing 2026-04-26 05:13:21 -05:00
Brooklyn Nicholson 2e6c3c7d23 fix(tui): address follow-up review nits 2026-04-26 05:06:57 -05:00
Brooklyn Nicholson a0aebad673 fix(tui): anchor details to stream timeline 2026-04-26 04:59:44 -05:00
Brooklyn Nicholson 7143d22a83 fix(tui): keep queued sends in queue UI 2026-04-26 04:49:56 -05:00
Brooklyn Nicholson 5ac4088856 fix(tui): keep live progress visible while scrolling 2026-04-26 04:46:44 -05:00
Brooklyn Nicholson e16e196c7e fix(tui): keep selection drag responsive 2026-04-26 04:44:19 -05:00
Brooklyn Nicholson 7d68ea9501 fix(tui): stream legacy thinking deltas visibly 2026-04-26 04:42:04 -05:00
Brooklyn Nicholson bc17310442 fix(tui): smooth selection drag behavior 2026-04-26 04:39:25 -05:00
Brooklyn Nicholson 8f0fa0836f fix(tui): preserve composer width on narrow panes 2026-04-26 04:35:54 -05:00
Brooklyn Nicholson bbd950efcf fix(tui): keep stream cadence responsive while typing 2026-04-26 04:32:55 -05:00
Brooklyn Nicholson 381121025e fix(tui): address review feedback 2026-04-26 04:28:55 -05:00
Brooklyn Nicholson 355e0ae960 fix(tui): keep streaming progress stable during interaction 2026-04-26 04:23:57 -05:00
Brooklyn Nicholson 1c964ed43f fix(tui): rely on native cursor for input 2026-04-26 03:47:05 -05:00
Brooklyn Nicholson cd7c5e5606 perf(tui): defer local input render during echo 2026-04-26 03:38:56 -05:00
Brooklyn Nicholson ee7ef33b02 fix(tui): queue busy submissions gracefully 2026-04-26 03:27:45 -05:00
Brooklyn Nicholson 5cd41d2b3b perf(tui): widen native input echo 2026-04-26 03:22:50 -05:00
Brooklyn Nicholson 9bb3bc422d perf(tui): optimistically echo simple input 2026-04-26 03:07:15 -05:00
Brooklyn Nicholson 19d75d1797 perf(tui): coalesce composer echo updates 2026-04-26 02:21:22 -05:00
Brooklyn Nicholson 458ce792d2 fix(tui): persist model switches by default 2026-04-26 02:15:10 -05:00
Brooklyn Nicholson 14fcff60c9 style(tui): apply formatter 2026-04-26 01:48:10 -05:00
Brooklyn Nicholson db4e4acca0 perf(tui): stabilize long-session scrolling 2026-04-26 01:47:05 -05:00
Teknium 59b56d445c feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.

Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
  invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
  used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
  authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
  _serialize_payload already surfaces non-top-level kwargs under
  payload['extra'], so duration_ms lands at extra.duration_ms for
  shell-hook scripts

Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.

Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).

Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
2026-04-25 22:13:12 -07:00
Teknium eb28145f36 feat(approval): hardline blocklist for unrecoverable commands (#15878)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode.  Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.

The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
  /lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
  systemctl poweroff/reboot/halt/kexec

Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.

Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.

Inspired by Mercury Agent's permission-hardened blocklist.
2026-04-25 22:07:12 -07:00
Teknium a55de5bcd0 feat(setup): auto-reconfigure on existing installs (#15879)
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.

Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)

The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.

The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).

Inspired by Mercury Agent's `mercury doctor` UX.

Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
  (guarding behavior that no longer exists — covered by
  test_setup_reconfigure.py instead)
2026-04-25 22:02:02 -07:00
brooklyn! cec0af02ad Merge pull request #15870 from NousResearch/bb/fix-skills-search
fix(tui): restore skills search RPC
2026-04-25 22:13:28 -05:00
Brooklyn Nicholson 91a7a0acbe fix(tui): restore skills search RPC 2026-04-25 22:11:52 -05:00
Teknium 7c50ed707c docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
  and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
  routing, /v1 stripping, api-version query forwarding, and the
  provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
  AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
  HangGlidersRule (noreply), and pein892 so the contributor-attribution
  CI check does not reject the salvage.
2026-04-25 18:48:43 -07:00
Teknium 731e1ef8cb feat(azure-foundry): auto-detect transport, models, context length
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:

  1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
     Claude routes and skip to anthropic_messages.
  2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
     model list, we switch to chat_completions and prefill the picker
     with the returned deployment/model IDs.
  3. Anthropic Messages probe — fallback for endpoints that don't expose
     /models but do speak the Anthropic Messages shape.
  4. Manual fallback — private endpoints / custom routes still work;
     the user picks API mode + types a deployment name.

Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.

Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.

New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection

New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError

Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth).  The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case.  Users on private/gated endpoints fall through to manual entry.
2026-04-25 18:48:43 -07:00
akhater ac57114284 fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
2026-04-25 18:48:43 -07:00
pein892 24b4b24d79 fix: preserve URL query params for Azure OpenAI and custom endpoints
Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.
2026-04-25 18:48:43 -07:00
HangGlidersRule c15064fa37 fix: pass api-version as default_query param, not in base_url — SDK was producing malformed URLs like /anthropic?api-version=.../v1/messages 2026-04-25 18:48:43 -07:00
HangGlidersRule 7bfa9442de fix: skip OAuth token refresh for Azure Anthropic endpoints — prevents ~/.claude/.credentials.json from overwriting Azure key mid-session 2026-04-25 18:48:43 -07:00
HangGlidersRule d8e4c7214e fix: Azure Anthropic short-circuit in resolve_runtime_provider — bypass custom runtime when provider=anthropic + azure.com URL 2026-04-25 18:48:43 -07:00
HangGlidersRule 6ef3a47ce5 fix: use Azure API key directly for Azure endpoints, bypass OAuth token priority chain 2026-04-25 18:48:43 -07:00
TechPrototyper 3a7653dd1f feat: Add Azure Foundry provider with OpenAI/Anthropic API mode selection
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.

Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)

Usage:
  hermes model -> More providers -> Azure Foundry

The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name

Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
2026-04-25 18:48:43 -07:00
Teknium 125de02056 fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844)
Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.

## What changed

New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.

`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).

Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)

## Context probe tiers

`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.

## Repro (from #15779)

```yaml
custom_providers:
  - name: my-custom-endpoint
    base_url: https://example.invalid/v1
    model: gpt-5.5
    models:
      gpt-5.5:
        context_length: 1050000
```

`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".

## Tests

- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
2026-04-25 18:47:53 -07:00
Teknium 4c591c2819 chore(release): map fqsy1416@gmail.com to EKKOLearnAI 2026-04-25 18:40:35 -07:00
Teknium 01535a4732 fix(api_server): cap stop-run wait at 5s so interrupt can't hang handler
task.cancel() can't preempt the run_in_executor thread running
run_conversation(), so we rely on agent.interrupt() to wake the loop.
Without a timeout, a slow/unresponsive interrupt blocks the HTTP
response indefinitely. Wrap the await in wait_for(shield(task), 5.0)
and log a warning on timeout.

Also tidy one extra space in the module docstring's /stop entry.
2026-04-25 18:40:35 -07:00
ekko 0a15dbdc43 feat(api_server): add POST /v1/runs/{run_id}/stop endpoint
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.

Includes 15 tests covering start, events, and stop scenarios.
2026-04-25 18:40:35 -07:00
Teknium ce0513dd2e chore(release): map Feranmi10 personal email 2026-04-25 18:39:55 -07:00
Oluwadare Feranmi dc5e02ea7f feat(cli): implement hermes update --check flag (fixes #10318) 2026-04-25 18:39:55 -07:00
brooklyn! ff851ba7b9 Merge pull request #15821 from NousResearch/fix/tui-ctrl-g-editor
fix: external editor handoff in CLI/TUI
2026-04-25 20:37:05 -05:00
Brooklyn Nicholson 14dd8e9a72 fix(tui): address Copilot review on editor handoff
- resolveEditor() now returns argv (string[]) so EDITOR='code --wait'
  and VISUAL='emacsclient -t' tokenize correctly into spawnSync's
  separate command + args. Previously the whole string was passed as
  argv[0] and would ENOENT.
- Skip the POSIX X_OK PATH walk on Windows; return ['notepad.exe']
  there since fs.constants.X_OK is not meaningful and PATHEXT-based
  resolution would need its own implementation.
- Surface openEditor() rejections via actions.sys instead of letting
  them become unhandled promise rejections in the useInput callback.
- Hotkey docs/comment now say Cmd/Ctrl+G to match isAction()'s
  platform-action-modifier behavior (Cmd on macOS, Ctrl elsewhere).
2026-04-25 20:34:24 -05:00
Wysie 1d80e92c7e test(discord): add guild to fake e2e messages 2026-04-25 18:25:56 -07:00
Teknium edce7522a5 chore(release): add AUTHOR_MAP entry for voidborne-d personal email 2026-04-25 18:25:13 -07:00
voidborne-d 45e1228a8a fix(cli): suppress OSError EIO on interrupt shutdown
When the user interrupts a long-running task, prompt_toolkit tries to
flush stdout during emergency shutdown.  If stdout is in a broken state
(redirected to /dev/null, pipe closed, terminal gone), the flush raises
`OSError: [Errno 5] Input/output error` which propagates unhandled and
crashes the CLI.

Two defense layers:

1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to
   the asyncio exception handler, matching the existing pattern for
   `RuntimeError("Event loop is closed")` and `KeyError("is not
   registered")`.

2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check
   before the existing string-match guards, silently suppressing the
   error instead of printing a misleading stdin-related message.

Fixes #13710.
2026-04-25 18:25:13 -07:00
Brooklyn Nicholson 83129e72de refactor(tui): tighten editor handoff helpers
- editor.ts: collapse two private helpers into one flatMap-driven lookup,
  keep `isExecutable` as the only named primitive, document the fallback
  chain with prompt_toolkit parity
- editor.test.ts: hoist the `exe` helper out of `describe`, drop the
  empty afterEach + dead mkdir branch, materialize expected paths before
  the resolveEditor call so argument evaluation order doesn't bite
- useComposerState.openEditor: rmSync the mkdtemp dir (was leaking),
  early-return on bad exit / empty buffer, run cleanup in finally
- useInputHandlers: cheap `ch.toLowerCase() === 'g'` guard before the
  modifier check
- hermes-ink/screen.ts: pick up `npm run fix` import-sort cleanup so
  lint passes
2026-04-25 20:24:06 -05:00
Teknium 4d170134ef chore(release): map nerijusn76@gmail.com to Nerijusas (#15833) 2026-04-25 18:22:49 -07:00
nerijusas 81e01f6ee9 fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
Brooklyn Nicholson 7fd8dc0bfb fix: preserve prompt_toolkit editor picker and mirror it in TUI
Base CLI's editor UX was better because prompt_toolkit picks the system
editor first, then friendly terminal editors before vi. Do not override
that with a vim-first chain.

Keep the CLI on prompt_toolkit's picker and only set tempfile_suffix='.md'
to avoid the complex-tempfile EEXIST path. Update the TUI resolver to
match prompt_toolkit's fallback order: $VISUAL, $EDITOR, editor, nano,
pico, vi, emacs.
2026-04-25 20:20:05 -05:00
Brooklyn Nicholson d056b610b7 fix: avoid prompt_toolkit complex tempfile bug and prefer nvim first
Setting buffer.tempfile = 'prompt.md' pushed prompt_toolkit into its
complex-tempfile path, which creates a temp dir and then calls
os.makedirs() on that same path when no subdirectory is present. That
raises EEXIST before the editor can launch.

Keep prompt_toolkit on the simple tempfile path with .md suffix, and
make the editor fallback chain explicit on both surfaces:
$VISUAL -> $EDITOR -> nvim -> vim -> vi -> nano.
2026-04-25 20:16:50 -05:00
Teknium 2536a36f6f fix(tui): route /save through session.save JSON-RPC
The cherry-picked approach serialized the UI-shaped transcript on the Node
side, producing a third JSON format alongside cli.py save_conversation and
tui_gateway session.save. Simpler to call the existing session.save method,
which already writes the canonical agent history (raw OpenAI messages +
model) to an absolute-path file.

- /save still short-circuits before the slash worker
- Empty transcript -> 'no conversation yet'
- No active session -> 'no active session - nothing to save'
- Otherwise: rpc('session.save', {session_id}) and echo back the file path
- Tests updated to assert RPC contract; new test covers the no-sid case
2026-04-25 18:11:37 -07:00
helix4u 1b8ca9254f fix(tui): save live transcript from slash command 2026-04-25 18:11:37 -07:00
Brooklyn Nicholson db7c5735f0 fix: prefer vim over nano for $EDITOR fallback (CLI + TUI)
prompt_toolkit's default editor list is: $VISUAL, $EDITOR, /usr/bin/editor,
/usr/bin/nano, /usr/bin/pico, /usr/bin/vi, /usr/bin/emacs — so when
neither env var is set, the base CLI launched nano. The TUI fell back
to a literal 'vi'. Same Ctrl+G keystroke, two different editors.

Pick the same chain on both surfaces:
  $VISUAL → $EDITOR → vim → vi → nano

CLI: override input_area.buffer._open_file_in_editor on the TextArea
once at app build time. Local to that buffer; doesn't touch
os.environ or affect other subprocesses.

TUI: extract resolveEditor() into ui-tui/src/lib/editor.ts. PATH walk
with accessSync(X_OK), no shelling out. Six-line unit test verifies
the priority order and the multi-entry PATH walk.
2026-04-25 20:11:25 -05:00
Teknium 8bbeaea6c7 fix(config): broaden api-key ref lookup to templated base_url
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.

Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.

Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
2026-04-25 18:10:52 -07:00
helix4u 1fdc31b214 fix(config): preserve custom provider api key refs 2026-04-25 18:10:52 -07:00
Brooklyn Nicholson 5fac6c3440 fix(cli): write editor draft to prompt.md so syntax highlighting works
Base CLI was handing prompt_toolkit's Buffer.open_in_editor() a default
config — Buffer.tempfile_suffix and .tempfile both empty — so it
created /tmp/tmpXXXXXX with no extension. nano/vim/helix all key
syntax highlighting off the file extension, so the buffer rendered
plain.

The TUI already writes to <mkdtemp>/prompt.md and gets full markdown
highlighting + a sensible title bar. Set buffer.tempfile = 'prompt.md'
on the TextArea so prompt_toolkit's complex-tempfile path produces
<mkdtemp>/prompt.md to match. shutil.rmtree cleanup is built-in.
2026-04-25 20:04:04 -05:00
kshitijk4poor 2c56dce0ed fix(model): preserve custom endpoint credentials and accept cloud models not in /v1/models
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
  being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
  switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
  provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
  to switch_model so the config model list is available for validation

Closes #15088
2026-04-25 18:03:47 -07:00
Teknium 01cf2c65cc chore(release): map iris@growthpillars.co to irispillars (#15825)
Follow-up to #15533 (merged). Prevents release notes CI from
attributing the contributor to the placeholder.
2026-04-25 18:02:13 -07:00
helix4u b2d3308f98 fix(doctor): accept bare custom provider 2026-04-25 18:01:36 -07:00
Iris Jin 25ba6a4a74 fix(gateway): make reasoning session-scoped by default 2026-04-25 18:01:31 -07:00
Brooklyn Nicholson 4c797bfae9 fix(cli): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
Same problem as the TUI: Cursor and VSCode bind Ctrl+G to "Find Next"
at the editor level, so the keystroke never reaches the terminal and
the prompt_toolkit-driven Hermes CLI sees nothing.

Register ('escape', 'g') alongside the existing 'c-g' on the same
handler so the editor handoff works inside Cursor/VSCode too. The
filter (no clarify/approval/sudo/secret prompt active) is unchanged.
2026-04-25 20:01:03 -05:00
Brooklyn Nicholson c58956a9a2 fix(tui): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
VSCode and Cursor bind Ctrl+G to "Find Next" at the editor level, so
the keystroke never reaches the embedded terminal — Ctrl+G to open
\$EDITOR was effectively dead inside those IDEs.

Alt+G is unbound in both editors and reaches the TUI cleanly as
`\x1bg` → `key.meta && ch === 'g'` after parse-keypress. Accept it
alongside the existing isAction(key, ch, 'g') check, and document the
fallback in README + the hotkeys panel.
2026-04-25 19:57:17 -05:00
Brooklyn Nicholson 3944b22506 fix(tui): suspend Ink properly when opening $EDITOR via Ctrl+G
The Ctrl+G handler was toggling the alt-screen by hand
(`\x1b[?1049l` ... `\x1b[?1049h`) without releasing stdin or kitty
keyboard mode, so the launched editor would lose keystrokes (Ink kept
swallowing them) and editors that don't speak CSI-u (e.g. nano) would
print "Unknown sequence" for every Ctrl-key.

Switch to `withInkSuspended` from @hermes/ink, the same helper
`/setup` already uses. It pauses Ink, removes stdin listeners, drops
raw mode, disables kitty/modifyOtherKeys + mouse + focus reporting,
runs the editor, then restores everything with a full repaint.
2026-04-25 19:54:06 -05:00
brooklyn! 489bed6f96 Merge pull request #15478 from yes999zc/fix-deepseek-reasoning-all-assistant-messages
fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
2026-04-25 19:19:33 -05:00
FocusFlow Dev ad0ac89478 fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.

- Remove the 'source_msg.get("tool_calls") and' guard
- Update test: plain assistant turns now get padded for DeepSeek/Kimi

Fixes #15213
2026-04-26 07:47:13 +08:00
Teknium dc4d92f131 docs: embed tutorial videos on webhooks + auxiliary models pages (#15809)
- webhooks.md: adds a Video Tutorial section under the intro with a
  responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
  Models with a responsive YouTube iframe (NoF-YajElIM).

Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
2026-04-25 16:44:53 -07:00
Teknium 47420a84b9 docs(obliteratus): link YouTube video guide in SKILL.md (#15808)
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
2026-04-25 16:30:38 -07:00
brooklyn! f93d4624bf Merge pull request #15749 from Zjianru/fix/copy-reasoning-content-ordering-and-cross-provider-isolation
fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
2026-04-25 17:21:49 -05:00
codez 5ae608152e fix: remove has_reasoning guard — inject empty reasoning_content for DeepSeek/Kimi tool_calls unconditionally 2026-04-26 06:08:54 +08:00
brooklyn! 88b65cc82a Update run_agent.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-26 05:49:38 +08:00
brooklyn! edc78e258c Merge pull request #15766 from NousResearch/bb/tui-ssh-copy
fix(tui): honor client copy shortcut over ssh
2026-04-25 15:33:17 -05:00
Brooklyn Nicholson 31d7f1951a fix(tui): clamp copied selection bounds
Clamp copied selection columns to the screen width before scanning rendered cells.
2026-04-25 15:32:45 -05:00
Brooklyn Nicholson b1c18e5a41 refactor(tui): format screen imports
Keep screen.ts import ordering aligned with the ui-tui formatter.
2026-04-25 15:26:51 -05:00
Brooklyn Nicholson bd66e55a02 fix(tui): track rendered spaces for selection copy
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
2026-04-25 15:21:26 -05:00
Brooklyn Nicholson 1735ced93b fix(tui): preserve code block indentation in selection
Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.
2026-04-25 15:17:36 -05:00
Brooklyn Nicholson bba16943f6 fix(tui): preserve rendered indentation in selections
- trim only empty edge rows instead of full selected text
- bound selection paint using unwritten cells so rendered indentation remains copyable
2026-04-25 15:14:26 -05:00
Brooklyn Nicholson 132620ba3d refactor(tui): simplify remote copy hotkey hints
Use an explicit conditional table instead of spread casting for SSH copy hint rows.
2026-04-25 15:09:12 -05:00
Brooklyn Nicholson 876bb60044 fix(tui): trim whitespace-only selection chrome
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
2026-04-25 15:07:29 -05:00
Brooklyn Nicholson a68793b6c4 refactor(tui): share remote shell detection
Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.
2026-04-25 14:55:28 -05:00
Brooklyn Nicholson bcc5362432 fix(tui): honor client copy shortcut over ssh
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
2026-04-25 14:44:39 -05:00
brooklyn! 283c8fd6e2 Merge pull request #15755 from NousResearch/bb/tui-model-flag
fix(tui): honor launch model overrides
2026-04-25 14:30:26 -05:00
Brooklyn Nicholson 919274b60e fix(tui): align overlay q shortcut casing
Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.
2026-04-25 14:26:35 -05:00
Brooklyn Nicholson 6e83d90eb4 refactor(tui): tighten overlay helpers
- rename overlay help text component to match its role
- share picker window math across model, session, and skills overlays
2026-04-25 14:23:45 -05:00
Brooklyn Nicholson c6fdf48b79 fix(tui): sync inference model after switches
- keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches
- clarify static provider detection remapping docs
2026-04-25 14:17:57 -05:00
Brooklyn Nicholson a046483e86 fix(tui): share overlay close controls
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
2026-04-25 14:17:04 -05:00
Brooklyn Nicholson fdcbd2257b fix(tui): resolve startup model aliases statically
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
2026-04-25 14:13:02 -05:00
Brooklyn Nicholson 48bdd2445e fix(tui): apply ui-tui fix pass and restore type-check
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
2026-04-25 14:08:54 -05:00
Brooklyn Nicholson 5e52011de3 fix(tui): bind provider as model alias 2026-04-25 13:58:59 -05:00
Brooklyn Nicholson e48a497d16 fix(tui): share static model detection 2026-04-25 13:56:16 -05:00
Brooklyn Nicholson 2dfcc8087a fix(tui): avoid network lookup during startup 2026-04-25 13:47:18 -05:00
Brooklyn Nicholson 4db58d45d4 fix(tui): address startup provider review 2026-04-25 13:29:15 -05:00
Brooklyn Nicholson 57b43fdd4b fix(tui): preserve provider precedence on startup 2026-04-25 13:25:43 -05:00
Brooklyn Nicholson e9c47c7042 fix(tui): honor launch model overrides 2026-04-25 13:21:59 -05:00
brooklyn! ee0728c6c4 Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle
fix(tui): rebuild when ink bundle is missing
2026-04-25 13:14:23 -05:00
codez 9daa0620a6 fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
Fix logic-ordering bug where normalized_reasoning promotion returns
before the DeepSeek/Kimi needs_empty_reasoning guard, causing
cross-provider reasoning content (MiniMax → DeepSeek) to leak into
reasoning_content and trigger HTTP 400.

Changes:
- Reorder branching: existing reasoning_content check first
- Add 'not has_reasoning' guard so poisoned histories (no reasoning)
  still get '' injected for DeepSeek/Kimi
- Healthy same-provider reasoning promotion path unchanged

Refs: #15250, #15213
2026-04-26 02:04:52 +08:00
kshitij 648b89911f fix: use output_text for assistant message content in Codex Responses API (#15690)
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.

_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:

  Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().

Fixes #15687
2026-04-25 10:13:29 -07:00
kshitijk4poor 7c17accb29 fix: /stop now immediately aborts streaming retry loop
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.

On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.

Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.

Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.

Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
2026-04-25 09:51:39 -07:00
Teknium 5006b2204b fix(update): honor RestartSec when polling for gateway respawn (#15707)
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed

  ⚠ hermes-gateway drained but didn't relaunch — forcing restart

followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).

Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.

Observed timeline from journalctl before fix:
  08:56:22.262  old PID exits 75
  08:56:32.707  systemd logs Stopped -> Started  (10.4s gap, > 10s budget)

After fix the poll covers 40s — comfortably inside RestartSec + slack.

Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
  'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
2026-04-25 09:08:27 -07:00
Teknium a9fa73a620 feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704)
Makes hermes -z usable by sweeper without mutating user config.

- Top-level -m/--model and --provider flags that apply to -z/--oneshot
  (mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
  for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
  given without --provider, detect_provider_for_model() auto-selects the
  provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
  model across to a different provider is usually wrong, and silently
  picking the provider's catalog default hides the mismatch.

Config defaults still used when both flags are omitted (existing behavior).

Validation (all live against OpenRouter):
  -z 'x' ....................... uses config default (opus-4.7)
  -z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
  -z 'x' --model ... --provider  pair as given
  HERMES_INFERENCE_MODEL=... -z  haiku-4.5 via env var
  -z 'x' --provider anthropic .. exits 2 with error to stderr
2026-04-25 08:55:36 -07:00
Teknium 7c8c031f60 feat: add hermes -z <prompt> one-shot mode (#15702)
* feat: add `hermes -z <prompt>` one-shot mode

Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.

Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().

* feat(oneshot): handle interactive-callback gaps explicitly

Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:

  - clarify       — inject a callback that tells the agent to pick the
                    best default and continue (previously returned a
                    generic 'not available in this execution context'
                    error that wastes a tool call)
  - sudo password — terminal_tool already gates on HERMES_INTERACTIVE
                    (we don't set it); sudo fails gracefully
  - shell hooks   — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
                    back to deny on non-tty stdin
  - dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
  - secret capture— tool returns gracefully when no callback wired

Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
2026-04-25 08:44:38 -07:00
Teknium ea01bdcebe refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
2026-04-25 08:21:14 -07:00
kshitijk4poor d635e2df3f fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.

Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').

Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.

Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
2026-04-25 07:09:47 -07:00
Teknium cf2fabc40f docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.

Changes
- extending-the-dashboard.md:
  - Frontmatter description + intro bullet now mention page-scoped slots
  - New TOC entry "Augmenting built-in pages (page-scoped slots)"
  - New dedicated subsection after "Replacing built-in pages"
    explaining the heavy-vs-light tradeoff, listing the pages that
    expose slots, and showing a worked manifest + IIFE example with
    tab.hidden: true
  - Cross-link from the tab.override section pointing readers to the
    lighter augmentation option
- web-dashboard.md:
  - Bullet mentioning "page-scoped slots (inject widgets into
    built-in pages without overriding them)"

Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
  the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
  link) reproduce on bare main -- not introduced here
2026-04-25 06:59:24 -07:00
Teknium af22421e87 feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.

* feat(dashboard): page-scoped plugin slots for built-in pages

Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.

Previously, plugins could only:
  1. Add new tabs (tab.path)
  2. Replace whole built-in pages (tab.override)
  3. Inject into global shell slots (header-*, footer-*, pre-main, ...)

None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.

Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
  (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
  grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
    <PluginSlot name="<page>:top" />
  as the first child of its outer wrapper and
    <PluginSlot name="<page>:bottom" />
  as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
  sessions:top via registerSlot(), with matching slots entry in
  the manifest -- so freshly-setup users can see what page-scoped
  slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
  authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
  colon-bearing slot names (sessions:top, analytics:bottom, ...)

Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
2026-04-25 06:55:35 -07:00
Teknium 97d54f0e4d fix(terminal): three-layer defense against watch_patterns notification spam (#15642)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
2026-04-25 06:41:58 -07:00
Teknium 6e561ffa6d fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639)
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call.  The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later.  Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.

Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s.  Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.

Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
2026-04-25 06:11:22 -07:00
Teknium ac05daa189 fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634)
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.

Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
2026-04-25 05:53:08 -07:00
Teknium 3c1c65e754 fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
2026-04-25 05:50:34 -07:00
Teknium f92006ce1c fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium b35d692f45 chore(release): map ash@users.noreply.github.com to ash 2026-04-25 05:27:17 -07:00
Ash Rowan Vale 🌿 facea84559 fix(auxiliary): retry without temperature when any provider rejects it
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
2026-04-25 05:27:17 -07:00
Teknium f67a61dc93 fix(flush_memories): strip temperature from codex_responses fallback (#15620)
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:

- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models

The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.

Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
2026-04-25 05:01:25 -07:00
Teknium 6ed37e0f42 feat(tools): make discord/discord_admin opt-in, Discord-only
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.

Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
  - _prompt_toolset_checklist: checklist filter
  - _get_platform_tools: resolution filter (both branches)
  - _save_platform_tools: save-time filter (covers 'Configure all
    platforms' and hand-edited config.yaml)
  - tools_disable_enable_command: rejects `hermes tools enable discord`
    on non-Discord platforms with a clear error

build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
2026-04-25 04:51:11 -07:00
alt-glitch 591deeb928 feat(session): inject Discord IDs block when discord tool is loaded
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.

Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string).  The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.

The block adapts to context:

  - Thread:     guild / parent channel / thread / message
  - Channel:    guild / channel / message
  - (DM has no guild/channel IDs worth listing; only message)

Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
2026-04-25 04:51:11 -07:00
alt-glitch 5ae07e7b5c fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.

Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``.  Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
2026-04-25 04:51:11 -07:00
alt-glitch 47b02e961c feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message.  Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.

For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.

Behavioural: still a no-op — nothing consumes these fields yet.
2026-04-25 04:51:11 -07:00
alt-glitch 0702231dd8 feat(session): add guild_id/parent_chat_id/message_id to SessionSource
Groundwork for injecting raw platform identifiers into the agent's
system prompt.  Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.

Adds three optional fields:

- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)

Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them.  Behaviourally a
no-op: nothing reads the fields yet and they default to None.
2026-04-25 04:51:11 -07:00
alt-glitch db09477b77 feat(feishu): wire feishu doc/drive tools into hermes-feishu composite
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
2026-04-25 04:50:14 -07:00
alt-glitch 81987f0350 feat(discord): split discord_server into discord + discord_admin tools
Split the monolithic discord_server tool (14 actions) into two:

- discord: core actions (fetch_messages, search_members, create_thread)
  that are useful for the agent's normal operation. Auto-enabled on
  the discord platform via the pipeline fix.

- discord_admin: server management actions (list channels/roles, pins,
  role assignment) that require explicit opt-in via hermes tools.
  Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
2026-04-25 04:50:14 -07:00
alt-glitch 9830905dab fix(tools): recover non-configurable toolsets from composite resolution
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
2026-04-25 04:50:14 -07:00
Teknium 0d548d1db9 fix(cron): wire context_from through the update action
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.

- tools/cronjob_tools.py: handle context_from in the update branch the
  same way script/enabled_toolsets/workdir are handled: normalize str/list
  to refs, validate each referenced job exists (same check the create
  branch does), store as list-or-None to match create_job()'s shape.
  Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
  clear (both shapes)/bad-ref/preserve-across-unrelated-update.
2026-04-25 04:49:28 -07:00
MorAlekss eb92222811 fix(cron): silent skip when context_from job has no output yet 2026-04-25 04:49:28 -07:00
MorAlekss e4a91ccb76 test(cron): add PermissionError coverage for context_from 2026-04-25 04:49:28 -07:00
MorAlekss 5ac5365923 feat(cron): add context_from field for cron job output chaining 2026-04-25 04:49:28 -07:00
Teknium f433197f23 feat(installer): FHS layout for root installs on Linux (#15608)
Root installs on Linux now put the code at /usr/local/lib/hermes-agent and
the hermes command at /usr/local/bin/hermes.  HERMES_HOME (~/.hermes) stays
state-only.  Matches Claude Code / Codex CLI / OpenClaw, keeps Docker
bind-mounted /root/ volumes lean, and puts the command on every shell's
default PATH without touching shell RC files.

- Non-root users and macOS root: unchanged
- Existing root installs at $HERMES_HOME/hermes-agent: preserved in-place
  (detected via .git dir) — no auto-migration, no breakage
- Explicit --dir / $HERMES_INSTALL_DIR: always wins, never overridden
- Termux: unchanged (package manager manages /data/data/...)

Requested by @souly9999 (Discord). Our own Dockerfile already uses this
split (code at /opt/hermes, data at /opt/data volume); the user-install
path now matches.
2026-04-25 04:49:16 -07:00
Teknium df485628ce chore(release): map Readon's git email to GitHub login 2026-04-25 04:49:07 -07:00
Yindong 9fde22d233 fix the reset of model change by /model. 2026-04-25 04:49:07 -07:00
alt-glitch 9d7b64b5dd fix(tools): normalize numeric entries and clear stale no_mcp in _save_platform_tools
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.

The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
2026-04-25 04:49:02 -07:00
helix4u 0738b80833 fix(tui): rebuild when ink bundle is missing 2026-04-24 15:51:38 -06:00
374 changed files with 40921 additions and 4169 deletions
+13 -4
View File
@@ -390,7 +390,16 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
"timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
}
if normalized_base_url:
kwargs["base_url"] = normalized_base_url
# Azure Anthropic endpoints require an ``api-version`` query parameter.
# Pass it via default_query so the SDK appends it to every request URL
# without corrupting the base_url (appending it directly produces
# malformed paths like /anthropic?api-version=.../v1/messages).
_is_azure_endpoint = "azure.com" in normalized_base_url.lower()
if _is_azure_endpoint and "api-version" not in normalized_base_url:
kwargs["base_url"] = normalized_base_url.rstrip("/")
kwargs["default_query"] = {"api-version": "2025-04-15"}
else:
kwargs["base_url"] = normalized_base_url
common_betas = _common_betas_for_base_url(normalized_base_url)
if _is_kimi_coding_endpoint(base_url):
@@ -1680,9 +1689,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
+134 -13
View File
@@ -42,6 +42,7 @@ import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
@@ -52,6 +53,17 @@ from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_
logger = logging.getLogger(__name__)
def _extract_url_query_params(url: str):
"""Extract query params from URL, return (clean_url, default_query dict or None)."""
parsed = urlparse(url)
if parsed.query:
clean = urlunparse(parsed._replace(query=""))
params = {k: v[0] for k, v in parse_qs(parsed.query).items()}
return clean, params
return url, None
# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
_stale_base_url_warned = False
@@ -390,7 +402,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
converted = []
@@ -1157,8 +1169,10 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
return None, None
model = _read_main_model() or "gpt-4o-mini"
logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
_clean_base, _dq = _extract_url_query_params(custom_base)
_extra = {"default_query": _dq} if _dq else {}
if custom_mode == "codex_responses":
real_client = OpenAI(api_key=custom_key, base_url=custom_base)
real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
return CodexAuxiliaryClient(real_client, model), model
if custom_mode == "anthropic_messages":
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
@@ -1172,12 +1186,12 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
"Custom endpoint declares api_mode=anthropic_messages but the "
"anthropic SDK is not installed — falling back to OpenAI-wire."
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
return (
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
@@ -1349,6 +1363,49 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
"""Detect provider 400s for an unsupported request parameter.
Different OpenAI-compatible endpoints phrase the same class of error a few
ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
``param`` field, ``X is not supported``, ``unknown parameter: X``,
``unrecognized request argument: X``. We match on both the parameter
name and a generic "unsupported/unknown/unrecognized parameter" marker so
call sites can reactively retry without the offending key instead of
surfacing a noisy auxiliary failure.
Generalizes the temperature-specific detector that originally shipped
with PR #15621 so the same retry strategy can cover ``max_tokens``,
``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
for the generalization pattern.
"""
param_lower = (param or "").lower()
if not param_lower:
return False
err_lower = str(exc).lower()
if param_lower not in err_lower:
return False
return any(marker in err_lower for marker in (
"unsupported parameter",
"unsupported_parameter",
"not supported",
"does not support",
"unknown parameter",
"unrecognized request argument",
"unrecognized parameter",
"invalid parameter",
))
def _is_unsupported_temperature_error(exc: Exception) -> bool:
"""Back-compat wrapper: detect API errors where the model rejects ``temperature``.
Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
public symbol because existing tests and call sites import it by name.
"""
return _is_unsupported_parameter_error(exc, "temperature")
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
@@ -1782,12 +1839,15 @@ def resolve_provider_client(
provider,
)
extra = {}
_clean_base, _dq = _extract_url_query_params(custom_base)
if _dq:
extra["default_query"] = _dq
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
@@ -1824,6 +1884,8 @@ def resolve_provider_client(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
_clean_base2, _dq2 = _extract_url_query_params(custom_base)
_extra2 = {"default_query": _dq2} if _dq2 else {}
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
@@ -1841,7 +1903,7 @@ def resolve_provider_client(
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
@@ -1850,7 +1912,7 @@ def resolve_provider_client(
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
@@ -2760,8 +2822,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@@ -2849,7 +2911,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "flush_memories").
"session_search", "skills_hub", "mcp", "title_generation").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.
@@ -2952,13 +3014,45 @@ def call_llm(
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
# Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
# then payment fallback.
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s: provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
# If retry still fails, fall through to the max_tokens /
# payment / auth chains below using the temperature-stripped
# kwargs. Re-raise only if the retry hit something those
# chains won't handle.
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3221,8 +3315,35 @@ async def async_call_llm(
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s (async): provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
await client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
+147 -11
View File
@@ -44,22 +44,31 @@ _TOOL_CALL_LEAK_PATTERN = re.compile(
# Multimodal content helpers
# ---------------------------------------------------------------------------
def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> List[Dict[str, Any]]:
"""Convert chat-style multimodal content to Responses API input parts.
Input: ``[{"type":"text"|"image_url", ...}]`` (native OpenAI Chat format)
Output: ``[{"type":"input_text"|"input_image", ...}]`` (Responses format)
Output: ``[{"type":"input_text"|"output_text"|"input_image", ...}]`` (Responses format)
The ``role`` parameter controls the text content type:
- ``"user"`` (default) → ``"input_text"``
- ``"assistant"`` → ``"output_text"``
The Responses API rejects ``input_text`` inside assistant messages and
``output_text`` inside user messages, so callers MUST pass the correct
role for the message being converted.
Returns an empty list when ``content`` is not a list or contains no
recognized parts — callers fall back to the string path.
"""
text_type = "output_text" if role == "assistant" else "input_text"
if not isinstance(content, list):
return []
converted: List[Dict[str, Any]] = []
for part in content:
if isinstance(part, str):
if part:
converted.append({"type": "input_text", "text": part})
converted.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
continue
@@ -67,7 +76,7 @@ def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
if ptype in {"text", "input_text", "output_text"}:
text = part.get("text")
if isinstance(text, str) and text:
converted.append({"type": "input_text", "text": text})
converted.append({"type": text_type, "text": text})
continue
if ptype in {"image_url", "input_image"}:
image_ref = part.get("image_url")
@@ -218,6 +227,23 @@ def _responses_tools(tools: Optional[List[Dict[str, Any]]] = None) -> Optional[L
# Message format conversion
# ---------------------------------------------------------------------------
_RESPONSE_MESSAGE_STATUSES = {"completed", "incomplete", "in_progress"}
def _normalize_responses_message_status(value: Any, *, default: str = "completed") -> str:
"""Normalize a Responses assistant message status for replay.
The API accepts completed/incomplete/in_progress on replayed assistant
output messages. Preserve those exactly (modulo case/hyphen spelling) so
incomplete Codex continuation turns don't get falsely marked completed.
"""
if isinstance(value, str):
status = value.strip().lower().replace("-", "_").replace(" ", "_")
if status in _RESPONSE_MESSAGE_STATUSES:
return status
return default
def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items."""
items: List[Dict[str, Any]] = []
@@ -233,9 +259,10 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
if role in {"user", "assistant"}:
content = msg.get("content", "")
if isinstance(content, list):
content_parts = _chat_content_to_responses_parts(content)
content_parts = _chat_content_to_responses_parts(content, role=role)
text_type = "output_text" if role == "assistant" else "input_text"
content_text = "".join(
p.get("text", "") for p in content_parts if p.get("type") == "input_text"
p.get("text", "") for p in content_parts if p.get("type") == text_type
)
else:
content_parts = []
@@ -262,7 +289,57 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
seen_item_ids.add(item_id)
has_codex_reasoning = True
if content_parts:
# Replay exact assistant message items (with id/phase) from
# previous turns so the API can maintain prefix-cache hits.
# OpenAI docs: "preserve and resend phase on all assistant
# messages — dropping it can degrade performance."
codex_message_items = msg.get("codex_message_items")
replayed_message_items = 0
if isinstance(codex_message_items, list):
for raw_item in codex_message_items:
if not isinstance(raw_item, dict):
continue
if raw_item.get("type") != "message" or raw_item.get("role") != "assistant":
continue
raw_content_parts = raw_item.get("content")
if not isinstance(raw_content_parts, list):
continue
normalized_content_parts = []
for part in raw_content_parts:
if not isinstance(part, dict):
continue
part_type = str(part.get("type") or "").strip()
if part_type not in {"output_text", "text"}:
continue
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content_parts.append({"type": "output_text", "text": text})
if not normalized_content_parts:
continue
replay_item = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(raw_item.get("status")),
"content": normalized_content_parts,
}
item_id = raw_item.get("id")
if isinstance(item_id, str) and item_id.strip():
replay_item["id"] = item_id.strip()
phase = raw_item.get("phase")
if isinstance(phase, str) and phase.strip():
replay_item["phase"] = phase.strip()
items.append(replay_item)
replayed_message_items += 1
if replayed_message_items > 0:
pass
elif content_parts:
items.append({"role": "assistant", "content": content_parts})
elif content_text.strip():
items.append({"role": "assistant", "content": content_text})
@@ -422,6 +499,47 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
normalized.append(reasoning_item)
continue
if item_type == "message":
role = item.get("role")
if role != "assistant":
raise ValueError(f"Codex Responses input[{idx}] message items must have role='assistant'.")
content = item.get("content")
if not isinstance(content, list):
raise ValueError(f"Codex Responses input[{idx}] message item must have content list.")
normalized_content = []
for part_idx, part in enumerate(content):
if not isinstance(part, dict):
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] must be an object."
)
part_type = part.get("type")
if part_type not in {"output_text", "text"}:
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] has unsupported type {part_type!r}."
)
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content.append({"type": "output_text", "text": text})
if not normalized_content:
raise ValueError(f"Codex Responses input[{idx}] message item must contain at least one text part.")
normalized_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item.get("status")),
"content": normalized_content,
}
item_id = item.get("id")
if isinstance(item_id, str) and item_id.strip():
normalized_item["id"] = item_id.strip()
phase = item.get("phase")
if isinstance(phase, str) and phase.strip():
normalized_item["phase"] = phase.strip()
normalized.append(normalized_item)
continue
role = item.get("role")
if role in {"user", "assistant"}:
content = item.get("content", "")
@@ -429,13 +547,16 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
content = ""
if isinstance(content, list):
# Multimodal content from ``_chat_messages_to_responses_input``
# is already in Responses format (``input_text`` / ``input_image``).
# Validate each part and pass through.
# is already in Responses format (``input_text`` / ``output_text``
# / ``input_image``). Validate each part and pass through.
# Use the correct text type for the role — ``output_text`` for
# assistant messages, ``input_text`` for user messages.
text_type = "output_text" if role == "assistant" else "input_text"
validated: List[Dict[str, Any]] = []
for part_idx, part in enumerate(content):
if isinstance(part, str):
if part:
validated.append({"type": "input_text", "text": part})
validated.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
raise ValueError(
@@ -446,7 +567,7 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
text = part.get("text", "")
if not isinstance(text, str):
text = str(text or "")
validated.append({"type": "input_text", "text": text})
validated.append({"type": text_type, "text": text})
elif ptype in {"input_image", "image_url"}:
image_ref = part.get("image_url", "")
detail = part.get("detail")
@@ -703,6 +824,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
content_parts: List[str] = []
reasoning_parts: List[str] = []
reasoning_items_raw: List[Dict[str, Any]] = []
message_items_raw: List[Dict[str, Any]] = []
tool_calls: List[Any] = []
has_incomplete_items = response_status in {"queued", "in_progress", "incomplete"}
saw_commentary_phase = False
@@ -721,6 +843,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if item_type == "message":
item_phase = getattr(item, "phase", None)
normalized_phase = None
if isinstance(item_phase, str):
normalized_phase = item_phase.strip().lower()
if normalized_phase in {"commentary", "analysis"}:
@@ -730,6 +853,18 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
message_text = _extract_responses_message_text(item)
if message_text:
content_parts.append(message_text)
raw_message_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item_status),
"content": [{"type": "output_text", "text": message_text}],
}
item_id = getattr(item, "id", None)
if isinstance(item_id, str) and item_id:
raw_message_item["id"] = item_id
if normalized_phase:
raw_message_item["phase"] = normalized_phase
message_items_raw.append(raw_message_item)
elif item_type == "reasoning":
reasoning_text = _extract_responses_reasoning_text(item)
if reasoning_text:
@@ -842,6 +977,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
reasoning_content=None,
reasoning_details=None,
codex_reasoning_items=reasoning_items_raw or None,
codex_message_items=message_items_raw or None,
)
if tool_calls:
+6 -3
View File
@@ -14,6 +14,7 @@ from datetime import datetime
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import get_env_value
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
@@ -1273,7 +1274,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
def _is_source_suppressed(_p, _s): # type: ignore[misc]
return False
if provider == "openrouter":
token = os.getenv("OPENROUTER_API_KEY", "").strip()
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value("OPENROUTER_API_KEY") or "").strip()
if token:
source = "env:OPENROUTER_API_KEY"
if _is_source_suppressed(provider, source):
@@ -1299,7 +1301,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")
env_vars = list(pconfig.api_key_env_vars)
if provider == "anthropic":
@@ -1310,7 +1312,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
]
for env_var in env_vars:
token = os.getenv(env_var, "").strip()
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value(env_var) or "").strip()
if not token:
continue
source = f"env:{env_var}"
+39 -8
View File
@@ -106,9 +106,11 @@ _endpoint_model_metadata_cache_time: Dict[str, float] = {}
_ENDPOINT_MODEL_CACHE_TTL = 300
# Descending tiers for context length probing when the model is unknown.
# We start at 128K (a safe default for most modern models) and step down
# on context-length errors until one works.
# We start at 256K (covers GPT-5.x, many current large-context models) and
# step down on context-length errors until one works. Tier[0] is also the
# default fallback when no detection method succeeds.
CONTEXT_PROBE_TIERS = [
256_000,
128_000,
64_000,
32_000,
@@ -143,10 +145,11 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
"gpt-5.5": 400000,
# GPT-5.5 (launched Apr 23 2026) is 1.05M on the direct OpenAI API and
# ChatGPT Codex OAuth caps it at 272K; both paths resolve via their own
# provider-aware branches (_resolve_codex_oauth_context_length + models.dev).
# This hardcoded value is only reached when every probe misses.
"gpt-5.5": 1050000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4": 1050000, # GPT-5.4, GPT-5.4 Pro (1.05M context)
@@ -162,7 +165,17 @@ DEFAULT_CONTEXT_LENGTHS = {
"gemma-4-31b": 256000,
"gemma-3": 131072,
"gemma": 8192, # fallback for older gemma models
# DeepSeek
# DeepSeek — V4 family ships with a 1M context window. The legacy
# aliases ``deepseek-chat`` / ``deepseek-reasoner`` are server-side
# mapped to the non-thinking / thinking modes of ``deepseek-v4-flash``
# and inherit the same 1M window. The ``deepseek`` substring entry
# below remains as a 128K fallback for older / unknown DeepSeek model
# ids (e.g. via custom endpoints).
# https://api-docs.deepseek.com/zh-cn/quick_start/pricing
"deepseek-v4-pro": 1_000_000,
"deepseek-v4-flash": 1_000_000,
"deepseek-chat": 1_000_000,
"deepseek-reasoner": 1_000_000,
"deepseek": 128000,
# Meta
"llama": 131072,
@@ -1193,6 +1206,7 @@ def get_model_context_length(
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
custom_providers: list | None = None,
) -> int:
"""Get the context length for a model.
@@ -1213,6 +1227,23 @@ def get_model_context_length(
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# 0b. custom_providers per-model override — check before any probe.
# This closes the gap where /model switch and display paths used to fall
# back to 128K despite the user having a per-model context_length set.
# See #15779.
if custom_providers and base_url and model:
try:
from hermes_cli.config import get_custom_provider_context_length
cp_ctx = get_custom_provider_context_length(
model=model,
base_url=base_url,
custom_providers=custom_providers,
)
if cp_ctx:
return cp_ctx
except Exception:
pass # fall through to probing
# Normalise provider-prefixed model names (e.g. "local:model-name" →
# "model-name") so cache lookups and server queries use the bare ID that
# local servers actually know about. Ollama "model:tag" colons are preserved.
@@ -1352,7 +1383,7 @@ def get_model_context_length(
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
return metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
+142
View File
@@ -180,3 +180,145 @@ def format_remaining(seconds: float) -> str:
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
# Buckets with reset windows shorter than this are treated as transient
# (upstream jitter, secondary throttling) rather than a genuine quota
# exhaustion worth a cross-session breaker trip.
_MIN_RESET_FOR_BREAKER_SECONDS = 60.0
def is_genuine_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
last_known_state: Optional[Any] = None,
) -> bool:
"""Decide whether a 429 from Nous Portal is a real account rate limit.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes, ...) behind one endpoint. A 429 can mean either:
(a) The caller's own RPM / RPH / TPM / TPH bucket on Nous is
exhausted — a genuine rate limit that will last until the
bucket resets.
(b) The upstream provider is out of capacity for a specific model
— transient, clears in seconds, and has nothing to do with
the caller's quota on Nous.
Tripping the cross-session breaker on (b) blocks ALL Nous requests
(and all models, since Nous is one provider key) for minutes even
though the caller's account is healthy and a different model would
have worked. That's the bug users hit when DeepSeek V4 Pro 429s
trigger a breaker that then blocks Kimi 2.6 and MiMo V2.5 Pro.
We tell the two apart by looking at:
1. The 429 response's own ``x-ratelimit-*`` headers. Nous emits
the full suite on every response including 429s. An exhausted
bucket (``remaining == 0`` with a reset window >= 60s) is
proof of (a).
2. The last-known-good rate-limit state captured by
``_capture_rate_limits()`` on the previous successful
response. If any bucket there was already near-exhausted with
a substantial reset window, the current 429 is almost
certainly (a) continuing from that condition.
If neither signal fires, we treat the 429 as (b): fail the single
request, let the retry loop or model-switch proceed, and do NOT
write the cross-session breaker file.
Returns True when the evidence points at (a).
"""
# Signal 1: current 429 response headers.
state = _parse_buckets_from_headers(headers)
if _has_exhausted_bucket(state):
return True
# Signal 2: last-known-good state from a recent successful response.
# Accepts either a RateLimitState (dataclass from rate_limit_tracker)
# or a dict of bucket snapshots.
if last_known_state is not None and _has_exhausted_bucket_in_object(last_known_state):
return True
return False
def _parse_buckets_from_headers(
headers: Optional[Mapping[str, str]],
) -> dict[str, tuple[Optional[int], Optional[float]]]:
"""Extract (remaining, reset_seconds) per bucket from x-ratelimit-* headers.
Returns empty dict when no rate-limit headers are present.
"""
if not headers:
return {}
lowered = {k.lower(): v for k, v in headers.items()}
if not any(k.startswith("x-ratelimit-") for k in lowered):
return {}
def _maybe_int(raw: Optional[str]) -> Optional[int]:
if raw is None:
return None
try:
return int(float(raw))
except (TypeError, ValueError):
return None
def _maybe_float(raw: Optional[str]) -> Optional[float]:
if raw is None:
return None
try:
return float(raw)
except (TypeError, ValueError):
return None
result: dict[str, tuple[Optional[int], Optional[float]]] = {}
for tag in ("requests", "requests-1h", "tokens", "tokens-1h"):
remaining = _maybe_int(lowered.get(f"x-ratelimit-remaining-{tag}"))
reset = _maybe_float(lowered.get(f"x-ratelimit-reset-{tag}"))
if remaining is not None or reset is not None:
result[tag] = (remaining, reset)
return result
def _has_exhausted_bucket(
buckets: Mapping[str, tuple[Optional[int], Optional[float]]],
) -> bool:
"""Return True when any bucket has remaining == 0 AND a meaningful reset window."""
for remaining, reset in buckets.values():
if remaining is None or remaining > 0:
continue
if reset is None:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
def _has_exhausted_bucket_in_object(state: Any) -> bool:
"""Check a RateLimitState-like object for an exhausted bucket.
Accepts the dataclass from ``agent.rate_limit_tracker`` (buckets
exposed as attributes ``requests_min``, ``requests_hour``,
``tokens_min``, ``tokens_hour``) and falls back gracefully for any
object missing those attributes.
"""
for attr in ("requests_min", "requests_hour", "tokens_min", "tokens_hour"):
bucket = getattr(state, attr, None)
if bucket is None:
continue
limit = getattr(bucket, "limit", 0) or 0
remaining = getattr(bucket, "remaining", 0) or 0
# Prefer the adjusted "remaining_seconds_now" property when present;
# fall back to raw reset_seconds.
reset = getattr(bucket, "remaining_seconds_now", None)
if reset is None:
reset = getattr(bucket, "reset_seconds", 0.0) or 0.0
if limit <= 0:
continue
if remaining > 0:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
+191
View File
@@ -0,0 +1,191 @@
"""
Contextual first-touch onboarding hints.
Instead of blocking first-run questionnaires, show a one-time hint the *first*
time a user hits a behavior fork — message-while-running, first long-running
tool, etc. Each hint is shown once per install (tracked in ``config.yaml`` under
``onboarding.seen.<flag>``) and then never again.
Keep this module tiny and dependency-free so both the CLI and gateway can import
it without pulling in heavy modules.
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
# -------------------------------------------------------------------------
# Flag names (stable — used as config.yaml keys under onboarding.seen)
# -------------------------------------------------------------------------
BUSY_INPUT_FLAG = "busy_input_prompt"
TOOL_PROGRESS_FLAG = "tool_progress_prompt"
OPENCLAW_RESIDUE_FLAG = "openclaw_residue_cleanup"
# -------------------------------------------------------------------------
# Hint content
# -------------------------------------------------------------------------
def busy_input_hint_gateway(mode: str) -> str:
"""Hint shown the first time a user messages while the agent is busy.
``mode`` is the effective busy_input_mode that was just applied, so the
message matches reality ("I just interrupted…" vs "I just queued…").
"""
if mode == "queue":
return (
"💡 First-time tip — I queued your message instead of interrupting. "
"Send `/busy interrupt` to make new messages stop the current task "
"immediately, or `/busy status` to check. This notice won't appear again."
)
if mode == "steer":
return (
"💡 First-time tip — I steered your message into the current run; "
"it will arrive after the next tool call instead of interrupting. "
"Send `/busy interrupt` or `/busy queue` to change this, or "
"`/busy status` to check. This notice won't appear again."
)
return (
"💡 First-time tip — I just interrupted my current task to answer you. "
"Send `/busy queue` to queue follow-ups for after the current task instead, "
"`/busy steer` to inject them mid-run without interrupting, or "
"`/busy status` to check. This notice won't appear again."
)
def busy_input_hint_cli(mode: str) -> str:
"""CLI version of the busy-input hint (plain text, no markdown)."""
if mode == "queue":
return (
"(tip) Your message was queued for the next turn. "
"Use /busy interrupt to make Enter stop the current run instead, "
"or /busy steer to inject mid-run. This tip only shows once."
)
if mode == "steer":
return (
"(tip) Your message was steered into the current run; it arrives "
"after the next tool call. Use /busy interrupt or /busy queue to "
"change this. This tip only shows once."
)
return (
"(tip) Your message interrupted the current run. "
"Use /busy queue to queue messages for the next turn instead, "
"or /busy steer to inject mid-run. This tip only shows once."
)
def tool_progress_hint_gateway() -> str:
return (
"💡 First-time tip — that tool took a while and I'm streaming every step. "
"If the progress messages feel noisy, send `/verbose` to cycle modes "
"(all → new → off). This notice won't appear again."
)
def tool_progress_hint_cli() -> str:
return (
"(tip) That tool ran for a while. Use /verbose to cycle tool-progress "
"display modes (all -> new -> off -> verbose). This tip only shows once."
)
def openclaw_residue_hint_cli() -> str:
"""Banner shown the first time Hermes starts and finds ``~/.openclaw/``.
OpenClaw-era config, memory, and skill paths in ``~/.openclaw/`` will
otherwise attract the agent (memory entries like ``~/.openclaw/config.yaml``
get carried forward and the agent dutifully reads them). ``hermes claw
cleanup`` renames the directory so the agent stops finding it.
"""
return (
"Heads up — an OpenClaw workspace was detected at ~/.openclaw/.\n"
"After migrating, the agent can still get confused and read that "
"directory's config/memory instead of Hermes's.\n"
"Run `hermes claw cleanup` to archive it (rename → .openclaw.pre-migration). "
"This tip only shows once; rerun it any time with `hermes claw cleanup`."
)
def detect_openclaw_residue(home: Optional[Path] = None) -> bool:
"""Return True if an OpenClaw workspace directory is present in ``$HOME``.
Pure filesystem check — no side effects. ``home`` override exists for tests.
"""
base = home or Path.home()
try:
return (base / ".openclaw").is_dir()
except OSError:
return False
# -------------------------------------------------------------------------
# State read / write
# -------------------------------------------------------------------------
def _get_seen_dict(config: Mapping[str, Any]) -> Mapping[str, Any]:
onboarding = config.get("onboarding") if isinstance(config, Mapping) else None
if not isinstance(onboarding, Mapping):
return {}
seen = onboarding.get("seen")
return seen if isinstance(seen, Mapping) else {}
def is_seen(config: Mapping[str, Any], flag: str) -> bool:
"""Return True if the user has already been shown this first-touch hint."""
return bool(_get_seen_dict(config).get(flag))
def mark_seen(config_path: Path, flag: str) -> bool:
"""Persist ``onboarding.seen.<flag> = True`` to ``config_path``.
Uses the atomic YAML writer so a concurrent process can't observe a
partially-written file. Returns True on success, False on any error
(including the config file being absent — onboarding is best-effort).
"""
try:
import yaml
from utils import atomic_yaml_write
except Exception as e: # pragma: no cover — dependency issue
logger.debug("onboarding: failed to import yaml/utils: %s", e)
return False
try:
cfg: dict = {}
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
if not isinstance(cfg.get("onboarding"), dict):
cfg["onboarding"] = {}
seen = cfg["onboarding"].get("seen")
if not isinstance(seen, dict):
seen = {}
cfg["onboarding"]["seen"] = seen
if seen.get(flag) is True:
return True # already marked — nothing to do
seen[flag] = True
atomic_yaml_write(config_path, cfg)
return True
except Exception as e:
logger.debug("onboarding: failed to mark flag %s: %s", flag, e)
return False
__all__ = [
"BUSY_INPUT_FLAG",
"TOOL_PROGRESS_FLAG",
"OPENCLAW_RESIDUE_FLAG",
"busy_input_hint_gateway",
"busy_input_hint_cli",
"tool_progress_hint_gateway",
"tool_progress_hint_cli",
"openclaw_residue_hint_cli",
"detect_openclaw_residue",
"is_seen",
"mark_seen",
]
+28
View File
@@ -422,6 +422,29 @@ PLATFORM_HINTS = {
"your response. Images are sent as native photos, and other files arrive as downloadable "
"documents."
),
"yuanbao": (
"You are on Yuanbao (腾讯元宝), a Chinese AI assistant platform. "
"Markdown formatting is supported (code blocks, tables, bold/italic). "
"You CAN send media files natively — to deliver a file to the user, include "
"MEDIA:/absolute/path/to/file in your response. The file will be sent as a native "
"Yuanbao attachment: images (.jpg, .png, .webp, .gif) are sent as photos, "
"and other files (.pdf, .docx, .txt, .zip, etc.) arrive as downloadable documents "
"(max 50 MB). You can also include image URLs in markdown format ![alt](url) and "
"they will be downloaded and sent as native photos. "
"Do NOT tell the user you lack file-sending capability — use MEDIA: syntax "
"whenever a file delivery is appropriate.\n\n"
"Stickers (贴纸 / 表情包 / TIM face): Yuanbao has a built-in sticker catalogue. "
"When the user sends a sticker (you see '[emoji: 名称]' in their message) or asks "
"you to send/reply-with a 贴纸/表情/表情包, you MUST use the sticker tools:\n"
" 1. Call yb_search_sticker with a Chinese keyword (e.g. '666', '比心', '吃瓜', "
" '捂脸', '合十') to discover matching sticker_ids.\n"
" 2. Call yb_send_sticker with the chosen sticker_id or name — this sends a real "
" TIMFaceElem that renders as a native sticker in the chat.\n"
"DO NOT draw sticker-like PNGs with execute_code/Pillow/matplotlib and then send "
"them via MEDIA: or send_image_file. That produces a fake low-quality 'sticker' "
"image and is the WRONG path. Bare Unicode emoji in text is also not a substitute "
"— when a sticker is the right response, use yb_send_sticker."
),
}
# ---------------------------------------------------------------------------
@@ -825,6 +848,11 @@ def build_skills_system_prompt(
"Skills also encode the user's preferred approach, conventions, and quality standards "
"for tasks like code review, planning, and testing — load them even for tasks you "
"already know how to do, because the skill defines how it should be done here.\n"
"Whenever the user asks you to configure, set up, install, enable, disable, modify, "
"or troubleshoot Hermes Agent itself — its CLI, config, models, providers, tools, "
"skills, voice, gateway, plugins, or any feature — load the `hermes-agent` skill "
"first. It has the actual commands (e.g. `hermes config set …`, `hermes tools`, "
"`hermes setup`) so you don't have to guess or invent workarounds.\n"
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
+5 -1
View File
@@ -754,7 +754,11 @@ def _resolve_effective_accept(
if env in ("1", "true", "yes", "on"):
return True
cfg_val = cfg.get("hooks_auto_accept", False)
return bool(cfg_val)
if isinstance(cfg_val, bool):
return cfg_val
if isinstance(cfg_val, str):
return cfg_val.strip().lower() in ("1", "true", "yes", "on")
return False
# ---------------------------------------------------------------------------
+2 -2
View File
@@ -329,7 +329,7 @@ def build_skill_invocation_message(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want '
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
"you to follow its instructions. The full skill content is loaded below.]"
)
return _build_skill_message(
@@ -368,7 +368,7 @@ def build_preloaded_skills_prompt(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[SYSTEM: The user launched this CLI session with the "{skill_name}" skill '
f'[IMPORTANT: The user launched this CLI session with the "{skill_name}" skill '
"preloaded. Treat its instructions as active guidance for the duration of this "
"session unless the user overrides them.]"
)
+7 -2
View File
@@ -23,9 +23,14 @@ def get_transport(api_mode: str):
This allows gradual migration — call sites can check for None
and fall back to the legacy code path.
"""
if not _REGISTRY:
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
# The registry can be partially populated when a specific transport
# module was imported directly (for example chat_completions before
# codex). Discover on misses, not only when the registry is empty, so
# test/order-dependent imports do not make valid api_modes unavailable.
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
return None
return cls()
+5 -4
View File
@@ -31,15 +31,15 @@ class ChatCompletionsTransport(ProviderTransport):
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` on the
message, ``call_id``/``response_item_id`` on tool_calls) that strict
chat-completions providers reject with 400/422.
Strips Codex Responses API fields (``codex_reasoning_items`` /
``codex_message_items`` on the message, ``call_id``/``response_item_id``
on tool_calls) that strict chat-completions providers reject with 400/422.
"""
needs_sanitize = False
for msg in messages:
if not isinstance(msg, dict):
continue
if "codex_reasoning_items" in msg:
if "codex_reasoning_items" in msg or "codex_message_items" in msg:
needs_sanitize = True
break
tool_calls = msg.get("tool_calls")
@@ -59,6 +59,7 @@ class ChatCompletionsTransport(ProviderTransport):
if not isinstance(msg, dict):
continue
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
+20
View File
@@ -120,6 +120,24 @@ class ResponsesApiTransport(ProviderTransport):
if request_overrides:
kwargs.update(request_overrides)
if is_codex_backend:
prompt_cache_key = kwargs.get("prompt_cache_key")
cache_scope_id = str(prompt_cache_key or session_id or "").strip()
if cache_scope_id:
existing_extra_headers = kwargs.get("extra_headers")
merged_extra_headers: Dict[str, str] = {}
if isinstance(existing_extra_headers, dict):
merged_extra_headers.update(
{
str(key): str(value)
for key, value in existing_extra_headers.items()
if key and value is not None
}
)
merged_extra_headers["session_id"] = cache_scope_id
merged_extra_headers["x-client-request-id"] = cache_scope_id
kwargs["extra_headers"] = merged_extra_headers
max_tokens = params.get("max_tokens")
if max_tokens is not None and not is_codex_backend:
kwargs["max_output_tokens"] = max_tokens
@@ -160,6 +178,8 @@ class ResponsesApiTransport(ProviderTransport):
provider_data = {}
if msg and hasattr(msg, "codex_reasoning_items") and msg.codex_reasoning_items:
provider_data["codex_reasoning_items"] = msg.codex_reasoning_items
if msg and hasattr(msg, "codex_message_items") and msg.codex_message_items:
provider_data["codex_message_items"] = msg.codex_message_items
if msg and hasattr(msg, "reasoning_details") and msg.reasoning_details:
provider_data["reasoning_details"] = msg.reasoning_details
+6 -1
View File
@@ -97,7 +97,7 @@ class NormalizedResponse:
Response-level ``provider_data`` examples:
* Anthropic: ``{"reasoning_details": [...]}``
* Codex: ``{"codex_reasoning_items": [...]}``
* Codex: ``{"codex_reasoning_items": [...], "codex_message_items": [...]}``
* Others: ``None``
"""
@@ -126,6 +126,11 @@ class NormalizedResponse:
pd = self.provider_data or {}
return pd.get("codex_reasoning_items")
@property
def codex_message_items(self):
pd = self.provider_data or {}
return pd.get("codex_message_items")
# ---------------------------------------------------------------------------
# Factory helpers
+28 -8
View File
@@ -606,6 +606,7 @@ platform_toolsets:
signal: [hermes-signal]
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
yuanbao: [hermes-yuanbao]
# =============================================================================
# Gateway Platform Settings
@@ -824,7 +825,9 @@ delegation:
# Display
# =============================================================================
display:
# Use compact banner mode
# Use compact banner mode (hides the ASCII-art banner, shows a single line).
# true: Compact single-line banner
# false: Full ASCII banner with tool/skill summary (default)
compact: false
# Tool progress display level (CLI and gateway)
@@ -838,12 +841,19 @@ display:
# Gateway-only natural mid-turn assistant updates.
# When true, completed assistant status messages are sent as separate chat
# messages. This is independent of tool_progress and gateway streaming.
# true: Send mid-turn assistant updates as separate messages (default)
# false: Only send the final response
interim_assistant_messages: true
# What Enter does when Hermes is already busy in the CLI.
# What Enter does when Hermes is already busy (CLI and gateway platforms).
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# Ctrl+C always interrupts regardless of this setting.
# steer: Inject your message mid-run via /steer, arriving at the agent
# after the next tool call — no interrupt, no role violation.
# Falls back to 'queue' if the agent isn't running yet or if
# images are attached (steer only carries text).
# Ctrl+C (or /stop in gateway) always interrupts regardless of this setting.
# Toggle at runtime with /busy <interrupt|queue|steer>.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
@@ -859,17 +869,22 @@ display:
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
# true: Ring the terminal bell on each response
# false: Silent (default)
bell_on_complete: false
# Show model reasoning/thinking before each response.
# When enabled, a dim box shows the model's thought process above the response.
# Toggle at runtime with /reasoning show or /reasoning hide.
# true: Show the reasoning box
# false: Hide reasoning (default)
show_reasoning: false
# Stream tokens to the terminal as they arrive instead of waiting for the
# full response. The response box opens on first token and text appears
# line-by-line. Tool calls are still captured silently.
# Stream tokens to the terminal in real-time. Disable to wait for full responses.
# true: Stream tokens as they arrive (default)
# false: Wait for the full response before rendering
streaming: true
# ───────────────────────────────────────────────────────────────────────────
@@ -879,10 +894,15 @@ display:
# response box label, and branding text. Change at runtime with /skin <name>.
#
# Built-in skins:
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# daylight — Bright light-mode theme
# warm-lightmode — Warm paper-tone light-mode theme
# poseidon — Sea-green/teal Olympian theme
# sisyphus — Earthy stone-and-moss theme
# charizard — Fiery orange dragon theme
#
# Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
# Schema (all fields optional, missing values inherit from default):
+254 -185
View File
@@ -22,6 +22,7 @@ import re
import concurrent.futures
import base64
import atexit
import errno
import tempfile
import time
import uuid
@@ -416,6 +417,11 @@ def load_cli_config() -> Dict[str, Any]:
"base_url": "", # Direct OpenAI-compatible endpoint for subagents
"api_key": "", # API key for delegation.base_url (falls back to OPENAI_API_KEY)
},
"onboarding": {
# First-touch hint flags (see agent/onboarding.py). Each hint is
# shown once per install then latched here.
"seen": {},
},
}
# Track whether the config file explicitly set terminal config.
@@ -968,6 +974,7 @@ def _run_state_db_auto_maintenance(session_db) -> None:
return
try:
from hermes_cli.config import load_config as _load_full_config
from hermes_constants import get_hermes_home as _get_hermes_home
cfg = (_load_full_config().get("sessions") or {})
if not cfg.get("auto_prune", False):
return
@@ -975,11 +982,35 @@ def _run_state_db_auto_maintenance(session_db) -> None:
retention_days=int(cfg.get("retention_days", 90)),
min_interval_hours=int(cfg.get("min_interval_hours", 24)),
vacuum=bool(cfg.get("vacuum_after_prune", True)),
sessions_dir=_get_hermes_home() / "sessions",
)
except Exception as exc:
logger.debug("state.db auto-maintenance skipped: %s", exc)
def _run_checkpoint_auto_maintenance() -> None:
"""Call ``checkpoint_manager.maybe_auto_prune_checkpoints`` using current config.
Reads the ``checkpoints:`` section from config.yaml via
:func:`hermes_cli.config.load_config`. Honours ``auto_prune`` /
``retention_days`` / ``delete_orphans`` / ``min_interval_hours``.
Never raises maintenance must never block interactive startup.
"""
try:
from hermes_cli.config import load_config as _load_full_config
cfg = (_load_full_config().get("checkpoints") or {})
if not cfg.get("auto_prune", False):
return
from tools.checkpoint_manager import maybe_auto_prune_checkpoints
maybe_auto_prune_checkpoints(
retention_days=int(cfg.get("retention_days", 7)),
min_interval_hours=int(cfg.get("min_interval_hours", 24)),
delete_orphans=bool(cfg.get("delete_orphans", True)),
)
except Exception as exc:
logger.debug("checkpoint auto-maintenance skipped: %s", exc)
def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
"""Remove stale worktrees and orphaned branches on startup.
@@ -1372,7 +1403,7 @@ def _resolve_attachment_path(raw_path: str) -> Path | None:
def _format_process_notification(evt: dict) -> "str | None":
"""Format a process notification event into a [SYSTEM: ...] message.
"""Format a process notification event into a [IMPORTANT: ...] message.
Handles both completion events (notify_on_complete) and watch pattern
match events from the unified completion_queue.
@@ -1382,14 +1413,14 @@ def _format_process_notification(evt: dict) -> "str | None":
_cmd = evt.get("command", "unknown")
if evt_type == "watch_disabled":
return f"[SYSTEM: {evt.get('message', '')}]"
return f"[IMPORTANT: {evt.get('message', '')}]"
if evt_type == "watch_match":
_pat = evt.get("pattern", "?")
_out = evt.get("output", "")
_sup = evt.get("suppressed", 0)
text = (
f"[SYSTEM: Background process {_sid} matched "
f"[IMPORTANT: Background process {_sid} matched "
f"watch pattern \"{_pat}\".\n"
f"Command: {_cmd}\n"
f"Matched output:\n{_out}"
@@ -1403,7 +1434,7 @@ def _format_process_notification(evt: dict) -> "str | None":
_exit = evt.get("exit_code", "?")
_out = evt.get("output", "")
return (
f"[SYSTEM: Background process {_sid} completed "
f"[IMPORTANT: Background process {_sid} completed "
f"(exit code {_exit}).\n"
f"Command: {_cmd}\n"
f"Output:\n{_out}]"
@@ -1842,9 +1873,16 @@ class HermesCLI:
self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
# show_reasoning: display model thinking/reasoning before the response
self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
# busy_input_mode: "interrupt" (Enter interrupts current run) or "queue" (Enter queues for next turn)
_bim = CLI_CONFIG["display"].get("busy_input_mode", "interrupt")
self.busy_input_mode = "queue" if str(_bim).strip().lower() == "queue" else "interrupt"
# busy_input_mode: "interrupt" (Enter interrupts current run),
# "queue" (Enter queues for next turn), or "steer" (Enter injects
# mid-run via /steer, arriving after the next tool call).
_bim = str(CLI_CONFIG["display"].get("busy_input_mode", "interrupt")).strip().lower()
if _bim == "queue":
self.busy_input_mode = "queue"
elif _bim == "steer":
self.busy_input_mode = "steer"
else:
self.busy_input_mode = "interrupt"
self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
@@ -2039,6 +2077,11 @@ class HermesCLI:
# Never blocks startup on failure.
_run_state_db_auto_maintenance(self._session_db)
# Opportunistic shadow-repo cleanup — deletes orphan/stale
# checkpoint repos under ~/.hermes/checkpoints/. Opt-in via
# checkpoints.auto_prune, idempotent via .last_prune marker.
_run_checkpoint_auto_maintenance()
# Deferred title: stored in memory until the session is created in the DB
self._pending_title: Optional[str] = None
@@ -3176,7 +3219,14 @@ class HermesCLI:
# the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
runtime_model = runtime.get("model")
if runtime_model and isinstance(runtime_model, str):
self.model = runtime_model
# Only use runtime model if: model is unset, or model equals provider name
should_use_runtime_model = (
not self.model or # No model configured yet
self.model == self.provider or # Model is the provider slug
self.model == runtime.get("name") # Model matches provider display name
)
if should_use_runtime_model:
self.model = runtime_model
# If model is still empty (e.g. user ran `hermes auth add openai-codex`
# without `hermes model`), fall back to the provider's first catalog
@@ -4311,7 +4361,7 @@ class HermesCLI:
_cprint(f"\n {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
_cprint(f" {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
_cprint(f" {_DIM}Draft editor: Ctrl+G{_RST}")
_cprint(f" {_DIM}Draft editor: Ctrl+G (Alt+G in VSCode/Cursor){_RST}")
if _is_termux_environment():
_cprint(f" {_DIM}Attach image: /image {_termux_example_image_path()} or start your prompt with a local image path{_RST}\n")
else:
@@ -4661,10 +4711,6 @@ class HermesCLI:
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
@@ -4906,6 +4952,12 @@ class HermesCLI:
if self.agent:
self.agent.session_id = new_session_id
self.agent.session_start = now
# Redirect the JSON session log to the new branch session file so
# messages written after branching land in the correct file.
if hasattr(self.agent, "session_log_file") and hasattr(self.agent, "logs_dir"):
self.agent.session_log_file = (
self.agent.logs_dir / f"session_{new_session_id}.json"
)
self.agent.reset_session_state()
if hasattr(self.agent, "_last_flushed_db_idx"):
self.agent._last_flushed_db_idx = len(self.conversation_history)
@@ -4927,22 +4979,37 @@ class HermesCLI:
_cprint(f" Branch session: {new_session_id}")
def save_conversation(self):
"""Save the current conversation to a file."""
"""Save the current conversation to a JSON snapshot under ~/.hermes/sessions/saved/.
The snapshot is a convenience export for sharing or off-line inspection;
every message is already persisted incrementally to the SQLite session
DB, so the live session remains resumable via ``hermes --resume <id>``
regardless of whether the user ever runs ``/save``.
"""
if not self.conversation_history:
print("(;_;) No conversation to save.")
return
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"hermes_conversation_{timestamp}.json"
saved_dir = get_hermes_home() / "sessions" / "saved"
try:
with open(filename, "w", encoding="utf-8") as f:
saved_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
print(f"(x_x) Failed to create save directory {saved_dir}: {e}")
return
path = saved_dir / f"hermes_conversation_{timestamp}.json"
try:
with open(path, "w", encoding="utf-8") as f:
json.dump({
"model": self.model,
"session_id": self.session_id,
"session_start": self.session_start.isoformat(),
"messages": self.conversation_history,
}, f, indent=2, ensure_ascii=False)
print(f"(^_^)v Conversation saved to: {filename}")
print(f"(^_^)v Conversation snapshot saved to: {path}")
if self.session_id:
print(f" Resume the live session with: hermes --resume {self.session_id}")
except Exception as e:
print(f"(x_x) Failed to save: {e}")
@@ -5149,27 +5216,29 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
mi = result.model_info
try:
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
cache_enabled = (
(base_url_host_matches(result.base_url or "", "openrouter.ai") and "claude" in result.new_model.lower())
@@ -5270,24 +5339,22 @@ class HermesCLI:
# Parse --provider and --global flags
model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
# Load providers for switch_model (picker path needs them below)
user_provs = None
custom_provs = None
try:
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
user_provs = cfg.get("providers")
custom_provs = get_compatible_custom_providers(cfg)
except Exception:
pass
# No args at all: open prompt_toolkit-native picker modal
if not model_input and not explicit_provider:
model_display = self.model or "unknown"
provider_display = get_label(self.provider) if self.provider else "unknown"
user_provs = None
custom_provs = None
try:
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
user_provs = cfg.get("providers")
custom_provs = get_compatible_custom_providers(cfg)
except Exception:
pass
try:
providers = list_authenticated_providers(
current_provider=self.provider or "",
@@ -6120,8 +6187,6 @@ class HermesCLI:
self._handle_agents_command()
elif canonical == "background":
self._handle_background_command(cmd_original)
elif canonical == "btw":
self._handle_btw_command(cmd_original)
elif canonical == "queue":
# Extract prompt after "/queue " or "/q "
parts = cmd_original.split(None, 1)
@@ -6300,6 +6365,12 @@ class HermesCLI:
turn_route = self._resolve_turn_agent_config(prompt)
def run_background():
set_sudo_password_callback(self._sudo_password_callback)
set_approval_callback(self._approval_callback)
try:
set_secret_capture_callback(self._secret_capture_callback)
except Exception:
pass
try:
bg_agent = AIAgent(
model=turn_route["model"],
@@ -6397,6 +6468,12 @@ class HermesCLI:
print()
_cprint(f" ❌ Background task #{task_num} failed: {e}")
finally:
try:
set_sudo_password_callback(None)
set_approval_callback(None)
set_secret_capture_callback(None)
except Exception:
pass
self._background_tasks.pop(task_id, None)
# Clear spinner only if no foreground agent owns it
if not self._agent_running:
@@ -6408,122 +6485,6 @@ class HermesCLI:
self._background_tasks[task_id] = thread
thread.start()
def _handle_btw_command(self, cmd: str):
"""Handle /btw <question> — ephemeral side question using session context.
Snapshots the current conversation history, spawns a no-tools agent in
a background thread, and prints the answer without persisting anything
to the main session.
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or not parts[1].strip():
_cprint(" Usage: /btw <question>")
_cprint(" Example: /btw what module owns session title sanitization?")
_cprint(" Answers using session context. No tools, not persisted.")
return
question = parts[1].strip()
task_id = f"btw_{datetime.now().strftime('%H%M%S')}_{uuid.uuid4().hex[:6]}"
if not self._ensure_runtime_credentials():
_cprint(" (>_<) Cannot start /btw: no valid credentials.")
return
turn_route = self._resolve_turn_agent_config(question)
history_snapshot = list(self.conversation_history)
preview = question[:60] + ("..." if len(question) > 60 else "")
_cprint(f' 💬 /btw: "{preview}"')
def run_btw():
try:
btw_agent = AIAgent(
model=turn_route["model"],
api_key=turn_route["runtime"].get("api_key"),
base_url=turn_route["runtime"].get("base_url"),
provider=turn_route["runtime"].get("provider"),
api_mode=turn_route["runtime"].get("api_mode"),
acp_command=turn_route["runtime"].get("command"),
acp_args=turn_route["runtime"].get("args"),
max_iterations=8,
enabled_toolsets=[],
quiet_mode=True,
verbose_logging=False,
session_id=task_id,
platform="cli",
reasoning_config=self.reasoning_config,
service_tier=self.service_tier,
request_overrides=turn_route.get("request_overrides"),
providers_allowed=self._providers_only,
providers_ignored=self._providers_ignore,
providers_order=self._providers_order,
provider_sort=self._provider_sort,
provider_require_parameters=self._provider_require_params,
provider_data_collection=self._provider_data_collection,
fallback_model=self._fallback_model,
session_db=None,
skip_memory=True,
skip_context_files=True,
persist_session=False,
)
btw_prompt = (
"[Ephemeral /btw side question. Answer using the conversation "
"context. No tools available. Be direct and concise.]\n\n"
+ question
)
result = btw_agent.run_conversation(
user_message=btw_prompt,
conversation_history=history_snapshot,
task_id=task_id,
)
response = (result.get("final_response") or "") if result else ""
if not response and result and result.get("error"):
response = f"Error: {result['error']}"
# TUI refresh before printing
if self._app:
self._app.invalidate()
time.sleep(0.05)
print()
if response:
try:
from hermes_cli.skin_engine import get_active_skin
_skin = get_active_skin()
_resp_color = _skin.get_color("response_border", "#4F6D4A")
except Exception:
_resp_color = "#4F6D4A"
ChatConsole().print(Panel(
_render_final_assistant_content(response, mode=self.final_response_markdown),
title=f"[{_resp_color} bold]⚕ /btw[/]",
title_align="left",
border_style=_resp_color,
box=rich_box.HORIZONTALS,
padding=(1, 4),
))
else:
_cprint(" 💬 /btw: (no response)")
if self.bell_on_complete:
sys.stdout.write("\a")
sys.stdout.flush()
except Exception as e:
if self._app:
self._app.invalidate()
time.sleep(0.05)
print()
_cprint(f" ❌ /btw failed: {e}")
finally:
if self._app:
self._invalidate(min_interval=0)
thread = threading.Thread(target=run_btw, daemon=True, name=f"btw-{task_id}")
thread.start()
@staticmethod
def _try_launch_chrome_debug(port: int, system: str) -> bool:
"""Try to launch Chrome/Chromium with remote debugging enabled.
@@ -6907,24 +6868,36 @@ class HermesCLI:
/busy Show current busy input mode
/busy status Show current busy input mode
/busy queue Queue input for the next turn instead of interrupting
/busy steer Inject Enter mid-run via /steer (after next tool call)
/busy interrupt Interrupt the current run on Enter (default)
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or parts[1].strip().lower() == "status":
_cprint(f" {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
_cprint(f" {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
if self.busy_input_mode == "queue":
_behavior = "queues for next turn"
elif self.busy_input_mode == "steer":
_behavior = "steers into current run (after next tool call)"
else:
_behavior = "interrupts current run"
_cprint(f" {_DIM}Enter while busy: {_behavior}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|steer|interrupt|status]{_RST}")
return
arg = parts[1].strip().lower()
if arg not in {"queue", "interrupt"}:
if arg not in {"queue", "interrupt", "steer"}:
_cprint(f" {_DIM}(._.) Unknown argument: {arg}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|steer|interrupt|status]{_RST}")
return
self.busy_input_mode = arg
if save_config_value("display.busy_input_mode", arg):
behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
if arg == "queue":
behavior = "Enter will queue follow-up input while Hermes is busy."
elif arg == "steer":
behavior = "Enter will steer your message into the current run (after the next tool call)."
else:
behavior = "Enter will interrupt the current run while Hermes is busy."
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
_cprint(f" {_DIM}{behavior}{_RST}")
else:
@@ -7326,7 +7299,7 @@ class HermesCLI:
change_detail = ". ".join(change_parts) + ". " if change_parts else ""
self.conversation_history.append({
"role": "user",
"content": f"[SYSTEM: MCP servers have been reloaded. {change_detail}{tool_summary}. The tool list for this conversation has been updated accordingly.]",
"content": f"[IMPORTANT: MCP servers have been reloaded. {change_detail}{tool_summary}. The tool list for this conversation has been updated accordingly.]",
})
# Persist session immediately so the session log reflects the
@@ -7408,6 +7381,31 @@ class HermesCLI:
_cprint(f" {line}")
except Exception:
pass
# First-touch onboarding: on the first tool in this process
# that takes longer than the threshold while we're in the
# noisiest progress mode, print a one-time hint about
# /verbose. Latched on self so it fires at most once per
# process; persisted to config.yaml so it never fires again
# across processes either.
try:
if (
not getattr(self, "_long_tool_hint_fired", False)
and self.tool_progress_mode == "all"
and duration >= 30.0
):
from agent.onboarding import (
TOOL_PROGRESS_FLAG,
is_seen,
mark_seen,
tool_progress_hint_cli,
)
if not is_seen(CLI_CONFIG, TOOL_PROGRESS_FLAG):
self._long_tool_hint_fired = True
_cprint(f" {_DIM}{tool_progress_hint_cli()}{_RST}")
mark_seen(_hermes_home / "config.yaml", TOOL_PROGRESS_FLAG)
CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[TOOL_PROGRESS_FLAG] = True
except Exception:
pass
self._invalidate()
return
if event_type != "tool.started":
@@ -9075,6 +9073,30 @@ class HermesCLI:
_welcome_text = "Welcome to Hermes Agent! Type your message or /help for commands."
_welcome_color = "#FFF8DC"
self._console_print(f"[{_welcome_color}]{_welcome_text}[/]")
# First-time OpenClaw-residue banner — fires once if ~/.openclaw/ exists
# after an OpenClaw→Hermes migration (especially migrations done by
# OpenClaw's own tool, which doesn't archive the source directory).
try:
from agent.onboarding import (
OPENCLAW_RESIDUE_FLAG,
detect_openclaw_residue,
is_seen,
mark_seen,
openclaw_residue_hint_cli,
)
if not is_seen(self.config, OPENCLAW_RESIDUE_FLAG) and detect_openclaw_residue():
try:
_resid_color = _welcome_skin.get_color("banner_dim", "#B8860B")
except Exception:
_resid_color = "#B8860B"
self._console_print(f"[{_resid_color}]{openclaw_residue_hint_cli()}[/]")
try:
from hermes_cli.config import get_config_path as _get_cfg_path_resid
mark_seen(_get_cfg_path_resid(), OPENCLAW_RESIDUE_FLAG)
except Exception:
pass # best-effort — banner will fire again next session
except Exception:
pass # banner is non-critical — never break startup
# Show a random tip to help users discover features
try:
from hermes_cli.tips import get_random_tip
@@ -9276,12 +9298,34 @@ class HermesCLI:
# Bundle text + images as a tuple when images are present
payload = (text, images) if images else text
if self._agent_running and not (text and _looks_like_slash_command(text)):
if self.busy_input_mode == "queue":
_effective_mode = self.busy_input_mode
if _effective_mode == "steer":
# Route Enter through /steer — inject mid-run after the
# next tool call. Images can't ride along (steer only
# appends text), so fall back to queue when images are
# attached. If the agent lacks steer() or rejects the
# payload, also fall back to queue so nothing is lost.
if images or not text:
_effective_mode = "queue"
else:
accepted = False
try:
if self.agent is not None and hasattr(self.agent, "steer"):
accepted = bool(self.agent.steer(text))
except Exception as exc:
_cprint(f" {_DIM}Steer failed ({exc}) — queued for next turn.{_RST}")
accepted = False
if accepted:
preview = text[:80] + ("..." if len(text) > 80 else "")
_cprint(f" {_ACCENT}⏩ Steered: '{preview}'{_RST}")
else:
_effective_mode = "queue"
if _effective_mode == "queue":
# Queue for the next turn instead of interrupting
self._pending_input.put(payload)
preview = text if text else f"[{len(images)} image{'s' if len(images) != 1 else ''} attached]"
_cprint(f" Queued for the next turn: {preview[:80]}{'...' if len(preview) > 80 else ''}")
else:
elif _effective_mode == "interrupt":
self._interrupt_queue.put(payload)
# Debug: log to file when message enters interrupt queue
try:
@@ -9291,6 +9335,24 @@ class HermesCLI:
f"agent_running={self._agent_running}\n")
except Exception:
pass
# First-touch onboarding: on the very first busy-while-running
# event for this install, print a one-line tip explaining the
# /busy knob. Flag persists to config.yaml and never fires
# again. Guarded for exceptions so onboarding can't break
# the input loop.
try:
from agent.onboarding import (
BUSY_INPUT_FLAG,
busy_input_hint_cli,
is_seen,
mark_seen,
)
if not is_seen(CLI_CONFIG, BUSY_INPUT_FLAG):
_cprint(f" {_DIM}{busy_input_hint_cli(self.busy_input_mode)}{_RST}")
mark_seen(_hermes_home / "config.yaml", BUSY_INPUT_FLAG)
CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[BUSY_INPUT_FLAG] = True
except Exception:
pass
else:
self._pending_input.put(payload)
event.app.current_buffer.reset(append_to_history=True)
@@ -9305,14 +9367,18 @@ class HermesCLI:
"""Ctrl+Enter (c-j) inserts a newline. Most terminals send c-j for Ctrl+Enter."""
event.current_buffer.insert_text('\n')
@kb.add(
'c-g',
filter=Condition(
lambda: not self._clarify_state and not self._approval_state and not self._sudo_state and not self._secret_state
),
# VSCode/Cursor bind Ctrl+G to "Find Next" at the editor level, so
# the keystroke never reaches the embedded terminal. Alt+G is unbound
# in those IDEs and arrives here as ('escape', 'g') — register it as
# a fallback so the editor handoff works inside Cursor/VSCode too.
_editor_filter = Condition(
lambda: not self._clarify_state and not self._approval_state and not self._sudo_state and not self._secret_state
)
@kb.add('c-g', filter=_editor_filter)
@kb.add('escape', 'g', filter=_editor_filter)
def handle_open_in_editor(event):
"""Ctrl+G opens the current draft in an external editor."""
"""Ctrl+G (or Alt+G in VSCode/Cursor) opens the current draft in an external editor."""
cli_ref._open_external_editor(event.current_buffer)
@kb.add('tab', eager=True)
@@ -9776,6 +9842,11 @@ class HermesCLI:
completer=_completer,
),
)
# Keep prompt_toolkit on its simple tempfile path. Setting
# buffer.tempfile = "prompt.md" triggers its complex-tempfile branch,
# which tries to mkdir() the mkdtemp() directory again and raises
# EEXIST. The suffix keeps markdown highlighting without that bug.
input_area.buffer.tempfile_suffix = '.md'
# Dynamic height: accounts for both explicit newlines AND visual
# wrapping of long lines so the input area always fits its content.
@@ -9898,7 +9969,7 @@ class HermesCLI:
status = cli_ref._command_status or "Processing command..."
return f"{frame} {status}"
if cli_ref._agent_running:
return "type a message + Enter to interrupt, Ctrl+C to cancel"
return "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
if cli_ref._voice_mode:
return "type or Ctrl+B to record"
return ""
@@ -10728,6 +10799,8 @@ class HermesCLI:
return # silently suppress
if isinstance(exc, KeyError) and "is not registered" in str(exc):
return # suppress selector registration failures (#6393)
if isinstance(exc, OSError) and getattr(exc, "errno", None) == errno.EIO:
return # suppress I/O errors from broken stdout on interrupt (#13710)
# Fall back to default handler for everything else
loop.default_exception_handler(context)
@@ -10760,9 +10833,11 @@ class HermesCLI:
except (EOFError, KeyboardInterrupt, BrokenPipeError):
pass
except (KeyError, OSError) as _stdin_err:
# Catch selector registration failures from broken stdin (#6393).
# This is the fallback for cases that slip past the fstat() guard.
if "is not registered" in str(_stdin_err) or "Bad file descriptor" in str(_stdin_err):
# Catch selector registration failures from broken stdin (#6393)
# and I/O errors from broken stdout during interrupt (#13710).
if isinstance(_stdin_err, OSError) and getattr(_stdin_err, "errno", None) == errno.EIO:
pass # suppress broken-stdout I/O errors on interrupt (#13710)
elif "is not registered" in str(_stdin_err) or "Bad file descriptor" in str(_stdin_err):
print(
f"\nError: stdin is not usable ({_stdin_err}).\n"
"This can happen with certain Python installations (e.g. uv-managed cPython on macOS).\n"
@@ -10781,12 +10856,6 @@ class HermesCLI:
self.agent.interrupt()
except Exception:
pass
# Flush memories before exit (only for substantial conversations)
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Shut down voice recorder (release persistent audio stream)
if hasattr(self, '_voice_recorder') and self._voice_recorder:
try:
+14 -1
View File
@@ -16,7 +16,7 @@ import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any
from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
@@ -417,6 +417,7 @@ def create_job(
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
@@ -438,6 +439,9 @@ def create_job(
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
@@ -481,6 +485,14 @@ def create_job(
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
context_from = [context_from.strip()] if context_from.strip() else None
elif isinstance(context_from, list):
context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
else:
context_from = None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
"id": job_id,
@@ -492,6 +504,7 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
+57 -4
View File
@@ -77,7 +77,7 @@ _KNOWN_DELIVERY_PLATFORMS = frozenset({
"telegram", "discord", "slack", "whatsapp", "signal",
"matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
"wecom", "wecom_callback", "weixin", "sms", "email", "webhook", "bluebubbles",
"qqbot",
"qqbot", "yuanbao",
})
# Platforms that support a configured cron/notification home target, mapped to
@@ -337,6 +337,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
"sms": Platform.SMS,
"bluebubbles": Platform.BLUEBUBBLES,
"qqbot": Platform.QQBOT,
"yuanbao": Platform.YUANBAO,
}
# Optionally wrap the content with a header/footer so the user knows this
@@ -671,10 +672,51 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
f"{prompt}"
)
# Inject output from referenced cron jobs as context.
context_from = job.get("context_from")
if context_from:
from cron.jobs import OUTPUT_DIR
if isinstance(context_from, str):
context_from = [context_from]
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
if not job_output_dir.exists():
continue # silent skip — no output yet
output_files = sorted(
job_output_dir.glob("*.md"),
key=lambda f: f.stat().st_mtime,
reverse=True,
)
if not output_files:
continue # silent skip — no output yet
latest_output = output_files[0].read_text(encoding="utf-8").strip()
# Truncate to 8K characters to avoid prompt bloat
_MAX_CONTEXT_CHARS = 8000
if len(latest_output) > _MAX_CONTEXT_CHARS:
latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
if latest_output:
prompt = (
f"## Output from job '{source_job_id}'\n"
"The following is the most recent output from a preceding "
"cron job. Use it as context for your analysis.\n\n"
f"```\n{latest_output}\n```\n\n"
f"{prompt}"
)
else:
continue # silent skip — empty output
except (OSError, PermissionError) as e:
logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
# silent skip — do not pollute the prompt with error messages
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
"[SYSTEM: You are running as a scheduled cron job. "
"[IMPORTANT: You are running as a scheduled cron job. "
"DELIVERY: Your final response will be automatically delivered "
"to the user — do NOT use send_message or try to deliver "
"the output yourself. Just produce your report/output as your "
@@ -710,7 +752,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
parts.append("")
parts.extend(
[
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
"",
content,
]
@@ -718,7 +760,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
if skipped:
notice = (
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
f"[IMPORTANT: The following skill(s) were listed for this job but could not be found "
f"and were skipped: {', '.join(skipped)}. "
f"Start your response with a brief notice so the user is aware, e.g.: "
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
@@ -1267,6 +1309,17 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
# Best-effort sweep of MCP stdio subprocesses that survived their
# session teardown during this tick. Runs AFTER every job has
# finished so active sessions (including live user chats) are
# never touched — only PIDs explicitly detected as orphans in
# tools.mcp_tool._run_stdio's finally block are reaped.
try:
from tools.mcp_tool import _kill_orphaned_mcp_children
_kill_orphaned_mcp_children()
except Exception as _e:
logger.debug("Post-tick MCP orphan cleanup failed: %s", _e)
return sum(_results)
finally:
if fcntl:
+9 -7
View File
@@ -41,6 +41,15 @@ if [ "$(id -u)" = "0" ]; then
echo "Warning: chown failed (rootless container?) — continuing anyway"
fi
# Ensure config.yaml is readable by the hermes runtime user even if it was
# edited on the host after initial ownership setup. Must run here (as root)
# rather than after the gosu drop, otherwise a non-root caller like
# `docker run -u $(id -u):$(id -g)` hits "Operation not permitted" (#15865).
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
fi
echo "Dropping root privileges"
exec gosu hermes "$0" "$@"
fi
@@ -67,13 +76,6 @@ if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# Ensure the main config file remains accessible to the hermes runtime user
# even if it was edited on the host after initial ownership setup.
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml"
chmod 640 "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
+67 -14
View File
@@ -57,7 +57,7 @@ def _session_entry_name(origin: Dict[str, Any]) -> str:
# Build / refresh
# ---------------------------------------------------------------------------
def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
async def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
"""
Build a channel directory from connected platform adapters and session data.
@@ -72,7 +72,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
if platform == Platform.DISCORD:
platforms["discord"] = _build_discord(adapter)
elif platform == Platform.SLACK:
platforms["slack"] = _build_slack(adapter)
platforms["slack"] = await _build_slack(adapter)
except Exception as e:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
@@ -136,21 +136,66 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
return channels
def _build_slack(adapter) -> List[Dict[str, str]]:
"""List Slack channels the bot has joined."""
# Slack adapter may expose a web client
client = getattr(adapter, "_app", None) or getattr(adapter, "_client", None)
if not client:
async def _build_slack(adapter) -> List[Dict[str, Any]]:
"""List Slack channels the bot has joined across all workspaces.
Uses ``users.conversations`` against each workspace's web client. Pulls
public + private channels the bot is a member of, then merges in DMs
discovered from session history (IMs aren't useful to enumerate
proactively).
"""
team_clients = getattr(adapter, "_team_clients", None) or {}
if not team_clients:
return _build_from_sessions("slack")
try:
from tools.send_message_tool import _send_slack # noqa: F401
# Use the Slack Web API directly if available
except Exception:
pass
channels: List[Dict[str, Any]] = []
seen_ids: set = set()
# Fallback to session data
return _build_from_sessions("slack")
for team_id, client in team_clients.items():
try:
cursor: Optional[str] = None
for _page in range(20): # safety cap on pagination
response = await client.users_conversations(
types="public_channel,private_channel",
exclude_archived=True,
limit=200,
cursor=cursor,
)
if not response.get("ok"):
logger.warning(
"Channel directory: users.conversations not ok for team %s: %s",
team_id,
response.get("error", "unknown"),
)
break
for ch in response.get("channels", []):
cid = ch.get("id")
name = ch.get("name")
if not cid or not name or cid in seen_ids:
continue
seen_ids.add(cid)
channels.append({
"id": cid,
"name": name,
"type": "private" if ch.get("is_private") else "channel",
})
cursor = (response.get("response_metadata") or {}).get("next_cursor")
if not cursor:
break
except Exception as e:
logger.warning(
"Channel directory: failed to list Slack channels for team %s: %s",
team_id, e,
)
continue
# Merge in DM/group entries discovered from session history.
for entry in _build_from_sessions("slack"):
if entry.get("id") not in seen_ids:
channels.append(entry)
seen_ids.add(entry.get("id"))
return channels
def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
@@ -223,6 +268,14 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
if not channels:
return None
# 0. Exact ID match — case-sensitive, no normalization. Lets callers pass
# raw platform IDs (e.g. Slack "C0B0QV5434G") even when the format guard
# in _parse_target_ref hasn't recognized them as explicit.
raw = name.strip()
for ch in channels:
if ch.get("id") == raw:
return ch["id"]
query = _normalize_channel_query(name)
# 1. Exact name match, including the display labels shown by send_message(action="list")
+68 -2
View File
@@ -67,6 +67,7 @@ class Platform(Enum):
WEIXIN = "weixin"
BLUEBUBBLES = "bluebubbles"
QQBOT = "qqbot"
YUANBAO = "yuanbao"
@dataclass
@@ -195,6 +196,14 @@ class StreamingConfig:
edit_interval: float = 1.0 # Seconds between message edits (Telegram rate-limits at ~1/s)
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
# Ported from openclaw/openclaw#72038. When >0, the final edit for
# a long-running streamed response is delivered as a fresh message
# if the original preview has been visible for at least this many
# seconds, so the platform's visible timestamp reflects completion
# time instead of the preview creation time. Currently applied to
# Telegram only (other platforms ignore the setting). Default 60s
# matches the OpenClaw rollout. Set to 0 to disable.
fresh_final_after_seconds: float = 60.0
def to_dict(self) -> Dict[str, Any]:
return {
@@ -203,6 +212,7 @@ class StreamingConfig:
"edit_interval": self.edit_interval,
"buffer_threshold": self.buffer_threshold,
"cursor": self.cursor,
"fresh_final_after_seconds": self.fresh_final_after_seconds,
}
@classmethod
@@ -215,6 +225,9 @@ class StreamingConfig:
edit_interval=float(data.get("edit_interval", 1.0)),
buffer_threshold=int(data.get("buffer_threshold", 40)),
cursor=data.get("cursor", ""),
fresh_final_after_seconds=float(
data.get("fresh_final_after_seconds", 60.0)
),
)
@@ -314,6 +327,9 @@ class GatewayConfig:
# QQBot uses extra dict for app credentials
elif platform == Platform.QQBOT and config.extra.get("app_id") and config.extra.get("client_secret"):
connected.append(platform)
# Yuanbao uses extra dict for app credentials
elif platform == Platform.YUANBAO and config.extra.get("app_id") and config.extra.get("app_secret"):
connected.append(platform)
# DingTalk uses client_id/client_secret from config.extra or env vars
elif platform == Platform.DINGTALK and (
config.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID")
@@ -570,6 +586,8 @@ def load_gateway_config() -> GatewayConfig:
)
if "reply_prefix" in platform_cfg:
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
if "reply_in_thread" in platform_cfg:
bridged["reply_in_thread"] = platform_cfg["reply_in_thread"]
if "require_mention" in platform_cfg:
bridged["require_mention"] = platform_cfg["require_mention"]
if "free_response_channels" in platform_cfg:
@@ -584,7 +602,7 @@ def load_gateway_config() -> GatewayConfig:
bridged["group_policy"] = platform_cfg["group_policy"]
if "group_allow_from" in platform_cfg:
bridged["group_allow_from"] = platform_cfg["group_allow_from"]
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
if plat in (Platform.DISCORD, Platform.SLACK) and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
@@ -609,6 +627,8 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(slack_cfg, dict):
if "require_mention" in slack_cfg and not os.getenv("SLACK_REQUIRE_MENTION"):
os.environ["SLACK_REQUIRE_MENTION"] = str(slack_cfg["require_mention"]).lower()
if "strict_mention" in slack_cfg and not os.getenv("SLACK_STRICT_MENTION"):
os.environ["SLACK_STRICT_MENTION"] = str(slack_cfg["strict_mention"]).lower()
if "allow_bots" in slack_cfg and not os.getenv("SLACK_ALLOW_BOTS"):
os.environ["SLACK_ALLOW_BOTS"] = str(slack_cfg["allow_bots"]).lower()
frc = slack_cfg.get("free_response_channels")
@@ -918,8 +938,12 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
slack_token = os.getenv("SLACK_BOT_TOKEN")
if slack_token:
if Platform.SLACK not in config.platforms:
# No yaml config for Slack — env-only setup, enable it
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
config.platforms[Platform.SLACK].enabled = True
# If yaml config exists, respect its enabled flag (don't override
# explicit enabled: false). Token is still stored so skills that
# send Slack messages can use it without activating the gateway adapter.
config.platforms[Platform.SLACK].token = slack_token
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home and Platform.SLACK in config.platforms:
@@ -1276,6 +1300,48 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
)
# Yuanbao — YUANBAO_APP_ID preferred
yuanbao_app_id = os.getenv("YUANBAO_APP_ID") or os.getenv("YUANBAO_APP_KEY")
yuanbao_app_secret = os.getenv("YUANBAO_APP_SECRET")
if yuanbao_app_id and yuanbao_app_secret:
if Platform.YUANBAO not in config.platforms:
config.platforms[Platform.YUANBAO] = PlatformConfig()
config.platforms[Platform.YUANBAO].enabled = True
extra = config.platforms[Platform.YUANBAO].extra
extra["app_id"] = yuanbao_app_id
extra["app_secret"] = yuanbao_app_secret
yuanbao_bot_id = os.getenv("YUANBAO_BOT_ID")
if yuanbao_bot_id:
extra["bot_id"] = yuanbao_bot_id
yuanbao_ws_url = os.getenv("YUANBAO_WS_URL")
if yuanbao_ws_url:
extra["ws_url"] = yuanbao_ws_url
yuanbao_api_domain = os.getenv("YUANBAO_API_DOMAIN")
if yuanbao_api_domain:
extra["api_domain"] = yuanbao_api_domain
yuanbao_route_env = os.getenv("YUANBAO_ROUTE_ENV")
if yuanbao_route_env:
extra["route_env"] = yuanbao_route_env
yuanbao_home = os.getenv("YUANBAO_HOME_CHANNEL")
if yuanbao_home:
config.platforms[Platform.YUANBAO].home_channel = HomeChannel(
platform=Platform.YUANBAO,
chat_id=yuanbao_home,
name=os.getenv("YUANBAO_HOME_CHANNEL_NAME", "Home"),
)
yuanbao_dm_policy = os.getenv("YUANBAO_DM_POLICY")
if yuanbao_dm_policy:
extra["dm_policy"] = yuanbao_dm_policy.strip().lower()
yuanbao_dm_allow_from = os.getenv("YUANBAO_DM_ALLOW_FROM")
if yuanbao_dm_allow_from:
extra["dm_allow_from"] = yuanbao_dm_allow_from
yuanbao_group_policy = os.getenv("YUANBAO_GROUP_POLICY")
if yuanbao_group_policy:
extra["group_policy"] = yuanbao_group_policy.strip().lower()
yuanbao_group_allow_from = os.getenv("YUANBAO_GROUP_ALLOW_FROM")
if yuanbao_group_allow_from:
extra["group_allow_from"] = yuanbao_group_allow_from
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
+3 -1
View File
@@ -79,7 +79,9 @@ _PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
"discord": _TIER_HIGH,
# Tier 2 — edit support, often customer/workspace channels
"slack": _TIER_MEDIUM,
# Slack: tool_progress off by default — Bolt posts cannot be edited like CLI;
# "new"/"all" spam permanent lines in channels (hermes-agent#14663).
"slack": {**_TIER_MEDIUM, "tool_progress": "off"},
"mattermost": _TIER_MEDIUM,
"matrix": _TIER_MEDIUM,
"feishu": _TIER_MEDIUM,
+57 -11
View File
@@ -28,6 +28,7 @@ def mirror_to_session(
message_text: str,
source_label: str = "cli",
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> bool:
"""
Append a delivery-mirror message to the target session's transcript.
@@ -39,9 +40,20 @@ def mirror_to_session(
All errors are caught -- this is never fatal.
"""
try:
session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
session_id = _find_session_id(
platform,
str(chat_id),
thread_id=thread_id,
user_id=user_id,
)
if not session_id:
logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
logger.debug(
"Mirror: no session found for %s:%s:%s:%s",
platform,
chat_id,
thread_id,
user_id,
)
return False
mirror_msg = {
@@ -59,17 +71,33 @@ def mirror_to_session(
return True
except Exception as e:
logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
logger.debug(
"Mirror failed for %s:%s:%s:%s: %s",
platform,
chat_id,
thread_id,
user_id,
e,
)
return False
def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
def _find_session_id(
platform: str,
chat_id: str,
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> Optional[str]:
"""
Find the active session_id for a platform + chat_id pair.
Scans sessions.json entries and matches where origin.chat_id == chat_id
on the right platform. DM session keys don't embed the chat_id
(e.g. "agent:main:telegram:dm"), so we check the origin dict.
When *user_id* is provided, prefer exact sender matches. If multiple
same-chat candidates exist and none matches the user, return None instead
of guessing and contaminating another participant's session.
"""
if not _SESSIONS_INDEX.exists():
return None
@@ -81,8 +109,7 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
return None
platform_lower = platform.lower()
best_match = None
best_updated = ""
candidates = []
for _key, entry in data.items():
origin = entry.get("origin") or {}
@@ -96,12 +123,31 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
origin_thread_id = origin.get("thread_id")
if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
continue
updated = entry.get("updated_at", "")
if updated > best_updated:
best_updated = updated
best_match = entry.get("session_id")
candidates.append(entry)
return best_match
if not candidates:
return None
if user_id:
exact_user_matches = [
entry for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "") == str(user_id)
]
if exact_user_matches:
candidates = exact_user_matches
elif len(candidates) > 1:
return None
elif len(candidates) > 1:
distinct_user_ids = {
str((entry.get("origin") or {}).get("user_id") or "").strip()
for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "").strip()
}
if len(distinct_user_ids) > 1:
return None
best_entry = max(candidates, key=lambda entry: entry.get("updated_at", ""))
return best_entry.get("session_id")
def _append_to_jsonl(session_id: str, message: dict) -> None:
+2
View File
@@ -10,10 +10,12 @@ Each adapter handles:
from .base import BasePlatformAdapter, MessageEvent, SendResult
from .qqbot import QQAdapter
from .yuanbao import YuanbaoAdapter
__all__ = [
"BasePlatformAdapter",
"MessageEvent",
"SendResult",
"QQAdapter",
"YuanbaoAdapter",
]
+49
View File
@@ -9,6 +9,7 @@ Exposes an HTTP server with endpoints:
- GET /v1/models lists hermes-agent as an available model
- POST /v1/runs start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id}/events SSE stream of structured lifecycle events
- POST /v1/runs/{run_id}/stop interrupt a running agent
- GET /health health check
- GET /health/detailed rich status for cross-container dashboard probing
@@ -586,6 +587,9 @@ class APIServerAdapter(BasePlatformAdapter):
self._run_streams: Dict[str, "asyncio.Queue[Optional[Dict]]"] = {}
# Creation timestamps for orphaned-run TTL sweep
self._run_streams_created: Dict[str, float] = {}
# Active run agent/task references for stop support
self._active_run_agents: Dict[str, Any] = {}
self._active_run_tasks: Dict[str, "asyncio.Task"] = {}
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
@staticmethod
@@ -2441,6 +2445,7 @@ class APIServerAdapter(BasePlatformAdapter):
stream_delta_callback=_text_cb,
tool_progress_callback=event_cb,
)
self._active_run_agents[run_id] = agent
def _run_sync():
r = agent.run_conversation(
user_message=user_message,
@@ -2480,8 +2485,11 @@ class APIServerAdapter(BasePlatformAdapter):
q.put_nowait(None)
except Exception:
pass
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
task = asyncio.create_task(_run_and_close())
self._active_run_tasks[run_id] = task
try:
self._background_tasks.add(task)
except TypeError:
@@ -2540,6 +2548,44 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _handle_stop_run(self, request: "web.Request") -> "web.Response":
"""POST /v1/runs/{run_id}/stop — interrupt a running agent."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
run_id = request.match_info["run_id"]
agent = self._active_run_agents.get(run_id)
task = self._active_run_tasks.get(run_id)
if agent is None and task is None:
return web.json_response(_openai_error(f"Run not found: {run_id}", code="run_not_found"), status=404)
if agent is not None:
try:
agent.interrupt("Stop requested via API")
except Exception:
pass
if task is not None and not task.done():
task.cancel()
# Bounded wait: run_conversation() executes in the default
# executor thread which task.cancel() cannot preempt — we rely on
# agent.interrupt() above to break the loop. Cap the wait so a
# slow/unresponsive interrupt can't hang this handler.
try:
await asyncio.wait_for(asyncio.shield(task), timeout=5.0)
except asyncio.TimeoutError:
logger.warning(
"[api_server] stop for run %s timed out after 5s; "
"agent may still be finishing the current step",
run_id,
)
except (asyncio.CancelledError, Exception):
pass
return web.json_response({"run_id": run_id, "status": "stopping"})
async def _sweep_orphaned_runs(self) -> None:
"""Periodically clean up run streams that were never consumed."""
while True:
@@ -2554,6 +2600,8 @@ class APIServerAdapter(BasePlatformAdapter):
logger.debug("[api_server] sweeping orphaned run %s", run_id)
self._run_streams.pop(run_id, None)
self._run_streams_created.pop(run_id, None)
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
# ------------------------------------------------------------------
# BasePlatformAdapter interface
@@ -2589,6 +2637,7 @@ class APIServerAdapter(BasePlatformAdapter):
# Structured event streaming
self._app.router.add_post("/v1/runs", self._handle_runs)
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
self._app.router.add_post("/v1/runs/{run_id}/stop", self._handle_stop_run)
# Start background sweep to clean up orphaned (unconsumed) run streams
sweep_task = asyncio.create_task(self._sweep_orphaned_runs())
try:
+158 -5
View File
@@ -336,6 +336,39 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
"""Return True when ``hostname`` matches a ``NO_PROXY`` entry.
Supports comma- or whitespace-separated entries with optional leading dots
and ``*.`` wildcards, which match both the apex domain and subdomains.
"""
raw = no_proxy_value
if raw is None:
raw = os.environ.get("NO_PROXY") or os.environ.get("no_proxy") or ""
raw = raw.strip()
if not raw:
return False
lower_hostname = hostname.lower()
for entry in re.split(r"[\s,]+", raw):
normalized = entry.strip().lower()
if not normalized:
continue
if normalized == "*":
return True
if normalized.startswith("*."):
normalized = normalized[2:]
elif normalized.startswith("."):
normalized = normalized[1:]
if lower_hostname == normalized or lower_hostname.endswith(f".{normalized}"):
return True
return False
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
@@ -693,7 +726,15 @@ SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
".md": "text/markdown",
".txt": "text/plain",
".csv": "text/csv",
".log": "text/plain",
".json": "application/json",
".xml": "application/xml",
".yaml": "application/yaml",
".yml": "application/yaml",
".toml": "application/toml",
".ini": "text/plain",
".cfg": "text/plain",
".zip": "application/zip",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -982,6 +1023,61 @@ def resolve_channel_prompt(
return None
def resolve_channel_skills(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> list[str] | None:
"""Resolve auto-loaded skill(s) for a channel/thread from platform config.
Looks up ``channel_skill_bindings`` in the adapter's ``config.extra`` dict.
Config format::
channel_skill_bindings:
- id: "C0123" # Slack channel ID or Discord channel/forum ID
skills: ["skill-a", "skill-b"]
- id: "D0ABCDE"
skill: "solo-skill" # single string also accepted
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / Slack threads inheriting the parent channel's
binding).
Returns a deduplicated list of skill names (order preserved), or None if
no match is found.
"""
bindings = config_extra.get("channel_skill_bindings") or []
if not isinstance(bindings, list) or not bindings:
return None
ids_to_check: set[str] = set()
if channel_id:
ids_to_check.add(str(channel_id))
if parent_id:
ids_to_check.add(str(parent_id))
if not ids_to_check:
return None
for entry in bindings:
if not isinstance(entry, dict):
continue
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
s = skills.strip()
return [s] if s else None
if isinstance(skills, list) and skills:
seen: list[str] = []
for name in skills:
if not isinstance(name, str):
continue
nm = name.strip()
if nm and nm not in seen:
seen.append(nm)
return seen or None
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -1025,7 +1121,20 @@ class BasePlatformAdapter(ABC):
self._post_delivery_callbacks: Dict[str, Any] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
# Auto-TTS on voice input: ``_auto_tts_default`` is the global default
# (``voice.auto_tts`` in config.yaml, pushed by GatewayRunner on connect).
# Per-chat overrides live in two sets populated from ``_voice_mode``:
# - ``_auto_tts_enabled_chats``: chat explicitly opted in via ``/voice on``
# or ``/voice tts`` (mode is ``voice_only`` or ``all``). Fires even when
# the global default is False.
# - ``_auto_tts_disabled_chats``: chat explicitly opted out via
# ``/voice off`` (mode is ``off``). Suppresses auto-TTS even when the
# global default is True.
# The gate in _process_message() is:
# fire if chat in _auto_tts_enabled_chats
# OR (_auto_tts_default and chat not in _auto_tts_disabled_chats)
self._auto_tts_default: bool = False
self._auto_tts_enabled_chats: set = set()
self._auto_tts_disabled_chats: set = set()
# Chats where typing indicator is paused (e.g. during approval waits).
# _keep_typing skips send_typing when the chat_id is in this set.
@@ -1047,6 +1156,21 @@ class BasePlatformAdapter(ABC):
def fatal_error_retryable(self) -> bool:
return self._fatal_error_retryable
def _should_auto_tts_for_chat(self, chat_id: str) -> bool:
"""Whether auto-TTS on voice input should fire for ``chat_id``.
Decision layers (Issue #16007):
1. Explicit ``/voice on`` or ``/voice tts`` always fire (even if
``voice.auto_tts`` is False).
2. Explicit ``/voice off`` never fire.
3. Fall back to the global ``voice.auto_tts`` config default.
"""
if chat_id in self._auto_tts_enabled_chats:
return True
if chat_id in self._auto_tts_disabled_chats:
return False
return bool(self._auto_tts_default)
def set_fatal_error_handler(self, handler: Callable[["BasePlatformAdapter"], Awaitable[None] | None]) -> None:
self._fatal_error_handler = handler
@@ -1230,6 +1354,27 @@ class BasePlatformAdapter(ABC):
"""
return SendResult(success=False, error="Not supported")
async def delete_message(
self,
chat_id: str,
message_id: str,
) -> bool:
"""
Delete a previously sent message. Optional platforms that don't
support deletion return ``False`` and callers fall back to leaving
the message in place.
Used by the stream consumer's fresh-final cleanup path (see
openclaw/openclaw#72038) to remove long-lived preview messages
after sending the completed reply as a fresh message so the
platform's visible timestamp reflects completion time.
Returns ``True`` on successful deletion, ``False`` otherwise.
Subclasses should override for platforms with a deletion API
(e.g. Telegram ``deleteMessage``).
"""
return False
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""
Send a typing indicator.
@@ -2214,12 +2359,14 @@ class BasePlatformAdapter(ABC):
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
# Auto-TTS: if voice message, generate audio FIRST (before sending text)
# Skipped when the chat has voice mode disabled (/voice off)
# Gated via ``_should_auto_tts_for_chat``: fires when the chat has
# an explicit ``/voice on|tts`` opt-in OR when ``voice.auto_tts`` is
# True globally and no ``/voice off`` has been issued.
_tts_path = None
if (event.message_type == MessageType.VOICE
if (self._should_auto_tts_for_chat(event.source.chat_id)
and event.message_type == MessageType.VOICE
and text_content
and not media_files
and event.source.chat_id not in self._auto_tts_disabled_chats):
and not media_files):
try:
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
if check_tts_requirements():
@@ -2543,6 +2690,9 @@ class BasePlatformAdapter(ABC):
user_id_alt: Optional[str] = None,
chat_id_alt: Optional[str] = None,
is_bot: bool = False,
guild_id: Optional[str] = None,
parent_chat_id: Optional[str] = None,
message_id: Optional[str] = None,
) -> SessionSource:
"""Helper to build a SessionSource for this platform."""
# Normalize empty topic to None
@@ -2560,6 +2710,9 @@ class BasePlatformAdapter(ABC):
user_id_alt=user_id_alt,
chat_id_alt=chat_id_alt,
is_bot=is_bot,
guild_id=str(guild_id) if guild_id else None,
parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
message_id=str(message_id) if message_id else None,
)
@abstractmethod
+6 -20
View File
@@ -2315,11 +2315,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_background(interaction: discord.Interaction, prompt: str):
await self._run_simple_slash(interaction, f"/background {prompt}", "Background task started~")
@tree.command(name="btw", description="Ephemeral side question using session context")
@discord.app_commands.describe(question="Your side question (no tools, not persisted)")
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
@@ -2684,21 +2679,8 @@ class DiscordAdapter(BasePlatformAdapter):
skills: ["skill-a", "skill-b"]
Also checks parent_id so forum threads inherit the forum's bindings.
"""
bindings = self.config.extra.get("channel_skill_bindings", [])
if not bindings:
return None
ids_to_check = {channel_id}
if parent_id:
ids_to_check.add(parent_id)
for entry in bindings:
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
return [skills]
if isinstance(skills, list) and skills:
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
from gateway.platforms.base import resolve_channel_skills
return resolve_channel_skills(self.config.extra, channel_id, parent_id)
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
@@ -3261,6 +3243,7 @@ class DiscordAdapter(BasePlatformAdapter):
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
thread = await self._auto_create_thread(message)
if thread:
parent_channel_id = str(message.channel.id)
is_thread = True
thread_id = str(thread.id)
auto_threaded_channel = thread
@@ -3320,6 +3303,9 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
# Build media URLs -- download image attachments to local cache so the
+9
View File
@@ -57,6 +57,15 @@ class MessageDeduplicator:
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
self._seen = {k: v for k, v in self._seen.items() if v > cutoff}
if len(self._seen) > self._max_size:
# TTL pruning alone does not cap the cache when every entry is
# still fresh. Keep the newest entries so the helper's
# max_size bound is enforced under sustained traffic.
newest = sorted(
self._seen.items(),
key=lambda item: item[1],
)[-self._max_size:]
self._seen = dict(newest)
return False
def clear(self):
File diff suppressed because it is too large Load Diff
+25
View File
@@ -1209,6 +1209,31 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=False, error=str(e))
async def delete_message(self, chat_id: str, message_id: str) -> bool:
"""Delete a previously sent Telegram message.
Used by the stream consumer's fresh-final cleanup path (ported
from openclaw/openclaw#72038) to remove long-lived preview
messages after sending the completed reply as a fresh message.
Telegram's Bot API ``deleteMessage`` works for bot-posted
messages in the last 48 hours. Failures are non-fatal the
caller leaves the preview in place and logs at debug level.
"""
if not self._bot:
return False
try:
await self._bot.delete_message(
chat_id=int(chat_id),
message_id=int(message_id),
)
return True
except Exception as e:
logger.debug(
"[%s] Failed to delete Telegram message %s: %s",
self.name, message_id, e,
)
return False
async def send_update_prompt(
self, chat_id: str, prompt: str, default: str = "",
session_key: str = "",
File diff suppressed because it is too large Load Diff
+647
View File
@@ -0,0 +1,647 @@
"""
yuanbao_media.py 元宝平台媒体处理模块
提供 COS 上传文件下载TIM 媒体消息构建等功能
移植自 TypeScript media.tsyuanbao-openclaw-plugin
使用 httpx 替代 cos-nodejs-sdk-v5避免引入额外 SDK 依赖
COS 上传流程
1. 调用 genUploadInfo 获取临时凭证tmpSecretId/tmpSecretKey/sessionToken
2. 用临时凭证通过 HMAC-SHA1 签名构建 Authorization
3. HTTP PUT 上传到 COS
TIM 消息体构建
- buildImageMsgBody() TIMImageElem
- buildFileMsgBody() TIMFileElem
"""
from __future__ import annotations
import hashlib
import hmac
import logging
import os
import re
import secrets
import struct
import time
import urllib.parse
from datetime import datetime, timezone, timedelta
from typing import Optional, Any
import httpx
logger = logging.getLogger(__name__)
# ============ 常量 ============
UPLOAD_INFO_PATH = "/api/resource/genUploadInfo"
DEFAULT_API_DOMAIN = "yuanbao.tencent.com"
DEFAULT_MAX_SIZE_MB = 50
# COS 加速域名后缀(优先使用全球加速)
COS_USE_ACCELERATE = True
# ============ 类型映射 ============
# MIME → image_format 数字(TIM 协议字段)
_MIME_TO_IMAGE_FORMAT: dict[str, int] = {
"image/jpeg": 1,
"image/jpg": 1,
"image/gif": 2,
"image/png": 3,
"image/bmp": 4,
"image/webp": 255,
"image/heic": 255,
"image/tiff": 255,
}
# 文件扩展名 → MIME
_EXT_TO_MIME: dict[str, str] = {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
".heic": "image/heic",
".tiff": "image/tiff",
".ico": "image/x-icon",
".pdf": "application/pdf",
".doc": "application/msword",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xls": "application/vnd.ms-excel",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".ppt": "application/vnd.ms-powerpoint",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
".txt": "text/plain",
".zip": "application/zip",
".tar": "application/x-tar",
".gz": "application/gzip",
".mp3": "audio/mpeg",
".mp4": "video/mp4",
".wav": "audio/wav",
".ogg": "audio/ogg",
".webm": "video/webm",
}
# ============ 工具函数 ============
def guess_mime_type(filename: str) -> str:
"""根据文件扩展名猜测 MIME 类型。"""
ext = os.path.splitext(filename)[-1].lower()
return _EXT_TO_MIME.get(ext, "application/octet-stream")
def is_image(filename: str, mime_type: str = "") -> bool:
"""判断是否为图片类型。"""
if mime_type.startswith("image/"):
return True
ext = os.path.splitext(filename)[-1].lower()
return ext in {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".heic", ".tiff", ".ico"}
def get_image_format(mime_type: str) -> int:
"""获取 TIM 图片格式编号。"""
return _MIME_TO_IMAGE_FORMAT.get(mime_type.lower(), 255)
def md5_hex(data: bytes) -> str:
"""计算 MD5 十六进制摘要。"""
return hashlib.md5(data).hexdigest()
def generate_file_id() -> str:
"""生成随机文件 ID(32 位 hex)。"""
return secrets.token_hex(16)
# ============ 图片尺寸解析(纯 Python,无需 Pillow ============
def parse_image_size(data: bytes) -> Optional[dict[str, int]]:
"""
解析图片宽高支持 JPEG/PNG/GIF/WebP无需第三方依赖
返回 {"width": w, "height": h} None无法识别
"""
return (
_parse_png_size(data)
or _parse_jpeg_size(data)
or _parse_gif_size(data)
or _parse_webp_size(data)
)
def _parse_png_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 24:
return None
if buf[:4] != b"\x89PNG":
return None
w = struct.unpack(">I", buf[16:20])[0]
h = struct.unpack(">I", buf[20:24])[0]
return {"width": w, "height": h}
def _parse_jpeg_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 4 or buf[0] != 0xFF or buf[1] != 0xD8:
return None
i = 2
while i < len(buf) - 9:
if buf[i] != 0xFF:
i += 1
continue
marker = buf[i + 1]
if marker in (0xC0, 0xC2):
h = struct.unpack(">H", buf[i + 5: i + 7])[0]
w = struct.unpack(">H", buf[i + 7: i + 9])[0]
return {"width": w, "height": h}
if i + 3 < len(buf):
i += 2 + struct.unpack(">H", buf[i + 2: i + 4])[0]
else:
break
return None
def _parse_gif_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 10:
return None
sig = buf[:6].decode("ascii", errors="replace")
if sig not in ("GIF87a", "GIF89a"):
return None
w = struct.unpack("<H", buf[6:8])[0]
h = struct.unpack("<H", buf[8:10])[0]
return {"width": w, "height": h}
def _parse_webp_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 16:
return None
if buf[:4] != b"RIFF" or buf[8:12] != b"WEBP":
return None
chunk = buf[12:16].decode("ascii", errors="replace")
if chunk == "VP8 ":
if len(buf) >= 30 and buf[23] == 0x9D and buf[24] == 0x01 and buf[25] == 0x2A:
w = struct.unpack("<H", buf[26:28])[0] & 0x3FFF
h = struct.unpack("<H", buf[28:30])[0] & 0x3FFF
return {"width": w, "height": h}
elif chunk == "VP8L":
if len(buf) >= 25 and buf[20] == 0x2F:
bits = struct.unpack("<I", buf[21:25])[0]
w = (bits & 0x3FFF) + 1
h = ((bits >> 14) & 0x3FFF) + 1
return {"width": w, "height": h}
elif chunk == "VP8X":
if len(buf) >= 30:
w = (buf[24] | (buf[25] << 8) | (buf[26] << 16)) + 1
h = (buf[27] | (buf[28] << 8) | (buf[29] << 16)) + 1
return {"width": w, "height": h}
return None
# ============ URL 下载 ============
async def download_url(
url: str,
max_size_mb: int = DEFAULT_MAX_SIZE_MB,
) -> tuple[bytes, str]:
"""
下载 URL 内容返回 (bytes, content_type)
Args:
url: HTTP(S) URL
max_size_mb: 最大允许大小MB超过则抛出异常
Returns:
(data_bytes, content_type_string)
Raises:
ValueError: 内容超过大小限制
httpx.HTTPError: 网络/HTTP 错误
"""
max_bytes = max_size_mb * 1024 * 1024
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
# 先 HEAD 检查大小
try:
head = await client.head(url)
content_length = int(head.headers.get("content-length", 0) or 0)
if content_length > 0 and content_length > max_bytes:
raise ValueError(
f"文件过大: {content_length / 1024 / 1024:.1f} MB > {max_size_mb} MB"
)
except httpx.HTTPStatusError:
pass # 部分服务器不支持 HEAD,忽略
# GET 下载(流式读取,防止超限)
async with client.stream("GET", url) as resp:
resp.raise_for_status()
content_type = resp.headers.get("content-type", "").split(";")[0].strip()
chunks: list[bytes] = []
downloaded = 0
async for chunk in resp.aiter_bytes(65536):
downloaded += len(chunk)
if downloaded > max_bytes:
raise ValueError(
f"文件过大: 已超过 {max_size_mb} MB 限制"
)
chunks.append(chunk)
data = b"".join(chunks)
return data, content_type
# ============ COS 鉴权(HMAC-SHA1 ============
def _cos_sign(
method: str,
path: str,
params: dict[str, str],
headers: dict[str, str],
secret_id: str,
secret_key: str,
start_time: Optional[int] = None,
expire_seconds: int = 3600,
) -> str:
"""
构建 COS 请求签名q-sign-algorithm=sha1 方案
参考https://cloud.tencent.com/document/product/436/7778
Args:
method: HTTP 方法小写 "put"
path: URL 路径URL encode 后的小写
params: URL 查询参数 dict用于签名
headers: 参与签名的请求头 dictkey 需小写
secret_id: 临时 SecretIdtmpSecretId
secret_key: 临时 SecretKeytmpSecretKey
start_time: 签名起始 Unix 时间戳默认 now
expire_seconds: 签名有效期默认 3600
Returns:
Authorization header 完整字符串
"""
now = int(time.time())
q_sign_time = f"{start_time or now};{(start_time or now) + expire_seconds}"
# Step 1: SignKey = HMAC-SHA1(SecretKey, q-sign-time)
sign_key = hmac.new(
secret_key.encode("utf-8"),
q_sign_time.encode("utf-8"),
hashlib.sha1,
).hexdigest()
# Step 2: HttpString
# 参数和头部需按字典序排列,key 小写
sorted_params = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in params.items())
sorted_headers = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in headers.items())
url_param_list = ";".join(k for k, _ in sorted_params)
url_params = "&".join(f"{k}={v}" for k, v in sorted_params)
header_list = ";".join(k for k, _ in sorted_headers)
header_str = "&".join(f"{k}={v}" for k, v in sorted_headers)
http_string = "\n".join([
method.lower(),
path,
url_params,
header_str,
"",
])
# Step 3: StringToSign = sha1 hash of HttpString
sha1_of_http = hashlib.sha1(http_string.encode("utf-8")).hexdigest()
string_to_sign = "\n".join([
"sha1",
q_sign_time,
sha1_of_http,
"",
])
# Step 4: Signature = HMAC-SHA1(SignKey, StringToSign)
signature = hmac.new(
sign_key.encode("utf-8"),
string_to_sign.encode("utf-8"),
hashlib.sha1,
).hexdigest()
return (
f"q-sign-algorithm=sha1"
f"&q-ak={secret_id}"
f"&q-sign-time={q_sign_time}"
f"&q-key-time={q_sign_time}"
f"&q-header-list={header_list}"
f"&q-url-param-list={url_param_list}"
f"&q-signature={signature}"
)
# ============ 主要公开 API ============
async def get_cos_credentials(
app_key: str,
api_domain: str,
token: str,
filename: str = "file",
file_id: Optional[str] = None,
bot_id: str = "",
route_env: str = "",
) -> dict:
"""
调用 genUploadInfo 接口获取 COS 临时密钥及上传配置
Args:
app_key: 应用 Key用于 X-ID
api_domain: API 域名 https://bot.yuanbao.tencent.com
token: 当前有效的签票 tokenX-Token
filename: 待上传的文件名含扩展名
file_id: 客户端生成的唯一文件 ID不传则自动生成
bot_id: Bot 账号 ID用于 X-ID
Returns:
COS 上传配置 dict包含以下字段
bucketName (str) COS Bucket 名称
region (str) COS 地域
location (str) 上传 Key对象路径
encryptTmpSecretId (str) 临时 SecretId
encryptTmpSecretKey(str) 临时 SecretKey
encryptToken (str) SessionToken
startTime (int) 凭证起始时间戳Unix
expiredTime (int) 凭证过期时间戳Unix
resourceUrl (str) 上传后的公网访问 URL
resourceID (str) 资源 ID可选
Raises:
RuntimeError: 接口返回非 0 code 或字段缺失
"""
if file_id is None:
file_id = generate_file_id()
upload_url = f"{api_domain.rstrip('/')}{UPLOAD_INFO_PATH}"
headers = {
"Content-Type": "application/json",
"X-Token": token,
"X-ID": bot_id or app_key,
"X-Source": "web",
}
if route_env:
headers["X-Route-Env"] = route_env
body = {
"fileName": filename,
"fileId": file_id,
"docFrom": "localDoc",
"docOpenId": "",
}
async with httpx.AsyncClient(timeout=15.0) as client:
resp = await client.post(upload_url, json=body, headers=headers)
resp.raise_for_status()
result: dict[str, Any] = resp.json()
code = result.get("code")
if code != 0 and code is not None:
raise RuntimeError(
f"genUploadInfo 失败: code={code}, msg={result.get('msg', '')}"
)
data = result.get("data") or result
required_fields = ["bucketName", "location"]
missing = [f for f in required_fields if not data.get(f)]
if missing:
raise RuntimeError(
f"genUploadInfo 返回字段不完整: 缺少字段 {missing}"
)
return data
async def upload_to_cos(
file_bytes: bytes,
filename: str,
content_type: str,
credentials: dict,
bucket: str,
region: str,
) -> dict:
"""
通过 httpx PUT 请求将文件上传到 COS
使用临时凭证tmpSecretId/tmpSecretKey/sessionToken构建 HMAC-SHA1 签名
Args:
file_bytes: 文件二进制内容
filename: 文件名用于辅助计算 MIMEUUID
content_type: MIME 类型 "image/jpeg"
credentials: get_cos_credentials() 返回的 dict包含
encryptTmpSecretId tmpSecretId
encryptTmpSecretKey tmpSecretKey
encryptToken sessionToken
location COS key对象路径
resourceUrl 上传后公网 URL
startTime 凭证起始时间Unix
expiredTime 凭证过期时间Unix
bucket: COS Bucket 名称 chatbot-1234567890
region: COS 地域 ap-guangzhou
Returns:
上传结果 dict包含
url (str) COS 公网访问 URL
uuid (str) 文件内容 MD5
size (int) 文件大小字节
width (int, optional) 图片宽度仅图片
height (int, optional) 图片高度仅图片
Raises:
httpx.HTTPStatusError: COS 返回非 2xx 状态
RuntimeError: credentials 字段缺失
"""
secret_id: str = credentials.get("encryptTmpSecretId", "")
secret_key: str = credentials.get("encryptTmpSecretKey", "")
session_token: str = credentials.get("encryptToken", "")
cos_key: str = credentials.get("location", "")
resource_url: str = credentials.get("resourceUrl", "")
start_time: Optional[int] = credentials.get("startTime")
expired_time: Optional[int] = credentials.get("expiredTime")
if not secret_id or not secret_key or not cos_key:
raise RuntimeError(
f"COS credentials 不完整: secretId={bool(secret_id)}, "
f"secretKey={bool(secret_key)}, location={bool(cos_key)}"
)
# 构建 COS 上传 URL(优先使用全球加速域名)
if COS_USE_ACCELERATE:
cos_host = f"{bucket}.cos.accelerate.myqcloud.com"
else:
cos_host = f"{bucket}.cos.{region}.myqcloud.com"
# URL encode cos_key(保留 /
encoded_key = urllib.parse.quote(cos_key, safe="/")
cos_url = f"https://{cos_host}/{encoded_key.lstrip('/')}"
# 确定 Content-Type
if not content_type or content_type == "application/octet-stream":
if is_image(filename):
content_type = guess_mime_type(filename)
else:
content_type = "application/octet-stream"
# 计算文件 MD5 + size
file_uuid = md5_hex(file_bytes)
file_size = len(file_bytes)
# 参与签名的请求头
sign_headers = {
"host": cos_host,
"content-type": content_type,
"x-cos-security-token": session_token,
}
# 计算签名有效期
now = int(time.time())
sign_start = start_time if start_time else now
sign_expire = (expired_time - now) if expired_time and expired_time > now else 3600
authorization = _cos_sign(
method="put",
path=f"/{encoded_key.lstrip('/')}",
params={},
headers=sign_headers,
secret_id=secret_id,
secret_key=secret_key,
start_time=sign_start,
expire_seconds=sign_expire,
)
put_headers = {
"Authorization": authorization,
"Content-Type": content_type,
"x-cos-security-token": session_token,
}
logger.info(
"COS PUT: bucket=%s region=%s key=%s size=%d mime=%s",
bucket, region, cos_key, file_size, content_type,
)
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.put(
cos_url,
content=file_bytes,
headers=put_headers,
)
resp.raise_for_status()
# 解析图片尺寸(仅图片类型)
result: dict[str, Any] = {
"url": resource_url or cos_url,
"uuid": file_uuid,
"size": file_size,
}
if content_type.startswith("image/"):
size_info = parse_image_size(file_bytes)
if size_info:
result["width"] = size_info["width"]
result["height"] = size_info["height"]
logger.info(
"COS 上传成功: url=%s size=%d",
result["url"], file_size,
)
return result
# ============ TIM 媒体消息构建 ============
def build_image_msg_body(
url: str,
uuid: Optional[str] = None,
filename: Optional[str] = None,
size: int = 0,
width: int = 0,
height: int = 0,
mime_type: str = "",
) -> list[dict]:
"""
构建腾讯 IM TIMImageElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 图片公网访问 URLCOS resourceUrl
uuid: 文件 UUIDMD5 或其他唯一标识
filename: 文件名uuid 为空时作为备用
size: 文件大小字节
width: 图片宽度像素
height: 图片高度像素
mime_type: MIME 类型用于确定 image_format
Returns:
TIMImageElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename or _basename_from_url(url) or "image"
image_format = get_image_format(mime_type) if mime_type else 255
return [
{
"msg_type": "TIMImageElem",
"msg_content": {
"uuid": _uuid,
"image_format": image_format,
"image_info_array": [
{
"type": 1, # 1 = 原图
"size": size,
"width": width,
"height": height,
"url": url,
}
],
},
}
]
def build_file_msg_body(
url: str,
filename: str,
uuid: Optional[str] = None,
size: int = 0,
) -> list[dict]:
"""
构建腾讯 IM TIMFileElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 文件公网访问 URLCOS resourceUrl
filename: 文件名含扩展名
uuid: 文件 UUIDMD5 或其他唯一标识不传则使用 filename
size: 文件大小字节
Returns:
TIMFileElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename
return [
{
"msg_type": "TIMFileElem",
"msg_content": {
"uuid": _uuid,
"file_name": filename,
"file_size": size,
"url": url,
},
}
]
# ============ 内部工具 ============
def _basename_from_url(url: str) -> str:
"""从 URL 提取文件名。"""
try:
parsed = urllib.parse.urlparse(url)
return os.path.basename(parsed.path)
except Exception:
return ""
File diff suppressed because it is too large Load Diff
+558
View File
@@ -0,0 +1,558 @@
"""
Yuanbao sticker (TIMFaceElem) support.
Ported from yuanbao-openclaw-plugin/src/sticker/.
TIMFaceElem wire format:
{
"msg_type": "TIMFaceElem",
"msg_content": {
"index": 0, # always 0 per Yuanbao convention
"data": "<json>", # serialised sticker metadata
}
}
The `data` field carries a JSON string with the sticker's metadata so the
receiver can look up the correct asset in the emoji pack.
"""
from __future__ import annotations
import json
import random
import re
import unicodedata
from typing import Optional
# ---------------------------------------------------------------------------
# Sticker catalogue ported from builtin-stickers.json
# Key : canonical name (Chinese)
# Value : {sticker_id, package_id, name, description, width, height, formats}
# ---------------------------------------------------------------------------
STICKER_MAP: dict[str, dict] = {
"六六六": {
"sticker_id": "278", "package_id": "1003", "name": "六六六",
"description": "666 厉害 牛 棒 绝了 好强 awesome",
"width": 128, "height": 128, "formats": "png",
},
"我想开了": {
"sticker_id": "262", "package_id": "1003", "name": "我想开了",
"description": "想开 佛系 释怀 顿悟 看淡了 无所谓",
"width": 128, "height": 128, "formats": "png",
},
"害羞": {
"sticker_id": "130", "package_id": "1003", "name": "害羞",
"description": "腼腆 不好意思 脸红 娇羞 羞涩 捂脸",
"width": 128, "height": 128, "formats": "png",
},
"比心": {
"sticker_id": "252", "package_id": "1003", "name": "比心",
"description": "笔芯 爱你 爱心手势 love heart 喜欢你",
"width": 128, "height": 128, "formats": "png",
},
"委屈": {
"sticker_id": "125", "package_id": "1003", "name": "委屈",
"description": "难过 想哭 可怜巴巴 瘪嘴 受伤 被欺负",
"width": 128, "height": 128, "formats": "png",
},
"亲亲": {
"sticker_id": "146", "package_id": "1003", "name": "亲亲",
"description": "么么 mua 亲一下 kiss 飞吻 啵",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "131", "package_id": "1003", "name": "",
"description": "帅 墨镜 cool 高冷 有型 swagger",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "145", "package_id": "1003", "name": "",
"description": "睡觉 困 zzZ 打盹 躺平 休眠 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"发呆": {
"sticker_id": "152", "package_id": "1003", "name": "发呆",
"description": "懵 愣住 放空 呆滞 出神 脑子空白",
"width": 128, "height": 128, "formats": "png",
},
"可怜": {
"sticker_id": "157", "package_id": "1003", "name": "可怜",
"description": "卖萌 求饶 委屈巴巴 弱小 拜托 眼巴巴",
"width": 128, "height": 128, "formats": "png",
},
"摊手": {
"sticker_id": "200", "package_id": "1003", "name": "摊手",
"description": "无奈 没办法 耸肩 随便 那咋整 whatever",
"width": 128, "height": 128, "formats": "png",
},
"头大": {
"sticker_id": "213", "package_id": "1003", "name": "头大",
"description": "头疼 烦恼 郁闷 难搞 崩溃 一团乱",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "256", "package_id": "1003", "name": "",
"description": "害怕 惊恐 震惊 吓一跳 恐怖 怂",
"width": 128, "height": 128, "formats": "png",
},
"吐血": {
"sticker_id": "203", "package_id": "1003", "name": "吐血",
"description": "无语 崩溃 被雷 内伤 一口老血 屮",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "185", "package_id": "1003", "name": "",
"description": "傲娇 生气 不满 撇嘴 不理 赌气",
"width": 128, "height": 128, "formats": "png",
},
"嘿嘿": {
"sticker_id": "220", "package_id": "1003", "name": "嘿嘿",
"description": "坏笑 猥琐笑 偷笑 憨笑 得意 你懂的",
"width": 128, "height": 128, "formats": "png",
},
"头秃": {
"sticker_id": "218", "package_id": "1003", "name": "头秃",
"description": "程序员 加班 焦虑 没头发 秃了 肝爆",
"width": 128, "height": 128, "formats": "png",
},
"暗中观察": {
"sticker_id": "221", "package_id": "1003", "name": "暗中观察",
"description": "窥屏 潜水 偷偷看 角落 围观 屏住呼吸",
"width": 128, "height": 128, "formats": "png",
},
"我酸了": {
"sticker_id": "224", "package_id": "1003", "name": "我酸了",
"description": "嫉妒 柠檬精 羡慕 吃柠檬 眼红 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"打call": {
"sticker_id": "246", "package_id": "1003", "name": "打call",
"description": "应援 加油 支持 喝彩 助威 call",
"width": 128, "height": 128, "formats": "png",
},
"庆祝": {
"sticker_id": "251", "package_id": "1003", "name": "庆祝",
"description": "祝贺 开心 耶 party 胜利 干杯",
"width": 128, "height": 128, "formats": "png",
},
"奋斗": {
"sticker_id": "151", "package_id": "1003", "name": "奋斗",
"description": "努力 加油 拼搏 冲 干劲 卷起来",
"width": 128, "height": 128, "formats": "png",
},
"惊讶": {
"sticker_id": "143", "package_id": "1003", "name": "惊讶",
"description": "震惊 哇 不敢相信 OMG 居然 这么离谱",
"width": 128, "height": 128, "formats": "png",
},
"疑问": {
"sticker_id": "144", "package_id": "1003", "name": "疑问",
"description": "问号 不懂 啥 为什么 啥情况 懵逼问",
"width": 128, "height": 128, "formats": "png",
},
"仔细分析": {
"sticker_id": "248", "package_id": "1003", "name": "仔细分析",
"description": "思考 推敲 认真 研究 琢磨 让我想想",
"width": 128, "height": 128, "formats": "png",
},
"撅嘴": {
"sticker_id": "184", "package_id": "1003", "name": "撅嘴",
"description": "嘟嘴 卖萌 不高兴 撒娇 嘴翘",
"width": 128, "height": 128, "formats": "png",
},
"泪奔": {
"sticker_id": "199", "package_id": "1003", "name": "泪奔",
"description": "大哭 伤心 破防 感动哭 泪流满面 呜呜",
"width": 128, "height": 128, "formats": "png",
},
"尊嘟假嘟": {
"sticker_id": "276", "package_id": "1003", "name": "尊嘟假嘟",
"description": "真的假的 真假 可爱问 你骗我 是不是",
"width": 128, "height": 128, "formats": "png",
},
"略略略": {
"sticker_id": "113", "package_id": "1003", "name": "略略略",
"description": "调皮 吐舌 不服 略 气死你 鬼脸",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "180", "package_id": "1003", "name": "",
"description": "想睡 倦 打哈欠 睁不开眼 好困啊 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"折磨": {
"sticker_id": "181", "package_id": "1003", "name": "折磨",
"description": "难受 痛苦 煎熬 蚌埠住了 受不了 要命",
"width": 128, "height": 128, "formats": "png",
},
"抠鼻": {
"sticker_id": "182", "package_id": "1003", "name": "抠鼻",
"description": "不屑 无聊 淡定 无所谓 鄙视 挖鼻",
"width": 128, "height": 128, "formats": "png",
},
"鼓掌": {
"sticker_id": "183", "package_id": "1003", "name": "鼓掌",
"description": "拍手 叫好 赞同 666 喝彩 掌声",
"width": 128, "height": 128, "formats": "png",
},
"斜眼笑": {
"sticker_id": "204", "package_id": "1003", "name": "斜眼笑",
"description": "滑稽 坏笑 doge 意味深长 阴阳怪气 嘿嘿嘿",
"width": 128, "height": 128, "formats": "png",
},
"辣眼睛": {
"sticker_id": "216", "package_id": "1003", "name": "辣眼睛",
"description": "看不下去 cringe 毁三观 太丑了 瞎了",
"width": 128, "height": 128, "formats": "png",
},
"哦哟": {
"sticker_id": "217", "package_id": "1003", "name": "哦哟",
"description": "惊讶 起哄 哇哦 有戏 不简单 哟",
"width": 128, "height": 128, "formats": "png",
},
"吃瓜": {
"sticker_id": "222", "package_id": "1003", "name": "吃瓜",
"description": "围观 看戏 八卦 路人 看热闹 板凳",
"width": 128, "height": 128, "formats": "png",
},
"狗头": {
"sticker_id": "225", "package_id": "1003", "name": "狗头",
"description": "doge 保命 开玩笑 滑稽 反讽 懂的都懂",
"width": 128, "height": 128, "formats": "png",
},
"敬礼": {
"sticker_id": "227", "package_id": "1003", "name": "敬礼",
"description": "salute 尊重 收到 遵命 致敬 报告",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "231", "package_id": "1003", "name": "",
"description": "知道了 明白 敷衍 嗯 这样啊 收到",
"width": 128, "height": 128, "formats": "png",
},
"拿到红包": {
"sticker_id": "236", "package_id": "1003", "name": "拿到红包",
"description": "红包 谢谢老板 发财 开心 抢到了 欧气",
"width": 128, "height": 128, "formats": "png",
},
"牛吖": {
"sticker_id": "239", "package_id": "1003", "name": "牛吖",
"description": "牛 厉害 强 666 佩服 大佬",
"width": 128, "height": 128, "formats": "png",
},
"贴贴": {
"sticker_id": "272", "package_id": "1003", "name": "贴贴",
"description": "抱抱 亲昵 蹭蹭 亲密 靠靠 撒娇贴",
"width": 128, "height": 128, "formats": "png",
},
"爱心": {
"sticker_id": "138", "package_id": "1003", "name": "爱心",
"description": "心 love 喜欢你 红心 示爱 么么哒",
"width": 128, "height": 128, "formats": "png",
},
"晚安": {
"sticker_id": "170", "package_id": "1003", "name": "晚安",
"description": "好梦 睡了 night 早点休息 安啦 moon",
"width": 128, "height": 128, "formats": "png",
},
"太阳": {
"sticker_id": "176", "package_id": "1003", "name": "太阳",
"description": "晴天 早上好 阳光 morning 好天气 日",
"width": 128, "height": 128, "formats": "png",
},
"柠檬": {
"sticker_id": "266", "package_id": "1003", "name": "柠檬",
"description": "酸 嫉妒 柠檬精 羡慕 我酸 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"大冤种": {
"sticker_id": "267", "package_id": "1003", "name": "大冤种",
"description": "倒霉 吃亏 自嘲 好心没好报 背锅 工具人",
"width": 128, "height": 128, "formats": "png",
},
"吐了": {
"sticker_id": "132", "package_id": "1003", "name": "吐了",
"description": "恶心 yue 受不了 嫌弃 想吐 生理不适",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "134", "package_id": "1003", "name": "",
"description": "生气 愤怒 火大 暴躁 气炸 怼",
"width": 128, "height": 128, "formats": "png",
},
"玫瑰": {
"sticker_id": "165", "package_id": "1003", "name": "玫瑰",
"description": "花 示爱 表白 浪漫 送你花 情人节",
"width": 128, "height": 128, "formats": "png",
},
"凋谢": {
"sticker_id": "119", "package_id": "1003", "name": "凋谢",
"description": "花谢 失恋 难过 枯萎 心碎 凉了",
"width": 128, "height": 128, "formats": "png",
},
"点赞": {
"sticker_id": "159", "package_id": "1003", "name": "点赞",
"description": "赞 认同 好棒 good like 大拇指 顶",
"width": 128, "height": 128, "formats": "png",
},
"握手": {
"sticker_id": "164", "package_id": "1003", "name": "握手",
"description": "合作 你好 商务 hello deal 成交 友好",
"width": 128, "height": 128, "formats": "png",
},
"抱拳": {
"sticker_id": "163", "package_id": "1003", "name": "抱拳",
"description": "谢谢 失敬 江湖 承让 拜托 有礼",
"width": 128, "height": 128, "formats": "png",
},
"ok": {
"sticker_id": "169", "package_id": "1003", "name": "ok",
"description": "好的 收到 没问题 okay 行 可以 懂了",
"width": 128, "height": 128, "formats": "png",
},
"拳头": {
"sticker_id": "174", "package_id": "1003", "name": "拳头",
"description": "加油 干 冲 fight 力量 击拳 硬气",
"width": 128, "height": 128, "formats": "png",
},
"鞭炮": {
"sticker_id": "191", "package_id": "1003", "name": "鞭炮",
"description": "过年 喜庆 爆竹 春节 噼里啪啦 红",
"width": 128, "height": 128, "formats": "png",
},
"烟花": {
"sticker_id": "258", "package_id": "1003", "name": "烟花",
"description": "庆典 漂亮 新年 嘭 绽放 节日快乐",
"width": 128, "height": 128, "formats": "png",
},
}
def get_sticker_by_name(name: str) -> Optional[dict]:
"""
按名称查找贴纸支持模糊匹配
匹配优先级
1. 完全相等name
2. name 包含查询词前缀/子串
3. description 包含查询词同义词搜索
4. 通用模糊评分 sticker-search 同算法命中即返回得分最高的一条
返回 sticker dict找不到返回 None
"""
if not name:
return None
query = name.strip()
if query in STICKER_MAP:
return STICKER_MAP[query]
for key, sticker in STICKER_MAP.items():
if query in key or key in query:
return sticker
for sticker in STICKER_MAP.values():
desc = sticker.get("description", "")
if query in desc:
return sticker
matches = search_stickers(query, limit=1)
return matches[0] if matches else None
def get_random_sticker(category: str = None) -> dict:
"""
随机返回一个贴纸
若指定 category则在 description 中含有该关键词的贴纸里随机选取
category None 时从全表随机
"""
if category:
candidates = [
s for s in STICKER_MAP.values()
if category in s.get("description", "") or category in s.get("name", "")
]
if candidates:
return random.choice(candidates)
return random.choice(list(STICKER_MAP.values()))
def get_sticker_by_id(sticker_id: str) -> Optional[dict]:
"""按 sticker_id 精确查找贴纸。"""
if not sticker_id:
return None
sid = str(sticker_id).strip()
for sticker in STICKER_MAP.values():
if sticker.get("sticker_id") == sid:
return sticker
return None
# ---------------------------------------------------------------------------
# 模糊搜索(对齐 chatbot-web yuanbao-openclaw-plugin/sticker-cache.ts.searchStickers
# ---------------------------------------------------------------------------
_PUNCT_RE = re.compile(r"[\s\u3000\-_·.,,。!?\"“”'‘’、/\\]+")
def _normalize_text(raw: str) -> str:
return unicodedata.normalize("NFKC", str(raw or "")).strip().lower()
def _compact_text(raw: str) -> str:
return _PUNCT_RE.sub("", _normalize_text(raw))
def _multiset_char_hit_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
bag: dict[str, int] = {}
for ch in haystack:
bag[ch] = bag.get(ch, 0) + 1
hits = 0
for ch in needle:
n = bag.get(ch, 0)
if n > 0:
hits += 1
bag[ch] = n - 1
return hits / len(needle)
def _bigram_jaccard(a: str, b: str) -> float:
if len(a) < 2 or len(b) < 2:
return 0.0
A = {a[i:i + 2] for i in range(len(a) - 1)}
B = {b[i:i + 2] for i in range(len(b) - 1)}
inter = len(A & B)
union = len(A) + len(B) - inter
return inter / union if union else 0.0
def _longest_subsequence_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
j = 0
for ch in haystack:
if j >= len(needle):
break
if ch == needle[j]:
j += 1
return j / len(needle)
def _score_field(haystack: str, query: str) -> float:
hay = _normalize_text(haystack)
q = _normalize_text(query)
if not hay or not q:
return 0.0
hay_c = _compact_text(haystack)
q_c = _compact_text(query)
best = 0.0
if hay == q:
best = max(best, 100.0)
if q in hay:
best = max(best, 92 + min(6, len(q)))
if len(q) >= 2 and hay.startswith(q):
best = max(best, 88.0)
if q_c and q_c in hay_c:
best = max(best, 86.0)
best = max(best, _multiset_char_hit_ratio(q_c, hay_c) * 62)
best = max(best, _bigram_jaccard(q_c, hay_c) * 58)
best = max(best, _longest_subsequence_ratio(q_c, hay_c) * 52)
if len(q) == 1 and q in hay:
best = max(best, 68.0)
return best
def search_stickers(query: str, limit: int = 10) -> list[dict]:
"""
在内置贴纸表中按模糊匹配排序返回前 N 条结果
评分综合 name/description 字段的子串字符多重集覆盖bigram Jaccard子序列比例
name 权重略高于 description×0.88 query 时按字典顺序返回前 N
"""
safe_limit = max(1, min(500, int(limit) if limit else 10))
if not query or not _normalize_text(query):
return list(STICKER_MAP.values())[:safe_limit]
scored: list[tuple[float, dict]] = []
for sticker in STICKER_MAP.values():
name_s = _score_field(sticker.get("name", ""), query)
desc_s = _score_field(sticker.get("description", ""), query) * 0.88
sid = str(sticker.get("sticker_id", "")).strip()
q_norm = _normalize_text(query)
id_s = 0.0
if sid and q_norm:
sid_norm = _normalize_text(sid)
if sid_norm == q_norm:
id_s = 100.0
elif q_norm in sid_norm:
id_s = 84.0
scored.append((max(name_s, desc_s, id_s), sticker))
scored.sort(key=lambda x: x[0], reverse=True)
top = scored[0][0] if scored else 0
if top <= 0:
return [s for _, s in scored[:safe_limit]]
if top >= 22:
floor = 18.0
elif top >= 12:
floor = max(10.0, top * 0.5)
else:
floor = max(6.0, top * 0.35)
filtered = [pair for pair in scored if pair[0] >= floor]
out = filtered if filtered else scored
return [s for _, s in out[:safe_limit]]
def build_face_msg_body(
face_index: int,
face_type: int = 1,
data: Optional[str] = None,
) -> list:
"""
构造 TIMFaceElem 消息体
Yuanbao 约定
- index 固定传 0服务端通过 data 字段识别具体表情
- data JSON 字符串包含 sticker_id / package_id 等字段
Args:
face_index: 保留字段暂时不影响 wire formatYuanbao 固定 index=0
face_index > 0 时视为旧版 QQ 表情 ID直接放入 index
face_type: 保留字段兼容旧接口当前未使用
data: 已序列化的 JSON 字符串 None 时仅传 index
Returns:
符合 Yuanbao TIM 协议的 msg_body list::
[{"msg_type": "TIMFaceElem", "msg_content": {"index": 0, "data": "..."}}]
"""
msg_content: dict = {"index": face_index}
if data is not None:
msg_content["data"] = data
return [{"msg_type": "TIMFaceElem", "msg_content": msg_content}]
def build_sticker_msg_body(sticker: dict) -> list:
"""
STICKER_MAP 中的 sticker dict 直接构造 TIMFaceElem 消息体
这是 send_sticker() 的内部辅助确保 data 字段与原始 JS 插件一致
"""
data_payload = json.dumps(
{
"sticker_id": sticker["sticker_id"],
"package_id": sticker["package_id"],
"width": sticker.get("width", 128),
"height": sticker.get("height", 128),
"formats": sticker.get("formats", "png"),
"name": sticker["name"],
},
ensure_ascii=False,
separators=(",", ":"),
)
return build_face_msg_body(face_index=0, data=data_payload)
+698 -444
View File
File diff suppressed because it is too large Load Diff
+85 -18
View File
@@ -87,6 +87,9 @@ class SessionSource:
user_id_alt: Optional[str] = None # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
chat_id_alt: Optional[str] = None # Signal group internal ID
is_bot: bool = False # True when the message author is a bot/webhook (Discord)
guild_id: Optional[str] = None # Discord guild / Slack workspace / Matrix server scope
parent_chat_id: Optional[str] = None # Parent channel when chat_id refers to a thread
message_id: Optional[str] = None # ID of the triggering message (for pin/reply/react)
@property
def description(self) -> str:
@@ -124,8 +127,14 @@ class SessionSource:
d["user_id_alt"] = self.user_id_alt
if self.chat_id_alt:
d["chat_id_alt"] = self.chat_id_alt
if self.guild_id:
d["guild_id"] = self.guild_id
if self.parent_chat_id:
d["parent_chat_id"] = self.parent_chat_id
if self.message_id:
d["message_id"] = self.message_id
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
return cls(
@@ -139,6 +148,9 @@ class SessionSource:
chat_topic=data.get("chat_topic"),
user_id_alt=data.get("user_id_alt"),
chat_id_alt=data.get("chat_id_alt"),
guild_id=data.get("guild_id"),
parent_chat_id=data.get("parent_chat_id"),
message_id=data.get("message_id"),
)
@@ -190,6 +202,31 @@ that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def _discord_tools_loaded() -> bool:
"""True iff the agent will actually have Discord tools this session.
Two conditions must hold:
1. The `discord` or `discord_admin` toolset is enabled for the
Discord platform via `hermes tools` (opt-in, default OFF).
2. `DISCORD_BOT_TOKEN` is set the tool's `check_fn` gates on it
at registry time, so the toolset being enabled in config is not
enough if the token isn't configured.
Returns False (safe default keeps the stale-API disclaimer) on any
error so a bad config can't silently promise tools the agent lacks.
"""
if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
return False
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import _get_platform_tools
cfg = load_config()
enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
return "discord" in enabled or "discord_admin" in enabled
except Exception:
return False
def build_session_context_prompt(
context: SessionContext,
*,
@@ -273,18 +310,38 @@ def build_session_context_prompt(
"**Platform notes:** You are running inside Slack. "
"You do NOT have access to Slack-specific APIs — you cannot search "
"channel history, pin/unpin messages, manage channels, or list users. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
"Do not promise to perform these actions. The gateway may inline the "
"current message's Slack block/attachment payload when available, but "
"you still cannot call Slack APIs yourself."
)
elif context.source.platform == Platform.DISCORD:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
# Inject the Discord IDs block only when the agent actually has
# Discord tools loaded this session — i.e. the user opted into
# `discord` / `discord_admin` via `hermes tools` AND the bot
# token is configured. Otherwise keep the stale-API disclaimer
# honest so we never promise tools the agent lacks.
if _discord_tools_loaded():
src = context.source
id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
if src.guild_id:
id_lines.append(f" - Guild: `{src.guild_id}`")
if src.thread_id and src.parent_chat_id:
id_lines.append(f" - Parent channel: `{src.parent_chat_id}`")
id_lines.append(f" - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
else:
id_lines.append(f" - Channel: `{src.chat_id}`")
if src.message_id:
id_lines.append(f" - Triggering message: `{src.message_id}`")
lines.extend(id_lines)
else:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
@@ -297,6 +354,14 @@ def build_session_context_prompt(
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
)
elif context.source.platform == Platform.YUANBAO:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Yuanbao. "
"You CAN send private (DM) messages via the send_message tool. "
"Use target='yuanbao:direct:<account_id>' for DM "
"and target='yuanbao:group:<group_code>' for group chat."
)
# Connected platforms
platforms_list = ["local (files on this machine)"]
@@ -383,11 +448,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@@ -423,7 +488,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
"expiry_finalized": self.expiry_finalized,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@@ -475,7 +540,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
@@ -1176,6 +1241,7 @@ class SessionStore:
reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
codex_message_items=message.get("codex_message_items") if message.get("role") == "assistant" else None,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -1208,6 +1274,7 @@ class SessionStore:
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
+110
View File
@@ -44,6 +44,14 @@ class StreamConsumerConfig:
buffer_threshold: int = 40
cursor: str = ""
buffer_only: bool = False
# When >0, the final edit for a streamed response is delivered as a
# fresh message if the original preview has been visible for at least
# this many seconds. This makes the platform's visible timestamp
# reflect completion time instead of first-token time for long-running
# responses (e.g. reasoning models that stream slowly). Ported from
# openclaw/openclaw#72038. Default 0 = always edit in place (legacy
# behavior). The gateway enables this selectively per-platform.
fresh_final_after_seconds: float = 0.0
class GatewayStreamConsumer:
@@ -91,6 +99,12 @@ class GatewayStreamConsumer:
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
# Wall-clock timestamp (time.monotonic) when ``_message_id`` was
# first assigned from a successful first-send. Used by the
# fresh-final logic to detect long-lived previews whose edit
# timestamps would be stale by completion time. Ported from
# openclaw/openclaw#72038.
self._message_created_ts: Optional[float] = None
self._already_sent = False
self._edit_supported = True # Disabled when progressive edits are no longer usable
self._last_edit_time = 0.0
@@ -136,6 +150,7 @@ class GatewayStreamConsumer:
if preserve_no_edit and self._message_id == "__no_edit__":
return
self._message_id = None
self._message_created_ts = None
self._accumulated = ""
self._last_sent_text = ""
self._fallback_final_send = False
@@ -734,6 +749,81 @@ class GatewayStreamConsumer:
logger.error("Commentary send error: %s", e)
return False
def _should_send_fresh_final(self) -> bool:
"""Return True when a long-lived preview should be replaced with a
fresh final message instead of an edit.
Conditions:
- Fresh-final is enabled (``fresh_final_after_seconds > 0``).
- We have a real preview message id (not the ``__no_edit__`` sentinel
and not ``None``).
- The preview has been visible for at least the configured threshold.
Ported from openclaw/openclaw#72038.
"""
threshold = getattr(self.cfg, "fresh_final_after_seconds", 0.0) or 0.0
if threshold <= 0:
return False
if not self._message_id or self._message_id == "__no_edit__":
return False
if self._message_created_ts is None:
return False
age = time.monotonic() - self._message_created_ts
return age >= threshold
async def _try_fresh_final(self, text: str) -> bool:
"""Send ``text`` as a brand-new message (best-effort delete the old
preview) so the platform's visible timestamp reflects completion
time. Returns True on successful delivery, False on any failure so
the caller falls back to the normal edit path.
Ported from openclaw/openclaw#72038.
"""
old_message_id = self._message_id
try:
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
metadata=self.metadata,
)
except Exception as e:
logger.debug("Fresh-final send failed, falling back to edit: %s", e)
return False
if not getattr(result, "success", False):
return False
# Successful fresh send — try to delete the stale preview so the
# user doesn't see the old edit-stuck message underneath. Cleanup
# is best-effort; platforms that don't implement ``delete_message``
# just leave the preview behind (still an acceptable outcome —
# the visible final timestamp is the important part).
if old_message_id and old_message_id != "__no_edit__":
delete_fn = getattr(self.adapter, "delete_message", None)
if delete_fn is not None:
try:
await delete_fn(self.chat_id, old_message_id)
except Exception as e:
logger.debug(
"Fresh-final preview cleanup failed (%s): %s",
old_message_id, e,
)
# Adopt the new message id as the current message so subsequent
# callers (e.g. overflow split loops, finalize retries) see a
# consistent state.
new_message_id = getattr(result, "message_id", None)
if new_message_id:
self._message_id = new_message_id
self._message_created_ts = time.monotonic()
else:
# Send succeeded but platform didn't return an id — treat the
# delivery as final-only and fall back to "__no_edit__" so we
# don't try to edit something we can't address.
self._message_id = "__no_edit__"
self._message_created_ts = None
self._already_sent = True
self._last_sent_text = text
self._final_response_sent = True
return True
async def _send_or_edit(self, text: str, *, finalize: bool = False) -> bool:
"""Send or edit the streaming message.
@@ -786,6 +876,22 @@ class GatewayStreamConsumer:
finalize and self._adapter_requires_finalize
):
return True
# Fresh-final for long-lived previews: when finalizing
# the last edit in a streaming sequence, if the
# original preview has been visible for at least
# ``fresh_final_after_seconds``, send the completed
# reply as a fresh message so the platform's visible
# timestamp reflects completion time instead of the
# preview creation time. Best-effort cleanup of the
# old preview follows. Ported from
# openclaw/openclaw#72038. Gated by config so the
# legacy edit-in-place path stays the default.
if (
finalize
and self._should_send_fresh_final()
and await self._try_fresh_final(text)
):
return True
# Edit existing message
result = await self.adapter.edit_message(
chat_id=self.chat_id,
@@ -852,6 +958,10 @@ class GatewayStreamConsumer:
if result.success:
if result.message_id:
self._message_id = result.message_id
# Track when the preview first became visible to
# the user so fresh-final logic can detect stale
# preview timestamps on long-running responses.
self._message_created_ts = time.monotonic()
else:
self._edit_supported = False
self._already_sent = True
+21 -1
View File
@@ -31,8 +31,17 @@ Hermes' own session keys.
from __future__ import annotations
import json
import logging
import re
from typing import Set
logger = logging.getLogger(__name__)
# WhatsApp JIDs are numeric (or plus-prefixed numeric) with optional
# ``@``, ``.`` and ``:`` separators. ``\w`` is pinned to ASCII so
# full-width digits / Unicode word chars can't sneak through.
_SAFE_IDENTIFIER_RE = re.compile(r"^[A-Za-z0-9@.+\-]+$")
from hermes_constants import get_hermes_home
@@ -81,6 +90,16 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
current = queue.pop(0)
if not current or current in resolved:
continue
# Defense-in-depth: reject identifiers that could sneak path
# separators / traversal segments into the ``lid-mapping-{current}``
# filename below. The hardcoded ``lid-mapping-`` prefix already
# prevents escape via pathlib's component split (an attacker can't
# create ``lid-mapping-..`` as a real directory in session_dir), but
# this keeps the identifier space to the characters WhatsApp JIDs
# actually use and avoids depending on that filesystem-layout
# invariant.
if not _SAFE_IDENTIFIER_RE.match(current):
continue
resolved.add(current)
for suffix in ("", "_reverse"):
@@ -91,7 +110,8 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
except (OSError, json.JSONDecodeError) as exc:
logger.debug("whatsapp_identity: failed to read %s: %s", mapping_path, exc)
continue
if mapped and mapped not in resolved:
queue.append(mapped)
+27 -3
View File
@@ -356,6 +356,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=(),
base_url_env_var="BEDROCK_BASE_URL",
),
"azure-foundry": ProviderConfig(
id="azure-foundry",
name="Azure Foundry",
auth_type="api_key",
inference_base_url="", # User-provided endpoint
api_key_env_vars=("AZURE_FOUNDRY_API_KEY",),
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
}
@@ -459,11 +467,27 @@ def _resolve_api_key_provider_secret(
pass
return "", ""
from hermes_cli.config import get_env_value
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
# Check both os.environ and ~/.hermes/.env file
val = (get_env_value(env_var) or "").strip()
if has_usable_secret(val):
return val, env_var
# Fallback: try credential pool (e.g. zai key stored via auth.json)
try:
from agent.credential_pool import load_pool
pool = load_pool(provider_id)
if pool and pool.has_credentials():
entry = pool.peek()
if entry:
key = getattr(entry, "access_token", "") or getattr(entry, "runtime_api_key", "")
key = str(key).strip()
if has_usable_secret(key):
return key, f"credential_pool:{provider_id}"
except Exception:
pass
return "", ""
@@ -4236,10 +4260,10 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
)
from hermes_cli.models import (
_PROVIDER_MODELS, get_pricing_for_provider,
get_curated_nous_model_ids, get_pricing_for_provider,
check_nous_free_tier, partition_nous_models_by_tier,
)
model_ids = _PROVIDER_MODELS.get("nous", [])
model_ids = get_curated_nous_model_ids()
print()
unavailable_models: list = []
+300
View File
@@ -0,0 +1,300 @@
"""Azure Foundry endpoint auto-detection.
Inspect an Azure AI Foundry / Azure OpenAI endpoint to determine:
- API transport (OpenAI-style ``chat_completions`` vs
Anthropic-style ``anthropic_messages``)
- Available models (best effort Azure does not expose a deployment
listing via the inference API key, but Azure OpenAI v1 endpoints
return the resource's model catalog via ``GET /models``)
- Context length for each discovered/entered model, via the existing
:func:`agent.model_metadata.get_model_context_length` resolver.
Rationale:
Azure has no pure-API-key deployment-listing endpoint per Microsoft,
deployment enumeration requires ARM management-plane auth. Azure
OpenAI v1 endpoints ``{resource}.openai.azure.com/openai/v1`` do return
a ``/models`` list, but it reflects the resource's *available* models
rather than the user's *deployed* deployment names. In practice it is
still a useful hint the user picks a familiar model name and we look
up its context length from the catalog.
The detector never crashes on errors (every HTTP call is wrapped in a
broad try/except). Callers get a :class:`DetectionResult` with whatever
information could be gathered, and fall back to manual entry for the
rest.
"""
from __future__ import annotations
import json
import logging
import re
from dataclasses import dataclass, field
from typing import Optional
from urllib import request as urllib_request
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse, urlunparse
logger = logging.getLogger(__name__)
# Default Azure OpenAI ``api-version`` to probe with. The v1 GA endpoint
# accepts requests without ``api-version`` entirely, so this is only used
# as a fallback for pre-v1 resources that still require it.
_AZURE_OPENAI_PROBE_API_VERSIONS = (
"2025-04-01-preview",
"2024-10-21", # oldest GA that supports /models
)
# Default Azure Anthropic ``api-version``. Matches the value used by
# ``agent/anthropic_adapter.py`` when building the Anthropic client.
_AZURE_ANTHROPIC_API_VERSION = "2025-04-15"
@dataclass
class DetectionResult:
"""Everything auto-detection could gather from a base URL + API key."""
#: Detected API transport: ``"chat_completions"``,
#: ``"anthropic_messages"``, or ``None`` when detection failed.
api_mode: Optional[str] = None
#: Deployment / model IDs returned by ``/models`` (best effort).
#: Empty when the endpoint doesn't expose the list with an API key.
models: list[str] = field(default_factory=list)
#: Lowercased host from the base URL (used for display messages).
hostname: str = ""
#: Human-readable reason the detector chose ``api_mode``. Useful
#: for explaining auto-detection to the user in the wizard.
reason: str = ""
#: ``True`` when ``/models`` returned a valid OpenAI-shaped payload.
models_probe_ok: bool = False
#: ``True`` when the URL was determined to be an Anthropic-style
#: endpoint (from path suffix or live probe).
is_anthropic: bool = False
def _http_get_json(url: str, api_key: str, timeout: float = 6.0) -> tuple[int, Optional[dict]]:
"""GET a URL with ``api-key`` + ``Authorization`` headers. Return
``(status_code, parsed_json_or_None)``. Never raises."""
req = urllib_request.Request(url, method="GET")
# Azure OpenAI uses ``api-key``. Some Azure deployments (and
# Anthropic-style routes) use ``Authorization: Bearer``. Send both
# so we probe once per URL rather than twice.
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=timeout) as resp:
body = resp.read()
try:
return resp.status, json.loads(body.decode("utf-8", errors="replace"))
except Exception:
return resp.status, None
except HTTPError as exc:
return exc.code, None
except (URLError, TimeoutError, OSError) as exc:
logger.debug("azure_detect: GET %s failed: %s", url, exc)
return 0, None
except Exception as exc: # pragma: no cover — defensive
logger.debug("azure_detect: GET %s unexpected error: %s", url, exc)
return 0, None
def _strip_trailing_v1(url: str) -> str:
"""Strip trailing ``/v1`` or ``/v1/`` so we can construct sub-paths."""
return re.sub(r"/v1/?$", "", url.rstrip("/"))
def _looks_like_anthropic_path(url: str) -> bool:
"""Return True when the URL's path ends in ``/anthropic`` or
contains a ``/anthropic/`` segment. Used by Azure Foundry
resources that route Claude traffic through a dedicated path."""
try:
parsed = urlparse(url)
path = (parsed.path or "").lower().rstrip("/")
return path.endswith("/anthropic") or "/anthropic/" in path + "/"
except Exception:
return False
def _extract_model_ids(payload: dict) -> list[str]:
"""Extract a list of model IDs from an OpenAI-shaped ``/models``
response. Returns ``[]`` on any shape mismatch."""
data = payload.get("data") if isinstance(payload, dict) else None
if not isinstance(data, list):
return []
ids: list[str] = []
for item in data:
if not isinstance(item, dict):
continue
# OpenAI shape: {"id": "gpt-5.4", "object": "model", ...}
mid = item.get("id") or item.get("model") or item.get("name")
if isinstance(mid, str) and mid:
ids.append(mid)
return ids
def _probe_openai_models(base_url: str, api_key: str) -> tuple[bool, list[str]]:
"""Probe ``<base>/models`` for an OpenAI-shaped response.
Returns ``(ok, models)``. ``ok`` is True iff the endpoint accepted
us as an OpenAI-style caller (200 OK + OpenAI-shaped JSON body).
"""
base_url = base_url.rstrip("/")
# Azure OpenAI v1: {resource}.openai.azure.com/openai/v1 — no
# api-version required for GA paths, so probe without first.
candidates = [f"{base_url}/models"]
# Fallback: explicit api-version for pre-v1 resources
for v in _AZURE_OPENAI_PROBE_API_VERSIONS:
candidates.append(f"{base_url}/models?api-version={v}")
for url in candidates:
status, body = _http_get_json(url, api_key)
if status == 200 and body is not None:
ids = _extract_model_ids(body)
if ids:
logger.info(
"azure_detect: /models probe OK at %s (%d models)",
url, len(ids),
)
return True, ids
# 200 + empty list still counts as "OpenAI shape, no models
# listed" — let the user proceed with manual entry.
if isinstance(body, dict) and "data" in body:
return True, []
return False, []
def _probe_anthropic_messages(base_url: str, api_key: str) -> bool:
"""Send a zero-token request to ``<base>/v1/messages`` and check
whether the endpoint at least *recognises* the Anthropic Messages
shape (any 4xx that mentions ``messages`` or ``model``, or a 400
``invalid_request`` with an Anthropic error shape). Never completes
a real chat.
"""
base = _strip_trailing_v1(base_url)
url = f"{base}/v1/messages?api-version={_AZURE_ANTHROPIC_API_VERSION}"
payload = json.dumps({
"model": "probe",
"max_tokens": 1,
"messages": [{"role": "user", "content": "ping"}],
}).encode("utf-8")
req = urllib_request.Request(url, method="POST", data=payload)
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("anthropic-version", "2023-06-01")
req.add_header("content-type", "application/json")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=6.0) as resp:
# Should never 200 — "probe" isn't a real deployment. But
# if it does, the endpoint definitely speaks Anthropic.
return resp.status < 500
except HTTPError as exc:
# 4xx with an Anthropic-shaped error body = Anthropic endpoint.
try:
body = exc.read().decode("utf-8", errors="replace")
lowered = body.lower()
if "anthropic" in lowered or '"type"' in lowered and '"error"' in lowered:
return True
# Pre-Azure-v1 Azure Foundry returns a plain 404 for
# Anthropic-style calls on non-Anthropic deployments. A
# 400 "model not found" IS Anthropic though.
if exc.code == 400 and ("messages" in lowered or "model" in lowered):
return True
return False
except Exception:
return False
except (URLError, TimeoutError, OSError):
return False
except Exception: # pragma: no cover
return False
def detect(base_url: str, api_key: str) -> DetectionResult:
"""Inspect an Azure endpoint and describe its transport + models.
Call this from the wizard before asking the user to pick an API
mode manually. The caller should treat the returned
:class:`DetectionResult` as *advisory* if ``api_mode`` is None,
fall back to asking the user.
"""
result = DetectionResult()
try:
parsed = urlparse(base_url)
result.hostname = (parsed.hostname or "").lower()
except Exception:
result.hostname = ""
# 1. Path sniff. Azure Foundry exposes Anthropic-style deployments
# under a dedicated ``/anthropic`` path.
if _looks_like_anthropic_path(base_url):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "URL path ends in /anthropic → Anthropic Messages API"
return result
# 2. Try the OpenAI-style /models probe. If this works, the
# endpoint definitely speaks OpenAI wire.
ok, models = _probe_openai_models(base_url, api_key)
if ok:
result.models_probe_ok = True
result.models = models
result.api_mode = "chat_completions"
result.reason = (
f"GET /models returned {len(models)} model(s) — OpenAI-style endpoint"
if models
else "GET /models returned an OpenAI-shaped empty list — OpenAI-style endpoint"
)
return result
# 3. Fallback: probe the Anthropic Messages shape. Slower and more
# intrusive than /models, so only run it when the OpenAI probe
# failed.
if _probe_anthropic_messages(base_url, api_key):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "Endpoint accepts Anthropic Messages shape"
return result
# Nothing matched. Caller falls back to manual selection.
result.reason = (
"Could not probe endpoint (private network, missing model list, or "
"non-standard path) — falling back to manual API-mode selection"
)
return result
def lookup_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
"""Thin wrapper around :func:`agent.model_metadata.get_model_context_length`
that returns ``None`` when only the fallback default (128k) would
fire, so the wizard can distinguish "we actually know this" from
"we guessed."""
try:
from agent.model_metadata import (
DEFAULT_FALLBACK_CONTEXT,
get_model_context_length,
)
except Exception:
return None
try:
n = get_model_context_length(model, base_url=base_url, api_key=api_key)
except Exception as exc:
logger.debug("azure_detect: context length lookup failed: %s", exc)
return None
if isinstance(n, int) and n > 0 and n != DEFAULT_FALLBACK_CONTEXT:
return n
return None
__all__ = ["DetectionResult", "detect", "lookup_context_length"]
+113 -6
View File
@@ -84,9 +84,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("deny", "Deny a pending dangerous command", "Session",
gateway_only=True),
CommandDef("background", "Run a prompt in the background", "Session",
aliases=("bg",), args_hint="<prompt>"),
CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
args_hint="<question>"),
aliases=("bg", "btw"), args_hint="<prompt>"),
CommandDef("agents", "Show active agents and running tasks", "Session",
aliases=("tasks",)),
CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
@@ -103,7 +101,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
@@ -127,8 +126,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
cli_only=True, args_hint="[queue|steer|interrupt|status]",
subcommands=("queue", "steer", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -807,6 +806,114 @@ def discord_skill_commands_by_category(
return trimmed_categories, uncategorized, hidden
# ---------------------------------------------------------------------------
# Slack native slash commands
# ---------------------------------------------------------------------------
# Slack slash command name constraints: lowercase a-z, 0-9, hyphens,
# underscores. Max 32 chars. Slack app manifest accepts up to 50 slash
# commands per app.
_SLACK_MAX_SLASH_COMMANDS = 50
_SLACK_NAME_LIMIT = 32
_SLACK_INVALID_CHARS = re.compile(r"[^a-z0-9_\-]")
def _sanitize_slack_name(raw: str) -> str:
"""Convert a command name to a valid Slack slash command name.
Slack allows lowercase a-z, digits, hyphens, and underscores. Max 32
chars. Uppercase is lowercased; invalid chars are stripped.
"""
name = raw.lower()
name = _SLACK_INVALID_CHARS.sub("", name)
name = name.strip("-_")
return name[:_SLACK_NAME_LIMIT]
def slack_native_slashes() -> list[tuple[str, str, str]]:
"""Return (slash_name, description, usage_hint) triples for Slack.
Every gateway-available command in ``COMMAND_REGISTRY`` is surfaced as
a standalone Slack slash command (e.g. ``/btw``, ``/stop``, ``/model``),
matching Discord's and Telegram's model where every command is a
first-class slash and not a ``/hermes <verb>`` subcommand.
Both canonical names and aliases are included so users can type any
documented form (e.g. ``/background``, ``/bg``, and ``/btw`` all work).
Plugin-registered slash commands are included too.
Results are clamped to Slack's 50-command limit with duplicate-name
avoidance. ``/hermes`` is always reserved as the first entry so the
legacy ``/hermes <subcommand>`` form keeps working for anything that
gets dropped by the clamp or for free-form questions.
"""
overrides = _resolve_config_gates()
entries: list[tuple[str, str, str]] = []
seen: set[str] = set()
# Reserve /hermes as the catch-all top-level command.
entries.append(("hermes", "Talk to Hermes or run a subcommand", "[subcommand] [args]"))
seen.add("hermes")
def _add(name: str, desc: str, hint: str) -> None:
slack_name = _sanitize_slack_name(name)
if not slack_name or slack_name in seen:
return
if len(entries) >= _SLACK_MAX_SLASH_COMMANDS:
return
# Slack description cap is 2000 chars; keep it short.
entries.append((slack_name, desc[:140], hint[:100]))
seen.add(slack_name)
# First pass: canonical names (so they win slots if we hit the cap).
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
_add(cmd.name, cmd.description, cmd.args_hint or "")
# Second pass: aliases.
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
for alias in cmd.aliases:
# Skip aliases that only differ from canonical by case/punctuation
# normalization (already covered by _add dedup).
_add(alias, f"Alias for /{cmd.name}{cmd.description}", cmd.args_hint or "")
# Third pass: plugin commands.
for name, description, args_hint in _iter_plugin_command_entries():
_add(name, description, args_hint or "")
return entries
def slack_app_manifest(request_url: str = "https://hermes-agent.local/slack/commands") -> dict[str, Any]:
"""Generate a Slack app manifest with all gateway commands as slashes.
``request_url`` is required by Slack's manifest schema for every slash
command, but in Socket Mode (which we use) Slack ignores it and routes
the command event through the WebSocket. A placeholder URL is fine.
The returned dict is the ``features.slash_commands`` portion only
callers compose it into a full manifest (or merge into an existing
one). Keeping it narrow avoids coupling us to the rest of the manifest
schema (display_information, oauth_config, settings, etc.) which users
set up once in the Slack UI and rarely change.
"""
slashes = []
for name, desc, usage in slack_native_slashes():
entry = {
"command": f"/{name}",
"description": desc or f"Run /{name}",
"should_escape": False,
"url": request_url,
}
if usage:
entry["usage_hint"] = usage
slashes.append(entry)
return {"features": {"slash_commands": slashes}}
def slack_subcommand_map() -> dict[str, str]:
"""Return subcommand -> /command mapping for Slack /hermes handler.
+162 -10
View File
@@ -465,6 +465,7 @@ DEFAULT_CONFIG = {
"command_timeout": 30, # Timeout for browser commands in seconds (screenshot, navigate, etc.)
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"auto_local_for_private_urls": True, # When a cloud provider is set, auto-spawn local Chromium for LAN/localhost URLs instead of sending them to the cloud
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
@@ -486,6 +487,19 @@ DEFAULT_CONFIG = {
"checkpoints": {
"enabled": True,
"max_snapshots": 50, # Max checkpoints to keep per directory
# Auto-maintenance: shadow repos accumulate forever under
# ~/.hermes/checkpoints/ (one per cd'd working directory). Field
# reports put the typical offender at 1000+ repos / ~12 GB. When
# auto_prune is on, hermes sweeps at startup (at most once per
# min_interval_hours) and deletes:
# * orphan repos: HERMES_WORKDIR no longer exists on disk
# * stale repos: newest mtime older than retention_days
# Opt-in so users who rely on /rollback against long-ago sessions
# never lose data silently.
"auto_prune": False,
"retention_days": 7,
"delete_orphans": True,
"min_interval_hours": 24,
},
# Maximum characters returned by a single read_file call. Reads that
@@ -612,14 +626,6 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",
@@ -634,7 +640,7 @@ DEFAULT_CONFIG = {
"compact": False,
"personality": "kawaii",
"resume_display": "full",
"busy_input_mode": "interrupt",
"busy_input_mode": "interrupt", # interrupt | queue | steer
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
@@ -848,7 +854,7 @@ DEFAULT_CONFIG = {
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
# discord_server tool: restrict which actions the agent may call.
# discord / discord_admin tools: restrict which actions the agent may call.
# Default (empty) = all actions allowed (subject to bot privileged intents).
# Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
# or YAML list. Unknown names are dropped with a warning at load time.
@@ -967,6 +973,27 @@ DEFAULT_CONFIG = {
"backup_count": 3, # Number of rotated backup files to keep
},
# Remotely-hosted model catalog manifest. When enabled, the CLI fetches
# curated model lists for OpenRouter and Nous Portal from this URL,
# falling back to the in-repo snapshot on network failure. Lets us
# update model picker lists without shipping a hermes-agent release.
# The default URL is served by the docs site GitHub Pages deploy.
"model_catalog": {
"enabled": True,
"url": "https://hermes-agent.nousresearch.com/docs/api/model-catalog.json",
# Disk cache TTL in hours. Beyond this, the CLI refetches on the
# next /model or `hermes model` invocation; network failures
# silently fall back to the stale cache.
"ttl_hours": 24,
# Optional per-provider override URLs for third parties that want
# to self-host their own curation list using the same schema.
# Example:
# providers:
# openrouter:
# url: https://example.com/my-curation.json
"providers": {},
},
# Network settings — workarounds for connectivity issues.
"network": {
# Force IPv4 connections. On servers with broken or unreachable IPv6,
@@ -1003,6 +1030,13 @@ DEFAULT_CONFIG = {
"min_interval_hours": 24,
},
# Contextual first-touch onboarding hints (see agent/onboarding.py).
# Each hint is shown once per install and then latched here so it
# never fires again. Users can wipe the section to re-see all hints.
"onboarding": {
"seen": {},
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1379,6 +1413,21 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AZURE_FOUNDRY_API_KEY": {
"description": "Azure Foundry API key for custom Azure endpoints",
"prompt": "Azure Foundry API Key",
"url": "https://ai.azure.com/",
"password": True,
"category": "provider",
},
"AZURE_FOUNDRY_BASE_URL": {
"description": "Azure Foundry base URL (set via 'hermes model' for endpoint-specific config)",
"prompt": "Azure Foundry base URL",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -1546,6 +1595,44 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Bundled skills (opt-in: only needed if the user uses that skill) ──
# These use category="skill" (distinct from "tool") so the sandbox
# env blocklist in tools/environments/local.py does NOT rewrite them —
# skills legitimately need these passed through to curl via
# tools/env_passthrough.py when the user's skill calls out.
"NOTION_API_KEY": {
"description": "Notion integration token (used by the `notion` skill)",
"prompt": "Notion API key",
"url": "https://www.notion.so/my-integrations",
"password": True,
"category": "skill",
"advanced": True,
},
"LINEAR_API_KEY": {
"description": "Linear personal API key (used by the `linear` skill)",
"prompt": "Linear API key",
"url": "https://linear.app/settings/api",
"password": True,
"category": "skill",
"advanced": True,
},
"AIRTABLE_API_KEY": {
"description": "Airtable personal access token (used by the `airtable` skill)",
"prompt": "Airtable API key",
"url": "https://airtable.com/create/tokens",
"password": True,
"category": "skill",
"advanced": True,
},
"TENOR_API_KEY": {
"description": "Tenor API key for GIF search (used by the `gif-search` skill)",
"prompt": "Tenor API key",
"url": "https://developers.google.com/tenor/guides/quickstart",
"password": True,
"category": "skill",
"advanced": True,
},
# ── Honcho ──
"HONCHO_API_KEY": {
"description": "Honcho API key for AI-native persistent memory",
@@ -2214,6 +2301,71 @@ def get_compatible_custom_providers(
return compatible
def get_custom_provider_context_length(
model: str,
base_url: str,
custom_providers: Optional[List[Dict[str, Any]]] = None,
config: Optional[Dict[str, Any]] = None,
) -> Optional[int]:
"""Look up a per-model ``context_length`` override from ``custom_providers``.
Matches any entry whose ``base_url`` equals ``base_url`` (trailing-slash
insensitive) and returns ``custom_providers[i].models.<model>.context_length``
if present and valid. Returns ``None`` when no override applies.
This is the single source of truth for custom-provider context overrides,
used by:
* ``AIAgent.__init__`` (startup resolution)
* ``AIAgent.switch_model`` (mid-session ``/model`` switch)
* ``hermes_cli.model_switch.resolve_display_context_length`` (``/model`` confirmation display)
* ``gateway.run._format_session_info`` (``/info`` display)
* ``agent.model_metadata.get_model_context_length`` (when custom_providers is threaded through)
Before this helper existed, the lookup was duplicated in ``run_agent.py``'s
startup path only; every other path (notably ``/model`` switch) fell back
to the 128K default. See #15779.
"""
if not model or not base_url:
return None
if custom_providers is None:
try:
custom_providers = get_compatible_custom_providers(config)
except Exception:
if config is None:
return None
raw = config.get("custom_providers")
custom_providers = raw if isinstance(raw, list) else []
if not isinstance(custom_providers, list):
return None
target_url = (base_url or "").rstrip("/")
if not target_url:
return None
for entry in custom_providers:
if not isinstance(entry, dict):
continue
entry_url = (entry.get("base_url") or "").rstrip("/")
if not entry_url or entry_url != target_url:
continue
models = entry.get("models")
if not isinstance(models, dict):
continue
model_cfg = models.get(model)
if not isinstance(model_cfg, dict):
continue
raw_ctx = model_cfg.get("context_length")
if raw_ctx is None:
continue
try:
ctx = int(raw_ctx)
except (TypeError, ValueError):
continue
if ctx > 0:
return ctx
return None
def check_config_version() -> Tuple[int, int]:
"""
Check config version.
+5 -1
View File
@@ -320,7 +320,11 @@ def run_doctor(args):
known_providers.add("custom:" + name.lower().replace(" ", "-"))
canonical_provider = provider
if provider and _resolve_provider_full is not None and provider != "auto":
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
+361
View File
@@ -0,0 +1,361 @@
"""
hermes fallback manage the fallback provider chain.
Fallback providers are tried in order when the primary model fails with
rate-limit, overload, or connection errors. See:
https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers
Subcommands:
hermes fallback [list] Show the current fallback chain (default when no subcommand)
hermes fallback add Pick provider + model via the same picker as `hermes model`,
then append the selection to the chain
hermes fallback remove Pick an entry to delete from the chain
hermes fallback clear Remove all fallback entries
Storage: ``fallback_providers`` in ``~/.hermes/config.yaml`` (top-level, list of
``{provider, model, base_url?, api_mode?}`` dicts). The legacy single-dict
``fallback_model`` format is migrated to the new list format on first add.
"""
from __future__ import annotations
import copy
from typing import Any, Dict, List, Optional
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return the normalized fallback chain as a list of dicts.
Accepts both the new list format (``fallback_providers``) and the legacy
single-dict format (``fallback_model``). The returned list is always a
fresh copy callers can mutate without touching the config dict.
"""
chain = config.get("fallback_providers") or []
if isinstance(chain, list):
result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
if result:
return result
legacy = config.get("fallback_model")
if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
return [dict(legacy)]
if isinstance(legacy, list):
return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
return []
def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
"""Persist the chain to ``fallback_providers`` and clear legacy key."""
config["fallback_providers"] = chain
# Drop the legacy single-dict key on write so there's only one source of truth.
if "fallback_model" in config:
config.pop("fallback_model", None)
def _format_entry(entry: Dict[str, Any]) -> str:
"""One-line human-readable rendering of a fallback entry."""
provider = entry.get("provider", "?")
model = entry.get("model", "?")
base = entry.get("base_url")
suffix = f" [{base}]" if base else ""
return f"{model} (via {provider}){suffix}"
def _extract_fallback_from_model_cfg(model_cfg: Any) -> Optional[Dict[str, Any]]:
"""Pull the ``{provider, model, base_url?, api_mode?}`` dict from a ``config["model"]`` snapshot."""
if not isinstance(model_cfg, dict):
return None
provider = (model_cfg.get("provider") or "").strip()
# The picker writes the selected model to ``model.default``.
model = (model_cfg.get("default") or model_cfg.get("model") or "").strip()
if not provider or not model:
return None
entry: Dict[str, Any] = {"provider": provider, "model": model}
base_url = (model_cfg.get("base_url") or "").strip()
if base_url:
entry["base_url"] = base_url
api_mode = (model_cfg.get("api_mode") or "").strip()
if api_mode:
entry["api_mode"] = api_mode
return entry
def _snapshot_auth_active_provider() -> Any:
"""Return the current ``active_provider`` in auth.json, or a sentinel if unavailable."""
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
return store.get("active_provider")
except Exception:
return None
def _restore_auth_active_provider(value: Any) -> None:
"""Write back a previously snapshotted ``active_provider`` value."""
try:
from hermes_cli.auth import _auth_store_lock, _load_auth_store, _save_auth_store
with _auth_store_lock():
store = _load_auth_store()
store["active_provider"] = value
_save_auth_store(store)
except Exception:
# Best-effort — if auth.json can't be restored, the user's primary
# provider may have been deactivated by the picker. They can re-run
# `hermes model` to fix it. Don't fail the fallback add.
pass
# ---------------------------------------------------------------------------
# Subcommand handlers
# ---------------------------------------------------------------------------
def cmd_fallback_list(args) -> None: # noqa: ARG001
"""Print the current fallback chain."""
from hermes_cli.config import load_config
config = load_config()
chain = _read_chain(config)
print()
if not chain:
print(" No fallback providers configured.")
print()
print(" Add one with: hermes fallback add")
print()
return
primary = _describe_primary(config)
if primary:
print(f" Primary: {primary}")
print()
print(f" Fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
print(" Tried in order when the primary fails (rate-limit, 5xx, connection errors).")
print(" Docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers")
print()
def _describe_primary(config: Dict[str, Any]) -> Optional[str]:
"""One-line description of the primary model for display purposes."""
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
provider = (model_cfg.get("provider") or "?").strip() or "?"
model = (model_cfg.get("default") or model_cfg.get("model") or "?").strip() or "?"
return f"{model} (via {provider})"
if isinstance(model_cfg, str) and model_cfg.strip():
return model_cfg.strip()
return None
def cmd_fallback_add(args) -> None:
"""Launch the same picker as `hermes model`, then append the selection to the chain."""
from hermes_cli.main import _require_tty, select_provider_and_model
from hermes_cli.config import load_config, save_config
_require_tty("fallback add")
# Snapshot BEFORE the picker runs so we can distinguish "user actually
# picked something" from "user cancelled" by comparing before/after.
before_cfg = load_config()
model_before = copy.deepcopy(before_cfg.get("model"))
active_provider_before = _snapshot_auth_active_provider()
print()
print(" Adding a fallback provider. The picker below is the same one used by")
print(" `hermes model` — select the provider + model you want as a fallback.")
print()
try:
select_provider_and_model(args=args)
except SystemExit:
# Some provider flows exit on auth failure — restore state and re-raise.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
raise
# Read the post-picker state to see what the user selected.
after_cfg = load_config()
model_after = after_cfg.get("model")
new_entry = _extract_fallback_from_model_cfg(model_after)
if not new_entry:
# Picker didn't complete (user cancelled or flow bailed). Nothing to do.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(" No fallback added.")
return
# Picker picked the same thing that's already the primary → nothing changed,
# and there's nothing useful to add as a fallback to itself.
primary_entry = _extract_fallback_from_model_cfg(model_before)
if primary_entry and primary_entry["provider"] == new_entry["provider"] \
and primary_entry["model"] == new_entry["model"]:
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(f" Selected model matches the current primary ({_format_entry(new_entry)}).")
print(" A provider cannot be a fallback for itself — no change.")
return
# Reload the config with the primary restored, then append the new entry
# to ``fallback_providers``. We deliberately re-load (rather than mutating
# ``after_cfg``) because the picker may have touched other top-level keys
# (custom_providers, providers credentials) that we want to keep.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
final_cfg = load_config()
chain = _read_chain(final_cfg)
# Reject exact-duplicate fallback entries.
for existing in chain:
if existing.get("provider") == new_entry["provider"] \
and existing.get("model") == new_entry["model"]:
print()
print(f" {_format_entry(new_entry)} is already in the fallback chain — skipped.")
return
chain.append(new_entry)
_write_chain(final_cfg, chain)
save_config(final_cfg)
print()
print(f" Added fallback: {_format_entry(new_entry)}")
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
print()
print(" Run `hermes fallback list` to view, or `hermes fallback remove` to delete.")
def _restore_model_cfg(model_before: Any) -> None:
"""Restore ``config["model"]`` to a previously-captured snapshot."""
from hermes_cli.config import load_config, save_config
cfg = load_config()
if model_before is None:
cfg.pop("model", None)
else:
cfg["model"] = copy.deepcopy(model_before)
save_config(cfg)
def cmd_fallback_remove(args) -> None: # noqa: ARG001
"""Pick an entry from the chain and remove it."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to remove.")
print()
return
choices = [_format_entry(e) for e in chain]
choices.append("Cancel")
try:
from hermes_cli.setup import _curses_prompt_choice
idx = _curses_prompt_choice("Select a fallback to remove:", choices, 0)
except Exception:
idx = _numbered_pick("Select a fallback to remove:", choices)
if idx is None or idx < 0 or idx >= len(chain):
print()
print(" Cancelled — no change.")
return
removed = chain.pop(idx)
_write_chain(config, chain)
save_config(config)
print()
print(f" Removed fallback: {_format_entry(removed)}")
if chain:
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
else:
print(" Fallback chain is now empty.")
print()
def cmd_fallback_clear(args) -> None: # noqa: ARG001
"""Remove all fallback entries (with confirmation)."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to clear.")
print()
return
print()
print(f" Current fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
try:
resp = input(" Clear all entries? [y/N]: ").strip().lower()
except (KeyboardInterrupt, EOFError):
print()
print(" Cancelled.")
return
if resp not in ("y", "yes"):
print(" Cancelled — no change.")
return
_write_chain(config, [])
save_config(config)
print()
print(" Fallback chain cleared.")
print()
def _numbered_pick(question: str, choices: List[str]) -> Optional[int]:
"""Fallback numbered-list picker when curses is unavailable."""
print(question)
for i, c in enumerate(choices, 1):
print(f" {i}. {c}")
print()
while True:
try:
val = input(f"Choice [1-{len(choices)}]: ").strip()
if not val:
return None
idx = int(val) - 1
if 0 <= idx < len(choices):
return idx
print(f"Please enter 1-{len(choices)}")
except ValueError:
print("Please enter a number")
except (KeyboardInterrupt, EOFError):
print()
return None
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
def cmd_fallback(args) -> None:
"""Top-level dispatcher for ``hermes fallback [subcommand]``."""
sub = getattr(args, "fallback_command", None)
if sub in (None, "", "list", "ls"):
cmd_fallback_list(args)
elif sub == "add":
cmd_fallback_add(args)
elif sub in ("remove", "rm"):
cmd_fallback_remove(args)
elif sub == "clear":
cmd_fallback_clear(args)
else:
print(f"Unknown fallback subcommand: {sub}")
print("Use one of: list, add, remove, clear")
raise SystemExit(2)
+24
View File
@@ -2724,6 +2724,24 @@ _PLATFORMS = [
"help": "OpenID to deliver cron results and notifications to."},
],
},
{
"key": "yuanbao",
"label": "Yuanbao",
"emoji": "💎",
"token_var": "YUANBAO_APP_ID",
"setup_instructions": [
"1. Download the Yuanbao app from https://yuanbao.tencent.com/",
"2. In the app, go to PAI → My Bot and create a new bot",
"3. After the bot is created, copy the App ID and App Secret",
"4. Enter them below and Hermes will connect automatically over WebSocket",
],
"vars": [
{"name": "YUANBAO_APP_ID", "prompt": "App ID", "password": False,
"help": "The App ID from your Yuanbao IM Bot credentials."},
{"name": "YUANBAO_APP_SECRET", "prompt": "App Secret", "password": True,
"help": "The App Secret (used for HMAC signing) from your Yuanbao IM Bot."},
],
},
]
@@ -3108,6 +3126,12 @@ def _setup_wecom():
print_success("💬 WeCom configured!")
def _setup_yuanbao():
"""Configure Yuanbao via the standard platform setup."""
yuanbao_platform = next(p for p in _PLATFORMS if p["key"] == "yuanbao")
_setup_standard_platform(yuanbao_platform)
def _is_service_installed() -> bool:
"""Check if the gateway is installed as a system service."""
if supports_systemd_services():
+1
View File
@@ -125,6 +125,7 @@ _DEFAULT_PAYLOADS = {
"task_id": "test-task",
"tool_call_id": "test-call",
"result": '{"output": "hello"}',
"duration_ms": 42,
},
"pre_llm_call": {
"session_id": "test-session",
+843 -45
View File
File diff suppressed because it is too large Load Diff
+329
View File
@@ -0,0 +1,329 @@
"""Remote model catalog fetcher.
The Hermes docs site hosts a JSON manifest of curated models for providers
we want to update without shipping a release (currently OpenRouter and
Nous Portal). This module fetches, validates, and caches that manifest,
falling back to the in-repo hardcoded lists when the network is unavailable.
Pipeline
--------
1. ``get_catalog()`` returns a parsed manifest dict.
- Checks in-process cache (invalidated by TTL).
- Reads disk cache at ``~/.hermes/cache/model_catalog.json``.
- Fetches the master URL if disk cache is stale or missing.
- On any fetch failure, keeps using the stale cache (or empty dict).
2. ``get_curated_openrouter_models()`` / ``get_curated_nous_models()``
thin accessors returning the shapes existing callers expect. Each
falls back to the in-repo hardcoded list on any lookup failure.
Schema (version 1)
------------------
::
{
"version": 1,
"updated_at": "2026-04-25T22:00:00Z",
"metadata": {...}, # free-form
"providers": {
"openrouter": {
"metadata": {...}, # free-form
"models": [
{"id": "vendor/model", "description": "recommended",
"metadata": {...}} # free-form, model-level
]
},
"nous": {...}
}
}
Unknown fields are ignored extra metadata can be added at either level
without bumping ``version``. ``version`` bumps are reserved for
breaking changes (renaming ``providers``, changing ``models`` shape).
"""
from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
from hermes_cli import __version__ as _HERMES_VERSION
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_CATALOG_URL = (
"https://hermes-agent.nousresearch.com/docs/api/model-catalog.json"
)
DEFAULT_TTL_HOURS = 24
DEFAULT_FETCH_TIMEOUT = 8.0
SUPPORTED_SCHEMA_VERSION = 1
_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
# In-process cache to avoid repeated disk + parse work across multiple
# calls within the same session. Invalidated by TTL against the disk file's
# mtime, so calling code never has to think about this.
_catalog_cache: dict[str, Any] | None = None
_catalog_cache_source_mtime: float = 0.0
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_catalog_config() -> dict[str, Any]:
"""Load the ``model_catalog`` config block with defaults filled in."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
except Exception:
cfg = {}
raw = cfg.get("model_catalog")
if not isinstance(raw, dict):
raw = {}
return {
"enabled": bool(raw.get("enabled", True)),
"url": str(raw.get("url") or DEFAULT_CATALOG_URL),
"ttl_hours": float(raw.get("ttl_hours") or DEFAULT_TTL_HOURS),
"providers": raw.get("providers") if isinstance(raw.get("providers"), dict) else {},
}
def _cache_path() -> Path:
"""Return the disk cache path. Import lazily so tests can monkeypatch home."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "cache" / "model_catalog.json"
# ---------------------------------------------------------------------------
# Fetch + validate + cache
# ---------------------------------------------------------------------------
def _fetch_manifest(url: str, timeout: float) -> dict[str, Any] | None:
"""HTTP GET the manifest URL and return a parsed dict, or None on failure."""
try:
req = urllib.request.Request(
url,
headers={
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError, OSError) as exc:
logger.info("model catalog fetch failed (%s): %s", url, exc)
return None
except Exception as exc: # pragma: no cover — defensive
logger.info("model catalog fetch errored (%s): %s", url, exc)
return None
if not _validate_manifest(data):
logger.info("model catalog at %s failed schema validation", url)
return None
return data
def _validate_manifest(data: Any) -> bool:
"""Return True when ``data`` matches the minimum manifest shape."""
if not isinstance(data, dict):
return False
version = data.get("version")
if not isinstance(version, int) or version > SUPPORTED_SCHEMA_VERSION:
# Future schema version we don't understand — refuse rather than
# guess. Older schemas (version < 1) aren't supported either.
return False
providers = data.get("providers")
if not isinstance(providers, dict):
return False
for pname, pblock in providers.items():
if not isinstance(pname, str) or not isinstance(pblock, dict):
return False
models = pblock.get("models")
if not isinstance(models, list):
return False
for m in models:
if not isinstance(m, dict):
return False
if not isinstance(m.get("id"), str) or not m["id"].strip():
return False
return True
def _read_disk_cache() -> tuple[dict[str, Any] | None, float]:
"""Return ``(data_or_none, mtime)``. mtime is 0 if file is missing."""
path = _cache_path()
try:
mtime = path.stat().st_mtime
except (OSError, FileNotFoundError):
return (None, 0.0)
try:
with open(path) as fh:
data = json.load(fh)
except (OSError, json.JSONDecodeError):
return (None, 0.0)
if not _validate_manifest(data):
return (None, 0.0)
return (data, mtime)
def _write_disk_cache(data: dict[str, Any]) -> None:
path = _cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
with open(tmp, "w") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
os.replace(tmp, path)
except OSError as exc:
logger.info("model catalog cache write failed: %s", exc)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def get_catalog(*, force_refresh: bool = False) -> dict[str, Any]:
"""Return the parsed model catalog manifest, or an empty dict on failure.
Callers should treat a missing provider/model as "use the in-repo fallback"
never raise from this function so the CLI keeps working offline.
"""
global _catalog_cache, _catalog_cache_source_mtime
cfg = _load_catalog_config()
if not cfg["enabled"]:
return {}
ttl_seconds = max(0.0, cfg["ttl_hours"] * 3600.0)
disk_data, disk_mtime = _read_disk_cache()
now = time.time()
disk_fresh = disk_data is not None and (now - disk_mtime) < ttl_seconds
# In-process cache hit: disk hasn't changed since we loaded it and still fresh.
if (
not force_refresh
and _catalog_cache is not None
and disk_data is not None
and disk_mtime == _catalog_cache_source_mtime
and disk_fresh
):
return _catalog_cache
# Disk is fresh enough — use it without a network hit.
if not force_refresh and disk_fresh and disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
# Need to (re)fetch. If it fails, fall back to any stale disk copy.
fetched = _fetch_manifest(cfg["url"], DEFAULT_FETCH_TIMEOUT)
if fetched is not None:
_write_disk_cache(fetched)
new_disk_data, new_mtime = _read_disk_cache()
if new_disk_data is not None:
_catalog_cache = new_disk_data
_catalog_cache_source_mtime = new_mtime
return new_disk_data
_catalog_cache = fetched
_catalog_cache_source_mtime = now
return fetched
if disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
return {}
def _fetch_provider_override(provider: str) -> dict[str, Any] | None:
"""If ``model_catalog.providers.<name>.url`` is set, fetch that instead."""
cfg = _load_catalog_config()
if not cfg["enabled"]:
return None
provider_cfg = cfg["providers"].get(provider)
if not isinstance(provider_cfg, dict):
return None
override_url = provider_cfg.get("url")
if not isinstance(override_url, str) or not override_url.strip():
return None
# Override fetches skip the disk cache because they're usually
# third-party self-hosted. Re-request on every call but with a short
# timeout so they don't block the picker.
return _fetch_manifest(override_url.strip(), DEFAULT_FETCH_TIMEOUT)
def _get_provider_block(provider: str) -> dict[str, Any] | None:
"""Return the provider's manifest block, respecting per-provider overrides."""
override = _fetch_provider_override(provider)
if override is not None:
block = override.get("providers", {}).get(provider)
if isinstance(block, dict):
return block
catalog = get_catalog()
if not catalog:
return None
block = catalog.get("providers", {}).get(provider)
return block if isinstance(block, dict) else None
def get_curated_openrouter_models() -> list[tuple[str, str]] | None:
"""Return OpenRouter's curated ``[(id, description), ...]`` from the manifest.
Returns ``None`` when the manifest is unavailable, so callers can fall
back to their hardcoded list.
"""
block = _get_provider_block("openrouter")
if not block:
return None
out: list[tuple[str, str]] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if not mid:
continue
desc = str(m.get("description") or "")
out.append((mid, desc))
return out or None
def get_curated_nous_models() -> list[str] | None:
"""Return Nous Portal's curated list of model ids from the manifest.
Returns ``None`` when the manifest is unavailable.
"""
block = _get_provider_block("nous")
if not block:
return None
out: list[str] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if mid:
out.append(mid)
return out or None
def reset_cache() -> None:
"""Clear the in-process cache. Used by tests and ``hermes model --refresh``."""
global _catalog_cache, _catalog_cache_source_mtime
_catalog_cache = None
_catalog_cache_source_mtime = 0.0
+39 -12
View File
@@ -533,6 +533,7 @@ def resolve_display_context_length(
base_url: str = "",
api_key: str = "",
model_info: Optional[ModelInfo] = None,
custom_providers: list | None = None,
) -> Optional[int]:
"""Resolve the context length to show in /model output.
@@ -543,6 +544,11 @@ def resolve_display_context_length(
about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
rest.
When ``custom_providers`` is provided, per-model ``context_length``
overrides from ``custom_providers[].models.<id>.context_length`` are
honored this closes #15779 where ``/model`` switch ignored user-set
overrides.
Prefer the provider-aware value; fall back to ``model_info.context_window``
only if the resolver returns nothing.
"""
@@ -553,6 +559,7 @@ def resolve_display_context_length(
base_url=base_url or "",
api_key=api_key or "",
provider=provider or None,
custom_providers=custom_providers,
)
if ctx:
return int(ctx)
@@ -831,9 +838,14 @@ def switch_model(
requested=current_provider,
target_model=new_model,
)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
# If resolution fell through to "custom" (e.g. named custom provider like
# "ollama-launch" that resolve_runtime_provider doesn't know), keep existing
# credentials. Otherwise use the resolved values (picks up credential rotation,
# base_url adjustments for OpenCode, etc.).
if runtime.get("provider") != "custom":
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
except Exception:
pass
@@ -867,16 +879,31 @@ def switch_model(
"message": f"Could not validate `{new_model}`: {e}",
}
# Override rejection if model is in the user's saved provider config.
# API /v1/models may not list cloud/aliased models even though the server supports them.
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
override = False
if user_providers:
for up in user_providers:
if isinstance(up, dict) and up.get("provider") == target_provider:
cfg_models = up.get("models", [])
if new_model in cfg_models or any(
m.get("name") == new_model for m in cfg_models if isinstance(m, dict)
):
override = True
break
if override:
validation = {"accepted": True, "persist": True, "recognized": False, "message": validation.get("message", "")}
else:
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
# Apply auto-correction if validation found a closer match
if validation.get("corrected_model"):
+140 -62
View File
@@ -33,8 +33,6 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("deepseek/deepseek-v4-pro", ""),
("deepseek/deepseek-v4-flash", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@@ -111,8 +109,6 @@ def _codex_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"deepseek/deepseek-v4-pro",
"deepseek/deepseek-v4-flash",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"anthropic/claude-opus-4.7",
@@ -383,6 +379,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
# Azure Foundry: user-provided endpoint and model.
# Empty list because models depend on the endpoint configuration.
"azure-foundry": [],
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -740,6 +739,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
ProviderEntry("azure-foundry", "Azure Foundry", "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
]
# Derived dicts — used throughout the codebase
@@ -872,7 +872,16 @@ def fetch_openrouter_models(
if _openrouter_catalog_cache is not None and not force_refresh:
return list(_openrouter_catalog_cache)
fallback = list(OPENROUTER_MODELS)
# Prefer the remotely-hosted catalog manifest; fall back to the in-repo
# snapshot when the manifest is unreachable. Both are curated lists that
# drive the picker; the OpenRouter live /v1/models filter (tool support,
# free pricing) is applied on top either way.
try:
from hermes_cli.model_catalog import get_curated_openrouter_models
remote = get_curated_openrouter_models()
except Exception:
remote = None
fallback = list(remote) if remote else list(OPENROUTER_MODELS)
preferred_ids = [mid for mid, _ in fallback]
try:
@@ -925,6 +934,24 @@ def model_ids(*, force_refresh: bool = False) -> list[str]:
return [mid for mid, _ in fetch_openrouter_models(force_refresh=force_refresh)]
def get_curated_nous_model_ids() -> list[str]:
"""Return the curated Nous Portal model-id list.
Prefers the remotely-hosted catalog manifest (published under
``website/static/api/model-catalog.json``); falls back to the in-repo
snapshot in ``_PROVIDER_MODELS["nous"]`` when the manifest is
unreachable. Always returns a list (never None).
"""
try:
from hermes_cli.model_catalog import get_curated_nous_models
remote = get_curated_nous_models()
except Exception:
remote = None
if remote:
return list(remote)
return list(_PROVIDER_MODELS.get("nous", []))
def _ai_gateway_model_is_free(pricing: Any) -> bool:
"""Return True if an AI Gateway model has $0 input AND output pricing."""
if not isinstance(pricing, dict):
@@ -1379,27 +1406,93 @@ def curated_models_for_provider(
return [(m, "") for m in models]
def detect_provider_for_model(
def _provider_keys(provider: str) -> set[str]:
key = (provider or "").strip().lower()
normalized = normalize_provider(provider)
return {k for k in (key, normalized) if k}
def _model_in_provider_catalog(name_lower: str, providers: set[str]) -> bool:
return any(
name_lower == model.lower()
for provider in providers
for model in _PROVIDER_MODELS.get(provider, [])
)
_AGGREGATOR_PROVIDERS = frozenset(
{"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
)
def _resolve_static_model_alias(
name_lower: str,
current_keys: set[str],
) -> Optional[tuple[str, str]]:
"""Resolve short aliases (e.g. sonnet/opus) using static catalogs only."""
try:
from hermes_cli.model_switch import MODEL_ALIASES
except Exception:
return None
identity = MODEL_ALIASES.get(name_lower)
if identity is None:
return None
vendor = identity.vendor
family = identity.family
def _match(provider: str) -> Optional[str]:
models = _PROVIDER_MODELS.get(provider, [])
if not models:
return None
prefix = (
f"{vendor}/{family}"
if provider in _AGGREGATOR_PROVIDERS
else family
).lower()
for model in models:
if model.lower().startswith(prefix):
return model
return None
for provider in current_keys:
if matched := _match(provider):
return provider, matched
for provider in _PROVIDER_MODELS:
if provider in current_keys or provider in _AGGREGATOR_PROVIDERS:
continue
if matched := _match(provider):
return provider, matched
for provider in _AGGREGATOR_PROVIDERS:
if provider in current_keys and (matched := _match(provider)):
return provider, matched
return None
def detect_static_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
"""Auto-detect a provider from static catalogs only.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``(provider_id, model_name)``. The model name may be remapped
when a static alias or bare provider name resolves to a catalog default.
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider with credentials (highest)
2. Direct provider without credentials remap to OpenRouter slug
3. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
name_lower = name.lower()
current_keys = _provider_keys(current_provider)
alias_match = _resolve_static_model_alias(name_lower, current_keys)
if alias_match:
return alias_match
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
@@ -1412,64 +1505,49 @@ def detect_provider_for_model(
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider != normalize_provider(current_provider)
and resolved_provider not in current_keys
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
if any(name_lower == m.lower() for m in current_models):
if _model_in_provider_catalog(name_lower, current_keys):
return None
# --- Step 1: check static provider catalogs for a direct match ---
direct_match: Optional[str] = None
for pid, models in _PROVIDER_MODELS.items():
if pid == current_provider or pid in _AGGREGATORS:
if pid in current_keys or pid in _AGGREGATOR_PROVIDERS:
continue
if any(name_lower == m.lower() for m in models):
direct_match = pid
break
return (pid, name)
if direct_match:
# Check if we have credentials for this provider — env vars,
# credential pool, or auth store entries.
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(direct_match)
if pconfig:
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
has_creds = True
break
except Exception:
pass
# Also check credential pool and auth store — covers OAuth,
# Claude Code tokens, and other non-env-var credentials (#10300).
if not has_creds:
try:
from agent.credential_pool import load_pool
pool = load_pool(direct_match)
if pool.has_credentials():
has_creds = True
except Exception:
pass
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
return None
# Always return the direct provider match. If credentials are
# missing, the client init will give a clear error rather than
# silently routing through the wrong provider (#10300).
return (direct_match, name)
def detect_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider static catalog match
2. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
static_match = detect_static_provider_for_model(name, current_provider)
if static_match:
return static_match
if _model_in_provider_catalog(name.lower(), _provider_keys(current_provider)):
return None
# --- Step 2: check OpenRouter catalog ---
# First try exact match (handles provider/model format)
@@ -2571,8 +2649,8 @@ def validate_requested_model(
)
return {
"accepted": False,
"persist": False,
"accepted": True,
"persist": True,
"recognized": False,
"message": message,
}
+16 -8
View File
@@ -9,6 +9,7 @@ from typing import Dict, Iterable, Optional, Set
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.config import get_env_value, load_config
from tools.managed_tool_gateway import is_managed_tool_gateway_ready
from utils import is_truthy_value
from tools.tool_backend_helpers import (
fal_key_is_configured,
has_direct_modal_credentials,
@@ -25,6 +26,13 @@ _DEFAULT_PLATFORM_TOOLSETS = {
}
def _uses_gateway(section: object) -> bool:
"""Return True when a config section explicitly opts into the gateway."""
if not isinstance(section, dict):
return False
return is_truthy_value(section.get("use_gateway"), default=False)
@dataclass(frozen=True)
class NousFeatureState:
key: str
@@ -262,11 +270,11 @@ def get_nous_subscription_features(
# use_gateway flags — when True, the user explicitly opted into the
# Tool Gateway via `hermes model`, so direct credentials should NOT
# prevent gateway routing.
web_use_gateway = bool(web_cfg.get("use_gateway"))
tts_use_gateway = bool(tts_cfg.get("use_gateway"))
browser_use_gateway = bool(browser_cfg.get("use_gateway"))
web_use_gateway = _uses_gateway(web_cfg)
tts_use_gateway = _uses_gateway(tts_cfg)
browser_use_gateway = _uses_gateway(browser_cfg)
image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
image_use_gateway = _uses_gateway(image_gen_cfg)
direct_exa = bool(get_env_value("EXA_API_KEY"))
direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
@@ -601,10 +609,10 @@ def get_gateway_eligible_tools(
# no direct keys exist — we only skip the prompt for tools where
# use_gateway was explicitly set.
opted_in = {
"web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
"image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
"tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
"browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
"web": _uses_gateway(config.get("web")),
"image_gen": _uses_gateway(config.get("image_gen")),
"tts": _uses_gateway(config.get("tts")),
"browser": _uses_gateway(config.get("browser")),
}
unconfigured: list[str] = []
+202
View File
@@ -0,0 +1,202 @@
"""Oneshot (-z) mode: send a prompt, get the final content block, exit.
Bypasses cli.py entirely. No banner, no spinner, no session_id line,
no stderr chatter. Just the agent's final text to stdout.
Toolsets = whatever the user has configured for "cli" in `hermes tools`.
Rules / memory / AGENTS.md / preloaded skills = same as a normal chat turn.
Approvals = auto-bypassed (HERMES_YOLO_MODE=1 is set for the call).
Working directory = the user's CWD (AGENTS.md etc. resolve from there as usual).
Model / provider selection mirrors `hermes chat`:
- Both optional. If omitted, use the user's configured default.
- If both given, pair them exactly as given.
- If only --model given, auto-detect the provider that serves it.
- If only --provider given, error out (ambiguous caller must pick a model).
Env var fallbacks (used when the corresponding arg is not passed):
- HERMES_INFERENCE_MODEL
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
"""
from __future__ import annotations
import logging
import os
import sys
from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
def run_oneshot(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> int:
"""Execute a single prompt and print only the final content block.
Args:
prompt: The user message to send.
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
env var, then config.yaml's model.default / model.model.
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
Returns the exit code. Caller should sys.exit() with the return.
"""
# Silence every stdlib logger for the duration. AIAgent, tools, and
# provider adapters all log to stderr through the root logger; file
# handlers added by setup_logging() keep working (they're attached to
# the root logger's handler list, not affected by level), but no
# bytes reach the terminal.
logging.disable(logging.CRITICAL)
# --provider without --model is ambiguous: carrying the user's configured
# model across to a different provider is usually wrong (that provider may
# not host it), and silently picking the provider's catalog default hides
# the mismatch. Require the caller to be explicit. Validate BEFORE the
# stderr redirect so the message actually reaches the terminal.
env_model_early = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
if provider and not ((model or "").strip() or env_model_early):
sys.stderr.write(
"hermes -z: --provider requires --model (or HERMES_INFERENCE_MODEL). "
"Pass both explicitly, or neither to use your configured defaults.\n"
)
return 2
# Auto-approve any shell / tool approvals. Non-interactive by
# definition — a prompt would hang forever.
os.environ["HERMES_YOLO_MODE"] = "1"
os.environ["HERMES_ACCEPT_HOOKS"] = "1"
# Redirect stderr AND stdout to devnull for the entire call tree.
# We'll print the final response to the real stdout at the end.
real_stdout = sys.stdout
devnull = open(os.devnull, "w")
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
response = _run_agent(prompt, model=model, provider=provider)
finally:
try:
devnull.close()
except Exception:
pass
if response:
real_stdout.write(response)
if not response.endswith("\n"):
real_stdout.write("\n")
real_stdout.flush()
return 0
def _run_agent(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> str:
"""Build an AIAgent exactly like a normal CLI chat turn would, then
run a single conversation. Returns the final response string."""
# Imports are local so they don't run when hermes is invoked for
# other commands (keeps top-level CLI startup cheap).
from hermes_cli.config import load_config
from hermes_cli.models import detect_provider_for_model
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.tools_config import _get_platform_tools
from run_agent import AIAgent
cfg = load_config()
# Resolve effective model: explicit arg → env var → config.
model_cfg = cfg.get("model") or {}
if isinstance(model_cfg, str):
cfg_model = model_cfg
else:
cfg_model = model_cfg.get("default") or model_cfg.get("model") or ""
env_model = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
effective_model = (model or "").strip() or env_model or cfg_model
# Resolve effective provider: explicit arg → (auto-detect from model if
# model was explicit) → env / config (handled inside resolve_runtime_provider).
#
# When --model is given without --provider, auto-detect the provider that
# serves that model — same semantic as `/model <name>` in an interactive
# session. Without this, resolve_runtime_provider() would fall back to
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
)
# Pull in whatever toolsets the user has enabled for "cli".
# sorted() gives stable ordering; set→list for AIAgent's signature.
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
agent = AIAgent(
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=effective_model,
enabled_toolsets=toolsets_list,
quiet_mode=True,
platform="cli",
credential_pool=runtime.get("credential_pool"),
# Interactive callbacks are intentionally NOT wired beyond this
# one. In oneshot mode there's no user sitting at a terminal:
# - clarify → returns a synthetic "pick a default" instruction
# so the agent continues instead of stalling on
# the tool's built-in "not available" error
# - sudo password prompt → terminal_tool gates on
# HERMES_INTERACTIVE which we never set
# - shell-hook approval → auto-approved via HERMES_ACCEPT_HOOKS=1
# (set above); also falls back to deny on non-tty
# - dangerous-command approval → bypassed via HERMES_YOLO_MODE=1
# - skill secret capture → returns gracefully when no callback set
clarify_callback=_oneshot_clarify_callback,
)
# Belt-and-braces: make sure AIAgent doesn't invoke any streaming
# display callbacks that would bypass our stdout capture.
agent.suppress_status_output = True
agent.stream_delta_callback = None
agent.tool_gen_callback = None
return agent.chat(prompt) or ""
def _oneshot_clarify_callback(question: str, choices=None) -> str:
"""Clarify is disabled in oneshot mode — tell the agent to pick a
default and proceed instead of stalling or erroring."""
if choices:
return (
f"[oneshot mode: no user available. Pick the best option from "
f"{choices} using your own judgment and continue.]"
)
return (
"[oneshot mode: no user available. Make the most reasonable "
"assumption you can and continue.]"
)
+1
View File
@@ -36,6 +36,7 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
("wecom_callback", PlatformInfo(label="💬 WeCom Callback", default_toolset="hermes-wecom-callback")),
("weixin", PlatformInfo(label="💬 Weixin", default_toolset="hermes-weixin")),
("qqbot", PlatformInfo(label="💬 QQBot", default_toolset="hermes-qqbot")),
("yuanbao", PlatformInfo(label="🤖 Yuanbao", default_toolset="hermes-yuanbao")),
("webhook", PlatformInfo(label="🔗 Webhook", default_toolset="hermes-webhook")),
("api_server", PlatformInfo(label="🌐 API Server", default_toolset="hermes-api-server")),
("cron", PlatformInfo(label="⏰ Cron", default_toolset="hermes-cron")),
+6
View File
@@ -167,6 +167,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
# Azure Foundry: supports both OpenAI-style and Anthropic-style endpoints.
# The transport is determined at runtime from config.yaml model.api_mode.
"azure-foundry": HermesOverlay(
transport="openai_chat", # default; overridden by api_mode in config
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
}
+148 -7
View File
@@ -221,6 +221,19 @@ def _resolve_runtime_from_pool_entry(
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
elif provider == "azure-foundry":
# Azure Foundry: read api_mode and base_url from config
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# For Anthropic-style endpoints, strip /v1 suffix
if api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -589,6 +602,71 @@ def _resolve_openrouter_runtime(
}
def _resolve_azure_foundry_runtime(
*,
requested_provider: str,
model_cfg: Dict[str, Any],
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve an Azure Foundry runtime entry.
Reads ``model.base_url`` + ``model.api_mode`` from config.yaml (or
explicit overrides), pulls the API key from ``.env`` / env var, and
strips a trailing ``/v1`` for Anthropic-style endpoints because the
Anthropic SDK appends ``/v1/messages`` internally.
Raises :class:`AuthError` when required values are missing.
"""
explicit_api_key = str(explicit_api_key or "").strip()
explicit_base_url_clean = str(explicit_base_url or "").strip().rstrip("/")
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
cfg_base_url = ""
cfg_api_mode = "chat_completions"
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"
env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
base_url = explicit_base_url_clean or cfg_base_url or env_base_url
if not base_url:
raise AuthError(
"Azure Foundry requires a base URL. Set it via 'hermes model' or "
"the AZURE_FOUNDRY_BASE_URL environment variable."
)
api_key = explicit_api_key
if not api_key:
try:
from hermes_cli.config import get_env_value
api_key = get_env_value("AZURE_FOUNDRY_API_KEY") or ""
except Exception:
api_key = ""
if not api_key:
api_key = os.getenv("AZURE_FOUNDRY_API_KEY", "").strip()
if not api_key:
raise AuthError(
"Azure Foundry requires an API key. Set AZURE_FOUNDRY_API_KEY in "
"~/.hermes/.env or run 'hermes model' to configure."
)
# Anthropic SDK appends /v1/messages itself, so strip any trailing /v1
# we inherited from the configured base_url to avoid double-/v1 paths.
if cfg_api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
source = "explicit" if (explicit_api_key or explicit_base_url) else "config"
return {
"provider": "azure-foundry",
"api_mode": cfg_api_mode,
"base_url": base_url,
"api_key": api_key,
"source": source,
"requested_provider": requested_provider,
}
def _resolve_explicit_runtime(
*,
provider: str,
@@ -678,6 +756,15 @@ def _resolve_explicit_runtime(
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
if provider == "azure-foundry":
return _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=model_cfg,
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
env_url = ""
@@ -746,6 +833,40 @@ def resolve_runtime_provider(
"""
requested_provider = resolve_requested_provider(requested)
# Azure Anthropic short-circuit: when explicitly targeting an Azure endpoint
# with provider="anthropic", bypass _resolve_named_custom_runtime (which would
# return provider="custom" with chat_completions api_mode and no valid key).
# Instead, use the Azure key directly with anthropic_messages api_mode.
_eff_base = (explicit_base_url or "").strip()
if requested_provider == "anthropic" and "azure.com" in _eff_base:
_azure_key = (
(explicit_api_key or "").strip()
or os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
"base_url": _eff_base.rstrip("/"),
"api_key": _azure_key,
"source": "azure-explicit",
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
# (OpenAI-style chat_completions or Anthropic-style anthropic_messages).
# Resolve before the custom-runtime / pool / generic paths so Azure
# config is always picked up from model.base_url + model.api_mode,
# regardless of whether the caller passed explicit_* args.
if requested_provider == "azure-foundry":
azure_runtime = _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=_get_model_config(),
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
)
return azure_runtime
custom_runtime = _resolve_named_custom_runtime(
requested_provider=requested_provider,
explicit_api_key=explicit_api_key,
@@ -924,13 +1045,6 @@ def resolve_runtime_provider(
# Anthropic (native Messages API)
if provider == "anthropic":
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
# Allow base URL override from config.yaml model.base_url, but only
# when the configured provider is anthropic — otherwise a non-Anthropic
# base_url (e.g. Codex endpoint) would leak into Anthropic requests.
@@ -939,6 +1053,33 @@ def resolve_runtime_provider(
if cfg_provider == "anthropic":
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
base_url = cfg_base_url or "https://api.anthropic.com"
# For Azure AI Foundry endpoints, use ANTHROPIC_API_KEY directly —
# Claude Code OAuth tokens (sk-ant-oat01) are not accepted by Azure.
# Azure keys don't start with "sk-ant-" so resolve_anthropic_token()
# would find the Claude Code OAuth token first (priority 3) and return
# that instead, causing 401s. Detect Azure endpoints and use the env
# key directly to bypass the OAuth priority chain.
_is_azure_endpoint = "azure.com" in base_url.lower() or (
cfg_base_url and "azure.com" in cfg_base_url.lower()
)
if _is_azure_endpoint:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
if not token:
raise AuthError(
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or ANTHROPIC_API_KEY."
)
else:
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
+96 -63
View File
@@ -1856,27 +1856,32 @@ def _setup_slack():
if existing:
print_info("Slack: already configured")
if not prompt_yes_no("Reconfigure Slack?", False):
# Even without reconfiguring, offer to refresh the manifest so
# new commands (e.g. /btw, /stop, ...) get registered in Slack.
if prompt_yes_no(
"Regenerate the Slack app manifest with the latest command "
"list? (recommended after `hermes update`)",
True,
):
_write_slack_manifest_and_instruct()
return
print_info("Steps to create a Slack app:")
print_info(" 1. Go to https://api.slack.com/apps → Create New App (from scratch)")
print_info(" 1. Go to https://api.slack.com/apps → Create New App")
print_info(" Pick 'From an app manifest' — we'll generate one for you below.")
print_info(" 2. Enable Socket Mode: Settings → Socket Mode → Enable")
print_info(" • Create an App-Level Token with 'connections:write' scope")
print_info(" 3. Add Bot Token Scopes: Features → OAuth & Permissions")
print_info(" Required scopes: chat:write, app_mentions:read,")
print_info(" channels:history, channels:read, im:history,")
print_info(" im:read, im:write, users:read, files:read, files:write")
print_info(" Optional for private channels: groups:history")
print_info(" 4. Subscribe to Events: Features → Event Subscriptions → Enable")
print_info(" Required events: message.im, message.channels, app_mention")
print_info(" Optional for private channels: message.groups")
print_warning(" ⚠ Without message.channels the bot will ONLY work in DMs,")
print_warning(" not public channels.")
print_info(" 5. Install to Workspace: Settings → Install App")
print_info(" 6. Reinstall the app after any scope or event changes")
print_info(" 7. After installing, invite the bot to channels: /invite @YourBot")
print_info(" 3. Install to Workspace: Settings → Install App")
print_info(" 4. After installing, invite the bot to channels: /invite @YourBot")
print()
print_info(" Full guide: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/slack/")
print()
# Generate and write manifest up-front so the user can paste it into
# the "Create from manifest" flow instead of clicking through scopes /
# events / slash commands one at a time.
_write_slack_manifest_and_instruct()
print()
bot_token = prompt("Slack Bot Token (xoxb-...)", password=True)
if not bot_token:
@@ -1902,6 +1907,49 @@ def _setup_slack():
print_info(" Set SLACK_ALLOW_ALL_USERS=true or GATEWAY_ALLOW_ALL_USERS=true only if you intentionally want open workspace access.")
def _write_slack_manifest_and_instruct():
"""Generate the Slack manifest, write it under HERMES_HOME, and print
paste-into-Slack instructions.
Exposed as its own helper so both the initial setup flow and the
"reconfigure? → no" branch can refresh the manifest without the user
re-entering tokens. Failures are non-fatal if the manifest write
fails for any reason, we print a warning and skip rather than abort
the whole Slack setup.
"""
try:
from hermes_cli.slack_cli import _build_full_manifest
from hermes_constants import get_hermes_home
manifest = _build_full_manifest(
bot_name="Hermes",
bot_description="Your Hermes agent on Slack",
)
target = Path(get_hermes_home()) / "slack-manifest.json"
target.parent.mkdir(parents=True, exist_ok=True)
import json as _json
target.write_text(
_json.dumps(manifest, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
print_success(f"Slack app manifest written to: {target}")
print_info(
" Paste it into https://api.slack.com/apps → your app → Features "
"→ App Manifest → Edit, then Save. Slack will prompt to "
"reinstall if scopes or slash commands changed."
)
print_info(
" Re-run `hermes slack manifest --write` anytime to refresh after "
"Hermes adds new commands."
)
except Exception as exc: # pragma: no cover - best-effort UX helper
print_warning(f"Couldn't write Slack manifest: {exc}")
print_info(
" You can generate it manually later with: "
"hermes slack manifest --write"
)
def _setup_matrix():
"""Configure Matrix credentials."""
print_header("Matrix")
@@ -2085,6 +2133,12 @@ def _setup_feishu():
_gateway_setup_feishu()
def _setup_yuanbao():
"""Configure Yuanbao via gateway setup."""
from hermes_cli.gateway import _setup_yuanbao as _gateway_setup_yuanbao
_gateway_setup_yuanbao()
def _setup_wecom():
"""Configure WeCom (Enterprise WeChat) via gateway setup."""
from hermes_cli.gateway import _setup_wecom as _gateway_setup_wecom
@@ -2229,6 +2283,7 @@ _GATEWAY_PLATFORMS = [
("WhatsApp", "WHATSAPP_ENABLED", _setup_whatsapp),
("DingTalk", "DINGTALK_CLIENT_ID", _setup_dingtalk),
("Feishu / Lark", "FEISHU_APP_ID", _setup_feishu),
("Yuanbao", "YUANBAO_APP_ID", _setup_yuanbao),
("WeCom (Enterprise WeChat)", "WECOM_BOT_ID", _setup_wecom),
("WeCom Callback (Self-Built App)", "WECOM_CALLBACK_CORP_ID", _setup_wecom_callback),
("Weixin (WeChat)", "WEIXIN_ACCOUNT_ID", _setup_weixin),
@@ -2863,17 +2918,6 @@ SETUP_SECTIONS = [
("agent", "Agent Settings", setup_agent_settings),
]
# The returning-user menu intentionally omits standalone TTS because model setup
# already includes TTS selection and tools setup covers the rest of the provider
# configuration. Keep this list in the same order as the visible menu entries.
RETURNING_USER_MENU_SECTION_KEYS = [
"model",
"terminal",
"gateway",
"tools",
"agent",
]
def run_setup_wizard(args):
"""Run the interactive setup wizard.
@@ -2898,6 +2942,9 @@ def run_setup_wizard(args):
save_config(copy.deepcopy(DEFAULT_CONFIG))
print_success("Configuration reset to defaults.")
reconfigure_requested = bool(getattr(args, "reconfigure", False))
quick_requested = bool(getattr(args, "quick", False))
config = load_config()
hermes_home = get_hermes_home()
@@ -2989,50 +3036,36 @@ def run_setup_wizard(args):
migration_ran = False
if is_existing:
# ── Returning User Menu ──
print()
print_header("Welcome Back!")
print_success("You already have Hermes configured.")
print()
menu_choices = [
"Quick Setup - configure missing items only",
"Full Setup - reconfigure everything",
"Model & Provider",
"Terminal Backend",
"Messaging Platforms (Gateway)",
"Tools",
"Agent Settings",
"Exit",
]
choice = prompt_choice("What would you like to do?", menu_choices, 0)
if choice == 0:
# Quick setup
# Existing install — default is the full-wizard reconfigure flow.
# Every prompt shows the current value as its default, so pressing
# Enter keeps it. Opt into `--quick` for the narrow "just fill in
# missing items" flow (useful after a partial OpenClaw migration
# or when a required API key got cleared).
if quick_requested:
_run_quick_setup(config, hermes_home)
return
elif choice == 1:
# Full setup — fall through to run all sections
pass
elif choice == 7:
print_info("Exiting. Run 'hermes setup' again when ready.")
return
elif 2 <= choice <= 6:
# Individual section — map by key, not by position.
# SETUP_SECTIONS includes TTS but the returning-user menu skips it,
# so positional indexing (choice - 2) would dispatch the wrong section.
section_key = RETURNING_USER_MENU_SECTION_KEYS[choice - 2]
section = next((s for s in SETUP_SECTIONS if s[0] == section_key), None)
if section:
_, label, func = section
func(config)
save_config(config)
_print_setup_summary(config, hermes_home)
return
print()
print_header("Reconfigure")
print_success("You already have Hermes configured.")
print_info("Running the full wizard — each prompt shows your current value.")
print_info("Press Enter to keep it, or type a new value to change it.")
print_info("")
print_info("Tip: jump straight to a section with 'hermes setup model|terminal|")
print_info(" gateway|tools|agent', or fill only missing items with --quick.")
# Fall through to the "Full Setup — run all sections" block below.
# --reconfigure is now the default on existing installs; the flag
# is preserved for backwards compatibility but is a no-op here.
else:
# ── First-Time Setup ──
print()
# --reconfigure / --quick on a fresh install are meaningless — fall
# through to the normal first-time flow.
if reconfigure_requested or quick_requested:
print_info("No existing configuration found — running first-time setup.")
print()
# Offer OpenClaw migration before configuration begins
migration_ran = _offer_openclaw_migration(hermes_home)
if migration_ran:
+230 -20
View File
@@ -11,9 +11,10 @@ handler are thin wrappers that parse args and delegate.
"""
import json
import re
import shutil
from pathlib import Path
from typing import Any, Dict, Optional
from typing import Any, Dict, List, Optional
from rich.console import Console
from rich.panel import Panel
@@ -141,6 +142,103 @@ def _derive_category_from_install_path(install_path: str) -> str:
return "" if parent == "." else parent
# ---------------------------------------------------------------------------
# Interactive name/category resolution for URL-installed skills
# ---------------------------------------------------------------------------
_VALID_NAME_RE = re.compile(r"^[a-z][a-z0-9_-]*$")
_VALID_CATEGORY_RE = re.compile(r"^[a-z][a-z0-9_/-]*$")
def _is_valid_installed_skill_name(name: str) -> bool:
"""Accept identifier-shaped names, reject empty / sentinel-y values."""
if not isinstance(name, str):
return False
candidate = name.strip().lower()
if not candidate or candidate in {"skill", "readme", "index", "unnamed-skill"}:
return False
return bool(_VALID_NAME_RE.match(candidate))
def _existing_categories() -> List[str]:
"""Return sorted subdirectory names under ``~/.hermes/skills/`` that look
like category buckets (contain at least one ``SKILL.md`` somewhere below).
Used to suggest reusable categories when interactively installing from a
URL. Hidden dirs (``.hub``, ``.trash``) are skipped.
"""
from tools.skills_hub import SKILLS_DIR
out: List[str] = []
try:
for entry in SKILLS_DIR.iterdir():
if not entry.is_dir() or entry.name.startswith("."):
continue
# Only count as a category if it contains skills, not if it IS a skill.
# Heuristic: if ``<entry>/SKILL.md`` exists, it's a skill at the
# top level (no category); otherwise treat as a category bucket.
if (entry / "SKILL.md").exists():
continue
# Has at least one nested SKILL.md?
try:
if any(entry.rglob("SKILL.md")):
out.append(entry.name)
except OSError:
continue
except (FileNotFoundError, OSError):
return []
return sorted(set(out))
def _prompt_for_skill_name(c: Console, url: str, default: str = "") -> Optional[str]:
"""Prompt interactively for a skill name. Returns None on cancel/EOF."""
c.print()
c.print(
f"[yellow]The SKILL.md at {url} doesn't declare a `name:` in its "
f"frontmatter,[/]\n[yellow]and the URL path doesn't produce a valid "
f"identifier either.[/]"
)
default_hint = f" [{default}]" if default else ""
c.print(
f"[bold]Enter a skill name{default_hint}:[/] "
f"[dim](lowercase letters, digits, hyphens, underscores; starts with a letter)[/]"
)
try:
answer = input("Name: ").strip()
except (EOFError, KeyboardInterrupt):
return None
if not answer and default:
answer = default
if not _is_valid_installed_skill_name(answer):
c.print(f"[bold red]Invalid name:[/] {answer!r}. Aborting install.\n")
return None
return answer
def _prompt_for_category(c: Console, existing: List[str]) -> str:
"""Prompt interactively for a category. Empty/None input means flat install."""
c.print()
if existing:
c.print(
"[bold]Pick a category[/] "
"[dim](reuse an existing bucket, type a new one, or press Enter to install flat)[/]"
)
c.print(f"[dim]Existing: {', '.join(existing)}[/]")
else:
c.print(
"[bold]Category[/] [dim](optional — press Enter to install flat at ~/.hermes/skills/<name>/)[/]"
)
try:
answer = input("Category: ").strip()
except (EOFError, KeyboardInterrupt):
return ""
if not answer:
return ""
if not _VALID_CATEGORY_RE.match(answer):
c.print(f"[dim]Invalid category {answer!r} — installing flat.[/]")
return ""
return answer
def do_search(query: str, source: str = "all", limit: int = 10,
console: Optional[Console] = None) -> None:
"""Search registries and display results as a Rich table."""
@@ -309,8 +407,17 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
def do_install(identifier: str, category: str = "", force: bool = False,
console: Optional[Console] = None, skip_confirm: bool = False,
invalidate_cache: bool = True) -> None:
"""Fetch, quarantine, scan, confirm, and install a skill."""
invalidate_cache: bool = True,
name_override: str = "") -> None:
"""Fetch, quarantine, scan, confirm, and install a skill.
``name_override`` lets non-interactive callers (slash commands, gateway,
scripts) supply a skill name when the upstream SKILL.md lacks a valid
``name:`` frontmatter field. On interactive TTY surfaces, a missing name
triggers a prompt instead; ``skip_confirm=True`` means "non-interactive"
(so pair it with ``name_override`` when installing from a URL that has
no frontmatter).
"""
from tools.skills_hub import (
GitHubAuth, create_source_router, ensure_hub_dirs,
quarantine_bundle, install_from_quarantine, HubLockFile,
@@ -354,6 +461,58 @@ def do_install(identifier: str, category: str = "", force: bool = False,
c.print()
return
# URL-sourced skills may arrive with an empty name when SKILL.md has no
# ``name:`` in frontmatter AND the URL path doesn't yield a valid
# identifier. Resolve by (1) --name override, (2) interactive prompt on
# a TTY, (3) refuse with an actionable error on non-interactive surfaces.
bundle_meta = getattr(bundle, "metadata", {}) or {}
if bundle.source == "url" and (not bundle.name or bundle_meta.get("awaiting_name")):
if name_override and _is_valid_installed_skill_name(name_override):
bundle.name = name_override.strip()
bundle_meta["awaiting_name"] = False
elif name_override:
c.print(
f"[bold red]Invalid --name:[/] {name_override!r}. "
"Must be a lowercase identifier (letters, digits, hyphens, "
"underscores; starts with a letter).\n"
)
return
elif skip_confirm:
# Non-interactive surface (slash command / TUI / gateway). Can't
# prompt — emit an actionable error.
url = bundle_meta.get("url") or identifier
c.print(
f"[bold red]Cannot install from URL:[/] {url}\n"
"[yellow]The SKILL.md has no `name:` in its frontmatter, "
"and the URL path doesn't produce a valid identifier.[/]\n\n"
"Retry with an explicit name:\n"
f" [bold]/skills install {url} --name <your-name>[/]\n"
f" [bold]hermes skills install {url} --name <your-name>[/]\n\n"
"[dim]Or ask the SKILL.md's author to add a `name:` field to "
"its YAML frontmatter.[/]\n"
)
return
else:
# Interactive TTY — prompt.
url = bundle_meta.get("url") or identifier
chosen = _prompt_for_skill_name(c, url)
if not chosen:
c.print("[dim]Installation cancelled.[/]\n")
return
bundle.name = chosen
bundle_meta["awaiting_name"] = False
# Keep SkillMeta in sync so downstream "already installed" checks,
# audit logs, and display all see the final name.
if meta is not None:
meta.name = bundle.name
meta.path = bundle.name
# URL-sourced skills: offer to pick a category interactively when the
# caller didn't specify one (TTY only — non-interactive installs fall
# through to flat install, matching all other sources).
if bundle.source == "url" and not category and not skip_confirm:
category = _prompt_for_category(c, _existing_categories())
# Auto-detect category for official skills (e.g. "official/autonomous-ai-agents/blackbox")
if bundle.source == "official" and not category:
id_parts = bundle.identifier.split("/") # ["official", "category", "skill"]
@@ -599,11 +758,24 @@ def inspect_skill(identifier: str) -> Optional[dict]:
return out
def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
"""List installed skills, distinguishing hub, builtin, and local skills."""
def do_list(source_filter: str = "all",
enabled_only: bool = False,
console: Optional[Console] = None) -> None:
"""List installed skills, distinguishing hub, builtin, and local skills.
Args:
source_filter: ``all`` | ``hub`` | ``builtin`` | ``local``.
enabled_only: If True, hide disabled skills from the output.
Enabled/disabled state is resolved against the currently active profile's
config ``hermes -p <profile> skills list`` reads that profile's
``skills.disabled`` list because ``-p`` swaps ``HERMES_HOME`` at process
start. No explicit profile flag needed here.
"""
from tools.skills_hub import HubLockFile, ensure_hub_dirs
from tools.skills_sync import _read_manifest
from tools.skills_tool import _find_all_skills
from agent.skill_utils import get_disabled_skill_names
c = console or _console
ensure_hub_dirs()
@@ -611,17 +783,26 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
hub_installed = {e["name"]: e for e in lock.list_installed()}
builtin_names = set(_read_manifest())
all_skills = _find_all_skills()
# Pull ALL skills (including disabled ones) so we can annotate status.
all_skills = _find_all_skills(skip_disabled=True)
disabled_names = get_disabled_skill_names()
table = Table(title="Installed Skills")
title = "Installed Skills"
if enabled_only:
title += " (enabled only)"
table = Table(title=title)
table.add_column("Name", style="bold cyan")
table.add_column("Category", style="dim")
table.add_column("Source", style="dim")
table.add_column("Trust", style="dim")
table.add_column("Status", style="dim")
hub_count = 0
builtin_count = 0
local_count = 0
enabled_count = 0
disabled_count = 0
for skill in sorted(all_skills, key=lambda s: (s.get("category") or "", s["name"])):
name = skill["name"]
@@ -632,29 +813,48 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
source_type = "hub"
source_display = hub_entry.get("source", "hub")
trust = hub_entry.get("trust_level", "community")
hub_count += 1
elif name in builtin_names:
source_type = "builtin"
source_display = "builtin"
trust = "builtin"
builtin_count += 1
else:
source_type = "local"
source_display = "local"
trust = "local"
local_count += 1
if source_filter != "all" and source_filter != source_type:
continue
is_enabled = name not in disabled_names
if enabled_only and not is_enabled:
continue
if source_type == "hub":
hub_count += 1
elif source_type == "builtin":
builtin_count += 1
else:
local_count += 1
if is_enabled:
enabled_count += 1
status_cell = "[bold green]enabled[/]"
else:
disabled_count += 1
status_cell = "[dim red]disabled[/]"
trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow", "local": "dim"}.get(trust, "dim")
trust_label = "official" if source_display == "official" else trust
table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]")
table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]", status_cell)
c.print(table)
c.print(
f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local[/]\n"
)
summary = f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local"
if enabled_only:
summary += f"{enabled_count} enabled shown"
else:
summary += f"{enabled_count} enabled, {disabled_count} disabled"
summary += "[/]\n"
c.print(summary)
def do_check(name: Optional[str] = None, console: Optional[Console] = None) -> None:
@@ -1123,11 +1323,15 @@ def skills_command(args) -> None:
do_search(args.query, source=args.source, limit=args.limit)
elif action == "install":
do_install(args.identifier, category=args.category, force=args.force,
skip_confirm=getattr(args, "yes", False))
skip_confirm=getattr(args, "yes", False),
name_override=getattr(args, "name", "") or "")
elif action == "inspect":
do_inspect(args.identifier)
elif action == "list":
do_list(source_filter=args.source)
do_list(
source_filter=args.source,
enabled_only=getattr(args, "enabled_only", False),
)
elif action == "check":
do_check(name=getattr(args, "name", None))
elif action == "update":
@@ -1177,6 +1381,7 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
/skills search kubernetes
/skills install openai/skills/skill-creator
/skills install openai/skills/skill-creator --force
/skills install https://example.com/path/SKILL.md
/skills inspect openai/skills/skill-creator
/skills list
/skills list --source hub
@@ -1253,10 +1458,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "install":
if not args:
c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force] [--now]\n")
c.print("[bold red]Usage:[/] /skills install <identifier-or-url> [--name <name>] [--category <cat>] [--force] [--now]\n")
return
identifier = args[0]
category = ""
name_override = ""
# Slash commands run inside prompt_toolkit where input() hangs.
# Always skip confirmation — the user typing the command is implicit consent.
skip_confirm = True
@@ -1267,9 +1473,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
for i, a in enumerate(args):
if a == "--category" and i + 1 < len(args):
category = args[i + 1]
elif a == "--name" and i + 1 < len(args):
name_override = args[i + 1]
do_install(identifier, category=category, force=force,
skip_confirm=skip_confirm, invalidate_cache=invalidate_cache,
console=c)
name_override=name_override, console=c)
elif action == "inspect":
if not args:
@@ -1279,11 +1487,12 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "list":
source_filter = "all"
enabled_only = "--enabled-only" in args or "--enabled" in args
if "--source" in args:
idx = args.index("--source")
if idx + 1 < len(args):
source_filter = args[idx + 1]
do_list(source_filter=source_filter, console=c)
do_list(source_filter=source_filter, enabled_only=enabled_only, console=c)
elif action == "check":
name = args[0] if args else None
@@ -1371,7 +1580,8 @@ def _print_skills_help(console: Console) -> None:
" [cyan]search[/] <query> Search registries for skills\n"
" [cyan]install[/] <identifier> Install a skill (with security scan)\n"
" [cyan]inspect[/] <identifier> Preview a skill without installing\n"
" [cyan]list[/] [--source hub|builtin|local] List installed skills\n"
" [cyan]list[/] [--source hub|builtin|local] [--enabled-only]\n"
" List installed skills; --enabled-only filters to the active profile's live set\n"
" [cyan]check[/] [name] Check hub skills for upstream updates\n"
" [cyan]update[/] [name] Update hub skills with upstream changes\n"
" [cyan]audit[/] [name] Re-scan hub skills for security\n"
+152
View File
@@ -0,0 +1,152 @@
"""``hermes slack ...`` CLI subcommands.
Today only ``hermes slack manifest`` is implemented it generates the
Slack app manifest JSON for registering every gateway command as a native
Slack slash (``/btw``, ``/stop``, ``/model``, ) so users get the same
first-class slash UX Discord and Telegram already have.
Typical workflow::
$ hermes slack manifest > slack-manifest.json
# or:
$ hermes slack manifest --write
Then paste the printed JSON into the Slack app config (Features App
Manifest Edit) and click Save. Slack diffs the manifest and prompts
for reinstall when scopes/commands change.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
def _build_full_manifest(bot_name: str, bot_description: str) -> dict:
"""Build a full Slack manifest merging display info + our slash list.
The slash-command list is always generated from ``COMMAND_REGISTRY`` so
it stays in sync with the rest of Hermes. Other manifest sections
(display info, OAuth scopes, socket mode) are set to sensible defaults
for a Hermes deployment users can tweak them in the Slack UI after
pasting.
"""
from hermes_cli.commands import slack_app_manifest
partial = slack_app_manifest()
slashes = partial["features"]["slash_commands"]
return {
"_metadata": {
"major_version": 1,
"minor_version": 1,
},
"display_information": {
"name": bot_name[:35],
"description": (bot_description or "Your Hermes agent on Slack")[:140],
"background_color": "#1a1a2e",
},
"features": {
"bot_user": {
"display_name": bot_name[:80],
"always_online": True,
},
"slash_commands": slashes,
"assistant_view": {
"assistant_description": "Chat with Hermes in threads and DMs.",
},
},
"oauth_config": {
"scopes": {
"bot": [
"app_mentions:read",
"assistant:write",
"channels:history",
"channels:read",
"chat:write",
"commands",
"files:read",
"files:write",
"groups:history",
"im:history",
"im:read",
"im:write",
"users:read",
],
},
},
"settings": {
"event_subscriptions": {
"bot_events": [
"app_mention",
"assistant_thread_context_changed",
"assistant_thread_started",
"message.channels",
"message.groups",
"message.im",
],
},
"interactivity": {
"is_enabled": True,
},
"org_deploy_enabled": False,
"socket_mode_enabled": True,
"token_rotation_enabled": False,
},
}
def slack_manifest_command(args) -> int:
"""Print or write a Slack app manifest JSON.
Flags (all parsed in ``hermes_cli/main.py``):
--write [PATH] Write to file instead of stdout (default path:
``$HERMES_HOME/slack-manifest.json``)
--name NAME Override the bot display name (default: "Hermes")
--description DESC Override the bot description
--slashes-only Emit only the ``features.slash_commands`` array (for
merging into an existing manifest manually)
"""
name = getattr(args, "name", None) or "Hermes"
description = getattr(args, "description", None) or "Your Hermes agent on Slack"
if getattr(args, "slashes_only", False):
from hermes_cli.commands import slack_app_manifest
manifest = slack_app_manifest()["features"]["slash_commands"]
else:
manifest = _build_full_manifest(name, description)
payload = json.dumps(manifest, indent=2, ensure_ascii=False) + "\n"
write_target = getattr(args, "write", None)
if write_target is not None:
if isinstance(write_target, bool) and write_target:
# --write with no value → default location
try:
from hermes_constants import get_hermes_home
target = Path(get_hermes_home()) / "slack-manifest.json"
except Exception:
target = Path.home() / ".hermes" / "slack-manifest.json"
else:
target = Path(write_target).expanduser()
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(payload, encoding="utf-8")
print(f"Slack manifest written to: {target}", file=sys.stderr)
print(
"\nNext steps:\n"
" 1. Open https://api.slack.com/apps and pick your Hermes app\n"
" (or create a new one: Create New App → From an app manifest).\n"
f" 2. Features → App Manifest → paste the contents of\n"
f" {target}\n"
" 3. Save; Slack will prompt to reinstall the app if scopes or\n"
" slash commands changed.\n"
" 4. Make sure Socket Mode is enabled and you have a bot token\n"
" (xoxb-...) and app token (xapp-...) configured via\n"
" `hermes setup`.\n",
file=sys.stderr,
)
else:
sys.stdout.write(payload)
return 0
+2 -1
View File
@@ -326,7 +326,8 @@ def show_status(args):
"WeCom Callback": ("WECOM_CALLBACK_CORP_ID", None),
"Weixin": ("WEIXIN_ACCOUNT_ID", "WEIXIN_HOME_CHANNEL"),
"BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
"QQBot": ("QQ_APP_ID", "QQBOT_HOME_CHANNEL"),
"QQBot": ("QQ_APP_ID", "QQ_HOME_CHANNEL"),
"Yuanbao": ("YUANBAO_APP_ID", "YUANBAO_HOME_CHANNEL"),
}
for name, (token_var, home_var) in platforms.items():
+4 -4
View File
@@ -20,10 +20,10 @@ def get_provider_request_timeout(
try:
from hermes_cli.config import load_config
except ImportError:
config = load_config()
except Exception:
return None
config = load_config()
providers = config.get("providers", {}) if isinstance(config, dict) else {}
provider_config = (
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
@@ -49,10 +49,10 @@ def get_provider_stale_timeout(
try:
from hermes_cli.config import load_config
except ImportError:
config = load_config()
except Exception:
return None
config = load_config()
providers = config.get("providers", {}) if isinstance(config, dict) else {}
provider_config = (
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
+2 -3
View File
@@ -10,8 +10,7 @@ import random
TIPS = [
# --- Slash Commands ---
"/btw <question> asks a quick side question without tools or history — great for clarifications.",
"/background <prompt> runs a task in a separate session while your current one stays free.",
"/background <prompt> (alias /bg or /btw) runs a task in a separate session while your current one stays free.",
"/branch forks the current session so you can explore a different direction without losing progress.",
"/compress manually compresses conversation context when things get long.",
"/rollback lists filesystem checkpoints — restore files the agent modified to any prior state.",
@@ -107,7 +106,7 @@ TIPS = [
"Set display.streaming: true to see tokens appear in real time as the model generates.",
"Set display.show_reasoning: true to watch the model's chain-of-thought reasoning.",
"Set display.compact: true to reduce whitespace in output for denser information.",
"Set display.busy_input_mode: queue to queue messages instead of interrupting the agent.",
"Set display.busy_input_mode: queue to queue messages instead of interrupting the agent, or steer to inject them mid-run via /steer.",
"Set display.resume_display: minimal to skip the full conversation recap on session resume.",
"Set compression.threshold: 0.50 to control when auto-compression fires (default: 50% of context).",
"Set agent.max_turns: 200 to let the agent take more tool-calling steps per turn.",
+137 -15
View File
@@ -11,6 +11,7 @@ the `platform_toolsets` key.
import json as _json
import logging
import os
import sys
from pathlib import Path
from typing import Dict, List, Optional, Set
@@ -25,7 +26,7 @@ from hermes_cli.nous_subscription import (
get_nous_subscription_features,
)
from tools.tool_backend_helpers import fal_key_is_configured, managed_nous_tools_enabled
from utils import base_url_hostname
from utils import base_url_hostname, is_truthy_value
logger = logging.getLogger(__name__)
@@ -68,25 +69,59 @@ CONFIGURABLE_TOOLSETS = [
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
("spotify", "🎵 Spotify", "playback, search, playlists, library"),
("discord", "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
("discord_admin", "🛡️ Discord Server Admin", "list channels/roles, pin, assign roles"),
("yuanbao", "🤖 Yuanbao", "group info, member queries, DM"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify"}
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
# absent from this map is available on every platform (current behaviour).
#
# Use this for tools whose APIs only make sense on one platform (Discord
# server admin, Slack workspace admin, etc.). Keeps every other platform's
# checklist from filling up with irrelevant toggles.
_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
"discord": {"discord"},
"discord_admin": {"discord"},
}
def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
"""Return True if ``ts_key`` is configurable on ``platform``.
Toolsets without a restriction entry are allowed everywhere (the default).
"""
allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
return allowed is None or platform in allowed
def _get_effective_configurable_toolsets():
"""Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.
Plugin toolsets are appended at the end so they appear after the
built-in toolsets in the TUI checklist.
built-in toolsets in the TUI checklist. A plugin whose toolset key
already appears in ``CONFIGURABLE_TOOLSETS`` is skipped bundled
plugins (e.g. ``plugins/spotify``) share their toolset key with the
built-in entry, and we want the built-in label/description to win.
Without the dedupe, ``hermes tools`` "reconfigure existing" would
list the same toolset twice.
"""
result = list(CONFIGURABLE_TOOLSETS)
seen = {ts_key for ts_key, _, _ in result}
try:
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
result.extend(get_plugin_toolsets())
for entry in get_plugin_toolsets():
if entry[0] in seen:
continue
seen.add(entry[0])
result.append(entry)
except Exception:
pass
return result
@@ -591,7 +626,7 @@ def _get_platform_tools(
include_default_mcp_servers: bool = True,
) -> Set[str]:
"""Resolve which individual toolset names are enabled for a platform."""
from toolsets import resolve_toolset
from toolsets import resolve_toolset, TOOLSETS
platform_toolsets = config.get("platform_toolsets") or {}
toolset_names = platform_toolsets.get(platform)
@@ -605,6 +640,8 @@ def _get_platform_tools(
toolset_names = [str(ts) for ts in toolset_names]
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_ts_keys = _get_plugin_toolset_keys()
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
# If the saved list contains any configurable keys directly, the user
# has explicitly configured this platform — use direct membership.
@@ -614,7 +651,10 @@ def _get_platform_tools(
has_explicit_config = any(ts in configurable_keys for ts in toolset_names)
if has_explicit_config:
enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
enabled_toolsets = {
ts for ts in toolset_names
if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
}
else:
# No explicit config — fall back to resolving composite toolset names
# (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -624,14 +664,61 @@ def _get_platform_tools(
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
if not _toolset_allowed_for_platform(ts_key, platform):
continue
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
default_off = set(_DEFAULT_OFF_TOOLSETS)
if platform in default_off:
# Legacy safety: if the platform's own name matches a default-off
# toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
# keep that toolset enabled on first install. Skip this dodge for
# platform-restricted toolsets — those are always opt-in even on
# their own platform (e.g. `discord` + `discord` should stay OFF).
if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
default_off.remove(platform)
# Home Assistant is already runtime-gated by its check_fn (requires
# HASS_TOKEN to register any tools). When a user has configured
# HASS_TOKEN, they've explicitly opted in — don't also strip it via
# _DEFAULT_OFF_TOOLSETS, which would silently drop HA from platforms
# (e.g. cron) that run through _get_platform_tools without an
# explicit saved toolset list. Without this, Norbert's HA cron jobs
# regressed after #14798 made cron honor per-platform tool config.
if "homeassistant" in default_off and os.getenv("HASS_TOKEN"):
default_off.remove("homeassistant")
enabled_toolsets -= default_off
# Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
# feishu_drive). These are part of the platform's default composite but
# absent from CONFIGURABLE_TOOLSETS, so they can't appear in the TUI
# checklist or in a user-saved config. Must run in BOTH branches —
# otherwise saving via `hermes tools` (which flips has_explicit_config
# to True) silently drops them.
platform_tool_universe = set(resolve_toolset(PLATFORMS[platform]["default_toolset"]))
configurable_tool_universe = set()
for ck in configurable_keys:
configurable_tool_universe.update(resolve_toolset(ck))
claimed = set()
for ts_key in enabled_toolsets:
claimed.update(resolve_toolset(ts_key))
skip = configurable_keys | plugin_ts_keys | platform_default_keys
skip |= {k for k in TOOLSETS if k.startswith("hermes-")}
skip |= set(_DEFAULT_OFF_TOOLSETS) - {platform}
for ts_key, ts_def in TOOLSETS.items():
if ts_key in skip:
continue
if ts_def.get("includes"):
continue
ts_tools = set(resolve_toolset(ts_key))
if not ts_tools or not ts_tools.issubset(platform_tool_universe):
continue
if ts_tools.issubset(configurable_tool_universe):
continue
if not ts_tools.issubset(claimed):
enabled_toolsets.add(ts_key)
claimed.update(ts_tools)
# Plugin toolsets: enabled by default unless explicitly disabled, or
# unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
# shipped as a bundled plugin but user must opt in via `hermes tools`
@@ -639,7 +726,6 @@ def _get_platform_tools(
# A plugin toolset is "known" for a platform once `hermes tools`
# has been saved for that platform (tracked via known_plugin_toolsets).
# Unknown plugins default to enabled; known-but-absent = disabled.
plugin_ts_keys = _get_plugin_toolset_keys()
if plugin_ts_keys:
known_map = config.get("known_plugin_toolsets", {})
known_for_platform = set(known_map.get(platform, []))
@@ -657,7 +743,6 @@ def _get_platform_tools(
# Preserve any explicit non-configurable toolset entries (for example,
# custom toolsets or MCP server names saved in platform_toolsets).
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
explicit_passthrough = {
ts
for ts in toolset_names
@@ -703,6 +788,14 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
"""
config.setdefault("platform_toolsets", {})
# Drop platform-scoped toolsets that don't apply here. Prevents the
# "Configure all platforms" checklist (or a hand-edited config.yaml)
# from turning on, say, the `discord` toolset for Telegram.
enabled_toolset_keys = {
ts for ts in enabled_toolset_keys
if _toolset_allowed_for_platform(ts, platform)
}
# Get the set of all configurable toolset keys (built-in + plugin)
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_keys = _get_plugin_toolset_keys()
@@ -717,6 +810,7 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
existing_toolsets = [str(ts) for ts in existing_toolsets]
# Preserve any entries that are NOT configurable toolsets and NOT platform
# defaults (i.e. only MCP server names should be preserved)
@@ -724,6 +818,11 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
entry for entry in existing_toolsets
if entry not in configurable_keys and entry not in platform_default_keys
}
# Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
# saving from the picker as consent to clear the "no_mcp" sentinel. The
# picker has no checkbox for no_mcp, so without this users who once set it
# by hand could never re-enable MCP servers through the UI.
preserved_entries.discard("no_mcp")
# Merge preserved entries with new enabled toolsets
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -831,7 +930,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
return _tool_token_cache
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
from toolsets import resolve_toolset
@@ -839,7 +938,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
# Pre-compute per-tool token counts (cached after first call).
tool_tokens = _estimate_tool_tokens()
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
# Drop platform-scoped toolsets that don't apply to this platform.
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
labels = []
for ts_key, ts_label, ts_desc in effective:
@@ -1084,7 +1188,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
configured_provider = image_cfg.get("provider")
if configured_provider not in (None, "", "fal"):
return False
if image_cfg.get("use_gateway") is False:
if image_cfg.get("use_gateway") is not None and not is_truthy_value(image_cfg.get("use_gateway"), default=False):
return False
return feature.managed_by_nous
if provider.get("tts_provider"):
@@ -1116,7 +1220,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
return (
provider["imagegen_backend"] == "fal"
and configured_provider in (None, "", "fal")
and not image_cfg.get("use_gateway")
and not is_truthy_value(image_cfg.get("use_gateway"), default=False)
)
return False
@@ -1753,7 +1857,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)
added = new_enabled - current_enabled
removed = current_enabled - new_enabled
@@ -2109,7 +2213,11 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]
def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
"""Print a summary of enabled/disabled toolsets and MCP tool filters."""
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
print(f"Built-in toolsets ({platform}):")
@@ -2175,6 +2283,20 @@ def tools_disable_enable_command(args):
_print_error(f"Unknown toolset '{name}'")
toolset_targets = [t for t in toolset_targets if t in valid_toolsets]
# Reject platform-scoped toolsets on platforms that don't allow them.
restricted_targets = [
t for t in toolset_targets
if not _toolset_allowed_for_platform(t, platform)
]
if restricted_targets:
for name in restricted_targets:
allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
_print_error(
f"Toolset '{name}' is not available on platform '{platform}' "
f"(only: {', '.join(allowed)})"
)
toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
if toolset_targets:
_apply_toolset_change(config, platform, toolset_targets, action)
+7 -9
View File
@@ -287,7 +287,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"display.busy_input_mode": {
"type": "select",
"description": "Input behavior while agent is running",
"options": ["interrupt", "queue"],
"options": ["interrupt", "queue", "steer"],
},
"memory.provider": {
"type": "select",
@@ -2327,16 +2327,14 @@ def _resolve_chat_argv(
from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
env: Optional[dict] = None
env = os.environ.copy()
env.setdefault("NODE_ENV", "production")
if resume or sidecar_url:
env = os.environ.copy()
if resume:
env["HERMES_TUI_RESUME"] = resume
if resume:
env["HERMES_TUI_RESUME"] = resume
if sidecar_url:
env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
if sidecar_url:
env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
return list(argv), str(cwd) if cwd else None, env
+3 -4
View File
@@ -195,10 +195,6 @@ def setup_logging(
The ``logs/`` directory where files are written.
"""
global _logging_initialized
if _logging_initialized and not force:
home = hermes_home or get_hermes_home()
return home / "logs"
home = hermes_home or get_hermes_home()
log_dir = home / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
@@ -248,6 +244,9 @@ def setup_logging(
log_filter=_ComponentFilter(COMPONENT_PREFIXES["gateway"]),
)
if _logging_initialized and not force:
return log_dir
# Ensure root logger level is low enough for the handlers to fire.
if root.level == logging.NOTSET or root.level > level:
root.setLevel(level)
+158 -19
View File
@@ -31,7 +31,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 8
SCHEMA_VERSION = 9
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -83,7 +83,8 @@ CREATE TABLE IF NOT EXISTS messages (
reasoning TEXT,
reasoning_content TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
codex_reasoning_items TEXT,
codex_message_items TEXT
);
CREATE TABLE IF NOT EXISTS state_meta (
@@ -356,6 +357,15 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -822,7 +832,18 @@ class SessionDB:
params = []
if not include_children:
where_clauses.append("s.parent_session_id IS NULL")
# Show root sessions and branch sessions (whose parent ended with
# end_reason='branched' before the child was created), while still
# hiding sub-agent runs and compression continuations (which also
# carry a parent_session_id but were spawned while the parent was
# still live — i.e., started_at < parent.ended_at).
where_clauses.append(
"(s.parent_session_id IS NULL"
" OR EXISTS (SELECT 1 FROM sessions p"
" WHERE p.id = s.parent_session_id"
" AND p.end_reason = 'branched'"
" AND s.started_at >= p.ended_at))"
)
if source:
where_clauses.append("s.source = ?")
@@ -956,6 +977,7 @@ class SessionDB:
reasoning_content: str = None,
reasoning_details: Any = None,
codex_reasoning_items: Any = None,
codex_message_items: Any = None,
) -> int:
"""
Append a message to a session. Returns the message row ID.
@@ -972,6 +994,10 @@ class SessionDB:
json.dumps(codex_reasoning_items)
if codex_reasoning_items else None
)
codex_message_items_json = (
json.dumps(codex_message_items)
if codex_message_items else None
)
tool_calls_json = json.dumps(tool_calls) if tool_calls else None
# Pre-compute tool call count
@@ -983,8 +1009,9 @@ class SessionDB:
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -999,6 +1026,7 @@ class SessionDB:
reasoning_content,
reasoning_details_json,
codex_items_json,
codex_message_items_json,
),
)
msg_id = cursor.lastrowid
@@ -1104,19 +1132,27 @@ class SessionDB:
current = child_id
return session_id
def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
def get_messages_as_conversation(
self, session_id: str, include_ancestors: bool = False
) -> List[Dict[str, Any]]:
"""
Load messages in the OpenAI conversation format (role + content dicts).
Used by the gateway to restore conversation history.
"""
session_ids = [session_id]
if include_ancestors:
session_ids = self._session_lineage_root_to_tip(session_id)
with self._lock:
cursor = self._conn.execute(
placeholders = ",".join("?" for _ in session_ids)
rows = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
rows = cursor.fetchall()
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items, "
"codex_message_items "
f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY timestamp, id",
tuple(session_ids),
).fetchall()
messages = []
for row in rows:
msg = {"role": row["role"], "content": row["content"]}
@@ -1150,9 +1186,53 @@ class SessionDB:
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_reasoning_items, falling back to None")
msg["codex_reasoning_items"] = None
if row["codex_message_items"]:
try:
msg["codex_message_items"] = json.loads(row["codex_message_items"])
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_message_items, falling back to None")
msg["codex_message_items"] = None
if include_ancestors and self._is_duplicate_replayed_user_message(messages, msg):
continue
messages.append(msg)
return messages
def _session_lineage_root_to_tip(self, session_id: str) -> List[str]:
if not session_id:
return [session_id]
chain = []
current = session_id
seen = set()
with self._lock:
for _ in range(100):
if not current or current in seen:
break
seen.add(current)
chain.append(current)
row = self._conn.execute(
"SELECT parent_session_id FROM sessions WHERE id = ?",
(current,),
).fetchone()
if row is None:
break
current = row["parent_session_id"] if hasattr(row, "keys") else row[0]
return list(reversed(chain)) or [session_id]
@staticmethod
def _is_duplicate_replayed_user_message(messages: List[Dict[str, Any]], msg: Dict[str, Any]) -> bool:
if msg.get("role") != "user":
return False
content = msg.get("content")
if not isinstance(content, str) or not content:
return False
for prev in reversed(messages):
if prev.get("role") == "user" and prev.get("content") == content:
return True
if prev.get("role") == "assistant" and (prev.get("content") or prev.get("tool_calls")):
return False
return False
# =========================================================================
# Search
# =========================================================================
@@ -1477,12 +1557,45 @@ class SessionDB:
)
self._execute_write(_do)
def delete_session(self, session_id: str) -> bool:
@staticmethod
def _remove_session_files(sessions_dir: Optional[Path], session_id: str) -> None:
"""Remove on-disk transcript files for a session.
Cleans up ``{session_id}.json``, ``{session_id}.jsonl``, and any
``request_dump_{session_id}_*.json`` files left by the gateway.
Silently skips files that don't exist and swallows OSError so a
filesystem hiccup never blocks a DB operation.
"""
if sessions_dir is None:
return
for suffix in (".json", ".jsonl"):
p = sessions_dir / f"{session_id}{suffix}"
try:
p.unlink(missing_ok=True)
except OSError:
pass
# request_dump files use session_id as a prefix component
try:
for p in sessions_dir.glob(f"request_dump_{session_id}_*.json"):
try:
p.unlink(missing_ok=True)
except OSError:
pass
except OSError:
pass
def delete_session(
self,
session_id: str,
sessions_dir: Optional[Path] = None,
) -> bool:
"""Delete a session and all its messages.
Child sessions are orphaned (parent_session_id set to NULL) rather
than cascade-deleted, so they remain accessible independently.
Returns True if the session was found and deleted.
When *sessions_dir* is provided, also removes on-disk transcript
files (``.json`` / ``.jsonl`` / ``request_dump_*``) for the deleted
session. Returns True if the session was found and deleted.
"""
def _do(conn):
cursor = conn.execute(
@@ -1499,16 +1612,29 @@ class SessionDB:
conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
return True
return self._execute_write(_do)
def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
deleted = self._execute_write(_do)
if deleted:
self._remove_session_files(sessions_dir, session_id)
return deleted
def prune_sessions(
self,
older_than_days: int = 90,
source: str = None,
sessions_dir: Optional[Path] = None,
) -> int:
"""Delete sessions older than N days. Returns count of deleted sessions.
Only prunes ended sessions (not active ones). Child sessions outside
the prune window are orphaned (parent_session_id set to NULL) rather
than cascade-deleted.
than cascade-deleted. When *sessions_dir* is provided, also removes
on-disk transcript files (``.json`` / ``.jsonl`` /
``request_dump_*``) for every pruned session, outside the DB
transaction.
"""
cutoff = time.time() - (older_than_days * 86400)
removed_ids: list[str] = []
def _do(conn):
if source:
@@ -1538,9 +1664,14 @@ class SessionDB:
for sid in session_ids:
conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
removed_ids.append(sid)
return len(session_ids)
return self._execute_write(_do)
count = self._execute_write(_do)
# Clean up on-disk files outside the DB transaction
for sid in removed_ids:
self._remove_session_files(sessions_dir, sid)
return count
# ── Meta key/value (for scheduler bookkeeping) ──
@@ -1594,6 +1725,7 @@ class SessionDB:
retention_days: int = 90,
min_interval_hours: int = 24,
vacuum: bool = True,
sessions_dir: Optional[Path] = None,
) -> Dict[str, Any]:
"""Idempotent auto-maintenance: prune old sessions + optional VACUUM.
@@ -1601,6 +1733,10 @@ class SessionDB:
within ``min_interval_hours`` no-op. Designed to be called once at
startup from long-lived entrypoints (CLI, gateway, cron scheduler).
When *sessions_dir* is provided, on-disk transcript files
(``.json`` / ``.jsonl`` / ``request_dump_*``) for pruned sessions
are removed as part of the same sweep (issue #3015).
Never raises. On any failure, logs a warning and returns a dict
with ``"error"`` set.
@@ -1624,7 +1760,10 @@ class SessionDB:
except (TypeError, ValueError):
pass # corrupt meta; treat as no prior run
pruned = self.prune_sessions(older_than_days=retention_days)
pruned = self.prune_sessions(
older_than_days=retention_days,
sessions_dir=sessions_dir,
)
result["pruned"] = pruned
# Only VACUUM if we actually freed rows — VACUUM on a tight DB
+39 -23
View File
@@ -24,6 +24,7 @@ import json
import asyncio
import logging
import threading
import time
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import discover_builtin_tools, registry
@@ -288,30 +289,34 @@ def get_tool_definitions(
filtered_tools[i] = {"type": "function", "function": dynamic_schema}
break
# Rebuild discord_server schema based on the bot's privileged intents
# (detected from GET /applications/@me) and the user's action allowlist
# in config. Hides actions the bot's intents don't support so the
# model never attempts them, and annotates fetch_messages when the
# Rebuild discord / discord_admin schemas based on the bot's privileged
# intents (detected from GET /applications/@me) and the user's action
# allowlist in config. Hides actions the bot's intents don't support so
# the model never attempts them, and annotates fetch_messages when the
# MESSAGE_CONTENT intent is missing.
if "discord_server" in available_tool_names:
try:
from tools.discord_tool import get_dynamic_schema
dynamic = get_dynamic_schema()
except Exception: # pragma: no cover — defensive, fall back to static
dynamic = None
if dynamic is None:
# Tool filtered out entirely (empty allowlist or detection disabled
# the only remaining actions). Drop it from the schema list.
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != "discord_server"
]
available_tool_names.discard("discord_server")
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == "discord_server":
filtered_tools[i] = {"type": "function", "function": dynamic}
break
_discord_schema_fns = {
"discord": "get_dynamic_schema_core",
"discord_admin": "get_dynamic_schema_admin",
}
for discord_tool_name in _discord_schema_fns:
if discord_tool_name in available_tool_names:
try:
from tools import discord_tool as _dt
schema_fn = getattr(_dt, _discord_schema_fns[discord_tool_name])
dynamic = schema_fn()
except Exception:
dynamic = None
if dynamic is None:
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != discord_tool_name
]
available_tool_names.discard(discord_tool_name)
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == discord_tool_name:
filtered_tools[i] = {"type": "function", "function": dynamic}
break
# Strip web tool cross-references from browser_navigate description when
# web_search / web_extract are not available. The static schema says
@@ -563,6 +568,14 @@ def handle_function_call(
except Exception:
pass # file_tools may not be loaded yet
# Measure tool dispatch latency so post_tool_call and
# transform_tool_result hooks can observe per-tool duration.
# Inspired by Claude Code 2.1.119, which added ``duration_ms`` to
# PostToolUse hook inputs so plugin authors can build latency
# dashboards, budget alerts, and regression canaries without having
# to wrap every tool manually. We use monotonic() so the value is
# unaffected by wall-clock adjustments during the call.
_dispatch_start = time.monotonic()
if function_name == "execute_code":
# Prefer the caller-provided list so subagents can't overwrite
# the parent's tool set via the process-global.
@@ -578,6 +591,7 @@ def handle_function_call(
task_id=task_id,
user_task=user_task,
)
duration_ms = int((time.monotonic() - _dispatch_start) * 1000)
try:
from hermes_cli.plugins import invoke_hook
@@ -589,6 +603,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
except Exception:
pass
@@ -609,6 +624,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
for hook_result in hook_results:
if isinstance(hook_result, str):
+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../ui-tui;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-RU4qSHgJPMyfRSEJDzkG4+MReDZDc6QbTD2wisa5QE0=";
hash = "sha256-Chz+NW9NXqboXHOa6PKwf5bhAkkcFtKNhvKWwg2XSPc=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "ui-tui"; attr = "tui"; pname = "hermes-tui"; };
@@ -380,6 +380,10 @@ def backup_existing(path: Path, backup_root: Path) -> Optional[Path]:
# Replace OpenClaw brand names with Hermes in migrated text so that
# memory entries, user profiles, SOUL.md, and workspace instructions
# read as self-referential to the new agent identity.
#
# Case-preserving: ``OpenClaw`` → ``Hermes`` (prose), but lowercase matches
# like ``openclaw`` → ``hermes`` (so filesystem paths like ``~/.openclaw``
# become ``~/.hermes`` — the real Hermes home — not the broken ``~/.Hermes``).
_REBRAND_PATTERNS: List[Tuple[re.Pattern, str]] = [
(re.compile(r'\bOpen[\s-]?Claw\b', re.IGNORECASE), 'Hermes'),
(re.compile(r'\bClawdBot\b', re.IGNORECASE), 'Hermes'),
@@ -387,10 +391,31 @@ _REBRAND_PATTERNS: List[Tuple[re.Pattern, str]] = [
]
def _case_preserving_replacement(replacement: str):
"""Return a re.sub replacement fn that lowercases the result when the
matched text was all-lowercase.
Keeps ``OpenClaw`` ``Hermes`` but maps ``openclaw`` ``hermes`` so a
filesystem path like ``~/.openclaw/config.yaml`` rewrites to
``~/.hermes/config.yaml`` (the real Hermes home) instead of the broken
``~/.Hermes/config.yaml``.
"""
def _sub(match: "re.Match[str]") -> str:
matched = match.group(0)
if matched and matched.islower():
return replacement.lower()
return replacement
return _sub
def rebrand_text(text: str) -> str:
"""Replace OpenClaw / ClawdBot / MoltBot brand names with Hermes."""
"""Replace OpenClaw / ClawdBot / MoltBot brand names with Hermes.
Preserves case so filesystem-path matches (lowercase) don't become
capitalized directory names that don't exist.
"""
for pattern, replacement in _REBRAND_PATTERNS:
text = pattern.sub(replacement, text)
text = pattern.sub(_case_preserving_replacement(replacement), text)
return text
+25
View File
@@ -91,4 +91,29 @@
// Register this plugin — the dashboard picks it up automatically.
window.__HERMES_PLUGINS__.register("example", ExamplePage);
// ─────────────────────────────────────────────────────────────────────
// Page-scoped slot demo: inject a small banner at the top of /sessions.
//
// Built-in pages expose named slots (<page>:top, <page>:bottom) that
// plugins can populate without overriding the whole route. The
// manifest lists the slots we use in its `slots` array so the shell
// knows to render <PluginSlot name="sessions:top" /> there.
// ─────────────────────────────────────────────────────────────────────
function SessionsTopBanner() {
return React.createElement(Card, {
className: "border-dashed",
},
React.createElement(CardContent, { className: "flex items-center gap-3 py-2" },
React.createElement(Badge, { variant: "outline" }, "Example"),
React.createElement("span", {
className: "text-xs text-muted-foreground",
}, "This banner was injected into the Sessions page by the example plugin via the ",
React.createElement("code", { className: "font-courier" }, "sessions:top"),
" slot."),
),
);
}
window.__HERMES_PLUGINS__.registerSlot("example", "sessions:top", SessionsTopBanner);
})();
@@ -8,6 +8,7 @@
"path": "/example",
"position": "after:skills"
},
"slots": ["sessions:top"],
"entry": "dist/index.js",
"api": "plugin_api.py"
}
+124 -29
View File
@@ -3,7 +3,9 @@
Long-term memory with knowledge graph, entity resolution, and multi-strategy
retrieval. Supports cloud (API key) and local modes.
Configurable timeout via HINDSIGHT_TIMEOUT env var or config.json.
Configurable request timeout via HINDSIGHT_TIMEOUT env var or config.json.
Configurable embedded daemon idle timeout via HINDSIGHT_IDLE_TIMEOUT env var
or config.json idle_timeout.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
@@ -14,6 +16,7 @@ Config via environment variables:
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_TIMEOUT API request timeout in seconds (default: 120)
HINDSIGHT_IDLE_TIMEOUT embedded daemon idle timeout seconds; 0 disables shutdown (default: 300)
HINDSIGHT_RETAIN_TAGS comma-separated tags attached to retained memories
HINDSIGHT_RETAIN_SOURCE metadata source value attached to retained memories
HINDSIGHT_RETAIN_USER_PREFIX label used before user turns in retained transcripts
@@ -45,6 +48,7 @@ _DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
_DEFAULT_LOCAL_URL = "http://localhost:8888"
_MIN_CLIENT_VERSION = "0.4.22"
_DEFAULT_TIMEOUT = 120 # seconds — cloud API can take 30-40s per request
_DEFAULT_IDLE_TIMEOUT = 300 # seconds — Hindsight embedded daemon default
_VALID_BUDGETS = {"low", "mid", "high"}
_PROVIDER_DEFAULT_MODELS = {
"openai": "gpt-4o-mini",
@@ -59,6 +63,17 @@ _PROVIDER_DEFAULT_MODELS = {
}
def _parse_int_setting(value: Any, default: int) -> int:
"""Parse an integer config/env value, falling back on invalid input."""
if value is None or value == "":
return default
try:
return int(value)
except (TypeError, ValueError):
logger.warning("Invalid integer Hindsight setting %r; using default %s", value, default)
return default
def _check_local_runtime() -> tuple[bool, str | None]:
"""Return whether local embedded Hindsight imports cleanly.
@@ -203,6 +218,8 @@ def _load_config() -> dict:
return {
"mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
"apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
"timeout": _parse_int_setting(os.environ.get("HINDSIGHT_TIMEOUT"), _DEFAULT_TIMEOUT),
"idle_timeout": _parse_int_setting(os.environ.get("HINDSIGHT_IDLE_TIMEOUT"), _DEFAULT_IDLE_TIMEOUT),
"retain_tags": os.environ.get("HINDSIGHT_RETAIN_TAGS", ""),
"retain_source": os.environ.get("HINDSIGHT_RETAIN_SOURCE", ""),
"retain_user_prefix": os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User"),
@@ -304,6 +321,16 @@ def _build_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | No
}
if current_base_url:
env_values["HINDSIGHT_API_LLM_BASE_URL"] = str(current_base_url)
idle_timeout = (
config.get("idle_timeout")
if config.get("idle_timeout") is not None
else os.environ.get("HINDSIGHT_IDLE_TIMEOUT")
)
if idle_timeout is not None and idle_timeout != "":
env_values["HINDSIGHT_EMBED_DAEMON_IDLE_TIMEOUT"] = str(
_parse_int_setting(idle_timeout, _DEFAULT_IDLE_TIMEOUT)
)
return env_values
@@ -412,6 +439,7 @@ class HindsightMemoryProvider(MemoryProvider):
self._turn_index = 0
self._client = None
self._timeout = _DEFAULT_TIMEOUT
self._idle_timeout = _DEFAULT_IDLE_TIMEOUT
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
@@ -592,10 +620,17 @@ class HindsightMemoryProvider(MemoryProvider):
sys.stdout.write(" LLM API key: ")
sys.stdout.flush()
llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
# Always write explicitly (including empty) so the provider sees ""
# rather than a missing variable. The daemon reads from .env at
# startup and fails when HINDSIGHT_LLM_API_KEY is unset.
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
if llm_key:
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
else:
env_path = Path(hermes_home) / ".env"
existing_llm_key = ""
if env_path.exists():
for line in env_path.read_text().splitlines():
if line.startswith("HINDSIGHT_LLM_API_KEY="):
existing_llm_key = line.split("=", 1)[1]
break
env_writes["HINDSIGHT_LLM_API_KEY"] = existing_llm_key
# Step 4: Save everything
provider_config["bank_id"] = "hermes"
@@ -605,6 +640,11 @@ class HindsightMemoryProvider(MemoryProvider):
timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
provider_config["timeout"] = timeout_val
env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
if mode == "local_embedded":
existing_idle_timeout = self._config.get("idle_timeout") if self._config else None
idle_timeout_val = existing_idle_timeout if existing_idle_timeout is not None else _DEFAULT_IDLE_TIMEOUT
provider_config["idle_timeout"] = idle_timeout_val
env_writes["HINDSIGHT_IDLE_TIMEOUT"] = str(idle_timeout_val)
config["memory"]["provider"] = "hindsight"
save_config(config)
@@ -693,6 +733,7 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
{"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
{"key": "timeout", "description": "API request timeout in seconds", "default": _DEFAULT_TIMEOUT},
{"key": "idle_timeout", "description": "Embedded daemon idle timeout in seconds (0 disables auto-shutdown)", "default": _DEFAULT_IDLE_TIMEOUT, "when": {"mode": "local_embedded"}},
]
def _get_client(self):
@@ -720,6 +761,14 @@ class HindsightMemoryProvider(MemoryProvider):
)
if self._llm_base_url:
kwargs["llm_base_url"] = self._llm_base_url
idle_timeout = _parse_int_setting(
self._config.get("idle_timeout")
if self._config.get("idle_timeout") is not None
else os.environ.get("HINDSIGHT_IDLE_TIMEOUT", self._idle_timeout),
_DEFAULT_IDLE_TIMEOUT,
)
self._idle_timeout = idle_timeout
kwargs["idle_timeout"] = idle_timeout
self._client = HindsightEmbedded(**kwargs)
else:
from hindsight_client import Hindsight
@@ -736,6 +785,38 @@ class HindsightMemoryProvider(MemoryProvider):
"""Schedule *coro* on the shared loop using the configured timeout."""
return _run_sync(coro, timeout=self._timeout)
def _is_retriable_embedded_connection_error(self, exc: Exception) -> bool:
"""Return True for stale embedded-daemon connection failures."""
if self._mode != "local_embedded":
return False
text = f"{type(exc).__name__}: {exc}".lower()
return any(
marker in text
for marker in (
"cannot connect to host",
"connection refused",
"connect call failed",
"clientconnectorerror",
)
)
def _run_hindsight_operation(self, operation):
"""Run an async Hindsight client operation, retrying once after idle shutdown."""
client = self._get_client()
try:
return self._run_sync(operation(client))
except Exception as exc:
if not self._is_retriable_embedded_connection_error(exc):
raise
logger.info(
"Hindsight embedded daemon appears unreachable; recreating client and retrying once: %s",
exc,
)
self._client = None
client = self._get_client()
self._client = client
return self._run_sync(operation(client))
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = str(session_id or "").strip()
self._parent_session_id = str(kwargs.get("parent_session_id", "") or "").strip()
@@ -790,7 +871,14 @@ class HindsightMemoryProvider(MemoryProvider):
self._session_turns = []
self._mode = self._config.get("mode", "cloud")
# Read timeout from config or env var, fall back to default
self._timeout = self._config.get("timeout") or int(os.environ.get("HINDSIGHT_TIMEOUT", str(_DEFAULT_TIMEOUT)))
self._timeout = _parse_int_setting(
self._config.get("timeout") if self._config.get("timeout") is not None else os.environ.get("HINDSIGHT_TIMEOUT"),
_DEFAULT_TIMEOUT,
)
self._idle_timeout = _parse_int_setting(
self._config.get("idle_timeout") if self._config.get("idle_timeout") is not None else os.environ.get("HINDSIGHT_IDLE_TIMEOUT"),
_DEFAULT_IDLE_TIMEOUT,
)
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
self._mode = "local_embedded"
@@ -981,10 +1069,9 @@ class HindsightMemoryProvider(MemoryProvider):
def _run():
try:
client = self._get_client()
if self._prefetch_method == "reflect":
logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
resp = self._run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
resp = self._run_hindsight_operation(lambda client: client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
text = resp.text or ""
else:
recall_kwargs: dict = {
@@ -998,7 +1085,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.arecall(**recall_kwargs))
resp = self._run_hindsight_operation(lambda client: client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Prefetch: recall returned %d results", num_results)
text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
@@ -1131,12 +1218,14 @@ class HindsightMemoryProvider(MemoryProvider):
item.pop("retain_async", None)
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._document_id, self._retain_async, len(content), len(self._session_turns))
self._run_sync(client.aretain_batch(
bank_id=self._bank_id,
items=[item],
document_id=self._document_id,
retain_async=self._retain_async,
))
self._run_hindsight_operation(
lambda client: client.aretain_batch(
bank_id=self._bank_id,
items=[item],
document_id=self._document_id,
retain_async=self._retain_async,
)
)
logger.debug("Hindsight retain succeeded")
except Exception as e:
logger.warning("Hindsight sync failed: %s", e, exc_info=True)
@@ -1152,12 +1241,6 @@ class HindsightMemoryProvider(MemoryProvider):
return [RETAIN_SCHEMA, RECALL_SCHEMA, REFLECT_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
try:
client = self._get_client()
except Exception as e:
logger.warning("Hindsight client init failed: %s", e)
return tool_error(f"Hindsight client unavailable: {e}")
if tool_name == "hindsight_retain":
content = args.get("content", "")
if not content:
@@ -1171,7 +1254,7 @@ class HindsightMemoryProvider(MemoryProvider):
)
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
self._run_sync(client.aretain(**retain_kwargs))
self._run_hindsight_operation(lambda client: client.aretain(**retain_kwargs))
logger.debug("Tool hindsight_retain: success")
return json.dumps({"result": "Memory stored successfully."})
except Exception as e:
@@ -1194,7 +1277,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.arecall(**recall_kwargs))
resp = self._run_hindsight_operation(lambda client: client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Tool hindsight_recall: %d results", num_results)
if not resp.results:
@@ -1212,9 +1295,11 @@ class HindsightMemoryProvider(MemoryProvider):
try:
logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.areflect(
bank_id=self._bank_id, query=query, budget=self._budget
))
resp = self._run_hindsight_operation(
lambda client: client.areflect(
bank_id=self._bank_id, query=query, budget=self._budget
)
)
logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
return json.dumps({"result": resp.text or "No relevant memories found."})
except Exception as e:
@@ -1231,9 +1316,19 @@ class HindsightMemoryProvider(MemoryProvider):
if self._client is not None:
try:
if self._mode == "local_embedded":
# Use the public close() API. The RuntimeError from
# aiohttp's "attached to a different loop" is expected
# and harmless — the daemon keeps running independently.
# HindsightEmbedded.close() delegates to its sync client.close().
# When Hermes created/used that client on the shared async loop,
# closing it from this thread can raise "attached to a different
# loop" before aiohttp releases the session. Close the embedded
# inner async client on the shared loop first, then let the
# wrapper clean up daemon/UI bookkeeping.
inner_client = getattr(self._client, "_client", None)
if inner_client is not None and hasattr(inner_client, "aclose"):
_run_sync(inner_client.aclose())
try:
self._client._client = None
except Exception:
pass
try:
self._client.close()
except RuntimeError:
+1 -1
View File
@@ -43,7 +43,7 @@ _TIMEOUT = 30.0
# ---------------------------------------------------------------------------
# Process-level atexit safety net — ensures pending sessions are committed
# even if shutdown_memory_provider is never called (e.g. gateway crash,
# SIGKILL, or exception in _async_flush_memories preventing shutdown).
# SIGKILL, or exception in the session expiry watcher preventing shutdown).
# ---------------------------------------------------------------------------
_last_active_provider: Optional["OpenVikingMemoryProvider"] = None
+314 -331
View File
@@ -40,6 +40,7 @@ from types import SimpleNamespace
import urllib.request
import uuid
from typing import List, Dict, Any, Optional
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
import fire
from datetime import datetime
@@ -891,7 +892,6 @@ class AIAgent:
checkpoints_enabled: bool = False,
checkpoint_max_snapshots: int = 50,
pass_session_id: bool = False,
persist_session: bool = True,
):
"""
Initialize the AI Agent.
@@ -963,7 +963,6 @@ class AIAgent:
self.background_review_callback = None # Optional sync callback for gateway delivery
self.skip_context_files = skip_context_files
self.pass_session_id = pass_session_id
self.persist_session = persist_session
self._credential_pool = credential_pool
self.log_prefix_chars = log_prefix_chars
self.log_prefix = f"{log_prefix} " if log_prefix else ""
@@ -1033,12 +1032,16 @@ class AIAgent:
# surface.
# When api_mode was explicitly provided, respect it — the user
# knows what their endpoint supports (#10473).
# Exception: Azure OpenAI serves gpt-5.x on /chat/completions and
# does NOT support the Responses API — skip the upgrade for Azure
# (openai.azure.com), even though it looks OpenAI-compatible.
if (
api_mode is None
and self.api_mode == "chat_completions"
and self.provider != "copilot-acp"
and not str(self.base_url or "").lower().startswith("acp://copilot")
and not str(self.base_url or "").lower().startswith("acp+tcp://")
and not self._is_azure_openai_url()
and (
self._is_direct_openai_url()
or self._provider_model_requires_responses_api(
@@ -1314,7 +1317,22 @@ class AIAgent:
if api_key and base_url:
# Explicit credentials from CLI/gateway — construct directly.
# The runtime provider resolver already handled auth for us.
client_kwargs = {"api_key": api_key, "base_url": base_url}
# Extract query params (e.g. Azure api-version) from base_url
# and pass via default_query to prevent loss during SDK URL
# joining (httpx drops query string when joining paths).
_parsed_url = urlparse(base_url)
if _parsed_url.query:
_clean_url = urlunparse(_parsed_url._replace(query=""))
_query_params = {
k: v[0] for k, v in parse_qs(_parsed_url.query).items()
}
client_kwargs = {
"api_key": api_key,
"base_url": _clean_url,
"default_query": _query_params,
}
else:
client_kwargs = {"api_key": api_key, "base_url": base_url}
if _provider_timeout is not None:
client_kwargs["timeout"] = _provider_timeout
if self.provider == "copilot-acp":
@@ -1578,7 +1596,6 @@ class AIAgent:
self._memory_enabled = False
self._user_profile_enabled = False
self._memory_nudge_interval = 10
self._memory_flush_min_turns = 6
self._turns_since_memory = 0
self._iters_since_skill = 0
if not skip_memory:
@@ -1587,7 +1604,6 @@ class AIAgent:
self._memory_enabled = mem_config.get("memory_enabled", False)
self._user_profile_enabled = mem_config.get("user_profile_enabled", False)
self._memory_nudge_interval = int(mem_config.get("nudge_interval", 10))
self._memory_flush_min_turns = int(mem_config.get("flush_min_turns", 6))
if self._memory_enabled or self._user_profile_enabled:
from tools.memory_tool import MemoryStore
self._memory_store = MemoryStore(
@@ -1767,43 +1783,64 @@ class AIAgent:
# Store for reuse in switch_model (so config override persists across model switches)
self._config_context_length = _config_context_length
# Resolve custom_providers list once for reuse below (startup
# context-length override and plugin context-engine init).
try:
from hermes_cli.config import get_compatible_custom_providers
_custom_providers = get_compatible_custom_providers(_agent_cfg)
except Exception:
_custom_providers = _agent_cfg.get("custom_providers")
if not isinstance(_custom_providers, list):
_custom_providers = []
# Check custom_providers per-model context_length
if _config_context_length is None:
if _config_context_length is None and _custom_providers:
try:
from hermes_cli.config import get_compatible_custom_providers
_custom_providers = get_compatible_custom_providers(_agent_cfg)
from hermes_cli.config import get_custom_provider_context_length
_cp_ctx_resolved = get_custom_provider_context_length(
model=self.model,
base_url=self.base_url,
custom_providers=_custom_providers,
)
if _cp_ctx_resolved:
_config_context_length = int(_cp_ctx_resolved)
except Exception:
_custom_providers = _agent_cfg.get("custom_providers")
if not isinstance(_custom_providers, list):
_custom_providers = []
for _cp_entry in _custom_providers:
if not isinstance(_cp_entry, dict):
continue
_cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
if _cp_url and _cp_url == self.base_url.rstrip("/"):
_cp_models = _cp_entry.get("models", {})
if isinstance(_cp_models, dict):
_cp_model_cfg = _cp_models.get(self.model, {})
if isinstance(_cp_model_cfg, dict):
_cp_ctx = _cp_model_cfg.get("context_length")
if _cp_ctx is not None:
try:
_config_context_length = int(_cp_ctx)
except (TypeError, ValueError):
logger.warning(
"Invalid context_length for model %r in "
"custom_providers: %r — must be a plain "
"integer (e.g. 256000, not '256K'). "
"Falling back to auto-detection.",
self.model, _cp_ctx,
)
print(
f"\n⚠ Invalid context_length for model {self.model!r} in custom_providers: {_cp_ctx!r}\n"
f" Must be a plain integer (e.g. 256000, not '256K').\n"
f" Falling back to auto-detected context window.\n",
file=sys.stderr,
)
break
_cp_ctx_resolved = None
# Surface a clear warning if the user set a context_length but it
# wasn't a valid positive int — the helper silently skips those.
if _config_context_length is None:
_target = self.base_url.rstrip("/") if self.base_url else ""
for _cp_entry in _custom_providers:
if not isinstance(_cp_entry, dict):
continue
_cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
if _target and _cp_url == _target:
_cp_models = _cp_entry.get("models", {})
if isinstance(_cp_models, dict):
_cp_model_cfg = _cp_models.get(self.model, {})
if isinstance(_cp_model_cfg, dict):
_cp_ctx = _cp_model_cfg.get("context_length")
if _cp_ctx is not None:
try:
_parsed = int(_cp_ctx)
if _parsed <= 0:
raise ValueError
except (TypeError, ValueError):
logger.warning(
"Invalid context_length for model %r in "
"custom_providers: %r — must be a positive "
"integer (e.g. 256000, not '256K'). "
"Falling back to auto-detection.",
self.model, _cp_ctx,
)
print(
f"\n⚠ Invalid context_length for model {self.model!r} in custom_providers: {_cp_ctx!r}\n"
f" Must be a positive integer (e.g. 256000, not '256K').\n"
f" Falling back to auto-detected context window.\n",
file=sys.stderr,
)
break
# Select context engine: config-driven (like memory providers).
# 1. Check config.yaml context.engine setting
@@ -1853,6 +1890,7 @@ class AIAgent:
api_key=getattr(self, "api_key", ""),
config_context_length=_config_context_length,
provider=self.provider,
custom_providers=_custom_providers,
)
self.context_compressor.update_model(
model=self.model,
@@ -2143,12 +2181,23 @@ class AIAgent:
# ── Update context compressor ──
if hasattr(self, "context_compressor") and self.context_compressor:
from agent.model_metadata import get_model_context_length
# Re-read custom_providers from live config so per-model
# context_length overrides are honored when switching to a
# custom provider mid-session (closes #15779).
_sm_custom_providers = None
try:
from hermes_cli.config import load_config, get_compatible_custom_providers
_sm_cfg = load_config()
_sm_custom_providers = get_compatible_custom_providers(_sm_cfg)
except Exception:
_sm_custom_providers = None
new_context_length = get_model_context_length(
self.model,
base_url=self.base_url,
api_key=self.api_key,
provider=self.provider,
config_context_length=getattr(self, "_config_context_length", None),
custom_providers=_sm_custom_providers,
)
self.context_compressor.update_model(
model=self.model,
@@ -2399,6 +2448,7 @@ class AIAgent:
base_url=aux_base_url,
api_key=aux_api_key,
config_context_length=getattr(self, "_aux_compression_context_length_config", None),
provider=getattr(self, "provider", ""),
)
# Hard floor: the auxiliary compression model must have at least
@@ -2425,6 +2475,11 @@ class AIAgent:
# compression actually works this session. The hard floor
# above guarantees aux_context >= MINIMUM_CONTEXT_LENGTH,
# so the new threshold is always >= 64K.
#
# The compression summariser sends a single user-role
# prompt (no system prompt, no tools) to the aux model, so
# new_threshold == aux_context is safe: the request is
# the raw messages plus a small summarisation instruction.
old_threshold = threshold
new_threshold = aux_context
self.context_compressor.threshold_tokens = new_threshold
@@ -2500,6 +2555,22 @@ class AIAgent:
)
return hostname == "api.openai.com"
def _is_azure_openai_url(self, base_url: str = None) -> bool:
"""Return True when a base URL targets Azure OpenAI.
Azure OpenAI exposes an OpenAI-compatible endpoint at
``{resource}.openai.azure.com/openai/v1`` that accepts the
standard ``openai`` Python client. Unlike api.openai.com it
does NOT support the Responses API gpt-5.x models are served
on the regular ``/chat/completions`` path so routing decisions
must treat Azure separately from direct OpenAI.
"""
if base_url is not None:
url = str(base_url).lower()
else:
url = getattr(self, "_base_url_lower", "") or ""
return "openai.azure.com" in url
def _resolved_api_call_timeout(self) -> float:
"""Resolve the effective per-call request timeout in seconds.
@@ -2671,12 +2742,14 @@ class AIAgent:
def _max_tokens_param(self, value: int) -> dict:
"""Return the correct max tokens kwarg for the current provider.
OpenAI's newer models (gpt-4o, o-series, gpt-5+) require
'max_completion_tokens'. OpenRouter, local models, and older
'max_completion_tokens'. Azure OpenAI also requires
'max_completion_tokens' for gpt-5.x models served via the
OpenAI-compatible endpoint. OpenRouter, local models, and older
OpenAI models use 'max_tokens'.
"""
if self._is_direct_openai_url():
if self._is_direct_openai_url() or self._is_azure_openai_url():
return {"max_completion_tokens": value}
return {"max_tokens": value}
@@ -3034,13 +3107,28 @@ class AIAgent:
)
_SKILL_REVIEW_PROMPT = (
"Review the conversation above and consider saving or updating a skill if appropriate.\n\n"
"Focus on: was a non-trivial approach used to complete a task that required trial "
"and error, or changing course due to experiential findings along the way, or did "
"the user expect or desire a different method or outcome?\n\n"
"If a relevant skill already exists, update it with what you learned. "
"Otherwise, create a new skill if the approach is reusable.\n"
"If nothing is worth saving, just say 'Nothing to save.' and stop."
"Review the conversation above and consider whether a skill should be saved or updated.\n\n"
"Work in this order — do not skip steps:\n\n"
"1. SURVEY the existing skill landscape first. Call skills_list to see what you "
"have. If anything looks potentially relevant, skill_view it before deciding. "
"You are looking for the CLASS of task that just happened, not the exact task. "
"Example: a successful Tauri build is in the class \"desktop app build "
"troubleshooting\", not \"fix my specific Tauri error today\".\n\n"
"2. THINK CLASS-FIRST. What general pattern of task did the user just complete? "
"What conditions will trigger this pattern again? Describe the class in one "
"sentence before looking at what to save.\n\n"
"3. PREFER GENERALIZING AN EXISTING SKILL over creating a new one. If a skill "
"already covers the class — even partially — update it (skill_manage patch) "
"with the new insight. Broaden its \"when to use\" trigger if needed.\n\n"
"4. ONLY CREATE A NEW SKILL when no existing skill reasonably covers the class. "
"When you create one, name and scope it at the class level "
"(\"react-i18n-setup\", not \"add-i18n-to-my-dashboard-app\"). The trigger "
"section must describe the class of situations, not this one session.\n\n"
"5. If you notice two existing skills that overlap, note it in your response "
"so a future review can consolidate them. Do not consolidate now unless the "
"overlap is obvious and low-risk.\n\n"
"Only act when something is genuinely worth saving. "
"If nothing stands out, just say 'Nothing to save.' and stop."
)
_COMBINED_REVIEW_PROMPT = (
@@ -3050,9 +3138,16 @@ class AIAgent:
"about how you should behave, their work style, or ways they want you to operate? "
"If so, save using the memory tool.\n\n"
"**Skills**: Was a non-trivial approach used to complete a task that required trial "
"and error, or changing course due to experiential findings along the way, or did "
"the user expect or desire a different method or outcome? If a relevant skill "
"already exists, update it. Otherwise, create a new one if the approach is reusable.\n\n"
"and error, changing course due to experiential findings, or a different method "
"or outcome than the user expected? If so, work in this order:\n"
" a. SURVEY existing skills first (skills_list, then skill_view on candidates).\n"
" b. Identify the CLASS of task, not the specific task "
"(\"desktop app build troubleshooting\", not \"fix my Tauri error\").\n"
" c. PREFER UPDATING/GENERALIZING an existing skill that covers the class.\n"
" d. ONLY CREATE A NEW SKILL if no existing one covers the class. Scope at "
"the class level, not this one session.\n"
" e. If you notice overlapping skills during the survey, note it so a future "
"review can consolidate them.\n\n"
"Only act if there's something genuinely worth saving. "
"If nothing stands out, just say 'Nothing to save.' and stop."
)
@@ -3150,12 +3245,25 @@ class AIAgent:
with open(os.devnull, "w") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
# Inherit the parent agent's live runtime (provider, model,
# base_url, api_key, api_mode) so the fork uses the exact
# same credentials the main turn is using. Without this,
# AIAgent.__init__ re-runs auto-resolution from env vars,
# which fails for OAuth-only providers, session-scoped
# creds, or credential-pool setups where the resolver can't
# reconstruct auth from scratch -- producing the spurious
# "No LLM provider configured" warning at end of turn.
_parent_runtime = self._current_main_runtime()
review_agent = AIAgent(
model=self.model,
max_iterations=8,
quiet_mode=True,
platform=self.platform,
provider=self.provider,
api_mode=_parent_runtime.get("api_mode") or None,
base_url=_parent_runtime.get("base_url") or None,
api_key=_parent_runtime.get("api_key") or None,
credential_pool=getattr(self, "_credential_pool", None),
parent_session_id=self.session_id,
)
review_agent._memory_write_origin = "background_review"
@@ -3196,10 +3304,19 @@ class AIAgent:
logger.warning("Background memory/skill review failed: %s", e)
self._emit_auxiliary_failure("background review", e)
finally:
# Close all resources (httpx client, subprocesses, etc.) so
# GC doesn't try to clean them up on a dead asyncio event
# loop (which produces "Event loop is closed" errors).
# Background review agents can initialize memory providers
# (for example Hindsight) that own their own network clients.
# Explicitly stop those providers before closing the agent so
# their aiohttp sessions do not leak until GC/process exit.
# Then close all remaining resources (httpx client,
# subprocesses, etc.) so GC doesn't try to clean them up on a
# dead asyncio event loop (which produces "Event loop is
# closed" errors).
if review_agent is not None:
try:
review_agent.shutdown_memory_provider()
except Exception:
pass
try:
review_agent.close()
except Exception:
@@ -3256,10 +3373,7 @@ class AIAgent:
"""Save session state to both JSON log and SQLite on any exit path.
Ensures conversations are never lost, even on errors or early returns.
Skipped when ``persist_session=False`` (ephemeral helper flows).
"""
if not self.persist_session:
return
self._apply_persist_user_message_override(messages)
self._session_messages = messages
self._save_session_log(messages)
@@ -3309,6 +3423,7 @@ class AIAgent:
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
self._last_flushed_db_idx = len(messages)
except Exception as e:
@@ -5137,6 +5252,8 @@ class AIAgent:
# response.incomplete instead of response.completed).
self._codex_streamed_text_parts: list = []
for attempt in range(max_stream_retries + 1):
if self._interrupt_requested:
raise InterruptedError("Agent interrupted before Codex stream retry")
collected_output_items: list = []
try:
with active_client.responses.stream(**api_kwargs) as stream:
@@ -5431,6 +5548,11 @@ class AIAgent:
# Other anthropic_messages providers (MiniMax, Alibaba, etc.) use their own keys.
if self.provider != "anthropic":
return False
# Azure endpoints use static API keys — OAuth token rotation doesn't apply.
# Refreshing would pick up ~/.claude/.credentials.json OAuth token and break auth.
_base = getattr(self, "_anthropic_base_url", "") or ""
if "azure.com" in _base:
return False
try:
from agent.anthropic_adapter import resolve_anthropic_token, build_anthropic_client
@@ -6306,6 +6428,14 @@ class AIAgent:
try:
for _stream_attempt in range(_max_stream_retries + 1):
# Check for interrupt before each retry attempt. Without
# this, /stop closes the HTTP connection (outer poll loop),
# but the retry loop opens a FRESH connection — negating the
# interrupt entirely. On slow providers (ollama-cloud) each
# retry can block for the full stream-read timeout (120s+),
# causing multi-minute delays between /stop and response.
if self._interrupt_requested:
raise InterruptedError("Agent interrupted before stream retry")
try:
if self.api_mode == "anthropic_messages":
self._try_refresh_anthropic_client_credentials()
@@ -6779,10 +6909,15 @@ class AIAgent:
# Determine api_mode from provider / base URL / model
fb_api_mode = "chat_completions"
fb_base_url = str(fb_client.base_url)
_fb_is_azure = self._is_azure_openai_url(fb_base_url)
if fb_provider == "openai-codex":
fb_api_mode = "codex_responses"
elif fb_provider == "anthropic" or fb_base_url.rstrip("/").lower().endswith("/anthropic"):
fb_api_mode = "anthropic_messages"
elif _fb_is_azure:
# Azure OpenAI serves gpt-5.x on /chat/completions — does NOT
# support the Responses API. Stay on chat_completions.
fb_api_mode = "chat_completions"
elif self._is_direct_openai_url(fb_base_url):
fb_api_mode = "codex_responses"
elif self._provider_model_requires_responses_api(
@@ -7655,6 +7790,13 @@ class AIAgent:
if codex_items:
msg["codex_reasoning_items"] = codex_items
# Codex Responses API: preserve exact assistant message items (with
# id/phase) so follow-up turns can replay structured items instead of
# flattening to plain text. This is required for prefix cache hits.
codex_message_items = getattr(assistant_message, "codex_message_items", None)
if codex_message_items:
msg["codex_message_items"] = codex_message_items
if assistant_message.tool_calls:
tool_calls = []
for tool_call in assistant_message.tool_calls:
@@ -7740,25 +7882,53 @@ class AIAgent:
if source_msg.get("role") != "assistant":
return
explicit_reasoning = source_msg.get("reasoning_content")
if isinstance(explicit_reasoning, str):
api_msg["reasoning_content"] = explicit_reasoning
# 1. Explicit reasoning_content already set — preserve it verbatim
# (includes DeepSeek/Kimi's own empty-string placeholder written at
# creation time, and any valid reasoning content from the same provider).
existing = source_msg.get("reasoning_content")
if isinstance(existing, str):
api_msg["reasoning_content"] = existing
return
# 2. Healthy session: promote 'reasoning' field to 'reasoning_content'
# for providers that use the internal 'reasoning' key.
# This must happen BEFORE the DeepSeek/Kimi tool-call check so that
# genuine reasoning content is not overwritten by the empty-string
# fallback (#15812 regression in PR #15478).
normalized_reasoning = source_msg.get("reasoning")
if isinstance(normalized_reasoning, str) and normalized_reasoning:
api_msg["reasoning_content"] = normalized_reasoning
return
# Providers that require an echoed reasoning_content on every
# assistant tool-call turn. Detection logic lives in the per-provider
# helpers so both the creation path (_build_assistant_message) and
# this replay path stay in sync.
if source_msg.get("tool_calls") and (
# 3. DeepSeek / Kimi thinking mode: tool-call turns that lack
# reasoning_content are "poisoned history" — a prior provider (MiniMax,
# etc.) left them empty. DeepSeek returns HTTP 400 if reasoning_content
# is absent on replay; inject "" to satisfy the provider's requirement
# without forwarding any cross-provider reasoning content.
needs_empty_reasoning = (
source_msg.get("tool_calls")
and (
self._needs_kimi_tool_reasoning()
or self._needs_deepseek_tool_reasoning()
)
)
if needs_empty_reasoning:
api_msg["reasoning_content"] = ""
return
# 4. DeepSeek / Kimi thinking mode: all assistant messages need
# reasoning_content. Inject "" to satisfy the provider's requirement
# when no explicit reasoning content is present.
if (
self._needs_kimi_tool_reasoning()
or self._needs_deepseek_tool_reasoning()
):
api_msg["reasoning_content"] = ""
return
# 5. reasoning_content was present but not a string (e.g. None after
# context compaction). Don't pass null to the API.
api_msg.pop("reasoning_content", None)
@staticmethod
def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
@@ -7910,251 +8080,6 @@ class AIAgent:
"""
return self.api_mode != "codex_responses"
def flush_memories(self, messages: list = None, min_turns: int = None):
"""Give the model one turn to persist memories before context is lost.
Called before compression, session reset, or CLI exit. Injects a flush
message, makes one API call, executes any memory tool calls, then
strips all flush artifacts from the message list.
Args:
messages: The current conversation messages. If None, uses
self._session_messages (last run_conversation state).
min_turns: Minimum user turns required to trigger the flush.
None = use config value (flush_min_turns).
0 = always flush (used for compression).
"""
if self._memory_flush_min_turns == 0 and min_turns is None:
return
if "memory" not in self.valid_tool_names or not self._memory_store:
return
effective_min = min_turns if min_turns is not None else self._memory_flush_min_turns
if self._user_turn_count < effective_min:
return
if messages is None:
messages = getattr(self, '_session_messages', None)
if not messages or len(messages) < 3:
return
flush_content = (
"[System: The session is being compressed. "
"Save anything worth remembering — prioritize user preferences, "
"corrections, and recurring patterns over task-specific details.]"
)
_sentinel = f"__flush_{id(self)}_{time.monotonic()}"
flush_msg = {"role": "user", "content": flush_content, "_flush_sentinel": _sentinel}
messages.append(flush_msg)
try:
# Build API messages for the flush call
_needs_sanitize = self._should_sanitize_tool_calls()
api_messages = []
for msg in messages:
api_msg = msg.copy()
self._copy_reasoning_content_for_api(msg, api_msg)
api_msg.pop("reasoning", None)
api_msg.pop("finish_reason", None)
api_msg.pop("_flush_sentinel", None)
api_msg.pop("_thinking_prefill", None)
if _needs_sanitize:
self._sanitize_tool_calls_for_strict_api(api_msg)
api_messages.append(api_msg)
if self._cached_system_prompt:
api_messages = [{"role": "system", "content": self._cached_system_prompt}] + api_messages
# Make one API call with only the memory tool available
memory_tool_def = None
for t in (self.tools or []):
if t.get("function", {}).get("name") == "memory":
memory_tool_def = t
break
if not memory_tool_def:
messages.pop() # remove flush msg
return
# Use auxiliary client for the flush call when available --
# it's cheaper and avoids Codex Responses API incompatibility.
from agent.auxiliary_client import (
call_llm as _call_llm,
_fixed_temperature_for_model,
OMIT_TEMPERATURE,
)
_aux_available = True
# Kimi models manage temperature server-side — omit it entirely.
# Other models with a fixed contract get that value; everyone else
# gets the historical 0.3 default.
_fixed_temp = _fixed_temperature_for_model(self.model, self.base_url)
_omit_temperature = _fixed_temp is OMIT_TEMPERATURE
if _omit_temperature:
_flush_temperature = None
elif _fixed_temp is not None:
_flush_temperature = _fixed_temp
else:
_flush_temperature = 0.3
aux_error = None
try:
response = _call_llm(
task="flush_memories",
messages=api_messages,
tools=[memory_tool_def],
temperature=_flush_temperature,
max_tokens=5120,
# timeout resolved from auxiliary.flush_memories.timeout config
)
except Exception as e:
aux_error = e
_aux_available = False
response = None
if not _aux_available and self.api_mode == "codex_responses":
# No auxiliary client -- use the Codex Responses path directly
codex_kwargs = self._build_api_kwargs(api_messages)
_ct_flush = self._get_transport()
if _ct_flush is not None:
codex_kwargs["tools"] = _ct_flush.convert_tools([memory_tool_def])
elif not codex_kwargs.get("tools"):
codex_kwargs["tools"] = [memory_tool_def]
if _flush_temperature is not None:
codex_kwargs["temperature"] = _flush_temperature
else:
codex_kwargs.pop("temperature", None)
if "max_output_tokens" in codex_kwargs:
codex_kwargs["max_output_tokens"] = 5120
response = self._run_codex_stream(codex_kwargs)
elif not _aux_available and self.api_mode == "anthropic_messages":
# Native Anthropic — use the transport for kwargs
_tflush = self._get_transport()
ant_kwargs = _tflush.build_kwargs(
model=self.model, messages=api_messages,
tools=[memory_tool_def], max_tokens=5120,
reasoning_config=None,
preserve_dots=self._anthropic_preserve_dots(),
)
response = self._anthropic_messages_create(ant_kwargs)
elif not _aux_available:
api_kwargs = {
"model": self.model,
"messages": api_messages,
"tools": [memory_tool_def],
**self._max_tokens_param(5120),
}
if _flush_temperature is not None:
api_kwargs["temperature"] = _flush_temperature
from agent.auxiliary_client import _get_task_timeout
response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(
**api_kwargs, timeout=_get_task_timeout("flush_memories")
)
if aux_error is not None:
logger.warning("Auxiliary memory flush failed; used fallback path: %s", aux_error)
self._emit_auxiliary_failure("memory flush", aux_error)
def _openai_tool_calls(resp):
if resp is not None and hasattr(resp, "choices") and resp.choices:
msg = getattr(resp.choices[0], "message", None)
calls = getattr(msg, "tool_calls", None)
if calls:
return calls
return []
def _codex_output_tool_calls(resp):
calls = []
for item in getattr(resp, "output", []) or []:
if getattr(item, "type", None) == "function_call":
calls.append(SimpleNamespace(
id=getattr(item, "call_id", None),
type="function",
function=SimpleNamespace(
name=getattr(item, "name", ""),
arguments=getattr(item, "arguments", "{}"),
),
))
return calls
# Extract tool calls from the response, handling all API formats
tool_calls = []
if self.api_mode == "codex_responses" and not _aux_available:
_ct_flush = self._get_transport()
_cnr_flush = _ct_flush.normalize_response(response) if _ct_flush is not None else None
if _cnr_flush and _cnr_flush.tool_calls:
tool_calls = [
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in _cnr_flush.tool_calls
]
else:
tool_calls = _codex_output_tool_calls(response)
elif self.api_mode == "anthropic_messages" and not _aux_available:
_tfn = self._get_transport()
_flush_result = _tfn.normalize_response(response, strip_tool_prefix=self._is_anthropic_oauth)
if _flush_result and _flush_result.tool_calls:
tool_calls = [
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in _flush_result.tool_calls
]
elif self.api_mode in ("chat_completions", "bedrock_converse"):
# chat_completions / bedrock — normalize through transport
_tfn = self._get_transport()
_flush_result = _tfn.normalize_response(response) if _tfn is not None else None
if _flush_result and _flush_result.tool_calls:
tool_calls = _flush_result.tool_calls
else:
tool_calls = _openai_tool_calls(response)
elif _aux_available and hasattr(response, "choices") and response.choices:
# Auxiliary client returned OpenAI-shaped response while main
# api_mode is codex/anthropic — extract tool_calls from .choices
tool_calls = _openai_tool_calls(response)
for tc in tool_calls:
if tc.function.name == "memory":
try:
args = json.loads(tc.function.arguments)
flush_target = args.get("target", "memory")
from tools.memory_tool import memory_tool as _memory_tool
_memory_tool(
action=args.get("action"),
target=flush_target,
content=args.get("content"),
old_text=args.get("old_text"),
store=self._memory_store,
)
if self._memory_manager and args.get("action") in ("add", "replace"):
try:
self._memory_manager.on_memory_write(
args.get("action", ""),
flush_target,
args.get("content", ""),
metadata=self._build_memory_write_metadata(
write_origin="memory_flush",
execution_context="flush_memories",
),
)
except Exception:
pass
if not self.quiet_mode:
print(f" 🧠 Memory flush: saved to {args.get('target', 'memory')}")
except Exception as e:
logger.warning("Memory flush tool call failed: %s", e)
self._emit_auxiliary_failure("memory flush tool", e)
except Exception as e:
logger.warning("Memory flush API call failed: %s", e)
self._emit_auxiliary_failure("memory flush", e)
finally:
# Strip flush artifacts: remove everything from the flush message onward.
# Use sentinel marker instead of identity check for robustness.
while messages and messages[-1].get("_flush_sentinel") != _sentinel:
messages.pop()
if not messages:
break
if messages and messages[-1].get("_flush_sentinel") == _sentinel:
messages.pop()
def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None, task_id: str = "default", focus_topic: str = None) -> tuple:
"""Compress conversation context and split the session in SQLite.
@@ -8173,8 +8098,6 @@ class AIAgent:
f"{approx_tokens:,}" if approx_tokens else "unknown", self.model,
focus_topic,
)
# Pre-compression memory flush: let the model save memories before they're lost
self.flush_memories(messages, min_turns=0)
# Notify external memory provider before compression discards context
if self._memory_manager:
@@ -8237,6 +8160,22 @@ class AIAgent:
except Exception as e:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
# Notify the context engine that the session_id rotated because of
# compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
# boundary_reason="compression" to preserve DAG lineage across the
# rollover instead of re-initializing fresh per-session state.
# See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
try:
_old_sid = locals().get("old_session_id")
if _old_sid and hasattr(self.context_compressor, "on_session_start"):
self.context_compressor.on_session_start(
self.session_id or "",
boundary_reason="compression",
old_session_id=_old_sid,
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
# Warn on repeated compressions (quality degrades with each pass)
_cc = self.context_compressor.compression_count
if _cc >= 2:
@@ -11126,36 +11065,69 @@ class AIAgent:
continue
# ── Nous Portal: record rate limit & skip retries ─────
# When Nous returns a 429, record the reset time to a
# shared file so ALL sessions (cron, gateway, auxiliary)
# know not to pile on. Then skip further retries —
# each one burns another RPH request and deepens the
# rate limit hole. The retry loop's top-of-iteration
# guard will catch this on the next pass and try
# fallback or bail with a clear message.
# When Nous returns a 429 that is a genuine account-
# level rate limit, record the reset time to a shared
# file so ALL sessions (cron, gateway, auxiliary) know
# not to pile on, then skip further retries -- each
# one burns another RPH request and deepens the hole.
# The retry loop's top-of-iteration guard will catch
# this on the next pass and try fallback or bail.
#
# IMPORTANT: Nous Portal multiplexes multiple upstream
# providers (DeepSeek, Kimi, MiMo, Hermes). A 429 can
# also mean an UPSTREAM provider is out of capacity
# for one specific model -- transient, clears in
# seconds, nothing to do with the caller's quota.
# Tripping the cross-session breaker on that would
# block every Nous model for minutes. We use
# ``is_genuine_nous_rate_limit`` to tell the two
# apart via the 429's own x-ratelimit-* headers and
# the last-known-good state captured on the previous
# successful response.
if (
is_rate_limited
and self.provider == "nous"
and classified.reason == FailoverReason.rate_limit
and not recovered_with_pool
):
_genuine_nous_rate_limit = False
try:
from agent.nous_rate_guard import record_nous_rate_limit
from agent.nous_rate_guard import (
is_genuine_nous_rate_limit,
record_nous_rate_limit,
)
_err_resp = getattr(api_error, "response", None)
_err_hdrs = (
getattr(_err_resp, "headers", None)
if _err_resp else None
)
record_nous_rate_limit(
_genuine_nous_rate_limit = is_genuine_nous_rate_limit(
headers=_err_hdrs,
error_context=error_context,
last_known_state=self._rate_limit_state,
)
if _genuine_nous_rate_limit:
record_nous_rate_limit(
headers=_err_hdrs,
error_context=error_context,
)
else:
logging.info(
"Nous 429 looks like upstream capacity "
"(no exhausted bucket in headers or "
"last-known state) -- not tripping "
"cross-session breaker."
)
except Exception:
pass
# Skip straight to max_retries — the top-of-loop
# guard will handle fallback or bail cleanly.
retry_count = max_retries
continue
if _genuine_nous_rate_limit:
# Skip straight to max_retries -- the
# top-of-loop guard will handle fallback or
# bail cleanly.
retry_count = max_retries
continue
# Upstream capacity 429: fall through to normal
# retry logic. A different model (or the same
# model a moment later) will typically succeed.
is_payload_too_large = (
classified.reason == FailoverReason.payload_too_large
@@ -11757,16 +11729,26 @@ class AIAgent:
interim_has_content = bool((interim_msg.get("content") or "").strip())
interim_has_reasoning = bool(interim_msg.get("reasoning", "").strip()) if isinstance(interim_msg.get("reasoning"), str) else False
interim_has_codex_reasoning = bool(interim_msg.get("codex_reasoning_items"))
interim_has_codex_message_items = bool(interim_msg.get("codex_message_items"))
if interim_has_content or interim_has_reasoning or interim_has_codex_reasoning:
if (
interim_has_content
or interim_has_reasoning
or interim_has_codex_reasoning
or interim_has_codex_message_items
):
last_msg = messages[-1] if messages else None
# Duplicate detection: two consecutive incomplete assistant
# messages with identical content AND reasoning are collapsed.
# For reasoning-only messages (codex_reasoning_items differ but
# visible content/reasoning are both empty), we also compare
# the encrypted items to avoid silently dropping new state.
# For provider-state-only changes (encrypted reasoning
# items or replayable message ids/phases/statuses differ
# while visible content/reasoning are unchanged), compare
# those opaque payloads too so we don't silently drop the
# newer continuation state.
last_codex_items = last_msg.get("codex_reasoning_items") if isinstance(last_msg, dict) else None
interim_codex_items = interim_msg.get("codex_reasoning_items")
last_codex_message_items = last_msg.get("codex_message_items") if isinstance(last_msg, dict) else None
interim_codex_message_items = interim_msg.get("codex_message_items")
duplicate_interim = (
isinstance(last_msg, dict)
and last_msg.get("role") == "assistant"
@@ -11774,6 +11756,7 @@ class AIAgent:
and (last_msg.get("content") or "") == (interim_msg.get("content") or "")
and (last_msg.get("reasoning") or "") == (interim_msg.get("reasoning") or "")
and last_codex_items == interim_codex_items
and last_codex_message_items == interim_codex_message_items
)
if not duplicate_interim:
messages.append(interim_msg)
+95
View File
@@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""Build the Hermes Model Catalog — a centralized JSON manifest of curated models.
This script reads the in-repo hardcoded curated lists (``OPENROUTER_MODELS``,
``_PROVIDER_MODELS["nous"]``) and writes them to a JSON manifest that the
Hermes CLI fetches at runtime. Publishing the catalog through the docs site
lets maintainers update model lists without shipping a Hermes release.
The runtime fetcher falls back to the same in-repo hardcoded lists if the
manifest is unreachable, so this script is a convenience for keeping the
manifest in sync not a source of truth.
Usage::
python scripts/build_model_catalog.py
Output: ``website/static/api/model-catalog.json``
Live URL (after ``deploy-site.yml`` runs on merge to main):
``https://hermes-agent.nousresearch.com/docs/api/model-catalog.json``
"""
from __future__ import annotations
import json
import os
import sys
from datetime import datetime, timezone
REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, REPO_ROOT)
# Ensure HERMES_HOME is set for imports that touch it at module level.
os.environ.setdefault("HERMES_HOME", os.path.join(os.path.expanduser("~"), ".hermes"))
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS # noqa: E402
OUTPUT_PATH = os.path.join(REPO_ROOT, "website", "static", "api", "model-catalog.json")
CATALOG_VERSION = 1
def build_catalog() -> dict:
return {
"version": CATALOG_VERSION,
"updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"metadata": {
"source": "hermes-agent repo",
"docs": "https://hermes-agent.nousresearch.com/docs/reference/model-catalog",
},
"providers": {
"openrouter": {
"metadata": {
"display_name": "OpenRouter",
"note": (
"Descriptions drive picker badges. Live /api/v1/models "
"filters curated ids by tool-calling support and free pricing."
),
},
"models": [
{"id": mid, "description": desc}
for mid, desc in OPENROUTER_MODELS
],
},
"nous": {
"metadata": {
"display_name": "Nous Portal",
"note": (
"Free-tier gating is determined live via Portal pricing "
"(partition_nous_models_by_tier), not this manifest."
),
},
"models": [
{"id": mid}
for mid in _PROVIDER_MODELS.get("nous", [])
],
},
},
}
def main() -> int:
catalog = build_catalog()
os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
with open(OUTPUT_PATH, "w") as fh:
json.dump(catalog, fh, indent=2)
fh.write("\n")
print(f"Wrote {OUTPUT_PATH}")
for provider, block in catalog["providers"].items():
print(f" {provider}: {len(block['models'])} models")
return 0
if __name__ == "__main__":
sys.exit(main())
+126 -7
View File
@@ -29,10 +29,25 @@ BOLD='\033[1m'
REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
# INSTALL_DIR is resolved AFTER arg parsing and OS detection so we can pick an
# FHS-style layout for root installs. Track whether the user gave us an
# explicit directory — if so we never override it.
if [ -n "${HERMES_INSTALL_DIR:-}" ]; then
INSTALL_DIR="$HERMES_INSTALL_DIR"
INSTALL_DIR_EXPLICIT=true
else
INSTALL_DIR=""
INSTALL_DIR_EXPLICIT=false
fi
PYTHON_VERSION="3.11"
NODE_VERSION="22"
# FHS-style root install layout (set by resolve_install_layout when applicable):
# code at /usr/local/lib/hermes-agent, command at /usr/local/bin/hermes,
# data still at /root/.hermes (HERMES_HOME). Matches Claude Code / Codex CLI
# and keeps Docker bind-mounted /root/ volumes lean.
ROOT_FHS_LAYOUT=false
# Options
USE_VENV=true
RUN_SETUP=true
@@ -64,6 +79,7 @@ while [[ $# -gt 0 ]]; do
;;
--dir)
INSTALL_DIR="$2"
INSTALL_DIR_EXPLICIT=true
shift 2
;;
--hermes-home)
@@ -79,9 +95,20 @@ while [[ $# -gt 0 ]]; do
echo " --no-venv Don't create virtual environment"
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " --dir PATH Installation directory"
echo " default (non-root): ~/.hermes/hermes-agent"
echo " default (root, Linux): /usr/local/lib/hermes-agent"
echo " --hermes-home PATH Data directory (default: ~/.hermes, or \$HERMES_HOME)"
echo " -h, --help Show this help"
echo ""
echo "Notes:"
echo " When running as root on Linux, Hermes installs the code under"
echo " /usr/local/lib/hermes-agent and links the command into"
echo " /usr/local/bin/hermes (FHS layout — matches Claude Code / Codex CLI)."
echo " Data, config, sessions, and logs still live in \$HERMES_HOME"
echo " (default /root/.hermes). This keeps Docker bind-mounted volumes"
echo " small and ensures the command is on PATH for all shells."
echo " Existing installs at \$HERMES_HOME/hermes-agent are preserved in-place."
exit 0
;;
*)
@@ -163,9 +190,60 @@ is_termux() {
[ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
}
# Decide where the repo checkout + venv live, and where the `hermes` command
# symlink goes. Called after detect_os so $OS/$DISTRO are known.
#
# Defaults:
# - Non-root, any OS: INSTALL_DIR = $HERMES_HOME/hermes-agent
# command link in $HOME/.local/bin
# - Termux (any uid): INSTALL_DIR = $HERMES_HOME/hermes-agent
# command link in $PREFIX/bin (already on PATH)
# - Root on Linux (new): INSTALL_DIR = /usr/local/lib/hermes-agent
# command link in /usr/local/bin
# (unless a legacy install already exists at
# $HERMES_HOME/hermes-agent — then preserve it)
#
# Always no-op when the user set --dir or $HERMES_INSTALL_DIR.
resolve_install_layout() {
if [ "$INSTALL_DIR_EXPLICIT" = true ]; then
log_info "Install directory: $INSTALL_DIR (explicit)"
return 0
fi
# Termux: package manager manages /data/data/..., keep code in HERMES_HOME.
if is_termux; then
INSTALL_DIR="$HERMES_HOME/hermes-agent"
return 0
fi
# Root on Linux: prefer FHS layout unless a legacy install already exists.
# macOS root installs keep the legacy layout because /usr/local/ on macOS
# is Homebrew territory and we don't want to fight that.
if [ "$OS" = "linux" ] && [ "$(id -u)" -eq 0 ]; then
if [ -d "$HERMES_HOME/hermes-agent/.git" ]; then
INSTALL_DIR="$HERMES_HOME/hermes-agent"
log_info "Existing install detected at $INSTALL_DIR — keeping legacy layout"
log_info " (new root installs use /usr/local/lib/hermes-agent)"
return 0
fi
INSTALL_DIR="/usr/local/lib/hermes-agent"
ROOT_FHS_LAYOUT=true
log_info "Root install on Linux — using FHS layout"
log_info " Code: $INSTALL_DIR"
log_info " Command: /usr/local/bin/hermes"
log_info " Data: $HERMES_HOME (unchanged)"
return 0
fi
# Default: non-root, non-Termux → legacy user-scoped layout.
INSTALL_DIR="$HERMES_HOME/hermes-agent"
}
get_command_link_dir() {
if is_termux && [ -n "${PREFIX:-}" ]; then
echo "$PREFIX/bin"
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo "/usr/local/bin"
else
echo "$HOME/.local/bin"
fi
@@ -174,6 +252,8 @@ get_command_link_dir() {
get_command_link_display_dir() {
if is_termux && [ -n "${PREFIX:-}" ]; then
echo '$PREFIX/bin'
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo '/usr/local/bin'
else
echo '~/.local/bin'
fi
@@ -975,6 +1055,41 @@ setup_path() {
return 0
fi
# FHS layout: /usr/local/bin is normally on PATH for login shells (via
# /etc/profile pathmunge), but on RHEL/CentOS/Rocky/Alma 8+ non-login
# interactive root shells (su, sudo -s, tmux panes, some web terminals)
# only source /etc/bashrc, which does NOT add /usr/local/bin — and
# /root/.bash_profile doesn't either. So verify with `command -v` and
# fall back to writing a PATH guard into /root/.bashrc when needed.
if [ "$ROOT_FHS_LAYOUT" = true ]; then
export PATH="$command_link_dir:$PATH"
# Probe a fresh non-login interactive bash the way the user will use it.
# `bash -i -c` sources ~/.bashrc but NOT ~/.bash_profile or /etc/profile,
# which is the exact scenario where RHEL root loses /usr/local/bin.
if env -i HOME="$HOME" TERM="${TERM:-dumb}" bash -i -c 'command -v hermes' \
>/dev/null 2>&1; then
log_info "/usr/local/bin is already on PATH for all shells"
log_success "hermes command ready"
return 0
fi
log_info "hermes not on PATH in non-login shells (common on RHEL-family)"
PATH_LINE='export PATH="/usr/local/bin:$PATH"'
PATH_COMMENT='# Hermes Agent — ensure /usr/local/bin is on PATH (RHEL non-login shells)'
for SHELL_CONFIG in "$HOME/.bashrc" "$HOME/.bash_profile"; do
[ -f "$SHELL_CONFIG" ] || continue
if ! grep -v '^[[:space:]]*#' "$SHELL_CONFIG" 2>/dev/null \
| grep -qE 'PATH=.*(/usr/local/bin|\$command_link_dir)'; then
echo "" >> "$SHELL_CONFIG"
echo "$PATH_COMMENT" >> "$SHELL_CONFIG"
echo "$PATH_LINE" >> "$SHELL_CONFIG"
log_success "Added /usr/local/bin to PATH in $SHELL_CONFIG"
fi
done
log_success "hermes command ready"
return 0
fi
# Check if ~/.local/bin is on PATH; if not, add it to shell config.
# Detect the user's actual login shell (not the shell running this script,
# which is always bash when piped from curl).
@@ -1339,12 +1454,12 @@ print_success() {
echo ""
# Show file locations
echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
echo -e "${CYAN}${BOLD}📁 Your files:${NC}"
echo ""
echo -e " ${YELLOW}Config:${NC} ~/.hermes/config.yaml"
echo -e " ${YELLOW}API Keys:${NC} ~/.hermes/.env"
echo -e " ${YELLOW}Data:${NC} ~/.hermes/cron/, sessions/, logs/"
echo -e " ${YELLOW}Code:${NC} ~/.hermes/hermes-agent/"
echo -e " ${YELLOW}Config:${NC} $HERMES_HOME/config.yaml"
echo -e " ${YELLOW}API Keys:${NC} $HERMES_HOME/.env"
echo -e " ${YELLOW}Data:${NC} $HERMES_HOME/cron/, sessions/, logs/"
echo -e " ${YELLOW}Code:${NC} $INSTALL_DIR"
echo ""
echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
@@ -1364,6 +1479,9 @@ print_success() {
if [ "$DISTRO" = "termux" ]; then
echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
echo ""
elif [ "$ROOT_FHS_LAYOUT" = true ]; then
echo -e "${YELLOW}⚡ 'hermes' was linked into /usr/local/bin and is ready to use — no shell reload needed.${NC}"
echo ""
else
echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
echo ""
@@ -1415,6 +1533,7 @@ main() {
print_banner
detect_os
resolve_install_layout
install_uv
check_python
check_git
+614
View File
@@ -0,0 +1,614 @@
#!/usr/bin/env python3
"""Drive the Hermes TUI under HERMES_DEV_PERF and summarize the pipeline.
Usage:
scripts/profile-tui.py [--session SID] [--hold KEY] [--seconds N] [--rate HZ]
Defaults: picks the session with the most messages, holds PageUp for 8s at
~30 Hz (matching xterm key-repeat), summarizes ~/.hermes/perf.log on exit.
The --tui build must exist (run `npm run build` in ui-tui first). This script
launches `node dist/entry.js` directly with HERMES_TUI_RESUME set so it
bypasses the hermes_cli wrapper we want repeatable timing, not the CLI's
session-picker flow.
Environment overrides:
HERMES_PERF_LOG (default ~/.hermes/perf.log)
HERMES_PERF_NODE (default node from $PATH)
HERMES_TUI_DIR (default /home/bb/hermes-agent/ui-tui)
Exit code is 0 if the harness ran and parsed results, 2 if the TUI crashed
or produced no perf data (suggests HERMES_DEV_PERF wiring is broken).
"""
from __future__ import annotations
import argparse
import json
import os
import pty
import select
import signal
import sqlite3
import sys
import time
from pathlib import Path
from typing import Any
DEFAULT_TUI_DIR = Path(os.environ.get("HERMES_TUI_DIR", "/home/bb/hermes-agent/ui-tui"))
DEFAULT_LOG = Path(os.environ.get("HERMES_PERF_LOG", str(Path.home() / ".hermes" / "perf.log")))
DEFAULT_STATE_DB = Path.home() / ".hermes" / "state.db"
# Keystroke escape sequences. Matches what xterm/VT220 send when the
# terminal has bracketed-paste disabled and the key-repeat handler fires.
KEYS = {
"page_up": b"\x1b[5~",
"page_down": b"\x1b[6~",
"wheel_up": b"\x1b[M`!!", # mouse wheel up (SGR-less) — best-effort
"shift_up": b"\x1b[1;2A",
"shift_down": b"\x1b[1;2B",
}
def pick_longest_session(db: Path) -> str:
conn = sqlite3.connect(db)
row = conn.execute(
"SELECT id FROM sessions s ORDER BY "
"(SELECT COUNT(*) FROM messages m WHERE m.session_id = s.id) DESC LIMIT 1"
).fetchone()
if not row:
sys.exit(f"no sessions in {db}")
return row[0]
def drain(fd: int, timeout: float) -> bytes:
"""Read whatever's available from fd within `timeout`, then return."""
chunks = []
end = time.monotonic() + timeout
while time.monotonic() < end:
r, _, _ = select.select([fd], [], [], max(0.0, end - time.monotonic()))
if not r:
break
try:
data = os.read(fd, 4096)
except OSError:
break
if not data:
break
chunks.append(data)
return b"".join(chunks)
def hold_key(fd: int, seq: bytes, seconds: float, rate_hz: int) -> int:
"""Write `seq` to fd at ~rate_hz for `seconds`. Returns keystrokes sent."""
interval = 1.0 / max(1, rate_hz)
end = time.monotonic() + seconds
sent = 0
while time.monotonic() < end:
try:
os.write(fd, seq)
sent += 1
except OSError:
break
# Drain stdout to keep the PTY buffer flowing; ignore content.
drain(fd, 0)
time.sleep(interval)
return sent
def summarize(log: Path, since_ts_ms: int) -> dict[str, Any]:
"""Parse perf.log, keep only events newer than since_ts_ms, return stats."""
react_events: list[dict[str, Any]] = []
frame_events: list[dict[str, Any]] = []
if not log.exists():
return {"error": f"no log at {log}", "react": [], "frame": []}
for line in log.read_text().splitlines():
line = line.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError:
continue
if int(row.get("ts", 0)) < since_ts_ms:
continue
src = row.get("src")
if src == "react":
react_events.append(row)
elif src == "frame":
frame_events.append(row)
return {
"react": react_events,
"frame": frame_events,
}
def pct(values: list[float], p: float) -> float:
if not values:
return 0.0
s = sorted(values)
idx = min(len(s) - 1, int(len(s) * p))
return s[idx]
def format_report(data: dict[str, Any]) -> str:
react = data.get("react") or []
frames = data.get("frame") or []
out = []
out.append("═══ React Profiler ═══")
if not react:
out.append(" (no react events — HERMES_DEV_PERF wired? threshold too high?)")
else:
by_id: dict[str, list[float]] = {}
for r in react:
by_id.setdefault(r["id"], []).append(r["actualMs"])
out.append(f" {'pane':<14} {'count':>6} {'p50':>8} {'p95':>8} {'p99':>8} {'max':>8}")
for pid, ms in sorted(by_id.items(), key=lambda kv: -pct(kv[1], 0.99)):
out.append(
f" {pid:<14} {len(ms):>6} {pct(ms,0.50):>8.2f} {pct(ms,0.95):>8.2f} "
f"{pct(ms,0.99):>8.2f} {max(ms):>8.2f}"
)
out.append("")
out.append("═══ Ink pipeline ═══")
if not frames:
out.append(" (no frame events — onFrame wiring broken?)")
else:
dur = [f["durationMs"] for f in frames]
phases_present = any(f.get("phases") for f in frames)
out.append(f" frames captured: {len(frames)}")
out.append(
f" durationMs p50={pct(dur,0.50):.2f} p95={pct(dur,0.95):.2f} "
f"p99={pct(dur,0.99):.2f} max={max(dur):.2f}"
)
# Effective FPS during the run: frames / elapsed seconds.
ts = sorted(f["ts"] for f in frames)
if len(ts) >= 2:
elapsed_s = (ts[-1] - ts[0]) / 1000.0
fps = len(frames) / elapsed_s if elapsed_s > 0 else float("inf")
out.append(f" throughput: {len(frames)} frames / {elapsed_s:.2f}s = {fps:.1f} fps")
if phases_present:
fields = ["yoga", "renderer", "diff", "optimize", "write", "commit"]
out.append("")
out.append(f" {'phase':<10} {'p50':>8} {'p95':>8} {'p99':>8} {'max':>8} (ms)")
for field in fields:
vals = [f["phases"][field] for f in frames if f.get("phases")]
if vals:
out.append(
f" {field:<10} {pct(vals,0.50):>8.2f} {pct(vals,0.95):>8.2f} "
f"{pct(vals,0.99):>8.2f} {max(vals):>8.2f}"
)
# Derived: sum of phases vs durationMs (reveals hidden time).
sum_ps = [
sum(f["phases"][k] for k in fields)
for f in frames if f.get("phases")
]
if sum_ps:
dur_match = [f["durationMs"] for f in frames if f.get("phases")]
deltas = [d - s for d, s in zip(dur_match, sum_ps)]
out.append(
f" {'dur-Σphases':<10} {pct(deltas,0.50):>8.2f} {pct(deltas,0.95):>8.2f} "
f"{pct(deltas,0.99):>8.2f} {max(deltas):>8.2f} (unaccounted-for time)"
)
# Yoga counters
visited = [f["phases"]["yogaVisited"] for f in frames if f.get("phases")]
measured = [f["phases"]["yogaMeasured"] for f in frames if f.get("phases")]
cache_hits = [f["phases"]["yogaCacheHits"] for f in frames if f.get("phases")]
live = [f["phases"]["yogaLive"] for f in frames if f.get("phases")]
out.append("")
out.append(" Yoga counters (per frame):")
for name, vals in (
("visited", visited),
("measured", measured),
("cacheHits", cache_hits),
("live", live),
):
if vals:
out.append(f" {name:<11} p50={pct(vals,0.5):.0f} p99={pct(vals,0.99):.0f} max={max(vals)}")
# Patch counts — proxy for "how much changed each frame"
patches = [f["phases"]["patches"] for f in frames if f.get("phases")]
if patches:
out.append(
f" patches p50={pct(patches,0.5):.0f} p99={pct(patches,0.99):.0f} "
f"max={max(patches)} total={sum(patches)}"
)
optimized = [
f["phases"].get("optimizedPatches", 0)
for f in frames if f.get("phases")
]
if any(optimized):
out.append(
f" optimized p50={pct(optimized,0.5):.0f} p99={pct(optimized,0.99):.0f} "
f"max={max(optimized)} total={sum(optimized)}"
f" (ratio: {sum(optimized)/max(1,sum(patches)):.2f})"
)
# Write bytes + drain telemetry — the outer-terminal bottleneck gauge.
bytes_written = [
f["phases"].get("writeBytes", 0)
for f in frames if f.get("phases")
]
if any(bytes_written):
total_b = sum(bytes_written)
kb = total_b / 1024
out.append(
f" writeBytes p50={pct(bytes_written,0.5):.0f}B p99={pct(bytes_written,0.99):.0f}B "
f"max={max(bytes_written)}B total={kb:.1f}KB"
)
drains = [
f["phases"].get("prevFrameDrainMs", 0)
for f in frames if f.get("phases")
]
if any(d > 0 for d in drains):
nonzero = [d for d in drains if d > 0]
out.append(
f" drainMs p50={pct(nonzero,0.5):.2f} p95={pct(nonzero,0.95):.2f} "
f"p99={pct(nonzero,0.99):.2f} max={max(nonzero):.2f} (terminal flush latency)"
)
backpressure = sum(1 for f in frames if f.get("phases", {}).get("backpressure"))
if backpressure:
out.append(
f" backpressure: {backpressure}/{len(frames)} frames "
f"({100*backpressure/len(frames):.0f}%) (Node stdout buffer full — terminal slow)"
)
# Flickers
flicker_frames = [f for f in frames if f.get("flickers")]
if flicker_frames:
out.append("")
out.append(f" ⚠ flickers detected in {len(flicker_frames)} frames")
reasons: dict[str, int] = {}
for f in flicker_frames:
for fl in f["flickers"]:
reasons[fl["reason"]] = reasons.get(fl["reason"], 0) + 1
for reason, n in sorted(reasons.items(), key=lambda kv: -kv[1]):
out.append(f" {reason}: {n}")
return "\n".join(out)
def key_metrics(data: dict[str, Any]) -> dict[str, float]:
"""Flatten the report into a dict of scalar metrics for A/B diffing."""
metrics: dict[str, float] = {}
frames = data.get("frame") or []
react = data.get("react") or []
if frames:
durs = [f["durationMs"] for f in frames]
metrics["frames"] = len(frames)
metrics["dur_p50"] = pct(durs, 0.50)
metrics["dur_p95"] = pct(durs, 0.95)
metrics["dur_p99"] = pct(durs, 0.99)
metrics["dur_max"] = max(durs)
ts = sorted(f["ts"] for f in frames)
if len(ts) >= 2:
elapsed = (ts[-1] - ts[0]) / 1000.0
metrics["fps_throughput"] = len(frames) / elapsed if elapsed > 0 else 0.0
# Interframe gaps distribution — complementary view to throughput:
gaps = [ts[i] - ts[i - 1] for i in range(1, len(ts))]
if gaps:
metrics["gap_p50_ms"] = pct(gaps, 0.50)
metrics["gap_p99_ms"] = pct(gaps, 0.99)
metrics["gaps_under_16ms"] = sum(1 for g in gaps if g < 16)
metrics["gaps_over_200ms"] = sum(1 for g in gaps if g >= 200)
for phase in ("renderer", "yoga", "diff", "write"):
vals = [f["phases"][phase] for f in frames if f.get("phases")]
if vals:
metrics[f"{phase}_p99"] = pct(vals, 0.99)
metrics[f"{phase}_max"] = max(vals)
patches = [f["phases"]["patches"] for f in frames if f.get("phases")]
if patches:
metrics["patches_total"] = sum(patches)
metrics["patches_p99"] = pct(patches, 0.99)
optimized = [
f["phases"].get("optimizedPatches", 0) for f in frames if f.get("phases")
]
if any(optimized):
metrics["optimized_total"] = sum(optimized)
bytes_list = [
f["phases"].get("writeBytes", 0) for f in frames if f.get("phases")
]
if any(bytes_list):
metrics["writeBytes_total"] = sum(bytes_list)
drains = [
f["phases"].get("prevFrameDrainMs", 0)
for f in frames if f.get("phases")
]
drain_nonzero = [d for d in drains if d > 0]
if drain_nonzero:
metrics["drain_p99"] = pct(drain_nonzero, 0.99)
metrics["drain_max"] = max(drain_nonzero)
bp = sum(1 for f in frames if f.get("phases", {}).get("backpressure"))
metrics["backpressure_frames"] = bp
if react:
for pid in set(e["id"] for e in react):
ms = [e["actualMs"] for e in react if e["id"] == pid]
metrics[f"react_{pid}_p99"] = pct(ms, 0.99)
metrics[f"react_{pid}_max"] = max(ms)
return metrics
def format_diff(before: dict[str, float], after: dict[str, float]) -> str:
"""Render a side-by-side A/B comparison table."""
keys = sorted(set(before) | set(after))
lines = [f"{'metric':<28} {'before':>12} {'after':>12} {'delta':>12} {'%':>6}"]
lines.append("" * 76)
for k in keys:
b = before.get(k, 0.0)
a = after.get(k, 0.0)
d = a - b
pct_change = ((a / b) - 1) * 100 if b not in (0, 0.0) else float("inf") if a else 0
# Flag improvements vs regressions. For _p99 / _max / _total / gaps_over /
# patches / writeBytes / backpressure, LOWER is better. For fps / gaps_under,
# HIGHER is better.
lower_is_better = any(
token in k
for token in (
"p50",
"p95",
"p99",
"_max",
"_total",
"gaps_over",
"backpressure",
"drain",
)
)
higher_is_better = "fps_" in k or "gaps_under" in k
mark = ""
if d and not (lower_is_better or higher_is_better):
mark = ""
elif d < 0 and lower_is_better:
mark = ""
elif d > 0 and higher_is_better:
mark = ""
elif d > 0 and lower_is_better:
mark = "" # regression
elif d < 0 and higher_is_better:
mark = "" # regression
pct_str = "" if pct_change == float("inf") else f"{pct_change:+6.1f}%"
lines.append(
f"{k:<28} {b:>12.2f} {a:>12.2f} {d:>+12.2f} {pct_str} {mark}"
)
return "\n".join(lines)
def run_once(args: argparse.Namespace) -> dict[str, Any]:
tui_dir = Path(args.tui_dir).resolve()
entry = tui_dir / "dist" / "entry.js"
if not entry.exists():
sys.exit(f"{entry} missing — run `npm run build` in {tui_dir} first")
sid = args.session or pick_longest_session(DEFAULT_STATE_DB)
print(f"• session: {sid}")
print(f"• hold: {args.hold} x {args.rate}Hz for {args.seconds}s after {args.warmup}s warmup")
print(f"• terminal: {args.cols}x{args.rows}")
log = Path(args.log)
if not args.keep_log and log.exists():
log.unlink()
since_ms = int(time.time() * 1000)
env = os.environ.copy()
env["HERMES_DEV_PERF"] = "1"
env["HERMES_DEV_PERF_MS"] = str(args.threshold_ms)
env["HERMES_DEV_PERF_LOG"] = str(log)
env["HERMES_TUI_RESUME"] = sid
env["COLUMNS"] = str(args.cols)
env["LINES"] = str(args.rows)
env["TERM"] = env.get("TERM", "xterm-256color")
# Pass through extra flags the TUI wrapper recognizes (e.g. --no-fullscreen).
# Stored on args as `extra_flags` list.
node = os.environ.get("HERMES_PERF_NODE", "node")
node_args = [node, str(entry), *getattr(args, "extra_flags", [])]
pid, fd = pty.fork()
if pid == 0:
os.execvpe(node, node_args, env)
try:
import fcntl, struct, termios
winsize = struct.pack("HHHH", args.rows, args.cols, 0, 0)
fcntl.ioctl(fd, termios.TIOCSWINSZ, winsize)
print(f"• pid: {pid} fd: {fd}")
print(f"• warmup {args.warmup}s (drain startup output)…")
drain(fd, args.warmup)
print(f"• holding {args.hold}")
sent = hold_key(fd, KEYS[args.hold], args.seconds, args.rate)
print(f" sent {sent} keystrokes")
drain(fd, 0.5)
finally:
try:
os.kill(pid, signal.SIGTERM)
for _ in range(10):
pid_done, _ = os.waitpid(pid, os.WNOHANG)
if pid_done == pid:
break
time.sleep(0.1)
else:
os.kill(pid, signal.SIGKILL)
os.waitpid(pid, 0)
except (ProcessLookupError, ChildProcessError):
pass
try:
os.close(fd)
except OSError:
pass
time.sleep(0.2)
return summarize(log, since_ms)
def main() -> int:
p = argparse.ArgumentParser()
p.add_argument("--session", help="session id to resume (default: longest in db)")
p.add_argument("--hold", default="page_up", choices=sorted(KEYS.keys()), help="key to hold")
p.add_argument("--seconds", type=float, default=8.0, help="how long to hold the key")
p.add_argument("--rate", type=int, default=30, help="keystrokes per second")
p.add_argument("--warmup", type=float, default=3.0, help="seconds to wait after launch before input")
p.add_argument("--threshold-ms", type=float, default=0.0, help="HERMES_DEV_PERF_MS (0 = capture all)")
p.add_argument("--cols", type=int, default=120)
p.add_argument("--rows", type=int, default=40)
p.add_argument("--keep-log", action="store_true", help="don't wipe perf.log before run")
p.add_argument("--tui-dir", default=str(DEFAULT_TUI_DIR))
p.add_argument("--log", default=str(DEFAULT_LOG))
p.add_argument("--save", metavar="LABEL",
help="save the final metrics as /tmp/perf-<LABEL>.json for later --compare")
p.add_argument("--compare", metavar="LABEL",
help="diff against /tmp/perf-<LABEL>.json after running")
p.add_argument("--loop", action="store_true",
help="watch for source changes, rebuild, rerun, and diff vs previous run")
p.add_argument("--extra-flag", dest="extra_flags", action="append", default=[],
help="pass through to node dist/entry.js (repeatable)")
args = p.parse_args()
if args.loop:
return loop_mode(args)
# Single-shot path.
data = run_once(args)
print()
print(format_report(data))
metrics = key_metrics(data)
if args.save:
path = Path(f"/tmp/perf-{args.save}.json")
path.write_text(json.dumps(metrics, indent=2))
print(f"\n• saved: {path}")
if args.compare:
path = Path(f"/tmp/perf-{args.compare}.json")
if not path.exists():
print(f"\n⚠ no baseline at {path} — run with --save {args.compare} first")
else:
before = json.loads(path.read_text())
print(f"\n═══ A/B diff vs /tmp/perf-{args.compare}.json ═══")
print(format_diff(before, metrics))
if not data["react"] and not data["frame"]:
return 2
return 0
def loop_mode(args: argparse.Namespace) -> int:
"""Watch source files, rebuild, rerun, print A/B diff against previous run.
Keeps a rolling 'previous run' baseline in memory so each iteration
reports delta vs the last one visibility into whether the last
edit moved the needle. Press Ctrl+C to stop.
"""
import subprocess
tui_dir = Path(args.tui_dir).resolve()
src_root = tui_dir / "src"
pkg_root = tui_dir / "packages" / "hermes-ink" / "src"
def collect_mtimes() -> dict[str, float]:
mtimes: dict[str, float] = {}
for root in (src_root, pkg_root):
if not root.exists():
continue
for path in root.rglob("*"):
if path.suffix in {".ts", ".tsx"} and "__tests__" not in str(path):
try:
mtimes[str(path)] = path.stat().st_mtime
except OSError:
pass
return mtimes
previous_metrics: dict[str, float] | None = None
previous_mtimes = collect_mtimes()
iteration = 0
print(f"• loop mode — watching {src_root} + {pkg_root} for *.ts(x) changes")
print("• edit any TS file, the harness rebuilds + reruns automatically")
print("• Ctrl+C to stop\n")
try:
while True:
iteration += 1
print(f"\n{'' * 76}")
print(f"Iteration {iteration} @ {time.strftime('%H:%M:%S')}")
print("" * 76)
if iteration > 1:
print("• rebuilding…")
result = subprocess.run(
["npm", "run", "build"],
cwd=tui_dir,
capture_output=True,
text=True,
)
if result.returncode != 0:
print("✗ build failed:")
print(result.stdout[-2000:])
print(result.stderr[-2000:])
print("\n• waiting for source changes to retry…")
previous_mtimes = wait_for_change(previous_mtimes, collect_mtimes)
continue
print("✓ build ok")
data = run_once(args)
metrics = key_metrics(data)
print()
print(format_report(data))
if previous_metrics is not None:
print(f"\n═══ A/B diff vs iteration {iteration - 1} ═══")
print(format_diff(previous_metrics, metrics))
previous_metrics = metrics
print("\n• waiting for source changes…")
previous_mtimes = wait_for_change(previous_mtimes, collect_mtimes)
except KeyboardInterrupt:
print("\n• loop stopped")
return 0
def wait_for_change(prev: dict[str, float], collect) -> dict[str, float]:
"""Poll every 1s until a watched file's mtime changes. Debounced 500ms."""
while True:
time.sleep(1)
current = collect()
changed = [
path for path, mtime in current.items() if prev.get(path) != mtime
]
if changed:
print(f"{len(changed)} file(s) changed:")
for path in changed[:5]:
print(f" {path}")
# Debounce — editor save bursts can take ~500ms to settle
time.sleep(0.5)
return collect()
if __name__ == "__main__":
sys.exit(main())
+46
View File
@@ -43,14 +43,22 @@ AUTHOR_MAP = {
"teknium1@gmail.com": "teknium1",
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
"johnnncenaaa77@gmail.com": "johnncenae",
"focusflow.app.help@gmail.com": "yes999zc",
"343873859@qq.com": "DrStrangerUJN",
"uzmpsk.dilekakbas@gmail.com": "dlkakbs",
"jefferson@heimdallstrategy.com": "Mind-Dragon",
"130918800+devorun@users.noreply.github.com": "devorun",
"sonoyuncudmr@gmail.com": "Sonoyunchu",
"maks.mir@yahoo.com": "say8hi",
"web3blind@users.noreply.github.com": "web3blind",
"julia@alexland.us": "alexg0bot",
"1060770+benjaminsehl@users.noreply.github.com": "benjaminsehl",
"nerijusn76@gmail.com": "Nerijusas",
"itonov@proton.me": "Ito-69",
"glesstech@gmail.com": "georgeglessner",
"maxim.smetanin@gmail.com": "maxims-oss",
"yoimexex@gmail.com": "Yoimex",
# contributors (from noreply pattern)
"david.vv@icloud.com": "davidvv",
"wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
@@ -65,9 +73,14 @@ AUTHOR_MAP = {
"thomasgeorgevii09@gmail.com": "tochukwuada",
"harryykyle1@gmail.com": "hharry11",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"1294707+Tosko4@users.noreply.github.com": "Tosko4",
"keira.voss94@gmail.com": "keiravoss94",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"fqsy1416@gmail.com": "EKKOLearnAI",
"octo-patch@github.com": "octo-patch",
"math0r-be@github.com": "math0r-be",
"simbamax99@gmail.com": "simbam99",
"iris@growthpillars.co": "irispillars",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
"101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
"255305877+ismell0992-afk@users.noreply.github.com": "ismell0992-afk",
@@ -92,6 +105,8 @@ AUTHOR_MAP = {
"104278804+Sertug17@users.noreply.github.com": "Sertug17",
"112503481+caentzminger@users.noreply.github.com": "caentzminger",
"258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
"liusway405@gmail.com": "voidborne-d",
"xydarcher@uestc.edu.cn": "Readon",
"sir_even@icloud.com": "sirEven",
"36056348+sirEven@users.noreply.github.com": "sirEven",
"70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
@@ -110,9 +125,22 @@ AUTHOR_MAP = {
"Mibayy@users.noreply.github.com": "Mibayy",
"mibayy@users.noreply.github.com": "Mibayy",
"135070653+sgaofen@users.noreply.github.com": "sgaofen",
"lzy.dev@gmail.com": "zhiyanliu",
"me@janstepanovsky.cz": "hhhonzik",
"139848623+hhuang91@users.noreply.github.com": "hhuang91",
"s.ozaki@ebinou.net": "Satoshi-agi",
"10774721+kunlabs@users.noreply.github.com": "kunlabs",
"110560187+Wang-tianhao@users.noreply.github.com": "Wang-tianhao",
"170458616+ghostmfr@users.noreply.github.com": "ghostmfr",
"1848670+mewwts@users.noreply.github.com": "mewwts",
"1930707+haru398801@users.noreply.github.com": "haru398801",
"rapabelias@gmail.com": "badgerbees",
"xnb888@proton.me": "xnbi",
"xiahu889889@proton.me": "xiahu88988",
"nocoo@users.noreply.github.com": "nocoo",
"30841158+n-WN@users.noreply.github.com": "n-WN",
"tsuijinglei@gmail.com": "hiddenpuppy",
"buraysandro9@gmail.com": "ygd58",
"jerome@clawwork.ai": "HiddenPuppy",
"jerome.benoit@sap.com": "jerome-benoit",
"wysie@users.noreply.github.com": "Wysie",
@@ -175,12 +203,17 @@ AUTHOR_MAP = {
"jaisehgal11299@gmail.com": "jaisup",
"percydikec@gmail.com": "PercyDikec",
"noonou7@gmail.com": "HenkDz",
# Azure Foundry salvage (PRs #9029, #4599, #10086, #8766)
"tech@smartlogics.net": "TechPrototyper",
"637186+HangGlidersRule@users.noreply.github.com": "HangGlidersRule",
"pein892@gmail.com": "pein892",
"dean.kerr@gmail.com": "deankerr",
"socrates1024@gmail.com": "socrates1024",
"seanalt555@gmail.com": "Salt-555",
"satelerd@gmail.com": "satelerd",
"dan@danlynn.com": "danklynn",
"mattmaximo@hotmail.com": "MattMaximo",
"MatthewRHardwick@gmail.com": "mrhwick",
"149063006+j3ffffff@users.noreply.github.com": "j3ffffff",
"A-FdL-Prog@users.noreply.github.com": "A-FdL-Prog",
"l0hde@users.noreply.github.com": "l0hde",
@@ -367,6 +400,17 @@ AUTHOR_MAP = {
"zzn+pa@zzn.im": "xinbenlv",
"zaynjarvis@gmail.com": "ZaynJarvis",
"zhiheng.liu@bytedance.com": "ZaynJarvis",
"izhaolongfei@gmail.com": "loongfay",
"296659110@qq.com": "lrt4836",
"fe.daniel91@gmail.com": "beforeload",
"libo1106@foxmail.com": "libo1106",
"295367131@qq.com": "295367131",
"295367132@qq.com": "IxAres",
"danieldliu@tencent.com": "danieldliu",
"loongzhao@tencent.com": "loongzhao",
"Bartok9@users.noreply.github.com": "Bartok9",
"LeonSGP43@users.noreply.github.com": "LeonSGP43",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"mbelleau@Michels-MacBook-Pro.local": "malaiwah",
"michel.belleau@malaiwah.com": "malaiwah",
"gnanasekaran.sekareee@gmail.com": "gnanam1990",
@@ -409,6 +453,7 @@ AUTHOR_MAP = {
"105142614+VTRiot@users.noreply.github.com": "VTRiot",
"vivien000812@gmail.com": "iamagenius00",
"89228157+Feranmi10@users.noreply.github.com": "Feranmi10",
"oluwadareferanmi11@gmail.com": "Feranmi10",
"simon@gtcl.us": "simon-gtcl",
"suzukaze.haduki@gmail.com": "houko",
"cliff@cigii.com": "cgarwood82",
@@ -504,6 +549,7 @@ AUTHOR_MAP = {
"screenmachine@gmail.com": "teknium1",
"chenzeshi@live.com": "chen1749144759",
"mor.aleksandr@yahoo.com": "MorAlekss",
"ash@users.noreply.github.com": "ash",
}
@@ -281,7 +281,6 @@ Type these during an interactive chat session.
### Utility
```
/branch (/fork) Branch the current session
/btw Ephemeral side question (doesn't interrupt main task)
/fast Toggle priority/fast processing
/browser Open CDP browser connection
/history Show conversation history (CLI)
@@ -403,6 +402,63 @@ Tool changes take effect on `/reset` (new session). They do NOT apply mid-conver
---
## Security & Privacy Toggles
Common "why is Hermes doing X to my output / tool calls / commands?" toggles — and the exact commands to change them. Most of these need a fresh session (`/reset` in chat, or start a new `hermes` invocation) because they're read once at startup.
### Secret redaction in tool output
Hermes auto-redacts strings that look like API keys, tokens, and secrets in all tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) so the model never sees raw credentials. If the user is intentionally working with mock tokens, share-management tokens, or their own secrets and the redaction is getting in the way:
```bash
hermes config set security.redact_secrets false # disable globally
```
**Restart required.** `security.redact_secrets` is snapshotted at import time — setting it mid-session (e.g. via `export HERMES_REDACT_SECRETS=false` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets false` in a terminal, then start a new session. This is deliberate — it prevents an LLM from turning off redaction on itself mid-task.
Re-enable with:
```bash
hermes config set security.redact_secrets true
```
### PII redaction in gateway messages
Separate from secret redaction. When enabled, the gateway hashes user IDs and strips phone numbers from the session context before it reaches the model:
```bash
hermes config set privacy.redact_pii true # enable
hermes config set privacy.redact_pii false # disable (default)
```
### Command approval prompts
By default (`approvals.mode: manual`), Hermes prompts the user before running shell commands flagged as destructive (`rm -rf`, `git reset --hard`, etc.). The modes are:
- `manual` — always prompt (default)
- `smart` — use an auxiliary LLM to auto-approve low-risk commands, prompt on high-risk
- `off` — skip all approval prompts (equivalent to `--yolo`)
```bash
hermes config set approvals.mode smart # recommended middle ground
hermes config set approvals.mode off # bypass everything (not recommended)
```
Per-invocation bypass without changing config:
- `hermes --yolo …`
- `export HERMES_YOLO_MODE=1`
Note: YOLO / `approvals.mode: off` does NOT turn off secret redaction. They are independent.
### Shell hooks allowlist
Some shell-hook integrations require explicit allowlisting before they fire. Managed via `~/.hermes/shell-hooks-allowlist.json` — prompted interactively the first time a hook wants to run.
### Disabling the web/browser/image-gen tools
To keep the model away from network or media tools entirely, open `hermes tools` and toggle per-platform. Takes effect on next session (`/reset`). See the Tools & Skills section above.
---
## Voice & Transcription
### STT (Voice → Text)
-3
View File
@@ -1,3 +0,0 @@
---
description: Skills for monitoring, aggregating, and processing RSS feeds, blogs, and web content sources.
---
@@ -17,6 +17,13 @@ Remove refusal behaviors (guardrails) from open-weight LLMs without retraining o
**License warning:** OBLITERATUS is AGPL-3.0. NEVER import it as a Python library. Always invoke via CLI (`obliteratus` command) or subprocess. This keeps Hermes Agent's MIT license clean.
## Video Guide
Walkthrough of OBLITERATUS used by a Hermes agent to abliterate Gemma:
https://www.youtube.com/watch?v=8fG9BrNTeHs ("OBLITERATUS: An AI Agent Removed Gemma 4's Safety Guardrails")
Useful when the user wants a visual overview of the end-to-end workflow before running it themselves.
## When to Use This Skill
Trigger when the user:
+228
View File
@@ -0,0 +1,228 @@
---
name: airtable
description: Airtable REST API via curl. Records CRUD, filters, upserts.
version: 1.1.0
author: community
license: MIT
prerequisites:
env_vars: [AIRTABLE_API_KEY]
commands: [curl]
metadata:
hermes:
tags: [Airtable, Productivity, Database, API]
homepage: https://airtable.com/developers/web/api/introduction
---
# Airtable — Bases, Tables & Records
Work with Airtable's REST API directly via `curl` using the `terminal` tool. No MCP server, no OAuth flow, no Python SDK — just `curl` and a personal access token.
## Prerequisites
1. Create a **Personal Access Token (PAT)** at https://airtable.com/create/tokens (tokens start with `pat...`).
2. Grant these scopes (minimum):
- `data.records:read` — read rows
- `data.records:write` — create / update / delete rows
- `schema.bases:read` — list bases and tables
3. **Important:** in the same token UI, add each base you want to access to the token's **Access** list. PATs are scoped per-base — a valid token on the wrong base returns `403`.
4. Store the token in `~/.hermes/.env` (or via `hermes setup`):
```
AIRTABLE_API_KEY=pat_your_token_here
```
> Note: legacy `key...` API keys were deprecated Feb 2024. Only PATs and OAuth tokens work now.
## API Basics
- **Endpoint:** `https://api.airtable.com/v0`
- **Auth header:** `Authorization: Bearer $AIRTABLE_API_KEY`
- **All requests** use JSON (`Content-Type: application/json` for any POST/PATCH/PUT body).
- **Object IDs:** bases `app...`, tables `tbl...`, records `rec...`, fields `fld...`. IDs never change; names can. Prefer IDs in automations.
- **Rate limit:** 5 requests/sec/base. `429` → back off. Burst on a single base will be throttled.
Base curl pattern:
```bash
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE?maxRecords=5" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
`-s` suppresses curl's progress bar — keep it set for every call so the tool output stays clean for Hermes. Pipe through `python3 -m json.tool` (always present) or `jq` (if installed) for readable JSON.
## Field Types (request body shapes)
| Field type | Write shape |
|---|---|
| Single line text | `"Name": "hello"` |
| Long text | `"Notes": "multi\nline"` |
| Number | `"Score": 42` |
| Checkbox | `"Done": true` |
| Single select | `"Status": "Todo"` (name must already exist unless `typecast: true`) |
| Multi-select | `"Tags": ["urgent", "bug"]` |
| Date | `"Due": "2026-04-01"` |
| DateTime (UTC) | `"At": "2026-04-01T14:30:00.000Z"` |
| URL / Email / Phone | `"Link": "https://…"` |
| Attachment | `"Files": [{"url": "https://…"}]` (Airtable fetches + rehosts) |
| Linked record | `"Owner": ["recXXXXXXXXXXXXXX"]` (array of record IDs) |
| User | `"AssignedTo": {"id": "usrXXXXXXXXXXXXXX"}` |
Pass `"typecast": true` at the top level of a create/update body to let Airtable auto-coerce values (e.g. create a new select option on the fly, convert `"42"``42`).
## Common Queries
### List bases the token can see
```bash
curl -s "https://api.airtable.com/v0/meta/bases" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
### List tables + schema for a base
```bash
curl -s "https://api.airtable.com/v0/meta/bases/$BASE_ID/tables" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
Use this BEFORE mutating — confirms exact field names and IDs, surfaces `options.choices` for select fields, and shows primary-field names.
### List records (first 10)
```bash
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE?maxRecords=10" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
### Get a single record
```bash
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE/$RECORD_ID" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
### Filter records (filterByFormula)
Airtable formulas must be URL-encoded. Let Python stdlib do it — never hand-encode:
```bash
FORMULA="{Status}='Todo'"
ENC=$(python3 -c 'import sys, urllib.parse; print(urllib.parse.quote(sys.argv[1], safe=""))' "$FORMULA")
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE?filterByFormula=$ENC&maxRecords=20" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
Useful formula patterns:
- Exact match: `{Email}='user@example.com'`
- Contains: `FIND('bug', LOWER({Title}))`
- Multiple conditions: `AND({Status}='Todo', {Priority}='High')`
- Or: `OR({Owner}='alice', {Owner}='bob')`
- Not empty: `NOT({Assignee}='')`
- Date comparison: `IS_AFTER({Due}, TODAY())`
### Sort + select specific fields
```bash
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE?sort%5B0%5D%5Bfield%5D=Priority&sort%5B0%5D%5Bdirection%5D=asc&fields%5B%5D=Name&fields%5B%5D=Status" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
Square brackets in query params MUST be URL-encoded (`%5B` / `%5D`).
### Use a named view
```bash
curl -s "https://api.airtable.com/v0/$BASE_ID/$TABLE?view=Grid%20view&maxRecords=50" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
Views apply their saved filter + sort server-side.
## Common Mutations
### Create a record
```bash
curl -s -X POST "https://api.airtable.com/v0/$BASE_ID/$TABLE" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"fields":{"Name":"New task","Status":"Todo","Priority":"High"}}' | python3 -m json.tool
```
### Create up to 10 records in one call
```bash
curl -s -X POST "https://api.airtable.com/v0/$BASE_ID/$TABLE" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"typecast": true,
"records": [
{"fields": {"Name": "Task A", "Status": "Todo"}},
{"fields": {"Name": "Task B", "Status": "In progress"}}
]
}' | python3 -m json.tool
```
Batch endpoints are capped at **10 records per request**. For larger inserts, loop in batches of 10 with a short sleep to respect 5 req/sec/base.
### Update a record (PATCH — merges, preserves unchanged fields)
```bash
curl -s -X PATCH "https://api.airtable.com/v0/$BASE_ID/$TABLE/$RECORD_ID" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"fields":{"Status":"Done"}}' | python3 -m json.tool
```
### Upsert by a merge field (no ID needed)
```bash
curl -s -X PATCH "https://api.airtable.com/v0/$BASE_ID/$TABLE" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"performUpsert": {"fieldsToMergeOn": ["Email"]},
"records": [
{"fields": {"Email": "user@example.com", "Status": "Active"}}
]
}' | python3 -m json.tool
```
`performUpsert` creates records whose merge-field values are new, patches records whose merge-field values already exist. Great for idempotent syncs.
### Delete a record
```bash
curl -s -X DELETE "https://api.airtable.com/v0/$BASE_ID/$TABLE/$RECORD_ID" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
### Delete up to 10 records in one call
```bash
curl -s -X DELETE "https://api.airtable.com/v0/$BASE_ID/$TABLE?records%5B%5D=rec1&records%5B%5D=rec2" \
-H "Authorization: Bearer $AIRTABLE_API_KEY" | python3 -m json.tool
```
## Pagination
List endpoints return at most **100 records per page**. If the response includes `"offset": "..."`, pass it back on the next call. Loop until the field is absent:
```bash
OFFSET=""
while :; do
URL="https://api.airtable.com/v0/$BASE_ID/$TABLE?pageSize=100"
[ -n "$OFFSET" ] && URL="$URL&offset=$OFFSET"
RESP=$(curl -s "$URL" -H "Authorization: Bearer $AIRTABLE_API_KEY")
echo "$RESP" | python3 -c 'import json,sys; d=json.load(sys.stdin); [print(r["id"], r["fields"].get("Name","")) for r in d["records"]]'
OFFSET=$(echo "$RESP" | python3 -c 'import json,sys; d=json.load(sys.stdin); print(d.get("offset",""))')
[ -z "$OFFSET" ] && break
done
```
## Typical Hermes Workflow
1. **Confirm auth.** `curl -s -o /dev/null -w "%{http_code}\n" https://api.airtable.com/v0/meta/bases -H "Authorization: Bearer $AIRTABLE_API_KEY"` — expect `200`.
2. **Find the base.** List bases (step above) OR ask the user for the `app...` ID directly if the token lacks `schema.bases:read`.
3. **Inspect the schema.** `GET /v0/meta/bases/$BASE_ID/tables` — cache the exact field names and primary-field name locally in the session before mutating anything.
4. **Read before you write.** For "update X where Y", `filterByFormula` first to resolve the `rec...` ID, then `PATCH /v0/$BASE_ID/$TABLE/$RECORD_ID`. Never guess record IDs.
5. **Batch writes.** Combine related creates into one 10-record POST to stay under the 5 req/sec budget.
6. **Destructive ops.** Deletions can't be undone via API. If the user says "delete all Xs", echo back the filter + record count and confirm before firing.
## Pitfalls
- **`filterByFormula` MUST be URL-encoded.** Field names with spaces or non-ASCII also need encoding (`{My Field}``%7BMy%20Field%7D`). Use Python stdlib (pattern above) — never hand-escape.
- **Empty fields are omitted from responses.** A missing `"Assignee"` key doesn't mean the field doesn't exist — it means this record's value is empty. Check the schema (step 3) before concluding a field is missing.
- **PATCH vs PUT.** `PATCH` merges supplied fields into the record. `PUT` replaces the record entirely and clears any field you didn't include. Default to `PATCH`.
- **Single-select options must exist.** Writing `"Status": "Shipping"` when `Shipping` isn't in the field's option list errors with `INVALID_MULTIPLE_CHOICE_OPTIONS` unless you pass `"typecast": true` (which auto-creates the option).
- **Per-base token scoping.** A `403` on one base while another works means the token's Access list doesn't include that base — not a scope or auth issue. Send the user to https://airtable.com/create/tokens to grant it.
- **Rate limits are per base, not per token.** 5 req/sec on `baseA` and 5 req/sec on `baseB` is fine; 6 req/sec on `baseA` alone will throttle. Monitor the `Retry-After` header on `429`.
## Important Notes for Hermes
- **Always use the `terminal` tool with `curl`.** Do NOT use `web_extract` (it can't send auth headers) or `browser_navigate` (needs UI auth and is slow).
- **`AIRTABLE_API_KEY` flows from `~/.hermes/.env` into the subprocess automatically** when this skill is loaded — no need to re-export it before each `curl` call.
- **Escape curly braces in formulas carefully.** In a heredoc body, `{Status}` is literal. In a shell argument, `{Status}` is safe outside `{...}` brace-expansion context — but pass dynamic strings through `python3 urllib.parse.quote` before splicing into a URL.
- **Pretty-print with `python3 -m json.tool`** (always present) rather than `jq` (optional). Only reach for `jq` when you need filtering/projection.
- **Pagination is per-page, not global.** Airtable's 100-record cap is a hard limit; there is no way to bump it. Loop with `offset` until the field is absent.
- **Read the `errors` array** on non-2xx responses — Airtable returns structured error codes like `AUTHENTICATION_REQUIRED`, `INVALID_PERMISSIONS`, `MODEL_ID_NOT_FOUND`, `INVALID_MULTIPLE_CHOICE_OPTIONS` that tell you exactly what's wrong.
@@ -289,6 +289,7 @@ def exchange_auth_code(code: str):
sys.exit(1)
pending_auth = _load_pending_auth()
raw_callback = code
code, returned_state = _extract_code_and_state(code)
if returned_state and returned_state != pending_auth["state"]:
print("ERROR: OAuth state mismatch. Run --auth-url again to start a fresh session.")
@@ -298,19 +299,13 @@ def exchange_auth_code(code: str):
from google_auth_oauthlib.flow import Flow
from urllib.parse import parse_qs, urlparse
# Extract granted scopes from the callback URL if present
if returned_state and "scope" in parse_qs(urlparse(code).query if isinstance(code, str) and code.startswith("http") else {}):
granted_scopes = parse_qs(urlparse(code).query)["scope"][0].split()
else:
# Try to extract from code_or_url parameter
if isinstance(code, str) and code.startswith("http"):
params = parse_qs(urlparse(code).query)
if "scope" in params:
granted_scopes = params["scope"][0].split()
else:
granted_scopes = SCOPES
else:
granted_scopes = SCOPES
# Extract granted scopes from the callback URL if the user pasted the full redirect URL.
granted_scopes = list(SCOPES)
if isinstance(raw_callback, str) and raw_callback.startswith("http"):
params = parse_qs(urlparse(raw_callback).query)
scope_val = (params.get("scope") or [""])[0].strip()
if scope_val:
granted_scopes = scope_val.split()
flow = Flow.from_client_secrets_file(
str(CLIENT_SECRET_PATH),
@@ -926,13 +926,18 @@ def cmd_timezone(args):
os_ = offset_info.get("seconds", 0)
sign = "+" if oh >= 0 else "-"
utc_offset = f"{sign}{abs(oh):02d}:{om:02d}"
if os_:
utc_offset = f"{utc_offset}:{os_:02d}"
elif tz_data.get("standardUtcOffset"):
offset_info2 = tz_data["standardUtcOffset"]
if isinstance(offset_info2, dict):
oh = offset_info2.get("hours", 0)
om = abs(offset_info2.get("minutes", 0))
os_ = offset_info2.get("seconds", 0)
sign = "+" if oh >= 0 else "-"
utc_offset = f"{sign}{abs(oh):02d}:{om:02d}"
if os_:
utc_offset = f"{utc_offset}:{os_:02d}"
timezone_src = "timeapi.io"
except (RuntimeError, KeyError, TypeError):
pass # API may be down; continue to fallback
@@ -0,0 +1,151 @@
---
name: debugging-hermes-tui-commands
description: Use when debugging or adding Hermes TUI slash commands across the Python backend (hermes_cli/commands.py), the tui_gateway bridge, and the TypeScript/Ink frontend. Covers autocomplete gaps, gateway dispatch issues, and live UI-state wiring.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [debugging, hermes-agent, tui, slash-commands, typescript, python]
related_skills: [python-debugpy, node-inspect-debugger, systematic-debugging]
---
# Debugging Hermes TUI Slash Commands
## Overview
Hermes slash commands span three layers — Python command registry, tui_gateway JSON-RPC bridge, and the Ink/TypeScript frontend. When a command misbehaves (missing from autocomplete, works in CLI but not TUI, config persists but UI doesn't update), the bug is almost always one layer being out of sync with another.
Use this skill when you encounter issues with slash commands in the Hermes TUI, particularly when commands aren't showing in autocomplete, aren't working properly in the TUI, or need to be added/updated.
## When to Use
- A slash command exists in one part of the codebase but doesn't work fully
- A command needs to be added to both backend and frontend
- Command autocomplete isn't working for specific commands
- Command behavior is inconsistent between CLI and TUI
- A command persists config but doesn't apply live in the TUI
## Architecture Overview
```
Python backend (hermes_cli/commands.py) <- canonical COMMAND_REGISTRY
TUI gateway (tui_gateway/server.py) <- slash.exec / command.dispatch
TUI frontend (ui-tui/src/app/slash/) <- local handlers + fallthrough
```
Command definitions must be registered consistently across Python and TypeScript to work properly. The Python `COMMAND_REGISTRY` is the source of truth for: CLI dispatch, gateway help, Telegram BotCommand menu, Slack subcommand map, and autocomplete data shipped to Ink.
## Investigation Steps
1. **Check if the command exists in the TUI frontend:**
```bash
search_files --pattern "/commandname" --file_glob "*.ts" --path ui-tui/
search_files --pattern "/commandname" --file_glob "*.tsx" --path ui-tui/
```
2. **Examine the TUI command definition:**
```bash
read_file ui-tui/src/app/slash/commands/core.ts
# If not there:
search_files --pattern "commandname" --path ui-tui/src/app/slash/commands --target files
```
3. **Check if the command exists in the Python backend:**
```bash
search_files --pattern "CommandDef" --file_glob "*.py" --path hermes_cli/
search_files --pattern "commandname" --path hermes_cli/commands.py --context 3
```
4. **Examine the gateway implementation:**
```bash
search_files --pattern "complete.slash|slash.exec" --path tui_gateway/
```
## Fix: Missing Command Autocomplete
If a command exists in the TUI but doesn't show in autocomplete:
1. Add a `CommandDef` entry to `COMMAND_REGISTRY` in `hermes_cli/commands.py`:
```python
CommandDef("commandname", "Description of the command", "Session",
cli_only=True, aliases=("alias",),
args_hint="[arg1|arg2|arg3]",
subcommands=("arg1", "arg2", "arg3")),
```
2. Pick `cli_only` vs gateway availability carefully:
- `cli_only=True` — only in the interactive CLI/TUI
- `gateway_only=True` — only in messaging platforms
- neither — available everywhere
- `gateway_config_gate="display.foo"` — config-gated availability in the gateway
3. Ensure `subcommands` matches the expected tab-completion options shown by the TUI.
4. If the command runs server-side, add a handler in `HermesCLI.process_command()` in `cli.py`:
```python
elif canonical == "commandname":
self._handle_commandname(cmd_original)
```
5. For gateway-available commands, add a handler in `gateway/run.py`:
```python
if canonical == "commandname":
return await self._handle_commandname(event)
```
## Common Issues
1. **Command shows in TUI but not in autocomplete.** The command is defined in the TUI codebase but missing from `COMMAND_REGISTRY` in `hermes_cli/commands.py`. Autocomplete data ships from Python.
2. **Command shows in autocomplete but doesn't work.** Check the command handler in `tui_gateway/server.py` and the frontend handler in `ui-tui/src/app/createSlashHandler.ts`. If the command is local-only in Ink, it must be handled in `app.tsx` built-in branch; otherwise it falls through to `slash.exec` and must have a Python handler.
3. **Command behavior differs between CLI and TUI.** The command might have different implementations. Check both `cli.py::process_command` and the TUI's local handler. Local TUI handlers take precedence over gateway dispatch.
4. **Command persists config but doesn't apply live.** For TUI-local commands, updating `config.set` is not enough. Also patch the relevant nanostore state immediately (usually `patchUiState(...)`) and pass any new state through rendering components. Example: `/details collapsed` must update live detail visibility, not just save `details_mode`; in-session global `/details <mode>` may need a separate command-override flag so live commands can override built-in section defaults while startup/config sync preserves default-expanded thinking/tools behavior.
5. **Gateway dispatch silently ignores the command.** The gateway only dispatches commands it knows about. Check `GATEWAY_KNOWN_COMMANDS` (derived from `COMMAND_REGISTRY` automatically) includes the canonical name. If the command is `cli_only` with a `gateway_config_gate`, verify the gated config value is truthy.
## Debugging Tactics
When surface-level inspection doesn't reveal the bug:
- **Python side hangs or misbehaves:** use the `python-debugpy` skill to break inside `_SlashWorker.exec` or the command handler. `remote-pdb` set at the handler entry is the fastest path.
- **Ink side not reacting:** use the `node-inspect-debugger` skill to break in `app.tsx`'s slash dispatch or the local command branch. `sb('dist/app.js', <line>)` after `npm run build`.
- **Registry mismatch / unclear which side is wrong:** compare the canonical `COMMAND_REGISTRY` entry against the TUI's local command list side-by-side.
## Pitfalls
- Don't forget to set the appropriate category for the command in `CommandDef` (e.g., "Session", "Configuration", "Tools & Skills", "Info", "Exit")
- Make sure any aliases are properly registered in the `aliases` tuple — no other file changes are needed, everything downstream (Telegram menu, Slack mapping, autocomplete, help) derives from it
- For commands with subcommands, ensure the `subcommands` tuple in `CommandDef` matches what's in the TUI code
- `cli_only=True` commands won't work in gateway/messaging platforms — unless you add a `gateway_config_gate` and the gate is truthy
- After adding live UI state, search every consumer of the old prop/helper and thread the new state through all render paths, not just the active streaming path. TUI detail rendering has at least two important paths: live `StreamingAssistant`/`ToolTrail` and transcript/pending `MessageLine` rows. A `/clean` pass should explicitly check both.
- Rebuild the TUI (`npm --prefix ui-tui run build`) before testing — tsx watch mode may lag on first launch
## Verification
After fixing:
1. Rebuild the TUI:
```bash
cd /home/bb/hermes-agent && npm --prefix ui-tui run build
```
2. Run the TUI and test the command:
```bash
hermes --tui
```
3. Type `/` and verify the command appears in autocomplete suggestions with the expected description and args hint.
4. Execute the command and confirm:
- Expected behavior fires
- Any persisted config updates correctly (`read_file ~/.hermes/config.yaml`)
- Live UI state reflects the change immediately (not just after restart)
5. If the command is also gateway-available, test it from at least one messaging platform (or run the gateway tests: `scripts/run_tests.sh tests/gateway/`).
@@ -0,0 +1,164 @@
---
name: hermes-agent-skill-authoring
description: Use when authoring or updating a SKILL.md inside the hermes-agent repo itself (skills/ tree, committed to a branch). Covers required frontmatter, validator limits, peer-matching structure, and the write_file-vs-skill_manage distinction for in-repo skills.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [skills, authoring, hermes-agent, conventions, skill-md]
related_skills: [writing-plans, requesting-code-review]
---
# Authoring Hermes-Agent Skills (in-repo)
## Overview
There are two places a SKILL.md can live:
1. **User-local:** `~/.hermes/skills/<maybe-category>/<name>/SKILL.md` — personal, not shared. Created via `skill_manage(action='create')`.
2. **In-repo (this skill is about this case):** `/home/bb/hermes-agent/skills/<category>/<name>/SKILL.md` — committed, shipped with the package. Use `write_file` + `git add`. `skill_manage(action='create')` does NOT target this tree.
## When to Use
- User asks you to add a skill "in this branch / repo / commit"
- You're committing a reusable workflow that should ship with hermes-agent
- You're editing an existing skill under `/home/bb/hermes-agent/skills/` (use `patch` for small edits, `write_file` for rewrites; `skill_manage` still works for patch on in-repo skills, but not for `create`)
## Required Frontmatter
Source of truth: `tools/skill_manager_tool.py::_validate_frontmatter`. Hard requirements:
- Starts with `---` as the first bytes (no leading blank line).
- Closes with `\n---\n` before the body.
- Parses as a YAML mapping.
- `name` field present.
- `description` field present, ≤ **1024 chars** (`MAX_DESCRIPTION_LENGTH`).
- Non-empty body after the closing `---`.
Peer-matched shape used by every skill under `skills/software-development/`:
```yaml
---
name: my-skill-name # lowercase, hyphens, ≤64 chars (MAX_NAME_LENGTH)
description: Use when <trigger>. <one-line behavior>.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [short, descriptive, tags]
related_skills: [other-skill, another-skill]
---
```
`version` / `author` / `license` / `metadata` are NOT enforced by the validator, but every peer has them — omit and your skill sticks out.
## Size Limits
- Description: ≤ 1024 chars (enforced).
- Full SKILL.md: ≤ 100,000 chars (enforced as `MAX_SKILL_CONTENT_CHARS`, ~36k tokens).
- Peer skills in `software-development/` sit at **8-14k chars**. Aim for that range. If you're pushing past 20k, split into `references/*.md` and reference them from SKILL.md.
## Peer-Matched Structure
Every in-repo skill follows roughly:
```
# <Title>
## Overview
One or two paragraphs: what and why.
## When to Use
- Bulleted triggers
- "Don't use for:" counter-triggers
## <Topic sections specific to the skill>
- Quick-reference tables are common
- Code blocks with exact commands
- Hermes-specific recipes (tests via scripts/run_tests.sh, ui-tui paths, etc.)
## Common Pitfalls
Numbered list of mistakes and their fixes.
## Verification Checklist
- [ ] Checkbox list of post-action verifications
## One-Shot Recipes (optional)
Named scenarios → concrete command sequences.
```
Not every section is mandatory, but `Overview` + `When to Use` + actionable body + pitfalls are the minimum for the skill to feel like a peer.
## Directory Placement
```
skills/<category>/<skill-name>/SKILL.md
```
Categories currently in repo (confirm with `ls skills/`): `autonomous-ai-agents`, `creative`, `data-science`, `devops`, `dogfood`, `email`, `gaming`, `github`, `leisure`, `mcp`, `media`, `mlops/*`, `note-taking`, `productivity`, `red-teaming`, `research`, `smart-home`, `social-media`, `software-development`.
Pick the closest existing category. Don't invent new top-level categories casually.
## Workflow
1. **Survey peers** in the target category:
```
ls skills/<category>/
```
Read 2-3 peer SKILL.md files to match tone and structure.
2. **Check validator constraints** in `tools/skill_manager_tool.py` if unsure.
3. **Draft** with `write_file` to `skills/<category>/<name>/SKILL.md`.
4. **Validate locally**:
```python
import yaml, re, pathlib
content = pathlib.Path("skills/<category>/<name>/SKILL.md").read_text()
assert content.startswith("---")
m = re.search(r'\n---\s*\n', content[3:])
fm = yaml.safe_load(content[3:m.start()+3])
assert "name" in fm and "description" in fm
assert len(fm["description"]) <= 1024
assert len(content) <= 100_000
```
5. **Git add + commit** on the active branch.
6. **Note:** the CURRENT session's skill loader is cached — `skill_view` / `skills_list` will not see the new skill until a new session. This is expected, not a bug.
## Cross-Referencing Other Skills
`metadata.hermes.related_skills` unions both trees (`skills/` in-repo and `~/.hermes/skills/`) at load time. You CAN reference a user-local skill from an in-repo skill, but it won't resolve for other users who clone the repo fresh. Prefer referencing only in-repo skills from in-repo skills. If a frequently-referenced skill lives only in `~/.hermes/skills/`, consider promoting it to the repo.
## Editing Existing In-Repo Skills
- **Small fix (typo, added pitfall, tightened trigger):** `skill_manage(action='patch', name=..., old_string=..., new_string=...)` works fine on in-repo skills.
- **Major rewrite:** `write_file` the whole SKILL.md. `skill_manage(action='edit')` also works but requires supplying the full new content.
- **Adding supporting files:** `write_file` to `skills/<category>/<name>/references/<file>.md`, `templates/<file>`, or `scripts/<file>`. `skill_manage(action='write_file')` also works and enforces the references/templates/scripts/assets subdir allowlist.
- **Always commit** the edit — in-repo skills are source, not runtime state.
## Common Pitfalls
1. **Using `skill_manage(action='create')` for an in-repo skill.** It writes to `~/.hermes/skills/`, not the repo tree. Use `write_file` for in-repo creation.
2. **Leading whitespace before `---`.** The validator checks `content.startswith("---")`; any leading blank line or BOM fails validation.
3. **Description too generic.** Peer descriptions start with "Use when ..." and describe the *trigger class*, not the one task. "Use when debugging X" > "Debug X".
4. **Forgetting the author/license/metadata block.** Not validator-enforced, but every peer has it; omitting makes the skill look half-finished.
5. **Writing a skill that duplicates a peer.** Before creating, `ls skills/<category>/` and open 2-3 peers. Prefer extending an existing skill to creating a narrow sibling.
6. **Expecting the current session to see the new skill.** It won't. The skill loader is initialized at session start. Verify in a fresh session or via `skill_view` using the exact path.
7. **Linking to skills that don't exist in-repo.** `related_skills: [some-user-local-skill]` works for you but breaks for other clones. Prefer only in-repo links.
## Verification Checklist
- [ ] File is at `skills/<category>/<name>/SKILL.md` (not in `~/.hermes/skills/`)
- [ ] Frontmatter starts at byte 0 with `---`, closes with `\n---\n`
- [ ] `name`, `description`, `version`, `author`, `license`, `metadata.hermes.{tags, related_skills}` all present
- [ ] Name ≤ 64 chars, lowercase + hyphens
- [ ] Description ≤ 1024 chars and starts with "Use when ..."
- [ ] Total file ≤ 100,000 chars (aim for 8-15k)
- [ ] Structure: `# Title``## Overview``## When to Use` → body → `## Common Pitfalls``## Verification Checklist`
- [ ] `related_skills` references resolve in-repo (or are explicitly OK to be user-local)
- [ ] `git add skills/<category>/<name>/ && git commit` completed on the intended branch
@@ -0,0 +1,318 @@
---
name: node-inspect-debugger
description: Use when debugging Node.js code (ui-tui, tui_gateway child processes, any Node script/test) with real breakpoints, stepping, scope inspection, and expression evaluation. Drives `node --inspect` via the Chrome DevTools Protocol from the terminal — no browser required.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [debugging, nodejs, node-inspect, cdp, breakpoints, ui-tui]
related_skills: [systematic-debugging, python-debugpy, debugging-hermes-tui-commands]
---
# Node.js Inspect Debugger
## Overview
When `console.log` isn't enough, drive Node's built-in V8 inspector programmatically from the terminal. You get real breakpoints, step in/over/out, call-stack walking, local/closure scope dumps, and arbitrary expression evaluation in the paused frame.
Two tools, pick one:
- **`node inspect`** — built-in, zero install, CLI REPL. Best for quick poking.
- **`ndb` / CDP via `chrome-remote-interface`** — scriptable from Node/Python; best when you want to automate many breakpoints, collect state across runs, or debug non-interactively from an agent loop.
**Prefer `node inspect` first.** It's always available and the REPL is fast.
## When to Use
- A Node test fails and you need to see intermediate state
- ui-tui crashes or behaves wrong and you want to inspect React/Ink state pre-render
- tui_gateway child processes (`_SlashWorker`, PTY bridge workers) misbehave
- You need to inspect a value in a closure that `console.log` can't reach without patching
- Perf: attach to a running process to capture a CPU profile or heap snapshot
**Don't use for:** things `console.log` solves in under a minute. Breakpoint-driven debugging is heavier; use it when the payoff is real.
## Quick Reference: `node inspect` REPL
Launch paused on first line:
```bash
node inspect path/to/script.js
# or with tsx
node --inspect-brk $(which tsx) path/to/script.ts
```
The `debug>` prompt accepts:
| Command | Action |
|---|---|
| `c` or `cont` | continue |
| `n` or `next` | step over |
| `s` or `step` | step into |
| `o` or `out` | step out |
| `pause` | pause running code |
| `sb('file.js', 42)` | set breakpoint at file.js line 42 |
| `sb(42)` | set breakpoint at line 42 of current file |
| `sb('functionName')` | break when function is called |
| `cb('file.js', 42)` | clear breakpoint |
| `breakpoints` | list all breakpoints |
| `bt` | backtrace (call stack) |
| `list(5)` | show 5 lines of source around current position |
| `watch('expr')` | evaluate expr on every pause |
| `watchers` | show watched expressions |
| `repl` | drop into REPL in current scope (Ctrl+C to exit REPL) |
| `exec expr` | evaluate expression once |
| `restart` | restart script |
| `kill` | kill the script |
| `.exit` | quit debugger |
**In the `repl` sub-mode:** type any JS expression, including access to locals/closure variables. `Ctrl+C` exits back to `debug>`.
## Attaching to a Running Process
When the process is already running (e.g. a long-lived dev server or the TUI gateway):
```bash
# 1. Send SIGUSR1 to enable the inspector on an existing process
kill -SIGUSR1 <pid>
# Node prints: Debugger listening on ws://127.0.0.1:9229/<uuid>
# 2. Attach the debugger CLI
node inspect -p <pid>
# or by URL
node inspect ws://127.0.0.1:9229/<uuid>
```
To start a process with the inspector from the beginning:
```bash
node --inspect script.js # listen on 127.0.0.1:9229, keep running
node --inspect-brk script.js # listen AND pause on first line
node --inspect=0.0.0.0:9230 script.js # custom host:port
```
For TypeScript via tsx:
```bash
node --inspect-brk --import tsx script.ts
# or older tsx
node --inspect-brk -r tsx/cjs script.ts
```
## Programmatic CDP (scripting from terminal)
When you want to automate — set many breakpoints, capture scope state, script a repro — use `chrome-remote-interface`:
```bash
npm i -g chrome-remote-interface # or project-local
# Start your target:
node --inspect-brk=9229 target.js &
```
Driver script (save as `/tmp/cdp-debug.js`):
```javascript
const CDP = require('chrome-remote-interface');
(async () => {
const client = await CDP({ port: 9229 });
const { Debugger, Runtime } = client;
Debugger.paused(async ({ callFrames, reason }) => {
const top = callFrames[0];
console.log(`PAUSED: ${reason} @ ${top.url}:${top.location.lineNumber + 1}`);
// Walk scopes for locals
for (const scope of top.scopeChain) {
if (scope.type === 'local' || scope.type === 'closure') {
const { result } = await Runtime.getProperties({
objectId: scope.object.objectId,
ownProperties: true,
});
for (const p of result) {
console.log(` ${scope.type}.${p.name} =`, p.value?.value ?? p.value?.description);
}
}
}
// Evaluate an expression in the paused frame
const { result } = await Debugger.evaluateOnCallFrame({
callFrameId: top.callFrameId,
expression: 'typeof state !== "undefined" ? JSON.stringify(state) : "n/a"',
});
console.log('state =', result.value ?? result.description);
await Debugger.resume();
});
await Runtime.enable();
await Debugger.enable();
// Set a breakpoint by URL regex + line
await Debugger.setBreakpointByUrl({
urlRegex: '.*app\\.tsx$',
lineNumber: 119, // 0-indexed
columnNumber: 0,
});
await Runtime.runIfWaitingForDebugger();
})();
```
Run it:
```bash
node /tmp/cdp-debug.js
```
Hermes-specific note: `chrome-remote-interface` is NOT in `ui-tui/package.json`. Install it to a throwaway location if you don't want to dirty the project:
```bash
mkdir -p /tmp/cdp-tools && cd /tmp/cdp-tools && npm i chrome-remote-interface
NODE_PATH=/tmp/cdp-tools/node_modules node /tmp/cdp-debug.js
```
## Debugging Hermes ui-tui
The TUI is built Ink + tsx. Two common scenarios:
### Debugging a single Ink component under dev
`ui-tui/package.json` has `npm run dev` (tsx --watch). Add `--inspect-brk` by running tsx directly:
```bash
cd /home/bb/hermes-agent/ui-tui
npm run build # produce dist/ once so transpile isn't needed on first load
node --inspect-brk dist/entry.js
# In another terminal:
node inspect -p <node pid>
```
Then inside `debug>`:
```
sb('dist/app.js', 220) # or wherever the suspect render is
cont
```
When it pauses, `repl` → inspect `props`, state refs, `useInput` handler values, etc.
### Debugging a running `hermes --tui`
The TUI spawns Node from the Python CLI. Easiest path:
```bash
# 1. Launch TUI
hermes --tui &
TUI_PID=$(pgrep -f 'ui-tui/dist/entry' | head -1)
# 2. Enable inspector on that Node PID
kill -SIGUSR1 "$TUI_PID"
# 3. Find the WS URL
curl -s http://127.0.0.1:9229/json/list | jq -r '.[0].webSocketDebuggerUrl'
# 4. Attach
node inspect ws://127.0.0.1:9229/<uuid>
```
Interacting with the TUI (typing in its window) continues to advance execution; your debugger can pause it on a breakpoint at any `sb(...)`.
### Debugging `_SlashWorker` / PTY child processes
Those are Python, not Node — use the `python-debugpy` skill for them. Only Node portions (Ink UI, tui_gateway client, tsx-run tests under `ui-tui/`) use this skill.
## Running Vitest Tests Under the Debugger
```bash
cd /home/bb/hermes-agent/ui-tui
# Run a single test file paused on entry
node --inspect-brk ./node_modules/vitest/vitest.mjs run --no-file-parallelism src/app/foo.test.tsx
```
In another terminal: `node inspect -p <pid>`, then `sb('src/app/foo.tsx', 42)`, `cont`.
Use `--no-file-parallelism` (vitest) or `--runInBand` (jest) so only one worker exists — debugging a pool is painful.
## Heap Snapshots & CPU Profiles (Non-interactive)
From the CDP driver above, swap Debugger for `HeapProfiler` / `Profiler`:
```javascript
// CPU profile for 5 seconds
await client.Profiler.enable();
await client.Profiler.start();
await new Promise(r => setTimeout(r, 5000));
const { profile } = await client.Profiler.stop();
require('fs').writeFileSync('/tmp/cpu.cpuprofile', JSON.stringify(profile));
// Open /tmp/cpu.cpuprofile in Chrome DevTools → Performance tab
```
```javascript
// Heap snapshot
await client.HeapProfiler.enable();
const chunks = [];
client.HeapProfiler.addHeapSnapshotChunk(({ chunk }) => chunks.push(chunk));
await client.HeapProfiler.takeHeapSnapshot({ reportProgress: false });
require('fs').writeFileSync('/tmp/heap.heapsnapshot', chunks.join(''));
```
## Common Pitfalls
1. **Wrong line numbers in TS source.** Breakpoints hit the emitted JS, not the `.ts`. Either (a) break in the built `dist/*.js`, or (b) enable sourcemaps (`node --enable-source-maps`) and use `sb('src/app.tsx', N)` — but only with CDP clients that follow sourcemaps. `node inspect` CLI does not.
2. **`--inspect` vs `--inspect-brk`.** `--inspect` starts the inspector but doesn't pause; your script races past your first breakpoint if you attach too late. Use `--inspect-brk` when you need to set breakpoints before any code runs.
3. **Port collisions.** Default is `9229`. If multiple Node processes are inspecting, pass `--inspect=0` (random port) and read the actual URL from `/json/list`:
```bash
curl -s http://127.0.0.1:9229/json/list # lists all inspectable targets on the host
```
4. **Child processes.** `--inspect` on a parent does NOT inspect its children. Use `NODE_OPTIONS='--inspect-brk' node parent.js` to propagate to every child; be aware they all need unique ports (Node auto-increments when `NODE_OPTIONS='--inspect'` is inherited).
5. **Background kills.** If you `Ctrl+C` out of `node inspect` while the target is paused, the target stays paused. Either `cont` first, or `kill` the target explicitly.
6. **Running `node inspect` through an agent terminal.** It's a PTY-friendly REPL. In Hermes, launch it with `terminal(pty=true)` or `background=true` + `process(action='submit', data='...')`. Non-PTY foreground mode will work for one-shot commands but not for interactive stepping.
7. **Security.** `--inspect=0.0.0.0:9229` exposes arbitrary code execution. Always bind to `127.0.0.1` (the default) unless you have an isolated network.
## Verification Checklist
After setting up a debug session, verify:
- [ ] `curl -s http://127.0.0.1:9229/json/list` returns exactly the target you expect
- [ ] First breakpoint actually hits (if it doesn't, you likely missed `--inspect-brk` or attached after execution completed)
- [ ] Source listing at pause shows the right file (mismatch = sourcemap issue, see pitfall 1)
- [ ] `exec process.pid` in `repl` returns the PID you meant to attach to
## One-Shot Recipes
**"Why is this variable undefined at line X?"**
```bash
node --inspect-brk script.js &
node inspect -p $!
# debug>
sb('script.js', X)
cont
# paused. Now:
repl
> myVariable
> Object.keys(this)
```
**"What's the call path into this function?"**
```
debug> sb('suspectFn')
debug> cont
# paused on entry
debug> bt
```
**"This async chain hangs — where?"**
```
# Start with --inspect (no -brk), let it run to the hang, then:
debug> pause
debug> bt
# Now you see the stuck frame
```
@@ -0,0 +1,374 @@
---
name: python-debugpy
description: Use when debugging Python code (run_agent.py, cli.py, tui_gateway, tests, scripts) with real breakpoints, stepping, scope inspection, and post-mortem analysis. Covers `pdb` for interactive REPL debugging and `debugpy` for remote/headless DAP-driven sessions.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [debugging, python, pdb, debugpy, breakpoints, dap, post-mortem]
related_skills: [systematic-debugging, node-inspect-debugger, debugging-hermes-tui-commands]
---
# Python Debugger (pdb + debugpy)
## Overview
Three tools, picked by situation:
| Tool | When |
|---|---|
| **`breakpoint()` + pdb** | Local, interactive, simplest. Add `breakpoint()` in the source, run normally, get a REPL at that line. |
| **`python -m pdb`** | Launch an existing script under pdb with no source edits. Useful for quick poking. |
| **`debugpy`** | Remote / headless / "attach to already-running process." Talks DAP, scriptable from terminal, works for long-lived processes (gateway, daemon, PTY children). |
**Start with `breakpoint()`.** It's the cheapest thing that works.
## When to Use
- A test fails and the traceback doesn't reveal why a value is wrong
- You need to step through a function and watch a collection mutate
- A long-running process (hermes gateway, tui_gateway) misbehaves and you can't restart it
- Post-mortem: an exception fired in prod-ish code and you want to inspect locals at the crash site
- A subprocess / child (Python `_SlashWorker`, PTY bridge worker) is the actual bug site
**Don't use for:** things `print()` / `logging.debug` solve in under a minute, or things `pytest -vv --tb=long --showlocals` already reveals.
## pdb Quick Reference
Inside any pdb prompt (`(Pdb)`):
| Command | Action |
|---|---|
| `h` / `h cmd` | help |
| `n` | next line (step over) |
| `s` | step into |
| `r` | return from current function |
| `c` | continue |
| `unt N` | continue until line N |
| `j N` | jump to line N (same function only) |
| `l` / `ll` | list source around current line / full function |
| `w` | where (stack trace) |
| `u` / `d` | move up / down in the stack |
| `a` | print args of the current function |
| `p expr` / `pp expr` | print / pretty-print expression |
| `display expr` | auto-print expr on every stop |
| `b file:line` | set breakpoint |
| `b func` | break on function entry |
| `b file:line, cond` | conditional breakpoint |
| `cl N` | clear breakpoint N |
| `tbreak file:line` | one-shot breakpoint |
| `!stmt` | execute arbitrary Python (assignments included) |
| `interact` | drop into full Python REPL in current scope (Ctrl+D to exit) |
| `q` | quit |
The `interact` command is the most powerful — you can import anything, inspect complex objects, even call methods that mutate state. Locals are read-only by default; use `!x = 42` from the `(Pdb)` prompt to mutate.
## Recipe 1: Local breakpoint
Easiest. Edit the file:
```python
def compute(x, y):
result = some_helper(x)
breakpoint() # <-- drops into pdb here
return result + y
```
Run the code normally. You land at the `breakpoint()` line with full access to locals.
**Don't forget to remove `breakpoint()` before committing.** Use `git diff` or a pre-commit grep:
```bash
rg -n 'breakpoint\(\)' --type py
```
## Recipe 2: Launch a script under pdb (no source edits)
```bash
python -m pdb path/to/script.py arg1 arg2
# Lands at first line of script
(Pdb) b path/to/script.py:42
(Pdb) c
```
## Recipe 3: Debug a pytest test
The hermes test runner and pytest both support this:
```bash
# Drop to pdb on failure (or on any raised exception):
scripts/run_tests.sh tests/path/to/test_file.py::test_name --pdb
# Drop to pdb at the START of the test:
scripts/run_tests.sh tests/path/to/test_file.py::test_name --trace
# Show locals in tracebacks without pdb:
scripts/run_tests.sh tests/path/to/test_file.py --showlocals --tb=long
```
Note: `scripts/run_tests.sh` uses xdist (`-n 4`) by default, and pdb does NOT work under xdist. Add `-p no:xdist` or run a single test with `-n 0`:
```bash
scripts/run_tests.sh tests/foo_test.py::test_bar --pdb -p no:xdist
# or
source .venv/bin/activate
python -m pytest tests/foo_test.py::test_bar --pdb
```
This bypasses the hermetic-env guarantees — fine for debugging, but re-run under the wrapper to confirm before pushing.
## Recipe 4: Post-mortem on any exception
```python
import pdb, sys
try:
run_the_thing()
except Exception:
pdb.post_mortem(sys.exc_info()[2])
```
Or wrap a whole script:
```bash
python -m pdb -c continue script.py
# When it crashes, pdb catches it and you're in the frame of the exception
```
Or set a global hook in a repl/jupyter:
```python
import sys
def excepthook(etype, value, tb):
import pdb; pdb.post_mortem(tb)
sys.excepthook = excepthook
```
## Recipe 5: Remote debug with debugpy (attach to running process)
For long-lived processes: Hermes gateway, tui_gateway, a daemon, a process that's already misbehaving and can't be restarted clean.
### Setup
```bash
source /home/bb/hermes-agent/.venv/bin/activate
pip install debugpy
```
### Pattern A: Source-edit — process waits for debugger at launch
Add near the top of the entry point (or inside the function you want to debug):
```python
import debugpy
debugpy.listen(("127.0.0.1", 5678))
print("debugpy listening on 5678, waiting for client...", flush=True)
debugpy.wait_for_client()
debugpy.breakpoint() # optional: pause immediately once attached
```
Start the process; it blocks on `wait_for_client()`.
### Pattern B: No source edit — launch with `-m debugpy`
```bash
python -m debugpy --listen 127.0.0.1:5678 --wait-for-client your_script.py arg1
```
Equivalent for module entry:
```bash
python -m debugpy --listen 127.0.0.1:5678 --wait-for-client -m your.module
```
### Pattern C: Attach to an already-running process
Needs the PID and debugpy preinstalled in the target's environment:
```bash
python -m debugpy --listen 127.0.0.1:5678 --pid <pid>
# debugpy injects itself into the process. Then attach a client as below.
```
Some kernels/security configs block the ptrace-based injection (`/proc/sys/kernel/yama/ptrace_scope`). Fix with:
```bash
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
```
### Connecting a client from the terminal
The easiest terminal-side DAP client is VS Code CLI or a small script. From inside Hermes you have two practical options:
**Option 1: `debugpy`'s own CLI REPL** — not an official feature, but a tiny DAP client script:
```python
# /tmp/dap_client.py
import socket, json, itertools, time, sys
HOST, PORT = "127.0.0.1", 5678
s = socket.create_connection((HOST, PORT))
seq = itertools.count(1)
def send(msg):
msg["seq"] = next(seq)
body = json.dumps(msg).encode()
s.sendall(f"Content-Length: {len(body)}\r\n\r\n".encode() + body)
def recv():
header = b""
while b"\r\n\r\n" not in header:
header += s.recv(1)
length = int(header.decode().split("Content-Length:")[1].split("\r\n")[0].strip())
body = b""
while len(body) < length:
body += s.recv(length - len(body))
return json.loads(body)
send({"type": "request", "command": "initialize", "arguments": {"adapterID": "python"}})
print(recv())
send({"type": "request", "command": "attach", "arguments": {}})
print(recv())
send({"type": "request", "command": "setBreakpoints",
"arguments": {"source": {"path": sys.argv[1]},
"breakpoints": [{"line": int(sys.argv[2])}]}})
print(recv())
send({"type": "request", "command": "configurationDone"})
# ... loop reading events and sending continue/stepIn/etc.
```
This is fine for one-off automation but painful as an interactive UX.
**Option 2: Attach from VS Code / Cursor / Zed** — if the user has one open, they can add a `launch.json`:
```json
{
"name": "Attach to Hermes",
"type": "debugpy",
"request": "attach",
"connect": { "host": "127.0.0.1", "port": 5678 },
"justMyCode": false,
"pathMappings": [
{ "localRoot": "${workspaceFolder}", "remoteRoot": "/home/bb/hermes-agent" }
]
}
```
**Option 3: Ditch DAP, use `remote-pdb`** — usually what you actually want from a terminal agent:
```bash
pip install remote-pdb
```
In your code:
```python
from remote_pdb import set_trace
set_trace(host="127.0.0.1", port=4444) # blocks until connection
```
Then from the terminal:
```bash
nc 127.0.0.1 4444
# You get a (Pdb) prompt exactly as if debugging locally.
```
`remote-pdb` is the cleanest agent-friendly choice when `debugpy`'s DAP protocol is overkill. Use `debugpy` only when you actually need IDE integration.
## Debugging Hermes-specific Processes
### Tests
See Recipe 3. Always add `-p no:xdist` or run single tests without xdist.
### `run_agent.py` / CLI — one-shot
Easiest: add `breakpoint()` near the suspect line, then run `hermes` normally. Control returns to your terminal at the pause point.
### `tui_gateway` subprocess (spawned by `hermes --tui`)
The gateway runs as a child of the Node TUI. Options:
**A. Source-edit the gateway:**
```python
# tui_gateway/server.py near the top of serve()
import debugpy
debugpy.listen(("127.0.0.1", 5678))
debugpy.wait_for_client()
```
Start `hermes --tui`. The TUI will appear frozen (its backend is waiting). Attach a client; execution resumes when you `continue`.
**B. Use `remote-pdb` at a specific handler:**
```python
from remote_pdb import set_trace
set_trace(host="127.0.0.1", port=4444) # in the RPC handler you want to trap
```
Trigger the matching slash command from the TUI, then `nc 127.0.0.1 4444` in another terminal.
### `_SlashWorker` subprocess
Same pattern — `remote-pdb` with `set_trace()` inside the worker's `exec` path. The worker is persistent across slash commands, so the first trigger blocks until you connect; subsequent slash commands pass through normally unless you re-arm.
### Gateway (`gateway/run.py`)
Long-lived. Use `remote-pdb` at a handler, or `debugpy` with `--wait-for-client` if you're restarting the gateway anyway.
## Common Pitfalls
1. **pdb under pytest-xdist silently does nothing.** You won't see the prompt, the test just hangs. Always use `-p no:xdist` or `-n 0`.
2. **`breakpoint()` in CI / non-TTY contexts hangs the process.** Safe locally; never commit it. Add a pre-commit grep as a safety net.
3. **`PYTHONBREAKPOINT=0`** disables all `breakpoint()` calls. Check the env if your breakpoint isn't hitting:
```bash
echo $PYTHONBREAKPOINT
```
4. **`debugpy.listen` blocks only if you also call `wait_for_client()`.** Without it, execution continues and your first breakpoint may fire before the client is attached.
5. **Attach to PID fails on hardened kernels.** `ptrace_scope=1` (Ubuntu default) allows only same-user ptrace of child processes. Workaround: `echo 0 > /proc/sys/kernel/yama/ptrace_scope` (needs root) or launch under `debugpy` from the start.
6. **Threads.** `pdb` only debugs the current thread. For multithreaded code, use `debugpy` (thread-aware DAP) or set `threading.settrace()` per thread.
7. **asyncio.** `pdb` works in coroutines but `await` inside pdb requires Python 3.13+ or `await` from `interact` mode on older versions. For 3.11/3.12, use `asyncio.run_coroutine_threadsafe` tricks or `!stmt`-based awaits via `asyncio.ensure_future`.
8. **`scripts/run_tests.sh` strips credentials and sets `HOME=<tmpdir>`.** If your bug depends on user config or real API keys, it won't reproduce under the wrapper. Debug with raw `pytest` first to repro, then re-confirm under the wrapper.
9. **Forking / multiprocessing.** pdb does not follow forks. Each child needs its own `breakpoint()` or `set_trace()`. For Hermes subagents, debug one process at a time.
## Verification Checklist
- [ ] After `pip install debugpy`, confirm: `python -c "import debugpy; print(debugpy.__version__)"`
- [ ] For remote debug, confirm the port is actually listening: `ss -tlnp | grep 5678`
- [ ] First breakpoint actually hits (if it doesn't, you likely have `PYTHONBREAKPOINT=0`, you're under xdist, or execution finished before attach)
- [ ] `where` / `w` shows the expected call stack
- [ ] Post-debug cleanup: no stray `breakpoint()` / `set_trace()` in committed code
```bash
rg -n 'breakpoint\(\)|set_trace\(|debugpy\.listen' --type py
```
## One-Shot Recipes
**"Why is this dict missing a key?"**
```python
# add above the KeyError site
breakpoint()
# then in pdb:
(Pdb) pp d
(Pdb) pp list(d.keys())
(Pdb) w # how did we get here
```
**"This test passes in isolation but fails in the suite."**
```bash
scripts/run_tests.sh tests/the_test.py --pdb -p no:xdist
# But if it only fails WITH other tests:
source .venv/bin/activate
python -m pytest tests/ -x --pdb -p no:xdist
# Now it pdb-traps at the exact failing test after state accumulated.
```
**"My async handler deadlocks."**
```python
# Add at handler entry
import remote_pdb; remote_pdb.set_trace(host="127.0.0.1", port=4444)
```
Trigger the handler. `nc 127.0.0.1 4444`, then `w` to see the suspended frame, `!import asyncio; asyncio.all_tasks()` to see what else is pending.
**"Post-mortem on a crash in an Ink child process / subprocess."**
```bash
PYTHONFAULTHANDLER=1 python -m pdb -c continue path/to/entrypoint.py
# On crash, pdb lands at the frame of the exception with full locals
```
+107
View File
@@ -0,0 +1,107 @@
---
name: yuanbao
description: Yuanbao (元宝) group interaction — @mention users, query group info and members
version: 1.0.0
metadata:
hermes:
tags: [yuanbao, mention, at, group, members, 元宝, 派, 艾特]
related_skills: []
---
# Yuanbao Group Interaction
## CRITICAL: How Messaging Works
**Your text reply IS the message sent to the group/user.** The gateway automatically delivers your response text to the chat. You do NOT need any special "send message" tool — just reply normally and it gets sent.
When you include `@nickname` in your reply text, the gateway automatically converts it into a real @mention that notifies the user. This is built-in — you have full @mention capability.
**NEVER say you cannot send messages or @mention users. NEVER suggest the user do it manually. NEVER add disclaimers about permissions. Just reply with the text you want sent.**
## Available Tools
| Tool | When to use |
|------|------------|
| `yb_query_group_info` | Query group name, owner, member count |
| `yb_query_group_members` | Find a user, list bots, list all members, or get nickname for @mention |
| `yb_send_dm` | Send a private/direct message (DM / 私信) to a user, with optional media files |
## @Mention Workflow
When you need to @mention / 艾特 someone:
1. Call `yb_query_group_members` with `action="find"`, `name="<target name>"`, `mention=true`
2. Get the exact nickname from the response
3. Include `@nickname` in your reply text — the gateway handles the rest
Example: user says "帮我艾特元宝"
Step 1 — tool call:
```json
{ "group_code": "328306697", "action": "find", "name": "元宝", "mention": true }
```
Step 2 — your reply (this gets sent to the group with a working @mention):
```
@元宝 你好,有人找你!
```
**That's it.** No extra explanation needed. Keep it short and natural.
**Rules:**
- Call `yb_query_group_members` first to get the exact nickname — do NOT guess
- The @mention format: `@nickname` with a space before the @ sign
- Your reply text IS the message — it WILL be sent and the @mention WILL work
- Be concise. Do NOT explain how @mention works to the user.
## Send DM (Private Message) Workflow
When someone asks to send a private message / 私信 / DM to a user:
1. Call `yb_send_dm` with `group_code`, `name` (target user's name), and `message`
2. The tool automatically finds the user and sends the DM
3. Report the result to the user
Example: user says "给 @用户aea3 私信发一个 hello"
```json
yb_send_dm({ "group_code": "535168412", "name": "用户aea3", "message": "hello" })
```
Example with media: user says "给 @用户aea3 私信发一张图片"
```json
yb_send_dm({
"group_code": "535168412",
"name": "用户aea3",
"message": "Here is the image",
"media_files": [{"path": "/tmp/photo.jpg"}]
})
```
**Rules:**
- Extract `group_code` from the current chat_id (e.g. `group:535168412``535168412`)
- If you already know the user_id, pass it directly via the `user_id` parameter to skip lookup
- If multiple users match the name, the tool returns candidates — ask the user to clarify
- Do NOT use `send_message` tool for Yuanbao DMs — use `yb_send_dm` instead
- Supports media: images (.jpg/.png/.gif/.webp/.bmp) sent as image messages, other files as documents
## Query Group Info
```json
yb_query_group_info({ "group_code": "328306697" })
```
## Query Members
| Action | Description |
|--------|-------------|
| `find` | Search by name (partial match, case-insensitive) |
| `list_bots` | List bots and Yuanbao AI assistants |
| `list_all` | List all members |
## Notes
- `group_code` comes from chat_id: `group:328306697``328306697`
- Groups are called "派 (Pai)" in the Yuanbao app
- Member roles: `user`, `yuanbao_ai`, `bot`
@@ -386,7 +386,7 @@ class TestProvidersDictApiModeAnthropicMessages:
},
},
"auxiliary": {
"flush_memories": {
"compression": {
"provider": "myrelay",
"model": "claude-sonnet-4.6",
},
@@ -399,11 +399,11 @@ class TestProvidersDictApiModeAnthropicMessages:
AnthropicAuxiliaryClient,
AsyncAnthropicAuxiliaryClient,
)
async_client, async_model = get_async_text_auxiliary_client("flush_memories")
async_client, async_model = get_async_text_auxiliary_client("compression")
assert isinstance(async_client, AsyncAnthropicAuxiliaryClient)
assert async_model == "claude-sonnet-4.6"
sync_client, sync_model = get_text_auxiliary_client("flush_memories")
sync_client, sync_model = get_text_auxiliary_client("compression")
assert isinstance(sync_client, AnthropicAuxiliaryClient)
assert sync_model == "claude-sonnet-4.6"
+51 -8
View File
@@ -192,6 +192,43 @@ class TestDefaultContextLengths:
f"{model_id}: expected {expected_ctx}, got {actual}"
)
def test_deepseek_v4_models_1m_context(self):
from agent.model_metadata import get_model_context_length
from unittest.mock import patch as mock_patch
expected_keys = {
"deepseek-v4-pro": 1_000_000,
"deepseek-v4-flash": 1_000_000,
"deepseek-chat": 1_000_000,
"deepseek-reasoner": 1_000_000,
}
for key, value in expected_keys.items():
assert key in DEFAULT_CONTEXT_LENGTHS, f"{key} missing"
assert DEFAULT_CONTEXT_LENGTHS[key] == value, (
f"{key} should be {value}, got {DEFAULT_CONTEXT_LENGTHS[key]}"
)
# Longest-first substring matching must resolve both the bare V4
# ids (native DeepSeek) and the vendor-prefixed forms (OpenRouter
# / Nous Portal) to 1M without probing down to the legacy 128K
# ``deepseek`` substring fallback.
with mock_patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
mock_patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
mock_patch("agent.model_metadata.get_cached_context_length", return_value=None):
cases = [
("deepseek-v4-pro", 1_000_000),
("deepseek-v4-flash", 1_000_000),
("deepseek/deepseek-v4-pro", 1_000_000),
("deepseek/deepseek-v4-flash", 1_000_000),
("deepseek-chat", 1_000_000),
("deepseek-reasoner", 1_000_000),
]
for model_id, expected_ctx in cases:
actual = get_model_context_length(model_id)
assert actual == expected_ctx, (
f"{model_id}: expected {expected_ctx}, got {actual}"
)
def test_all_values_positive(self):
for key, value in DEFAULT_CONTEXT_LENGTHS.items():
assert value > 0, f"{key} has non-positive context length"
@@ -303,7 +340,9 @@ class TestCodexOAuthContextLength:
from agent.model_metadata import get_model_context_length
# OpenRouter — should hit its own catalog path first; when mocked
# empty, falls through to hardcoded DEFAULT_CONTEXT_LENGTHS (400k).
# empty, falls through to hardcoded DEFAULT_CONTEXT_LENGTHS (1.05M,
# matching the real direct-API value — Codex OAuth's 272k cap is
# provider-specific and must not leak here).
with patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
patch("agent.model_metadata.get_cached_context_length", return_value=None), \
@@ -314,7 +353,7 @@ class TestCodexOAuthContextLength:
api_key="",
provider="openrouter",
)
assert ctx == 400_000, (
assert ctx == 1_050_000, (
f"Non-Codex gpt-5.5 resolved to {ctx}; Codex 272k override "
"leaked outside openai-codex provider"
)
@@ -459,9 +498,10 @@ class TestGetModelContextLength:
@patch("agent.model_metadata.fetch_model_metadata")
def test_api_missing_context_length_key(self, mock_fetch):
"""Model in API but without context_length → defaults to 128000."""
"""Model in API but without context_length → defaults to the top
probe tier (currently 256K)."""
mock_fetch.return_value = {"test/model": {"name": "Test"}}
assert get_model_context_length("test/model") == 128000
assert get_model_context_length("test/model") == CONTEXT_PROBE_TIERS[0]
@patch("agent.model_metadata.fetch_model_metadata")
def test_cache_takes_priority_over_api(self, mock_fetch, tmp_path):
@@ -814,14 +854,17 @@ class TestContextProbeTiers:
for i in range(len(CONTEXT_PROBE_TIERS) - 1):
assert CONTEXT_PROBE_TIERS[i] > CONTEXT_PROBE_TIERS[i + 1]
def test_first_tier_is_128k(self):
assert CONTEXT_PROBE_TIERS[0] == 128_000
def test_first_tier_is_256k(self):
assert CONTEXT_PROBE_TIERS[0] == 256_000
def test_last_tier_is_8k(self):
assert CONTEXT_PROBE_TIERS[-1] == 8_000
class TestGetNextProbeTier:
def test_from_256k(self):
assert get_next_probe_tier(256_000) == 128_000
def test_from_128k(self):
assert get_next_probe_tier(128_000) == 64_000
@@ -841,8 +884,8 @@ class TestGetNextProbeTier:
assert get_next_probe_tier(100_000) == 64_000
def test_above_max_tier(self):
"""Value above 128K should return 128K."""
assert get_next_probe_tier(500_000) == 128_000
"""Value above 256K should return 256K."""
assert get_next_probe_tier(500_000) == 256_000
def test_zero_returns_none(self):
assert get_next_probe_tier(0) is None
+138
View File
@@ -251,3 +251,141 @@ class TestAuxiliaryClientIntegration:
monkeypatch.setattr(aux, "_read_nous_auth", lambda: None)
result = aux._try_nous()
assert result == (None, None)
class TestIsGenuineNousRateLimit:
"""Tell a real account-level 429 apart from an upstream-capacity 429.
Nous Portal multiplexes upstreams (DeepSeek, Kimi, MiMo, Hermes).
A 429 from an upstream out of capacity should NOT trip the
cross-session breaker; a real user-quota 429 should.
"""
def test_exhausted_hourly_bucket_in_429_headers_is_genuine(self):
from agent.nous_rate_guard import is_genuine_nous_rate_limit
headers = {
"x-ratelimit-limit-requests-1h": "800",
"x-ratelimit-remaining-requests-1h": "0",
"x-ratelimit-reset-requests-1h": "3100",
"x-ratelimit-limit-requests": "200",
"x-ratelimit-remaining-requests": "198",
"x-ratelimit-reset-requests": "40",
}
assert is_genuine_nous_rate_limit(headers=headers) is True
def test_exhausted_tokens_bucket_is_genuine(self):
from agent.nous_rate_guard import is_genuine_nous_rate_limit
headers = {
"x-ratelimit-limit-tokens": "800000",
"x-ratelimit-remaining-tokens": "0",
"x-ratelimit-reset-tokens": "45", # < 60s threshold -> not genuine
"x-ratelimit-limit-tokens-1h": "8000000",
"x-ratelimit-remaining-tokens-1h": "0",
"x-ratelimit-reset-tokens-1h": "1800", # >= 60s threshold -> genuine
}
assert is_genuine_nous_rate_limit(headers=headers) is True
def test_healthy_headers_on_429_are_upstream_capacity(self):
# Classic upstream-capacity symptom: Nous edge reports plenty of
# headroom on every bucket, but returns 429 anyway because
# upstream (DeepSeek / Kimi / ...) is out of capacity.
from agent.nous_rate_guard import is_genuine_nous_rate_limit
headers = {
"x-ratelimit-limit-requests": "200",
"x-ratelimit-remaining-requests": "198",
"x-ratelimit-reset-requests": "40",
"x-ratelimit-limit-requests-1h": "800",
"x-ratelimit-remaining-requests-1h": "750",
"x-ratelimit-reset-requests-1h": "3100",
"x-ratelimit-limit-tokens": "800000",
"x-ratelimit-remaining-tokens": "790000",
"x-ratelimit-reset-tokens": "40",
"x-ratelimit-limit-tokens-1h": "8000000",
"x-ratelimit-remaining-tokens-1h": "7800000",
"x-ratelimit-reset-tokens-1h": "3100",
}
assert is_genuine_nous_rate_limit(headers=headers) is False
def test_bare_429_with_no_headers_is_upstream(self):
from agent.nous_rate_guard import is_genuine_nous_rate_limit
assert is_genuine_nous_rate_limit(headers=None) is False
assert is_genuine_nous_rate_limit(headers={}) is False
assert is_genuine_nous_rate_limit(
headers={"content-type": "application/json"}
) is False
def test_exhausted_bucket_with_short_reset_is_not_genuine(self):
# remaining == 0 but reset in < 60s: almost certainly a
# secondary per-minute throttle that will clear immediately --
# not worth tripping the cross-session breaker.
from agent.nous_rate_guard import is_genuine_nous_rate_limit
headers = {
"x-ratelimit-limit-requests": "200",
"x-ratelimit-remaining-requests": "0",
"x-ratelimit-reset-requests": "30",
}
assert is_genuine_nous_rate_limit(headers=headers) is False
def test_last_known_state_with_exhausted_bucket_triggers_genuine(self):
# Headers on the 429 lack rate-limit info, but the previous
# successful response already showed the hourly bucket
# exhausted -- the 429 is almost certainly that limit
# continuing.
from agent.nous_rate_guard import is_genuine_nous_rate_limit
from agent.rate_limit_tracker import parse_rate_limit_headers
prior_headers = {
"x-ratelimit-limit-requests-1h": "800",
"x-ratelimit-remaining-requests-1h": "0",
"x-ratelimit-reset-requests-1h": "2000",
"x-ratelimit-limit-requests": "200",
"x-ratelimit-remaining-requests": "100",
"x-ratelimit-reset-requests": "30",
"x-ratelimit-limit-tokens": "800000",
"x-ratelimit-remaining-tokens": "700000",
"x-ratelimit-reset-tokens": "30",
"x-ratelimit-limit-tokens-1h": "8000000",
"x-ratelimit-remaining-tokens-1h": "7000000",
"x-ratelimit-reset-tokens-1h": "2000",
}
last_state = parse_rate_limit_headers(prior_headers, provider="nous")
assert is_genuine_nous_rate_limit(
headers=None, last_known_state=last_state
) is True
def test_last_known_state_all_healthy_stays_upstream(self):
# Prior state was healthy; bare 429 arrives; should be treated
# as upstream capacity.
from agent.nous_rate_guard import is_genuine_nous_rate_limit
from agent.rate_limit_tracker import parse_rate_limit_headers
prior_headers = {
"x-ratelimit-limit-requests-1h": "800",
"x-ratelimit-remaining-requests-1h": "750",
"x-ratelimit-reset-requests-1h": "2000",
"x-ratelimit-limit-requests": "200",
"x-ratelimit-remaining-requests": "180",
"x-ratelimit-reset-requests": "30",
"x-ratelimit-limit-tokens": "800000",
"x-ratelimit-remaining-tokens": "790000",
"x-ratelimit-reset-tokens": "30",
"x-ratelimit-limit-tokens-1h": "8000000",
"x-ratelimit-remaining-tokens-1h": "7900000",
"x-ratelimit-reset-tokens-1h": "2000",
}
last_state = parse_rate_limit_headers(prior_headers, provider="nous")
assert is_genuine_nous_rate_limit(
headers=None, last_known_state=last_state
) is False
def test_none_last_state_and_no_headers_is_upstream(self):
from agent.nous_rate_guard import is_genuine_nous_rate_limit
assert is_genuine_nous_rate_limit(
headers=None, last_known_state=None
) is False
+228
View File
@@ -0,0 +1,228 @@
"""Tests for agent/onboarding.py — contextual first-touch hint helpers."""
from __future__ import annotations
import yaml
import pytest
from agent.onboarding import (
BUSY_INPUT_FLAG,
OPENCLAW_RESIDUE_FLAG,
TOOL_PROGRESS_FLAG,
busy_input_hint_cli,
busy_input_hint_gateway,
detect_openclaw_residue,
is_seen,
mark_seen,
openclaw_residue_hint_cli,
tool_progress_hint_cli,
tool_progress_hint_gateway,
)
class TestIsSeen:
def test_empty_config_unseen(self):
assert is_seen({}, BUSY_INPUT_FLAG) is False
def test_missing_onboarding_unseen(self):
assert is_seen({"display": {}}, BUSY_INPUT_FLAG) is False
def test_onboarding_not_dict_unseen(self):
assert is_seen({"onboarding": "nope"}, BUSY_INPUT_FLAG) is False
def test_seen_dict_missing_flag(self):
assert is_seen({"onboarding": {"seen": {}}}, BUSY_INPUT_FLAG) is False
def test_seen_flag_true(self):
cfg = {"onboarding": {"seen": {BUSY_INPUT_FLAG: True}}}
assert is_seen(cfg, BUSY_INPUT_FLAG) is True
def test_seen_flag_falsy(self):
cfg = {"onboarding": {"seen": {BUSY_INPUT_FLAG: False}}}
assert is_seen(cfg, BUSY_INPUT_FLAG) is False
def test_other_flags_isolated(self):
cfg = {"onboarding": {"seen": {BUSY_INPUT_FLAG: True}}}
assert is_seen(cfg, TOOL_PROGRESS_FLAG) is False
class TestMarkSeen:
def test_creates_missing_file_and_sets_flag(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert loaded["onboarding"]["seen"][BUSY_INPUT_FLAG] is True
def test_preserves_other_config(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
cfg_path.write_text(yaml.safe_dump({
"model": {"default": "claude-sonnet-4.6"},
"display": {"skin": "default"},
}))
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert loaded["model"]["default"] == "claude-sonnet-4.6"
assert loaded["display"]["skin"] == "default"
assert loaded["onboarding"]["seen"][BUSY_INPUT_FLAG] is True
def test_preserves_other_seen_flags(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
cfg_path.write_text(yaml.safe_dump({
"onboarding": {"seen": {TOOL_PROGRESS_FLAG: True}},
}))
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert loaded["onboarding"]["seen"][TOOL_PROGRESS_FLAG] is True
assert loaded["onboarding"]["seen"][BUSY_INPUT_FLAG] is True
def test_idempotent(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
mark_seen(cfg_path, BUSY_INPUT_FLAG)
first = cfg_path.read_text()
# Second call must be a no-op on-disk content (file may be touched,
# but the YAML contents should be identical).
mark_seen(cfg_path, BUSY_INPUT_FLAG)
second = cfg_path.read_text()
assert yaml.safe_load(first) == yaml.safe_load(second)
def test_handles_non_dict_onboarding(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
cfg_path.write_text(yaml.safe_dump({"onboarding": "corrupted"}))
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert loaded["onboarding"]["seen"][BUSY_INPUT_FLAG] is True
def test_handles_non_dict_seen(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
cfg_path.write_text(yaml.safe_dump({"onboarding": {"seen": "corrupted"}}))
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert loaded["onboarding"]["seen"][BUSY_INPUT_FLAG] is True
class TestHintMessages:
def test_busy_input_hint_gateway_interrupt(self):
msg = busy_input_hint_gateway("interrupt")
assert "/busy queue" in msg
assert "interrupted" in msg.lower()
def test_busy_input_hint_gateway_queue(self):
msg = busy_input_hint_gateway("queue")
assert "/busy interrupt" in msg
assert "queued" in msg.lower()
def test_busy_input_hint_gateway_steer(self):
msg = busy_input_hint_gateway("steer")
assert "/busy interrupt" in msg
assert "/busy queue" in msg
assert "steer" in msg.lower()
def test_busy_input_hint_cli_interrupt(self):
msg = busy_input_hint_cli("interrupt")
assert "/busy queue" in msg
def test_busy_input_hint_cli_queue(self):
msg = busy_input_hint_cli("queue")
assert "/busy interrupt" in msg
def test_busy_input_hint_cli_steer(self):
msg = busy_input_hint_cli("steer")
assert "/busy interrupt" in msg
assert "/busy queue" in msg
assert "steer" in msg.lower()
def test_tool_progress_hints_mention_verbose(self):
assert "/verbose" in tool_progress_hint_gateway()
assert "/verbose" in tool_progress_hint_cli()
def test_hints_are_not_empty(self):
for hint in (
busy_input_hint_gateway("queue"),
busy_input_hint_gateway("interrupt"),
busy_input_hint_gateway("steer"),
busy_input_hint_cli("queue"),
busy_input_hint_cli("interrupt"),
busy_input_hint_cli("steer"),
tool_progress_hint_gateway(),
tool_progress_hint_cli(),
):
assert hint.strip()
class TestRoundTrip:
"""After mark_seen, is_seen on the re-loaded config must return True."""
def test_mark_then_is_seen(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
assert mark_seen(cfg_path, BUSY_INPUT_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert is_seen(loaded, BUSY_INPUT_FLAG) is True
assert is_seen(loaded, TOOL_PROGRESS_FLAG) is False
def test_mark_both_flags_independently(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
mark_seen(cfg_path, BUSY_INPUT_FLAG)
mark_seen(cfg_path, TOOL_PROGRESS_FLAG)
loaded = yaml.safe_load(cfg_path.read_text())
assert is_seen(loaded, BUSY_INPUT_FLAG) is True
assert is_seen(loaded, TOOL_PROGRESS_FLAG) is True
# ---------------------------------------------------------------------------
# OpenClaw residue banner
# ---------------------------------------------------------------------------
class TestDetectOpenclawResidue:
def test_returns_true_when_openclaw_dir_present(self, tmp_path):
(tmp_path / ".openclaw").mkdir()
assert detect_openclaw_residue(home=tmp_path) is True
def test_returns_false_when_absent(self, tmp_path):
assert detect_openclaw_residue(home=tmp_path) is False
def test_returns_false_when_path_is_a_file(self, tmp_path):
# A stray file named ``.openclaw`` is NOT a workspace — skip the banner.
(tmp_path / ".openclaw").write_text("oops")
assert detect_openclaw_residue(home=tmp_path) is False
def test_default_home_does_not_crash(self):
# Smoke: real $HOME lookup must not raise regardless of state.
assert isinstance(detect_openclaw_residue(), bool)
class TestOpenclawResidueHint:
def test_hint_mentions_cleanup_command(self):
msg = openclaw_residue_hint_cli()
assert "hermes claw cleanup" in msg
assert "~/.openclaw" in msg
def test_hint_not_empty(self):
assert openclaw_residue_hint_cli().strip()
class TestOpenclawResidueSeenFlag:
def test_flag_independent_of_other_flags(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
mark_seen(cfg_path, BUSY_INPUT_FLAG)
loaded = yaml.safe_load(cfg_path.read_text())
assert is_seen(loaded, OPENCLAW_RESIDUE_FLAG) is False
def test_flag_round_trips(self, tmp_path):
cfg_path = tmp_path / "config.yaml"
assert mark_seen(cfg_path, OPENCLAW_RESIDUE_FLAG) is True
loaded = yaml.safe_load(cfg_path.read_text())
assert is_seen(loaded, OPENCLAW_RESIDUE_FLAG) is True
+71
View File
@@ -240,3 +240,74 @@ class TestAllowlistOps:
and e.get("command") == str(script)
]
assert len(matching) == 1
# ── hooks_auto_accept config parsing ──────────────────────────────────────
class TestHooksAutoAcceptParsing:
"""Regression guard: YAML-string values must not silently auto-accept.
``bool("false")`` is ``True`` in Python, so the old ``return bool(cfg_val)``
path treated ``hooks_auto_accept: "false"`` (quoted YAML string) as a
truthy opt-in, silently bypassing user consent for every shell hook.
"""
def test_bool_true_accepts(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": True}, accept_hooks_arg=False,
) is True
def test_bool_false_rejects(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": False}, accept_hooks_arg=False,
) is False
def test_string_false_rejects(self):
# The bug: bool("false") is True. Must be parsed, not coerced.
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": "false"}, accept_hooks_arg=False,
) is False
def test_string_no_rejects(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": "no"}, accept_hooks_arg=False,
) is False
def test_string_true_accepts(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": "true"}, accept_hooks_arg=False,
) is True
def test_string_true_case_insensitive(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": " TRUE "}, accept_hooks_arg=False,
) is True
def test_string_yes_on_one_accept(self):
for val in ("yes", "on", "1"):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": val}, accept_hooks_arg=False,
) is True, val
def test_missing_key_rejects(self):
assert shell_hooks._resolve_effective_accept(
{}, accept_hooks_arg=False,
) is False
def test_none_rejects(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": None}, accept_hooks_arg=False,
) is False
def test_integer_ignored(self):
# Only bool and str are honored; anything else (including 1) is False.
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": 1}, accept_hooks_arg=False,
) is False
def test_cli_arg_overrides_config(self):
assert shell_hooks._resolve_effective_accept(
{"hooks_auto_accept": "false"}, accept_hooks_arg=True,
) is True
@@ -0,0 +1,201 @@
"""Regression tests for the generic unsupported-parameter detector in
``agent.auxiliary_client``.
The original temperature-specific detector (PR #15621) was generalized so the
same reactive-retry strategy covers any provider that rejects an arbitrary
request parameter ``max_tokens``, ``seed``, ``top_p``, future quirks not
just ``temperature``. Credit @nicholasrae (PR #15416) for the generalization
pattern.
These tests lock in:
* ``_is_unsupported_parameter_error(exc, param)`` across common phrasings
* the back-compat wrapper ``_is_unsupported_temperature_error`` still works
* the max_tokens retry branch no longer pops a key that was never set
(``max_tokens is None`` gate)
* the max_tokens retry branch matches via the generic helper on top of the
legacy ``"max_tokens"`` / ``"unsupported_parameter"`` substring checks
"""
from unittest.mock import patch, MagicMock, AsyncMock
import pytest
from agent.auxiliary_client import (
call_llm,
async_call_llm,
_is_unsupported_parameter_error,
_is_unsupported_temperature_error,
)
class TestIsUnsupportedParameterError:
"""The generic detector must match real provider phrasings for any param."""
@pytest.mark.parametrize("param,message", [
# temperature phrasings (regression coverage via the generic API)
("temperature", "HTTP 400: Unsupported parameter: temperature"),
("temperature", "Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}"),
("temperature", "this model does not support temperature"),
# max_tokens phrasings
("max_tokens", "HTTP 400: Unsupported parameter: max_tokens"),
("max_tokens", "Unknown parameter: max_tokens — use max_completion_tokens"),
("max_tokens", "Invalid parameter: max_tokens is not supported"),
# arbitrary future params
("seed", "HTTP 400: unrecognized parameter: seed"),
("top_p", "Error: top_p is not supported for this model"),
])
def test_matches_real_provider_messages(self, param, message):
assert _is_unsupported_parameter_error(RuntimeError(message), param) is True
@pytest.mark.parametrize("param,message", [
# Param not mentioned at all
("temperature", "HTTP 400: max_tokens is too large"),
# Param mentioned but not flagged as unsupported
("temperature", "temperature must be between 0 and 2"),
# Totally unrelated 400
("max_tokens", "Rate limit exceeded"),
# Connection-level errors
("temperature", "Connection reset by peer"),
])
def test_does_not_match_unrelated_errors(self, param, message):
assert _is_unsupported_parameter_error(RuntimeError(message), param) is False
def test_empty_param_returns_false(self):
assert _is_unsupported_parameter_error(
RuntimeError("HTTP 400: Unsupported parameter: temperature"), ""
) is False
def test_temperature_wrapper_delegates_to_generic(self):
"""Back-compat: ``_is_unsupported_temperature_error`` still routes through."""
msg = "HTTP 400: Unsupported parameter: temperature"
assert _is_unsupported_temperature_error(RuntimeError(msg)) is True
# And the unrelated-case still holds
assert _is_unsupported_temperature_error(
RuntimeError("max_tokens is too large")) is False
def _dummy_response():
"""Sentinel — real code calls ``_validate_llm_response`` which we patch out."""
return {"ok": True}
class TestMaxTokensRetryHardening:
"""The max_tokens retry branch now (a) gates on ``max_tokens is not None``
and (b) also matches the generic phrasings via the helper.
"""
def test_sync_max_tokens_retry_skipped_when_max_tokens_is_none(self):
"""No max_tokens kwarg → must not pop/retry even if the error mentions it.
Before the hardening, ``kwargs.pop("max_tokens", None)`` was safe but
``kwargs["max_completion_tokens"] = max_tokens`` would set a None
value and hit the provider again. The gate skips the whole branch.
"""
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
err = RuntimeError("HTTP 400: Unsupported parameter: max_tokens")
client.chat.completions.create.side_effect = err
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
with pytest.raises(RuntimeError):
call_llm(
task="session_search",
messages=[{"role": "user", "content": "hi"}],
temperature=0.3,
# max_tokens omitted on purpose
)
# Only the initial attempt — no retry because the gate blocked it
assert client.chat.completions.create.call_count == 1
def test_sync_max_tokens_retry_matches_generic_phrasing(self):
"""A 400 saying "Unknown parameter: max_tokens" (not the legacy
substring ``"max_tokens"`` bare + no ``unsupported_parameter`` token)
now triggers the retry via the generic helper.
"""
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
err = RuntimeError("Unknown parameter: max_tokens")
response = _dummy_response()
client.chat.completions.create.side_effect = [err, response]
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
result = call_llm(
task="session_search",
messages=[{"role": "user", "content": "hi"}],
temperature=0.3,
max_tokens=512,
)
assert result is response
assert client.chat.completions.create.call_count == 2
second_call = client.chat.completions.create.call_args_list[1]
assert "max_tokens" not in second_call.kwargs
assert second_call.kwargs["max_completion_tokens"] == 512
@pytest.mark.asyncio
async def test_async_max_tokens_retry_skipped_when_max_tokens_is_none(self):
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
err = RuntimeError("HTTP 400: Unsupported parameter: max_tokens")
client.chat.completions.create = AsyncMock(side_effect=err)
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
with pytest.raises(RuntimeError):
await async_call_llm(
task="session_search",
messages=[{"role": "user", "content": "hi"}],
temperature=0.3,
)
assert client.chat.completions.create.call_count == 1
@pytest.mark.asyncio
async def test_async_max_tokens_retry_matches_generic_phrasing(self):
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
err = RuntimeError("Unknown parameter: max_tokens")
response = _dummy_response()
client.chat.completions.create = AsyncMock(side_effect=[err, response])
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
result = await async_call_llm(
task="session_search",
messages=[{"role": "user", "content": "hi"}],
temperature=0.3,
max_tokens=512,
)
assert result is response
assert client.chat.completions.create.await_count == 2
second_call = client.chat.completions.create.call_args_list[1]
assert "max_tokens" not in second_call.kwargs
assert second_call.kwargs["max_completion_tokens"] == 512
@@ -0,0 +1,237 @@
"""Regression tests for the universal "unsupported temperature" retry in
``agent.auxiliary_client``.
Auxiliary callers (context compression, session search,
web extract summarisation, etc.) hardcode ``temperature=0.3`` for historical
reasons. Several provider/model combinations reject ``temperature`` with a
400:
* OpenAI Responses (gpt-5/o-series reasoning models)
* Copilot Responses (reasoning models)
* OpenRouter reasoning models (gpt-5.5, some anthropic via OAI-compat)
* Anthropic Opus 4.7+ via OpenAI-compat endpoints
* Kimi/Moonshot (server-managed)
``_fixed_temperature_for_model`` catches Kimi up front, and
``build_chat_completion_kwargs`` drops temperature for Anthropic Opus 4.7+,
but the same backend can accept ``temperature`` for some models and reject
it for others (for example gpt-5.4 accepts but gpt-5.5 rejects on the same
endpoint). An allow/deny-list is not maintainable across providers.
The universal fix is reactive: when a call returns an
``Unsupported parameter: temperature`` 400, retry once without temperature.
These tests lock in that behaviour for both sync and async paths.
"""
from unittest.mock import patch, MagicMock, AsyncMock
import pytest
from agent.auxiliary_client import (
call_llm,
async_call_llm,
_is_unsupported_temperature_error,
)
class TestIsUnsupportedTemperatureError:
"""The detector must match the phrasings providers actually return."""
@pytest.mark.parametrize("message", [
# OpenAI / Codex Responses
"HTTP 400: Unsupported parameter: temperature",
"Error code: 400 - {'error': {'message': \"Unsupported parameter: 'temperature'\"}}",
# Copilot / OpenAI error-code form
"Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}",
# OpenRouter-style
"Provider returned error: temperature is not supported for this model",
"this model does not support temperature",
# Anthropic-style via OAI-compat
"temperature: unknown parameter",
# Some gateways
"unrecognized request argument supplied: temperature",
])
def test_matches_real_provider_messages(self, message):
assert _is_unsupported_temperature_error(RuntimeError(message)) is True
@pytest.mark.parametrize("message", [
# Unrelated 400s must NOT trigger a silent-retry
"HTTP 400: Invalid value: 'tool'. Supported values are: 'assistant'...",
"max_tokens is too large for this model",
"Rate limit exceeded",
"Connection reset by peer",
# Temperature value error is a different class of problem
"temperature must be between 0 and 2",
])
def test_does_not_match_unrelated_errors(self, message):
assert _is_unsupported_temperature_error(RuntimeError(message)) is False
def _dummy_response():
# The real code calls _validate_llm_response which inspects
# response.choices[0].message. The tests here patch that out, so
# any sentinel object is fine.
return {"ok": True}
class TestCallLlmUnsupportedTemperatureRetry:
"""``call_llm`` retries once without temperature and returns on success."""
def _setup(self, first_exc):
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
client.chat.completions.create.side_effect = [first_exc, _dummy_response()]
return client
@pytest.mark.parametrize("error_message", [
"HTTP 400: Unsupported parameter: temperature",
"Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}",
"Provider error: this model does not support temperature",
])
def test_retries_once_without_temperature(self, error_message):
client = self._setup(RuntimeError(error_message))
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
result = call_llm(
task="compression",
messages=[{"role": "user", "content": "remember this"}],
temperature=0.3,
max_tokens=500,
)
assert result == {"ok": True}
assert client.chat.completions.create.call_count == 2
first_kwargs = client.chat.completions.create.call_args_list[0].kwargs
retry_kwargs = client.chat.completions.create.call_args_list[1].kwargs
assert first_kwargs["temperature"] == 0.3
assert "temperature" not in retry_kwargs
# other kwargs preserved
assert retry_kwargs["max_tokens"] == 500
def test_non_temperature_400_does_not_retry_as_temperature(self):
"""Unrelated 400s (e.g. bad tool role) must not silently drop temp."""
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
non_temp_err = RuntimeError(
"HTTP 400: Invalid value: 'tool'. Supported values are: 'assistant'..."
)
client.chat.completions.create.side_effect = non_temp_err
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
patch("agent.auxiliary_client._try_payment_fallback",
return_value=None),
):
with pytest.raises(RuntimeError, match="Invalid value"):
call_llm(
task="compression",
messages=[{"role": "user", "content": "x"}],
temperature=0.3,
max_tokens=500,
)
# Should NOT have retried (non-temperature 400 doesn't match)
assert client.chat.completions.create.call_count == 1
def test_no_retry_when_temperature_not_in_kwargs(self):
"""If caller didn't send temperature, don't invent a temperature-retry."""
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
# Provider complains about temperature even though we didn't send it.
# (Pathological but possible with misleading error text.) The guard
# ``"temperature" in kwargs`` must prevent an unnecessary retry.
err = RuntimeError("HTTP 400: Unsupported parameter: temperature")
client.chat.completions.create.side_effect = err
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
patch("agent.auxiliary_client._try_payment_fallback",
return_value=None),
):
with pytest.raises(RuntimeError):
call_llm(
task="compression",
messages=[{"role": "user", "content": "x"}],
temperature=None, # explicit: no temperature sent
max_tokens=500,
)
assert client.chat.completions.create.call_count == 1
class TestAsyncCallLlmUnsupportedTemperatureRetry:
"""``async_call_llm`` mirror of the sync retry semantics."""
@pytest.mark.asyncio
async def test_async_retries_once_without_temperature(self):
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
client.chat.completions.create = AsyncMock(side_effect=[
RuntimeError("HTTP 400: Unsupported parameter: temperature"),
_dummy_response(),
])
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
):
result = await async_call_llm(
task="session_search",
messages=[{"role": "user", "content": "query"}],
temperature=0.3,
max_tokens=500,
)
assert result == {"ok": True}
assert client.chat.completions.create.await_count == 2
first_kwargs = client.chat.completions.create.call_args_list[0].kwargs
retry_kwargs = client.chat.completions.create.call_args_list[1].kwargs
assert first_kwargs["temperature"] == 0.3
assert "temperature" not in retry_kwargs
assert retry_kwargs["max_tokens"] == 500
@pytest.mark.asyncio
async def test_async_non_temperature_400_does_not_retry(self):
client = MagicMock()
client.base_url = "https://api.openai.com/v1"
client.chat.completions.create = AsyncMock(
side_effect=RuntimeError("HTTP 400: Invalid value: 'tool'"),
)
with (
patch("agent.auxiliary_client._resolve_task_provider_model",
return_value=("openai-codex", "gpt-5.5", None, None, None)),
patch("agent.auxiliary_client._get_cached_client",
return_value=(client, "gpt-5.5")),
patch("agent.auxiliary_client._validate_llm_response",
side_effect=lambda resp, _task: resp),
patch("agent.auxiliary_client._try_payment_fallback",
return_value=None),
):
with pytest.raises(RuntimeError, match="Invalid value"):
await async_call_llm(
task="session_search",
messages=[{"role": "user", "content": "x"}],
temperature=0.3,
max_tokens=500,
)
assert client.chat.completions.create.await_count == 1
@@ -33,15 +33,18 @@ class TestChatCompletionsBasic:
def test_convert_messages_strips_codex_fields(self, transport):
msgs = [
{"role": "assistant", "content": "ok", "codex_reasoning_items": [{"id": "rs_1"}],
"codex_message_items": [{"id": "msg_1", "type": "message"}],
"tool_calls": [{"id": "call_1", "call_id": "call_1", "response_item_id": "fc_1",
"type": "function", "function": {"name": "t", "arguments": "{}"}}]},
]
result = transport.convert_messages(msgs)
assert "codex_reasoning_items" not in result[0]
assert "codex_message_items" not in result[0]
assert "call_id" not in result[0]["tool_calls"][0]
assert "response_item_id" not in result[0]["tool_calls"][0]
# Original list untouched (deepcopy-on-demand)
assert "codex_reasoning_items" in msgs[0]
assert "codex_message_items" in msgs[0]
class TestChatCompletionsBuildKwargs:
@@ -194,6 +194,36 @@ class TestCodexNormalizeResponse:
assert nr.content == "Hello world"
assert nr.finish_reason == "stop"
def test_message_items_preserved_in_provider_data(self, transport):
"""Codex assistant message item ids/phases must survive transport normalization."""
r = SimpleNamespace(
output=[
SimpleNamespace(
type="message",
role="assistant",
id="msg_abc",
phase="final_answer",
content=[SimpleNamespace(type="output_text", text="Hello world")],
status="completed",
),
],
status="completed",
incomplete_details=None,
usage=SimpleNamespace(input_tokens=10, output_tokens=5,
input_tokens_details=None, output_tokens_details=None),
)
nr = transport.normalize_response(r)
assert nr.codex_message_items == [
{
"type": "message",
"role": "assistant",
"status": "completed",
"content": [{"type": "output_text", "text": "Hello world"}],
"id": "msg_abc",
"phase": "final_answer",
}
]
def test_tool_call_response(self, transport):
"""Normalize a Codex response with tool calls."""
r = SimpleNamespace(
+7
View File
@@ -60,6 +60,13 @@ class TestTransportRegistry:
assert t is not None
assert t.api_mode == "anthropic_messages"
def test_discovers_missing_transport_when_registry_partially_populated(self):
"""Importing one transport directly must not hide other valid api_modes."""
import agent.transports.chat_completions # noqa: F401
t = get_transport("codex_responses")
assert t is not None
assert t.api_mode == "codex_responses"
def test_register_and_get(self):
class DummyTransport(ProviderTransport):
@property
+12
View File
@@ -270,3 +270,15 @@ class TestNormalizedResponseBackwardCompat:
def test_codex_reasoning_items_none_when_absent(self):
nr = NormalizedResponse(content="hi", tool_calls=None, finish_reason="stop")
assert nr.codex_reasoning_items is None
def test_codex_message_items_from_provider_data(self):
items = [{"id": "msg_1", "type": "message"}]
nr = NormalizedResponse(
content="hi", tool_calls=None, finish_reason="stop",
provider_data={"codex_message_items": items},
)
assert nr.codex_message_items == items
def test_codex_message_items_none_when_absent(self):
nr = NormalizedResponse(content="hi", tool_calls=None, finish_reason="stop")
assert nr.codex_message_items is None
+24
View File
@@ -160,6 +160,30 @@ class TestBranchCommandCLI:
assert agent.reset_session_state.called
assert agent._last_flushed_db_idx == 4 # len(conversation_history)
def test_branch_updates_agent_session_log_file(self, cli_instance, session_db, tmp_path):
"""Branching must redirect the agent's session_log_file to the new session's path."""
from cli import HermesCLI
from pathlib import Path
logs_dir = tmp_path / "sessions"
logs_dir.mkdir()
agent = MagicMock()
agent._last_flushed_db_idx = 0
agent.logs_dir = logs_dir
agent.session_log_file = logs_dir / f"session_{cli_instance.session_id}.json"
cli_instance.agent = agent
old_log_file = agent.session_log_file
HermesCLI._handle_branch_command(cli_instance, "/branch")
new_session_id = cli_instance.session_id
expected_log = logs_dir / f"session_{new_session_id}.json"
assert agent.session_log_file == expected_log, (
"session_log_file must point to the branch session, not the original"
)
assert agent.session_log_file != old_log_file
def test_branch_sets_resumed_flag(self, cli_instance, session_db):
"""Branch should set _resumed=True to prevent auto-title generation."""
from cli import HermesCLI

Some files were not shown because too many files have changed in this diff Show More