Compare commits

..

325 Commits

Author SHA1 Message Date
teknium1 9845650fd6 feat(cli): add /exit --delete flag to remove session on quit
Port from google-gemini/gemini-cli#19332.

Users can now exit with '/exit --delete' (or '/quit --delete', '/exit -d')
to permanently remove the current session's SQLite history plus on-disk
transcripts (*.json / *.jsonl / request_dump_*) in one shot. Useful for
privacy-sensitive workflows and one-off interactions where leaving a
session recording behind is undesirable.

Implementation:
- New HermesCLI._delete_session_on_exit one-shot flag (defaults False).
- process_command() parses --delete / -d after /exit or /quit and arms
  the flag. Unknown args print a hint and keep the CLI running (prevents
  typos like '/exit -delete' from accidentally exiting).
- Shutdown path calls SessionDB.delete_session(session_id, sessions_dir=...)
  right after end_session() when the flag is set. That API already
  existed for 'hermes sessions delete' and handles both SQLite removal
  (orphaning child sessions so FK constraints hold) and on-disk file
  cleanup.
- /quit CommandDef now advertises '[--delete]' in args_hint so /help
  and CLI autocomplete surface it.

Tests: tests/cli/test_exit_delete_session.py (12 cases covering both
aliases, case insensitivity, whitespace, short form, unknown-arg
rejection, and registry metadata).

E2E-verified with isolated HERMES_HOME: session row deleted, all three
transcript/request-dump files removed, second delete_session call
correctly returns False.
2026-04-29 17:05:23 -07:00
brooklyn! 98f5be13fa fix(tui): word-wrap composer input (#17651)
* fix(tui): word-wrap composer input

Wrap composer input at word boundaries and anchor the good-vibes heart to the full composer row.

* test(tui): cover composer word wrap edge

Add regression coverage for moving the next word instead of splitting it at the composer edge.
2026-04-29 16:55:49 -07:00
brooklyn! 5e6e8b6af3 fix(tui): honor launch toolsets (#17623)
* fix(tui): honor launch toolsets

Carry chat --toolsets through the TUI launcher so TUI sessions use the same per-session tool scope as the classic CLI.

* fix(tui): parse top-level toolsets flag

Allow top-level hermes --tui --toolsets to reach the implicit chat session, matching chat subcommand behavior.

* fix(tui): validate launch toolsets

Filter invalid HERMES_TUI_TOOLSETS entries and fall back to configured CLI toolsets when the override contains no valid toolsets.

* fix(tui): avoid config load for builtin toolsets

Honor built-in HERMES_TUI_TOOLSETS values before loading config and treat all/* as the all-toolsets sentinel.

* fix(cli): honor toolsets in oneshot mode

Forward top-level --toolsets into oneshot agent construction so the flag is not silently ignored outside the TUI path.

* fix(cli): validate oneshot toolsets

Reject invalid-only oneshot toolset overrides before output redirection and clarify TUI fallback warnings.

* fix(cli): preserve all-toolsets sentinel

Map explicit all/* oneshot toolset overrides to the all-toolsets sentinel and replace locals() checks in TUI toolset loading.

* fix(cli): warn on extra all-toolset entries

Warn when all/* toolset overrides include additional ignored entries so typos are still visible.

* fix(tui): honor plugin toolset overrides

Discover plugin toolsets before rejecting unresolved explicit toolset overrides and read raw config for MCP name validation.

* fix(tui): reuse toolset argument normalizer

Share top-level TUI toolset argument parsing with the oneshot path to avoid duplicate normalization logic.

* fix(cli): reject disabled mcp toolsets

Validate explicit toolset overrides against enabled MCP servers only and clarify top-level toolset flag help.

* fix(cli): distinguish disabled mcp from unknown toolsets

Report disabled MCP servers separately from unknown toolset entries and stub plugin discovery in invalid-name tests for determinism.
2026-04-29 16:55:27 -07:00
brooklyn! d9bf093728 Merge pull request #17638 from NousResearch/bb/tui-details-persist
fix(tui): persist global details mode sections
2026-04-29 15:15:37 -07:00
Brooklyn Nicholson faa467ccaf fix(tui): share detail section constants
Reuse one gateway detail-section list for global and per-section detail mode config handling.
2026-04-29 17:05:51 -05:00
brooklyn! f45434d3c6 Merge pull request #17626 from NousResearch/bb/tui-prompt-gap
fix(tui): render explicit prompt gap
2026-04-29 14:58:17 -07:00
brooklyn! 2a9a5fffa5 Merge pull request #17625 from NousResearch/bb/tui-reasoning-hide
fix(tui): hide reasoning panels immediately
2026-04-29 14:49:20 -07:00
Brooklyn Nicholson c2cb6d1071 fix(tui): persist global details mode sections
Pin all detail sections when /details sets a global mode so config sync does not restore built-in section defaults.
2026-04-29 16:46:42 -05:00
teknium1 b52b63396c chore: map hejuntt1014 in AUTHOR_MAP 2026-04-29 14:21:35 -07:00
hejuntt1014 528e7dc176 fix(cli): exclude profiles/ from profile create --clone-all
shutil.copytree from default ~/.hermes duplicated ~/.hermes/profiles into
the new profile, causing nested profiles/.../profiles/... and huge disk use.
Match export behavior (_DEFAULT_EXPORT_EXCLUDE_ROOT) by ignoring the sibling
profiles tree at the source root.

Made-with: Cursor
2026-04-29 14:21:35 -07:00
Teknium 4899bd99c0 feat(skills): move comfyui from optional to built-in (#17631)
Intended placement per PR #17610 discussion — comfyui belongs in
skills/creative/ alongside other creative built-ins (touchdesigner-mcp,
pretext, sketch), not in optional-skills/.

Pure directory rename, no content changes. History preserved via git mv.
2026-04-29 14:09:17 -07:00
Brooklyn Nicholson 8652d47eaa fix(tui): remove unused prompt import
Drop the stale stringWidth import after centralizing composer prompt width metrics.
2026-04-29 16:04:22 -05:00
Brooklyn Nicholson 7d96a5ab6e fix(tui): refine reasoning visibility updates
Save reasoning display changes atomically and keep trail segments visible when Activity can render them.
2026-04-29 16:03:45 -05:00
Brooklyn Nicholson d3ab2b2e13 fix(tui): share composer prompt gap metric
Use one exported prompt gap constant for both composer width math and prompt prefix rendering.
2026-04-29 15:50:54 -05:00
Brooklyn Nicholson f7abcb4f01 fix(tui): ignore hidden reasoning stream segments
Only keep the live progress area mounted for stream segments that can render under the current detail section visibility.
2026-04-29 15:50:02 -05:00
Brooklyn Nicholson 10fcd620d2 fix(tui): render explicit prompt gap
Reserve the composer prompt gap as layout instead of relying on terminal handling of trailing spaces.
2026-04-29 15:25:06 -05:00
Brooklyn Nicholson d8afafd22b fix(tui): hide reasoning panels immediately
Make /reasoning hide update the thinking section visibility so existing and live reasoning blocks disappear without waiting for config sync.
2026-04-29 15:23:14 -05:00
brooklyn! 456955c2e4 Merge pull request #17259 from NousResearch/bb/pretext-skill
skills: add pretext (creative demos with @chenglou/pretext)
2026-04-29 12:57:25 -07:00
Teknium 9be3ab1a5b fix(plugins): stop firing pre_tool_call hook twice per tool execution (#17611)
The skip_pre_tool_call_hook flag was added to prevent double-firing of
pre_tool_call when run_agent._invoke_tool pre-checks for a block
directive and then dispatches via handle_function_call. But the
implementation added an else: branch that fired invoke_hook again for
'observers', without noticing that get_pre_tool_call_block_message() in
hermes_cli.plugins already fires invoke_hook('pre_tool_call', ...) as
part of its block-directive poll.

Result: every tool call ran through the run_agent loop fired the hook
twice — reported by community users whose observer / audit plugins
logged each tool invocation twice with identical timestamps.

Fix: delete the else: branch. The single-fire contract is now:
  - skip=False (direct handle_function_call): hook fires once inside
    get_pre_tool_call_block_message().
  - skip=True (run_agent._invoke_tool path): caller fires the hook
    once via get_pre_tool_call_block_message(); handle_function_call
    must not fire it again.

Tightened the existing skip-flag test (renamed to
test_skip_flag_prevents_double_fire) to assert pre_tool_call fires
zero times when skip=True, and added
test_run_agent_pattern_fires_pre_tool_call_exactly_once to lock in
end-to-end that the full block-check + dispatch sequence fires the
hook exactly once.
2026-04-29 12:43:39 -07:00
Teknium ffe1d660a0 docs(comfyui): ask local vs cloud FIRST before hardware check (#17612)
Adds Step 0 'Ask Local vs Cloud' as the very first onboarding step, with a
scripted question that spells out the hardware requirements for local
(6 GB VRAM NVIDIA, ROCm AMD on Linux, or M1+ Mac with 16 GB unified)
and routes Cloud users straight to Path A without a hardware check.
Hardware check becomes Step 1, run only when the user picked local.
2026-04-29 12:40:56 -07:00
teknium1 9d7ece362d feat(comfyui): add hardware check + auto-gate local install on verdict
Layers a programmatic hardware-feasibility check on top of the v4 skill
so the agent doesn't silently push users toward a local install they
can't actually run. The official comfy-cli supports --nvidia / --amd /
--m-series / --cpu, but has no guard against "4 GB laptop GPU on SDXL"
or "Intel Mac falling back to CPU" — both route to comfy-cli paths in
the original table and then fail on first workflow.

- scripts/hardware_check.py: detect OS/arch/GPU (NVIDIA nvidia-smi,
  AMD rocm-smi, Apple M1+ via arm64+sysctl, Intel Arc via clinfo),
  VRAM, system/unified RAM. Emits JSON
  {verdict: ok|marginal|cloud, recommended_install_path, comfy_cli_flag}
  with practical thresholds: discrete GPU >=6 GB VRAM minimum,
  Apple Silicon >=16 GB unified memory minimum, Intel Mac -> cloud,
  no accelerator -> cloud. comfy_cli_flag maps directly to
  `comfy install` so the agent can stitch the whole flow together.

- scripts/comfyui_setup.sh: runs hardware_check.py first when no
  explicit flag is passed. If verdict=cloud, refuses to install
  locally, prints Comfy Cloud URL + an override command, exits 2.
  Otherwise auto-selects the right --nvidia/--amd/--m-series flag
  for `comfy install`. Surfaces marginal-verdict notes to the user.

- SKILL.md Setup & Onboarding: adds mandatory Step 0 "Check If This
  Machine Can Run ComfyUI Locally" ahead of the Path A-E selection.
  Documents the verdict thresholds inline, ties verdict + comfy_cli_flag
  to the install paths, and updates the path-choice table so
  "verdict: cloud" is the first row. Quick-Start "Detect Environment"
  block extended to include the hardware check. Verification
  checklist gains a hardware-check gate.

- Frontmatter setup.help rewritten to point at hardware_check.py
  first. Version bumped 4.0.0 -> 4.1.0.
2026-04-29 12:38:59 -07:00
Siddharth Balyan 528a13b37a Potential fix for pull request finding 'CodeQL / Incomplete URL substring sanitization'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-04-29 12:38:59 -07:00
Siddharth Balyan 9835f57e9c Potential fix for pull request finding 'CodeQL / Incomplete URL substring sanitization'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-04-29 12:38:59 -07:00
alt-glitch d7d1503595 docs(comfyui): add comprehensive onboarding — all install paths, doc links, cloud setup
Adds structured onboarding flow to SKILL.md:
- Decision table: which install path for which situation
- Path A: Comfy Cloud (zero setup, API key, pricing)
- Path B: Desktop app (Windows/macOS, one-click)
- Path C: Portable build (Windows, extract-and-run)
- Path D: comfy-cli (recommended for agents, all platforms)
- Path E: Manual install (advanced, all hardware types)
- Post-install: model downloads, custom nodes, verification

All paths link to official docs:
- https://docs.comfy.org/installation
- https://docs.comfy.org/comfy-cli/getting-started
- https://docs.comfy.org/get_started/cloud
- https://docs.comfy.org/installation/desktop
- https://docs.comfy.org/installation/comfyui_portable_windows
- https://docs.comfy.org/installation/manual_install
2026-04-29 12:38:59 -07:00
alt-glitch b81638d749 feat(comfyui): rewrite skill — official CLI + REST API, no third-party dependency
Complete rewrite of the ComfyUI skill to use:
- comfy-cli (official, Comfy-Org/comfy-cli) for lifecycle management:
  install, launch, stop, node management, model downloads
- Direct REST API + helper scripts for workflow execution:
  parameter injection, submission, monitoring, output download
- No dependency on comfyui-skill-cli or any unofficial tool

New files:
- SKILL.md: full rewrite with two-layer architecture, decision tree, pitfalls
- references/official-cli.md: complete comfy-cli command reference
- references/rest-api.md: all REST endpoints (local + cloud)
- references/workflow-format.md: API format spec, common nodes, param mapping
- scripts/extract_schema.py: analyze workflow → extract controllable params
- scripts/run_workflow.py: inject args, submit, poll, download outputs
- scripts/check_deps.py: check missing nodes/models against running server
- scripts/comfyui_setup.sh: full setup automation with official CLI

Removed:
- references/cli-reference.md (was for unofficial comfyui-skill-cli)
- references/api-notes.md (replaced by rest-api.md)

Addresses feedback from PR #17316 comment:
- Correct author attribution
- Remove references to unofficial OpenClaw project
- License field reflects hermes-agent repo (MIT)
2026-04-29 12:38:59 -07:00
Brooklyn Nicholson 165d766891 skills: refine pretext creative demo guidance
Capture the reusable layout and animation lessons from the advanced Pretext demo so the skill teaches measured obstacle fields, morphing geometry, and polished browser examples.
2026-04-29 14:24:15 -05:00
Teknium 258449c468 chore(release): add Nanako0129 to AUTHOR_MAP 2026-04-29 12:10:40 -07:00
Nanako0129 2e991770fc fix(gemini): pass base_url into chat transport 2026-04-29 12:10:40 -07:00
Nanako0129 c5a5e586d7 fix(gemini): nest OpenAI-compat thinking config under google 2026-04-29 12:10:40 -07:00
github-actions[bot] 5a61c116e1 fix(nix): auto-refresh npm lockfile hashes
Source: 430302c197

Run: https://github.com/NousResearch/hermes-agent/actions/runs/25123381903
2026-04-29 18:07:17 +00:00
teknium1 69d4800db7 chore: add txbxxx to AUTHOR_MAP 2026-04-29 10:35:28 -07:00
txbxxx 9ee540a5e2 fix(install): promote croniter to a core dependency
Cron is a built-in Hermes feature (CLI `hermes cron`, `cronjob` agent
tool, gateway ticker, scheduler in cron/scheduler.py) but croniter has
been gated behind the [cron] optional extra. Users who do a plain
`pip install hermes-agent` can create jobs via /cron but any recurring
cron schedule silently returns next_run_at=None (HAS_CRONITER=False),
which then gets wrapped into a 'state=error' message only after a tick.

Move croniter into core dependencies so scheduled jobs work out of the
box on any install path. The [cron] extra is kept as an empty
passthrough so existing `pip install hermes-agent[cron]` installs and
the [all]/[termux] extras continue to resolve.

Also update the now-stale user-facing error message in
`compute_next_run()` that still tells users to install `hermes-agent[cron]`.

Salvaged from #17234 (authored by @txbxxx) with a corrected premise:
the original PR claimed [cron] wasn't in [all], but it is (pyproject.toml
line 112). The real UX problem is the plain no-extras install path,
which this fix addresses.
2026-04-29 10:35:28 -07:00
Teknium 0e577fb1be docs(curator): document that pinning also blocks skill_manage writes (#17578)
Add a dedicated 'Pinning a skill' section that covers both gating
layers — curator auto-transitions AND the agent's skill_manage tool
— so users know what the flag actually protects against after
PR #17562. Updates the one-line claim in 'How it runs' to cross-link
the new section instead of only mentioning auto-transitions.
2026-04-29 10:35:16 -07:00
Teknium c61b2e0af7 feat(skills): refuse skill_manage writes on pinned skills (#17562)
Extend curator's pin flag from 'skip auto-transitions' to 'no agent
edits at all'. All five skill_manage mutation actions (edit, patch,
delete, write_file, remove_file) now refuse pinned skills with a
message pointing the user at `hermes curator unpin <name>`.

Motivation: pin used to only stop the curator's own maintenance pass
from touching a skill. Nothing prevented the main agent from editing
or deleting a pinned skill via skill_manage in-session. This gives
users a hard fence against unwanted agent edits — same semantics as
curator pinning, extended to the write tool.

Create is unaffected (you can't pin a name that doesn't exist yet,
and name collisions already error out). Broken sidecars fail open
rather than lock the agent out.

The schema description advertises the new refusal so models know
not to route around it with rename/recreate tricks.
2026-04-29 10:28:25 -07:00
Teknium b01656d116 docs: exclude per-skill pages from search, add curator feature page (#17563)
Skill catalog pages (bundled/optional) were drowning out real user-guide
and reference docs in search results. There are ~3100 of them and they
match on almost every generic term.

- Add `ignoreFiles` regexes to docusaurus-search-local for
  `user-guide/skills/bundled/` and `user-guide/skills/optional/`.
  The two human-written catalog indexes (`reference/skills-catalog`,
  `reference/optional-skills-catalog`) remain indexed.
- Add a new feature page `user-guide/features/curator.md` covering the
  curator subsystem merged in #16049 and refined in #17307 (per-run
  reports): how it runs, config, CLI (`hermes curator status/run/pin/
  restore/...`), `.usage.json` telemetry, archival semantics, and
  recovery. Slotted into the Core features sidebar next to Skills.

Search index size dropped from 5822 docs to 2704 in the main section;
`user-guide/features/curator` is indexed.
2026-04-29 10:28:15 -07:00
Austin Pickett 430302c197 Merge pull request #17175 from NousResearch/fix/markdown
feat(latex): latex in tui
2026-04-29 10:18:17 -07:00
teknium1 40a98fb0fa feat(minimax-oauth): full integration with peer OAuth providers
Close integration gaps discovered by auditing qwen-oauth's file coverage.
These are surfaces the original salvage missed — they all existed on
main and were added in the 747 commits since PR #15203 was opened.

Coverage added:
- agent/credential_pool.py: seed pool from auth.json providers.minimax-oauth
  so `hermes auth list` reflects logged-in state and
  `hermes auth remove minimax-oauth <N>` works through the standard flow.
- agent/credential_sources.py: register RemovalStep for minimax-oauth
  with suppression-aware `_clear_auth_store_provider`.
- agent/models_dev.py: PROVIDER_TO_MODELS_DEV mapping (-> 'minimax' family).
- hermes_cli/providers.py: HermesOverlay entry (anthropic_messages transport,
  oauth_external auth_type, api.minimax.io/anthropic base).
- hermes_cli/model_normalize.py: add to _MATCHING_PREFIX_STRIP_PROVIDERS so
  `minimax-oauth/MiniMax-M2.7` in config.yaml gets correctly repaired.
- hermes_cli/status.py: render MiniMax OAuth block in `hermes doctor`
  (logged-in / region / expires_at / error).
- hermes_cli/web_server.py: register in OAUTH_PROVIDER_REGISTRY + dispatch
  branch in _resolve_provider_status so the dashboard auth page shows it.
- website/docs/integrations/providers.md: full 'MiniMax (OAuth)' section.
- website/docs/reference/cli-commands.md: --provider enum.
- website/docs/user-guide/features/fallback-providers.md: fallback table row.
- scripts/release.py AUTHOR_MAP: amanning3390 mapping (CI gate).
2026-04-29 09:53:42 -07:00
Adam Manning eafa637287 docs: document MiniMax OAuth login flow
Add comprehensive documentation for the minimax-oauth provider.

New file: website/docs/guides/minimax-oauth.md
  - Overview table (provider ID, auth type, models, endpoints)
  - Quick start via 'hermes model'
  - Manual login via 'hermes auth add minimax-oauth'
  - --region global|cn flag reference
  - The PKCE OAuth flow explained step-by-step
  - hermes doctor output example
  - Configuration reference (config.yaml shape, region table, aliases)
  - Environment variables note: MINIMAX_API_KEY is NOT used by
    minimax-oauth (OAuth path uses browser login)
  - Models table with context length note
  - Troubleshooting section: expired token, timeout, state mismatch,
    headless/remote sessions, not logged in
  - Logout command

Updated: website/docs/getting-started/quickstart.md
  - Add MiniMax (OAuth) to provider picker table as the recommended
    path for users who want MiniMax models without an API key

Updated: website/docs/user-guide/configuration.md
  - Add 'minimax-oauth' to the auxiliary providers list
  - Add MiniMax OAuth tip callout in the providers section
  - Add minimax-oauth row to the provider table (auxiliary tasks)
  - Add MiniMax OAuth config.yaml example in Common Setups

Updated: website/docs/reference/environment-variables.md
  - Annotate MINIMAX_API_KEY, MINIMAX_BASE_URL, MINIMAX_CN_API_KEY,
    MINIMAX_CN_BASE_URL as NOT used by minimax-oauth
  - Add minimax-oauth to HERMES_INFERENCE_PROVIDER allowed values
2026-04-29 09:53:42 -07:00
Adam Manning f3aa989b1b test(cli): cover minimax-oauth resolution, refresh, menu wiring
Add and extend tests for the minimax-oauth provider across three test
modules.

New file: tests/test_minimax_oauth.py (15 tests)
  - test_pkce_pair_produces_valid_s256: verifies PKCE verifier/challenge
    pair produces a valid S256 hash and correct lengths
  - test_request_user_code_happy_path: mocks httpx, verifies correct
    POST parameters and response parsing
  - test_request_user_code_state_mismatch_raises: verifies CSRF guard
  - test_request_user_code_non_200_raises: verifies HTTP error handling
  - test_poll_token_pending_then_success: verifies polling loop retries
    on 'pending' and returns on 'success'
  - test_poll_token_error_raises: verifies 'error' status raises AuthError
  - test_poll_token_timeout_raises: verifies deadline expiry raises
  - test_refresh_skip_when_not_expired: verifies no HTTP call when token
    is fresh
  - test_refresh_updates_access_token: verifies new access/refresh tokens
    stored on successful refresh
  - test_refresh_reuse_triggers_relogin_required: verifies
    relogin_required=True on invalid_grant/refresh_token_reused
  - test_resolve_credentials_requires_login: verifies AuthError when no
    stored state
  - test_provider_registry_contains_minimax_oauth: PROVIDER_REGISTRY key
  - test_minimax_oauth_alias_resolves: portal/global/underscore aliases
  - test_get_minimax_oauth_auth_status_not_logged_in
  - test_get_minimax_oauth_auth_status_logged_in

Extended: tests/hermes_cli/test_runtime_provider_resolution.py
  - test_minimax_oauth_runtime_returns_anthropic_messages_mode
  - test_minimax_oauth_runtime_uses_inference_base_url

Extended: tests/hermes_cli/test_api_key_providers.py
  - TestMinimaxOAuthProvider class (8 tests) covering registry keys,
    auth_type, endpoints, client_id, aliases, CANONICAL_PROVIDERS
    listing, _PROVIDER_MODELS entries, and aux model
2026-04-29 09:53:42 -07:00
Adam Manning 0b2f1bb27b feat(agent): wire MiniMax-M2.7 for minimax-oauth provider
Wire MiniMax-M2.7 and MiniMax-M2.7-highspeed into the model catalog,
CLI model picker, and agent auxiliary/metadata subsystems.

Changes:
- hermes_cli/models.py:
  - Add 'minimax-oauth' to _PROVIDER_MODELS with MiniMax-M2.7 and
    MiniMax-M2.7-highspeed
  - Add ProviderEntry('minimax-oauth', 'MiniMax (OAuth)', ...) to
    CANONICAL_PROVIDERS near existing minimax entries
  - Add aliases: minimax-portal, minimax-global, minimax_oauth in
    _PROVIDER_ALIASES
- hermes_cli/main.py:
  - Add 'minimax-oauth' to provider_labels dict
  - Insert 'minimax-oauth' into providers list in
    select_provider_and_model() near the other minimax entries
  - Add 'minimax-oauth' to --provider argparse choices
  - Add _model_flow_minimax_oauth() function: ensures login via
    _login_minimax_oauth(), resolves runtime credentials, prompts for
    model selection, saves model choice and config
  - Add dispatch elif branch for selected_provider == 'minimax-oauth'
- agent/auxiliary_client.py:
  - Add 'minimax-oauth': 'MiniMax-M2.7-highspeed' to
    _API_KEY_PROVIDER_AUX_MODELS
  - Add 'minimax-oauth' to _ANTHROPIC_COMPAT_PROVIDERS set
- agent/model_metadata.py:
  - Add 'minimax-oauth' to _PROVIDER_PREFIXES frozenset
  - MiniMax-M2.7 context length (200_000) already covered by the
    existing 'minimax' substring match in DEFAULT_CONTEXT_LENGTHS
2026-04-29 09:53:42 -07:00
Adam Manning 9eb16025bd feat(cli): add minimax-oauth provider with PKCE browser flow
Add MiniMax OAuth (minimax-oauth) as a first-class provider using a
PKCE device-code flow ported from openclaw/extensions/minimax/oauth.ts.

Changes:
- hermes_cli/auth.py:
  - Add 8 MINIMAX_OAUTH_* constants (client ID, scope, grant type,
    global/CN base URLs, inference URLs, refresh skew)
  - Add 'minimax-oauth' ProviderConfig to PROVIDER_REGISTRY (auth_type
    oauth_minimax) with global portal + inference base URLs and CN
    extras in the extra dict
  - Add provider aliases: minimax-portal, minimax-global, minimax_oauth
  - Implement _minimax_pkce_pair(), _minimax_request_user_code(),
    _minimax_poll_token(), _minimax_save_auth_state(),
    _minimax_oauth_login(), _refresh_minimax_oauth_state(),
    resolve_minimax_oauth_runtime_credentials(),
    get_minimax_oauth_auth_status(), _login_minimax_oauth()
  - Token refresh uses standard OAuth2 refresh_token grant; triggers
    relogin_required on invalid_grant / refresh_token_reused
- hermes_cli/runtime_provider.py:
  - Add minimax-oauth branch (after qwen-oauth) that calls
    resolve_minimax_oauth_runtime_credentials() and returns
    api_mode='anthropic_messages' with the OAuth Bearer token
- hermes_cli/auth_commands.py:
  - Add 'minimax-oauth' to _OAUTH_CAPABLE_PROVIDERS
  - Add auth_type auto-detection for oauth_minimax
  - Add provider == 'minimax-oauth' branch in auth_add_command
- hermes_cli/doctor.py:
  - Import get_minimax_oauth_auth_status
  - Add MiniMax OAuth status check in the Auth Providers section
2026-04-29 09:53:42 -07:00
teknium1 b2820cd207 chore: add beenherebefore to AUTHOR_MAP 2026-04-29 08:24:48 -07:00
beenherebefore e0c0167428 fix(cron): use last_run_at as croniter base for cron jobs
compute_next_run() ignored the last_run_at parameter for cron-type
schedules, always computing from _hermes_now() instead. This was
inconsistent with interval jobs which DO use last_run_at as the anchor.

After a crash or restart, cron jobs would compute next_run_at from
the arbitrary restart time rather than the actual last execution time.
While the stale detection in get_due_jobs() catches most cases, using
last_run_at as the croniter base eliminates edge cases and makes the
behavior consistent across schedule types.

Salvaged from #9014 (authored by @beenherebefore) onto current main.
The original PR branch was 2+ weeks stale and would have reverted
substantial unrelated work (jobs_file_lock, workdir/context_from/
enabled_toolsets, issue #16265 state=error recovery). Kept just the
7-line substantive fix and the regression test.
2026-04-29 08:24:48 -07:00
teknium1 6d8423761b chore: add yeyitech to AUTHOR_MAP 2026-04-29 08:21:04 -07:00
yeyitech ec27f0a3fa fix(cron): fall back gracefully when HERMES_CRON_TIMEOUT is invalid
Bare `float(os.getenv("HERMES_CRON_TIMEOUT", 600))` in `run_job()` raises
a `ValueError` when the env var is set to a non-numeric string (e.g. "abc").
Replace it with the same defensive try/except pattern already used by
`_get_script_timeout()` for `HERMES_CRON_SCRIPT_TIMEOUT`: log a warning
and fall back to the 600 s default instead of crashing.

Also update the existing env-var tests to exercise the new code path and
add two new tests — one for an invalid value, one for an empty string.

Fixes #11319

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:21:04 -07:00
Teknium 8c8fc6c1ec fix(skills): let skill_manage patch/edit/delete skills in external_dirs in place (#17512)
Closes #4759, closes #4381.

Mutating actions (patch, edit, write_file, remove_file, delete) used to
refuse skills that lived under `skills.external_dirs` with 'Skill X is in
an external directory and cannot be modified. Copy it to your local skills
directory first.'  Faced with that error, the agent would fall back to
action='create', which always writes under ~/.hermes/skills/ — producing
a silent duplicate of the external skill in the local store.

Fix: drop the read-only gate.  `skills.external_dirs` is configured by the
user; if they pointed it at a directory, they already said 'these are my
skills, treat them the same.'  Filesystem permissions handle the genuine
read-only case (write fails, agent sees the error).

- New _containing_skills_root() resolves whichever dir actually contains
  the skill; _delete_skill uses it to bound empty-category cleanup so an
  external root is never rmdir'd.
- _create_skill behavior is unchanged: new skills still land in local
  SKILLS_DIR only.  Fewer moving parts.
- Seven new TestExternalSkillMutations tests covering patch/edit/write_file/
  remove_file/delete/create against a mocked two-root layout + a category
  rmdir-safety check.
2026-04-29 08:16:52 -07:00
Teknium e120cd5941 fix(model_switch): dedup /model picker rows when custom provider endpoint matches a built-in (#16970) (#17511)
When a user authenticates a built-in provider via env var (e.g. DASHSCOPE_API_KEY
triggers the built-in 'alibaba' row) AND defines a custom_providers entry
pointing at the same endpoint, the picker previously emitted two rows for one
endpoint. The built-in row already carries the canonical slug, curated model
list, and correct auth wiring, so the shadow custom entry is redundant.

Adds a _builtin_endpoints set populated as sections 1/2/2b emit rows. Each
entry is the provider's effective base URL (env override via base_url_env_var
wins over the static inference_base_url, so DASHSCOPE_BASE_URL-overridden
endpoints dedup correctly). Section 4 skips any grouped custom entry whose
base_url matches.

Intentionally does NOT repurpose model_catalog.enabled as a 'hide built-ins'
flag. That config controls the remote curated-manifest fetch (documented on
the model-catalog reference page) and overloading it would silently change
behavior for users who disable it for network/privacy reasons.

Three new tests:
- shadow dedup fires when endpoint matches static inference_base_url
- dedup does NOT hide custom entries on genuinely distinct endpoints
- dedup honors the base_url_env_var override path
2026-04-29 08:11:05 -07:00
teknium1 fa3338c171 test(anthropic): regression guard for DeepSeek /anthropic thinking replay
Covers the #16748 fix:
- unsigned thinking blocks synthesised from reasoning_content survive replay
- non-latest assistant turns keep their thinking (DeepSeek validates every turn)
- signed Anthropic blocks are stripped (DeepSeek can't validate them)
- cache_control is stripped from thinking blocks
- OpenAI-compat base (api.deepseek.com without /anthropic) is NOT matched
- non-DeepSeek third parties (minimax) keep the generic strip-all behaviour
2026-04-29 08:10:29 -07:00
vominh1919 fd5479a4fc fix: preserve DeepSeek thinking blocks on Anthropic replay (#16748)
DeepSeek's /anthropic endpoint requires thinking blocks to be replayed
in multi-turn conversations for reasoning continuity. The existing code
classified api.deepseek.com as a generic third-party endpoint and stripped
ALL thinking blocks, causing HTTP 400 from DeepSeek.

Fix: add _is_deepseek_anthropic_endpoint() detector (following the Kimi
precedent) and a dedicated branch that strips only signed Anthropic blocks
while preserving unsigned ones synthesised from reasoning_content.

This follows the exact same pattern as the Kimi exemption (issue #13848)
and does not change behavior for any other third-party endpoint (Azure,
Bedrock, MiniMax, etc.).

Fixes NousResearch/hermes-agent#16748
2026-04-29 08:10:29 -07:00
teknium1 fd7188a7c6 chore(release): map liuhao03@bilibili.com to @liuhao1024 2026-04-29 08:10:25 -07:00
刘昊 60c6b07128 fix(cron): keep SOUL.md identity when workdir is unset 2026-04-29 08:10:25 -07:00
teknium1 0a5ee01e48 fix(hindsight): route flush-on-switch through writer queue, not raw thread
Follow-up to the cherry-picked PR #17447. The original flush spawned a
bare threading.Thread for the buffer-flush path, overwriting
self._sync_thread — which is aliased to the long-lived writer thread.
Two consequences:

1. No serialization with the writer queue. If old-session retains were
   still queued in _retain_queue, the flush ran concurrently with the
   writer and both threads could call aretain_batch against the same
   document_id.
2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the
   long-lived writer, which never exits, so the join was a no-op that
   just timed out — never actually serialized anything.

Fix: enqueue the flush closure on _retain_queue via _ensure_writer +
put(). Natural FIFO ordering behind any pending retains, no new thread,
no broken join. Shutdown-aware so it doesn't enqueue after teardown.

Tests updated to drain via _retain_queue.join() instead of the stale
_sync_thread.join(). Added regression guard
test_flush_serializes_behind_pending_retains_via_writer_queue that
blocks the writer mid-retain to prove the flush waits in FIFO behind
the old retain.

Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on
the bare-object test helper in test_memory_session_switch.py so that
path doesn't blow up under the new queue-enqueue.

tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing.
2026-04-29 08:09:03 -07:00
Nicolò Boschi c38dac742b fix(hindsight): flush buffered turns and drop stale prefetch on session switch
Two data-loss / leak gaps in HindsightMemoryProvider.on_session_switch
introduced by #17409.

1. Buffered turns silently lost when retain_every_n_turns > 1.
   on_session_switch unconditionally cleared _session_turns without
   flushing. Users who batched every N>1 turns and switched mid-batch
   (/reset, /new, /resume, /branch, or context compression) had those
   buffered turns disappear. Same data-loss class as the shutdown race,
   different lifecycle event.

   Note commit_memory_session() -> on_session_end() runs *before*
   on_session_switch on /reset, but Hindsight doesn't implement
   on_session_end so the buffer survives that step and dies at clear
   time. /resume, /branch, and compression skip commit_memory_session
   entirely so an on_session_end impl wouldn't help them anyway.

   Fix: snapshot the old _session_id, _document_id, _parent_session_id,
   _turn_index, and _session_turns; spawn one final retain that lands
   under the OLD document_id; then rotate state. Metadata is built
   synchronously against the old self._* so session_id / lineage tags
   on the flushed item all reference the prior session consistently.

2. Stale _prefetch_result leaks across switch.
   If queue_prefetch ran in the old session and the result hadn't been
   consumed by prefetch() yet, on_session_switch left the cached recall
   text in place. The next session's first prefetch() call would return
   text mined from the prior session's bank/query.

   Fix: join any in-flight _prefetch_thread (3s bounded — matches
   shutdown()), then clear _prefetch_result under _prefetch_lock before
   rotating session_id.

Tests
-----
- tests/plugins/memory/test_hindsight_provider.py (TestSessionSwitchBufferFlush):
    - buffered turns flushed under OLD document_id with OLD lineage tags
    - empty buffer => no spurious retain
    - _prefetch_result cleared on switch
    - in-flight prefetch thread is awaited before clear (no race)
- tests/agent/test_memory_session_switch.py: factory extended to seed the
  attrs the new flush path reads (_retain_source, _platform, _bank_id,
  prefetch state, etc.) and stub _run_hindsight_operation so existing
  switch-state assertions keep passing without network setup.
2026-04-29 08:09:03 -07:00
Teknium 1bedc836b5 docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507)
The ~/.openclaw/ detection banner (#16327) had two problems flagged in #16629:

1. It only pitched 'hermes claw cleanup' (destructive archive) and never
   mentioned 'hermes claw migrate' — the actual non-destructive path that
   ports config/memory/skills into Hermes.
2. The copy anthropomorphized the bug ('the agent can still get confused',
   'dutifully reads') and framed OpenClaw as a competitor to eliminate
   ('instead of Hermes's').

Rewrite so migrate leads, cleanup is a clearly-labelled follow-up with a
warning that archiving breaks OpenClaw for users still running it.

Closes #16629
2026-04-29 08:08:36 -07:00
briandevans e0a03f3f40 fix(api-server): collapse tool start/lifecycle into a single SSE event
Address Copilot review on PR #16666:

1. **Duplicate event on every tool start** — both ``tool_progress_callback``
   and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so
   wiring both into chat completions emitted *two* ``hermes.tool.progress``
   events per real tool call. Drop the legacy ``_on_tool_progress`` emit
   entirely; ``_on_tool_start`` now produces a single unified event that
   carries the legacy ``tool``/``emoji``/``label`` fields plus the new
   ``toolCallId``/``status`` correlation fields. Label is computed inline
   via ``build_tool_preview`` so callers do not need to pre-format it.

2. **Weak per-event correlation in the regression test** — the previous
   assertion checked that a ``toolCallId`` appeared *somewhere* in the
   aggregate, which would have passed even if ``running`` lacked the id.
   Collect ``(status, toolCallId)`` per event and assert each event
   carries the correct pair, plus exactly two events on the wire (no
   silent duplication regression).

The two existing chat-completions tool-progress tests are updated to fire
``tool_start_callback`` instead of ``tool_progress_callback``, matching
production reality where ``run_agent`` always pairs them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:08:16 -07:00
kshitijk4poor 13c238327e fix: address self-review findings for Vercel Sandbox salvage
- Add vercel_sandbox to hardline blocklist container bypass test
- Add vercel_sandbox to skills_tool remote backend parametrize test
- Deduplicate runtime set: doctor.py and setup.py now import
  _SUPPORTED_VERCEL_RUNTIMES from terminal_tool.py
- Add docstring to _run_bash explaining timeout/stdin_data discards
- Always stop sandbox during cleanup (unconditional, matching Modal/Daytona)
- Update security.md: container bypass text, production tip, comparison table
- Update environment-variables.md: TERMINAL_ENV list, Vercel auth vars,
  TERMINAL_VERCEL_RUNTIME
- Update inline comments in cli.py and config.py to include vercel_sandbox
2026-04-29 07:22:33 -07:00
Scott Trinh 5a1d4f6804 feat: add Vercel Sandbox backend
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).

Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.

Based on #17127 by @scotttrinh, cherry-picked onto current main.
2026-04-29 07:22:33 -07:00
Magaav 810d98e892 feat(api_server): expose run status for external UIs (#17085)
Adds two API server endpoints for external UIs and orchestrators:

- GET /v1/capabilities — machine-readable feature discovery so clients
  can detect which Runs API / SSE / auth features this Hermes version
  supports before depending on them.
- GET /v1/runs/{run_id} — pollable run status so dashboards can check
  queued/running/completed/failed/cancelled/stopping state without
  holding an SSE connection open.

Also moves request validation ahead of run allocation so invalid
payloads no longer leave orphaned entries in _run_streams waiting for
the TTL sweep.

task_id is intentionally kept as "default" for the Runs API to
preserve the shared-sandbox model used by CLI, gateway, and the
existing _run_agent_with_callbacks path. session_id is surfaced in
run status for external-UI correlation only.

Salvage of PR #17085 by @Magaav.
2026-04-29 06:38:10 -07:00
Teknium 83c288da01 fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455)
The guard that drops Anthropic's `thinking` kwarg for Kimi endpoints was
matched on `https://api.kimi.com/coding` only.  Users configuring a
custom Kimi-compatible gateway (or an official Moonshot host) with
`api_mode: anthropic_messages` fall through to the generic third-party
path, which strips thinking blocks AND still sends
`thinking={enabled,...}` → upstream rejects with HTTP 400
"reasoning_content is missing in assistant tool call message at index N"
on the next request after a tool call.

Replace `_is_kimi_coding_endpoint` callers (history replay + thinking
kwarg gate) with `_is_kimi_family_endpoint(base_url, model)` that also
matches the `api.kimi.com` / `moonshot.ai` / `moonshot.cn` hosts and
Kimi/Moonshot family model names (`kimi-`, `moonshot-`, `k1.`, `k2.`,
…) for custom / proxied endpoints.  Keeps the UA-header check in
`build_anthropic_client` URL-only — the `claude-code/0.1.0` header is
an official-Kimi contract.

Plumbs optional `model` through `convert_messages_to_anthropic` so
the unsigned reasoning_content→thinking block synthesised for Kimi's
history validation survives the third-party signature-stripping pass
on custom hosts too.

Closes #17057.
2026-04-29 06:35:42 -07:00
Teknium 398945e7b1 fix(cron): accept list-form deliver values so deliver=['telegram'] works (#17456)
The cron schema contracts deliver as a string ("local", "origin",
"telegram", "telegram:chat_id[:thread_id]", or comma-separated combos),
but MCP clients and scripts sometimes pass an array like ['telegram'].

Before this change, the list was written to jobs.json verbatim, and
the scheduler's str(deliver).split(',') then tried to resolve the
literal string "['telegram']" as a platform — returning None and
logging 'no delivery target resolved for deliver=[\'telegram\']'.

Fix on both ends:
- tools/cronjob_tools.py: normalize deliver at the API boundary on
  create and update, so storage is always a string.
- cron/scheduler.py: normalize deliver in _resolve_delivery_targets,
  so existing jobs.json entries with list-form deliver are handled
  gracefully without requiring users to edit the file.

Closes #17139
2026-04-29 06:35:34 -07:00
vominh1919 7141cda967 fix: narrow Anthropic adapter dot-mangling to Claude models only
The normalize_model_name() function unconditionally converted dots to
hyphens in all model names. This caused non-Anthropic models (e.g.
gpt-5.4) to be mangled to gpt-5-4 when routed through the Anthropic
adapter path, resulting in HTTP 404 from the backend.

Now only applies dot-to-hyphen conversion for models starting with
"claude-" or "anthropic/", which are the actual Anthropic model IDs.

Fixes NousResearch/hermes-agent#17171
Related: #7421, #13061, #16417
2026-04-29 06:34:57 -07:00
Nicolò Boschi 0565497dcc fix(hindsight): drain retain queue cleanly on shutdown
The plugin used to spawn one daemon thread per sync_turn() to do the
aretain_batch network write. On CLI exit, that pattern raced interpreter
shutdown — the last retain could reach aiohttp after asyncio's
"cannot schedule new futures" guard had fired, producing noisy logs and
silently losing the final unsaved turn:

    WARNING ... Hindsight sync failed: cannot schedule new futures after
            interpreter shutdown
    ERROR asyncio: Unclosed client session
            client_session: <aiohttp.client.ClientSession object at 0x...>

Switch to a single-writer model: each provider owns one long-lived
writer thread plus a queue. sync_turn() snapshots state and enqueues a
job; the writer drains sequentially. Once shutdown() is called:

  - new sync_turn() / queue_prefetch() calls are dropped, not enqueued
  - a sentinel wakes the writer so it finishes in-flight work
  - shutdown joins the writer (10s) before nulling the client

Also register an idempotent atexit hook from the first sync_turn(), so
exit paths that don't go through MemoryManager.shutdown_all() (Ctrl-C,
abrupt exit) still get a chance to drain.

Tests: keep _sync_thread as a legacy alias to the writer, swap join()
calls to _retain_queue.join() (canonical wait-for-drain), add a new
TestShutdownRace suite covering single-writer reuse, post-shutdown drop,
queue draining, and shutdown idempotency.
2026-04-29 06:34:24 -07:00
teknium1 5662ac2afc chore(release): map Kailigithub email to GitHub login 2026-04-29 06:34:13 -07:00
Kailigithub cf83982da0 fix(gateway): handle wmic encoding errors on Windows non-English locales
Pass encoding='utf-8', errors='ignore' and guard against result.stdout
being None so _scan_gateway_pids() no longer crashes with
UnicodeDecodeError + AttributeError on Windows systems whose default
code page is not UTF-8 (e.g. cp936 on zh-CN). The parser only matches
the ASCII prefixes CommandLine= and ProcessId=, so dropping undecodable
bytes is safe.

Closes #17049.
2026-04-29 06:34:13 -07:00
briandevans 835f9adec0 fix(update,test): clarify wmic comment; switch tests to monkeypatch sys.platform
Two fix-ups for #17123:

1. Reword the inline comment in `_warn_stale_dashboard_processes` to
   accurately describe the failure mode (locale-dependent decoder, not a
   "default UTF-8 decoder") and identify `errors="ignore"` as the
   load-bearing protection. Per Copilot's review.

2. Switch `TestWindowsWmicEncoding` from `patch("hermes_cli.main.sys")`
   to `monkeypatch.setattr(sys, "platform", "win32")` — the codebase's
   canonical pattern (e.g. `tests/hermes_cli/test_auth_ssl_macos.py`).
   The MagicMock-replacement approach passed locally on Python 3.12 but
   the platform-equality check failed under CI's xdist+Python 3.11,
   leaving both new tests red despite the fix being present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
briandevans b85fff9495 fix(update): protect dashboard wmic scan against UnicodeDecodeError on Windows non-UTF-8 locales (#17049)
`hermes update` calls `_warn_stale_dashboard_processes()` to warn about
dashboard processes still running the pre-update Python backend. On
Windows, that scan shells out to `wmic process get ProcessId,CommandLine
/FORMAT:LIST` with `text=True` and no explicit encoding.

`wmic` emits text in the system code page (e.g. cp936 on zh-CN locales),
not UTF-8. Without an explicit `encoding=`, Python's default UTF-8
decoder crashes the subprocess reader thread with
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 ...`. In
Python 3.11 that crash is silently absorbed: `subprocess.run()` returns
a `CompletedProcess` with `result.stdout = None`, the next line calls
`result.stdout.split("\n")`, and `hermes update` aborts with the
exact `AttributeError: 'NoneType' object has no attribute 'split'`
trace reported in #17049.

Fix: pass `encoding="utf-8", errors="ignore"` so undecodable bytes
cannot take down the reader thread (the parsing only matches the ASCII
prefixes `CommandLine=` and `ProcessId=`, so dropping non-UTF-8 bytes
is safe), and short-circuit when `result.stdout is None` as a defensive
guard for environments where the reader thread still fails for other
reasons.

This is the same root cause as #17074 (which patches
`hermes_cli/gateway._scan_gateway_pids` for the `hermes setup` path).
That PR does not touch `_warn_stale_dashboard_processes`, so
`hermes update` remains broken on the same locales until this lands.

Regression test in `tests/hermes_cli/test_update_stale_dashboard.py`:
- `test_wmic_invoked_with_utf8_ignore_errors` asserts the explicit
  encoding/errors kwargs reach `subprocess.run`.
- `test_wmic_returns_none_stdout_does_not_crash` simulates the
  reader-thread-crashed `result.stdout=None` aftermath and asserts the
  function returns silently instead of raising AttributeError.

Both new tests fail against clean origin/main (7d4648461) reproducing
the original AttributeError; both pass with this patch. The remaining
3 failures in `tests/hermes_cli/test_cmd_update.py` and
`test_update_autostash.py` are pre-existing baselines on origin/main —
they reproduce identically without this change and are unrelated to
the wmic scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
Teknium f317325279 docs(weixin): clarify iLink bot identity limits and warn on group policy (#17433)
QR-login connects an iLink bot identity (...@im.bot), not a scriptable
personal WeChat account. iLink typically does not deliver ordinary WeChat
group events to these bots, so WEIXIN_GROUP_POLICY / WEIXIN_GROUP_ALLOWED_USERS
often have no effect regardless of value.

- Setup wizard: print iLink-bot caveat before the group-policy prompt; relabel
  the allowlist input as 'group chat IDs (not member user IDs)'; note that
  'open' / 'allowlist' only take effect if iLink delivers group events.
- Adapter: log a WARNING at connect() when WEIXIN_GROUP_POLICY is non-disabled
  so the limitation is surfaced in gateway logs, not just docs.
- Docs: add a top-of-page warning callout to weixin.md explaining the iLink
  bot identity, narrow the 'DM and group messaging' feature line to DM-only
  with a group caveat, tighten the Group Policy section and troubleshooting
  row, and clarify WEIXIN_GROUP_ALLOWED_USERS as group IDs (not user IDs)
  in weixin.md and environment-variables.md.

Closes #17094
2026-04-29 06:26:10 -07:00
teknium1 9e63062b6c fix(stt): resolve API keys from ~/.hermes/.env via get_env_value (#17140)
Widen #17163 to the sibling file tools/transcription_tools.py, which had
the same class of bug. STT provider call sites and the _get_provider
selection gate called os.getenv(...) directly and missed keys that only
lived in ~/.hermes/.env.

Same pattern as tts_tool.py: one guarded top-level import of
get_env_value (falls back to os.getenv on ImportError), then every
API-key and paired-base-URL lookup swapped over.

Call sites migrated:
- _transcribe_groq    — GROQ_API_KEY
- _transcribe_mistral — MISTRAL_API_KEY
- _transcribe_xai     — XAI_API_KEY, XAI_STT_BASE_URL
- _get_provider       — GROQ/MISTRAL/XAI_API_KEY in explicit + auto branches

Module-level defaults (DEFAULT_STT_MODEL, GROQ_BASE_URL, etc.) stay on
os.getenv — they're import-time constants, not runtime config, and the
dotenv fallback would add no value there.

New regression tests in tests/tools/test_transcription_dotenv_fallback.py
(8 cases) mirror briandevans' TTS tests: per-provider dotenv-key
forwarding, selection-gate dotenv visibility, and an end-to-end probe
that patches hermes_cli.config.load_env to simulate ~/.hermes/.env
carrying the key while os.environ does not.
2026-04-29 06:25:20 -07:00
briandevans 33967b4e52 fix(tts): tolerate missing hermes_cli.config in tts_tool import
Wrap the new top-level `from hermes_cli.config import get_env_value`
in try/except ImportError and fall back to a thin os.getenv shim, so
importing tools.tts_tool keeps working in environments where
hermes_cli.config is unavailable. This matches the existing tolerance
in `_load_tts_config()` (tools/tts_tool.py) and the same
import-fallback pattern in tools/tool_backend_helpers.py::fal_key_is_configured.

Also update the TestDotenvFallbackPerProvider docstring to accurately
describe the mocking strategy: per-provider tests patch
`tools.tts_tool.get_env_value` directly, while the regression-guard
tests cover the lower-level `hermes_cli.config.load_env` integration.

Addresses Copilot review on #17163.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:25:20 -07:00
briandevans 40d25e125b fix(tts): resolve API keys from ~/.hermes/.env via get_env_value (#17140)
TTS provider tools (elevenlabs, xai, minimax, mistral, gemini) called
os.getenv("X_API_KEY") directly, which bypassed Hermes's dotenv bridge in
hermes_cli.config. Users who keep their TTS keys only in ~/.hermes/.env saw
"X_API_KEY not set" errors even though the rest of the stack
(agent/credential_pool, hermes_cli/auth) already resolves keys through
get_env_value() — same class of bug as #15914 fixed for those modules.

Switch every TTS env-var lookup (API keys, base URLs, and
check_tts_requirements gates) to get_env_value, which checks os.environ
first and then ~/.hermes/.env. Behaviour for users with keys exported in
the shell is unchanged; users with dotenv-only keys now succeed. The two
diagnostics prints in __main__ are migrated for consistency.

Regression test (tests/tools/test_tts_dotenv_fallback.py):
  - per-provider: each backend reads the dotenv key when only
    ~/.hermes/.env carries it (5 providers).
  - end-to-end: with hermes_cli.config.load_env returning the key and
    os.environ empty, _generate_minimax_tts and check_tts_requirements
    both succeed; reverting tools/tts_tool.py back to os.getenv makes all
    7 tests fail with "MINIMAX_API_KEY not set" / similar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:25:20 -07:00
Teknium ff687c019e fix(aux): skip kimi-coding in vision auto-detect (closes #17076) (#17451)
* docs(anthropic): correct OAuth scope to Max plan + extra usage credits only

The previous docs pass (#17399) overstated what Anthropic OAuth works
with. In practice Hermes can only route against a Claude Max plan that
has purchased extra usage credits — the base Max allowance is not
consumed, and Claude Pro is not supported at all. Without Max + extra
credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token).

Updates the four pages touched in #17399:
- integrations/providers.md
- user-guide/features/credential-pools.md
- reference/environment-variables.md
- getting-started/quickstart.md

* fix(aux): skip kimi-coding in vision auto-detect (closes #17076)

Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no
image_in capability — Kimi's own docs confirm and suggest switching to
a vision-capable model. Vision lives on the separate Kimi Platform
(api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has
kimi-coding as main provider and auxiliary.vision.provider=auto,
resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient
wrapped around /coding which 404'd on every vision request.

Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn})
and gate the main-provider vision branch on membership. On a skip the
auto-detect falls through to OpenRouter → Nous like any other
main-provider-unavailable case.

Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are
unaffected — the skip only applies when the caller is in auto mode.

Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering
the skip path, CN variant, explicit-override passthrough, and a guard
against accidental skip-list widening.
2026-04-29 06:10:23 -07:00
Teknium aea72c0936 skills: adapt spike/sketch + 2 references from gsd-build/get-shit-done (MIT) (#17421)
* skills: port spike, sketch, and gates/context-budget references from GSD

Adds two new lightweight standalone skills and two reference docs adapted
from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson). All ports
coexist cleanly with a full `npx get-shit-done-cc --hermes --global`
install — GSD lives under `skills/gsd-*/`, these ports live at their
natural Hermes category paths, zero name collisions.

New skills:
- skills/software-development/spike/ — Lightweight "spike an idea with
  throwaway experiments" workflow: decompose into Given/When/Then
  questions, research per-spike, build comparable variants, close with
  VALIDATED/PARTIAL/INVALIDATED verdict. Standalone alternative to the
  full `gsd-spike` (which requires `.planning/spikes/` state machinery
  and the rest of GSD).
- skills/creative/sketch/ — Lightweight "sketch 2-3 HTML design
  variants" workflow: intake (feel, references, core action), produce
  differentiated variants along a design axis, head-to-head comparison.
  Standalone alternative to the full `gsd-sketch`.

New references under subagent-driven-development/:
- references/context-budget-discipline.md — Four-tier context
  degradation model (PEAK/GOOD/DEGRADING/POOR at 0-30%/30-50%/50-70%/70%+)
  with read-depth rules that scale with context window size, plus early
  warning signs of silent degradation (silent partial completion,
  increasing vagueness, skipped protocol steps).
- references/gates-taxonomy.md — Four canonical gate types for
  validation checkpoints: Pre-flight (precondition block), Revision
  (bounded retry loop with stall detection), Escalation (pause for
  human decision), Abort (terminate to prevent damage). Each ships
  with behavior, recovery, and examples.

Collision guard: each port has explicit "If the user has the full GSD
system installed" guidance directing the agent to prefer `gsd-spike` /
`gsd-sketch` when the full workflow is available. Verified end-to-end
with 86 GSD skills + these 2 Hermes ports installed in the same
HERMES_HOME — 90 total skills, zero duplicate names, both
counterparts appear in the system prompt with distinct descriptions.

Attribution preserved in each SKILL.md footer per MIT notice
requirement. Full GSD system now installable via
`npx get-shit-done-cc --hermes --global` (gsd-build/get-shit-done#2845).

* skills/gsd-port: tighten descriptions, surface Hermes-native tools

Review feedback adjustments to the spike/sketch ports from the previous
commit on this branch:

- description lengths trimmed to <=60 chars with trigger-first phrasing
  (spike: 55 chars 'Throwaway experiments to validate an idea before build.';
   sketch: 55 chars 'Throwaway HTML mockups: 2-3 design variants to compare.')
- author field credits gsd-build/get-shit-done explicitly
- stale duplicate top-level `tags:` removed from sketch frontmatter
  (Hermes reads only metadata.hermes.tags — the top-level field was
  dead weight)
- spike research step now shows concrete Hermes tool calls
  (web_search, web_extract with real URLs, terminal for venv inspection)
  instead of just naming the tool names
- spike build step adds a worked tool-sequence example
  (terminal + write_file + terminal to run) and a delegate_task fan-out
  pattern for parallel comparison spikes (002a / 002b)
- sketch build step adds browser_navigate + browser_vision verification
  step — visual spot-check that catches layout bugs pure source
  inspection misses
- sketch Output section adds a worked tool-sequence example mirroring
  the spike pattern

Descriptions now lead with 'Throwaway' (the pattern-match word that
signals 'disposable / not production code') — gives the agent a clean
activation signal in the system-prompt skill index.
2026-04-29 06:10:05 -07:00
vominh1919 fe6c86623f fix: close file descriptor in LocalEnvironment._update_cwd
_update_cwd() uses a bare open(self._cwd_file).read() that never
closes the file descriptor. This method runs on every terminal
command execution, so the fd leaks accumulate in long sessions.

Use a with statement so the fd is released promptly.

Fixes #15552 (standalone resubmission)
2026-04-29 05:46:52 -07:00
teknium1 258755a24f test(weixin): cover _is_stale_session_ret helper (#17228)
Regression test for the ret=-2 / errmsg='unknown error' disambiguation:
- ret=-2 or errcode=-2 with 'unknown error' → stale session (True)
- ret=-2 with 'freq limit' or other errmsg → rate limit (False)
- ret=-14 → not matched here (handled by SESSION_EXPIRED_ERRCODE path)
- Success codes and missing errmsg → False
2026-04-29 05:44:44 -07:00
vominh1919 e9b96fd050 fix: recognize ret=-2 as stale-session signal in Weixin adapter
The Weixin adapter only recognized errcode=-14 as a session-expired
signal. However, iLink also returns ret=-2 with errmsg="unknown error"
for the same underlying condition (stale session). The adapter treated
ret=-2 as a rate-limit, exhausting retries with the same stale
context_token instead of refreshing the session.

Added _is_stale_session_ret() helper that distinguishes ret=-2 with
"unknown error" from genuine rate limits. Updated both the poll loop
and _send_text_chunk to use the helper.

Fixes NousResearch/hermes-agent#17228
2026-04-29 05:44:44 -07:00
teknium1 b0435cc164 fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback
_run_async() bridges sync tool handlers to async code. When the handler
is invoked from inside a running event loop (gateway / nested async),
it spawns a worker thread and blocks on future.result(timeout=300).

Before this change, a coroutine that ran past 300s leaked its worker
thread:

  - future.cancel() is a no-op on a running ThreadPoolExecutor future
    (cancel only works on not-yet-started work).
  - pool.shutdown(wait=False, cancel_futures=True) let the caller
    proceed but the worker kept running the coroutine until it
    returned on its own.

Every tool timeout leaked one thread. In long-lived gateway / RL
sessions this is cumulative.

The fix replaces bare asyncio.run() with a worker wrapper that
creates its own event loop. On timeout, _run_async schedules
task.cancel() on that loop via call_soon_threadsafe, then shuts the
pool down with wait=False so the caller returns immediately. The
coroutine observes CancelledError at its next await and the worker
thread exits cleanly.

Also switches logger.error() to logger.exception() in the top-level
handle_function_call() except block so tool failures produce full
stack traces in errors.log instead of just the message.

Related: #17420 (contributor flagged the leak; the original fix used
pool.shutdown(wait=True) which would have converted the leak into a
hang — caller blocks forever on the same stuck coroutine). Credit
for identifying the leak goes to the contributor.

Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>
2026-04-29 05:00:40 -07:00
teknium1 46437966cc chore(release): map tmimmanuel email to GitHub login 2026-04-29 05:00:37 -07:00
tmimmanuel 3606414ec7 fix(gateway): isolate platform connect failures with per-platform timeout
Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging
during startup or reconnect cannot block the others. Telegram's 8-retry
connect loop (~140s worst case) previously prevented Feishu from ever
starting when Telegram was network-restricted — common for users in
regions where Telegram is blocked.

Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT
(0 disables). Applied to both startup and the reconnect watcher so a
platform that hangs mid-retry also does not stall retries for others.

Fixes #17242
2026-04-29 05:00:37 -07:00
Teknium 20b759cd02 fix(process): reconcile session.exited against real child exit in poll/wait (#17430)
When a background terminal process spawns a descendant daemon that
inherits the stdout pipe (e.g. 'hermes update' triggering a gateway
systemctl restart), the reader thread's stdout.read() never returns EOF
and its finally: block never runs. session.exited stays False forever,
so process(action='poll') returns 'running' indefinitely even though
the direct child exited long ago.

Issue #17327: Feishu user polled 74 times over 7 minutes before killing
the gateway manually.

Fix: add _reconcile_local_exit() that checks the direct Popen.poll()
before trusting session.exited. If the direct child has exited, drain
any immediately-readable bytes non-blocking and flip session.exited.
Called from poll() and wait(). The stuck reader thread remains blocked
but is a daemon thread and gets reaped with the process.

Safe no-op for env/PTY sessions, already-exited sessions, and live
children (returns None from Popen.poll()).
2026-04-29 04:59:21 -07:00
Teknium 13683c0842 feat(memory): notify providers on mid-process session_id rotation (#17409)
Fixes #6672

Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.

MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
  parent_session_id='', reset=False, **kwargs) with no-op default for
  backward compat. reset=True signals /reset or /new — providers should
  flush accumulated per-session buffers. reset=False for /resume,
  /branch, compression where the logical conversation continues.

MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
  Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
  during shutdown paths.

run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
  into sync_all() and queue_prefetch_all(). Providers with defensive
  session_id updates in sync_turn (Hindsight already had this at
  plugins/memory/hindsight/__init__.py:1199) now actually receive the
  current id.
- Compression block at ~L8884 already notified the context engine of
  the rollover; now also calls
  _memory_manager.on_session_switch(reason='compression').

cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
  flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
  previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
  parent session_id already captured for the DB parent link.

gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
  /branch and /reset. The next message rebuilds a fresh agent whose
  memory provider initialize() runs with the correct session_id —
  matches the pattern the gateway already uses for provider state
  cross-session transitions.

Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
  updates _session_id, mints a fresh _document_id (prevents
  vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
  _turn_counter / _turn_index so in-flight batches don't flush under
  the new document id. parent_session_id only overwritten when provided
  (avoids clobbering on a bare switch).

Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
  default no-op, manager fan-out, failure isolation, empty-id no-op,
  session_id propagation through sync_all/queue_prefetch_all, Hindsight
  state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
  the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
  evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
  assertions to account for the session_id kwarg on sync_all and
  queue_prefetch_all.

E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash

Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
2026-04-29 04:57:22 -07:00
teknium1 d244596dba chore: add rylena to AUTHOR_MAP for PR #17363 2026-04-29 04:57:01 -07:00
Rylen Anil 37d107e03d [verified] fix(gateway): accept user systemd private socket during preflight 2026-04-29 04:57:01 -07:00
Teknium df0e97a168 fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425)
MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes #17332
2026-04-29 04:56:55 -07:00
Oluwadare Feranmi 860ff445f6 fix(usage_pricing): add MiniMax-M2.7 pricing for minimax and minimax-cn providers
Fixes #16825. Sessions using MiniMax-M2.7 via minimax-cn showed
estimated_cost_usd=0.0 and cost_status='unknown' because neither
provider had a billing route or pricing entry. Adds official_docs_snapshot
entries ($0.30/M input, $1.20/M output) for both minimax and minimax-cn,
and adds explicit routing in resolve_billing_route so both resolve to
billing_mode='official_docs_snapshot' instead of falling through to 'unknown'.
2026-04-29 04:56:50 -07:00
loongzhao ecaf8008bb feat(yuanbao): wire native text + media delivery into send_message
_send_yuanbao() already supported media_files= and the user-facing
error strings already advertised yuanbao support, but there was no
dispatch branch in _send_to_platform() actually routing to it. Target
yuanbao in send_message previously fell through to
"Direct sending not yet implemented".

- Add yuanbao media-chunk branch (mirrors Signal/Matrix: media on
  final chunk only).
- Add yuanbao elif in the non-media loop.

Salvage of #17411; SKILL.md description change and redundant
sidebars.ts entry dropped, indentation/trailing-whitespace cleaned up.
2026-04-29 04:56:18 -07:00
teknium1 4a62ba9ccd fix(signal): correct SPOILER docstring + AUTHOR_MAP for exiao
- _markdown_to_signal docstring claimed SPOILER support but the regex list
  never handled ``||...||``. Correct the docstring to match the four
  actually-supported styles (BOLD / ITALIC / STRIKETHROUGH / MONOSPACE).
  Signal's SPOILER bodyRange would need dedicated ``||spoiler||`` parsing
  and is left for a follow-up.

- scripts/release.py: add exiao's noreply email to AUTHOR_MAP so the
  contributor-attribution gate accepts their cherry-picked commit.
2026-04-29 04:38:17 -07:00
exiao 23f5fc6765 feat(gateway/signal): native formatting, reply quotes, and reactions
Three Signal adapter improvements that depend on the no-edit-mode
plumbing from the previous commit.

1. Native formatting (markdown -> Signal bodyRanges)
   Signal renders markdown as literal characters (**bold**, `code`, #
   heading), which looks broken. Added _markdown_to_signal(text) that
   strips markdown syntax and emits Signal-native bodyRanges as
   start:length:STYLE entries. Offsets are computed in UTF-16 code
   units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE,
   MONO, and headings mapped to BOLD. Fenced code and inline code are
   handled; link syntax is unwrapped to visible text + URL.

   Includes edge-case fixes reported previously:
   - Bullet lists ("* item") no longer misidentified as italics
   - URLs containing underscores no longer italicized around the dot

2. Reply-quote context
   Parses dataMessage.quote on inbound messages and populates
   MessageEvent.raw_message with sender + timestamp_ms. This lets the
   gateway's existing [Replying to: "..."] injector (gateway/run.py)
   work on Signal, matching Telegram/Matrix behavior.

3. Processing reactions
   Overrides on_processing_start -> hourglass and on_processing_complete
   -> checkmark via the sendReaction JSON-RPC using targetAuthor and
   targetTimestamp pulled from raw_message. Uses the ProcessingOutcome
   enum introduced in the previous commit.

Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the
no-edit streaming path activates.

Tests: 40+ new tests in tests/gateway/test_signal_format.py covering
markdown conversion, UTF-16 offset correctness with non-BMP emoji,
bullet-list and URL false-positive regressions, reply-quote extraction,
and reaction payload shape. Regression extensions to test_signal.py.
2026-04-29 04:38:17 -07:00
Teknium ed170f4333 docs(anthropic): correct OAuth scope to Max plan + extra usage credits only (#17404)
The previous docs pass (#17399) overstated what Anthropic OAuth works
with. In practice Hermes can only route against a Claude Max plan that
has purchased extra usage credits — the base Max allowance is not
consumed, and Claude Pro is not supported at all. Without Max + extra
credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token).

Updates the four pages touched in #17399:
- integrations/providers.md
- user-guide/features/credential-pools.md
- reference/environment-variables.md
- getting-started/quickstart.md
2026-04-29 04:11:14 -07:00
Teknium be57af7188 docs(anthropic): clarify OAuth uses Claude Pro/Max subscription usage (#17399)
Users have been asking what they're billed for when they authenticate
Anthropic via OAuth in Hermes. Clarify in the provider docs that OAuth
routes through Anthropic's Claude Code subscription path — consuming
the extra Claude Code usage included with their Pro or Max plan — and
that an ANTHROPIC_API_KEY is pay-per-token against that key's org
instead.

Touches:
- integrations/providers.md: new info admonition in Anthropic (Native)
  section, plus provider-table row.
- user-guide/features/credential-pools.md: OAuth comment line.
- reference/environment-variables.md: Provider Auth (OAuth) intro.
- getting-started/quickstart.md: provider-picker table row.
2026-04-29 04:05:43 -07:00
Teknium 059980727a refactor(config): migrate remaining 33 cfg_get call sites (#17311)
Completes the cfg_get migration started in PR #17304. Covers the
remaining hermes_cli/ and plugins/ config-access sites that the first
PR intentionally left opportunistic.

Migrated (33 sites across 14 files):

  hermes_cli/setup.py            13 sites  (terminal.*, agent.*, display.*, compression.*, tts.*)
  hermes_cli/tools_config.py      7 sites  (tts.*, browser.*, web.*, platform_toolsets.*)
  hermes_cli/plugins_cmd.py       3 sites  (plugins.*, memory.*, context.*)
  plugins/memory/honcho/cli.py    3 sites  (hosts.*)
  hermes_cli/web_server.py        1 site   (dashboard.*)
  hermes_cli/skills_config.py     1 site   (platform_disabled)
  hermes_cli/plugins.py           1 site   (plugins.disabled)
  hermes_cli/status.py            1 site   (terminal.backend)
  hermes_cli/mcp_config.py        1 site   (mcp_servers.*)
  hermes_cli/webhook.py           1 site   (platforms.webhook)
  plugins/memory/__init__.py      1 site   (memory.provider)
  plugins/memory/hindsight/       1 site   (banks.hermes)
  plugins/memory/holographic/     1 site   (plugins.hermes-memory-store)
  run_agent.py                    1 site   (auxiliary.compression)

The helper supports non-literal keys too, so e.g.
  cfg.get('hosts', {}).get(HOST, {})
becomes
  cfg_get(cfg, 'hosts', HOST, default={})

Migration bugs caught and fixed during this PR:

1. An AST-based batch rewrite naïvely captured the first word token in
   a chain, which corrupted 'self._config.get(...).get(...)' into
   'self.cfg_get(_config, ...)' (dropping 'self.', creating a broken
   method call). Plugins/memory/hindsight caught it via its test suite.
   Fixed manually to 'cfg_get(self._config, ...)'.

2. Import-extension heuristic rewrote multi-line parenthesized imports
   ('from X import (\n  A,\n  B,\n)') as
   'from X import cfg_get, (' — syntactically broken. Fixed by inserting
   cfg_get as the first name inside the parentheses.

Combined with PR #17304, the cfg_get migration now covers:

  PR #17304 (first batch): 20 sites in tools/ + gateway/
  PR #17317 (this one):    33 sites in hermes_cli/ + plugins/ + run_agent.py

Total: 53 sites migrated. Remaining ~8 sites are either:
  - Function-call chains (e.g. '_load_stt_config().get(...).get(...)')
    that would need double-evaluation or a local binding to migrate
    cleanly — intentionally deferred.
  - JSON response-navigation (e.g. 'response_data.get('data',{}).get('web'))
    which is unrelated to config access and shouldn't use cfg_get.

Verified:
- 412/412 tests/plugins/ pass (including the hindsight test that caught
  the self.X regex bug before commit)
- 3181/3189 tests/hermes_cli/ pass (8 pre-existing failures on main,
  verified by git-stash comparison)
- Live 'hermes status' and 'hermes config' render correctly (exercise
  the migrated terminal.backend, tts.provider, browser.cloud_provider,
  compression.threshold, display.tool_progress sites)
- Live 'hermes chat': 1 turn + /quit, zero errors in 11-line log window

No semantic changes — cfg_get was already proven to be a 1:1 match for
the original .get("X",{}).get("Y",default) pattern in PR #17304.
2026-04-29 04:03:03 -07:00
Teknium 21676e80cc Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)" (#17397)
This reverts commit 023f5c74b1.
2026-04-29 03:55:03 -07:00
Ben Barclay 58a6171bfb Merge pull request #17305 from NousResearch/feat/docker-run-as-host-user
feat(docker): run container as host user to avoid root-owned bind mounts
2026-04-29 16:41:55 +10:00
Teknium bc0d8a941e feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:

- `run.json` — machine-readable full record (before/after snapshot,
  state transitions, all tool calls, model/provider, timing, full LLM
  final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
  auto-transition counts, LLM consolidation stats, archived-this-run
  list, new-skills-this-run list, state transitions, the full LLM
  final summary, and a recovery footer pointing at the archive + the
  `hermes curator restore` command

Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.

Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
  provider, tool_calls, error) instead of a bare truncated string so
  the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
  never breaks the curator itself. Same-second rerun gets a numeric
  suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
  can immediately open the latest run

Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
  location (logs not skills), both files written, run.json shape and
  diff accuracy, markdown structure, error path still writes, state
  transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
  stub the new dict-returning _run_llm_review signature

Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
Teknium 2d137074a3 refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)
The "cfg.get('X', {}).get('Y', default)" pattern appears 50+ times
across tools/, gateway/, and plugins/. Each call site manually handles
the same three gotchas:

  1. Missing intermediate key → empty dict → chain works
  2. Non-dict value at intermediate position → AttributeError
     (uncaught in most sites, so a misconfigured YAML crashes the tool)
  3. cfg is None → AttributeError

Introduces cfg_get(cfg, *keys, default=None) in hermes_cli/config.py
as the canonical helper. Handles all three uniformly, returns default
only when the final key is *absent* (matches dict.get semantics —
explicit None values are preserved, falsy values like 0 / False / ''
are preserved).

Named cfg_get rather than cfg_path to avoid shadowing the existing
'cfg_path = _hermes_home / "config.yaml"' local variable that appears
in gateway/run.py, cron/scheduler.py, hermes_cli/main.py, etc.

Migrated 20 call sites as the first-batch proof-of-value:

  gateway/run.py            10 sites (agent/display subtrees)
  tools/browser_tool.py      3 sites
  tools/vision_tools.py      2 sites
  tools/browser_camofox.py   1 site
  tools/approval.py          1 site
  tools/skills_tool.py       1 site
  tools/skill_manager_tool.py 1 site
  tools/credential_files.py  1 site
  tools/env_passthrough.py   1 site

The remaining ~30 sites across plugins/ and smaller tool files can be
migrated opportunistically — the helper is now available and the
pattern is established.

Fixed a latent bug along the way: tools/vision_tools.py had its
cfg_get usage at line 560 inside a function that locally re-imports
'from hermes_cli.config import load_config', but the AST-based
migration script wrote the top-level cfg_get import to a different
function scope, leaving line 560's cfg_get as a NameError silently
swallowed by the surrounding try/except. Test
test_vision_uses_configured_temperature_and_timeout caught it. Fixed
by including cfg_get in the function-local import.

Verified:
- 7880/7893 tests/tools/ + tests/gateway/ + tests/hermes_cli/test_config
  tests pass; all 13 failures pre-existing on main (MCP, delegate,
  session_split_brain — verified earlier in the sweep).
- All 20 migrated sites AST-verified to have cfg_get in scope (either
  module-level or function-local).
- Live 'hermes chat' smoke: 2 turns + /model switch + tool calls +
  /quit, zero errors. Agent correctly counted 20 cfg_get hits across
  8 tool files — matching the migration.

Semantic parity verified against the original pattern across 8 edge
cases (missing keys, None values, falsy values, empty strings, string
instead of dict, None cfg, nested levels).
2026-04-28 23:17:39 -07:00
Ben 5531c0df82 feat(docker): run container as host user to avoid root-owned bind mounts
Add opt-in terminal.docker_run_as_host_user config flag that passes
--user $(id -u):$(id -g) to the Docker backend so files written into
bind-mounted directories (/workspace, /root, docker_volumes entries) are
owned by the host user instead of root.

When enabled on POSIX platforms, also drops SETUID/SETGID caps since the
container no longer needs gosu/su to switch users.  Falls back cleanly on
platforms without os.getuid (e.g. native Windows Docker) with a warning.

Wired through all three config.yaml -> TERMINAL_* env-var bridges:
  - cli.py env_mappings        (CLI + TUI startup)
  - gateway/run.py _terminal_env_map (gateway / messaging platforms)
  - hermes_cli/config.py _config_to_env_sync (`hermes config set`)

Also fixes docker_mount_cwd_to_workspace silently failing in gateway
mode -- it was missing from gateway/run.py's _terminal_env_map.

Adds tests/tools/test_terminal_config_env_sync.py to guard against
future drift between the three bridges (same bug class shipped twice
in one month).

Bundled Hermes image won't work with this flag since its entrypoint
expects to start as root for the usermod/gosu hermes flow; works with
the default nikolaik/python-nodejs image and plain Debian/Ubuntu.
2026-04-29 16:16:43 +10:00
brooklyn! 5e68503d2f Merge pull request #17190 from NousResearch/bb/tui-cold-start-profiling
perf(tui): cut visible cold start ~57% with lazy agent init
2026-04-28 22:45:14 -07:00
brooklyn! 22cc7492ff Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-28 22:44:58 -07:00
Brooklyn Nicholson c2fd0fa684 fix(tui): preserve memory monitor in-flight guard
Copilot caught that clearing inFlight on a transient normal-memory tick could
allow a second dump/eviction to start before the first async tick completed.
Only clear dumped on normal; let the in-flight tick's finally remove its own
level.

Tests:
- cd ui-tui && npm run type-check && npm run build
2026-04-29 00:44:04 -05:00
teknium1 fa9383d27b feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:

**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.

**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).

**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).

**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.

**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.

**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.

**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:33:33 -07:00
Teknium 019d4c1c3f feat(curator): hook into the gateway's cron-ticker thread
Long-running gateways need the curator to fire on cadence without
restarts. Piggy-back on the existing cron ticker thread (which already
runs image/document cache cleanup every hour on the same pattern)
instead of spawning a dedicated timer thread.

- New CURATOR_EVERY = 60 ticks (poll hourly at default 60s interval).
  The inner config.interval_hours gate controls the real cadence, so
  60 of these 60 hourly pokes are cheap no-ops and one runs the review.
- Removed the boot-time call added in the prior commit — the ticker
  covers boot + every hour thereafter. Avoids double-running.

Handles the weekly-default-on-24/7-gateway gap flagged in review.
2026-04-28 22:33:33 -07:00
Teknium a12f7aa8bb fix(curator): default cycle is every 7 days, not 24 hours
Weekly is closer to how skill churn actually works — most agent-created
skills don't change multiple times per day, so a daily review is pure
cost without benefit. Bumping the default to 7 days reduces aux-model
spend while still catching drift and staleness on the timescales that
matter (30d stale, 90d archive).

Changes:
- DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days)
- config.yaml default: interval_hours: 24 -> 24 * 7
- CLI status line renders as '7d' when interval is a whole-day multiple
- Test `test_old_run_eligible` decoupled from the exact default: it now
  uses 2 * get_interval_hours() so future tweaks don't break it
2026-04-28 22:33:33 -07:00
Teknium 0d31864e3b fix(curator): defense-in-depth gates against bundled/hub skills
Previous invariants only gated the primary entry points
(apply_automatic_transitions, archive_skill, CLI pin). Several paths
were unprotected:

  - bump_view / bump_use / bump_patch / set_state / set_pinned wrote
    usage records unconditionally, which is confusing noise in
    .usage.json even though the review list filtered them out
  - restore_skill did not check whether a bundled skill now shadows
    the archived name
  - CLI unpin was asymmetric with CLI pin — it had no gate

Fixes:
  - _mutate() (the shared counter / state writer) now drops silently
    when the skill is not agent-created. .usage.json never gains a
    record for a bundled or hub-installed skill.
  - restore_skill() refuses to restore under a name that is now
    bundled or hub-installed (would shadow upstream).
  - CLI unpin gate matches CLI pin.

New tests:
  - 5 provenance-guard tests on skill_usage (one per mutator)
  - 1 end-to-end test that hammers every mutator at a bundled skill
    and a hub skill, asserts both are untouched on disk, and asserts
    the sidecar stays clean
  - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically

64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).
2026-04-28 22:33:33 -07:00
Teknium c8b7e7268a refactor(curator): point review prompt at existing tools
The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill`
tools that are not registered as model tools. Swap the prompt to rely
on the real surface:

  - skill_manage action=patch  — for patching and consolidation
  - terminal                   — to `mv` skill dirs into .archive/

Also drop `pin` from the model's decision list — pinning is a user
opt-out for `hermes curator pin <skill>`, not something the model
should do autonomously.

Decision list is now: keep / patch / consolidate / archive.

Tests updated: prompt-invariant test now asserts the existing tools
are referenced and that bespoke tool names do NOT appear. New test
prevents `pin` from being re-added as a model decision.
2026-04-28 22:33:33 -07:00
Teknium bc79e227e6 feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.

Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).

Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
  .hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
  via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache

New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
  provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
  transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
  pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests

Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
  patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
  register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)

Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end

Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.

Refs #7816.
2026-04-28 22:33:33 -07:00
Mil Wang (from Dev Box) 88602376d4 fix: resolve external_dirs relative to HERMES_HOME instead of cwd (#9949)
Relative entries in skills.external_dirs were resolved against the
process cwd via Path.resolve(), making them silently fail when Hermes
was launched from a different directory.

Resolve relative paths against get_hermes_home() for consistent
behavior across CLI, gateway, and cron contexts. Absolute paths
and env-var/tilde expansion are unchanged.
2026-04-28 22:29:09 -07:00
teknium1 ded12f0968 chore(release): map LyleLengyel@gmail.com -> mcndjxlefnd 2026-04-28 22:26:09 -07:00
Lyle Lengyel 80e474f11f fix(gateway,terminal): expand shell tilde in terminal.cwd before subprocess
Commit 3c42064e made config.yaml the single source of truth for
TERMINAL_CWD, but the config bridge passes cwd values verbatim to
os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the
literal string '~/'' reaches subprocess.Popen, which the kernel
rejects because it does not expand shell tilde syntax.

This patch adds three defensive layers:

1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD
   is always an absolute path.

2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in
   _get_env_config(), guarding against stale or manually-set env vars.

3. tools/environments/local.py — expanduser in LocalEnvironment before
   passing cwd to subprocess.Popen, the final safety net.

Includes regression tests in test_config_cwd_bridge.py for nested
terminal.cwd, top-level cwd alias, and precedence ordering.

Refs: 3c42064e
2026-04-28 22:26:09 -07:00
Brooklyn Nicholson d341af22c0 fix(tui): preserve busy and init error signaling
Finish the Copilot review cleanup for lazy prompt submission:

- prompt.submit now claims session.running before returning success, preserving
  the existing RPC-level session busy error so the frontend can queue.
- agent-init timeout/failure now emits a normal error event instead of writing a
  second JSON-RPC response for an already-settled request id.

Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
2026-04-29 00:25:09 -05:00
JackJin 88e07c42b4 fix(cli): prevent .env sanitizer from splitting GLM_API_KEY by LM_API_KEY suffix
The known-key splitter in `_sanitize_env_lines` used substring matching
to find concatenated KEY=VALUE pairs. When a registered key was a suffix
of another (LM_API_KEY is a suffix of GLM_API_KEY), the shorter key's
needle would match inside the longer one, causing the sanitizer to
rewrite `GLM_API_KEY=...` as `G\nLM_API_KEY=...` and silently break
Z.AI/GLM auth (and similarly `GLM_BASE_URL` -> `G\nLM_BASE_URL`).

Drop matches whose needle range is fully contained within a longer
overlapping match. Two regression tests cover the suffix-collision case
and confirm a real concatenation that happens to start with the longer
key still splits where it should.

Fixes #17138
2026-04-28 22:22:45 -07:00
Brooklyn Nicholson cc5efb6fc1 fix(tui): keep non-agent session RPCs lazy
Respond to Copilot's lazy-start review: session metadata/history/usage do not
need a constructed AIAgent, so keep them on the no-wait session path. This
preserves the deferred startup model and avoids blocking simple session RPCs on
agent initialization.

Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
2026-04-29 00:22:38 -05:00
Brooklyn Nicholson 97a2474b39 review(copilot): point reload.env docstring at hermes_cli.config.reload_env 2026-04-28 22:22:30 -07:00
Brooklyn Nicholson 6b4ef00a2c review(copilot): keep /reload cli_only since gateway has no handler 2026-04-28 22:22:30 -07:00
Brooklyn Nicholson 4858e26eaa feat(tui): port classic CLI /reload (.env hot-reload) to TUI
Classic CLI exposes ``/reload`` (re-reads ~/.hermes/.env into
``os.environ`` via ``hermes_cli.config.reload_env``) so newly added API
keys take effect without restarting the session.  The TUI was missing
the parity command, so users had to Ctrl+C out and ``hermes --tui``
again whenever they added or rotated a credential.

Three small wires:

* New ``reload.env`` JSON-RPC method in ``tui_gateway/server.py`` that
  delegates to ``hermes_cli.config.reload_env`` and returns the count
  of vars updated.
* New ``/reload`` slash command in ``ui-tui/src/app/slash/commands/ops.ts``
  matching the existing ``/reload-mcp`` pattern (native RPC, no slash
  worker).
* Drop ``cli_only=True`` from the ``reload`` ``CommandDef`` in
  ``hermes_cli/commands.py`` so help/menus surface it in the TUI too.
  ``reload_env`` itself is environment-agnostic.

Same caveat as classic CLI: the *currently constructed* agent's
credential pool / provider routing does not auto-rebuild.  Users who
want a brand-new credential resolution should follow with ``/new``.

Tests:
* New ``test_reload_env_rpc_calls_hermes_cli_reload_env`` confirms
  RPC delegates and reports the count.
* New ``test_reload_env_rpc_surfaces_errors`` confirms exceptions are
  rendered as JSON-RPC errors.
* ``createSlashHandler.test.ts`` slash-parity matrix extended with
  ``['/reload', 'reload.env', {}]`` so we can't regress the routing.

Validation:
  scripts/run_tests.sh tests/test_tui_gateway_server.py — 92/92.
  scripts/run_tests.sh tests/hermes_cli/test_commands.py — 128/128.
  cd ui-tui && npm run type-check — clean; npm test --run — 390/390.
2026-04-28 22:22:30 -07:00
Teknium dcd7b717f8 fix(gateway): linearize tool-progress bubbles with content messages (#17280)
After PR #7885 (97b0cd51e) added content-side segment breaks for
natural mid-turn assistant messages, the tool-progress task in
gateway/run.py was not updated to match. progress_msg_id and
progress_lines persisted for the whole run, so after a tool batch
produced bubble B1 followed by content bubble C1, the next tool.started
kept editing the OLD bubble B1 above C1 — making the chat appear out
of order on Telegram, Discord, and Slack.

Add on_new_message callback to GatewayStreamConsumer, fired at the
four sites where a fresh content bubble lands on the platform:
  - _send_or_edit first-send branch (NOT edits)
  - _send_commentary
  - _send_new_chunk (overflow split)
  - each successful chunk of _send_fallback_final

Gateway supplies a lambda that enqueues ('__reset__',) into the
progress_queue. send_progress_messages() handles the marker in both
the main loop and the CancelledError drain path, clearing
progress_msg_id, progress_lines, and the dedup state so the next
tool.started opens a fresh bubble below the new content.

Result: each tool batch appears in chronological order below the
preceding content. When no content appears between tool batches,
tools still group in one bubble (CLI-style compactness).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 22:17:33 -07:00
Tranquil-Flow ac855bba0e fix(cli): respect terminal.cwd config in local terminal backend
init_session() runs a login shell bootstrap that sources profile scripts
(.bashrc, .bash_profile, etc.) before capturing pwd. If any profile
script changes the working directory, the captured cwd overwrites the
configured terminal.cwd value — so terminal commands run in the wrong
directory despite the TUI banner showing the configured path.

Add an explicit 'builtin cd' to the configured cwd in the bootstrap
script, after profile sourcing but before pwd capture, ensuring the
configured terminal.cwd is always what gets recorded.

Fixes #14044
2026-04-28 22:16:08 -07:00
Brooklyn Nicholson f95c34f415 fix(browser): address Copilot round-4 on /browser connect
* Reject unsupported schemes (anything outside http/https/ws/wss) in
  cli.py /browser connect before probing or persisting, matching the
  gateway's existing 4015 path.
* Defend gateway browser.manage against `{"url": null}` and
  non-string urls: empty/null falls back to DEFAULT_BROWSER_CDP_URL,
  non-string returns a 4015 instead of slipping into the generic
  5031 catch via TypeError on `"://" in url`.
* Add regression tests for both null-url fallback and non-string
  rejection.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson 679a27498d fix(browser): address Copilot round-3 on /browser connect
* Gate `browser.progress` emit on truthy `session_id`. The TUI
  prints `messages` from the response when there's no session, so
  emitting events too would double-render. Now: with a session →
  events stream live; without one → bundled messages only.
* Resolve `system = platform.system()` once in `_browser_connect`
  and thread it through `try_launch_chrome_debug` and
  `_failure_messages` → `manual_chrome_debug_command`, so the
  generated hint is consistent (and tests are deterministic) on
  any host.
* Add `test_browser_manage_connect_no_session_skips_progress_events`
  to lock in the gating behavior.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson d1ee4915f3 fix(browser): address Copilot review on /browser connect
Fixes from Copilot's two passes on PR #17238:

* Validate parsed URL once: reject missing host, invalid port, and
  unsupported scheme up front so malformed inputs (e.g. http://:9222
  or http://localhost:abc) don't fall through to a generic 5031.
* Tighten _is_default_local_cdp to require a discovery-style path so
  ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare
  http://127.0.0.1:9222 (which would lose the path and break the
  connect).
* Move browser.manage into _LONG_HANDLERS so the up-to-10s
  launch-and-retry loop runs on the RPC pool instead of blocking the
  main dispatcher.
* try_launch_chrome_debug uses Windows-appropriate detach kwargs
  (creationflags=DETACHED_PROCESS|CREATE_NEW_PROCESS_GROUP) instead
  of POSIX-only start_new_session=True.
* manual_chrome_debug_command uses subprocess.list2cmdline on
  Windows so the printed instruction is cmd.exe-compatible.
* Mirror host/port validation in cli.py /browser connect so the
  classic CLI never persists an invalid BROWSER_CDP_URL.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson 26816d1f77 refactor(tui): tighten /browser connect plumbing
Split browser.manage into a small dispatcher with named connect/disconnect
helpers, fold _http_ok / _probe_urls / _normalize_cdp_url out of the nested
probe loop, collapse the failure-message scaffolding, and DRY the chrome
candidate path tables. Behaviour and event shape unchanged.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson e750829015 fix(tui): stream /browser connect progress as gateway events
Emit browser.progress JSON-RPC notifications during the connect work and render them in the TUI as system transcript lines, so users see the same step-by-step status the base CLI prints instead of nothing for ~1m followed by a final result.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson 7d39a45749 fix(tui): show /browser connect progress like CLI
Return CLI-style browser connect status messages from the gateway and render them in the TUI so local Chrome launch attempts are visible instead of ending in a silent delayed failure.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson 69ff114ee2 fix(browser): avoid bogus Chrome launch fallback
Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson f10a3df632 fix(tui): align /browser connect local CDP handling
Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson 88a9efdb1a fix(tui): tighten cold-start edge cases after review
Clean up the remaining review nits:

- let the deferred @hermes/ink import retry after a transient failure instead
  of memoizing a rejected promise forever
- keep memory-monitor in-flight state inside a finally so future exceptions
  cannot suppress that memory level indefinitely
- use read_raw_config for the TUI MCP cold-start probe instead of full
  load_config()
- keep input.detect_drop for explicit relative path prefixes (./ and ../)
  while preserving the no-RPC fast path for ordinary plain prompts

Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
2026-04-29 00:08:34 -05:00
Brooklyn Nicholson 72a3af63d4 fix(tui): keep prompt submit off the RPC pool
A cleanup review found that adding prompt.submit to _LONG_HANDLERS made the RPC
pool own the full first-turn wait even though the handler itself already spawns
a turn thread. Keep prompt.submit inline and make it return immediately:

- look up the session without waiting
- kick the lazy agent build
- spawn a short waiter thread that blocks on agent_ready, then starts the
  existing turn dispatcher

This keeps stdin dispatch responsive, avoids occupying a bounded pool worker for
a normal chat turn, and preserves the lazy-start hydration behavior.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
2026-04-29 00:04:12 -05:00
Brooklyn Nicholson a2819e1820 fix(tui): address lazy startup review races
Copilot correctly flagged two concurrency windows:

- memoryMonitor could re-enter while awaiting the lazy @hermes/ink import or
  heap dump, producing duplicate imports/dumps under sustained pressure.
- _start_agent_build used a check-then-set guard without synchronization, so
  concurrent agent-backed RPCs could start duplicate agent builders.

Fix both with single-flight guards: cache the dynamic import promise and track
per-level dump in-flight state in memoryMonitor, and protect the TUI agent build
flag with a per-session lock.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
2026-04-28 23:54:33 -05:00
Brooklyn Nicholson 0a6ecea676 fix(tui): hydrate lazy startup panel and use animated loaders
The lazy startup panel could remain stuck on the placeholder when no first
prompt was submitted because agent construction only started from _sess(). Keep
session.create cheap, but schedule _start_agent_build shortly after returning
the placeholder so tools/skills hydrate automatically.

Also replace the ugly placeholder bar rows with compact unicode-animations
braille loaders for the tools and skills sections.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
2026-04-28 23:48:07 -05:00
Brooklyn Nicholson b66cbb7b4c perf(tui): defer agent construction until first prompt
Match classic CLI perceived startup behavior: show the TUI shell and composer
before constructing the full AIAgent. session.create now returns a lightweight
placeholder session with lazy=true and no longer starts _make_agent eagerly.
The first method that needs the agent triggers _start_agent_build() via _sess();
prompt.submit is routed through the RPC worker pool so that the initial wait for
agent construction does not block the stdio dispatcher.

The intro panel renders skeleton rows for tools/skills while the real
session.info payload is absent, then hydrates to the real tools/skills panel once
AIAgent initialization completes. Also skip the startup /voice status probe and
avoid the input.detect_drop RPC for ordinary plain-text prompts to keep early
startup/first-submit paths cheap.

Measurements on macOS Terminal.app:
- Previous full ready p50 after earlier PR commits: ~1537ms
- Lazy skeleton panel p50: ~794ms
- Original baseline full ready p50: ~1843ms

So the visible startup surface is now ~743ms faster than the prior PR state and
~1.05s faster than the original baseline. First prompt still pays the same agent
construction cost if it races the background/skeleton state, matching classic
CLI's deferred behavior.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
2026-04-28 23:32:02 -05:00
Teknium 1d4218be56 feat(review): active-update bias, loaded-skill-first, support-file variants (#17213)
The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.

This rewrite changes the stance:

- **Active-update bias.** The reviewer now treats inaction as a missed
  learning opportunity. 'Nothing to save.' remains an explicit escape
  but is no longer framed as the most-common outcome.

- **User-preference corrections are first-class skill signals.** Style,
  tone, format, legibility, verbosity complaints — and the actual
  phrasings users use ('stop doing X', 'this is too verbose', 'I hate
  when you Y', 'remember this') — now warrant patching the skill that
  governs the task, not just writing to memory.

- **Loaded-skill-first preference order.** When a skill was loaded via
  /skill-name or skill_view during the session, the reviewer patches
  THAT one first. It was in play; it's the right place.

- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
  create.** Support files are explicitly enumerated as three kinds:
    * references/<topic>.md — session-specific detail OR condensed
      knowledge banks (quoted research, API docs excerpts, domain notes)
    * templates/<name>.<ext> — starter files to copy and modify
    * scripts/<name>.<ext>  — statically re-runnable actions

- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
  numbers, error strings, codenames, library-alone names, or session
  artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
  only fits today's task, fall back to one of the patch/support-file
  options.

- **Memory scope clarified.** 'who the user is and what the current
  situation and state of your operations are' — MEMORY.md is
  situational/state, USER.md is identity/preferences.

- **Curator handoff.** Reviewer flags overlap; the background curator
  handles consolidation at scale. Single-session reviewer doesn't
  attempt umbrella-rebalancing.

Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 21:11:48 -07:00
Brooklyn Nicholson c4db1ce08c skills: add pretext creative-demos skill
Adds a 'pretext' skill under skills/creative/ for building cool browser
demos with @chenglou/pretext — the 15KB DOM-free text-layout library by
Cheng Lou.

The skill documents pretext as a creative primitive (not plumbing): text
flowing around obstacles, text-as-geometry games, proportional ASCII
surfaces, shatter/particle typography, editorial multi-column, kinetic
type, and multiline shrink-wrap. Each pattern pairs with copy-pasteable
snippets in references/patterns.md.

Two single-file HTML templates, both verified in a browser:

  templates/hello-orb-flow.html
    Minimal starter: long paragraph flows around a mouse-tracked orb
    using layoutNextLineRange + a per-row corridor-width function.

  templates/donut-orbit.html
    Full 3D Sloane torus with orbit controls (drag to rotate, scroll to
    zoom, idle auto-rotate). Each 'luminance pixel' is a real grapheme
    sampled in reading order from a prose corpus via pretext's
    prepareWithSegments + layoutWithLines + Intl.Segmenter. Amber-on-
    black CRT aesthetic, z-buffer keyed by screen cell, 60fps.

Related skills: p5js, claude-design, excalidraw, architecture-diagram.
2026-04-28 23:09:52 -05:00
Teknium 8c892c1453 refactor(redact): canonical mask_secret helper; fix status.py DIM drift (#17207)
Three modules independently implemented the same "preserve head+tail of
a secret, mask the middle" logic with slightly different behaviors that
had started to drift:

  hermes_cli/config.py redact_key  — 12-char floor, 4+4, DIM '(not set)'
  hermes_cli/status.py redact_key  — 12-char floor, 4+4, plain '(not set)'  ← drift
  hermes_cli/dump.py _redact       — 12-char floor, 4+4, empty string

The visible bug: 'hermes status' displayed the '(not set)' placeholder
in plain text while 'hermes config' showed it in dim text. Same concept,
inconsistent UI.

Introduces mask_secret() in agent/redact.py as the canonical helper,
with head/tail/floor/placeholder/empty kwargs. The three call sites
become one-line wrappers that differ only in the 'empty' handling:

  config.redact_key  → mask_secret(k, empty=color('(not set)', Colors.DIM))
  status.redact_key  → mask_secret(k, empty=color('(not set)', Colors.DIM))
  dump._redact       → mask_secret(v)  # empty → ''

agent.redact._mask_token (log redactor, different policy: 18-char floor,
6+4 visible, '***' on empty) also ports to mask_secret but retains its
own empty-case handling to preserve the historical '***' return.

Net: the three display-time redactors now agree on formatting, the
canonical helper lives in one place, and future tweaks (e.g. adding
bullet-point masking, changing the head/tail widths) happen once.

Verified:
- 3/3 tests/hermes_cli/test_web_server.py::TestRedactKey pass
- 89/89 agent/tests/test_redact.py + tests/tools/test_browser_secret_exfil.py
  + tests/hermes_cli/test_redact_config_bridge.py pass
- Live 'hermes status', 'hermes config', 'hermes dump' all render the
  same way they did before (verified against actual env with real
  keys: OpenRouter, Firecrawl, Browserbase, FAL, Tinker all show
  'prefix...suffix'; Kimi shows '***' at <12 chars; unset shows
  '(not set)' uniformly).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 21:04:35 -07:00
Brooklyn Nicholson 9e398e1809 perf(tui): avoid importing classic CLI during tool discovery
TUI session readiness was still laggy after the gateway-ready fixes. Profiling
session.create -> session.info showed the slow phase is background AIAgent
construction (~1.1s). A cProfile run of tui_gateway.server::_make_agent showed
model_tools/tool discovery importing tools.code_execution_tool, whose
module-level EXECUTE_CODE_SCHEMA calls _get_execution_mode(), which imported
cli.CLI_CONFIG.

That pulled the classic interactive CLI stack (prompt_toolkit/Rich and REPL
setup) into every agent startup path, including hermes --tui where it is not
used. Replace that with hermes_cli.config.read_raw_config(), which is cached and
reads only the raw code_execution section. Existing defaults still apply when
the key is absent.

Measurements on macOS Terminal.app:
- import run_agent: ~466ms -> ~347ms
- model_tools import: ~418ms -> ~272ms
- _make_agent: ~1452ms -> ~1239ms
- session.create -> session.info: ~1167ms -> ~999ms
- full hermes --tui ready p50: ~1655ms -> ~1537ms

Tests:
- scripts/run_tests.sh tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
2026-04-28 22:42:17 -05:00
brooklyn! 6e9691ff12 Merge pull request #17237 from NousResearch/bb/tui-paste-watchdog
fix(tui): stabilize sticky prompts and paste recovery
2026-04-28 20:22:44 -07:00
Brooklyn Nicholson 10ad7006b6 fix(tui): use paste timeout when rearming paste watchdog
Match the buffered-stdin rearm cadence to IN_PASTE state so large pastes do not spin the normal escape timeout while waiting for readable data to drain.
2026-04-28 22:21:44 -05:00
Brooklyn Nicholson f542d17b00 style(tui): apply npm run fix
Run the TUI lint autofix and formatter on the PR branch after the sticky prompt and paste recovery changes.
2026-04-28 22:18:26 -05:00
Brooklyn Nicholson d7ae8dfd0a style(tui): remove steer queued emoji
Keep the /steer acknowledgement plain text so it reads like the rest of the TUI status copy.
2026-04-28 22:15:57 -05:00
Brooklyn Nicholson ce2cc7302e fix(tui): stabilize sticky prompt tracking
Keep the latest prompt sticky while the viewport is in live assistant output beyond history, and clear stale sticky state at the real bottom using fresh scroll height.
2026-04-28 22:10:40 -05:00
Brooklyn Nicholson afb20a1d67 fix(tui): recover from stuck paste mode
Prevent unterminated bracketed paste input from swallowing future keystrokes, and avoid rendering an empty Thinking panel before reasoning arrives.
2026-04-28 22:06:27 -05:00
Austin Pickett e4120d1e6d Merge remote-tracking branch 'origin/main' into fix/markdown
Made-with: Cursor

# Conflicts:
#	ui-tui/src/components/markdown.tsx
2026-04-28 22:01:02 -04:00
Teknium cd7150a195 perf(approval): precompile DANGEROUS_PATTERNS and HARDLINE_PATTERNS (#17206)
detect_dangerous_command() and detect_hardline_command() were calling
re.search(pattern, text, re.IGNORECASE | re.DOTALL) inline — Python's
re._cache (512 patterns) amortizes compile cost on the warm path, but:

  1. The first terminal() call per process pays the full compile fan-out
     for all 59 patterns (12 HARDLINE + 47 DANGEROUS). Measured at
     ~2.6 ms per detect_dangerous_command() call after re.purge().
  2. The re._cache is LRU — unrelated regex work elsewhere in the agent
     (response parsing, text normalization, etc.) can evict our patterns
     and silently re-compile them on the next terminal() call.

Precompiling at module load eliminates both costs:

  detect_dangerous_command:
    cold  2.613 ms  →  0.298 ms   (-88%)
    warm  0.042 ms  →  0.004 ms   (-90%)
  detect_hardline_command:
    cold  ~0.6 ms   →  0.006 ms
    warm  0.011 ms  →  0.002 ms

Savings are per terminal() call. Agents with heavy terminal use see
compound savings; the bigger value is the stability guarantee (no
re._cache eviction can silently re-introduce the 2.6 ms cold cost
mid-session).

Implementation:
- HARDLINE_PATTERNS_COMPILED and DANGEROUS_PATTERNS_COMPILED built at
  module load from the existing (pattern, description) tuples, using
  shared _RE_FLAGS = re.IGNORECASE | re.DOTALL.
- detect_* functions now iterate the compiled list and call pattern_re.search(text).
- Original HARDLINE_PATTERNS and DANGEROUS_PATTERNS lists kept as-is
  (other code in the file uses them for key derivation /
  _PATTERN_KEY_ALIASES).

Verified:
- 160/161 tests/tools/test_approval*.py pass (1 pre-existing heartbeat
  test flake on main).
- 349/349 tests/tools/ 'approval or terminal or dangerous' pass.
- Live hermes chat smoke: 3 benign terminal commands + 1 rm -rf /tmp/
  (clarify prompt fired — approval path still works) + 1 sudo (sudo
  password prompt fired — DANGEROUS pattern match still works). 23
  log lines in the smoke window, zero errors.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 18:44:14 -07:00
Austin Pickett 3379f88ea4 docs: clarify wrapForFrac and streaming math-fence rationale
Address two Copilot review comments on PR #17175.

- `wrapForFrac` doc said "additive operators or whitespace" but the
  implementation also matches `*` and `/`. The wider behaviour is the
  one we want (nested products and fractions need parens to disambiguate
  inline `/`), so the doc is updated to match instead of tightening the
  regex.

- `fenceOpenAt` was flagged as "overly conservative" vs. `markdown.tsx`,
  which falls back to paragraph rendering for unclosed `$$` openers.
  Mirroring that fallback in the streaming chunker would prematurely
  commit a paragraph rendering of the unclosed opener to the monotonic
  stable prefix, where it would be frozen and become wrong the moment
  the closer streams in. The asymmetry is deliberate; document why so
  it isn't "fixed" again later.

Made-with: Cursor
2026-04-28 21:43:32 -04:00
Teknium adef1f33ab chore(release): map scott@scotttrinh.com -> scotttrinh (#17203)
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 18:28:49 -07:00
Teknium fe295f9836 docs(hooks): tutorial — build a BOOT.md startup checklist (#17202)
Replace the removed built-in boot-md hook (#17093) with a how-to that
shows users how to wire up the same behavior themselves via the hooks
system. Uses _resolve_gateway_model() + _resolve_runtime_agent_kwargs()
so the example works against custom endpoints and OAuth providers,
not just the aggregator defaults that the old built-in silently assumed.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 18:27:48 -07:00
Scott Trinh fd943461ca fix(doctor): accept catalog provider aliases
Validate configured providers against both Hermes runtime provider ids and
catalog-normalized provider ids. This keeps providers like ai-gateway from
being rejected after catalog resolution maps them to models.dev ids.

Keep credential checks and vendor-slug warnings anchored to the runtime id
so doctor reports actionable provider names in follow-up diagnostics.
2026-04-28 18:27:42 -07:00
Austin Pickett cb039ac000 fix: account for latex 2026-04-28 21:20:43 -04:00
Teknium 9f004b6d94 perf(tools): memoize get_tool_definitions + TTL-cache check_fn results (#17098)
Two amplifying optimizations to per-turn overhead in the gateway:

1. get_tool_definitions() memoization (model_tools.py)
   Keyed on (frozenset(enabled), frozenset(disabled),
   registry._generation, config.yaml mtime+size). Only active when
   quiet_mode=True (which is every hot-path caller — gateway,
   AIAgent.__init__); quiet_mode=False keeps the existing print side
   effects. Cached path returns a shallow-copy list sharing read-only
   schema dicts.

   Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway
   constructs fresh AIAgent per message, so this saves ~7 ms/turn before
   any LLM work.

2. check_fn() TTL cache (tools/registry.py)
   check_fn callables like check_terminal_requirements probe external
   state (Docker daemon, Modal SDK, playwright binary). For a long-lived
   process, hitting them on every get_definitions() pass was pure waste
   — external state changes on human timescales. 30 s TTL so env-var
   flips (hermes tools enable X) propagate within a turn or two without
   explicit invalidation.

   Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate);
   subsequent calls ~0.01ms via the upstream memoization.

Invalidation surface:
- registry._generation bumps on register/deregister/register_toolset_alias,
  invalidating the memoized definitions automatically.
- config.yaml mtime in the cache key captures user-visible config edits
  affecting dynamic schemas (execute_code mode, discord allowlist).
- invalidate_check_fn_cache() exposed for explicit flushes (e.g. after
  hermes tools enable/disable).
- tests/conftest.py autouse fixture clears both caches before every test
  so env-var monkeypatches don't see stale results.

Also fixes a regression from PR #17046 that I missed:
- tools/web_tools.py — Firecrawl was removed from module scope by the
  lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'.
  Applied the same _FirecrawlProxy pattern used in auxiliary_client/
  run_agent for OpenAI (module-level proxy that looks like the class
  but imports the SDK on first call/isinstance; patch() replaces the
  attribute as usual).

Verified:
- 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main)
- 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in
  the full suite due to check_fn TTL cross-test pollution; fixed by
  the autouse fixture)
- 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp
  dynamic discovery, 5 mcp structured content — all confirmed on main)
- 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails)
- 868/868 tests/run_agent/ (excluding test_run_agent.py which has
  pre-existing suite-level issues)
- Live smoke: 2 turns + /model switch + tool calls, zero errors in
  agent.log session window.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 18:20:17 -07:00
Brooklyn Nicholson 0399d4b976 perf(tui): shave ~190ms off hermes --tui cold start
Two targeted fixes on the critical path from `hermes --tui` launch to
`gateway.ready`:

1. **Defer `@hermes/ink` import in memoryMonitor.ts.** The static top-level
   import dragged the full ~414KB Ink bundle (React + renderer + all
   components/hooks) onto the critical path *before* `gw.start()` could
   spawn the Python gateway — serialising ~155ms of Node work in front of
   it on every launch. `evictInkCaches` only runs inside the 10-second
   tick under heap pressure, so it moves to a lazy dynamic import. First
   tick hits the ESM cache because the app entry has long since imported
   `@hermes/ink`.

2. **Gate `tools.mcp_tool` import on config in tui_gateway/entry.py.**
   Importing the module transitively pulls the MCP SDK + pydantic + httpx
   + jsonschema + starlette formparsers (~200ms). The overwhelming
   majority of users have no `mcp_servers` configured, so this runs for
   nothing. A cheap `load_config()` check (~25ms) skips the 200ms import
   when no servers are declared, with a conservative fallback to the old
   behaviour if the config probe itself fails.

## Measurements (macOS Terminal.app, Apple Silicon, n=12)

| Metric                     | Before (p50) | After (p50) | Δ        |
|----------------------------|--------------|-------------|----------|
| Python gateway boot alone  | 252–365ms    | 105–151ms   | −180ms   |
| `hermes --tui` banner paint | 686ms        | 665ms       | −21ms    |
| `hermes --tui` → ready      | **1843ms**   | **1655ms**  | **−188ms (−10.2%)** |
| `hermes --tui` → ready p90  | 1932ms       | 1778ms      | −154ms   |
| stdev (ready)              | 126ms        | 83ms        | also more consistent |

## Tests

- `scripts/run_tests.sh tests/tui_gateway/ tests/tools/test_mcp_tool.py`:
  195 passed.  (The one pre-existing failure in
  `test_session_resume_returns_hydrated_messages` reproduces on main —
  unrelated, it's a mock-DB kwarg mismatch.)
- `ui-tui` vitest: 430 tests, all pass.
- `npm run type-check` in ui-tui: clean.

## Notes

- Node-side first paint ("banner") didn't move meaningfully because that
  latency is dominated by Ink's render pipeline + React mount, not by
  which imports load first.
- The win shows up entirely in the time from banner to `gateway.ready`
  — exactly where we expected it, since both fixes shorten the Python
  gateway's boot path or let it overlap more with Node startup.
- No user-visible behaviour change. Memory monitoring still fires every
  10s; MCP still works when `mcp_servers` is configured.
2026-04-28 19:42:31 -05:00
brooklyn! 188eaa57c4 fix(tui): honor documented mouse_tracking config key (#17188)
* fix(tui): honor documented mouse_tracking config key

The TUI runtime was reading display.tui_mouse while docs and user-facing
examples pointed users at display.mouse_tracking. That made persistent
mouse-disable config look like a no-op for users trying to restore native
terminal selection/copy behavior on Linux/SSH/tmux terminals.

Use display.mouse_tracking as the canonical key, keep display.tui_mouse as
a legacy fallback, and have /mouse write the documented key. Both gateway
config.get and client-side config sync now share the same precedence: the
canonical key wins, then the legacy key, then default on.

* review(copilot): align mouse tracking config coercion

- Load gateway config once before deriving display.mouse_tracking state.
- Use key-presence precedence on the TUI client too, so canonical
  mouse_tracking wins over legacy tui_mouse even when the value is null.
- Treat numeric 0 as disabled on both gateway and client, matching the
  existing string "0" handling.
- Widen ConfigDisplayConfig mouse fields because config.get full returns raw
  YAML, not normalized booleans.
2026-04-28 17:39:07 -07:00
brooklyn! 6b09df39be fix(tui): restore macOS copy behavior and theme polish (#17131)
This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions:

- copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled
- copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus
- keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation
- force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app
- move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing
- render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text
2026-04-28 18:47:14 -05:00
brooklyn! a9efa46b69 Merge pull request #17174 from NousResearch/bb/nix-web-hash-refresh
fix(nix): refresh web/ npm-deps hash to unblock main builds
2026-04-28 16:45:57 -07:00
Brooklyn Nicholson b2f936fd37 fix(nix): treat transient magic-cache throttling as skip in fix-lockfiles
Round 1 of #17174 hit `nix-lockfile-check` failure.  Root cause was
NOT a stale hash — the primary `nix (ubuntu-latest)` and
`nix (macos-latest)` builds passed.  GitHub's Magic Nix Cache returned
HTTP 418 (rate-limited / throttled) mid-run, so the rebuild bailed
with `some outputs of '/nix/store/...-npm-deps.drv' are not valid,
so checking is not possible` — no `got:` line for the script to
extract.

The script then incorrectly treated this as 'build failed with no
hash mismatch' and exited 1, breaking the lint on every PR whenever
the cache is throttled.

Now we recognize the throttling/cache-disabled signature and skip
that entry with a warning.  A real stale hash still surfaces in the
primary `.#$ATTR` build (separate CI job), so we don't lose
coverage.
2026-04-28 18:39:35 -05:00
Brooklyn Nicholson ec11aa64ee fix(nix): refresh web/ npm-deps hash to unblock main builds
`web/package-lock.json` was updated by the design-system refactor
(merged via #17007 + follow-ups: spinner / select / badges / buttons)
without bumping `nix/web.nix::npmDeps.hash`, breaking nix builds on
every PR + main since 2026-04-28T18:46.

Hash sourced from the actual `Check flake` failure output:
  specified: sha256-AahWmJ9gDQ9pMPa1FYwUjYdO2mOi6JM9Mst27E0vp68=
  got:       sha256-+B2+Fe4djPzHHcUXRx+m0cuyaopAhW0PcHsMgYfV5VE=

Standalone single-file fix so it can land fast and clear nix on
every other open PR.
2026-04-28 18:21:09 -05:00
brooklyn! 7d81d76366 feat(tui): pluggable busy-indicator styles (#13610) (#17150)
* feat(tui): pluggable busy-indicator styles (kaomoji/emoji/unicode/ascii)

The status-bar `FaceTicker` rotated through wide-and-variable kaomoji
glyphs (`(。•́︿•̀。)`, `( ͡° ͜ʖ ͡°)`, …) every 2.5s.  Real display widths range
from ~5 to ~16 columns, so the rest of the bar (cwd, ctx %, voice,
bg counter) shifted on every cycle.  Padding the verb alone (#17116)
helped but didn't address the dominant jitter source — the glyph
itself.

Add four indicator styles, configurable + hot-swappable:

* `kaomoji` (default — preserves the existing vibe; verb is now
  pad-stable so the only width churn left is the kaomoji itself).
* `emoji`  — single 2-col emoji frame (`⚕ 🌀 🤔  🍵 🔮`).
* `unicode` — `unicode-animations` braille spinner (1-col, smooth).
* `ascii`  — `| / - \` (1-col, max compat).

Wires:

* `display.tui_status_indicator` in `DEFAULT_CONFIG` (default
  `kaomoji`).
* New JSON-RPC `config.set/get indicator` keys, narrow allow-list.
* `applyDisplay` reads the field and patches `UiState.indicatorStyle`,
  so the existing `mtime` poll picks up `~/.hermes/config.yaml` edits
  within ~5s without a TUI restart.
* `/indicator [style]` slash command (alias `/indicator-style`,
  subcommand completion `kaomoji|emoji|unicode|ascii`).  Bare form
  shows the current style; setter fires `config.set` and
  optimistically `patchUiState({ indicatorStyle })` so the live TUI
  swaps immediately, matching the `/skin` UX.
* `CommandDef("indicator", ..., subcommands=...)` so classic CLI
  autocomplete + TUI `complete.slash` both surface it.
* `FaceTicker` decouples spinner cadence from verb cadence — the
  glyph runs at the spinner's authored interval (or `FACE_TICK_MS`
  for kaomoji), the verb stays on the original 2.5s cycle, and both
  re-arm cleanly when style changes.

Tests:

* `normalizeIndicatorStyle` rejects unknown / non-string input.
* `applyDisplay → tui_status_indicator` covers fan-out + fallback.
* `/indicator <style>` hot-swaps `UiState.indicatorStyle` after a
  successful `config.set`.
* `/indicator sparkle` rejects with the usage hint and never hits
  the gateway.
* Slash-parity matrix gets `'/indicator'` → `config.get`.

Validation:
  cd ui-tui && npm run type-check — clean; npm test --run — 398/398.
  scripts/run_tests.sh tests/test_tui_gateway_server.py
  tests/hermes_cli/test_commands.py — 220/220.

* chore(tui): drop /indicator-style alias to declutter autocomplete

* fix(tui): drop verb-width pad — /indicator handles glyph jitter directly

* fix(tui): unicode indicator style hides the verb (cleanest option)

* refactor(tui): single source of truth for INDICATOR_STYLES; cleaner error format

Round 1 Copilot review on PR #17150:

- Exported `INDICATOR_STYLES` const tuple from `interfaces.ts`;
  `IndicatorStyle` union type is derived from it. `useConfigSync`
  builds its validation Set from the tuple, and `session.ts` uses it
  for both the usage hint and the runtime allow-list — adding/removing
  a style now touches one line.
- Backend `config.set indicator` error message: switched
  `sorted(allowed)` list repr to `pick one of ascii|emoji|kaomoji|unicode`
  (matches the TUI usage hint), and reports the normalized `raw`
  instead of the original `value`. Backend allowed tuple now has a
  comment pointing back at `INDICATOR_STYLES` so the two stay aligned.

Note: kept the verb portion unpadded per design intent — fixed-width
padding was the exact UX the `/indicator` command was added to remove.
Stable width comes from the glyph; verbs cycling is part of the kawaii
aesthetic. Reply on the verb thread will explain.

* fix(tui): drop type collapse + gate verb timer + DEFAULT_INDICATOR_STYLE

Round 2 Copilot review on PR #17150:

- `tui_status_indicator?: 'ascii' | ... | string` collapses to `string`
  in TS — consumers got no narrowing. Documented as plain `string` with
  a comment about runtime validation via `normalizeIndicatorStyle`.
- `FaceTicker` always started a 2.5s verb interval, even for the
  `unicode` style which hides the verb entirely. Now gated on
  `showVerb` from `renderIndicator` — `unicode` stays calm.

Pre-emptive self-review (avoid round 3):
- Three call sites duplicated the literal `'kaomoji'` default
  (uiStore, normalizeIndicatorStyle, slash command). Added
  `DEFAULT_INDICATOR_STYLE` to interfaces.ts and threaded it through
  so changing the default touches one line.

* fix(tui-gateway): normalize config.get indicator output to match TUI render

Round 4 Copilot review on PR #17150: `config.get` for `indicator`
returned the raw `display.tui_status_indicator` value without
validation, so a hand-edited config.yaml with stray casing or an
unknown style would leave `/indicator` printing one thing while
the TUI rendered the kaomoji default (frontend's
`normalizeIndicatorStyle` does this normalization on receive).

Lifted the allow-list to module scope as `_INDICATOR_STYLES` /
`_INDICATOR_DEFAULT`, reused by both `config.set` and `config.get`.
Comment notes the alignment with `INDICATOR_STYLES` /
`DEFAULT_INDICATOR_STYLE` in interfaces.ts so adding/removing a
style is a one-line change on each end.

Tests cover: known value verbatim, casing/whitespace normalize,
unknown→default, unset→default.

* fix(tui-gateway): preserve falsy-input diagnostics in config.set indicator error

Round 5 Copilot review on PR #17150: `raw = str(value or "").strip().lower()`
collapsed any falsy non-string (`0`, `False`, `[]`) to empty string,
so the error message read `unknown indicator: ` with nothing after —
losing the original input.

Switched to `("" if value is None else str(value)).strip().lower()`
so only `None` (the genuine 'no value' case) becomes blank.  Used
`{raw!r}` in the error so the diagnostic is unambiguous (`'0'` vs `0`).

Tests:
- known-value happy path (`'EMOJI'` → `'emoji'`)
- falsy non-string inputs (`0` / `False` / `[]`) surface meaningfully
- `None` keeps the blank-repr error
2026-04-28 18:19:16 -05:00
Austin Pickett c3d39feb3a feat(latex): latex in tui 2026-04-28 19:08:11 -04:00
brooklyn! 258efb2575 feat(tui): expand light-terminal auto-detection (HERMES_TUI_THEME, background hex) (#17113)
* feat(tui): expand light-terminal auto-detection (HERMES_TUI_THEME, BG hex)

Modern terminals (Ghostty, Warp, iTerm2) don't set COLORFGBG, so the
auto-light path was effectively COLORFGBG-only and silently broken for
many users.  Two pragmatic additions, both opt-in, plus a clearer
priority chain:

1. **`HERMES_TUI_THEME=light|dark`** as a symmetric explicit override.
   The existing `HERMES_TUI_LIGHT` is fine but reads as boolean noise;
   a named theme env var matches `display.skin` muscle memory.

2. **`HERMES_TUI_BACKGROUND` hex/rgb hint.**  Lets advanced users
   (or a future OSC11 query helper that caches the answer) state a
   ground-truth background colour.  Decoded to Rec. 709 luma; ≥ 0.6
   counts as light.

Priority order is now fully ordered and explainable:
  1. `HERMES_TUI_LIGHT` (1/0/true/false/on/off).
  2. `HERMES_TUI_THEME=light|dark`.
  3. `HERMES_TUI_BACKGROUND` luminance.
  4. `COLORFGBG` last field — light slots 7/15 → light, 0–15 → dark
     (authoritative when set, so the new TERM_PROGRAM path can never
     stomp on a terminal that already volunteered a dark answer).
  5. `TERM_PROGRAM` allow-list — empty by default.  The slot is left
     in place because folks asked for it but populating it risks
     wrongly flipping users on Apple_Terminal / iTerm2 dark profiles
     to light.  Easy to add per terminal once we have signal.

Tests: 5 new cases in `theme.test.ts` covering theme env, background
hex (3- and 6-char), invalid hex falling through, and COLORFGBG taking
precedence over the future allow-list.

Validation: `npm run type-check` clean, `npm test --run` 392/392.

* review(copilot): tighten theme detection comments + drop unnecessary cast

* review(copilot): strict hex regex so partial garbage doesn't slip into luminance

* test(tui): make TERM_PROGRAM allow-list injectable so precedence is provable

Copilot review on PR #17113: `LIGHT_DEFAULT_TERM_PROGRAMS` is empty
in production, so the prior assertion would have passed even if
`detectLightMode` ignored `COLORFGBG` entirely.  That defeats the
test's purpose.

`detectLightMode` now takes the allow-list as an optional second
argument (defaults to the production set).  The test injects a set
containing `Apple_Terminal`, asserts the allow-list alone WOULD
return light, then asserts `COLORFGBG: '15;0'` overrides it — the
precedence rule is now exercised, not assumed.

* fix(tui): COLORFGBG empty-trailing-field falls through; isolate DEFAULT_THEME tests

Round 2 Copilot review on PR #17113:

1. `Number(colorfgbg.split(';').at(-1))` returns 0 for an empty trailing
   field (e.g. `COLORFGBG='15;'` → bg===0), which would have looked
   like an authoritative dark slot and incorrectly blocked the
   TERM_PROGRAM allow-list.  Added a `/^\d+$/` guard before coercion;
   non-numeric trailing fields now fall through.

2. Fixed the misleading '0–6 / 8–15 ranges are dark' comment — the
   block returns true for bg===15, so the range is actually 0–6 / 8–14.

3. `DEFAULT_THEME` is computed from `process.env` at module-load.
   A developer shell with `HERMES_TUI_THEME=light` (or a bright
   `HERMES_TUI_BACKGROUND`) would flip it and break local tests.
   The DEFAULT_THEME describe blocks now sterilize the relevant env
   vars + dynamically import theme.ts (vi.resetModules pattern from
   platform.test.ts).  fromSkin tests compare against DARK_THEME
   directly to decouple them from ambient env.

* test(tui): isolate ALL env-coupled theme symbols, not just DEFAULT_THEME

Round 3 Copilot review on PR #17113: the static top-level imports of
`fromSkin`, `DARK_THEME`, `LIGHT_THEME` evaluated theme.ts before
`importThemeWithCleanEnv` had a chance to clean the env. Because
`fromSkin` closes over `DEFAULT_THEME`, an ambient `HERMES_TUI_THEME=light`
or bright `HERMES_TUI_BACKGROUND` would still flip the base palette
and cause local-only failures.

Removed the static import entirely.  Every test now obtains its theme
symbols via `importThemeWithCleanEnv`, including `detectLightMode`
(for consistency, even though it takes env as a parameter).
`fromSkin` tests assert against the cleaned `DEFAULT_THEME` from the
same dynamic import — preserves the actual contract (skins extend the
ambient base palette) without coupling the test to dev-shell state.

Verified by running with HERMES_TUI_THEME=light + HERMES_TUI_BACKGROUND=#ffffff:
all 20 theme tests still pass.

Self-review (avoid round 4):
- Audited other test files importing DEFAULT_THEME (syntax.test.ts,
  streamingMarkdown.test.ts, constants.test.ts) — all just pass it as
  a parameter or assert palette property existence (works on both
  light + dark), so no env coupling there.
2026-04-28 18:02:06 -05:00
brooklyn! 1e326c686d fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races (#17118)
* fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races

`tui_gateway` reports `tui_gateway_crash.log` traces where the main
thread sits in `sys.stdin` while a worker holds `_stdout_lock` mid-
flush, and SIGTERM then calls `sys.exit(0)` while the lock is still
held — the interpreter shutdown stalls behind the wedged write.

Two narrowly scoped hardenings:

**`tui_gateway/transport.py`**

* Move JSON serialisation outside the lock — long messages no longer
  block sibling writers while we serialise.
* Treat `BrokenPipeError`, `ValueError` ("I/O on closed file") and
  generic `OSError` from both `write` and `flush` as "peer is gone":
  return `False` instead of bubbling, matching what `write_json`'s
  callers in `entry.py` already expect.
* Split `flush` into its own try block so a stuck flush never strands
  a partial write or holds the lock indefinitely on its way out.
* Optional `HERMES_TUI_GATEWAY_NO_FLUSH=1` env knob to skip explicit
  `flush()` entirely on environments where a half-closed read pipe
  produces an indefinite kernel-level block.  Default unchanged.

**`tui_gateway/entry.py`**

* `_log_signal` now spawns a 1-second daemon timer that calls
  `os._exit(0)` if the orderly `sys.exit(0)` path is itself stuck
  behind a wedged worker.  Atexit handlers run inside the grace
  window when they can; the timer is the safety net so a deadlocked
  flush no longer strands the gateway process.

Tests:

* `test_write_json_closed_stream_returns_false` — ValueError path.
* `test_write_json_oserror_on_flush_returns_false` — OSError on flush
  must not strand the lock; the write portion still landed before the
  flush failure.
* `test_write_json_no_flush_env_skips_flush` — env knob bypass.

Validation: `scripts/run_tests.sh tests/tui_gateway/test_protocol.py`
(42/42 pass; one pre-existing failure on
`test_session_resume_returns_hydrated_messages` is unrelated to this
change — same `include_ancestors` mock kwarg issue tracked elsewhere).
`scripts/run_tests.sh tests/test_tui_gateway_server.py` 90/90 pass.

* review(copilot): tighten transport hardening comments + test cleanup

* review(copilot): narrow exception capture, configurable grace, simpler no-flush test

* fix(tui-gateway): narrow ValueError to closed-stream; surface UnicodeEncodeError

Copilot review on PR #17118: `UnicodeEncodeError` is a ValueError
subclass, so a non-UTF-8 stdout (mismatched PYTHONIOENCODING / locale)
would have been silently swallowed as 'peer gone' under
`except ValueError`.  That hides a real environment bug.

Now:
- UnicodeEncodeError → log with exc_info (warning) and drop the frame
- ValueError where str(e) contains 'closed file' → peer gone, return False
- Any other ValueError → log loudly, drop frame (defensive, but visible)

Same shape applied to flush.  Adds two regression tests.

* fix(tui-gateway): reserve write() False for peer-gone; re-raise programming errors

Round 2 Copilot review on PR #17118: `Transport.write()` returning
`False` is documented as 'peer is gone', and `entry.py` reacts by
calling `sys.exit(0)`.  But the implementation also returned False
for non-IO conditions (non-JSON-safe payloads, UnicodeEncodeError,
unrelated ValueErrors), so a programming error or local env bug would
present as a clean disconnect — exactly the diagnosis pain we wanted
to eliminate.

Now:
- `json.dumps` failure → re-raises (TypeError/ValueError surfaces in crash log)
- `BrokenPipeError` → False (peer gone)
- `ValueError('...closed file...')` → False (peer gone)
- `UnicodeEncodeError` and any other ValueError → re-raise
- `OSError` → False (existing IO-failure semantics, debug-logged)

Tests updated to assert the re-raise behaviour and added a
non-serializable-payload regression test.

* fix(tui-gateway): narrow OSError to peer-gone errnos; honest test naming

Round 3 Copilot review on PR #17118:

- Docstring claimed False = peer gone, but generic OSError on write/flush
  also returned False — meaning ENOSPC/EACCES/EIO would silently exit.
  Added `_PEER_GONE_ERRNOS = {EPIPE, ECONNRESET, EBADF, ESHUTDOWN, +WSA}`
  and narrowed the OSError handlers; non-peer-gone errnos re-raise.
  Docstring now lists OSError as peer-gone branch with the errno set.
- The `_DISABLE_FLUSH` test was named after the env var but actually
  patched the module constant. Renamed it to reflect the contract being
  tested (skips flush when constant is true) AND added a real
  end-to-end test that sets the env var, reloads transport.py, and
  asserts the constant flips. Cleanup reload restores defaults so
  parallel tests stay isolated.

Self-review (avoid round 4):
- Verified TeeTransport's secondary-swallow stays intentional.
- _log_signal grace path already covered by separate tests.
2026-04-28 17:54:06 -05:00
brooklyn! af6b1a3343 fix(tui): honor display.busy_input_mode in TUI v2 (#17110)
* fix(tui): honor display.busy_input_mode in TUI v2

The TUI v2 frontend hard-coded `composerActions.enqueue(full)` whenever
`ui.busy` was true. The classic CLI and gateway adapters honor the
`display.busy_input_mode` config key (`interrupt` | `queue` | `steer`),
but Ink ignored it — sending a message during a long-running turn always
landed in the queue regardless of config. The config default is already
`interrupt` (hermes_cli/config.py), so users who explicitly opted into
that experience were silently stuck on the legacy queue path.

This wires the value through the existing config-sync surface:

* `applyDisplay` now reads `display.busy_input_mode`, defaults to
  `interrupt` (matching `_load_busy_input_mode` in tui_gateway), and
  drops it into a new `UiState.busyInputMode` field.
* `dispatchSubmission` and the queue-edit fall-through call a shared
  `handleBusyInput` helper that branches on the mode:
    * `queue`     — legacy behavior, append to the queue.
    * `steer`     — call `session.steer`; on rejection, fall back to
                    queue with a sys note.
    * `interrupt` — `turnController.interruptTurn(...)` then `send()`,
                    so the new prompt actually moves.
* Mtime polling in `useConfigSync` already re-applies `config.full`, so
  flipping `display.busy_input_mode` in `~/.hermes/config.yaml` takes
  effect on the next 5s tick without restarting the TUI.

Tests:
* `applyDisplay → busy_input_mode` covers normalization + UiState fan-out.
* `normalizeBusyInputMode` mirrors the Python side's allow-list.

Validation:
* `npm run type-check` (in `ui-tui/`) — clean.
* `npm test --run` (in `ui-tui/`) — 394/394.

* review(copilot): narrow busy_input_mode type, preserve queue order on steer fallback

* review(copilot): clarify handleBusyInput comment (option, not return value)

* fix(tui): default busy_input_mode to queue in TUI (CLI keeps interrupt)

In a full-screen TUI users typically author the next prompt while the
agent is still streaming, so an unintended interrupt loses in-flight
typing.  TUI fallback now defaults to `queue`; CLI / messaging
adapters keep `interrupt` as the framework default.

Override per-config via `display.busy_input_mode: interrupt` (or
`steer`) — the normalize/wire path is unchanged, only the missing-
value branch differs from the Python default.

uiStore initial value also flipped to `queue` so first-frame render
before `config.full` lands matches the eventual normalized value.
2026-04-28 17:52:13 -05:00
brooklyn! 8d591fe3c7 fix(tui): prefer raw text over Rich-rendered ANSI in TUI message display (#17111)
`turnController.recordMessageComplete` and `recordMessageDelta` both
prioritised `payload.rendered` over `payload.text`.  `payload.rendered`
is the Rich-Console output `tui_gateway` builds for terminals that
can't render markdown themselves; the TUI already renders markdown via
`<Md>`.  Two real bugs follow:

1. **Final answer garbled when `display.final_response_markdown: render`
   is set** (#16391).  Raw ANSI escape sequences pass through into the
   React tree and the user sees overlapping coloured text instead of
   their answer.

2. **Streaming silently drops content.**  Per-delta `rendered` is an
   *incremental* Rich fragment.  The previous code did
   `this.bufRef = rendered ?? this.bufRef + text`, which on every tick
   replaced the whole accumulated buffer with the latest mid-sequence
   ANSI fragment.  Long replies arrived truncated and looked
   half-painted — easy to miss as "model is being terse" instead of a
   client bug.

Fix:

* `recordMessageComplete` now prefers `payload.text`, falling back to
  `payload.rendered` only when the gateway elected not to send any.
* `recordMessageDelta` always accumulates `text`; `rendered` is ignored
  on the streaming path entirely (Ink does its own markdown render via
  `<Md>` / `streamingMarkdown.tsx`).

Tests:

* `prefers raw text over Rich-rendered ANSI on message.complete` —
  the assistant message reflects raw markdown, not ANSI.
* `falls back to payload.rendered when text is missing` — preserves
  the legacy "no `text`, only ANSI" path used by some adapters.
* `always accumulates raw text in message.delta and ignores rendered` —
  pre-fix code would have made this assertion fail because each delta
  overwrote the buffer.

Validation: `npm run type-check` clean, `npm test --run` 392/392 pass.
2026-04-28 17:47:50 -05:00
brooklyn! 15ef11a8b8 fix(tui): make /browser connect actually take effect on the live agent (#17120)
* fix(tui): make /browser connect actually take effect on the live agent

Reports were that `/browser connect <url>` (and "changes to CDP url
don't get picked up") didn't propagate to the live agent in `--tui`,
forcing users to fall back to setting `browser.cdp_url` in
`config.yaml` and restarting.  Tracing the path on current main shows
the protocol wiring is already correct — `/browser` is registered in
`ui-tui/src/app/slash/commands/ops.ts` and dispatches `browser.manage`
through the gateway RPC, NOT the slash worker (covered by the
`browser.manage` row in `slashParity.test.ts`).  But three real gaps
left the experience flaky:

1. `cleanup_all_browsers()` ran AFTER `os.environ["BROWSER_CDP_URL"]`
   was rewritten.  `_ensure_cdp_supervisor(...)` reads the env to
   resolve its target URL, so a tool call landing in that brief window
   could re-attach the supervisor to the OLD CDP endpoint just before
   we reaped sessions, leaving the agent talking to a dead URL.
   Reorder to clean first, swap env, clean again so the supervisor
   for the default task is definitively closed.
2. `browser.manage status` reported only the env var, ignoring
   `browser.cdp_url` from config.yaml.  `_get_cdp_override()` (the
   resolver the agent itself uses) consults both — match it so
   `/browser status` answers the same question the next
   `browser_navigate` will see.  Closes a stealth bug where users
   saw "browser not connected" while their CDP URL was perfectly
   set in config.yaml.
3. `/browser disconnect` only cleared `BROWSER_CDP_URL` and reaped
   once, leaving the same swap window as connect.  Symmetrical
   double-cleanup here too.

Frontend (`ops.ts`):
* Echo "next browser tool call will use this CDP endpoint" on success
  so users see immediate confirmation that the gateway accepted the
  swap, even before any tool runs.
* Mention `browser.cdp_url` in `config.yaml` in the usage hint and
  the not-connected status line.  Persistent config is the correct
  fix for some terminal-multiplexer / sub-agent flows where env
  inheritance is unreliable; surfacing it makes that workaround
  discoverable.

Tests (4 new, all hermetic):
* `status` returns the resolved URL when only `browser.cdp_url` is
  set in config.yaml.
* `connect` writes env AND cleans before/after, in that order.
* `connect` against an unreachable endpoint does NOT mutate env or
  reap.
* `disconnect` removes env and cleans twice.

Validation:
  scripts/run_tests.sh tests/test_tui_gateway_server.py — 94/94 pass.
  cd ui-tui && npm run type-check — clean; npm test --run — 389/389.

* review(copilot): always defer to _get_cdp_override; normalize bare host:port

* review(copilot): collapse discovery-style CDP paths so /json/version isn't duplicated

* fix(tui): /browser status must not perform CDP discovery I/O

Copilot review on PR #17120: previous version routed through
`tools.browser_tool._get_cdp_override`, which calls
`_resolve_cdp_override` and performs an HTTP probe to /json/version
with a multi-second timeout for discovery-style URLs.  That blocks
the TUI on `/browser status` whenever the configured host is slow
or unreachable.

Status now reads env-then-config directly with no network I/O.  The
WS normalization still happens in `browser_navigate` for actual
tool calls, so behaviour-on-call is unchanged.

* fix(tui): skip /json/version probe for concrete ws://devtools/browser endpoints

Round 2 Copilot review on PR #17120: hosted CDP providers (Browserbase,
browserless, etc.) return concrete `ws[s]://.../devtools/browser/<id>`
URLs which are already directly connectable but don't serve the HTTP
discovery path.  The previous `/json/version` probe rejected these
valid endpoints with 'could not reach browser CDP'.

For `ws[s]://...` URLs whose path starts with `/devtools/browser/` we
now do a TCP-level reachability check (`socket.create_connection`)
instead of the HTTP probe.  The actual CDP handshake happens on the
next `browser_navigate` call, so we still surface unreachable hosts
as 5031 errors — just without the false negatives.

Discovery-style URLs (`http://host:port[/json[/version]]`) keep the
HTTP probe path unchanged.  Updated existing test + added two new
ones (TCP-only success, TCP unreachable → 5031).
2026-04-28 17:46:57 -05:00
brooklyn! 87d3fa6f1c feat(tui): opt-in auto-resume of the most recent session (#17130)
* feat(tui): opt-in auto-resume of the most recent session

`hermes --tui` always forges a fresh session at startup unless the user
sets `HERMES_TUI_RESUME=<id>`.  Disconnects, terminal-window crashes,
and accidental Ctrl+D therefore lose every piece of in-flight context
even though `state.db` still has the full history a `/resume` away.

Add an opt-in path that mirrors classic CLI's `hermes -c` muscle
memory: when `display.tui_auto_resume_recent: true` is set in
`~/.hermes/config.yaml`, the TUI looks up the most recent human-facing
session and resumes it instead of starting fresh.  Default off so
existing users aren't surprised; explicit `HERMES_TUI_RESUME` always
wins.

Wires:

* New `session.most_recent` JSON-RPC in `tui_gateway/server.py` that
  returns the first non-`tool` row from `list_sessions_rich`, or
  `{"session_id": null}` when none.  Uses the same deny-list as
  `session.list` so sub-agent rows can't sneak in.
* `createGatewayEventHandler.handleReady` re-ordered: explicit
  `STARTUP_RESUME_ID` first (unchanged), then conditional auto-resume
  via `config.get full → display.tui_auto_resume_recent`, then the
  legacy `newSession()` fallback.  Failures of either RPC fall back
  to `newSession()` so the path is always finite.
* Default `display.tui_auto_resume_recent: False` added to
  `DEFAULT_CONFIG` in `hermes_cli/config.py` (no `_config_version`
  bump per AGENTS.md — deep-merge handles the additive key).

Tests:

* 4 new vitest cases in `createGatewayEventHandler.test.ts` cover
  every gate-and-fallback combination (env wins, config off, config
  on with hit, config on with miss).
* 3 new pytest cases for `session.most_recent` (denied row skip,
  tool-only → null, db-unavailable → null).

Validation:
  scripts/run_tests.sh tests/test_tui_gateway_server.py — 93/93.
  cd ui-tui && npm run type-check — clean; npm test --run — 393/393.

* review(copilot): fold session.most_recent errors into null + extend ConfigDisplayConfig

* review(copilot): cover RPC-rejection fallbacks in auto-resume tests
2026-04-28 16:53:38 -05:00
brooklyn! 75d9811393 Merge pull request #17114 from NousResearch/bb/tui-table-separator
fix(tui): visually distinguish markdown table rows from prose (#15534)
2026-04-28 14:52:53 -07:00
brooklyn! e42065b1f7 fix(tui): drop stale stream events after ctrl-c interrupt (#16706)
* fix(tui): drop stale stream events after ctrl-c interrupt

Once interruptTurn() flips this.interrupted, only recordMessageDelta
short-circuited.  recordReasoningDelta/Available, recordToolStart/
Progress/Complete, and recordInlineDiffToolComplete kept populating
turnState until the python loop reached its next _interrupt_requested
check (~1s on busy turns), making it look like ctrl-c was ignored
while late "thinking" + tool calls kept landing in the UI.

Add the same interrupted guard to every stream-side recorder, and
clear the flag at startMessage() so the next turn isn't suppressed
if the previous turn never delivered message.complete.

* fix(tui): guard recordTodos against post-interrupt mutation; fake-timers in test

Copilot review on PR #16706:

1. `recordToolStart` is interruption-guarded, but `tool.start`
   handler also calls `recordTodos(payload.todos)` first — so a
   late tool.start carrying todos could still mutate `turnState.todos`
   after Ctrl-C, leaving ghost rows in the panel.  Adds the same
   `if (this.interrupted) return` early-exit to `recordTodos` so
   *all* tool.start side-effects are dropped post-interrupt.

2. The interrupt test was leaking a real `setTimeout` (interrupt
   cooldown) across test files, which could fire later and mutate
   uiStore from the wrong test context.  Wraps the test in
   `vi.useFakeTimers()` + `vi.runAllTimers()` and restores real
   timers in finally.

3. Extends the same test with a todos payload on the post-interrupt
   tool.start so we have explicit regression coverage for #1.

* fix(tui): guard pushTrail post-interrupt; harden interrupt-test cleanup

Round 2 Copilot review on PR #16706:

1. `tool.generating` events route through `pushTrail`, which was not
   interruption-guarded — late events could still write 'drafting …'
   into `turnTrail` after Ctrl-C, leaving a stale shimmer in the UI.
   Adds the same `if (this.interrupted) return` early-exit.

2. Test cleanup moved `vi.runAllTimers()` into `finally` (before
   `vi.useRealTimers()`) so a mid-test assertion failure can't leak
   the interrupt-cooldown setTimeout across other test files.

3. Replaced the misleading 'pre-interrupt todos … expected to be
   cleared by the interrupt cycle' comment with an accurate one
   reflecting current behaviour (interrupt does NOT clear todos).

4. Added an explicit assertion that a post-interrupt `tool.generating`
   event does not extend `turnTrail` — regression coverage for #1.
2026-04-28 16:51:07 -05:00
brooklyn! a830f25f71 fix(tui): surface gateway stderr tail in start_timeout activity (#17112)
* fix(tui): append gateway stderr tail to start_timeout activity

`gateway.start_timeout` previously published only `cwd` + `python`,
which made TUI startup failures hard to disambiguate.  The user saw
`gateway startup timed out · /path/to/python /repo · /logs to inspect`
with no signal whether the actual cause was a wrong python interpreter,
a missing dependency, or a config parse failure.

Plumb a 20-line stderr tail through the event so the most useful lines
land directly in the TUI activity feed, capped to the last 8 non-empty
lines for readability:

* `gatewayClient.ts` — collect `getLogTail(20)` when the readyTimer
  fires and attach it as `payload.stderr_tail`.
* `gatewayTypes.ts`  — extend the `gateway.start_timeout` event union
  with the new optional field.
* `createGatewayEventHandler.ts` — emit the trimmed lines after the
  existing `gateway startup timed out` activity entry, classified
  `error`.

Tests: regression test in `createGatewayEventHandler.test.ts` checks
that `ModuleNotFoundError` / `FileNotFoundError` lines from the tail
land in `getTurnState().activity` so they show up in the UI immediately.

Validation: `npm run type-check` clean, `npm test --run` 390/390.

* review(copilot): filter blanks before slice and cap stderr tail at 120 chars
2026-04-28 15:56:02 -05:00
Brooklyn Nicholson 50edbe6f46 review(copilot): say solid rule, not dashed 2026-04-28 15:49:35 -05:00
Brooklyn Nicholson 4689ace7cb review(copilot): clarify table-rule rationale (UTF-16 code units, not graphemes) 2026-04-28 15:49:15 -05:00
Brooklyn Nicholson 9eabc24e24 fix(tui): visually distinguish markdown table rows from prose (#15534)
Tables rendered through `<Md>` had no separator and no header weight,
so they read as a paragraph with extra whitespace.  This adds two tiny,
border-free changes that survive Ink's grapheme-approximate column
widths better than a full outline:

* Bold the header row, keeping the existing amber colour.
* Insert a dim `─`-dashed rule between the header and body rows.

We deliberately stay away from a full outline — column widths are
measured via `stripInlineMarkup(...).length`, which is grapheme-aware
but still off by a cell on East Asian wide characters and emoji-mid-
cell strings.  A header rule plus the existing 2-space column gap
gives the visual hierarchy the issue asks for without amplifying that
inaccuracy into a misaligned border.

Validation: `npm run type-check` clean, `npm test --run` 389/389.
2026-04-28 15:49:15 -05:00
Gille 0d957a8d48 fix(tui): surface mouse slash command (#17126) 2026-04-28 13:27:43 -07:00
brooklyn! 5f215b13ce fix(docker): materialize bundled TUI Ink package (#16690)
* fix(docker): materialize bundled TUI Ink package

* fix(docker): keep nested deps out of build context

* fix(docker): make TUI Ink smoke check deterministic

* test(docker): skip dockerignore assertion in partial checkouts

* fix(docker): use lockfile install for vendored Ink deps

* test(cli): expect deterministic npm ci in /update flow

* fix(docker): fall back to npm install for vendored Ink deps

* fix(docker): keep bundled Ink source for TUI runtime builds

* fix(docker): dedupe React in vendored Ink package
2026-04-28 15:11:47 -05:00
Gille 124da27767 fix(tui): handle empty bracketed paste fallback (#15594) 2026-04-28 14:30:08 -05:00
kshitijk4poor 5d2f9b5d7d fix: follow-up for salvaged PR #17061
- Remove dead _lmstudio_loaded_context attribute from run_agent.py (set
  but never read — the loaded context is pushed to context_compressor.update_model
  which is the actual consumer)
- Cache empty reasoning options with 60s TTL to avoid per-turn HTTP probe
  for non-reasoning LM Studio models. Non-empty results cached permanently.
- Extract _lmstudio_server_root(), _lmstudio_request_headers(), and
  _lmstudio_fetch_raw_models() shared helpers in models.py — eliminates
  URL-strip + auth-header + HTTP-call duplication across probe_lmstudio_models,
  ensure_lmstudio_model_loaded, and lmstudio_model_reasoning_options
- Revert runtime_provider.py base_url precedence change: preserve the
  established contract (saved config.base_url > env var > default) for all
  api_key providers
- Remove unnecessary config version bump 22→23
- Fix TUI test: relax target_model assertion to avoid module-cache flake
- AUTHOR_MAP: added rugved@lmstudio.ai → rugvedS07
2026-04-28 12:27:36 -07:00
Rugved Somwanshi 433d38da09 chore(docs): update provider docs 2026-04-28 12:27:36 -07:00
Rugved Somwanshi a0105a7f81 chore(agent): drop drift from rebasing 2026-04-28 12:27:36 -07:00
Rugved Somwanshi 01ad0aacaf fix(tui): show correct context length 2026-04-28 12:27:36 -07:00
Rugved Somwanshi fa2bee1215 fix(tui): update test for target model 2026-04-28 12:27:36 -07:00
Rugved Somwanshi 214ca943ac feat(agent): add lmstudio integration 2026-04-28 12:27:36 -07:00
Austin Pickett 7d4648461a Merge pull request #17007 from NousResearch/austin/fix/more-design-system
fix: replace all buttons for design system buttons
2026-04-28 11:46:47 -07:00
kshitijk4poor faa15772b7 chore: add contributor emails to AUTHOR_MAP
Add ningfangbin and Joseph19820124 for salvage PR attribution.
2026-04-28 11:33:07 -07:00
nfb0408 74c209534c fix(copilot-acp): disable streaming path for CopilotACPClient
CopilotACPClient communicates via subprocess stdio and returns a plain
SimpleNamespace from _create_chat_completion(). The streaming path tries
to iterate this as a stream, crashing with:
  TypeError: 'types.SimpleNamespace' object is not iterable

Mirror the existing ACP exclusion pattern (used for Responses API upgrade)
to disable streaming when provider is copilot-acp or base_url starts with
acp:// or acp+tcp://.

Based on PR #9428 by @ningfangbin and issue #16271 by @Joseph19820124.

Fixes #16271
2026-04-28 11:33:07 -07:00
Siddharth Balyan 18f585f091 ci(nix): auto-fix stale npm hashes on push to main (#16285)
* ci(nix): auto-fix stale npm hashes on push to main

When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.

This eliminates the recurring manual hash-bump PRs (#15420, #15314,
#15272, #15244) by reusing the existing fix-lockfiles --apply pipeline.

The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.

Closes #15314

* fix(ci): use GitHub App token for auto-fix-main push

GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.

Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.

Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.

Resolves review feedback from Bugbot (r3144569551).

* ci(nix): rename lockfile check job for required status check

Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.

* fix(ci): harden auto-fix-main against races, loops, and silent failures

Address adversarial review findings:

1. Race condition (#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (#12): git push origin HEAD:main instead of
   bare git push.

---------

Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
2026-04-29 00:01:58 +05:30
Siddharth Balyan 4bf0e75ae9 fix(nix): make extraPackages actually work via per-user profile (#17047)
* fix(nix): make extraPackages actually work — wire into per-user profile

#17030 deprecated extraPackages because it only set the systemd service
PATH, which the terminal backend's login-shell snapshot discards.

Instead of deprecating, fix it: set users.users.${cfg.user}.packages
so NixOS builds a per-user profile at /etc/profiles/per-user/hermes/bin.
This path is included in PATH by /etc/set-environment, which the login
shell sources, so the terminal backend's snapshot picks it up.

One line of actual logic:
  users.users.${cfg.user}.packages = cfg.extraPackages;

Verified in a NixOS VM test: su - hermes -c 'which hello' resolves
to /etc/profiles/per-user/hermes/bin/hello.

Reverts the deprecation warning and docs changes from #17030, restores
extraPackages as the recommended way to give the agent extra tools.

Container mode is unaffected — extraPackages was always native-only
(the systemd path line is inside !cfg.container.enable).

* nix: clarify additive merge semantics for extraPackages user profile

---------

Co-authored-by: Siddharth Balyan <daimon@noreply.github.com>
2026-04-28 23:50:32 +05:30
helix4u a3c27b5cd1 docs: clarify quick commands config shape 2026-04-28 11:07:07 -07:00
Austin Pickett 47d4b6e31a feat: add spinner, lowercase version 2026-04-28 13:59:33 -04:00
Gille a1921c43cc fix(tui): prefer exact slash command matches (#15813) 2026-04-28 12:22:26 -05:00
Austin Pickett 912590a143 fix: button sizes 2026-04-28 13:11:47 -04:00
Austin Pickett 1285172aca fix(components): refactor to use design system 2026-04-28 13:03:05 -04:00
Teknium b53a091b97 remove: BOOT.md built-in hook (#17093)
BOOT.md was merged in PR #3733 before the feature was ready — the
built-in hook spawned a bare AIAgent() with no model/runtime kwargs,
which immediately 401s on any provider with a custom endpoint. Three
separate community PRs (#5240, #12514, #14992) tried to paper over it.

Remove the BOOT.md hook entirely and its user-facing docs/tips. Keep
the gateway/builtin_hooks/ package and the HookRegistry._register_builtin_hooks()
hook-point intact as the extension surface for future always-on
gateway hooks.

Closes #5239.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 09:50:27 -07:00
Teknium b5128a751b perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage (#17046)
* perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage

Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:

  cli                       1200 → 958 ms  (-242)
  run_agent                 1220 → 901 ms  (-319)
  tools.web_tools            711 → 423 ms  (-288)
  agent.anthropic_adapter    230 →  15 ms  (-215)
  agent.auxiliary_client     253 →  68 ms  (-185)

Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):

1. tools/web_tools.py:
   'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
   which is only called when backend='firecrawl'. Users on Exa/Tavily/
   Parallel pay zero firecrawl cost.

2. cli.py + gateway/run.py:
   'from agent.account_usage import ...' moved into the /limits handlers.
   account_usage transitively pulls the OpenAI SDK chain; only needed
   when the user runs /limits.

3. agent/anthropic_adapter.py:
   'try: import anthropic as _anthropic_sdk' replaced with a cached
   '_get_anthropic_sdk()' accessor. The three usage sites
   (build_anthropic_client, build_anthropic_bedrock_client,
   read_claude_code_credentials_from_keychain) now resolve via the
   accessor. All pre-existing test patches of
   'agent.anthropic_adapter._anthropic_sdk' keep working because the
   accessor respects any value already in module globals.

4. agent/auxiliary_client.py AND run_agent.py:
   'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
   level object that looks like the OpenAI class but imports the SDK on
   first call/isinstance check. This preserves:
     - 15+ in-module OpenAI(...) construction sites in auxiliary_client
       and the single site in run_agent's _create_openai_client (Python's
       function-scope name lookup finds the proxy, forwards the call);
     - 'patch("agent.auxiliary_client.OpenAI", ...)' and
       'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
       files (patch replaces the module attribute as usual).
   Tried two alternatives first:
     - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
       (the audit's hypothesis here was wrong).
     - Module-level __getattr__ — works for external access but Python
       function-scope name resolution skips __getattr__, so in-module
       OpenAI(...) calls NameError.

Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.

Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
  test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
  pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
  test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
  test_api_max_retries_config.py,
  tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
  pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
  errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
  no longer pulls 'anthropic' or 'firecrawl' at all.

* fix(gateway): restore top-level account_usage import for test-patch surface

CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:

    AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'

The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.

Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:

  1. the gateway is a long-running daemon — boot cost is paid once per
     install, not per turn;
  2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
     account_usage) remain in place and still account for the bulk of
     the savings reported in the PR body;
  3. preserving the patch surface keeps the established
     'gateway.run.fetch_account_usage' monkeypatch pattern working
     without touching tests.

Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 09:38:42 -07:00
Austin Pickett 663602f6b0 Merge branch 'austin/fix/more-design-system' of github.com:NousResearch/hermes-agent into austin/fix/more-design-system 2026-04-28 12:28:32 -04:00
Austin Pickett e1027134cd chore: remove comments 2026-04-28 12:28:08 -04:00
github-actions[bot] f62272b203 fix(nix): refresh npm lockfile hashes 2026-04-28 16:20:05 +00:00
Austin Pickett 0348a69c51 fix: migrate select to design system 2026-04-28 12:02:34 -04:00
Austin Pickett 753a071491 fix: badges 2026-04-28 11:24:08 -04:00
Austin Pickett e5601d1e85 fix: update design language 2026-04-28 10:57:30 -04:00
Teknium df51ad7973 perf(config): mtime-cache load_config() and read_raw_config() (#17041)
load_config() and read_raw_config() now cache their result keyed on
the config file's (mtime_ns, size). On cache hit they return a deepcopy
of the cached value, skipping yaml.safe_load + deep-merge + normalize +
env-var expansion entirely. save_config() + migrate_config() write via
atomic_yaml_write which produces a fresh inode, so stat() sees a new
mtime_ns and the next load repopulates automatically — no explicit
invalidation hook needed.

Measured per-call cost:
  load_config() cold:   13.3 ms
  load_config() cached:  0.23 ms  (57x faster)
  read_raw_config() cached: 0.13 ms

A single gateway turn hits the config 5-15 times (session context,
auxiliary client resolution, memory config, plugin hooks, approval
lookups, per-tool settings). That's 65-200 ms/turn of pure YAML
re-parsing on main. After this change: 1-3 ms/turn.

Also migrates gateway/run.py's 6 direct yaml.safe_load(config.yaml)
call sites through _load_gateway_config, which now shares the
read_raw_config cache when _hermes_home agrees with the canonical
config path. The direct-read fallback is retained for tests that
monkeypatch gateway_run._hermes_home without touching HERMES_HOME.

Safety:
- load_config() returns a deepcopy on every call; the 67+ call sites
  that mutate the result (cfg["model"]["default"] = ..., etc.) can't
  corrupt the cache.
- save_config() / atomic_yaml_write bump mtime, naturally invalidating
  the cache for the next reader.
- Cache is keyed on str(config_path), so HERMES_HOME profile switches
  don't collide.

Verified:
- 112 config tests pass (test_config, test_config_env_expansion,
  test_config_env_refs, test_config_drift, test_config_validation,
  test_aux_config).
- 87 gateway tests pass (test_verbose_command, test_session_info,
  test_compress_focus, test_runtime_footer, test_resume_command,
  test_reasoning_command, test_approve_deny_commands,
  test_run_progress_interrupt).
- Live hermes chat smoke — 2 turns + /model switch + tool calls,
  zero errors in agent.log.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 07:06:35 -07:00
Teknium 42be5e49b0 fix(browser): detect missing Chromium and fail fast with actionable error (#17039)
Previously, check_browser_requirements() only checked for the agent-browser
CLI, not the Chromium binary it drives. When the CLI was present but
Chromium wasn't (common in Docker images predating the playwright install
step), the browser tool was advertised to the agent, every call hung for
the full command timeout (~30s each, ~220s for a chained navigate), and
the agent eventually gave up with no useful error — users saw 'browser
not working' with empty errors.log.

Changes:
- tools/browser_tool.py: add _chromium_installed() checking
  PLAYWRIGHT_BROWSERS_PATH + default Playwright cache paths for
  chromium-* / chromium_headless_shell-* dirs; wire into
  check_browser_requirements() for local mode (cloud providers
  unaffected). _run_browser_command fails fast with an actionable
  Docker vs. host message instead of hanging. _running_in_docker()
  checks /.dockerenv and /proc/1/cgroup.
- hermes_cli/tools_config.py: post_setup for 'Local Browser' now runs
  'agent-browser install --with-deps' after npm install to actually
  download Chromium. In Docker, points user at the updated image pull
  instead of trying to install into a read-only layer. Cloud-provider
  post_setup (browserbase) skips Chromium install entirely.
- tests/tools/test_browser_chromium_check.py: new tests covering
  search roots, install detection, requirements branches (local/cloud/
  camofox), and the fast-fail guard in docker/non-docker contexts.
- tests/tools/test_browser_homebrew_paths.py: 5 existing subprocess-path
  tests now mock _chromium_installed=True since they exercise the
  post-guard subprocess path.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 07:03:44 -07:00
Teknium e0f5d39837 fix(discord): widen slash-sync timeout to 600s under rate-limit pressure (#16713) (#17029)
Discord's per-app command-management bucket is ~5 writes / 20 s. A
mass-prune-plus-upsert reconcile (77 orphans + 30 desired = 107 writes
in the reported case) can't finish under the old flat 30 s budget, and
the subsequent reconnect retries inside the rate-limit cooldown also
time out — leaving slash commands broken for ~60 min until the bucket
fully recovers.

Bump the timeout to 600 s so realistic bursts drain, update the warning
message to point at the saturated bucket instead of a hardcoded 30 s.
The 600 s cap still guards against a true hang.

Credit to @Tranquil-Flow for PR #16739 and @davidbordenwi for reporting
#16713 with the bucket-math diagnosis.

Closes #16713.

Co-authored-by: Teknium <teknium@nousresearch.com>
2026-04-28 07:02:43 -07:00
Teknium 5ed1eb0d0f docs(config): surface telegram.reactions in DEFAULT_CONFIG (#17028)
The telegram.reactions key was already wired up (gateway/config.py bridges
it to TELEGRAM_REACTIONS at startup) but was undocumented and missing from
DEFAULT_CONFIG, so users had no way to discover it. Add it with the
existing off-by-default behavior preserved.

No behavior change — runtime default stays False.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 07:02:30 -07:00
Siddharth Balyan be41ccd0af fix(nix): deprecate extraPackages — does not reach terminal/skills (#17030)
extraPackages adds packages to the systemd service PATH, but the
terminal backend's login-shell snapshot rebuilds PATH from NixOS system
profiles, so tools added via extraPackages are invisible to terminal
commands, skills, and cron jobs — the entire use case.

Changes:
- Mark the option description as deprecated with explanation
- Emit a NixOS warning when extraPackages is non-empty, including a
  ready-to-paste environment.systemPackages replacement
- Update docs: quick-reference table, plugin example, and options
  reference all point to environment.systemPackages

The option still functions (non-breaking) so existing configs keep
working while users migrate.
2026-04-28 19:28:11 +05:30
konsisumer e4b69bf149 fix(gateway): guard against None request_overrides in _build_api_kwargs 2026-04-28 06:57:23 -07:00
Teknium 1d8b9e6458 fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027)
Auxiliary tasks (title_generation, vision, compression, web_extract,
session_search) now pick the correct wire protocol based on the
endpoint, not just on which resolve_provider_client branch built the
client.  Fixes 404s on Kimi Coding Plan and any other named provider
whose endpoint speaks Anthropic Messages.

Root cause: the 'api_key' branch of resolve_provider_client (and the
Step 2 fallback chain inside _resolve_auto) always built a plain
OpenAI client regardless of what the endpoint actually spoke.  For
provider=kimi-coding + model=kimi-for-coding, that meant:

    POST https://api.kimi.com/coding/v1/chat/completions
    { "model": "kimi-for-coding", ... }
    → 404 resource_not_found_error

The /coding route only accepts the Anthropic Messages shape (the main
agent already uses api_mode=anthropic_messages for it).  Earlier fixes
(#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and
external-process branches — but the named api_key branch (kimi-coding,
minimax, zai, future /anthropic providers) was the fourth sibling and
never got the same treatment.

Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a
plain OpenAI client in AnthropicAuxiliaryClient when:

  - api_mode is explicitly 'anthropic_messages', OR
  - the URL ends in '/anthropic', OR
  - the host is api.kimi.com + path contains '/coding', OR
  - the host is api.anthropic.com.

Wired into _wrap_if_needed (covers all resolve_provider_client
branches that already go through it) and into the Step 2 api_key
fallback chain inside _resolve_auto.  Explicit api_mode still wins:
passing api_mode='chat_completions' forces OpenAI wire, and already-
wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass
through unchanged.

E2E verified:
- resolve_provider_client('kimi-coding', 'kimi-for-coding')
  → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd)
- _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient
- resolve_provider_client('openrouter', ...) → plain OpenAI (no regression)
- api_mode='chat_completions' override → plain OpenAI (explicit wins)

Tests:
- tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests
  covering URL detection, wrap decisions, and integration.
- 204/205 existing auxiliary tests pass (1 pre-existing failure on
  main, unrelated to this change).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:50:14 -07:00
Teknium e123f4ecf0 feat(gateway): opt-in runtime-metadata footer on final replies (#17026)
Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL
message of each turn, disabled by default (display.runtime_footer.enabled).
Answers the Telegram-side parity ask: runtime context that the CLI status
bar already shows is now available in messaging replies when enabled.

Wiring:
- gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer +
  build_footer_line. Pure-function renderer; per-platform overrides under
  display.platforms.<platform>.runtime_footer.
- gateway/run.py: appends footer to response right after reasoning prepend
  so it lands only on the final message (never tool progress or streaming
  chunks). When streaming already delivered the body (already_sent), the
  footer is sent as a small trailing message instead.
- agent_result now exposes context_length alongside last_prompt_tokens so
  the footer can compute the pct; both gateway return paths updated.
- /footer [on|off|status] slash command, wired in CLI (cli.py) and gateway
  (gateway/run.py both running-agent bypass and main dispatch). Global
  toggle only; per-platform overrides via config.yaml.

Graceful degradation:
- Missing context_length (unknown model) → pct field silently dropped
  (no '?%' artifact).
- Empty final_response → no footer appended.
- Unknown field names in config → silently ignored.

Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E
harness covering streaming vs non-streaming branches, per-platform override,
and the exact argument contract gateway/run.py uses.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:50:04 -07:00
Teknium 6085d7a93e chore: remove unused imports and dead locals (ruff F401, F841) (#17010)
Mechanical cleanup across 43 files — removes 46 unused imports
(F401) and 14 unused local variables (F841) detected by
`ruff check --select F401,F841`. Net: -49 lines.

Also fixes a latent NameError in rl_cli.py where `get_hermes_home()`
was called at module line 32 before its import at line 65 — the
module never imported successfully on main. The ruff audit surfaced
this because it correctly saw the symbol as imported-but-unused
(the call happened before the import ran); the fix moves the import
to the top of the file alongside other stdlib imports.

One `# noqa: F401` kept in hermes_cli/status.py for `subprocess`:
tests monkeypatch `hermes_cli.status.subprocess` as a regression
guard that systemctl isn't called on Termux, so the name must
exist at module scope even though the module body doesn't reference
it. Docstring explains the reason.

Also fixes an invalid `# noqa:` directive in
gateway/platforms/discord.py:308 that lacked a rule code.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:46:45 -07:00
teknium1 3d8be2c617 fix(install): widen /dev/tty open-probe to sibling gates (#16746)
The contributor's PR (#16750) scoped the fix to run_setup_wizard() and
explicitly punted the two sibling sites. Both have the identical
[ -e /dev/tty ] pattern followed by a < /dev/tty redirect and crash in
Docker the same way:

- scripts/install.sh:732 install_system_packages() -- apt sudo prompt
  fallback. sudo ... < /dev/tty dies with the same ENXIO.
- scripts/install.sh:1395 maybe_start_gateway() -- gateway-install gate,
  same function path as the wizard reproducer.

Fix both with the same (: </dev/tty) 2>/dev/null probe, and parametrize
the regression test over all three gated functions so any future
regression is caught regardless of which site breaks.
2026-04-28 06:45:55 -07:00
briandevans 89e8c87354 test(install): regex-based gate assertions per copilot review on #16750
Address the three Copilot inline findings on the regression test:

- Switch _extract_run_setup_wizard() from str.index() with hard-coded
  markers (which raises ValueError if `maybe_start_gateway()` is renamed
  or the marker leaks into a comment) to an anchored regex on the
  function-definition + closing-brace boundaries.
- Match `[ -e /dev/tty ]` with surrounding whitespace, optional quoting,
  and the `test -e /dev/tty` form so the regression guard catches every
  spelling of the existence-only check, not just the exact substring.
- Replace the literal `(: </dev/tty)` substring assertion with a
  higher-level invariant — the gate must be an `if`/`if !` whose test
  redirects stdin from /dev/tty — so equivalent open-based probes
  (`exec 3</dev/tty` + close, brace-grouped variants, etc.) keep the
  test green while the bare existence check stays caught.

Verified guard: both tests still pass on the fix and both fail on
`origin/main` with the documented messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:45:55 -07:00
briandevans 20c9340c34 fix(install): probe /dev/tty by opening it, not bare existence (#16746)
In Docker builds the `/dev/tty` device node is present in the mount
namespace, so `[ -e /dev/tty ]` returns true — but opening it fails
with `ENXIO: No such device or address`. Under the old gate the
"no terminal available" skip never triggered, the setup wizard ran,
and the build aborted a few lines later when bash tried `< /dev/tty`:

    /tmp/install.sh: line 1347: /dev/tty: No such device or address

Replace the existence check with `(: </dev/tty) 2>/dev/null`, which
actually attempts to open /dev/tty in a subshell. The probe succeeds
when piped from `curl | bash` on a real terminal (the wizard's intended
use case) and fails cleanly in Docker build / CI contexts so the skip
kicks in before the redirect can crash.

Add a regression test that statically asserts run_setup_wizard does not
gate on the bare existence check and that the open-based probe is in
place.

Fixes #16746.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:45:55 -07:00
teknium1 b2339c87e4 chore(release): map dejie.guo@gmail.com -> JayGwod 2026-04-28 06:45:35 -07:00
Dejie Guo 8cced33784 fix(model): prefer live models for user providers 2026-04-28 06:45:35 -07:00
Teknium 69b8fa65d4 docs(delegate_task): clarify that it is synchronous and not durable (#17022)
delegate_task runs inside the parent turn and is cancelled when the parent is interrupted (new user message, /stop, /new). The child status payload (status=interrupted, exit_reason=interrupted) is already honest, but the tool schema and user-facing docs did not set the expectation, so users reasonably assumed delegated subagents would keep running in the background after interrupting the parent.

Updates:

- tools/delegate_tool.py DELEGATE_TASK_SCHEMA description adds a WHEN NOT TO USE bullet pointing at cronjob / terminal(background=True, notify_on_complete=True) for durable long-running work.

- website/docs/user-guide/features/delegation.md gains a Lifetime and Durability callout above Key Properties.

- website/docs/guides/delegation-patterns.md expands the Use something else list and the Constraints section with the same guidance.

Reported by LizLiz (@lizliz404) via Teknium.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:45:15 -07:00
Teknium 5f84eac451 feat(gateway): bust cached agent on compression/context_length config edits (#17008)
The gateway caches one AIAgent per session to preserve prompt-cache hits,
keyed by _agent_config_signature().  The signature previously only
fingerprinted model/credentials/toolsets/ephemeral-prompt — NOT the
compression or context_length config.  As a result, users who edited
model.context_length or compression.threshold in config.yaml on a
long-lived gateway saw no effect until they triggered an unrelated
cache eviction (/model switch, /reset, gateway restart).

Add a new cache_keys parameter to _agent_config_signature and a
_CACHE_BUSTING_CONFIG_KEYS registry listing config values the agent
bakes in at construction time.  Call sites read the current config and
pass it through — next gateway message with an edited config
rebuilds the agent.

Keys registered:
- model.context_length
- compression.enabled
- compression.threshold
- compression.target_ratio
- compression.protect_last_n

Reported by @OP (Apr 26 feedback bundle).

## Changes
- gateway/run.py: new _CACHE_BUSTING_CONFIG_KEYS tuple,
  _extract_cache_busting_config classmethod, cache_keys kwarg on
  _agent_config_signature, call site passes the extracted dict
- tests/gateway/test_agent_cache.py: 11 new tests
  (5 on _agent_config_signature behavior, 6 on _extract_cache_busting_config)

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:37:42 -07:00
kshitijk4poor b5905f0d4a chore: add Mirac1eSky to AUTHOR_MAP 2026-04-28 06:37:22 -07:00
Siwen Wang d6137453ac fix(gateway): drain stale httpx polling connections on Telegram reconnect
Network errors through proxies (e.g. sing-box) can leave httpx
connections in a half-closed state occupying pool slots.  After enough
reconnect cycles the 256-connection default fills up entirely, causing
Pool timeout: All connections in the connection pool are occupied.

Fix: cycle only the getUpdates request object (_request[0]) via
shut-down + re-initialize before restarting polling.  This drains stale
connections without touching the general request (_request[1]) that
concurrent send_message / edit_message calls rely on.

The drain is applied to both _handle_polling_network_error and
_handle_polling_conflict reconnect paths via a shared
_drain_polling_connections() helper.  Failures in the drain are
swallowed so reconnect always proceeds.

Based on #16466 by @Mirac1eSky.
2026-04-28 06:37:22 -07:00
Austin Pickett a9369fc193 chore: more components 2026-04-28 09:18:40 -04:00
Austin Pickett e116957a63 fix: replace all buttons for design system buttons 2026-04-28 08:57:33 -04:00
Teknium 391f1ca1f4 feat(aux): translate extra_body.reasoning into Codex Responses API (#17004)
Auxiliary callers that configure reasoning via
auxiliary.<task>.extra_body.reasoning were having that config silently
dropped by the Codex Responses adapter — it only forwarded
messages/model/tools through to responses.stream(), never translating
chat.completions-shaped reasoning hints into the Responses API's
top-level reasoning + include fields.

Mirror the main-agent translation from agent/transports/codex.py:
- extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"}
- 'minimal' → 'low' clamp (Codex backend rejects 'minimal')
- Always include ['reasoning.encrypted_content'] when reasoning is enabled
- {'enabled': False} → omit reasoning and include entirely
- Non-dict reasoning values are ignored defensively

Reported by @OP (Apr 26 feedback bundle).

## Changes
- agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads
  and translates extra_body.reasoning before calling responses.stream()
- tests/agent/test_auxiliary_client.py: 9 new tests covering all effort
  levels, the minimal→low clamp, the disabled path, the no-op paths,
  and defensive handling of wrong-shape inputs

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:47:42 -07:00
Teknium 72dea9f4f7 feat(gateway): make hygiene hard message limit configurable (#17000)
The gateway session-hygiene pre-compression safety valve had a hardcoded
400-message threshold. On long-lived sessions with short turns this was
either too high (users with aggressive compression preferences) or too
low (users with very large context models who want to keep more history
in-flight).

Add compression.hygiene_hard_message_limit (default 400) so it can be
tuned without forking the gateway.

Reported by @OP (Apr 26 feedback bundle).

## Changes
- hermes_cli/config.py: new DEFAULT_CONFIG key with 400 default
- gateway/run.py: read compression.hygiene_hard_message_limit at
  hygiene-time, fall back to 400 if missing/invalid
- tests/gateway/test_session_hygiene.py: two tests — override fires at
  the configured limit, default does not fire below 400

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:43:12 -07:00
Teknium 06164a7b28 fix(codex): resync pool entry from auth.json after reauth (#17001)
When openai-codex tokens expire or the ChatGPT account hits a 429
window, the pool entry gets marked STATUS_EXHAUSTED with
last_error_reset_at many hours in the future. If the user then runs
`hermes model` / `hermes auth openai-codex` to reauth, fresh tokens
land in ~/.hermes/auth.json but the pool entry stayed frozen behind
its reset_at — every request kept failing with 'credential pool: no
available entries (all exhausted or empty)' until the original window
elapsed.

_available_entries() already had auth.json/credentials-file resync
branches for anthropic/claude_code and nous/device_code; openai-codex
was missing. Added _sync_codex_entry_from_auth_store() mirroring the
nous version (reads state["tokens"][{access,refresh}_token] +
state["last_refresh"]) and wired it into the exhausted-entry resync
loop.

Also softens the 'codex CLI not found' doctor warning — native
device-code OAuth does not require the Codex binary, only
importing existing Codex CLI tokens does. Downgraded to an info line.

Reported on Discord by p1aceho1der: Codex stalled indefinitely after
a rate-limit reset, reauth didn't help, and doctor falsely warned
that the codex CLI was required.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:43:09 -07:00
teknium1 529eb29b6a fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set
Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel
values. The salvaged bridge was forwarding Hermes' "minimal" effort to
Flash verbatim, which is not a documented Gemini level and risks a 400
from the native adapter.

Clamp minimal->low on Flash (matching how Pro already clamps minimal+low
down), and funnel anything outside {low, medium, high} into medium to
keep the request valid by construction. No behaviour change for the
documented effort levels.
2026-04-28 05:38:23 -07:00
Nanako0129 dbbe2d1973 fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes 2026-04-28 05:38:23 -07:00
teknium1 315a11a76f chore(prompt): tell telegram models to prefer bullets over tables
Telegram has no native table syntax. The gateway auto-rewrites pipe
tables into row-group bullets (see previous commit), but letting models
know up front means they emit the clean form directly instead of
relying on post-processing to synthesize headings.

Also helps users whose MEMORY.md formatting policies were being
overridden — the platform hint now carries the guidance.
2026-04-28 05:37:50 -07:00
LeonSGP43 a3b9343f08 feat(telegram): render markdown tables as row groups 2026-04-28 05:37:50 -07:00
helix4u d8c5573ffe fix(profiles): migrate Honcho host on rename 2026-04-28 05:37:09 -07:00
teknium1 c69310c625 fix(weixin): raise descriptive error when rate-limit retries exhaust
The rate-limit branch added by the original PR did sleep+continue with
no attempt to record the last error, so persistent iLink -2 responses
exhausted the retry loop and hit 'assert last_error is not None',
raising AssertionError instead of a descriptive RuntimeError.

Record last_error = RuntimeError(...) before continuing, and break out
of the loop on the final attempt instead of sleeping uselessly.
2026-04-28 05:21:58 -07:00
teknium1 d3a9c69e9b chore(release): map leihaibo1992 author for #16757 salvage 2026-04-28 05:21:58 -07:00
Leihb a54106bbc8 fix(weixin): split long messages (>2000 chars) into chunks to prevent truncation
- Change MAX_MESSAGE_LENGTH from 4000 to 2000 to match Weixin iLink API limit
- Add RATE_LIMIT_ERRCODE = -2 handling with 3x backoff retry
- Increase default send_chunk_delay_seconds from 0.35 to 1.5 to avoid rate limits
- Increase default send_chunk_retries from 2 to 4 for better reliability
- Use _split_text() in send() to chunk long messages before delivery

Fixes #16411
2026-04-28 05:21:58 -07:00
teknium1 1a4289b6b7 chore(release): map revar@users.noreply.github.com -> revaraver 2026-04-28 05:21:49 -07:00
revar 052b3449e5 test(cli): regression test for manual /compress system_message
Add tests/test_cli_manual_compress.py verifying _manual_compress passes
None (not the cached system prompt) to _compress_context, forwards the
/compress <topic> focus string, rotates CLI session_id to the new child
session, and clears the pending title.

Co-authored-by: revar <revar@users.noreply.github.com>
2026-04-28 05:21:49 -07:00
ygd58 fb112d6a73 fix(cli): pass None as system_message in manual compress to prevent duplication
_manual_compress() passed self.agent._cached_system_prompt to
_compress_context() as the system_message argument. _compress_context
calls _build_system_prompt(system_message), which appends system_message
to prompt_parts that already contain the agent identity block — causing
the identity to appear twice in the new session's system prompt
(20,957 -> 42,303 chars, +102% as reported in issue #15281).

Fix: pass None instead of _cached_system_prompt. _build_system_prompt(None)
rebuilds the system prompt correctly from scratch without appending a
pre-built prompt on top of the identity layers.

Fixes #15281
2026-04-28 05:21:49 -07:00
teknium1 7444e49d4e fix(gateway): use transcript timestamp for auto-continue freshness
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`).  At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.

Changes:
- Read the freshness signal from the RAW `history` list (via new
  `_last_transcript_timestamp()` helper) BEFORE the strip.  Both the
  resume_pending branch and the tool-tail branch use this single signal,
  replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
  `_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`.  The 15-minute default was
  shorter than the default `gateway_timeout` of 30 min, so a legitimate
  long-running turn interrupted near its timeout boundary and resumed
  shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
  (bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
  pattern as `gateway_timeout`).  Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
  subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
  `history` → `_build_agent_history` strip → freshness decision.  A
  regression guard (`test_stale_tool_tail_with_production_data_shape`)
  asserts `agent_history` tool rows carry NO timestamp, protecting
  against someone "fixing" the original bug by re-adding the stripped
  field (which would break the OpenAI tool-result message contract).

Add BeliefanX to scripts/release.py AUTHOR_MAP.

E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
2026-04-28 05:20:35 -07:00
beliefanx 93feffbcfa fix(gateway): avoid stale interrupted turn auto-continue 2026-04-28 05:20:35 -07:00
Teknium b61d9b297a refactor: consolidate symlink-safe atomic replace into shared helper
Extract the islink/realpath guard from the 16743 fix into a single
atomic_replace() helper in utils.py, then migrate every os.replace()
call site in the codebase to use it.

The original PR #16777 correctly identified and fixed the bug, but
only patched 9 of ~24 call sites. The same bug class (managed
deployments that symlink state files silently losing the link on
every write) still existed at auth.json, sessions file, gateway
config, env_loader, webhook subscriptions, debug store, model
catalog, pairing, google OAuth, nous rate guard, and more.

Rather than add another 10+ copies of the same three-line guard,
consolidate into atomic_replace(tmp, target) which:
- resolves symlinks via os.path.realpath before os.replace
- returns the resolved real path so callers can re-apply permissions
- is a drop-in replacement for os.replace at the use sites

Changes:
- utils.py: new atomic_replace() helper + atomic_json_write /
  atomic_yaml_write now call it instead of inlining the guard
- 16 files: all os.replace() call sites migrated to atomic_replace()
  - agent/{google_oauth, nous_rate_guard, shell_hooks}.py
  - cron/jobs.py
  - gateway/{pairing, session, platforms/telegram}.py
  - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py
  - tools/{memory_tool, skill_manager_tool, skills_sync}.py

Tests: tests/test_atomic_replace_symlinks.py pins the invariant for
atomic_replace + atomic_json_write + atomic_yaml_write, covers plain
files, first-time creates, broken symlinks, and permission preservation.

Refs #16743
Builds on #16777 by @vominh1919.
2026-04-28 04:58:22 -07:00
vominh1919 3ab97a32d1 fix: preserve symlinks during atomic file writes (#16743)
os.replace(tmp, path) replaces the symlink itself with a regular file,
breaking users who symlink config.yaml, SOUL.md, or .env from ~/.hermes/
to a dotfiles repo or managed profile package.

Fix: resolve symlinks via os.path.realpath() before os.replace(), so the
real file is overwritten in-place while the symlink survives.

Fixed in 7 files covering all os.replace call sites:
- utils.py (atomic_json_write, atomic_yaml_write — fixes save_config)
- hermes_cli/config.py (env sanitizer, save_env_value, remove_env_value)
- tools/skill_manager_tool.py (_atomic_write_text — SOUL.md writes)
- tools/memory_tool.py (memory file writes)
- tools/skills_sync.py (manifest writes)
- cron/jobs.py (job state + output file writes)
- agent/shell_hooks.py (hook file writes)

Fixes NousResearch/hermes-agent#16743
2026-04-28 04:58:22 -07:00
teknium1 1369dae226 test(openclaw-migration): cover alias reverse-lookup for real OpenClaw schema
Real OpenClaw configs key agents.defaults.models by full provider/model
API ID with an 'alias' field on the value (e.g.
{'anthropic/claude-opus-4-6': {'alias': 'Claude Opus 4.6'}}).  Add
regression tests for issue #16745 covering:

- reverse-lookup of alias against real schema (keyed by API ID)
- alias resolution when model is a bare string vs {'primary': ...}
- passthrough when the value is already a provider/model API ID
- passthrough when the alias has no catalog match
- string-valued catalog entries (belt-and-suspenders)
- no catalog at all
2026-04-28 04:58:13 -07:00
vominh1919 7996c14795 fix: resolve model aliases during claw migrate (#16745)
`hermes claw migrate` copied OpenClaw's model setting verbatim, which
could be a display alias (e.g. "Claude Opus 4.6") instead of the actual
API ID (e.g. "claude-opus-4-6"). Hermes then sent the alias to the API,
causing HTTP 404 model not found.

Fix: look up the model string in agents.defaults.models (plural) alias
catalog. If found, use the resolved "id" field, prepending the provider
prefix if needed. If not found (already an API ID), pass through unchanged.

Fixes NousResearch/hermes-agent#16745
2026-04-28 04:58:13 -07:00
阿泥豆 4aa0a7c195 fix(error-classifier): add insufficient balance to billing patterns
DeepSeek API returns HTTP 400 with 'Insufficient Balance' message when
account funds are depleted. This pattern was not in _BILLING_PATTERNS,
causing the error to be misclassified instead of triggering billing
exhaustion handling (e.g., fallback to alternate provider).

Suggested by teknium1 in PR review of #15586.
2026-04-28 04:58:09 -07:00
Teknium 7428abd54e chore(release): map mtf201013@gmail.com -> ma-pony 2026-04-28 04:58:03 -07:00
Teknium 0f473d643d refactor(schema): consolidate nullable-union stripping in schema_sanitizer
Adds tools.schema_sanitizer.strip_nullable_unions as the single
implementation for collapsing anyOf/oneOf nullable unions.  Both the
MCP input-schema normalizer and the Anthropic tool-schema guard now
delegate to it instead of re-implementing the same walk three times.

The global sanitizer also gains a final pass so any tool that slips
past the two earlier hooks (plugin tools, non-MCP custom tools with
Pydantic-shaped schemas) still gets safe input_schemas on Anthropic.

- tools/schema_sanitizer.py:
    * New public strip_nullable_unions(schema, keep_nullable_hint=True).
    * _sanitize_single_tool() calls it as a final pass (hint preserved
      so coerce_tool_args can still map string "null" to None).
- tools/mcp_tool.py: _normalize_mcp_input_schema delegates.
- agent/anthropic_adapter.py: _normalize_tool_input_schema delegates
  with keep_nullable_hint=False (Anthropic does not recognize nullable).

No behavioral change for the fix itself; tests (73/73 targeted +
E2E across MCP→sanitizer→Anthropic paths) pass.
2026-04-28 04:58:03 -07:00
Pony.Ma aa94883288 fix(mcp): preserve nullable schema coercion 2026-04-28 04:58:03 -07:00
Pony.Ma 1350d12b0b fix: keep mcp dynamic refresh tasks tracked 2026-04-28 04:58:03 -07:00
Pony.Ma 02ae152222 fix(mcp): normalize nullable tool schemas 2026-04-28 04:58:03 -07:00
teknium1 9cd02b1698 chore(release): map r.filgueiras@apheris.com -> rfilgueiras 2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras 37551ee53e test(bedrock): add model picker and region routing tests
25 new tests (all Bedrock API calls mocked, no real AWS creds needed):

tests/hermes_cli/test_bedrock_model_picker.py (20 tests):
  - provider_model_ids("bedrock") uses live discovery, returns regional
    model IDs, falls back gracefully on empty/exception, resolves all
    bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery
  - list_authenticated_providers() section 2: bedrock appears with AWS
    creds, model list from discover_bedrock_models(), total_models
    matches, is_current flag works, absent creds hides bedrock, discovery
    failure does not crash, no duplicate entries
  - Region routing: botocore profile eu-central-1 yields eu.* model IDs
    end-to-end; env var takes priority over botocore profile
  - providers.py overlay: exists with correct transport/auth_type, label
    is non-empty, all aliases normalize to bedrock

tests/agent/test_bedrock_adapter.py (5 tests):
  - resolve_bedrock_region() botocore profile fallback, botocore failure
    fallback, us-east-1 hard fallback (with botocore mocked)
2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras a23f18cc3e fix(bedrock): add live model discovery and region resolution for non-US regions
provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS
table containing only hardcoded us.* model IDs.  Users configured for
non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no
models in /model and autocomplete.

Root causes fixed:

1. models.py: provider_model_ids() now calls discover_bedrock_models()
   keyed by the resolved region before falling back to the static table.
   A new bedrock_model_ids_or_none() helper in bedrock_adapter.py
   consolidates the discover -> extract IDs -> fallback pattern used by
   all three call sites.

2. providers.py: registers bedrock in HERMES_OVERLAYS with
   transport=bedrock_converse and auth_type=aws_sdk so
   get_provider("bedrock") and resolve_provider_full("bedrock") work.

3. model_switch.py: list_authenticated_providers() sections 2 and 3
   detect AWS credentials via has_aws_credentials() for aws_sdk
   overlays and use live discovery for the model list.

4. bedrock_adapter.py: resolve_bedrock_region() reads the configured
   region from botocore.session before falling back to us-east-1,
   covering users who set their region in ~/.aws/config via a named
   profile rather than env vars.

5. tui_gateway/server.py: passes provider= to get_model_context_length()
   so context window lookups work correctly for the Bedrock provider.
2026-04-28 03:53:11 -07:00
Teknium 023f5c74b1 fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)
* fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path

OAuth requests now identify as Hermes on the wire. Removed:

  - "You are Claude Code, Anthropic's official CLI for Claude." system
    prompt prepend
  - Hermes Agent → Claude Code / Nous Research → Anthropic
    system-prompt substitutions
  - mcp_ tool-name prefix on outgoing tool schemas + message history
  - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path
    removed from AnthropicTransport.normalize_response, + all 5 call
    sites in run_agent.py and auxiliary_client.py)
  - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on
    the Messages API client

Added:

  - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth
    requests carrying it with HTTP 400 'This authentication style is
    incompatible with the long context beta header.'

Kept (auth plumbing, not identity spoofing):

  - _is_oauth_token classifier and is_oauth flag threading
  - Bearer vs x-api-key auth routing
  - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend
    requires these on the OAuth-gated Messages endpoint
  - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth
    creds to third parties; this is the only way the login flow works
  - claude-cli/<v> User-Agent on the OAuth token exchange + refresh
    endpoints at platform.claude.com/v1/oauth/token — bare requests get
    Cloudflare 1010 blocked

Verified live against api.anthropic.com with a fresh sk-ant-oat01-*
token:

  - claude-haiku-4-5 simple message: HTTP 200, 'OK' response
  - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool
    named 'terminal' (no mcp_ prefix) round-tripped correctly
  - Outgoing wire: no user-agent, no x-app, real Hermes identity in
    system prompt, real tool name in schema

Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer
needed since the mcp_ round-trip is gone).

* fix(anthropic): resolve_anthropic_token() reads credential pool first

Close the gap where ~/.hermes/auth.json → credential_pool.anthropic
(where hermes login + dashboard PKCE flow write OAuth tokens) was not
in resolve_anthropic_token()'s source list.

Before: users who authed via hermes login got the token written into
the pool, but legacy fallback code paths (auxiliary_client, models
catalog fetch, explicit-runtime path) that call resolve_anthropic_token()
saw None and raised 'No Anthropic credentials found' — even though the
token was sitting in auth.json.

New priority 1: pool.select() with env-sourced entries skipped. Skipping
env:* entries preserves the existing env-var priority logic further
down the chain (static env OAuth → refreshable Claude Code upgrade via
_prefer_refreshable_claude_code_token).

Surfaced while writing the hermes-agent-dev skill playbook for
'finding a live OAuth token for an E2E test'.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 03:51:17 -07:00
Teknium 2b728e1274 fix(agent): drop thinking-only assistant turns before provider call (#16959)
Adds a pre-call sanitizer that detects assistant messages containing only
reasoning (reasoning / reasoning_content, no visible content, no
tool_calls) and drops them from the API copy. Adjacent user messages
left behind are merged so role alternation is preserved for the
provider.

Mirrors Claude Code's approach in src/utils/messages.ts
(filterOrphanedThinkingOnlyMessages + mergeAdjacentUserMessages). We
drop the whole turn rather than fabricate stub text (the '.' /
'(continued)' pattern from contributor PRs #11098, #13010, #16842 that
were rejected because they put words in the model's mouth).

The stored conversation history (self.messages) is never mutated — only
the per-call api_messages copy. Users still see the reasoning block in
the CLI/gateway transcript; only the wire copy is cleaned. Session
persistence keeps the full trace.

Two call sites covered:
- Main agent loop, after _sanitize_api_messages (catches every turn).
- Iteration-limit-summary fallback path.

Tests: tests/run_agent/test_thinking_only_sanitizer.py — 25 cases
covering detection (string/list content, whitespace-only, tool_calls,
reasoning_details list form), drop behavior, adjacent-user merge
(string+string, list+list, mixed), non-mutation of input dicts, and
system-message handling.

E2E live-tested against 5 providers with a poisoned history (empty
assistant message + reasoning_content): OpenRouter→Anthropic/OpenAI/
DeepSeek-R1/Qwen, native Gemini. All 5 accepted the cleaned request.
Happy-path regression (5/5) confirms the sanitizer is a noop when no
thinking-only turn exists.

Related: #16823 (wontfix — stub-text approach rejected).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 03:50:51 -07:00
teknium1 5316ce95de chore(release): map simonweng@tencent.com -> Contentment003111
AUTHOR_MAP entry for the tencent-tokenhub provider PR #16860 contributor.
2026-04-28 03:45:52 -07:00
simonweng a6a6cf047d feat(providers): add tencent-tokenhub provider support
Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a
new API-key provider with model tencent/hy3-preview (256K context).

- PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars
- Aliases: tencent, tokenhub, tencent-cloud, tencentmaas
- openai_chat transport with is_tokenhub branch for top-level
  reasoning_effort (Hy3 is a reasoning model)
- tencent/hy3-preview:free added to OpenRouter curated list
- 60+ tests (provider registry, aliases, runtime resolution,
  credentials, model catalog, URL mapping, context length)
- Docs: integrations/providers.md, environment-variables.md,
  model-catalog.json

Author: simonweng <simonweng@tencent.com>
Salvaged from PR #16860 onto current main (resolved conflicts with
#16935 Azure Anthropic env-var hint tests and the --provider choices=
list removal in chat_parser).
2026-04-28 03:45:52 -07:00
Teknium bd10acd747 fix(providers): honor key_env/api_key_env on Azure Anthropic + accept alias in normalizer (#16935)
Three related fixes around custom env-var-name hints for provider entries.

1. Azure Anthropic path: previously hardcoded to look up AZURE_ANTHROPIC_KEY
   then ANTHROPIC_API_KEY with no way to override.  If a user wrote
     model:
       provider: anthropic
       base_url: https://my-resource.services.ai.azure.com/anthropic
       key_env: MY_CUSTOM_KEY
   the key_env hint was silently ignored and the resolver raised
   'No Azure Anthropic API key found' even when MY_CUSTOM_KEY was set
   in the environment.  The runtime now checks, in order:
     (1) os.getenv(model_cfg.key_env)
     (2) os.getenv(model_cfg.api_key_env)    # docs alias
     (3) model_cfg.api_key                     # inline value
     (4) AZURE_ANTHROPIC_KEY                   # historical default
     (5) ANTHROPIC_API_KEY                     # historical default
   Error message updated to mention key_env as an option.

2. Provider entry normalizer (_normalize_custom_provider_entry): accept
   'api_key_env' as a snake_case alias for 'key_env', and 'apiKeyEnv' as a
   camelCase alias.  Adds both to the _KNOWN_KEYS set so the 'unknown
   config keys ignored' warning doesn't fire on valid configs.

3. _VALID_CUSTOM_PROVIDER_FIELDS: add 'key_env'.  That set documents
   supported custom_providers entry fields; it was drifting from reality
   since key_env has been read at runtime in auxiliary_client.py,
   runtime_provider.py, and main.py for a while.

Docs: website/docs/guides/azure-foundry.md now uses the canonical key_env
field and notes that api_key_env / keyEnv / apiKeyEnv are accepted as
aliases.

Validation: 12 new tests in test_runtime_provider_resolution.py covering
all 5 Azure Anthropic resolution paths + 4 normalizer-alias tests.  Pass
rate across related suites (165 + 46 tests): 100%.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 02:12:08 -07:00
teknium1 4148e85b3a docs(web): document web_search limit parameter and query operators 2026-04-28 02:09:30 -07:00
墨綠BG 4462b349b2 feat(web): expose search result limit 2026-04-28 02:09:30 -07:00
Teknium 4e5ebf07ea fix(matrix): stop tagging the user on every reply (#16932)
The mention_user_id injection from #38a6bada9 unconditionally attached an
@user:server mention pill + MSC3952 m.mentions.user_ids payload to every
outbound reply and every tool-progress status update. The stated intent
was push notifications in muted rooms, but shipped as always-on in every
room, DM or group, muted or not — so every reply pinged the user.

- gateway/platforms/base.py: stop injecting mention_user_id into send
  metadata on every reply; restore the original _thread_metadata passthrough.
- gateway/run.py: drop mention_user_id from status-thread metadata.
- gateway/platforms/matrix.py: drop the mention-pill append block in
  _send_text that consumed the metadata. Keep the reaction-based exec
  approval half of #38a6bada9 and the inbound/outbound m.mentions
  handling (unrelated to the per-reply ping).

Reported by Elkim [NOUS] on Discord.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 02:00:37 -07:00
Teknium 447d800b81 docs: add observability/langfuse to built-in-plugins + env-vars reference (#16929)
Documents the langfuse plugin shipped in #16917:
- website/docs/user-guide/features/built-in-plugins.md: new
  observability/langfuse section (setup wizard vs manual, hook-by-hook
  behaviour, verify / optional tuning / disable)
- website/docs/reference/environment-variables.md: Langfuse Observability
  subsection under Tool APIs listing the 3 required + 5 optional env vars,
  with a back-link to the built-in-plugins page

Validated: ascii-guard clean, npm run build succeeds, #observabilitylangfuse
anchor resolves.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 01:57:52 -07:00
Teknium e63364b8df revert: computer-use cua-driver (PR #16919) (#16927)
Reverts PR #16919 (commits dad10a78d, 413ee1a28, b4a8031b2, afb958829)
which was merged prematurely. Restoring the pre-merge state so #14817
and #15328 can be revisited as standing PRs.

Reverted commits:
- afb958829 fix(computer-use): harden image-rejection fallback + AUTHOR_MAP
- b4a8031b2 fix(computer-use): unwrap _multimodal tool results
- 413ee1a28 feat(computer-use): background focus-safe backend
- dad10a78d feat(computer-use): cua-driver backend, universal any-model schema

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 01:57:21 -07:00
Teknium cf0852f92e feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup (#16911)
* feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup

Adopts four design patterns from OpenClaw's reciprocal migrate-hermes
importer so both migration paths have the same safety posture.

- **Refuse-on-conflict apply.** 'hermes claw migrate' now refuses to
  execute when the plan has any conflict items, unless --overwrite is
  set. Previously the user could say 'yes, proceed' and end up with a
  silent partial migration that skipped every conflicting item.
- **Engine-level secret redaction.** The report.json and summary.md
  written to disk (and --json stdout) run through a redactor that
  matches OpenClaw's key-name markers and value-shape patterns
  (sk-*, ghp_*, xox*-, AIza*, Bearer *). Prevents accidental API key
  leakage in bug reports and support channels.
- **Pre-migration tarball snapshot.** Apply creates one timestamped
  restore-point archive of ~/.hermes/ at ~/.hermes/migration/pre-migration-backups/
  before any mutation, excluding regenerable directories
  (sessions, logs, cache). Opt out with --no-backup.
- **Blocked-by-earlier-conflict sequencing.** If a config.yaml write
  hits conflict/error mid-apply, subsequent config-mutating options
  are marked skipped with reason 'blocked by earlier apply conflict'
  rather than attempting partial writes.
- **Structured warnings[] and next_steps[] on the report** — actionable
  guidance surfaces in both JSON output and summary.md.
- **--json output mode** — emits the redacted report on stdout for CI.

Also flips --preset full to NOT auto-enable --migrate-secrets. Users
now have to opt in to secret import explicitly, mirroring OpenClaw's
two-phase posture.

Status/kind/action constants are defined (STATUS_MIGRATED etc) with
values that match the existing strings the script emits, so the
report schema is backward-compatible. ItemResult gains a 'sensitive'
bool field that redaction and consumers can key off.

Validation: 26 new unit tests + 1 updated test in tests/skills/
test_openclaw_migration_hardening.py and test_claw.py cover redaction
(key markers, value patterns, recursion, on-disk), warnings/next_steps,
blocked-by-earlier sequencing, --json mode, and the preset-flip.
Manual E2E against a fake $HERMES_HOME with real-shaped secrets
confirmed: (1) secrets never appear in stdout or on disk,
(2) _cmd_migrate refuses apply when plan has conflicts,
(3) --overwrite proceeds past the guard and the backup tarball is
created, (4) --no-backup skips the archive.

Related docs: website/docs/guides/migrate-from-openclaw.md and
website/docs/reference/cli-commands.md updated to reflect the
preset-flip and new --no-backup flag.

* refactor(claw-migrate): reuse hermes backup system for pre-migration snapshot

Drops the inline tarball in hermes_cli/claw.py in favor of
hermes_cli.backup.create_pre_migration_backup(), which shares an
implementation with create_pre_update_backup via a new
_write_full_zip_backup helper.  Benefits:

- Consistent exclusion rules with hermes backup (_EXCLUDED_DIRS,
  _EXCLUDED_SUFFIXES, _EXCLUDED_NAMES — single source of truth).
- SQLite safe-copy via _safe_copy_db (state.db restores cleanly).
- Zip format restorable with 'hermes import <archive>'.
- Lives under ~/.hermes/backups/pre-migration-*.zip alongside
  pre-update-*.zip — one place for all snapshot archives.
- Auto-prune rotation with separate keep counters (pre-migration
  keeps 5, pre-update keeps 5, they don't touch each other's files).

7 new tests in tests/hermes_cli/test_backup.py lock the contract:
directory location, shared exclusion rules, _validate_backup_zip
acceptance (i.e. restorable with 'hermes import'), non-recursive
into prior backups, rotation, missing-home handling, and the
invariant that pre-migration rotation never touches pre-update
backups.

Help text and docs updated — the restore hint now says
'hermes import <name>' instead of 'tar -xzf <archive> -C ~/'.

* chore(claw-migrate): use backup._format_size and drop duplicate output line

Minor polish using another existing primitive from hermes_cli.backup:

- Show backup archive size with _format_size (e.g. '(245 B)' or '(2.4 MB)')
  matching the format hermes backup already uses.
- Drop the duplicate 'Pre-migration backup saved' line after Migration
  Results — the earlier 'Pre-migration backup: <path> (<size>)' line
  already surfaces the path before apply runs.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 01:50:23 -07:00
Teknium a83f669bcf fix(models): auto-derive xAI model list from models.dev cache (#16699)
Follow-up to the static list refresh: replace the hardcoded xAI entries
with _xai_curated_models(), mirroring the _codex_curated_models()
pattern from PR #7844. The helper reads $HERMES_HOME/models_dev_cache.json
at import time (no network call) and falls back to a small static list
when the cache is missing or malformed.

Why: _PROVIDER_MODELS["xai"] has drifted once already (issue #16699) and
will drift again next time xAI renames a model. Hermes already maintains
the models.dev cache and uses it for context-length lookups; pointing
_PROVIDER_MODELS at the same source means the /model picker self-heals on
the next cache refresh instead of requiring a PR.

Behavior:
- With cache populated (normal user): shows every current xAI model ID,
  picks up renames automatically on next refresh.
- Without cache (fresh install, offline): falls back to a static snapshot
  of the 9 current flagship IDs.
- Malformed cache / unexpected shape: same static fallback, no crash.

Import time verified <20ms — disk read only, no HTTP.

Addresses the structural piece of #16699 ("consider a single
_provider_models(provider) resolver") for xAI. Other per-provider lists
can adopt the same pattern as drift is observed.
2026-04-28 01:49:50 -07:00
vominh1919 6c78305294 fix(models): update stale xAI model list (#16699)
_PROVIDER_MODELS["xai"] was pointing at model IDs the xAI direct API
no longer accepts:
- grok-4.20-reasoning
- grok-4-1-fast-reasoning

Replaced with the actual current xAI catalog IDs from models.dev
($HERMES_HOME/models_dev_cache.json, mirror of https://models.dev/api.json):
  grok-4.20-0309-reasoning
  grok-4.20-0309-non-reasoning
  grok-4.20-multi-agent-0309
  grok-4-1-fast
  grok-4-1-fast-non-reasoning
  grok-4-fast
  grok-4-fast-non-reasoning
  grok-4
  grok-code-fast-1

The xAI-direct API (https://api.x.ai/v1) serves the dated IDs shown
above; the bare aliases (grok-4.20, grok-4.1-fast, etc.) are
OpenRouter/Vercel-gateway normalizations and are not accepted on
xAI-direct. Those gateways remain unaffected.

Fixes #16699
2026-04-28 01:49:50 -07:00
teknium1 1b9b5d2957 chore(release): map ThomassJonax author email 2026-04-28 01:49:46 -07:00
ThomassJonax 2f9243c333 fix(session): make SQLite transcript rewrites transactional 2026-04-28 01:49:46 -07:00
teknium1 22ddac4b14 fix(auxiliary): widen URL rewrite + main_runtime to sibling custom branches
Follow-up to PR #16819 applying the same treatment to the two sibling
fallback sites in resolve_provider_client() that carry the identical bug
class as the anonymous-custom branch:

- Named custom provider (providers: / custom_providers: config entries):
  apply _to_openai_base_url() on the OpenAI-wire path (chat_completions /
  codex_responses), leave custom_base untouched on the anthropic_messages
  path where the /anthropic surface is intentional.  Prefer
  main_runtime.get('model') over _read_main_model() so the entry model
  still wins first.  The ImportError fallback for anthropic_messages now
  redoes query-param extraction against the rewritten URL so the final
  OpenAI client hits /v1.

- external_process branch (copilot-acp): same main_runtime.get('model')
  fallback before _read_main_model() so auxiliary tasks on this provider
  track live /model switches instead of stale config.yaml.

Keeps the fix consistent across all three custom-endpoint fallback sites
in resolve_provider_client().
2026-04-28 01:47:25 -07:00
crayfish-ai f3371c39a4 fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen
- auxiliary_client: apply _to_openai_base_url() to custom base_url
  (fixes /anthropic → /v1 rewrite missing for provider="custom")
- auxiliary_client: use main_runtime.get("model") instead of _read_main_model()
  so auxiliary tasks follow system default model changes
- title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title
- cli.py / gateway/run.py: pass main_runtime to maybe_auto_title
- tests: update mock assertions for new main_runtime parameter
2026-04-28 01:47:25 -07:00
teknium1 20b49b71cd chore(release): map steve.westerhouse@origami-analytics.com to westers 2026-04-28 01:47:20 -07:00
westers 1791324604 test(cli): regression coverage for user-provider routing fix (#16767) 2026-04-28 01:47:20 -07:00
westers 632ddf2a0a fix(cli): honor user-defined providers via chat --provider and -m <alias>
Three related issues prevented user-defined providers in `providers:` and
`model_aliases:` from being reachable through standard CLI flags. Requests
silently routed to the configured `model.base_url` instead of the user-
intended endpoint.

* hermes_cli/model_switch.py — root cause of the silent misrouting:
  `_ensure_direct_aliases()` rebound `DIRECT_ALIASES` to a freshly-loaded
  dict, leaving every `from hermes_cli.model_switch import DIRECT_ALIASES`
  caller stuck on the stale empty original. Switched to `.update()` so
  module attribute references stay valid.

* hermes_cli/main.py — chat subcommand `--provider` had `choices=[...]`
  hardcoded to built-in providers, rejecting valid keys from user
  `providers:` config. Dropped the choices list; runtime resolution
  validates correctly downstream.

* hermes_cli/oneshot.py — `-m <alias>` only resolved the model name; the
  alias's base_url was never propagated. Now consults `DIRECT_ALIASES`
  before falling through to `detect_provider_for_model`, and threads the
  alias's base_url to `resolve_runtime_provider(explicit_base_url=...)`.

* hermes_cli/runtime_provider.py — `_resolve_named_custom_runtime` now
  honors `(provider="custom", explicit_base_url=...)` so a base_url
  propagated from a direct-alias resolution actually builds a runtime
  instead of falling through to provider-registry handlers that don't
  know about ad-hoc local endpoints.

Verified: `hermes chat --provider <user-key> -m <model> -q "..."` and
`hermes -m <user-alias> -z "..."` both route to the user-intended
endpoint, observable via the target server's request log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 01:47:20 -07:00
Teknium afb9588298 fix(computer-use): harden image-rejection fallback + AUTHOR_MAP
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.

_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.

Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.

Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.

AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
2026-04-28 01:46:36 -07:00
ddupont b4a8031b2e fix(computer-use): unwrap _multimodal tool results to content list for non-Anthropic providers
Tool handlers (e.g. computer_use capture) return a _multimodal envelope
dict when a screenshot is attached. The tool-message builder was passing
this raw dict as the `content` field of role:tool messages, which is an
illegal format — OpenAI-compatible APIs expect a string or a content-parts
list, not a plain Python dict, and would reject it with a 400/422 error.

Fix: unwrap _multimodal results to their `content` list
([{type:text,...},{type:image_url,...}]) in both the parallel and
sequential tool-call paths. The Anthropic adapter already handles content
lists natively; vision-capable OpenAI-compatible servers (mlx-vlm,
GPT-4o, etc.) accept image_url parts in tool messages directly.

Also add a _vision_supported adaptive fallback: on first image-rejection
error ("Only 'text' content type is supported." etc.) the agent strips all
image parts from the message history and retries with text only, so
text-only endpoints degrade gracefully without crashing the session.
2026-04-28 01:46:36 -07:00
ddupont 413ee1a286 feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.

## cua_backend.py

**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.

**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.

**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.

**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).

**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.

**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.

**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.

**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.

## schema.py

Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.

## tool.py

Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.

## agent/display.py + run_agent.py

Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
2026-04-28 01:46:36 -07:00
Teknium dad10a78d0 feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.

Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.

- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
  CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
  Actions: capture (som/vision/ax), click, double_click, right_click,
  middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
  `content: [text, image_url]` parts) that flows through
  handle_function_call into the tool message. Anthropic adapter converts
  into native `tool_result` image blocks; OpenAI-compatible providers
  get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
  recent screenshots carry real image data; older ones become text
  placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
  their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
  instead of its base64 char length (~1MB would have registered as
  ~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
  is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
  and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
  through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
  (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
  force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
  workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
  entries.

44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.

469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.

- `model_tools.py` provider-gating: the tool is available to every
  provider. Providers without multi-part tool message support will see
  text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
  client-side eviction + compressor pruning cover the same cost ceiling
  without a beta header.

- macOS only. cua-driver uses private SkyLight SPIs
  (SLEventPostToPid, SLPSPostEventRecordTo,
  _AXObserverAddNotificationAndCheckRemote) that can break on any macOS
  update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
  prints the Settings path.

Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
2026-04-28 01:46:36 -07:00
kshitijk4poor 42cc905c13 feat(plugins): add bundled observability/langfuse plugin
Opt-in Langfuse tracing for Hermes conversations — LLM calls, tool
usage, usage/cost breakdown per span. Hooks into pre/post_api_request,
pre/post_llm_call, pre/post_tool_call. SDK is optional; missing SDK or
credentials renders the plugin inert.

Salvaged from PR #16845 by @kshitijk4poor, who wrote the plugin
(~875 LOC, 6 hooks, Langfuse usage-details/cost-details normalization,
read_file payload summarization).

Salvage scope (why this isn't PR #16845 as-authored):
- Lives at plugins/observability/langfuse/ (standalone kind, opt-in via
  plugins.enabled) instead of a new parallel optional-plugins/
  directory. Standalone bundled plugins are already opt-in — only their
  plugin.yaml is scanned at startup; the Python module is not imported
  unless the user enables it. The premise of optional-plugins/ (avoid
  import cost for users who don't want it) is already solved by the
  existing plugin system.
- Dropped the triple activation gate (plugins.enabled +
  plugins.langfuse.enabled + HERMES_LANGFUSE_ENABLED). The Hermes plugin
  system's own enable/disable is authoritative; runtime credentials
  gate whether the hook actually traces.
- Rewrote _is_enabled() → cached _get_langfuse() with an _INIT_FAILED
  sentinel. The original called hermes_cli.config.load_config() from
  every hook invocation (full yaml parse + deep merge + env expansion
  on every pre/post_tool_call, potentially 100+ times per turn). The
  cached version reads env once and returns the cached client or None
  on every subsequent call with zero further work.
- hermes tools → Langfuse Observability post-setup adds
  observability/langfuse to plugins.enabled directly (via
  _save_enabled_set) instead of going through an install-copy flow.

Enable:
  hermes tools                                        # interactive
  hermes plugins enable observability/langfuse        # manual

Required env (set by `hermes tools` or in ~/.hermes/.env):
  HERMES_LANGFUSE_PUBLIC_KEY
  HERMES_LANGFUSE_SECRET_KEY
  HERMES_LANGFUSE_BASE_URL                            # optional

Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>
2026-04-28 01:40:59 -07:00
Surat Srichan 4d3e3ff8a2 fix(gateway): coerce plaintext "restart gateway" DMs to /restart
Narrow plaintext shortcut that rewrites a tiny set of admin phrases
("restart gateway", "restart the gateway", "restart hermes") into the
/restart slash command, but only in DMs. Scope is intentionally tight:

- DM text messages only — group chats keep natural-language semantics
- Exact restart-style phrases only
- Skips anything already starting with "/"

Without this, the LLM can receive "restart gateway" as a user turn and
try to satisfy it via the terminal tool (systemctl restart ...). That
kills the gateway while the originating agent is still running, which
leaves systemd in "draining" state waiting on a process it's about to
kill. Routing the phrase to the slash-command dispatcher bypasses the
agent loop and uses the existing restart machinery (request_restart).

Called once, at the adapter level in BasePlatformAdapter.handle_message,
so every platform gets it for free and pending-message reinjection is
covered by the same call site.

Adds 2 Telegram-parametrized e2e tests: DM routes to request_restart,
group chats fall through to the normal agent path.
2026-04-28 01:40:28 -07:00
Teknium c9d8b916d1 chore(release): map @beesrsj2500 contributor emails to GitHub login 2026-04-28 01:40:25 -07:00
Surat Srichan a8f9c56cb4 fix(config): accept fallback_model list (chain) in validator + save
Runtime already supports list-form fallback_model (run_agent.py:1459
iterates fallback_chain; fallback_cmd.py migrates legacy single-dict
configs to list format). The config validator and save_config comment
gate still assumed single-dict form and flagged list-form configs as
errors. Fix both:

- validate_config_structure: when fallback_model is a list, validate
  each entry has provider+model; keep the existing single-dict path.
- save_config: suppress the "add fallback_model" comment when any list
  entry is well-formed.

Adds 4 list-form validator tests.
2026-04-28 01:40:25 -07:00
Teknium 0edcc57d9a fix(acp): wire HERMES_SESSION_KEY per session so sudo cache scope activates
PR #16858's session-scoped interactive sudo password cache falls back to
a thread-identity scope when no HERMES_SESSION_KEY is bound. ACP never
set that contextvar, so two ACP sessions landing on the same reused
ThreadPoolExecutor thread still shared the cache — the exact scenario
the PR headlined.

acp_adapter/server.py now:
- binds HERMES_SESSION_KEY=<session_id> via gateway.session_context
  inside _run_agent() (and clears on exit)
- wraps the loop.run_in_executor(_executor, _run_agent) call in a fresh
  contextvars.copy_context() so concurrent ACP sessions don't stomp on
  each other's ContextVar writes (executor pool threads would otherwise
  share a context).

Adds tests/acp/test_approval_isolation.py::
  test_sudo_password_cache_isolated_across_acp_sessions_on_same_pool_thread
which drives two back-to-back sessions through a 1-worker ThreadPoolExecutor
and asserts B does not observe A's cached password.
2026-04-28 01:34:16 -07:00
hharry11 de03a332f7 fix(security): isolate interactive sudo password cache per session 2026-04-28 01:34:16 -07:00
Teknium efb7d27609 chore(release): map yes999zc@163.com to yes999zc 2026-04-28 01:33:00 -07:00
Teknium 8d76d69d48 fix(state): repair FTS5 delete trigger and add v11 migration for tool-call indexing
Follow-up on top of the cherry-picked contributor commit for #16751:

1. Delete triggers: the original PR switched FTS5 from external to inline
   content mode and concatenated content || tool_name || tool_calls in
   the insert/update triggers, but left the delete triggers passing
   old.content to the FTS5 delete-command. FTS5 inline delete requires
   the content to match what was stored, so every DELETE on messages
   raised 'SQL logic error'. Replaced with plain DELETE FROM ... WHERE
   rowid = old.id on all four delete paths (normal + trigram, delete +
   update-delete).

2. v11 migration: existing DBs have the old external-content FTS tables
   and triggers. Because CREATE VIRTUAL TABLE IF NOT EXISTS / CREATE
   TRIGGER IF NOT EXISTS skip when the objects already exist, upgraders
   would have kept the broken behavior forever. Bumped SCHEMA_VERSION
   to 11 and added a migration that drops both FTS tables + all 6 old
   triggers, recreates them via FTS_SQL / FTS_TRIGRAM_SQL, and backfills
   from messages using the same concatenation expression.

3. Regression tests: 6 new tests cover INSERT / UPDATE / DELETE paths
   for tool_name + tool_calls indexing plus the full v10 -> v11 upgrade
   path on a hand-built legacy DB.
2026-04-28 01:33:00 -07:00
Bakey Dev. cfcad80ee1 fix(state): index tool_calls and tool_name in FTS5 for session_search
The FTS5 virtual tables (messages_fts, messages_fts_trigram) previously
only indexed the content column via external content mode. Tool calls
and tool names stored in the tool_calls (JSON) and tool_name columns
were invisible to FTS5 search.

Root cause: FTS5 triggers only INSERTed new.content into the index.

Changes:
- Switch FTS5 tables from external content (content=messages) to inline
  mode so that trigger-inserted content is both indexed and stored
- Update all 6 FTS5 triggers to concatenate content, tool_name, and
  tool_calls when indexing new messages
- Extend the short-CJK LIKE fallback to also search tool_name and
  tool_calls columns

Closes: #16751
2026-04-28 01:33:00 -07:00
Teknium 7d884f81c4 chore(release): add crayfish-ai to AUTHOR_MAP 2026-04-28 01:31:40 -07:00
crayfish-ai abefd89059 fix(search): quote underscored terms in FTS5 query sanitization
FTS5 default tokenizer splits 'sp_new1' into tokens 'sp' and 'new1'.
Without quoting, a search for 'sp_new' becomes an AND query
('sp AND new') that fails to match rows indexed as 'sp_new1'.

Fix: add underscore to the character class in Step 5 regex
([.-] -> [._-]) so underscored terms are wrapped in double quotes.

Also adds test_sanitize_fts5_quotes_underscored_terms.
2026-04-28 01:31:40 -07:00
vominh1919 0169c51820 fix(config): add request_timeout_seconds and stale_timeout_seconds to provider _KNOWN_KEYS
Both keys are documented in cli-config.yaml.example and read at runtime by
hermes_cli/timeouts.py (get_provider_request_timeout and get_provider_stale_timeout),
but the provider-entry validator in config.py flagged them as unknown, producing
noisy warnings on every CLI invocation for users who followed the documented config.

Fixes #16779
2026-04-28 01:28:25 -07:00
Teknium db305bba8b chore(dashboard): address copilot review nits on #16861
- App.tsx doc comment: replace stale ChatPageHost reference with
  'persistent chat host block rendered inline near the bottom of this
  file' so readers can find the actual code.
- App.tsx persistent host: show a small spinner on /chat while plugin
  manifests are loading instead of a blank content area.  Direct
  /chat deep-links used to paint empty for up to ~2s in the worst
  case (plugin-registration safety timeout) because both the route
  sink (null) and the persistent host (!pluginsLoading gate) render
  nothing during that window.  Non-chat routes stay empty as before.
- ChatPage.tsx: rename setter to match the 'raw' state — useState
  now destructures as [mobilePanelOpenRaw, setMobilePanelOpenRaw],
  and all four call sites (closeMobilePanel, matchMedia listener,
  open-button onClick, plus destructure) updated accordingly.  No
  behavior change; matches the 'raw vs derived' convention the
  original comment set up.
2026-04-28 01:25:41 -07:00
emozilla d293e0051e fix(dashboard): persist chat tab state across tab switches
The dashboard's Chat tab (hermes dashboard --tui) lost its session
whenever the user navigated to another tab and came back.  React Router
unmounted ChatPage on path change, which ran the cleanup function,
closed the PTY WebSocket, and terminated the underlying TUI child -
so the next mount generated a fresh channel id, spawned a new PTY, and
started a brand-new conversation.

Rather than rebuild the destroyed state (session id capture + resume
via HERMES_TUI_RESUME would reload history from disk but drop in-flight
tool state, scrollback, and picker position), keep the component tree
alive.

* Pull ChatPage out of Routes into a sibling always-mounted host that
  toggles visibility via display:none keyed off the current route.  A
  tiny ChatRouteSink still claims /chat so the catch-all redirect
  does not fire.
* xterm instance, WebSocket, PTY child, and TUI/agent state all
  survive; returning to /chat shows the exact conversation the user
  left.
* Respect plugin `/chat` overrides: if a plugin manifest declares
  `tab.override: "/chat"`, the Routes tree already swaps the element
  for <PluginPage /> — we additionally suppress the persistent host
  so the two don't paint on top of each other.  Preserves the
  pre-persistence contract that a plugin owning /chat replaces the
  built-in chat UI entirely.
* Wait for usePlugins() to finish loading before mounting the
  persistent host.  Manifests arrive asynchronously from
  /api/dashboard/plugins, so without the `!pluginsLoading` gate the
  host would mount with manifests=[], spawn a PTY, and then unmount
  mid-session when the manifest list resolves and reveals a /chat
  override.  Typical delay is <50ms; worst case is the 2s plugin-
  registration safety timeout.  Cheaper than killing someone's
  conversation underneath them.
* Gate page-header slot (`setEnd`), the mobile sheet's portalled
  render, and body-scroll lock on a new `isActive` prop so the hidden
  ChatPage doesn't fight the active page for shared state.  The
  scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is
  `isActive && mobilePanelOpenRaw`) rather than the raw state — that
  way tab-switch flips the dep false, fires the cleanup, and releases
  `document.body.style.overflow`.  Keying on the raw state would leave
  body.overflow="hidden" stuck on /sessions and every other tab until
  the user navigated back to /chat and explicitly closed the sheet.
* When isActive flips false to true, force a double-rAF fit:
  display:none collapses the host box and ResizeObserver does not fire
  on display changes, so xterm would otherwise stay at a stale or 1x1
  grid.  Also early-return from syncTerminalMetrics when the host has
  zero area, since fit() on a zero-sized element produces a 1x1
  terminal.
* Focus handling on tab return: only steal focus into the terminal if
  focus wasn't already parked somewhere inside ChatPage (e.g. the
  sidebar model picker, a tool-call entry).  Yanking focus away from
  whatever the user last clicked is surprising and a screen-reader
  foot-gun; the typical "first activation" case still focuses the
  terminal because document.activeElement is <body> at that point.

Trade-off worth flagging, deliberately not mitigated in this change:
while hidden, ChatPage still holds a PTY child + WebSocket + xterm
instance for the dashboard's full lifetime.  The WS keeps delivering
bytes and xterm keeps parsing them into a display:none host (cheap —
no paint work, but not free).  Reasonable costs to pay for the session
preservation; if they become a problem we can pause `term.write` when
!isActive or idle-disconnect after N minutes hidden.

Lint clean on touched files.  tsc -b && vite build pass.
2026-04-28 01:25:41 -07:00
Teknium 185ecc71f1 docs: document agent.disabled_toolsets config + AUTHOR_MAP
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():

- Document the new config key under a "Global Toolset Disable" section
  in website/docs/user-guide/configuration.md, including the precedence
  note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
  AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
2026-04-28 01:23:16 -07:00
Hafiy Zakaria 40bd6d4709 fix: honor agent.disabled_toolsets in gateway sessions
Previously, agent.disabled_toolsets in config.yaml only worked for CLI
mode (run_agent.py --disabled_toolsets). The gateway always passed
enabled_toolsets to AIAgent, and get_tool_definitions() ignored
disabled_toolsets when enabled_toolsets was set.

Fix: _get_platform_tools() now reads agent.disabled_toolsets from config
and excludes those toolsets from the returned set. This runs last so it
overrides everything above.

Added 3 tests covering cross-platform suppression, explicit platform
config override, and empty/missing config no-op behavior.
2026-04-28 01:23:16 -07:00
Teknium d63abbc329 fix(agent): persist streamed reasoning_content on assistant turns (#16844) (#16892)
Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs #16844, #16884, #15250, #15353, #15748.
2026-04-28 01:19:18 -07:00
briandevans 66a05e44d6 fix(copilot): require successful exchange when walking credential_pool catalog tokens
Address Copilot review on #16868:

1. Tighten pool iteration. ``validate_copilot_token`` only rejects empty
   strings and classic PATs (``ghp_*``); a malformed/unsupported ``gho_*``
   token at ``credential_pool.copilot[0]`` would pass the gate and short-
   circuit the loop, hiding a later valid entry. Switch to calling
   ``exchange_copilot_token`` directly: only entries that actually exchange
   into a live Copilot API token are returned. Bad/expired entries fall
   through to the next, and an exhausted pool returns ``""`` so the picker
   falls back to the curated list (existing behaviour).

2. Reword the docstring + test module docstring to describe the pool seed
   path accurately — ``hermes auth add copilot`` adds an api-key-typed
   credential whose ``access_token`` field stores the pasted token, and
   ``_seed_from_env`` mirrors ``COPILOT_GITHUB_TOKEN`` from
   ``~/.hermes/.env`` into the pool. The previous wording implied
   ``auth add copilot`` itself ran the device-code flow, which it does
   not (the device-code flow lives in ``hermes model``).

Two new tests cover the iteration change:
  - ``test_skips_pool_entry_that_fails_to_exchange`` — pool[0] raises,
    pool[1] succeeds, picker uses pool[1].
  - ``test_all_pool_entries_fail_exchange_returns_empty`` — every entry
    raises, return ``""``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 01:18:09 -07:00
briandevans fdfe40a48b fix(copilot): fall back to credential_pool OAuth access_token for /model picker (#16708)
Users whose only Copilot credential is the OAuth `access_token` saved by
`hermes auth add copilot` (device-code flow) saw the `/model` picker drop
back to a stale hardcoded list. Reason: `_resolve_copilot_catalog_api_key`
only consulted env vars (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` /
`GITHUB_TOKEN`) and the `gh auth token` CLI fallback, never the credential
pool that Hermes's own login flow writes into `auth.json`. With no token,
the live catalog fetch silently 401s and the picker hides current models
(claude-opus-4.7, claude-sonnet-4.6, gpt-5.5, grok-code-fast-1) — even
though `/model <id>` works fine because runtime inference reads the pool
through a different code path.

Mirror the Codex catalog resolver pattern: env-var first (unchanged), then
walk `read_credential_pool("copilot")` for the first entry with a
supported `access_token` (`gho_*` / `github_pat_*` / `ghu_*`). Run it
through `get_copilot_api_token()` so the catalog request uses the same
exchanged token the runtime path uses. Classic PATs (`ghp_*`) are still
rejected up-front via `validate_copilot_token` since the Copilot API
doesn't accept them.

Strictly additive: env still wins, and a missing/locked auth.json (or any
exception during pool read) still returns "" so the caller falls through
to the curated catalog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 01:18:09 -07:00
Teknium dd789a4fdf fix(mcp): move discovery out of model_tools import side effect (#16856) (#16899)
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes #16856.
2026-04-28 01:17:58 -07:00
Teknium c8ef786926 chore(release): AUTHOR_MAP entry for @ztexydt-cqh 2026-04-28 01:17:17 -07:00
ztexydt-cqh 1d5e25f353 fix(gateway): persist /sethome home channel to .env across all platforms
_handle_set_home_command wrote FEISHU_HOME_CHANNEL / DISCORD_HOME_CHANNEL /
etc. as top-level keys into config.yaml, but load_gateway_config() only
reads home channels from env vars. After every gateway restart the home
channel was lost — on every platform, not just Feishu.

Fix: switch /sethome to save_env_value(), which atomically writes to
~/.hermes/.env and updates the current process env in one shot. The
handler builds the env key from platform_name.upper(), so one line
change repairs /sethome for every platform that has a HOME_CHANNEL
env var.

Also widen _EXTRA_ENV_KEYS in hermes_cli/config.py so HOME_CHANNEL and
HOME_CHANNEL_NAME for every platform are treated as managed env vars:
SIGNAL, SLACK, SMS, DINGTALK, BLUEBUBBLES, FEISHU, WECOM, YUANBAO, plus
the missing *_NAME variants for DISCORD/TELEGRAM/MATTERMOST.

Closes #16806

Co-authored-by: teknium1 <screenmachine@gmail.com>
2026-04-28 01:17:17 -07:00
Teknium 9e4d79b17f fix(tui): /model writes HERMES_TUI_PROVIDER unconditionally (#16857) (#16897)
`/new` after `/model <custom-provider>:<model>` silently reverted to a
native provider whose static catalog happened to contain the same model
name (e.g. `deepseek-v4-pro` → native `deepseek` → 401).

Root cause at the `/model` writeback site: `HERMES_INFERENCE_PROVIDER`
was set unconditionally but `HERMES_TUI_PROVIDER` was only mirrored when
it was already set. On sessions launched without `--provider`,
`HERMES_TUI_PROVIDER` stayed unset, so `_resolve_startup_runtime()` on
`/new` skipped the explicit-provider early return and fell through to
`detect_static_provider_for_model()`.

Fix: set `HERMES_TUI_PROVIDER` unconditionally alongside
`HERMES_INFERENCE_PROVIDER` when `/model` lands. Keeps #15755's
invariant intact — `HERMES_TUI_PROVIDER` remains the canonical
"explicit this process" carrier, `HERMES_INFERENCE_PROVIDER` remains
ambient and does not short-circuit startup resolution.

Bug report and diagnosis: @Bartok9 in #16857 / #16873.

Fixes #16857
2026-04-28 01:17:04 -07:00
Teknium 9048fd020f fix(cli): tighten stale-dashboard match to explicit patterns
Replace the Linux/macOS pgrep regex ("hermes.*dashboard") with a ps
scan + the same explicit patterns list already used on the Windows
branch and in hermes_cli.gateway._scan_gateway_pids:

    hermes dashboard
    hermes_cli.main dashboard
    hermes_cli/main.py dashboard

The old greedy regex would match any cmdline containing both words —
e.g. a chat session whose argv mentions "dashboard" or an unrelated
grafana/dashboard-server process. Added regression tests for both.

Follow-up tightening on #16881.
2026-04-28 01:14:44 -07:00
Societus 66b1142384 fix(cli): warn about stale dashboard processes after hermes update
The dashboard is a long-lived server process users start and forget.
When hermes update replaces files on disk, the running process holds
the old Python backend in memory while the JS bundle gets updated,
producing a silent frontend/backend mismatch (e.g. v0.11.0 changed
the session token header -- old backends reject every API call).

Scan for running dashboard processes after a successful update (both
git and ZIP paths) and print a warning with their PIDs and restart
instructions. Mirrors the existing pattern for gateway processes.

Fixes #16872
2026-04-28 01:14:44 -07:00
ygd58 6b6fc28e85 fix(delegate): clear acp_command when override_provider is set
When delegation.provider is configured (e.g. minimax-cn), subagents
inherited the parent's acp_command unconditionally. This caused
run_agent.py to initialize CopilotACPClient, which bypassed the
override credentials entirely and used its own default model
(provider=copilot-acp model=qwen3.5-397b-a17b) instead of the
configured delegation.provider and delegation.model.

Fix: when override_provider is set but override_acp_command is not,
clear effective_acp_command and effective_acp_args so the child agent
uses direct API calls with the configured provider credentials.

The existing override_acp_command path is unchanged — explicit ACP
transport overrides still force provider=copilot-acp as before.

Fixes #16816
2026-04-28 01:14:38 -07:00
Teknium 54e24f7758 test(runtime_provider): lock in model-derivation precedence over stale api_mode
PR #16888 swaps the opencode-zen/go resolver so that api_mode is always
re-derived from the effective model before the persisted api_mode is
consulted. That's the point of the fix — a stale anthropic_messages
from a previous minimax default must not survive a /model switch to a
chat_completions target (or vice versa) and strip /v1 from base_url.

The prior test asserted the opposite precedence — that a persisted
api_mode won over model-derived mode — and was added in #4508 to lock
in escape-hatch behavior. Under the new precedence that escape hatch
no longer exists for opencode (only for providers that genuinely
support both modes at a single endpoint — and for opencode the model
name is the unambiguous signal). Rename + invert the assertion to
document the intentional behavior change.

Refs #16878.
2026-04-28 01:14:35 -07:00
Sanjay b52ceccfa8 fix(opencode): re-derive api_mode per target model on /model switch
opencode-zen and opencode-go each serve both anthropic_messages
(e.g. minimax-m2.7) and chat_completions (e.g. deepseek-v4-flash)
models behind a single base_url. The api_mode resolver in
hermes_cli/runtime_provider.py honoured the persisted
model_cfg.api_mode (set by the previous default model) before checking
the opencode model registry, so /model deepseek-v4-flash from a session
whose default was minimax-m2.7 inherited 'anthropic_messages', stripped
'/v1' from base_url (the Anthropic SDK adds its own /v1/messages), and
404'd.

Promote the opencode detection branch above the configured_mode check
in both api_mode resolution paths:

- _resolve_runtime_from_pool_entry (pool-backed providers)
- _resolve_api_key_runtime          (api-key providers, fallback path)

Both branches now call opencode_model_api_mode(provider, effective_model)
unconditionally for opencode-zen/go before considering any persisted
api_mode, so the mode always reflects the model the user just switched
to.

Existing tests pass (12/12 in tests/hermes_cli/test_model_switch_opencode_anthropic.py).

Fixes #16878
2026-04-28 01:14:35 -07:00
Teknium 755f050c67 chore(release): map qiyin-code email to GitHub login 2026-04-28 01:14:31 -07:00
左奇银 07a818804e feat(alibaba): add qwen3.6-plus to supported models
- Add qwen3.6-plus to the Alibaba DashScope curated model list
- Enables model switching via /model qwen3.6-plus without auto-correction warning
2026-04-28 01:14:31 -07:00
loongzhao 474c725b49 fix(yuanbao) messaging platform entrance 2026-04-28 01:11:37 -07:00
Teknium 8269f9056c feat(fast): broaden /fast whitelist to all OpenAI + Anthropic models (#16883)
Switch _PRIORITY_PROCESSING_MODELS and _ANTHROPIC_FAST_MODE_MODELS from
hardcoded frozensets to prefix-based matching. Any gpt-*, o1*, o3*, o4*
(OpenAI) and any claude-* (Anthropic) now exposes /fast.

Fixes the case where gpt-5.5 and other post-catalog models silently
skipped Priority Processing because they weren't in the frozenset.
Future OpenAI/Anthropic releases will work without a catalog bump.

Safety:
- Codex-series (*codex*) still excluded — they route through the Codex
  Responses API which doesn't take service_tier.
- Anthropic adapter already gates speed=fast on native endpoints only
  (_is_third_party_anthropic_endpoint), so claude-sonnet-4.6 on
  OpenRouter/Bedrock/opencode-zen won't leak the unknown beta.
- service_tier=priority is silently dropped by non-OpenAI proxies, so
  false positives are harmless.
2026-04-28 00:44:43 -07:00
helix4u 6ce796b495 fix(cron): preserve Telegram topic targets 2026-04-28 00:44:12 -07:00
Teknium cff29fa7fd chore(migration): reuse existing load_openclaw_config() helper
Drop the duplicate _load_openclaw_config_early() added in the salvaged
commit — load_openclaw_config() (line 979) has the identical body and
is a plain instance method that only needs self.source_root, which is
already set before __init__ needs it.
2026-04-28 00:39:58 -07:00
in-liberty420 2dfd73a497 fix(migration): resolve workspace files from agents.defaults.workspace
OpenClaw users who started before the rebrand (when the project was
clawd/clawdbot) often have a custom workspace directory configured via
agents.defaults.workspace in openclaw.json (e.g. ~/clawd/ instead of
~/.openclaw/workspace/).

The migration tool only checked hardcoded relative paths (workspace/,
workspace-main/, workspace-assistant/) inside the source root, so files
like MEMORY.md, skills, and daily memory in custom workspaces were
silently skipped.

This change:
- Reads agents.defaults.workspace from openclaw.json at init time
- Uses it as a final fallback in source_candidate() when files aren't
  found in the standard locations
- Standard workspace paths are still preferred (custom is fallback only)
- Custom workspace is only used when it's outside the source_root tree
  (avoids double-matching when workspace/ is the default)

Adds two tests:
- Custom workspace files are discovered and migrated
- Standard workspace location is preferred over custom
2026-04-28 00:39:58 -07:00
Teknium 8081425a1c feat(security): make secret redaction off by default (#16794)
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.

New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.

Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
2026-04-27 21:24:08 -07:00
Teknium ec8243fe2a chore(release): map matrix-parity-batch contributor emails to GitHub logins 2026-04-27 21:22:44 -07:00
Teknium 3d67364b8f test(matrix): set user_id in approval-reaction test to bypass defensive self-drop
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
2026-04-27 21:22:44 -07:00
nbot 38a6bada92 feat(matrix): reaction-based exec approval + mention_user_id
Add Matrix reaction-based exec approval (/) and mention_user_id
support for push notifications in muted rooms.

- matrix.py: _MatrixApprovalPrompt, send_exec_approval, reaction
  approval handling, bot seed reaction redaction, mention pill in send
- base.py: inject mention_user_id into send metadata
- run.py: inject mention_user_id into status thread metadata
- tests for approval prompt registration and reaction resolution
2026-04-27 21:22:44 -07:00
Andrew Miller 6c70ac8eef matrix: e2e test for cross-signing auto-bootstrap
Self-contained docker-compose harness that exercises the new bootstrap
branch against a real Continuwuity homeserver. Three tests:

  1. fresh bot → bootstrap fires, /keys/query returns master + ssk
     with UNPADDED base64 keyids, current device is signed by the
     new SSK
  2. second startup with same crypto store → bootstrap is skipped
  3. MATRIX_RECOVERY_KEY set → existing verify_with_recovery_key path
     takes precedence, no new bootstrap

Run via:

    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
    python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v

The test mirrors the bootstrap snippet from matrix.py inline so it can
run without importing the full hermes gateway and its deps. Skipped
automatically when mautrix isn't installed or the homeserver is
unreachable.

All three pass against ghcr.io/continuwuity/continuwuity:latest
(Continuwuity 0.5.7). The unpadded-keyid assertion is the load-bearing
one — it's exactly the property the PR's bootstrap path provides that
the hand-rolled `base64.b64encode().decode()` scripts get wrong.
2026-04-27 21:22:44 -07:00
Andrew Miller d497387cec matrix: auto-bootstrap cross-signing on first startup
Without this, every Matrix bot started under hermes-agent shows the
"Encrypted by a device not verified by its owner" badge in Element
indefinitely, because the cross-signing chain (master → SSK → device)
was never published. Operators currently have to write their own
bootstrap script and remember to run it once per bot — and it's easy
to get wrong (the obvious base64.b64encode().decode() produces padded
keyids that matrix-rust-sdk silently rejects in /keys/query, so even
correctly-signed keys fail to load identity in Element).

mautrix already has the right primitive: generate_recovery_key() does
the full flow — generate seeds, upload privates to SSSS, publish
publics to the homeserver, sign the current device with the new SSK,
and return the human-readable recovery key. We invoke it once on
startup if the bot has no existing cross-signing identity, and log
the recovery key with a clear instruction to save it for future
restarts via MATRIX_RECOVERY_KEY (which the existing recovery-key
path already consumes).

Skipped when MATRIX_RECOVERY_KEY is set (existing path takes over)
or when the bot already has cross-signing keys on the homeserver
(get_own_cross_signing_public_keys returns non-None).

Bootstrap failure is non-fatal — logged with hint about UIA; the bot
continues without cross-signing and Element will show the warning
that prompted this PR. That matches the existing soft-fail pattern
for verify_with_recovery_key.

Tested against Continuwuity 0.5.7 (no UIA required). Synapse with
UIA enabled will need a follow-up PR to thread MATRIX_PASSWORD
through to /keys/device_signing/upload.
2026-04-27 21:22:44 -07:00
konsisumer 32d4048c6b fix: MatrixAdapter respects proxy configuration 2026-04-27 21:22:44 -07:00
Adam Rummer 1eab5960f0 feat(matrix): add dm_auto_thread config for DM auto-threading
Adds MATRIX_DM_AUTO_THREAD env var (default: false) to control
auto-threading in DM rooms independently from channel auto-threading.

Closes #15398
2026-04-27 21:22:44 -07:00
LeonSGP43 74a4832b74 fix(matrix): normalize image-only filenames 2026-04-27 21:22:44 -07:00
Alexazhu fbbcfa24c5 fix(matrix): preserve exception tracebacks on E2EE and auth failures
Five ``except Exception as exc:`` blocks in the Matrix adapter logged
only ``str(exc)`` without ``exc_info=True``:

- _reverify_keys_after_upload → post-upload key verification failure
- _upload_keys_if_needed      → initial device-key query failure
- _upload_keys_if_needed      → re-upload device keys failure
- _upload_keys_if_needed      → initial device key upload failure
- connect → whoami / access-token validation failure

The E2EE key paths here are security-critical: a silent traceback-
less failure during device-key verification or upload makes it
hard for operators to tell whether their Matrix bot is failing
because of a stale token, a federation timeout, or an olm state
mismatch — all three fail with different tracebacks, which
``str(exc)`` alone flattens.

The contributing guide asks for ``exc_info=True`` on error logs.
Append it to each of the five call sites. Pure logging enrichment.
2026-04-27 21:22:44 -07:00
Heathley f223346eb7 fix(matrix): add sync timeout, callback diagnostics, and mention-drop logging
- Wrap _sync_loop sync() call with asyncio.wait_for(timeout=45s) to guard
  against TCP-level hangs that the Matrix long-poll timeout cannot catch
- Add logger.debug at the top of _on_room_message so LOG_LEVEL=DEBUG
  confirms whether callbacks fire at all (diagnoses #5819, #7914, #12614)
- Add logger.debug when MATRIX_REQUIRE_MENTION silently drops a message,
  pointing users to the env var to disable the filter

Adapted for current mautrix-python adapter (PR was written against the
legacy matrix-nio adapter).

Closes #5819
2026-04-27 21:22:44 -07:00
Charles Brooks 57f8cf00e9 fix(matrix): reconcile pending invites from sync state 2026-04-27 21:22:44 -07:00
Teknium 6649e7e746 test(matrix): adapt outbound-mention notice test to current _send_simple_message API 2026-04-27 21:22:44 -07:00
Angel Claw 32b78578e0 fix(matrix): strip only explicit @mentions in _strip_mention 2026-04-27 21:22:44 -07:00
Sami Rusani 6769a0aece fix(matrix): add outbound mention payloads 2026-04-27 21:22:44 -07:00
Teknium d7528d43ac fix(web): scope dashboard config Reset button to the current tab (#16813)
* Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total

Child subagents built by delegate_task() each track their own
session_estimated_cost_usd, but the parent agent's total never folded
those numbers in.  On runs where the parent mostly delegates and the
children do the expensive work, the footer/UI was reporting a fraction
of the actual spend — sometimes $0.00 when the parent itself made no
billed calls.

Fix:
- Capture each child's session_estimated_cost_usd into _child_cost_usd
  on the result entry (before child.close() drops the counter).
- After the existing subagent_stop hook loop, sum the children's costs
  and add the total to parent.session_estimated_cost_usd.
- Promote session_cost_source from 'none' -> 'subagent' when the parent
  had no direct spend but children did, so the UI doesn't label the
  total as having unknown provenance.  Real sources (openrouter,
  anthropic, etc.) are preserved.

Nested orchestrator -> worker trees roll up naturally: each layer's own
delegate_task() folds its direct children in, and when the orchestrator
itself returns, its parent folds the orchestrator's now-inflated total
on top.

Internal fields (_child_cost_usd, _child_role) are stripped from the
results dict before it's serialised back to the model — same contract
as _child_role already followed.

Tests: TestSubagentCostRollup (5 cases) covers single-child, batch,
zero-cost-children, preserved-source, and legacy-fixture paths.

Source: https://github.com/Kilo-Org/kilocode/pull/9448

* fix(web): scope dashboard config Reset button to the current tab

Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on
the Auxiliary page wiped the entire config.yaml to defaults, not just
the auxiliary section. The button sits next to the category tabs and
users reasonably assumed 'reset this tab', not 'reset everything'.

Changes:
- handleReset now scopes to the fields in the current view:
  active category's fields (form mode) or search-matched fields
  (search mode). Only those keys are copied from defaults; the rest
  of the config is left alone.
- Added a window.confirm() with the scope name before applying.
- Button is hidden in YAML mode (scoping doesn't apply there).
- Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to
  defaults'.
- i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast
  strings in en + zh; resetDefaults key preserved for compat.
2026-04-27 21:09:14 -07:00
Teknium a7cdd4133c fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793)
On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6
are capped at 200K context unless the request carries the
`context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com)
1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate
it as beta as of 2026-04.

Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`)
while silently sending a request without the beta — so Bedrock users saw
a 200K ceiling with no error message, and no config knob unblocked it.
Claude Code sends this header by default, which is why the same Bedrock
credentials worked there.

- Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved
  thinking and fine-grained tool streaming).
- Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth
  endpoints — they host their own models, not Claude, so Anthropic beta
  headers are irrelevant and could risk rejection.
- Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock
  client. Previously that constructor passed no betas at all, so native
  Anthropic had the 1M unlock via default_headers but Bedrock didn't.
- Fast-mode per-request `extra_headers` already rebuilds from
  `_common_betas_for_base_url`, so it picks up the 1M beta automatically.

Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while
same credentials worked in Claude Code.
2026-04-27 20:41:36 -07:00
kshitijk4poor 461ef88705 fix(state): declarative column reconciliation for stuck-at-old-v7 DBs
Anyone who ran hermes between Apr 15 (42aeb4ec) and Apr 22 (a7d78d3b)
has schema_version=7 from the pre-renumber api_call_count migration.
When a7d78d3b inserted reasoning_content as the new v7 and pushed
api_call_count to v8, the 'if current_version < 7' gate was already
false for those users, so reasoning_content was never created —
sqlite3.OperationalError: no such column: reasoning_content on any
/continue or /resume touching assistant replays.

Replaces the version-gated ADD COLUMN chain with _reconcile_columns():
on every startup, parse SCHEMA_SQL via an in-memory SQLite and diff
against PRAGMA table_info; ALTER TABLE ADD COLUMN for anything missing.
Follows the Beets / sqlite-utils pattern — SCHEMA_SQL becomes the single
source of truth for declared columns. Self-healing and idempotent.

v10 trigram FTS backfill is retained in a version-gated block — that
migration isn't a column add, it inserts existing message rows into
the new FTS virtual table, so reconciliation can't express it.
schema_version is also kept for future row-data migrations.

Salvaged from #14097 (@kshitijk4poor) onto current main; v10 trigram
preservation and the v9 codex_message_items column (stale-missed by
the original branch) are covered automatically by reconciliation.

Tests:
- Regression: DB at old v7 with api_call_count but no reasoning_content
  gets the column on open
- Idempotency: reopening the same DB is a no-op
- Structural invariant: every SCHEMA_SQL column is in the live DB
- Existing v2 migration test still passes
- E2E verified against fresh / v1 / old-v7 / v9 DBs, plus v10 trigram
  backfill preserved
2026-04-27 20:29:32 -07:00
Teknium 12d745bd7e feat(skills): port humanizer — strip AI-isms from text (#16787)
Port https://github.com/blader/humanizer (MIT, v2.5.1, 16k stars) into
the built-in skills under skills/creative/humanizer/. Based on Wikipedia's
'Signs of AI writing' guide (WikiProject AI Cleanup) — detects 29 AI-writing
patterns and rewrites them to sound human.

Hermes-native adaptations:
- Description (<60 chars) explains what it's for: 'Humanize text: strip
  AI-isms and add real voice.'
- 'When to use this skill' section — trigger phrases (humanize, de-AI,
  de-slop, un-ChatGPT, rewrite to not sound like an LLM) plus guidance to
  apply it to the agent's own output (release notes, PR descriptions, docs).
- 'How to use it in Hermes' — maps the three real input paths (inline,
  file via read_file/patch/write_file, voice-calibration sample) onto the
  tools the agent actually has. Drops Claude Code's allowed-tools block.
- Converted frontmatter to Hermes format (metadata.hermes.tags, category,
  homepage, related_skills).

Attribution preserved:
- Original author Siqi Chen (@blader) credited in frontmatter and body.
- Full MIT LICENSE copied verbatim alongside SKILL.md.
- Wikipedia / WikiProject AI Cleanup credited.
- 29 patterns, personality/soul section, and full worked example kept
  verbatim from the source (29,914 chars).

Validated end-to-end against a clean HERMES_HOME:
- sync_skills() copies skills/creative/humanizer/ including LICENSE.
- skills_list(category='creative') returns the 48-char description.
- skill_view(name='humanizer') returns the full body with all 29 patterns,
  personality/soul, attribution, and Hermes tool refs (read_file, patch,
  write_file) intact.
2026-04-27 20:25:20 -07:00
Teknium 30307a9802 feat(plugins): add pre_approval_request / post_approval_response hooks (#16776)
Plugins can now observe dangerous-command approval events in real time,
on both the CLI-interactive path and the async gateway path. This is the
missing hook surface external tools need to build approval notifiers
(macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without
forking Hermes or running a parallel gateway adapter.

Changes:
- hermes_cli/plugins.py: add two entries to VALID_HOOKS
- tools/approval.py: fire both hooks from check_all_command_guards --
  around prompt_dangerous_approval (CLI surface) and around the
  notify_cb + blocking event.wait loop (gateway surface)
- website/docs/user-guide/features/hooks.md: document both hooks with
  a macOS-notification example
- tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once,
  CLI deny, plugin-crash resilience, gateway approve, gateway timeout

Hooks are observer-only: return values are ignored, so plugins cannot
veto or pre-answer an approval (use pre_tool_call for that). A crashing
plugin cannot break the approval flow -- invoke_hook swallows per-
callback errors, and the wrapper logs and swallows dispatch-layer
errors too.

Surface kwarg distinguishes "cli" from "gateway"; post hook reports
choice as one of once/session/always/deny/timeout.
2026-04-27 20:08:33 -07:00
Teknium 6ea5699e3f fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775)
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.

Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None

Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.

Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam

Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
2026-04-27 20:08:23 -07:00
SHL0MS c3e3a9c184 feat(skills): add Tier A references — external-data, panel-ui, replicator, dat-scripting, 3d-scene
Five additional reference docs covering common TD use cases that were not yet
documented in any reference (operators.md lists the ops, but no usage patterns).

- external-data.md: webDAT, webclientDAT, webserverDAT, websocketDAT,
  mqttClientDAT, serialDAT, tcpipDAT — auth, polling, push, JSON parsing
- panel-ui.md: custom parameter pages, button/slider/field/list COMPs,
  containerCOMP layouts, panelExecuteDAT callbacks
- replicator.md: replicatorCOMP for data-driven cloning, per-row overrides,
  recreatemissing pattern, replicator vs Python loop
- dat-scripting.md: full Execute DAT family — chopExecuteDAT, datExecuteDAT,
  parameterExecuteDAT, panelExecuteDAT, opExecuteDAT, executeDAT lifecycle
- 3d-scene.md: light types, three-point rigs, shadows, IBL/cubemaps,
  PBR materials with idiom table, multi-camera, DOF

Same conventions as existing refs: code-first, verify param names with
td_get_par_info, no token-budget impact (load on demand).
2026-04-27 19:35:18 -07:00
SHL0MS 02df438316 feat(skills): expand touchdesigner-mcp with animation, MIDI/OSC, particles, projection refs
Adds four new reference docs covering common TD use cases not previously
documented in the skill:

- animation.md: LFOs, timers, keyframes, easing, time references
- midi-osc.md: MIDI controllers, OSC routing, TouchOSC, multi-machine sync
- particles.md: POPs and particleSOP — emission, forces, collisions, render
- projection-mapping.md: windowCOMP, corner pin, mesh warp, edge blending

Also clarifies the SKILL.md tool quick reference: adds td_screen_point_to_global
and notes that 4 admin/dev-mode tools (td_project_quit, td_test_session,
td_dev_log, td_clear_dev_log) live only in mcp-tools.md to keep the main
reference focused on creative workflows.

No SKILL.md workflow or critical-rules changes. References load on demand
so no token-budget impact at session start.
2026-04-27 19:35:18 -07:00
Teknium 94b26f3ec9 fix(compression): retry summary on main model for unknown errors before giving up (#16774)
The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder.

Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back).

Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.
2026-04-27 19:25:57 -07:00
462 changed files with 45624 additions and 3937 deletions
+2
View File
@@ -5,7 +5,9 @@
# Dependencies
node_modules
**/node_modules
.venv
**/.venv
# CI/CD
.github
+7 -1
View File
@@ -13,7 +13,7 @@ concurrency:
cancel-in-progress: true
jobs:
check:
nix-lockfile-check:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
@@ -36,6 +36,12 @@ jobs:
LINK_SHA: ${{ steps.sha.outputs.full }}
run: nix run .#fix-lockfiles -- --check
- name: Fail if check crashed without reporting
if: steps.check.outputs.stale != 'true' && steps.check.outputs.stale != 'false'
run: |
echo "::error::fix-lockfiles exited without reporting stale status — likely an infrastructure or script failure"
exit 1
- name: Post sticky PR comment (stale)
if: steps.check.outputs.stale == 'true' && github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2.9.1
+103 -2
View File
@@ -1,6 +1,13 @@
name: Nix Lockfile Fix
on:
push:
branches: [main]
paths:
- 'ui-tui/package-lock.json'
- 'ui-tui/package.json'
- 'web/package-lock.json'
- 'web/package.json'
workflow_dispatch:
inputs:
pr_number:
@@ -19,9 +26,103 @@ concurrency:
cancel-in-progress: false
jobs:
# ── Auto-fix on main ───────────────────────────────────────────────
# Fires when a push to main touches package.json or package-lock.json
# in ui-tui/ or web/. Runs fix-lockfiles --apply and pushes the hash
# update commit directly to main so Nix builds never stay broken.
#
# Safety invariants:
# 1. The fix commit only touches nix/*.nix files, which are NOT in
# the paths filter above, so this cannot re-trigger itself.
# 2. An explicit file-whitelist check before commit aborts if
# fix-lockfiles ever modifies unexpected files.
# 3. Job-level concurrency with cancel-in-progress: true ensures
# back-to-back pushes collapse to the newest; ref: main checkout
# always operates on the latest branch state.
# 4. Uses a GitHub App token (not GITHUB_TOKEN) so the fix commit
# triggers downstream nix.yml verification.
auto-fix-main:
if: github.event_name == 'push'
runs-on: ubuntu-latest
timeout-minutes: 25
concurrency:
group: auto-fix-main
cancel-in-progress: true
steps:
- name: Generate GitHub App token
id: app-token
uses: actions/create-github-app-token@7bfa3a4717ef143a604ee0a99d859b8886a96d00 # v1.9.3
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: main
token: ${{ steps.app-token.outputs.token }}
- uses: ./.github/actions/nix-setup
- name: Apply lockfile hashes
id: apply
run: nix run .#fix-lockfiles -- --apply
- name: Commit & push
if: steps.apply.outputs.changed == 'true'
shell: bash
run: |
set -euo pipefail
# Ensure only nix files were modified — prevents accidental
# self-triggering if fix-lockfiles ever touches package files.
unexpected="$(git diff --name-only | grep -Ev '^nix/(tui|web)\.nix$' || true)"
if [ -n "$unexpected" ]; then
echo "::error::Unexpected modified files: $unexpected"
exit 1
fi
# Record the base SHA before committing — used to detect package
# file changes if we need to rebase after a non-fast-forward push.
BASE_SHA="$(git rev-parse HEAD)"
git config user.name 'github-actions[bot]'
git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
git add nix/tui.nix nix/web.nix
git commit -m "fix(nix): auto-refresh npm lockfile hashes" \
-m "Source: $GITHUB_SHA" \
-m "Run: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID"
# Retry push with rebase in case main advanced with an unrelated
# commit during the nix build. Without this, a non-fast-forward
# rejection silently loses the fix. If package files changed during
# the rebase, abort — a fresh auto-fix run will handle the new state.
for attempt in 1 2 3; do
if git push origin HEAD:main; then
exit 0
fi
echo "::warning::Push attempt $attempt failed (non-fast-forward?), rebasing…"
git fetch origin main
# If package files changed between our base and the new main,
# our computed hashes are stale. Abort and let the next triggered
# run recompute from the correct package-lock state.
pkg_changed="$(git diff --name-only "$BASE_SHA"..origin/main -- \
'ui-tui/package-lock.json' 'ui-tui/package.json' \
'web/package-lock.json' 'web/package.json' || true)"
if [ -n "$pkg_changed" ]; then
echo "::warning::Package files changed since hash computation — aborting; a fresh run will recompute"
exit 0
fi
git rebase origin/main
done
echo "::error::Failed to push after 3 rebase attempts"
exit 1
# ── PR fix (manual / checkbox) ─────────────────────────────────────
# Existing behavior: run on manual dispatch OR when a task-list
# checkbox in the sticky lockfile-check comment flips from [ ] to [x].
fix:
# Run on manual dispatch OR when a task-list checkbox in the sticky
# lockfile-check comment flips from `[ ]` to `[x]`.
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'issue_comment'
+1 -1
View File
@@ -38,7 +38,7 @@ hermes-agent/
│ │ # homeassistant, signal, matrix, mattermost, email, sms,
│ │ # dingtalk, wecom, weixin, feishu, qqbot, bluebubbles,
│ │ # webhook, api_server, ...). See ADDING_A_PLATFORM.md.
│ └── builtin_hooks/ # Always-registered gateway hooks (boot-md, ...)
│ └── builtin_hooks/ # Extension point for always-registered gateway hooks (none shipped)
├── plugins/ # Plugin system (see "Plugins" section below)
│ ├── memory/ # Memory-provider plugins (honcho, mem0, supermemory, ...)
│ ├── context_engine/ # Context-engine plugins
+1 -1
View File
@@ -494,7 +494,7 @@ branding:
agent_name: "My Agent"
welcome: "Welcome message"
response_label: " ⚔ Agent "
prompt_symbol: "⚔ "
prompt_symbol: "⚔"
tool_prefix: "╎" # Tool output line prefix
```
+8 -2
View File
@@ -14,7 +14,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -45,7 +45,13 @@ COPY --chown=hermes:hermes . .
# Build browser dashboard and terminal UI assets.
RUN cd web && npm run build && \
cd ../ui-tui && npm run build
cd ../ui-tui && npm run build && \
rm -rf node_modules/@hermes/ink && \
rm -rf packages/hermes-ink/node_modules && \
cp -R packages/hermes-ink node_modules/@hermes/ink && \
npm install --omit=dev --prefer-offline --no-audit --prefix node_modules/@hermes/ink && \
rm -rf node_modules/@hermes/ink/node_modules/react && \
node --input-type=module -e "await import('@hermes/ink')"
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
+11
View File
@@ -112,6 +112,17 @@ def main() -> None:
import acp
from .server import HermesACPAgent
# MCP tool discovery from config.yaml — run before asyncio.run() so
# it's safe to use blocking waits. (ACP also registers per-session
# MCP servers dynamically via asyncio.to_thread inside the event
# loop; that path is unaffected.) Moved from model_tools.py module
# scope to avoid freezing the gateway's loop on lazy import (#16856).
try:
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug("MCP tool discovery failed at ACP startup", exc_info=True)
agent = HermesACPAgent()
try:
asyncio.run(acp.run_agent(agent, use_unstable_protocol=True))
+28 -1
View File
@@ -3,6 +3,7 @@
from __future__ import annotations
import asyncio
import contextvars
import logging
import os
from collections import defaultdict, deque
@@ -574,6 +575,22 @@ class HermesACPAgent(acp.Agent):
def _run_agent() -> dict:
nonlocal previous_approval_cb, previous_interactive
# Bind HERMES_SESSION_KEY for this session so per-session caches
# (e.g. the interactive sudo password cache in tools.terminal_tool)
# scope to the ACP session rather than leaking across sessions
# that land on the same reused executor thread. This call runs
# inside a contextvars.copy_context() below, so the ContextVar
# write is isolated from other concurrent ACP sessions.
try:
from gateway.session_context import (
clear_session_vars,
set_session_vars,
)
session_tokens = set_session_vars(session_key=session_id)
except Exception:
session_tokens = None
clear_session_vars = None # type: ignore[assignment]
logger.debug("Could not set ACP session context", exc_info=True)
if approval_cb:
try:
from tools import terminal_tool as _terminal_tool
@@ -607,9 +624,19 @@ class HermesACPAgent(acp.Agent):
_terminal_tool.set_approval_callback(previous_approval_cb)
except Exception:
logger.debug("Could not restore approval callback", exc_info=True)
if session_tokens is not None and clear_session_vars is not None:
try:
clear_session_vars(session_tokens)
except Exception:
logger.debug("Could not clear ACP session context", exc_info=True)
try:
result = await loop.run_in_executor(_executor, _run_agent)
# Wrap the executor call in a fresh copy of the current context so
# concurrent ACP sessions on the shared ThreadPoolExecutor don't
# stomp on each other's ContextVar writes (HERMES_SESSION_KEY in
# particular — used by the interactive sudo password cache scope).
ctx = contextvars.copy_context()
result = await loop.run_in_executor(_executor, ctx.run, _run_agent)
except Exception:
logger.exception("Executor error for session %s", session_id)
return PromptResponse(stop_reason="end_turn")
+199 -26
View File
@@ -20,12 +20,27 @@ from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Any, Dict, List, Optional, Tuple
from utils import normalize_proxy_env_vars
from utils import base_url_host_matches, normalize_proxy_env_vars
try:
import anthropic as _anthropic_sdk
except ImportError:
_anthropic_sdk = None # type: ignore[assignment]
# NOTE: `import anthropic` is deliberately NOT at module top — the SDK pulls
# ~220 ms of imports (anthropic.types, anthropic.lib.tools._beta_runner, etc.)
# and the 3 usage sites (build_anthropic_client, build_anthropic_bedrock_client,
# read_claude_code_credentials_from_keychain) are all on cold user-triggered
# paths. Access via the `_get_anthropic_sdk()` accessor below, which caches
# the module after the first call and returns None on ImportError.
_anthropic_sdk: Any = ... # sentinel — None means "tried and missing"
def _get_anthropic_sdk():
"""Return the ``anthropic`` SDK module, importing lazily. None if not installed."""
global _anthropic_sdk
if _anthropic_sdk is ...:
try:
import anthropic as _sdk
_anthropic_sdk = _sdk
except ImportError:
_anthropic_sdk = None
return _anthropic_sdk
logger = logging.getLogger(__name__)
@@ -202,19 +217,33 @@ def _forbids_sampling_params(model: str) -> bool:
# Beta headers for enhanced features (sent with ALL auth types).
# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
# As of Opus 4.7 (2026-04-16), the first two are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
# that still gate on the headers continue to get the enhanced features.
# Migration guide: remove these if you no longer support ≤4.5 models.
#
# ``context-1m-2025-08-07`` unlocks the 1M context window on Claude Opus 4.6/4.7
# and Sonnet 4.6 when served via AWS Bedrock or Azure AI Foundry. 1M is GA on
# native Anthropic (api.anthropic.com) for Opus 4.6+, but Bedrock/Azure still
# gate it behind this beta header as of 2026-04 — without it Bedrock caps Opus
# at 200K even though model_metadata.py advertises 1M. The header is a harmless
# no-op on endpoints where 1M is GA.
#
# Migration guide: remove these if you no longer support ≤4.5 models or once
# Bedrock/Azure promote 1M to GA.
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
"context-1m-2025-08-07",
]
# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
# the fine-grained tool streaming beta is present. Omit it so tool calls
# fall back to the provider's default response path.
_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
# 1M context beta — see comment on _COMMON_BETAS above. Stripped for
# Bearer-auth (MiniMax) endpoints since they host their own models and
# unknown Anthropic beta headers risk request rejection.
_CONTEXT_1M_BETA = "context-1m-2025-08-07"
# Fast mode beta — enables the ``speed: "fast"`` request parameter for
# significantly higher output token throughput on Opus 4.6 (~2.5x).
@@ -336,6 +365,88 @@ def _is_kimi_coding_endpoint(base_url: str | None) -> bool:
return normalized.rstrip("/").lower().startswith("https://api.kimi.com/coding")
# Model-name prefixes that identify the Kimi / Moonshot family. Covers
# - official slugs: ``kimi-k2.5``, ``kimi_thinking``, ``moonshot-v1-8k``
# - common release lines: ``k1.5-...``, ``k2-thinking``, ``k25-...``, ``k2.5-...``
# Matched case-insensitively against the post-``normalize_model_name`` form,
# so a caller's ``provider/vendor/model`` slug is handled the same as a
# bare name.
_KIMI_FAMILY_MODEL_PREFIXES = (
"kimi-", "kimi_",
"moonshot-", "moonshot_",
"k1.", "k1-",
"k2.", "k2-",
"k25", "k2.5",
)
def _model_name_is_kimi_family(model: str | None) -> bool:
if not isinstance(model, str):
return False
m = model.strip().lower()
if not m:
return False
# Strip vendor prefix (e.g. ``moonshotai/kimi-k2.5`` → ``kimi-k2.5``)
if "/" in m:
m = m.rsplit("/", 1)[-1]
return m.startswith(_KIMI_FAMILY_MODEL_PREFIXES)
def _is_kimi_family_endpoint(base_url: str | None, model: str | None = None) -> bool:
"""Return True for any Kimi / Moonshot Anthropic-Messages-speaking endpoint.
Broader than ``_is_kimi_coding_endpoint`` — matches:
- Kimi's official ``/coding`` URL (legacy check, preserved)
- Any ``api.kimi.com`` / ``moonshot.ai`` / ``moonshot.cn`` host
- Custom or proxied endpoints whose *model* name is in the Kimi / Moonshot
family (``kimi-*``, ``moonshot-*``, ``k1.*``, ``k2.*``, …). Users with
``api_mode: anthropic_messages`` on a private gateway fronting Kimi
fall into this branch — the upstream still enforces Kimi's thinking
semantics (reasoning_content required on every replayed tool-call
message) regardless of the gateway's hostname.
Used to decide whether to drop Anthropic's ``thinking`` kwarg and to
preserve unsigned reasoning_content-derived thinking blocks on replay.
See hermes-agent#13848, #17057.
"""
if _is_kimi_coding_endpoint(base_url):
return True
for _domain in ("api.kimi.com", "moonshot.ai", "moonshot.cn"):
if base_url_host_matches(base_url or "", _domain):
return True
if _model_name_is_kimi_family(model):
return True
return False
def _is_deepseek_anthropic_endpoint(base_url: str | None) -> bool:
"""Return True for DeepSeek's Anthropic-compatible endpoint.
DeepSeek's ``/anthropic`` route speaks the Anthropic Messages protocol
but, when thinking mode is enabled, requires the ``thinking`` blocks
from prior assistant turns to round-trip on subsequent requests — the
generic third-party path strips them and triggers HTTP 400::
The content[].thinking in the thinking mode must be passed back
to the API.
Per DeepSeek's published compatibility matrix the blocks are unsigned
(no Anthropic-proprietary signature, no ``redacted_thinking`` support),
so this endpoint is handled with the same strip-signed / keep-unsigned
policy used for Kimi's ``/coding`` endpoint. The match is pinned to
the ``/anthropic`` path so the OpenAI-compatible ``api.deepseek.com``
base URL (which never reaches this adapter) is not misclassified.
See hermes-agent#16748.
"""
if not base_url_host_matches(base_url or "", "api.deepseek.com"):
return False
normalized = _normalize_base_url_text(base_url)
if not normalized:
return False
return "/anthropic" in normalized.rstrip("/").lower()
def _requires_bearer_auth(base_url: str | None) -> bool:
"""Return True for Anthropic-compatible providers that require Bearer auth.
@@ -357,9 +468,14 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
that include Anthropic's ``fine-grained-tool-streaming`` beta — every
tool-use message triggers a connection error. Strip that beta for
Bearer-auth endpoints while keeping all other betas intact.
The ``context-1m-2025-08-07`` beta is also stripped for Bearer-auth
endpoints — MiniMax hosts its own models, not Claude, so the header is
irrelevant at best and risks request rejection at worst.
"""
if _requires_bearer_auth(base_url):
return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
_stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
return [b for b in _COMMON_BETAS if b not in _stripped]
return _COMMON_BETAS
@@ -374,6 +490,7 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
Returns an anthropic.Anthropic instance.
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Anthropic provider. "
@@ -456,8 +573,16 @@ def build_anthropic_bedrock_client(region: str):
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Attaches the common Anthropic beta headers as client-level defaults so
that Bedrock-hosted Claude models get the same enhanced features as
native Anthropic. The ``context-1m-2025-08-07`` beta in particular
unlocks the 1M context window for Opus 4.6/4.7 on Bedrock — without
it, Bedrock caps these models at 200K even though the Anthropic API
serves them with 1M natively.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "
@@ -473,6 +598,7 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
default_headers={"anthropic-beta": ",".join(_COMMON_BETAS)},
)
@@ -488,9 +614,6 @@ def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
@@ -1035,9 +1158,12 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
# Only convert dots to hyphens for Anthropic/Claude models.
# Non-Anthropic models (gpt-5.4, gemini-2.5, etc.) use dots
# as part of their canonical names. See issue #17171.
_lower = model.lower()
if _lower.startswith("claude-") or _lower.startswith("anthropic/"):
model = model.replace(".", "-")
return model
@@ -1054,6 +1180,33 @@ def _sanitize_tool_id(tool_id: str) -> str:
return sanitized or "tool_0"
def _normalize_tool_input_schema(schema: Any) -> Dict[str, Any]:
"""Normalize tool schemas before sending them to Anthropic.
Anthropic's tool schema validator rejects nullable unions such as
``anyOf: [{"type": "string"}, {"type": "null"}]`` that Pydantic/MCP
commonly emits for optional fields. Tool optionality is represented by
the parent ``required`` array, so we delegate to the shared
``strip_nullable_unions`` helper to collapse nullable unions to the
non-null branch while preserving metadata like description/default.
``keep_nullable_hint=False`` because the Anthropic validator does not
recognize the OpenAPI-style ``nullable: true`` extension and strict
schema-to-grammar converters may reject unknown keywords.
"""
if not schema:
return {"type": "object", "properties": {}}
from tools.schema_sanitizer import strip_nullable_unions
normalized = strip_nullable_unions(schema, keep_nullable_hint=False)
if not isinstance(normalized, dict):
return {"type": "object", "properties": {}}
if normalized.get("type") == "object" and not isinstance(normalized.get("properties"), dict):
normalized = {**normalized, "properties": {}}
return normalized
def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
"""Convert OpenAI tool definitions to Anthropic format."""
if not tools:
@@ -1064,7 +1217,9 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
result.append({
"name": fn.get("name", ""),
"description": fn.get("description", ""),
"input_schema": fn.get("parameters", {"type": "object", "properties": {}}),
"input_schema": _normalize_tool_input_schema(
fn.get("parameters", {"type": "object", "properties": {}})
),
})
return result
@@ -1195,6 +1350,7 @@ def _convert_content_to_anthropic(content: Any) -> Any:
def convert_messages_to_anthropic(
messages: List[Dict],
base_url: str | None = None,
model: str | None = None,
) -> Tuple[Optional[Any], List[Dict]]:
"""Convert OpenAI-format messages to Anthropic format.
@@ -1206,6 +1362,12 @@ def convert_messages_to_anthropic(
endpoint, all thinking block signatures are stripped. Signatures are
Anthropic-proprietary — third-party endpoints cannot validate them and will
reject them with HTTP 400 "Invalid signature in thinking block".
When *model* is provided and matches the Kimi / Moonshot family (or
*base_url* is a Kimi / Moonshot host), unsigned thinking blocks
synthesised from ``reasoning_content`` are preserved on replayed
assistant tool-call messages — Kimi requires the field to exist, even
if empty.
"""
system = None
result = []
@@ -1434,7 +1596,16 @@ def convert_messages_to_anthropic(
# cache markers can interfere with signature validation.
_THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
_is_third_party = _is_third_party_anthropic_endpoint(base_url)
_is_kimi = _is_kimi_coding_endpoint(base_url)
# Kimi /coding and DeepSeek /anthropic share a contract: both speak the
# Anthropic Messages protocol upstream but require that thinking blocks
# synthesised from reasoning_content round-trip on subsequent turns when
# thinking is enabled. Signed Anthropic blocks still have to be stripped
# (neither endpoint can validate Anthropic's signatures); unsigned blocks
# are preserved. See hermes-agent#13848 (Kimi) and #16748 (DeepSeek).
_preserve_unsigned_thinking = (
_is_kimi_family_endpoint(base_url, model)
or _is_deepseek_anthropic_endpoint(base_url)
)
last_assistant_idx = None
for i in range(len(result) - 1, -1, -1):
@@ -1446,22 +1617,22 @@ def convert_messages_to_anthropic(
if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
continue
if _is_kimi:
# Kimi's /coding endpoint enables thinking server-side and
# requires unsigned thinking blocks on replayed assistant
# tool-call messages. Strip signed Anthropic blocks (Kimi
# can't validate signatures) but preserve the unsigned ones
# we synthesised from reasoning_content above.
if _preserve_unsigned_thinking:
# Kimi's /coding and DeepSeek's /anthropic endpoints both enable
# thinking server-side and require unsigned thinking blocks on
# replayed assistant tool-call messages. Strip signed Anthropic
# blocks (neither upstream can validate Anthropic signatures) but
# preserve the unsigned ones we synthesised from reasoning_content.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("signature") or b.get("data"):
# Anthropic-signed block — Kimi can't validate, strip
# Anthropic-signed block — upstream can't validate, strip
continue
# Unsigned thinking (synthesised from reasoning_content) —
# keep it: Kimi needs it for message-history validation.
# keep it: the upstream needs it for message-history validation.
new_content.append(b)
m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
elif _is_third_party or idx != last_assistant_idx:
@@ -1557,7 +1728,9 @@ def build_anthropic_kwargs(
Currently only supported on native Anthropic endpoints (not third-party
compatible ones).
"""
system, anthropic_messages = convert_messages_to_anthropic(messages, base_url=base_url)
system, anthropic_messages = convert_messages_to_anthropic(
messages, base_url=base_url, model=model
)
anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
model = normalize_model_name(model, preserve_dots=preserve_dots)
@@ -1663,7 +1836,7 @@ def build_anthropic_kwargs(
# silently hides reasoning text that Hermes surfaces in its CLI. We
# request "summarized" so the reasoning blocks stay populated — matching
# 4.6 behavior and preserving the activity-feed UX during long tool runs.
_is_kimi_coding = _is_kimi_coding_endpoint(base_url)
_is_kimi_coding = _is_kimi_family_endpoint(base_url, model)
if reasoning_config and isinstance(reasoning_config, dict) and not _is_kimi_coding:
if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
effort = str(reasoning_config.get("effort", "medium")).lower()
+298 -20
View File
@@ -41,10 +41,57 @@ import threading
import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional, Tuple, TYPE_CHECKING
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
# NOTE: `from openai import OpenAI` is deliberately NOT at module top — the
# openai SDK pulls a large type tree (~240 ms cold, including responses/*,
# graders/*). We expose `OpenAI` here as a thin proxy that imports the SDK on
# first call and forwards, so:
# (a) the 15+ in-module `OpenAI(...)` construction sites work unchanged
# (Python's function-scope name lookup resolves `OpenAI` to the proxy
# object bound in module globals here, without triggering any import);
# (b) external code can still do `auxiliary_client.OpenAI` or
# `patch("agent.auxiliary_client.OpenAI", ...)` — tests see the proxy,
# and patch replaces the module attribute as usual;
# (c) `OpenAI` as a type annotation resolves at runtime to the proxy class
# (which is harmless — annotations aren't type-checked at runtime).
# See tests/agent/test_auxiliary_client.py for patch patterns this supports.
if TYPE_CHECKING:
from openai import OpenAI # noqa: F401 — type hints only
_OPENAI_CLS_CACHE: Optional[type] = None
def _load_openai_cls() -> type:
"""Import and cache ``openai.OpenAI``."""
global _OPENAI_CLS_CACHE
if _OPENAI_CLS_CACHE is None:
from openai import OpenAI as _cls
_OPENAI_CLS_CACHE = _cls
return _OPENAI_CLS_CACHE
class _OpenAIProxy:
"""Module-level proxy that looks like the ``openai.OpenAI`` class.
Forwards ``OpenAI(...)`` calls and ``isinstance(x, OpenAI)`` checks to the
real SDK class, importing the SDK lazily on first use.
"""
__slots__ = ()
def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)
def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())
def __repr__(self):
return "<lazy openai.OpenAI proxy>"
OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
@@ -54,6 +101,14 @@ from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_
logger = logging.getLogger(__name__)
def _safe_isinstance(obj: Any, maybe_type: Any) -> bool:
"""Return False instead of raising when a patched symbol is not a type."""
try:
return isinstance(obj, maybe_type)
except TypeError:
return False
def _extract_url_query_params(url: str):
"""Extract query params from URL, return (clean_url, default_query dict or None)."""
parsed = urlparse(url)
@@ -94,6 +149,10 @@ _PROVIDER_ALIASES = {
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
}
@@ -159,6 +218,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"kimi-coding-cn": "kimi-k2-turbo-preview",
"gmi": "google/gemini-3.1-flash-lite-preview",
"minimax": "MiniMax-M2.7",
"minimax-oauth": "MiniMax-M2.7-highspeed",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
"ai-gateway": "google/gemini-3-flash",
@@ -166,6 +226,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"opencode-go": "glm-5",
"kilocode": "google/gemini-3-flash-preview",
"ollama-cloud": "nemotron-3-nano:30b",
"tencent-tokenhub": "hy3-preview",
}
# Vision-specific model overrides for direct providers.
@@ -177,6 +238,21 @@ _PROVIDER_VISION_MODELS: Dict[str, str] = {
"zai": "glm-5v-turbo",
}
# Providers whose endpoint does not accept image input, even though the
# provider's broader ecosystem has vision models available elsewhere. When
# `auxiliary.vision.provider: auto` sees one of these as the main provider,
# it must skip straight to the aggregator chain instead of returning a client
# that will 404 on every vision request.
#
# kimi-coding / kimi-coding-cn: the Kimi Coding Plan routes through
# api.kimi.com/coding (Anthropic Messages wire) which Kimi's own docs
# describe as having no image_in capability. Vision lives on the separate
# Kimi Platform (api.moonshot.ai, OpenAI-wire, pay-as-you-go). See #17076.
_PROVIDERS_WITHOUT_VISION: frozenset = frozenset({
"kimi-coding",
"kimi-coding-cn",
})
# OpenRouter app attribution headers
_OR_HEADERS = {
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
@@ -405,6 +481,33 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Translate extra_body.reasoning (chat.completions shape) into the
# Responses API's top-level reasoning + include fields. Mirrors
# agent/transports/codex.py::build_kwargs() so auxiliary callers
# that configure reasoning via auxiliary.<task>.extra_body get the
# same behavior as the main agent's Codex transport.
extra_body = kwargs.get("extra_body") or {}
if isinstance(extra_body, dict):
reasoning_cfg = extra_body.get("reasoning")
if isinstance(reasoning_cfg, dict):
if reasoning_cfg.get("enabled") is False:
# Reasoning explicitly disabled — do not set reasoning
# or include. The Codex backend still thinks by
# default, but we honor the caller's intent where the
# API allows it.
pass
else:
effort = reasoning_cfg.get("effort", "medium")
# Codex backend rejects "minimal"; clamp to "low" to
# match the main-agent Codex transport behavior.
if effort == "minimal":
effort = "low"
resp_kwargs["reasoning"] = {
"effort": effort,
"summary": "auto",
}
resp_kwargs["include"] = ["reasoning.encrypted_content"]
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
@@ -714,6 +817,116 @@ class AsyncAnthropicAuxiliaryClient:
self.base_url = sync_wrapper.base_url
def _endpoint_speaks_anthropic_messages(base_url: str) -> bool:
"""True if the endpoint at ``base_url`` speaks the Anthropic Messages
protocol instead of OpenAI chat.completions.
Mirrors ``hermes_cli.runtime_provider._detect_api_mode_for_url`` so the
auxiliary client and the main agent stay in sync on transport selection.
Covers:
- Any URL ending in ``/anthropic`` (MiniMax, Zhipu GLM, LiteLLM proxies,
Anthropic-compatible gateways).
- ``api.kimi.com/coding`` (Kimi Coding Plan — the /coding route only
speaks Claude-Code's native Anthropic shape; ``chat.completions``
returns 404 on Anthropic-only model aliases like ``kimi-for-coding``).
- ``api.anthropic.com`` (native Anthropic).
"""
normalized = (base_url or "").strip().lower().rstrip("/")
if not normalized:
return False
if normalized.endswith("/anthropic"):
return True
hostname = base_url_hostname(normalized)
if hostname == "api.anthropic.com":
return True
if hostname == "api.kimi.com" and "/coding" in normalized:
return True
return False
def _maybe_wrap_anthropic(
client_obj: Any,
model: str,
api_key: str,
base_url: str,
api_mode: Optional[str] = None,
) -> Any:
"""Rewrap a plain OpenAI client in ``AnthropicAuxiliaryClient`` when
the endpoint actually speaks Anthropic Messages.
This is the single chokepoint for aux-client transport correction.
Runs at the end of every ``resolve_provider_client`` branch so that
api_key providers (Kimi Coding Plan), the ``custom`` endpoint, and
future /anthropic gateways all land on the right wire format
regardless of which branch built the client.
Returns ``client_obj`` unchanged when:
- It's already an Anthropic/Codex/Gemini/CopilotACP wrapper.
- The endpoint is an OpenAI-wire endpoint.
- ``api_mode`` is explicitly set to a non-Anthropic transport.
- The ``anthropic`` SDK is not installed (falls back to OpenAI wire).
"""
# Already wrapped — don't double-wrap.
if _safe_isinstance(client_obj, AnthropicAuxiliaryClient):
return client_obj
# Other specialized adapters we should never re-dispatch.
if _safe_isinstance(client_obj, CodexAuxiliaryClient):
return client_obj
try:
from agent.gemini_native_adapter import GeminiNativeClient
if _safe_isinstance(client_obj, GeminiNativeClient):
return client_obj
except ImportError:
pass
try:
from agent.copilot_acp_client import CopilotACPClient
if _safe_isinstance(client_obj, CopilotACPClient):
return client_obj
except ImportError:
pass
# Explicit non-anthropic api_mode wins over URL heuristics.
if api_mode and api_mode != "anthropic_messages":
return client_obj
should_wrap = (
api_mode == "anthropic_messages"
or _endpoint_speaks_anthropic_messages(base_url)
)
if not should_wrap:
return client_obj
try:
from agent.anthropic_adapter import build_anthropic_client
except ImportError:
logger.warning(
"Endpoint %s speaks Anthropic Messages but the anthropic SDK is "
"not installed — falling back to OpenAI-wire (will likely 404).",
base_url,
)
return client_obj
try:
real_client = build_anthropic_client(api_key, base_url)
except Exception as exc:
logger.warning(
"Failed to build Anthropic client for %s (%s) — falling back to "
"OpenAI-wire client.", base_url, exc,
)
return client_obj
logger.debug(
"Auxiliary transport: wrapping client in AnthropicAuxiliaryClient "
"(model=%s, base_url=%s, api_mode=%s)",
model, base_url[:60] if base_url else "", api_mode or "auto-detected",
)
return AnthropicAuxiliaryClient(
real_client, model, api_key, base_url, is_oauth=False,
)
def _read_nous_auth() -> Optional[dict]:
"""Read and validate ~/.hermes/auth.json for an active Nous provider.
@@ -884,7 +1097,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, base_url)
return _client, model
creds = resolve_api_key_provider_credentials(provider_id)
api_key = str(creds.get("api_key", "")).strip()
@@ -910,7 +1125,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, base_url)
return _client, model
return None, None
@@ -1194,7 +1411,13 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
# URL-based anthropic detection for custom endpoints that didn't set
# api_mode explicitly (e.g. kimi.com/coding reached via custom config).
_fallback_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _maybe_wrap_anthropic(
_fallback_client, model, custom_key, custom_base, custom_mode,
)
return _fallback_client, model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
@@ -1745,8 +1968,20 @@ def resolve_provider_client(
return True
return False
def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
"""Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = "",
api_key_str: str = ""):
"""Wrap a plain OpenAI client in the correct transport adapter.
Handles two cases:
- ``CodexAuxiliaryClient`` when the endpoint needs the Responses API
(explicit ``api_mode=codex_responses`` or api.openai.com + codex
model name).
- ``AnthropicAuxiliaryClient`` when the endpoint speaks Anthropic
Messages (explicit ``api_mode=anthropic_messages``, any ``/anthropic``
suffix, ``api.kimi.com/coding``, or ``api.anthropic.com``).
Clients that are already specialized wrappers pass through unchanged.
"""
if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
logger.debug(
"resolve_provider_client: wrapping client in CodexAuxiliaryClient "
@@ -1754,7 +1989,11 @@ def resolve_provider_client(
api_mode or "auto-detected", final_model_str,
base_url_str[:60] if base_url_str else "")
return CodexAuxiliaryClient(client_obj, final_model_str)
return client_obj
# Anthropic-wire endpoints: rewrap plain OpenAI clients so
# chat.completions.create() is translated to /v1/messages.
return _maybe_wrap_anthropic(
client_obj, final_model_str, api_key_str, base_url_str, api_mode,
)
# ── Auto: try all providers in priority order ────────────────────
if provider == "auto":
@@ -1834,7 +2073,7 @@ def resolve_provider_client(
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
if provider == "custom":
if explicit_base_url:
custom_base = explicit_base_url.strip()
custom_base = _to_openai_base_url(explicit_base_url).strip()
custom_key = (
(explicit_api_key or "").strip()
or os.getenv("OPENAI_API_KEY", "").strip()
@@ -1847,7 +2086,7 @@ def resolve_provider_client(
)
return None, None
final_model = _normalize_resolved_model(
model or _read_main_model() or "gpt-4o-mini",
model or (main_runtime.get("model") if main_runtime else None) or "gpt-4o-mini",
provider,
)
extra = {}
@@ -1862,7 +2101,7 @@ def resolve_provider_client(
is_agent_turn=True, is_vision=is_vision
)
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base)
client = _wrap_if_needed(client, final_model, custom_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# Try custom first, then codex, then API-key providers
@@ -1872,7 +2111,8 @@ def resolve_provider_client(
if client is not None:
final_model = _normalize_resolved_model(model or default, provider)
_cbase = str(getattr(client, "base_url", "") or "")
client = _wrap_if_needed(client, final_model, _cbase)
_ckey = str(getattr(client, "api_key", "") or "")
client = _wrap_if_needed(client, final_model, _cbase, _ckey)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: custom/main requested "
@@ -1895,10 +2135,22 @@ def resolve_provider_client(
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
model
or custom_entry.get("model")
or (main_runtime.get("model") if main_runtime else None)
or _read_main_model()
or "gpt-4o-mini",
provider,
)
_clean_base2, _dq2 = _extract_url_query_params(custom_base)
# anthropic_messages talks to the /anthropic surface directly;
# OpenAI-wire paths (chat_completions / codex_responses) need the
# /v1 equivalent. Rewrite only on the OpenAI-wire path so the
# Anthropic fallback SDK still sees the original URL.
if entry_api_mode == "anthropic_messages":
openai_base = custom_base
else:
openai_base = _to_openai_base_url(custom_base)
_clean_base2, _dq2 = _extract_url_query_params(openai_base)
_extra2 = {"default_query": _dq2} if _dq2 else {}
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
@@ -1917,7 +2169,12 @@ def resolve_provider_client(
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
# Fallback went OpenAI-wire after all — redo the query
# extraction against the rewritten /v1 URL.
_fallback_base = _to_openai_base_url(custom_base)
_fb_clean, _fb_dq = _extract_url_query_params(_fallback_base)
_fb_extra = {"default_query": _fb_dq} if _fb_dq else {}
client = OpenAI(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
@@ -1936,7 +2193,7 @@ def resolve_provider_client(
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
client = _wrap_if_needed(client, final_model, openai_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning(
@@ -2029,8 +2286,11 @@ def resolve_provider_client(
# Honor api_mode for any API-key provider (e.g. direct OpenAI with
# codex-family models). The copilot-specific wrapping above handles
# copilot; this covers the general case (#6800).
client = _wrap_if_needed(client, final_model, base_url)
# copilot; this covers the general case (#6800). Also rewraps
# Anthropic-wire endpoints (Kimi Coding Plan api.kimi.com/coding,
# /anthropic-suffixed gateways) so named providers like kimi-coding
# land on the right transport without needing per-provider branches.
client = _wrap_if_needed(client, final_model, base_url, api_key)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
@@ -2038,7 +2298,12 @@ def resolve_provider_client(
if pconfig.auth_type == "external_process":
creds = resolve_external_process_provider_credentials(provider)
final_model = _normalize_resolved_model(model or _read_main_model(), provider)
final_model = _normalize_resolved_model(
model
or (main_runtime.get("model") if main_runtime else None)
or _read_main_model(),
provider,
)
if provider == "copilot-acp":
api_key = str(creds.get("api_key", "")).strip()
base_url = str(creds.get("base_url", "")).strip()
@@ -2293,6 +2558,19 @@ def resolve_vision_provider_client(
main_provider, default_model or resolved_model or main_model,
)
return _finalize(main_provider, sync_client, default_model)
elif main_provider in _PROVIDERS_WITHOUT_VISION:
# Kimi Coding Plan's /coding endpoint (Anthropic Messages wire)
# does not accept image input — Kimi's own docs say "Current
# model does not support image input, switch to a model with
# image_in capability" and vision lives on the separate Kimi
# Platform (api.moonshot.ai). Skip the main provider and fall
# through to the aggregator chain instead of returning a
# client that will 404 on every vision request (#17076).
logger.debug(
"Vision auto-detect: skipping main provider %s (no "
"vision support) — falling through to aggregator chain",
main_provider,
)
else:
rpc_client, rpc_model = resolve_provider_client(
main_provider, vision_model,
@@ -2774,7 +3052,7 @@ def _get_task_extra_body(task: str) -> Dict[str, Any]:
# Providers that use Anthropic-compatible endpoints (via OpenAI SDK wrapper).
# Their image content blocks must use Anthropic format, not OpenAI format.
_ANTHROPIC_COMPAT_PROVIDERS = frozenset({"minimax", "minimax-cn"})
_ANTHROPIC_COMPAT_PROVIDERS = frozenset({"minimax", "minimax-oauth", "minimax-cn"})
def _is_anthropic_compat_endpoint(provider: str, base_url: str) -> bool:
+41 -3
View File
@@ -291,14 +291,52 @@ def has_aws_credentials(env: Optional[Dict[str, str]] = None) -> bool:
def resolve_bedrock_region(env: Optional[Dict[str, str]] = None) -> str:
"""Resolve the AWS region for Bedrock API calls.
Priority: AWS_REGION → AWS_DEFAULT_REGION → us-east-1 (fallback).
Priority:
1. AWS_REGION env var
2. AWS_DEFAULT_REGION env var
3. boto3/botocore configured region (from ~/.aws/config or SSO profile)
4. us-east-1 (hard fallback)
The boto3 fallback is critical for EU/AP users who configure their region
in ~/.aws/config via a named profile rather than env vars — without it,
live model discovery would always return us.* profile IDs regardless of
the user's actual region.
"""
env = env if env is not None else os.environ
return (
explicit = (
env.get("AWS_REGION", "").strip()
or env.get("AWS_DEFAULT_REGION", "").strip()
or "us-east-1"
)
if explicit:
return explicit
try:
import botocore.session
region = botocore.session.get_session().get_config_variable("region")
if region:
return region
except Exception:
pass
return "us-east-1"
def bedrock_model_ids_or_none() -> Optional[List[str]]:
"""Live-discover Bedrock model IDs for the active region.
Returns a list of model ID strings if discovery succeeds and yields
at least one model, or ``None`` on failure / empty result. Callers
should fall back to the static curated list when ``None`` is returned.
This helper consolidates the discover → extract-ids → fallback
pattern that was previously duplicated across ``provider_model_ids``,
``list_authenticated_providers`` section 2, and section 3.
"""
try:
discovered = discover_bedrock_models(resolve_bedrock_region())
if discovered:
return [m["id"] for m in discovered]
except Exception:
pass
return None
# ---------------------------------------------------------------------------
+50
View File
@@ -340,6 +340,8 @@ class ContextCompressor(ContextEngine):
self._last_summary_error = None
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -448,6 +450,12 @@ class ContextCompressor(ContextEngine):
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
# succeeded. Silent recovery would hide the broken config.
self._last_aux_model_failure_error: Optional[str] = None
self._last_aux_model_failure_model: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -907,10 +915,50 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
# Record the aux-model failure so callers can warn the user
# even if the retry-on-main succeeds — a misconfigured aux
# model is something the user needs to fix.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic) # retry immediately
# Unknown-error best-effort retry on main model. Losing N turns of
# context is almost always worse than one extra summary attempt, so
# if we haven't already fallen back and the summary model differs
# from the main model, try once more on main before entering
# cooldown. Errors that DID match _is_model_not_found above are
# already handled by the fast-path retry; this branch catches
# everything else (400s, provider-specific "no route" strings,
# aggregator rejections, etc.) where auto-retry is still safer
# than dropping the turns.
if (
self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' failed (%s). "
"Retrying on main model '%s' before giving up.",
self.summary_model, e, self.model,
)
# Record the aux-model failure (see 404 branch above) — user
# should know their configured model is broken even if main
# recovers the call.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
@@ -1208,6 +1256,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_summary_error = None
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
+118 -1
View File
@@ -7,7 +7,6 @@ import random
import threading
import time
import uuid
import os
import re
from dataclasses import dataclass, fields, replace
from datetime import datetime
@@ -456,6 +455,70 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_codex_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Codex device_code pool entry from auth.json if tokens differ.
When a Codex OAuth access token expires (or the ChatGPT account hits
its 5h/weekly quota), the pool entry gets marked ``STATUS_EXHAUSTED``
with a ``last_error_reset_at`` that can be many hours in the future.
Meanwhile the user may run ``hermes model`` / ``hermes auth`` which
performs a fresh device-code login and writes new tokens to
``auth.json`` under ``_auth_store_lock``. Without this sync the pool
entry stays frozen until ``last_error_reset_at`` elapses — even
though fresh credentials are sitting on disk — and every request
fails with "no available entries (all exhausted or empty)".
Mirrors the Nous/Anthropic resync paths above. Only applies to
device_code-sourced entries; env/API-key-sourced entries have no
auth.json shadow to sync from.
"""
if self.provider != "openai-codex" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "openai-codex")
if not isinstance(state, dict):
return entry
tokens = state.get("tokens")
if not isinstance(tokens, dict):
return entry
store_access = tokens.get("access_token", "")
store_refresh = tokens.get("refresh_token", "")
# Adopt auth.json tokens when either side differs. Codex refresh
# tokens are single-use too, so a fresh refresh_token from
# another process means our entry's pair is consumed/stale.
entry_access = entry.access_token or ""
entry_refresh = entry.refresh_token or ""
if store_access and (
store_access != entry_access
or (store_refresh and store_refresh != entry_refresh)
):
logger.debug(
"Pool entry %s: syncing Codex tokens from auth.json "
"(refreshed by another process)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh or entry.refresh_token,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
"last_error_reason": None,
"last_error_message": None,
"last_error_reset_at": None,
}
if state.get("last_refresh"):
field_updates["last_refresh"] = state["last_refresh"]
updated = replace(entry, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Codex entry from auth.json: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
@@ -788,6 +851,18 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For openai-codex entries, same pattern: the user may have
# re-authed via `hermes model` / `hermes auth` after a 429/401,
# leaving fresh tokens on disk while the pool entry is still
# frozen behind last_error_reset_at (can be hours in the
# future for ChatGPT weekly windows).
if (self.provider == "openai-codex"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_codex_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -1224,6 +1299,48 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
except Exception as exc:
logger.debug("Qwen OAuth token seed failed: %s", exc)
elif provider == "minimax-oauth":
# MiniMax OAuth tokens live in ~/.hermes/auth.json providers.minimax-oauth.
# Seed the pool so `/auth list` reflects the logged-in state and the
# standard `hermes auth remove minimax-oauth <N>` flow works.
# Use refresh_if_expiring=False equivalent: resolve_minimax_oauth_runtime_credentials
# always refreshes on expiry, so instead read raw state here to avoid
# surprise network calls during provider discovery.
try:
from hermes_cli.auth import get_provider_auth_state
state = get_provider_auth_state("minimax-oauth")
if state and state.get("access_token"):
source_name = "oauth"
if not _is_suppressed(provider, source_name):
active_sources.add(source_name)
expires_at_ms = None
try:
from datetime import datetime as _dt
raw = state.get("expires_at", "")
if raw:
expires_at_ms = int(_dt.fromisoformat(raw).timestamp() * 1000)
except Exception:
expires_at_ms = None
base_url = str(state.get("inference_base_url", "") or "").rstrip("/")
changed |= _upsert_entry(
entries,
provider,
source_name,
{
"source": source_name,
"auth_type": AUTH_TYPE_OAUTH,
"access_token": state["access_token"],
"refresh_token": state.get("refresh_token"),
"expires_at_ms": expires_at_ms,
"base_url": base_url,
"label": state.get("label", "") or label_from_token(
state.get("access_token", ""), source_name
),
},
)
except Exception as exc:
logger.debug("MiniMax OAuth token seed failed: %s", exc)
elif provider == "openai-codex":
# Respect user suppression — `hermes auth remove openai-codex` marks
# the device_code source as suppressed so it won't be re-seeded from
+18 -1
View File
@@ -47,7 +47,6 @@ from __future__ import annotations
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Callable, List, Optional
@@ -253,6 +252,19 @@ def _remove_nous_device_code(provider: str, removed) -> RemovalResult:
return result
def _remove_minimax_oauth(provider: str, removed) -> RemovalResult:
"""MiniMax OAuth lives in auth.json providers.minimax-oauth — clear it.
Same pattern as Nous: single-source OAuth state with refresh tokens.
Suppression of the `oauth` source ensures the pool reseed path
(_seed_from_singletons) doesn't instantly undo the removal.
"""
result = RemovalResult()
if _clear_auth_store_provider(provider):
result.cleaned.append(f"Cleared {provider} OAuth tokens from auth store")
return result
def _remove_codex_device_code(provider: str, removed) -> RemovalResult:
"""Codex tokens live in TWO places: our auth store AND ~/.codex/auth.json.
@@ -390,6 +402,11 @@ def _register_all_sources() -> None:
remove_fn=_remove_qwen_cli,
description="~/.qwen/oauth_creds.json",
))
register(RemovalStep(
provider="minimax-oauth", source_id="oauth",
remove_fn=_remove_minimax_oauth,
description="auth.json providers.minimax-oauth",
))
register(RemovalStep(
provider="*", source_id="config:",
match_fn=lambda src: src.startswith("config:") or src == "model_config",
+869
View File
@@ -0,0 +1,869 @@
"""Curator — background skill maintenance orchestrator.
The curator is an auxiliary-model task that periodically reviews agent-created
skills and maintains the collection. It runs inactivity-triggered (no cron
daemon): when the agent is idle and the last curator run was longer than
``interval_hours`` ago, ``maybe_run_curator()`` spawns a forked AIAgent to do
the review.
Responsibilities:
- Auto-transition lifecycle states based on last_used_at timestamps
- Spawn a background review agent that can pin / archive / consolidate /
patch agent-created skills via skill_manage
- Persist curator state (last_run_at, paused, etc.) in .curator_state
Strict invariants:
- Only touches agent-created skills (see tools/skill_usage.is_agent_created)
- Never auto-deletes — only archives. Archive is recoverable.
- Pinned skills bypass all auto-transitions
- Uses the auxiliary client; never touches the main session's prompt cache
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import threading
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Set
from hermes_constants import get_hermes_home
from tools import skill_usage
logger = logging.getLogger(__name__)
DEFAULT_INTERVAL_HOURS = 24 * 7 # 7 days
DEFAULT_MIN_IDLE_HOURS = 2
DEFAULT_STALE_AFTER_DAYS = 30
DEFAULT_ARCHIVE_AFTER_DAYS = 90
# ---------------------------------------------------------------------------
# .curator_state — persistent scheduler + status
# ---------------------------------------------------------------------------
def _state_file() -> Path:
return get_hermes_home() / "skills" / ".curator_state"
def _default_state() -> Dict[str, Any]:
return {
"last_run_at": None,
"last_run_duration_seconds": None,
"last_run_summary": None,
"paused": False,
"run_count": 0,
}
def load_state() -> Dict[str, Any]:
path = _state_file()
if not path.exists():
return _default_state()
try:
data = json.loads(path.read_text(encoding="utf-8"))
if isinstance(data, dict):
base = _default_state()
base.update({k: v for k, v in data.items() if k in base or k.startswith("_")})
return base
except (OSError, json.JSONDecodeError) as e:
logger.debug("Failed to read curator state: %s", e)
return _default_state()
def save_state(data: Dict[str, Any]) -> None:
path = _state_file()
try:
path.parent.mkdir(parents=True, exist_ok=True)
fd, tmp = tempfile.mkstemp(dir=str(path.parent), prefix=".curator_state_", suffix=".tmp")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
except Exception as e:
logger.debug("Failed to save curator state: %s", e, exc_info=True)
def set_paused(paused: bool) -> None:
state = load_state()
state["paused"] = bool(paused)
save_state(state)
def is_paused() -> bool:
return bool(load_state().get("paused"))
# ---------------------------------------------------------------------------
# Config access
# ---------------------------------------------------------------------------
def _load_config() -> Dict[str, Any]:
"""Read curator.* config from ~/.hermes/config.yaml. Tolerates missing file."""
try:
from hermes_cli.config import load_config
cfg = load_config()
except Exception as e:
logger.debug("Failed to load config for curator: %s", e)
return {}
if not isinstance(cfg, dict):
return {}
cur = cfg.get("curator") or {}
if not isinstance(cur, dict):
return {}
return cur
def is_enabled() -> bool:
"""Default ON when no config says otherwise."""
cfg = _load_config()
return bool(cfg.get("enabled", True))
def get_interval_hours() -> int:
cfg = _load_config()
try:
return int(cfg.get("interval_hours", DEFAULT_INTERVAL_HOURS))
except (TypeError, ValueError):
return DEFAULT_INTERVAL_HOURS
def get_min_idle_hours() -> float:
cfg = _load_config()
try:
return float(cfg.get("min_idle_hours", DEFAULT_MIN_IDLE_HOURS))
except (TypeError, ValueError):
return DEFAULT_MIN_IDLE_HOURS
def get_stale_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("stale_after_days", DEFAULT_STALE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_STALE_AFTER_DAYS
def get_archive_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("archive_after_days", DEFAULT_ARCHIVE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_ARCHIVE_AFTER_DAYS
# ---------------------------------------------------------------------------
# Idle / interval check
# ---------------------------------------------------------------------------
def _parse_iso(ts: Optional[str]) -> Optional[datetime]:
if not ts:
return None
try:
return datetime.fromisoformat(ts)
except (TypeError, ValueError):
return None
def should_run_now(now: Optional[datetime] = None) -> bool:
"""Return True if the curator should run immediately.
Gates:
- curator.enabled == True
- not paused
- last_run_at missing, OR older than interval_hours
The idle check (min_idle_hours) is applied at the call site where we know
whether an agent is actively running — here we only enforce the static
gates.
"""
if not is_enabled():
return False
if is_paused():
return False
state = load_state()
last = _parse_iso(state.get("last_run_at"))
if last is None:
return True
if now is None:
now = datetime.now(timezone.utc)
if last.tzinfo is None:
last = last.replace(tzinfo=timezone.utc)
interval = timedelta(hours=get_interval_hours())
return (now - last) >= interval
# ---------------------------------------------------------------------------
# Automatic state transitions (pure function, no LLM)
# ---------------------------------------------------------------------------
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every agent-created skill and move active/stale/archived based on
last_used_at. Pinned skills are never touched. Returns a counter dict
describing what changed."""
from tools import skill_usage as _u
if now is None:
now = datetime.now(timezone.utc)
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}
for row in _u.agent_created_report():
counts["checked"] += 1
name = row["name"]
if row.get("pinned"):
continue
last_used = _parse_iso(row.get("last_used_at"))
# If never used, treat as using created_at as the anchor so new skills
# don't immediately archive themselves.
anchor = last_used or _parse_iso(row.get("created_at")) or now
if anchor.tzinfo is None:
anchor = anchor.replace(tzinfo=timezone.utc)
current = row.get("state", _u.STATE_ACTIVE)
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
ok, _msg = _u.archive_skill(name)
if ok:
counts["archived"] += 1
elif anchor <= stale_cutoff and current == _u.STATE_ACTIVE:
_u.set_state(name, _u.STATE_STALE)
counts["marked_stale"] += 1
elif anchor > stale_cutoff and current == _u.STATE_STALE:
# Skill got used again after being marked stale — reactivate.
_u.set_state(name, _u.STATE_ACTIVE)
counts["reactivated"] += 1
return counts
# ---------------------------------------------------------------------------
# Review prompt for the forked agent
# ---------------------------------------------------------------------------
CURATOR_REVIEW_PROMPT = (
"You are running as Hermes' background skill CURATOR. This is an "
"UMBRELLA-BUILDING consolidation pass, not a passive audit and not a "
"duplicate-finder.\n\n"
"The goal of the skill collection is a LIBRARY OF CLASS-LEVEL "
"INSTRUCTIONS AND EXPERIENTIAL KNOWLEDGE. A collection of hundreds of "
"narrow skills where each one captures one session's specific bug is "
"a FAILURE of the library — not a feature. An agent searching skills "
"matches on descriptions, not on exact names; one broad umbrella "
"skill with labeled subsections beats five narrow siblings for "
"discoverability, not the other way around.\n\n"
"The right target shape is CLASS-LEVEL skills with rich SKILL.md "
"bodies + `references/`, `templates/`, and `scripts/` subfiles for "
"session-specific detail — not one-session-one-skill micro-entries.\n\n"
"Hard rules — do not violate:\n"
"1. DO NOT touch bundled or hub-installed skills. The candidate list "
"below is already filtered to agent-created skills only.\n"
"2. DO NOT delete any skill. Archiving (moving the skill's directory "
"into ~/.hermes/skills/.archive/) is the maximum destructive action. "
"Archives are recoverable; deletion is not.\n"
"3. DO NOT touch skills shown as pinned=yes. Skip them entirely.\n"
"4. DO NOT use usage counters as a reason to skip consolidation. The "
"counters are new and often mostly zero. Judge overlap on CONTENT, "
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
"absence of evidence either way.\n"
"5. DO NOT reject consolidation on the grounds that 'each skill has "
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
"right bar is: 'would a human maintainer write this as N separate "
"skills, or as one skill with N labeled subsections?' When the "
"answer is the latter, merge.\n\n"
"How to work — not optional:\n"
"1. Scan the full candidate list. Identify PREFIX CLUSTERS (skills "
"sharing a first word or domain keyword). Examples you are likely "
"to find: hermes-config-*, hermes-dashboard-*, gateway-*, codex-*, "
"ollama-*, anthropic-*, gemini-*, mcp-*, salvage-*, pr-*, "
"competitor-*, python-*, security-*, etc. Expect 10-25 clusters.\n"
"2. For each cluster with 2+ members, do NOT ask 'are these pairs "
"overlapping?' — ask 'what is the UMBRELLA CLASS these skills all "
"serve? Would a maintainer name that class and write one skill for "
"it?' If yes, pick (or create) the umbrella and absorb the siblings "
"into it.\n"
"3. Three ways to consolidate — use the right one per cluster:\n"
" a. MERGE INTO EXISTING UMBRELLA — one skill in the cluster is "
"already broad enough to be the umbrella (example: `pr-triage-"
"salvage` for the PR review cluster). Patch it to add a labeled "
"section for each sibling's unique insight, then archive the "
"siblings.\n"
" b. CREATE A NEW UMBRELLA SKILL.md — no existing member is broad "
"enough. Use skill_manage action=create to write a new class-level "
"skill whose SKILL.md covers the shared workflow and has short "
"labeled subsections. Archive the now-absorbed narrow siblings.\n"
" c. DEMOTE TO REFERENCES/TEMPLATES/SCRIPTS — a sibling has "
"narrow-but-valuable session-specific content. Move it into the "
"umbrella's appropriate support directory:\n"
" • `references/<topic>.md` for session-specific detail OR "
"condensed knowledge banks (quoted research, API docs excerpts, "
"domain notes, provider quirks, reproduction recipes)\n"
" • `templates/<name>.<ext>` for starter files meant to be "
"copied and modified\n"
" • `scripts/<name>.<ext>` for statically re-runnable actions "
"(verification scripts, fixture generators, probes)\n"
" Then archive the old sibling. Use `terminal` with `mkdir -p "
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
"references/<topic>.md` (or templates/ / scripts/).\n"
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
"a feature codename, a specific error string, an 'audit' / "
"'diagnosis' / 'salvage' session artifact). These almost always "
"belong as a subsection or support file under a class-level umbrella.\n"
"5. Iterate. After one consolidation round, scan the remaining set "
"and look for the NEXT umbrella opportunity. Don't stop after 3 "
"merges.\n\n"
"Your toolset:\n"
" - skills_list, skill_view — read the current landscape\n"
" - skill_manage action=patch — add sections to the umbrella\n"
" - skill_manage action=create — create a new umbrella SKILL.md\n"
" - skill_manage action=write_file — add a references/, templates/, "
"or scripts/ file under an existing skill (the skill must already "
"exist)\n"
" - terminal — mv a sibling into the archive "
"OR move its content into a support subfile\n\n"
"'keep' is a legitimate decision ONLY when the skill is already a "
"class-level umbrella and none of the proposed merges would improve "
"discoverability. 'This is narrow but distinct from its siblings' "
"is NOT a reason to keep — it's a reason to move it under an "
"umbrella as a subsection or support file.\n\n"
"Expected output: real umbrella-ification. Process every obvious "
"cluster. If you end the pass with fewer than 10 archives, you "
"stopped too early — go back and look at the clusters you left "
"alone.\n\n"
"When done, write a summary with: clusters processed, skills "
"patched/absorbed, skills demoted to references/templates/scripts, "
"skills archived, new umbrellas created, and clusters you "
"deliberately left alone with one line each."
)
# ---------------------------------------------------------------------------
# Per-run reports — {YYYYMMDD-HHMMSS}/run.json + REPORT.md under logs/curator/
# ---------------------------------------------------------------------------
def _reports_root() -> Path:
"""Directory where curator run reports are written.
Lives under the profile-aware logs dir (``~/.hermes/logs/curator/``)
alongside ``agent.log`` and ``gateway.log`` so it's found by anyone
looking for operational telemetry, not mixed in with the user's
authored skill data in ``~/.hermes/skills/``.
"""
return get_hermes_home() / "logs" / "curator"
def _write_run_report(
*,
started_at: datetime,
elapsed_seconds: float,
auto_counts: Dict[str, int],
auto_summary: str,
before_report: List[Dict[str, Any]],
before_names: Set[str],
after_report: List[Dict[str, Any]],
llm_meta: Dict[str, Any],
) -> Optional[Path]:
"""Write run.json + REPORT.md under logs/curator/{YYYYMMDD-HHMMSS}/.
Returns the report directory path on success, None if the write
couldn't happen (caller logs and continues — reporting is best-effort).
"""
root = _reports_root()
try:
root.mkdir(parents=True, exist_ok=True)
except Exception as e:
logger.debug("Curator report dir create failed: %s", e)
return None
stamp = started_at.strftime("%Y%m%d-%H%M%S")
run_dir = root / stamp
# If we crash-reran within the same second, append a disambiguator
suffix = 1
while run_dir.exists():
suffix += 1
run_dir = root / f"{stamp}-{suffix}"
try:
run_dir.mkdir(parents=True, exist_ok=False)
except Exception as e:
logger.debug("Curator run dir create failed: %s", e)
return None
# Diff before/after
after_by_name = {r.get("name"): r for r in after_report if isinstance(r, dict)}
after_names = set(after_by_name.keys())
removed = sorted(before_names - after_names) # archived during this run
added = sorted(after_names - before_names) # new skills this run
before_by_name = {r.get("name"): r for r in before_report if isinstance(r, dict)}
# State transitions between the two snapshots (e.g. active -> stale)
transitions: List[Dict[str, str]] = []
for name in sorted(after_names & before_names):
s_before = (before_by_name.get(name) or {}).get("state")
s_after = (after_by_name.get(name) or {}).get("state")
if s_before and s_after and s_before != s_after:
transitions.append({"name": name, "from": s_before, "to": s_after})
# Classify LLM tool calls
tc_counts: Dict[str, int] = {}
for tc in llm_meta.get("tool_calls", []) or []:
name = tc.get("name", "unknown")
tc_counts[name] = tc_counts.get(name, 0) + 1
payload = {
"started_at": started_at.isoformat(),
"duration_seconds": round(elapsed_seconds, 2),
"model": llm_meta.get("model", ""),
"provider": llm_meta.get("provider", ""),
"auto_transitions": auto_counts,
"counts": {
"before": len(before_names),
"after": len(after_names),
"delta": len(after_names) - len(before_names),
"archived_this_run": len(removed),
"added_this_run": len(added),
"state_transitions": len(transitions),
"tool_calls_total": sum(tc_counts.values()),
},
"tool_call_counts": tc_counts,
"archived": removed,
"added": added,
"state_transitions": transitions,
"llm_final": llm_meta.get("final", ""),
"llm_summary": llm_meta.get("summary", ""),
"llm_error": llm_meta.get("error"),
"tool_calls": llm_meta.get("tool_calls", []),
}
# run.json — machine-readable, full fidelity
try:
(run_dir / "run.json").write_text(
json.dumps(payload, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
except Exception as e:
logger.debug("Curator run.json write failed: %s", e)
# REPORT.md — human-readable
try:
md = _render_report_markdown(payload)
(run_dir / "REPORT.md").write_text(md, encoding="utf-8")
except Exception as e:
logger.debug("Curator REPORT.md write failed: %s", e)
return run_dir
def _render_report_markdown(p: Dict[str, Any]) -> str:
"""Render the human-readable report."""
lines: List[str] = []
started = p.get("started_at", "")
duration = p.get("duration_seconds", 0) or 0
mins, secs = divmod(int(duration), 60)
dur_label = f"{mins}m {secs}s" if mins else f"{secs}s"
lines.append(f"# Curator run — {started}\n")
model = p.get("model") or "(not resolved)"
prov = p.get("provider") or "(not resolved)"
counts = p.get("counts") or {}
lines.append(
f"Model: `{model}` via `{prov}` · Duration: {dur_label} · "
f"Agent-created skills: {counts.get('before', 0)}{counts.get('after', 0)} "
f"({counts.get('delta', 0):+d})\n"
)
error = p.get("llm_error")
if error:
lines.append(f"> ⚠ LLM pass error: `{error}`\n")
# Auto-transitions (pure, no LLM)
auto = p.get("auto_transitions") or {}
lines.append("## Auto-transitions (pure, no LLM)\n")
lines.append(f"- checked: {auto.get('checked', 0)}")
lines.append(f"- marked stale: {auto.get('marked_stale', 0)}")
lines.append(f"- archived: {auto.get('archived', 0)}")
lines.append(f"- reactivated: {auto.get('reactivated', 0)}")
lines.append("")
# LLM pass numbers
tc_counts = p.get("tool_call_counts") or {}
lines.append("## LLM consolidation pass\n")
lines.append(f"- tool calls: **{counts.get('tool_calls_total', 0)}** "
f"(by name: {', '.join(f'{k}={v}' for k, v in sorted(tc_counts.items())) or 'none'})")
lines.append(f"- archived this run: **{counts.get('archived_this_run', 0)}**")
lines.append(f"- new skills this run: **{counts.get('added_this_run', 0)}**")
lines.append(f"- state transitions (active ↔ stale ↔ archived): "
f"**{counts.get('state_transitions', 0)}**")
lines.append("")
# Archived list
archived = p.get("archived") or []
if archived:
lines.append(f"### Skills archived ({len(archived)})\n")
lines.append("_Archived skills are at `~/.hermes/skills/.archive/`. "
"Restore any via `hermes curator restore <name>`._\n")
# Show first 50 inline, note truncation after that
SHOW = 50
for n in archived[:SHOW]:
lines.append(f"- `{n}`")
if len(archived) > SHOW:
lines.append(f"- … and {len(archived) - SHOW} more (see `run.json` for the full list)")
lines.append("")
# Added list
added = p.get("added") or []
if added:
lines.append(f"### New skills this run ({len(added)})\n")
lines.append("_Usually these are new class-level umbrellas created via `skill_manage action=create`._\n")
for n in added:
lines.append(f"- `{n}`")
lines.append("")
# State transitions
trans = p.get("state_transitions") or []
if trans:
lines.append(f"### State transitions ({len(trans)})\n")
for t in trans:
lines.append(f"- `{t.get('name')}`: {t.get('from')}{t.get('to')}")
lines.append("")
# Full LLM final response
final = (p.get("llm_final") or "").strip()
if final:
lines.append("## LLM final summary\n")
lines.append(final)
lines.append("")
elif not error:
llm_sum = p.get("llm_summary") or ""
if llm_sum:
lines.append("## LLM summary\n")
lines.append(llm_sum)
lines.append("")
# Recovery footer
lines.append("## Recovery\n")
lines.append("- Restore an archived skill: `hermes curator restore <name>`")
lines.append("- All archives live under `~/.hermes/skills/.archive/` and are recoverable by `mv`")
lines.append("- See `run.json` in this directory for the full machine-readable record.")
lines.append("")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Orchestrator — spawn a forked AIAgent for the LLM review pass
# ---------------------------------------------------------------------------
def _render_candidate_list() -> str:
"""Human/agent-readable list of agent-created skills with usage stats."""
rows = skill_usage.agent_created_report()
if not rows:
return "No agent-created skills to review."
lines = [f"Agent-created skills ({len(rows)}):\n"]
for r in rows:
lines.append(
f"- {r['name']} "
f"state={r['state']} "
f"pinned={'yes' if r.get('pinned') else 'no'} "
f"use={r.get('use_count', 0)} "
f"view={r.get('view_count', 0)} "
f"patches={r.get('patch_count', 0)} "
f"last_used={r.get('last_used_at') or 'never'}"
)
return "\n".join(lines)
def run_curator_review(
on_summary: Optional[Callable[[str], None]] = None,
synchronous: bool = False,
) -> Dict[str, Any]:
"""Execute a single curator review pass.
Steps:
1. Apply automatic state transitions (pure, no LLM).
2. If there are agent-created skills, spawn a forked AIAgent that runs
the LLM review prompt against the current candidate list.
3. Update .curator_state with last_run_at and a one-line summary.
4. Invoke *on_summary* with a user-visible description.
If *synchronous* is True, the LLM review runs in the calling thread; the
default is to spawn a daemon thread so the caller returns immediately.
"""
start = datetime.now(timezone.utc)
counts = apply_automatic_transitions(now=start)
auto_summary_parts = []
if counts["marked_stale"]:
auto_summary_parts.append(f"{counts['marked_stale']} marked stale")
if counts["archived"]:
auto_summary_parts.append(f"{counts['archived']} archived")
if counts["reactivated"]:
auto_summary_parts.append(f"{counts['reactivated']} reactivated")
auto_summary = ", ".join(auto_summary_parts) if auto_summary_parts else "no changes"
# Persist state before the LLM pass so a crash mid-review still records
# the run and doesn't immediately re-trigger.
state = load_state()
state["last_run_at"] = start.isoformat()
state["run_count"] = int(state.get("run_count", 0)) + 1
state["last_run_summary"] = f"auto: {auto_summary}"
save_state(state)
def _llm_pass():
nonlocal auto_summary
# Snapshot skill state BEFORE the LLM pass so the report can diff.
try:
before_report = skill_usage.agent_created_report()
except Exception:
before_report = []
before_names = {r.get("name") for r in before_report if isinstance(r, dict)}
llm_meta: Dict[str, Any] = {}
try:
candidate_list = _render_candidate_list()
if "No agent-created skills" in candidate_list:
final_summary = f"auto: {auto_summary}; llm: skipped (no candidates)"
llm_meta = {
"final": "",
"summary": "skipped (no candidates)",
"model": "",
"provider": "",
"tool_calls": [],
"error": None,
}
else:
prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
llm_meta = _run_llm_review(prompt)
final_summary = (
f"auto: {auto_summary}; llm: {llm_meta.get('summary', 'no change')}"
)
except Exception as e:
logger.debug("Curator LLM pass failed: %s", e, exc_info=True)
final_summary = f"auto: {auto_summary}; llm: error ({e})"
llm_meta = {
"final": "",
"summary": f"error ({e})",
"model": "",
"provider": "",
"tool_calls": [],
"error": str(e),
}
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
state2 = load_state()
state2["last_run_duration_seconds"] = elapsed
state2["last_run_summary"] = final_summary
# Write the per-run report. Runs in a best-effort try so a
# reporting bug never breaks the curator itself. Report path is
# recorded in state so `hermes curator status` can point at it.
try:
after_report = skill_usage.agent_created_report()
except Exception:
after_report = []
try:
report_path = _write_run_report(
started_at=start,
elapsed_seconds=elapsed,
auto_counts=counts,
auto_summary=auto_summary,
before_report=before_report,
before_names=before_names,
after_report=after_report,
llm_meta=llm_meta,
)
if report_path is not None:
state2["last_report_path"] = str(report_path)
except Exception as e:
logger.debug("Curator report write failed: %s", e, exc_info=True)
save_state(state2)
if on_summary:
try:
on_summary(f"curator: {final_summary}")
except Exception:
pass
if synchronous:
_llm_pass()
else:
t = threading.Thread(target=_llm_pass, daemon=True, name="curator-review")
t.start()
return {
"started_at": start.isoformat(),
"auto_transitions": counts,
"summary_so_far": auto_summary,
}
def _run_llm_review(prompt: str) -> Dict[str, Any]:
"""Spawn an AIAgent fork to run the curator review prompt.
Returns a dict with:
- final: full (untruncated) final response from the reviewer
- summary: short summary suitable for state file (240-char cap)
- model, provider: what the fork actually ran on
- tool_calls: list of {name, arguments} for every tool call made during
the pass (arguments may be truncated for readability)
- error: set if the pass failed mid-run; final/summary may still be empty
Never raises; callers get a structured failure instead.
"""
import contextlib
result_meta: Dict[str, Any] = {
"final": "",
"summary": "",
"model": "",
"provider": "",
"tool_calls": [],
"error": None,
}
try:
from run_agent import AIAgent
except Exception as e:
result_meta["error"] = f"AIAgent import failed: {e}"
result_meta["summary"] = result_meta["error"]
return result_meta
# Resolve provider + model the same way the CLI does, so the curator
# fork inherits the user's active main config rather than falling
# through to an empty provider/model pair (which sends HTTP 400
# "No models provided"). AIAgent() without explicit provider/model
# arguments hits an auto-resolution path that fails for OAuth-only
# providers and for pool-backed credentials.
_api_key = None
_base_url = None
_api_mode = None
_resolved_provider = None
_model_name = ""
try:
from hermes_cli.config import load_config
from hermes_cli.runtime_provider import resolve_runtime_provider
_cfg = load_config()
_m = _cfg.get("model", {}) if isinstance(_cfg.get("model"), dict) else {}
_provider = _m.get("provider") or "auto"
_model_name = _m.get("default") or _m.get("model") or ""
_rp = resolve_runtime_provider(
requested=_provider, target_model=_model_name
)
_api_key = _rp.get("api_key")
_base_url = _rp.get("base_url")
_api_mode = _rp.get("api_mode")
_resolved_provider = _rp.get("provider") or _provider
except Exception as e:
logger.debug("Curator provider resolution failed: %s", e, exc_info=True)
result_meta["model"] = _model_name
result_meta["provider"] = _resolved_provider or ""
review_agent = None
try:
review_agent = AIAgent(
model=_model_name,
provider=_resolved_provider,
api_key=_api_key,
base_url=_base_url,
api_mode=_api_mode,
# Umbrella-building over a large skill collection is worth a
# high iteration ceiling — the pass typically takes 50-100
# API calls against hundreds of candidate skills. The
# single-session review path caps itself at a much smaller
# number because it's not doing a curation sweep.
max_iterations=9999,
quiet_mode=True,
platform="curator",
skip_context_files=True,
skip_memory=True,
)
# Disable recursive nudges — the curator must never spawn its own review.
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Redirect the forked agent's stdout/stderr to /dev/null while it
# runs so its tool-call chatter doesn't pollute the foreground
# terminal. The background-thread runner also hides it; this
# belt-and-suspenders path matters when a caller invokes
# run_curator_review(synchronous=True) from the CLI.
with open(os.devnull, "w") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
conv_result = review_agent.run_conversation(user_message=prompt)
final = ""
if isinstance(conv_result, dict):
final = str(conv_result.get("final_response") or "").strip()
result_meta["final"] = final
result_meta["summary"] = (final[:240] + "") if len(final) > 240 else (final or "no change")
# Collect tool calls for the report. Walk the forked agent's
# session messages and extract every tool_call made during the
# pass. Truncate argument payloads so a giant skill_manage create
# doesn't blow up the report.
_calls: List[Dict[str, Any]] = []
for msg in getattr(review_agent, "_session_messages", []) or []:
if not isinstance(msg, dict):
continue
tcs = msg.get("tool_calls") or []
for tc in tcs:
if not isinstance(tc, dict):
continue
fn = tc.get("function") or {}
name = fn.get("name") or ""
args_raw = fn.get("arguments") or ""
if isinstance(args_raw, str) and len(args_raw) > 400:
args_raw = args_raw[:400] + ""
_calls.append({"name": name, "arguments": args_raw})
result_meta["tool_calls"] = _calls
except Exception as e:
result_meta["error"] = f"error: {e}"
result_meta["summary"] = result_meta["error"]
finally:
if review_agent is not None:
try:
review_agent.close()
except Exception:
pass
return result_meta
# ---------------------------------------------------------------------------
# Public entrypoint for the session-start hook
# ---------------------------------------------------------------------------
def maybe_run_curator(
*,
idle_for_seconds: Optional[float] = None,
on_summary: Optional[Callable[[str], None]] = None,
) -> Optional[Dict[str, Any]]:
"""Best-effort: run a curator pass if all gates pass. Returns the result
dict if a pass was started, else None. Never raises."""
try:
if not should_run_now():
return None
# Idle gating: only enforce when the caller provided a measurement.
if idle_for_seconds is not None:
min_idle_s = get_min_idle_hours() * 3600.0
if idle_for_seconds < min_idle_s:
return None
return run_curator_review(on_summary=on_summary)
except Exception as e:
logger.debug("maybe_run_curator failed: %s", e, exc_info=True)
return None
+1
View File
@@ -91,6 +91,7 @@ class ClassifiedError:
_BILLING_PATTERNS = [
"insufficient credits",
"insufficient_quota",
"insufficient balance",
"credit balance",
"credits have been exhausted",
"top up your credits",
-2
View File
@@ -30,7 +30,6 @@ from __future__ import annotations
import json
import logging
import os
import time
import uuid
from types import SimpleNamespace
@@ -42,7 +41,6 @@ from agent import google_oauth
from agent.gemini_schema import sanitize_gemini_tool_parameters
from agent.google_code_assist import (
CODE_ASSIST_ENDPOINT,
FREE_TIER_ID,
CodeAssistError,
ProjectContext,
resolve_project_context,
+1 -1
View File
@@ -2,7 +2,7 @@
from __future__ import annotations
from typing import Any, Dict, List
from typing import Any, Dict
# Gemini's ``FunctionDeclaration.parameters`` field accepts the ``Schema``
# object, which is only a subset of OpenAPI 3.0 / JSON Schema. Strip fields
-1
View File
@@ -29,7 +29,6 @@ from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.parse
+3 -3
View File
@@ -49,14 +49,13 @@ import json
import logging
import os
import secrets
import socket
import stat
import threading
import time
import urllib.error
import urllib.parse
import urllib.request
from dataclasses import dataclass, field
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, Optional, Tuple
@@ -98,6 +97,7 @@ _DEFAULT_CLIENT_SECRET = f"GOCSPX-{_PUBLIC_CLIENT_SECRET_SUFFIX}"
# Regex patterns for fallback scraping from an installed gemini-cli.
import re as _re
from utils import atomic_replace
_CLIENT_ID_PATTERN = _re.compile(
r"OAUTH_CLIENT_ID\s*=\s*['\"]([0-9]+-[a-z0-9]+\.apps\.googleusercontent\.com)['\"]"
)
@@ -499,7 +499,7 @@ def save_credentials(creds: GoogleCredentials) -> Path:
fh.flush()
os.fsync(fh.fileno())
os.chmod(tmp_path, stat.S_IRUSR | stat.S_IWUSR)
os.replace(tmp_path, path)
atomic_replace(tmp_path, path)
finally:
try:
if tmp_path.exists():
+48
View File
@@ -0,0 +1,48 @@
"""LM Studio reasoning-effort resolution shared by the chat-completions
transport and run_agent's iteration-limit summary path.
LM Studio publishes per-model ``capabilities.reasoning.allowed_options`` (e.g.
``["off","on"]`` for toggle-style models, ``["off","minimal","low"]`` for
graduated models). We map the user's ``reasoning_config`` onto LM Studio's
OpenAI-compatible vocabulary, then clamp against the model's allowed set so
the server doesn't 400 on an unsupported effort.
"""
from __future__ import annotations
from typing import List, Optional
# LM Studio accepts these top-level reasoning_effort values via its
# OpenAI-compatible chat.completions endpoint.
_LM_VALID_EFFORTS = {"none", "minimal", "low", "medium", "high", "xhigh"}
# Toggle-style models publish allowed_options as ["off","on"] in /api/v1/models.
# Map them onto the OpenAI-compatible request vocabulary.
_LM_EFFORT_ALIASES = {"off": "none", "on": "medium"}
def resolve_lmstudio_effort(
reasoning_config: Optional[dict],
allowed_options: Optional[List[str]],
) -> Optional[str]:
"""Return the ``reasoning_effort`` string to send to LM Studio, or ``None``.
``None`` means "omit the field": the user picked a level the model can't
honor, so let LM Studio fall back to the model's declared default rather
than silently substituting a different effort. When ``allowed_options`` is
falsy (probe failed), skip clamping and send the resolved effort anyway.
"""
effort = "medium"
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
effort = "none"
else:
raw = (reasoning_config.get("effort") or "").strip().lower()
raw = _LM_EFFORT_ALIASES.get(raw, raw)
if raw in _LM_VALID_EFFORTS:
effort = raw
if allowed_options:
allowed = {_LM_EFFORT_ALIASES.get(opt, opt) for opt in allowed_options}
if effort not in allowed:
return None
return effort
+35 -1
View File
@@ -28,7 +28,6 @@ Usage in run_agent.py:
from __future__ import annotations
import json
import logging
import re
import inspect
@@ -403,6 +402,41 @@ class MemoryManager:
provider.name, e,
)
def on_session_switch(
self,
new_session_id: str,
*,
parent_session_id: str = "",
reset: bool = False,
**kwargs,
) -> None:
"""Notify all providers that the agent's session_id has rotated.
Fires on ``/resume``, ``/branch``, ``/reset``, ``/new``, and
context compression any path that reassigns
``AIAgent.session_id`` without tearing the provider down.
Providers keep running; they only need to refresh cached
per-session state so subsequent writes land in the correct
session's record. See ``MemoryProvider.on_session_switch`` for
the full contract.
"""
if not new_session_id:
return
for provider in self._providers:
try:
provider.on_session_switch(
new_session_id,
parent_session_id=parent_session_id,
reset=reset,
**kwargs,
)
except Exception as e:
logger.debug(
"Memory provider '%s' on_session_switch failed: %s",
provider.name, e,
)
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Notify all providers before context compression.
+40
View File
@@ -25,6 +25,7 @@ Lifecycle (called by MemoryManager, wired in run_agent.py):
Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) per-turn tick with runtime context
on_session_end(messages) end-of-session extraction
on_session_switch(new_session_id, **kwargs) mid-process session_id rotation
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
@@ -160,6 +161,45 @@ class MemoryProvider(ABC):
(CLI exit, /reset, gateway session expiry).
"""
def on_session_switch(
self,
new_session_id: str,
*,
parent_session_id: str = "",
reset: bool = False,
**kwargs,
) -> None:
"""Called when the agent switches session_id mid-process.
Fires on ``/resume``, ``/branch``, ``/reset``, ``/new`` (CLI), the
gateway equivalents, and context compression any path that
reassigns ``AIAgent.session_id`` without tearing the provider down.
Providers that cache per-session state in ``initialize()``
(``_session_id``, ``_document_id``, accumulated turn buffers,
counters) should update or reset that state here so subsequent
writes land in the correct session's record.
Parameters
----------
new_session_id:
The session_id the agent just switched to.
parent_session_id:
The previous session_id, if meaningful set for ``/branch``
(fork lineage), context compression (continuation lineage),
and ``/resume`` (the session we're leaving). Empty string
when no lineage applies.
reset:
``True`` when this is a genuinely new conversation, not a
resumption of an existing one. Fired by ``/reset`` / ``/new``.
Providers should flush accumulated per-session buffers
(``_session_turns``, ``_turn_counter``, etc.) when this is
set. ``False`` for ``/resume`` / ``/branch`` / compression
where the logical conversation continues under the new id.
Default is no-op for backward compatibility.
"""
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Called before context compression discards old messages.
+15 -10
View File
@@ -46,12 +46,13 @@ def _resolve_requests_verify() -> bool | str:
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-cn", "anthropic", "deepseek",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"xiaomi",
"arcee",
"gmi",
"tencent-tokenhub",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
@@ -60,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"ollama",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"tencent", "tokenhub", "tencent-cloud", "tencentmaas",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
@@ -208,6 +210,8 @@ DEFAULT_CONTEXT_LENGTHS = {
"grok": 131072, # catch-all (grok-beta, unknown grok-*)
# Kimi
"kimi": 262144,
# Tencent — Hy3 Preview (Hunyuan) with 256K context window
"hy3-preview": 256000,
# Nemotron — NVIDIA's open-weights series (128K context across all sizes)
"nemotron": 131072,
# Arcee
@@ -310,6 +314,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"tokenhub.tencentmaas.com": "tencent-tokenhub",
"ollama.com": "ollama-cloud",
}
@@ -620,8 +625,6 @@ def fetch_endpoint_model_metadata(
if isinstance(ctx, int) and ctx > 0:
context_length = ctx
break
if context_length is None:
context_length = _extract_context_length(model)
if context_length is not None:
entry["context_length"] = context_length
@@ -1011,10 +1014,7 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
ctx = cfg.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# Fall back to max_context_length (theoretical model max)
ctx = m.get("max_context_length") or m.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
break
# LM Studio / vLLM / llama.cpp: try /v1/models/{model}
resp = client.get(f"{server_url}/v1/models/{model}")
@@ -1276,7 +1276,10 @@ def get_model_context_length(
model = _strip_provider_prefix(model)
# 1. Check persistent cache (model+provider)
if base_url:
# LM Studio is excluded — its loaded context length is transient (the
# user can reload the model with a different context_length at any time
# via /api/v1/models/load), so a stale cached value would mask reloads.
if base_url and provider != "lmstudio":
cached = get_cached_context_length(model, base_url)
if cached is not None:
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
@@ -1329,7 +1332,8 @@ def get_model_context_length(
if is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
if provider != "lmstudio":
save_context_length(model, base_url, local_ctx)
return local_ctx
logger.info(
"Could not detect context length for model %r at %s"
@@ -1419,7 +1423,8 @@ def get_model_context_length(
if base_url and is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
if provider != "lmstudio":
save_context_length(model, base_url, local_ctx)
return local_ctx
# 10. Default fallback — 128K
+1
View File
@@ -149,6 +149,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"stepfun": "stepfun",
"kimi-coding-cn": "kimi-for-coding",
"minimax": "minimax",
"minimax-oauth": "minimax",
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",
"alibaba": "alibaba",
+2 -1
View File
@@ -18,6 +18,7 @@ import os
import tempfile
import time
from typing import Any, Mapping, Optional
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -118,7 +119,7 @@ def record_nous_rate_limit(
try:
with os.fdopen(fd, "w") as f:
json.dump(state, f)
os.replace(tmp_path, path)
atomic_replace(tmp_path, path)
except Exception:
# Clean up temp file on failure
try:
+11 -9
View File
@@ -98,17 +98,19 @@ def tool_progress_hint_cli() -> str:
def openclaw_residue_hint_cli() -> str:
"""Banner shown the first time Hermes starts and finds ``~/.openclaw/``.
OpenClaw-era config, memory, and skill paths in ``~/.openclaw/`` will
otherwise attract the agent (memory entries like ``~/.openclaw/config.yaml``
get carried forward and the agent dutifully reads them). ``hermes claw
cleanup`` renames the directory so the agent stops finding it.
Points users at ``hermes claw migrate`` (non-destructive port of config,
memory, and skills) first. ``hermes claw cleanup`` is mentioned as the
follow-up step for users who have already migrated and want to archive
the old directory with a warning that archiving breaks OpenClaw.
"""
return (
"Heads up — an OpenClaw workspace was detected at ~/.openclaw/.\n"
"After migrating, the agent can still get confused and read that "
"directory's config/memory instead of Hermes's.\n"
"Run `hermes claw cleanup` to archive it (rename → .openclaw.pre-migration). "
"This tip only shows once; rerun it any time with `hermes claw cleanup`."
"A legacy OpenClaw directory was detected at ~/.openclaw/.\n"
"To port your config, memory, and skills over to Hermes, run "
"`hermes claw migrate`.\n"
"If you've already migrated and want to archive the old directory, "
"run `hermes claw cleanup` (renames it to ~/.openclaw.pre-migration — "
"OpenClaw will stop working after this).\n"
"This tip only shows once."
)
+4
View File
@@ -310,6 +310,10 @@ PLATFORM_HINTS = {
"Standard markdown is automatically converted to Telegram format. "
"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
"`inline code`, ```code blocks```, [links](url), and ## headers. "
"Telegram has NO table syntax — prefer bullet lists or labeled "
"key: value pairs over pipe tables (any tables you do emit are "
"auto-rewritten into row-group bullets, which you can produce "
"directly for cleaner output). "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
+58 -6
View File
@@ -56,8 +56,12 @@ _SENSITIVE_BODY_KEYS = frozenset({
})
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
# `export HERMES_REDACT_SECRETS=true`) cannot enable/disable redaction
# mid-session. OFF by default — user must opt in via
# `security.redact_secrets: true` in config.yaml (bridged to this env var
# in hermes_cli/main.py and gateway/run.py) or `HERMES_REDACT_SECRETS=true`
# in ~/.hermes/.env.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("1", "true", "yes", "on")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
@@ -180,11 +184,59 @@ _PREFIX_RE = re.compile(
)
def mask_secret(
value: str,
*,
head: int = 4,
tail: int = 4,
floor: int = 12,
placeholder: str = "***",
empty: str = "",
) -> str:
"""Mask a secret for display, preserving ``head`` and ``tail`` characters.
Canonical helper for display-time redaction across Hermes used by
``hermes config``, ``hermes status``, ``hermes dump``, and anywhere
a secret needs to be shown truncated for debuggability while still
keeping the bulk hidden.
Args:
value: The secret to mask. ``None``/empty returns ``empty``.
head: Leading characters to preserve. Default 4.
tail: Trailing characters to preserve. Default 4.
floor: Values shorter than ``head + tail + floor_margin`` are
fully masked (returns ``placeholder``). Default 12
matches the existing config/status/dump convention.
placeholder: Value returned for too-short inputs. Default ``"***"``.
empty: Value returned when ``value`` is falsy (None, ""). The
caller can override this to e.g. ``color("(not set)",
Colors.DIM)`` for user-facing display.
Examples:
>>> mask_secret("sk-proj-abcdef1234567890")
'sk-p...7890'
>>> mask_secret("short") # fully masked
'***'
>>> mask_secret("") # empty default
''
>>> mask_secret("", empty="(not set)") # empty override
'(not set)'
>>> mask_secret("long-token", head=6, tail=4, floor=18)
'***'
"""
if not value:
return empty
if len(value) < floor:
return placeholder
return f"{value[:head]}...{value[-tail:]}"
def _mask_token(token: str) -> str:
"""Mask a token, preserving prefix for long tokens."""
if len(token) < 18:
"""Mask a log token — conservative 18-char floor, preserves 6 prefix / 4 suffix."""
# Empty input: historically this returned "***" rather than "". Preserve.
if not token:
return "***"
return f"{token[:6]}...{token[-4:]}"
return mask_secret(token, head=6, tail=4, floor=18)
def _redact_query_string(query: str) -> str:
@@ -257,7 +309,7 @@ def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
Disabled by default enable via security.redact_secrets: true in config.yaml.
"""
if text is None:
return None
+2 -1
View File
@@ -76,6 +76,7 @@ except ImportError: # pragma: no cover
fcntl = None # type: ignore[assignment]
from hermes_constants import get_hermes_home
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -568,7 +569,7 @@ def save_allowlist(data: Dict[str, Any]) -> None:
try:
with os.fdopen(fd, "w") as fh:
fh.write(json.dumps(data, indent=2, sort_keys=True))
os.replace(tmp_path, p)
atomic_replace(tmp_path, p)
except Exception:
try:
os.unlink(tmp_path)
+9 -1
View File
@@ -200,6 +200,9 @@ def get_external_skills_dirs() -> List[Path]:
if not isinstance(raw_dirs, list):
return []
from hermes_constants import get_hermes_home
hermes_home = get_hermes_home()
local_skills = get_skills_dir().resolve()
seen: Set[Path] = set()
result: List[Path] = []
@@ -210,7 +213,12 @@ def get_external_skills_dirs() -> List[Path]:
continue
# Expand ~ and environment variables
expanded = os.path.expanduser(os.path.expandvars(entry))
p = Path(expanded).resolve()
p = Path(expanded)
# Resolve relative paths against HERMES_HOME, not cwd
if not p.is_absolute():
p = (hermes_home / p).resolve()
else:
p = p.resolve()
if p == local_skills:
continue
if p in seen:
+8 -3
View File
@@ -30,10 +30,12 @@ def generate_title(
assistant_response: str,
timeout: float = 30.0,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
Uses the main runtime's model when available, falling back to the
auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
``failure_callback`` is invoked with ``(task, exception)`` when the
@@ -57,6 +59,7 @@ def generate_title(
max_tokens=500,
temperature=0.3,
timeout=timeout,
main_runtime=main_runtime,
)
title = (response.choices[0].message.content or "").strip()
# Clean up: remove quotes, trailing punctuation, prefixes like "Title: "
@@ -86,6 +89,7 @@ def auto_title_session(
user_message: str,
assistant_response: str,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> None:
"""Generate and set a session title if one doesn't already exist.
@@ -107,7 +111,7 @@ def auto_title_session(
return
title = generate_title(
user_message, assistant_response, failure_callback=failure_callback
user_message, assistant_response, failure_callback=failure_callback, main_runtime=main_runtime
)
if not title:
return
@@ -126,6 +130,7 @@ def maybe_auto_title(
assistant_response: str,
conversation_history: list,
failure_callback: Optional[FailureCallback] = None,
main_runtime: dict = None,
) -> None:
"""Fire-and-forget title generation after the first exchange.
@@ -147,7 +152,7 @@ def maybe_auto_title(
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
kwargs={"failure_callback": failure_callback},
kwargs={"failure_callback": failure_callback, "main_runtime": main_runtime},
daemon=True,
name="auto-title",
)
+124 -2
View File
@@ -12,12 +12,84 @@ reasoning configuration, temperature handling, and extra_body assembly.
import copy
from typing import Any, Dict, List, Optional
from agent.lmstudio_reasoning import resolve_lmstudio_effort
from agent.moonshot_schema import is_moonshot_model, sanitize_moonshot_tools
from agent.prompt_builder import DEVELOPER_ROLE_MODELS
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
def _build_gemini_thinking_config(model: str, reasoning_config: dict | None) -> dict | None:
"""Translate Hermes/OpenRouter-style reasoning config to Gemini thinkingConfig."""
if reasoning_config is None or not isinstance(reasoning_config, dict):
return None
if reasoning_config.get("enabled") is False:
# Gemini can hide thought parts even when internal thinking still
# happens; omit thinkingLevel to avoid model-specific validation quirks.
return {"includeThoughts": False}
effort = str(reasoning_config.get("effort", "medium") or "medium").strip().lower()
if effort == "none":
return {"includeThoughts": False}
thinking_config: Dict[str, Any] = {"includeThoughts": True}
normalized_model = (model or "").strip().lower()
if normalized_model.startswith("google/"):
normalized_model = normalized_model.split("/", 1)[1]
# Gemini 2.5 accepts thinkingBudget; don't guess a budget from Hermes'
# coarse effort levels. ``includeThoughts`` alone is enough to surface
# thought parts without risking request validation errors.
if normalized_model.startswith("gemini-2.5-"):
return thinking_config
if effort not in {"minimal", "low", "medium", "high", "xhigh"}:
effort = "medium"
# Gemini 3 Flash documents low/medium/high thinking levels; Gemini 3 Pro
# is stricter (low/high). Clamp Hermes' wider effort set to what each
# family accepts so we never forward an undocumented level verbatim.
if normalized_model.startswith(("gemini-3", "gemini-3.1")):
if "flash" in normalized_model:
if effort in {"minimal", "low"}:
thinking_config["thinkingLevel"] = "low"
elif effort in {"high", "xhigh"}:
thinking_config["thinkingLevel"] = "high"
else:
thinking_config["thinkingLevel"] = "medium"
elif "pro" in normalized_model:
thinking_config["thinkingLevel"] = (
"high" if effort in {"high", "xhigh"} else "low"
)
return thinking_config
def _snake_case_gemini_thinking_config(config: dict | None) -> dict | None:
"""Convert Gemini thinking config keys to the OpenAI-compat field names."""
if not isinstance(config, dict) or not config:
return None
translated: Dict[str, Any] = {}
if isinstance(config.get("includeThoughts"), bool):
translated["include_thoughts"] = config["includeThoughts"]
if isinstance(config.get("thinkingLevel"), str) and config["thinkingLevel"].strip():
translated["thinking_level"] = config["thinkingLevel"].strip().lower()
if isinstance(config.get("thinkingBudget"), (int, float)):
translated["thinking_budget"] = int(config["thinkingBudget"])
return translated or None
def _is_gemini_openai_compat_base_url(base_url: Any) -> bool:
normalized = str(base_url or "").strip().rstrip("/").lower()
if not normalized:
return False
if "generativelanguage.googleapis.com" not in normalized:
return False
return normalized.endswith("/openai")
class ChatCompletionsTransport(ProviderTransport):
"""Transport for api_mode='chat_completions'.
@@ -101,6 +173,7 @@ class ChatCompletionsTransport(ProviderTransport):
is_github_models: bool
is_nvidia_nim: bool
is_kimi: bool
is_lmstudio: bool
is_custom_provider: bool
ollama_num_ctx: int | None
# Provider routing
@@ -114,6 +187,7 @@ class ChatCompletionsTransport(ProviderTransport):
# Reasoning
supports_reasoning: bool
github_reasoning_extra: dict | None
lmstudio_reasoning_options: list[str] | None # raw allowed_options from /api/v1/models
# Claude on OpenRouter/Nous max output
anthropic_max_output: int | None
# Extra
@@ -188,6 +262,7 @@ class ChatCompletionsTransport(ProviderTransport):
anthropic_max_out = params.get("anthropic_max_output")
is_nvidia_nim = params.get("is_nvidia_nim", False)
is_kimi = params.get("is_kimi", False)
is_tokenhub = params.get("is_tokenhub", False)
reasoning_config = params.get("reasoning_config")
if ephemeral is not None and max_tokens_fn:
@@ -219,12 +294,41 @@ class ChatCompletionsTransport(ProviderTransport):
_kimi_effort = _e
api_kwargs["reasoning_effort"] = _kimi_effort
# Tencent TokenHub: top-level reasoning_effort (unless thinking disabled)
if is_tokenhub:
_tokenhub_thinking_off = bool(
reasoning_config
and isinstance(reasoning_config, dict)
and reasoning_config.get("enabled") is False
)
if not _tokenhub_thinking_off:
_tokenhub_effort = "high"
if reasoning_config and isinstance(reasoning_config, dict):
_e = (reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
_tokenhub_effort = _e
api_kwargs["reasoning_effort"] = _tokenhub_effort
# LM Studio: top-level reasoning_effort. Only emit when the model
# declares reasoning support via /api/v1/models capabilities (gated
# upstream by params["supports_reasoning"]). resolve_lmstudio_effort
# is shared with run_agent's summary path so both stay in sync.
if params.get("is_lmstudio", False) and params.get("supports_reasoning", False):
_lm_effort = resolve_lmstudio_effort(
reasoning_config,
params.get("lmstudio_reasoning_options"),
)
if _lm_effort is not None:
api_kwargs["reasoning_effort"] = _lm_effort
# extra_body assembly
extra_body: Dict[str, Any] = {}
is_openrouter = params.get("is_openrouter", False)
is_nous = params.get("is_nous", False)
is_github_models = params.get("is_github_models", False)
provider_name = str(params.get("provider_name") or "").strip().lower()
base_url = params.get("base_url")
provider_prefs = params.get("provider_preferences")
if provider_prefs and is_openrouter:
@@ -240,8 +344,9 @@ class ChatCompletionsTransport(ProviderTransport):
"type": "enabled" if _kimi_thinking_enabled else "disabled",
}
# Reasoning
if params.get("supports_reasoning", False):
# Reasoning. LM Studio is handled above via top-level reasoning_effort,
# so skip emitting extra_body.reasoning for it.
if params.get("supports_reasoning", False) and not params.get("is_lmstudio", False):
if is_github_models:
gh_reasoning = params.get("github_reasoning_extra")
if gh_reasoning is not None:
@@ -277,6 +382,23 @@ class ChatCompletionsTransport(ProviderTransport):
if is_qwen:
extra_body["vl_high_resolution_images"] = True
if provider_name == "gemini":
raw_thinking_config = _build_gemini_thinking_config(model, reasoning_config)
if _is_gemini_openai_compat_base_url(base_url):
thinking_config = _snake_case_gemini_thinking_config(raw_thinking_config)
if thinking_config:
openai_compat_extra = extra_body.get("extra_body", {})
google_extra = openai_compat_extra.get("google", {})
google_extra["thinking_config"] = thinking_config
openai_compat_extra["google"] = google_extra
extra_body["extra_body"] = openai_compat_extra
elif raw_thinking_config:
extra_body["thinking_config"] = raw_thinking_config
elif provider_name == "google-gemini-cli":
thinking_config = _build_gemini_thinking_config(model, reasoning_config)
if thinking_config:
extra_body["thinking_config"] = thinking_config
# Merge any pre-built extra_body additions
additions = params.get("extra_body_additions")
if additions:
+1 -3
View File
@@ -8,7 +8,7 @@ streaming, or the _run_codex_stream() call path.
from typing import Any, Dict, List, Optional
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
from agent.transports.types import NormalizedResponse, ToolCall
class ResponsesApiTransport(ProviderTransport):
@@ -151,8 +151,6 @@ class ResponsesApiTransport(ProviderTransport):
"""Normalize Codex Responses API response to NormalizedResponse."""
from agent.codex_responses_adapter import (
_normalize_codex_response,
_extract_responses_message_text,
_extract_responses_reasoning_text,
)
# _normalize_codex_response returns (SimpleNamespace, finish_reason_str)
+21
View File
@@ -359,6 +359,25 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
# MiniMax
(
"minimax",
"minimax-m2.7",
): PricingEntry(
input_cost_per_million=Decimal("0.30"),
output_cost_per_million=Decimal("1.20"),
source="official_docs_snapshot",
pricing_version="minimax-pricing-2026-04",
),
(
"minimax-cn",
"minimax-m2.7",
): PricingEntry(
input_cost_per_million=Decimal("0.30"),
output_cost_per_million=Decimal("1.20"),
source="official_docs_snapshot",
pricing_version="minimax-pricing-2026-04",
),
}
@@ -400,6 +419,8 @@ def resolve_billing_route(
return BillingRoute(provider="anthropic", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name == "openai":
return BillingRoute(provider="openai", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name in {"minimax", "minimax-cn"}:
return BillingRoute(provider=provider_name, model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name in {"custom", "local"} or (base and "localhost" in base):
return BillingRoute(provider=provider_name or "custom", model=model, base_url=base_url or "", billing_mode="unknown")
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
+11 -7
View File
@@ -30,14 +30,13 @@ model:
# "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
# "kilocode" - KiloCode gateway (requires: KILOCODE_API_KEY)
# "ai-gateway" - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
# "lmstudio" - LM Studio local server (optional: LM_API_KEY, defaults to http://127.0.0.1:1234/v1)
#
# Local servers (LM Studio, Ollama, vLLM, llama.cpp):
# "custom" - Any OpenAI-compatible endpoint. Set base_url below.
# Aliases: "lmstudio", "ollama", "vllm", "llamacpp" all map to "custom".
# Example for LM Studio:
# provider: "lmstudio"
# base_url: "http://localhost:1234/v1"
# No API key needed — local servers typically ignore auth.
# "custom" - Any other OpenAI-compatible endpoint. Set base_url below.
# Aliases: "ollama", "vllm", "llamacpp" all map to "custom".
# LM Studio is first-class and uses provider: "lmstudio".
# It works with both no-auth and auth-enabled server modes.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"
@@ -181,6 +180,11 @@ terminal:
# lifetime_seconds: 300
# docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
# docker_mount_cwd_to_workspace: true # Explicit opt-in: mount your launch cwd into /workspace
# # Optional: run the container as your host user's uid:gid so files written
# # into bind-mounted dirs are owned by you, not root. Drops SETUID/SETGID
# # caps too since no gosu privilege drop is needed. Leave off if your
# # chosen docker_image expects to start as root.
# docker_run_as_host_user: true
# # Optional: explicitly forward selected env vars into Docker.
# # These values come from your current shell first, then ~/.hermes/.env.
# # Warning: anything forwarded here is visible to commands run in the container.
@@ -928,7 +932,7 @@ display:
# agent_name: "My Agent" # Banner title and branding
# welcome: "Welcome message" # Shown at CLI startup
# response_label: " ⚔ Agent " # Response box header label
# prompt_symbol: "⚔ " # Prompt symbol
# prompt_symbol: "⚔" # Prompt symbol (bare token; renderers add trailing space)
# tool_prefix: "╎" # Tool output line prefix (default: ┊)
#
skin: default
+246 -123
View File
@@ -69,7 +69,9 @@ from agent.usage_pricing import (
format_duration_compact,
format_token_count_compact,
)
from agent.account_usage import fetch_account_usage, render_account_usage_lines
# NOTE: `from agent.account_usage import ...` is deliberately NOT at module
# top — it transitively pulls the OpenAI SDK chain (~230 ms cold) and is only
# needed when the user runs `/limits`. Lazy-imported inside the handler below.
from hermes_cli.banner import _format_context_length, format_banner_version_label
_COMMAND_SPINNER_FRAMES = ("", "", "", "", "", "", "", "", "", "")
@@ -78,6 +80,11 @@ _COMMAND_SPINNER_FRAMES = ("⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧
# Load .env from ~/.hermes/.env first, then project root as dev fallback.
# User-managed env files should override stale shell exports on restart.
from hermes_constants import get_hermes_home, display_hermes_home
from hermes_cli.browser_connect import (
DEFAULT_BROWSER_CDP_URL,
manual_chrome_debug_command,
try_launch_chrome_debug,
)
from hermes_cli.env_loader import load_hermes_dotenv
from utils import base_url_host_matches
@@ -238,65 +245,6 @@ def _parse_service_tier_config(raw: str) -> str | None:
logger.warning("Unknown service_tier '%s', ignoring", raw)
return None
def _get_chrome_debug_candidates(system: str) -> list[str]:
"""Return likely browser executables for local CDP auto-launch."""
candidates: list[str] = []
seen: set[str] = set()
def _add_candidate(path: str | None) -> None:
if not path:
return
normalized = os.path.normcase(os.path.normpath(path))
if normalized in seen:
return
if os.path.isfile(path):
candidates.append(path)
seen.add(normalized)
def _add_from_path(*names: str) -> None:
for name in names:
_add_candidate(shutil.which(name))
if system == "Darwin":
for app in (
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Chromium.app/Contents/MacOS/Chromium",
"/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
):
_add_candidate(app)
elif system == "Windows":
_add_from_path(
"chrome.exe", "msedge.exe", "brave.exe", "chromium.exe",
"chrome", "msedge", "brave", "chromium",
)
for base in (
os.environ.get("ProgramFiles"),
os.environ.get("ProgramFiles(x86)"),
os.environ.get("LOCALAPPDATA"),
):
if not base:
continue
for parts in (
("Google", "Chrome", "Application", "chrome.exe"),
("Chromium", "Application", "chrome.exe"),
("Chromium", "Application", "chromium.exe"),
("BraveSoftware", "Brave-Browser", "Application", "brave.exe"),
("Microsoft", "Edge", "Application", "msedge.exe"),
):
_add_candidate(os.path.join(base, *parts))
else:
_add_from_path(
"google-chrome", "google-chrome-stable", "chromium-browser",
"chromium", "brave-browser", "microsoft-edge",
)
return candidates
def load_cli_config() -> Dict[str, Any]:
"""
Load CLI configuration from config files.
@@ -549,18 +497,20 @@ def load_cli_config() -> Dict[str, Any]:
"singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"modal_image": "TERMINAL_MODAL_IMAGE",
"daytona_image": "TERMINAL_DAYTONA_IMAGE",
"vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
# SSH config
"ssh_host": "TERMINAL_SSH_HOST",
"ssh_user": "TERMINAL_SSH_USER",
"ssh_port": "TERMINAL_SSH_PORT",
"ssh_key": "TERMINAL_SSH_KEY",
# Container resource config (docker, singularity, modal, daytona -- ignored for local/ssh)
# Container resource config (docker, singularity, modal, daytona, vercel_sandbox -- ignored for local/ssh)
"container_cpu": "TERMINAL_CONTAINER_CPU",
"container_memory": "TERMINAL_CONTAINER_MEMORY",
"container_disk": "TERMINAL_CONTAINER_DISK",
"container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
"docker_volumes": "TERMINAL_DOCKER_VOLUMES",
"docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
# Persistent shell (non-local backends)
"persistent_shell": "TERMINAL_PERSISTENT_SHELL",
@@ -2169,6 +2119,11 @@ class HermesCLI:
self._pending_input = queue.Queue()
self._interrupt_queue = queue.Queue()
self._should_exit = False
# /exit --delete: when True, the current session's SQLite history and
# on-disk transcripts are deleted during shutdown. Set by
# process_command() when the user runs /exit --delete or /quit --delete.
# Ported from google-gemini/gemini-cli#19332.
self._delete_session_on_exit = False
self._last_ctrl_c_time = 0
self._clarify_state = None
self._clarify_freetext = False
@@ -4860,6 +4815,22 @@ class HermesCLI:
)
except Exception:
pass
# Notify memory providers that session_id rotated to a fresh
# conversation. reset=True signals providers to flush accumulated
# per-session state (_session_turns, _turn_counter, _document_id).
# Fires BEFORE the plugin on_session_reset hook (shell hooks only
# see the new id; Python providers see the transition). See #6672.
try:
_mm = getattr(self.agent, "_memory_manager", None)
if _mm is not None:
_mm.on_session_switch(
self.session_id,
parent_session_id=old_session_id or "",
reset=True,
reason="new_session",
)
except Exception:
pass
self._notify_session_boundary("on_session_reset")
if not silent:
@@ -4912,6 +4883,7 @@ class HermesCLI:
_cprint(" Already on that session.")
return
old_session_id = self.session_id
# End current session
try:
self._session_db.end_session(self.session_id, "resumed_other")
@@ -4949,6 +4921,22 @@ class HermesCLI:
if hasattr(self.agent, "_invalidate_system_prompt"):
self.agent._invalidate_system_prompt()
# Notify memory providers that session_id rotated to a resumed
# session. reset=False — the provider's accumulated state is
# still valid; it just needs to target the new session_id for
# subsequent writes. See #6672.
try:
_mm = getattr(self.agent, "_memory_manager", None)
if _mm is not None:
_mm.on_session_switch(
target_id,
parent_session_id=old_session_id or "",
reset=False,
reason="resume",
)
except Exception:
pass
title_part = f" \"{session_meta['title']}\"" if session_meta.get("title") else ""
msg_count = len([m for m in self.conversation_history if m.get("role") == "user"])
if self.conversation_history:
@@ -5069,6 +5057,22 @@ class HermesCLI:
if hasattr(self.agent, "_invalidate_system_prompt"):
self.agent._invalidate_system_prompt()
# Notify memory providers that session_id forked to a new branch.
# reset=False — the branched session carries the transcript
# forward, so provider state tracks the lineage. parent_session_id
# links the branch back to the original. See #6672.
try:
_mm = getattr(self.agent, "_memory_manager", None)
if _mm is not None:
_mm.on_session_switch(
new_session_id,
parent_session_id=parent_session_id or "",
reset=False,
reason="branch",
)
except Exception:
pass
msg_count = len([m for m in self.conversation_history if m.get("role") == "user"])
_cprint(
f" ⑂ Branched session \"{branch_title}\""
@@ -5457,6 +5461,8 @@ class HermesCLI:
try:
providers = list_authenticated_providers(
current_provider=self.provider or "",
current_base_url=self.base_url or "",
current_model=self.model or "",
user_providers=user_provs,
custom_providers=custom_provs,
max_models=50,
@@ -5975,7 +5981,29 @@ class HermesCLI:
print(f"(._.) Unknown cron command: {subcommand}")
print(" Available: list, add, edit, pause, resume, run, remove")
def _handle_curator_command(self, cmd: str):
"""Handle /curator slash command.
Delegates to hermes_cli.curator so the CLI and the `hermes curator`
subcommand share the same handler set.
"""
import shlex
tokens = shlex.split(cmd)[1:] if cmd else []
if not tokens:
tokens = ["status"]
try:
from hermes_cli.curator import cli_main
cli_main(tokens)
except SystemExit:
# argparse calls sys.exit() on --help or errors; swallow so we
# don't kill the interactive session.
pass
except Exception as exc:
print(f"(._.) curator: {exc}")
def _handle_skills_command(self, cmd: str):
"""Handle /skills slash command — delegates to hermes_cli.skills_hub."""
from hermes_cli.skills_hub import handle_skills_slash
@@ -6060,6 +6088,16 @@ class HermesCLI:
canonical = _cmd_def.name if _cmd_def else _base_word
if canonical in ("quit", "exit", "q"):
# Parse --delete flag: /exit --delete also removes the current
# session's transcripts + SQLite history. Ported from
# google-gemini/gemini-cli#19332.
_rest = cmd_original.split(None, 1)
_args = (_rest[1] if len(_rest) > 1 else "").strip().lower()
if _args in ("--delete", "-d"):
self._delete_session_on_exit = True
elif _args:
_cprint(f" {_DIM}✗ Unknown argument: {_escape(_args)}. Use /exit --delete to also remove session history.{_RST}")
return True
return False
elif canonical == "help":
self.show_help()
@@ -6219,6 +6257,8 @@ class HermesCLI:
self.save_conversation()
elif canonical == "cron":
self._handle_cron_command(cmd_original)
elif canonical == "curator":
self._handle_curator_command(cmd_original)
elif canonical == "skills":
with self._busy_command(self._slow_command_status(cmd_original)):
self._handle_skills_command(cmd_original)
@@ -6232,6 +6272,8 @@ class HermesCLI:
self._console_print(f" Status bar {state}")
elif canonical == "verbose":
self._toggle_verbose()
elif canonical == "footer":
self._handle_footer_command(cmd_original)
elif canonical == "yolo":
self._toggle_yolo()
elif canonical == "reasoning":
@@ -6600,34 +6642,7 @@ class HermesCLI:
Returns True if a launch command was executed (doesn't guarantee success).
"""
import subprocess as _sp
candidates = _get_chrome_debug_candidates(system)
if not candidates:
return False
# Dedicated profile dir so debug Chrome won't collide with normal Chrome
data_dir = str(_hermes_home / "chrome-debug")
os.makedirs(data_dir, exist_ok=True)
chrome = candidates[0]
try:
_sp.Popen(
[
chrome,
f"--remote-debugging-port={port}",
f"--user-data-dir={data_dir}",
"--no-first-run",
"--no-default-browser-check",
],
stdout=_sp.DEVNULL,
stderr=_sp.DEVNULL,
start_new_session=True, # detach from terminal
)
return True
except Exception:
return False
return try_launch_chrome_debug(port, system)
def _handle_browser_command(self, cmd: str):
"""Handle /browser connect|disconnect|status — manage live Chrome CDP connection."""
@@ -6636,13 +6651,44 @@ class HermesCLI:
parts = cmd.strip().split(None, 1)
sub = parts[1].lower().strip() if len(parts) > 1 else "status"
_DEFAULT_CDP = "http://127.0.0.1:9222"
_DEFAULT_CDP = DEFAULT_BROWSER_CDP_URL
current = os.environ.get("BROWSER_CDP_URL", "").strip()
if sub.startswith("connect"):
# Optionally accept a custom CDP URL: /browser connect ws://host:port
connect_parts = cmd.strip().split(None, 2) # ["/browser", "connect", "ws://..."]
cdp_url = connect_parts[2].strip() if len(connect_parts) > 2 else _DEFAULT_CDP
parsed_cdp = urlparse(cdp_url if "://" in cdp_url else f"http://{cdp_url}")
if parsed_cdp.scheme not in {"http", "https", "ws", "wss"}:
print()
print(
f" ⚠ Unsupported browser url scheme: {parsed_cdp.scheme or '(missing)'} "
"(expected one of: http, https, ws, wss)"
)
print()
return
try:
_port = parsed_cdp.port or (443 if parsed_cdp.scheme in {"https", "wss"} else 80)
except ValueError:
print()
print(f" ⚠ Invalid port in browser url: {cdp_url}")
print()
return
if not parsed_cdp.hostname:
print()
print(f" ⚠ Missing host in browser url: {cdp_url}")
print()
return
_host = parsed_cdp.hostname
if parsed_cdp.path.startswith("/devtools/browser/"):
cdp_url = parsed_cdp.geturl()
else:
cdp_url = parsed_cdp._replace(
path="",
params="",
query="",
fragment="",
).geturl()
# Clear any existing browser sessions so the next tool call uses the new backend
try:
@@ -6653,20 +6699,13 @@ class HermesCLI:
print()
# Extract port for connectivity checks
_port = 9222
try:
_port = int(cdp_url.rsplit(":", 1)[-1].split("/")[0])
except (ValueError, IndexError):
pass
# Check if Chrome is already listening on the debug port
import socket
_already_open = False
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(1)
s.connect(("127.0.0.1", _port))
s.connect((_host, _port))
s.close()
_already_open = True
except (OSError, socket.timeout):
@@ -6684,7 +6723,7 @@ class HermesCLI:
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(1)
s.connect(("127.0.0.1", _port))
s.connect((_host, _port))
s.close()
_already_open = True
break
@@ -6697,33 +6736,22 @@ class HermesCLI:
print(" Try again in a few seconds — the debug instance may still be starting")
else:
print(" ⚠ Could not auto-launch Chrome")
# Show manual instructions as fallback
_data_dir = str(_hermes_home / "chrome-debug")
sys_name = _plat.system()
if sys_name == "Darwin":
chrome_cmd = (
'open -a "Google Chrome" --args'
f" --remote-debugging-port=9222"
f' --user-data-dir="{_data_dir}"'
" --no-first-run --no-default-browser-check"
)
elif sys_name == "Windows":
chrome_cmd = (
f'chrome.exe --remote-debugging-port=9222'
f' --user-data-dir="{_data_dir}"'
f" --no-first-run --no-default-browser-check"
)
chrome_cmd = manual_chrome_debug_command(_port, sys_name)
if chrome_cmd:
print(f" Launch Chrome manually:")
print(f" {chrome_cmd}")
else:
chrome_cmd = (
f"google-chrome --remote-debugging-port=9222"
f' --user-data-dir="{_data_dir}"'
f" --no-first-run --no-default-browser-check"
)
print(f" Launch Chrome manually:")
print(f" {chrome_cmd}")
print(" No Chrome/Chromium executable found in this environment")
else:
print(f" ⚠ Port {_port} is not reachable at {cdp_url}")
if not _already_open:
print()
print("Browser not connected — start Chrome with remote debugging and retry /browser connect")
print()
return
os.environ["BROWSER_CDP_URL"] = cdp_url
# Eagerly start the CDP supervisor so pending_dialogs + frame_tree
# show up in the next browser_snapshot. No-op if already started.
@@ -6859,6 +6887,58 @@ class HermesCLI:
if self._apply_tui_skin_style():
print(" Prompt + TUI colors updated.")
def _handle_footer_command(self, cmd_original: str) -> None:
"""Toggle or inspect ``display.runtime_footer.enabled`` from the CLI.
Usage:
/footer toggle
/footer on|off explicit
/footer status show current state
"""
from hermes_cli.config import load_config
from hermes_cli.colors import Colors as _Colors
# Parse arg
arg = ""
try:
parts = (cmd_original or "").strip().split(None, 1)
if len(parts) > 1:
arg = parts[1].strip().lower()
except Exception:
arg = ""
cfg = load_config() or {}
footer_cfg = ((cfg.get("display") or {}).get("runtime_footer") or {})
current = bool(footer_cfg.get("enabled", False))
fields = footer_cfg.get("fields") or ["model", "context_pct", "cwd"]
if arg in ("status", "?"):
state = "ON" if current else "OFF"
_cprint(
f" {_Colors.BOLD}Runtime footer:{_Colors.RESET} {state}\n"
f" Fields: {', '.join(fields)}"
)
return
if arg in ("on", "enable", "true", "1"):
new_state = True
elif arg in ("off", "disable", "false", "0"):
new_state = False
elif arg == "":
new_state = not current
else:
_cprint(" Usage: /footer [on|off|status]")
return
if save_config_value("display.runtime_footer.enabled", new_state):
state = (
f"{_Colors.GREEN}ON{_Colors.RESET}" if new_state
else f"{_Colors.DIM}OFF{_Colors.RESET}"
)
_cprint(f" Runtime footer: {state}")
else:
_cprint(" Failed to save runtime_footer setting to config.yaml")
def _toggle_verbose(self):
"""Cycle tool progress mode: off → new → all → verbose → off."""
cycle = ["off", "new", "all", "verbose"]
@@ -7099,9 +7179,15 @@ class HermesCLI:
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
# Pass None as system_message so _compress_context rebuilds
# the system prompt from scratch via _build_system_prompt(None).
# Passing _cached_system_prompt caused duplication because
# _build_system_prompt appends system_message to prompt_parts
# which already contain the agent identity — resulting in the
# identity block appearing twice (issue #15281).
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
None,
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
@@ -7225,6 +7311,8 @@ class HermesCLI:
provider = getattr(agent, "provider", None) or getattr(self, "provider", None)
base_url = getattr(agent, "base_url", None) or getattr(self, "base_url", None)
api_key = getattr(agent, "api_key", None) or getattr(self, "api_key", None)
# Lazy import — pulls the OpenAI SDK chain, only needed here.
from agent.account_usage import fetch_account_usage, render_account_usage_lines
account_snapshot = None
if provider:
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as _pool:
@@ -8814,6 +8902,13 @@ class HermesCLI:
response,
self.conversation_history,
failure_callback=_title_failure_cb,
main_runtime={
"model": self.model,
"provider": self.provider,
"base_url": self.base_url,
"api_key": self.api_key,
"api_mode": self.api_mode,
},
)
except Exception:
pass
@@ -9271,6 +9366,21 @@ class HermesCLI:
self._console_print(f"[dim {_tip_color}]✦ Tip: {_tip}[/]")
except Exception:
pass # Tips are non-critical — never break startup
# Curator — kick off a background skill-maintenance pass on startup
# if the schedule says we're due. Runs in a daemon thread so it
# never blocks the interactive loop. Best-effort; any failure is
# swallowed to avoid breaking session startup.
try:
from agent.curator import maybe_run_curator
maybe_run_curator(
idle_for_seconds=float("inf"), # CLI startup = fully idle
on_summary=lambda msg: self._console_print(
f"[dim #6b7684]💾 {msg}[/]"
),
)
except Exception:
pass
if self.preloaded_skills and not self._startup_skills_line_shown:
skills_label = ", ".join(self.preloaded_skills)
self._console_print(
@@ -11079,6 +11189,19 @@ class HermesCLI:
self._session_db.end_session(self.agent.session_id, "cli_close")
except (Exception, KeyboardInterrupt) as e:
logger.debug("Could not close session in DB: %s", e)
# /exit --delete: also remove the current session's transcripts
# and SQLite history. Ported from google-gemini/gemini-cli#19332.
if getattr(self, '_delete_session_on_exit', False):
try:
from hermes_constants import get_hermes_home as _ghh
_sessions_dir = _ghh() / "sessions"
_sid = self.agent.session_id
if self._session_db.delete_session(_sid, sessions_dir=_sessions_dir):
_cprint(f" {_DIM}✓ Session {_escape(_sid)} deleted{_RST}")
else:
_cprint(f" {_DIM}✗ Session {_escape(_sid)} not found for deletion{_RST}")
except (Exception, KeyboardInterrupt) as e:
logger.debug("Could not delete session on exit: %s", e)
# Plugin hook: on_session_end — safety net for interrupted exits.
# run_conversation() already fires this per-turn on normal completion,
# so only fire here if the agent was mid-turn (_agent_running) when
+15 -6
View File
@@ -21,6 +21,7 @@ from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
from hermes_time import now as _hermes_now
from utils import atomic_replace
try:
from croniter import croniter
@@ -312,13 +313,21 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
elif schedule["kind"] == "cron":
if not HAS_CRONITER:
logger.warning(
"Cannot compute next run for cron schedule %r: 'croniter' "
"is not installed. Install the 'cron' extra (pip install "
"'hermes-agent[cron]') to re-enable recurring cron jobs.",
"Cannot compute next run for cron schedule %r: 'croniter' is "
"not installed. croniter is a core dependency as of v0.9.x; "
"reinstall hermes-agent or run 'pip install croniter' in your "
"runtime env.",
schedule.get("expr"),
)
return None
cron = croniter(schedule["expr"], now)
# Use last_run_at as the croniter base when available, consistent
# with interval jobs. This ensures that after a crash/restart,
# the next run is anchored to the actual last execution time
# rather than to an arbitrary restart time.
base_time = now
if last_run_at:
base_time = _ensure_aware(datetime.fromisoformat(last_run_at))
cron = croniter(schedule["expr"], base_time)
next_run = cron.get_next(datetime)
return next_run.isoformat()
@@ -367,7 +376,7 @@ def save_jobs(jobs: List[Dict[str, Any]]):
json.dump({"jobs": jobs, "updated_at": _hermes_now().isoformat()}, f, indent=2)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, JOBS_FILE)
atomic_replace(tmp_path, JOBS_FILE)
_secure_file(JOBS_FILE)
except BaseException:
try:
@@ -863,7 +872,7 @@ def save_job_output(job_id: str, output: str):
f.write(output)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, output_file)
atomic_replace(tmp_path, output_file)
_secure_file(output_file)
except BaseException:
try:
+42 -7
View File
@@ -198,7 +198,9 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
if resolved:
parsed_chat_id, parsed_thread_id, resolved_is_explicit = _parse_target_ref(platform_key, resolved)
if resolved_is_explicit:
chat_id, thread_id = parsed_chat_id, parsed_thread_id
chat_id = parsed_chat_id
if parsed_thread_id is not None:
thread_id = parsed_thread_id
else:
chat_id = resolved
except Exception:
@@ -231,12 +233,32 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d
}
def _normalize_deliver_value(deliver) -> str:
"""Normalize a stored/submitted ``deliver`` value to its canonical string form.
The contract is that ``deliver`` is a string (``"local"``, ``"origin"``,
``"telegram"``, ``"telegram:-1001:17"``, or comma-separated combinations).
Historically some callers MCP clients passing an array, direct edits of
``jobs.json``, or stale code paths have stored a list/tuple like
``["telegram"]``. ``str(["telegram"])`` would serialize to the literal
string ``"['telegram']"``, which is not a known platform and fails
resolution silently. Flatten lists/tuples into a comma-separated string
so both forms work. Returns ``"local"`` for anything falsy.
"""
if deliver is None or deliver == "":
return "local"
if isinstance(deliver, (list, tuple)):
parts = [str(p).strip() for p in deliver if str(p).strip()]
return ",".join(parts) if parts else "local"
return str(deliver)
def _resolve_delivery_targets(job: dict) -> List[dict]:
"""Resolve all concrete auto-delivery targets for a cron job (supports comma-separated deliver)."""
deliver = job.get("deliver", "local")
deliver = _normalize_deliver_value(job.get("deliver", "local"))
if deliver == "local":
return []
parts = [p.strip() for p in str(deliver).split(",") if p.strip()]
parts = [p.strip() for p in deliver.split(",") if p.strip()]
seen = set()
targets = []
for part in parts:
@@ -1011,10 +1033,12 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
# When a workdir is configured, inject AGENTS.md / CLAUDE.md /
# .cursorrules from that directory; otherwise preserve the old
# behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
# Cron jobs should always inherit the user's SOUL.md identity from
# HERMES_HOME. When a workdir is configured, also inject project
# context files (AGENTS.md / CLAUDE.md / .cursorrules) from there.
# Without a workdir, keep cwd context discovery disabled.
skip_context_files=not bool(_job_workdir),
load_soul_identity=True,
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
@@ -1029,7 +1053,18 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
#
# Uses the agent's built-in activity tracker (updated by
# _touch_activity() on every tool call, API call, and stream delta).
_cron_timeout = float(os.getenv("HERMES_CRON_TIMEOUT", 600))
_raw_cron_timeout = os.getenv("HERMES_CRON_TIMEOUT", "").strip()
if _raw_cron_timeout:
try:
_cron_timeout = float(_raw_cron_timeout)
except (ValueError, TypeError):
logger.warning(
"Invalid HERMES_CRON_TIMEOUT=%r; using default 600s",
_raw_cron_timeout,
)
_cron_timeout = 600.0
else:
_cron_timeout = 600.0
_cron_inactivity_limit = _cron_timeout if _cron_timeout > 0 else None
_POLL_INTERVAL = 5.0
_cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
-85
View File
@@ -1,85 +0,0 @@
"""Built-in boot-md hook — run ~/.hermes/BOOT.md on gateway startup.
This hook is always registered. It silently skips if no BOOT.md exists.
To activate, create ``~/.hermes/BOOT.md`` with instructions for the
agent to execute on every gateway restart.
Example BOOT.md::
# Startup Checklist
1. Check if any cron jobs failed overnight
2. Send a status update to Discord #general
3. If there are errors in /opt/app/deploy.log, summarize them
The agent runs in a background thread so it doesn't block gateway
startup. If nothing needs attention, it replies with [SILENT] to
suppress delivery.
"""
import logging
import threading
logger = logging.getLogger("hooks.boot-md")
from hermes_constants import get_hermes_home
HERMES_HOME = get_hermes_home()
BOOT_FILE = HERMES_HOME / "BOOT.md"
def _build_boot_prompt(content: str) -> str:
"""Wrap BOOT.md content in a system-level instruction."""
return (
"You are running a startup boot checklist. Follow the BOOT.md "
"instructions below exactly.\n\n"
"---\n"
f"{content}\n"
"---\n\n"
"Execute each instruction. If you need to send a message to a "
"platform, use the send_message tool.\n"
"If nothing needs attention and there is nothing to report, "
"reply with ONLY: [SILENT]"
)
def _run_boot_agent(content: str) -> None:
"""Spawn a one-shot agent session to execute the boot instructions."""
try:
from run_agent import AIAgent
prompt = _build_boot_prompt(content)
agent = AIAgent(
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
max_iterations=20,
)
result = agent.run_conversation(prompt)
response = result.get("final_response", "")
if response and "[SILENT]" not in response:
logger.info("boot-md completed: %s", response[:200])
else:
logger.info("boot-md completed (nothing to report)")
except Exception as e:
logger.error("boot-md agent failed: %s", e)
async def handle(event_type: str, context: dict) -> None:
"""Gateway startup handler — run BOOT.md if it exists."""
if not BOOT_FILE.exists():
return
content = BOOT_FILE.read_text(encoding="utf-8").strip()
if not content:
return
logger.info("Running BOOT.md (%d chars)", len(content))
# Run in a background thread so we don't block gateway startup.
thread = threading.Thread(
target=_run_boot_agent,
args=(content,),
name="boot-md",
daemon=True,
)
thread.start()
+6 -12
View File
@@ -52,19 +52,13 @@ class HookRegistry:
return list(self._loaded_hooks)
def _register_builtin_hooks(self) -> None:
"""Register built-in hooks that are always active."""
try:
from gateway.builtin_hooks.boot_md import handle as boot_md_handle
"""Register built-in hooks that are always active.
self._handlers.setdefault("gateway:startup", []).append(boot_md_handle)
self._loaded_hooks.append({
"name": "boot-md",
"description": "Run ~/.hermes/BOOT.md on gateway startup",
"events": ["gateway:startup"],
"path": "(builtin)",
})
except Exception as e:
print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
Currently empty no shipped built-in hooks. Kept as the extension
point for future always-on gateway hooks so they drop in without
re-plumbing discover_and_load().
"""
return
def discover_and_load(self) -> None:
"""
+2 -1
View File
@@ -28,6 +28,7 @@ from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_dir
from utils import atomic_replace
# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
@@ -59,7 +60,7 @@ def _secure_write(path: Path, data: str) -> None:
f.write(data)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, str(path))
atomic_replace(tmp_path, path)
try:
os.chmod(path, 0o600)
except OSError:
+207 -47
View File
@@ -7,7 +7,9 @@ Exposes an HTTP server with endpoints:
- GET /v1/responses/{response_id} Retrieve a stored response
- DELETE /v1/responses/{response_id} Delete a stored response
- GET /v1/models lists hermes-agent as an available model
- GET /v1/capabilities machine-readable API capabilities for external UIs
- POST /v1/runs start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id} retrieve current run status
- GET /v1/runs/{run_id}/events SSE stream of structured lifecycle events
- POST /v1/runs/{run_id}/stop interrupt a running agent
- GET /health health check
@@ -590,6 +592,8 @@ class APIServerAdapter(BasePlatformAdapter):
# Active run agent/task references for stop support
self._active_run_agents: Dict[str, Any] = {}
self._active_run_tasks: Dict[str, "asyncio.Task"] = {}
# Pollable run status for dashboards and external control-plane UIs.
self._run_statuses: Dict[str, Dict[str, Any]] = {}
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
@staticmethod
@@ -808,6 +812,51 @@ class APIServerAdapter(BasePlatformAdapter):
],
})
async def _handle_capabilities(self, request: "web.Request") -> "web.Response":
"""GET /v1/capabilities — advertise the stable API surface.
External UIs and orchestrators use this endpoint to discover the API
server's plugin-safe contract without scraping docs or assuming that
every Hermes version exposes the same endpoints.
"""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
return web.json_response({
"object": "hermes.api_server.capabilities",
"platform": "hermes-agent",
"model": self._model_name,
"auth": {
"type": "bearer",
"required": bool(self._api_key),
},
"features": {
"chat_completions": True,
"chat_completions_streaming": True,
"responses_api": True,
"responses_streaming": True,
"run_submission": True,
"run_status": True,
"run_events_sse": True,
"run_stop": True,
"tool_progress_events": True,
"session_continuity_header": "X-Hermes-Session-Id",
"cors": bool(self._cors_origins),
},
"endpoints": {
"health": {"method": "GET", "path": "/health"},
"health_detailed": {"method": "GET", "path": "/health/detailed"},
"models": {"method": "GET", "path": "/v1/models"},
"chat_completions": {"method": "POST", "path": "/v1/chat/completions"},
"responses": {"method": "POST", "path": "/v1/responses"},
"runs": {"method": "POST", "path": "/v1/runs"},
"run_status": {"method": "GET", "path": "/v1/runs/{run_id}"},
"run_events": {"method": "GET", "path": "/v1/runs/{run_id}/events"},
"run_stop": {"method": "POST", "path": "/v1/runs/{run_id}/stop"},
},
})
async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
"""POST /v1/chat/completions — OpenAI Chat Completions format."""
auth_err = self._check_auth(request)
@@ -932,39 +981,62 @@ class APIServerAdapter(BasePlatformAdapter):
if delta is not None:
_stream_q.put(delta)
def _on_tool_progress(event_type, name, preview, args, **kwargs):
"""Send tool progress as a separate SSE event.
# Track which tool_call_ids we've emitted a "running" lifecycle
# event for, so a "completed" event without a matching "running"
# (e.g. internal/filtered tools) is silently dropped instead of
# producing an orphaned event clients can't correlate.
_started_tool_call_ids: set[str] = set()
Previously, progress markers like `` list`` were injected
directly into ``delta.content``. OpenAI-compatible frontends
(Open WebUI, LobeChat, ) store ``delta.content`` verbatim as
the assistant message and send it back on subsequent requests.
After enough turns the model learns to *emit* the markers as
plain text instead of issuing real tool calls silently
hallucinating tool results. See #6972.
def _on_tool_start(tool_call_id, function_name, function_args):
"""Emit ``hermes.tool.progress`` with ``status: running``.
The fix: push a tagged tuple ``("__tool_progress__", payload)``
onto the stream queue. The SSE writer emits it as a custom
``event: hermes.tool.progress`` line that compliant frontends
can render for UX but will *not* persist into conversation
history. Clients that don't understand the custom event type
silently ignore it per the SSE specification.
Replaces the old ``tool_progress_callback("tool.started",
...)`` emit so SSE consumers receive a single event per
tool start, carrying both the legacy ``tool``/``emoji``/
``label`` payload (for #6972 frontends) and the new
``toolCallId``/``status`` correlation fields (#16588).
Skips tools whose names start with ``_`` so internal
events (``_thinking``, ) stay off the wire matching
the prior ``_on_tool_progress`` filter exactly.
"""
if event_type != "tool.started":
if not tool_call_id or function_name.startswith("_"):
return
if name.startswith("_"):
return
from agent.display import get_tool_emoji
emoji = get_tool_emoji(name)
label = preview or name
_started_tool_call_ids.add(tool_call_id)
from agent.display import build_tool_preview, get_tool_emoji
label = build_tool_preview(function_name, function_args) or function_name
_stream_q.put(("__tool_progress__", {
"tool": name,
"emoji": emoji,
"tool": function_name,
"emoji": get_tool_emoji(function_name),
"label": label,
"toolCallId": tool_call_id,
"status": "running",
}))
def _on_tool_complete(tool_call_id, function_name, function_args, function_result):
"""Emit the matching ``status: completed`` event.
Dropped if the start was filtered (internal tool, missing
id, or never seen) so clients never get an orphaned
``completed`` they can't correlate to a prior ``running``.
"""
if not tool_call_id or tool_call_id not in _started_tool_call_ids:
return
_started_tool_call_ids.discard(tool_call_id)
_stream_q.put(("__tool_progress__", {
"tool": function_name,
"toolCallId": tool_call_id,
"status": "completed",
}))
# Start agent in background. agent_ref is a mutable container
# so the SSE writer can interrupt the agent on client disconnect.
#
# ``tool_progress_callback`` is intentionally not wired here:
# it would duplicate every emit because ``run_agent`` fires it
# side-by-side with ``tool_start_callback``/``tool_complete_callback``.
# The structured callbacks are strictly richer (they carry the
# tool_call id), so they own the chat-completions SSE channel.
agent_ref = [None]
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
@@ -972,7 +1044,8 @@ class APIServerAdapter(BasePlatformAdapter):
ephemeral_system_prompt=system_prompt,
session_id=session_id,
stream_delta_callback=_on_delta,
tool_progress_callback=_on_tool_progress,
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
))
@@ -1087,7 +1160,8 @@ class APIServerAdapter(BasePlatformAdapter):
Tagged tuples ``("__tool_progress__", payload)`` are sent
as a custom ``event: hermes.tool.progress`` SSE event so
frontends can display them without storing the markers in
conversation history. See #6972.
conversation history. See #6972 for the original event,
#16588 for the ``toolCallId``/``status`` lifecycle fields.
"""
if isinstance(item, tuple) and len(item) == 2 and item[0] == "__tool_progress__":
event_data = json.dumps(item[1])
@@ -2297,10 +2371,31 @@ class APIServerAdapter(BasePlatformAdapter):
_MAX_CONCURRENT_RUNS = 10 # Prevent unbounded resource allocation
_RUN_STREAM_TTL = 300 # seconds before orphaned runs are swept
_RUN_STATUS_TTL = 3600 # seconds to retain terminal run status for polling
def _set_run_status(self, run_id: str, status: str, **fields: Any) -> Dict[str, Any]:
"""Update pollable run status without exposing private agent objects."""
now = time.time()
current = self._run_statuses.get(run_id, {})
current.update({
"object": "hermes.run",
"run_id": run_id,
"status": status,
"updated_at": now,
})
current.setdefault("created_at", fields.pop("created_at", now))
current.update(fields)
self._run_statuses[run_id] = current
return current
def _make_run_event_callback(self, run_id: str, loop: "asyncio.AbstractEventLoop"):
"""Return a tool_progress_callback that pushes structured events to the run's SSE queue."""
def _push(event: Dict[str, Any]) -> None:
self._set_run_status(
run_id,
self._run_statuses.get(run_id, {}).get("status", "running"),
last_event=event.get("event"),
)
q = self._run_streams.get(run_id)
if q is None:
return
@@ -2365,28 +2460,6 @@ class APIServerAdapter(BasePlatformAdapter):
if not user_message:
return web.json_response(_openai_error("No user message found in input"), status=400)
run_id = f"run_{uuid.uuid4().hex}"
loop = asyncio.get_running_loop()
q: "asyncio.Queue[Optional[Dict]]" = asyncio.Queue()
self._run_streams[run_id] = q
self._run_streams_created[run_id] = time.time()
event_cb = self._make_run_event_callback(run_id, loop)
# Also wire stream_delta_callback so message.delta events flow through
def _text_cb(delta: Optional[str]) -> None:
if delta is None:
return
try:
loop.call_soon_threadsafe(q.put_nowait, {
"event": "message.delta",
"run_id": run_id,
"timestamp": time.time(),
"delta": delta,
})
except Exception:
pass
instructions = body.get("instructions")
previous_response_id = body.get("previous_response_id")
@@ -2434,11 +2507,42 @@ class APIServerAdapter(BasePlatformAdapter):
)
conversation_history.append({"role": msg["role"], "content": str(content)})
run_id = f"run_{uuid.uuid4().hex}"
session_id = body.get("session_id") or stored_session_id or run_id
ephemeral_system_prompt = instructions
loop = asyncio.get_running_loop()
q: "asyncio.Queue[Optional[Dict]]" = asyncio.Queue()
created_at = time.time()
self._run_streams[run_id] = q
self._run_streams_created[run_id] = created_at
event_cb = self._make_run_event_callback(run_id, loop)
# Also wire stream_delta_callback so message.delta events flow through.
def _text_cb(delta: Optional[str]) -> None:
if delta is None:
return
try:
loop.call_soon_threadsafe(q.put_nowait, {
"event": "message.delta",
"run_id": run_id,
"timestamp": time.time(),
"delta": delta,
})
except Exception:
pass
self._set_run_status(
run_id,
"queued",
created_at=created_at,
session_id=session_id,
model=body.get("model", self._model_name),
)
async def _run_and_close():
try:
self._set_run_status(run_id, "running")
agent = self._create_agent(
ephemeral_system_prompt=ephemeral_system_prompt,
session_id=session_id,
@@ -2468,8 +2572,36 @@ class APIServerAdapter(BasePlatformAdapter):
"output": final_response,
"usage": usage,
})
self._set_run_status(
run_id,
"completed",
output=final_response,
usage=usage,
last_event="run.completed",
)
except asyncio.CancelledError:
self._set_run_status(
run_id,
"cancelled",
last_event="run.cancelled",
)
try:
q.put_nowait({
"event": "run.cancelled",
"run_id": run_id,
"timestamp": time.time(),
})
except Exception:
pass
raise
except Exception as exc:
logger.exception("[api_server] run %s failed", run_id)
self._set_run_status(
run_id,
"failed",
error=str(exc),
last_event="run.failed",
)
try:
q.put_nowait({
"event": "run.failed",
@@ -2499,6 +2631,21 @@ class APIServerAdapter(BasePlatformAdapter):
return web.json_response({"run_id": run_id, "status": "started"}, status=202)
async def _handle_get_run(self, request: "web.Request") -> "web.Response":
"""GET /v1/runs/{run_id} — return pollable run status for external UIs."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
run_id = request.match_info["run_id"]
status = self._run_statuses.get(run_id)
if status is None:
return web.json_response(
_openai_error(f"Run not found: {run_id}", code="run_not_found"),
status=404,
)
return web.json_response(status)
async def _handle_run_events(self, request: "web.Request") -> "web.StreamResponse":
"""GET /v1/runs/{run_id}/events — SSE stream of structured agent lifecycle events."""
auth_err = self._check_auth(request)
@@ -2561,6 +2708,8 @@ class APIServerAdapter(BasePlatformAdapter):
if agent is None and task is None:
return web.json_response(_openai_error(f"Run not found: {run_id}", code="run_not_found"), status=404)
self._set_run_status(run_id, "stopping", last_event="run.stopping")
if agent is not None:
try:
agent.interrupt("Stop requested via API")
@@ -2603,6 +2752,15 @@ class APIServerAdapter(BasePlatformAdapter):
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
stale_statuses = [
run_id
for run_id, status in list(self._run_statuses.items())
if status.get("status") in {"completed", "failed", "cancelled"}
and now - float(status.get("updated_at", 0) or 0) > self._RUN_STATUS_TTL
]
for run_id in stale_statuses:
self._run_statuses.pop(run_id, None)
# ------------------------------------------------------------------
# BasePlatformAdapter interface
# ------------------------------------------------------------------
@@ -2621,6 +2779,7 @@ class APIServerAdapter(BasePlatformAdapter):
self._app.router.add_get("/health/detailed", self._handle_health_detailed)
self._app.router.add_get("/v1/health", self._handle_health)
self._app.router.add_get("/v1/models", self._handle_models)
self._app.router.add_get("/v1/capabilities", self._handle_capabilities)
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
self._app.router.add_post("/v1/responses", self._handle_responses)
self._app.router.add_get("/v1/responses/{response_id}", self._handle_get_response)
@@ -2636,6 +2795,7 @@ class APIServerAdapter(BasePlatformAdapter):
self._app.router.add_post("/api/jobs/{job_id}/run", self._handle_run_job)
# Structured event streaming
self._app.router.add_post("/v1/runs", self._handle_runs)
self._app.router.add_get("/v1/runs/{run_id}", self._handle_get_run)
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
self._app.router.add_post("/v1/runs/{run_id}/stop", self._handle_stop_run)
# Start background sweep to clean up orphaned (unconsumed) run streams
+52 -10
View File
@@ -307,9 +307,14 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""Build kwargs for standalone ``aiohttp.ClientSession`` with proxy.
Returns ``(session_kwargs, request_kwargs)`` where:
- SOCKS ``({"connector": ProxyConnector(...)}, {})``
- HTTP ``({}, {"proxy": url})``
- None ``({}, {})``
- With aiohttp-socks ``({"connector": ProxyConnector(...)}, {})``
for *all* proxy schemes (SOCKS **and** HTTP/HTTPS).
- HTTP without aiohttp-socks ``({}, {"proxy": url})``.
- None ``({}, {})``.
Prefer the connector path: it works transparently with libraries
(like mautrix) that call ``session.request()`` without forwarding
per-request ``proxy=`` kwargs.
Usage::
@@ -320,20 +325,20 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""
if not proxy_url:
return {}, {}
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
if proxy_url.lower().startswith("socks"):
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}, {}
return {}, {"proxy": proxy_url}
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
@@ -902,6 +907,41 @@ class MessageEvent:
return args
_PLAINTEXT_GATEWAY_RESTART_PATTERNS: tuple[re.Pattern[str], ...] = (
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?gateway[.!?\s]*$", re.IGNORECASE),
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?hermes\s+gateway[.!?\s]*$", re.IGNORECASE),
re.compile(r"^(?:please\s+)?restart\s+hermes[.!?\s]*$", re.IGNORECASE),
)
def coerce_plaintext_gateway_command(event: "MessageEvent") -> None:
"""Rewrite a tiny set of DM plaintext admin phrases into slash commands.
This keeps high-impact operational phrases like ``restart gateway`` out of
the LLM/tool path, where they can trigger a self-restart from inside the
currently running agent and leave the gateway stuck in ``draining`` while it
waits for that same agent to finish.
Scope is intentionally narrow: DM text messages only, exact restart-style
phrases only. Group chats keep natural-language semantics.
"""
try:
if event is None or event.message_type != MessageType.TEXT:
return
text = (event.text or "").strip()
if not text or text.startswith("/"):
return
source = getattr(event, "source", None)
if getattr(source, "chat_type", None) != "dm":
return
for pattern in _PLAINTEXT_GATEWAY_RESTART_PATTERNS:
if pattern.match(text):
event.text = "/restart"
return
except Exception:
return
@dataclass
class SendResult:
"""Result of sending a message."""
@@ -2188,6 +2228,8 @@ class BasePlatformAdapter(ABC):
"""
if not self._message_handler:
return
coerce_plaintext_gateway_command(event)
session_key = build_session_key(
event.source,
+14 -3
View File
@@ -305,7 +305,7 @@ class VoiceReceiver:
encrypted = bytes(payload_with_nonce[:-4])
try:
import nacl.secret # noqa: delayed import only in voice path
import nacl.secret # noqa: E402 — delayed import, only in voice path
box = nacl.secret.Aead(self._secret_key)
decrypted = box.decrypt(encrypted, header, bytes(nonce))
except Exception as e:
@@ -813,7 +813,14 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
return
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=30)
# Discord's per-app command-management bucket is ~5 writes / 20 s,
# so a mass-prune-plus-upsert reconcile (e.g. 77 orphans + 30
# desired = 107 writes) takes several minutes of forced waits.
# A flat 30 s budget blew up reliably under bucket pressure and
# left slash commands broken for ~60 min until the bucket fully
# recovered. Use a wide ceiling; the cap still guards against a
# true hang. (#16713)
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=600)
logger.info(
"[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
self.name,
@@ -825,7 +832,11 @@ class DiscordAdapter(BasePlatformAdapter):
summary["deleted"],
)
except asyncio.TimeoutError:
logger.warning("[%s] Slash command sync timed out after 30s", self.name)
logger.warning(
"[%s] Slash command sync timed out — Discord rate-limit bucket "
"may be saturated; will retry on next reconnect",
self.name,
)
except asyncio.CancelledError:
raise
except Exception as e: # pragma: no cover - defensive logging
-1
View File
@@ -974,7 +974,6 @@ def build_whole_comment_prompt(
def _resolve_model_and_runtime() -> Tuple[str, dict]:
"""Resolve model and provider credentials, same as gateway message handling."""
import os
from gateway.run import _load_gateway_config, _resolve_gateway_model
user_config = _load_gateway_config()
+2 -2
View File
@@ -11,10 +11,10 @@ import logging
import re
import time
from pathlib import Path
from typing import TYPE_CHECKING, Dict, Optional
from typing import TYPE_CHECKING, Dict
if TYPE_CHECKING:
from gateway.platforms.base import BasePlatformAdapter, MessageEvent
from gateway.platforms.base import MessageEvent
logger = logging.getLogger(__name__)
+420 -44
View File
@@ -11,6 +11,7 @@ Environment variables:
MATRIX_PASSWORD Password (alternative to access token)
MATRIX_ENCRYPTION Set "true" to enable E2EE
MATRIX_DEVICE_ID Stable device ID for E2EE persistence across restarts
MATRIX_PROXY HTTP(S) or SOCKS proxy URL for Matrix traffic
MATRIX_ALLOWED_USERS Comma-separated Matrix user IDs (@user:server)
MATRIX_HOME_ROOM Room ID for cron/notification delivery
MATRIX_REACTIONS Set "false" to disable processing lifecycle reactions
@@ -18,6 +19,7 @@ Environment variables:
MATRIX_REQUIRE_MENTION Require @mention in rooms (default: true)
MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement
MATRIX_AUTO_THREAD Auto-create threads for room messages (default: true)
MATRIX_DM_AUTO_THREAD Auto-create threads for DM messages (default: false)
MATRIX_RECOVERY_KEY Recovery key for cross-signing verification after device key rotation
MATRIX_DM_MENTION_THREADS Create a thread when bot is @mentioned in a DM (default: false)
"""
@@ -30,6 +32,8 @@ import mimetypes
import os
import re
import time
from dataclasses import dataclass
from html import escape as _html_escape
from pathlib import Path
from typing import Any, Dict, Optional, Set
@@ -95,11 +99,25 @@ from gateway.platforms.base import (
MessageType,
ProcessingOutcome,
SendResult,
resolve_proxy_url,
proxy_kwargs_for_aiohttp,
)
from gateway.platforms.helpers import ThreadParticipationTracker
logger = logging.getLogger(__name__)
@dataclass
class _MatrixApprovalPrompt:
"""Tracks a pending Matrix reaction-based exec approval prompt."""
def __init__(self, session_key: str, chat_id: str, message_id: str, resolved: bool = False):
self.session_key = session_key
self.chat_id = chat_id
self.message_id = message_id
self.resolved = resolved
self.bot_reaction_events: dict[str, str] = {} # emoji -> event_id
# Matrix message size limit (4000 chars practical, spec has no hard limit
# but clients render poorly above this).
MAX_MESSAGE_LENGTH = 4000
@@ -114,11 +132,85 @@ _CRYPTO_DB_PATH = _STORE_DIR / "crypto.db"
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
_OUTBOUND_MENTION_RE = re.compile(
r"(?<![\w/])(@[0-9A-Za-z._=/-]+:[0-9A-Za-z.-]+(?::\d+)?)"
)
_E2EE_INSTALL_HINT = (
"Install with: pip install 'mautrix[encryption]' (requires libolm C library)"
)
_MATRIX_IMAGE_FILENAME_EXTS = frozenset({
".jpg",
".jpeg",
".png",
".gif",
".webp",
".bmp",
".svg",
".heic",
".heif",
".avif",
})
def _looks_like_matrix_image_filename(text: str) -> bool:
"""Return True when Matrix image body text is probably just a transport filename.
Matrix ``m.image`` events commonly populate ``content.body`` with the uploaded
filename when the user did not add a caption. Treating that raw filename as
user-authored text confuses downstream vision enrichment.
"""
candidate = str(text or "").strip()
if not candidate or "\n" in candidate or candidate.endswith("/"):
return False
name = Path(candidate).name
if not name or name != candidate:
return False
suffix = Path(name).suffix.lower()
if not suffix:
return False
guessed_type, _ = mimetypes.guess_type(name)
if guessed_type and guessed_type.startswith("image/"):
return True
return suffix in _MATRIX_IMAGE_FILENAME_EXTS
def _create_matrix_session(proxy_url: str | None):
"""Create an ``aiohttp.ClientSession`` whose proxy applies to *all* requests.
mautrix's ``HTTPAPI._send()`` calls ``session.request()`` without forwarding
per-request ``proxy=`` kwargs. For HTTP(S) proxies we use aiohttp's native
``proxy=`` session parameter which sets a default for every request. For SOCKS
we use ``aiohttp_socks.ProxyConnector`` (connector-level).
When no proxy is configured we enable ``trust_env`` so standard env vars
(``HTTP_PROXY`` / ``HTTPS_PROXY``) are honoured automatically.
"""
import aiohttp
if not proxy_url:
return aiohttp.ClientSession(trust_env=True)
if proxy_url.split("://")[0].lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
return aiohttp.ClientSession(
connector=ProxyConnector.from_url(proxy_url, rdns=True),
)
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return aiohttp.ClientSession(trust_env=True)
return aiohttp.ClientSession(proxy=proxy_url)
def _check_e2ee_deps() -> bool:
"""Return True if mautrix E2EE dependencies (python-olm) are available."""
@@ -260,6 +352,9 @@ class MatrixAdapter(BasePlatformAdapter):
"1",
"yes",
)
self._dm_auto_thread: bool = os.getenv(
"MATRIX_DM_AUTO_THREAD", "false"
).lower() in ("true", "1", "yes")
self._dm_mention_threads: bool = os.getenv(
"MATRIX_DM_MENTION_THREADS", "false"
).lower() in ("true", "1", "yes")
@@ -270,6 +365,11 @@ class MatrixAdapter(BasePlatformAdapter):
).lower() not in ("false", "0", "no")
self._pending_reactions: dict[tuple[str, str], str] = {}
# Proxy support — resolve once at init, reuse for all HTTP traffic.
self._proxy_url: str | None = resolve_proxy_url(platform_env_var="MATRIX_PROXY")
if self._proxy_url:
logger.info("Matrix: proxy configured — %s", self._proxy_url)
# Text batching: merge rapid successive messages (Telegram-style).
# Matrix clients split long messages around 4000 chars.
self._text_batch_delay_seconds = float(
@@ -281,6 +381,18 @@ class MatrixAdapter(BasePlatformAdapter):
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
# Matrix reaction-based dangerous command approvals.
self._approval_reaction_map = {
"": "once",
"": "deny",
}
self._approval_prompts_by_event: Dict[str, _MatrixApprovalPrompt] = {}
self._approval_prompt_by_session: Dict[str, str] = {}
allowed_users_raw = os.getenv("MATRIX_ALLOWED_USERS", "")
self._allowed_user_ids: Set[str] = {
u.strip() for u in allowed_users_raw.split(",") if u.strip()
}
def _is_duplicate_event(self, event_id) -> bool:
"""Return True if this event was already processed. Tracks the ID otherwise."""
if not event_id:
@@ -326,7 +438,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
return False
except Exception as exc:
logger.error("Matrix: post-upload key verification failed: %s", exc)
logger.error("Matrix: post-upload key verification failed: %s", exc, exc_info=True)
return False
return True
@@ -342,6 +454,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: cannot verify device keys on server: %s — refusing E2EE",
exc,
exc_info=True,
)
return False
@@ -356,7 +469,7 @@ class MatrixAdapter(BasePlatformAdapter):
try:
await olm.share_keys()
except Exception as exc:
logger.error("Matrix: failed to re-upload device keys: %s", exc)
logger.error("Matrix: failed to re-upload device keys: %s", exc, exc_info=True)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -396,6 +509,7 @@ class MatrixAdapter(BasePlatformAdapter):
"Try generating a new access token to get a fresh device.",
client.device_id,
exc,
exc_info=True,
)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -420,9 +534,11 @@ class MatrixAdapter(BasePlatformAdapter):
_STORE_DIR.mkdir(parents=True, exist_ok=True)
# Create the HTTP API layer.
client_session = _create_matrix_session(self._proxy_url)
api = HTTPAPI(
base_url=self._homeserver,
token=self._access_token or "",
client_session=client_session,
)
# Create the client.
@@ -465,6 +581,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER: %s",
exc,
exc_info=True,
)
await api.session.close()
return False
@@ -607,6 +724,44 @@ class MatrixAdapter(BasePlatformAdapter):
logger.warning(
"Matrix: recovery key verification failed: %s", exc
)
else:
# No recovery key — bootstrap cross-signing if the bot
# has none yet. Without this, Element shows "Encrypted
# by a device not verified by its owner" on every
# message from this bot, indefinitely. mautrix's
# generate_recovery_key does the full flow: generates
# MSK/SSK/USK, uploads private keys to SSSS, publishes
# public keys to the homeserver, and signs the current
# device with the new SSK. Some homeservers require UIA
# for /keys/device_signing/upload — those will need an
# alternate path; Continuwuity and Synapse-with-shared-
# secret accept the unauthenticated upload.
try:
own_xsign = await olm.get_own_cross_signing_public_keys()
except Exception as exc:
own_xsign = None
logger.warning(
"Matrix: cross-signing key lookup failed: %s", exc
)
if own_xsign is None:
try:
new_recovery_key = await olm.generate_recovery_key()
logger.warning(
"Matrix: bootstrapped cross-signing for %s. "
"SAVE THIS RECOVERY KEY — set "
"MATRIX_RECOVERY_KEY for future restarts so "
"the bot can re-sign its device after key "
"rotation: %s",
client.mxid,
new_recovery_key,
)
except Exception as exc:
logger.warning(
"Matrix: cross-signing bootstrap failed "
"(non-fatal — Element will show 'not "
"verified by its owner'): %s",
exc,
)
client.crypto = olm
logger.info(
@@ -664,6 +819,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: initial sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
else:
logger.warning(
"Matrix: initial sync returned unexpected type %s",
@@ -727,17 +883,8 @@ class MatrixAdapter(BasePlatformAdapter):
chunks = self.truncate_message(formatted, MAX_MESSAGE_LENGTH)
last_event_id = None
for chunk in chunks:
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": chunk,
}
# Convert markdown to HTML for rich rendering.
html = self._markdown_to_html(chunk)
if html and html != chunk:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
for i, chunk in enumerate(chunks):
msg_content = self._build_text_message_content(chunk)
# Reply-to support.
if reply_to:
@@ -844,25 +991,21 @@ class MatrixAdapter(BasePlatformAdapter):
"""Edit an existing message (via m.replace)."""
formatted = self.format_message(content)
new_content = self._build_text_message_content(formatted)
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": f"* {formatted}",
"m.new_content": {
"msgtype": "m.text",
"body": formatted,
},
"m.relates_to": {
"rel_type": "m.replace",
"event_id": message_id,
},
"m.new_content": new_content,
}
html = self._markdown_to_html(formatted)
if html and html != formatted:
msg_content["m.new_content"]["format"] = "org.matrix.custom.html"
msg_content["m.new_content"]["formatted_body"] = html
if "m.mentions" in new_content:
msg_content["m.mentions"] = new_content["m.mentions"]
if "formatted_body" in new_content:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = f"* {html}"
msg_content["formatted_body"] = f'* {new_content["formatted_body"]}'
msg_content["m.relates_to"] = {
"rel_type": "m.replace",
"event_id": message_id,
}
try:
event_id = await self._client.send_message_event(
@@ -895,10 +1038,12 @@ class MatrixAdapter(BasePlatformAdapter):
# Try aiohttp first (always available), fall back to httpx
try:
import aiohttp as _aiohttp
async with _aiohttp.ClientSession(trust_env=True) as http:
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(self._proxy_url)
async with _aiohttp.ClientSession(**_sess_kw) as http:
async with http.get(
image_url, timeout=_aiohttp.ClientTimeout(total=30)
image_url,
timeout=_aiohttp.ClientTimeout(total=30),
**_req_kw,
) as resp:
resp.raise_for_status()
data = await resp.read()
@@ -908,8 +1053,10 @@ class MatrixAdapter(BasePlatformAdapter):
)
except ImportError:
import httpx
async with httpx.AsyncClient() as http:
_httpx_kw: dict = {}
if self._proxy_url:
_httpx_kw["proxy"] = self._proxy_url
async with httpx.AsyncClient(**_httpx_kw) as http:
resp = await http.get(image_url, follow_redirects=True, timeout=30)
resp.raise_for_status()
data = resp.content
@@ -984,6 +1131,56 @@ class MatrixAdapter(BasePlatformAdapter):
chat_id, video_path, "m.video", caption, reply_to, metadata=metadata
)
async def send_exec_approval(
self,
chat_id: str,
command: str,
session_key: str,
description: str = "dangerous command",
metadata: Optional[dict] = None,
) -> SendResult:
"""Send a reaction-based exec approval prompt for Matrix."""
if not self._client:
return SendResult(success=False, error="Not connected")
cmd_preview = command[:2000] + "..." if len(command) > 2000 else command
text = (
"⚠️ **Dangerous command requires approval**\n"
f"```\n{cmd_preview}\n```\n"
f"Reason: {description}\n\n"
"Reply `/approve` to execute, `/approve session` to approve this pattern for the session, "
"`/approve always` to approve permanently, or `/deny` to cancel.\n\n"
"You can also click the reaction to approve:\n"
"✅ = /approve\n"
"❎ = /deny"
)
result = await self.send(chat_id, text, metadata=metadata)
if not result.success or not result.message_id:
return result
prompt = _MatrixApprovalPrompt(
session_key=session_key,
chat_id=chat_id,
message_id=result.message_id,
)
old_event = self._approval_prompt_by_session.get(session_key)
if old_event:
self._approval_prompts_by_event.pop(old_event, None)
self._approval_prompts_by_event[result.message_id] = prompt
self._approval_prompt_by_session[session_key] = result.message_id
for emoji in ("", ""):
try:
reaction_result = await self._send_reaction(chat_id, result.message_id, emoji)
# Save the bot's reaction event_id for later cleanup
if reaction_result:
prompt.bot_reaction_events[emoji] = str(reaction_result)
except Exception as exc:
logger.debug("Matrix: failed to add approval reaction %s: %s", emoji, exc)
return result
def format_message(self, content: str) -> str:
"""Pass-through — Matrix supports standard Markdown natively."""
# Strip image markdown; media is uploaded separately.
@@ -1115,9 +1312,15 @@ class MatrixAdapter(BasePlatformAdapter):
next_batch = await client.sync_store.get_next_batch()
while not self._closing:
try:
sync_data = await client.sync(
since=next_batch,
timeout=30000,
# Wrap in asyncio.wait_for to guard against TCP-level hangs
# that the Matrix long-poll timeout cannot catch. Long-poll
# is 30s, so 45s gives 15s slack for network drain.
sync_data = await asyncio.wait_for(
client.sync(
since=next_batch,
timeout=30000,
),
timeout=45.0,
)
# nio returns SyncError objects (not exceptions) for auth
@@ -1153,6 +1356,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
except asyncio.CancelledError:
return
@@ -1239,6 +1443,15 @@ class MatrixAdapter(BasePlatformAdapter):
room_id = str(getattr(event, "room_id", ""))
sender = str(getattr(event, "sender", ""))
# Diagnostic: confirm the callback is firing at all when DEBUG is on.
# Helps users troubleshoot silent inbound issues like #5819, #7914, #12614.
logger.debug(
"Matrix: callback fired — event %s from %s in %s",
getattr(event, "event_id", "?"),
sender,
room_id,
)
# Ignore own messages (case-insensitive; also drops when our own
# user_id hasn't been resolved yet — see _is_self_sender docstring
# and issue #15763).
@@ -1350,6 +1563,12 @@ class MatrixAdapter(BasePlatformAdapter):
in_bot_thread = bool(thread_id and thread_id in self._threads)
if self._require_mention and not is_free_room and not in_bot_thread:
if not is_mentioned:
logger.debug(
"Matrix: ignoring message %s in %s — no @mention "
"(set MATRIX_REQUIRE_MENTION=false to disable)",
event_id,
room_id,
)
return None
# DM mention-thread.
@@ -1362,7 +1581,7 @@ class MatrixAdapter(BasePlatformAdapter):
body = self._strip_mention(body)
# Auto-thread.
if not is_dm and not thread_id and self._auto_thread:
if not thread_id and ((not is_dm and self._auto_thread) or (is_dm and self._dm_auto_thread)):
thread_id = event_id
self._threads.mark(thread_id)
@@ -1604,6 +1823,9 @@ class MatrixAdapter(BasePlatformAdapter):
return
body, is_dm, chat_type, thread_id, display_name, source = ctx
if msgtype == "m.image" and _looks_like_matrix_image_filename(body):
body = ""
allow_http_fallback = bool(http_url) and not is_encrypted_media
media_urls = (
[cached_path]
@@ -1633,13 +1855,35 @@ class MatrixAdapter(BasePlatformAdapter):
"Matrix: invited to %s — joining",
room_id,
)
await self._join_room_by_id(room_id)
async def _join_room_by_id(self, room_id: str) -> bool:
"""Join a room by ID and refresh local caches on success."""
if not room_id:
return False
if room_id in self._joined_rooms:
return True
try:
await self._client.join_room(RoomID(room_id))
self._joined_rooms.add(room_id)
logger.info("Matrix: joined %s", room_id)
await self._refresh_dm_cache()
return True
except Exception as exc:
logger.warning("Matrix: error joining %s: %s", room_id, exc)
return False
async def _join_pending_invites(self, sync_data: Dict[str, Any]) -> None:
"""Join rooms still present in rooms.invite after sync processing."""
rooms = sync_data.get("rooms", {}) if isinstance(sync_data, dict) else {}
invites = rooms.get("invite", {})
if not isinstance(invites, dict):
return
for room_id in invites:
if room_id in self._joined_rooms:
continue
logger.info("Matrix: reconciling pending invite for %s", room_id)
await self._join_room_by_id(str(room_id))
# ------------------------------------------------------------------
# Reactions (send, receive, processing lifecycle)
@@ -1754,6 +1998,51 @@ class MatrixAdapter(BasePlatformAdapter):
room_id,
)
# Check if this reaction resolves a pending approval prompt.
prompt = self._approval_prompts_by_event.get(reacts_to)
if prompt and not prompt.resolved:
if room_id != prompt.chat_id:
return
if self._allowed_user_ids and sender not in self._allowed_user_ids:
logger.info(
"Matrix: ignoring approval reaction from unauthorized user %s on %s",
sender, reacts_to,
)
return
choice = self._approval_reaction_map.get(key)
if not choice:
return
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(prompt.session_key, choice)
if count:
prompt.resolved = True
self._approval_prompts_by_event.pop(reacts_to, None)
self._approval_prompt_by_session.pop(prompt.session_key, None)
logger.info(
"Matrix reaction resolved %d approval(s) for session %s "
"(choice=%s, user=%s)",
count, prompt.session_key, choice, sender,
)
# Redact bot's seed reactions, leaving only the user's
await self._redact_bot_approval_reactions(room_id, prompt)
except Exception as exc:
logger.error("Failed to resolve gateway approval from Matrix reaction: %s", exc)
async def _redact_bot_approval_reactions(
self,
room_id: str,
prompt: "_MatrixApprovalPrompt",
) -> None:
"""Redact the bot's seed ✅/❎ reactions, leaving only the user's reaction."""
for emoji, evt_id in prompt.bot_reaction_events.items():
try:
await self.redact_message(room_id, evt_id, "approval resolved")
logger.debug("Matrix: redacted bot reaction %s (%s)", emoji, evt_id)
except Exception as exc:
logger.debug("Matrix: failed to redact bot reaction %s: %s", emoji, exc)
# ------------------------------------------------------------------
# Text message aggregation (handles Matrix client-side splits)
# ------------------------------------------------------------------
@@ -1979,11 +2268,7 @@ class MatrixAdapter(BasePlatformAdapter):
if not self._client or not text:
return SendResult(success=False, error="No client or empty text")
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
html = self._markdown_to_html(text)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
msg_content = self._build_text_message_content(text, msgtype=msgtype)
try:
event_id = await self._client.send_message_event(
@@ -2046,6 +2331,77 @@ class MatrixAdapter(BasePlatformAdapter):
# Mention detection helpers
# ------------------------------------------------------------------
def _build_text_message_content(self, text: str, msgtype: str = "m.text") -> Dict[str, Any]:
"""Build Matrix text content with HTML and outbound mention metadata."""
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
mention_user_ids = self._extract_outbound_mentions(text)
if mention_user_ids:
msg_content["m.mentions"] = {"user_ids": mention_user_ids}
html_source = self._inject_outbound_mention_links(text)
html = self._markdown_to_html(html_source)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
return msg_content
def _extract_outbound_mentions(self, text: str) -> list[str]:
"""Return unique Matrix user IDs mentioned in outbound text."""
protected, _ = self._protect_outbound_mention_regions(text)
seen: Set[str] = set()
mentions: list[str] = []
for match in _OUTBOUND_MENTION_RE.finditer(protected):
user_id = match.group(1)
if user_id not in seen:
seen.add(user_id)
mentions.append(user_id)
return mentions
def _inject_outbound_mention_links(self, text: str) -> str:
"""Wrap outbound Matrix mentions in markdown links outside code spans."""
if not text:
return text
protected, placeholders = self._protect_outbound_mention_regions(text)
linked = _OUTBOUND_MENTION_RE.sub(
lambda match: f"[{match.group(1)}](https://matrix.to/#/{match.group(1)})",
protected,
)
for idx, original in enumerate(placeholders):
linked = linked.replace(f"\x00MENTION_PROTECTED{idx}\x00", original)
return linked
def _protect_outbound_mention_regions(self, text: str) -> tuple[str, list[str]]:
"""Protect markdown regions where outbound mentions should stay literal."""
placeholders: list[str] = []
def _protect(fragment: str) -> str:
idx = len(placeholders)
placeholders.append(fragment)
return f"\x00MENTION_PROTECTED{idx}\x00"
protected = re.sub(
r"```[\s\S]*?```",
lambda match: _protect(match.group(0)),
text or "",
)
protected = re.sub(
r"`[^`\n]+`",
lambda match: _protect(match.group(0)),
protected,
)
protected = re.sub(
r"\[[^\]]+\]\([^)]+\)",
lambda match: _protect(match.group(0)),
protected,
)
return protected, placeholders
def _is_bot_mentioned(
self,
body: str,
@@ -2080,13 +2436,33 @@ class MatrixAdapter(BasePlatformAdapter):
return False
def _strip_mention(self, body: str) -> str:
"""Strip the bot's full MXID (``@user:server``) from *body*.
"""Remove explicit bot mentions from message body.
The bare localpart is intentionally *not* stripped it would
mangle file paths like ``/home/hermes/media/file.png``.
Important: only strip explicit mention tokens (``@user:server`` or
``@localpart``). Do NOT strip bare words matching the bot localpart,
otherwise normal phrases like "Hermes Agent" become "Agent".
"""
if not body:
return ""
# Strip explicit full MXID mentions.
if self._user_id:
body = body.replace(self._user_id, "")
# Strip explicit @localpart mentions only (not bare localpart words).
if self._user_id and ":" in self._user_id:
localpart = self._user_id.split(":")[0].lstrip("@")
if localpart:
body = re.sub(
r'(?<![\w])@' + re.escape(localpart) + r'\b',
'',
body,
flags=re.IGNORECASE,
)
# Normalize spacing after mention removal.
body = re.sub(r'[ \t]{2,}', ' ', body)
body = re.sub(r'\s+([,.;:!?])', r'\1', body)
return body.strip()
async def _get_display_name(self, room_id: str, user_id: str) -> str:
-1
View File
@@ -412,7 +412,6 @@ class MattermostAdapter(BasePlatformAdapter):
import aiohttp
last_exc = None
file_data = None
ct = "application/octet-stream"
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
+2 -7
View File
@@ -1957,7 +1957,7 @@ class QQAdapter(BasePlatformAdapter):
self, openid: str, content: str, reply_to: Optional[str] = None
) -> SendResult:
"""Send text to a C2C user via REST API."""
msg_seq = self._next_msg_seq(reply_to or openid)
self._next_msg_seq(reply_to or openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
@@ -1970,7 +1970,7 @@ class QQAdapter(BasePlatformAdapter):
self, group_openid: str, content: str, reply_to: Optional[str] = None
) -> SendResult:
"""Send text to a group via REST API."""
msg_seq = self._next_msg_seq(reply_to or group_openid)
self._next_msg_seq(reply_to or group_openid)
body = self._build_text_body(content, reply_to)
if reply_to:
body["msg_id"] = reply_to
@@ -2135,11 +2135,6 @@ class QQAdapter(BasePlatformAdapter):
# Route
chat_type = self._guess_chat_type(chat_id)
target_path = (
f"/v2/users/{chat_id}/files"
if chat_type == "c2c"
else f"/v2/groups/{chat_id}/files"
)
if chat_type == "guild":
# Guild channels don't support native media upload in the same way
+287 -8
View File
@@ -31,6 +31,7 @@ from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
ProcessingOutcome,
SendResult,
cache_image_from_bytes,
cache_audio_from_bytes,
@@ -162,6 +163,10 @@ class SignalAdapter(BasePlatformAdapter):
"""Signal messenger adapter using signal-cli HTTP daemon."""
platform = Platform.SIGNAL
# Signal has no real edit API for already-sent messages. Mark it explicitly
# so streaming suppresses the visible cursor instead of leaving a stale tofu
# square behind in chat clients when edit attempts fail.
SUPPORTS_MESSAGE_EDITING = False
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.SIGNAL)
@@ -488,6 +493,11 @@ class SignalAdapter(BasePlatformAdapter):
if text and mentions:
text = _render_mentions(text, mentions)
# Extract quote (reply-to) context from Signal dataMessage
quote_data = data_message.get("quote") or {}
reply_to_id = str(quote_data.get("id")) if quote_data.get("id") else None
reply_to_text = quote_data.get("text")
# Process attachments
attachments_data = data_message.get("attachments", [])
media_urls = []
@@ -541,7 +551,9 @@ class SignalAdapter(BasePlatformAdapter):
else:
timestamp = datetime.now(tz=timezone.utc)
# Build and dispatch event
# Build and dispatch event.
# Store raw envelope data in raw_message so on_processing_start/complete
# can extract targetAuthor + targetTimestamp for sendReaction.
event = MessageEvent(
source=source,
text=text or "",
@@ -549,6 +561,9 @@ class SignalAdapter(BasePlatformAdapter):
media_urls=media_urls,
media_types=media_types,
timestamp=timestamp,
raw_message={"sender": sender, "timestamp_ms": ts_ms},
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
)
logger.debug("Signal: message from %s in %s: %s",
@@ -707,6 +722,159 @@ class SignalAdapter(BasePlatformAdapter):
logger.debug("Signal RPC %s failed: %s", method, e)
return None
# ------------------------------------------------------------------
# Formatting — markdown → Signal body ranges
# ------------------------------------------------------------------
@staticmethod
def _markdown_to_signal(text: str) -> tuple:
"""Convert markdown to plain text + Signal textStyles list.
Signal doesn't render markdown. Instead it uses ``bodyRanges``
(exposed by signal-cli as ``textStyle`` / ``textStyles`` params)
with the format ``start:length:STYLE``.
Positions are measured in **UTF-16 code units** (not Python code
points) because that's what the Signal protocol uses.
Supported styles: BOLD, ITALIC, STRIKETHROUGH, MONOSPACE.
(Signal's SPOILER style is not currently mapped — no standard
markdown syntax for it; would need ``||spoiler||`` parsing.)
Returns ``(plain_text, styles_list)`` where *styles_list* may be
empty if there's nothing to format.
"""
import re
def _utf16_len(s: str) -> int:
"""Length of *s* in UTF-16 code units."""
return len(s.encode("utf-16-le")) // 2
# Pre-process: normalize whitespace before any position tracking
# so later operations don't invalidate recorded offsets.
text = re.sub(r"\n{3,}", "\n\n", text)
text = text.strip()
styles: list = []
# --- Phase 1: fenced code blocks ```...``` → MONOSPACE ---
_CB = re.compile(r"```[a-zA-Z0-9_+-]*\n?(.*?)```", re.DOTALL)
while m := _CB.search(text):
inner = m.group(1).rstrip("\n")
start = m.start()
text = text[: m.start()] + inner + text[m.end() :]
styles.append((start, len(inner), "MONOSPACE"))
# --- Phase 2: heading markers # Foo → Foo (BOLD) ---
_HEADING = re.compile(r"^#{1,6}\s+", re.MULTILINE)
new_text = ""
last_end = 0
for m in _HEADING.finditer(text):
new_text += text[last_end : m.start()]
last_end = m.end()
eol = text.find("\n", m.end())
if eol == -1:
eol = len(text)
heading_text = text[m.end() : eol]
start = len(new_text)
new_text += heading_text
styles.append((start, len(heading_text), "BOLD"))
last_end = eol
new_text += text[last_end:]
text = new_text
# --- Phase 3: inline patterns (single-pass to avoid offset drift) ---
# The old code processed each pattern sequentially, stripping markers
# and recording positions per-pass. Later passes shifted text without
# adjusting earlier positions → bold/italic landed mid-word.
#
# Fix: collect ALL non-overlapping matches first, then strip every
# marker in one pass so positions are computed against the final text.
_PATTERNS = [
(re.compile(r"\*\*(.+?)\*\*", re.DOTALL), "BOLD"),
(re.compile(r"__(.+?)__", re.DOTALL), "BOLD"),
(re.compile(r"~~(.+?)~~", re.DOTALL), "STRIKETHROUGH"),
(re.compile(r"`(.+?)`"), "MONOSPACE"),
(re.compile(r"(?<!\*)\*(?!\*| )(.+?)(?<!\*)\*(?!\*)"), "ITALIC"),
(re.compile(r"(?<!\w)_(?!_)(.+?)(?<!_)_(?!\w)"), "ITALIC"),
]
# Collect all non-overlapping matches (earlier patterns win ties).
all_matches: list = [] # (start, end, g1_start, g1_end, style)
occupied: list = [] # (start, end) intervals already claimed
for pat, style in _PATTERNS:
for m in pat.finditer(text):
ms, me = m.start(), m.end()
if not any(ms < oe and me > os for os, oe in occupied):
all_matches.append((ms, me, m.start(1), m.end(1), style))
occupied.append((ms, me))
all_matches.sort()
# Build removal list so we can adjust Phase 1/2 styles.
# Each match removes its prefix markers (start..g1_start) and
# suffix markers (g1_end..end).
removals: list = [] # (position, length) sorted
for ms, me, g1s, g1e, _ in all_matches:
if g1s > ms:
removals.append((ms, g1s - ms))
if me > g1e:
removals.append((g1e, me - g1e))
removals.sort()
# Adjust Phase 1/2 styles for characters about to be removed.
def _adj(pos: int) -> int:
shift = 0
for rp, rl in removals:
if rp < pos:
shift += min(rl, pos - rp)
else:
break
return pos - shift
adjusted_prior: list = []
for s, l, st in styles:
ns = _adj(s)
ne = _adj(s + l)
if ne > ns:
adjusted_prior.append((ns, ne - ns, st))
# Strip all inline markers in one pass → positions are correct.
result = ""
last_end = 0
inline_styles: list = []
for ms, me, g1s, g1e, sty in all_matches:
result += text[last_end:ms]
pos = len(result)
inner = text[g1s:g1e]
result += inner
inline_styles.append((pos, len(inner), sty))
last_end = me
result += text[last_end:]
text = result
styles = adjusted_prior + inline_styles
# Convert code-point offsets → UTF-16 code-unit offsets
style_strings = []
for cp_start, cp_len, stype in sorted(styles):
# Safety: skip any out-of-bounds styles
if cp_start < 0 or cp_start + cp_len > len(text):
continue
u16_start = _utf16_len(text[:cp_start])
u16_len = _utf16_len(text[cp_start : cp_start + cp_len])
style_strings.append(f"{u16_start}:{u16_len}:{stype}")
return text, style_strings
def format_message(self, content: str) -> str:
"""Strip markdown for plain-text fallback (used by base class).
The actual rich formatting happens in send() via _markdown_to_signal().
"""
# This is only called if someone uses the base-class send path.
# Our send() override bypasses this entirely.
return content
# ------------------------------------------------------------------
# Sending
# ------------------------------------------------------------------
@@ -718,14 +886,22 @@ class SignalAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a text message."""
"""Send a text message with native Signal formatting."""
await self._stop_typing_indicator(chat_id)
plain_text, text_styles = self._markdown_to_signal(content)
params: Dict[str, Any] = {
"account": self.account,
"message": content,
"message": plain_text,
}
if text_styles:
if len(text_styles) == 1:
params["textStyle"] = text_styles[0]
else:
params["textStyles"] = text_styles
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
@@ -735,11 +911,10 @@ class SignalAdapter(BasePlatformAdapter):
if result is not None:
self._track_sent_timestamp(result)
# Use the timestamp from the RPC result as a pseudo message_id.
# Signal doesn't have real message IDs, but the stream consumer
# needs a truthy value to follow its edit→fallback path correctly.
_msg_id = str(result.get("timestamp", "")) if isinstance(result, dict) else None
return SendResult(success=True, message_id=_msg_id or None)
# Signal has no editable message identifier. Returning None keeps the
# stream consumer on the non-edit fallback path instead of pretending
# future edits can remove an in-progress cursor from the chat thread.
return SendResult(success=True, message_id=None)
return SendResult(success=False, error="RPC send failed")
def _track_sent_timestamp(self, rpc_result) -> None:
@@ -963,6 +1138,110 @@ class SignalAdapter(BasePlatformAdapter):
_keep_typing finally block to clean up platform-level typing tasks."""
await self._stop_typing_indicator(chat_id)
# ------------------------------------------------------------------
# Reactions
# ------------------------------------------------------------------
async def send_reaction(
self,
chat_id: str,
emoji: str,
target_author: str,
target_timestamp: int,
) -> bool:
"""Send a reaction emoji to a specific message via signal-cli RPC.
Args:
chat_id: The chat (phone number or "group:<id>")
emoji: Reaction emoji string (e.g. "👀", "")
target_author: Phone number / UUID of the message author
target_timestamp: Signal timestamp (ms) of the message to react to
"""
params: Dict[str, Any] = {
"account": self.account,
"emoji": emoji,
"targetAuthor": target_author,
"targetTimestamp": target_timestamp,
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
result = await self._rpc("sendReaction", params)
if result is not None:
return True
logger.debug("Signal: sendReaction failed (chat=%s, emoji=%s)", chat_id[:20], emoji)
return False
async def remove_reaction(
self,
chat_id: str,
target_author: str,
target_timestamp: int,
) -> bool:
"""Remove a reaction by sending an empty-string emoji."""
params: Dict[str, Any] = {
"account": self.account,
"emoji": "",
"targetAuthor": target_author,
"targetTimestamp": target_timestamp,
"remove": True,
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
result = await self._rpc("sendReaction", params)
return result is not None
# ------------------------------------------------------------------
# Processing Lifecycle Hooks (reactions as progress indicators)
# ------------------------------------------------------------------
def _extract_reaction_target(self, event: MessageEvent) -> Optional[tuple]:
"""Extract (target_author, target_timestamp) from a MessageEvent.
Returns None if the event doesn't carry the raw Signal envelope data
needed for sendReaction.
"""
raw = event.raw_message
if not isinstance(raw, dict):
return None
author = raw.get("sender")
ts = raw.get("timestamp_ms")
if not author or not ts:
return None
return (author, ts)
async def on_processing_start(self, event: MessageEvent) -> None:
"""React with 👀 when processing begins."""
target = self._extract_reaction_target(event)
if target:
await self.send_reaction(event.source.chat_id, "👀", *target)
async def on_processing_complete(self, event: MessageEvent, outcome: "ProcessingOutcome") -> None:
"""Swap the 👀 reaction for ✅ (success) or ❌ (failure).
On CANCELLED we leave the 👀 in place no terminal outcome means
the reaction should keep reflecting "in progress" (matches Telegram).
"""
if outcome == ProcessingOutcome.CANCELLED:
return
target = self._extract_reaction_target(event)
if not target:
return
chat_id = event.source.chat_id
# Remove the in-progress reaction, then add the final one
await self.remove_reaction(chat_id, *target)
if outcome == ProcessingOutcome.SUCCESS:
await self.send_reaction(chat_id, "", *target)
elif outcome == ProcessingOutcome.FAILURE:
await self.send_reaction(chat_id, "", *target)
# ------------------------------------------------------------------
# Chat Info
# ------------------------------------------------------------------
+93 -14
View File
@@ -84,6 +84,7 @@ from gateway.platforms.telegram_network import (
discover_fallback_ips,
parse_fallback_ip_env,
)
from utils import atomic_replace
def check_telegram_requirements() -> bool:
@@ -122,12 +123,12 @@ def _strip_mdv2(text: str) -> str:
# ---------------------------------------------------------------------------
# Markdown table → code block conversion
# Markdown table → Telegram-friendly row groups
# ---------------------------------------------------------------------------
# Telegram's MarkdownV2 has no table syntax — '|' is just an escaped literal,
# so pipe tables render as noisy backslash-pipe text with no alignment.
# Wrapping the table in a fenced code block makes Telegram render it as
# monospace preformatted text with columns intact.
# Reformating each row into a bold heading plus bullet list keeps the content
# readable on mobile clients while preserving the source data.
# Matches a GFM table delimiter row: optional outer pipes, cells containing
# only dashes (with optional leading/trailing colons for alignment) separated
@@ -144,13 +145,49 @@ def _is_table_row(line: str) -> bool:
return bool(stripped) and '|' in stripped
def _split_markdown_table_row(line: str) -> list[str]:
"""Split a simple GFM table row into stripped cell values."""
stripped = line.strip()
if stripped.startswith("|"):
stripped = stripped[1:]
if stripped.endswith("|"):
stripped = stripped[:-1]
return [cell.strip() for cell in stripped.split("|")]
def _render_table_block_for_telegram(table_block: list[str]) -> str:
"""Render a detected GFM table as Telegram-friendly row groups."""
if len(table_block) < 3:
return "\n".join(table_block)
headers = _split_markdown_table_row(table_block[0])
if len(headers) < 2:
return "\n".join(table_block)
rendered_rows: list[str] = []
for index, row in enumerate(table_block[2:], start=1):
cells = _split_markdown_table_row(row)
if len(cells) < len(headers):
cells.extend([""] * (len(headers) - len(cells)))
elif len(cells) > len(headers):
cells = cells[: len(headers)]
heading = next((cell for cell in cells if cell), f"Row {index}")
rendered_rows.append(f"**{heading}**")
rendered_rows.extend(
f"{header}: {value}" for header, value in zip(headers, cells)
)
return "\n\n".join(rendered_rows)
def _wrap_markdown_tables(text: str) -> str:
"""Wrap GFM-style pipe tables in ``` fences so Telegram renders them.
"""Rewrite GFM-style pipe tables into Telegram-friendly bullet groups.
Detected by a row containing '|' immediately followed by a delimiter
row matching :data:`_TABLE_SEPARATOR_RE`. Subsequent pipe-containing
non-blank lines are consumed as the table body and included in the
wrapped block. Tables inside existing fenced code blocks are left
non-blank lines are consumed as the table body and rewritten as
per-row bullet groups. Tables inside existing fenced code blocks are left
alone.
"""
if '|' not in text or '-' not in text:
@@ -187,9 +224,7 @@ def _wrap_markdown_tables(text: str) -> str:
while j < len(lines) and _is_table_row(lines[j]):
table_block.append(lines[j])
j += 1
out.append('```')
out.extend(table_block)
out.append('```')
out.append(_render_table_block_for_telegram(table_block))
i = j
continue
@@ -334,6 +369,49 @@ class TelegramAdapter(BasePlatformAdapter):
return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
return {"disable_web_page_preview": True}
async def _drain_polling_connections(self) -> None:
"""Reset the httpx connection pool used for getUpdates polling.
Network errors (especially through proxies like sing-box) can leave
httpx connections in a half-closed state that still occupy pool slots.
After enough reconnect cycles the pool fills up entirely, causing
``Pool timeout: All connections in the connection pool are occupied.``
We reset ONLY ``_request[0]`` (the getUpdates request) the general
request (``_request[1]``) is left untouched so concurrent
``send_message`` / ``edit_message`` calls are never interrupted.
Implementation note: accesses ``Bot._request[0]`` which is the
get-updates ``BaseRequest`` in the PTB 22.x internal tuple
``(get_updates_request, general_request)``. There is no public
accessor for the polling request; review if upgrading to PTB 23+.
"""
if not (self._app and self._app.bot):
return
try:
# PTB 22.x: _request is a (get_updates, general) tuple;
# no public accessor exists for the polling request.
polling_req = self._app.bot._request[0] # noqa: SLF001
except Exception:
return
try:
await polling_req.shutdown()
except Exception:
logger.debug(
"[%s] Polling request shutdown failed (non-fatal)",
self.name, exc_info=True,
)
try:
await polling_req.initialize()
logger.debug(
"[%s] Polling request pool drained before reconnect", self.name
)
except Exception:
logger.debug(
"[%s] Polling request re-initialize failed (non-fatal)",
self.name, exc_info=True,
)
async def _handle_polling_network_error(self, error: Exception) -> None:
"""Reconnect polling after a transient network interruption.
@@ -379,6 +457,8 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception:
pass
await self._drain_polling_connections()
try:
await self._app.updater.start_polling(
allowed_updates=Update.ALL_TYPES,
@@ -426,6 +506,7 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception:
pass
await asyncio.sleep(RETRY_DELAY)
await self._drain_polling_connections()
try:
await self._app.updater.start_polling(
allowed_updates=Update.ALL_TYPES,
@@ -554,7 +635,7 @@ class TelegramAdapter(BasePlatformAdapter):
_yaml.dump(config, f, default_flow_style=False, sort_keys=False)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, config_path)
atomic_replace(tmp_path, config_path)
except BaseException:
try:
os.unlink(tmp_path)
@@ -2080,10 +2161,8 @@ class TelegramAdapter(BasePlatformAdapter):
text = content
# 0) Pre-wrap GFM-style pipe tables in ``` fences. Telegram can't
# render tables natively, but fenced code blocks render as
# monospace preformatted text with columns intact. The wrapped
# tables then flow through step (1) below as protected regions.
# 0) Rewrite GFM-style pipe tables into Telegram-friendly row groups
# before the normal MarkdownV2 conversions run.
text = _wrap_markdown_tables(text)
# 1) Protect fenced code blocks (``` ... ```)
+52 -4
View File
@@ -89,8 +89,21 @@ MAX_CONSECUTIVE_FAILURES = 3
RETRY_DELAY_SECONDS = 2
BACKOFF_DELAY_SECONDS = 30
SESSION_EXPIRED_ERRCODE = -14
RATE_LIMIT_ERRCODE = -2 # iLink frequency limit — backoff and retry
MESSAGE_DEDUP_TTL_SECONDS = 300
def _is_stale_session_ret(
ret: "Optional[int]", errcode: "Optional[int]", errmsg: "Optional[str]",
) -> bool:
"""True when iLink returns ret=-2 / errcode=-2 with 'unknown error',
which is a stale-session signal (same as errcode=-14) rather than
a genuine rate limit."""
if ret != RATE_LIMIT_ERRCODE and errcode != RATE_LIMIT_ERRCODE:
return False
return (errmsg or "").lower() == "unknown error"
MEDIA_IMAGE = 1
MEDIA_VIDEO = 2
MEDIA_FILE = 3
@@ -1113,7 +1126,7 @@ async def qr_login(
class WeixinAdapter(BasePlatformAdapter):
"""Native Hermes adapter for Weixin personal accounts."""
MAX_MESSAGE_LENGTH = 4000
MAX_MESSAGE_LENGTH = 2000
# WeChat does not support editing sent messages — streaming must use the
# fallback "send-final-only" path so the cursor (▉) is never left visible.
@@ -1138,10 +1151,10 @@ class WeixinAdapter(BasePlatformAdapter):
extra.get("cdn_base_url") or os.getenv("WEIXIN_CDN_BASE_URL", WEIXIN_CDN_BASE_URL)
).strip().rstrip("/")
self._send_chunk_delay_seconds = float(
extra.get("send_chunk_delay_seconds") or os.getenv("WEIXIN_SEND_CHUNK_DELAY_SECONDS", "0.35")
extra.get("send_chunk_delay_seconds") or os.getenv("WEIXIN_SEND_CHUNK_DELAY_SECONDS", "1.5")
)
self._send_chunk_retries = int(
extra.get("send_chunk_retries") or os.getenv("WEIXIN_SEND_CHUNK_RETRIES", "2")
extra.get("send_chunk_retries") or os.getenv("WEIXIN_SEND_CHUNK_RETRIES", "4")
)
self._send_chunk_retry_delay_seconds = float(
extra.get("send_chunk_retry_delay_seconds")
@@ -1209,6 +1222,17 @@ class WeixinAdapter(BasePlatformAdapter):
self._mark_connected()
_LIVE_ADAPTERS[self._token] = self
logger.info("[%s] Connected account=%s base=%s", self.name, _safe_id(self._account_id), self._base_url)
if self._group_policy != "disabled":
logger.warning(
"[%s] WEIXIN_GROUP_POLICY=%s is set, but QR-login connects an iLink bot "
"identity (e.g. ...@im.bot) which typically cannot be invited into ordinary "
"WeChat groups. iLink usually does not deliver ordinary-group events for "
"these accounts, so group messages may never reach Hermes regardless of this "
"policy. If group delivery doesn't work, the limitation is on the iLink side, "
"not in Hermes.",
self.name,
self._group_policy,
)
return True
async def disconnect(self) -> None:
@@ -1253,7 +1277,8 @@ class WeixinAdapter(BasePlatformAdapter):
ret = response.get("ret", 0)
errcode = response.get("errcode", 0)
if ret not in (0, None) or errcode not in (0, None):
if ret == SESSION_EXPIRED_ERRCODE or errcode == SESSION_EXPIRED_ERRCODE:
if (ret == SESSION_EXPIRED_ERRCODE or errcode == SESSION_EXPIRED_ERRCODE
or _is_stale_session_ret(ret, errcode, response.get("errmsg"))):
logger.error("[%s] Session expired; pausing for 10 minutes", self.name)
await asyncio.sleep(600)
consecutive_failures = 0
@@ -1518,6 +1543,7 @@ class WeixinAdapter(BasePlatformAdapter):
is_session_expired = (
ret == SESSION_EXPIRED_ERRCODE
or errcode == SESSION_EXPIRED_ERRCODE
or _is_stale_session_ret(ret, errcode, resp.get("errmsg"))
)
# Session expired — strip token and retry once
if is_session_expired and not retried_without_token and context_token:
@@ -1531,6 +1557,28 @@ class WeixinAdapter(BasePlatformAdapter):
self.name, _safe_id(chat_id),
)
continue
# Rate limit (-2) — backoff and retry
is_rate_limited = (
ret == RATE_LIMIT_ERRCODE
or errcode == RATE_LIMIT_ERRCODE
)
if is_rate_limited:
errmsg = resp.get("errmsg") or resp.get("msg") or "rate limited"
# Record the error so we raise a descriptive
# RuntimeError (instead of AssertionError) if the
# loop exhausts with the server still rate-limiting.
last_error = RuntimeError(
f"iLink sendmessage rate limited: ret={ret} errcode={errcode} errmsg={errmsg}"
)
if attempt >= self._send_chunk_retries:
break
wait = self._send_chunk_retry_delay_seconds * 3 # 3x backoff for rate limit
logger.warning(
"[%s] rate limited for %s; backing off %.1fs before retry",
self.name, _safe_id(chat_id), wait,
)
await asyncio.sleep(wait)
continue
errmsg = resp.get("errmsg") or resp.get("msg") or "unknown error"
raise RuntimeError(
f"iLink sendmessage error: ret={ret} errcode={errcode} errmsg={errmsg}"
+2 -2
View File
@@ -90,7 +90,7 @@ from gateway.platforms.yuanbao_proto import (
encode_get_group_member_list,
next_seq_no,
)
from gateway.session import SessionSource, build_session_key
from gateway.session import build_session_key
logger = logging.getLogger(__name__)
@@ -1897,7 +1897,7 @@ class OwnerCommandMiddleware(InboundMiddleware):
return None, None, False
# Sender identity check: bot owner <-> push.from_account == push.bot_owner_id
owner_id = (push or {}).get("bot_owner_id") or ""
# owner_id = (push or {}).get("bot_owner_id") or ""
# is_owner = bool(owner_id) and owner_id == from_account
is_owner = True
return cmd, cmd_line, is_owner
-2
View File
@@ -21,12 +21,10 @@ import hashlib
import hmac
import logging
import os
import re
import secrets
import struct
import time
import urllib.parse
from datetime import datetime, timezone, timedelta
from typing import Optional, Any
import httpx
+1 -2
View File
@@ -19,9 +19,8 @@ yuanbao_proto.py - Yuanbao WebSocket 协议编解码(纯 Python 实现)
from __future__ import annotations
import logging
import struct
import threading
from typing import Optional, Union
from typing import Optional
logger = logging.getLogger(__name__)
+553 -62
View File
File diff suppressed because it is too large Load Diff
+150
View File
@@ -0,0 +1,150 @@
"""Gateway runtime-metadata footer.
Renders a compact footer showing runtime state (model, context %, cwd) and
appends it to the FINAL message of an agent turn when enabled. Off by default
to keep replies minimal.
Config (``~/.hermes/config.yaml``)::
display:
runtime_footer:
enabled: true # off by default
fields: [model, context_pct, cwd] # order shown; drop any to hide
Per-platform overrides live under ``display.platforms.<platform>.runtime_footer``.
Users can toggle the global setting with ``/footer on|off`` from both the CLI
and any gateway platform.
The footer is appended to the final response text in ``gateway/run.py`` right
before returning the response to the adapter send path so it only lands on
the final message a user sees, not on tool-progress updates or streaming
partials. When streaming is on and the final text has already been delivered
piecemeal, the footer is sent as a separate trailing message via
``send_trailing_footer()``.
"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Any, Iterable, Optional
_DEFAULT_FIELDS: tuple[str, ...] = ("model", "context_pct", "cwd")
_SEP = " · "
def _home_relative_cwd(cwd: str) -> str:
"""Return *cwd* with ``$HOME`` collapsed to ``~``. Empty string if unset."""
if not cwd:
return ""
try:
home = os.path.expanduser("~")
p = os.path.abspath(cwd)
if home and (p == home or p.startswith(home + os.sep)):
return "~" + p[len(home):]
return p
except Exception:
return cwd
def _model_short(model: Optional[str]) -> str:
"""Drop ``vendor/`` prefix for readability (``openai/gpt-5.4`` → ``gpt-5.4``)."""
if not model:
return ""
return model.rsplit("/", 1)[-1]
def resolve_footer_config(
user_config: dict[str, Any] | None,
platform_key: str | None = None,
) -> dict[str, Any]:
"""Resolve effective runtime-footer config for *platform_key*.
Merge order (later wins):
1. Built-in defaults (enabled=False)
2. ``display.runtime_footer``
3. ``display.platforms.<platform_key>.runtime_footer``
"""
resolved = {"enabled": False, "fields": list(_DEFAULT_FIELDS)}
cfg = (user_config or {}).get("display") or {}
global_cfg = cfg.get("runtime_footer")
if isinstance(global_cfg, dict):
if "enabled" in global_cfg:
resolved["enabled"] = bool(global_cfg.get("enabled"))
if isinstance(global_cfg.get("fields"), list) and global_cfg["fields"]:
resolved["fields"] = [str(f) for f in global_cfg["fields"]]
if platform_key:
platforms = cfg.get("platforms") or {}
plat_cfg = platforms.get(platform_key)
if isinstance(plat_cfg, dict):
plat_footer = plat_cfg.get("runtime_footer")
if isinstance(plat_footer, dict):
if "enabled" in plat_footer:
resolved["enabled"] = bool(plat_footer.get("enabled"))
if isinstance(plat_footer.get("fields"), list) and plat_footer["fields"]:
resolved["fields"] = [str(f) for f in plat_footer["fields"]]
return resolved
def format_runtime_footer(
*,
model: Optional[str],
context_tokens: int,
context_length: Optional[int],
cwd: Optional[str] = None,
fields: Iterable[str] = _DEFAULT_FIELDS,
) -> str:
"""Render the footer line, or return "" if no fields have data.
Fields are skipped silently when their underlying data is missing a
partially-populated footer is better than a line with ``?%`` or empty slots.
"""
parts: list[str] = []
for field in fields:
if field == "model":
m = _model_short(model)
if m:
parts.append(m)
elif field == "context_pct":
if context_length and context_length > 0 and context_tokens >= 0:
pct = max(0, min(100, round((context_tokens / context_length) * 100)))
parts.append(f"{pct}%")
elif field == "cwd":
rel = _home_relative_cwd(cwd or os.environ.get("TERMINAL_CWD", ""))
if rel:
parts.append(rel)
# Unknown field names are silently ignored.
if not parts:
return ""
return _SEP.join(parts)
def build_footer_line(
*,
user_config: dict[str, Any] | None,
platform_key: str | None,
model: Optional[str],
context_tokens: int,
context_length: Optional[int],
cwd: Optional[str] = None,
) -> str:
"""Top-level entry point used by gateway/run.py.
Returns the footer text (empty string when disabled or no data). Callers
append this to the final response themselves, preserving a single blank
line of separation.
"""
cfg = resolve_footer_config(user_config, platform_key)
if not cfg.get("enabled"):
return ""
return format_runtime_footer(
model=model,
context_tokens=context_tokens,
context_length=context_length,
cwd=cwd,
fields=cfg.get("fields") or _DEFAULT_FIELDS,
)
+5 -19
View File
@@ -62,8 +62,8 @@ from .config import (
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
from utils import atomic_replace
@dataclass
@@ -705,7 +705,7 @@ class SessionStore:
json.dump(data, f, indent=2)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, sessions_file)
atomic_replace(tmp_path, sessions_file)
except BaseException:
try:
os.unlink(tmp_path)
@@ -1257,25 +1257,11 @@ class SessionStore:
Used by /retry, /undo, and /compress to persist modified conversation history.
Rewrites both SQLite and legacy JSONL storage.
"""
# SQLite: clear old messages and re-insert
# SQLite: replace atomically so a mid-rewrite failure doesn't leave
# the session half-empty in the DB while JSONL still has history.
if self._db:
try:
self._db.clear_messages(session_id)
for msg in messages:
role = msg.get("role", "unknown")
self._db.append_message(
session_id=session_id,
role=role,
content=msg.get("content"),
tool_name=msg.get("tool_name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
self._db.replace_messages(session_id, messages)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
+35
View File
@@ -91,11 +91,20 @@ class GatewayStreamConsumer:
chat_id: str,
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
on_new_message: Optional[callable] = None,
):
self.adapter = adapter
self.chat_id = chat_id
self.cfg = config or StreamConsumerConfig()
self.metadata = metadata
# Fired whenever a fresh content bubble is created on the platform
# (first-send of a new message, commentary, overflow chunk, or
# fallback continuation). The gateway uses this to linearize the
# tool-progress bubble: when content resumes after a tool batch,
# the next tool.started should open a NEW progress bubble below
# the content, not edit the old bubble above it.
# Called with no arguments. Exceptions are swallowed.
self._on_new_message = on_new_message
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
@@ -146,6 +155,16 @@ class GatewayStreamConsumer:
if text:
self._queue.put((_COMMENTARY, text))
def _notify_new_message(self) -> None:
"""Fire the on_new_message callback, swallowing any errors."""
cb = self._on_new_message
if cb is None:
return
try:
cb()
except Exception:
logger.debug("on_new_message callback error", exc_info=True)
def _reset_segment_state(self, *, preserve_no_edit: bool = False) -> None:
if preserve_no_edit and self._message_id == "__no_edit__":
return
@@ -529,6 +548,9 @@ class GatewayStreamConsumer:
self._message_id = str(result.message_id)
self._already_sent = True
self._last_sent_text = text
# Fresh content bubble — close off any stale tool bubble
# above so the next tool starts a new bubble below.
self._notify_new_message()
return str(result.message_id)
else:
self._edit_supported = False
@@ -661,6 +683,9 @@ class GatewayStreamConsumer:
sent_any_chunk = True
last_successful_chunk = chunk
last_message_id = result.message_id or last_message_id
# Each fallback chunk is a fresh platform message — notify
# so any stale tool-progress bubble gets closed off.
self._notify_new_message()
self._message_id = last_message_id
self._already_sent = True
@@ -744,6 +769,11 @@ class GatewayStreamConsumer:
# tool..."), not the final response. Setting already_sent would cause
# the final response to be incorrectly suppressed when there are
# multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
if result.success:
# Commentary counts as fresh content — close off any
# stale tool bubble above it so the next tool starts a
# new bubble below.
self._notify_new_message()
return result.success
except Exception as e:
logger.error("Commentary send error: %s", e)
@@ -973,6 +1003,11 @@ class GatewayStreamConsumer:
# every delta/tool boundary when platforms accept a
# message but do not return an editable message id.
self._message_id = "__no_edit__"
# Notify the gateway that a fresh content bubble was
# created so any accumulated tool-progress bubble above
# gets closed off — the next tool fires into a new
# bubble below, preserving chronological order.
self._notify_new_message()
return True
else:
# Initial send failed — disable streaming for this session
+380 -5
View File
@@ -43,6 +43,7 @@ import yaml
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_constants import OPENROUTER_BASE_URL
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -71,6 +72,14 @@ DEFAULT_AGENT_KEY_MIN_TTL_SECONDS = 30 * 60 # 30 minutes
ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120 # refresh 2 min before expiry
DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS = 1 # poll at most every 1s
DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
MINIMAX_OAUTH_CLIENT_ID = "78257093-7e40-4613-99e0-527b14b39113"
MINIMAX_OAUTH_SCOPE = "group_id profile model.completion"
MINIMAX_OAUTH_GRANT_TYPE = "urn:ietf:params:oauth:grant-type:user_code"
MINIMAX_OAUTH_GLOBAL_BASE = "https://api.minimax.io"
MINIMAX_OAUTH_CN_BASE = "https://api.minimaxi.com"
MINIMAX_OAUTH_GLOBAL_INFERENCE = "https://api.minimax.io/anthropic"
MINIMAX_OAUTH_CN_INFERENCE = "https://api.minimaxi.com/anthropic"
MINIMAX_OAUTH_REFRESH_SKEW_SECONDS = 60
DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
@@ -109,6 +118,12 @@ SERVICE_PROVIDER_NAMES: Dict[str, str] = {
DEFAULT_GEMINI_CLOUDCODE_BASE_URL = "cloudcode-pa://google"
GEMINI_OAUTH_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 60 # refresh 60s before expiry
# LM Studio's default no-auth mode still requires *some* non-empty bearer for
# the API-key code paths (auxiliary_client, runtime resolver) to treat the
# provider as configured. This sentinel is sent only to LM Studio, never to
# any remote service.
LMSTUDIO_NOAUTH_PLACEHOLDER = "dummy-lm-api-key"
# =============================================================================
# Provider Registry
@@ -119,7 +134,7 @@ class ProviderConfig:
"""Describes a known inference provider."""
id: str
name: str
auth_type: str # "oauth_device_code", "oauth_external", or "api_key"
auth_type: str # "oauth_device_code", "oauth_external", "oauth_minimax", or "api_key"
portal_base_url: str = ""
inference_base_url: str = ""
client_id: str = ""
@@ -159,6 +174,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_GEMINI_CLOUDCODE_BASE_URL,
),
"lmstudio": ProviderConfig(
id="lmstudio",
name="LM Studio",
auth_type="api_key",
inference_base_url="http://127.0.0.1:1234/v1",
api_key_env_vars=("LM_API_KEY",),
base_url_env_var="LM_BASE_URL",
),
"copilot": ProviderConfig(
id="copilot",
name="GitHub Copilot",
@@ -240,6 +263,17 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("MINIMAX_API_KEY",),
base_url_env_var="MINIMAX_BASE_URL",
),
"minimax-oauth": ProviderConfig(
id="minimax-oauth",
name="MiniMax (OAuth \u00b7 minimax.io)",
auth_type="oauth_minimax",
portal_base_url=MINIMAX_OAUTH_GLOBAL_BASE,
inference_base_url=MINIMAX_OAUTH_GLOBAL_INFERENCE,
client_id=MINIMAX_OAUTH_CLIENT_ID,
scope=MINIMAX_OAUTH_SCOPE,
extra={"region": "global", "cn_portal_base_url": MINIMAX_OAUTH_CN_BASE,
"cn_inference_base_url": MINIMAX_OAUTH_CN_INFERENCE},
),
"anthropic": ProviderConfig(
id="anthropic",
name="Anthropic",
@@ -348,6 +382,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"tencent-tokenhub": ProviderConfig(
id="tencent-tokenhub",
name="Tencent TokenHub",
auth_type="api_key",
inference_base_url="https://tokenhub.tencentmaas.com/v1",
api_key_env_vars=("TOKENHUB_API_KEY",),
base_url_env_var="TOKENHUB_BASE_URL",
),
"ollama-cloud": ProviderConfig(
id="ollama-cloud",
name="Ollama Cloud",
@@ -820,7 +862,7 @@ def _save_auth_store(auth_store: Dict[str, Any]) -> Path:
handle.write(payload)
handle.flush()
os.fsync(handle.fileno())
os.replace(tmp_path, auth_file)
atomic_replace(tmp_path, auth_file)
try:
dir_fd = os.open(str(auth_file.parent), os.O_RDONLY)
except OSError:
@@ -1130,6 +1172,7 @@ def resolve_provider(
"arcee-ai": "arcee", "arceeai": "arcee",
"gmi-cloud": "gmi", "gmicloud": "gmi",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"minimax-portal": "minimax-oauth", "minimax-global": "minimax-oauth", "minimax_oauth": "minimax-oauth",
"alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
"claude": "anthropic", "claude-code": "anthropic",
@@ -1141,11 +1184,13 @@ def resolve_provider(
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth", "google-gemini-cli": "google-gemini-cli", "gemini-cli": "google-gemini-cli", "gemini-oauth": "google-gemini-cli",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub", "tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub", "tencentmaas": "tencent-tokenhub",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
"lmstudio": "lmstudio", "lm-studio": "lmstudio", "lm_studio": "lmstudio",
# Local server aliases — route through the generic custom provider
"lmstudio": "custom", "lm-studio": "custom", "lm_studio": "custom",
"ollama": "custom", "ollama_cloud": "ollama-cloud",
"vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
@@ -1192,8 +1237,11 @@ def resolve_provider(
continue
# GitHub tokens are commonly present for repo/tool access but should not
# hijack inference auto-selection unless the user explicitly chooses
# Copilot/GitHub Models as the provider.
if pid == "copilot":
# Copilot/GitHub Models as the provider. LM Studio is a local server
# whose availability isn't implied by LM_API_KEY presence (it may be
# offline, and the no-auth setup uses a placeholder value), so it
# also requires explicit selection.
if pid in ("copilot", "lmstudio"):
continue
for env_var in pconfig.api_key_env_vars:
if has_usable_secret(os.getenv(env_var, "")):
@@ -3471,6 +3519,13 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
key_source = ""
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
# No-auth LM Studio: substitute a placeholder so runtime / auxiliary_client
# see the local server as configured. doctor still reports unconfigured
# because get_api_key_provider_status uses the raw secret resolver.
if not api_key and provider_id == "lmstudio":
api_key = LMSTUDIO_NOAUTH_PLACEHOLDER
key_source = key_source or "default"
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
@@ -4081,6 +4136,326 @@ def _codex_device_code_login() -> Dict[str, Any]:
}
# ==================== MiniMax Portal OAuth ====================
def _minimax_pkce_pair() -> tuple:
"""Generate (code_verifier, code_challenge_S256, state) for MiniMax OAuth."""
import secrets
verifier = secrets.token_urlsafe(64)[:96]
challenge = base64.urlsafe_b64encode(
hashlib.sha256(verifier.encode()).digest()
).decode().rstrip("=")
state = secrets.token_urlsafe(16)
return verifier, challenge, state
def _minimax_request_user_code(
client: httpx.Client, *, portal_base_url: str, client_id: str,
code_challenge: str, state: str,
) -> Dict[str, Any]:
response = client.post(
f"{portal_base_url}/oauth/code",
data={
"response_type": "code",
"client_id": client_id,
"scope": MINIMAX_OAUTH_SCOPE,
"code_challenge": code_challenge,
"code_challenge_method": "S256",
"state": state,
},
headers={
"Content-Type": "application/x-www-form-urlencoded",
"Accept": "application/json",
"x-request-id": str(uuid.uuid4()),
},
)
if response.status_code != 200:
raise AuthError(
f"MiniMax OAuth authorization failed: {response.text or response.reason_phrase}",
provider="minimax-oauth", code="authorization_failed",
)
payload = response.json()
for field in ("user_code", "verification_uri", "expired_in"):
if field not in payload:
raise AuthError(
f"MiniMax OAuth response missing field: {field}",
provider="minimax-oauth", code="authorization_incomplete",
)
if payload.get("state") != state:
raise AuthError(
"MiniMax OAuth state mismatch (possible CSRF).",
provider="minimax-oauth", code="state_mismatch",
)
return payload
def _minimax_poll_token(
client: httpx.Client, *, portal_base_url: str, client_id: str,
user_code: str, code_verifier: str, expired_in: int, interval_ms: Optional[int],
) -> Dict[str, Any]:
# OpenClaw treats expired_in as a unix-ms timestamp (Date.now() < expireTimeMs).
# Defensive parsing: if it's small enough to be a duration, treat as seconds.
import time as _time
now_ms = int(_time.time() * 1000)
if expired_in > now_ms // 2:
# Looks like a unix-ms timestamp.
deadline = expired_in / 1000.0
else:
# Treat as duration in seconds from now.
deadline = _time.time() + max(1, expired_in)
interval = max(2.0, (interval_ms or 2000) / 1000.0)
while _time.time() < deadline:
response = client.post(
f"{portal_base_url}/oauth/token",
data={
"grant_type": MINIMAX_OAUTH_GRANT_TYPE,
"client_id": client_id,
"user_code": user_code,
"code_verifier": code_verifier,
},
headers={
"Content-Type": "application/x-www-form-urlencoded",
"Accept": "application/json",
},
)
try:
payload = response.json() if response.text else {}
except Exception:
payload = {}
if response.status_code != 200:
msg = (payload.get("base_resp", {}) or {}).get("status_msg") or response.text
raise AuthError(
f"MiniMax OAuth error: {msg or 'unknown'}",
provider="minimax-oauth", code="token_exchange_failed",
)
status = payload.get("status")
if status == "error":
raise AuthError(
"MiniMax OAuth reported an error. Please try again later.",
provider="minimax-oauth", code="authorization_denied",
)
if status == "success":
if not all(payload.get(k) for k in ("access_token", "refresh_token", "expired_in")):
raise AuthError(
"MiniMax OAuth success payload missing required token fields.",
provider="minimax-oauth", code="token_incomplete",
)
return payload
# "pending" or any other status -> keep polling
_time.sleep(interval)
raise AuthError(
"MiniMax OAuth timed out before authorization completed.",
provider="minimax-oauth", code="timeout",
)
def _minimax_save_auth_state(auth_state: Dict[str, Any]) -> None:
"""Persist MiniMax OAuth state to Hermes auth store (~/.hermes/auth.json)."""
with _auth_store_lock():
auth_store = _load_auth_store()
_save_provider_state(auth_store, "minimax-oauth", auth_state)
_save_auth_store(auth_store)
def _minimax_oauth_login(
*, region: str = "global", open_browser: bool = True,
timeout_seconds: float = 15.0,
) -> Dict[str, Any]:
"""Run MiniMax OAuth flow, persist tokens, return auth state dict."""
pconfig = PROVIDER_REGISTRY["minimax-oauth"]
if region == "cn":
portal_base_url = pconfig.extra["cn_portal_base_url"]
inference_base_url = pconfig.extra["cn_inference_base_url"]
else:
portal_base_url = pconfig.portal_base_url
inference_base_url = pconfig.inference_base_url
verifier, challenge, state = _minimax_pkce_pair()
if _is_remote_session():
open_browser = False
print(f"Starting Hermes login via MiniMax ({region}) OAuth...")
print(f"Portal: {portal_base_url}")
with httpx.Client(timeout=httpx.Timeout(timeout_seconds),
headers={"Accept": "application/json"}) as client:
code_data = _minimax_request_user_code(
client, portal_base_url=portal_base_url,
client_id=pconfig.client_id,
code_challenge=challenge, state=state,
)
verification_url = str(code_data["verification_uri"])
user_code = str(code_data["user_code"])
print()
print("To continue:")
print(f" 1. Open: {verification_url}")
print(f" 2. If prompted, enter code: {user_code}")
if open_browser:
if webbrowser.open(verification_url):
print(" (Opened browser for verification)")
else:
print(" Could not open browser automatically -- use the URL above.")
interval_raw = code_data.get("interval")
interval_ms = int(interval_raw) if interval_raw is not None else None
print("Waiting for approval...")
token_data = _minimax_poll_token(
client, portal_base_url=portal_base_url,
client_id=pconfig.client_id,
user_code=user_code, code_verifier=verifier,
expired_in=int(code_data["expired_in"]),
interval_ms=interval_ms,
)
now = datetime.now(timezone.utc)
expires_in_s = int(token_data["expired_in"])
expires_at = now.timestamp() + expires_in_s
auth_state = {
"provider": "minimax-oauth",
"region": region,
"portal_base_url": portal_base_url,
"inference_base_url": inference_base_url,
"client_id": pconfig.client_id,
"scope": MINIMAX_OAUTH_SCOPE,
"token_type": token_data.get("token_type", "Bearer"),
"access_token": token_data["access_token"],
"refresh_token": token_data["refresh_token"],
"resource_url": token_data.get("resource_url"),
"obtained_at": now.isoformat(),
"expires_at": datetime.fromtimestamp(expires_at, tz=timezone.utc).isoformat(),
"expires_in": expires_in_s,
}
_minimax_save_auth_state(auth_state)
print("\u2713 MiniMax OAuth login successful.")
if msg := token_data.get("notification_message"):
print(f"Note from MiniMax: {msg}")
return auth_state
def _refresh_minimax_oauth_state(
state: Dict[str, Any], *, timeout_seconds: float = 15.0,
force: bool = False,
) -> Dict[str, Any]:
"""Refresh MiniMax OAuth access token if close to expiry (or forced)."""
if not state.get("refresh_token"):
raise AuthError(
"MiniMax OAuth state has no refresh_token; please re-login.",
provider="minimax-oauth", code="no_refresh_token", relogin_required=True,
)
try:
expires_at = datetime.fromisoformat(state.get("expires_at", "")).timestamp()
except Exception:
expires_at = 0.0
now = time.time()
if not force and (expires_at - now) > MINIMAX_OAUTH_REFRESH_SKEW_SECONDS:
return state
portal_base_url = state["portal_base_url"]
with httpx.Client(timeout=httpx.Timeout(timeout_seconds)) as client:
response = client.post(
f"{portal_base_url}/oauth/token",
data={
"grant_type": "refresh_token",
"client_id": state["client_id"],
"refresh_token": state["refresh_token"],
},
headers={
"Content-Type": "application/x-www-form-urlencoded",
"Accept": "application/json",
},
)
if response.status_code != 200:
body = response.text.lower()
relogin = any(m in body for m in
("invalid_grant", "refresh_token_reused", "invalid_refresh_token"))
raise AuthError(
f"MiniMax OAuth refresh failed: {response.text or response.reason_phrase}",
provider="minimax-oauth", code="refresh_failed",
relogin_required=relogin,
)
payload = response.json()
if payload.get("status") != "success":
raise AuthError(
"MiniMax OAuth refresh did not return success.",
provider="minimax-oauth", code="refresh_failed",
relogin_required=True,
)
now_dt = datetime.now(timezone.utc)
expires_in_s = int(payload["expired_in"])
new_state = dict(state)
new_state.update({
"access_token": payload["access_token"],
"refresh_token": payload.get("refresh_token", state["refresh_token"]),
"obtained_at": now_dt.isoformat(),
"expires_at": datetime.fromtimestamp(now_dt.timestamp() + expires_in_s,
tz=timezone.utc).isoformat(),
"expires_in": expires_in_s,
})
_minimax_save_auth_state(new_state)
return new_state
def resolve_minimax_oauth_runtime_credentials(
*, min_token_ttl_seconds: int = MINIMAX_OAUTH_REFRESH_SKEW_SECONDS,
) -> Dict[str, Any]:
"""Return {provider, api_key, base_url, source} for minimax-oauth."""
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
raise AuthError(
"Not logged into MiniMax OAuth. Run `hermes model` and select "
"MiniMax (OAuth).",
provider="minimax-oauth", code="not_logged_in", relogin_required=True,
)
state = _refresh_minimax_oauth_state(state)
return {
"provider": "minimax-oauth",
"api_key": state["access_token"],
"base_url": state["inference_base_url"].rstrip("/"),
"source": "oauth",
}
def get_minimax_oauth_auth_status() -> Dict[str, Any]:
"""Return auth status dict for MiniMax OAuth provider."""
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
return {"logged_in": False, "provider": "minimax-oauth"}
try:
expires_at = datetime.fromisoformat(state.get("expires_at", "")).timestamp()
token_valid = (expires_at - time.time()) > 0
except Exception:
token_valid = bool(state.get("access_token"))
return {
"logged_in": token_valid,
"provider": "minimax-oauth",
"region": state.get("region", "global"),
"expires_at": state.get("expires_at"),
}
def _login_minimax_oauth(args, pconfig: ProviderConfig) -> None:
"""CLI entry for MiniMax OAuth login."""
region = getattr(args, "region", None) or "global"
open_browser = not getattr(args, "no_browser", False)
timeout = getattr(args, "timeout", None) or 15.0
try:
_minimax_oauth_login(
region=region, open_browser=open_browser, timeout_seconds=timeout,
)
except AuthError as exc:
print(format_auth_error(exc))
raise SystemExit(1)
def _nous_device_code_login(
*,
portal_base_url: Optional[str] = None,
+23 -2
View File
@@ -33,7 +33,7 @@ from hermes_constants import OPENROUTER_BASE_URL
# Providers that support OAuth login in addition to API keys.
_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli"}
_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli", "minimax-oauth"}
def _get_custom_provider_names() -> list:
@@ -170,7 +170,7 @@ def auth_add_command(args) -> None:
if provider.startswith(CUSTOM_POOL_PREFIX):
requested_type = AUTH_TYPE_API_KEY
else:
requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli"} else AUTH_TYPE_API_KEY
requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli", "minimax-oauth"} else AUTH_TYPE_API_KEY
pool = load_pool(provider)
@@ -333,6 +333,27 @@ def auth_add_command(args) -> None:
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
if provider == "minimax-oauth":
from hermes_cli.auth import resolve_minimax_oauth_runtime_credentials
creds = resolve_minimax_oauth_runtime_credentials()
label = (getattr(args, "label", None) or "").strip() or label_from_token(
creds["api_key"],
_oauth_default_label(provider, len(pool.entries()) + 1),
)
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
label=label,
auth_type=AUTH_TYPE_OAUTH,
priority=0,
source=f"{SOURCE_MANUAL}:minimax_oauth",
access_token=creds["api_key"],
base_url=creds.get("base_url"),
)
pool.add_entry(entry)
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
raise SystemExit(f"`hermes auth add {provider}` is not implemented for auth type {requested_type} yet.")
+1 -1
View File
@@ -34,7 +34,7 @@ from dataclasses import dataclass, field
from typing import Optional
from urllib import request as urllib_request
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse, urlunparse
from urllib.parse import urlparse
logger = logging.getLogger(__name__)
+152 -57
View File
@@ -696,6 +696,78 @@ def run_quick_backup(args) -> None:
print("No state files found to snapshot.")
# ---------------------------------------------------------------------------
# Shared full-zip backup helper
# ---------------------------------------------------------------------------
def _write_full_zip_backup(out_path: Path, hermes_root: Path) -> Optional[Path]:
"""Write a full zip snapshot of ``hermes_root`` to ``out_path``.
Uses the same exclusion rules and SQLite safe-copy as :func:`run_backup`.
Returns the output path on success, None on failure (nothing to back up,
or write error caller should surface the outcome but not raise).
"""
files_to_add: list[tuple[Path, Path]] = []
try:
for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
dp = Path(dirpath)
# Prune excluded directories in-place so os.walk doesn't descend
dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
for fname in filenames:
fpath = dp / fname
try:
rel = fpath.relative_to(hermes_root)
except ValueError:
continue
if _should_exclude(rel):
continue
# Skip the output zip itself if it already exists inside root.
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Full-zip backup: walk failed: %s", exc)
return None
if not files_to_add:
return None
try:
with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for abs_path, rel_path in files_to_add:
try:
if abs_path.suffix == ".db":
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
tmp_db = Path(tmp.name)
try:
if _safe_copy_db(abs_path, tmp_db):
zf.write(tmp_db, arcname=str(rel_path))
finally:
tmp_db.unlink(missing_ok=True)
else:
zf.write(abs_path, arcname=str(rel_path))
except (PermissionError, OSError, ValueError) as exc:
logger.debug("Skipping %s in zip backup: %s", rel_path, exc)
continue
except OSError as exc:
logger.warning("Full-zip backup: zip write failed: %s", exc)
# Best-effort cleanup of partial file
try:
out_path.unlink(missing_ok=True)
except OSError:
pass
return None
return out_path
# ---------------------------------------------------------------------------
# Pre-update auto-backup
# ---------------------------------------------------------------------------
@@ -768,64 +840,87 @@ def create_pre_update_backup(
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_UPDATE_PREFIX}{stamp}.zip"
# Collect files (same logic as run_backup, minus the chatty progress prints)
files_to_add: list[tuple[Path, Path]] = []
try:
for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
dp = Path(dirpath)
# Prune excluded directories in-place so os.walk doesn't descend
dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
for fname in filenames:
fpath = dp / fname
try:
rel = fpath.relative_to(hermes_root)
except ValueError:
continue
if _should_exclude(rel):
continue
# Skip the output zip itself if it already exists
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Pre-update backup: walk failed: %s", exc)
return None
if not files_to_add:
return None
try:
with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for abs_path, rel_path in files_to_add:
try:
if abs_path.suffix == ".db":
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
tmp_db = Path(tmp.name)
try:
if _safe_copy_db(abs_path, tmp_db):
zf.write(tmp_db, arcname=str(rel_path))
finally:
tmp_db.unlink(missing_ok=True)
else:
zf.write(abs_path, arcname=str(rel_path))
except (PermissionError, OSError, ValueError) as exc:
logger.debug("Skipping %s in pre-update backup: %s", rel_path, exc)
continue
except OSError as exc:
logger.warning("Pre-update backup: zip write failed: %s", exc)
# Best-effort cleanup of partial file
try:
out_path.unlink(missing_ok=True)
except OSError:
pass
result = _write_full_zip_backup(out_path, hermes_root)
if result is None:
return None
_prune_pre_update_backups(backup_dir, keep=keep)
return out_path
# ---------------------------------------------------------------------------
# Pre-migration auto-backup (used by `hermes claw migrate`)
# ---------------------------------------------------------------------------
_PRE_MIGRATION_PREFIX = "pre-migration-"
_PRE_MIGRATION_DEFAULT_KEEP = 5
def _prune_pre_migration_backups(backup_dir: Path, keep: int) -> int:
"""Remove oldest pre-migration backups beyond the keep limit.
Only touches files matching ``pre-migration-*.zip`` so other backups in
the same directory are never touched.
"""
if keep < 0:
keep = 0
if not backup_dir.exists():
return 0
backups = sorted(
(p for p in backup_dir.iterdir()
if p.is_file() and p.name.startswith(_PRE_MIGRATION_PREFIX) and p.suffix.lower() == ".zip"),
key=lambda p: p.name,
reverse=True,
)
deleted = 0
for p in backups[keep:]:
try:
p.unlink()
deleted += 1
except OSError as exc:
logger.warning("Failed to prune pre-migration backup %s: %s", p.name, exc)
return deleted
def create_pre_migration_backup(
hermes_home: Optional[Path] = None,
keep: int = _PRE_MIGRATION_DEFAULT_KEEP,
) -> Optional[Path]:
"""Create a full zip backup of HERMES_HOME under ``backups/`` before a
``hermes claw migrate`` apply.
Shares implementation with :func:`create_pre_update_backup` via
``_write_full_zip_backup`` same exclusions, same SQLite safe-copy,
restorable with ``hermes import <archive>``. Writes to
``<HERMES_HOME>/backups/pre-migration-<timestamp>.zip`` and auto-prunes
old pre-migration backups.
Returns the path to the created zip, or ``None`` if nothing was found
to back up (fresh install) or the write failed. Never raises the
caller decides whether to abort or proceed.
"""
hermes_root = hermes_home or get_default_hermes_root()
if not hermes_root.is_dir():
return None
# Reuses the shared backups/ directory so `hermes import` and the
# update-backup listing pick up pre-migration archives too.
backup_dir = _pre_update_backup_dir(hermes_root)
try:
backup_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
logger.warning("Could not create pre-migration backup dir %s: %s", backup_dir, exc)
return None
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_MIGRATION_PREFIX}{stamp}.zip"
result = _write_full_zip_backup(out_path, hermes_root)
if result is None:
return None
_prune_pre_migration_backups(backup_dir, keep=keep)
return out_path
-1
View File
@@ -562,7 +562,6 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
right_content = "\n".join(right_lines)
layout_table.add_row(left_content, right_content)
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
version_label = format_banner_version_label()
+138
View File
@@ -0,0 +1,138 @@
"""Shared helpers for attaching Hermes to a local Chrome CDP port."""
from __future__ import annotations
import os
import platform
import shlex
import shutil
import subprocess
from hermes_constants import get_hermes_home
DEFAULT_BROWSER_CDP_PORT = 9222
DEFAULT_BROWSER_CDP_URL = f"http://127.0.0.1:{DEFAULT_BROWSER_CDP_PORT}"
_DARWIN_APPS = (
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Chromium.app/Contents/MacOS/Chromium",
"/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
)
_WINDOWS_INSTALL_PARTS = (
("Google", "Chrome", "Application", "chrome.exe"),
("Chromium", "Application", "chrome.exe"),
("Chromium", "Application", "chromium.exe"),
("BraveSoftware", "Brave-Browser", "Application", "brave.exe"),
("Microsoft", "Edge", "Application", "msedge.exe"),
)
_LINUX_BIN_NAMES = (
"google-chrome", "google-chrome-stable", "chromium-browser",
"chromium", "brave-browser", "microsoft-edge",
)
_WINDOWS_BIN_NAMES = (
"chrome.exe", "msedge.exe", "brave.exe", "chromium.exe",
"chrome", "msedge", "brave", "chromium",
)
def get_chrome_debug_candidates(system: str) -> list[str]:
candidates: list[str] = []
seen: set[str] = set()
def add(path: str | None) -> None:
if not path:
return
normalized = os.path.normcase(os.path.normpath(path))
if normalized in seen or not os.path.isfile(path):
return
candidates.append(path)
seen.add(normalized)
def add_install_paths(bases: tuple[str | None, ...]) -> None:
for base in filter(None, bases):
for parts in _WINDOWS_INSTALL_PARTS:
add(os.path.join(base, *parts))
if system == "Darwin":
for app in _DARWIN_APPS:
add(app)
return candidates
if system == "Windows":
for name in _WINDOWS_BIN_NAMES:
add(shutil.which(name))
add_install_paths((
os.environ.get("ProgramFiles"),
os.environ.get("ProgramFiles(x86)"),
os.environ.get("LOCALAPPDATA"),
))
return candidates
for name in _LINUX_BIN_NAMES:
add(shutil.which(name))
add_install_paths(("/mnt/c/Program Files", "/mnt/c/Program Files (x86)"))
return candidates
def chrome_debug_data_dir() -> str:
return str(get_hermes_home() / "chrome-debug")
def _chrome_debug_args(port: int) -> list[str]:
return [
f"--remote-debugging-port={port}",
f"--user-data-dir={chrome_debug_data_dir()}",
"--no-first-run",
"--no-default-browser-check",
]
def manual_chrome_debug_command(port: int = DEFAULT_BROWSER_CDP_PORT, system: str | None = None) -> str | None:
system = system or platform.system()
candidates = get_chrome_debug_candidates(system)
if candidates:
argv = [candidates[0], *_chrome_debug_args(port)]
return subprocess.list2cmdline(argv) if system == "Windows" else shlex.join(argv)
if system == "Darwin":
data_dir = chrome_debug_data_dir()
return (
f'open -a "Google Chrome" --args --remote-debugging-port={port} '
f'--user-data-dir="{data_dir}" --no-first-run --no-default-browser-check'
)
return None
def _detach_kwargs(system: str) -> dict:
if system != "Windows":
return {"start_new_session": True}
flags = getattr(subprocess, "DETACHED_PROCESS", 0) | getattr(
subprocess, "CREATE_NEW_PROCESS_GROUP", 0
)
return {"creationflags": flags} if flags else {}
def try_launch_chrome_debug(port: int = DEFAULT_BROWSER_CDP_PORT, system: str | None = None) -> bool:
system = system or platform.system()
candidates = get_chrome_debug_candidates(system)
if not candidates:
return False
os.makedirs(chrome_debug_data_dir(), exist_ok=True)
try:
subprocess.Popen(
[candidates[0], *_chrome_debug_args(port)],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**_detach_kwargs(system),
)
return True
except Exception:
return False
+67 -6
View File
@@ -4,7 +4,8 @@ Usage:
hermes claw migrate # Preview then migrate (always shows preview first)
hermes claw migrate --dry-run # Preview only, no changes
hermes claw migrate --yes # Skip confirmation prompt
hermes claw migrate --preset full --overwrite # Full migration, overwrite conflicts
hermes claw migrate --preset full --overwrite --migrate-secrets # Full run w/ secrets
hermes claw migrate --no-backup # Skip pre-migration snapshot
hermes claw cleanup # Archive leftover OpenClaw directories
hermes claw cleanup --dry-run # Preview what would be archived
"""
@@ -15,6 +16,7 @@ import subprocess
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home, get_config_path, load_config, save_config
from hermes_constants import get_optional_skills_dir
@@ -321,10 +323,13 @@ def _cmd_migrate(args):
migrate_secrets = getattr(args, "migrate_secrets", False)
workspace_target = getattr(args, "workspace_target", None)
skill_conflict = getattr(args, "skill_conflict", "skip")
no_backup = getattr(args, "no_backup", False)
# If using the "full" preset, secrets are included by default
if preset == "full":
migrate_secrets = True
# Secrets are never included implicitly — they must be explicitly requested
# via --migrate-secrets, even under --preset full. This mirrors OpenClaw's
# migrate-hermes posture (two-phase: run once without secrets, rerun with
# --include-secrets) and prevents a --preset full invocation from silently
# importing API keys that the user may not have intended to copy.
print()
print(
@@ -431,15 +436,24 @@ def _cmd_migrate(args):
preview_summary = preview_report.get("summary", {})
preview_count = preview_summary.get("migrated", 0)
preview_conflicts = preview_summary.get("conflict", 0)
if preview_count == 0:
# "Nothing to migrate" means nothing migrated AND nothing blocked by
# conflicts. If there are conflicts, we still want to show the plan and
# surface the refusal/--overwrite guidance instead of silently bailing.
if preview_count == 0 and preview_conflicts == 0:
print()
print_info("Nothing to migrate from OpenClaw.")
_print_migration_report(preview_report, dry_run=True)
return
print()
print_header(f"Migration Preview — {preview_count} item(s) would be imported")
if preview_count > 0:
print_header(f"Migration Preview — {preview_count} item(s) would be imported")
else:
print_header(
f"Migration Preview — {preview_conflicts} conflict(s), nothing would be imported"
)
print_info("No changes have been made yet. Review the list below:")
_print_migration_report(preview_report, dry_run=True)
@@ -447,6 +461,24 @@ def _cmd_migrate(args):
if dry_run:
return
# ── Phase 1b: Refuse if the plan has conflicts and --overwrite is not set ─
# Modelled on OpenClaw's assertConflictFreePlan() — apply is a safe no-op
# on conflicts unless the user explicitly opts in to overwriting. Without
# this guard, the user would answer "yes, proceed" and silently end up
# with a migration that skipped every conflicting item.
if preview_conflicts > 0 and not overwrite:
print()
print_error(
f"Plan has {preview_conflicts} conflict(s). Refusing to apply."
)
print_info(
"Each conflict is an item whose target already exists in ~/.hermes/. "
"Re-run with --overwrite to replace conflicting targets (item-level "
"backups are written to the migration report directory)."
)
print_info("Or re-run with --dry-run to review the full plan.")
return
# ── Phase 2: Confirm and execute ───────────────────────────
print()
if not auto_yes:
@@ -458,6 +490,32 @@ def _cmd_migrate(args):
print_info("Migration cancelled.")
return
# ── Phase 2b: Pre-apply backup of the Hermes home ─────────
# Delegates to hermes_cli.backup.create_pre_migration_backup(), which
# shares implementation with the pre-update backup (same exclusion
# rules, same SQLite safe-copy, zip format) so the archive is
# restorable with `hermes import`. Mirrors OpenClaw's
# createPreMigrationBackup posture — one atomic restore point before
# any mutation, auto-pruned to the last 5 pre-migration zips.
backup_archive: Optional[Path] = None
if not no_backup:
try:
from hermes_cli.backup import create_pre_migration_backup, _format_size
backup_archive = create_pre_migration_backup(hermes_home=hermes_home)
if backup_archive:
size_str = _format_size(backup_archive.stat().st_size)
print()
print_success(f"Pre-migration backup: {backup_archive} ({size_str})")
print_info(f"Restore with: hermes import {backup_archive.name}")
except Exception as e:
print()
print_error(f"Could not create pre-migration backup: {e}")
print_info(
"Re-run with --no-backup to skip, or free up disk space under the Hermes home."
)
logger.debug("Pre-migration backup error", exc_info=True)
return
try:
migrator = mod.Migrator(
source_root=source_dir.resolve(),
@@ -476,6 +534,9 @@ def _cmd_migrate(args):
print()
print_error(f"Migration failed: {e}")
logger.debug("OpenClaw migration error", exc_info=True)
if backup_archive:
print_info(f"A pre-migration backup is available at: {backup_archive}")
print_info(f"Restore with: hermes import {backup_archive.name}")
return
# Print results
+60 -2
View File
@@ -115,6 +115,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
"Configuration", cli_only=True,
gateway_config_gate="display.tool_progress_command"),
CommandDef("footer", "Toggle gateway runtime-metadata footer on final replies",
"Configuration", args_hint="[on|off|status]",
subcommands=("on", "off", "status")),
CommandDef("yolo", "Toggle YOLO mode (skip all dangerous command approvals)",
"Configuration"),
CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
@@ -125,6 +128,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
cli_only=True, args_hint="[name]"),
CommandDef("indicator", "Pick the TUI busy-indicator style", "Configuration",
cli_only=True, args_hint="[kaomoji|emoji|unicode|ascii]",
subcommands=("kaomoji", "emoji", "unicode", "ascii")),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
@@ -142,6 +148,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("curator", "Background skill maintenance (status, run, pin, archive)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("status", "run", "pause", "resume", "pin", "unpin", "restore")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
@@ -174,8 +183,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("debug", "Upload debug report (system info + logs) and get shareable links", "Info"),
# Exit
CommandDef("quit", "Exit the CLI", "Exit",
cli_only=True, aliases=("exit",)),
CommandDef("quit", "Exit the CLI (use --delete to also remove session history)", "Exit",
cli_only=True, aliases=("exit",), args_hint="[--delete]"),
]
@@ -943,6 +952,42 @@ def slack_subcommand_map() -> dict[str, str]:
# Autocomplete
# ---------------------------------------------------------------------------
# Per-process cache for /model<space> LM Studio autocomplete. Probing on
# every keystroke would block the UI; a short TTL keeps it live without
# hammering the server.
_LMSTUDIO_COMPLETION_CACHE: tuple[float, list[str]] | None = None
def _lmstudio_completion_models() -> list[str]:
"""Locally-loaded LM Studio models for /model autocomplete (cached, gated)."""
global _LMSTUDIO_COMPLETION_CACHE
# Gate: don't probe 127.0.0.1 on every keystroke for users who don't use LM Studio.
if not (os.environ.get("LM_API_KEY") or os.environ.get("LM_BASE_URL")):
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store() or {}
if "lmstudio" not in (store.get("providers") or {}) \
and "lmstudio" not in (store.get("credential_pool") or {}):
return []
except Exception:
return []
now = time.time()
if _LMSTUDIO_COMPLETION_CACHE and (now - _LMSTUDIO_COMPLETION_CACHE[0]) < 30.0:
return _LMSTUDIO_COMPLETION_CACHE[1]
try:
from hermes_cli.models import fetch_lmstudio_models
models = fetch_lmstudio_models(
api_key=os.environ.get("LM_API_KEY", ""),
base_url=os.environ.get("LM_BASE_URL") or "http://127.0.0.1:1234/v1",
timeout=0.8,
)
except Exception:
models = []
_LMSTUDIO_COMPLETION_CACHE = (now, models)
return models
class SlashCommandCompleter(Completer):
"""Autocomplete for built-in slash commands, subcommands, and skill commands."""
@@ -1366,6 +1411,19 @@ class SlashCommandCompleter(Completer):
)
except Exception:
pass
# LM Studio: surface locally-loaded models. Gated on the user actually
# having LM Studio configured (env var or auth-store entry) so we
# don't probe 127.0.0.1 on every keystroke for users who don't use it.
for name in _lmstudio_completion_models():
if name in seen:
continue
if name.startswith(sub_lower) and name != sub_lower:
yield Completion(
name,
start_position=-len(sub_text),
display=name,
display_meta="LM Studio",
)
def get_completions(self, document, complete_event):
text = document.text_before_cursor
+337 -43
View File
@@ -30,34 +30,67 @@ logger = logging.getLogger(__name__)
_IS_WINDOWS = platform.system() == "Windows"
_ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
_LAST_EXPANDED_CONFIG_BY_PATH: Dict[str, Any] = {}
# (path, mtime_ns, size) -> cached expanded config dict.
# load_config() returns a deepcopy of the cached value when the file
# hasn't changed since the last load, skipping yaml.safe_load +
# _deep_merge + _normalize_* + _expand_env_vars (~13 ms/call).
# save_config() + migrate_config() write via atomic_yaml_write which
# produces a fresh inode, so stat() sees a new mtime_ns and the next
# load repopulates automatically — no explicit invalidation hook.
_LOAD_CONFIG_CACHE: Dict[str, Tuple[int, int, Dict[str, Any]]] = {}
# (path, mtime_ns, size) -> cached raw yaml dict. Same pattern as
# _LOAD_CONFIG_CACHE but for read_raw_config() — used when callers want
# the user's on-disk values without defaults merged in.
_RAW_CONFIG_CACHE: Dict[str, Tuple[int, int, Dict[str, Any]]] = {}
# Env var names written to .env that aren't in OPTIONAL_ENV_VARS
# (managed by setup/provider flows directly).
_EXTRA_ENV_KEYS = frozenset({
"OPENAI_API_KEY", "OPENAI_BASE_URL",
"ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
"DISCORD_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL",
"DISCORD_HOME_CHANNEL", "DISCORD_HOME_CHANNEL_NAME",
"TELEGRAM_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL_NAME",
"SLACK_HOME_CHANNEL", "SLACK_HOME_CHANNEL_NAME",
"SIGNAL_ACCOUNT", "SIGNAL_HTTP_URL",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"SIGNAL_HOME_CHANNEL", "SIGNAL_HOME_CHANNEL_NAME",
"SMS_HOME_CHANNEL", "SMS_HOME_CHANNEL_NAME",
"DINGTALK_CLIENT_ID", "DINGTALK_CLIENT_SECRET",
"DINGTALK_HOME_CHANNEL", "DINGTALK_HOME_CHANNEL_NAME",
"FEISHU_APP_ID", "FEISHU_APP_SECRET", "FEISHU_ENCRYPT_KEY", "FEISHU_VERIFICATION_TOKEN",
"FEISHU_HOME_CHANNEL", "FEISHU_HOME_CHANNEL_NAME",
"YUANBAO_HOME_CHANNEL", "YUANBAO_HOME_CHANNEL_NAME",
"WECOM_BOT_ID", "WECOM_SECRET",
"WECOM_CALLBACK_CORP_ID", "WECOM_CALLBACK_CORP_SECRET", "WECOM_CALLBACK_AGENT_ID",
"WECOM_CALLBACK_TOKEN", "WECOM_CALLBACK_ENCODING_AES_KEY",
"WECOM_CALLBACK_HOST", "WECOM_CALLBACK_PORT",
"WECOM_HOME_CHANNEL", "WECOM_HOME_CHANNEL_NAME",
"WEIXIN_ACCOUNT_ID", "WEIXIN_TOKEN", "WEIXIN_BASE_URL", "WEIXIN_CDN_BASE_URL",
"WEIXIN_HOME_CHANNEL", "WEIXIN_HOME_CHANNEL_NAME", "WEIXIN_DM_POLICY", "WEIXIN_GROUP_POLICY",
"WEIXIN_ALLOWED_USERS", "WEIXIN_GROUP_ALLOWED_USERS", "WEIXIN_ALLOW_ALL_USERS",
"BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_PASSWORD",
"BLUEBUBBLES_HOME_CHANNEL", "BLUEBUBBLES_HOME_CHANNEL_NAME",
"QQ_APP_ID", "QQ_CLIENT_SECRET", "QQBOT_HOME_CHANNEL", "QQBOT_HOME_CHANNEL_NAME",
"QQ_HOME_CHANNEL", "QQ_HOME_CHANNEL_NAME", # legacy aliases (pre-rename, still read for back-compat)
"QQ_ALLOWED_USERS", "QQ_GROUP_ALLOWED_USERS", "QQ_ALLOW_ALL_USERS", "QQ_MARKDOWN_SUPPORT",
"QQ_STT_API_KEY", "QQ_STT_BASE_URL", "QQ_STT_MODEL",
"TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_HOME_CHANNEL_NAME", "MATTERMOST_REPLY_MODE",
"MATRIX_PASSWORD", "MATRIX_ENCRYPTION", "MATRIX_DEVICE_ID", "MATRIX_HOME_ROOM",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD", "MATRIX_DM_AUTO_THREAD",
"MATRIX_RECOVERY_KEY",
# Langfuse observability plugin — optional tuning keys + standard SDK vars.
# Activation is via plugins.enabled (opt-in through `hermes plugins enable
# observability/langfuse` or `hermes tools → Langfuse`); credentials gate
# the plugin at runtime.
"HERMES_LANGFUSE_ENV",
"HERMES_LANGFUSE_RELEASE",
"HERMES_LANGFUSE_SAMPLE_RATE",
"HERMES_LANGFUSE_MAX_CHARS",
"HERMES_LANGFUSE_DEBUG",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
})
import yaml
@@ -206,6 +239,7 @@ def get_container_exec_info() -> Optional[dict]:
# Re-export from hermes_constants — canonical definition lives there.
from hermes_constants import get_hermes_home # noqa: F811,E402
from utils import atomic_replace
def get_config_path() -> Path:
"""Get the main config file path."""
@@ -389,6 +423,20 @@ DEFAULT_CONFIG = {
# (60+ tool iterations with tiny output) before users assume the
# bot is dead and /restart.
"gateway_notify_interval": 180,
# Freshness window for the gateway auto-continue note (seconds).
# After a gateway crash/restart/SIGTERM mid-run, the next user
# message gets a "[System note: your previous turn was
# interrupted — process the unfinished tool result(s) first]"
# prepended so the model picks up where it left off. That's the
# right behaviour while the interruption is fresh, but stale
# markers (transcript last touched hours or days ago) can revive
# an unrelated old task when the user's next message starts new
# work. This window is the max age of the last persisted
# transcript row for which we still inject the continue note.
# Default 3600s comfortably covers a long turn (gateway_timeout
# default is 1800s) plus runtime slack. Set to 0 to disable the
# gate and restore pre-fix behaviour (always inject).
"gateway_auto_continue_freshness": 3600,
# How user-attached images are presented to the main model on each turn.
# "auto" — attach natively when the active model reports
# supports_vision=True AND the user hasn't explicitly
@@ -451,7 +499,8 @@ DEFAULT_CONFIG = {
"singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
"modal_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"daytona_image": "nikolaik/python-nodejs:python3.11-nodejs20",
# Container resource limits (docker, singularity, modal, daytona — ignored for local/ssh)
"vercel_runtime": "node24",
# Container resource limits (docker, singularity, modal, daytona, vercel_sandbox — ignored for local/ssh)
"container_cpu": 1,
"container_memory": 5120, # MB (default 5GB)
"container_disk": 51200, # MB (default 50GB)
@@ -467,6 +516,16 @@ DEFAULT_CONFIG = {
# Explicit opt-in: mount the host cwd into /workspace for Docker sessions.
# Default off because passing host directories into a sandbox weakens isolation.
"docker_mount_cwd_to_workspace": False,
# Explicit opt-in: run the Docker container as the host user's uid:gid
# (via `--user`). When enabled, files written into bind-mounted dirs
# (docker_volumes, the persistent workspace, or the auto-mounted cwd)
# are owned by your host user instead of root, which avoids needing
# `sudo chown` after container runs. Default off to preserve behavior
# for images whose entrypoints expect to start as root (e.g. the
# bundled Hermes image, which drops to the `hermes` user via gosu).
# When on, SETUID/SETGID caps are omitted from the container since
# no privilege drop is needed.
"docker_run_as_host_user": False,
# Persistent shell — keep a long-lived bash shell across execute() calls
# so cwd/env vars/shell variables survive between commands.
# Enabled by default for non-local backends (SSH); local is always opt-in
@@ -546,7 +605,7 @@ DEFAULT_CONFIG = {
"threshold": 0.50, # compress when context usage exceeds this ratio
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
"hygiene_hard_message_limit": 400, # gateway session-hygiene force-compress threshold by message count
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
@@ -655,6 +714,11 @@ DEFAULT_CONFIG = {
"personality": "kawaii",
"resume_display": "full",
"busy_input_mode": "interrupt", # interrupt | queue | steer
# When true, `hermes --tui` auto-resumes the most recent human-
# facing session on launch instead of forging a fresh one.
# Mirrors `hermes -c` muscle memory. Default off so existing
# users aren't surprised. HERMES_TUI_RESUME=<id> always wins.
"tui_auto_resume_recent": False,
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
@@ -662,6 +726,9 @@ DEFAULT_CONFIG = {
"inline_diffs": True, # Show inline diff previews for write actions (write_file, patch, skill_manage)
"show_cost": False, # Show $ cost in the status bar (off by default)
"skin": "default",
# TUI busy indicator style: kaomoji (default), emoji, unicode (braille
# spinner), or ascii. Live-swappable via `/indicator <style>`.
"tui_status_indicator": "kaomoji",
"user_message_preview": { # CLI: how many submitted user-message lines to echo back in scrollback
"first_lines": 2,
"last_lines": 2,
@@ -671,6 +738,14 @@ DEFAULT_CONFIG = {
"tool_progress_overrides": {}, # DEPRECATED — use display.platforms instead
"tool_preview_length": 0, # Max chars for tool call previews (0 = no limit, show full paths/commands)
"platforms": {}, # Per-platform display overrides: {"telegram": {"tool_progress": "all"}, "slack": {"tool_progress": "off"}}
# Gateway runtime-metadata footer appended to the FINAL message of a turn
# (disabled by default to keep replies minimal). When enabled, renders
# e.g. `model · 68% · ~/projects/hermes`. Per-platform overrides go under
# display.platforms.<platform>.runtime_footer.
"runtime_footer": {
"enabled": False,
"fields": ["model", "context_pct", "cwd"], # Order shown; drop any to hide
},
},
# Web dashboard settings
@@ -851,6 +926,35 @@ DEFAULT_CONFIG = {
"guard_agent_created": False,
},
# Curator — background skill maintenance.
#
# Periodically reviews AGENT-CREATED skills (never bundled or
# hub-installed) and keeps the collection tidy: marks long-unused skills
# as stale, archives genuinely obsolete ones (archive only, never
# deletes), and spawns a forked aux-model agent to consolidate overlaps
# and patch drift. Runs inactivity-triggered from session start — no
# cron daemon.
#
# See `hermes curator status` for the last run summary.
"curator": {
"enabled": True,
# How long to wait between curator runs (hours). Default: 7 days.
"interval_hours": 24 * 7,
# Only run when the agent has been idle at least this long (hours).
"min_idle_hours": 2,
# Mark a skill as "stale" after this many days without use.
"stale_after_days": 30,
# Archive a skill (move to skills/.archive/) after this many days
# without use. Archived skills are recoverable — no auto-deletion.
"archive_after_days": 90,
# Optional per-task override for the curator's aux model. Leave null
# to use Hermes' main auxiliary client resolution.
"auxiliary": {
"provider": None,
"model": None,
},
},
# Honcho AI-native memory -- reads ~/.honcho/config.json as single source of truth.
# This section is only needed for hermes-specific overrides; everything else
# (apiKey, workspace, peerName, sessions, enabled) comes from the global config.
@@ -888,6 +992,7 @@ DEFAULT_CONFIG = {
# Telegram platform settings (gateway mode)
"telegram": {
"reactions": False, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
},
@@ -942,7 +1047,7 @@ DEFAULT_CONFIG = {
# Pre-exec security scanning via tirith
"security": {
"allow_private_urls": False, # Allow requests to private/internal IPs (for OpenWrt, proxies, VPNs)
"redact_secrets": True,
"redact_secrets": False,
"tirith_enabled": True,
"tirith_path": "tirith",
"tirith_timeout": 5,
@@ -1166,6 +1271,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"LM_API_KEY": {
"description": "LM Studio bearer token for auth-enabled local servers",
"prompt": "LM Studio API key / bearer token",
"url": None,
"password": True,
"category": "provider",
"advanced": True,
},
"LM_BASE_URL": {
"description": "LM Studio base URL override",
"prompt": "LM Studio base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"GLM_API_KEY": {
"description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
"prompt": "Z.AI / GLM API key",
@@ -1692,6 +1813,30 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Langfuse observability ──
"HERMES_LANGFUSE_PUBLIC_KEY": {
"description": "Langfuse project public key (pk-lf-...)",
"prompt": "Langfuse public key",
"url": "https://cloud.langfuse.com",
"password": False,
"category": "tool",
},
"HERMES_LANGFUSE_SECRET_KEY": {
"description": "Langfuse project secret key (sk-lf-...)",
"prompt": "Langfuse secret key",
"url": "https://cloud.langfuse.com",
"password": True,
"category": "tool",
},
"HERMES_LANGFUSE_BASE_URL": {
"description": "Langfuse server URL (default: https://cloud.langfuse.com)",
"prompt": "Langfuse server URL (leave empty for cloud.langfuse.com)",
"url": None,
"password": False,
"category": "tool",
"advanced": True,
},
# ── Messaging platforms ──
"TELEGRAM_BOT_TOKEN": {
"description": "Telegram bot token from @BotFather",
@@ -1839,6 +1984,14 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"MATRIX_DM_AUTO_THREAD": {
"description": "Auto-create threads for DM messages in Matrix (default: false)",
"prompt": "Auto-create threads in DMs (true/false)",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"MATRIX_DEVICE_ID": {
"description": "Stable Matrix device ID for E2EE persistence across restarts (e.g. HERMES_BOT)",
"prompt": "Matrix device ID (stable across restarts)",
@@ -2180,14 +2333,21 @@ def _normalize_custom_provider_entry(
"baseUrl": "base_url",
"apiMode": "api_mode",
"keyEnv": "key_env",
"apiKeyEnv": "key_env", # alias — OpenClaw-compatible + docs variant
"defaultModel": "default_model",
"contextLength": "context_length",
"rateLimitDelay": "rate_limit_delay",
}
# api_key_env is a documented snake_case alias for key_env (see
# website/docs/guides/azure-foundry.md). Normalize it up front so the
# rest of the normalizer treats it as the canonical field.
if "api_key_env" in entry and "key_env" not in entry:
entry["key_env"] = entry["api_key_env"]
_KNOWN_KEYS = {
"name", "api", "url", "base_url", "api_key", "key_env",
"name", "api", "url", "base_url", "api_key", "key_env", "api_key_env",
"api_mode", "transport", "model", "default_model", "models",
"context_length", "rate_limit_delay",
"request_timeout_seconds", "stale_timeout_seconds",
}
for camel, snake in _CAMEL_ALIASES.items():
if camel in entry and snake not in entry:
@@ -2439,6 +2599,9 @@ _KNOWN_ROOT_KEYS = {
_VALID_CUSTOM_PROVIDER_FIELDS = {
"name", "base_url", "api_key", "api_mode", "model", "models",
"context_length", "rate_limit_delay",
# key_env is read at runtime by runtime_provider.py and auxiliary_client.py
# — include it here so the set accurately describes the supported schema.
"key_env",
}
# Fields that look like they should be inside custom_providers, not at root
@@ -2515,10 +2678,32 @@ def validate_config_structure(config: Optional[Dict[str, Any]] = None) -> List["
"Add the API endpoint URL, e.g.: base_url: https://api.example.com/v1",
))
# ── fallback_model must be a top-level dict with provider + model ────
# ── fallback_model: single dict OR list of dicts (chain) ─────────────
fb = config.get("fallback_model")
if fb is not None:
if not isinstance(fb, dict):
if isinstance(fb, list):
# Chain fallback — validate each entry
for i, entry in enumerate(fb):
if not isinstance(entry, dict):
issues.append(ConfigIssue(
"error",
f"fallback_model[{i}] should be a dict, got {type(entry).__name__}",
"Each entry needs provider + model",
))
else:
if not entry.get("provider"):
issues.append(ConfigIssue(
"warning",
f"fallback_model[{i}] is missing 'provider' field",
"Add: provider: openrouter (or another provider)",
))
if not entry.get("model"):
issues.append(ConfigIssue(
"warning",
f"fallback_model[{i}] is missing 'model' field",
"Add: model: <model-name>",
))
elif not isinstance(fb, dict):
issues.append(ConfigIssue(
"error",
f"fallback_model should be a dict with 'provider' and 'model', got {type(fb).__name__}",
@@ -3303,6 +3488,52 @@ def _normalize_max_turns_config(config: Dict[str, Any]) -> Dict[str, Any]:
return config
def cfg_get(cfg: Optional[Dict[str, Any]], *keys: str, default: Any = None) -> Any:
"""Traverse nested dict keys safely, returning ``default`` on any miss.
Canonical helper for the ``cfg.get("X", {}).get("Y", default)`` pattern
that appears 50+ times across the codebase. Handles three common gotchas
in one place:
1. Missing intermediate keys (returns ``default``, no KeyError).
2. An intermediate value that's not a dict (e.g. a user wrote a string
where a section was expected). Returns ``default`` instead of
AttributeError on ``.get()``.
3. ``cfg is None`` (callers sometimes pass ``load_config() or None``).
Named ``cfg_get`` rather than ``cfg_path`` to avoid shadowing the
ubiquitous ``cfg_path = _hermes_home / "config.yaml"`` local variable
that appears in gateway/run.py, cron/scheduler.py, main.py, etc.
Explicit ``None`` values are returned as-is (matches ``dict.get(key,
default)`` semantics ``default`` is only returned when the key is
*absent*, not when it's present but set to ``None``).
Examples:
>>> cfg_get({"agent": {"reasoning_effort": "high"}}, "agent", "reasoning_effort")
'high'
>>> cfg_get({}, "agent", "reasoning_effort", default="medium")
'medium'
>>> cfg_get({"agent": "oops_a_string"}, "agent", "reasoning_effort", default="low")
'low'
>>> cfg_get(None, "anything", default=42)
42
>>> cfg_get({"a": {"b": None}}, "a", "b", default="def") # explicit None preserved
>>> cfg_get({"a": {"b": False}}, "a", "b", default=True) # falsy values preserved
False
"""
if not isinstance(cfg, dict):
return default
node: Any = cfg
for key in keys:
if not isinstance(node, dict):
return default
if key not in node:
return default
node = node[key]
return node
def read_raw_config() -> Dict[str, Any]:
"""Read ~/.hermes/config.yaml as-is, without merging defaults or migrating.
@@ -3311,25 +3542,62 @@ def read_raw_config() -> Dict[str, Any]:
be parsed. Use this for lightweight config reads where you just need a
single value and don't want the overhead of ``load_config()``'s deep-merge
+ migration pipeline.
Cached on the config file's (mtime_ns, size) — same strategy as
``load_config()``. Returns a deepcopy on every call since some callers
mutate the result before passing to ``save_config()``.
"""
try:
config_path = get_config_path()
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
st = config_path.stat()
cache_key = (st.st_mtime_ns, st.st_size)
except (FileNotFoundError, OSError):
return {}
path_key = str(config_path)
cached = _RAW_CONFIG_CACHE.get(path_key)
if cached is not None and cached[:2] == cache_key:
return copy.deepcopy(cached[2])
try:
with open(config_path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
except Exception:
pass
return {}
return {}
if not isinstance(data, dict):
data = {}
_RAW_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], copy.deepcopy(data))
return data
def load_config() -> Dict[str, Any]:
"""Load configuration from ~/.hermes/config.yaml."""
"""Load configuration from ~/.hermes/config.yaml.
Cached on the config file's (mtime_ns, size). Returns a deepcopy of
the cached value when unchanged, since most call sites mutate the
result (e.g. ``cfg["model"]["default"] = ...`` before ``save_config``).
The cache is keyed on ``str(config_path)`` so profile switches
(which change ``HERMES_HOME`` and therefore ``get_config_path()``)
don't collide.
"""
ensure_hermes_home()
config_path = get_config_path()
path_key = str(config_path)
try:
st = config_path.stat()
cache_key: Optional[Tuple[int, int]] = (st.st_mtime_ns, st.st_size)
except FileNotFoundError:
cache_key = None
cached = _LOAD_CONFIG_CACHE.get(path_key)
if cached is not None and cache_key is not None and cached[:2] == cache_key:
return copy.deepcopy(cached[2])
config = copy.deepcopy(DEFAULT_CONFIG)
if config_path.exists():
if cache_key is not None:
try:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
@@ -3347,20 +3615,26 @@ def load_config() -> Dict[str, Any]:
normalized = _normalize_root_model_keys(_normalize_max_turns_config(config))
expanded = _expand_env_vars(normalized)
_LAST_EXPANDED_CONFIG_BY_PATH[str(config_path)] = copy.deepcopy(expanded)
_LAST_EXPANDED_CONFIG_BY_PATH[path_key] = copy.deepcopy(expanded)
if cache_key is not None:
_LOAD_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], copy.deepcopy(expanded))
else:
_LOAD_CONFIG_CACHE.pop(path_key, None)
return expanded
_SECURITY_COMMENT = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default — tool output (terminal stdout,
# read_file results, web content) passes through unmodified. Set
# redact_secrets to true to mask strings that look like API keys, tokens,
# and passwords before they enter the model context and logs.
# tirith pre-exec scanning is enabled by default when the tirith binary
# is available. Configure via security.tirith_* keys or env vars
# (TIRITH_ENABLED, TIRITH_BIN, TIRITH_TIMEOUT, TIRITH_FAIL_OPEN).
#
# security:
# redact_secrets: false
# redact_secrets: true
# tirith_enabled: true
# tirith_path: "tirith"
# tirith_timeout: 5
@@ -3393,11 +3667,11 @@ _FALLBACK_COMMENT = """
_COMMENTED_SECTIONS = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default. Set to true to mask strings that
# look like API keys, tokens, and passwords in tool output and logs.
#
# security:
# redact_secrets: false
# redact_secrets: true
# ── Fallback Model ────────────────────────────────────────────────────
# Automatic provider failover when primary is unavailable.
@@ -3448,7 +3722,12 @@ def save_config(config: Dict[str, Any]):
if not sec or sec.get("redact_secrets") is None:
parts.append(_SECURITY_COMMENT)
fb = normalized.get("fallback_model", {})
if not fb or not isinstance(fb, dict) or not (fb.get("provider") and fb.get("model")):
fb_is_valid = False
if isinstance(fb, list):
fb_is_valid = any(isinstance(e, dict) and e.get("provider") and e.get("model") for e in fb)
elif isinstance(fb, dict):
fb_is_valid = bool(fb.get("provider") and fb.get("model"))
if not fb_is_valid:
parts.append(_FALLBACK_COMMENT)
atomic_yaml_write(
@@ -3517,18 +3796,27 @@ def _sanitize_env_lines(lines: list) -> list:
# Detect concatenated KEY=VALUE pairs on one line.
# Search for known KEY= patterns at any position in the line.
split_positions = []
# We collect full needle ranges so we can drop matches that are
# fully contained within a longer overlapping needle. Without this,
# suffix collisions corrupt the file: e.g. LM_API_KEY= inside
# GLM_API_KEY= would otherwise split the line into "G\nLM_API_KEY=...".
match_ranges: list[tuple[int, int]] = []
for key_name in known_keys:
needle = key_name + "="
idx = stripped.find(needle)
while idx >= 0:
split_positions.append(idx)
match_ranges.append((idx, idx + len(needle)))
idx = stripped.find(needle, idx + len(needle))
split_positions = sorted({
s for s, e in match_ranges
if not any(
s2 <= s and e2 >= e and (s2, e2) != (s, e)
for s2, e2 in match_ranges
)
})
if len(split_positions) > 1:
split_positions.sort()
# Deduplicate (shouldn't happen, but be safe)
split_positions = sorted(set(split_positions))
for i, pos in enumerate(split_positions):
end = split_positions[i + 1] if i + 1 < len(split_positions) else len(stripped)
part = stripped[pos:end].strip()
@@ -3574,7 +3862,7 @@ def sanitize_env_file() -> int:
f.writelines(sanitized)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
atomic_replace(tmp_path, env_path)
except BaseException:
try:
os.unlink(tmp_path)
@@ -3637,7 +3925,7 @@ def save_env_value(key: str, value: str):
value = _check_non_ascii_credential(key, value)
ensure_hermes_home()
env_path = get_env_path()
# On Windows, open() defaults to the system locale (cp1252) which can
# cause OSError errno 22 on UTF-8 .env files.
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
@@ -3649,7 +3937,7 @@ def save_env_value(key: str, value: str):
lines = f.readlines()
# Sanitize on every read: split concatenated keys, drop stale placeholders
lines = _sanitize_env_lines(lines)
# Find and update or append
found = False
for i, line in enumerate(lines):
@@ -3657,7 +3945,7 @@ def save_env_value(key: str, value: str):
lines[i] = f"{key}={value}\n"
found = True
break
if not found:
# Ensure there's a newline at the end of the file before appending
if lines and not lines[-1].endswith("\n"):
@@ -3677,7 +3965,7 @@ def save_env_value(key: str, value: str):
f.writelines(lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
atomic_replace(tmp_path, env_path)
# Restore original permissions before _secure_file may tighten them.
if original_mode is not None:
try:
@@ -3733,7 +4021,7 @@ def remove_env_value(key: str) -> bool:
f.writelines(new_lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
atomic_replace(tmp_path, env_path)
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
@@ -3820,12 +4108,13 @@ def get_env_value(key: str) -> Optional[str]:
# =============================================================================
def redact_key(key: str) -> str:
"""Redact an API key for display."""
if not key:
return color("(not set)", Colors.DIM)
if len(key) < 12:
return "***"
return key[:4] + "..." + key[-4:]
"""Redact an API key for display.
Thin wrapper over :func:`agent.redact.mask_secret` preserves the
"(not set)" placeholder in dim color for the empty case.
"""
from agent.redact import mask_secret
return mask_secret(key, empty=color("(not set)", Colors.DIM))
def show_config():
@@ -3905,6 +4194,9 @@ def show_config():
print(f" Daytona image: {terminal.get('daytona_image', 'nikolaik/python-nodejs:python3.11-nodejs20')}")
daytona_key = get_env_value('DAYTONA_API_KEY')
print(f" API key: {'configured' if daytona_key else '(not set)'}")
elif terminal.get('backend') == 'vercel_sandbox':
print(f" Vercel runtime: {terminal.get('vercel_runtime', 'node24')}")
print(f" Vercel auth: {'configured' if get_env_value('VERCEL_OIDC_TOKEN') or (get_env_value('VERCEL_TOKEN') and get_env_value('VERCEL_PROJECT_ID') and get_env_value('VERCEL_TEAM_ID')) else '(not set)'}")
elif terminal.get('backend') == 'ssh':
ssh_host = get_env_value('TERMINAL_SSH_HOST')
ssh_user = get_env_value('TERMINAL_SSH_USER')
@@ -4097,7 +4389,9 @@ def set_config_value(key: str, value: str):
"terminal.singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"terminal.modal_image": "TERMINAL_MODAL_IMAGE",
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"terminal.cwd": "TERMINAL_CWD",
"terminal.timeout": "TERMINAL_TIMEOUT",
"terminal.sandbox_dir": "TERMINAL_SANDBOX_DIR",
+235
View File
@@ -0,0 +1,235 @@
"""CLI subcommand: `hermes curator <subcommand>`.
Thin shell around agent/curator.py and tools/skill_usage.py. Renders a status
table, triggers a run, pauses/resumes, and pins/unpins skills.
This module intentionally has no side effects at import time main.py wires
the argparse subparsers on demand.
"""
from __future__ import annotations
import argparse
import sys
from datetime import datetime, timezone
from typing import Optional
def _fmt_ts(ts: Optional[str]) -> str:
if not ts:
return "never"
try:
dt = datetime.fromisoformat(ts)
except (TypeError, ValueError):
return str(ts)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
delta = datetime.now(timezone.utc) - dt
secs = int(delta.total_seconds())
if secs < 60:
return f"{secs}s ago"
if secs < 3600:
return f"{secs // 60}m ago"
if secs < 86400:
return f"{secs // 3600}h ago"
return f"{secs // 86400}d ago"
def _cmd_status(args) -> int:
from agent import curator
from tools import skill_usage
state = curator.load_state()
enabled = curator.is_enabled()
paused = state.get("paused", False)
last_run = state.get("last_run_at")
summary = state.get("last_run_summary") or "(none)"
runs = state.get("run_count", 0)
status_line = (
"ENABLED" if enabled and not paused else
"PAUSED" if paused else
"DISABLED"
)
print(f"curator: {status_line}")
print(f" runs: {runs}")
print(f" last run: {_fmt_ts(last_run)}")
print(f" last summary: {summary}")
_report = state.get("last_report_path")
if _report:
print(f" last report: {_report}")
_ih = curator.get_interval_hours()
_interval_label = (
f"{_ih // 24}d" if _ih % 24 == 0 and _ih >= 24
else f"{_ih}h"
)
print(f" interval: every {_interval_label}")
print(f" stale after: {curator.get_stale_after_days()}d unused")
print(f" archive after: {curator.get_archive_after_days()}d unused")
rows = skill_usage.agent_created_report()
if not rows:
print("\nno agent-created skills")
return 0
by_state = {"active": [], "stale": [], "archived": []}
pinned = []
for r in rows:
state_name = r.get("state", "active")
by_state.setdefault(state_name, []).append(r)
if r.get("pinned"):
pinned.append(r["name"])
print(f"\nagent-created skills: {len(rows)} total")
for state_name in ("active", "stale", "archived"):
bucket = by_state.get(state_name, [])
print(f" {state_name:10s} {len(bucket)}")
if pinned:
print(f"\npinned ({len(pinned)}): {', '.join(pinned)}")
# Show top 5 least-recently-used active skills
active = sorted(
by_state.get("active", []),
key=lambda r: r.get("last_used_at") or r.get("created_at") or "",
)[:5]
if active:
print("\nleast recently used (top 5):")
for r in active:
last = _fmt_ts(r.get("last_used_at"))
print(f" {r['name']:40s} use={r.get('use_count', 0):3d} last_used={last}")
return 0
def _cmd_run(args) -> int:
from agent import curator
if not curator.is_enabled():
print("curator: disabled via config; enable with `curator.enabled: true`")
return 1
print("curator: running review pass...")
def _on_summary(msg: str) -> None:
print(msg)
result = curator.run_curator_review(
on_summary=_on_summary,
synchronous=bool(args.synchronous),
)
auto = result.get("auto_transitions", {})
if auto:
print(
f"auto: checked={auto.get('checked', 0)} "
f"stale={auto.get('marked_stale', 0)} "
f"archived={auto.get('archived', 0)} "
f"reactivated={auto.get('reactivated', 0)}"
)
if not args.synchronous:
print("llm pass running in background — check `hermes curator status` later")
return 0
def _cmd_pause(args) -> int:
from agent import curator
curator.set_paused(True)
print("curator: paused")
return 0
def _cmd_resume(args) -> int:
from agent import curator
curator.set_paused(False)
print("curator: resumed")
return 0
def _cmd_pin(args) -> int:
from tools import skill_usage
if not skill_usage.is_agent_created(args.skill):
print(
f"curator: '{args.skill}' is bundled or hub-installed — cannot pin "
"(only agent-created skills participate in curation)"
)
return 1
skill_usage.set_pinned(args.skill, True)
print(f"curator: pinned '{args.skill}' (will bypass auto-transitions)")
return 0
def _cmd_unpin(args) -> int:
from tools import skill_usage
if not skill_usage.is_agent_created(args.skill):
print(
f"curator: '{args.skill}' is bundled or hub-installed — "
"there's nothing to unpin (curator only tracks agent-created skills)"
)
return 1
skill_usage.set_pinned(args.skill, False)
print(f"curator: unpinned '{args.skill}'")
return 0
def _cmd_restore(args) -> int:
from tools import skill_usage
ok, msg = skill_usage.restore_skill(args.skill)
print(f"curator: {msg}")
return 0 if ok else 1
# ---------------------------------------------------------------------------
# argparse wiring (called from hermes_cli.main)
# ---------------------------------------------------------------------------
def register_cli(parent: argparse.ArgumentParser) -> None:
"""Attach `curator` subcommands to *parent*.
main.py calls this with the ArgumentParser returned by
``subparsers.add_parser("curator", ...)``.
"""
parent.set_defaults(func=lambda a: (parent.print_help(), 0)[1])
subs = parent.add_subparsers(dest="curator_command")
p_status = subs.add_parser("status", help="Show curator status and skill stats")
p_status.set_defaults(func=_cmd_status)
p_run = subs.add_parser("run", help="Trigger a curator review now")
p_run.add_argument(
"--sync", "--synchronous", dest="synchronous", action="store_true",
help="Wait for the LLM review pass to finish (default: background thread)",
)
p_run.set_defaults(func=_cmd_run)
p_pause = subs.add_parser("pause", help="Pause the curator until resumed")
p_pause.set_defaults(func=_cmd_pause)
p_resume = subs.add_parser("resume", help="Resume a paused curator")
p_resume.set_defaults(func=_cmd_resume)
p_pin = subs.add_parser("pin", help="Pin a skill so the curator never auto-transitions it")
p_pin.add_argument("skill", help="Skill name")
p_pin.set_defaults(func=_cmd_pin)
p_unpin = subs.add_parser("unpin", help="Unpin a skill")
p_unpin.add_argument("skill", help="Skill name")
p_unpin.set_defaults(func=_cmd_unpin)
p_restore = subs.add_parser("restore", help="Restore an archived skill")
p_restore.add_argument("skill", help="Skill name")
p_restore.set_defaults(func=_cmd_restore)
def cli_main(argv=None) -> int:
"""Standalone entry (also usable by hermes_cli.main fallthrough)."""
parser = argparse.ArgumentParser(prog="hermes curator")
register_cli(parser)
args = parser.parse_args(argv)
fn = getattr(args, "func", None)
if fn is None:
parser.print_help()
return 0
return int(fn(args) or 0)
if __name__ == "__main__": # pragma: no cover
sys.exit(cli_main())
+2 -2
View File
@@ -7,7 +7,6 @@ Currently supports:
import io
import json
import os
import sys
import time
import urllib.error
@@ -18,6 +17,7 @@ from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_home
from utils import atomic_replace
# ---------------------------------------------------------------------------
@@ -79,7 +79,7 @@ def _save_pending(entries: list[dict]) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(".json.tmp")
tmp.write_text(json.dumps(entries, indent=2), encoding="utf-8")
os.replace(tmp, path)
atomic_replace(tmp, path)
except OSError:
# Non-fatal — worst case the user has to run ``hermes debug delete``
# manually.
-1
View File
@@ -13,7 +13,6 @@ automatically.
from __future__ import annotations
import io
import os
import sys
import time
+130 -13
View File
@@ -8,6 +8,7 @@ import os
import sys
import subprocess
import shutil
import importlib.util
from pathlib import Path
from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
@@ -30,6 +31,7 @@ load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_cli.vercel_auth import describe_vercel_auth
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@@ -57,6 +59,7 @@ _PROVIDER_ENV_HINTS = (
"OPENCODE_ZEN_API_KEY",
"OPENCODE_GO_API_KEY",
"XIAOMI_API_KEY",
"TOKENHUB_API_KEY",
)
@@ -292,15 +295,23 @@ def run_doctor(args):
known_providers: set = set()
try:
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_cli.auth import (
PROVIDER_REGISTRY,
resolve_provider as _resolve_auth_provider,
)
known_providers = set(PROVIDER_REGISTRY.keys()) | {"openrouter", "custom", "auto"}
except Exception:
_resolve_auth_provider = None
pass
try:
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
from hermes_cli.providers import (
normalize_provider as _normalize_catalog_provider,
resolve_provider_full as _resolve_provider_full,
)
except Exception:
_compatible_custom_providers = None
_normalize_catalog_provider = None
_resolve_provider_full = None
custom_providers = []
@@ -320,17 +331,43 @@ def run_doctor(args):
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
canonical_provider = provider
valid_provider_ids = set(known_providers)
provider_ids_to_accept = {provider} if provider else set()
if _normalize_catalog_provider is not None:
for known_provider in known_providers:
try:
valid_provider_ids.add(_normalize_catalog_provider(known_provider))
except Exception:
continue
runtime_provider = provider
if (
provider
and _resolve_auth_provider is not None
and provider not in ("auto", "custom")
):
try:
runtime_provider = _resolve_auth_provider(provider)
provider_ids_to_accept.add(runtime_provider)
except Exception:
runtime_provider = provider
catalog_provider = provider
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
catalog_provider = provider_def.id if provider_def is not None else None
if catalog_provider is not None:
provider_ids_to_accept.add(catalog_provider)
if provider and provider != "auto":
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
if catalog_provider is None or (
known_providers
and not (provider_ids_to_accept & valid_provider_ids)
):
known_list = ", ".join(sorted(known_providers)) if known_providers else "(unavailable)"
check_fail(
f"model.provider '{provider_raw}' is not a recognised provider",
@@ -343,7 +380,24 @@ def run_doctor(args):
)
# Warn if model is set to a provider-prefixed name on a provider that doesn't use them
if default_model and "/" in default_model and canonical_provider and canonical_provider not in ("openrouter", "custom", "auto", "ai-gateway", "kilocode", "opencode-zen", "huggingface", "nous"):
provider_for_policy = runtime_provider or catalog_provider
providers_accepting_vendor_slugs = {
"openrouter",
"custom",
"auto",
"ai-gateway",
"kilocode",
"opencode-zen",
"huggingface",
"lmstudio",
"nous",
}
if (
default_model
and "/" in default_model
and provider_for_policy
and provider_for_policy not in providers_accepting_vendor_slugs
):
check_warn(
f"model.default '{default_model}' uses a vendor/model slug but provider is '{provider_raw}'",
"(vendor-prefixed slugs belong to aggregators like openrouter)",
@@ -359,20 +413,24 @@ def run_doctor(args):
# own env-var checks elsewhere in doctor, and get_auth_status()
# returns a bare {logged_in: False} for anything it doesn't
# explicitly dispatch, which would produce false positives.
if canonical_provider and canonical_provider not in ("auto", "custom", "openrouter"):
if runtime_provider and runtime_provider not in ("auto", "custom", "openrouter"):
try:
from hermes_cli.auth import PROVIDER_REGISTRY, get_auth_status
pconfig = PROVIDER_REGISTRY.get(canonical_provider)
pconfig = PROVIDER_REGISTRY.get(runtime_provider)
if pconfig and getattr(pconfig, "auth_type", "") == "api_key":
status = get_auth_status(canonical_provider) or {}
configured = bool(status.get("configured") or status.get("logged_in") or status.get("api_key"))
status = get_auth_status(runtime_provider) or {}
configured = bool(
status.get("configured")
or status.get("logged_in")
or status.get("api_key")
)
if not configured:
check_fail(
f"model.provider '{canonical_provider}' is set but no API key is configured",
f"model.provider '{runtime_provider}' is set but no API key is configured",
"(check ~/.hermes/.env or run 'hermes setup')",
)
issues.append(
f"No credentials found for provider '{canonical_provider}'. "
f"No credentials found for provider '{runtime_provider}'. "
f"Run 'hermes setup' or set the provider's API key in {_DHH}/.env, "
f"or switch providers with 'hermes config set model.provider <name>'"
)
@@ -481,6 +539,7 @@ def run_doctor(args):
get_nous_auth_status,
get_codex_auth_status,
get_gemini_oauth_auth_status,
get_minimax_oauth_auth_status,
)
nous_status = get_nous_auth_status()
@@ -510,13 +569,27 @@ def run_doctor(args):
check_ok("Google Gemini OAuth", f"(logged in{suffix})")
else:
check_warn("Google Gemini OAuth", "(not logged in)")
minimax_status = get_minimax_oauth_auth_status()
if minimax_status.get("logged_in"):
region = minimax_status.get("region", "global")
check_ok("MiniMax OAuth", f"(logged in, region={region})")
else:
check_warn("MiniMax OAuth", "(not logged in)")
except Exception as e:
check_warn("Auth provider status", f"(could not check: {e})")
if shutil.which("codex"):
check_ok("codex CLI")
else:
check_warn("codex CLI not found", "(required for openai-codex login)")
# Native OAuth uses Hermes' own device-code flow — the Codex CLI is
# only needed if you want to import existing tokens from
# ~/.codex/auth.json. Downgrade to info so users running
# `hermes auth openai-codex` aren't told they're missing something.
check_info(
"codex CLI not installed "
"(optional — only required to import tokens from an existing Codex CLI login)"
)
# =========================================================================
# Check: Directory structure
@@ -800,6 +873,50 @@ def run_doctor(args):
check_fail("daytona SDK not installed", "(pip install daytona)")
issues.append("Install daytona SDK: pip install daytona")
# Vercel Sandbox (if using vercel_sandbox backend)
if terminal_env == "vercel_sandbox":
runtime = os.getenv("TERMINAL_VERCEL_RUNTIME", "node24").strip() or "node24"
from tools.terminal_tool import _SUPPORTED_VERCEL_RUNTIMES
if runtime in _SUPPORTED_VERCEL_RUNTIMES:
check_ok("Vercel runtime", f"({runtime})")
else:
supported = ", ".join(_SUPPORTED_VERCEL_RUNTIMES)
check_fail("Vercel runtime unsupported", f"({runtime}; use {supported})")
issues.append(f"Set TERMINAL_VERCEL_RUNTIME to one of: {supported}")
disk = os.getenv("TERMINAL_CONTAINER_DISK", "51200").strip()
if disk in ("", "0", "51200"):
check_ok("Vercel disk setting", "(uses platform default)")
else:
check_fail("Vercel custom disk unsupported", "(reset terminal.container_disk to 51200)")
issues.append("Vercel Sandbox does not support custom container_disk; use the shared default 51200")
if importlib.util.find_spec("vercel") is not None:
check_ok("vercel SDK", "(installed)")
else:
check_fail("vercel SDK not installed", "(pip install 'hermes-agent[vercel]')")
issues.append("Install the Vercel optional dependency: pip install 'hermes-agent[vercel]'")
auth_status = describe_vercel_auth()
if auth_status.ok:
check_ok("Vercel auth", f"({auth_status.label})")
elif auth_status.label.startswith("partial"):
check_fail("Vercel auth incomplete", f"({auth_status.label})")
issues.append("Set VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID together")
else:
check_fail("Vercel auth not configured", f"({auth_status.label})")
issues.append(
"Configure Vercel Sandbox auth with VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID"
)
for line in auth_status.detail_lines:
check_info(f"Vercel auth {line}")
persistent = os.getenv("TERMINAL_CONTAINER_PERSISTENT", "true").lower() in ("1", "true", "yes", "on")
if persistent:
check_info("Vercel persistence: snapshot filesystem only; live processes do not survive sandbox recreation")
else:
check_info("Vercel persistence: ephemeral filesystem")
# Node.js + agent-browser (for browser automation tools)
if shutil.which("node"):
check_ok("Node.js")
+8 -6
View File
@@ -33,12 +33,14 @@ def _get_git_commit(project_root: Path) -> str:
def _redact(value: str) -> str:
"""Redact all but first 4 and last 4 chars."""
if not value:
return ""
if len(value) < 12:
return "***"
return value[:4] + "..." + value[-4:]
"""Redact all but first 4 and last 4 chars.
Thin wrapper over :func:`agent.redact.mask_secret`. Returns ``""`` for
an empty value (matches the historical behavior of this helper
``hermes dump`` formats empty values as blank, not as ``"(not set)"``).
"""
from agent.redact import mask_secret
return mask_secret(value)
def _gateway_status() -> str:
+2 -1
View File
@@ -7,6 +7,7 @@ import sys
from pathlib import Path
from dotenv import load_dotenv
from utils import atomic_replace
# Env var name suffixes that indicate credential values. These are the
@@ -127,7 +128,7 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
f.writelines(sanitized)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
atomic_replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
+41 -20
View File
@@ -279,9 +279,11 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
["wmic", "process", "get", "ProcessId,CommandLine", "/FORMAT:LIST"],
capture_output=True,
text=True,
encoding="utf-8",
errors="ignore",
timeout=10,
)
if result.returncode != 0:
if result.returncode != 0 or result.stdout is None:
return []
current_cmd = ""
for line in result.stdout.split("\n"):
@@ -830,6 +832,22 @@ def _user_dbus_socket_path() -> Path:
return Path(xdg) / "bus"
def _user_systemd_private_socket_path() -> Path:
"""Return the per-user systemd private socket path (regardless of existence)."""
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
return Path(xdg) / "systemd" / "private"
def _user_systemd_socket_ready() -> bool:
"""Return True when user-scope systemd has a reachable control socket.
Some distros expose only the per-user systemd private socket even when the
D-Bus session bus socket is absent. ``systemctl --user`` can still work in
that configuration, so preflight checks must treat either socket as valid.
"""
return _user_dbus_socket_path().exists() or _user_systemd_private_socket_path().exists()
def _ensure_user_systemd_env() -> None:
"""Ensure DBUS_SESSION_BUS_ADDRESS and XDG_RUNTIME_DIR are set for systemctl --user.
@@ -853,28 +871,29 @@ def _ensure_user_systemd_env() -> None:
def _wait_for_user_dbus_socket(timeout: float = 3.0) -> bool:
"""Poll for the user D-Bus socket to appear, up to ``timeout`` seconds.
"""Poll for the user systemd runtime socket(s), up to ``timeout`` seconds.
Linger-enabled user@.service can take a second or two to spawn the socket
after ``loginctl enable-linger`` runs. Returns True once the socket exists.
Linger-enabled user@.service can take a second or two to spawn its control
socket(s) after ``loginctl enable-linger`` runs. Returns True once either
the user D-Bus socket or the per-user systemd private socket exists.
"""
import time
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
if _user_dbus_socket_path().exists():
if _user_systemd_socket_ready():
_ensure_user_systemd_env()
return True
time.sleep(0.2)
return _user_dbus_socket_path().exists()
return _user_systemd_socket_ready()
def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
"""Ensure ``systemctl --user`` will reach the user D-Bus session bus.
"""Ensure ``systemctl --user`` will reach the user-scope systemd instance.
No-op when the bus socket is already there (the common case on desktops
and linger-enabled servers). On fresh SSH sessions where the socket is
missing:
No-op when the user D-Bus socket or per-user systemd private socket is
already there (the common case on desktops and linger-enabled servers). On
fresh SSH sessions where both are missing:
* If linger is already enabled, wait briefly for user@.service to spawn
the socket.
@@ -888,8 +907,7 @@ def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
systemd operations and surface the message to the user.
"""
_ensure_user_systemd_env()
bus_path = _user_dbus_socket_path()
if bus_path.exists():
if _user_systemd_socket_ready():
return
import getpass
@@ -903,7 +921,7 @@ def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
# Linger is on but socket still missing — unusual; fall through to error.
_raise_user_systemd_unavailable(
username,
reason="User D-Bus socket is missing even though linger is enabled.",
reason="User systemd control sockets are missing even though linger is enabled.",
fix_hint=(
f" systemctl start user@{os.getuid()}.service\n"
" (may require sudo; try again after the command succeeds)"
@@ -2953,7 +2971,7 @@ def _setup_sms():
def _setup_dingtalk():
"""Configure DingTalk — QR scan (recommended) or manual credential entry."""
from hermes_cli.setup import (
prompt_choice, prompt_yes_no, print_info, print_success, print_warning,
prompt_choice, prompt_yes_no, print_success, print_warning,
)
dingtalk_platform = next(p for p in _PLATFORMS if p["key"] == "dingtalk")
@@ -3277,6 +3295,12 @@ def _setup_weixin():
print_warning(" Direct messages disabled.")
print()
print_info(" Note: QR login connects an iLink bot identity (e.g. ...@im.bot), not a")
print_info(" scriptable personal WeChat account. Ordinary WeChat groups typically cannot")
print_info(" invite an @im.bot identity, and iLink does not deliver ordinary-group events")
print_info(" to most bot accounts. The settings below only apply when iLink actually")
print_info(" delivers group events for your account type — otherwise DM remains the only")
print_info(" working channel regardless of this choice.")
group_choices = [
"Disable group chats (recommended)",
"Allow all group chats",
@@ -3290,12 +3314,12 @@ def _setup_weixin():
elif group_idx == 1:
save_env_value("WEIXIN_GROUP_POLICY", "open")
save_env_value("WEIXIN_GROUP_ALLOWED_USERS", "")
print_warning(" All group chats enabled.")
print_warning(" All group chats enabled (only takes effect if iLink delivers group events).")
else:
allow_groups = prompt(" Allowed group chat IDs (comma-separated)", "", password=False).replace(" ", "")
allow_groups = prompt(" Allowed group chat IDs (comma-separated, not member user IDs)", "", password=False).replace(" ", "")
save_env_value("WEIXIN_GROUP_POLICY", "allowlist")
save_env_value("WEIXIN_GROUP_ALLOWED_USERS", allow_groups)
print_success(" Group allowlist saved.")
print_success(" Group allowlist saved (only takes effect if iLink delivers group events).")
if user_id:
print()
@@ -3504,7 +3528,6 @@ def _setup_qqbot():
method_idx = prompt_choice(" How would you like to set up QQ Bot?", method_choices, 0)
credentials = None
used_qr = False
if method_idx == 0:
# ── QR scan-to-configure ──
@@ -3515,8 +3538,6 @@ def _setup_qqbot():
print()
print_warning(" QQ Bot setup cancelled.")
return
if credentials:
used_qr = True
if not credentials:
print_info(" QR setup did not complete. Continuing with manual input.")
+1 -2
View File
@@ -19,9 +19,8 @@ format) lives there.
from __future__ import annotations
import json
import os
from pathlib import Path
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
def hooks_command(args) -> None:
+280 -37
View File
@@ -1094,11 +1094,36 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
return [node, str(root / "dist" / "entry.js")], root
def _normalize_tui_toolsets(toolsets: object) -> list[str]:
"""Normalize argparse/Fire-style toolset input for the TUI subprocess."""
try:
from hermes_cli.oneshot import _normalize_toolsets
return _normalize_toolsets(toolsets) or []
except (AttributeError, ImportError):
if not toolsets:
return []
raw_items = [toolsets] if isinstance(toolsets, str) else toolsets
if not isinstance(raw_items, (list, tuple)):
raw_items = [raw_items]
normalized: list[str] = []
for item in raw_items:
if isinstance(item, str):
normalized.extend(part.strip() for part in item.split(","))
else:
normalized.append(str(item).strip())
return [item for item in normalized if item]
def _launch_tui(
resume_session_id: Optional[str] = None,
tui_dev: bool = False,
model: Optional[str] = None,
provider: Optional[str] = None,
toolsets: object = None,
):
"""Replace current process with the TUI."""
tui_dir = PROJECT_ROOT / "ui-tui"
@@ -1123,6 +1148,9 @@ def _launch_tui(
if provider:
env["HERMES_TUI_PROVIDER"] = provider
env["HERMES_INFERENCE_PROVIDER"] = provider
tui_toolsets = _normalize_tui_toolsets(toolsets)
if tui_toolsets:
env["HERMES_TUI_TOOLSETS"] = ",".join(tui_toolsets)
# Guarantee an 8GB V8 heap + exposed GC for the TUI. Default node cap is
# ~1.54GB depending on version and can fatal-OOM on long sessions with
# large transcripts / reasoning blobs. Token-level merge: respect any
@@ -1270,6 +1298,7 @@ def cmd_chat(args):
tui_dev=getattr(args, "tui_dev", False),
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
toolsets=getattr(args, "toolsets", None),
)
# Import and run the CLI
@@ -1770,6 +1799,8 @@ def select_provider_and_model(args=None):
_model_flow_openai_codex(config, current_model)
elif selected_provider == "qwen-oauth":
_model_flow_qwen_oauth(config, current_model)
elif selected_provider == "minimax-oauth":
_model_flow_minimax_oauth(config, current_model, args=args)
elif selected_provider == "google-gemini-cli":
_model_flow_google_gemini_cli(config, current_model)
elif selected_provider == "copilot-acp":
@@ -1820,6 +1851,8 @@ def select_provider_and_model(args=None):
"gmi",
"nvidia",
"ollama-cloud",
"tencent-tokenhub",
"lmstudio",
):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -2046,7 +2079,11 @@ def _aux_select_for_task(task: str) -> None:
# Gather authenticated providers (has credentials + curated model list)
try:
providers = list_authenticated_providers(current_provider=current_provider)
providers = list_authenticated_providers(
current_provider=current_provider,
current_model=current_model,
current_base_url=current_base_url,
)
except Exception as exc:
print(f"Could not detect authenticated providers: {exc}")
providers = []
@@ -2652,6 +2689,53 @@ def _model_flow_qwen_oauth(_config, current_model=""):
print("No change.")
def _model_flow_minimax_oauth(config, current_model="", args=None):
"""MiniMax OAuth provider: ensure logged in, then pick model."""
from hermes_cli.auth import (
get_provider_auth_state,
_prompt_model_selection,
_save_model_choice,
_update_config_for_provider,
resolve_minimax_oauth_runtime_credentials,
AuthError,
format_auth_error,
_login_minimax_oauth,
PROVIDER_REGISTRY,
)
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
print("Not logged into MiniMax. Starting OAuth login...")
print()
try:
mock_args = argparse.Namespace(
region=getattr(args, "region", None) or "global",
no_browser=bool(getattr(args, "no_browser", False)),
timeout=getattr(args, "timeout", None) or 15.0,
)
_login_minimax_oauth(mock_args, PROVIDER_REGISTRY["minimax-oauth"])
except SystemExit:
print("Login cancelled or failed.")
return
except Exception as exc:
print(f"Login failed: {exc}")
return
try:
creds = resolve_minimax_oauth_runtime_credentials()
except AuthError as exc:
print(format_auth_error(exc))
return
from hermes_cli.models import _PROVIDER_MODELS
model_ids = _PROVIDER_MODELS.get("minimax-oauth", [])
selected = _prompt_model_selection(model_ids, current_model)
if not selected:
return
_save_model_choice(selected)
_update_config_for_provider("minimax-oauth", creds["base_url"])
print(f"\u2713 Using MiniMax model: {selected}")
def _model_flow_google_gemini_cli(_config, current_model=""):
"""Google Gemini OAuth (PKCE) via Cloud Code Assist — supports free AND paid tiers.
@@ -4376,6 +4460,7 @@ def _model_flow_bedrock(config, current_model=""):
def _model_flow_api_key_provider(config, provider_id, current_model=""):
"""Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
from hermes_cli.auth import (
LMSTUDIO_NOAUTH_PLACEHOLDER,
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
@@ -4410,13 +4495,20 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
if provider_id == "lmstudio":
prompt = f"{key_env} (Enter for no-auth default {LMSTUDIO_NOAUTH_PLACEHOLDER!r}): "
else:
prompt = f"{key_env} (or Enter to cancel): "
new_key = getpass.getpass(prompt).strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not new_key:
print("Cancelled.")
return
if provider_id == "lmstudio":
new_key = LMSTUDIO_NOAUTH_PLACEHOLDER
else:
print("Cancelled.")
return
save_env_value(key_env, new_key)
existing_key = new_key
print("API key saved.")
@@ -4483,10 +4575,21 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print(" Tier check: could not verify (proceeding anyway).")
print()
# Optional base URL override
# Optional base URL override.
# Precedence: env var → config.yaml model.base_url → registry default.
# Reading config.yaml prevents silently overwriting a saved remote URL
# (e.g. a remote LM Studio endpoint) with localhost when the user just
# presses Enter at the prompt below.
current_base = ""
if base_url_env:
current_base = get_env_value(base_url_env) or os.getenv(base_url_env, "")
if not current_base:
try:
_m = load_config().get("model") or {}
if str(_m.get("provider") or "").strip().lower() == provider_id:
current_base = str(_m.get("base_url") or "").strip()
except Exception:
pass
effective_base = current_base or pconfig.inference_base_url
try:
@@ -4508,8 +4611,22 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
# 2. Curated static fallback list (offline insurance)
# 3. Live /models endpoint probe (small providers without models.dev data)
#
# Ollama Cloud: dedicated merged discovery (live API + models.dev + disk cache)
if provider_id == "ollama-cloud":
# LM Studio: live /api/v1/models probe (no models.dev catalog).
# Ollama Cloud: merged discovery (live API + models.dev + disk cache).
if provider_id == "lmstudio":
from hermes_cli.auth import AuthError
from hermes_cli.models import fetch_lmstudio_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
try:
model_list = fetch_lmstudio_models(api_key=api_key_for_probe, base_url=effective_base)
except AuthError as exc:
print(f" LM Studio rejected the request: {exc}")
print(" Set LM_API_KEY (or update it) to match the server's bearer token.")
model_list = []
if model_list:
print(f" Found {len(model_list)} model(s) from LM Studio")
elif provider_id == "ollama-cloud":
from hermes_cli.models import fetch_ollama_cloud_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
@@ -4731,7 +4848,6 @@ def _model_flow_anthropic(config, current_model=""):
read_claude_code_credentials,
is_claude_code_token_valid,
_is_oauth_token,
_resolve_claude_code_token_from_credentials,
)
cc_creds = read_claude_code_credentials()
@@ -5213,6 +5329,101 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
return True
def _warn_stale_dashboard_processes() -> None:
"""Warn about running dashboard processes that still hold pre-update code.
``hermes dashboard`` is a long-lived server process commonly started and
forgotten. When ``hermes update`` replaces files on disk, the running
process keeps the old Python backend in memory while the JS bundle on
disk is updated, causing a silent frontend/backend mismatch (e.g. new
auth headers the old backend doesn't recognise → every API call 401s).
Unlike the gateway, the dashboard has no service manager (systemd /
launchd), so we can only warn we don't auto-kill user-managed
background processes.
"""
patterns = [
"hermes dashboard",
"hermes_cli.main dashboard",
"hermes_cli/main.py dashboard",
]
self_pid = os.getpid()
dashboard_pids: list[int] = []
try:
if sys.platform == "win32":
# wmic may emit text in the system code page (for example cp936
# on zh-CN systems), not UTF-8. In text mode, subprocess output
# decoding depends on Python's configuration (locale-dependent
# by default, or UTF-8 in UTF-8 mode). The important protection
# here is errors="ignore": it prevents a reader-thread
# UnicodeDecodeError from leaving result.stdout=None and turning
# the later .split() into an AttributeError (#17049).
result = subprocess.run(
["wmic", "process", "get", "ProcessId,CommandLine",
"/FORMAT:LIST"],
capture_output=True, text=True, timeout=10,
encoding="utf-8", errors="ignore",
)
if result.returncode != 0 or result.stdout is None:
return
current_cmd = ""
for line in result.stdout.split("\n"):
line = line.strip()
if line.startswith("CommandLine="):
current_cmd = line[len("CommandLine="):]
elif line.startswith("ProcessId="):
pid_str = line[len("ProcessId="):]
if (any(p in current_cmd for p in patterns)
and int(pid_str) != self_pid):
try:
dashboard_pids.append(int(pid_str))
except ValueError:
pass
else:
# Linux / macOS: scan the process table via ps and match against
# the same explicit patterns list used on Windows. Using ps
# (rather than `pgrep -f "hermes.*dashboard"`) keeps us consistent
# with `hermes_cli.gateway._scan_gateway_pids` and avoids the
# greedy regex matching unrelated cmdlines that merely contain
# both words (e.g. a chat session discussing "dashboard").
result = subprocess.run(
["ps", "-A", "-o", "pid=,command="],
capture_output=True, text=True, timeout=10,
)
if result.returncode == 0:
for line in result.stdout.split("\n"):
stripped = line.strip()
if not stripped or "grep" in stripped:
continue
parts = stripped.split(None, 1)
if len(parts) != 2:
continue
try:
pid = int(parts[0])
except ValueError:
continue
command = parts[1]
if (any(p in command for p in patterns)
and pid != self_pid):
dashboard_pids.append(pid)
except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
return
if not dashboard_pids:
return
print()
print(f"{len(dashboard_pids)} dashboard process(es) still running "
f"with the previous version:")
for pid in dashboard_pids:
print(f" PID {pid}")
print(" The running backend may not match the updated frontend,")
print(" causing silent auth failures or empty data.")
print(" Restart them to pick up the changes:")
print(" kill <pid> && hermes dashboard --port <port> ...")
def _update_via_zip(args):
"""Update Hermes Agent by downloading a ZIP archive.
@@ -5347,6 +5558,7 @@ def _update_via_zip(args):
print()
print("✓ Update complete!")
_warn_stale_dashboard_processes()
def _stash_local_changes_if_needed(git_cmd: list[str], cwd: Path) -> Optional[str]:
@@ -7048,7 +7260,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
print(
f"{svc_name} died after restart, retrying..."
)
retry = subprocess.run(
subprocess.run(
scope_cmd + ["restart", svc_name],
capture_output=True,
text=True,
@@ -7163,6 +7375,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
except Exception as e:
logger.debug("Legacy unit check during update failed: %s", e)
# Warn about stale dashboard processes — the dashboard has no
# service manager, so we can only tell the user to restart them.
_warn_stale_dashboard_processes()
print()
print("Tip: You can now select a provider and model:")
print(" hermes model # Select provider and model")
@@ -7700,6 +7916,12 @@ For more help on a command:
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
),
)
parser.add_argument(
"-t",
"--toolsets",
default=None,
help="Comma-separated toolsets to enable for this invocation. Applies to -z/--oneshot and --tui.",
)
parser.add_argument(
"--resume",
"-r",
@@ -7811,32 +8033,12 @@ For more help on a command:
)
chat_parser.add_argument(
"--provider",
choices=[
"auto",
"openrouter",
"nous",
"openai-codex",
"copilot-acp",
"copilot",
"anthropic",
"gemini",
"xai",
"ollama-cloud",
"huggingface",
"zai",
"kimi-coding",
"kimi-coding-cn",
"stepfun",
"minimax",
"minimax-cn",
"kilocode",
"xiaomi",
"arcee",
"gmi",
"nvidia",
],
# No `choices=` here: user-defined providers from config.yaml `providers:`
# are also valid values, and runtime resolution (resolve_runtime_provider)
# handles validation/error reporting consistently with the top-level
# `--provider` flag.
default=None,
help="Inference provider (default: auto)",
help="Inference provider (default: auto). Built-in or a user-defined name from `providers:` in config.yaml.",
)
chat_parser.add_argument(
"-v", "--verbose", action="store_true", help="Verbose output"
@@ -9120,6 +9322,26 @@ Examples:
except Exception as _exc:
logging.getLogger(__name__).debug("Plugin CLI discovery failed: %s", _exc)
# =========================================================================
# curator command — background skill maintenance
# =========================================================================
curator_parser = subparsers.add_parser(
"curator",
help="Background skill maintenance (curator) — status, run, pause, pin",
description=(
"The curator is an auxiliary-model background task that "
"periodically reviews agent-created skills, prunes stale ones, "
"consolidates overlaps, and archives obsolete skills. "
"Bundled and hub-installed skills are never touched. "
"Archives are recoverable; auto-deletion never happens."
),
)
try:
from hermes_cli.curator import register_cli as _register_curator_cli
_register_curator_cli(curator_parser)
except Exception as _exc:
logging.getLogger(__name__).debug("curator CLI wiring failed: %s", _exc)
# =========================================================================
# memory command
# =========================================================================
@@ -9676,17 +9898,26 @@ Examples:
"--preset",
choices=["user-data", "full"],
default="full",
help="Migration preset (default: full). 'user-data' excludes secrets",
help="Migration preset (default: full). Neither preset imports secrets"
"pass --migrate-secrets to include API keys.",
)
claw_migrate.add_argument(
"--overwrite",
action="store_true",
help="Overwrite existing files (default: skip conflicts)",
help="Overwrite existing files (default: refuse to apply when the plan has conflicts)",
)
claw_migrate.add_argument(
"--migrate-secrets",
action="store_true",
help="Include allowlisted secrets (TELEGRAM_BOT_TOKEN, API keys, etc.)",
help="Include allowlisted secrets (TELEGRAM_BOT_TOKEN, API keys, etc.). "
"Required even under --preset full.",
)
claw_migrate.add_argument(
"--no-backup",
action="store_true",
help="Skip the pre-migration zip snapshot of ~/.hermes/ (by default a "
"single restore-point archive is written to ~/.hermes/backups/ "
"before apply; restorable with 'hermes import').",
)
claw_migrate.add_argument(
"--workspace-target", help="Absolute path to copy workspace instructions into"
@@ -10101,6 +10332,17 @@ Examples:
logger.debug(
"plugin discovery failed at CLI startup", exc_info=True,
)
try:
# MCP tool discovery — no event loop running in CLI/TUI startup,
# so inline is safe. Moved here from model_tools.py module scope
# to avoid freezing the gateway's event loop on its first message
# via the same lazy import path (#16856).
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug(
"MCP tool discovery failed at CLI startup", exc_info=True,
)
try:
from hermes_cli.config import load_config
from agent.shell_hooks import register_from_config
@@ -10120,6 +10362,7 @@ Examples:
args.oneshot,
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
toolsets=getattr(args, "toolsets", None),
))
# Handle top-level --resume / --continue as shortcut to chat
+2 -1
View File
@@ -16,6 +16,7 @@ import time
from typing import Any, Dict, List, Optional, Tuple
from hermes_cli.config import (
cfg_get,
load_config,
save_config,
get_env_value,
@@ -716,7 +717,7 @@ def cmd_mcp_configure(args):
# Update config
config = load_config()
server_entry = config.get("mcp_servers", {}).get(name, {})
server_entry = cfg_get(config, "mcp_servers", name, default={})
if len(chosen) == total:
# All selected → remove include/exclude (register all)
+2 -2
View File
@@ -46,7 +46,6 @@ from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.request
@@ -54,6 +53,7 @@ from pathlib import Path
from typing import Any
from hermes_cli import __version__ as _HERMES_VERSION
from utils import atomic_replace
logger = logging.getLogger(__name__)
@@ -190,7 +190,7 @@ def _write_disk_cache(data: dict[str, Any]) -> None:
with open(tmp, "w") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
os.replace(tmp, path)
atomic_replace(tmp, path)
except OSError as exc:
logger.info("model catalog cache write failed: %s", exc)
+1
View File
@@ -96,6 +96,7 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"kimi-coding",
"kimi-coding-cn",
"minimax",
"minimax-oauth",
"minimax-cn",
"alibaba",
"qwen-oauth",
+127 -6
View File
@@ -213,10 +213,15 @@ def _load_direct_aliases() -> dict[str, DirectAlias]:
def _ensure_direct_aliases() -> None:
"""Lazy-load direct aliases on first use."""
global DIRECT_ALIASES
"""Lazy-load direct aliases on first use.
Mutates the existing DIRECT_ALIASES dict in place rather than rebinding
the module attribute. This keeps `from hermes_cli.model_switch import
DIRECT_ALIASES` references valid in callers rebinding would leave them
pointing at a stale empty dict.
"""
if not DIRECT_ALIASES:
DIRECT_ALIASES = _load_direct_aliases()
DIRECT_ALIASES.update(_load_direct_aliases())
# ---------------------------------------------------------------------------
@@ -979,6 +984,7 @@ def list_authenticated_providers(
user_providers: dict = None,
custom_providers: list | None = None,
max_models: int = 8,
current_model: str = "",
) -> List[dict]:
"""Detect which providers have credentials and list their curated models.
@@ -1012,6 +1018,37 @@ def list_authenticated_providers(
results: List[dict] = []
seen_slugs: set = set() # lowercase-normalized to catch case variants (#9545)
seen_mdev_ids: set = set() # prevent duplicate entries for aliases (e.g. kimi-coding + kimi-coding-cn)
# Effective base URLs of every built-in row we emit (normalized lower+rstrip).
# Section 4 uses this to hide ``custom_providers`` entries that point at the
# same endpoint as a built-in (e.g. a user-defined "my-dashscope" on
# https://coding-intl.dashscope.aliyuncs.com/v1 collides with the built-in
# alibaba-coding-plan row when DASHSCOPE_API_KEY is present). Fixes #16970.
_builtin_endpoints: set = set()
def _norm_url(url: str) -> str:
return str(url or "").strip().rstrip("/").lower()
def _record_builtin_endpoint(slug: str) -> None:
"""Record the effective base URL for a built-in provider row.
Prefers the live env-override (e.g. DASHSCOPE_BASE_URL) over the
static inference_base_url so the dedup matches what a user typing
that URL into custom_providers would actually hit."""
try:
from hermes_cli.auth import PROVIDER_REGISTRY as _reg
except Exception:
return
pcfg = _reg.get(slug)
if not pcfg:
return
url = ""
if getattr(pcfg, "base_url_env_var", ""):
url = os.environ.get(pcfg.base_url_env_var, "") or ""
if not url:
url = getattr(pcfg, "inference_base_url", "") or ""
normed = _norm_url(url)
if normed:
_builtin_endpoints.add(normed)
data = fetch_models_dev()
@@ -1025,6 +1062,34 @@ def list_authenticated_providers(
if "ollama-cloud" not in curated:
from hermes_cli.models import fetch_ollama_cloud_models
curated["ollama-cloud"] = fetch_ollama_cloud_models()
# LM Studio has no static catalog — probe its native /api/v1/models
# endpoint live so the picker reflects whatever the user has loaded.
# Base URL precedence: LM_BASE_URL env var > active config's base_url
# (when current provider is lmstudio) > 127.0.0.1 default.
# On auth rejection or unreachable server, fall back to the caller-supplied
# current model so the picker still shows something when offline / mis-keyed.
if "lmstudio" not in curated and (
os.environ.get("LM_API_KEY") or os.environ.get("LM_BASE_URL") or current_provider.strip().lower() == "lmstudio"
):
from hermes_cli.models import fetch_lmstudio_models
from hermes_cli.auth import AuthError
is_current_lmstudio = current_provider.strip().lower() == "lmstudio"
lm_base = (
os.environ.get("LM_BASE_URL")
or (current_base_url if is_current_lmstudio and current_base_url else None)
or "http://127.0.0.1:1234/v1"
)
try:
live = fetch_lmstudio_models(
api_key=os.environ.get("LM_API_KEY", ""),
base_url=lm_base,
timeout=1.5, # Smaller timeout for picker
)
except AuthError:
live = []
if not live and is_current_lmstudio and current_model:
live = [current_model]
curated["lmstudio"] = live
# --- 1. Check Hermes-mapped providers ---
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
@@ -1090,6 +1155,7 @@ def list_authenticated_providers(
})
seen_slugs.add(slug.lower())
seen_mdev_ids.add(mdev_id)
_record_builtin_endpoint(slug)
# --- 2. Check Hermes-only providers (nous, openai-codex, copilot, opencode-go) ---
from hermes_cli.providers import HERMES_OVERLAYS
@@ -1175,6 +1241,15 @@ def list_authenticated_providers(
if hermes_slug in {"copilot", "copilot-acp"}:
model_ids = provider_model_ids(hermes_slug)
# For aws_sdk providers (bedrock), use live discovery so the list
# reflects the active region (eu.*, ap.*) not the static us.* list.
elif overlay.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
model_ids = _ids if _ids is not None else (curated.get(hermes_slug, []) or curated.get(pid, []))
except Exception:
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
@@ -1195,6 +1270,7 @@ def list_authenticated_providers(
})
seen_slugs.add(pid.lower())
seen_slugs.add(hermes_slug.lower())
_record_builtin_endpoint(hermes_slug)
# --- 2b. Cross-check canonical provider list ---
# Catches providers that are in CANONICAL_PROVIDERS but weren't found
@@ -1237,10 +1313,30 @@ def list_authenticated_providers(
except Exception:
pass
# Special case: aws_sdk auth (bedrock) — no API key env vars,
# credentials come from the boto3 credential chain (env vars,
# ~/.aws/credentials, instance roles, etc.)
if not _cp_has_creds and _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
try:
from agent.bedrock_adapter import has_aws_credentials
_cp_has_creds = has_aws_credentials()
except Exception:
pass
if not _cp_has_creds:
continue
_cp_model_ids = curated.get(_cp.slug, [])
# For bedrock, use live discovery so the list reflects the active
# region (eu.*, us.*, ap.*) instead of the hardcoded us.* static list.
if _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
_ids = bedrock_model_ids_or_none()
_cp_model_ids = _ids if _ids is not None else curated.get(_cp.slug, [])
except Exception:
_cp_model_ids = curated.get(_cp.slug, [])
else:
_cp_model_ids = curated.get(_cp.slug, [])
_cp_total = len(_cp_model_ids)
_cp_top = _cp_model_ids[:max_models]
@@ -1254,6 +1350,7 @@ def list_authenticated_providers(
"source": "canonical",
})
seen_slugs.add(_cp.slug.lower())
_record_builtin_endpoint(_cp.slug)
# --- 3. User-defined endpoints from config ---
# Track (name, base_url) of what section 3 emits so section 4 can skip
@@ -1312,8 +1409,23 @@ def list_authenticated_providers(
if fb:
models_list = list(fb)
# Try to probe /v1/models if URL is set (but don't block on it)
# For now just show what we know from config
# Prefer the endpoint's live /models list when credentials are
# available. This keeps OpenAI-compatible relays (for example CRS)
# in sync when the server catalog changes without requiring the
# user to mirror every model into config.yaml.
api_key = str(ep_cfg.get("api_key", "") or "").strip()
if not api_key:
key_env = str(ep_cfg.get("key_env", "") or "").strip()
api_key = os.environ.get(key_env, "").strip() if key_env else ""
if api_url and api_key:
try:
from hermes_cli.models import fetch_api_models
live_models = fetch_api_models(api_key, api_url)
if live_models:
models_list = live_models
except Exception:
pass
results.append({
"slug": ep_name,
"name": display_name,
@@ -1448,6 +1560,15 @@ def list_authenticated_providers(
)
if _pair_key[0] and _pair_key[1] and _pair_key in _section3_emitted_pairs:
continue
# Skip if a built-in row (sections 1/2/2b) already represents this
# endpoint. Fixes #16970: a user-defined "my-dashscope" pointing at
# https://coding-intl.dashscope.aliyuncs.com/v1 duplicates the
# built-in alibaba-coding-plan row whenever DASHSCOPE_API_KEY is
# set. The built-in row carries the curated model list, correct
# auth wiring, and canonical slug — keep it and hide the shadow.
_grp_url_norm = _pair_key[1]
if _grp_url_norm and _grp_url_norm in _builtin_endpoints:
continue
results.append({
"slug": slug,
"name": grp["name"],
+421 -37
View File
@@ -44,6 +44,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
("tencent/hy3-preview:free", "free"),
("openai/gpt-5.3-codex", ""),
("google/gemini-3-pro-image-preview", ""),
("google/gemini-3-flash-preview", ""),
@@ -106,11 +107,57 @@ def _codex_curated_models() -> list[str]:
return _add_forward_compat_models(list(DEFAULT_CODEX_MODELS))
# Static fallback for xAI when the models.dev disk cache is empty (fresh
# install, offline first run, etc.). Mirrors the xAI-direct model IDs from
# $HERMES_HOME/models_dev_cache.json as of 2026-04-28. Whenever xAI renames
# or retires a model, the disk cache picks it up on the next refresh and the
# fallback here only matters until that refresh lands.
_XAI_STATIC_FALLBACK: list[str] = [
"grok-4.20-0309-reasoning",
"grok-4.20-0309-non-reasoning",
"grok-4.20-multi-agent-0309",
"grok-4-1-fast",
"grok-4-1-fast-non-reasoning",
"grok-4-fast",
"grok-4-fast-non-reasoning",
"grok-4",
"grok-code-fast-1",
]
def _xai_curated_models() -> list[str]:
"""Derive the xAI-direct curated list from models.dev disk cache.
Reads $HERMES_HOME/models_dev_cache.json directly (no network) so this
runs at import time without blocking. Falls back to ``_XAI_STATIC_FALLBACK``
when the cache is empty or unreadable. Hermes refreshes the cache from
https://models.dev/api.json on normal use, so this list self-heals as
xAI renames models.
Mirrors ``_codex_curated_models()``'s role for openai-codex.
"""
try:
from agent.models_dev import _load_disk_cache
data = _load_disk_cache()
xai = data.get("xai") if isinstance(data, dict) else None
models = xai.get("models") if isinstance(xai, dict) else None
if isinstance(models, dict) and models:
ids = [mid for mid in models.keys() if isinstance(mid, str)]
if ids:
return sorted(ids)
except Exception:
# Any failure (missing file, malformed JSON, import error)
# falls through to the static list.
pass
return list(_XAI_STATIC_FALLBACK)
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"tencent/hy3-preview",
"anthropic/claude-opus-4.7",
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
@@ -193,10 +240,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"glm-4.5",
"glm-4.5-flash",
],
"xai": [
"grok-4.20-reasoning",
"grok-4-1-fast-reasoning",
],
"xai": _xai_curated_models(),
"nvidia": [
# NVIDIA flagship reasoning models
"nvidia/nemotron-3-super-120b-a12b",
@@ -244,6 +288,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"MiniMax-M2.1",
"MiniMax-M2",
],
"minimax-oauth": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
],
"minimax-cn": [
"MiniMax-M2.7",
"MiniMax-M2.5",
@@ -273,6 +321,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"mimo-v2-omni",
"mimo-v2-flash",
],
"tencent-tokenhub": [
"hy3-preview",
],
"arcee": [
"trinity-large-thinking",
"trinity-large-preview",
@@ -350,6 +401,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
# to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
# or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
"alibaba": [
"qwen3.6-plus",
"kimi-k2.5",
"qwen3.5-plus",
"qwen3-coder-plus",
@@ -720,10 +772,12 @@ class ProviderEntry(NamedTuple):
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
ProviderEntry("lmstudio", "LM Studio", "LM Studio (local desktop app with built-in model server)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, $5 free credit, no markup)"),
ProviderEntry("anthropic", "Anthropic", "Anthropic (Claude models — API key or Claude Code)"),
ProviderEntry("openai-codex", "OpenAI Codex", "OpenAI Codex"),
ProviderEntry("xiaomi", "Xiaomi MiMo", "Xiaomi MiMo (MiMo-V2.5 and V2 models — pro, omni, flash)"),
ProviderEntry("tencent-tokenhub", "Tencent TokenHub", "Tencent TokenHub (Hy3 Preview — direct API via tokenhub.tencentmaas.com)"),
ProviderEntry("nvidia", "NVIDIA NIM", "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
ProviderEntry("copilot", "GitHub Copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
@@ -738,6 +792,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("stepfun", "StepFun Step Plan", "StepFun Step Plan (agent/coding models via Step Plan API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-oauth", "MiniMax (OAuth)", "MiniMax via OAuth browser login (Coding Plan, minimax.io)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
@@ -781,6 +836,9 @@ _PROVIDER_ALIASES = {
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"minimax-portal": "minimax-oauth",
"minimax-global": "minimax-oauth",
"minimax_oauth": "minimax-oauth",
"claude": "anthropic",
"claude-code": "anthropic",
"deep-seek": "deepseek",
@@ -806,6 +864,10 @@ _PROVIDER_ALIASES = {
"huggingface-hub": "huggingface",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
@@ -817,6 +879,9 @@ _PROVIDER_ALIASES = {
"nvidia-nim": "nvidia",
"build-nvidia": "nvidia",
"nemotron": "nvidia",
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
"lm_studio": "lmstudio",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama_cloud": "ollama-cloud",
}
@@ -1623,31 +1688,41 @@ def provider_label(provider: Optional[str]) -> str:
# Models that support OpenAI Priority Processing (service_tier="priority").
# See https://openai.com/api-priority-processing/ for the canonical list.
# Only the bare model slug is stored (no vendor prefix).
_PRIORITY_PROCESSING_MODELS: frozenset[str] = frozenset({
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5.2",
"gpt-5.1",
"gpt-5",
"gpt-5-mini",
"gpt-4.1",
"gpt-4.1-mini",
"gpt-4.1-nano",
"gpt-4o",
"gpt-4o-mini",
#
# Pattern-based matching — any OpenAI flagship model (gpt-*, o1*, o3*, o4*)
# is assumed to support Priority Processing. service_tier=priority is silently
# ignored by non-OpenAI endpoints (OpenRouter/Copilot/opencode-zen proxies
# strip the field), so false positives are harmless. Codex-series models
# (gpt-5-codex, gpt-5.3-codex, etc.) are excluded — they don't expose the
# service_tier parameter through the Codex Responses API.
_OPENAI_FAST_MODE_PREFIXES: tuple[str, ...] = (
"gpt-",
"o1",
"o3",
"o4-mini",
})
"o4",
)
def _is_openai_fast_model(model_id: Optional[str]) -> bool:
"""Return True if the model is an OpenAI flagship eligible for Priority Processing."""
raw = _strip_vendor_prefix(str(model_id or ""))
base = raw.split(":")[0]
if not base:
return False
# Exclude Codex-series — they route through the Codex Responses API
# which doesn't accept service_tier.
if "codex" in base:
return False
return any(base.startswith(prefix) for prefix in _OPENAI_FAST_MODE_PREFIXES)
# Models that support Anthropic Fast Mode (speed="fast").
# See https://platform.claude.com/docs/en/build-with-claude/fast-mode
# Currently only Claude Opus 4.6. Both hyphen and dot variants are stored
# to handle native Anthropic (claude-opus-4-6) and OpenRouter (claude-opus-4.6).
_ANTHROPIC_FAST_MODE_MODELS: frozenset[str] = frozenset({
"claude-opus-4-6",
"claude-opus-4.6",
})
#
# Pattern-based matching — any claude-* model is eligible. The anthropic
# adapter gates speed=fast on native Anthropic endpoints only (see
# _is_third_party_anthropic_endpoint in agent/anthropic_adapter.py), so
# third-party proxies that would reject the beta header are protected.
def _strip_vendor_prefix(model_id: str) -> str:
@@ -1660,20 +1735,14 @@ def _strip_vendor_prefix(model_id: str) -> str:
def model_supports_fast_mode(model_id: Optional[str]) -> bool:
"""Return whether Hermes should expose the /fast toggle for this model."""
raw = _strip_vendor_prefix(str(model_id or ""))
if raw in _PRIORITY_PROCESSING_MODELS:
return True
# Anthropic fast mode — strip date suffixes (e.g. claude-opus-4-6-20260401)
# and OpenRouter variant tags (:fast, :beta) for matching.
base = raw.split(":")[0]
return base in _ANTHROPIC_FAST_MODE_MODELS
return _is_anthropic_fast_model(model_id) or _is_openai_fast_model(model_id)
def _is_anthropic_fast_model(model_id: Optional[str]) -> bool:
"""Return True if the model supports Anthropic's fast mode (speed='fast')."""
"""Return True if the model is a Claude model eligible for Anthropic Fast Mode."""
raw = _strip_vendor_prefix(str(model_id or ""))
base = raw.split(":")[0]
return base in _ANTHROPIC_FAST_MODE_MODELS
return base.startswith("claude-")
def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | None:
@@ -1695,14 +1764,61 @@ def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | Non
def _resolve_copilot_catalog_api_key() -> str:
"""Best-effort GitHub token for fetching the Copilot model catalog."""
"""Best-effort GitHub token for fetching the Copilot model catalog.
Resolution order:
1. ``resolve_api_key_provider_credentials("copilot")`` env vars
(``COPILOT_GITHUB_TOKEN`` / ``GH_TOKEN`` / ``GITHUB_TOKEN``) plus
the ``gh auth token`` CLI fallback.
2. ``read_credential_pool("copilot")`` a token (typically a
``gho_*`` from device-code login, or a fine-grained PAT) stored in
``auth.json`` under ``credential_pool.copilot[]``. The pool is
populated by ``hermes auth add copilot`` and by ``_seed_from_env``
when the env var is set in ``~/.hermes/.env``.
Without (2), users whose only Copilot credential is in the pool see
the ``/model`` picker fall back to a stale hardcoded list because the
live catalog fetch silently 401s. To avoid wedging on a malformed pool
entry, each candidate is exchanged via ``exchange_copilot_token``
only entries that actually exchange successfully are returned, so a
later valid entry is reachable when an earlier one is unsupported.
"""
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("copilot")
return str(creds.get("api_key") or "").strip()
api_key = str(creds.get("api_key") or "").strip()
if api_key:
return api_key
except Exception:
return ""
pass
try:
from hermes_cli.auth import read_credential_pool
from hermes_cli.copilot_auth import (
exchange_copilot_token,
validate_copilot_token,
)
for entry in read_credential_pool("copilot"):
if not isinstance(entry, dict):
continue
raw = str(entry.get("access_token") or "").strip()
if not raw:
continue
valid, _ = validate_copilot_token(raw)
if not valid:
continue
try:
api_token, _expires_at = exchange_copilot_token(raw)
except Exception:
continue
if api_token:
return api_token
except Exception:
pass
return ""
# Providers where models.dev is treated as authoritative: curated static
@@ -1884,6 +2000,18 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = fetch_api_models(api_key, base_url)
if live:
return live
# Bedrock uses live discovery keyed by the resolved AWS region so that
# EU/AP users see eu.*/ap.* model IDs instead of the static us.* list.
# Note: early return intentionally skips _MODELS_DEV_PREFERRED merge
# below — bedrock is not expected to appear in that table.
if normalized == "bedrock":
try:
from agent.bedrock_adapter import bedrock_model_ids_or_none
ids = bedrock_model_ids_or_none()
if ids is not None:
return ids
except Exception:
pass
curated_static = list(_PROVIDER_MODELS.get(normalized, []))
if normalized in _MODELS_DEV_PREFERRED:
return _merge_with_models_dev(normalized, curated_static)
@@ -2079,6 +2207,228 @@ def _is_github_models_base_url(base_url: Optional[str]) -> bool:
)
def _lmstudio_server_root(base_url: Optional[str]) -> Optional[str]:
"""Strip ``/v1`` suffix from an LM Studio base URL to get the native API root.
Returns ``None`` when the base URL is empty/invalid.
"""
root = (base_url or "").strip().rstrip("/")
if root.endswith("/v1"):
root = root[:-3].rstrip("/")
return root or None
def _lmstudio_request_headers(api_key: Optional[str] = None) -> dict:
"""Build HTTP headers for LM Studio native API requests."""
headers = {"User-Agent": _HERMES_USER_AGENT}
token = str(api_key or "").strip()
if token:
headers["Authorization"] = f"Bearer {token}"
return headers
def _lmstudio_fetch_raw_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> Optional[list[dict]]:
"""Fetch the raw model list from LM Studio's ``/api/v1/models``.
Returns the ``models`` list of dicts on success, ``None`` on network
errors or malformed responses. Raises ``AuthError`` on HTTP 401/403.
"""
server_root = _lmstudio_server_root(base_url)
if not server_root:
return None
headers = _lmstudio_request_headers(api_key)
request = urllib.request.Request(server_root + "/api/v1/models", headers=headers)
try:
with urllib.request.urlopen(request, timeout=timeout) as resp:
payload = json.loads(resp.read().decode())
except urllib.error.HTTPError as exc:
if exc.code in (401, 403):
from hermes_cli.auth import AuthError
raise AuthError(
f"LM Studio rejected the request with HTTP {exc.code}.",
provider="lmstudio",
code="auth_rejected",
) from exc
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s failed with HTTP %s", server_root, exc.code,
)
return None
except Exception as exc:
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s failed: %s", server_root, exc,
)
return None
raw_models = payload.get("models") if isinstance(payload, dict) else None
if not isinstance(raw_models, list):
import logging
logging.getLogger(__name__).debug(
"LM Studio probe at %s returned malformed payload (no `models` list)",
server_root,
)
return None
return raw_models
def probe_lmstudio_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> Optional[list[str]]:
"""Probe LM Studio's model listing.
Returns chat-capable model keys on success, including the valid empty-list
case when the server is reachable but has no non-embedding models.
Returns ``None`` on network errors, malformed responses, or empty/invalid
base URLs.
Raises ``AuthError`` on HTTP 401/403 so callers can surface token issues
separately from reachability problems.
"""
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=timeout)
if raw_models is None:
return None
keys: list[str] = []
for raw in raw_models:
if not isinstance(raw, dict):
continue
if str(raw.get("type") or "").strip().lower() == "embedding":
continue
key = str(raw.get("key") or raw.get("id") or "").strip()
if key and key not in keys:
keys.append(key)
return keys
def fetch_lmstudio_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
timeout: float = 5.0,
) -> list[str]:
"""Fetch LM Studio chat-capable model keys from native ``/api/v1/models``.
Returns a list of model keys (e.g. ``publisher/model-name``) with embedding
models filtered out. Returns an empty list on network errors, malformed
responses, or empty/invalid base URLs.
Raises ``AuthError`` on HTTP 401/403 so callers can distinguish a missing
or wrong ``LM_API_KEY`` from an unreachable server the most common
LM Studio support case once auth-enabled mode is turned on.
"""
models = probe_lmstudio_models(api_key=api_key, base_url=base_url, timeout=timeout)
return models or []
def ensure_lmstudio_model_loaded(
model: str,
base_url: Optional[str],
api_key: Optional[str],
target_context_length: int,
timeout: float = 120.0,
) -> Optional[int]:
"""Ensure LM Studio has ``model`` loaded with at least ``target_context_length``.
No-op when an instance is already loaded with sufficient context. Otherwise
POSTs ``/api/v1/models/load`` to (re)load with the target context, capped
at the model's ``max_context_length``. Returns the resolved loaded context
length, or ``None`` when the probe / load failed.
"""
server_root = _lmstudio_server_root(base_url)
if not server_root:
return None
headers = _lmstudio_request_headers(api_key)
try:
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=10)
except Exception:
raw_models = None
if raw_models is None:
return None
target_entry = None
for raw in raw_models:
if not isinstance(raw, dict):
continue
if raw.get("key") == model or raw.get("id") == model:
target_entry = raw
break
if target_entry is None:
return None
max_ctx = target_entry.get("max_context_length")
if isinstance(max_ctx, int) and max_ctx > 0:
target_context_length = min(target_context_length, max_ctx)
for inst in target_entry.get("loaded_instances") or []:
cfg = inst.get("config") if isinstance(inst, dict) else None
loaded_ctx = cfg.get("context_length") if isinstance(cfg, dict) else None
if isinstance(loaded_ctx, int) and loaded_ctx >= target_context_length:
return loaded_ctx
body = json.dumps({
"model": model,
"context_length": target_context_length,
}).encode()
load_headers = dict(headers)
load_headers["Content-Type"] = "application/json"
try:
with urllib.request.urlopen(
urllib.request.Request(
server_root + "/api/v1/models/load",
data=body,
headers=load_headers,
method="POST",
),
timeout=timeout,
) as resp:
resp.read()
except Exception:
return None
return target_context_length
def lmstudio_model_reasoning_options(
model: str,
base_url: Optional[str],
api_key: Optional[str] = None,
timeout: float = 5.0,
) -> list[str]:
"""Return the reasoning ``allowed_options`` LM Studio publishes for ``model``.
Pulls ``capabilities.reasoning.allowed_options`` from ``/api/v1/models``.
Returns ``[]`` when the model is unknown, the endpoint is unreachable,
or the model does not declare a reasoning capability.
"""
try:
raw_models = _lmstudio_fetch_raw_models(api_key=api_key, base_url=base_url, timeout=timeout)
except Exception:
raw_models = None
if not raw_models:
return []
for raw in raw_models:
if not isinstance(raw, dict):
continue
if raw.get("key") != model and raw.get("id") != model:
continue
caps = raw.get("capabilities")
reasoning = caps.get("reasoning") if isinstance(caps, dict) else None
opts = reasoning.get("allowed_options") if isinstance(reasoning, dict) else None
if isinstance(opts, list):
return [str(o).strip().lower() for o in opts if isinstance(o, str)]
return []
return []
def _fetch_github_models(api_key: Optional[str] = None, timeout: float = 5.0) -> Optional[list[str]]:
catalog = fetch_github_model_catalog(api_key=api_key, timeout=timeout)
if not catalog:
@@ -2674,6 +3024,40 @@ def validate_requested_model(
"message": "Model names cannot contain spaces.",
}
if normalized == "lmstudio":
from hermes_cli.auth import AuthError
# Use probe_lmstudio_models so we can distinguish None (unreachable
# / malformed response) from [] (reachable, but no chat-capable models
# are loaded). fetch_lmstudio_models collapses both to [].
try:
models = probe_lmstudio_models(api_key=api_key, base_url=base_url)
except AuthError as exc:
return {
"accepted": False, "persist": False, "recognized": False,
"message": (
f"{exc} Set `LM_API_KEY` (or update it) to match the server's bearer token."
),
}
if models is None:
return {
"accepted": False, "persist": False, "recognized": False,
"message": f"Could not reach LM Studio's `/api/v1/models` to validate `{requested}`.",
}
if not models:
return {
"accepted": False, "persist": False, "recognized": False,
"message": (
f"LM Studio is reachable but no chat-capable models are loaded. "
f"Load `{requested}` in LM Studio (Developer tab → Load Model) and try again."
),
}
if requested_for_lookup in set(models):
return {"accepted": True, "persist": True, "recognized": True, "message": None}
return {
"accepted": False, "persist": False, "recognized": False,
"message": f"Model `{requested}` was not found in LM Studio's model listing.",
}
if normalized == "custom":
# Try probing with correct auth for the api_mode.
if api_mode == "anthropic_messages":
+145 -16
View File
@@ -3,7 +3,8 @@
Bypasses cli.py entirely. No banner, no spinner, no session_id line,
no stderr chatter. Just the agent's final text to stdout.
Toolsets = whatever the user has configured for "cli" in `hermes tools`.
Toolsets = explicit --toolsets when provided, otherwise whatever the user has
configured for "cli" in `hermes tools`.
Rules / memory / AGENTS.md / preloaded skills = same as a normal chat turn.
Approvals = auto-bypassed (HERMES_YOLO_MODE=1 is set for the call).
Working directory = the user's CWD (AGENTS.md etc. resolve from there as usual).
@@ -28,10 +29,103 @@ from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
def _normalize_toolsets(toolsets: object = None) -> list[str] | None:
if not toolsets:
return None
raw_items = [toolsets] if isinstance(toolsets, str) else toolsets
if not isinstance(raw_items, (list, tuple)):
raw_items = [raw_items]
normalized: list[str] = []
for item in raw_items:
if isinstance(item, str):
normalized.extend(part.strip() for part in item.split(","))
else:
normalized.append(str(item).strip())
return [item for item in normalized if item] or None
def _validate_explicit_toolsets(toolsets: object = None) -> tuple[list[str] | None, str | None]:
normalized = _normalize_toolsets(toolsets)
if normalized is None:
return None, None
try:
from toolsets import validate_toolset
except Exception as exc:
return None, f"hermes -z: failed to validate --toolsets: {exc}\n"
built_in = [name for name in normalized if validate_toolset(name)]
unresolved = [name for name in normalized if name not in built_in]
if unresolved:
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
plugin_valid = [name for name in unresolved if validate_toolset(name)]
except Exception:
plugin_valid = []
if plugin_valid:
built_in.extend(plugin_valid)
unresolved = [name for name in unresolved if name not in plugin_valid]
if any(name in {"all", "*"} for name in built_in):
ignored = [name for name in normalized if name not in {"all", "*"}]
if ignored:
sys.stderr.write(
"hermes -z: --toolsets all enables every toolset; "
f"ignoring additional entries: {', '.join(ignored)}\n"
)
return None, None
mcp_names: set[str] = set()
mcp_disabled: set[str] = set()
if unresolved:
try:
from hermes_cli.config import read_raw_config
from hermes_cli.tools_config import _parse_enabled_flag
cfg = read_raw_config()
mcp_servers = cfg.get("mcp_servers") if isinstance(cfg.get("mcp_servers"), dict) else {}
for name, server_cfg in mcp_servers.items():
if not isinstance(server_cfg, dict):
continue
if _parse_enabled_flag(server_cfg.get("enabled", True), default=True):
mcp_names.add(str(name))
else:
mcp_disabled.add(str(name))
except Exception:
mcp_names = set()
mcp_disabled = set()
mcp_valid = [name for name in unresolved if name in mcp_names]
disabled = [name for name in unresolved if name in mcp_disabled]
unknown = [name for name in unresolved if name not in mcp_names and name not in mcp_disabled]
valid = built_in + mcp_valid
if unknown:
sys.stderr.write(f"hermes -z: ignoring unknown --toolsets entries: {', '.join(unknown)}\n")
if disabled:
sys.stderr.write(
"hermes -z: ignoring disabled MCP servers (set enabled: true in config.yaml to use): "
f"{', '.join(disabled)}\n"
)
if not valid:
return None, "hermes -z: --toolsets did not contain any valid toolsets.\n"
return valid, None
def run_oneshot(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
toolsets: object = None,
) -> int:
"""Execute a single prompt and print only the final content block.
@@ -42,6 +136,7 @@ def run_oneshot(
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
toolsets: Optional comma-separated string or iterable of toolsets.
Returns the exit code. Caller should sys.exit() with the return.
"""
@@ -65,6 +160,12 @@ def run_oneshot(
)
return 2
explicit_toolsets, toolsets_error = _validate_explicit_toolsets(toolsets)
if toolsets_error:
sys.stderr.write(toolsets_error)
return 2
use_config_toolsets = _normalize_toolsets(toolsets) is None
# Auto-approve any shell / tool approvals. Non-interactive by
# definition — a prompt would hang forever.
os.environ["HERMES_YOLO_MODE"] = "1"
@@ -77,7 +178,13 @@ def run_oneshot(
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
response = _run_agent(prompt, model=model, provider=provider)
response = _run_agent(
prompt,
model=model,
provider=provider,
toolsets=explicit_toolsets,
use_config_toolsets=use_config_toolsets,
)
finally:
try:
devnull.close()
@@ -96,6 +203,8 @@ def _run_agent(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
toolsets: object = None,
use_config_toolsets: bool = True,
) -> str:
"""Build an AIAgent exactly like a normal CLI chat turn would, then
run a single conversation. Returns the final response string."""
@@ -128,32 +237,52 @@ def _run_agent(
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
explicit_base_url_from_alias: Optional[str] = None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
# First check DIRECT_ALIASES populated from config.yaml `model_aliases:`.
# These map a user-defined alias to (model, provider, base_url) for
# endpoints not in any catalog (local servers, custom proxies, etc.).
try:
from hermes_cli import model_switch as _ms
_ms._ensure_direct_aliases()
direct = _ms.DIRECT_ALIASES.get(explicit_model.strip().lower())
except Exception:
direct = None
if direct is not None:
effective_model = direct.model
effective_provider = direct.provider
if direct.base_url:
explicit_base_url_from_alias = direct.base_url.rstrip("/")
else:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
explicit_base_url=explicit_base_url_from_alias,
)
# Pull in whatever toolsets the user has enabled for "cli".
# sorted() gives stable ordering; set→list for AIAgent's signature.
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
# Pull in explicit toolsets when provided; otherwise use whatever the user
# has enabled for "cli". sorted() gives stable ordering for config-derived
# sets; explicit values preserve user order.
toolsets_list = _normalize_toolsets(toolsets)
if toolsets_list is None and use_config_toolsets:
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
agent = AIAgent(
api_key=runtime.get("api_key"),
+16 -1
View File
@@ -45,6 +45,7 @@ from typing import Any, Callable, Dict, List, Optional, Set, Union
from hermes_constants import get_hermes_home
from utils import env_var_enabled
from hermes_cli.config import cfg_get
try:
import yaml
@@ -79,6 +80,20 @@ VALID_HOOKS: Set[str] = {
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
# Approval lifecycle hooks. Fired by tools/approval.py when a dangerous
# command needs user approval -- fires BOTH for CLI-interactive prompts
# and for gateway/ACP approvals (Telegram, Discord, Slack, TUI, etc.).
# Observers only: return values are ignored. Plugins cannot veto or
# pre-answer an approval from these hooks (use pre_tool_call to block
# a tool before it reaches approval).
#
# Kwargs for pre_approval_request:
# command: str, description: str, pattern_key: str, pattern_keys: list[str],
# session_key: str, surface: "cli" | "gateway"
# Kwargs for post_approval_response: same as above plus
# choice: "once" | "session" | "always" | "deny" | "timeout"
"pre_approval_request",
"post_approval_response",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
@@ -101,7 +116,7 @@ def _get_disabled_plugins() -> set:
try:
from hermes_cli.config import load_config
config = load_config()
disabled = config.get("plugins", {}).get("disabled", [])
disabled = cfg_get(config, "plugins", "disabled", default=[])
return set(disabled) if isinstance(disabled, list) else set()
except Exception:
return set()
+4 -4
View File
@@ -18,6 +18,7 @@ from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_home
from hermes_cli.config import cfg_get
logger = logging.getLogger(__name__)
@@ -519,7 +520,7 @@ def _get_disabled_set() -> set:
try:
from hermes_cli.config import load_config
config = load_config()
disabled = config.get("plugins", {}).get("disabled", [])
disabled = cfg_get(config, "plugins", "disabled", default=[])
return set(disabled) if isinstance(disabled, list) else set()
except Exception:
return set()
@@ -763,7 +764,7 @@ def _get_current_memory_provider() -> str:
try:
from hermes_cli.config import load_config
config = load_config()
return config.get("memory", {}).get("provider", "") or ""
return cfg_get(config, "memory", "provider", default="") or ""
except Exception:
return ""
@@ -773,7 +774,7 @@ def _get_current_context_engine() -> str:
try:
from hermes_cli.config import load_config
config = load_config()
return config.get("context", {}).get("engine", "compressor") or "compressor"
return cfg_get(config, "context", "engine", default="compressor") or "compressor"
except Exception:
return "compressor"
@@ -999,7 +1000,6 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
# We need to map logical cursor positions to screen rows
# accounting for non-navigable separator/headers
draw_row = 0 # tracks navigable item index
# --- General Plugins section ---
if n_plugins > 0:
+87 -4
View File
@@ -71,6 +71,29 @@ _CLONE_ALL_STRIP = [
"processes.json",
]
def _clone_all_copytree_ignore(source_dir: Path):
"""Ignore ``profiles/`` at the root of *source_dir* only.
``~/.hermes`` contains ``profiles/<name>/`` for sibling named profiles.
``shutil.copytree`` would otherwise duplicate that entire tree inside the
new profile (recursive ``.../profiles/.../profiles/...``). Export already
excludes ``profiles`` via ``_DEFAULT_EXPORT_EXCLUDE_ROOT`` match that
behavior for ``--clone-all``.
"""
source_resolved = source_dir.resolve()
def _ignore(directory: str, names: List[str]) -> List[str]:
try:
if Path(directory).resolve() == source_resolved:
return [n for n in names if n == "profiles"]
except (OSError, ValueError):
pass
return []
return _ignore
# Directories/files to exclude when exporting the default (~/.hermes) profile.
# The default profile contains infrastructure (repo checkout, worktrees, DBs,
# caches, binaries) that named profiles don't have. We exclude those so the
@@ -424,8 +447,12 @@ def create_profile(
)
if clone_all and source_dir:
# Full copy of source profile
shutil.copytree(source_dir, profile_dir)
# Full copy of source profile (exclude sibling ~/.hermes/profiles/)
shutil.copytree(
source_dir,
profile_dir,
ignore=_clone_all_copytree_ignore(source_dir),
)
# Strip runtime files
for stale in _CLONE_ALL_STRIP:
(profile_dir / stale).unlink(missing_ok=True)
@@ -954,6 +981,59 @@ def import_profile(archive_path: str, name: Optional[str] = None) -> Path:
# Rename
# ---------------------------------------------------------------------------
def _migrate_honcho_profile_host(old_name: str, new_name: str, new_dir: Path) -> None:
"""Rename Honcho host blocks for a renamed profile without changing peers."""
old_host = f"hermes.{old_name}"
new_host = f"hermes.{new_name}"
candidates = [
new_dir / "honcho.json",
_get_default_hermes_home() / "honcho.json",
Path.home() / ".honcho" / "config.json",
]
seen: set[Path] = set()
for path in candidates:
try:
resolved = path.resolve()
except OSError:
resolved = path
if resolved in seen or not path.is_file():
continue
seen.add(resolved)
try:
raw = json.loads(path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
hosts = raw.get("hosts")
if not isinstance(hosts, dict) or old_host not in hosts:
continue
if new_host in hosts:
print(f"⚠ Honcho host block not migrated: {new_host} already exists in {path}")
continue
block = hosts[old_host]
if isinstance(block, dict) and "aiPeer" not in block:
bare = old_host.split(".", 1)[1] if "." in old_host else old_host
block["aiPeer"] = bare
hosts[new_host] = hosts.pop(old_host)
tmp = path.with_suffix(path.suffix + ".tmp")
try:
tmp.write_text(json.dumps(raw, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
tmp.replace(path)
except OSError:
try:
tmp.unlink(missing_ok=True)
except OSError:
pass
continue
print(f"✓ Honcho host updated: {old_host}{new_host}")
def rename_profile(old_name: str, new_name: str) -> Path:
"""Rename a profile: directory, wrapper script, service, active_profile.
@@ -984,7 +1064,10 @@ def rename_profile(old_name: str, new_name: str) -> Path:
old_dir.rename(new_dir)
print(f"✓ Renamed {old_dir.name}{new_dir.name}")
# 3. Update wrapper script
# 3. Update profile-scoped Honcho host blocks, preserving aiPeer identity
_migrate_honcho_profile_host(old_name, new_name, new_dir)
# 4. Update wrapper script
remove_wrapper_script(old_name)
collision = check_alias_collision(new_name)
if not collision:
@@ -993,7 +1076,7 @@ def rename_profile(old_name: str, new_name: str) -> Path:
else:
print(f"⚠ Cannot create alias '{new_name}'{collision}")
# 4. Update active_profile if it pointed to old name
# 5. Update active_profile if it pointed to old name
try:
if get_active_profile() == old_name:
set_active_profile(new_name)
+28
View File
@@ -71,6 +71,13 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
auth_type="oauth_external",
base_url_override="cloudcode-pa://google",
),
"lmstudio": HermesOverlay(
transport="openai_chat",
auth_type="api_key",
extra_env_vars=("LM_API_KEY",),
base_url_override="http://127.0.0.1:1234/v1",
base_url_env_var="LM_BASE_URL",
),
"copilot-acp": HermesOverlay(
transport="codex_responses",
auth_type="external_process",
@@ -104,6 +111,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="anthropic_messages",
base_url_env_var="MINIMAX_BASE_URL",
),
"minimax-oauth": HermesOverlay(
transport="anthropic_messages",
auth_type="oauth_external",
base_url_override="https://api.minimax.io/anthropic",
),
"minimax-cn": HermesOverlay(
transport="anthropic_messages",
base_url_env_var="MINIMAX_CN_BASE_URL",
@@ -158,6 +170,10 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="XIAOMI_BASE_URL",
),
"tencent-tokenhub": HermesOverlay(
transport="openai_chat",
base_url_env_var="TOKENHUB_BASE_URL",
),
"arcee": HermesOverlay(
transport="openai_chat",
base_url_override="https://api.arcee.ai/api/v1",
@@ -179,6 +195,10 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat", # default; overridden by api_mode in config
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
"bedrock": HermesOverlay(
transport="bedrock_converse",
auth_type="aws_sdk",
),
}
@@ -293,6 +313,12 @@ ALIASES: Dict[str, str] = {
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
# tencent
"tencent": "tencent-tokenhub",
"tokenhub": "tencent-tokenhub",
"tencent-cloud": "tencent-tokenhub",
"tencentmaas": "tencent-tokenhub",
# bedrock
"aws": "bedrock",
"aws-bedrock": "bedrock",
@@ -330,6 +356,8 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
"gmi": "GMI Cloud",
"tencent-tokenhub": "Tencent TokenHub",
"lmstudio": "LM Studio",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+83 -14
View File
@@ -260,11 +260,16 @@ def _resolve_runtime_from_pool_entry(
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
if provider in ("opencode-zen", "opencode-go"):
# Re-derive api_mode from the effective model rather than the
# persisted api_mode: the opencode providers serve both
# anthropic_messages and chat_completions models, so the previous
# session's mode must not leak across /model switches.
# Refs #16878.
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, effective_model)
elif configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@@ -464,6 +469,30 @@ def _resolve_named_custom_runtime(
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
) -> Optional[Dict[str, Any]]:
# Bare `provider="custom"` with an explicit base_url (e.g. propagated
# from a `model_aliases:` direct-alias resolution) — build a runtime
# directly so the alias's base_url actually takes effect.
requested_norm = (requested_provider or "").strip().lower()
if requested_norm == "custom" and explicit_base_url:
base_url = explicit_base_url.strip().rstrip("/")
api_key_candidates = [
(explicit_api_key or "").strip(),
os.getenv("OPENAI_API_KEY", "").strip(),
os.getenv("OPENROUTER_API_KEY", "").strip(),
]
api_key = next(
(c for c in api_key_candidates if has_usable_secret(c)),
"",
) or "no-key-required"
return {
"provider": "custom",
"api_mode": _detect_api_mode_for_url(base_url) or "chat_completions",
"base_url": base_url,
"api_key": api_key,
"source": "direct-alias",
"requested_provider": requested_provider,
}
custom_provider = _get_named_custom_provider(requested_provider)
if not custom_provider:
return None
@@ -1041,6 +1070,20 @@ def resolve_runtime_provider(
logger.info("Qwen OAuth credentials failed; "
"falling through to next provider.")
if provider == "minimax-oauth":
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "oauth_minimax":
from hermes_cli.auth import resolve_minimax_oauth_runtime_credentials
creds = resolve_minimax_oauth_runtime_credentials()
return {
"provider": provider,
"api_mode": "anthropic_messages",
"base_url": creds["base_url"],
"api_key": creds["api_key"],
"source": creds.get("source", "oauth"),
"requested_provider": requested_provider,
}
if provider == "google-gemini-cli":
try:
creds = resolve_gemini_oauth_runtime_credentials()
@@ -1095,13 +1138,34 @@ def resolve_runtime_provider(
cfg_base_url and "azure.com" in cfg_base_url.lower()
)
if _is_azure_endpoint:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
# Honor user-specified env var hints on the model config before
# falling back to the built-in AZURE_ANTHROPIC_KEY / ANTHROPIC_API_KEY
# chain. Accept both `key_env` (Hermes canonical — matches the
# custom_providers field name) and `api_key_env` (documented in the
# Azure Foundry guide and read by most Hermes-compatible importers).
# Matches the config.yaml examples in website/docs/guides/azure-foundry.md.
token = ""
for hint_key in ("key_env", "api_key_env"):
env_var = str(model_cfg.get(hint_key) or "").strip()
if env_var:
token = os.getenv(env_var, "").strip()
if token:
break
# Next: an inline api_key on the model config (useful in multi-profile
# setups that want to avoid env-var juggling).
if not token:
token = str(model_cfg.get("api_key") or "").strip()
# Finally fall back to the historical fixed names.
if not token:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
if not token:
raise AuthError(
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or ANTHROPIC_API_KEY."
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or "
"ANTHROPIC_API_KEY, or point key_env/api_key_env in your "
"config.yaml model section at a custom env var."
)
else:
from agent.anthropic_adapter import resolve_anthropic_token
@@ -1212,15 +1276,20 @@ def resolve_runtime_provider(
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Only honor persisted api_mode when it belongs to the same provider family.
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
if provider in ("opencode-zen", "opencode-go"):
# opencode-zen/go must always re-derive api_mode from the
# target model (not the stale persisted api_mode), because
# the same provider serves both anthropic_messages
# (e.g. minimax-m2.7) and chat_completions (e.g.
# deepseek-v4-flash) and switching models via /model would
# otherwise carry the previous mode forward, stripping /v1
# from base_url for chat_completions models and 404'ing.
# Refs #16878.
from hermes_cli.models import opencode_model_api_mode
# Prefer the target_model from the caller (explicit mid-session
# switch) over the stale model.default; see _resolve_runtime_from_pool_entry
# for the same rationale.
_effective = target_model or model_cfg.get("default", "")
api_mode = opencode_model_api_mode(provider, _effective)
elif configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
else:
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
+151 -24
View File
@@ -12,6 +12,7 @@ Config files are stored in ~/.hermes/ for easy access.
"""
import importlib.util
import json
import logging
import os
import shutil
@@ -131,6 +132,7 @@ def _set_reasoning_effort(config: Dict[str, Any], effort: str) -> None:
# Import config helpers
from hermes_cli.config import (
cfg_get,
DEFAULT_CONFIG,
get_hermes_home,
get_config_path,
@@ -138,6 +140,7 @@ from hermes_cli.config import (
load_config,
save_config,
save_env_value,
remove_env_value,
get_env_value,
ensure_hermes_home,
)
@@ -441,7 +444,7 @@ def _print_setup_summary(config: dict, hermes_home):
tool_status.append(("Image Generation", False, "FAL_KEY or OPENAI_API_KEY"))
# TTS — show configured provider
tts_provider = config.get("tts", {}).get("provider", "edge")
tts_provider = cfg_get(config, "tts", "provider", default="edge")
if subscription_features.tts.managed_by_nous:
tool_status.append(("Text-to-Speech (OpenAI via Nous subscription)", True, None))
elif tts_provider == "elevenlabs" and get_env_value("ELEVENLABS_API_KEY"):
@@ -480,7 +483,7 @@ def _print_setup_summary(config: dict, hermes_home):
if subscription_features.modal.managed_by_nous:
tool_status.append(("Modal Execution (Nous subscription)", True, None))
elif config.get("terminal", {}).get("backend") == "modal":
elif cfg_get(config, "terminal", "backend") == "modal":
if subscription_features.modal.direct_override:
tool_status.append(("Modal Execution (direct Modal)", True, None))
else:
@@ -654,6 +657,102 @@ def _prompt_container_resources(config: dict):
pass
def _prompt_vercel_sandbox_settings(config: dict):
"""Prompt for Vercel Sandbox settings without exposing unsupported disk sizing."""
terminal = config.setdefault("terminal", {})
print()
print_info("Vercel Sandbox settings:")
print_info(" Filesystem persistence uses Vercel snapshots.")
print_info(" Snapshots restore files only; live processes do not continue after sandbox recreation.")
from tools.terminal_tool import _SUPPORTED_VERCEL_RUNTIMES
current_runtime = terminal.get("vercel_runtime") or "node24"
supported_label = ", ".join(_SUPPORTED_VERCEL_RUNTIMES)
runtime = prompt(f" Runtime ({supported_label})", current_runtime).strip() or current_runtime
if runtime not in _SUPPORTED_VERCEL_RUNTIMES:
print_warning(f"Unsupported Vercel runtime '{runtime}', keeping {current_runtime}.")
runtime = current_runtime if current_runtime in _SUPPORTED_VERCEL_RUNTIMES else "node24"
terminal["vercel_runtime"] = runtime
save_env_value("TERMINAL_VERCEL_RUNTIME", runtime)
current_persist = terminal.get("container_persistent", True)
persist_label = "yes" if current_persist else "no"
terminal["container_persistent"] = prompt(
" Persist filesystem with snapshots? (yes/no)", persist_label
).lower() in ("yes", "true", "y", "1")
current_cpu = terminal.get("container_cpu", 1)
cpu_str = prompt(" CPU cores", str(current_cpu))
try:
terminal["container_cpu"] = float(cpu_str)
except ValueError:
pass
current_mem = terminal.get("container_memory", 5120)
mem_str = prompt(" Memory in MB (5120 = 5GB)", str(current_mem))
try:
terminal["container_memory"] = int(mem_str)
except ValueError:
pass
if terminal.get("container_disk", 51200) not in (0, 51200):
print_warning("Vercel Sandbox does not support custom disk sizing; resetting container_disk to 51200.")
terminal["container_disk"] = 51200
print()
print_info("Vercel authentication:")
print_info(" Use a long-lived Vercel access token plus project/team IDs.")
linked_project = _read_nearest_vercel_project()
if linked_project:
print_info(" Found defaults in nearest .vercel/project.json.")
remove_env_value("VERCEL_OIDC_TOKEN")
token = prompt(" Vercel access token", get_env_value("VERCEL_TOKEN") or "", password=True)
project = prompt(
" Vercel project ID",
get_env_value("VERCEL_PROJECT_ID") or linked_project.get("projectId", ""),
)
team = prompt(
" Vercel team ID",
get_env_value("VERCEL_TEAM_ID") or linked_project.get("orgId", ""),
)
if token:
save_env_value("VERCEL_TOKEN", token)
if project:
save_env_value("VERCEL_PROJECT_ID", project)
if team:
save_env_value("VERCEL_TEAM_ID", team)
def _read_nearest_vercel_project(start: Path | None = None) -> dict[str, str]:
"""Read project/team defaults from the nearest Vercel link file."""
current = (start or Path.cwd()).resolve()
if current.is_file():
current = current.parent
for directory in (current, *current.parents):
project_file = directory / ".vercel" / "project.json"
if not project_file.exists():
continue
try:
data = json.loads(project_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
return {}
if not isinstance(data, dict):
return {}
return {
key: value
for key, value in {
"projectId": data.get("projectId"),
"orgId": data.get("orgId"),
}.items()
if isinstance(value, str) and value.strip()
}
return {}
# Tool categories and provider config are now in tools_config.py (shared
# between `hermes tools` and `hermes setup tools`).
@@ -712,8 +811,6 @@ def setup_model_provider(config: dict, *, quick: bool = False):
if isinstance(_m, dict):
selected_provider = _m.get("provider")
nous_subscription_selected = selected_provider == "nous"
# ── Same-provider fallback & rotation setup (full setup only) ──
if not quick and _supports_same_provider_pool_setup(selected_provider):
try:
@@ -1181,7 +1278,7 @@ def setup_terminal_backend(config: dict):
print_info(f" Guide: {_DOCS_BASE}/developer-guide/environments")
print()
current_backend = config.get("terminal", {}).get("backend", "local")
current_backend = cfg_get(config, "terminal", "backend", default="local")
is_linux = _platform.system() == "Linux"
# Build backend choices with descriptions
@@ -1191,11 +1288,12 @@ def setup_terminal_backend(config: dict):
"Modal - serverless cloud sandbox",
"SSH - run on a remote machine",
"Daytona - persistent cloud development environment",
"Vercel Sandbox - cloud microVM with snapshot filesystem persistence",
]
idx_to_backend = {0: "local", 1: "docker", 2: "modal", 3: "ssh", 4: "daytona"}
backend_to_idx = {"local": 0, "docker": 1, "modal": 2, "ssh": 3, "daytona": 4}
idx_to_backend = {0: "local", 1: "docker", 2: "modal", 3: "ssh", 4: "daytona", 5: "vercel_sandbox"}
backend_to_idx = {"local": 0, "docker": 1, "modal": 2, "ssh": 3, "daytona": 4, "vercel_sandbox": 5}
next_idx = 5
next_idx = 6
if is_linux:
terminal_choices.append("Singularity/Apptainer - HPC-friendly container")
idx_to_backend[next_idx] = "singularity"
@@ -1230,7 +1328,7 @@ def setup_terminal_backend(config: dict):
print_info(
" the agent starts. CLI mode always starts in the current directory."
)
current_cwd = config.get("terminal", {}).get("cwd", "")
current_cwd = cfg_get(config, "terminal", "cwd", default="")
cwd = prompt(" Messaging working directory", current_cwd or str(Path.home()))
if cwd:
config["terminal"]["cwd"] = cwd
@@ -1261,9 +1359,7 @@ def setup_terminal_backend(config: dict):
print_info(f"Docker found: {docker_bin}")
# Docker image
current_image = config.get("terminal", {}).get(
"docker_image", "nikolaik/python-nodejs:python3.11-nodejs20"
)
current_image = cfg_get(config, "terminal", "docker_image", default="nikolaik/python-nodejs:python3.11-nodejs20")
image = prompt(" Docker image", current_image)
config["terminal"]["docker_image"] = image
save_env_value("TERMINAL_DOCKER_IMAGE", image)
@@ -1283,9 +1379,7 @@ def setup_terminal_backend(config: dict):
else:
print_info(f"Found: {sing_bin}")
current_image = config.get("terminal", {}).get(
"singularity_image", "docker://nikolaik/python-nodejs:python3.11-nodejs20"
)
current_image = cfg_get(config, "terminal", "singularity_image", default="docker://nikolaik/python-nodejs:python3.11-nodejs20")
image = prompt(" Container image", current_image)
config["terminal"]["singularity_image"] = image
save_env_value("TERMINAL_SINGULARITY_IMAGE", image)
@@ -1304,7 +1398,7 @@ def setup_terminal_backend(config: dict):
get_nous_subscription_features(config).nous_auth_present
and is_managed_tool_gateway_ready("modal")
)
modal_mode = normalize_modal_mode(config.get("terminal", {}).get("modal_mode"))
modal_mode = normalize_modal_mode(cfg_get(config, "terminal", "modal_mode"))
use_managed_modal = False
if managed_modal_available:
modal_choices = [
@@ -1441,15 +1535,46 @@ def setup_terminal_backend(config: dict):
print_success(" Configured")
# Daytona image
current_image = config.get("terminal", {}).get(
"daytona_image", "nikolaik/python-nodejs:python3.11-nodejs20"
)
current_image = cfg_get(config, "terminal", "daytona_image", default="nikolaik/python-nodejs:python3.11-nodejs20")
image = prompt(" Sandbox image", current_image)
config["terminal"]["daytona_image"] = image
save_env_value("TERMINAL_DAYTONA_IMAGE", image)
_prompt_container_resources(config)
elif selected_backend == "vercel_sandbox":
print_success("Terminal backend: Vercel Sandbox")
print_info("Cloud microVM sandboxes with snapshot-backed filesystem persistence.")
print_info("Requires the optional SDK: pip install 'hermes-agent[vercel]'")
try:
__import__("vercel")
except ImportError:
print_info("Installing vercel SDK...")
import subprocess
uv_bin = shutil.which("uv")
if uv_bin:
result = subprocess.run(
[uv_bin, "pip", "install", "--python", sys.executable, "vercel"],
capture_output=True,
text=True,
)
else:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "vercel"],
capture_output=True,
text=True,
)
if result.returncode == 0:
print_success("vercel SDK installed")
else:
print_warning("Install failed — run manually: pip install 'hermes-agent[vercel]'")
if result.stderr:
print_info(f" Error: {result.stderr.strip().splitlines()[-1]}")
_prompt_vercel_sandbox_settings(config)
elif selected_backend == "ssh":
print_success("Terminal backend: SSH")
print_info("Run commands on a remote machine via SSH.")
@@ -1503,6 +1628,8 @@ def setup_terminal_backend(config: dict):
save_env_value("TERMINAL_ENV", selected_backend)
if selected_backend == "modal":
save_env_value("TERMINAL_MODAL_MODE", config["terminal"].get("modal_mode", "auto"))
if selected_backend == "vercel_sandbox":
save_env_value("TERMINAL_VERCEL_RUNTIME", config["terminal"].get("vercel_runtime", "node24"))
save_config(config)
print()
print_success(f"Terminal backend set to: {selected_backend}")
@@ -1547,7 +1674,7 @@ def setup_agent_settings(config: dict):
# ── Max Iterations ──
current_max = get_env_value("HERMES_MAX_ITERATIONS") or str(
config.get("agent", {}).get("max_turns", 90)
cfg_get(config, "agent", "max_turns", default=90)
)
print_info("Maximum tool-calling iterations per conversation.")
print_info("Higher = more complex tasks, but costs more tokens.")
@@ -1575,7 +1702,7 @@ def setup_agent_settings(config: dict):
print_info(" all — Show every tool call with a short preview")
print_info(" verbose — Full args, results, and debug logs")
current_mode = config.get("display", {}).get("tool_progress", "all")
current_mode = cfg_get(config, "display", "tool_progress", default="all")
mode = prompt("Tool progress mode", current_mode)
if mode.lower() in ("off", "new", "all", "verbose"):
if "display" not in config:
@@ -1595,7 +1722,7 @@ def setup_agent_settings(config: dict):
config.setdefault("compression", {})["enabled"] = True
current_threshold = config.get("compression", {}).get("threshold", 0.50)
current_threshold = cfg_get(config, "compression", "threshold", default=0.50)
threshold_str = prompt("Compression threshold (0.5-0.95)", str(current_threshold))
try:
threshold = float(threshold_str)
@@ -2603,11 +2730,11 @@ def _get_section_config_summary(config: dict, section_key: str) -> Optional[str]
return "configured"
elif section_key == "terminal":
backend = config.get("terminal", {}).get("backend", "local")
backend = cfg_get(config, "terminal", "backend", default="local")
return f"backend: {backend}"
elif section_key == "agent":
max_turns = config.get("agent", {}).get("max_turns", 90)
max_turns = cfg_get(config, "agent", "max_turns", default=90)
return f"max turns: {max_turns}"
elif section_key == "gateway":
+2 -2
View File
@@ -13,7 +13,7 @@ Config stored in ~/.hermes/config.yaml under:
"""
from typing import List, Optional, Set
from hermes_cli.config import load_config, save_config
from hermes_cli.config import cfg_get, load_config, save_config
from hermes_cli.colors import Colors, color
from hermes_cli.platforms import PLATFORMS as _PLATFORMS
@@ -30,7 +30,7 @@ def get_disabled_skills(config: dict, platform: Optional[str] = None) -> Set[str
global_disabled = set(skills_cfg.get("disabled", []))
if platform is None:
return global_disabled
platform_disabled = skills_cfg.get("platform_disabled", {}).get(platform)
platform_disabled = cfg_get(skills_cfg, "platform_disabled", platform)
if platform_disabled is None:
return global_disabled
return set(platform_disabled)
+23 -14
View File
@@ -68,7 +68,7 @@ All fields are optional. Missing values inherit from the ``default`` skin.
welcome: "Welcome message" # Shown at CLI startup
goodbye: "Goodbye! ⚕" # Shown on exit
response_label: " ⚕ Hermes " # Response box header label
prompt_symbol: " " # Input prompt symbol
prompt_symbol: "" # Input prompt symbol (bare token; renderers add trailing space)
help_header: "(^_^)? Commands" # /help header text
# Tool prefix: character for tool output lines (default: ┊)
@@ -190,7 +190,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "(^_^)? Available Commands",
},
"tool_prefix": "",
@@ -242,7 +242,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Ares Agent! Type your message or /help for commands.",
"goodbye": "Farewell, warrior! ⚔",
"response_label": " ⚔ Ares ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "(⚔) Available Commands",
},
"tool_prefix": "",
@@ -301,7 +301,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "[?] Available Commands",
},
"tool_prefix": "",
@@ -340,7 +340,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "(^_^)? Available Commands",
},
"tool_prefix": "",
@@ -377,7 +377,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "[?] Available Commands",
},
"tool_prefix": "",
@@ -414,7 +414,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! \u2695",
"response_label": " \u2695 Hermes ",
"prompt_symbol": "\u276f ",
"prompt_symbol": "\u276f",
"help_header": "(^_^)? Available Commands",
},
"tool_prefix": "\u250a",
@@ -467,7 +467,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Poseidon Agent! Type your message or /help for commands.",
"goodbye": "Fair winds! Ψ",
"response_label": " Ψ Poseidon ",
"prompt_symbol": "Ψ ",
"prompt_symbol": "Ψ",
"help_header": "(Ψ) Available Commands",
},
"tool_prefix": "",
@@ -539,7 +539,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Sisyphus Agent! Type your message or /help for commands.",
"goodbye": "The boulder waits. ◉",
"response_label": " ◉ Sisyphus ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "(◉) Available Commands",
},
"tool_prefix": "",
@@ -612,7 +612,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"welcome": "Welcome to Charizard Agent! Type your message or /help for commands.",
"goodbye": "Flame out! ✦",
"response_label": " ✦ Charizard ",
"prompt_symbol": " ",
"prompt_symbol": "",
"help_header": "(✦) Available Commands",
},
"tool_prefix": "",
@@ -780,12 +780,21 @@ def init_skin_from_config(config: dict) -> None:
# =============================================================================
def get_active_prompt_symbol(fallback: str = " ") -> str:
"""Get the interactive prompt symbol from the active skin."""
def get_active_prompt_symbol(fallback: str = "") -> str:
"""Return the interactive prompt symbol with a single trailing space.
Skins store ``prompt_symbol`` as a bare token (no spaces). The trailing
space is appended here so callers can drop it straight into a rendered
prompt without hand-rolling whitespace.
"""
try:
return get_active_skin().get_branding("prompt_symbol", fallback)
raw = get_active_skin().get_branding("prompt_symbol", fallback)
except Exception:
return fallback
raw = fallback
cleaned = (raw or fallback).strip()
return f"{cleaned or fallback.strip()} "
+70 -15
View File
@@ -6,7 +6,8 @@ Shows the status of all Hermes Agent components.
import os
import sys
import subprocess
import subprocess # noqa: F401 — re-exported for tests that monkeypatch status.subprocess to guard against regressions
import importlib.util
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.resolve()
@@ -17,6 +18,7 @@ from hermes_cli.config import get_env_path, get_env_value, get_hermes_home, load
from hermes_cli.models import provider_label
from hermes_cli.nous_subscription import get_nous_subscription_features
from hermes_cli.runtime_provider import resolve_requested_provider
from hermes_cli.vercel_auth import describe_vercel_auth
from hermes_constants import OPENROUTER_MODELS_URL
from tools.tool_backend_helpers import managed_nous_tools_enabled
@@ -26,12 +28,15 @@ def check_mark(ok: bool) -> str:
return color("", Colors.RED)
def redact_key(key: str) -> str:
"""Redact an API key for display."""
if not key:
return "(not set)"
if len(key) < 12:
return "***"
return key[:4] + "..." + key[-4:]
"""Redact an API key for display.
Thin wrapper over :func:`agent.redact.mask_secret`. Preserves the
"(not set)" placeholder in dim color to match ``hermes config``'s
output (previously this variant was missing the DIM color
consolidated via PR that also introduced ``mask_secret``).
"""
from agent.redact import mask_secret
return mask_secret(key, empty=color("(not set)", Colors.DIM))
def _format_iso_timestamp(value) -> str:
@@ -154,14 +159,21 @@ def show_status(args):
print(color("◆ Auth Providers", Colors.CYAN, Colors.BOLD))
try:
from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status, get_qwen_auth_status
from hermes_cli.auth import (
get_nous_auth_status,
get_codex_auth_status,
get_qwen_auth_status,
get_minimax_oauth_auth_status,
)
nous_status = get_nous_auth_status()
codex_status = get_codex_auth_status()
qwen_status = get_qwen_auth_status()
minimax_status = get_minimax_oauth_auth_status()
except Exception:
nous_status = {}
codex_status = {}
qwen_status = {}
minimax_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
nous_error = nous_status.get("error")
@@ -214,6 +226,20 @@ def show_status(args):
if qwen_status.get("error") and not qwen_logged_in:
print(f" Error: {qwen_status.get('error')}")
minimax_logged_in = bool(minimax_status.get("logged_in"))
print(
f" {'MiniMax OAuth':<12} {check_mark(minimax_logged_in)} "
f"{'logged in' if minimax_logged_in else 'not logged in (run: hermes auth add minimax-oauth)'}"
)
minimax_region = minimax_status.get("region")
if minimax_logged_in and minimax_region:
print(f" Region: {minimax_region}")
minimax_exp = minimax_status.get("expires_at")
if minimax_exp:
print(f" Access exp: {minimax_exp}")
if minimax_status.get("error") and not minimax_logged_in:
print(f" Error: {minimax_status.get('error')}")
# =========================================================================
# Nous Subscription Features
# =========================================================================
@@ -274,21 +300,33 @@ def show_status(args):
label = "configured" if configured else "not configured (run: hermes model)"
print(f" {pname:<16} {check_mark(configured)} {label}")
# LM Studio reachability — only probe when it's the active provider so
# users with foreign configs don't see noise. Auth rejection vs. silent
# empty list is the most common LM Studio support case.
if _effective_provider_label() == "LM Studio":
from hermes_cli.models import probe_lmstudio_models
model_cfg = config.get("model")
base = (model_cfg.get("base_url") if isinstance(model_cfg, dict) else None) or get_env_value("LM_BASE_URL") or "http://127.0.0.1:1234/v1"
try:
models = probe_lmstudio_models(api_key=get_env_value("LM_API_KEY") or "", base_url=base, timeout=1.5)
if models is None:
ok, msg = False, f"unreachable at {base}"
else:
ok, msg = True, f"reachable ({len(models)} model(s)) at {base}"
except AuthError:
ok, msg = False, "auth rejected — set LM_API_KEY"
print(f" {'LM Studio':<16} {check_mark(ok)} {msg}")
# =========================================================================
# Terminal Configuration
# =========================================================================
print()
print(color("◆ Terminal Backend", Colors.CYAN, Colors.BOLD))
terminal_cfg = config.get("terminal", {}) if isinstance(config.get("terminal"), dict) else {}
terminal_env = os.getenv("TERMINAL_ENV", "")
if not terminal_env:
# Fall back to config file value when env var isn't set
# (hermes status doesn't go through cli.py's config loading)
try:
_cfg = load_config()
terminal_env = _cfg.get("terminal", {}).get("backend", "local")
except Exception:
terminal_env = "local"
terminal_env = terminal_cfg.get("backend", "local")
print(f" Backend: {terminal_env}")
if terminal_env == "ssh":
@@ -302,6 +340,23 @@ def show_status(args):
elif terminal_env == "daytona":
daytona_image = os.getenv("TERMINAL_DAYTONA_IMAGE", "nikolaik/python-nodejs:python3.11-nodejs20")
print(f" Daytona Image: {daytona_image}")
elif terminal_env == "vercel_sandbox":
runtime = os.getenv("TERMINAL_VERCEL_RUNTIME") or terminal_cfg.get("vercel_runtime") or "node24"
persist = os.getenv("TERMINAL_CONTAINER_PERSISTENT")
if persist is None:
persist_enabled = bool(terminal_cfg.get("container_persistent", True))
else:
persist_enabled = persist.lower() in ("1", "true", "yes", "on")
auth_status = describe_vercel_auth()
sdk_ok = importlib.util.find_spec("vercel") is not None
sdk_label = "installed" if sdk_ok else "missing (install: pip install 'hermes-agent[vercel]')"
print(f" Runtime: {runtime}")
print(f" SDK: {check_mark(sdk_ok)} {sdk_label}")
print(f" Auth: {check_mark(auth_status.ok)} {auth_status.label}")
for line in auth_status.detail_lines:
print(f" Auth detail: {line}")
print(f" Persistence: {'snapshot filesystem' if persist_enabled else 'ephemeral filesystem'}")
print(" Processes: live processes do not survive cleanup, snapshots, or sandbox recreation")
sudo_password = os.getenv("SUDO_PASSWORD", "")
print(f" Sudo: {check_mark(bool(sudo_password))} {'enabled' if sudo_password else 'disabled'}")
-1
View File
@@ -263,7 +263,6 @@ TIPS = [
"hermes status --deep runs deeper diagnostic checks across all components.",
# --- Hidden Gems & Power-User Tricks ---
"BOOT.md at ~/.hermes/BOOT.md runs automatically on every gateway start — use it for startup checks.",
"Cron jobs can attach a Python script (--script) whose stdout is injected into the prompt as context.",
"Cron scripts live in ~/.hermes/scripts/ and run before the agent — perfect for data collection pipelines.",
"prefill_messages_file in config.yaml injects few-shot examples into every API call, never saved to history.",
+167 -8
View File
@@ -18,6 +18,7 @@ from typing import Dict, List, Optional, Set
from hermes_cli.config import (
cfg_get,
load_config, save_config, get_env_value, save_env_value,
)
from hermes_cli.colors import Colors, color
@@ -425,6 +426,31 @@ TOOL_CATEGORIES = {
},
],
},
"langfuse": {
"name": "Langfuse Observability",
"icon": "📊",
"providers": [
{
"name": "Langfuse Cloud",
"tag": "Hosted Langfuse (cloud.langfuse.com)",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)", "url": "https://cloud.langfuse.com"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)", "url": "https://cloud.langfuse.com"},
],
"post_setup": "langfuse",
},
{
"name": "Langfuse Self-Hosted",
"tag": "Self-hosted Langfuse instance",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)"},
{"key": "HERMES_LANGFUSE_BASE_URL", "prompt": "Langfuse server URL (e.g. http://localhost:3000)", "default": "http://localhost:3000"},
],
"post_setup": "langfuse",
},
],
},
}
# Simple env-var requirements for toolsets NOT in TOOL_CATEGORIES.
@@ -442,7 +468,10 @@ def _run_post_setup(post_setup_key: str):
import shutil
if post_setup_key in ("agent_browser", "browserbase"):
node_modules = PROJECT_ROOT / "node_modules" / "agent-browser"
if not node_modules.exists() and shutil.which("npm"):
npm_bin = shutil.which("npm")
npx_bin = shutil.which("npx")
# Step 1: install the agent-browser npm package into node_modules/
if not node_modules.exists() and npm_bin:
_print_info(" Installing Node.js dependencies for browser tools...")
import subprocess
result = subprocess.run(
@@ -454,8 +483,94 @@ def _run_post_setup(post_setup_key: str):
else:
from hermes_constants import display_hermes_home
_print_warning(f" npm install failed - run manually: cd {display_hermes_home()}/hermes-agent && npm install")
if result.stderr:
_print_info(f" {result.stderr.strip()[:200]}")
elif not node_modules.exists():
_print_warning(" Node.js not found - browser tools require: npm install (in hermes-agent directory)")
return
# Step 2: only the local browser provider actually needs Chromium on
# disk. Cloud providers (Browserbase, Browser Use, Firecrawl) host
# their own Chromium and don't need the local install.
if post_setup_key != "agent_browser":
return
# Step 3: ensure the Chromium / headless-shell build agent-browser
# drives is actually installed. Without it the CLI hangs on first
# use until the command timeout fires. Skip inside Docker — the
# image bakes Chromium in at build time, and runtime users usually
# can't write to PLAYWRIGHT_BROWSERS_PATH anyway.
try:
# Import lazily so the tools_config UI doesn't pull in the full
# browser_tool module at import time.
from tools.browser_tool import (
_chromium_installed,
_running_in_docker,
)
except Exception as exc: # pragma: no cover — defensive
_print_warning(f" Could not check Chromium status: {exc}")
return
if _chromium_installed():
_print_success(" Chromium browser already installed")
return
if _running_in_docker():
_print_warning(
" Chromium is missing but you're running in Docker."
)
_print_info(
" Pull the latest image to get the bundled Chromium:"
)
_print_info(
" docker pull ghcr.io/nousresearch/hermes-agent:latest"
)
return
if not npx_bin:
_print_warning(
" npx not found - install Chromium manually: npx agent-browser install --with-deps"
)
return
_print_info(" Installing Chromium (~170MB one-time download)...")
import subprocess
# Prefer the bundled agent-browser install subcommand so the
# version of Chromium matches the CLI. Fall back to npx shim on
# setups where the local bin stub isn't present.
local_ab = PROJECT_ROOT / "node_modules" / ".bin" / "agent-browser"
if sys.platform == "win32":
local_ab_win = local_ab.with_suffix(".cmd")
if local_ab_win.exists():
local_ab = local_ab_win
install_cmd = (
[str(local_ab), "install", "--with-deps"]
if local_ab.exists()
else [npx_bin, "-y", "agent-browser", "install", "--with-deps"]
)
try:
result = subprocess.run(
install_cmd,
capture_output=True, text=True, cwd=str(PROJECT_ROOT), timeout=600,
)
if result.returncode == 0:
_print_success(" Chromium installed")
# Invalidate the cached "missing" result so subsequent
# check_browser_requirements() calls see the new install.
import tools.browser_tool as _bt
_bt._cached_chromium_installed = None
else:
_print_warning(" Chromium install failed:")
tail = (result.stderr or result.stdout or "").strip().splitlines()[-3:]
for line in tail:
_print_info(f" {line[:200]}")
_print_info(" Run manually: npx agent-browser install --with-deps")
except subprocess.TimeoutExpired:
_print_warning(" Chromium install timed out (>10min)")
_print_info(" Run manually: npx agent-browser install --with-deps")
except Exception as exc:
_print_warning(f" Chromium install failed: {exc}")
_print_info(" Run manually: npx agent-browser install --with-deps")
elif post_setup_key == "camofox":
camofox_dir = PROJECT_ROOT / "node_modules" / "@askjo" / "camofox-browser"
@@ -567,6 +682,40 @@ def _run_post_setup(post_setup_key: str):
_print_info(" git submodule update --init --recursive")
_print_info(' uv pip install -e "./tinker-atropos"')
elif post_setup_key == "langfuse":
# Install the langfuse SDK.
try:
__import__("langfuse")
_print_success(" langfuse SDK already installed")
except ImportError:
import subprocess
_print_info(" Installing langfuse SDK...")
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "langfuse", "--quiet"],
capture_output=True, text=True, timeout=120,
)
if result.returncode == 0:
_print_success(" langfuse SDK installed")
else:
_print_warning(" langfuse SDK install failed — run manually: pip install langfuse")
# Opt the bundled observability/langfuse plugin into plugins.enabled.
# The plugin ships in the repo but doesn't load until the user enables
# it (standalone plugins are opt-in).
try:
from hermes_cli.plugins_cmd import _get_enabled_set, _save_enabled_set
enabled = _get_enabled_set()
if "observability/langfuse" in enabled or "langfuse" in enabled:
_print_success(" Plugin observability/langfuse already enabled")
else:
enabled.add("observability/langfuse")
_save_enabled_set(enabled)
_print_success(" Plugin observability/langfuse enabled")
except Exception as exc:
_print_warning(f" Could not enable plugin automatically: {exc}")
_print_info(" Run manually: hermes plugins enable observability/langfuse")
_print_info(" Restart Hermes for tracing to take effect.")
_print_info(" Verify: hermes plugins list")
# ─── Platform / Toolset Helpers ───────────────────────────────────────────────
@@ -777,6 +926,16 @@ def _get_platform_tools(
else:
enabled_toolsets.update(explicit_mcp_servers)
# Honor agent.disabled_toolsets from config.yaml — allows users to
# globally suppress specific toolsets (e.g. "memory") across all
# platforms without per-platform toolset configuration. This runs
# last so it overrides everything above.
agent_cfg = config.get("agent") or {}
disabled_toolsets = agent_cfg.get("disabled_toolsets") or []
if disabled_toolsets:
disabled_set = {str(ts) for ts in disabled_toolsets}
enabled_toolsets -= disabled_set
return enabled_toolsets
@@ -807,7 +966,7 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
# Get existing toolsets for this platform
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
existing_toolsets = cfg_get(config, "platform_toolsets", platform, default=[])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
existing_toolsets = [str(ts) for ts in existing_toolsets]
@@ -1194,23 +1353,23 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
if provider.get("tts_provider"):
return (
feature.managed_by_nous
and config.get("tts", {}).get("provider") == provider["tts_provider"]
and cfg_get(config, "tts", "provider") == provider["tts_provider"]
)
if "browser_provider" in provider:
current = config.get("browser", {}).get("cloud_provider")
current = cfg_get(config, "browser", "cloud_provider")
return feature.managed_by_nous and provider["browser_provider"] == current
if provider.get("web_backend"):
current = config.get("web", {}).get("backend")
current = cfg_get(config, "web", "backend")
return feature.managed_by_nous and current == provider["web_backend"]
return feature.managed_by_nous
if provider.get("tts_provider"):
return config.get("tts", {}).get("provider") == provider["tts_provider"]
return cfg_get(config, "tts", "provider") == provider["tts_provider"]
if "browser_provider" in provider:
current = config.get("browser", {}).get("cloud_provider")
current = cfg_get(config, "browser", "cloud_provider")
return provider["browser_provider"] == current
if provider.get("web_backend"):
current = config.get("web", {}).get("backend")
current = cfg_get(config, "web", "backend")
return current == provider["web_backend"]
if provider.get("imagegen_backend"):
image_cfg = config.get("image_gen", {})
+70
View File
@@ -0,0 +1,70 @@
"""Helpers for reporting Vercel Sandbox authentication state."""
from __future__ import annotations
import os
from dataclasses import dataclass
_TOKEN_TUPLE_VARS = ("VERCEL_TOKEN", "VERCEL_PROJECT_ID", "VERCEL_TEAM_ID")
@dataclass(frozen=True)
class VercelAuthStatus:
ok: bool
label: str
detail_lines: tuple[str, ...]
def _present(name: str) -> bool:
return bool(os.getenv(name))
def describe_vercel_auth() -> VercelAuthStatus:
"""Return Vercel auth status without exposing secret values."""
has_oidc = _present("VERCEL_OIDC_TOKEN")
token_states = {name: _present(name) for name in _TOKEN_TUPLE_VARS}
present_token_vars = tuple(name for name, present in token_states.items() if present)
missing_token_vars = tuple(name for name, present in token_states.items() if not present)
if has_oidc:
details = [
"mode: OIDC",
"active env: VERCEL_OIDC_TOKEN",
"note: OIDC tokens are development-only; use access-token auth for deployments and long-running processes",
]
if present_token_vars:
details.append(f"also present: {', '.join(present_token_vars)}")
return VercelAuthStatus(True, "OIDC token via VERCEL_OIDC_TOKEN", tuple(details))
if not missing_token_vars:
return VercelAuthStatus(
True,
"access token + project/team via VERCEL_TOKEN, VERCEL_PROJECT_ID, VERCEL_TEAM_ID",
(
"mode: access token",
"active env: VERCEL_TOKEN, VERCEL_PROJECT_ID, VERCEL_TEAM_ID",
),
)
if present_token_vars:
return VercelAuthStatus(
False,
f"partial access-token auth (missing {', '.join(missing_token_vars)})",
(
"mode: incomplete access token",
f"present env: {', '.join(present_token_vars)}",
f"missing env: {', '.join(missing_token_vars)}",
"recommended: set VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID together",
),
)
return VercelAuthStatus(
False,
"not configured",
(
"recommended: set VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID",
"development-only alternative: set VERCEL_OIDC_TOKEN",
),
)
+31 -7
View File
@@ -33,6 +33,7 @@ if str(PROJECT_ROOT) not in sys.path:
from hermes_cli import __version__, __release_date__
from hermes_cli.config import (
cfg_get,
DEFAULT_CONFIG,
OPTIONAL_ENV_VARS,
get_config_path,
@@ -252,7 +253,12 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"terminal.backend": {
"type": "select",
"description": "Terminal execution backend",
"options": ["local", "docker", "ssh", "modal", "daytona", "singularity"],
"options": ["local", "docker", "ssh", "modal", "daytona", "vercel_sandbox", "singularity"],
},
"terminal.vercel_runtime": {
"type": "select",
"description": "Vercel Sandbox runtime",
"options": ["node24", "node22", "python3.13"], # sync with _SUPPORTED_VERCEL_RUNTIMES in terminal_tool.py
},
"terminal.modal_mode": {
"type": "select",
@@ -338,6 +344,7 @@ _CATEGORY_MERGE: Dict[str, str] = {
"human_delay": "display",
"dashboard": "display",
"code_execution": "agent",
"prompt_caching": "agent",
}
# Display order for tabs — unlisted categories sort alphabetically after these.
@@ -736,7 +743,7 @@ async def get_sessions(limit: int = 20, offset: int = 0):
return {"sessions": sessions, "total": total, "limit": limit, "offset": offset}
finally:
db.close()
except Exception as e:
except Exception:
_log.exception("GET /api/sessions failed")
raise HTTPException(status_code=500, detail="Internal server error")
@@ -968,7 +975,7 @@ async def update_config(body: ConfigUpdate):
try:
save_config(_denormalize_config_from_web(body.config))
return {"ok": True}
except Exception as e:
except Exception:
_log.exception("PUT /api/config failed")
raise HTTPException(status_code=500, detail="Internal server error")
@@ -997,7 +1004,7 @@ async def set_env_var(body: EnvVarUpdate):
try:
save_env_value(body.key, body.value)
return {"ok": True, "key": body.key}
except Exception as e:
except Exception:
_log.exception("PUT /api/env failed")
raise HTTPException(status_code=500, detail="Internal server error")
@@ -1011,7 +1018,7 @@ async def remove_env_var(body: EnvVarDelete):
return {"ok": True, "key": body.key}
except HTTPException:
raise
except Exception as e:
except Exception:
_log.exception("DELETE /api/env failed")
raise HTTPException(status_code=500, detail="Internal server error")
@@ -1214,6 +1221,14 @@ _OAUTH_PROVIDER_CATALOG: tuple[Dict[str, Any], ...] = (
"docs_url": "https://github.com/QwenLM/qwen-code",
"status_fn": None, # dispatched via auth.get_qwen_auth_status
},
{
"id": "minimax-oauth",
"name": "MiniMax (OAuth)",
"flow": "pkce",
"cli_command": "hermes auth add minimax-oauth",
"docs_url": "https://www.minimax.io",
"status_fn": None, # dispatched via auth.get_minimax_oauth_auth_status
},
)
@@ -1257,6 +1272,16 @@ def _resolve_provider_status(provider_id: str, status_fn) -> Dict[str, Any]:
"expires_at": raw.get("expires_at"),
"has_refresh_token": bool(raw.get("has_refresh_token")),
}
if provider_id == "minimax-oauth":
raw = hauth.get_minimax_oauth_auth_status()
return {
"logged_in": bool(raw.get("logged_in")),
"source": "minimax_oauth",
"source_label": f"MiniMax ({raw.get('region', 'global')})",
"token_preview": None,
"expires_at": raw.get("expires_at"),
"has_refresh_token": True,
}
except Exception as e:
return {"logged_in": False, "error": str(e)}
return {"logged_in": False}
@@ -1568,7 +1593,6 @@ async def _start_device_code_flow(provider_id: str) -> Dict[str, Any]:
then spawns a background poller. Returns the user-facing display fields
so the UI can render the verification page link + user code.
"""
from hermes_cli import auth as hauth
if provider_id == "nous":
from hermes_cli.auth import _request_device_code, PROVIDER_REGISTRY
import httpx
@@ -2903,7 +2927,7 @@ async def get_dashboard_themes():
them without a stub.
"""
config = load_config()
active = config.get("dashboard", {}).get("theme", "default")
active = cfg_get(config, "dashboard", "theme", default="default")
user_themes = _discover_user_themes()
seen = set()
themes = []
+4 -3
View File
@@ -11,7 +11,6 @@ hot-reloaded by the webhook adapter without a gateway restart.
"""
import json
import os
import re
import secrets
import time
@@ -19,6 +18,8 @@ from pathlib import Path
from typing import Dict
from hermes_constants import display_hermes_home
from utils import atomic_replace
from hermes_cli.config import cfg_get
_SUBSCRIPTIONS_FILENAME = "webhook_subscriptions.json"
@@ -52,7 +53,7 @@ def _save_subscriptions(subs: Dict[str, dict]) -> None:
json.dumps(subs, indent=2, ensure_ascii=False),
encoding="utf-8",
)
os.replace(str(tmp_path), str(path))
atomic_replace(tmp_path, path)
def _get_webhook_config() -> dict:
@@ -60,7 +61,7 @@ def _get_webhook_config() -> dict:
try:
from hermes_cli.config import load_config
cfg = load_config()
return cfg.get("platforms", {}).get("webhook", {})
return cfg_get(cfg, "platforms", "webhook", default={})
except Exception:
return {}
+280 -122
View File
@@ -33,7 +33,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 10
SCHEMA_VERSION = 11
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -102,22 +102,26 @@ CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestam
FTS_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
content,
content=messages,
content_rowid=id
content
);
CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
INSERT INTO messages_fts(rowid, content) VALUES (
new.id,
COALESCE(new.content, '') || ' ' || COALESCE(new.tool_name, '') || ' ' || COALESCE(new.tool_calls, '')
);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.id, old.content);
DELETE FROM messages_fts WHERE rowid = old.id;
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.id, old.content);
INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
DELETE FROM messages_fts WHERE rowid = old.id;
INSERT INTO messages_fts(rowid, content) VALUES (
new.id,
COALESCE(new.content, '') || ' ' || COALESCE(new.tool_name, '') || ' ' || COALESCE(new.tool_calls, '')
);
END;
"""
@@ -128,22 +132,26 @@ END;
FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
content,
content=messages,
content_rowid=id,
tokenize='trigram'
);
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (
new.id,
COALESCE(new.content, '') || ' ' || COALESCE(new.tool_name, '') || ' ' || COALESCE(new.tool_calls, '')
);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
DELETE FROM messages_fts_trigram WHERE rowid = old.id;
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
DELETE FROM messages_fts_trigram WHERE rowid = old.id;
INSERT INTO messages_fts_trigram(rowid, content) VALUES (
new.id,
COALESCE(new.content, '') || ' ' || COALESCE(new.tool_name, '') || ' ' || COALESCE(new.tool_calls, '')
);
END;
"""
@@ -285,130 +293,201 @@ class SessionDB:
self._conn.close()
self._conn = None
@staticmethod
def _parse_schema_columns(schema_sql: str) -> Dict[str, Dict[str, str]]:
"""Extract expected columns per table from SCHEMA_SQL.
Uses an in-memory SQLite database to parse the SQL SQLite itself
handles all syntax (DEFAULT expressions with commas, inline
REFERENCES, CHECK constraints, etc.) so there are zero regex
edge cases. The in-memory DB is opened, the schema DDL is
executed, and PRAGMA table_info extracts the column metadata.
Adding a column to SCHEMA_SQL is all that's needed; the
reconciliation loop picks it up automatically.
"""
ref = sqlite3.connect(":memory:")
try:
ref.executescript(schema_sql)
table_columns: Dict[str, Dict[str, str]] = {}
for (tbl,) in ref.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name NOT LIKE 'sqlite_%'"
).fetchall():
cols: Dict[str, str] = {}
for row in ref.execute(
f'PRAGMA table_info("{tbl}")'
).fetchall():
# row: (cid, name, type, notnull, dflt_value, pk)
col_name = row[1]
col_type = row[2] or ""
notnull = row[3]
default = row[4]
pk = row[5]
# Reconstruct the type expression for ALTER TABLE ADD COLUMN
parts = [col_type] if col_type else []
if notnull and not pk:
parts.append("NOT NULL")
if default is not None:
parts.append(f"DEFAULT {default}")
cols[col_name] = " ".join(parts)
table_columns[tbl] = cols
return table_columns
finally:
ref.close()
def _reconcile_columns(self, cursor: sqlite3.Cursor) -> None:
"""Ensure live tables have every column declared in SCHEMA_SQL.
Follows the Beets/sqlite-utils pattern: the CREATE TABLE definition
in SCHEMA_SQL is the single source of truth for the desired schema.
On every startup this method diffs the live columns (via PRAGMA
table_info) against the declared columns, and ADDs any that are
missing.
This makes column additions a declarative operation just add
the column to SCHEMA_SQL and it appears on the next startup.
Version-gated migration blocks are no longer needed for ADD COLUMN.
"""
expected = self._parse_schema_columns(SCHEMA_SQL)
for table_name, declared_cols in expected.items():
# Get current columns from the live table
try:
rows = cursor.execute(
f'PRAGMA table_info("{table_name}")'
).fetchall()
except sqlite3.OperationalError:
continue # Table doesn't exist yet (shouldn't happen after executescript)
live_cols = set()
for row in rows:
# PRAGMA table_info returns (cid, name, type, notnull, dflt_value, pk)
name = row[1] if isinstance(row, (tuple, list)) else row["name"]
live_cols.add(name)
for col_name, col_type in declared_cols.items():
if col_name not in live_cols:
safe_name = col_name.replace('"', '""')
try:
cursor.execute(
f'ALTER TABLE "{table_name}" ADD COLUMN "{safe_name}" {col_type}'
)
except sqlite3.OperationalError as exc:
# Expected: "duplicate column name" from a race or
# re-run. Unexpected: "Cannot add a NOT NULL column
# with default value NULL" from a schema mistake.
# Log at DEBUG so it's visible in agent.log.
logger.debug(
"reconcile %s.%s: %s", table_name, col_name, exc,
)
def _init_schema(self):
"""Create tables and FTS if they don't exist, run migrations."""
"""Create tables and FTS if they don't exist, reconcile columns.
Schema management follows the declarative reconciliation pattern
(Beets, sqlite-utils): SCHEMA_SQL is the single source of truth.
On existing databases, _reconcile_columns() diffs live columns
against SCHEMA_SQL and ADDs any missing ones. This eliminates
the version-gated migration chain for column additions, making
it impossible for reordered or inserted migrations to skip columns.
The schema_version table is retained for future data migrations
(transforming existing rows) which cannot be handled declaratively.
"""
cursor = self._conn.cursor()
cursor.executescript(SCHEMA_SQL)
# Check schema version and run migrations
# ── Declarative column reconciliation ──────────────────────────
# Diff live tables against SCHEMA_SQL and ADD any missing columns.
# This is idempotent and self-healing: even if a version-gated
# migration was skipped (e.g. due to version renumbering), the
# column gets created here.
self._reconcile_columns(cursor)
# ── Schema version bookkeeping ─────────────────────────────────
# Bump to current so future data migrations (if any) can gate on
# version. No version-gated column additions remain.
cursor.execute("SELECT version FROM schema_version LIMIT 1")
row = cursor.fetchone()
if row is None:
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
cursor.execute(
"INSERT INTO schema_version (version) VALUES (?)",
(SCHEMA_VERSION,),
)
else:
current_version = row["version"] if isinstance(row, sqlite3.Row) else row[0]
if current_version < 2:
# v2: add finish_reason column to messages
try:
cursor.execute("ALTER TABLE messages ADD COLUMN finish_reason TEXT")
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 2")
if current_version < 3:
# v3: add title column to sessions
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN title TEXT")
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 3")
if current_version < 4:
# v4: add unique index on title (NULLs allowed, only non-NULL must be unique)
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
"ON sessions(title) WHERE title IS NOT NULL"
)
except sqlite3.OperationalError:
pass # Index already exists
cursor.execute("UPDATE schema_version SET version = 4")
if current_version < 5:
new_columns = [
("cache_read_tokens", "INTEGER DEFAULT 0"),
("cache_write_tokens", "INTEGER DEFAULT 0"),
("reasoning_tokens", "INTEGER DEFAULT 0"),
("billing_provider", "TEXT"),
("billing_base_url", "TEXT"),
("billing_mode", "TEXT"),
("estimated_cost_usd", "REAL"),
("actual_cost_usd", "REAL"),
("cost_status", "TEXT"),
("cost_source", "TEXT"),
("pricing_version", "TEXT"),
]
for name, column_type in new_columns:
try:
# name and column_type come from the hardcoded tuple above,
# not user input. Double-quote identifier escaping is applied
# as defense-in-depth; SQLite DDL cannot be parameterized.
safe_name = name.replace('"', '""')
cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
except sqlite3.OperationalError:
pass
cursor.execute("UPDATE schema_version SET version = 5")
if current_version < 6:
# v6: add reasoning columns to messages table — preserves assistant
# reasoning text and structured reasoning_details across gateway
# session turns. Without these, reasoning chains are lost on
# session reload, breaking multi-turn reasoning continuity for
# providers that replay reasoning (OpenRouter, OpenAI, Nous).
for col_name, col_type in [
("reasoning", "TEXT"),
("reasoning_details", "TEXT"),
("codex_reasoning_items", "TEXT"),
]:
try:
safe = col_name.replace('"', '""')
cursor.execute(
f'ALTER TABLE messages ADD COLUMN "{safe}" {col_type}'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 6")
if current_version < 7:
# v7: preserve provider-native reasoning_content separately from
# normalized reasoning text. Kimi/Moonshot replay can require
# this field on assistant tool-call messages when thinking is on.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "reasoning_content" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 7")
if current_version < 8:
# v8: add api_call_count column to sessions — tracks the number
# of individual LLM API calls made within a session (as opposed
# to the session count itself).
try:
cursor.execute(
'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
# Data migrations that can't be expressed declaratively (row
# backfills, index changes tied to a specific version step) stay
# in a version-gated chain. Column additions are handled by
# _reconcile_columns() above and no longer need entries here.
if current_version < 10:
# v10: trigram FTS5 table for CJK/substring search.
# Created via FTS_TRIGRAM_SQL below; backfill existing messages.
# v10: trigram FTS5 table for CJK/substring search. The
# virtual table + triggers are created unconditionally via
# FTS_TRIGRAM_SQL below, but existing rows need a one-time
# backfill into the FTS index.
try:
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
_fts_trigram_exists = True
except sqlite3.OperationalError:
_fts_trigram_exists = False
if not _fts_trigram_exists:
cursor.executescript(FTS_TRIGRAM_SQL)
cursor.execute(
"INSERT INTO messages_fts_trigram(rowid, content) "
"SELECT id, content FROM messages WHERE content IS NOT NULL"
)
cursor.execute("UPDATE schema_version SET version = 10")
if current_version < 11:
# v11: re-index FTS5 tables to cover tool_name + tool_calls and
# switch from external-content to inline mode. Existing DBs have
# old-schema FTS tables and triggers that IF NOT EXISTS won't
# overwrite, so we drop them explicitly and let the post-migration
# existence checks (below) recreate them from FTS_SQL /
# FTS_TRIGRAM_SQL, then backfill every message row. Fixes #16751.
for _trig in (
"messages_fts_insert",
"messages_fts_delete",
"messages_fts_update",
"messages_fts_trigram_insert",
"messages_fts_trigram_delete",
"messages_fts_trigram_update",
):
try:
cursor.execute(f"DROP TRIGGER IF EXISTS {_trig}")
except sqlite3.OperationalError:
pass
for _tbl in ("messages_fts", "messages_fts_trigram"):
try:
cursor.execute(f"DROP TABLE IF EXISTS {_tbl}")
except sqlite3.OperationalError:
pass
# Recreate virtual tables + triggers with the new inline-mode
# schema that indexes content || tool_name || tool_calls.
cursor.executescript(FTS_SQL)
cursor.executescript(FTS_TRIGRAM_SQL)
# Backfill both indexes from every existing messages row.
cursor.execute(
"INSERT INTO messages_fts(rowid, content) "
"SELECT id, "
"COALESCE(content, '') || ' ' || "
"COALESCE(tool_name, '') || ' ' || "
"COALESCE(tool_calls, '') "
"FROM messages"
)
cursor.execute(
"INSERT INTO messages_fts_trigram(rowid, content) "
"SELECT id, "
"COALESCE(content, '') || ' ' || "
"COALESCE(tool_name, '') || ' ' || "
"COALESCE(tool_calls, '') "
"FROM messages"
)
if current_version < SCHEMA_VERSION:
cursor.execute(
"UPDATE schema_version SET version = ?",
(SCHEMA_VERSION,),
)
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
# Unique title index — always ensure it exists
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
@@ -1093,6 +1172,85 @@ class SessionDB:
return self._execute_write(_do)
def replace_messages(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
"""Atomically replace every message for a session.
Used by transcript-rewrite flows such as /retry, /undo, and /compress.
The delete + reinsert sequence must commit as one transaction so a
mid-rewrite failure does not leave SQLite with a partial transcript.
"""
def _do(conn):
conn.execute(
"DELETE FROM messages WHERE session_id = ?", (session_id,)
)
conn.execute(
"UPDATE sessions SET message_count = 0, tool_call_count = 0 WHERE id = ?",
(session_id,),
)
now_ts = time.time()
total_messages = 0
total_tool_calls = 0
for msg in messages:
role = msg.get("role", "unknown")
tool_calls = msg.get("tool_calls")
reasoning_details = msg.get("reasoning_details") if role == "assistant" else None
codex_reasoning_items = (
msg.get("codex_reasoning_items") if role == "assistant" else None
)
codex_message_items = (
msg.get("codex_message_items") if role == "assistant" else None
)
reasoning_details_json = (
json.dumps(reasoning_details) if reasoning_details else None
)
codex_items_json = (
json.dumps(codex_reasoning_items) if codex_reasoning_items else None
)
codex_message_items_json = (
json.dumps(codex_message_items) if codex_message_items else None
)
tool_calls_json = json.dumps(tool_calls) if tool_calls else None
conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
msg.get("content"),
msg.get("tool_call_id"),
tool_calls_json,
msg.get("tool_name"),
now_ts,
msg.get("token_count"),
msg.get("finish_reason"),
msg.get("reasoning") if role == "assistant" else None,
msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details_json,
codex_items_json,
codex_message_items_json,
),
)
total_messages += 1
if tool_calls is not None:
total_tool_calls += (
len(tool_calls) if isinstance(tool_calls, list) else 1
)
now_ts += 1e-6
conn.execute(
"UPDATE sessions SET message_count = ?, tool_call_count = ? WHERE id = ?",
(total_messages, total_tool_calls, session_id),
)
self._execute_write(_do)
def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
"""Load all messages for a session, ordered by timestamp."""
with self._lock:
@@ -1329,9 +1487,9 @@ class SessionDB:
# quotes. FTS5's tokenizer splits on dots and hyphens, turning
# ``chat-send`` into ``chat AND send`` and ``P2.2`` into ``p2 AND 2``.
# Quoting preserves phrase semantics. A single pass avoids the
# double-quoting bug that would occur if dotted and hyphenated
# double-quoting bug that would occur if dotted, hyphenated and underscored
# patterns were applied sequentially (e.g. ``my-app.config``).
sanitized = re.sub(r"\b(\w+(?:[.-]\w+)+)\b", r'"\1"', sanitized)
sanitized = re.sub(r"\b(\w+(?:[._-]\w+)+)\b", r'"\1"', sanitized)
# Step 6: Restore preserved quoted phrases
for i, quoted in enumerate(_quoted_parts):
@@ -1508,8 +1666,8 @@ class SessionDB:
# Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
# Fall back to LIKE substring search.
escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_where = ["m.content LIKE ? ESCAPE '\\'"]
like_params: list = [f"%{escaped}%"]
like_where = ["(m.content LIKE ? ESCAPE '\\' OR m.tool_name LIKE ? ESCAPE '\\' OR m.tool_calls LIKE ? ESCAPE '\\')"]
like_params: list = [f"%{escaped}%", f"%{escaped}%", f"%{escaped}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
+163 -30
View File
@@ -107,17 +107,58 @@ def _run_async(coro):
loop = None
if loop and loop.is_running():
# Inside an async context (gateway, RL env) — run in a fresh thread.
# Inside an async context (gateway, RL env) — run in a fresh thread
# with its own event loop we own a reference to, so on timeout we
# can cancel the task inside that loop (ThreadPoolExecutor.cancel()
# only works on not-yet-started futures — it's a no-op on a running
# worker, which previously leaked the thread on every 300 s timeout).
import concurrent.futures
worker_loop: Optional[asyncio.AbstractEventLoop] = None
loop_ready = threading.Event()
def _run_in_worker():
nonlocal worker_loop
worker_loop = asyncio.new_event_loop()
loop_ready.set()
try:
asyncio.set_event_loop(worker_loop)
return worker_loop.run_until_complete(coro)
finally:
try:
# Cancel anything still pending (e.g. task cancelled
# externally via call_soon_threadsafe on timeout).
pending = asyncio.all_tasks(worker_loop)
for t in pending:
t.cancel()
if pending:
worker_loop.run_until_complete(
asyncio.gather(*pending, return_exceptions=True)
)
except Exception:
pass
worker_loop.close()
pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
future = pool.submit(asyncio.run, coro)
future = pool.submit(_run_in_worker)
try:
return future.result(timeout=300)
except concurrent.futures.TimeoutError:
future.cancel()
# Cancel the coroutine inside its own loop so the worker thread
# can wind down instead of running forever.
if loop_ready.wait(timeout=1.0) and worker_loop is not None:
try:
for t in asyncio.all_tasks(worker_loop):
worker_loop.call_soon_threadsafe(t.cancel)
except RuntimeError:
# Loop already closed — nothing to cancel.
pass
raise
finally:
pool.shutdown(wait=False, cancel_futures=True)
# wait=False: don't block the caller on a stuck coroutine. We've
# already requested cancellation above; the worker will exit
# once the coroutine observes it (usually at the next await).
pool.shutdown(wait=False)
# If we're on a worker thread (e.g., parallel tool execution in
# delegate_task), use a per-thread persistent loop. This avoids
@@ -138,12 +179,18 @@ def _run_async(coro):
discover_builtin_tools()
# MCP tool discovery (external MCP servers from config)
try:
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception as e:
logger.debug("MCP tool discovery failed: %s", e)
# MCP tool discovery (external MCP servers from config) used to run here as
# a module-level side effect. It was removed because discover_mcp_tools()
# internally uses a blocking future.result(timeout=120) wait, and the
# gateway lazy-imports this module from inside the asyncio event loop on
# the first user message — freezing Discord/Telegram heartbeats for up to
# 120s whenever any configured MCP server was slow or unreachable (#16856).
#
# Each entry point now runs discovery explicitly at its own startup:
# - gateway/run.py -> start_gateway() uses run_in_executor
# - cli.py, hermes_cli/* -> inline on startup (no event loop)
# - tui_gateway/server.py -> inline on startup (no event loop)
# - acp_adapter/server.py -> asyncio.to_thread on session init
# Plugin tool discovery (user/project/pip plugins)
try:
@@ -200,6 +247,27 @@ _LEGACY_TOOLSET_MAP = {
# get_tool_definitions (the main schema provider)
# =============================================================================
# Module-level memoization for get_tool_definitions(). Keyed on
# (frozenset(enabled_toolsets), frozenset(disabled_toolsets), registry._generation).
# Hot callers (gateway runner, AIAgent.__init__) invoke this on every turn
# with quiet_mode=True; caching avoids ~7 ms of registry walking + schema
# filtering + check_fn probing per call. Only active when quiet_mode=True
# because quiet_mode=False has stdout side effects (tool-selection prints).
#
# Invalidation happens transparently via the registry's _generation counter,
# which bumps on register() / deregister() / register_toolset_alias(). The
# inner check_fn TTL cache in registry.py handles environment drift (Docker
# daemon start/stop, env var changes, etc.) on a 30 s horizon.
_tool_defs_cache: Dict[tuple, List[Dict[str, Any]]] = {}
def _clear_tool_defs_cache() -> None:
"""Drop memoized get_tool_definitions() results. Called when dynamic
schema dependencies change (e.g. discord capability cache reset,
execute_code sandbox reconfigured)."""
_tool_defs_cache.clear()
def get_tool_definitions(
enabled_toolsets: List[str] = None,
disabled_toolsets: List[str] = None,
@@ -218,6 +286,50 @@ def get_tool_definitions(
Returns:
Filtered list of OpenAI-format tool definitions.
"""
# Fast path: memoized result when the caller doesn't need stdout prints.
# The cache key captures every argument-level input; the registry
# generation captures registry mutations (MCP refresh, plugin load).
# check_fn results are TTL-cached one level down, inside
# registry.get_definitions. The config-mtime fingerprint below captures
# user-visible config edits that affect dynamic schemas (execute_code
# mode, discord action allowlist, etc.) without needing an explicit
# invalidate hook on every config-writer.
if quiet_mode:
try:
from hermes_cli.config import get_config_path
cfg_path = get_config_path()
cfg_stat = cfg_path.stat()
cfg_fp = (cfg_stat.st_mtime_ns, cfg_stat.st_size)
except (FileNotFoundError, OSError, ImportError):
cfg_fp = None
cache_key = (
frozenset(enabled_toolsets) if enabled_toolsets is not None else None,
frozenset(disabled_toolsets) if disabled_toolsets else None,
registry._generation,
cfg_fp,
)
cached = _tool_defs_cache.get(cache_key)
if cached is not None:
# Update _last_resolved_tool_names so downstream callers see
# consistent state even on a cache hit.
global _last_resolved_tool_names
_last_resolved_tool_names = [t["function"]["name"] for t in cached]
# Return a shallow copy of the list but share the dict references —
# schemas are treated as read-only by all known callers.
return list(cached)
result = _compute_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)
if quiet_mode:
_tool_defs_cache[cache_key] = result
return result
def _compute_tool_definitions(
enabled_toolsets: List[str] = None,
disabled_toolsets: List[str] = None,
quiet_mode: bool = False,
) -> List[Dict[str, Any]]:
"""Uncached implementation of :func:`get_tool_definitions`."""
# Determine which tool names the caller wants
tools_to_include: set = set()
@@ -409,24 +521,27 @@ def coerce_tool_args(tool_name: str, args: Dict[str, Any]) -> Dict[str, Any]:
if not prop_schema:
continue
expected = prop_schema.get("type")
if not expected:
if not expected and not _schema_allows_null(prop_schema):
continue
coerced = _coerce_value(value, expected)
coerced = _coerce_value(value, expected, schema=prop_schema)
if coerced is not value:
args[key] = coerced
return args
def _coerce_value(value: str, expected_type):
def _coerce_value(value: str, expected_type, schema: dict | None = None):
"""Attempt to coerce a string *value* to *expected_type*.
Returns the original string when coercion is not applicable or fails.
"""
if _schema_allows_null(schema) and value.strip().lower() == "null":
return None
if isinstance(expected_type, list):
# Union type — try each in order, return first successful coercion
for t in expected_type:
result = _coerce_value(value, t)
result = _coerce_value(value, t, schema=schema)
if result is not value:
return result
return value
@@ -439,9 +554,35 @@ def _coerce_value(value: str, expected_type):
return _coerce_json(value, list)
if expected_type == "object":
return _coerce_json(value, dict)
if expected_type == "null" and value.strip().lower() == "null":
return None
return value
def _schema_allows_null(schema: dict | None) -> bool:
"""Return True when a JSON Schema fragment explicitly permits null."""
if not isinstance(schema, dict):
return False
schema_type = schema.get("type")
if schema_type == "null":
return True
if isinstance(schema_type, list) and "null" in schema_type:
return True
if schema.get("nullable") is True:
return True
for union_key in ("anyOf", "oneOf"):
variants = schema.get(union_key)
if not isinstance(variants, list):
continue
for variant in variants:
if isinstance(variant, dict) and variant.get("type") == "null":
return True
return False
def _coerce_json(value: str, expected_python_type: type):
"""Parse *value* as JSON when the schema expects an array or object.
@@ -527,6 +668,13 @@ def handle_function_call(
# Check plugin hooks for a block directive (unless caller already
# checked — e.g. run_agent._invoke_tool passes skip=True to
# avoid double-firing the hook).
#
# Single-fire contract: pre_tool_call fires exactly once per tool
# execution. get_pre_tool_call_block_message() internally calls
# invoke_hook("pre_tool_call", ...) and returns the first block
# directive (if any), so observer plugins see the hook on that same
# pass. When skip=True, the caller already fired it — do nothing
# here.
if not skip_pre_tool_call_hook:
block_message: Optional[str] = None
try:
@@ -543,21 +691,6 @@ def handle_function_call(
if block_message is not None:
return json.dumps({"error": block_message}, ensure_ascii=False)
else:
# Still fire the hook for observers — just don't check for blocking
# (the caller already did that).
try:
from hermes_cli.plugins import invoke_hook
invoke_hook(
"pre_tool_call",
tool_name=function_name,
args=function_args,
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
)
except Exception:
pass
# Notify the read-loop tracker when a non-read/search tool runs,
# so the *consecutive* counter resets (reads after other work are fine).
@@ -637,7 +770,7 @@ def handle_function_call(
except Exception as e:
error_msg = f"Error executing {function_name}: {str(e)}"
logger.error(error_msg)
logger.exception(error_msg)
return json.dumps({"error": error_msg}, ensure_ascii=False)

Some files were not shown because too many files have changed in this diff Show More