8fdaf4d3d6a877d362b8dd8deec00a9d2caaba17
679 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
850fac14e3 | chore: address copilot comments | ||
|
|
5500b51800 | chore: fix lint | ||
|
|
10deb1b87d |
fix(gateway): canonicalize WhatsApp identity in session keys
Hermes' WhatsApp bridge routinely surfaces the same person under either a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid), and may flip between the two for a single human within the same conversation. Before this change, build_session_key used the raw identifier verbatim, so the bridge reshuffling an alias form produced two distinct session keys for the same person — in two places: 1. DM chat_id — a user's DM sessions split in half, transcripts and per-sender state diverge. 2. Group participant_id (with group_sessions_per_user enabled) — a member's per-user session inside a group splits in half for the same reason. Add a canonicalizer that walks the bridge's lid-mapping-*.json files and picks the shortest/numeric-preferred alias as the stable identity. build_session_key now routes both the DM chat_id and the group participant_id through this helper when the platform is WhatsApp. All other platforms and chat types are untouched. Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier as public helpers. Plugins that need per-sender behaviour (role-based routing, per-contact authorization, policy gating) need the same identity resolution Hermes uses internally; without a public helper, each plugin would have to re-implement the walker against the bridge's internal on-disk format. Keeping this alongside build_session_key makes it authoritative and one refactor away if the bridge ever changes shape. _expand_whatsapp_aliases stays private — it's an implementation detail of how the mapping files are walked, not a contract callers should depend on. |
||
|
|
f49afd3122 |
feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit
|
||
|
|
1840c6a57d |
feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180)
A — 'hermes tools' activation now runs the full Spotify wizard. Previously a user had to (1) toggle the Spotify toolset on in 'hermes tools' AND (2) separately run 'hermes auth spotify' to actually use it. The second step was a discovery gap — the docs mentioned it but nothing in the TUI pointed users there. Now toggling Spotify on calls login_spotify_command as a post_setup hook. If the user has no client_id yet, the interactive wizard walks them through Spotify app creation; if they do, it skips straight to PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on AND authenticated. SystemExit from the wizard (user abort) leaves the toolset enabled and prints a 'run: hermes auth spotify' hint — it does NOT fail the toolset toggle. Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users to type env var names before the wizard fires was UX-backwards — the point of the wizard is that they don't HAVE a client_id yet. B — Docs page now covers cron + Spotify. New 'Scheduling: Spotify + cron' section with two working examples (morning playlist, wind-down pause) using the real 'hermes cron add' CLI surface (verified via 'cron add --help'). Covers the active-device gotcha, Premium gating, memory isolation, and links to the cron docs. Also fixed a stale '9 Spotify tools' reference in the setup copy — we consolidated to 7 tools in #15154. Validation: - scripts/run_tests.sh tests/hermes_cli/test_tools_config.py tests/hermes_cli/test_spotify_auth.py tests/tools/test_spotify_client.py → 54 passed - website: node scripts/prebuild.mjs && npx docusaurus build → SUCCESS, no new warnings |
||
|
|
e5d41f05d4 |
feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154)
Three quality improvements on top of #15121 / #15130 / #15135: 1. Tool consolidation (9 → 7) - spotify_saved_tracks + spotify_saved_albums → spotify_library with kind='tracks'|'albums'. Handler code was ~90 percent identical across the two old tools; the merge is a behavioral no-op. - spotify_activity dropped. Its 'now_playing' action was a duplicate of spotify_playback.get_currently_playing (both return identical 204/empty payloads). Its 'recently_played' action moves onto spotify_playback as a new action — history belongs adjacent to live state. - Net: each API call ships 2 fewer tool schemas when the Spotify toolset is enabled, and the action surface is more discoverable (everything playback-related is on one tool). 2. Spotify skill (skills/media/spotify/SKILL.md) Teaches the agent canonical usage patterns so common requests don't balloon into 4+ tool calls: - 'play X' = one search, then play by URI (not search + scan + describe + play) - 'what's playing' = single get_currently_playing (no preflight get_state chain) - Don't retry on '403 Premium required' or '403 No active device' — both require user action - URI/URL/bare-ID format normalization - Full failure-mode reference for 204/401/403/429 3. Surfaced in 'hermes setup' tool status Adds 'Spotify (PKCE OAuth)' to the tool status list when auth.json has a Spotify access/refresh token. Matches the homeassistant pattern but reads from auth.json (OAuth-based) rather than env vars. Docs updated to reflect the new 7-tool surface, and mention the companion skill in the 'Using it' section. Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after renaming/replacing the spotify_activity tests with library + recently_played coverage). Docusaurus build clean. |
||
|
|
9be17bb84f |
docs(spotify): expand feature page with tool reference, Free/Premium matrix, troubleshooting (#15135)
The initial Spotify docs page shipped in #15130 was a setup guide. This expands it into a full feature reference: - Per-tool parameter table for all 9 tools, extracted from the real schemas in tools/spotify_tool.py (actions, required/optional args, premium gating). - Free vs Premium feature matrix — which actions work on which tier, so Free users don't assume Spotify tools are useless to them. - Active-device prerequisite called out at the top; this is the #1 cause of '403 no active device' reports for every Spotify integration. - SSH / headless section explaining that browser auto-open is skipped when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port. - Token lifecycle: refresh on 401, persistence across restarts, how to revoke server-side via spotify.com/account/apps. - Example prompt list so users know what to ask the agent. - Troubleshooting expanded: no-active-device, Premium-required, 204 now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not opening browser. - 'Where things live' table mapping auth.json / .env / Spotify app. Verified with 'node scripts/prebuild.mjs && npx docusaurus build' — page compiles, no new warnings. |
||
|
|
05394f2f28 |
feat(spotify): interactive setup wizard + docs page (#15130)
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID is required' if the user hadn't manually created a Spotify developer app and set env vars. Now the command detects a missing client_id and walks the user through the one-time app registration inline: - Opens https://developer.spotify.com/dashboard in the browser - Tells the user exactly what to paste into the Spotify form (including the correct default redirect URI, 127.0.0.1:43827) - Prompts for the Client ID - Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent runs skip the wizard - Continues straight into the PKCE OAuth flow Also prints the docs URL at both the start of the wizard and the end of a successful login so users can find the full guide. Adds website/docs/user-guide/features/spotify.md with the complete setup walkthrough, tool reference, and troubleshooting, and wires it into the sidebar under User Guide > Features > Advanced. Fixes a stale redirect URI default in the hermes_cli/tools_config.py TOOL_CATEGORIES entry (was 8888/callback from the PR description instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value 43827/spotify/callback defined in auth.py). |
||
|
|
2cab8129d1 |
feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild
When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section. |
||
|
|
852c7f3be3 |
feat(cron): per-job workdir for project-aware cron runs (#15110)
Cron jobs can now specify a per-job working directory. When set, the job runs as if launched from that directory: AGENTS.md / CLAUDE.md / .cursorrules from that dir are injected into the system prompt, and the terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD). When unset, old behaviour is preserved (no project context files, tools use the scheduler's cwd). Requested by @bluthcy. ## Mechanism - cron/jobs.py: create_job / update_job accept 'workdir'; validated to be an absolute existing directory at create/update time. - cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD at it and flip skip_context_files to False before building the agent. Restored in finally on every exit path. - cron/scheduler.py tick: workdir jobs run sequentially (outside the thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs still run in the parallel pool unchanged. - tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py: expose 'workdir' via the cronjob tool and 'hermes cron create/edit --workdir ...'. Empty string on edit clears the field. ## Validation - tests/cron/test_cron_workdir.py (21 tests): normalize, create, update, JSON round-trip via cronjob tool, tick partition (workdir jobs run on the main thread, not the pool), run_job env toggle + restore in finally. - Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py, test_config_cwd_bridge.py, test_worktree.py): 314/314 passed. - Live smoke: hermes cron create --workdir $(pwd) works; relative path rejected; list shows 'Workdir:'; edit --workdir '' clears. |
||
|
|
7626f3702e |
feat: read prompt caching cache_ttl from config
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in) - Document DEFAULT_CONFIG and developer guide example - Add unit tests for default, 1h, and invalid TTL fallback Made-with: Cursor |
||
|
|
b2e124d082 |
refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047)
* refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch |
||
|
|
2ba9b29f37 |
docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section
Follow-up to aeff6dfe:
- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
"before auth"). Hook intentionally runs BEFORE auth so plugins can
handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
website/docs/user-guide/features/hooks.md (matches the pattern of
every other plugin hook: signature, params table, fires-where,
return-value table, use cases, two worked examples) plus a row in
the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
other hook entries.
No code behavior change.
|
||
|
|
1ef1e4c669 |
feat(plugins): add pre_gateway_dispatch hook
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:
{"action": "skip", "reason": "..."} -> drop (no reply)
{"action": "rewrite", "text": "..."} -> replace event.text
{"action": "allow"} / None -> normal dispatch
Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.
Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.
Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
(skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`
Made-with: Cursor
|
||
|
|
67bfd4b828 |
feat(tui): stream thinking + tools expanded by default
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as a live transcript (reasoning + tool calls streaming inline) instead of a wall of `▸` chevrons the user has to click every turn. Final default matrix: - thinking: expanded - tools: expanded - activity: hidden (unchanged from the previous commit) - subagents: falls through to details_mode (collapsed by default) Everything explicit in `display.sections` still wins, so anyone who already pinned an override keeps their layout. One-line revert is `display.sections.<name>: collapsed`. |
||
|
|
728767e910 |
feat(tui): hide the activity panel by default
The activity panel (gateway hints, terminal-parity nudges, background notifications) is noise for the typical day-to-day user, who only cares about thinking + tools + streamed content. Make `hidden` the built-in default for that section so users land on the quiet mode out of the box. Tool failures still render inline on the failing tool row, so this default suppresses the noise feed without losing the signal. Opt back in with `display.sections.activity: collapsed` (chevron) or `expanded` (always open) in `~/.hermes/config.yaml`, or live with `/details activity collapsed`. Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the fallback in `sectionMode()` between the explicit override and the global details_mode. Existing `display.sections.activity` overrides take precedence — no migration needed for users who already set it. |
||
|
|
78481ac124 |
feat(tui): per-section visibility for the details accordion
Adds optional per-section overrides on top of the existing global
details_mode (hidden | collapsed | expanded). Lets users keep the
accordion collapsed by default while auto-expanding tools, or hide the
activity panel entirely without touching thinking/tools/subagents.
Config (~/.hermes/config.yaml):
display:
details_mode: collapsed
sections:
thinking: expanded
tools: expanded
activity: hidden
Slash command:
/details show current global + overrides
/details [hidden|collapsed|expanded] set global mode (existing)
/details <section> <mode|reset> per-section override (new)
/details <section> reset clear override
Sections: thinking, tools, subagents, activity.
Implementation:
- ui-tui/src/types.ts SectionName + SectionVisibility
- ui-tui/src/domain/details.ts parseSectionMode / resolveSections /
sectionMode + SECTION_NAMES
- ui-tui/src/app/uiStore.ts +
app/interfaces.ts +
app/useConfigSync.ts sections threaded into UiState
- ui-tui/src/components/
thinking.tsx ToolTrail consults per-section mode for
hidden/expanded behaviour; expandAll
skips hidden sections; floating-alert
fallback respects activity:hidden
- ui-tui/src/components/
messageLine.tsx + appLayout.tsx pass sections through render tree
- ui-tui/src/app/slash/
commands/core.ts /details <section> <mode|reset> syntax
- tui_gateway/server.py config.set details_mode.<section>
writes to display.sections.<section>
(empty value clears the override)
- website/docs/user-guide/tui.md documented
Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway).
Total: 269/269 vitest, all gateway tests pass.
|
||
|
|
5a1c599412 |
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' | 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes. |
||
|
|
0f6eabb890 |
docs(website): dedicated page per bundled + optional skill (#14929)
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.
Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.
- website/scripts/generate-skill-docs.py — generator that reads skills/ and
optional-skills/, writes per-skill pages, regenerates both catalog indexes,
and rewrites the Skills section of sidebars.ts. Handles MDX escaping
(outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
before docusaurus build so CI stays in sync with the source SKILL.md files.
Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
|
||
|
|
983bbe2d40 |
feat(skills): add design-md skill for Google's DESIGN.md spec (#14876)
* feat(config): make tool output truncation limits configurable Port from anomalyco/opencode#23770: expose a new `tool_output` config section so users can tune the hardcoded truncation caps that apply to terminal output and read_file pagination. Three knobs under `tool_output`: - max_bytes (default 50_000) — terminal stdout/stderr cap - max_lines (default 2000) — read_file pagination cap - max_line_length (default 2000) — per-line cap in line-numbered view All three keep their existing hardcoded values as defaults, so behaviour is unchanged when the section is absent. Power users on big-context models can raise them; small-context local models can lower them. Implementation: - New `tools/tool_output_limits.py` reads the section with defensive fallback (missing/invalid values → defaults, never raises). - `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from get_max_bytes(). - `tools/file_operations.py` normalize_read_pagination() and _add_line_numbers() now pull the limits at call time. - `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section so `hermes setup` writes defaults into fresh configs. - Docs page `user-guide/configuration.md` gains a "Tool Output Truncation Limits" section with large-context and small-context example configs. Tests (18 new in tests/tools/test_tool_output_limits.py): - Default resolution with missing / malformed / non-dict config. - Full and partial user overrides. - Coercion of bad values (None, negative, wrong type, str int). - Shortcut accessors delegate correctly. - DEFAULT_CONFIG exposes the section with the right defaults. - Integration: normalize_read_pagination clamps to the configured max_lines. * feat(skills): add design-md skill for Google's DESIGN.md spec Built-in skill under skills/creative/ that teaches the agent to author, lint, diff, and export DESIGN.md files — Google's open-source (Apache-2.0) format for describing a visual identity to coding agents. Covers: - YAML front matter + markdown body anatomy - Full token schema (colors, typography, rounded, spacing, components) - Canonical section order + duplicate-heading rejection - Component property whitelist + variants-as-siblings pattern - CLI workflow via 'npx @google/design.md' (lint/diff/export/spec) - Lint rule reference including WCAG contrast checks - Common YAML pitfalls (quoted hex, negative dimensions, dotted refs) - Starter template at templates/starter.md Package verified live on npm (@google/design.md@0.1.1). |
||
|
|
f593c367be |
feat(dashboard): reskin extension points for themes and plugins (#14776)
Themes and plugins can now pull off arbitrary dashboard reskins (cockpit
HUD, retro terminal, etc.) without touching core code.
Themes gain four new fields:
- layoutVariant: standard | cockpit | tiled — shell layout selector
- assets: {bg, hero, logo, crest, sidebar, header, custom: {...}} —
artwork URLs exposed as --theme-asset-* CSS vars
- customCSS: raw CSS injected as a scoped <style> tag on theme apply
(32 KiB cap, cleaned up on theme switch)
- componentStyles: per-component CSS-var overrides (clipPath,
borderImage, background, boxShadow, ...) for card/header/sidebar/
backdrop/tab/progress/badge/footer/page
Plugin manifests gain three new fields:
- tab.override: replaces a built-in route instead of adding a tab
- tab.hidden: register component + slots without adding a nav entry
- slots: declares shell slots the plugin populates
10 named shell slots: backdrop, header-left/right/banner, sidebar,
pre-main, post-main, footer-left/right, overlay. Plugins register via
window.__HERMES_PLUGINS__.registerSlot(name, slot, Component). A
<PluginSlot> React helper is exported on the plugin SDK.
Ships a full demo at plugins/strike-freedom-cockpit/ — theme YAML +
slot-only plugin that reproduces a Gundam cockpit dashboard: MS-STATUS
sidebar with live telemetry, COMPASS crest in header, notched card
corners via componentStyles, scanline overlay via customCSS, gold/cyan
palette, Orbitron typography.
Validation:
- 15 new tests in test_web_server.py covering every extended field
- tests/hermes_cli/: 2615 passed (3 pre-existing unrelated failures)
- tsc -b --noEmit: clean
- vite build: 418 kB bundle, ~2 kB delta for slots/theme extensions
Co-authored-by: Teknium <p@nousresearch.com>
|
||
|
|
a1ff6b45ea |
fix(gateway/discord): add safe startup slash sync policy
Replaces blind tree.sync() on every Discord reconnect with a diff-based reconcile. In safe mode (default), fetch existing global commands, compare desired vs existing payloads, skip unchanged, PATCH changed, recreate when non-patchable metadata differs, POST missing, and delete stale commands one-by-one. Keeps 'bulk' for legacy behavior and 'off' to skip startup sync entirely. Fixes restart-heavy workflows that burn Discord's command write budget and can surface 429s when iterating on native slash commands. Env var: DISCORD_COMMAND_SYNC_POLICY (safe|bulk|off), default 'safe'. Co-authored-by: Codex <codex@openai.invalid> |
||
|
|
255ba5bf26 |
feat(dashboard): expand themes to fonts, layout, density (#14725)
Dashboard themes now control typography and layout, not just colors. Each built-in theme picks its own fonts, base size, radius, and density so switching produces visible changes beyond hue. Schema additions (per theme): - typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize, lineHeight, letterSpacing. fontUrl is injected as <link> on switch so Google/Bunny/self-hosted stylesheets all work. - layout — radius (any CSS length) and density (compact | comfortable | spacious, multiplies Tailwind spacing). - colorOverrides (optional) — pin individual shadcn tokens that would otherwise derive from the palette. Built-in themes are now distinct beyond palette: - default — system stack, 15px, 0.5rem radius, comfortable - midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable - ember — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem - mono — IBM Plex Sans + Mono, 13px, 0 radius, compact - cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact - rose — Fraunces (serif) + DM Mono, 16px, 1rem, spacious Also fixes two bugs: 1. Custom user themes silently fell back to default. ThemeProvider only applied BUILTIN_THEMES[name], so YAML files in ~/.hermes/dashboard-themes/ showed in the picker but did nothing. Server now ships the full normalised definition; client applies it. 2. Docs documented a 21-token flat colors schema that never matched the code (applyPalette reads a 3-layer palette). Rewrote the Themes section against the actual shape. Implementation: - web/src/themes/types.ts: extend DashboardTheme with typography, layout, colorOverrides; ThemeListEntry carries optional definition. - web/src/themes/presets.ts: 6 built-ins with distinct typography+layout. - web/src/themes/context.tsx: applyTheme() writes palette+typography+ layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the fallback-to-default bug via resolveTheme(name). - web/src/index.css: html/body/code read the new theme-font vars; --radius-sm/md/lg/xl derive from --theme-radius; --spacing scales with --theme-spacing-mul so Tailwind utilities shift with density. - hermes_cli/web_server.py: _normalise_theme_definition() parses loose YAML (bare hex strings, partial blocks) into the canonical wire shape; /api/dashboard/themes ships full definitions for user themes. - tests/hermes_cli/test_web_server.py: 16 new tests covering the normaliser and discovery (rejection cases, clamping, defaults). - website/docs/user-guide/features/web-dashboard.md: rewrite Themes section with real schema, per-model tables, full YAML example. |
||
|
|
7ca2f70055 |
fix(docs): Add links to Atropos and wandb in user guide
fix #7724 The user guide has mention of atropos and wandb but no links. This PR adds links so that users dont have to search for them. |
||
|
|
f77da7de42 | Rename _api_call_with_interrupt to _interruptible_api_call | ||
|
|
36adcebe6c | Rename API call function to _interruptible_api_call | ||
|
|
48dc8ef1d1 |
docs(cron): clarify default model/provider setup for scheduled jobs
Added a note about configuring default model and provider before creating cron jobs. |
||
|
|
156b358320 |
docs(cron): explain runtime resolution for null model/provider
Clarify job storage behavior regarding model and provider fields. |
||
|
|
92e4bbc201 |
Update Docker guide with terminal command
Add alternative instructions for opening an interactive Hermes cli chat session in a running Docker container. |
||
|
|
9357db2844 |
docs: fix fallback behavior description — it is per-turn, not per-session
The documentation claimed fallback activates 'at most once per session', but the actual implementation restores the primary model at the start of every run_conversation() call via _restore_primary_runtime(). Relevant source: run_agent.py lines 1666-1694 (snapshot), 6454-6517 (restore), 8681-8684 (called each turn). Updated the One-Shot info box and the summary table to accurately describe the per-turn restoration behavior. |
||
|
|
73533fc728 | docs: add GMI Cloud to compatible providers list | ||
|
|
dcb8c5c67a | docs(contributing): align Node requirement in repo + docs site | ||
|
|
d6d9f10629 | Update Git requirement to include git-lfs extension | ||
|
|
e038677ef6 |
docs: add Exa web search backend setup guide and details
Adds an Exa-specific setup note next to the Parallel search-modes line documenting EXA_API_KEY, category filtering (company, research paper, news, people, personal site, pdf), and domain/date filters. Reapplied onto current main from @10ishq's PR #6697 — the original branch was too far behind main to cherry-pick directly (touched 1,456 unrelated files from deleted/renamed paths). Co-authored-by: 10ishq <tanishq@exa.ai> |
||
|
|
a2a8092e90 |
feat(cli): add --ignore-user-config and --ignore-rules flags
Port from openai/codex#18646. Adds two flags to 'hermes chat' that fully isolate a run from user-level configuration and rules: * --ignore-user-config: skip ~/.hermes/config.yaml and fall back to built-in defaults. Credentials in .env are still loaded so the agent can actually call a provider. * --ignore-rules: skip auto-injection of AGENTS.md, SOUL.md, .cursorrules, and persistent memory (maps to AIAgent(skip_context_files=True, skip_memory=True)). Primary use cases: - Reproducible CI runs that should not pick up developer-local config - Third-party integrations (e.g. Chronicle in Codex) that bring their own config and don't want user preferences leaking in - Bug-report reproduction without the reporter's personal overrides - Debugging: bisect 'was it my config?' vs 'real bug' in one command Both flags are registered on the parent parser AND the 'chat' subparser (with argparse.SUPPRESS on the subparser to avoid overwriting the parent value when the flag is placed before the subcommand, matching the existing --yolo/--worktree/--pass-session-id pattern). Env vars HERMES_IGNORE_USER_CONFIG=1 and HERMES_IGNORE_RULES=1 are set by cmd_chat BEFORE 'from cli import main' runs, which is critical because cli.py evaluates CLI_CONFIG = load_cli_config() at module import time. The cli.py / hermes_cli.config.load_cli_config() function checks the env var and skips ~/.hermes/config.yaml when set. Tests: 11 new tests in tests/hermes_cli/test_ignore_user_config_flags.py covering the env gate, constructor wiring, cmd_chat simulation, and argparse flag registration. All pass; existing hermes_cli + cli suites unaffected (3005 pass, 2 pre-existing unrelated failures). |
||
|
|
b66644f0ec |
feat(hindsight): richer session-scoped retain metadata
- Add configurable retain_tags / retain_source / retain_user_prefix / retain_assistant_prefix knobs for native Hindsight. - Thread gateway session identity (user_name, chat_id, chat_name, chat_type, thread_id) through AIAgent and MemoryManager into MemoryProvider.initialize kwargs so providers can scope and tag retained memories. - Hindsight attaches the new identity fields as retain metadata, merges per-call tool tags with configured default tags, and uses the configurable transcript labels for auto-retained turns. Co-authored-by: Abner <abner.the.foreman@agentmail.to> |
||
|
|
b8663813b6 |
feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861)
* feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff |
||
|
|
3f60a907e1 | docs(wecom): document QR scan-to-create setup flow | ||
|
|
bf73ced4f5 |
docs: document delegation width + depth knobs (#13745)
Fills the three gaps left by the orchestrator/width-depth salvage: - configuration.md §Delegation: max_concurrent_children, max_spawn_depth, orchestrator_enabled are now in the canonical config.yaml reference with a paragraph covering defaults, clamping, role-degradation, and the 3x3x3=27-leaf cost scaling. - environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to the Agent Behavior table. - features/delegation.md: corrects stale 'default 5, cap 8' wording (that was from the original PR; the salvage landed on default 3 with no ceiling and a tool error on excess instead of truncation). |
||
|
|
3e1a3372ab |
docs(delegate): clarify that the parent agent, not the user, populates goal/context (#13698)
The 'subagents know nothing' warning and the 'no conversation history' constraint both said the user provides the goal/context fields. In practice the LLM parent agent calls delegate_task; the user configures the feature but doesn't write delegation calls. Rewording to point at the parent agent matches how the tool actually works. |
||
|
|
48ecb98f8a |
feat(delegate): orchestrator role and configurable spawn depth (default flat)
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2, an orchestrator child retains the 'delegation' toolset and can spawn its own workers; leaf children cannot delegate further (identical to today). Default posture is flat — max_spawn_depth=1 means a depth-0 parent's children land at the depth-1 floor and orchestrator role silently degrades to leaf. Users opt into nested delegation by raising max_spawn_depth to 2 or 3 in config.yaml. Also threads acp_command/acp_args through the main agent loop's delegate dispatch (previously silently dropped in the schema) via a new _dispatch_delegate_task helper, and adds a DelegateEvent enum with legacy-string back-compat for gateway/ACP/CLI progress consumers. Config (hermes_cli/config.py defaults): delegation.max_concurrent_children: 3 # floor-only, no upper cap delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested delegation.orchestrator_enabled: true # global kill switch Salvaged from @pefontana's PR #11215. Overrides vs. the original PR: concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only, no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which silently enabled one level of orchestration for every user). Co-authored-by: pefontana <fontana.pedro93@gmail.com> |
||
|
|
baaf49e9fd |
docs(delegate): remove default_toolsets from example config and docs
Matches the default-config removal in the preceding commit. default_toolsets was documented for users to set but was never actually read at runtime, so showing it in the example config and the delegation user guide was misleading. No deprecation note is added: the key was always a no-op, so users who copied it from the example continue to see no behavior change. Their config.yaml still parses; the key is just silently unused, same as before. Part of Initiative 2 / M0.5. |
||
|
|
5ffae9228b |
feat(image-gen): add GPT Image 2 to FAL catalog (#13677)
Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through `hermes tools` → Image Generation. SOTA text rendering (including CJK) and world-aware photorealism. - FAL_MODELS entry with image_size_preset style - 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below GPT-Image-2's 655,360 min-pixel floor and would be rejected - quality pinned to medium (same rule as gpt-image-1.5) for predictable Nous Portal billing - BYOK (openai_api_key) deliberately omitted from supports so all users stay on shared FAL billing - 6 new tests covering preset mapping, quality pinning, and supports-whitelist integrity - Docs table + aspect-ratio map updated Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG |
||
|
|
9556fef5a1 |
fix(tui): improve macOS paste and shortcut parity
- support Cmd-as-super and readline-style fallback shortcuts on macOS - add layered clipboard/OSC52 paste handling and immediate image-path attach - add IDE terminal setup helpers, terminal parity hints, and aligned docs |
||
|
|
e0dc0a88d3 |
chore: attribution + catalog rows for adversarial-ux-test
- AUTHOR_MAP: omni@comelse.com -> omnissiah-comelse - skills-catalog.md: add adversarial-ux-test row under dogfood - optional-skills-catalog.md: add new Dogfood section |
||
|
|
2d7ff9c5bd |
feat(tts): complete KittenTTS integration (tools/setup/docs/tests)
Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed |
||
|
|
328223576b |
feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384)
* feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates
When a skill loads, the activation message now exposes the absolute
skill directory and substitutes ${HERMES_SKILL_DIR} /
${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with
bundled scripts can instruct the agent to run them by absolute path
without an extra skill_view round-trip.
Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md
are pre-executed (with the skill directory as CWD) and their stdout is
inlined into the message before the agent reads it. Off by default —
enable via skills.inline_shell in config.yaml — because any snippet
runs on the host without approval.
Changes:
- agent/skill_commands.py: template substitution, inline-shell
expansion, absolute skill-dir header, supporting-files list now
shows both relative and absolute forms.
- hermes_cli/config.py: new skills.template_vars,
skills.inline_shell, skills.inline_shell_timeout knobs.
- tests/agent/test_skill_commands.py: coverage for header, both
template tokens (present and missing session id), template_vars
disable, inline-shell default-off, enabled, CWD, and timeout.
- website/docs/developer-guide/creating-skills.md: documents the
template tokens, the absolute-path header, and the opt-in inline
shell with its security caveat.
Validation: tests/agent/ 1591 passed (includes 9 new tests).
E2E: loaded a real skill in an isolated HERMES_HOME; confirmed
${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID}
resolves to the passed task_id, !`date` runs when opt-in is set, and
stays literal when it isn't.
* feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot
bash login shells don't source ~/.bashrc, so tools that install themselves
there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to
the environment snapshot Hermes builds once per session. Under systemd
or any context with a minimal parent env, that surfaces as
'node: command not found' in the terminal tool even though the binary
is reachable from every interactive shell on the machine.
Changes:
- tools/environments/local.py: before the login-shell snapshot bootstrap
runs, prepend guarded 'source <file>' lines for each resolved init
file. Missing files are skipped, each source is wrapped with a
'[ -r ... ] && . ... || true' guard so a broken rc can't abort the
bootstrap.
- hermes_cli/config.py: new terminal.shell_init_files (explicit list,
supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on)
knobs. When shell_init_files is set it takes precedence; when it's
empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced.
- tests/tools/test_local_shell_init.py: 10 tests covering the resolver
(auto-bashrc, missing file, explicit override, ~/${VAR} expansion,
opt-out) and the prelude builder (quoting, guarded sourcing), plus
a real-LocalEnvironment snapshot test that confirms exports in the
init file land in subsequent commands' environment.
- website/docs/reference/faq.md: documents the fix in Troubleshooting,
including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh
directly via shell_init_files.
Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40
pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py
50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake
~/.bashrc setting a marker var and PATH addition shows up in a real
LocalEnvironment().execute() call, that auto_source_bashrc=false
suppresses it, that an explicit shell_init_files entry wins over the
auto default, and that a missing bashrc is silently skipped.
|
||
|
|
b48ea41d27 | feat(voice): add cli beep toggle | ||
|
|
3988c3c245 |
feat: shell hooks — wire shell scripts as Hermes hook callbacks
Users can declare shell scripts in config.yaml under a hooks: block that fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call, subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on stdout to block tool calls or inject context pre-LLM. Key design: - Registers closures on existing PluginManager._hooks dict — zero changes to invoke_hook() call sites - subprocess.run(shell=False) via shlex.split — no shell injection - First-use consent per (event, command) pair, persisted to allowlist JSON - Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept - hermes hooks list/test/revoke/doctor CLI subcommands - Adds subagent_stop hook event fired after delegate_task children exit - Claude Code compatible response shapes accepted Cherry-picked from PR #13143 by @pefontana. |
||
|
|
b65f6ca7fe |
fix(telegram): actionable error for DM topics when Topics mode not enabled (#13162)
When createForumTopic fails with 'not a forum' in a private chat, the error now tells the user exactly what to do: enable Topics in the DM chat settings from the Telegram app. Also adds a Prerequisites callout to the docs explaining this client-side requirement before the config section. |