Compare commits

..

152 Commits

Author SHA1 Message Date
kshitijk4poor d8c2c77be6 feat(plugins): add optional-plugins/ discovery + langfuse_tracing as first official optional plugin
Introduces optional-plugins/ — a new category for plugins that ship with
the repo but are NOT auto-discovered. They live alongside the code but only
land in ~/.hermes/plugins/ (and thus get loaded) when the user explicitly
installs them.

Core changes:
- optional-plugins/observability/langfuse-tracing/ — langfuse tracing plugin
  (pre/post LLM + tool hooks, usage/cost normalization, fail-open when SDK
  missing). NOT in plugins/ so zero import overhead on devices that don't
  want it.
- hermes_cli/plugins_cmd.py — official install path: _resolve_official_plugin()
  recognises 'official/<category>/<name>' identifiers and copies from
  optional-plugins/ into ~/.hermes/plugins/ (no git clone, no network).
  _list_official_plugins() enumerates available optional plugins.
  cmd_list(available=True) shows not-yet-installed official plugins.
- hermes_cli/main.py — hermes plugins list --available flag
- hermes_cli/tools_config.py — Langfuse Observability in TOOL_CATEGORIES;
  post_setup handler installs the langfuse SDK and runs cmd_install()
- hermes_cli/config.py — Langfuse credentials in OPTIONAL_ENV_VARS;
  optional tuning keys in _EXTRA_ENV_KEYS

User flows:
  hermes plugins install official/observability/langfuse-tracing
  hermes plugins list --available
  hermes tools  (-> Langfuse Observability -> credentials -> auto-installs)

Closes #15764
2026-04-28 11:52:42 +05:30
Teknium 8081425a1c feat(security): make secret redaction off by default (#16794)
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.

New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.

Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
2026-04-27 21:24:08 -07:00
Teknium ec8243fe2a chore(release): map matrix-parity-batch contributor emails to GitHub logins 2026-04-27 21:22:44 -07:00
Teknium 3d67364b8f test(matrix): set user_id in approval-reaction test to bypass defensive self-drop
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
2026-04-27 21:22:44 -07:00
nbot 38a6bada92 feat(matrix): reaction-based exec approval + mention_user_id
Add Matrix reaction-based exec approval (/) and mention_user_id
support for push notifications in muted rooms.

- matrix.py: _MatrixApprovalPrompt, send_exec_approval, reaction
  approval handling, bot seed reaction redaction, mention pill in send
- base.py: inject mention_user_id into send metadata
- run.py: inject mention_user_id into status thread metadata
- tests for approval prompt registration and reaction resolution
2026-04-27 21:22:44 -07:00
Andrew Miller 6c70ac8eef matrix: e2e test for cross-signing auto-bootstrap
Self-contained docker-compose harness that exercises the new bootstrap
branch against a real Continuwuity homeserver. Three tests:

  1. fresh bot → bootstrap fires, /keys/query returns master + ssk
     with UNPADDED base64 keyids, current device is signed by the
     new SSK
  2. second startup with same crypto store → bootstrap is skipped
  3. MATRIX_RECOVERY_KEY set → existing verify_with_recovery_key path
     takes precedence, no new bootstrap

Run via:

    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
    python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v

The test mirrors the bootstrap snippet from matrix.py inline so it can
run without importing the full hermes gateway and its deps. Skipped
automatically when mautrix isn't installed or the homeserver is
unreachable.

All three pass against ghcr.io/continuwuity/continuwuity:latest
(Continuwuity 0.5.7). The unpadded-keyid assertion is the load-bearing
one — it's exactly the property the PR's bootstrap path provides that
the hand-rolled `base64.b64encode().decode()` scripts get wrong.
2026-04-27 21:22:44 -07:00
Andrew Miller d497387cec matrix: auto-bootstrap cross-signing on first startup
Without this, every Matrix bot started under hermes-agent shows the
"Encrypted by a device not verified by its owner" badge in Element
indefinitely, because the cross-signing chain (master → SSK → device)
was never published. Operators currently have to write their own
bootstrap script and remember to run it once per bot — and it's easy
to get wrong (the obvious base64.b64encode().decode() produces padded
keyids that matrix-rust-sdk silently rejects in /keys/query, so even
correctly-signed keys fail to load identity in Element).

mautrix already has the right primitive: generate_recovery_key() does
the full flow — generate seeds, upload privates to SSSS, publish
publics to the homeserver, sign the current device with the new SSK,
and return the human-readable recovery key. We invoke it once on
startup if the bot has no existing cross-signing identity, and log
the recovery key with a clear instruction to save it for future
restarts via MATRIX_RECOVERY_KEY (which the existing recovery-key
path already consumes).

Skipped when MATRIX_RECOVERY_KEY is set (existing path takes over)
or when the bot already has cross-signing keys on the homeserver
(get_own_cross_signing_public_keys returns non-None).

Bootstrap failure is non-fatal — logged with hint about UIA; the bot
continues without cross-signing and Element will show the warning
that prompted this PR. That matches the existing soft-fail pattern
for verify_with_recovery_key.

Tested against Continuwuity 0.5.7 (no UIA required). Synapse with
UIA enabled will need a follow-up PR to thread MATRIX_PASSWORD
through to /keys/device_signing/upload.
2026-04-27 21:22:44 -07:00
konsisumer 32d4048c6b fix: MatrixAdapter respects proxy configuration 2026-04-27 21:22:44 -07:00
Adam Rummer 1eab5960f0 feat(matrix): add dm_auto_thread config for DM auto-threading
Adds MATRIX_DM_AUTO_THREAD env var (default: false) to control
auto-threading in DM rooms independently from channel auto-threading.

Closes #15398
2026-04-27 21:22:44 -07:00
LeonSGP43 74a4832b74 fix(matrix): normalize image-only filenames 2026-04-27 21:22:44 -07:00
Alexazhu fbbcfa24c5 fix(matrix): preserve exception tracebacks on E2EE and auth failures
Five ``except Exception as exc:`` blocks in the Matrix adapter logged
only ``str(exc)`` without ``exc_info=True``:

- _reverify_keys_after_upload → post-upload key verification failure
- _upload_keys_if_needed      → initial device-key query failure
- _upload_keys_if_needed      → re-upload device keys failure
- _upload_keys_if_needed      → initial device key upload failure
- connect → whoami / access-token validation failure

The E2EE key paths here are security-critical: a silent traceback-
less failure during device-key verification or upload makes it
hard for operators to tell whether their Matrix bot is failing
because of a stale token, a federation timeout, or an olm state
mismatch — all three fail with different tracebacks, which
``str(exc)`` alone flattens.

The contributing guide asks for ``exc_info=True`` on error logs.
Append it to each of the five call sites. Pure logging enrichment.
2026-04-27 21:22:44 -07:00
Heathley f223346eb7 fix(matrix): add sync timeout, callback diagnostics, and mention-drop logging
- Wrap _sync_loop sync() call with asyncio.wait_for(timeout=45s) to guard
  against TCP-level hangs that the Matrix long-poll timeout cannot catch
- Add logger.debug at the top of _on_room_message so LOG_LEVEL=DEBUG
  confirms whether callbacks fire at all (diagnoses #5819, #7914, #12614)
- Add logger.debug when MATRIX_REQUIRE_MENTION silently drops a message,
  pointing users to the env var to disable the filter

Adapted for current mautrix-python adapter (PR was written against the
legacy matrix-nio adapter).

Closes #5819
2026-04-27 21:22:44 -07:00
Charles Brooks 57f8cf00e9 fix(matrix): reconcile pending invites from sync state 2026-04-27 21:22:44 -07:00
Teknium 6649e7e746 test(matrix): adapt outbound-mention notice test to current _send_simple_message API 2026-04-27 21:22:44 -07:00
Angel Claw 32b78578e0 fix(matrix): strip only explicit @mentions in _strip_mention 2026-04-27 21:22:44 -07:00
Sami Rusani 6769a0aece fix(matrix): add outbound mention payloads 2026-04-27 21:22:44 -07:00
Teknium d7528d43ac fix(web): scope dashboard config Reset button to the current tab (#16813)
* Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total

Child subagents built by delegate_task() each track their own
session_estimated_cost_usd, but the parent agent's total never folded
those numbers in.  On runs where the parent mostly delegates and the
children do the expensive work, the footer/UI was reporting a fraction
of the actual spend — sometimes $0.00 when the parent itself made no
billed calls.

Fix:
- Capture each child's session_estimated_cost_usd into _child_cost_usd
  on the result entry (before child.close() drops the counter).
- After the existing subagent_stop hook loop, sum the children's costs
  and add the total to parent.session_estimated_cost_usd.
- Promote session_cost_source from 'none' -> 'subagent' when the parent
  had no direct spend but children did, so the UI doesn't label the
  total as having unknown provenance.  Real sources (openrouter,
  anthropic, etc.) are preserved.

Nested orchestrator -> worker trees roll up naturally: each layer's own
delegate_task() folds its direct children in, and when the orchestrator
itself returns, its parent folds the orchestrator's now-inflated total
on top.

Internal fields (_child_cost_usd, _child_role) are stripped from the
results dict before it's serialised back to the model — same contract
as _child_role already followed.

Tests: TestSubagentCostRollup (5 cases) covers single-child, batch,
zero-cost-children, preserved-source, and legacy-fixture paths.

Source: https://github.com/Kilo-Org/kilocode/pull/9448

* fix(web): scope dashboard config Reset button to the current tab

Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on
the Auxiliary page wiped the entire config.yaml to defaults, not just
the auxiliary section. The button sits next to the category tabs and
users reasonably assumed 'reset this tab', not 'reset everything'.

Changes:
- handleReset now scopes to the fields in the current view:
  active category's fields (form mode) or search-matched fields
  (search mode). Only those keys are copied from defaults; the rest
  of the config is left alone.
- Added a window.confirm() with the scope name before applying.
- Button is hidden in YAML mode (scoping doesn't apply there).
- Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to
  defaults'.
- i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast
  strings in en + zh; resetDefaults key preserved for compat.
2026-04-27 21:09:14 -07:00
Teknium a7cdd4133c fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793)
On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6
are capped at 200K context unless the request carries the
`context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com)
1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate
it as beta as of 2026-04.

Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`)
while silently sending a request without the beta — so Bedrock users saw
a 200K ceiling with no error message, and no config knob unblocked it.
Claude Code sends this header by default, which is why the same Bedrock
credentials worked there.

- Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved
  thinking and fine-grained tool streaming).
- Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth
  endpoints — they host their own models, not Claude, so Anthropic beta
  headers are irrelevant and could risk rejection.
- Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock
  client. Previously that constructor passed no betas at all, so native
  Anthropic had the 1M unlock via default_headers but Bedrock didn't.
- Fast-mode per-request `extra_headers` already rebuilds from
  `_common_betas_for_base_url`, so it picks up the 1M beta automatically.

Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while
same credentials worked in Claude Code.
2026-04-27 20:41:36 -07:00
kshitijk4poor 461ef88705 fix(state): declarative column reconciliation for stuck-at-old-v7 DBs
Anyone who ran hermes between Apr 15 (42aeb4ec) and Apr 22 (a7d78d3b)
has schema_version=7 from the pre-renumber api_call_count migration.
When a7d78d3b inserted reasoning_content as the new v7 and pushed
api_call_count to v8, the 'if current_version < 7' gate was already
false for those users, so reasoning_content was never created —
sqlite3.OperationalError: no such column: reasoning_content on any
/continue or /resume touching assistant replays.

Replaces the version-gated ADD COLUMN chain with _reconcile_columns():
on every startup, parse SCHEMA_SQL via an in-memory SQLite and diff
against PRAGMA table_info; ALTER TABLE ADD COLUMN for anything missing.
Follows the Beets / sqlite-utils pattern — SCHEMA_SQL becomes the single
source of truth for declared columns. Self-healing and idempotent.

v10 trigram FTS backfill is retained in a version-gated block — that
migration isn't a column add, it inserts existing message rows into
the new FTS virtual table, so reconciliation can't express it.
schema_version is also kept for future row-data migrations.

Salvaged from #14097 (@kshitijk4poor) onto current main; v10 trigram
preservation and the v9 codex_message_items column (stale-missed by
the original branch) are covered automatically by reconciliation.

Tests:
- Regression: DB at old v7 with api_call_count but no reasoning_content
  gets the column on open
- Idempotency: reopening the same DB is a no-op
- Structural invariant: every SCHEMA_SQL column is in the live DB
- Existing v2 migration test still passes
- E2E verified against fresh / v1 / old-v7 / v9 DBs, plus v10 trigram
  backfill preserved
2026-04-27 20:29:32 -07:00
Teknium 12d745bd7e feat(skills): port humanizer — strip AI-isms from text (#16787)
Port https://github.com/blader/humanizer (MIT, v2.5.1, 16k stars) into
the built-in skills under skills/creative/humanizer/. Based on Wikipedia's
'Signs of AI writing' guide (WikiProject AI Cleanup) — detects 29 AI-writing
patterns and rewrites them to sound human.

Hermes-native adaptations:
- Description (<60 chars) explains what it's for: 'Humanize text: strip
  AI-isms and add real voice.'
- 'When to use this skill' section — trigger phrases (humanize, de-AI,
  de-slop, un-ChatGPT, rewrite to not sound like an LLM) plus guidance to
  apply it to the agent's own output (release notes, PR descriptions, docs).
- 'How to use it in Hermes' — maps the three real input paths (inline,
  file via read_file/patch/write_file, voice-calibration sample) onto the
  tools the agent actually has. Drops Claude Code's allowed-tools block.
- Converted frontmatter to Hermes format (metadata.hermes.tags, category,
  homepage, related_skills).

Attribution preserved:
- Original author Siqi Chen (@blader) credited in frontmatter and body.
- Full MIT LICENSE copied verbatim alongside SKILL.md.
- Wikipedia / WikiProject AI Cleanup credited.
- 29 patterns, personality/soul section, and full worked example kept
  verbatim from the source (29,914 chars).

Validated end-to-end against a clean HERMES_HOME:
- sync_skills() copies skills/creative/humanizer/ including LICENSE.
- skills_list(category='creative') returns the 48-char description.
- skill_view(name='humanizer') returns the full body with all 29 patterns,
  personality/soul, attribution, and Hermes tool refs (read_file, patch,
  write_file) intact.
2026-04-27 20:25:20 -07:00
Teknium 30307a9802 feat(plugins): add pre_approval_request / post_approval_response hooks (#16776)
Plugins can now observe dangerous-command approval events in real time,
on both the CLI-interactive path and the async gateway path. This is the
missing hook surface external tools need to build approval notifiers
(macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without
forking Hermes or running a parallel gateway adapter.

Changes:
- hermes_cli/plugins.py: add two entries to VALID_HOOKS
- tools/approval.py: fire both hooks from check_all_command_guards --
  around prompt_dangerous_approval (CLI surface) and around the
  notify_cb + blocking event.wait loop (gateway surface)
- website/docs/user-guide/features/hooks.md: document both hooks with
  a macOS-notification example
- tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once,
  CLI deny, plugin-crash resilience, gateway approve, gateway timeout

Hooks are observer-only: return values are ignored, so plugins cannot
veto or pre-answer an approval (use pre_tool_call for that). A crashing
plugin cannot break the approval flow -- invoke_hook swallows per-
callback errors, and the wrapper logs and swallows dispatch-layer
errors too.

Surface kwarg distinguishes "cli" from "gateway"; post hook reports
choice as one of once/session/always/deny/timeout.
2026-04-27 20:08:33 -07:00
Teknium 6ea5699e3f fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775)
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.

Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None

Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.

Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam

Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
2026-04-27 20:08:23 -07:00
SHL0MS c3e3a9c184 feat(skills): add Tier A references — external-data, panel-ui, replicator, dat-scripting, 3d-scene
Five additional reference docs covering common TD use cases that were not yet
documented in any reference (operators.md lists the ops, but no usage patterns).

- external-data.md: webDAT, webclientDAT, webserverDAT, websocketDAT,
  mqttClientDAT, serialDAT, tcpipDAT — auth, polling, push, JSON parsing
- panel-ui.md: custom parameter pages, button/slider/field/list COMPs,
  containerCOMP layouts, panelExecuteDAT callbacks
- replicator.md: replicatorCOMP for data-driven cloning, per-row overrides,
  recreatemissing pattern, replicator vs Python loop
- dat-scripting.md: full Execute DAT family — chopExecuteDAT, datExecuteDAT,
  parameterExecuteDAT, panelExecuteDAT, opExecuteDAT, executeDAT lifecycle
- 3d-scene.md: light types, three-point rigs, shadows, IBL/cubemaps,
  PBR materials with idiom table, multi-camera, DOF

Same conventions as existing refs: code-first, verify param names with
td_get_par_info, no token-budget impact (load on demand).
2026-04-27 19:35:18 -07:00
SHL0MS 02df438316 feat(skills): expand touchdesigner-mcp with animation, MIDI/OSC, particles, projection refs
Adds four new reference docs covering common TD use cases not previously
documented in the skill:

- animation.md: LFOs, timers, keyframes, easing, time references
- midi-osc.md: MIDI controllers, OSC routing, TouchOSC, multi-machine sync
- particles.md: POPs and particleSOP — emission, forces, collisions, render
- projection-mapping.md: windowCOMP, corner pin, mesh warp, edge blending

Also clarifies the SKILL.md tool quick reference: adds td_screen_point_to_global
and notes that 4 admin/dev-mode tools (td_project_quit, td_test_session,
td_dev_log, td_clear_dev_log) live only in mcp-tools.md to keep the main
reference focused on creative workflows.

No SKILL.md workflow or critical-rules changes. References load on demand
so no token-budget impact at session start.
2026-04-27 19:35:18 -07:00
Teknium 94b26f3ec9 fix(compression): retry summary on main model for unknown errors before giving up (#16774)
The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder.

Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back).

Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.
2026-04-27 19:25:57 -07:00
iamagenius00 f2fcc087f7 test(gateway): cover /compress summary-failure warning path
PR #16333 added a warning to the manual /compress reply when the
auxiliary summariser fails and the static fallback placeholder is
used, but only the gateway-hygiene path had a test
(test_session_hygiene_warns_user_when_summary_generation_fails).
The /compress branch in _handle_compress_command was uncovered.

New test test_compress_command_appends_warning_when_summary_generation_fails
mocks the compressor's _last_summary_fallback_used /
_last_summary_dropped_count / _last_summary_error fields and
verifies the /compress reply contains the ⚠️ marker, the underlying
error string, the dropped message count, and the 'historical
message(s) were removed' wording — i.e. the same contract the
hygiene-path test enforces.
2026-04-27 19:18:13 -07:00
iamagenius00 e7f2204a07 fix(compression): reset _last_summary_error at start of compress()
The per-call reset block at the top of compress() cleared
_last_summary_dropped_count and _last_summary_fallback_used but
not _last_summary_error. Functionally this didn't break the
gateway warning path (callers gate on _last_summary_fallback_used
first, and _last_summary_error is overwritten on the next failure),
but it left the three tracking fields inconsistent — anyone
reading _last_summary_error standalone after a successful compress
would see a stale value from a previous failed compress.

Reset all three together so the per-call contract is uniform.
2026-04-27 19:18:13 -07:00
iamagenius00 5c56805a74 fix(compression): align fallback placeholder wording with gateway warning
The fallback placeholder said "N conversation turns were removed" while the
gateway warning said "N historical message(s) were removed". Use "messages"
in both so users don't wonder if the two counters refer to different things.
2026-04-27 19:18:13 -07:00
iamagenius00 c61bc3f72c fix(compression): pass thread_id metadata + add gateway test for warning delivery
Address review feedback on PR #16333:

1. The hygiene-path warning send was missing metadata=_hyg_meta. On
   Telegram topics / Slack threads / Discord threads the warning would
   land in the main channel instead of the originating thread. Now
   reuses the same _hyg_meta dict already computed for the hygiene
   compaction itself.

2. New gateway-level test
   test_session_hygiene_warns_user_when_summary_generation_fails
   verifies end-to-end:
   - When the compressor's _last_summary_fallback_used flag is True,
     the gateway invokes adapter.send() exactly once.
   - The warning message includes the dropped count and the underlying
     error string.
   - metadata={'thread_id': ...} is propagated so the warning lands
     in the originating topic/thread.

Tests: 20 gateway hygiene + 54 context_compressor — all pass.
2026-04-27 19:18:13 -07:00
iamagenius00 dfdc4276e8 fix(compression): notify gateway users when summary generation fails
When auxiliary compression's summary LLM call fails (e.g. model 404,
auxiliary model misconfigured), the compressor still drops the selected
turns and inserts a static fallback placeholder — the dropped context
is unrecoverable.

Previously the only signal of this was a WARNING in agent.log. Gateway
users (Telegram/Discord/etc.) had no way to know context was lost
because the existing _emit_warning path requires a status_callback,
and the gateway hygiene path uses a temporary _hyg_agent with
quiet_mode=True and no callback wired up.

Changes:
- ContextCompressor: track _last_summary_fallback_used and
  _last_summary_dropped_count on each compress() call. Cleared at the
  start of compress() and on session reset.
- gateway/run.py hygiene: after auto-compress, inspect the temp
  agent's compressor; if fallback was used, send a visible ⚠️ warning
  to the user via the platform adapter (TG/Discord/etc.) including
  dropped count and the underlying error.
- gateway/run.py /compress: append the same warning to the manual
  compress reply so users running /compress see the failure too.

Acceptance:
- Summary success: no user-visible warning (unchanged).
- Summary failure on gateway hygiene: user receives a TG/Discord
  message with dropped count + error + remediation hint.
- Summary failure on /compress: warning appended to the command reply.
- CLI status_callback / _emit_warning path is untouched.
- Test coverage: two new tests verify the tracking fields are set on
  failure and cleared on subsequent success.
2026-04-27 19:18:13 -07:00
Teknium f40b20d13c fix(gateway): keep typing indicator alive across slow send_typing calls (#16763)
The typing-indicator refresh loop in BasePlatformAdapter._keep_typing
awaited each send_typing call unconditionally. Each call is an HTTP
round-trip to the platform API (Telegram/Discord), normally ~100ms. When
the same network instability that causes upstream provider timeouts
(e.g. Anthropic capacity blips slowing first-token latency past the
120s stream-read timeout) also slows the platform typing API to
multi-second response times, the refresh loop stalls inside the await.
Platform-side typing expires at ~5s, so the bubble dies and stays dead
until the stuck send_typing call returns — right when the user most
needs the 'still working' signal and instead sees a bot that looks
dead, then asks 'wtf are you doing' which itself interrupts the
eventually-recovering turn.

Bound each send_typing with asyncio.wait_for (1.5s cap, derived from
interval so it's always below the 2s cadence). Slow calls get abandoned
so the next scheduled tick fires a fresh send_typing on schedule. As
long as any one of them reaches the platform within its ~5s
typing-expiry window, the bubble stays visible across the stall.

Also catches non-timeout send_typing exceptions (transient HTTP errors)
so one bad tick doesn't terminate the whole loop.

Tests: 4 new in tests/gateway/test_keep_typing_timeout.py covering
slow-send non-blocking, fast-send still-awaited, exception resilience,
and paused-chat regression guard.
2026-04-27 19:09:32 -07:00
kshitijk4poor 853ed609a1 feat(skills): bundle touchdesigner-mcp by default 2026-04-27 18:22:58 -07:00
helix4u 49fb75463f fix(gateway): keep env-token Slack enabled 2026-04-27 18:19:14 -07:00
brooklyn! e0e67a99bb fix(tui): address copilot follow-up review on PR #16732 (#16740)
- moveCursor(extend=true) now collapses to the bare cursor when the
  computed offset equals the existing anchor instead of leaving a
  zero-length sel. Without this, Shift+Left at col 0 / Shift+Home at
  start would silently hide the hardware cursor (selected truthy)
  without rendering any highlight.
- _tui_need_npm_install also catches UnicodeDecodeError so a corrupted
  / non-UTF8 lockfile falls back to the mtime path the docstring
  promises instead of crashing.

Made-with: Cursor
2026-04-27 16:54:25 -07:00
brooklyn! e7091bb326 fix(tui): mouse + keyboard text selection in the composer (#16732)
* feat(tui): auto copy-on-select for transcript text

Drag in the transcript already highlighted but you had to press Cmd+C to
land it on the clipboard, and the highlight cleared on copy — most users
never realised selection existed. Now drag-release fires copySelectionNoClear
so the text is on the clipboard immediately while the highlight stays put,
matching iTerm2's "Copy to pasteboard on selection" default. Esc clears.

Behaviour:
- Single click in the input still positions the cursor (TextInput onClick).
- Single click in the transcript still does nothing destructive.
- Double / triple click select word / line, then drag extends.
- /copyselect [on|off|toggle] (alias /cos) flips the setting at runtime,
  HERMES_TUI_DISABLE_COPY_ON_SELECT=1 disables at startup, persists via
  display.tui_copy_on_select in config.yaml.

Help overlay now lists drag-select, multi-click, and click-to-position
so the gestures are discoverable.

Made-with: Cursor

* fix(tui): support prompt text selection gestures

Add mouse drag selection and Shift+Arrow/Home/End extension inside the TUI composer so prompt text behaves like a normal editable field while keeping click-to-position and right-click paste intact.

Made-with: Cursor

* Revert "feat(tui): auto copy-on-select for transcript text"

This reverts commit 6701288fe0.

* fix(tui): allow composer selection from prompt whitespace

Give the composer a one-cell mouse capture pad before the editable text. The prompt glyph/gutter still does not become selectable, but dragging from the edge now anchors at input offset 0 so users do not need to hit the first character precisely.

Made-with: Cursor

* fix(tui): clear selections from blank composer space

Clicking blank space in the transcript or composer now clears active TUI/input selections like a normal text surface. TextInput clicks stop bubbling so cursor placement and selection gestures keep their local behavior.

Made-with: Cursor

* fix(tui): delegate prompt gutter drags to composer text

The prompt gutter is now an input gesture region, not selectable content. Dragging from the whitespace or prompt area anchors the composer selection at offset 0, while selection highlight/copy remains limited to actual input text.

Made-with: Cursor

* fix(tui): move composer cursor to end on selection clear

External clear actions now collapse the composer selection to the end of the input, matching normal text-field behavior after dismissing a selection.

Made-with: Cursor

* fix(tui): capture composer padding before prompt

Add an explicit mouse capture cell over the left padding before the prompt glyph. Drags starting there now delegate to the composer input at offset 0 instead of starting terminal-level selection over the prompt chrome.

Made-with: Cursor

* fix(tui): avoid npm install on lockfile mtime churn

Compare package-lock.json against npm's hidden node_modules lock by content instead of mtimes. Git checkouts and npm lock rewrites can make the root lockfile newer even when installed dependencies already match, causing hermes --tui to print Installing TUI dependencies on every launch.

Made-with: Cursor

* fix(tui): include prompt leading cell in gesture region

Use the prompt box's real layout region to cover the leading whitespace cell before the glyph. The cell now participates in mouse hit testing and delegates to composer selection instead of starting terminal-level selection.

Made-with: Cursor

* fix(tui): widen prompt-side gesture capture band

Capture a wider left-side band around the composer prompt row so drags starting in terminal gutter/padding cells are consumed and delegated to input selection, instead of triggering terminal-level selection chrome.

Made-with: Cursor

* fix(tui): make pre-prompt spacer non-selectable content

Replace the sticky-prompt fallback `Text(' ')` with an empty spacer box so the visual gap remains but no literal space character is rendered/copyable before the composer prompt.

Made-with: Cursor

* fix(tui): capture pre-prompt spacer without shifting prompt layout

Revert the widened negative-margin prompt capture band and instead capture drags on the dedicated spacer row above the prompt. This keeps prompt/text alignment stable while still delegating whitespace-start drags to composer selection.

Made-with: Cursor

* fix(tui): align prompt with status bar and capture full input row

Drop the leading prompt column from 3 to 2 so the input first character lines up with the status bar text. Wrap the prompt+input row in a single mouse-capture box and stop event propagation from TextInput's own handlers so any drag in that row delegates to composer selection without leaking to terminal-level selection.

Made-with: Cursor

* fix(tui): anchor hardware cursor during composer selection

When a composer selection covers a row exactly the column width, the rendered text fills the row and the terminal auto-wraps the hardware cursor to col 0 of the next row, leaving a ghost block beneath the prompt. Park the cursor at the start of the input box during selection so it can't escape the input region.

Made-with: Cursor

* fix(tui): hide hardware cursor during composer selection

Stop fighting auto-wrap by hiding the hardware cursor outright while the
composer has an active selection. This prevents both the ghost block under
the prompt (cursor wrapping past the last cell) and the parked-cursor block
on the first selected character. The cursor restores as soon as the
selection clears or focus changes.

Made-with: Cursor

* chore(tui): /clean — drop dead capture-pad path, dedupe gutter handlers

- TextInput: remove unused leftCaptureColumns prop and capture-pad math, drop
  unused mouseApi.startAt, fold mouse offset into a single offsetAt helper,
  share a MouseEventLite type across the four handlers.
- appLayout: hoist a GutterMouseEvent type and an endInputDrag callback so the
  spacer/prompt/input rows share one shape.
- _tui_need_npm_install: lift the runtime-only key set to a module constant,
  collapse nested isinstance checks, and document the mtime fallback.

Made-with: Cursor

* fix(tui): address copilot review on PR #16732

- Split InputSelection.clear() into clear() (cursor-preserving) and
  collapseToEnd() (clear + jump to end). Cmd+C copy paths keep using
  clear() so the cursor stays put; the blank-area click in useMainApp
  switches to collapseToEnd() to match the requested UX.
- Spacer-row drags now force row=0 when forwarding into the input,
  since the spacer's vertical origin doesn't align with the input box
  and Ink mouse-capture keeps dispatching motion to the original
  target. Prompt+input row drag keeps localRow because origins match.

Made-with: Cursor

* fix(tui): give TextInput Box an explicit width

After the /clean pass dropped the unused capture-pad math, the wrapping
Box also lost its explicit width and started sizing to its rendered
content. Clicks past the last character missed TextInput and fell
through to the parent prompt-row Box, which collapsed the cursor to
offset 0. Pin the Box back to `columns` so the input owns its full
column span regardless of value length.

Made-with: Cursor

* feat(tui): double-click select-all + hide cursor on terminal blur

- Track click time/offset in TextInput so a quick second click on the
  same offset triggers select-all. Ink's screen-level multi-click is
  bypassed once our onMouseDown captures, so the gesture has to be
  detected locally.
- Extend the cursor-hide effect to also fire when the terminal loses
  focus, so the hollow-rect ghost most terminals draw at the parked
  cursor position disappears too.

Made-with: Cursor

* chore(tui): /clean — extract isMultiClickAt helper

Pull the click-recurrence math out of TextInput's onMouseDown into a
small isMultiClickAt(offset) helper so the handler reads as the gesture
list it actually is (multi-click → select-all, otherwise start).
Drop the redundant length>0 guard now that selectAll() already noops on
an empty value.

Made-with: Cursor

* docs(tui): explain _tui_need_npm_install content-vs-mtime comparison

Expand the docstring so future readers understand why we parse the
lockfiles instead of comparing mtimes, what the optional/peer skip
covers, how stale hidden-lock entries are handled, and when we fall
back to mtime.
2026-04-27 16:43:48 -07:00
Ben Barclay bebc10528f Merge pull request #16728 from NousResearch/docs/docker-multi-profile-section
docs(docker): add "Multi-profile support" section recommending one container per profile
2026-04-28 09:29:24 +10:00
Ben Barclay 273be93499 docs(docker): restore accidentally-redacted placeholder strings
The previous commit on this branch went through a layer that redacted
strings matching API-key patterns. Restore the original placeholder
values (sk-ant-..., ${ANTHROPIC_API_KEY}, etc.) that were already in
main so the diff is scoped strictly to the new Multi-profile support
section.
2026-04-28 08:21:40 +10:00
Ben Barclay adc2856ffb docs(docker): add "Multi-profile support" section
Clarifies that Hermes' built-in multi-profile feature is not recommended
when running under Docker. Recommends instead running one container per
profile, each bind-mounting its own host data directory as /opt/data.
Includes docker run examples, a rationale list (isolation, independent
lifecycle, port separation, concurrent-write safety), and a Compose
snippet showing two profile services side by side.
2026-04-28 08:20:01 +10:00
brooklyn! 46b4cf8d21 Merge pull request #16707 from NousResearch/bb/tui-queue-delete
feat(tui): delete queued message while editing with ctrl-x / cancel with esc
2026-04-27 15:56:46 -05:00
Brooklyn Nicholson 718088c382 fix(tui): copilot review on #16707 — naming, label consistency, esc priority
- Rename `removeAt` → `removeAtInPlace` and document the mutation
  contract; the old name read like a non-mutating helper.
- Hotkey table + queue header: use `Ctrl+X` / `Esc` to match the
  rest of the UI (was `⌃X` / `esc`).
- Render the queued header as a single template literal so JSX
  text-node whitespace can't sneak into the rendered line.
- Make `Esc` while editing beat the `terminal.hasSelection` clear:
  the header promises 'Esc cancel', so an active selection
  shouldn't silently consume the keystroke.
2026-04-27 15:37:54 -05:00
Brooklyn Nicholson 32b068560d fix(tui): stop ctrl+x from leaking a literal 'x' into the composer
The text input's ctrl-passthrough whitelist only listed Ctrl+C and
Ctrl+B.  Ctrl+X fell through to the printable-char branch and got
inserted as 'x' alongside the queue-delete action firing in
useInputHandlers.

Add Ctrl+X to the same whitelist so it bypasses the readline-style
fallback and reaches the app-level handler unchanged.  When not in
queue-edit mode it's a no-op, which is fine — typing 'x' on Ctrl+X
was the wrong default anyway.
2026-04-27 15:32:16 -05:00
Brooklyn Nicholson ea1012f59f feat(tui): delete queued message while editing with ctrl-x / cancel with esc
Today there's no way to remove a queued message — ↑ loads it for edit,
ctrl-K dispatches the head, but a draft you no longer want stays put
forever. ctrl-C just clears the composer and exits edit mode without
touching the queue.

Two new bindings, both gated on queueEditIdx !== null so they're
inert when the user isn't pointing at a queue item:

- ctrl-X — delete the queue item being edited, clear composer, exit
  edit mode.  "cut" matches the mental model and doesn't collide with
  any existing binding.
- esc — cancel the edit (composer clears, item stays in queue).
  Mirrors ctrl-C's existing behavior so muscle memory has two paths.

Header line now reads `queued (3) · editing 2 · ⌃X delete · esc cancel`
when in edit mode, so the affordance is discoverable without /help.
The /help hotkey table also gets a Ctrl+X entry.

ctrl-C is intentionally unchanged: it should never destroy queued
content.  Cancel is non-destructive (esc / ctrl-C); only ctrl-X
removes the item.
2026-04-27 15:24:14 -05:00
Erosika 4a9ac5c355 fix(memory): drop scrub from interim commentary + final response
Same layering concern as the persisted-assistant scrub already removed:
_emit_interim_assistant_message and the final_response return path were
mutating model output broadly.  Streaming scrubber covers real leaks
delta-by-delta; these post-stream scrubs were redundant.
2026-04-27 12:37:33 -07:00
Erosika 49e3a1d8ee style: trim verbose comment blocks added by previous commit 2026-04-27 12:37:33 -07:00
Erosika e553f6f3e4 fix(memory): narrow scrub surface to known wrapper boundaries
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:

1. gateway/run.py hardcoded a '## Honcho Context' literal split for
   vision-LLM output.  Plugin-format heading in framework code; could
   truncate legitimate output naturally containing that header.
   Drop the literal split; keep generic sanitize_context (the wrapper
   strip is plugin-agnostic).  Plugin-specific cleanup belongs at the
   provider boundary, not the shared gateway path.

2. run_agent.run_conversation scrubbed user_message and
   persist_user_message before the conversation loop.  User text is
   sacred — if a user types a literal <memory-context> tag we must
   not silently delete it.  The producer (build_memory_context_block)
   is the only legitimate emitter; user input should never need the
   reverse op.

3. _build_assistant_message scrubbed model output before persistence.
   Same hazard: would silently mutate legitimate documentation/code
   the model emits containing the literal markers.  The streaming
   scrubber catches real leaks delta-by-delta before content is
   concatenated; persist-time scrub was redundant belt-and-suspenders.

4. _fire_stream_delta stripped leading newlines from every delta unless
   a paragraph break flag was set.  Mid-stream '\n' is legitimate
   markdown — lists, code fences, paragraph breaks — and chunk
   boundaries are arbitrary.  Narrow lstrip to the very first delta
   of the stream only (so stale provider preamble still gets cleaned
   on turn start, but mid-stream formatting survives).

Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.

Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation).  Plugin-specific strings
stay out of shared runtime paths.  User input and persisted assistant
output are no longer mutated.

Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
2026-04-27 12:37:33 -07:00
Erosika 05435a35ed chore(release): map honcho-consolidation contributor emails
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in #15381
so the contributor-attribution CI check passes.
2026-04-27 12:37:33 -07:00
Erosika 894e0b935b feat(honcho): explain why when honcho_profile returns an empty card
Closed PR #5137 addressed the retrieval path (peer cards via get_card()
instead of the session-scoped lookup that returned empty for per-session
messaging flows) — that architectural fix is already in main as
_fetch_peer_card / _fetch_peer_context.

What never got fixed is the user-visible side: honcho_profile returning
a flat 'No profile facts available yet.' leaves the model to guess at
why.  The model then often surfaces it to the user as a cryptic error.

Adds a diagnostic hint next to the existing 'result' message, enumerating
the likely causes in rough order of frequency:

  1. Observation disabled for this peer (user_observe_me/others off)
  2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence
     hasn't fired enough turns — cards build over time)
  3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards

The hint also suggests alternative tools (honcho_reasoning / honcho_search)
so the model can route around the empty card rather than giving up.

Schema description updated so the model knows the hint field exists and
that an empty card is NOT an error state.

7 tests cover the hint paths: warmup, observation-disabled for user + ai,
generic fallback, populated card still returns plain result (no hint),
alternative-tool suggestion present.
2026-04-27 12:37:33 -07:00
Erosika 5883df5574 fix(honcho): keep legacy schemeless baseUrl configs working
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.

urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.

Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.

Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected.  Reviewer
noted concern #1: schemeless regression for self-hosters with old
configs.
2026-04-27 12:37:33 -07:00
Erosika cd276eef78 compat(honcho): accept metadata kwarg on on_memory_write ABC bump
main's 6a957a74 added an optional 'metadata' kwarg to
MemoryProvider.on_memory_write so providers can distinguish tool-driven
memory writes from background-review writes.  MemoryManager already
does a getfullargspec-based introspection, so the old 3-arg signature
didn't break at runtime — but it missed the origin hint entirely.

Updates HonchoMemoryProvider.on_memory_write to accept the kwarg.  The
metadata isn't yet threaded into Honcho's create_conclusion payload —
that's worth its own PR once the consolidation lands and the new
metadata shape stabilises.
2026-04-27 12:37:33 -07:00
Erosika 02ab255a0d style(honcho): hoist hashlib import; validate baseUrl scheme before 'local' sentinel
Two small follow-ups to the PR review:

- Hoist hashlib import from _enforce_session_id_limit() to module top.
  stdlib imports are free after first cache, but keeping all imports at
  module top matches the rest of the codebase.

- _resolve_api_key now URL-parses baseUrl and requires http/https +
  non-empty netloc before returning the 'local' sentinel.  A typo like
  baseUrl: 'true' (or bare 'localhost') no longer silently passes the
  credential guard; the CLI correctly reports 'not configured'.

Three new tests cover the new validation (garbage strings, non-http
schemes, valid https).
2026-04-27 12:37:33 -07:00
Erosika 3b2edb347d fix(gateway): scrub memory-context leaks from vision auto-analysis output
fixes #5719

The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description.  That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.

Strips both forms of leak before embedding:
  - <memory-context>...</memory-context> fenced blocks (sanitize_context)
  - trailing '## Honcho Context' sections (header + everything after)

Plus regression tests:
  - tests/agent/test_streaming_context_scrubber.py — 13 tests on the
    stateful scrubber (whole block, split tags, false-positive partial
    tags, unterminated span, reset, case-insensitivity)
  - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
    _fire_stream_delta covering the realistic 7-chunk leak scenario and
    the cross-turn scrubber reset
  - tests/gateway/test_vision_memory_leak.py — 4 tests covering the
    vision auto-analysis boundary (clean pass-through, '## Honcho Context'
    header, fenced block, both patterns together)
2026-04-27 12:37:33 -07:00
Erosika 5ce5b17a42 fix(honcho): buffer partial memory-context spans across stream deltas
sanitize_context() uses a non-greedy block regex that needs both
<memory-context> open and close tags present in a single string. When a
provider streams the fenced memory block across multiple deltas (typical
for recalled-context leaks — the payload often arrives in 10+ 1-80 char
chunks), the per-delta sanitize stripped the lone open/close tags via
_FENCE_TAG_RE but let the payload in between flow straight to the UI.

Adds StreamingContextScrubber: a small stateful scrubber that tracks
open/close tag pairs across deltas, holds back partial-tag tails at
chunk boundaries, and discards span contents wholesale (including the
system-note line that fragments across deltas).

Wired into _fire_stream_delta; reset per user turn; benign trailing
partial-tag tails are flushed at the end of each model call.  Mid-span
interruption (provider drops closing tag) drops the orphaned content
rather than leaking it — truncated answer > leaked memory.

Follow-up to #13672 (@dontcallmejames).
2026-04-27 12:37:33 -07:00
Erosika 5d349ea857 fix(honcho): hold RLock across new_session's get_or_create to close race
new_session() was popping the old cached session, releasing the lock,
calling get_or_create, then re-acquiring the lock to insert. A concurrent
caller could observe the empty-cache window and race-create its own
session, producing two divergent session objects for the same key.

_cache_lock is an RLock, so nested reacquisition inside get_or_create is
safe. Hold it across the whole pop/create/insert sequence.

Follow-up to #13510 (@hekaru-agent).
2026-04-27 12:37:33 -07:00
twozle 82205276c1 fix(plugins/memory/honcho): default Honcho SDK HTTP timeout to 30s
When no explicit timeout is configured (HonchoClientConfig.timeout,
honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client
previously constructed the SDK with no timeout kwarg, letting the
underlying httpx client hang indefinitely if the Honcho backend
became unreachable mid-request.

This is a silent-failure hazard on the post-response path of
run_conversation: the memory_manager.sync_all() / queue_prefetch_all()
calls fire after the agent has already generated its final reply, so
a stalled Honcho request blocks run_conversation from returning.
The gateway never logs "response ready" and never delivers the
response to the platform (Telegram, etc.), even though the text is
already saved to the session file.

Repro: unplug the network or block app.honcho.dev mid-turn after
the model has produced its final message. Without this change,
_run_agent never returns. With it, the call aborts after 30s,
run_conversation returns, and the gateway delivers the response
(Honcho sync failure is logged and swallowed as before).

The default applies only when nothing is configured, so any
deployment that has explicitly set timeout / HONCHO_TIMEOUT /
honcho.timeout / honcho.requestTimeout keeps its existing value.
Self-hosted deployments that genuinely need a longer ceiling can
still override via any of those knobs.
2026-04-27 12:37:33 -07:00
Alexander Yususpov 36d6b643f6 fix(honcho): CLI credential guard rejects self-hosted baseUrl configs
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all
CLI subcommands (identity --show, status, migrate, etc.) bail with
"No API key configured" on self-hosted instances that use baseUrl
without an API key.

Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the
client.py behavior that already handles this case for the SDK.

Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
2026-04-27 12:37:33 -07:00
HiddenPuppy 5d36871d92 Fix Honcho HOME-aware global config fallback 2026-04-27 12:37:33 -07:00
dontcallmejames f1ba4014e1 fix: harden memory-context leak boundaries 2026-04-27 12:37:33 -07:00
dontcallmejames 39713ba2ae fix: strip leaked memory context from commentary 2026-04-27 12:37:33 -07:00
hekaru-agent dad0217450 fix(honcho): thread-safe session cache via RLock
Wraps _session_cache mutations in threading.RLock. Without this, concurrent
gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time)
can race on the cache and silently lose conclusions or memory writes.

Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup
hunk from that PR is dropped here for scope isolation. Resolved a small
conflict with the pinPeerName guard (kept both).
2026-04-27 12:37:33 -07:00
Sanjays2402 cd1c4812ab fix(honcho): truncate resolve_session_name output to Honcho's 100-char limit (#13868)
Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram
supergroup reply chains, Slack thread IDs with long workspace prefixes) can
exceed Honcho's 100-character session ID limit after sanitization. Every
Honcho API call for those sessions then 400s with "session_id too long".

Add a helper that enforces the 100-char limit after sanitization:
short keys (the common case) short-circuit unchanged; over-limit keys
keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over
the original key so two long keys sharing a leading segment can't
collide onto the same truncated ID.

Adds 7 regression tests in tests/honcho_plugin/test_client.py covering
short / exact-limit / long / deterministic / collision-resistant /
allowlist-preserving / hash-suffix-present cases.
2026-04-27 12:37:33 -07:00
Brian D. Evans 326c9daa69 fix(honcho): require strict True for pin_peer_name to survive MagicMock configs (#15162)
CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name``
in ``tests/agent/test_memory_user_id.py`` failed after this branch: that
test passes a ``MagicMock`` for ``config``, where
``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by
default.  My ``getattr(..., "pin_peer_name", False)`` fallback was
supposed to guard against callers that haven't added the new attr, but
MagicMock *does* have the attr — it just returns a live mock for it.

Tightened the gate to ``getattr(..., False) is True``.  Real configs
built via ``HonchoClientConfig.from_global_config`` always yield a
proper boolean, so strict equality matches the pinned case and rejects
both the unset-attr fallback and MagicMock stand-ins.  Added a comment
explaining why ``is True`` is intentional, not paranoid.

Also tightened the ``peer_name`` existence check to
``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its
default (also truthy) doesn't spuriously enable pinning either.

Verified against both the new ``test_pin_peer_name.py`` suite (13/13
pass) and the previously-failing
``TestHonchoUserIdScoping`` (3/3 pass).  Zero behaviour change for real
``HonchoClientConfig`` values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Brian D. Evans d03c6fcc45 fix(honcho): pinPeerName opt-in keeps memory unified across platforms (#14984)
When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the
platform-native user ID as ``runtime_user_peer_name`` into the Honcho
session manager.  That ID wins over ``peer_name`` in ``honcho.json``, so a
single user who connects over three platforms ends up as three separate
Honcho peers — one per platform — with fragmented memory and no cross-
platform context continuity.

For multi-user bots this is correct (and must not change): each user gets
their own peer scope.  For the vast majority of personal Hermes deployments
the configured ``peer_name`` is an unambiguous identity, though, so the
reporter asked for an opt-in knob that pins the user peer to that value.

Fix: new ``pinPeerName`` boolean on the host config, default ``false``.
When ``true`` AND ``peerName`` is set, the configured peer_name beats the
gateway's runtime identity; every other resolution case is unchanged.

  honcho.json:
  {
    "peerName": "Igor",
    "hosts": {
      "hermes": { "pinPeerName": true }
    }
  }

  session.py (resolution order, pinned case):
    runtime_user_peer_name  →  skipped (opt-in flag active)
    config.peer_name        →  WINS   "Igor"
    session-key fallback    →  unreached

Parsing follows the same host-block-overrides-root pattern as every other
flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper).

Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups):
- Config parsing: default, root true, host-block true, host overrides
  root, explicit false.
- Peer resolution: runtime wins by default (regression guard for multi-
  user bots), config wins when pinned, pin-without-peer_name is a no-op
  (prevents silent peer-id collapse to session-key fallback), CLI path
  where runtime is absent, deepest fallback intact, assistant peer
  untouched by the flag.
- Cross-platform unification: Telegram UID + Discord snowflake collapse
  to one peer when pinned; negative control confirms two distinct
  runtime IDs still produce two peers when unpinned.

244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions.

Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name",
False)`` so callers building partial config objects (several test fixtures
across the codebase do this) don't break if they haven't updated yet.
Runtime cost: one attr lookup per new session.

Closes #14984

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Siddharth Balyan ef41d3bd45 feat(nix): declarative plugin installation for NixOS module (#15953)
* feat(nix): parameterize dependency-groups in python.nix

* refactor(nix): extract package to callPackage-able hermes-agent.nix

Makes the package overridable via .override{} and adds
extraPythonPackages parameter for PYTHONPATH injection.
Includes build-time collision check using PEP 503 name
canonicalization.

* feat(nix): add overlay for external NixOS consumption

External flakes can now add overlays = [ inputs.hermes-agent.overlays.default ]
to get pkgs.hermes-agent with full .override support.

* test(nix): add check for extraPythonPackages PYTHONPATH injection

Verifies wrapper has PYTHONPATH when extras provided, and
base package has no PYTHONPATH without extras.

* feat(nix): add extraPlugins option for directory-based plugins

Symlinks plugin packages into HERMES_HOME/plugins/ at activation time.
Validates plugin.yaml presence. Asserts unique plugin names at eval time.
Hermes discovers them automatically via its directory scan.

* feat(nix): add extraPythonPackages option for entry-point plugins

Overrides the hermes package with PYTHONPATH injection when
extraPythonPackages is non-empty. Plugin .dist-info directories
become visible to importlib.metadata for entry-point discovery.
Works in both native systemd and container modes.

* docs: add NixOS declarative plugin installation to nix-setup, plugins, and build-a-plugin guides

- nix-setup.md: new Plugins section with extraPlugins/extraPythonPackages
  examples, overlay usage, collision checking note, options reference rows
- plugins.md: Nix row in discovery table, NixOS declarative plugins section
- build-a-hermes-plugin.md: Distribute for NixOS section after pip section

* fix: address review feedback — remove unrelated umask, fix fetchFromGitHub naming, simplify checks

- Remove accidentally introduced umask/migration changes (unrelated to plugins)
- Add pluginName helper, fix fetchFromGitHub producing name='source'
- Show name= in extraPlugins example docs
- Simplify checks.nix: use hermes-agent.override instead of re-callPackage
- Fix fragile grep shell logic in checks

* refactor: address simplify feedback — lib.getName, drop unused inputs', Python list for extras

- Use lib.getName instead of custom pluginName helper
- Drop unused inputs' from checks.nix perSystem args
- Pass extraPythonPackages as Python list literal instead of colon-split string

* fix: walk propagatedBuildInputs for plugin PYTHONPATH and collision check

Uses python312.pkgs.requiredPythonModules to resolve the full transitive
closure of extraPythonPackages. Without this, a plugin with third-party
deps (e.g. requests) would fail at runtime if those deps weren't already
in the sealed uv2nix venv. The collision check now also scans the full
closure, catching transitive conflicts.

* cleanup: fold plugins into subdir loop, use find for symlink cleanup, inline lib.getName

- Add 'plugins' to the existing cron/sessions/logs/memories subdir loop
  instead of a separate mkdir/chown/chmod block
- Replace fragile for-glob with find -delete for stale symlink cleanup
- Inline lib.getName at both call sites, remove pluginName wrapper
2026-04-28 00:18:32 +05:30
Siddharth Balyan 1fa76607c0 feat: trigram FTS5 index for CJK search, replace LIKE fallback (#16651)
* fix: bypass FTS5 for CJK queries in session_search

FTS5 default tokenizer splits CJK characters into individual tokens,
so multi-character queries like "大别山项目" become AND of single chars.
This produces few/no results compared to LIKE substring search.

For CJK queries, skip FTS5 entirely and use LIKE for accurate
phrase matching.

Fixes NousResearch/hermes-agent#15500

* fix: cache _contains_cjk, escape LIKE wildcards, add regression tests

On top of the CJK FTS5 bypass from #15509:

- Cache _contains_cjk() result in a local var to avoid redundant O(n)
  scans on every CJK query
- Escape %, _ in LIKE queries so literal wildcards in user input are
  not treated as SQL wildcards (consistent with other LIKE queries in
  hermes_state.py that use ESCAPE '\')
- Fix misleading comment ('or CJK fallback' → accurate description)
- Add 3 regression tests:
  - test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829)
  - test_cjk_like_dedup_no_duplicates
  - test_cjk_like_escapes_wildcards (new wildcard escaping)

* feat: trigram FTS5 index for CJK search, replace LIKE fallback

Replace the LIKE '%query%' full-table-scan fallback for CJK queries with
a proper trigram FTS5 index (messages_fts_trigram).  The trigram tokenizer
creates overlapping 3-byte sequences so substring matching works natively
for any script — CJK, Thai, etc.

For queries with 3+ CJK characters: uses the trigram FTS5 table with
proper ranking, snippets, and indexed lookups.  For shorter queries
(1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs
≥9 UTF-8 bytes (3 CJK chars) minimum.

Schema v10 migration creates the trigram table and backfills existing
messages.  Triggers keep the index in sync on INSERT/UPDATE/DELETE.

Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards).

---------

Co-authored-by: vominh1919 <vominh1919@gmail.com>
2026-04-28 00:12:07 +05:30
brooklyn! e80504b088 Merge pull request #16656 from NousResearch/bb/tui-parity-mutating-commands
fix(tui): route mutating slash commands through live gateway state
2026-04-27 13:30:19 -05:00
Brooklyn Nicholson ed4f7f0ba3 test(tui): skip slash parity matrix when Python registry is unavailable
Keep the parity test backed by the real Python command registry while avoiding hard failures in Node-only Vitest environments that cannot import hermes_cli.commands.
2026-04-27 13:19:11 -05:00
kshitijk4poor 56724147ef fix(providers/gmi): post-salvage review fixes
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
  is 22, so all users are past version 17 and would never be prompted for
  GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
  model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
  used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
  stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
  (was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
  wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
  to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
  fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
2026-04-27 11:17:59 -07:00
Isaac Huang c53fcb0173 feat(providers): add GMI Cloud as a first-class API-key provider (#11955)
Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>
2026-04-27 11:17:59 -07:00
Brooklyn Nicholson 8a33ed6136 fix(tui): address rollback guard and parity registry review
Load slash command names from the Python registry instead of regex-parsing source, and guard native rollback when no TUI session is active.
2026-04-27 13:10:13 -05:00
brooklyn! 41f70e6fc4 Merge pull request #16664 from NousResearch/bb/fix-tui-forceredraw-export
fix(tui): expose forceRedraw in Ink type shim
2026-04-27 13:08:16 -05:00
Brooklyn Nicholson adbd173ddd fix(tui): expose forceRedraw in Ink type shim 2026-04-27 13:07:48 -05:00
Brooklyn Nicholson 4f59510dd4 fix(tui): tighten fast-mode support validation
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
2026-04-27 13:00:11 -05:00
Brooklyn Nicholson 4a08f1015a fix(tui): reject fast mode for unsupported live models
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
2026-04-27 12:55:41 -05:00
Brooklyn Nicholson 8bd5d0667a Merge origin/main into bb/tui-parity-mutating-commands
Resolve session command merge conflict and keep the branch current with main so PR #16656 is mergeable.
2026-04-27 12:51:11 -05:00
brooklyn! 6d24880604 Merge pull request #16657 from NousResearch/bb/tui-keybinding-model-parity
fix(tui): align Ctrl+L and /model default scope with classic CLI
2026-04-27 12:49:37 -05:00
Brooklyn Nicholson b8556eb15e fix(tui): address fast-mode live sync review feedback
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
2026-04-27 12:47:42 -05:00
Brooklyn Nicholson b3e7a412e2 fix(tui): wire Ctrl+L to Ink forceRedraw path
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
2026-04-27 12:44:24 -05:00
Brooklyn Nicholson da6f8449a5 test(tui): tighten redraw hotkey review follow-ups
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
2026-04-27 12:30:40 -05:00
Brooklyn Nicholson a13449a40a fix(tui): address Copilot review feedback on mutating command parity
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
2026-04-27 12:30:30 -05:00
Brooklyn Nicholson 17029a64e8 chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the keybinding/model parity branch.
2026-04-27 12:25:27 -05:00
Brooklyn Nicholson 487da4b72b chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the parity PR branch.
2026-04-27 12:25:21 -05:00
Brooklyn Nicholson 4909b94f99 fix(tui): align Ctrl+L and /model with classic CLI semantics
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
2026-04-27 12:23:56 -05:00
Brooklyn Nicholson a4cb3ef66c fix(tui): make mutating slash paths native and lifecycle-safe
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
2026-04-27 12:20:08 -05:00
brooklyn! d5a89283b7 Merge pull request #16625 from NousResearch/bb/fix-tui-title-session-sync
fix(tui): keep /title session names in sync
2026-04-27 12:05:54 -05:00
Brooklyn Nicholson 633f74504f fix(ci): resolve follow-up title edge case and flaky checks
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
2026-04-27 11:49:02 -05:00
Brooklyn Nicholson 27936ee02d fix(tui-gateway): keep queued user titles from being dropped
Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.
2026-04-27 11:31:49 -05:00
Brooklyn Nicholson 3aa86717b6 fix(tui-gateway): harden pending-title retry and user errors
Retry persisting queued titles on session.title reads and map title validation failures to a user-facing 4022 code instead of generic 5007.
2026-04-27 11:27:51 -05:00
Brooklyn Nicholson 492c4c6573 fix(tui-gateway): address follow-up Copilot title threads
Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.
2026-04-27 11:15:37 -05:00
Brooklyn Nicholson 3824b03237 fix(tui-gateway): harden session title RPC edge cases
Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.
2026-04-27 11:05:10 -05:00
Brooklyn Nicholson 42b917c92c chore: uptick 2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 7ccfb97fee test(cli): assert active-session file lifecycle in launch_tui
Validate that the temp active-session file exists while the TUI subprocess runs and is removed after launch cleanup to match mkstemp semantics.
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 7a6128cc4f fix(tui): harden active-session temp file handling
- create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally
- add assertions that launch wiring uses a randomized session file path and removes it on exit
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 4b28140912 fix(cli): tighten MRU lookup and session DB cleanup
- use a grouped last_active join in search_sessions to avoid per-row correlated max lookups
- always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 653b5ec128 fix(tui): report actual session on exit 2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 164e33aa46 fix(cli): resolve -c by true MRU session
- order session listing by computed last_active in SessionDB so callers get MRU rows directly
- keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson cdfbd89ea5 fix(tui): keep /title session names in sync
Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.
2026-04-27 10:51:14 -05:00
kshitijk4poor 730347e38f feat(skills): expand touchdesigner-mcp with GLSL, post-FX, audio, geometry references (#13664)
Add 6 new reference files with generic reusable patterns:
- glsl.md: uniforms, built-in functions, shader templates, Bayer dither
- postfx.md: bloom, CRT scanlines, chromatic aberration, feedback glow
- layout-compositor.md: layoutTOP, overTOP grids, panel dividers
- operator-tips.md: wireframe rendering, feedback TOP setup
- geometry-comp.md: instancing, POP vs SOP rendering, shape morphing
- audio-reactive.md: band extraction (audiofilterCHOP), beat detection, MIDI

Expand pitfalls.md (#46-63):
- Connection syntax, moviefileoutTOP bug, batch frame capture
- TOP.save() time advancement, feedback masking, incremental builds
- MCP reconnection after project.load(), TOX reverse-engineering
- sliderCOMP naming, create() suffix requirement
- COMP reparenting (copyOPs), expressionCHOP crash
- Strip session-specific names in earlier pitfalls (promo_ -> my_)
- Audio device CHOP at FPS=0: active=False is the fix, not volume=0

All content is generic — no session-specific paths, hardware, aesthetics,
or param-name-only entries (those belong in td_get_par_info).
Bumps version 1.0.0 -> 1.1.0.

Salvaged from @kshitijk4poor's original PR #13664; dropped setup.sh and
troubleshooting.md changes that reverted subsequent HERMES_HOME and pgrep
fixes already on main, and preserved original author frontmatter.
2026-04-27 08:46:36 -07:00
Teknium 628ca99d9b fix(compression): show main + aux model and provider in feasibility warning (#16619)
The auto-lowered-threshold warning only named the compression model,
making it confusing when the main and aux models are configured with
the same slug but end up with different resolved context lengths (e.g.
OpenRouter's stepfun/step-3.5-flash catalog value vs. a main-model
context_length override). Users couldn't tell whether the warning
reflected two different models or a context-resolution mismatch.

Now includes both 'model (provider)' labels. The aux provider falls
back to the client's base_url hostname when the configured provider
is 'auto', so users see where compression is actually being called.
2026-04-27 08:43:24 -07:00
Teknium 460a8ce5d9 chore(release): map hermes-agent-dhabibi bot -> dhabibi 2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi aa53fb661a fix(copilot): mark native image requests as vision
Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi 8402ba150e fix(copilot): send vision header for Copilot vision requests
Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
2026-04-27 08:35:50 -07:00
brooklyn! 512c610058 Merge pull request #16605 from NousResearch/bb/fix-tui-docker-ink-build
fix(docker): prebuild TUI assets in image
2026-04-27 10:17:58 -05:00
Brooklyn Nicholson b479205396 fix(docker): tighten TUI build contract 2026-04-27 10:15:00 -05:00
Austin Pickett 60f2415a4a Merge pull request #16600 from NousResearch/austin/fix/model-provider
fix(models): consolidate provider and model into /model command
2026-04-27 08:14:27 -07:00
Austin Pickett 082acc75b0 fix(review): address copilot review 2026-04-27 11:06:28 -04:00
Brooklyn Nicholson 4424a0e0f7 fix(docker): prebuild TUI assets in image 2026-04-27 10:05:07 -05:00
kshitij 98d75dea5a perf(tui): lazily seed virtual history heights (#16523) 2026-04-27 07:55:45 -07:00
Teknium 9b55365f6f fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979) (#16598)
* fix: clean gateway auxiliary client caches on teardown

* fix(gateway): recover from stale pid files and close cron agents

Two issues were keeping the gateway from surviving long runs:

1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
   refuses to unlink when the file's pid differs from our own. That
   safety check exists for the --replace atexit handoff, but it also
   applied to stale-record cleanup, so after a crashy exit the pid
   file was orphaned: `write_pid_file()`'s O_EXCL create then failed
   with `FileExistsError`, and systemd looped on "PID file race lost
   to another gateway instance". Unlink unconditionally from this
   helper since the caller has already verified the record is dead.

2. The cron scheduler never closed the ephemeral `AIAgent` it creates
   per tick, and never swept the process-global auxiliary-client
   cache. Over days of 10-minute ticks this leaked subprocesses and
   async httpx transports until the gateway hit EMFILE. Release the
   agent and call `cleanup_stale_async_clients()` in `run_job`'s
   outer `finally`, matching the gateway's own per-turn cleanup.

* chore(release): map bloodcarter@gmail.com -> bloodcarter

---------

Co-authored-by: bloodcarter <bloodcarter@gmail.com>
2026-04-27 07:41:42 -07:00
Austin Pickett a0b62e0c5a fix(models): consolidate provider and model into /model command 2026-04-27 10:38:36 -04:00
Teknium ac0325c257 diagnostic(cli): log slow bracketed-paste handler (>500ms) for #16263 (#16575)
When a paste takes longer than 500ms to process on the prompt_toolkit
event-loop thread, emit a logger.warning with elapsed time, byte size,
line count, and sys.platform. Gives us concrete repro data for the
recurring 'CLI freezes after paste on macOS' class of reports (issue
#16263, plus sibling reports across Claude Code / Cursor / Lightroom
against macOS Tahoe 26).

Pure diagnostic — no behavior change. Two time.perf_counter() calls
and one conditional per paste event. Log line only fires when the
handler is actually slow, so normal pastes add no log noise.
2026-04-27 06:44:36 -07:00
Teknium 817633bc5d feat(backup): exclude SQLite WAL/SHM/journal sidecars (#16576)
The backup takes a consistent snapshot of each .db via sqlite3.backup(),
so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the
fresh snapshot with stale sidecar state and produces a torn restore on
first open. Sidecars are transient and SQLite regenerates them on next
connection anyway.

This also trims multi-MB of junk from every zip — state.db-wal alone was
~9 MB here, doubled by the fact the WAL is the live write-ahead log, not
data.
2026-04-27 06:43:52 -07:00
Teknium 9692ce2072 chore(release): map andrewho.sf@gmail.com -> andrewhosf
Release-notes contributor attribution for the salvaged PR #13734 fix.
2026-04-27 06:42:32 -07:00
Teknium 008860a23f fix(approval): close remaining prompt_toolkit deadlock vectors (#15216)
PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor
workers didn't inherit the CLI's TLS approval callback). Two vectors
remained that could still land in the deadlocking input() fallback:

1. _spawn_background_review spawns a raw threading.Thread with no
   approval callback installed, so any dangerous-command guard the
   review agent trips falls back to input() -> deadlock against the
   parent's prompt_toolkit TUI (same class as delegate_task subagents,
   fixed in 023b1bff1 / #15491). Install a _bg_review_auto_deny
   callback at thread start, clear on finally.

2. prompt_dangerous_approval's fallback unconditionally spawned a
   daemon thread calling input() when approval_callback was None.
   That fallback can never succeed under prompt_toolkit because the
   user's Enter goes to pt's raw-mode stdin capture. Detect an active
   pt Application via get_app_or_none() and fail closed (deny + log)
   instead, so future threads that forget to install a callback
   degrade gracefully instead of hanging 60s invisibly.

Regression guards:
- tests/run_agent/test_background_review.py verifies the review
  worker thread sees a callable auto-deny callback mid-run and that
  the slot is cleared in the finally block.
- tests/tools/test_approval.py TestFailClosedUnderPromptToolkit
  verifies prompt_dangerous_approval returns 'deny' fast under a
  mocked pt Application, and that a real callback still wins over
  the guard.
2026-04-27 06:42:32 -07:00
Andrew Ho 0046d170dc fix(agent): propagate approval callbacks to concurrent tool worker threads
When tools execute concurrently via ThreadPoolExecutor, worker threads
could not see the thread-local approval/sudo callbacks registered by
the CLI. This caused dangerous-command prompts to fall back to plain
input(), which deadlocks against prompt_toolkit's raw terminal mode.

Capture parent-thread callbacks before launching workers, register
them locally in each _run_tool thread, and clear them on exit.

Mirrors the existing fix pattern from cli.py run_agent() for the
main agent worker thread (GHSA-qg5c-hvr5-hjgr / #13617).
2026-04-27 06:42:32 -07:00
luyao618 8ad29a938a fix(agent): restrict background review agent to memory and skills toolsets
The background skill/memory review agent was created without toolset
restrictions, inheriting the full default tool set. This allowed it to
use terminal, send_message, delegate_task, and other tools outside its
intended scope, potentially performing unrelated side effects after
skill creation.

Restrict the review agent to only memory and skills toolsets by passing
enabled_toolsets=['memory', 'skills'] during AIAgent construction.

Fixes #15204
2026-04-27 06:41:23 -07:00
Teknium a59a98b180 fix(cli): pass session messages to shutdown_memory_provider (#15165 sibling)
The gateway fix in the previous commit forwards _session_messages on
gateway session teardown.  The CLI exit cleanup path had the same bug:
it read getattr(agent, 'conversation_history', None) or [] — but AIAgent
has no conversation_history attribute, so providers always received [].

Switch to _session_messages (same attribute the gateway now uses),
guarded by isinstance(..., list) to preserve the no-arg fallback for
MagicMock-based CLI test stubs.

Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring
the gateway suite).
2026-04-27 06:41:16 -07:00
briandevans 500774e30e fix(gateway): pass session messages to shutdown_memory_provider (#15165)
``_cleanup_agent_resources`` previously invoked
``agent.shutdown_memory_provider()`` with no arguments, so every memory
provider's ``on_session_end`` hook received an empty list. Providers
with an early-return guard on empty input (Holographic, Hindsight) never
extracted facts from the conversation, and users hit
"抱歉,找不到相關的對話記錄" on the first turn after any gateway
restart, session reset, or idle expiry.

Forward ``agent._session_messages`` — the transcript the agent itself
maintains and refreshes every turn via ``_persist_session`` — so
providers see the actual conversation. Falls back to the legacy no-arg
call whenever the attribute is absent or not a list (test stubs built
via ``object.__new__`` or ``MagicMock``) to preserve backward
compatibility with existing suites. ``AIAgent.shutdown_memory_provider``
already accepts ``messages: list = None`` (run_agent.py:4126), so this
is a pure caller-side fix.

Paths that use ``skip_memory=True`` temporary agents (memory flush,
hygiene auto-compress, ``/compress``) are no-ops inside
``shutdown_memory_provider`` because ``self._memory_manager`` is None —
no behaviour change for them.

Covers Part A of the bug report. Part B (adding ``on_session_end`` to
the Hindsight plugin) is a separate concern that would benefit from
this fix landing first.

Regression test added at
``tests/gateway/test_shutdown_memory_provider_messages.py`` covering:
populated messages forwarded, empty list still forwarded, attribute
missing falls back, non-list (MagicMock) falls back, provider
exceptions don't block ``close()``, None agent no-op, and agent
without ``shutdown_memory_provider`` tolerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:41:16 -07:00
teknium1 c4ad2c33f4 chore(release): map christian@scheid.tech -> scheidti 2026-04-27 06:41:11 -07:00
Christian Scheid 75b460bc94 fix(email): add required Date header to outbound mail 2026-04-27 06:41:11 -07:00
Teknium a9033c9220 feat(backup): exclude checkpoints/ from backups (#16572)
Session-local trajectory cache — keyed by session hash, regenerated
per-session, won't port to another machine anyway. On a large install
this was multiple GB of pure noise in every zip.

Also adds a regression test for the pre-existing backups/ exclusion
so the two machine-local dirs share coverage.
2026-04-27 06:40:18 -07:00
Teknium ea3c5a14c3 feat(update): make pre-update backup opt-in (off by default) (#16566)
The zip backup could add minutes to every 'hermes update' on large
HERMES_HOME directories. Flip the default to off and add a --backup
flag for one-off opt-in runs.

- updates.pre_update_backup default: True -> False
- hermes update: new --backup flag (opposite of existing --no-backup)
- Silent no-op when disabled (no message spam on every update)
- Existing --no-backup still works and wins over --backup
- Users who explicitly set pre_update_backup: true keep the old behavior
- Tests updated to cover default-off, --backup opt-in, and config-enabled paths
2026-04-27 06:36:35 -07:00
Teknium ec671c4154 feat(image-input): native multimodal routing based on model vision capability (#16506)
* feat(image-input): native multimodal routing based on model vision capability

Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.

Routing decision (agent/image_routing.py::decide_image_input_mode):

  agent.image_input_mode = auto | native | text  (default: auto)

In auto mode:
  - If auxiliary.vision.provider/model is explicitly configured, keep the
    text pipeline (user paid for a dedicated vision backend).
  - Else if models.dev reports supports_vision=True for the active
    provider/model, attach natively.
  - Else fall back to text (current behaviour).

Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).

run_agent.py changes:
  - _prepare_anthropic_messages_for_api now passes image parts through
    unchanged when the model supports vision — the Anthropic adapter
    translates them to native image blocks. Previous behaviour
    (vision_analyze → text) only runs for non-vision Anthropic models.
  - New _prepare_messages_for_non_vision_model mirrors the same contract
    for chat.completions and codex_responses paths, so non-vision models
    on any provider get text-fallback instead of failing at the provider.
  - New _model_supports_vision() helper reads models.dev caps.

vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.

Config default: agent.image_input_mode = auto.

Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).

* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor

Two follow-ups that make the native image routing safer for long / heavy
sessions:

1) Oversize handling in build_native_content_parts:
   - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
     the most restrictive provider — Gemini inline data).
   - Delegates to vision_tools._resize_image_for_vision (Pillow-based,
     already battle-tested) to downscale to 5 MB first-try.
   - If Pillow is missing or resize still overshoots, the image is
     dropped and reported back in skipped[]; caller falls back to text
     enrichment for that image.

2) Image-token accounting in context_compressor:
   - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
     within the realistic range for Anthropic/GPT-4o/Gemini billing).
   - _content_length_for_budget() helper: sums text-part lengths and
     charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
     input_image part.  Base64 payload inside image_url is NOT counted
     as chars — dimensions don't matter, only image-presence.
   - Both tail-cut sites (_prune_old_tool_results L527 and
     _find_tail_cut_by_tokens L1126) now call the helper so multi-image
     conversations don't slip past compression budget.

Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).

* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits

The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:

  * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
    'image exceeds 5 MB maximum' above that).
  * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
    complaint (empirically verified April 2026 with progressive PNG
    sizes).

New behaviour:

  * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
    ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
  * Providers NOT in the table get no ceiling — images attach at native
    size and we trust the provider to return its own error if it
    disagrees. A provider-specific 400 message is clearer than us
    guessing wrong and silently degrading image quality.
  * build_native_content_parts() gains a keyword-only provider arg;
    gateway/CLI/TUI pass the active provider so Anthropic users get
    auto-resize protection while OpenAI users don't pay it.
  * Resize target dropped from 5 MB to 4 MB to slide safely under
    Anthropic's boundary with header overhead.

Empirical measurements (direct API, no Hermes in the loop):

    image b64     anthropic   openrouter/gpt5.5   codex-oauth/gpt5.5
    0.19 MB       ✓           ✓                   ✓
    12.37 MB      ✗ 400 5MB   ✓                   ✓
    23.85 MB      ✗ 400 5MB   ✓                   ✓
    49.46 MB      ✗ 413       ✓                   ✓

Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.

* refactor(image-input): attempt native, shrink-and-retry on provider reject

Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.

Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.

Reactive design:
  - image_routing.py: _file_to_data_url encodes native size, no ceiling.
    build_native_content_parts drops its provider kwarg.
  - error_classifier.py: new FailoverReason.image_too_large + pattern
    match ("image exceeds", "image too large", etc.) checked BEFORE
    context_overflow so Anthropic's 5MB rejection lands in the right
    bucket.
  - run_agent.py: new _try_shrink_image_parts_in_messages walks api
    messages in-place, re-encodes oversized data: URL image parts
    through vision_tools._resize_image_for_vision to fit under 4MB,
    handles both chat.completions (dict image_url) and Responses
    (string image_url) shapes, ignores http URLs (provider-fetched).
    New image_shrink_retry_attempted flag in the retry loop fires the
    shrink exactly once per turn after credential-pool recovery but
    before auth retries.

E2E verified live against Anthropic claude-sonnet-4-6:
  - 17.9MB PNG (23.9MB b64) attached at native size
  - Anthropic returns 400 "image exceeds 5 MB maximum"
  - Agent logs '📐 Image(s) exceeded provider size limit — shrank and
    retrying...'
  - Retry succeeds, correct response delivered in 6.8s total.

Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
2026-04-27 06:27:59 -07:00
Teknium df3c9593f8 feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364)
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls

v1 shipping transcribe-only. Spawns headless Chromium via Playwright,
joins an explicit https://meet.google.com/ URL, enables live captions,
and scrapes them into a transcript file the agent can read across turns.
The agent then has the meeting content in context and can do followup
work (send recap, file issues, schedule followups) with its regular tools.

Surface:
  - Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say
    (meet_say is a v1 stub — returns not-implemented; v2 will wire
    realtime duplex audio via OpenAI Realtime / Gemini Live +
    BlackHole / PulseAudio null-sink.)
  - CLI: hermes meet setup | auth | join | status | transcript | stop
  - Lifecycle: on_session_end auto-leaves any still-running bot.

Safety:
  - URL regex rejects anything that isn't https://meet.google.com/...
  - No calendar scanning, no auto-dial, no auto-consent announcement.
  - Single active meeting per install; a second meet_join leaves the first.
  - Platform-gated to Linux + macOS (Windows audio routing for v2 untested).
  - Opt-in: standalone plugin, user must add 'google_meet' to
    plugins.enabled in config.yaml.

Zero core changes. Plugin uses existing register_tool /
register_cli_command / register_hook surfaces. 21 new unit tests cover the
URL safety gate, transcript dedup + status round-trip, process-manager
refusals/start/stop paths, tool-handler JSON shape under each branch,
session-end cleanup, and platform-gated register().

* feat(plugins/google_meet): v2 realtime audio + v3 remote node host

v2 \u2014 agent speaks in-meeting
  audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS).
    On Linux we load pactl module-null-sink + module-virtual-source, track
    module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its
    fake mic reads what we write to the sink. macOS just probes BlackHole
    2ch and returns its device name \u2014 the plugin refuses to switch the
    user's default audio input (that would surprise them).
  realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime
    API. RealtimeSession.speak(text) sends conversation.item.create +
    response.create, accumulates response.audio.delta PCM bytes, appends
    them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming
    meet_say calls. 'websockets' is an optional dep imported lazily.
  meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge,
    starts RealtimeSession + speaker thread, spawns paplay to pump PCM
    into the null-sink, then cleans everything up on SIGTERM. If any
    realtime setup step fails, falls back cleanly to transcribe mode
    with an error flagged in status.json.
  process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl;
    refuses when no active meeting or active meeting is transcribe-only.
  tools.meet_say: real implementation; requires active mode='realtime'.
  meet_join: adds mode='transcribe'|'realtime' param.

v3 \u2014 remote node host
  node/protocol.py: JSON envelope (type, id, token, payload) + validate.
  node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with
    resolve() auto-selecting the sole registered node when name is None.
  node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth,
    dispatches start_bot/stop/status/transcript/say/ping onto the local
    process_manager. Token auto-generated + persisted on first run.
  node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises
    RuntimeError on error envelopes, clean API matching the server.
  node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}'
    subtree; wired into the main meet CLI by cli.py so 'hermes meet node'
    Just Works.
  tools.py: every meet_* tool accepts node='<name>'|'auto'; when set,
    routes through NodeClient to the remote bot instead of running
    locally. Unknown node \u2192 clear 'no registered meet node matches ...'
    error.
  cli.py: 'hermes meet join --node my-mac --mode realtime' and
    'hermes meet say "..." --node my-mac' route to the node; 'hermes
    meet node approve <name> <url> <token>' registers one.

Tests
  21 v1 tests updated (meet_say is no longer a stub; active-record now
    carries mode).
  20 new audio_bridge + realtime tests.
  42 new node tests (protocol/registry/server/client/cli).
  17 new v1/v2/v3 integration tests at the plugin level covering
    enqueue_say edge cases, env var passthrough, mode validation, node
    routing (known/unknown/auto/ambiguous), and argparse wiring for
    `hermes meet say` + `hermes meet node` + --mode/--node flags.
  Total: 100 plugin tests + 58 plugin-system tests = 158 passing.

E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools
register, on_session_end hook wires, 'hermes meet' CLI tree wires
including the node subtree, NodeRegistry round-trips, meet_join routes
correctly to NodeClient under node='my-mac' with mode='realtime',
enqueue_say accepts realtime/rejects transcribe, argparse parses every
new flag cleanly.

Zero changes to core. All new code lives under plugins/google_meet/.

* feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status

Ready-for-live-test follow-up on PR #16364. Five additions that matter for
the first live run on a real Meet, in priority order:

1. hermes meet install [--realtime] [--yes]
   pip install playwright websockets + python -m playwright install chromium
   --realtime: installs platform audio deps (pulseaudio-utils on Linux via
   sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before
   sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the
   macOS default input — user still selects BlackHole in System Settings
   (deliberate; surprise audio rerouting is worse than a manual step).

2. Admission detection
   _detect_admission(page): Leave-button visible OR caption region
   attached OR participants list present → we're in-call.
   _detect_denied(page): 'You can\'t join this video call' / 'You were
   removed' / 'No one responded to your request' → bail out.
   HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in
   the lobby before giving up. in_call stays False until admitted.
   Status surfaces leaveReason: duration_expired | lobby_timeout |
   denied | page_closed.

3. macOS PCM pump
   ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the
   BlackHole AVFoundation output via -f audiotoolbox
   -audio_device_index <N>. _mac_audio_device_index() probes
   ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch'
   → numeric index. Falls back to index 0 on probe failure. Linux
   paplay pump unchanged.

4. Richer status dict
   _BotState now tracks realtime, realtimeReady, realtimeDevice,
   audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt,
   leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at
   counters fold into the status file once a second so meet_status()
   can show the agent's voice activity in near-real-time.

5. Barge-in
   RealtimeSession.cancel_response() sends type='response.cancel' over
   the same WS (lock-guarded so it's safe to call from the caption
   thread while speak() is reading frames). Handles response.cancelled
   as a terminal frame type. _looks_like_human_speaker() gates triggers
   so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel.
   Called from the caption drain loop: when a new caption arrives
   attributed to a real participant while rt.session exists, we fire
   cancel_response() and stamp lastBargeInAt.

Tests: 20 new unit tests across _BotState telemetry, barge-in gating,
admission/denied probe error handling, cancel_response with and without
a connected WS, and `hermes meet install` CLI wiring (flag parsing +
end-to-end subprocess.run verification + Linux-already-installed fast
path). Total 171 passing across all google_meet test files + the
plugin-system regression suite.

E2E verified on Linux: plugin loads, all 5 tools register,
`hermes meet install --realtime --yes` parses, fresh-bot status.json
has every new telemetry key, cancel_response on a disconnected session
returns False without raising, barge-in helper gates the bot's own
name correctly.

Still out of scope (for a future PR, not blocking live test):
mic → Realtime duplex (the agent listening to meeting audio via
WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio.

Docs updated: SKILL.md now lists the installer subcommand, lobby
timeout, barge-in caveat, and the full status-dict reference table.
README.md quick-start uses hermes meet install.
2026-04-27 06:22:25 -07:00
Teknium 8ed599dc05 feat(update): auto-backup HERMES_HOME before hermes update (#16539)
Every 'hermes update' now runs a full backup of ~/.hermes/ first, so
users can always roll back to the exact state they had before the
update if anything goes wrong (corrupted sessions.db, broken skills,
config migrations that don't round-trip, etc.).

Changes:
- hermes_cli/backup.py: new create_pre_update_backup() helper. Writes
  to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same
  exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates
  (keep last N, pre-update-*.zip only — hand-dropped zips in backups/
  are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups
  don't nest prior ones.
- hermes_cli/main.py: _run_pre_update_backup() wired into
  _cmd_update_impl before any git operation. Prints save path, restore
  command, and how to disable. Swallows failures so a broken backup
  never blocks the update itself. New --no-backup flag on 'hermes
  update' for one-off override.
- hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with
  pre_update_backup (default true) and backup_keep (default 5).
  Auto-surfaces in the dashboard config UI.
- tests/hermes_cli/test_backup.py: +11 tests covering backup location,
  content parity with 'hermes backup', no-recursion, rotation, manual
  file preservation, config gate, --no-backup flag, flag-wins-over-config.
2026-04-27 05:36:19 -07:00
Teknium 920ebd8303 feat(prompt): point agent at hermes-agent skill + docs site for Hermes questions (#16535)
Adds a short always-on pointer to the system prompt: when the user asks
about configuring, setting up, troubleshooting, or using Hermes Agent
itself, load the hermes-agent skill via skill_view(name='hermes-agent')
and fall back to https://hermes-agent.nousresearch.com/docs via
web_extract. Keeps sessions without skill_view loaded useful too — the
docs URL + web_extract is enough to answer most questions.

The guidance is appended right after DEFAULT_AGENT_IDENTITY (or SOUL.md)
so it ships regardless of which toolset profile is active. Footprint is
~560 chars, behind the existing prompt cache.
2026-04-27 05:35:55 -07:00
Teknium bb00b783fb fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.

Fixes:

- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
  column-shrink reflow. Generalize it to force a full screen-clear
  (erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
  — covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.

- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
  prompt_toolkit has no signal to recover. Add a _force_full_redraw()
  helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
  as /redraw. Users can manually clear drift without restarting Hermes.

- #14692 (DSR response leaks — ^[[53;1R): resize storms make
  prompt_toolkit's CSI 6n queries race past the input parser; the
  terminal's reply ends up as literal input text. Add a sibling of the
  bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
  caret-escape visible form from paste text, buffer text-filter, and
  the input-processing loop.

The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
2026-04-27 05:31:47 -07:00
Q 5e92b67807 fix: stop idle CLI redraws 2026-04-27 05:31:47 -07:00
Teknium ee1a07f9e9 fix(agent): block cross-provider reasoning leak to DeepSeek/Kimi (#15748) (#16500)
On provider switches mid-session (e.g. MiniMax -> DeepSeek), the source
assistant turn carries a 'reasoning' field written by the prior provider
but no 'reasoning_content' key. _copy_reasoning_content_for_api would
promote that foreign 'reasoning' to 'reasoning_content' on the outbound
DeepSeek request, leaking a cross-provider chain of thought and in
practice causing HTTP 400.

DeepSeek's own _build_assistant_message always pins reasoning_content=''
at creation time for tool-call turns, so the shape (reasoning set,
reasoning_content absent, tool_calls present) is unreachable from
same-provider DeepSeek history — it can only come from a prior provider.
Pad with '' in that case instead of promoting.

Healthy same-provider 'reasoning' promotion (no tool_calls, or on
providers that do not require the empty-string pin) is unchanged.
2026-04-27 04:06:23 -07:00
Teknium 65f648ee84 fix(website): auto-wrap ASCII-art code blocks in generated skill pages (#16497)
Defensive: when the generator encounters a fenced code block containing
Unicode box-drawing characters, wrap it in `<!-- ascii-guard-ignore -->`
markers so the docs-site-checks lint (which scans inside code fences)
can't reject the page for a skill's own diagram.

Plain bash/python code blocks stay uncluttered — only blocks with box
chars get wrapped. Skill authors no longer have to remember to add the
ignore markers in every SKILL.md with ASCII art.

Fixes #15305.
2026-04-27 03:38:39 -07:00
Wysie 64a497bfa9 fix(hindsight): preserve setup config on blank input 2026-04-27 03:34:58 -07:00
Teknium 90a3e73daf fix(debug): sweep expired paste.rs uploads on a real timer (#16431)
Previously 'hermes debug share' uploads only got DELETEd when the user
ran 'hermes debug share' again — opportunistic-sweep-on-invoke was the
only cleanup path. A user who uploaded once and never ran debug again
left pastes up until paste.rs's retention kicked in (which, empirically,
never actually expires them).

Hook _sweep_expired_pastes into the gateway cron ticker at the same
hourly cadence as the image/document cache cleanups. The opportunistic
sweep in 'hermes debug share' stays as a fallback for CLI-only users
who never start the gateway.
2026-04-27 00:36:33 -07:00
vominh1919 2e6699b319 fix: strip leaked declare-x env dump from terminal output on macOS (#15459)
On macOS (bash 3.2 and some Homebrew bash builds) `source`ing a file that
contains `declare -x` statements prints each declaration to stdout. The
persistent-shell wrapper in tools/environments/base.py was only redirecting
stderr when sourcing the session snapshot, so ~60 lines of env vars leaked
into every terminal tool response — blowing out context and triggering
HTTP 400s on context-limited providers.

Fix: redirect both stdout and stderr when sourcing the snapshot. Linux
bash is silent here, so the redirect is harmless there; macOS no longer
leaks.

Closes #15459

Co-authored-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>
2026-04-27 00:19:48 -07:00
Teknium 21f503c23c feat(update): snapshot pairing data before git pull (#16383)
Quick state snapshot now includes pairing JSONs (generic + legacy +
Feishu comment pairing), and `hermes update` takes a pre-update
snapshot labeled `pre-update` before pulling.

Pairing data lives outside state.db in platform-specific JSONs under
~/.hermes/pairing/, ~/.hermes/platforms/pairing/, and
~/.hermes/feishu_comment_pairing.json.  The update command already
couldn't touch $HERMES_HOME, but #15733 reports lost pairing after
an update — this gives users something to restore from via
`/snapshot list` / `/snapshot restore <id>` if anything clobbers
the approved-user lists.

- Extend _QUICK_STATE_FILES with pairing paths (files + dirs)
- Snapshot walks directories recursively and records each file in the
  manifest individually so restore logic is unchanged
- _cmd_update_impl calls create_quick_snapshot(label='pre-update')
  after 'Found N new commits' and before 'Pulling updates'
- Snapshot failures are logged at debug and never block the update

Refs #15733.
2026-04-27 00:19:12 -07:00
Teknium a32d07529c fix(file-tools): escalate to BLOCKED on repeated read_file dedup stubs (#16382)
read_file's dedup path returned a lightweight stub on re-reads of an
unchanged file, then returned early — so the consecutive-read loop
guard (hard block at count>=4) at the bottom of read_file_tool never
ran for stub-looped calls. Weaker tool-following models (local Qwen3.6
variants in the reported case) ignore the passive 'refer to earlier
result' hint and hammer the same read_file call until iteration budget
runs out.

Track per-key stub returns in task_data['dedup_hits'] and, on the
second stub for the same (path, offset, limit), return a hard BLOCKED
error mirroring the wording the real-read path already uses. A real
read, an intervening non-read tool call (notify_other_tool_call), or
reset_file_dedup (on context compression) all clear the counter so
the guard never stays engaged longer than the actual loop.

Closes #15759
2026-04-27 00:17:26 -07:00
alberto 3ff3dfb5ac fix(telegram): accept /cmd@botname from bot menu in groups
Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.

Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes #15415.
2026-04-26 22:00:18 -07:00
Teknium 8258f4dcb7 fix(model): avoid persisting key_env-resolved secrets to providers entry (#16372)
When 'hermes model' runs against a providers: (keyed-schema) entry that
relies only on key_env, the picker resolves the env var for the live
/models request and then wrote a synthesized 'api_key: ${KEY_ENV}' back
to the providers.<key> entry. That's redundant — the runtime already
resolves from key_env directly — and it clutters configs that
intentionally keep credentials out of config.yaml.

Only persist provider_entry['api_key'] when the user originally had an
inline value (literal secret or ${VAR} template). Entries that declared
only key_env stay clean on save.

Fixes #15803.
2026-04-26 21:52:12 -07:00
Teknium 9f1b1977bc docs(skills): salvage dropped trigger content into skill bodies
For 14 of 74 compressed skills, the original description contained
trigger keywords, technique counts, attribution, or use-case phrases
not covered by the existing body content. Prepends a 'When to use' /
'What's inside' block near the top so the agent still has the full
context when the skill is loaded.

Skills salvaged:
- codex, ascii-video, creative-ideation, excalidraw, manim-video, p5js
- gif-search, heartmula, youtube-content
- lm-evaluation-harness, obliteratus, vllm, axolotl
- powerpoint

Remaining 60 skills were verified to already cover the dropped content
in their existing body sections (When to Use, overview, intro prose)
or had short descriptions fully captured by the new compressed form.
2026-04-26 21:50:56 -07:00
Teknium e3921e7ca4 docs(skills): compress 74 built-in skill descriptions to <=60 chars
Target: every skill's description fits in a one-line gateway menu and
leads with trigger keywords an agent would match on. Drops filler like
'Use this skill to', 'A skill for', 'This skill provides'.

Before: max description length was 791 chars (architecture-diagram),
74 of 81 built-in skills were >60 chars.

After: max 60, mean 54, all 81 built-in skills <=60.

Rewritten with double-quoted YAML scalars to preserve Chinese/arrow
glyphs (baoyu-comic, yuanbao, youtube-content).
2026-04-26 21:50:56 -07:00
Teknium 7d586ddb42 docs(skills): trim design skill descriptions to <=60 chars + inline cross-ref
- claude-design: 'Design one-off HTML artifacts (landing, deck, prototype).' (57)
- popular-web-designs: '54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.' (60)
- design-md: "Author/validate/export Google's DESIGN.md token spec files." (59)

Also adds an inline callout near the top of claude-design pointing to
popular-web-designs and design-md so the cross-reference lands even
without reading the full decision table.
2026-04-26 21:50:56 -07:00
Teknium a131c134bc chore(release): map BadTechBandit in AUTHOR_MAP 2026-04-26 21:50:56 -07:00
Teknium 55be532369 docs(skills): clarify when to use claude-design vs popular-web-designs vs design-md
- claude-design: design process + taste for one-off HTML artifacts
- popular-web-designs: 54 ready-to-paste design systems (Stripe/Linear/etc.)
- design-md: formal DESIGN.md token spec file authoring

Adds a comparison table to claude-design's 'When To Use' section and
reciprocal pointers in design-md and popular-web-designs. Also corrects
claude-design author attribution to BadTechBandit.
2026-04-26 21:50:56 -07:00
CREWorx 8c5d3a99d6 feat(skills): add claude-design HTML artifact skill 2026-04-26 21:50:56 -07:00
Teknium af3d5150c1 fix(matrix): close 'hall of mirrors' pairing + echo loop (#15763) (#16374)
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.

Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):

1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
   equality with self._user_id.  When self._user_id is still empty
   (whoami has not resolved, or login failed), returns True
   defensively: an unidentified bot dropping its own events is always
   preferable to falling into an echo loop.  The previous byte-for-byte
   equality check let differently-cased copies of the bot's MXID slip
   through, and an unresolved self-ID silently disabled the guard.

2. _is_system_or_bridge_sender(sender) — drops appservice namespace
   puppets (conventional @_bridge_...:server form) and malformed
   senders with an empty localpart.  These identities used to fall
   through to the gateway's unauthorized-user path, trigger a pairing
   code, and — once an operator approved the bridge — every outbound
   message the bridge relayed would loop back as an authorized user
   message.  This was the root of the 'hall of mirrors' symptom.

Fixes #15763

Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass.  14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
2026-04-26 21:50:28 -07:00
Teknium 4a2ee6c162 fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure
Closes #15775.

Title generation swallowed exceptions at debug level and returned None,
so a depleted auxiliary provider (e.g. OpenRouter 402) silently left
sessions with NULL titles. Reporter observed 45 untitled sessions
accumulated over 19 days with no user-visible indication.

- agent/title_generator.py: accept optional failure_callback, bump log
  to WARNING, invoke callback on call_llm exception (swallowing callback
  errors so nothing can crash the fire-and-forget worker thread).
- cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the
  callback so failures route through the existing user-visible warning
  channel.
- tests: cover callback fires / errors are swallowed / no-callback
  legacy behavior / maybe_auto_title forwards kwarg to worker.
2026-04-26 21:49:34 -07:00
briandevans bda2dbc29e fix(compressor): apply bare-string guard to protect-tail boundary scan
The bare-string isinstance guard added in 80ae2621 covered _find_tail_cut_by_tokens
(line 1084) but missed the identical pattern in _calculate_protect_tail_boundary
(line 487, the protect-tail scan loop).  Both loops call .get("text", "") on every
list item in message["content"]; both crash with AttributeError when that list
contains a bare string.

Apply the same dict/str/fallback isinstance guard to the protect-tail path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
briandevans 943465235e fix(compressor): guard against bare-string items in multimodal content list
raw_content from message["content"] can be a list that contains bare
strings, not only dicts.  The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.

Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)).  Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
briandevans cfc8befe65 fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.

This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.

Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
  sum(len(p.get("text", "")) for p in raw_content)
  if isinstance(raw_content, list) else len(raw_content)

Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.

Fixes #16087.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
Teknium 3e68809fe0 chore(release): map romanornr noreply email 2026-04-26 21:47:40 -07:00
romanornr a0fe73bada fix(cli): strip leaked bracketed-paste wrappers 2026-04-26 21:47:40 -07:00
Teknium 7c63c24613 fix(cron): don't silently disable recurring cron jobs when croniter is missing (#16368)
If the gateway's Python env loses access to 'croniter' between when a
cron job was created and when mark_job_run() fires, compute_next_run()
returns None for cron schedules. mark_job_run() treated that as terminal
completion and wrote enabled=false, state=completed — turning a missing
runtime dep into a silent, permanent job-off.

That behaviour is safe for one-shot jobs but wrong for recurring ones. A
missing dep should surface as an error the user can see, not as successful
completion of a job that is about to stop firing.

mark_job_run() now only disables the job on next_run_at=None when the
schedule is one-shot. For recurring (cron/interval) schedules it keeps
enabled=true, sets state=error, and records last_error so the user can
see why the job isn't advancing. compute_next_run() also logs a warning
the first time cron+no-croniter hits, so the underlying cause is visible
in the gateway log.

Tests cover:
- recurring cron job stays enabled with state=error when HAS_CRONITER=False
- recurring interval stays enabled when compute_next_run returns None
- one-shot jobs still flip to enabled=false, state=completed (no regression)

Fixes #16265
2026-04-26 21:47:32 -07:00
Teknium c5781d50c7 fix(azure-foundry): auto-route gpt-5.x / codex / o-series to Responses API (#16361)
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as
Responses-API-only.  Calling /chat/completions against these deployments
returns 400 'The requested operation is unsupported.', which broke any
user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment,
and kept the default api_mode: chat_completions.  Verified in a user
debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com
with that exact payload while gpt-4o-pure on the same endpoint worked.

Adds azure_foundry_model_api_mode(model_name) that returns
codex_responses when the model name starts with gpt-5, codex, o1, o3,
or o4 — otherwise None so chat_completions / anthropic_messages stay
untouched for gpt-4o, Llama, Claude-via-Anthropic, etc.

Resolver (both the direct Azure Foundry path and the pool-entry path)
consults it and upgrades api_mode unless the user explicitly picked
anthropic_messages.  target_model (from /model mid-session switch)
takes precedence over the persisted default so switching from gpt-4o
to gpt-5.3-codex routes correctly before the next request.

Docs: correct the azure-foundry guide which previously claimed Azure
keeps gpt-5.x on chat completions — that was only true for early Azure
OpenAI, not Azure Foundry codex/o-series deployments.

Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration
tests in TestAzureFoundryResolution covering Bob's exact scenario,
target_model override, anthropic_messages guard, and o3-mini.
2026-04-26 21:33:31 -07:00
Teknium 235bfb192b docs(skills): document URL install across features, reference, guide, and hermes-agent skill (#16355)
Follow-up to #16323 — the UrlSource adapter is shipped but four
user-facing docs surfaces still only listed the hub-identifier forms.

- user-guide/features/skills.md: add ``url`` to the Supported-hub-sources
  table; add a new "#### 8. Direct URL (`url`)" section explaining scope
  (single-file SKILL.md only), name-resolution order (frontmatter → URL
  slug → interactive prompt → --name flag), and both TTY and
  non-interactive usage. Add two URL examples to the install-examples
  block near the top of the page.
- reference/cli-commands.md: two URL install examples + one note
  explaining the name-resolution fallback chain.
- guides/work-with-skills.md: one URL-install example alongside the
  existing hub-identifier examples.
- skills/autonomous-ai-agents/hermes-agent/SKILL.md: Quick Reference
  block's ``hermes skills install`` line now spells out that ID can be
  a hub identifier OR a direct SKILL.md URL, and mentions --name for
  frontmatter-less skills.

No code changes. No new dependencies. Website builds via the usual
Docusaurus pipeline.

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 21:27:59 -07:00
309 changed files with 25906 additions and 1128 deletions
+1
View File
@@ -69,3 +69,4 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
models-dev-upstream/
+6 -2
View File
@@ -30,18 +30,22 @@ WORKDIR /opt/hermes
# unless the lockfiles themselves change.
COPY package.json package-lock.json ./
COPY web/package.json web/package-lock.json web/
COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
COPY ui-tui/packages/hermes-ink/package.json ui-tui/packages/hermes-ink/package-lock.json ui-tui/packages/hermes-ink/
RUN npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
(cd web && npm install --prefer-offline --no-audit) && \
(cd ui-tui && npm install --prefer-offline --no-audit) && \
npm cache clean --force
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# Build browser dashboard and terminal UI assets.
RUN cd web && npm run build && \
cd ../ui-tui && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
+30 -3
View File
@@ -202,19 +202,33 @@ def _forbids_sampling_params(model: str) -> bool:
# Beta headers for enhanced features (sent with ALL auth types).
# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
# As of Opus 4.7 (2026-04-16), the first two are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
# that still gate on the headers continue to get the enhanced features.
# Migration guide: remove these if you no longer support ≤4.5 models.
#
# ``context-1m-2025-08-07`` unlocks the 1M context window on Claude Opus 4.6/4.7
# and Sonnet 4.6 when served via AWS Bedrock or Azure AI Foundry. 1M is GA on
# native Anthropic (api.anthropic.com) for Opus 4.6+, but Bedrock/Azure still
# gate it behind this beta header as of 2026-04 — without it Bedrock caps Opus
# at 200K even though model_metadata.py advertises 1M. The header is a harmless
# no-op on endpoints where 1M is GA.
#
# Migration guide: remove these if you no longer support ≤4.5 models or once
# Bedrock/Azure promote 1M to GA.
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
"context-1m-2025-08-07",
]
# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
# the fine-grained tool streaming beta is present. Omit it so tool calls
# fall back to the provider's default response path.
_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
# 1M context beta — see comment on _COMMON_BETAS above. Stripped for
# Bearer-auth (MiniMax) endpoints since they host their own models and
# unknown Anthropic beta headers risk request rejection.
_CONTEXT_1M_BETA = "context-1m-2025-08-07"
# Fast mode beta — enables the ``speed: "fast"`` request parameter for
# significantly higher output token throughput on Opus 4.6 (~2.5x).
@@ -357,9 +371,14 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
that include Anthropic's ``fine-grained-tool-streaming`` beta — every
tool-use message triggers a connection error. Strip that beta for
Bearer-auth endpoints while keeping all other betas intact.
The ``context-1m-2025-08-07`` beta is also stripped for Bearer-auth
endpoints — MiniMax hosts its own models, not Claude, so the header is
irrelevant at best and risks request rejection at worst.
"""
if _requires_bearer_auth(base_url):
return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
_stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
return [b for b in _COMMON_BETAS if b not in _stripped]
return _COMMON_BETAS
@@ -456,6 +475,13 @@ def build_anthropic_bedrock_client(region: str):
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Attaches the common Anthropic beta headers as client-level defaults so
that Bedrock-hosted Claude models get the same enhanced features as
native Anthropic. The ``context-1m-2025-08-07`` beta in particular
unlocks the 1M context window for Opus 4.6/4.7 on Bedrock — without
it, Bedrock caps these models at 200K even though the Anthropic API
serves them with 1M natively.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
if _anthropic_sdk is None:
@@ -473,6 +499,7 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
default_headers={"anthropic-beta": ",".join(_COMMON_BETAS)},
)
+77 -33
View File
@@ -82,6 +82,8 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -155,6 +157,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"gmi": "google/gemini-3.1-flash-lite-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
@@ -1617,8 +1620,14 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
# below — never look up auth env vars ad-hoc.
def _to_async_client(sync_client, model: str):
"""Convert a sync client to its async counterpart, preserving Codex routing."""
def _to_async_client(sync_client, model: str, is_vision: bool = False):
"""Convert a sync client to its async counterpart, preserving Codex routing.
When ``is_vision=True`` and the underlying base URL is Copilot, the
resulting async client carries the ``Copilot-Vision-Request: true``
header so the request is routed to Copilot's vision-capable
infrastructure (otherwise vision payloads silently time out).
"""
from openai import AsyncOpenAI
if isinstance(sync_client, CodexAuxiliaryClient):
@@ -1647,9 +1656,11 @@ def _to_async_client(sync_client, model: str):
if base_url_host_matches(sync_base_url, "openrouter.ai"):
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
from hermes_cli.copilot_auth import copilot_request_headers
async_kwargs["default_headers"] = copilot_default_headers()
async_kwargs["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
elif base_url_host_matches(sync_base_url, "api.kimi.com"):
async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
return AsyncOpenAI(**async_kwargs), model
@@ -1676,6 +1687,7 @@ def resolve_provider_client(
explicit_api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Central router: given a provider name and optional model, return a
configured client with the correct auth, base URL, and API format.
@@ -1759,7 +1771,7 @@ def resolve_provider_client(
"auxiliary provider (using %r instead)", model, resolved)
model = None
final_model = model or resolved
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── OpenRouter ───────────────────────────────────────────────────
@@ -1772,7 +1784,7 @@ def resolve_provider_client(
)
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── Nous Portal (OAuth) ──────────────────────────────────────────
@@ -1789,7 +1801,7 @@ def resolve_provider_client(
"but Nous Portal not configured (run: hermes auth)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
@@ -1816,7 +1828,7 @@ def resolve_provider_client(
"but no Codex OAuth token found (run: hermes model)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
@@ -1845,11 +1857,13 @@ def resolve_provider_client(
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
from hermes_cli.copilot_auth import copilot_request_headers
extra["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# Try custom first, then codex, then API-key providers
for try_fn in (_try_custom_endpoint, _try_codex,
@@ -1859,7 +1873,7 @@ def resolve_provider_client(
final_model = _normalize_resolved_model(model or default, provider)
_cbase = str(getattr(client, "base_url", "") or "")
client = _wrap_if_needed(client, final_model, _cbase)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: custom/main requested "
"but no endpoint credentials found")
@@ -1904,7 +1918,7 @@ def resolve_provider_client(
provider,
)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
@@ -1923,7 +1937,7 @@ def resolve_provider_client(
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning(
"resolve_provider_client: named custom provider %r has no base_url",
@@ -1955,7 +1969,7 @@ def resolve_provider_client(
logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
return None, None
final_model = _normalize_resolved_model(model or default_model, provider)
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))
creds = resolve_api_key_provider_credentials(provider)
api_key = str(creds.get("api_key", "")).strip()
@@ -1981,7 +1995,7 @@ def resolve_provider_client(
if is_native_gemini_base_url(base_url):
client = GeminiNativeClient(api_key=api_key, base_url=base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# Provider-specific headers
@@ -1989,9 +2003,11 @@ def resolve_provider_client(
if base_url_host_matches(base_url, "api.kimi.com"):
headers["User-Agent"] = "claude-code/0.1.0"
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
from hermes_cli.copilot_auth import copilot_request_headers
headers.update(copilot_default_headers())
headers.update(copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
))
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
@@ -2017,7 +2033,7 @@ def resolve_provider_client(
client = _wrap_if_needed(client, final_model, base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
if pconfig.auth_type == "external_process":
@@ -2049,7 +2065,7 @@ def resolve_provider_client(
args=args,
)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: external-process provider %s not "
"directly supported", provider)
@@ -2085,7 +2101,7 @@ def resolve_provider_client(
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
@@ -2160,8 +2176,13 @@ def _normalize_vision_provider(provider: Optional[str]) -> str:
return _normalize_aux_provider(provider)
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
def _resolve_strict_vision_backend(
provider: str,
model: Optional[str] = None,
) -> Tuple[Optional[Any], Optional[str]]:
provider = _normalize_vision_provider(provider)
if provider == "copilot":
return resolve_provider_client("copilot", model, is_vision=True)
if provider == "openrouter":
return _try_openrouter()
if provider == "nous":
@@ -2229,7 +2250,7 @@ def resolve_vision_provider_client(
return resolved_provider, None, None
final_model = resolved_model or default_model
if async_mode:
async_client, async_model = _to_async_client(sync_client, final_model)
async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
return resolved_provider, async_client, async_model
return resolved_provider, sync_client, final_model
@@ -2261,8 +2282,11 @@ def resolve_vision_provider_client(
main_provider = _read_main_provider()
main_model = _read_main_model()
if main_provider and main_provider not in ("auto", ""):
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
if main_provider == "nous":
sync_client, default_model = _resolve_strict_vision_backend(main_provider)
sync_client, default_model = _resolve_strict_vision_backend(
main_provider, vision_model
)
if sync_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2270,10 +2294,10 @@ def resolve_vision_provider_client(
)
return _finalize(main_provider, sync_client, default_model)
else:
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
rpc_client, rpc_model = resolve_provider_client(
main_provider, vision_model,
api_mode=resolved_api_mode)
api_mode=resolved_api_mode,
is_vision=True)
if rpc_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2295,11 +2319,14 @@ def resolve_vision_provider_client(
return None, None, None
if requested in _VISION_AUTO_PROVIDER_ORDER:
sync_client, default_model = _resolve_strict_vision_backend(requested)
sync_client, default_model = _resolve_strict_vision_backend(
requested, resolved_model
)
return _finalize(requested, sync_client, default_model)
client, final_model = _get_cached_client(requested, resolved_model, async_mode,
api_mode=resolved_api_mode)
api_mode=resolved_api_mode,
is_vision=True)
if client is None:
return requested, None, None
return requested, client, final_model
@@ -2363,10 +2390,11 @@ def _client_cache_key(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> tuple:
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)
def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
@@ -2392,6 +2420,7 @@ def _refresh_nous_auxiliary_client(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
runtime = _resolve_nous_runtime_api(force_refresh=True)
@@ -2409,7 +2438,7 @@ def _refresh_nous_auxiliary_client(
current_loop = _aio.get_event_loop()
except RuntimeError:
pass
client, final_model = _to_async_client(sync_client, final_model or "")
client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
else:
client = sync_client
@@ -2420,6 +2449,7 @@ def _refresh_nous_auxiliary_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
_store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
return client, final_model
@@ -2531,12 +2561,19 @@ def _is_openrouter_client(client: Any) -> bool:
return False
def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
"""Best-effort check for cached clients that accept ``vendor/model`` IDs."""
if _is_openrouter_client(client):
return True
return bool(cached_default and "/" in cached_default)
def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
"""Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
"""Keep slash-bearing model IDs only for cached clients that support them.
Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
"""
if model and "/" in model and not _is_openrouter_client(client):
if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
return cached_default
return model or cached_default
@@ -2549,6 +2586,7 @@ def _get_cached_client(
api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Get or create a cached client for the given provider.
@@ -2585,6 +2623,7 @@ def _get_cached_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
with _client_cache_lock:
if cache_key in _client_cache:
@@ -2616,6 +2655,7 @@ def _get_cached_client(
explicit_api_key=api_key,
api_mode=api_mode,
main_runtime=runtime,
is_vision=is_vision,
)
if client is not None:
# For async clients, remember which loop they were created on so we
@@ -3079,6 +3119,7 @@ def call_llm(
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
@@ -3369,6 +3410,7 @@ async def async_call_llm(
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
@@ -3437,7 +3479,9 @@ async def async_call_llm(
extra_body=effective_extra_body,
base_url=str(getattr(fb_client, "base_url", "") or ""))
# Convert sync fallback client to async
async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
async_fb, async_fb_model = _to_async_client(
fb_client, fb_model or "", is_vision=(task == "vision")
)
if async_fb_model and async_fb_model != fb_kwargs.get("model"):
fb_kwargs["model"] = async_fb_model
return _validate_llm_response(
+113 -5
View File
@@ -61,9 +61,52 @@ _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
# Flat token cost per attached image part. Real cost varies by provider and
# dimensions (Anthropic ≈ width×height/750, GPT-4o up to ~1700 for
# high-detail 2048×2048, Gemini 258/tile), but 1600 is a realistic ceiling
# that keeps compression budgeting honest for multi-image conversations.
# Matches Claude Code's IMAGE_TOKEN_ESTIMATE constant.
_IMAGE_TOKEN_ESTIMATE = 1600
# Same figure expressed in the char-budget currency the rest of the
# compressor speaks in. Used when accumulating message "content length"
# for tail-cut decisions.
_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _content_length_for_budget(raw_content: Any) -> int:
"""Return the effective char-length of a message's content for token budgeting.
Plain strings: ``len(content)``. Multimodal lists: sum of text-part
``len(text)`` plus a flat ``_IMAGE_CHAR_EQUIVALENT`` per image part
(``image_url`` / ``input_image`` / Anthropic-style ``image``). This
keeps the compressor from treating a turn with 5 attached images as
near-zero tokens just because the text part is empty.
"""
if isinstance(raw_content, str):
return len(raw_content)
if not isinstance(raw_content, list):
return len(str(raw_content or ""))
total = 0
for p in raw_content:
if isinstance(p, str):
total += len(p)
continue
if not isinstance(p, dict):
total += len(str(p))
continue
ptype = p.get("type")
if ptype in {"image_url", "input_image", "image"}:
total += _IMAGE_CHAR_EQUIVALENT
else:
# text / input_text / tool_result-with-text / anything else with
# a text field. Ignore the raw base64 payload inside image_url
# dicts — dimensions don't matter, only whether it's an image.
total += len(p.get("text", "") or "")
return total
def _content_text_for_contains(content: Any) -> str:
"""Return a best-effort text view of message content.
@@ -295,6 +338,10 @@ class ContextCompressor(ContextEngine):
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -398,6 +445,17 @@ class ContextCompressor(ContextEngine):
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
# When summary generation fails and a static fallback is inserted,
# record how many turns were unrecoverably dropped so callers
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
# succeeded. Silent recovery would hide the broken config.
self._last_aux_model_failure_error: Optional[str] = None
self._last_aux_model_failure_model: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -484,7 +542,7 @@ class ContextCompressor(ContextEngine):
for i in range(len(result) - 1, -1, -1):
msg = result[i]
raw_content = msg.get("content") or ""
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
content_len = _content_length_for_budget(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -857,10 +915,50 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
# Record the aux-model failure so callers can warn the user
# even if the retry-on-main succeeds — a misconfigured aux
# model is something the user needs to fix.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic) # retry immediately
# Unknown-error best-effort retry on main model. Losing N turns of
# context is almost always worse than one extra summary attempt, so
# if we haven't already fallen back and the summary model differs
# from the main model, try once more on main before entering
# cooldown. Errors that DID match _is_model_not_found above are
# already handled by the fast-path retry; this branch catches
# everything else (400s, provider-specific "no route" strings,
# aggregator rejections, etc.) where auto-retry is still safer
# than dropping the turns.
if (
self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' failed (%s). "
"Retrying on main model '%s' before giving up.",
self.summary_model, e, self.model,
)
# Record the aux-model failure (see 404 branch above) — user
# should know their configured model is broken even if main
# recovers the call.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
@@ -1082,8 +1180,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
for i in range(n - 1, head_end - 1, -1):
msg = messages[i]
content = msg.get("content") or ""
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
raw_content = msg.get("content") or ""
content_len = _content_length_for_budget(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -1152,6 +1251,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
related to this topic and be more aggressive about compressing
everything else. Inspired by Claude Code's ``/compact``.
"""
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_summary_error = None
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
@@ -1230,11 +1336,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting static fallback context marker")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} conversation turns were "
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"removed to free context space but could not be summarized. The removed "
f"turns contained earlier work in this session. Continue based on the "
f"messages contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
)
+31
View File
@@ -42,6 +42,7 @@ class FailoverReason(enum.Enum):
# Context / payload
context_overflow = "context_overflow" # Context too large — compress, not failover
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
@@ -147,6 +148,20 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
"error code: 413",
]
# Image-size patterns. Matched against 400 bodies (not 413) because most
# providers return a 400 with a specific image-too-big message before the
# whole request hits the 413 size limit. Anthropic's wording is the most
# important here (hard 5 MB per image, returned as
# "messages.N.content.K.image.source.base64: image exceeds 5 MB maximum").
_IMAGE_TOO_LARGE_PATTERNS = [
"image exceeds", # Anthropic: "image exceeds 5 MB maximum"
"image too large", # generic
"image_too_large", # error_code variant
"image size exceeds", # variant
# "request_too_large" on a request known to contain an image → image is
# the likely culprit; we still try the shrink path before giving up.
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@@ -671,6 +686,15 @@ def _classify_400(
) -> ClassifiedError:
"""Classify 400 Bad Request — context overflow, format error, or generic."""
# Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
# Must be checked BEFORE context_overflow because messages can trip both
# patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Context overflow from 400
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
return result_fn(
@@ -798,6 +822,13 @@ def _classify_by_message(
should_compress=True,
)
# Image-too-large patterns (from message text when no status_code)
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Usage-limit patterns need the same disambiguation as 402: some providers
# surface "usage limit" errors without an HTTP status code. A transient
# signal ("try again", "resets at", …) means it's a periodic quota, not
+236
View File
@@ -0,0 +1,236 @@
"""Routing helpers for inbound user-attached images.
Two modes:
native — attach images as OpenAI-style ``image_url`` content parts on the
user turn. Provider adapters (Anthropic, Gemini, Bedrock, Codex,
OpenAI chat.completions) already translate these into their
vendor-specific multimodal formats.
text — run ``vision_analyze`` on each image up-front and prepend the
description to the user's text. The model never sees the pixels;
it only sees a lossy text summary. This is the pre-existing
behaviour and still the right choice for non-vision models.
The decision is made once per message turn by :func:`decide_image_input_mode`.
It reads ``agent.image_input_mode`` from config.yaml (``auto`` | ``native``
| ``text``, default ``auto``) and the active model's capability metadata.
In ``auto`` mode:
- If the user has explicitly configured ``auxiliary.vision.provider``
(i.e. not ``auto`` and not empty), we assume they want the text pipeline
regardless of the main model — they've opted in to a specific vision
backend for a reason (cost, quality, local-only, etc.).
- Otherwise, if the active model reports ``supports_vision=True`` in its
models.dev metadata, we attach natively.
- Otherwise (non-vision model, no explicit override), we fall back to text.
This keeps ``vision_analyze`` surfaced as a tool in every session — skills
and agent flows that chain it (browser screenshots, deeper inspection of
URL-referenced images, style-gating loops) keep working. The routing only
affects *how user-attached images on the current turn* are presented to the
main model.
"""
from __future__ import annotations
import base64
import logging
import mimetypes
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
return "auto"
val = raw.strip().lower()
if val in _VALID_MODES:
return val
return "auto"
def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
"""True when the user configured a specific auxiliary vision backend.
An explicit override means the user *wants* the text pipeline (they're
paying for a dedicated vision model), so we don't silently bypass it.
"""
if not isinstance(cfg, dict):
return False
aux = cfg.get("auxiliary") or {}
if not isinstance(aux, dict):
return False
vision = aux.get("vision") or {}
if not isinstance(vision, dict):
return False
provider = str(vision.get("provider") or "").strip().lower()
model = str(vision.get("model") or "").strip()
base_url = str(vision.get("base_url") or "").strip()
# "auto" / "" / blank = not explicit
if provider in ("", "auto") and not model and not base_url:
return False
return True
def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
"""Return True/False if we can resolve caps, None if unknown."""
if not provider or not model:
return None
try:
from agent.models_dev import get_model_capabilities
caps = get_model_capabilities(provider, model)
except Exception as exc: # pragma: no cover - defensive
logger.debug("image_routing: caps lookup failed for %s:%s%s", provider, model, exc)
return None
if caps is None:
return None
return bool(caps.supports_vision)
def decide_image_input_mode(
provider: str,
model: str,
cfg: Optional[Dict[str, Any]],
) -> str:
"""Return ``"native"`` or ``"text"`` for the given turn.
Args:
provider: active inference provider ID (e.g. ``"anthropic"``, ``"openrouter"``).
model: active model slug as it would be sent to the provider.
cfg: loaded config.yaml dict, or None. When None, behaves as auto.
"""
mode_cfg = "auto"
if isinstance(cfg, dict):
agent_cfg = cfg.get("agent") or {}
if isinstance(agent_cfg, dict):
mode_cfg = _coerce_mode(agent_cfg.get("image_input_mode"))
if mode_cfg == "native":
return "native"
if mode_cfg == "text":
return "text"
# auto
if _explicit_aux_vision_override(cfg):
return "text"
supports = _lookup_supports_vision(provider, model)
if supports is True:
return "native"
return "text"
# Image size handling is REACTIVE rather than proactive: we attempt native
# attachment at full size regardless of provider, and rely on
# ``run_agent._try_shrink_image_parts_in_messages`` to shrink + retry if
# the provider rejects the request (e.g. Anthropic's hard 5 MB per-image
# ceiling returned as HTTP 400 "image exceeds 5 MB maximum").
#
# Why reactive: our knowledge of provider ceilings is partial and evolving
# (OpenAI accepts 49 MB+, Anthropic 5 MB, Gemini 100 MB, others unknown).
# A proactive per-provider table would be stale the moment a provider raises
# or lowers its limit, and silently degrading quality for users on providers
# that would have accepted the full image is the worse failure mode.
# The shrink-on-reject path loses 1 API call + maybe 1s of Pillow work when
# it fires, which is cheaper than permanent quality loss.
def _guess_mime(path: Path) -> str:
mime, _ = mimetypes.guess_type(str(path))
if mime and mime.startswith("image/"):
return mime
# mimetypes on some Linux distros mis-maps .jpg; default to jpeg when
# the suffix looks imagey.
suffix = path.suffix.lower()
return {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(suffix, "image/jpeg")
def _file_to_data_url(path: Path) -> Optional[str]:
"""Encode a local image as a base64 data URL at its native size.
Size limits are NOT enforced here — the agent retry loop
(``run_agent._try_shrink_image_parts_in_messages``) shrinks on the
provider's first rejection. Keeping this simple means providers that
accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
quality tax just because one other provider is stricter.
Returns None only if the file can't be read (missing, permission
denied, etc.); the caller reports those paths in ``skipped``.
"""
try:
raw = path.read_bytes()
except Exception as exc:
logger.warning("image_routing: failed to read %s%s", path, exc)
return None
mime = _guess_mime(path)
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"
def build_native_content_parts(
user_text: str,
image_paths: List[str],
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "..."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
...]
Images are attached at their native size. If a provider rejects the
request because an image is too large (e.g. Anthropic's 5 MB per-image
ceiling), the agent's retry loop transparently shrinks and retries
once — see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk.
"""
parts: List[Dict[str, Any]] = []
skipped: List[str] = []
text = (user_text or "").strip()
if text:
parts.append({"type": "text", "text": text})
for raw_path in image_paths:
p = Path(raw_path)
if not p.exists() or not p.is_file():
skipped.append(str(raw_path))
continue
data_url = _file_to_data_url(p)
if not data_url:
skipped.append(str(raw_path))
continue
parts.append({
"type": "image_url",
"image_url": {"url": data_url},
})
# If the text was empty, add a neutral prompt so the turn isn't just images.
if not text and any(p.get("type") == "image_url" for p in parts):
parts.insert(0, {"type": "text", "text": "What do you see in this image?"})
return parts, skipped
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
]
+113 -4
View File
@@ -63,15 +63,124 @@ def sanitize_context(text: str) -> str:
return text
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note.
class StreamingContextScrubber:
"""Stateful scrubber for streaming text that may contain split memory-context spans.
The fence prevents the model from treating recalled context as user
discourse. Injected at API-call time only — never persisted.
The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
a ``<memory-context>`` opened in one delta and closed in a later delta
leaks its payload to the UI because the non-greedy block regex needs
both tags in one string. This scrubber runs a small state machine
across deltas, holding back partial-tag tails and discarding
everything inside a span (including the system-note line).
Usage::
scrubber = StreamingContextScrubber()
for delta in stream:
visible = scrubber.feed(delta)
if visible:
emit(visible)
trailing = scrubber.flush() # at end of stream
if trailing:
emit(trailing)
The scrubber is re-entrant per agent instance. Callers building new
top-level responses (new turn) should create a fresh scrubber or call
``reset()``.
"""
_OPEN_TAG = "<memory-context>"
_CLOSE_TAG = "</memory-context>"
def __init__(self) -> None:
self._in_span: bool = False
self._buf: str = ""
def reset(self) -> None:
self._in_span = False
self._buf = ""
def feed(self, text: str) -> str:
"""Return the visible portion of ``text`` after scrubbing.
Any trailing fragment that could be the start of an open/close tag
is held back in the internal buffer and surfaced on the next
``feed()`` call or discarded/emitted by ``flush()``.
"""
if not text:
return ""
buf = self._buf + text
self._buf = ""
out: list[str] = []
while buf:
if self._in_span:
idx = buf.lower().find(self._CLOSE_TAG)
if idx == -1:
# Hold back a potential partial close tag; drop the rest
held = self._max_partial_suffix(buf, self._CLOSE_TAG)
self._buf = buf[-held:] if held else ""
return "".join(out)
# Found close — skip span content + tag, continue
buf = buf[idx + len(self._CLOSE_TAG):]
self._in_span = False
else:
idx = buf.lower().find(self._OPEN_TAG)
if idx == -1:
# No open tag — hold back a potential partial open tag
held = self._max_partial_suffix(buf, self._OPEN_TAG)
if held:
out.append(buf[:-held])
self._buf = buf[-held:]
else:
out.append(buf)
return "".join(out)
# Emit text before the tag, enter span
if idx > 0:
out.append(buf[:idx])
buf = buf[idx + len(self._OPEN_TAG):]
self._in_span = True
return "".join(out)
def flush(self) -> str:
"""Emit any held-back buffer at end-of-stream.
If we're still inside an unterminated span the remaining content is
discarded (safer: leaking partial memory context is worse than a
truncated answer). Otherwise the held-back partial-tag tail is
emitted verbatim (it turned out not to be a real tag).
"""
if self._in_span:
self._buf = ""
self._in_span = False
return ""
tail = self._buf
self._buf = ""
return tail
@staticmethod
def _max_partial_suffix(buf: str, tag: str) -> int:
"""Return the length of the longest buf-suffix that is a tag-prefix.
Case-insensitive. Returns 0 if no suffix could start the tag.
"""
tag_lower = tag.lower()
buf_lower = buf.lower()
max_check = min(len(buf_lower), len(tag_lower) - 1)
for i in range(max_check, 0, -1):
if tag_lower.startswith(buf_lower[-i:]):
return i
return 0
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note."""
if not raw_context or not raw_context.strip():
return ""
clean = sanitize_context(raw_context)
if clean != raw_context:
logger.warning("memory provider returned pre-wrapped context; stripped")
return (
"<memory-context>\n"
"[System note: The following is recalled memory context, "
+35 -16
View File
@@ -51,6 +51,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"qwen-oauth",
"xiaomi",
"arcee",
"gmi",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
@@ -60,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal",
@@ -307,6 +309,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"integrate.api.nvidia.com": "nvidia",
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"ollama.com": "ollama-cloud",
}
@@ -702,6 +705,29 @@ def fetch_endpoint_model_metadata(
return {}
def _resolve_endpoint_context_length(
model: str,
base_url: str,
api_key: str = "",
) -> Optional[int]:
"""Resolve context length from an endpoint's live ``/models`` metadata."""
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
return None
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
from hermes_constants import get_hermes_home
@@ -1295,22 +1321,9 @@ def get_model_context_length(
# returns 128k) instead of the model's full context (400k). models.dev
# has the correct per-provider values and is checked at step 5+.
if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if context_length is not None:
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
@@ -1374,6 +1387,12 @@ def get_model_context_length(
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider == "gmi" and base_url:
# GMI exposes authoritative context_length via /models, but it is not
# in models.dev yet. Preserve that higher-fidelity endpoint lookup.
ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if ctx is not None:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
+6
View File
@@ -141,6 +141,12 @@ DEFAULT_AGENT_IDENTITY = (
"Be targeted and efficient in your exploration and investigations."
)
HERMES_AGENT_HELP_GUIDANCE = (
"If the user asks about configuring, setting up, or using Hermes Agent "
"itself, load the `hermes-agent` skill with skill_view(name='hermes-agent') "
"before answering. Docs: https://hermes-agent.nousresearch.com/docs"
)
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Save durable facts using the memory "
"tool: user preferences, environment details, tool quirks, and stable conventions. "
+7 -3
View File
@@ -56,8 +56,12 @@ _SENSITIVE_BODY_KEYS = frozenset({
})
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
# `export HERMES_REDACT_SECRETS=true`) cannot enable/disable redaction
# mid-session. OFF by default — user must opt in via
# `security.redact_secrets: true` in config.yaml (bridged to this env var
# in hermes_cli/main.py and gateway/run.py) or `HERMES_REDACT_SECRETS=true`
# in ~/.hermes/.env.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("1", "true", "yes", "on")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
@@ -257,7 +261,7 @@ def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
Disabled by default — enable via security.redact_secrets: true in config.yaml.
"""
if text is None:
return None
+33 -4
View File
@@ -6,12 +6,18 @@ adds latency to the user-facing reply.
import logging
import threading
from typing import Optional
from typing import Callable, Optional
from agent.auxiliary_client import call_llm
logger = logging.getLogger(__name__)
# Callback signature: (task_name, exception) -> None. Used to surface
# auxiliary failures to the user through AIAgent._emit_auxiliary_failure
# so silent-drops (e.g. OpenRouter 402 exhausting the fallback chain)
# become visible instead of piling up as NULL session titles.
FailureCallback = Callable[[str, BaseException], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
@@ -19,11 +25,21 @@ _TITLE_PROMPT = (
)
def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
def generate_title(
user_message: str,
assistant_response: str,
timeout: float = 30.0,
failure_callback: Optional[FailureCallback] = None,
) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
``failure_callback`` is invoked with ``(task, exception)`` when the
auxiliary call raises — the caller typically wires this to
``AIAgent._emit_auxiliary_failure`` so the user sees a warning instead
of silently accumulating untitled sessions.
"""
# Truncate long messages to keep the request small
user_snippet = user_message[:500] if user_message else ""
@@ -52,7 +68,15 @@ def generate_title(user_message: str, assistant_response: str, timeout: float =
title = title[:77] + "..."
return title if title else None
except Exception as e:
logger.debug("Title generation failed: %s", e)
# Log at WARNING so this shows up in agent.log without debug mode.
# Full detail at debug level for operators who need the stack.
logger.warning("Title generation failed: %s", e)
logger.debug("Title generation traceback", exc_info=True)
if failure_callback is not None:
try:
failure_callback("title generation", e)
except Exception:
logger.debug("Title generation failure_callback raised", exc_info=True)
return None
@@ -61,6 +85,7 @@ def auto_title_session(
session_id: str,
user_message: str,
assistant_response: str,
failure_callback: Optional[FailureCallback] = None,
) -> None:
"""Generate and set a session title if one doesn't already exist.
@@ -81,7 +106,9 @@ def auto_title_session(
except Exception:
return
title = generate_title(user_message, assistant_response)
title = generate_title(
user_message, assistant_response, failure_callback=failure_callback
)
if not title:
return
@@ -98,6 +125,7 @@ def maybe_auto_title(
user_message: str,
assistant_response: str,
conversation_history: list,
failure_callback: Optional[FailureCallback] = None,
) -> None:
"""Fire-and-forget title generation after the first exchange.
@@ -119,6 +147,7 @@ def maybe_auto_title(
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
kwargs={"failure_callback": failure_callback},
daemon=True,
name="auto-title",
)
+240 -40
View File
@@ -15,6 +15,7 @@ Usage:
import logging
import os
import re
import shutil
import sys
import json
@@ -758,9 +759,17 @@ def _run_cleanup():
pass
try:
if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
_active_agent_ref.shutdown_memory_provider(
getattr(_active_agent_ref, 'conversation_history', None) or []
)
# Forward the agent's own transcript so memory providers'
# ``on_session_end`` hooks see the real conversation instead of
# an empty list (#15165). ``_session_messages`` is set on
# ``AIAgent.__init__`` and refreshed every turn via
# ``_persist_session``. Fall back to no-arg on test stubs /
# partially-initialised agents where the attribute is missing.
_session_msgs = getattr(_active_agent_ref, '_session_messages', None)
if isinstance(_session_msgs, list):
_active_agent_ref.shutdown_memory_provider(_session_msgs)
else:
_active_agent_ref.shutdown_memory_provider()
except Exception:
pass
@@ -1547,6 +1556,60 @@ def _should_auto_attach_clipboard_image_on_paste(pasted_text: str) -> bool:
return not pasted_text.strip()
def _strip_leaked_bracketed_paste_wrappers(text: str) -> str:
"""Strip leaked bracketed-paste wrapper markers from user-visible text.
Defensive normalization for cases where terminal/prompt_toolkit parsing
fails and bracketed-paste markers end up in the buffer as literal text.
We strip canonical wrappers unconditionally and also handle degraded
visible forms like ``[200~`` / ``[201~`` and ``00~`` / ``01~`` when they
look like wrapper boundaries, not arbitrary user content.
"""
if not text:
return text
text = (
text.replace("\x1b[200~", "")
.replace("\x1b[201~", "")
.replace("^[[200~", "")
.replace("^[[201~", "")
)
text = re.sub(r"(^|[\s\n>:\]\)])\[200~", r"\1", text)
text = re.sub(r"\[201~(?=$|[\s\n<\[\(\):;.,!?])", "", text)
text = re.sub(r"(^|[\s\n>:\]\)])00~", r"\1", text)
text = re.sub(r"01~(?=$|[\s\n<\[\(\):;.,!?])", "", text)
return text
# Cursor Position Report (CPR / DSR) response, format ``ESC[<row>;<col>R``.
# prompt_toolkit's _on_resize() + renderer send ``ESC[6n`` queries to the
# terminal; under resize storms or tab switches the terminal's reply can
# race past the input parser and end up in the input buffer as literal
# text (see issue #14692). Also matches the visible-form ``^[[<row>;<col>R``
# that appears when the ESC byte was stripped by a prior filter.
_DSR_CPR_ESC_RE = re.compile(r"\x1b\[\d+;\d+R")
_DSR_CPR_VISIBLE_RE = re.compile(r"\^\[\[\d+;\d+R")
def _strip_leaked_terminal_responses(text: str) -> str:
"""Strip leaked terminal control-response sequences from user input.
Covers Cursor Position Report (CPR / DSR) responses ``ESC[<row>;<col>R``
and the visible ``^[[<row>;<col>R`` form. These are replies the terminal
sends back to queries prompt_toolkit makes during ``_on_resize`` /
``_request_absolute_cursor_position``. When the input parser drops one
(resize storms, multiplexer focus changes, slow PTYs) the response
lands in the input buffer as literal text and corrupts what the user
typed.
"""
if not text:
return text
text = _DSR_CPR_ESC_RE.sub("", text)
text = _DSR_CPR_VISIBLE_RE.sub("", text)
return text
def _collect_query_images(query: str | None, image_arg: str | None = None) -> tuple[str, list[Path]]:
"""Collect local image attachments for single-query CLI flows."""
message = query or ""
@@ -2155,6 +2218,42 @@ class HermesCLI:
self._last_invalidate = now
self._app.invalidate()
def _force_full_redraw(self) -> None:
"""Force a clean full-screen repaint of the prompt_toolkit UI.
Used to recover from terminal buffer drift caused by external
redraws we can't detect — e.g. macOS cmux / tmux tab switches,
``clear`` issued from a subshell, or SSH window restores. These
wipe or repaint the terminal without firing SIGWINCH, so
prompt_toolkit's tracked ``_cursor_pos`` no longer matches reality
and the next incremental redraw stacks on top of stale content
(ghost status bars, duplicated prompts).
Bound to Ctrl+L and exposed as the ``/redraw`` slash command,
matching the standard terminal-UX convention (bash, zsh, fish,
vim, htop).
"""
app = getattr(self, "_app", None)
if not app:
return
try:
renderer = app.renderer
out = renderer.output
out.reset_attributes()
out.erase_screen()
out.cursor_goto(0, 0)
out.flush()
# Drop prompt_toolkit's cached screen + cursor state so the
# next _redraw() starts from a known (0, 0) origin and
# re-renders every cell rather than diffing against stale.
renderer.reset(leave_alternate_screen=False)
except Exception:
pass
try:
app.invalidate()
except Exception:
pass
def _status_bar_context_style(self, percent_used: Optional[int]) -> str:
if percent_used is None:
return "class:status-bar-dim"
@@ -5901,6 +6000,7 @@ class HermesCLI:
platform_status = {
Platform.TELEGRAM: ("Telegram", "TELEGRAM_BOT_TOKEN"),
Platform.DISCORD: ("Discord", "DISCORD_BOT_TOKEN"),
Platform.SLACK: ("Slack", "SLACK_BOT_TOKEN"),
Platform.WHATSAPP: ("WhatsApp", "WHATSAPP_ENABLED"),
}
@@ -5971,6 +6071,12 @@ class HermesCLI:
self.show_toolsets()
elif canonical == "config":
self.show_config()
elif canonical == "redraw":
# Manual recovery for terminal buffer drift from multiplexer
# tab switches, subshell ``clear``, SSH window restores, etc.
# See issue #8688 (cmux). Ctrl+L is bound to the same helper.
self._force_full_redraw()
_cprint(f" {_DIM}✓ UI redrawn{_RST}")
elif canonical == "clear":
self.new_session(silent=True)
# Clear terminal screen. Inside the TUI, Rich's console.clear()
@@ -8336,13 +8442,62 @@ class HermesCLI:
):
return None
# Pre-process images through the vision tool (Gemini Flash) so the
# main model receives text descriptions instead of raw base64 image
# content — works with any model, not just vision-capable ones.
# Route image attachments based on the active model's vision capability.
# "native" → pass pixels as OpenAI-style content parts (adapters
# translate for Anthropic/Gemini/Bedrock).
# "text" → pre-analyze each image with vision_analyze and prepend the
# description as text — works with non-vision models.
# See agent/image_routing.py for the decision table.
if images:
message = self._preprocess_images_with_vision(
message if isinstance(message, str) else "", images
)
try:
from agent.image_routing import (
build_native_content_parts,
decide_image_input_mode,
)
from hermes_cli.config import load_config
_img_mode = decide_image_input_mode(
(self.provider or "").strip(),
(self.model or "").strip(),
load_config(),
)
except Exception as _img_exc:
logging.debug("image_routing decision failed, defaulting to text: %s", _img_exc)
_img_mode = "text"
if _img_mode == "native":
try:
_text_for_parts = message if isinstance(message, str) else ""
_img_str_paths = [str(p) for p in images]
_parts, _skipped = build_native_content_parts(
_text_for_parts,
_img_str_paths,
)
if _skipped:
_cprint(
f" {_DIM}⚠ skipped {len(_skipped)} unreadable image path(s){_RST}"
)
if any(p.get("type") == "image_url" for p in _parts):
_img_names = ", ".join(Path(p).name for p in _img_str_paths)
_cprint(
f" {_DIM}📎 attaching {len(images)} image(s) natively "
f"(model supports vision): {_img_names}{_RST}"
)
message = _parts
else:
# All images unreadable — fall back to text enrichment.
message = self._preprocess_images_with_vision(
message if isinstance(message, str) else "", images
)
except Exception as _img_exc:
logging.warning("native image attach failed, falling back to text: %s", _img_exc)
message = self._preprocess_images_with_vision(
message if isinstance(message, str) else "", images
)
else:
message = self._preprocess_images_with_vision(
message if isinstance(message, str) else "", images
)
# Expand @ context references (e.g. @file:main.py, @diff, @folder:src/)
if isinstance(message, str) and "@" in message:
@@ -8645,12 +8800,20 @@ class HermesCLI:
if response and result and not result.get("failed") and not result.get("partial"):
try:
from agent.title_generator import maybe_auto_title
# Route title-generation failures through the agent's
# user-visible warning channel so a depleted auxiliary
# provider doesn't silently leave sessions untitled
# (issue #15775).
_title_failure_cb = getattr(
self.agent, "_emit_auxiliary_failure", None
) if self.agent else None
maybe_auto_title(
self._session_db,
self.session_id,
message,
response,
self.conversation_history,
failure_callback=_title_failure_cb,
)
except Exception:
pass
@@ -9528,6 +9691,17 @@ class HermesCLI:
"""Down arrow: browse history when on last line, else move cursor down."""
event.app.current_buffer.auto_down(count=event.arg)
@kb.add('c-l')
def handle_ctrl_l(event):
"""Ctrl+L: force a clean full-screen repaint.
Recovers the UI after external terminal buffer drift tmux /
cmux tab switches, ``clear`` from a subshell, SSH window
restores, etc. that prompt_toolkit can't detect on its own.
Matches the universal bash/zsh/fish/vim/htop convention.
"""
self._force_full_redraw()
@kb.add('c-c')
def handle_ctrl_c(event):
"""Handle Ctrl+C - cancel interactive prompts, interrupt agent, or exit.
@@ -9755,10 +9929,18 @@ class HermesCLI:
placeholder while preserving any existing user text in the
buffer.
"""
# Diagnostic canary: measure how long the paste handler blocks
# the prompt_toolkit event loop. If this exceeds ~500ms we log
# it so recurring "CLI freezes on paste" reports (issue #16263,
# macOS Tahoe 26 + iTerm2/Ghostty) arrive with data attached.
_paste_handler_start = time.perf_counter()
_paste_raw_size = len(event.data or "")
pasted_text = event.data or ""
# Normalise line endings — Windows \r\n and old Mac \r both become \n
# so the 5-line collapse threshold and display are consistent.
pasted_text = pasted_text.replace('\r\n', '\n').replace('\r', '\n')
pasted_text = _strip_leaked_bracketed_paste_wrappers(pasted_text)
pasted_text = _strip_leaked_terminal_responses(pasted_text)
if _should_auto_attach_clipboard_image_on_paste(pasted_text) and self._try_attach_clipboard_image():
event.app.invalidate()
if pasted_text:
@@ -9781,6 +9963,17 @@ class HermesCLI:
buf.insert_text(prefix + placeholder)
else:
buf.insert_text(pasted_text)
_paste_handler_elapsed_ms = (time.perf_counter() - _paste_handler_start) * 1000.0
if _paste_handler_elapsed_ms > 500.0:
logger.warning(
"Slow bracketed-paste handler: %.1fms to process %d bytes "
"(%d lines) on %s. If the input becomes unresponsive after "
"this, attach this log line to the bug report.",
_paste_handler_elapsed_ms,
_paste_raw_size,
pasted_text.count('\n') + 1 if pasted_text else 0,
sys.platform,
)
@kb.add('c-v')
def handle_ctrl_v(event):
@@ -9900,7 +10093,16 @@ class HermesCLI:
still batch newlines. Alt+Enter only adds 1 newline per
event so it never triggers this.
"""
text = buf.text
text = _strip_leaked_bracketed_paste_wrappers(buf.text)
text = _strip_leaked_terminal_responses(text)
if text != buf.text:
cursor = min(buf.cursor_position, len(text))
_paste_just_collapsed[0] = True
buf.text = text
buf.cursor_position = cursor
_prev_text_len[0] = len(text)
_prev_newline_count[0] = text.count('\n')
return
chars_added = len(text) - _prev_text_len[0]
_prev_text_len[0] = len(text)
if _paste_just_collapsed[0] or self._skip_paste_collapse:
@@ -10557,36 +10759,30 @@ class HermesCLI:
# only cursor_up()s by the stored layout height, missing the extra
# rows created by reflow — leaving ghost duplicates visible.
#
# Fix: before the standard erase, inflate _cursor_pos.y so the
# cursor moves up far enough to cover the reflowed ghost content.
# It's not just column-shrink: widening, row-shrinking, and
# multiplexer-driven SIGWINCH-less redraws (cmux / tmux tab switch)
# all produce the same class of drift, where the renderer's tracked
# _cursor_pos.y no longer matches terminal reality. The only reliable
# recovery is a full screen-clear (\x1b[2J\x1b[H) before the next
# redraw, so we force one on every resize rather than trying to
# compute the exact drift.
_original_on_resize = app._on_resize
def _resize_clear_ghosts():
from prompt_toolkit.data_structures import Point as _Pt
renderer = app.renderer
try:
old_size = renderer._last_size
new_size = renderer.output.get_size()
if (
old_size
and new_size.columns < old_size.columns
and new_size.columns > 0
):
reflow_factor = (
(old_size.columns + new_size.columns - 1)
// new_size.columns
)
last_h = (
renderer._last_screen.height
if renderer._last_screen
else 0
)
extra = last_h * (reflow_factor - 1)
if extra > 0:
renderer._cursor_pos = _Pt(
x=renderer._cursor_pos.x,
y=renderer._cursor_pos.y + extra,
)
out = renderer.output
# Reset attributes, erase the entire screen, and home the
# cursor. This overwrites any reflowed status-bar rows or
# stale content the terminal kept from the prior layout.
out.reset_attributes()
out.erase_screen()
out.cursor_goto(0, 0)
out.flush()
# Tell the renderer its tracked position is fresh so its
# own erase() inside _on_resize doesn't cursor_up() past
# the top of the screen.
renderer.reset(leave_alternate_screen=False)
except Exception:
pass # never break resize handling
_original_on_resize()
@@ -10594,7 +10790,6 @@ class HermesCLI:
app._on_resize = _resize_clear_ghosts
def spinner_loop():
last_idle_refresh = 0.0
while not self._should_exit:
if not self._app:
time.sleep(0.1)
@@ -10603,10 +10798,11 @@ class HermesCLI:
self._invalidate(min_interval=0.1)
time.sleep(0.1)
else:
now = time.monotonic()
if now - last_idle_refresh >= 1.0:
last_idle_refresh = now
self._invalidate(min_interval=1.0)
# Do not repaint the idle prompt every second. In non-full-screen
# prompt_toolkit mode, background redraws can fight tmux/Ghostty/cmux
# viewport restoration after focus changes and visually move the
# command input area. Keep idle stable; input/agent events still
# invalidate explicitly when the UI actually changes.
time.sleep(0.2)
spinner_thread = threading.Thread(target=spinner_loop, daemon=True)
@@ -10648,6 +10844,10 @@ class HermesCLI:
submit_images = []
if isinstance(user_input, tuple):
user_input, submit_images = user_input
if isinstance(user_input, str):
user_input = _strip_leaked_bracketed_paste_wrappers(user_input)
user_input = _strip_leaked_terminal_responses(user_input)
# Check for commands — but detect dragged/pasted file paths first.
# See _detect_file_drop() for details.
+31 -3
View File
@@ -311,6 +311,12 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
elif schedule["kind"] == "cron":
if not HAS_CRONITER:
logger.warning(
"Cannot compute next run for cron schedule %r: 'croniter' "
"is not installed. Install the 'cron' extra (pip install "
"'hermes-agent[cron]') to re-enable recurring cron jobs.",
schedule.get("expr"),
)
return None
cron = croniter(schedule["expr"], now)
next_run = cron.get_next(datetime)
@@ -698,10 +704,32 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
# Compute next run
job["next_run_at"] = compute_next_run(job["schedule"], now)
# If no next run (one-shot completed), disable
# If no next run, decide whether this is terminal completion
# (one-shot) or a transient failure (recurring schedule couldn't
# compute — e.g. 'croniter' missing from the runtime env).
# Recurring jobs must NEVER be silently disabled: that turns a
# missing runtime dep into "job completed" and the user's
# schedule quietly goes off. See issue #16265.
if job["next_run_at"] is None:
job["enabled"] = False
job["state"] = "completed"
kind = job.get("schedule", {}).get("kind")
if kind in ("cron", "interval"):
job["state"] = "error"
if not job.get("last_error"):
job["last_error"] = (
"Failed to compute next run for recurring "
"schedule (is the 'croniter' package "
"installed in the gateway's Python env?)"
)
logger.error(
"Job '%s' (%s) could not compute next_run_at; "
"leaving enabled and marking state=error so the "
"job is not silently disabled.",
job.get("name", job["id"]),
kind,
)
else:
job["enabled"] = False
job["state"] = "completed"
elif job.get("state") != "paused":
job["state"] = "scheduled"
+20
View File
@@ -822,6 +822,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
agent = None
# Mark this as a cron session so the approval system can apply cron_mode.
# This env var is process-wide and persists for the lifetime of the
# scheduler process — every job this process runs is a cron job.
@@ -1170,6 +1172,24 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_session_db.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close SQLite session store: %s", job_id, e)
# Release subprocesses, terminal sandboxes, browser daemons, and the
# main OpenAI/httpx client held by this ephemeral cron agent. Without
# this, a gateway that ticks cron every N minutes leaks fds per job
# until it hits EMFILE (#10200 / "too many open files").
try:
if agent is not None:
agent.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close agent resources: %s", job_id, e)
# Each cron run spins up a short-lived worker thread whose event loop
# dies as soon as the ``ThreadPoolExecutor`` shuts down. Any async
# httpx clients cached under that loop are now unusable — reap them
# so their transports don't accumulate in the process-global cache.
try:
from agent.auxiliary_client import cleanup_stale_async_clients
cleanup_stale_async_clients()
except Exception as e:
logger.debug("Job '%s': failed to reap stale auxiliary clients: %s", job_id, e)
def tick(verbose: bool = True, adapters=None, loop=None) -> int:
+1
View File
@@ -36,6 +36,7 @@
imports = [
./nix/packages.nix
./nix/overlays.nix
./nix/nixosModules.nix
./nix/checks.nix
./nix/devShell.nix
+16 -1
View File
@@ -566,6 +566,8 @@ def load_gateway_config() -> GatewayConfig:
existing = {}
# Deep-merge extra dicts so gateway.json defaults survive
merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
if plat_name == Platform.SLACK.value and "enabled" in plat_block:
merged_extra["_enabled_explicit"] = True
merged = {**existing, **plat_block}
if merged_extra:
merged["extra"] = merged_extra
@@ -610,16 +612,21 @@ def load_gateway_config() -> GatewayConfig:
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
enabled_was_explicit = "enabled" in platform_cfg
if not bridged and not enabled_was_explicit:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
if plat == Platform.SLACK and enabled_was_explicit:
extra["_enabled_explicit"] = True
extra.update(bridged)
# Slack settings → env vars (env vars take precedence)
@@ -941,6 +948,14 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# No yaml config for Slack — env-only setup, enable it
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
else:
slack_config = config.platforms[Platform.SLACK]
enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
if not slack_config.enabled and not enabled_was_explicit:
# Top-level Slack settings such as channel prompts should not
# turn an env-token setup into a disabled platform. Only an
# explicit slack.enabled/platforms.slack.enabled false should.
slack_config.enabled = True
# If yaml config exists, respect its enabled flag (don't override
# explicit enabled: false). Token is still stored so skills that
# send Slack messages can use it without activating the gateway adapter.
+49 -12
View File
@@ -307,9 +307,14 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""Build kwargs for standalone ``aiohttp.ClientSession`` with proxy.
Returns ``(session_kwargs, request_kwargs)`` where:
- SOCKS ``({"connector": ProxyConnector(...)}, {})``
- HTTP ``({}, {"proxy": url})``
- None ``({}, {})``
- With aiohttp-socks ``({"connector": ProxyConnector(...)}, {})``
for *all* proxy schemes (SOCKS **and** HTTP/HTTPS).
- HTTP without aiohttp-socks ``({}, {"proxy": url})``.
- None ``({}, {})``.
Prefer the connector path: it works transparently with libraries
(like mautrix) that call ``session.request()`` without forwarding
per-request ``proxy=`` kwargs.
Usage::
@@ -320,20 +325,20 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""
if not proxy_url:
return {}, {}
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
if proxy_url.lower().startswith("socks"):
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}, {}
return {}, {"proxy": proxy_url}
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
@@ -1702,13 +1707,41 @@ class BasePlatformAdapter(ABC):
the agent is waiting for dangerous-command approval). This is critical
for Slack's Assistant API where ``assistant_threads_setStatus`` disables
the compose box pausing lets the user type ``/approve`` or ``/deny``.
Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
network round-trip can't stall the refresh cadence. Telegram- and
Discord-side typing expire after ~5s; if any individual send_typing
takes longer than the refresh interval, the bubble would die and
stay dead until that call returns. Abandoning the slow call lets
the next tick fire a fresh send_typing on schedule as long as
one of them succeeds within the 5s platform-side window, the bubble
stays visible across provider stalls / upstream API timeouts.
"""
# Bound each send_typing round-trip so the refresh cadence isn't
# gated on network health. Must stay below ``interval`` so a slow
# call gets abandoned before the next scheduled tick.
_send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
try:
while True:
if stop_event is not None and stop_event.is_set():
return
if chat_id not in self._typing_paused:
await self.send_typing(chat_id, metadata=metadata)
try:
await asyncio.wait_for(
self.send_typing(chat_id, metadata=metadata),
timeout=_send_typing_timeout,
)
except asyncio.TimeoutError:
# Slow network — abandon this tick, keep the loop
# on schedule so the next send_typing fires fresh.
pass
except asyncio.CancelledError:
raise
except Exception as typing_err:
logger.debug(
"[%s] send_typing error (non-fatal): %s",
self.name, typing_err,
)
if stop_event is None:
await asyncio.sleep(interval)
continue
@@ -2399,11 +2432,15 @@ class BasePlatformAdapter(ABC):
# Send the text portion
if text_content:
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
# Build send metadata: thread_id + mention target for platforms that need it
send_metadata = dict(_thread_metadata) if _thread_metadata else {}
if event.source.user_id:
send_metadata["mention_user_id"] = event.source.user_id
result = await self._send_with_retry(
chat_id=event.source.chat_id,
content=text_content,
reply_to=event.message_id,
metadata=_thread_metadata,
metadata=send_metadata,
)
_record_delivery(result)
+2 -1
View File
@@ -3294,6 +3294,7 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)
# Build source
guild = getattr(message, "guild", None)
source = self.build_source(
chat_id=str(effective_channel.id),
chat_name=chat_name,
@@ -3303,7 +3304,7 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
guild_id=str(guild.id) if guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
+3
View File
@@ -28,6 +28,7 @@ from email.header import decode_header
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email.utils import formatdate
from email import encoders
from pathlib import Path
from typing import Any, Dict, List, Optional
@@ -504,6 +505,7 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
@@ -586,6 +588,7 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
+511 -45
View File
@@ -11,6 +11,7 @@ Environment variables:
MATRIX_PASSWORD Password (alternative to access token)
MATRIX_ENCRYPTION Set "true" to enable E2EE
MATRIX_DEVICE_ID Stable device ID for E2EE persistence across restarts
MATRIX_PROXY HTTP(S) or SOCKS proxy URL for Matrix traffic
MATRIX_ALLOWED_USERS Comma-separated Matrix user IDs (@user:server)
MATRIX_HOME_ROOM Room ID for cron/notification delivery
MATRIX_REACTIONS Set "false" to disable processing lifecycle reactions
@@ -18,6 +19,7 @@ Environment variables:
MATRIX_REQUIRE_MENTION Require @mention in rooms (default: true)
MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement
MATRIX_AUTO_THREAD Auto-create threads for room messages (default: true)
MATRIX_DM_AUTO_THREAD Auto-create threads for DM messages (default: false)
MATRIX_RECOVERY_KEY Recovery key for cross-signing verification after device key rotation
MATRIX_DM_MENTION_THREADS Create a thread when bot is @mentioned in a DM (default: false)
"""
@@ -30,6 +32,8 @@ import mimetypes
import os
import re
import time
from dataclasses import dataclass
from html import escape as _html_escape
from pathlib import Path
from typing import Any, Dict, Optional, Set
@@ -95,11 +99,25 @@ from gateway.platforms.base import (
MessageType,
ProcessingOutcome,
SendResult,
resolve_proxy_url,
proxy_kwargs_for_aiohttp,
)
from gateway.platforms.helpers import ThreadParticipationTracker
logger = logging.getLogger(__name__)
@dataclass
class _MatrixApprovalPrompt:
"""Tracks a pending Matrix reaction-based exec approval prompt."""
def __init__(self, session_key: str, chat_id: str, message_id: str, resolved: bool = False):
self.session_key = session_key
self.chat_id = chat_id
self.message_id = message_id
self.resolved = resolved
self.bot_reaction_events: dict[str, str] = {} # emoji -> event_id
# Matrix message size limit (4000 chars practical, spec has no hard limit
# but clients render poorly above this).
MAX_MESSAGE_LENGTH = 4000
@@ -114,11 +132,85 @@ _CRYPTO_DB_PATH = _STORE_DIR / "crypto.db"
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
_OUTBOUND_MENTION_RE = re.compile(
r"(?<![\w/])(@[0-9A-Za-z._=/-]+:[0-9A-Za-z.-]+(?::\d+)?)"
)
_E2EE_INSTALL_HINT = (
"Install with: pip install 'mautrix[encryption]' (requires libolm C library)"
)
_MATRIX_IMAGE_FILENAME_EXTS = frozenset({
".jpg",
".jpeg",
".png",
".gif",
".webp",
".bmp",
".svg",
".heic",
".heif",
".avif",
})
def _looks_like_matrix_image_filename(text: str) -> bool:
"""Return True when Matrix image body text is probably just a transport filename.
Matrix ``m.image`` events commonly populate ``content.body`` with the uploaded
filename when the user did not add a caption. Treating that raw filename as
user-authored text confuses downstream vision enrichment.
"""
candidate = str(text or "").strip()
if not candidate or "\n" in candidate or candidate.endswith("/"):
return False
name = Path(candidate).name
if not name or name != candidate:
return False
suffix = Path(name).suffix.lower()
if not suffix:
return False
guessed_type, _ = mimetypes.guess_type(name)
if guessed_type and guessed_type.startswith("image/"):
return True
return suffix in _MATRIX_IMAGE_FILENAME_EXTS
def _create_matrix_session(proxy_url: str | None):
"""Create an ``aiohttp.ClientSession`` whose proxy applies to *all* requests.
mautrix's ``HTTPAPI._send()`` calls ``session.request()`` without forwarding
per-request ``proxy=`` kwargs. For HTTP(S) proxies we use aiohttp's native
``proxy=`` session parameter which sets a default for every request. For SOCKS
we use ``aiohttp_socks.ProxyConnector`` (connector-level).
When no proxy is configured we enable ``trust_env`` so standard env vars
(``HTTP_PROXY`` / ``HTTPS_PROXY``) are honoured automatically.
"""
import aiohttp
if not proxy_url:
return aiohttp.ClientSession(trust_env=True)
if proxy_url.split("://")[0].lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
return aiohttp.ClientSession(
connector=ProxyConnector.from_url(proxy_url, rdns=True),
)
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return aiohttp.ClientSession(trust_env=True)
return aiohttp.ClientSession(proxy=proxy_url)
def _check_e2ee_deps() -> bool:
"""Return True if mautrix E2EE dependencies (python-olm) are available."""
@@ -260,6 +352,9 @@ class MatrixAdapter(BasePlatformAdapter):
"1",
"yes",
)
self._dm_auto_thread: bool = os.getenv(
"MATRIX_DM_AUTO_THREAD", "false"
).lower() in ("true", "1", "yes")
self._dm_mention_threads: bool = os.getenv(
"MATRIX_DM_MENTION_THREADS", "false"
).lower() in ("true", "1", "yes")
@@ -270,6 +365,11 @@ class MatrixAdapter(BasePlatformAdapter):
).lower() not in ("false", "0", "no")
self._pending_reactions: dict[tuple[str, str], str] = {}
# Proxy support — resolve once at init, reuse for all HTTP traffic.
self._proxy_url: str | None = resolve_proxy_url(platform_env_var="MATRIX_PROXY")
if self._proxy_url:
logger.info("Matrix: proxy configured — %s", self._proxy_url)
# Text batching: merge rapid successive messages (Telegram-style).
# Matrix clients split long messages around 4000 chars.
self._text_batch_delay_seconds = float(
@@ -281,6 +381,18 @@ class MatrixAdapter(BasePlatformAdapter):
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
# Matrix reaction-based dangerous command approvals.
self._approval_reaction_map = {
"": "once",
"": "deny",
}
self._approval_prompts_by_event: Dict[str, _MatrixApprovalPrompt] = {}
self._approval_prompt_by_session: Dict[str, str] = {}
allowed_users_raw = os.getenv("MATRIX_ALLOWED_USERS", "")
self._allowed_user_ids: Set[str] = {
u.strip() for u in allowed_users_raw.split(",") if u.strip()
}
def _is_duplicate_event(self, event_id) -> bool:
"""Return True if this event was already processed. Tracks the ID otherwise."""
if not event_id:
@@ -326,7 +438,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
return False
except Exception as exc:
logger.error("Matrix: post-upload key verification failed: %s", exc)
logger.error("Matrix: post-upload key verification failed: %s", exc, exc_info=True)
return False
return True
@@ -342,6 +454,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: cannot verify device keys on server: %s — refusing E2EE",
exc,
exc_info=True,
)
return False
@@ -356,7 +469,7 @@ class MatrixAdapter(BasePlatformAdapter):
try:
await olm.share_keys()
except Exception as exc:
logger.error("Matrix: failed to re-upload device keys: %s", exc)
logger.error("Matrix: failed to re-upload device keys: %s", exc, exc_info=True)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -396,6 +509,7 @@ class MatrixAdapter(BasePlatformAdapter):
"Try generating a new access token to get a fresh device.",
client.device_id,
exc,
exc_info=True,
)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -420,9 +534,11 @@ class MatrixAdapter(BasePlatformAdapter):
_STORE_DIR.mkdir(parents=True, exist_ok=True)
# Create the HTTP API layer.
client_session = _create_matrix_session(self._proxy_url)
api = HTTPAPI(
base_url=self._homeserver,
token=self._access_token or "",
client_session=client_session,
)
# Create the client.
@@ -465,6 +581,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER: %s",
exc,
exc_info=True,
)
await api.session.close()
return False
@@ -607,6 +724,44 @@ class MatrixAdapter(BasePlatformAdapter):
logger.warning(
"Matrix: recovery key verification failed: %s", exc
)
else:
# No recovery key — bootstrap cross-signing if the bot
# has none yet. Without this, Element shows "Encrypted
# by a device not verified by its owner" on every
# message from this bot, indefinitely. mautrix's
# generate_recovery_key does the full flow: generates
# MSK/SSK/USK, uploads private keys to SSSS, publishes
# public keys to the homeserver, and signs the current
# device with the new SSK. Some homeservers require UIA
# for /keys/device_signing/upload — those will need an
# alternate path; Continuwuity and Synapse-with-shared-
# secret accept the unauthenticated upload.
try:
own_xsign = await olm.get_own_cross_signing_public_keys()
except Exception as exc:
own_xsign = None
logger.warning(
"Matrix: cross-signing key lookup failed: %s", exc
)
if own_xsign is None:
try:
new_recovery_key = await olm.generate_recovery_key()
logger.warning(
"Matrix: bootstrapped cross-signing for %s. "
"SAVE THIS RECOVERY KEY — set "
"MATRIX_RECOVERY_KEY for future restarts so "
"the bot can re-sign its device after key "
"rotation: %s",
client.mxid,
new_recovery_key,
)
except Exception as exc:
logger.warning(
"Matrix: cross-signing bootstrap failed "
"(non-fatal — Element will show 'not "
"verified by its owner'): %s",
exc,
)
client.crypto = olm
logger.info(
@@ -664,6 +819,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: initial sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
else:
logger.warning(
"Matrix: initial sync returned unexpected type %s",
@@ -723,21 +879,32 @@ class MatrixAdapter(BasePlatformAdapter):
if not content:
return SendResult(success=True)
mention_user_id = (metadata or {}).get("mention_user_id")
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, MAX_MESSAGE_LENGTH)
last_event_id = None
for chunk in chunks:
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": chunk,
}
for i, chunk in enumerate(chunks):
msg_content = self._build_text_message_content(chunk)
# Convert markdown to HTML for rich rendering.
html = self._markdown_to_html(chunk)
if html and html != chunk:
# Append @mention pill to the last chunk for push notifications
# in muted rooms (mention-only mode).
if mention_user_id and i == len(chunks) - 1:
mention_html = (
f'<a href="https://matrix.to/#/{mention_user_id}">'
f"{mention_user_id}</a>"
)
msg_content["body"] = chunk + f" @{mention_user_id}"
base_html = msg_content.get("formatted_body", chunk)
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
msg_content["formatted_body"] = base_html + " " + mention_html
# m.mentions for MSC3952 push reliability.
existing_mentions = msg_content.get("m.mentions", {}).get("user_ids", [])
if mention_user_id not in existing_mentions:
msg_content["m.mentions"] = {
"user_ids": existing_mentions + [mention_user_id]
}
# Reply-to support.
if reply_to:
@@ -844,25 +1011,21 @@ class MatrixAdapter(BasePlatformAdapter):
"""Edit an existing message (via m.replace)."""
formatted = self.format_message(content)
new_content = self._build_text_message_content(formatted)
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": f"* {formatted}",
"m.new_content": {
"msgtype": "m.text",
"body": formatted,
},
"m.relates_to": {
"rel_type": "m.replace",
"event_id": message_id,
},
"m.new_content": new_content,
}
html = self._markdown_to_html(formatted)
if html and html != formatted:
msg_content["m.new_content"]["format"] = "org.matrix.custom.html"
msg_content["m.new_content"]["formatted_body"] = html
if "m.mentions" in new_content:
msg_content["m.mentions"] = new_content["m.mentions"]
if "formatted_body" in new_content:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = f"* {html}"
msg_content["formatted_body"] = f'* {new_content["formatted_body"]}'
msg_content["m.relates_to"] = {
"rel_type": "m.replace",
"event_id": message_id,
}
try:
event_id = await self._client.send_message_event(
@@ -895,10 +1058,12 @@ class MatrixAdapter(BasePlatformAdapter):
# Try aiohttp first (always available), fall back to httpx
try:
import aiohttp as _aiohttp
async with _aiohttp.ClientSession(trust_env=True) as http:
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(self._proxy_url)
async with _aiohttp.ClientSession(**_sess_kw) as http:
async with http.get(
image_url, timeout=_aiohttp.ClientTimeout(total=30)
image_url,
timeout=_aiohttp.ClientTimeout(total=30),
**_req_kw,
) as resp:
resp.raise_for_status()
data = await resp.read()
@@ -908,8 +1073,10 @@ class MatrixAdapter(BasePlatformAdapter):
)
except ImportError:
import httpx
async with httpx.AsyncClient() as http:
_httpx_kw: dict = {}
if self._proxy_url:
_httpx_kw["proxy"] = self._proxy_url
async with httpx.AsyncClient(**_httpx_kw) as http:
resp = await http.get(image_url, follow_redirects=True, timeout=30)
resp.raise_for_status()
data = resp.content
@@ -984,6 +1151,56 @@ class MatrixAdapter(BasePlatformAdapter):
chat_id, video_path, "m.video", caption, reply_to, metadata=metadata
)
async def send_exec_approval(
self,
chat_id: str,
command: str,
session_key: str,
description: str = "dangerous command",
metadata: Optional[dict] = None,
) -> SendResult:
"""Send a reaction-based exec approval prompt for Matrix."""
if not self._client:
return SendResult(success=False, error="Not connected")
cmd_preview = command[:2000] + "..." if len(command) > 2000 else command
text = (
"⚠️ **Dangerous command requires approval**\n"
f"```\n{cmd_preview}\n```\n"
f"Reason: {description}\n\n"
"Reply `/approve` to execute, `/approve session` to approve this pattern for the session, "
"`/approve always` to approve permanently, or `/deny` to cancel.\n\n"
"You can also click the reaction to approve:\n"
"✅ = /approve\n"
"❎ = /deny"
)
result = await self.send(chat_id, text, metadata=metadata)
if not result.success or not result.message_id:
return result
prompt = _MatrixApprovalPrompt(
session_key=session_key,
chat_id=chat_id,
message_id=result.message_id,
)
old_event = self._approval_prompt_by_session.get(session_key)
if old_event:
self._approval_prompts_by_event.pop(old_event, None)
self._approval_prompts_by_event[result.message_id] = prompt
self._approval_prompt_by_session[session_key] = result.message_id
for emoji in ("", ""):
try:
reaction_result = await self._send_reaction(chat_id, result.message_id, emoji)
# Save the bot's reaction event_id for later cleanup
if reaction_result:
prompt.bot_reaction_events[emoji] = str(reaction_result)
except Exception as exc:
logger.debug("Matrix: failed to add approval reaction %s: %s", emoji, exc)
return result
def format_message(self, content: str) -> str:
"""Pass-through — Matrix supports standard Markdown natively."""
# Strip image markdown; media is uploaded separately.
@@ -1115,9 +1332,15 @@ class MatrixAdapter(BasePlatformAdapter):
next_batch = await client.sync_store.get_next_batch()
while not self._closing:
try:
sync_data = await client.sync(
since=next_batch,
timeout=30000,
# Wrap in asyncio.wait_for to guard against TCP-level hangs
# that the Matrix long-poll timeout cannot catch. Long-poll
# is 30s, so 45s gives 15s slack for network drain.
sync_data = await asyncio.wait_for(
client.sync(
since=next_batch,
timeout=30000,
),
timeout=45.0,
)
# nio returns SyncError objects (not exceptions) for auth
@@ -1153,6 +1376,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
except asyncio.CancelledError:
return
@@ -1178,13 +1402,92 @@ class MatrixAdapter(BasePlatformAdapter):
# Event callbacks
# ------------------------------------------------------------------
def _is_self_sender(self, sender: str) -> bool:
"""Return True if the sender refers to the bot's own account.
Matrix user IDs are byte-compared after trimming whitespace and
lowercasing some homeservers normalize the localpart case
differently at different API surfaces, and the reply-loop tail
of the "hall of mirrors" bug (#15763) has been observed with the
bot's own account bypassing a case-sensitive equality check.
When ``self._user_id`` is empty (whoami hasn't resolved yet, or
login failed), we cannot prove a sender is NOT us, so we return
True defensively an unidentified bot dropping its own events
is always preferable to falling into an echo loop.
"""
own = (self._user_id or "").strip().lower()
if not own:
return True
return sender.strip().lower() == own
@staticmethod
def _is_system_or_bridge_sender(sender: str) -> bool:
"""Return True if the sender looks like a system / bridge / appservice
identity rather than a real user.
Appservice namespaces on Matrix conventionally prefix bot / puppet
user IDs with an underscore (e.g. ``@_telegram_12345:server``,
``@_discord_999:server``, ``@_slack_...:server``). Server-notices
bots and bridge-controller bots on many homeservers use the same
pattern.
We treat these as system identities for pairing purposes: they
should never be offered a pairing code, because an operator
approving the code would hand the bridge itself permanent
authorization and every outbound message relayed by the bridge
would then loop back into the agent as an "authorized user
message", which is the root of issue #15763.
Matches:
``@_something:server`` appservice namespace convention
``@:server`` malformed / empty localpart
``:server`` malformed, no leading ``@``
"""
s = (sender or "").strip()
if not s:
return True
# Localpart is everything between leading '@' and ':'
if s.startswith("@"):
s = s[1:]
if ":" in s:
localpart, _, _ = s.partition(":")
else:
localpart = s
if not localpart:
return True
return localpart.startswith("_")
async def _on_room_message(self, event: Any) -> None:
"""Handle incoming room message events (text, media)."""
room_id = str(getattr(event, "room_id", ""))
sender = str(getattr(event, "sender", ""))
# Ignore own messages.
if sender == self._user_id:
# Diagnostic: confirm the callback is firing at all when DEBUG is on.
# Helps users troubleshoot silent inbound issues like #5819, #7914, #12614.
logger.debug(
"Matrix: callback fired — event %s from %s in %s",
getattr(event, "event_id", "?"),
sender,
room_id,
)
# Ignore own messages (case-insensitive; also drops when our own
# user_id hasn't been resolved yet — see _is_self_sender docstring
# and issue #15763).
if self._is_self_sender(sender):
return
# Ignore appservice / bridge / system identities so they never
# trigger the pairing flow. Once a bridge user is paired, every
# outbound message it relays would loop back as an authorized
# user message (the "hall of mirrors" in #15763).
if self._is_system_or_bridge_sender(sender):
logger.debug(
"Matrix: ignoring system/bridge sender %s in %s",
sender,
room_id,
)
return
# Deduplicate by event ID.
@@ -1280,6 +1583,12 @@ class MatrixAdapter(BasePlatformAdapter):
in_bot_thread = bool(thread_id and thread_id in self._threads)
if self._require_mention and not is_free_room and not in_bot_thread:
if not is_mentioned:
logger.debug(
"Matrix: ignoring message %s in %s — no @mention "
"(set MATRIX_REQUIRE_MENTION=false to disable)",
event_id,
room_id,
)
return None
# DM mention-thread.
@@ -1292,7 +1601,7 @@ class MatrixAdapter(BasePlatformAdapter):
body = self._strip_mention(body)
# Auto-thread.
if not is_dm and not thread_id and self._auto_thread:
if not thread_id and ((not is_dm and self._auto_thread) or (is_dm and self._dm_auto_thread)):
thread_id = event_id
self._threads.mark(thread_id)
@@ -1534,6 +1843,9 @@ class MatrixAdapter(BasePlatformAdapter):
return
body, is_dm, chat_type, thread_id, display_name, source = ctx
if msgtype == "m.image" and _looks_like_matrix_image_filename(body):
body = ""
allow_http_fallback = bool(http_url) and not is_encrypted_media
media_urls = (
[cached_path]
@@ -1563,13 +1875,35 @@ class MatrixAdapter(BasePlatformAdapter):
"Matrix: invited to %s — joining",
room_id,
)
await self._join_room_by_id(room_id)
async def _join_room_by_id(self, room_id: str) -> bool:
"""Join a room by ID and refresh local caches on success."""
if not room_id:
return False
if room_id in self._joined_rooms:
return True
try:
await self._client.join_room(RoomID(room_id))
self._joined_rooms.add(room_id)
logger.info("Matrix: joined %s", room_id)
await self._refresh_dm_cache()
return True
except Exception as exc:
logger.warning("Matrix: error joining %s: %s", room_id, exc)
return False
async def _join_pending_invites(self, sync_data: Dict[str, Any]) -> None:
"""Join rooms still present in rooms.invite after sync processing."""
rooms = sync_data.get("rooms", {}) if isinstance(sync_data, dict) else {}
invites = rooms.get("invite", {})
if not isinstance(invites, dict):
return
for room_id in invites:
if room_id in self._joined_rooms:
continue
logger.info("Matrix: reconciling pending invite for %s", room_id)
await self._join_room_by_id(str(room_id))
# ------------------------------------------------------------------
# Reactions (send, receive, processing lifecycle)
@@ -1654,7 +1988,7 @@ class MatrixAdapter(BasePlatformAdapter):
async def _on_reaction(self, event: Any) -> None:
"""Handle incoming reaction events."""
sender = str(getattr(event, "sender", ""))
if sender == self._user_id:
if self._is_self_sender(sender):
return
event_id = str(getattr(event, "event_id", ""))
if self._is_duplicate_event(event_id):
@@ -1684,6 +2018,51 @@ class MatrixAdapter(BasePlatformAdapter):
room_id,
)
# Check if this reaction resolves a pending approval prompt.
prompt = self._approval_prompts_by_event.get(reacts_to)
if prompt and not prompt.resolved:
if room_id != prompt.chat_id:
return
if self._allowed_user_ids and sender not in self._allowed_user_ids:
logger.info(
"Matrix: ignoring approval reaction from unauthorized user %s on %s",
sender, reacts_to,
)
return
choice = self._approval_reaction_map.get(key)
if not choice:
return
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(prompt.session_key, choice)
if count:
prompt.resolved = True
self._approval_prompts_by_event.pop(reacts_to, None)
self._approval_prompt_by_session.pop(prompt.session_key, None)
logger.info(
"Matrix reaction resolved %d approval(s) for session %s "
"(choice=%s, user=%s)",
count, prompt.session_key, choice, sender,
)
# Redact bot's seed reactions, leaving only the user's
await self._redact_bot_approval_reactions(room_id, prompt)
except Exception as exc:
logger.error("Failed to resolve gateway approval from Matrix reaction: %s", exc)
async def _redact_bot_approval_reactions(
self,
room_id: str,
prompt: "_MatrixApprovalPrompt",
) -> None:
"""Redact the bot's seed ✅/❎ reactions, leaving only the user's reaction."""
for emoji, evt_id in prompt.bot_reaction_events.items():
try:
await self.redact_message(room_id, evt_id, "approval resolved")
logger.debug("Matrix: redacted bot reaction %s (%s)", emoji, evt_id)
except Exception as exc:
logger.debug("Matrix: failed to redact bot reaction %s: %s", emoji, exc)
# ------------------------------------------------------------------
# Text message aggregation (handles Matrix client-side splits)
# ------------------------------------------------------------------
@@ -1909,11 +2288,7 @@ class MatrixAdapter(BasePlatformAdapter):
if not self._client or not text:
return SendResult(success=False, error="No client or empty text")
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
html = self._markdown_to_html(text)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
msg_content = self._build_text_message_content(text, msgtype=msgtype)
try:
event_id = await self._client.send_message_event(
@@ -1976,6 +2351,77 @@ class MatrixAdapter(BasePlatformAdapter):
# Mention detection helpers
# ------------------------------------------------------------------
def _build_text_message_content(self, text: str, msgtype: str = "m.text") -> Dict[str, Any]:
"""Build Matrix text content with HTML and outbound mention metadata."""
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
mention_user_ids = self._extract_outbound_mentions(text)
if mention_user_ids:
msg_content["m.mentions"] = {"user_ids": mention_user_ids}
html_source = self._inject_outbound_mention_links(text)
html = self._markdown_to_html(html_source)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
return msg_content
def _extract_outbound_mentions(self, text: str) -> list[str]:
"""Return unique Matrix user IDs mentioned in outbound text."""
protected, _ = self._protect_outbound_mention_regions(text)
seen: Set[str] = set()
mentions: list[str] = []
for match in _OUTBOUND_MENTION_RE.finditer(protected):
user_id = match.group(1)
if user_id not in seen:
seen.add(user_id)
mentions.append(user_id)
return mentions
def _inject_outbound_mention_links(self, text: str) -> str:
"""Wrap outbound Matrix mentions in markdown links outside code spans."""
if not text:
return text
protected, placeholders = self._protect_outbound_mention_regions(text)
linked = _OUTBOUND_MENTION_RE.sub(
lambda match: f"[{match.group(1)}](https://matrix.to/#/{match.group(1)})",
protected,
)
for idx, original in enumerate(placeholders):
linked = linked.replace(f"\x00MENTION_PROTECTED{idx}\x00", original)
return linked
def _protect_outbound_mention_regions(self, text: str) -> tuple[str, list[str]]:
"""Protect markdown regions where outbound mentions should stay literal."""
placeholders: list[str] = []
def _protect(fragment: str) -> str:
idx = len(placeholders)
placeholders.append(fragment)
return f"\x00MENTION_PROTECTED{idx}\x00"
protected = re.sub(
r"```[\s\S]*?```",
lambda match: _protect(match.group(0)),
text or "",
)
protected = re.sub(
r"`[^`\n]+`",
lambda match: _protect(match.group(0)),
protected,
)
protected = re.sub(
r"\[[^\]]+\]\([^)]+\)",
lambda match: _protect(match.group(0)),
protected,
)
return protected, placeholders
def _is_bot_mentioned(
self,
body: str,
@@ -2010,13 +2456,33 @@ class MatrixAdapter(BasePlatformAdapter):
return False
def _strip_mention(self, body: str) -> str:
"""Strip the bot's full MXID (``@user:server``) from *body*.
"""Remove explicit bot mentions from message body.
The bare localpart is intentionally *not* stripped it would
mangle file paths like ``/home/hermes/media/file.png``.
Important: only strip explicit mention tokens (``@user:server`` or
``@localpart``). Do NOT strip bare words matching the bot localpart,
otherwise normal phrases like "Hermes Agent" become "Agent".
"""
if not body:
return ""
# Strip explicit full MXID mentions.
if self._user_id:
body = body.replace(self._user_id, "")
# Strip explicit @localpart mentions only (not bare localpart words).
if self._user_id and ":" in self._user_id:
localpart = self._user_id.split(":")[0].lstrip("@")
if localpart:
body = re.sub(
r'(?<![\w])@' + re.escape(localpart) + r'\b',
'',
body,
flags=re.IGNORECASE,
)
# Normalize spacing after mention removal.
body = re.sub(r'[ \t]{2,}', ' ', body)
body = re.sub(r'\s+([,.;:!?])', r'\1', body)
return body.strip()
async def _get_display_name(self, room_id: str, user_id: str) -> str:
+20
View File
@@ -2353,6 +2353,26 @@ class TelegramAdapter(BasePlatformAdapter):
user = getattr(entity, "user", None)
if user and getattr(user, "id", None) == bot_id:
return True
elif entity_type == "bot_command" and expected:
# Telegram's official group-disambiguation form for slash
# commands (``/cmd@botname``) is emitted as a single
# ``bot_command`` entity covering the whole span — there
# is no accompanying ``mention`` entity. Treat it as a
# direct address to this bot when the ``@botname`` suffix
# matches. This is the form Telegram's own command menu
# autocomplete produces in groups, so dropping it at the
# mention gate would break /new, /reset, /help, ... for
# every group that has ``require_mention`` enabled (#15415).
offset = int(getattr(entity, "offset", -1))
length = int(getattr(entity, "length", 0))
if offset < 0 or length <= 0:
continue
command_text = source_text[offset:offset + length]
at_index = command_text.find("@")
if at_index < 0:
continue
if command_text[at_index:].strip().lower() == expected:
return True
return False
def _message_matches_mention_patterns(self, message: Message) -> bool:
+224 -8
View File
@@ -1943,7 +1943,21 @@ class GatewayRunner:
return
try:
if hasattr(agent, "shutdown_memory_provider"):
agent.shutdown_memory_provider()
# Pass the agent's own conversation transcript so memory
# providers' ``on_session_end`` hooks see the real messages
# instead of the empty default (#15165). ``_session_messages``
# is set on ``AIAgent`` (run_agent.py:1518) and refreshed at
# the end of every ``run_conversation`` turn via
# ``_persist_session``; on an agent built through
# ``object.__new__`` (test stubs) the attribute may be
# absent, so ``getattr`` with a ``None`` default keeps the
# call signature-compatible with the pre-fix behaviour
# (``shutdown_memory_provider(messages=None)``).
session_messages = getattr(agent, "_session_messages", None)
if isinstance(session_messages, list):
agent.shutdown_memory_provider(session_messages)
else:
agent.shutdown_memory_provider()
except Exception:
pass
# Close tool resources (terminal sandboxes, browser daemons,
@@ -1954,6 +1968,15 @@ class GatewayRunner:
agent.close()
except Exception:
pass
# Auxiliary async clients (session_search/web/vision/etc.) live in a
# process-global cache and are created inside worker threads. Clean up
# any entries whose event loop is now dead so their httpx transports do
# not accumulate across gateway turns.
try:
from agent.auxiliary_client import cleanup_stale_async_clients
cleanup_stale_async_clients()
except Exception:
pass
_STUCK_LOOP_THRESHOLD = 3 # restarts while active before auto-suspend
_STUCK_LOOP_FILE = ".restart_failure_counts"
@@ -2917,6 +2940,19 @@ class GatewayRunner:
# disconnect (defense in depth; safe to call repeatedly).
_kill_tool_subprocesses("final-cleanup")
# Reap the process-global auxiliary-client cache once at the very
# end of teardown. Per-turn cleanup runs in _cleanup_agent_resources
# for each active agent, but clients bound to worker-thread loops
# that died with their ThreadPoolExecutor (notably cron ticks) only
# get swept here. Without this, long-running gateways accumulate
# async httpx transports until they hit EMFILE on macOS's default
# RLIMIT_NOFILE=256. See #14210.
try:
from agent.auxiliary_client import shutdown_cached_clients
shutdown_cached_clients()
except Exception as _e:
logger.debug("shutdown_cached_clients error: %s", _e)
# Close SQLite session DBs so the WAL write lock is released.
# Without this, --replace and similar restart flows leave the
# old gateway's connection holding the WAL lock until Python
@@ -4199,9 +4235,18 @@ class GatewayRunner:
Keep the normal inbound path and the queued follow-up path on the same
preprocessing pipeline so sender attribution, image enrichment, STT,
document notes, reply context, and @ references all behave the same.
Side effect: writes ``self._pending_native_image_paths`` to a list of
local image paths when the active model supports native vision AND
the user has images attached. The caller consumes and clears this
attribute at the ``run_conversation`` site to build a multimodal user
turn. When the list is empty, the ``_enrich_message_with_vision``
text path has already run and images are represented in-text.
"""
history = history or []
message_text = event.text or ""
# Reset per-call buffer; set only when native routing is chosen.
self._pending_native_image_paths = []
_is_shared_multi_user = is_shared_multi_user_session(
source,
@@ -4222,10 +4267,25 @@ class GatewayRunner:
audio_paths.append(path)
if image_paths:
message_text = await self._enrich_message_with_vision(
message_text,
image_paths,
)
# Decide routing: native (attach pixels) vs text (vision_analyze
# pre-run + prepend description). See agent/image_routing.py.
_img_mode = self._decide_image_input_mode()
if _img_mode == "native":
# Defer attachment to the run_conversation call site.
self._pending_native_image_paths = list(image_paths)
logger.info(
"Image routing: native (model supports vision). %d image(s) will be attached inline.",
len(image_paths),
)
else:
logger.info(
"Image routing: text (mode=%s). Pre-analyzing %d image(s) via vision_analyze.",
_img_mode, len(image_paths),
)
message_text = await self._enrich_message_with_vision(
message_text,
image_paths,
)
if audio_paths:
message_text = await self._enrich_message_with_transcription(
@@ -4740,6 +4800,58 @@ class GatewayRunner:
"compression",
f"{_new_tokens:,}",
)
# If summary generation failed, the
# compressor inserted a static fallback
# placeholder and the dropped turns are
# gone for good. Surface a visible
# warning to the gateway user — agent.log
# alone is invisible on TG/Discord/etc.
_comp = getattr(_hyg_agent, "context_compressor", None)
if _comp is not None and getattr(_comp, "_last_summary_fallback_used", False):
_dropped = getattr(_comp, "_last_summary_dropped_count", 0)
_err = getattr(_comp, "_last_summary_error", None) or "unknown error"
_warn_msg = (
"⚠️ Context compression summary failed "
f"({_err}). {_dropped} historical message(s) "
"were removed and replaced with a placeholder. "
"Earlier context is no longer recoverable. "
"Consider /reset for a clean session, or check "
"your auxiliary.compression model configuration."
)
try:
_adapter = self.adapters.get(source.platform)
if _adapter and source.chat_id:
await _adapter.send(source.chat_id, _warn_msg, metadata=_hyg_meta)
except Exception as _werr:
logger.warning(
"Failed to deliver compression-failure warning to user: %s",
_werr,
)
# Separately: if the user's CONFIGURED aux
# model failed and we recovered by falling
# back to the main model, tell them — a
# misconfigured auxiliary.compression.model
# is something only they can fix, and
# silent recovery would hide it.
elif _comp is not None and getattr(_comp, "_last_aux_model_failure_model", None):
_aux_model = getattr(_comp, "_last_aux_model_failure_model", "")
_aux_err = getattr(_comp, "_last_aux_model_failure_error", None) or "unknown error"
_aux_msg = (
f"️ Configured compression model `{_aux_model}` "
f"failed ({_aux_err}). Recovered using your main "
"model — context is intact — but you may want to "
"check `auxiliary.compression.model` in config.yaml."
)
try:
_adapter = self.adapters.get(source.platform)
if _adapter and source.chat_id:
await _adapter.send(source.chat_id, _aux_msg, metadata=_hyg_meta)
except Exception as _werr:
logger.warning(
"Failed to deliver aux-model-fallback notice to user: %s",
_werr,
)
finally:
self._cleanup_agent_resources(_hyg_agent)
@@ -7283,6 +7395,17 @@ class GatewayRunner:
approx_tokens,
new_tokens,
)
# Detect summary-generation failure so we can surface a
# visible warning to the user even on the manual /compress
# path (otherwise the failure is silently logged).
_summary_failed = bool(getattr(compressor, "_last_summary_fallback_used", False))
_dropped_count = int(getattr(compressor, "_last_summary_dropped_count", 0) or 0)
_summary_err = getattr(compressor, "_last_summary_error", None)
# Separately: did the user's CONFIGURED aux model fail
# and we recovered via main? Surface that as an info
# note so they can fix their config.
_aux_fail_model = getattr(compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(compressor, "_last_aux_model_failure_error", None)
finally:
self._cleanup_agent_resources(tmp_agent)
lines = [f"🗜️ {summary['headline']}"]
@@ -7291,6 +7414,20 @@ class GatewayRunner:
lines.append(summary["token_line"])
if summary["note"]:
lines.append(summary["note"])
if _summary_failed:
lines.append(
f"⚠️ Summary generation failed ({_summary_err or 'unknown error'}). "
f"{_dropped_count} historical message(s) were removed and replaced "
"with a placeholder; earlier context is no longer recoverable. "
"Consider checking your auxiliary.compression model configuration."
)
elif _aux_fail_model:
lines.append(
f"️ Configured compression model `{_aux_fail_model}` failed "
f"({_aux_fail_err or 'unknown error'}). Recovered using your main "
"model — context is intact — but you may want to check "
"`auxiliary.compression.model` in config.yaml."
)
return "\n".join(lines)
except Exception as e:
logger.warning("Manual compress failed: %s", e)
@@ -8378,6 +8515,29 @@ class GatewayRunner:
ctx = copy_context()
return await loop.run_in_executor(None, ctx.run, func, *args)
def _decide_image_input_mode(self) -> str:
"""Resolve the image-input routing for the currently active model.
Returns ``"native"`` (attach pixels on the user turn) or ``"text"``
(pre-analyze with vision_analyze and prepend the description). See
agent/image_routing.py for the full decision table.
The active provider/model are read from config.yaml so the decision
tracks ``/model`` switches automatically on the next message.
"""
try:
from agent.image_routing import decide_image_input_mode
from agent.auxiliary_client import _read_main_model, _read_main_provider
from hermes_cli.config import load_config
cfg = load_config()
provider = _read_main_provider()
model = _read_main_model()
return decide_image_input_mode(provider, model, cfg)
except Exception as exc:
logger.debug("image_routing: decision failed, falling back to text — %s", exc)
return "text"
async def _enrich_message_with_vision(
self,
user_text: str,
@@ -8400,6 +8560,7 @@ class GatewayRunner:
The enriched message string with vision descriptions prepended.
"""
from tools.vision_tools import vision_analyze_tool
from agent.memory_manager import sanitize_context
analysis_prompt = (
"Describe everything visible in this image in thorough detail. "
@@ -8418,6 +8579,7 @@ class GatewayRunner:
result = json.loads(result_json)
if result.get("success"):
description = result.get("analysis", "")
description = sanitize_context(description)
enriched_parts.append(
f"[The user sent an image~ Here's what I can see:\n{description}]\n"
f"[If you need a closer look, use vision_analyze with "
@@ -9879,7 +10041,7 @@ class GatewayRunner:
# Bridge sync status_callback → async adapter.send for context pressure
_status_adapter = self.adapters.get(source.platform)
_status_chat_id = source.chat_id
_status_thread_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
_status_thread_metadata = {"thread_id": _progress_thread_id, "mention_user_id": source.user_id} if _progress_thread_id else {"mention_user_id": source.user_id}
def _status_callback_sync(event_type: str, message: str) -> None:
if not _status_adapter or not _run_still_current():
@@ -10394,7 +10556,39 @@ class GatewayRunner:
_approval_session_token = set_current_session_key(_approval_session_key)
register_gateway_notify(_approval_session_key, _approval_notify_sync)
try:
result = agent.run_conversation(message, conversation_history=agent_history, task_id=session_id)
# If _prepare_inbound_message_text buffered image paths for native
# attachment, wrap the user turn as an OpenAI-style multimodal
# content list. Consume-and-clear so subsequent turns on the same
# runner instance don't re-attach stale images.
_native_imgs = list(getattr(self, "_pending_native_image_paths", []) or [])
self._pending_native_image_paths = []
if _native_imgs:
try:
from agent.image_routing import build_native_content_parts
_parts, _skipped = build_native_content_parts(
message,
_native_imgs,
)
if _skipped:
logger.warning(
"Native image attachment: skipped %d unreadable path(s): %s",
len(_skipped), _skipped,
)
if any(p.get("type") == "image_url" for p in _parts):
_run_message: Any = _parts
else:
# All images failed to read — fall back to plain text.
_run_message = message
except Exception as _img_exc:
logger.warning(
"Native image attachment failed, falling back to text: %s",
_img_exc,
)
_run_message = message
else:
_run_message = message
result = agent.run_conversation(_run_message, conversation_history=agent_history, task_id=session_id)
finally:
unregister_gateway_notify(_approval_session_key)
reset_current_session_key(_approval_session_token)
@@ -10500,12 +10694,20 @@ class GatewayRunner:
try:
from agent.title_generator import maybe_auto_title
all_msgs = result_holder[0].get("messages", []) if result_holder[0] else []
# Route title-generation failures through the agent's
# user-visible warning channel so a depleted auxiliary
# provider doesn't silently leave sessions untitled
# (issue #15775).
_title_failure_cb = getattr(
agent, "_emit_auxiliary_failure", None
)
maybe_auto_title(
self._session_db,
effective_session_id,
message,
final_response,
all_msgs,
failure_callback=_title_failure_cb,
)
except Exception:
pass
@@ -11145,13 +11347,16 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
cron delivery path so live adapters can be used for E2EE rooms.
Also refreshes the channel directory every 5 minutes and prunes the
image/audio/document cache once per hour.
image/audio/document cache + expired ``hermes debug share`` pastes
once per hour.
"""
from cron.scheduler import tick as cron_tick
from gateway.platforms.base import cleanup_image_cache, cleanup_document_cache
from hermes_cli.debug import _sweep_expired_pastes
IMAGE_CACHE_EVERY = 60 # ticks — once per hour at default 60s interval
CHANNEL_DIR_EVERY = 5 # ticks — every 5 minutes
PASTE_SWEEP_EVERY = 60 # ticks — once per hour
logger.info("Cron ticker started (interval=%ds)", interval)
tick_count = 0
@@ -11192,6 +11397,17 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, in
except Exception as e:
logger.debug("Document cache cleanup error: %s", e)
if tick_count % PASTE_SWEEP_EVERY == 0:
try:
deleted, remaining = _sweep_expired_pastes()
if deleted:
logger.info(
"Paste sweep: deleted %d expired paste(s), %d pending",
deleted, remaining,
)
except Exception as e:
logger.debug("Paste sweep error: %s", e)
stop_event.wait(timeout=interval)
logger.info("Cron ticker stopped")
+9
View File
@@ -224,6 +224,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("ARCEEAI_API_KEY",),
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": ProviderConfig(
id="gmi",
name="GMI Cloud",
auth_type="api_key",
inference_base_url="https://api.gmi-serving.com/v1",
api_key_env_vars=("GMI_API_KEY",),
base_url_env_var="GMI_BASE_URL",
),
"minimax": ProviderConfig(
id="minimax",
name="MiniMax",
@@ -1120,6 +1128,7 @@ def resolve_provider(
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"step": "stepfun", "stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee", "arceeai": "arcee",
"gmi-cloud": "gmi", "gmicloud": "gmi",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
+177 -1
View File
@@ -36,12 +36,23 @@ _EXCLUDED_DIRS = {
"__pycache__", # bytecode caches — regenerated on import
".git", # nested git dirs (profiles shouldn't have these, but safety)
"node_modules", # js deps if website/ somehow leaks in
"backups", # prior auto-backups — don't nest backups exponentially
"checkpoints", # session-local trajectory caches — regenerated per-session,
# session-hash-keyed so they don't port to another machine anyway
}
# File-name suffixes to skip
_EXCLUDED_SUFFIXES = (
".pyc",
".pyo",
# SQLite sidecar files — the backup takes a consistent snapshot of ``*.db``
# via ``sqlite3.backup()``, so shipping the live WAL / shared-memory /
# rollback-journal alongside would pair a fresh snapshot with stale sidecar
# state and produce a torn restore on the next open. They're transient and
# regenerated on first connection anyway.
".db-wal",
".db-shm",
".db-journal",
)
# File names to skip (runtime state that's meaningless on another machine)
@@ -454,6 +465,12 @@ def run_import(args) -> None:
# Critical state files to include in quick snapshots (relative to HERMES_HOME).
# Everything else is either regeneratable (logs, cache) or managed separately
# (skills, repo, sessions/).
#
# Entries may be individual files OR directories. Directories are captured
# recursively; missing entries are silently skipped. Pairing data lives in
# platform-specific JSON blobs outside state.db, so it's listed here explicitly
# — `hermes update` snapshots this set before pulling so approved-user lists
# are recoverable if anything goes wrong (issue #15733).
_QUICK_STATE_FILES = (
"state.db",
"config.yaml",
@@ -463,6 +480,10 @@ _QUICK_STATE_FILES = (
"gateway_state.json",
"channel_directory.json",
"processes.json",
# Pairing stores (generic + per-platform JSONs outside state.db)
"pairing", # legacy location (gateway/pairing.py)
"platforms/pairing", # new location (gateway/pairing.py)
"feishu_comment_pairing.json", # Feishu comment subscription pairings
)
_QUICK_SNAPSHOTS_DIR = "state-snapshots"
@@ -498,7 +519,27 @@ def create_quick_snapshot(
for rel in _QUICK_STATE_FILES:
src = home / rel
if not src.exists() or not src.is_file():
if not src.exists():
continue
if src.is_dir():
# Walk the directory and record each file individually in the
# manifest so restore can treat them uniformly. Empty dirs are
# skipped (nothing to snapshot).
for sub in src.rglob("*"):
if not sub.is_file():
continue
sub_rel = sub.relative_to(home).as_posix()
dst = snap_dir / sub_rel
dst.parent.mkdir(parents=True, exist_ok=True)
try:
shutil.copy2(sub, dst)
manifest[sub_rel] = dst.stat().st_size
except (OSError, PermissionError) as exc:
logger.warning("Could not snapshot %s: %s", sub_rel, exc)
continue
if not src.is_file():
continue
dst = snap_dir / rel
@@ -653,3 +694,138 @@ def run_quick_backup(args) -> None:
print(f" Restore with: /snapshot restore {snap_id}")
else:
print("No state files found to snapshot.")
# ---------------------------------------------------------------------------
# Pre-update auto-backup
# ---------------------------------------------------------------------------
_PRE_UPDATE_BACKUPS_DIR = "backups"
_PRE_UPDATE_PREFIX = "pre-update-"
_PRE_UPDATE_DEFAULT_KEEP = 5
def _pre_update_backup_dir(hermes_home: Optional[Path] = None) -> Path:
home = hermes_home or get_hermes_home()
return home / _PRE_UPDATE_BACKUPS_DIR
def _prune_pre_update_backups(backup_dir: Path, keep: int) -> int:
"""Remove oldest pre-update backups beyond the keep limit.
Returns the number of files deleted. Only touches files matching
``pre-update-*.zip`` so hand-made zips dropped in the same directory
are never touched.
"""
if keep < 0:
keep = 0
if not backup_dir.exists():
return 0
backups = sorted(
(p for p in backup_dir.iterdir()
if p.is_file() and p.name.startswith(_PRE_UPDATE_PREFIX) and p.suffix.lower() == ".zip"),
key=lambda p: p.name,
reverse=True,
)
deleted = 0
for p in backups[keep:]:
try:
p.unlink()
deleted += 1
except OSError as exc:
logger.warning("Failed to prune backup %s: %s", p.name, exc)
return deleted
def create_pre_update_backup(
hermes_home: Optional[Path] = None,
keep: int = _PRE_UPDATE_DEFAULT_KEEP,
) -> Optional[Path]:
"""Create a full zip backup of HERMES_HOME under ``backups/``.
Mirrors :func:`run_backup` (same exclusion rules, same SQLite safe-copy)
but writes to ``<HERMES_HOME>/backups/pre-update-<timestamp>.zip`` and
auto-prunes old pre-update backups.
Returns the path to the created zip, or ``None`` if no files were
found or the backup could not be created. Never raises the caller
(``hermes update``) should continue even if the backup fails.
"""
hermes_root = hermes_home or get_default_hermes_root()
if not hermes_root.is_dir():
return None
backup_dir = _pre_update_backup_dir(hermes_root)
try:
backup_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
logger.warning("Could not create pre-update backup dir %s: %s", backup_dir, exc)
return None
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_UPDATE_PREFIX}{stamp}.zip"
# Collect files (same logic as run_backup, minus the chatty progress prints)
files_to_add: list[tuple[Path, Path]] = []
try:
for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
dp = Path(dirpath)
# Prune excluded directories in-place so os.walk doesn't descend
dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
for fname in filenames:
fpath = dp / fname
try:
rel = fpath.relative_to(hermes_root)
except ValueError:
continue
if _should_exclude(rel):
continue
# Skip the output zip itself if it already exists
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Pre-update backup: walk failed: %s", exc)
return None
if not files_to_add:
return None
try:
with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for abs_path, rel_path in files_to_add:
try:
if abs_path.suffix == ".db":
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
tmp_db = Path(tmp.name)
try:
if _safe_copy_db(abs_path, tmp_db):
zf.write(tmp_db, arcname=str(rel_path))
finally:
tmp_db.unlink(missing_ok=True)
else:
zf.write(abs_path, arcname=str(rel_path))
except (PermissionError, OSError, ValueError) as exc:
logger.debug("Skipping %s in pre-update backup: %s", rel_path, exc)
continue
except OSError as exc:
logger.warning("Pre-update backup: zip write failed: %s", exc)
# Best-effort cleanup of partial file
try:
out_path.unlink(missing_ok=True)
except OSError:
pass
return None
_prune_pre_update_backups(backup_dir, keep=keep)
return out_path
+2
View File
@@ -62,6 +62,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
aliases=("reset",)),
CommandDef("clear", "Clear screen and start a new session", "Session",
cli_only=True),
CommandDef("redraw", "Force a full UI repaint (recovers from terminal drift)", "Session",
cli_only=True),
CommandDef("history", "Show conversation history", "Session",
cli_only=True),
CommandDef("save", "Save the current conversation", "Session",
+96 -8
View File
@@ -56,8 +56,18 @@ _EXTRA_ENV_KEYS = frozenset({
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
"MATRIX_PASSWORD", "MATRIX_ENCRYPTION", "MATRIX_DEVICE_ID", "MATRIX_HOME_ROOM",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD", "MATRIX_DM_AUTO_THREAD",
"MATRIX_RECOVERY_KEY",
# Langfuse observability plugin — optional tuning keys + standard SDK vars
"HERMES_LANGFUSE_ENABLED", # backward-compat env var (new: plugins.langfuse.enabled in config.yaml)
"HERMES_LANGFUSE_ENV",
"HERMES_LANGFUSE_RELEASE",
"HERMES_LANGFUSE_SAMPLE_RATE",
"HERMES_LANGFUSE_MAX_CHARS",
"HERMES_LANGFUSE_DEBUG",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
})
import yaml
@@ -389,6 +399,20 @@ DEFAULT_CONFIG = {
# (60+ tool iterations with tiny output) before users assume the
# bot is dead and /restart.
"gateway_notify_interval": 180,
# How user-attached images are presented to the main model on each turn.
# "auto" — attach natively when the active model reports
# supports_vision=True AND the user hasn't explicitly
# configured auxiliary.vision.provider. Otherwise fall
# back to text (vision_analyze pre-analysis).
# "native" — always attach natively; non-vision models will either
# error at the provider or get a last-chance text fallback
# (see run_agent._prepare_messages_for_api).
# "text" — always pre-analyze with vision_analyze and prepend the
# description as text; the main model never sees pixels.
# Affects gateway platforms, the TUI, and CLI /attach. vision_analyze
# remains available as a tool regardless of this setting — the routing
# only controls how inbound user images are presented.
"image_input_mode": "auto",
},
"terminal": {
@@ -928,7 +952,7 @@ DEFAULT_CONFIG = {
# Pre-exec security scanning via tirith
"security": {
"allow_private_urls": False, # Allow requests to private/internal IPs (for OpenWrt, proxies, VPNs)
"redact_secrets": True,
"redact_secrets": False,
"tirith_enabled": True,
"tirith_path": "tirith",
"tirith_timeout": 5,
@@ -1037,6 +1061,20 @@ DEFAULT_CONFIG = {
"seen": {},
},
# ``hermes update`` behaviour.
"updates": {
# Run a full ``hermes backup``-style zip of HERMES_HOME before every
# ``hermes update``. Backups land in ``<HERMES_HOME>/backups/`` and
# can be restored with ``hermes import <path>``. Off by default —
# on large HERMES_HOME directories the zip can add minutes to every
# update. Set to true to re-enable, or pass ``--backup`` to opt in
# for a single update run.
"pre_update_backup": False,
# How many pre-update backup zips to retain. Older ones are pruned
# automatically after each successful backup.
"backup_keep": 5,
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1226,6 +1264,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"GMI_API_KEY": {
"description": "GMI Cloud API key",
"prompt": "GMI Cloud API key",
"url": "https://www.gmicloud.ai/",
"password": True,
"category": "provider",
"advanced": True,
},
"GMI_BASE_URL": {
"description": "GMI Cloud base URL override",
"prompt": "GMI Cloud base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"MINIMAX_API_KEY": {
"description": "MiniMax API key (international)",
"prompt": "MiniMax API key",
@@ -1648,6 +1702,30 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Langfuse observability ──
"HERMES_LANGFUSE_PUBLIC_KEY": {
"description": "Langfuse project public key (pk-lf-...)",
"prompt": "Langfuse public key",
"url": "https://cloud.langfuse.com",
"password": False,
"category": "tool",
},
"HERMES_LANGFUSE_SECRET_KEY": {
"description": "Langfuse project secret key (sk-lf-...)",
"prompt": "Langfuse secret key",
"url": "https://cloud.langfuse.com",
"password": True,
"category": "tool",
},
"HERMES_LANGFUSE_BASE_URL": {
"description": "Langfuse server URL (default: https://cloud.langfuse.com)",
"prompt": "Langfuse server URL (leave empty for cloud.langfuse.com)",
"url": None,
"password": False,
"category": "tool",
"advanced": True,
},
# ── Messaging platforms ──
"TELEGRAM_BOT_TOKEN": {
"description": "Telegram bot token from @BotFather",
@@ -1795,6 +1873,14 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"MATRIX_DM_AUTO_THREAD": {
"description": "Auto-create threads for DM messages in Matrix (default: false)",
"prompt": "Auto-create threads in DMs (true/false)",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"MATRIX_DEVICE_ID": {
"description": "Stable Matrix device ID for E2EE persistence across restarts (e.g. HERMES_BOT)",
"prompt": "Matrix device ID (stable across restarts)",
@@ -3309,14 +3395,16 @@ def load_config() -> Dict[str, Any]:
_SECURITY_COMMENT = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default — tool output (terminal stdout,
# read_file results, web content) passes through unmodified. Set
# redact_secrets to true to mask strings that look like API keys, tokens,
# and passwords before they enter the model context and logs.
# tirith pre-exec scanning is enabled by default when the tirith binary
# is available. Configure via security.tirith_* keys or env vars
# (TIRITH_ENABLED, TIRITH_BIN, TIRITH_TIMEOUT, TIRITH_FAIL_OPEN).
#
# security:
# redact_secrets: false
# redact_secrets: true
# tirith_enabled: true
# tirith_path: "tirith"
# tirith_timeout: 5
@@ -3349,11 +3437,11 @@ _FALLBACK_COMMENT = """
_COMMENTED_SECTIONS = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default. Set to true to mask strings that
# look like API keys, tokens, and passwords in tool output and logs.
#
# security:
# redact_secrets: false
# redact_secrets: true
# ── Fallback Model ────────────────────────────────────────────────────
# Automatic provider failover when primary is unavailable.
+11 -5
View File
@@ -45,8 +45,13 @@ def _pending_file() -> Path:
Each entry: ``{"url": "...", "expire_at": <unix_ts>}``. Scheduled
DELETEs used to be handled by spawning a detached Python process per
paste that slept for 6 hours; those accumulated forever if the user
ran ``hermes debug share`` repeatedly. We now persist the schedule
to disk and sweep expired entries on the next debug invocation.
ran ``hermes debug share`` repeatedly.
Deletion is now driven by the gateway's cron ticker
(``gateway/run.py::_start_cron_ticker``) which calls
``_sweep_expired_pastes`` once per hour. ``hermes debug share`` also
runs an opportunistic sweep on entry as a fallback for CLI-only users
who never start the gateway.
"""
return get_hermes_home() / "pastes" / "pending.json"
@@ -223,9 +228,10 @@ def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SEC
interpreters that never exited until the sleep completed.
The replacement is stateless: we append to ``~/.hermes/pastes/pending.json``
and rely on opportunistic sweeps (``_sweep_expired_pastes``) called from
every ``hermes debug`` invocation. If the user never runs ``hermes debug``
again, paste.rs's own retention policy handles cleanup.
and the gateway's cron ticker sweeps expired entries once per hour.
``hermes debug share`` also runs an opportunistic sweep as a fallback
for CLI-only users. If neither runs again, paste.rs's own retention
policy handles cleanup.
"""
_record_pending(urls, delay_seconds=delay_seconds)
+2
View File
@@ -46,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
"Z_AI_API_KEY",
"KIMI_API_KEY",
"KIMI_CN_API_KEY",
"GMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"KILOCODE_API_KEY",
@@ -937,6 +938,7 @@ def run_doctor(args):
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
+242 -15
View File
@@ -44,6 +44,7 @@ Usage:
"""
import argparse
import json
import os
import shutil
import subprocess
@@ -595,17 +596,22 @@ def _session_browse_picker(sessions: list) -> Optional[str]:
def _resolve_last_session(source: str = "cli") -> Optional[str]:
"""Look up the most recent session ID for a source."""
"""Look up the most recently-used session ID for a source."""
db = None
try:
from hermes_state import SessionDB
db = SessionDB()
sessions = db.search_sessions(source=source, limit=1)
db.close()
if sessions:
return sessions[0]["id"]
return sessions[0]["id"] if sessions else None
except Exception:
pass
finally:
if db is not None:
try:
db.close()
except Exception:
pass
return None
@@ -760,9 +766,20 @@ def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:
return None
def _print_tui_exit_summary(session_id: Optional[str]) -> None:
def _read_tui_active_session_file(path: Optional[str]) -> Optional[str]:
if not path:
return None
try:
data = json.loads(Path(path).read_text(encoding="utf-8"))
sid = str(data.get("session_id") or "").strip()
return sid or None
except Exception:
return None
def _print_tui_exit_summary(session_id: Optional[str], active_session_file: Optional[str] = None) -> None:
"""Print a shell-visible epilogue after TUI exits."""
target = session_id or _resolve_last_session(source="tui")
target = _read_tui_active_session_file(active_session_file) or session_id or _resolve_last_session(source="tui")
if not target:
return
@@ -812,8 +829,29 @@ def _print_tui_exit_summary(session_id: Optional[str]) -> None:
)
_NPM_LOCK_RUNTIME_KEYS = frozenset({"ideallyInert"})
def _tui_need_npm_install(root: Path) -> bool:
"""True when @hermes/ink is missing or node_modules is behind package-lock.json (post-pull)."""
"""True when @hermes/ink is missing or node_modules is behind package-lock.json.
Compares ``package-lock.json`` against ``node_modules/.package-lock.json``
(npm's hidden lockfile) by **content**, not mtime: git checkouts and npm
rewrites can bump the root lockfile's timestamp even when installed deps
already match, which used to trigger a spurious "Installing TUI
dependencies" on every launch.
For each entry in the root lock's ``packages`` map:
- missing from hidden lock reinstall (unless the entry is marked
``optional`` or ``peer``, which npm may intentionally skip per platform)
- present but with differing fields (excluding npm-written runtime
annotations like ``ideallyInert``) reinstall
Extra entries that exist only in the hidden lock are ignored stale
transitives left over from a removed dependency don't break runtime and
we'd rather not force a reinstall for them. Falls back to mtime
comparison if either lockfile is unparseable.
"""
ink = root / "node_modules" / "@hermes" / "ink" / "package.json"
if not ink.is_file():
return True
@@ -823,7 +861,35 @@ def _tui_need_npm_install(root: Path) -> bool:
marker = root / "node_modules" / ".package-lock.json"
if not marker.is_file():
return True
return lock.stat().st_mtime > marker.stat().st_mtime
# Compare lockfile contents, not mtimes: git checkouts and npm rewrites
# can bump the root lockfile timestamp even when installed deps already
# match. Fall back to mtime when either file is unparseable.
try:
wanted = json.loads(lock.read_text(encoding="utf-8")).get("packages") or {}
installed = json.loads(marker.read_text(encoding="utf-8")).get("packages") or {}
except (OSError, UnicodeDecodeError, json.JSONDecodeError):
return lock.stat().st_mtime > marker.stat().st_mtime
def comparable(pkg: dict) -> dict:
return {k: v for k, v in pkg.items() if k not in _NPM_LOCK_RUNTIME_KEYS}
for name, pkg in wanted.items():
if not name:
continue
if not isinstance(pkg, dict):
continue
if name not in installed:
if pkg.get("optional") or pkg.get("peer"):
continue
return True
if isinstance(installed[name], dict) and comparable(pkg) != comparable(installed[name]):
return True
return False
def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
@@ -1037,7 +1103,14 @@ def _launch_tui(
"""Replace current process with the TUI."""
tui_dir = PROJECT_ROOT / "ui-tui"
import tempfile
env = os.environ.copy()
active_session_fd, active_session_file = tempfile.mkstemp(
prefix="hermes-tui-active-session-", suffix=".json"
)
os.close(active_session_fd)
env["HERMES_TUI_ACTIVE_SESSION_FILE"] = active_session_file
env["HERMES_PYTHON_SRC_ROOT"] = os.environ.get(
"HERMES_PYTHON_SRC_ROOT", str(PROJECT_ROOT)
)
@@ -1065,13 +1138,20 @@ def _launch_tui(
env["HERMES_TUI_RESUME"] = resume_session_id
argv, cwd = _make_tui_argv(tui_dir, tui_dev)
code: Optional[int] = None
try:
code = subprocess.call(argv, cwd=str(cwd), env=env)
except KeyboardInterrupt:
code = 130
try:
code = subprocess.call(argv, cwd=str(cwd), env=env)
except KeyboardInterrupt:
code = 130
if code in (0, 130):
_print_tui_exit_summary(resume_session_id)
if code in (0, 130):
_print_tui_exit_summary(resume_session_id, active_session_file)
finally:
try:
os.unlink(active_session_file)
except OSError:
pass
sys.exit(code)
@@ -1737,6 +1817,7 @@ def select_provider_and_model(args=None):
"huggingface",
"xiaomi",
"arcee",
"gmi",
"nvidia",
"ollama-cloud",
):
@@ -3332,7 +3413,26 @@ def _model_flow_named_custom(config, provider_info):
provider_entry = providers_cfg.get(provider_key)
if isinstance(provider_entry, dict):
provider_entry["default_model"] = model_name
if config_api_key and not str(provider_entry.get("api_key", "") or "").strip():
# Only persist an inline api_key when the user originally had
# one (either a literal secret or a ``${VAR}`` template). When
# the entry relies on ``key_env``, do not synthesize a
# ``${key_env}`` api_key — the runtime already resolves the
# key from ``key_env`` directly, and writing the resolved
# secret (or even a synthesized template) would silently
# downgrade credential hygiene on entries that intentionally
# keep plaintext out of ``config.yaml``. See issue #15803.
original_api_key_ref = str(
provider_info.get("api_key_ref", "") or ""
).strip()
original_api_key = str(
provider_info.get("api_key", "") or ""
).strip()
had_inline_api_key = bool(original_api_key_ref or original_api_key)
if (
had_inline_api_key
and config_api_key
and not str(provider_entry.get("api_key", "") or "").strip()
):
provider_entry["api_key"] = config_api_key
if key_env and not str(provider_entry.get("key_env", "") or "").strip():
provider_entry["key_env"] = key_env
@@ -6123,6 +6223,96 @@ def _ensure_fhs_path_guard() -> None:
print(" (reload your shell or run 'source ~/.bashrc' to pick it up)")
def _run_pre_update_backup(args) -> None:
"""Create a full zip backup of HERMES_HOME before running the update.
Gated on ``updates.pre_update_backup`` in config (default false). Off
by default because the zip can add minutes to every update on large
HERMES_HOME directories. The ``--backup`` flag on ``hermes update``
opts in for a single run; ``--no-backup`` forces it off when config
has it enabled. Never raises a backup failure should not block the
update itself.
"""
# CLI flags win over config. --no-backup beats --backup if both are set.
if getattr(args, "no_backup", False):
print("◆ Pre-update backup: skipped (--no-backup)")
print()
return
force_backup = bool(getattr(args, "backup", False))
try:
from hermes_cli.config import load_config
cfg = load_config()
except Exception as exc:
logging.getLogger(__name__).debug("Could not load config for pre-update backup: %s", exc)
cfg = {}
updates_cfg = cfg.get("updates", {}) if isinstance(cfg, dict) else {}
enabled = updates_cfg.get("pre_update_backup", False)
keep = updates_cfg.get("backup_keep", 5)
if not enabled and not force_backup:
# Silent by default — the backup is off, most users don't need to
# hear about it on every update. They can opt in via --backup
# or by flipping the config knob.
return
try:
from hermes_cli.backup import create_pre_update_backup
except Exception as exc:
print(f"⚠ Pre-update backup: could not load backup module ({exc}); continuing update.")
print()
return
print("◆ Creating pre-update backup...")
t0 = _time.monotonic()
try:
out_path = create_pre_update_backup(keep=int(keep))
except Exception as exc: # defensive — helper already swallows, but just in case
print(f" ⚠ Backup failed: {exc}")
print(" Continuing with update.")
print()
return
elapsed = _time.monotonic() - t0
if out_path is None:
print(" ⚠ Backup skipped (no files found or write failed); continuing update.")
print()
return
try:
size_bytes = out_path.stat().st_size
except OSError:
size_bytes = 0
# Human-readable size
size_str = f"{size_bytes} B"
for unit in ("KB", "MB", "GB"):
if size_bytes < 1024:
break
size_bytes /= 1024
size_str = f"{size_bytes:.1f} {unit}"
# Render path using display_hermes_home so the user sees ~/.hermes/...
try:
from hermes_constants import get_hermes_home, display_hermes_home
home = get_hermes_home()
try:
display_path = f"{display_hermes_home()}/{out_path.relative_to(home)}"
except ValueError:
display_path = str(out_path)
except Exception:
display_path = str(out_path)
print(f" Saved: {display_path} ({size_str}, {elapsed:.1f}s)")
print(f" Restore: hermes import {out_path}")
print(f" Disable: omit --backup (backups are off by default)")
print(f" set updates.pre_update_backup: false in config.yaml")
print()
def cmd_update(args):
"""Update Hermes Agent to the latest version.
@@ -6165,6 +6355,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
print("⚕ Updating Hermes Agent...")
print()
# Pre-update backup — runs before any git/file mutation so users can
# always roll back to the exact state they had before this update.
_run_pre_update_backup(args)
# Try git-based update first, fall back to ZIP download on Windows
# when git file I/O is broken (antivirus, NTFS filter drivers, etc.)
use_zip_update = False
@@ -6314,6 +6508,22 @@ def _cmd_update_impl(args, gateway_mode: bool):
print(f"→ Found {commit_count} new commit(s)")
# Snapshot critical state (state.db, config, pairing JSONs, etc.)
# before pulling so a user can recover if something goes wrong.
# Issue #15733 reported missing pairing data after an update; even
# though `git pull` can't touch $HERMES_HOME, this is cheap
# belt-and-suspenders insurance and gives the user something to
# restore from via `/snapshot list` / `/snapshot restore <id>`.
try:
from hermes_cli.backup import create_quick_snapshot
snap_id = create_quick_snapshot(label="pre-update")
if snap_id:
print(f" ✓ Pre-update snapshot: {snap_id}")
except Exception as exc:
# Never let a snapshot failure block an update.
logger.debug("Pre-update snapshot failed: %s", exc)
print("→ Pulling updates...")
update_succeeded = False
try:
@@ -7622,6 +7832,7 @@ For more help on a command:
"kilocode",
"xiaomi",
"arcee",
"gmi",
"nvidia",
],
default=None,
@@ -8871,7 +9082,11 @@ Examples:
)
plugins_remove.add_argument("name", help="Plugin directory name to remove")
plugins_subparsers.add_parser("list", aliases=["ls"], help="List installed plugins")
plugins_list = plugins_subparsers.add_parser("list", aliases=["ls"], help="List installed plugins")
plugins_list.add_argument(
"--available", action="store_true",
help="Also show official optional plugins that are not yet installed",
)
plugins_enable = plugins_subparsers.add_parser(
"enable", help="Enable a disabled plugin"
@@ -9542,6 +9757,18 @@ Examples:
default=False,
help="Check whether an update is available without installing anything",
)
update_parser.add_argument(
"--no-backup",
action="store_true",
default=False,
help="Skip the pre-update backup for this run (overrides updates.pre_update_backup)",
)
update_parser.add_argument(
"--backup",
action="store_true",
default=False,
help="Force a pre-update backup for this run (off by default; overrides updates.pre_update_backup)",
)
update_parser.set_defaults(func=cmd_update)
# =========================================================================
+70 -1
View File
@@ -278,6 +278,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"trinity-large-preview",
"trinity-mini",
],
"gmi": [
"zai-org/GLM-5.1-FP8",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"google/gemini-3.1-flash-lite-preview",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
],
"opencode-zen": [
"kimi-k2.5",
"gpt-5.4-pro",
@@ -709,7 +717,6 @@ class ProviderEntry(NamedTuple):
label: str
tui_desc: str # detailed description for `hermes model` TUI
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
@@ -735,6 +742,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("gmi", "GMI Cloud", "GMI Cloud (multi-model direct API)"),
ProviderEntry("kilocode", "Kilo Code", "Kilo Code (Kilo Gateway API)"),
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
@@ -769,6 +777,8 @@ _PROVIDER_ALIASES = {
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -1849,6 +1859,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
if normalized == "gmi":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("gmi")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@@ -2226,6 +2249,52 @@ def copilot_model_api_mode(
return "chat_completions"
# Azure Foundry model families that require the Responses API. Azure
# rejects /chat/completions against these deployments with
# ``400 "The requested operation is unsupported."`` — the same payload Bob
# Dobolina hit in April 2026 on ``gpt-5.3-codex`` while ``gpt-4o-pure`` on
# the same endpoint worked fine. Keep the patterns broad enough to cover
# vendor-renamed deployments (e.g. ``gpt-5.3-codex``, ``gpt-5-codex``,
# ``gpt-5.4``, ``o1-preview``) but tight enough to leave GPT-4 / 3.5 / Llama /
# Mistral / Grok deployments on chat completions.
_AZURE_FOUNDRY_RESPONSES_PREFIXES = (
"codex", # codex-*, codex-mini
"gpt-5", # gpt-5, gpt-5.x, gpt-5-codex, gpt-5.x-codex
"o1", # o1, o1-preview, o1-mini
"o3", # o3, o3-mini
"o4", # o4, o4-mini
)
def azure_foundry_model_api_mode(model_name: Optional[str]) -> Optional[str]:
"""Infer Azure Foundry api_mode from a deployment/model name.
Returns ``"codex_responses"`` when the model name matches a family that
only accepts the Responses API on Azure Foundry (GPT-5.x, codex, o1/o3/o4
reasoning models). Returns ``None`` otherwise the caller should fall
back to the configured/default api_mode (typically ``chat_completions``)
so GPT-4o, GPT-4 Turbo, Llama, Mistral, etc. keep working.
Intentionally does NOT return ``anthropic_messages``; Anthropic-style
Azure endpoints are disambiguated by URL (``/anthropic`` suffix) in
``runtime_provider._detect_api_mode_for_url`` and by the user setting
``model.api_mode: anthropic_messages`` explicitly.
"""
raw = str(model_name or "").strip().lower()
if not raw:
return None
# Strip any vendor/ prefix a user may have copied from OpenRouter / Copilot.
if "/" in raw:
raw = raw.rsplit("/", 1)[-1]
# gpt-5-mini speaks chat completions on Copilot but Azure Foundry deploys
# the full gpt-5 family uniformly on Responses API — don't carve an
# exception here.
for prefix in _AZURE_FOUNDRY_RESPONSES_PREFIXES:
if raw.startswith(prefix):
return "codex_responses"
return None
def normalize_opencode_model_id(provider_id: Optional[str], model_id: Optional[str]) -> str:
"""Normalize OpenCode config IDs to the bare model slug used in API requests."""
provider = normalize_provider(provider_id)
+14
View File
@@ -79,6 +79,20 @@ VALID_HOOKS: Set[str] = {
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
# Approval lifecycle hooks. Fired by tools/approval.py when a dangerous
# command needs user approval -- fires BOTH for CLI-interactive prompts
# and for gateway/ACP approvals (Telegram, Discord, Slack, TUI, etc.).
# Observers only: return values are ignored. Plugins cannot veto or
# pre-answer an approval from these hooks (use pre_tool_call to block
# a tool before it reaches approval).
#
# Kwargs for pre_approval_request:
# command: str, description: str, pattern_key: str, pattern_keys: list[str],
# session_key: str, surface: "cli" | "gateway"
# Kwargs for post_approval_response: same as above plus
# choice: "once" | "session" | "always" | "deny" | "timeout"
"pre_approval_request",
"post_approval_response",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
+168 -10
View File
@@ -1,7 +1,13 @@
"""``hermes plugins`` CLI subcommand — install, update, remove, and list plugins.
Plugins are installed from Git repositories into ``~/.hermes/plugins/``.
Supports full URLs and ``owner/repo`` shorthand (resolves to GitHub).
Plugins can be installed from:
- Official optional plugins shipped with the repo: ``official/<category>/<name>``
- Git repositories (full URL or ``owner/repo`` GitHub shorthand)
Official plugins live in ``optional-plugins/`` inside the Hermes repo and are
copied into ``~/.hermes/plugins/`` on install no git clone needed, no network
required. They are NOT auto-discovered from ``optional-plugins/``; only installed
copies in ``~/.hermes/plugins/`` are loaded by Hermes.
After install, if the plugin ships an ``after-install.md`` file it is
rendered with Rich Markdown. Otherwise a default confirmation is shown.
@@ -95,10 +101,80 @@ def _resolve_git_url(identifier: str) -> str:
raise ValueError(
f"Invalid plugin identifier: '{identifier}'. "
"Use a Git URL or owner/repo shorthand."
"Use 'official/<category>/<name>', a Git URL, or owner/repo shorthand."
)
def _optional_plugins_dir() -> Path:
"""Return the optional-plugins/ directory shipped with the Hermes repo."""
return Path(__file__).resolve().parent.parent / "optional-plugins"
def _resolve_official_plugin(identifier: str) -> Optional[Path]:
"""If *identifier* is 'official/<category>/<name>', return its source path.
Returns ``None`` when the identifier is not in official format or the
plugin directory does not exist.
"""
# Accept 'official/category/name' or just 'category/name' when the
# category/name path exists under optional-plugins/.
parts = identifier.strip("/").split("/")
# Strip leading 'official' prefix if present
if parts and parts[0] == "official":
parts = parts[1:]
if len(parts) < 1:
return None
base = _optional_plugins_dir()
# Try category/name (2 parts) or bare name (1 part)
for nparts in (2, 1):
if len(parts) < nparts:
continue
candidate = base.joinpath(*parts[-nparts:])
try:
resolved = candidate.resolve()
base_resolved = base.resolve()
resolved.relative_to(base_resolved) # traversal guard
except (ValueError, OSError):
continue
if resolved.is_dir() and (
(resolved / "plugin.yaml").exists() or (resolved / "__init__.py").exists()
):
return resolved
return None
def _list_official_plugins() -> list[tuple[str, str]]:
"""Return [(identifier, description), ...] for all official optional plugins."""
base = _optional_plugins_dir()
if not base.is_dir():
return []
results = []
for category_dir in sorted(base.iterdir()):
if not category_dir.is_dir() or category_dir.name.startswith("."):
continue
for plugin_dir in sorted(category_dir.iterdir()):
if not plugin_dir.is_dir() or plugin_dir.name.startswith("."):
continue
manifest_file = plugin_dir / "plugin.yaml"
desc = ""
if manifest_file.exists():
try:
import yaml
data = yaml.safe_load(manifest_file.read_text()) or {}
desc = data.get("description", "")
except Exception:
pass
identifier = f"official/{category_dir.name}/{plugin_dir.name}"
results.append((identifier, desc))
return results
def _repo_name_from_url(url: str) -> str:
"""Extract the repo name from a Git URL for the plugin directory name."""
# Strip trailing .git and slashes
@@ -296,7 +372,61 @@ def cmd_install(
from rich.console import Console
console = Console()
plugins_dir = _plugins_dir()
# ── Official optional plugins (no network, copied from optional-plugins/) ──
official_src = _resolve_official_plugin(identifier)
if official_src is not None:
manifest = _read_manifest(official_src)
plugin_name = manifest.get("name") or official_src.name
target = _sanitize_plugin_name(plugin_name, plugins_dir)
if target.exists():
if not force:
console.print(
f"[red]Error:[/red] Plugin '{plugin_name}' already exists at {target}.\n"
f"Use [bold]--force[/bold] to reinstall, or "
f"[bold]hermes plugins update {plugin_name}[/bold] to update."
)
sys.exit(1)
console.print(f"[dim] Removing existing {plugin_name}...[/dim]")
shutil.rmtree(target)
console.print(f"[dim]Installing {plugin_name} from official optional plugins...[/dim]")
shutil.copytree(str(official_src), str(target))
_copy_example_files(target, console)
_prompt_plugin_env_vars(manifest, console)
_display_after_install(target, identifier)
installed_name = manifest.get("name") or target.name
should_enable = enable
if should_enable is None:
if sys.stdin.isatty() and sys.stdout.isatty():
try:
answer = input(" Enable now? [y/N] ").strip().lower()
should_enable = answer in ("y", "yes")
except (EOFError, KeyboardInterrupt):
should_enable = False
else:
should_enable = False
if should_enable:
enabled = _get_enabled_set()
disabled = _get_disabled_set()
enabled.add(installed_name)
disabled.discard(installed_name)
_save_enabled_set(enabled)
_save_disabled_set(disabled)
console.print(f" [green]✓[/green] Plugin [bold]{installed_name}[/bold] enabled.")
else:
console.print(
f" [dim]Plugin installed but not enabled. "
f"Run [bold]hermes plugins enable {installed_name}[/bold] to activate.[/dim]"
)
return
# ── Git URL / owner/repo install ──────────────────────────────────────────
try:
git_url = _resolve_git_url(identifier)
except ValueError as e:
@@ -310,8 +440,6 @@ def cmd_install(
"Consider using https:// or git@ for production installs."
)
plugins_dir = _plugins_dir()
# Clone into a temp directory first so we can read plugin.yaml for the name
with tempfile.TemporaryDirectory() as tmp:
tmp_target = Path(tmp) / "plugin"
@@ -696,16 +824,21 @@ def _discover_all_plugins() -> list:
return list(seen.values())
def cmd_list() -> None:
"""List all plugins (bundled + user) with enabled/disabled state."""
def cmd_list(available: bool = False) -> None:
"""List all plugins (bundled + user) with enabled/disabled state.
When *available* is True, also show official optional plugins that are
not yet installed.
"""
from rich.console import Console
from rich.table import Table
console = Console()
entries = _discover_all_plugins()
if not entries:
if not entries and not available:
console.print("[dim]No plugins installed.[/dim]")
console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
console.print("[dim]Install with:[/dim] hermes plugins install official/<category>/<name>")
console.print("[dim]Browse available:[/dim] hermes plugins list --available")
return
enabled = _get_enabled_set()
@@ -734,6 +867,31 @@ def cmd_list() -> None:
console.print("[dim]Enable/disable:[/dim] hermes plugins enable/disable <name>")
console.print("[dim]Plugins are opt-in by default — only 'enabled' plugins load.[/dim]")
if available:
official = _list_official_plugins()
if official:
installed_names = {name for name, *_ in entries}
def _is_installed(ident: str) -> bool:
dirname = ident.rsplit("/", 1)[-1]
# Check both the directory name (langfuse-tracing) and
# common underscore variant (langfuse_tracing) since the
# installed plugin uses the manifest name, not the dir name.
return (dirname in installed_names
or dirname.replace("-", "_") in installed_names)
not_installed = [(ident, desc) for ident, desc in official
if not _is_installed(ident)]
if not_installed:
console.print()
avail_table = Table(title="Official optional plugins (not installed)", show_lines=False)
avail_table.add_column("Identifier", style="bold")
avail_table.add_column("Description")
for ident, desc in not_installed:
avail_table.add_row(ident, desc)
console.print(avail_table)
console.print("[dim]Install:[/dim] hermes plugins install official/<category>/<name>")
else:
console.print("[dim]All official optional plugins are already installed.[/dim]")
# ---------------------------------------------------------------------------
# Provider plugin discovery helpers
@@ -1270,7 +1428,7 @@ def plugins_command(args) -> None:
elif action == "disable":
cmd_disable(args.name)
elif action in ("list", "ls"):
cmd_list()
cmd_list(available=getattr(args, "available", False))
elif action is None:
cmd_toggle()
else:
+11
View File
@@ -163,6 +163,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
base_url_override="https://api.arcee.ai/api/v1",
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": HermesOverlay(
transport="openai_chat",
extra_env_vars=("GMI_API_KEY",),
base_url_override="https://api.gmi-serving.com/v1",
base_url_env_var="GMI_BASE_URL",
),
"ollama-cloud": HermesOverlay(
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
@@ -297,6 +303,10 @@ ALIASES: Dict[str, str] = {
"arcee-ai": "arcee",
"arceeai": "arcee",
# gmi
"gmi-cloud": "gmi",
"gmicloud": "gmi",
# Local server aliases → virtual "local" concept (resolved via user config)
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
@@ -319,6 +329,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
"gmi": "GMI Cloud",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+31
View File
@@ -231,6 +231,19 @@ def _resolve_runtime_from_pool_entry(
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# Model-family inference for GPT-5.x / codex / o1-o4: Azure rejects
# /chat/completions on these with 400 "operation unsupported" — see
# azure_foundry_model_api_mode() for rationale. Skip when the user
# explicitly picked anthropic_messages (Anthropic-style endpoint).
if effective_model and api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
api_mode = inferred
# For Anthropic-style endpoints, strip /v1 suffix
if api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
@@ -608,6 +621,7 @@ def _resolve_azure_foundry_runtime(
model_cfg: Dict[str, Any],
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve an Azure Foundry runtime entry.
@@ -628,6 +642,22 @@ def _resolve_azure_foundry_runtime(
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"
# Model-family inference: Azure Foundry deploys GPT-5.x / codex / o1-o4
# reasoning models as Responses-API-only. Calling /chat/completions
# against them returns 400 "The requested operation is unsupported."
# Upgrade api_mode when the model name matches, unless the user has
# explicitly chosen anthropic_messages (Anthropic-style endpoint).
effective_model = str(target_model or model_cfg.get("default") or "").strip()
if effective_model and cfg_api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
cfg_api_mode = inferred
env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
base_url = explicit_base_url_clean or cfg_base_url or env_base_url
if not base_url:
@@ -864,6 +894,7 @@ def resolve_runtime_provider(
model_cfg=_get_model_config(),
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
target_model=target_model,
)
return azure_runtime
+50
View File
@@ -425,6 +425,31 @@ TOOL_CATEGORIES = {
},
],
},
"langfuse": {
"name": "Langfuse Observability",
"icon": "📊",
"providers": [
{
"name": "Langfuse Cloud",
"tag": "Hosted Langfuse (cloud.langfuse.com)",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)", "url": "https://cloud.langfuse.com"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)", "url": "https://cloud.langfuse.com"},
],
"post_setup": "langfuse",
},
{
"name": "Langfuse Self-Hosted",
"tag": "Self-hosted Langfuse instance",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)"},
{"key": "HERMES_LANGFUSE_BASE_URL", "prompt": "Langfuse server URL (e.g. http://localhost:3000)", "default": "http://localhost:3000"},
],
"post_setup": "langfuse",
},
],
},
}
# Simple env-var requirements for toolsets NOT in TOOL_CATEGORIES.
@@ -567,6 +592,31 @@ def _run_post_setup(post_setup_key: str):
_print_info(" git submodule update --init --recursive")
_print_info(' uv pip install -e "./tinker-atropos"')
elif post_setup_key == "langfuse":
# Install the langfuse SDK.
try:
__import__("langfuse")
_print_success(" langfuse SDK already installed")
except ImportError:
import subprocess
_print_info(" Installing langfuse SDK...")
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "langfuse", "--quiet"],
capture_output=True, text=True, timeout=120,
)
if result.returncode == 0:
_print_success(" langfuse SDK installed")
else:
_print_warning(" langfuse SDK install failed — run manually: pip install langfuse")
# Install and enable the official optional plugin into ~/.hermes/plugins/.
try:
from hermes_cli.plugins_cmd import cmd_install as _plugins_install
_plugins_install("official/observability/langfuse", enable=True)
except SystemExit:
pass # cmd_install prints its own errors and calls sys.exit
_print_info(" Restart Hermes for tracing to take effect.")
_print_info(" Verify: hermes plugins list")
# ─── Platform / Toolset Helpers ───────────────────────────────────────────────
+313 -146
View File
@@ -22,6 +22,8 @@ import sqlite3
import threading
import time
from pathlib import Path
from agent.memory_manager import sanitize_context
from hermes_constants import get_hermes_home
from typing import Any, Callable, Dict, List, Optional, TypeVar
@@ -31,7 +33,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 9
SCHEMA_VERSION = 10
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -119,6 +121,32 @@ CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
END;
"""
# Trigram FTS5 table for CJK substring search. The default unicode61
# tokenizer splits CJK characters into individual tokens, breaking phrase
# matching. The trigram tokenizer creates overlapping 3-byte sequences so
# substring queries work natively for any script (CJK, Thai, etc.).
FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
content,
content=messages,
content_rowid=id,
tokenize='trigram'
);
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
"""
class SessionDB:
"""
@@ -257,118 +285,156 @@ class SessionDB:
self._conn.close()
self._conn = None
@staticmethod
def _parse_schema_columns(schema_sql: str) -> Dict[str, Dict[str, str]]:
"""Extract expected columns per table from SCHEMA_SQL.
Uses an in-memory SQLite database to parse the SQL SQLite itself
handles all syntax (DEFAULT expressions with commas, inline
REFERENCES, CHECK constraints, etc.) so there are zero regex
edge cases. The in-memory DB is opened, the schema DDL is
executed, and PRAGMA table_info extracts the column metadata.
Adding a column to SCHEMA_SQL is all that's needed; the
reconciliation loop picks it up automatically.
"""
ref = sqlite3.connect(":memory:")
try:
ref.executescript(schema_sql)
table_columns: Dict[str, Dict[str, str]] = {}
for (tbl,) in ref.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name NOT LIKE 'sqlite_%'"
).fetchall():
cols: Dict[str, str] = {}
for row in ref.execute(
f'PRAGMA table_info("{tbl}")'
).fetchall():
# row: (cid, name, type, notnull, dflt_value, pk)
col_name = row[1]
col_type = row[2] or ""
notnull = row[3]
default = row[4]
pk = row[5]
# Reconstruct the type expression for ALTER TABLE ADD COLUMN
parts = [col_type] if col_type else []
if notnull and not pk:
parts.append("NOT NULL")
if default is not None:
parts.append(f"DEFAULT {default}")
cols[col_name] = " ".join(parts)
table_columns[tbl] = cols
return table_columns
finally:
ref.close()
def _reconcile_columns(self, cursor: sqlite3.Cursor) -> None:
"""Ensure live tables have every column declared in SCHEMA_SQL.
Follows the Beets/sqlite-utils pattern: the CREATE TABLE definition
in SCHEMA_SQL is the single source of truth for the desired schema.
On every startup this method diffs the live columns (via PRAGMA
table_info) against the declared columns, and ADDs any that are
missing.
This makes column additions a declarative operation just add
the column to SCHEMA_SQL and it appears on the next startup.
Version-gated migration blocks are no longer needed for ADD COLUMN.
"""
expected = self._parse_schema_columns(SCHEMA_SQL)
for table_name, declared_cols in expected.items():
# Get current columns from the live table
try:
rows = cursor.execute(
f'PRAGMA table_info("{table_name}")'
).fetchall()
except sqlite3.OperationalError:
continue # Table doesn't exist yet (shouldn't happen after executescript)
live_cols = set()
for row in rows:
# PRAGMA table_info returns (cid, name, type, notnull, dflt_value, pk)
name = row[1] if isinstance(row, (tuple, list)) else row["name"]
live_cols.add(name)
for col_name, col_type in declared_cols.items():
if col_name not in live_cols:
safe_name = col_name.replace('"', '""')
try:
cursor.execute(
f'ALTER TABLE "{table_name}" ADD COLUMN "{safe_name}" {col_type}'
)
except sqlite3.OperationalError as exc:
# Expected: "duplicate column name" from a race or
# re-run. Unexpected: "Cannot add a NOT NULL column
# with default value NULL" from a schema mistake.
# Log at DEBUG so it's visible in agent.log.
logger.debug(
"reconcile %s.%s: %s", table_name, col_name, exc,
)
def _init_schema(self):
"""Create tables and FTS if they don't exist, run migrations."""
"""Create tables and FTS if they don't exist, reconcile columns.
Schema management follows the declarative reconciliation pattern
(Beets, sqlite-utils): SCHEMA_SQL is the single source of truth.
On existing databases, _reconcile_columns() diffs live columns
against SCHEMA_SQL and ADDs any missing ones. This eliminates
the version-gated migration chain for column additions, making
it impossible for reordered or inserted migrations to skip columns.
The schema_version table is retained for future data migrations
(transforming existing rows) which cannot be handled declaratively.
"""
cursor = self._conn.cursor()
cursor.executescript(SCHEMA_SQL)
# Check schema version and run migrations
# ── Declarative column reconciliation ──────────────────────────
# Diff live tables against SCHEMA_SQL and ADD any missing columns.
# This is idempotent and self-healing: even if a version-gated
# migration was skipped (e.g. due to version renumbering), the
# column gets created here.
self._reconcile_columns(cursor)
# ── Schema version bookkeeping ─────────────────────────────────
# Bump to current so future data migrations (if any) can gate on
# version. No version-gated column additions remain.
cursor.execute("SELECT version FROM schema_version LIMIT 1")
row = cursor.fetchone()
if row is None:
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
cursor.execute(
"INSERT INTO schema_version (version) VALUES (?)",
(SCHEMA_VERSION,),
)
else:
current_version = row["version"] if isinstance(row, sqlite3.Row) else row[0]
if current_version < 2:
# v2: add finish_reason column to messages
# Data migrations that can't be expressed declaratively (row
# backfills, index changes tied to a specific version step) stay
# in a version-gated chain. Column additions are handled by
# _reconcile_columns() above and no longer need entries here.
if current_version < 10:
# v10: trigram FTS5 table for CJK/substring search. The
# virtual table + triggers are created unconditionally via
# FTS_TRIGRAM_SQL below, but existing rows need a one-time
# backfill into the FTS index.
try:
cursor.execute("ALTER TABLE messages ADD COLUMN finish_reason TEXT")
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
_fts_trigram_exists = True
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 2")
if current_version < 3:
# v3: add title column to sessions
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN title TEXT")
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 3")
if current_version < 4:
# v4: add unique index on title (NULLs allowed, only non-NULL must be unique)
try:
_fts_trigram_exists = False
if not _fts_trigram_exists:
cursor.executescript(FTS_TRIGRAM_SQL)
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
"ON sessions(title) WHERE title IS NOT NULL"
"INSERT INTO messages_fts_trigram(rowid, content) "
"SELECT id, content FROM messages WHERE content IS NOT NULL"
)
except sqlite3.OperationalError:
pass # Index already exists
cursor.execute("UPDATE schema_version SET version = 4")
if current_version < 5:
new_columns = [
("cache_read_tokens", "INTEGER DEFAULT 0"),
("cache_write_tokens", "INTEGER DEFAULT 0"),
("reasoning_tokens", "INTEGER DEFAULT 0"),
("billing_provider", "TEXT"),
("billing_base_url", "TEXT"),
("billing_mode", "TEXT"),
("estimated_cost_usd", "REAL"),
("actual_cost_usd", "REAL"),
("cost_status", "TEXT"),
("cost_source", "TEXT"),
("pricing_version", "TEXT"),
]
for name, column_type in new_columns:
try:
# name and column_type come from the hardcoded tuple above,
# not user input. Double-quote identifier escaping is applied
# as defense-in-depth; SQLite DDL cannot be parameterized.
safe_name = name.replace('"', '""')
cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
except sqlite3.OperationalError:
pass
cursor.execute("UPDATE schema_version SET version = 5")
if current_version < 6:
# v6: add reasoning columns to messages table — preserves assistant
# reasoning text and structured reasoning_details across gateway
# session turns. Without these, reasoning chains are lost on
# session reload, breaking multi-turn reasoning continuity for
# providers that replay reasoning (OpenRouter, OpenAI, Nous).
for col_name, col_type in [
("reasoning", "TEXT"),
("reasoning_details", "TEXT"),
("codex_reasoning_items", "TEXT"),
]:
try:
safe = col_name.replace('"', '""')
cursor.execute(
f'ALTER TABLE messages ADD COLUMN "{safe}" {col_type}'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 6")
if current_version < 7:
# v7: preserve provider-native reasoning_content separately from
# normalized reasoning text. Kimi/Moonshot replay can require
# this field on assistant tool-call messages when thinking is on.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "reasoning_content" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 7")
if current_version < 8:
# v8: add api_call_count column to sessions — tracks the number
# of individual LLM API calls made within a session (as opposed
# to the session count itself).
try:
cursor.execute(
'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
if current_version < SCHEMA_VERSION:
cursor.execute(
"UPDATE schema_version SET version = ?",
(SCHEMA_VERSION,),
)
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
# Unique title index — always ensure it exists
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
@@ -383,6 +449,12 @@ class SessionDB:
except sqlite3.OperationalError:
cursor.executescript(FTS_SQL)
# Trigram FTS5 for CJK/substring search
try:
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
except sqlite3.OperationalError:
cursor.executescript(FTS_TRIGRAM_SQL)
self._conn.commit()
# =========================================================================
@@ -1155,7 +1227,10 @@ class SessionDB:
messages = []
for row in rows:
msg = {"role": row["role"], "content": row["content"]}
content = row["content"]
if row["role"] in {"user", "assistant"} and isinstance(content, str):
content = sanitize_context(content).strip()
msg = {"role": row["role"], "content": content}
if row["tool_call_id"]:
msg["tool_call_id"] = row["tool_call_id"]
if row["tool_name"]:
@@ -1291,6 +1366,16 @@ class SessionDB:
return sanitized.strip()
@staticmethod
def _is_cjk_codepoint(cp: int) -> bool:
return (0x4E00 <= cp <= 0x9FFF or # CJK Unified Ideographs
0x3400 <= cp <= 0x4DBF or # CJK Extension A
0x20000 <= cp <= 0x2A6DF or # CJK Extension B
0x3000 <= cp <= 0x303F or # CJK Symbols
0x3040 <= cp <= 0x309F or # Hiragana
0x30A0 <= cp <= 0x30FF or # Katakana
0xAC00 <= cp <= 0xD7AF) # Hangul Syllables
@staticmethod
def _contains_cjk(text: str) -> bool:
"""Check if text contains CJK (Chinese, Japanese, Korean) characters."""
@@ -1306,6 +1391,11 @@ class SessionDB:
return True
return False
@classmethod
def _count_cjk(cls, text: str) -> int:
"""Count CJK characters in text."""
return sum(1 for ch in text if cls._is_cjk_codepoint(ord(ch)))
def search_messages(
self,
query: str,
@@ -1376,52 +1466,113 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
with self._lock:
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
# unless query contains CJK (fall back to LIKE below)
if not self._contains_cjk(query):
return []
matches = []
else:
matches = [dict(row) for row in cursor.fetchall()]
# LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK
# characters individually, causing multi-character queries to fail.
if not matches and self._contains_cjk(query):
# CJK queries bypass the unicode61 FTS5 table. The default tokenizer
# splits CJK characters into individual tokens, so "大别山项目" becomes
# "大 AND 别 AND 山 AND 项 AND 目" — producing false positives and
# missing exact phrase matches.
#
# For queries with 3+ CJK characters, we use the trigram FTS5 table
# (indexed substring matching with ranking and snippets). For shorter
# CJK queries (1-2 chars), trigram can't match (it needs ≥9 UTF-8
# bytes = 3 CJK chars), so we fall back to LIKE.
is_cjk = self._contains_cjk(query)
if is_cjk:
raw_query = query.strip('"').strip()
like_where = ["m.content LIKE ?"]
like_params: list = [f"%{raw_query}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
cjk_count = self._count_cjk(raw_query)
if cjk_count >= 3:
# Trigram FTS5 path — quote each non-operator token to handle
# FTS5 special chars (%, *, etc.) while preserving boolean
# operators (AND, OR, NOT) for multi-term queries.
tokens = raw_query.split()
parts = []
for tok in tokens:
if tok.upper() in ("AND", "OR", "NOT"):
parts.append(tok)
else:
parts.append('"' + tok.replace('"', '""') + '"')
trigram_query = " ".join(parts)
tri_where = ["messages_fts_trigram MATCH ?"]
tri_params: list = [trigram_query]
if source_filter is not None:
tri_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
tri_params.extend(source_filter)
if exclude_sources is not None:
tri_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
tri_params.extend(exclude_sources)
if role_filter:
tri_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
tri_params.extend(role_filter)
tri_sql = f"""
SELECT
m.id,
m.session_id,
m.role,
snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
m.content,
m.timestamp,
m.tool_name,
s.source,
s.model,
s.started_at AS session_started
FROM messages_fts_trigram
JOIN messages m ON m.id = messages_fts_trigram.rowid
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(tri_where)}
ORDER BY rank
LIMIT ? OFFSET ?
"""
tri_params.extend([limit, offset])
with self._lock:
try:
tri_cursor = self._conn.execute(tri_sql, tri_params)
except sqlite3.OperationalError:
matches = []
else:
matches = [dict(row) for row in tri_cursor.fetchall()]
else:
# Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
# Fall back to LIKE substring search.
escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_where = ["m.content LIKE ? ESCAPE '\\'"]
like_params: list = [f"%{escaped}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
else:
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
else:
matches = [dict(row) for row in cursor.fetchall()]
# Add surrounding context (1 message before + after each match).
# Done outside the lock so we don't hold it across N sequential queries.
@@ -1481,16 +1632,32 @@ class SessionDB:
limit: int = 20,
offset: int = 0,
) -> List[Dict[str, Any]]:
"""List sessions, optionally filtered by source."""
"""List sessions, optionally filtered by source.
Returns rows enriched with a computed ``last_active`` column (latest
message timestamp for the session, falling back to ``started_at``),
ordered by most-recently-used first.
"""
select_with_last_active = (
"SELECT s.*, COALESCE(m.last_active, s.started_at) AS last_active "
"FROM sessions s "
"LEFT JOIN ("
"SELECT session_id, MAX(timestamp) AS last_active "
"FROM messages GROUP BY session_id"
") m ON m.session_id = s.id "
)
with self._lock:
if source:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
f"{select_with_last_active}"
"WHERE s.source = ? "
"ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
(source, limit, offset),
)
else:
cursor = self._conn.execute(
"SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
f"{select_with_last_active}"
"ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
(limit, offset),
)
return [dict(row) for row in cursor.fetchall()]
+30 -3
View File
@@ -7,9 +7,7 @@
perSystem = { pkgs, system, lib, ... }:
let
hermes-agent = inputs.self.packages.${system}.default;
hermesVenv = pkgs.callPackage ./python.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
};
hermesVenv = hermes-agent.hermesVenv;
configMergeScript = pkgs.callPackage ./configMergeScript.nix { };
@@ -193,6 +191,35 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
echo "ok" > $out/result
'';
# Verify extraPythonPackages PYTHONPATH injection
extra-python-packages = let
testPkg = pkgs.python312Packages.pyfiglet;
hermesWithExtra = hermes-agent.override {
extraPythonPackages = [ testPkg ];
};
in pkgs.runCommand "hermes-extra-python-packages" { } ''
set -e
echo "=== Checking extraPythonPackages PYTHONPATH injection ==="
grep -q "PYTHONPATH" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: PYTHONPATH not in wrapper"; exit 1)
echo "PASS: PYTHONPATH present in wrapper"
grep -q "${testPkg}" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: test package path not in PYTHONPATH"; exit 1)
echo "PASS: test package path found in wrapper"
echo "=== Checking base package has no PYTHONPATH ==="
if grep -q "PYTHONPATH" ${hermes-agent}/bin/hermes; then
echo "FAIL: base package should not have PYTHONPATH"; exit 1
fi
echo "PASS: base package clean"
echo "=== All extraPythonPackages checks passed ==="
mkdir -p $out
echo "ok" > $out/result
'';
# ── Config merge + round-trip test ────────────────────────────────
# Tests the merge script (Nix activation behavior) across 7
# scenarios, then verifies Python's load_config() reads correctly.
+186
View File
@@ -0,0 +1,186 @@
# nix/hermes-agent.nix — Overridable Hermes Agent package
#
# callPackage auto-wires nixpkgs args; flake inputs are passed explicitly.
# Users override via: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
{
lib,
stdenv,
makeWrapper,
callPackage,
python312,
nodejs_22,
ripgrep,
git,
openssh,
ffmpeg,
tirith,
# Flake inputs — passed explicitly by packages.nix and overlays.nix
uv2nix,
pyproject-nix,
pyproject-build-systems,
npm-lockfile-fix,
# Overridable parameters
extraPythonPackages ? [ ],
}:
let
hermesVenv = callPackage ./python.nix {
inherit uv2nix pyproject-nix pyproject-build-systems;
};
hermesNpmLib = callPackage ./lib.nix {
inherit npm-lockfile-fix;
};
hermesTui = callPackage ./tui.nix {
inherit hermesNpmLib;
};
hermesWeb = callPackage ./web.nix {
inherit hermesNpmLib;
};
bundledSkills = lib.cleanSourceWith {
src = ../skills;
filter = path: _type: !(lib.hasInfix "/index-cache/" path);
};
runtimeDeps = [
nodejs_22
ripgrep
git
openssh
ffmpeg
tirith
];
runtimePath = lib.makeBinPath runtimeDeps;
sitePackagesPath = python312.sitePackages;
# Walk propagatedBuildInputs to include transitive Python deps in PYTHONPATH.
# Without this, a plugin listing e.g. requests as a dep would fail at runtime
# if requests isn't already in the sealed uv2nix venv.
allExtraPythonPackages = python312.pkgs.requiredPythonModules extraPythonPackages;
pythonPath = lib.makeSearchPath sitePackagesPath allExtraPythonPackages;
pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
uvLockHash =
if builtins.pathExists ../uv.lock then
builtins.hashString "sha256" (builtins.readFile ../uv.lock)
else
"none";
in
stdenv.mkDerivation {
pname = "hermes-agent";
version = (builtins.fromTOML (builtins.readFile ../pyproject.toml)).project.version;
dontUnpack = true;
dontBuild = true;
nativeBuildInputs = [ makeWrapper ];
installPhase = ''
runHook preInstall
mkdir -p $out/share/hermes-agent $out/bin
cp -r ${bundledSkills} $out/share/hermes-agent/skills
cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
mkdir -p $out/ui-tui
cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
${lib.concatMapStringsSep "\n"
(name: ''
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
--set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
--set HERMES_TUI_DIR $out/ui-tui \
--set HERMES_PYTHON ${hermesVenv}/bin/python3 \
--set HERMES_NODE ${nodejs_22}/bin/node \
${lib.optionalString (extraPythonPackages != [ ]) ''--suffix PYTHONPATH : "${pythonPath}"''}
'')
[
"hermes"
"hermes-agent"
"hermes-acp"
]
}
${lib.optionalString (extraPythonPackages != [ ]) ''
echo "=== Checking for plugin/core package collisions ==="
${hermesVenv}/bin/python3 -c "
import pathlib, sys, re
def canonical(name):
return re.sub(r'[-_.]+', '-', name).lower()
# Collect core venv package names
core = set()
venv_sp = pathlib.Path('${hermesVenv}/${sitePackagesPath}')
for di in venv_sp.glob('*.dist-info'):
meta = di / 'METADATA'
if meta.exists():
for line in meta.read_text().splitlines():
if line.startswith('Name:'):
core.add(canonical(line.split(':', 1)[1].strip()))
break
# Check each extra package for collisions
extras_dirs = [${lib.concatMapStringsSep ", " (p: "'${toString p}'") allExtraPythonPackages}]
for edir in extras_dirs:
sp = pathlib.Path(edir) / '${sitePackagesPath}'
if not sp.exists():
continue
for di in sp.glob('*.dist-info'):
meta = di / 'METADATA'
if not meta.exists():
continue
for line in meta.read_text().splitlines():
if line.startswith('Name:'):
pkg = canonical(line.split(':', 1)[1].strip())
if pkg in core:
print(f'ERROR: plugin package \"{pkg}\" collides with a package in hermes sealed venv', file=sys.stderr)
print(f' from: {di}', file=sys.stderr)
print(f' Remove this dependency from extraPythonPackages.', file=sys.stderr)
sys.exit(1)
break
print('No collisions found.')
"
echo "=== No collisions ==="
''}
runHook postInstall
'';
passthru = {
inherit hermesTui hermesWeb hermesNpmLib hermesVenv;
devShellHook = ''
STAMP=".nix-stamps/hermes-agent"
STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-agent: installing Python dependencies..."
uv venv .venv --python ${python312}/bin/python3 2>/dev/null || true
source .venv/bin/activate
uv pip install -e ".[all]"
[ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
[ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
mkdir -p .nix-stamps
echo "$STAMP_VALUE" > "$STAMP"
else
source .venv/bin/activate
export HERMES_PYTHON=${hermesVenv}/bin/python3
fi
'';
};
meta = with lib; {
description = "AI agent with advanced tool-calling capabilities";
homepage = "https://github.com/NousResearch/hermes-agent";
mainProgram = "hermes";
license = licenses.mit;
platforms = platforms.unix;
};
}
+81 -6
View File
@@ -28,6 +28,8 @@
let
cfg = config.services.hermes-agent;
effectivePackage = if cfg.extraPythonPackages == [ ] then cfg.package
else cfg.package.override { inherit (cfg) extraPythonPackages; };
hermes-agent = inputs.self.packages.${pkgs.stdenv.hostPlatform.system}.default;
# Deep-merge config type (from 0xrsydn/nix-hermes-agent)
@@ -456,6 +458,52 @@
description = "Extra packages available on PATH.";
};
extraPlugins = mkOption {
type = types.listOf types.package;
default = [ ];
description = ''
Directory-based plugin packages to symlink into the hermes plugins
directory. Each package should contain a plugin.yaml and __init__.py
at its root. Hermes discovers these automatically on startup.
'';
example = literalExpression ''
[
(pkgs.fetchFromGitHub {
owner = "stephenschoettler";
repo = "hermes-lcm";
name = "hermes-lcm";
rev = "v0.7.0";
hash = "sha256-...";
})
]
'';
};
extraPythonPackages = mkOption {
type = types.listOf types.package;
default = [ ];
description = ''
Python packages to add to PYTHONPATH for entry-point plugin discovery.
These are pip-packaged plugins that register via the
hermes_agent.plugins entry-point group. Each package must be built
with the same Python interpreter as hermes (python312).
'';
example = literalExpression ''
[
(pkgs.python312Packages.buildPythonPackage {
pname = "rtk-hermes";
version = "1.0.0";
src = pkgs.fetchFromGitHub {
owner = "ogallotti";
repo = "rtk-hermes";
rev = "main";
hash = "sha256-...";
};
})
]
'';
};
restart = mkOption {
type = types.str;
default = "always";
@@ -570,7 +618,7 @@
# so interactive shells share state (sessions, skills, cron) with the
# gateway service instead of creating a separate ~/.hermes/.
(lib.mkIf cfg.addToSystemPackages {
environment.systemPackages = [ cfg.package ];
environment.systemPackages = [ effectivePackage ];
environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
})
@@ -581,6 +629,16 @@
});
})
# ── Assertions ─────────────────────────────────────────────────────
{
assertions = let
names = map lib.getName cfg.extraPlugins;
in [{
assertion = (lib.length names) == (lib.length (lib.unique names));
message = "services.hermes-agent.extraPlugins: duplicate plugin names detected: ${toString names}. If using fetchFromGitHub, set name = \"plugin-name\" to disambiguate.";
}];
}
# ── Warnings ──────────────────────────────────────────────────────
(lib.mkIf (cfg.container.enable && !cfg.addToSystemPackages && cfg.container.hostUsers != []) {
warnings = [
@@ -602,6 +660,7 @@
"d ${cfg.stateDir}/.hermes/sessions 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/logs 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/memories 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/plugins 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/home 0750 ${cfg.user} ${cfg.group} - -"
"d ${cfg.workingDirectory} 2770 ${cfg.user} ${cfg.group} - -"
];
@@ -623,7 +682,7 @@
find ${cfg.stateDir}/.hermes -maxdepth 1 \
\( -name "*.db" -o -name "*.db-wal" -o -name "*.db-shm" -o -name "SOUL.md" \) \
-exec chmod g+rw {} + 2>/dev/null || true
for _subdir in cron sessions logs memories; do
for _subdir in cron sessions logs memories plugins; do
mkdir -p "${cfg.stateDir}/.hermes/$_subdir"
chown ${cfg.user}:${cfg.group} "${cfg.stateDir}/.hermes/$_subdir"
chmod 2770 "${cfg.stateDir}/.hermes/$_subdir"
@@ -732,6 +791,22 @@ HERMES_NIX_ENV_EOF
${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
'') cfg.documents)}
# ── Declarative plugins ─────────────────────────────────────────
# Remove stale managed symlinks (plugins removed from config)
find ${cfg.stateDir}/.hermes/plugins -maxdepth 1 -type l -name 'nix-managed-*' -delete 2>/dev/null || true
${lib.concatStringsSep "\n" (map (plugin:
let
name = lib.getName plugin;
in ''
if [ ! -f "${plugin}/plugin.yaml" ]; then
echo "ERROR: extraPlugins entry '${plugin}' has no plugin.yaml" >&2
exit 1
fi
ln -sfn ${plugin} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
chown -h ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
'') cfg.extraPlugins)}
'';
}
@@ -762,7 +837,7 @@ HERMES_NIX_ENV_EOF
# reads them at Python startup — no systemd EnvironmentFile needed.
ExecStart = lib.concatStringsSep " " ([
"${cfg.package}/bin/hermes"
"${effectivePackage}/bin/hermes"
"gateway"
] ++ cfg.extraArgs);
@@ -785,7 +860,7 @@ HERMES_NIX_ENV_EOF
};
path = [
cfg.package
effectivePackage
pkgs.bash
pkgs.coreutils
pkgs.git
@@ -810,11 +885,11 @@ HERMES_NIX_ENV_EOF
preStart = ''
# Stable symlinks — container references these, not store paths directly
ln -sfn ${cfg.package} ${cfg.stateDir}/current-package
ln -sfn ${effectivePackage} ${cfg.stateDir}/current-package
ln -sfn ${containerEntrypoint} ${cfg.stateDir}/current-entrypoint
# GC roots so nix-collect-garbage doesn't remove store paths in use
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${cfg.package} 2>/dev/null || true
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${effectivePackage} 2>/dev/null || true
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root-entrypoint --indirect -r ${containerEntrypoint} 2>/dev/null || true
# Check if container needs (re)creation
+10
View File
@@ -0,0 +1,10 @@
# nix/overlays.nix — Expose pkgs.hermes-agent for external NixOS configs
{ inputs, ... }:
{
flake.overlays.default = final: _: {
hermes-agent = final.callPackage ./hermes-agent.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
npm-lockfile-fix = inputs.npm-lockfile-fix.packages.${final.stdenv.hostPlatform.system}.default;
};
};
}
+6 -107
View File
@@ -4,120 +4,19 @@
perSystem =
{ pkgs, inputs', ... }:
let
hermesVenv = pkgs.callPackage ./python.nix {
hermesAgent = pkgs.callPackage ./hermes-agent.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
};
hermesNpmLib = pkgs.callPackage ./lib.nix {
npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
};
hermesTui = pkgs.callPackage ./tui.nix {
inherit hermesNpmLib;
};
# Import bundled skills, excluding runtime caches
bundledSkills = pkgs.lib.cleanSourceWith {
src = ../skills;
filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
};
hermesWeb = pkgs.callPackage ./web.nix {
inherit hermesNpmLib;
};
runtimeDeps = with pkgs; [
nodejs_22
ripgrep
git
openssh
ffmpeg
tirith
];
runtimePath = pkgs.lib.makeBinPath runtimeDeps;
# Lockfile hashes for dev shell stamps
pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
uvLockHash =
if builtins.pathExists ../uv.lock then
builtins.hashString "sha256" (builtins.readFile ../uv.lock)
else
"none";
in
{
packages = {
default = pkgs.stdenv.mkDerivation {
pname = "hermes-agent";
version = (fromTOML (builtins.readFile ../pyproject.toml)).project.version;
default = hermesAgent;
tui = hermesAgent.hermesTui;
web = hermesAgent.hermesWeb;
dontUnpack = true;
dontBuild = true;
nativeBuildInputs = [ pkgs.makeWrapper ];
installPhase = ''
runHook preInstall
mkdir -p $out/share/hermes-agent $out/bin
cp -r ${bundledSkills} $out/share/hermes-agent/skills
cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
# copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
mkdir -p $out/ui-tui
cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
${pkgs.lib.concatMapStringsSep "\n"
(name: ''
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
--set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
--set HERMES_TUI_DIR $out/ui-tui \
--set HERMES_PYTHON ${hermesVenv}/bin/python3 \
--set HERMES_NODE ${pkgs.nodejs_22}/bin/node
'')
[
"hermes"
"hermes-agent"
"hermes-acp"
]
}
runHook postInstall
'';
passthru.devShellHook = ''
STAMP=".nix-stamps/hermes-agent"
STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-agent: installing Python dependencies..."
uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true
source .venv/bin/activate
uv pip install -e ".[all]"
[ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
[ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
mkdir -p .nix-stamps
echo "$STAMP_VALUE" > "$STAMP"
else
source .venv/bin/activate
export HERMES_PYTHON=${hermesVenv}/bin/python3
fi
'';
meta = with pkgs.lib; {
description = "AI agent with advanced tool-calling capabilities";
homepage = "https://github.com/NousResearch/hermes-agent";
mainProgram = "hermes";
license = licenses.mit;
platforms = platforms.unix;
};
};
tui = hermesTui;
web = hermesWeb;
fix-lockfiles = hermesNpmLib.mkFixLockfiles {
packages = [ hermesTui hermesWeb ];
fix-lockfiles = hermesAgent.hermesNpmLib.mkFixLockfiles {
packages = [ hermesAgent.hermesTui hermesAgent.hermesWeb ];
};
};
};
+2 -1
View File
@@ -7,6 +7,7 @@
pyproject-nix,
pyproject-build-systems,
stdenv,
dependency-groups ? [ "all" ],
}:
let
workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
@@ -96,5 +97,5 @@ let
]);
in
pythonSet.mkVirtualEnv "hermes-agent-env" {
hermes-agent = [ "all" ];
hermes-agent = dependency-groups;
}
+1
View File
@@ -17,6 +17,7 @@ pkgs.buildNpmPackage (npm // {
inherit src npmDeps version;
doCheck = false;
npmFlags = [ "--legacy-peer-deps" ];
installPhase = ''
runHook preInstall
@@ -0,0 +1,875 @@
"""langfuse — Hermes plugin for Langfuse observability.
Traces Hermes conversations, LLM calls, and tool usage to Langfuse.
Enable via ``hermes tools`` or by setting HERMES_LANGFUSE_ENABLED=true
and the required credentials in ~/.hermes/.env.
Required env vars (set via ``hermes tools`` or ~/.hermes/.env):
HERMES_LANGFUSE_ENABLED - set to "true" to activate tracing
HERMES_LANGFUSE_PUBLIC_KEY - Langfuse project public key (pk-lf-...)
HERMES_LANGFUSE_SECRET_KEY - Langfuse project secret key (sk-lf-...)
HERMES_LANGFUSE_BASE_URL - Langfuse server URL (default: https://cloud.langfuse.com)
Optional env vars:
HERMES_LANGFUSE_ENV - environment tag (e.g. "production", "local")
HERMES_LANGFUSE_RELEASE - release/version tag
HERMES_LANGFUSE_SAMPLE_RATE - sampling rate 0.01.0 (default: 1.0)
HERMES_LANGFUSE_MAX_CHARS - max chars per field (default: 12000)
HERMES_LANGFUSE_DEBUG - set to "true" for verbose logging
"""
from __future__ import annotations
import json
import logging
import os
import re
import threading
import time
from dataclasses import dataclass, field
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
try:
from langfuse import Langfuse, propagate_attributes
except Exception: # pragma: no cover - fail-open when optional dep is missing
Langfuse = None
propagate_attributes = None
@dataclass
class TraceState:
trace_id: str
root_ctx: Any
root_span: Any
generations: Dict[str, Any] = field(default_factory=dict)
tools: Dict[str, Any] = field(default_factory=dict)
turn_tool_calls: list[dict[str, Any]] = field(default_factory=list)
last_updated_at: float = field(default_factory=time.time)
_STATE_LOCK = threading.Lock()
_TRACE_STATE: Dict[str, TraceState] = {}
_LANGFUSE_CLIENT = None
_READ_FILE_LINE_RE = re.compile(r"^\s*(\d+)\|(.*)$")
_READ_FILE_HEAD_LINES = 25
_READ_FILE_TAIL_LINES = 15
def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default).strip()
def _env_bool(*names: str) -> bool:
for name in names:
value = _env(name).lower()
if value:
return value in {"1", "true", "yes", "on"}
return False
def _debug_enabled() -> bool:
return _env_bool("HERMES_LANGFUSE_DEBUG")
def _debug(message: str) -> None:
if _debug_enabled():
logger.info("Langfuse tracing: %s", message)
def _is_enabled() -> bool:
if Langfuse is None:
return False
# Primary activation path: config.yaml plugins.langfuse.enabled
try:
from hermes_cli.config import load_config
_cfg = load_config()
_plugin_cfg = _cfg.get("plugins", {})
if isinstance(_plugin_cfg, dict):
_lt_cfg = _plugin_cfg.get("langfuse", {})
if isinstance(_lt_cfg, dict) and "enabled" in _lt_cfg:
if not _lt_cfg["enabled"]:
return False
# Explicit enabled=true in config — skip env-var check below
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
return bool(public_key and secret_key)
except Exception:
pass
# Backward-compat path: HERMES_LANGFUSE_ENABLED env var (legacy .env installs)
if not _env_bool("HERMES_LANGFUSE_ENABLED"):
return False
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
return bool(public_key and secret_key)
def _get_langfuse() -> Optional[Langfuse]:
global _LANGFUSE_CLIENT
if not _is_enabled():
return None
if _LANGFUSE_CLIENT is not None:
return _LANGFUSE_CLIENT
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
base_url = _env("HERMES_LANGFUSE_BASE_URL") or _env("LANGFUSE_BASE_URL") or "https://cloud.langfuse.com"
environment = _env("HERMES_LANGFUSE_ENV") or _env("LANGFUSE_ENV")
release = _env("HERMES_LANGFUSE_RELEASE") or _env("LANGFUSE_RELEASE")
sample_rate = _env("HERMES_LANGFUSE_SAMPLE_RATE")
kwargs: Dict[str, Any] = {
"public_key": public_key,
"secret_key": secret_key,
"base_url": base_url,
}
if environment:
kwargs["environment"] = environment
if release:
kwargs["release"] = release
if sample_rate:
try:
kwargs["sample_rate"] = float(sample_rate)
except ValueError:
logger.warning("Invalid HERMES_LANGFUSE_SAMPLE_RATE=%r", sample_rate)
try:
_LANGFUSE_CLIENT = Langfuse(**kwargs)
except Exception as exc: # pragma: no cover - fail-open
logger.warning("Could not initialize Langfuse client: %s", exc)
return None
return _LANGFUSE_CLIENT
def _trace_key(task_id: str, session_id: str) -> str:
if task_id:
return task_id
if session_id:
return f"session:{session_id}"
return f"thread:{threading.get_ident()}"
def _truncate_text(value: str, max_chars: int) -> str:
if len(value) <= max_chars:
return value
return value[:max_chars] + f"... [truncated {len(value) - max_chars} chars]"
def _maybe_parse_json_string(value: str) -> Any:
stripped = value.strip()
if len(stripped) < 2 or stripped[0] not in "{[" or stripped[-1] not in "}]":
if len(stripped) < 2 or stripped[0] not in "{[":
return value
try:
parsed, idx = json.JSONDecoder().raw_decode(stripped)
except Exception:
return value
if not isinstance(parsed, (dict, list)):
return value
trailing = stripped[idx:].strip()
if not trailing:
return parsed
hint_key = "_hint" if trailing.startswith("[Hint:") else "_trailing_text"
if isinstance(parsed, dict):
merged = dict(parsed)
key = hint_key if hint_key not in merged else "_trailing_text"
merged[key] = trailing
return merged
return {"data": parsed, hint_key: trailing}
def _looks_like_read_file_payload(value: Any) -> bool:
if not isinstance(value, dict):
return False
content = value.get("content")
return (
isinstance(content, str)
and "total_lines" in value
and "file_size" in value
and "is_binary" in value
and "is_image" in value
and not value.get("error")
)
def _parse_read_file_lines(content: str) -> list[dict[str, Any]]:
if not isinstance(content, str) or not content:
return []
lines = []
for raw_line in content.splitlines():
match = _READ_FILE_LINE_RE.match(raw_line)
if not match:
return []
lines.append({
"line": int(match.group(1)),
"text": match.group(2),
})
return lines
def _build_read_file_preview(lines: list[dict[str, Any]]) -> dict[str, Any]:
if len(lines) <= (_READ_FILE_HEAD_LINES + _READ_FILE_TAIL_LINES):
return {"lines": lines}
return {
"head": lines[:_READ_FILE_HEAD_LINES],
"tail": lines[-_READ_FILE_TAIL_LINES:],
"omitted_line_count": len(lines) - _READ_FILE_HEAD_LINES - _READ_FILE_TAIL_LINES,
}
def _normalize_read_file_payload(value: dict[str, Any], *, args: Any = None) -> dict[str, Any]:
normalized: dict[str, Any] = {}
if isinstance(args, dict):
path = args.get("path")
offset = args.get("offset")
limit = args.get("limit")
if isinstance(path, str) and path:
normalized["path"] = path
if isinstance(offset, int):
normalized["offset"] = offset
if isinstance(limit, int):
normalized["limit"] = limit
lines = _parse_read_file_lines(value.get("content", ""))
if lines:
normalized["returned_lines"] = {
"start": lines[0]["line"],
"end": lines[-1]["line"],
"count": len(lines),
}
normalized["content_preview"] = _build_read_file_preview(lines)
elif value.get("content"):
normalized["content_preview"] = {
"text": value.get("content", ""),
}
for key in (
"total_lines",
"file_size",
"truncated",
"is_binary",
"is_image",
"hint",
"_warning",
"mime_type",
"dimensions",
"similar_files",
"error",
):
if key in value:
normalized[key] = value[key]
base64_content = value.get("base64_content")
if isinstance(base64_content, str) and base64_content:
normalized["base64_content"] = {
"omitted": True,
"length": len(base64_content),
}
return normalized
def _normalize_payload(value: Any, *, tool_name: str = "", args: Any = None) -> Any:
if _looks_like_read_file_payload(value):
return _normalize_read_file_payload(
value,
args=args if tool_name == "read_file" else None,
)
return value
def _safe_value(value: Any, *, max_chars: Optional[int] = None, depth: int = 0,
parse_json_strings: bool = False) -> Any:
max_chars = max_chars if max_chars is not None else int(_env("HERMES_LANGFUSE_MAX_CHARS", "12000") or "12000")
if depth > 4:
return "<max-depth>"
if value is None or isinstance(value, (int, float, bool)):
return value
if isinstance(value, bytes):
return {"type": "bytes", "len": len(value)}
if isinstance(value, str):
if parse_json_strings:
parsed = _maybe_parse_json_string(value)
if parsed is not value:
return _safe_value(parsed, max_chars=max_chars, depth=depth, parse_json_strings=True)
return _truncate_text(value, max_chars)
if isinstance(value, dict):
normalized = _normalize_payload(value)
if normalized is not value:
return _safe_value(normalized, max_chars=max_chars, depth=depth, parse_json_strings=parse_json_strings)
return {
str(k): _safe_value(v, max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
for k, v in list(value.items())[:50]
}
if isinstance(value, (list, tuple, set)):
return [
_safe_value(v, max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
for v in list(value)[:50]
]
if hasattr(value, "__dict__"):
return _safe_value(vars(value), max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
return _truncate_text(repr(value), max_chars)
def _extract_last_user_message(messages: Any) -> Any:
if not isinstance(messages, list):
return None
for message in reversed(messages):
if isinstance(message, dict) and message.get("role") == "user":
return {
"role": "user",
"content": _safe_value(message.get("content")),
}
return None
def _serialize_messages(messages: Any) -> list[dict[str, Any]]:
if not isinstance(messages, list):
return []
serialized = []
for message in messages[-12:]:
if not isinstance(message, dict):
continue
role = message.get("role")
item = {
"role": role,
"content": _safe_value(
message.get("content"),
parse_json_strings=(role == "tool"),
),
}
if role == "tool" and message.get("tool_call_id"):
item["tool_call_id"] = message.get("tool_call_id")
if message.get("tool_calls"):
item["tool_calls"] = _safe_value(message.get("tool_calls"), parse_json_strings=True)
serialized.append(item)
return serialized
def _serialize_tool_calls(tool_calls: Any) -> list[dict[str, Any]]:
if not tool_calls:
return []
serialized = []
for tool_call in tool_calls:
fn = getattr(tool_call, "function", None)
name = getattr(fn, "name", None) if fn else None
arguments = getattr(fn, "arguments", None) if fn else None
if isinstance(arguments, str):
try:
arguments = json.loads(arguments)
except Exception:
pass
serialized.append({
"id": getattr(tool_call, "id", None),
"name": name,
"arguments": _safe_value(arguments, parse_json_strings=True),
})
return serialized
def _serialize_assistant_message(message: Any) -> dict[str, Any]:
return {
"content": _safe_value(getattr(message, "content", None)),
"reasoning": _safe_value(getattr(message, "reasoning", None)),
"tool_calls": _serialize_tool_calls(getattr(message, "tool_calls", None)),
}
def _usage_and_cost(response: Any, *, provider: str, api_mode: str, model: str, base_url: str) -> tuple[dict[str, int], dict[str, float]]:
usage_details: Dict[str, int] = {}
cost_details: Dict[str, float] = {}
raw_usage = getattr(response, "usage", None)
if not raw_usage:
return usage_details, cost_details
try:
from agent.usage_pricing import estimate_usage_cost, normalize_usage
canonical = normalize_usage(raw_usage, provider=provider, api_mode=api_mode)
# Langfuse usage_details keys follow a naming convention:
# - Dashboard sums all keys containing "input" as input total
# - Dashboard sums all keys containing "output" as output total
# - If no "total" key, Langfuse derives it from all usage types
# Use Anthropic-style key names so cache tokens roll into the
# dashboard input total automatically.
# Ref: https://langfuse.com/docs/model-usage-and-cost
usage_details = {
"input": canonical.input_tokens,
"output": canonical.output_tokens,
}
if canonical.cache_read_tokens:
usage_details["cache_read_input_tokens"] = canonical.cache_read_tokens
if canonical.cache_write_tokens:
usage_details["cache_creation_input_tokens"] = canonical.cache_write_tokens
if canonical.reasoning_tokens:
usage_details["reasoning_tokens"] = canonical.reasoning_tokens
cost = estimate_usage_cost(
model,
canonical,
provider=provider,
base_url=base_url,
api_key="",
)
if cost.amount_usd is not None:
# Langfuse cost_details keys must match usage_details keys.
# Provide per-type breakdown so dashboard can show cost by type.
try:
from agent.usage_pricing import get_pricing_entry
from decimal import Decimal
_ONE_M = Decimal("1000000")
entry = get_pricing_entry(model, provider=provider, base_url=base_url)
if entry:
if entry.input_cost_per_million is not None and canonical.input_tokens:
cost_details["input"] = float(Decimal(canonical.input_tokens) * entry.input_cost_per_million / _ONE_M)
if entry.output_cost_per_million is not None and canonical.output_tokens:
cost_details["output"] = float(Decimal(canonical.output_tokens) * entry.output_cost_per_million / _ONE_M)
if entry.cache_read_cost_per_million is not None and canonical.cache_read_tokens:
cost_details["cache_read_input_tokens"] = float(Decimal(canonical.cache_read_tokens) * entry.cache_read_cost_per_million / _ONE_M)
if entry.cache_write_cost_per_million is not None and canonical.cache_write_tokens:
cost_details["cache_creation_input_tokens"] = float(Decimal(canonical.cache_write_tokens) * entry.cache_write_cost_per_million / _ONE_M)
else:
cost_details["total"] = float(cost.amount_usd)
except Exception:
cost_details["total"] = float(cost.amount_usd)
except Exception as exc: # pragma: no cover - fail-open
_debug(f"usage normalization failed: {exc}")
return usage_details, cost_details
def _start_root_trace(task_key: str, *, task_id: str, session_id: str, platform: str, provider: str, model: str,
api_mode: str, messages: Any, client: Langfuse) -> TraceState:
trace_id = client.create_trace_id(seed=f"{session_id or 'sessionless'}::{task_id or task_key}")
trace_input = _extract_last_user_message(messages)
metadata = {
"source": "hermes",
"task_id": task_id,
"platform": platform,
"provider": provider,
"model": model,
"api_mode": api_mode,
}
# session_id must be passed in trace_context for Langfuse session grouping.
trace_ctx: Dict[str, Any] = {"trace_id": trace_id}
if session_id:
trace_ctx["session_id"] = session_id
if propagate_attributes is not None:
try:
with propagate_attributes(
session_id=session_id or task_key,
trace_name="Hermes turn",
tags=["hermes", "langfuse"],
):
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
except Exception:
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
else:
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
try:
root_span.set_trace_io(input=trace_input)
except Exception:
pass
_debug(f"started trace {trace_id} for {task_key}")
return TraceState(trace_id=trace_id, root_ctx=root_ctx, root_span=root_span)
def _start_child_observation(state: TraceState, *, client: Langfuse, name: str, as_type: str,
input_value: Any, metadata: Optional[dict] = None,
model: Optional[str] = None, model_parameters: Optional[dict] = None) -> Any:
return state.root_span.start_observation(
name=name,
as_type=as_type,
input=input_value,
metadata=metadata or {},
model=model,
model_parameters=model_parameters,
)
def _end_observation(observation: Any, *, output: Any = None, metadata: Optional[dict] = None,
usage_details: Optional[dict] = None, cost_details: Optional[dict] = None) -> None:
if observation is None:
return
try:
update_kwargs: Dict[str, Any] = {}
if output is not None:
update_kwargs["output"] = output
if metadata:
update_kwargs["metadata"] = metadata
if usage_details:
update_kwargs["usage_details"] = usage_details
if cost_details:
update_kwargs["cost_details"] = cost_details
if update_kwargs:
observation.update(**update_kwargs)
observation.end()
except Exception as exc: # pragma: no cover - fail-open
_debug(f"end observation failed: {exc}")
def _merge_trace_output(output: Any, state: TraceState) -> Any:
if not state.turn_tool_calls:
return output
merged = dict(output) if isinstance(output, dict) else {"content": output}
merged["tool_calls"] = list(state.turn_tool_calls)
return merged
def _finish_trace(task_key: str, *, output: Any = None) -> None:
client = _get_langfuse()
if client is None:
return
with _STATE_LOCK:
state = _TRACE_STATE.pop(task_key, None)
if state is None:
return
try:
for observation in state.generations.values():
_end_observation(observation)
for observation in state.tools.values():
_end_observation(observation)
final_output = _merge_trace_output(output, state)
if final_output is not None:
state.root_span.set_trace_io(output=final_output)
state.root_span.update(output=final_output)
state.root_span.end()
except Exception as exc: # pragma: no cover - fail-open
_debug(f"finish trace failed: {exc}")
finally:
try:
client.flush()
except Exception:
pass
def _assistant_has_tool_calls(message: Any) -> bool:
return bool(getattr(message, "tool_calls", None))
def _request_key(api_call_count: Any) -> str:
return str(api_call_count or 0)
def on_pre_llm_call(*, task_id: str = "", session_id: str = "", platform: str = "", model: str = "",
provider: str = "", base_url: str = "", api_mode: str = "",
api_call_count: int = 0, messages: Any = None, turn_type: str = "user",
conversation_history: Any = None, user_message: Any = None, **_: Any) -> None:
# Older Hermes branches used pre_llm_call for request-scoped tracing and
# passed the actual API messages. Current Hermes also has a turn-scoped
# pre_llm_call used for context injection; tracing that hook creates an
# extra orphan/root trace before the real request trace. Only trace the
# legacy request-shaped call here.
if not isinstance(messages, list):
return
client = _get_langfuse()
if client is None:
return
# messages is a list only for legacy Hermes branches that fired
# pre_llm_call with API messages directly. Current Hermes fires
# pre_llm_call for context injection (conversation_history/user_message,
# no messages list) — tracing that would create orphan traces.
task_key = _trace_key(task_id, session_id)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
state = _start_root_trace(
task_key,
task_id=task_id,
session_id=session_id,
platform=platform,
provider=provider,
model=model,
api_mode=api_mode,
messages=messages,
client=client,
)
_TRACE_STATE[task_key] = state
state.last_updated_at = time.time()
def on_pre_llm_request(
*,
task_id: str = "",
session_id: str = "",
platform: str = "",
model: str = "",
provider: str = "",
base_url: str = "",
api_mode: str = "",
api_call_count: int = 0,
messages: Any = None,
turn_type: str = "user",
message_count: int = 0,
tool_count: int = 0,
approx_input_tokens: int = 0,
request_char_count: int = 0,
max_tokens: Any = None,
**_: Any,
) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
req_key = _request_key(api_call_count)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
state = _start_root_trace(
task_key,
task_id=task_id,
session_id=session_id,
platform=platform,
provider=provider,
model=model,
api_mode=api_mode,
messages=messages,
client=client,
)
_TRACE_STATE[task_key] = state
state.last_updated_at = time.time()
previous = state.generations.pop(req_key, None)
if previous is not None:
_end_observation(previous)
state.generations[req_key] = _start_child_observation(
state,
client=client,
name=f"LLM call {api_call_count}",
as_type="generation",
input_value=_serialize_messages(messages),
metadata={
"provider": provider,
"platform": platform,
"api_mode": api_mode,
"base_url": base_url,
},
model=model,
model_parameters={"api_mode": api_mode, "provider": provider},
)
def on_post_llm_call(*, task_id: str = "", session_id: str = "", provider: str = "", base_url: str = "",
api_mode: str = "", model: str = "", api_call_count: int = 0,
assistant_message: Any = None, response: Any = None,
api_duration: float = 0.0, finish_reason: str = "",
usage: Any = None, assistant_content_chars: int = 0,
assistant_tool_call_count: int = 0, assistant_response: Any = None,
**_: Any) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
req_key = _request_key(api_call_count)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
generation = state.generations.pop(req_key, None) if state else None
if state is None or generation is None:
return
# Handle both call patterns:
# 1. post_api_request: passes usage (dict), assistant_content_chars, assistant_tool_call_count
# 2. post_llm_call: passes assistant_message (object), response (object), assistant_response (str)
if assistant_message is not None:
output = _serialize_assistant_message(assistant_message)
elif assistant_response is not None:
# post_llm_call passes assistant_response as a plain string
output = {"content": _safe_value(assistant_response), "reasoning": None, "tool_calls": []}
else:
# post_api_request path — reconstruct from summary kwargs
output = {
"content": f"[{assistant_content_chars} chars]" if assistant_content_chars else None,
"reasoning": None,
"tool_calls": [{"id": f"tc_{i}"} for i in range(assistant_tool_call_count)] if assistant_tool_call_count else [],
}
if output.get("tool_calls"):
state.turn_tool_calls.extend(output["tool_calls"])
# Extract usage: prefer response object, fall back to usage dict from post_api_request
if response is not None:
usage_details, cost_details = _usage_and_cost(
response,
provider=provider,
api_mode=api_mode,
model=model,
base_url=base_url,
)
elif isinstance(usage, dict) and usage:
# post_api_request passes a pre-built CanonicalUsage summary dict.
# Use Langfuse-convention key names: "input", "output", and
# "cache_read_input_tokens" / "cache_creation_input_tokens" so the
# dashboard sums cache tokens into the input total automatically.
_input = usage.get("input_tokens", 0)
_output = usage.get("output_tokens", 0) or usage.get("completion_tokens", 0)
_cache_read = usage.get("cache_read_tokens", 0)
_cache_write = usage.get("cache_write_tokens", 0)
_reasoning = usage.get("reasoning_tokens", 0)
usage_details = {
"input": _input,
"output": _output,
}
if _cache_read:
usage_details["cache_read_input_tokens"] = _cache_read
if _cache_write:
usage_details["cache_creation_input_tokens"] = _cache_write
if _reasoning:
usage_details["reasoning_tokens"] = _reasoning
cost_details = {}
# Estimate per-type cost from the summary if possible
try:
from agent.usage_pricing import CanonicalUsage, estimate_usage_cost, get_pricing_entry
from decimal import Decimal
_ONE_M = Decimal("1000000")
_cu = CanonicalUsage(
input_tokens=_input,
output_tokens=_output,
cache_read_tokens=_cache_read,
cache_write_tokens=_cache_write,
reasoning_tokens=_reasoning,
)
entry = get_pricing_entry(model, provider=provider, base_url=base_url)
if entry:
if entry.input_cost_per_million is not None and _input:
cost_details["input"] = float(Decimal(_input) * entry.input_cost_per_million / _ONE_M)
if entry.output_cost_per_million is not None and _output:
cost_details["output"] = float(Decimal(_output) * entry.output_cost_per_million / _ONE_M)
if entry.cache_read_cost_per_million is not None and _cache_read:
cost_details["cache_read_input_tokens"] = float(Decimal(_cache_read) * entry.cache_read_cost_per_million / _ONE_M)
if entry.cache_write_cost_per_million is not None and _cache_write:
cost_details["cache_creation_input_tokens"] = float(Decimal(_cache_write) * entry.cache_write_cost_per_million / _ONE_M)
else:
_cost = estimate_usage_cost(model, _cu, provider=provider, base_url=base_url, api_key="")
if _cost.amount_usd is not None:
cost_details["total"] = float(_cost.amount_usd)
except Exception:
pass
else:
usage_details, cost_details = {}, {}
tool_count = len(output.get("tool_calls", [])) or assistant_tool_call_count
gen_metadata: Dict[str, Any] = {"tool_call_count": tool_count}
if api_duration and api_duration > 0:
gen_metadata["api_duration_s"] = round(api_duration, 3)
if finish_reason:
gen_metadata["finish_reason"] = finish_reason
_end_observation(
generation,
output=output,
usage_details=usage_details,
cost_details=cost_details,
metadata=gen_metadata,
)
has_tools = _assistant_has_tool_calls(assistant_message) if assistant_message else (assistant_tool_call_count > 0)
has_content = bool(output.get("content"))
if not has_tools and has_content:
_finish_trace(task_key, output=output)
def on_pre_tool_call(*, tool_name: str = "", args: Any = None, task_id: str = "",
session_id: str = "", tool_call_id: str = "", **_: Any) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
tool_key = tool_call_id or f"{tool_name}:{time.time_ns()}"
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
return
state.tools[tool_key] = _start_child_observation(
state,
client=client,
name=f"Tool: {tool_name}",
as_type="tool",
input_value=_safe_value(args),
metadata={"tool_name": tool_name, "tool_call_id": tool_call_id},
)
def on_post_tool_call(*, tool_name: str = "", args: Any = None, result: Any = None,
task_id: str = "", session_id: str = "", tool_call_id: str = "", **_: Any) -> None:
task_key = _trace_key(task_id, session_id)
tool_key = tool_call_id or ""
observation = None
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
return
if tool_key:
observation = state.tools.pop(tool_key, None)
elif state.tools:
_, observation = state.tools.popitem()
if observation is None:
return
if isinstance(result, str):
result_value = _maybe_parse_json_string(result)
else:
result_value = result
result_value = _normalize_payload(result_value, tool_name=tool_name, args=args)
_end_observation(
observation,
output=_safe_value(result_value, parse_json_strings=True),
metadata={"tool_name": tool_name, "args": _safe_value(args, parse_json_strings=True)},
)
def register(ctx) -> None:
# Register for both hook name variants so the plugin works across
# Hermes versions. pre_api_request / post_api_request fire per API
# call (preferred); pre_llm_call / post_llm_call fire once per turn.
ctx.register_hook("pre_api_request", on_pre_llm_request)
ctx.register_hook("post_api_request", on_post_llm_call)
ctx.register_hook("pre_llm_call", on_pre_llm_call)
ctx.register_hook("post_llm_call", on_post_llm_call)
ctx.register_hook("pre_tool_call", on_pre_tool_call)
ctx.register_hook("post_tool_call", on_post_tool_call)
@@ -0,0 +1,38 @@
# After installing langfuse
Langfuse tracing is now installed and enabled for your Hermes profile.
## Required credentials
Set these in `~/.hermes/.env` (or via `hermes tools` → Langfuse Observability):
```bash
HERMES_LANGFUSE_PUBLIC_KEY=pk-lf-...
HERMES_LANGFUSE_SECRET_KEY=sk-lf-...
HERMES_LANGFUSE_BASE_URL=https://cloud.langfuse.com # or your self-hosted URL
```
## Verify
```bash
hermes plugins list # langfuse should appear as enabled
hermes chat -q "hello" # then check Langfuse for a "Hermes turn" trace
```
## Optional settings
```bash
HERMES_LANGFUSE_ENV=production # environment tag
HERMES_LANGFUSE_RELEASE=v1.0.0 # release tag
HERMES_LANGFUSE_SAMPLE_RATE=0.5 # sample 50% of traces
HERMES_LANGFUSE_MAX_CHARS=12000 # max chars per field (default: 12000)
HERMES_LANGFUSE_DEBUG=true # verbose plugin logging
```
## Dependencies
The `langfuse` Python SDK is required. Install it into your Hermes venv:
```bash
pip install langfuse
```
@@ -0,0 +1,14 @@
name: langfuse
version: "1.0.0"
description: "Optional Langfuse observability for Hermes — traces conversations, LLM calls, and tool usage. Install via: hermes plugins install official/observability/langfuse"
author: NousResearch
requires_env:
- HERMES_LANGFUSE_PUBLIC_KEY
- HERMES_LANGFUSE_SECRET_KEY
hooks:
- pre_api_request
- post_api_request
- pre_llm_call
- post_llm_call
- pre_tool_call
- post_tool_call
+131
View File
@@ -0,0 +1,131 @@
# google_meet plugin
Let the hermes agent join a Google Meet call, transcribe it, optionally speak
in it, and do the followup work afterwards.
## What ships
| Version | What | Status |
|---|---|---|
| v1 | Transcribe-only: Playwright joins Meet, scrapes captions to transcript file | ✓ ships by default |
| v2 | Realtime duplex audio: bot speaks in-call via OpenAI Realtime + BlackHole/PulseAudio null-sink | ✓ opt in with `mode='realtime'` |
| v3 | Remote node host: run the bot on a different machine than the gateway | ✓ opt in with `node='<name>'` |
## Architecture
```
┌─ gateway (Linux box, where hermes runs) ────────────────────────────┐
│ │
│ agent → meet_join(url, mode='realtime', node='my-mac') │
│ │ │
│ └─ NodeClient ─── ws ────┐ │
│ │ │
└──────────────────────────────────┼───────────────────────────────────┘
│ wss (token auth)
┌─ node host (user's Mac, signed-in Chrome lives here) ───────────────┐
│ │
│ NodeServer (from `hermes meet node run`) │
│ │ │
│ ├─ start_bot → process_manager.start() → spawns meet_bot │
│ │ │
│ └─ meet_bot (Playwright) │
│ ├─ Chromium → meet.google.com │
│ ├─ caption scraper → transcript.txt │
│ └─ (realtime mode only) RealtimeSpeaker thread │
│ ↓ │
│ OpenAI Realtime WS → speaker.pcm │
│ ↓ │
│ paplay → null-sink ← Chrome fake mic │
│ │
└──────────────────────────────────────────────────────────────────────┘
```
Without v3: the whole right column runs on the gateway machine.
Without v2: the "realtime" path is skipped; transcribe runs alone.
## Files
| Path | Purpose |
|---|---|
| `plugin.yaml` | manifest |
| `__init__.py` | `register(ctx)` — registers 5 tools + `on_session_end` hook + `hermes meet` CLI |
| `meet_bot.py` | Playwright bot subprocess (standalone, `python -m plugins.google_meet.meet_bot`) |
| `process_manager.py` | local bot lifecycle + `enqueue_say` |
| `tools.py` | agent-facing tools + node-routing helper |
| `cli.py` | `hermes meet setup / auth / join / status / transcript / say / stop / node ...` |
| `audio_bridge.py` | v2: PulseAudio null-sink (Linux) + BlackHole probe (macOS) |
| `realtime/openai_client.py` | v2: `RealtimeSession` + `RealtimeSpeaker` (file-queue → OpenAI Realtime WS → PCM) |
| `node/protocol.py` | v3: message envelope + validation |
| `node/registry.py` | v3: `$HERMES_HOME/workspace/meetings/nodes.json` |
| `node/server.py` | v3: `NodeServer` (runs on host machine) |
| `node/client.py` | v3: `NodeClient` (used by tool handlers + CLI on gateway) |
| `node/cli.py` | v3: `hermes meet node {run,list,approve,remove,status,ping}` |
| `SKILL.md` | agent usage guide |
## Local quick start
```bash
hermes plugins enable google_meet
hermes meet install # pip + Chromium
hermes meet setup # preflight
hermes meet auth # optional
hermes meet join https://meet.google.com/abc-defg-hij # transcribe
```
## Realtime mode
Linux (preferred, most automated):
```bash
hermes meet install --realtime # installs pulseaudio-utils
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
# then from the agent or CLI:
hermes meet say "Good morning everyone, I'm the note-taker bot."
```
macOS:
```bash
hermes meet install --realtime # runs: brew install blackhole-2ch ffmpeg
# then — manually! — open System Settings → Sound → Input → BlackHole 2ch
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
```
On macOS, hermes will **not** switch your system audio input automatically — the
user has to do it. This is deliberate: switching default input on a whim would
be a surprising side effect.
## Remote node host
On the node machine (e.g. user's Mac with a signed-in Chrome):
```bash
pip install playwright websockets
python -m playwright install chromium
hermes plugins enable google_meet
hermes meet node run --display-name my-mac --host 0.0.0.0 --port 18789
# prints the bearer token on first run; copy it
```
On the gateway:
```bash
hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
hermes meet node ping my-mac
# now any meet_* tool call accepts node='my-mac' (or 'auto')
```
## Safety
- URL gate: only `https://meet.google.com/abc-defg-hij`, `/new`, `/lookup/<id>`.
- No calendar scanning, no auto-dial, no auto-consent announcement.
- Node server uses bearer-token auth; no key exchange, no TLS termination
built in — run it on a LAN or behind a reverse proxy you trust.
- One active meeting per (gateway, node) pair. A second `meet_join` leaves the first.
- `meet_say` refuses unless the active meeting was started with `mode='realtime'`.
## Out of scope
- **Calendar scanning** — deliberately not implemented. Join URLs must be explicit.
- **Multi-tenant node sharing** — a node serves one gateway at a time.
- **Windows** — audio bridging isn't tested; `register()` no-ops on Windows.
- **System audio input switching on macOS** — user responsibility, not the bot's.
+148
View File
@@ -0,0 +1,148 @@
---
name: google_meet
description: Join a Google Meet call, transcribe live captions, optionally speak in realtime, and do the followup work afterwards. Use when the user asks the agent to sit in on a meeting, take notes, summarize, respond in-call, or action items from it.
version: 0.2.0
platforms:
- linux
- macos
metadata:
hermes:
tags: [meetings, google-meet, transcription, realtime-voice]
---
# google_meet
## When to use
The user says any of:
- "join my Meet at <url>"
- "take notes on this meeting"
- "summarize the meeting and send followups"
- "sit in on my standup"
- "be a bot in this call and speak up when X"
## Two modes
| Mode | What the bot does |
|---|---|
| `transcribe` (default) | Joins, enables captions, scrapes a transcript. Listen-only. |
| `realtime` | Same as transcribe PLUS speaks into the meeting via OpenAI Realtime. The agent calls `meet_say(text)` and the bot's voice comes out of the call. |
Pick `realtime` only when the user actually wants the agent to speak. It costs real money (OpenAI Realtime is pay-per-audio-minute) and requires a virtual audio device set up on the machine running the bot.
## Two locations
| Location | When |
|---|---|
| Local (default) | Gateway machine runs the Playwright bot directly. |
| Remote node (`node="<name>"`) | Bot runs on a different machine that has a signed-in Chrome and (for realtime) a configured audio bridge. Useful when the gateway runs on a headless Linux box but the user's real signed-in Chrome lives on their Mac. |
## Prerequisites the user must handle once
Easiest path — run the built-in installer:
```bash
hermes plugins enable google_meet
hermes meet install # pip deps + Chromium (transcribe only)
hermes meet install --realtime # + pulseaudio-utils / brew blackhole+ffmpeg
hermes meet auth # optional; skips guest-lobby wait
hermes meet setup # preflight checks
```
`hermes meet install --realtime` prompts before running `sudo apt-get` (Linux)
or `brew install` (macOS). Pass `--yes` to skip the prompt. It will NOT touch
your macOS default-input setting — you have to select BlackHole 2ch in
System Settings yourself before starting a realtime meeting.
Or do it manually:
```bash
pip install playwright websockets && python -m playwright install chromium
# For realtime mode, additionally:
# Linux: sudo apt install pulseaudio-utils
# macOS: brew install blackhole-2ch ffmpeg
# → System Settings → Sound → Input → BlackHole 2ch
# Then set OPENAI_API_KEY or HERMES_MEET_REALTIME_KEY in ~/.hermes/.env
```
For a remote node:
```bash
# on the user's Mac (where Chrome is signed in):
pip install playwright websockets && python -m playwright install chromium
hermes plugins enable google_meet
hermes meet node run --display-name my-mac # persistent server
# copy the printed token
# on the gateway:
hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
hermes meet node ping my-mac # confirm reachable
```
Run `hermes meet setup` to preflight local prereqs.
## Flow
1. **Join** — call `meet_join(url=..., mode=..., node=...)`. Returns immediately.
2. **Announce yourself** — no auto-consent. Say (in whatever channel the user is watching): "A Hermes agent bot is in this call taking notes."
3. **Poll**`meet_status()` for liveness, `meet_transcript(last=20)` for recent captions. Don't re-read the whole transcript every turn.
4. **Speak (realtime only)**`meet_say(text="...")` queues text for TTS. The speech lags by ~2s. Don't spam it.
5. **Leave**`meet_leave()` when done, or set `duration="30m"` on `meet_join` for auto-leave.
6. **Follow up** — read `meet_transcript()` in full, summarize, and use regular tools to send the recap, file issues, schedule followups.
## Tool reference
| Tool | Parameters | Use |
|---|---|---|
| `meet_join` | `url`, `mode?`, `guest_name?`, `duration?`, `headed?`, `node?` | Start bot |
| `meet_status` | `node?` | Liveness + progress |
| `meet_transcript` | `last?`, `node?` | Read captions |
| `meet_leave` | `node?` | Close bot |
| `meet_say` | `text`, `node?` | Speak in realtime meeting |
`node?` on all tools: pass a registered node name (or `"auto"` for the sole node) to operate a remote bot instead of a local one. Omit for local.
## Important limits
- Captions are only as good as Google Meet's live captions. English-biased, lossy on overlapping speakers.
- Guest mode sits in the lobby until a host admits. Warn the user; `hermes meet auth` avoids this.
- **Lobby timeout**: if the host doesn't admit the bot within 5 minutes (configurable via `HERMES_MEET_LOBBY_TIMEOUT` env), the bot leaves and `meet_status` reports `leaveReason: "lobby_timeout"`.
- **One active meeting per install per location.** A second `meet_join` leaves the first.
- **Windows not supported.**
- Realtime mode needs a virtual audio device. If the audio bridge setup fails, the bot falls back to transcribe mode and flags it in `meet_status().error`.
- `meet_say` requires `mode='realtime'` on the originating `meet_join`. Calling it against a transcribe-mode meeting returns a clear error.
- **Barge-in is best-effort.** When a caption arrives attributed to a real participant while the bot is generating audio, the bot sends `response.cancel` to OpenAI Realtime. Captions take ~500ms to show up, so the bot will talk over the first second or so of a human interruption.
## Status dict reference
`meet_status()` returns (subset shown, there are more):
| Key | Meaning |
|---|---|
| `inCall` | Past the lobby. False while waiting for admission. |
| `lobbyWaiting` | Clicked "Ask to join", waiting on host. |
| `joinAttemptedAt` / `joinedAt` | Timestamps for lobby-click and actual admission. |
| `captioning` | Caption observer is installed. |
| `transcriptLines` / `lastCaptionAt` | Transcript progress. |
| `realtime` / `realtimeReady` | Realtime mode provisioned / WS connected. |
| `realtimeDevice` | Audio device name the bot is feeding (e.g. `hermes_meet_src`). |
| `audioBytesOut` / `lastAudioOutAt` | How much PCM the OpenAI session has produced. |
| `lastBargeInAt` | Timestamp of the most recent `response.cancel` sent. |
| `leaveReason` | `duration_expired`, `lobby_timeout`, `denied`, `page_closed`, or null. |
| `error` | Last error (soft — bot may still be running). |
## Transcript location
Local:
```
$HERMES_HOME/workspace/meetings/<meeting-id>/transcript.txt
```
Remote node: transcript lives on the node host's disk. Use `meet_transcript(node=...)` to read it over RPC.
## Safety
- URL regex: only `https://meet.google.com/...` URLs pass.
- No calendar scanning. No auto-dial.
- Remote nodes use bearer-token auth; tokens are generated on the node (32 hex chars, persisted in `$HERMES_HOME/workspace/meetings/node_token.json`) and must be copied to the gateway via `hermes meet node approve`.
- `meet_say` text is rate-limited by the OpenAI Realtime session; spam-protection is the bot's problem, not yours, but still — don't queue hundreds of lines.
+103
View File
@@ -0,0 +1,103 @@
"""google_meet plugin — let the agent join a Meet call, transcribe it, follow up.
v1: transcribe-only. Spawns a headless Chromium via Playwright, joins the Meet
URL, enables live captions, scrapes them into a transcript file. The agent then
has the transcript in its workspace and can do whatever followup work it needs
using its regular tools.
v2 (not in this PR): realtime duplex audio so the agent can speak in the
meeting, via OpenAI Realtime / Gemini Live + BlackHole / PulseAudio null-sink.
``meet_say`` exists as a stub today so the tool surface is stable.
Explicit-by-design: only joins ``https://meet.google.com/`` URLs explicitly
passed in. No calendar scanning, no auto-dial, no consent announcement.
"""
from __future__ import annotations
import logging
import platform
from plugins.google_meet import process_manager as pm
from plugins.google_meet.cli import register_cli as _register_meet_cli
from plugins.google_meet.cli import meet_command as _meet_command
from plugins.google_meet.tools import (
MEET_JOIN_SCHEMA,
MEET_LEAVE_SCHEMA,
MEET_SAY_SCHEMA,
MEET_STATUS_SCHEMA,
MEET_TRANSCRIPT_SCHEMA,
check_meet_requirements,
handle_meet_join,
handle_meet_leave,
handle_meet_say,
handle_meet_status,
handle_meet_transcript,
)
logger = logging.getLogger(__name__)
_TOOLS = (
("meet_join", MEET_JOIN_SCHEMA, handle_meet_join, "📞"),
("meet_status", MEET_STATUS_SCHEMA, handle_meet_status, "🟢"),
("meet_transcript", MEET_TRANSCRIPT_SCHEMA, handle_meet_transcript, "📝"),
("meet_leave", MEET_LEAVE_SCHEMA, handle_meet_leave, "👋"),
("meet_say", MEET_SAY_SCHEMA, handle_meet_say, "🗣️"),
)
def _on_session_end(**kwargs) -> None:
"""Best-effort cleanup — if a meet bot is still running when the session
ends, leave the call so we don't orphan a headless Chromium.
No-ops when nothing is active. Swallows all exceptions session end must
not fail because the bot cleanup hit an edge case.
"""
try:
status = pm.status()
if status.get("ok") and status.get("alive"):
pm.stop(reason="session ended")
except Exception as e: # pragma: no cover — defensive
logger.debug("google_meet on_session_end cleanup failed: %s", e)
def register(ctx) -> None:
"""Register tools, CLI, and lifecycle hooks.
Called once by the plugin loader when the plugin is enabled via
``plugins.enabled`` in config.yaml.
"""
# Windows is not supported in v1 — audio routing for v2 doesn't have a
# tested path there and guest-join Chromium is flakier. Refuse to register
# rather than half-working.
system = platform.system().lower()
if system not in ("linux", "darwin"):
logger.info(
"google_meet plugin: platform=%s not supported (linux/macos only)",
system,
)
return
for name, schema, handler, emoji in _TOOLS:
ctx.register_tool(
name=name,
toolset="google_meet",
schema=schema,
handler=handler,
check_fn=check_meet_requirements,
emoji=emoji,
)
ctx.register_cli_command(
name="meet",
help="Google Meet bot (join, transcribe, follow up)",
setup_fn=_register_meet_cli,
handler_fn=_meet_command,
description=(
"Let the hermes agent join a Google Meet call and scrape live "
"captions into a transcript. See: hermes meet setup"
),
)
ctx.register_hook("on_session_end", _on_session_end)
+244
View File
@@ -0,0 +1,244 @@
"""Virtual audio bridge for feeding generated speech into Chrome's mic.
v2 module. Provisions a platform-specific virtual audio device so the
Meet bot's Chromium instance can be pointed at an input source we
control. The OpenAI Realtime client writes PCM bytes into this device;
Chrome reads them as if they were coming from a microphone.
Linux (primary): uses pactl (PulseAudio) to create a null-sink plus a
virtual source whose master is the null-sink's monitor. Callers set
PULSE_SOURCE=<source_name> in Chrome's env and pass the fake-mic flag.
macOS: requires BlackHole 2ch to be installed. This module only
verifies its presence and returns the device name; routing OS default
input is left to the user (or a future switchaudio-osx integration) to
avoid surprising the user's system audio state.
Windows: not supported in v2.
"""
from __future__ import annotations
import platform
import subprocess
from typing import Optional
_BLACKHOLE_DEVICE = "BlackHole 2ch"
class AudioBridge:
"""Manages a virtual audio device for Chrome fake-mic input.
Call ``setup()`` once before launching the Meet bot and
``teardown()`` when the session ends. ``teardown()`` is idempotent.
"""
def __init__(self, name_prefix: str = "hermes_meet") -> None:
self._name_prefix = name_prefix
self._platform: Optional[str] = None
self._device_name: Optional[str] = None
self._write_target: Optional[str] = None
self._module_ids: list[int] = []
self._torn_down = False
# ── public properties ─────────────────────────────────────────────────
@property
def device_name(self) -> str:
if not self._device_name:
raise RuntimeError("AudioBridge not set up yet")
return self._device_name
@property
def write_target(self) -> str:
if not self._write_target:
raise RuntimeError("AudioBridge not set up yet")
return self._write_target
# ── lifecycle ─────────────────────────────────────────────────────────
def setup(self) -> dict:
"""Provision the virtual audio device.
Returns a dict describing the device. Raises RuntimeError on
unsupported platforms or when required system tools are missing.
"""
system = platform.system()
if system == "Linux":
return self._setup_linux()
if system == "Darwin":
return self._setup_darwin()
if system == "Windows":
raise RuntimeError("windows not supported in v2")
raise RuntimeError(f"unsupported platform: {system}")
def teardown(self) -> None:
"""Release the virtual audio device. Idempotent."""
if self._torn_down:
return
# Only Linux needs explicit unloading.
if self._platform == "linux" and self._module_ids:
# Unload in reverse order (virtual-source before null-sink).
for mod_id in reversed(self._module_ids):
try:
subprocess.run(
["pactl", "unload-module", str(mod_id)],
check=False,
capture_output=True,
)
except Exception:
# Best-effort teardown — never raise from here.
pass
self._module_ids = []
self._torn_down = True
# ── platform impls ────────────────────────────────────────────────────
def _setup_linux(self) -> dict:
sink_name = f"{self._name_prefix}_sink"
src_name = f"{self._name_prefix}_src"
try:
sink_out = subprocess.run(
[
"pactl",
"load-module",
"module-null-sink",
f"sink_name={sink_name}",
f"sink_properties=device.description=HermesMeetSink",
],
check=True,
capture_output=True,
text=True,
)
except FileNotFoundError as exc:
raise RuntimeError(
"pactl not found — install PulseAudio/pipewire-pulse"
) from exc
except subprocess.CalledProcessError as exc:
raise RuntimeError(
f"pactl load-module null-sink failed: {exc.stderr or exc}"
) from exc
sink_mod_id = self._parse_module_id(sink_out.stdout)
try:
src_out = subprocess.run(
[
"pactl",
"load-module",
"module-virtual-source",
f"source_name={src_name}",
f"master={sink_name}.monitor",
],
check=True,
capture_output=True,
text=True,
)
except subprocess.CalledProcessError as exc:
# Roll back the null-sink we just created so we don't leak it.
subprocess.run(
["pactl", "unload-module", str(sink_mod_id)],
check=False,
capture_output=True,
)
raise RuntimeError(
f"pactl load-module virtual-source failed: {exc.stderr or exc}"
) from exc
src_mod_id = self._parse_module_id(src_out.stdout)
self._platform = "linux"
self._device_name = src_name
self._write_target = sink_name
self._module_ids = [sink_mod_id, src_mod_id]
self._torn_down = False
return {
"platform": "linux",
"device_name": src_name,
"sample_rate": 48000,
"channels": 2,
"module_ids": list(self._module_ids),
"write_target": sink_name,
}
def _setup_darwin(self) -> dict:
try:
out = subprocess.check_output(
["system_profiler", "SPAudioDataType"],
text=True,
stderr=subprocess.STDOUT,
)
except FileNotFoundError as exc:
raise RuntimeError(
"system_profiler not found (macOS-only command)"
) from exc
except subprocess.CalledProcessError as exc:
raise RuntimeError(
f"system_profiler failed: {exc.output}"
) from exc
if "BlackHole" not in out:
raise RuntimeError(
"BlackHole virtual audio device not installed. "
"Install via: brew install blackhole-2ch"
)
self._platform = "darwin"
self._device_name = _BLACKHOLE_DEVICE
self._write_target = _BLACKHOLE_DEVICE
self._module_ids = []
self._torn_down = False
return {
"platform": "darwin",
"device_name": _BLACKHOLE_DEVICE,
"sample_rate": 48000,
"channels": 2,
"module_ids": [],
"write_target": _BLACKHOLE_DEVICE,
}
# ── helpers ──────────────────────────────────────────────────────────
@staticmethod
def _parse_module_id(stdout: str) -> int:
"""pactl load-module prints the new module ID to stdout."""
text = (stdout or "").strip()
if not text:
raise RuntimeError("pactl load-module returned empty stdout")
# Take the last whitespace-separated token on the first non-empty line.
first = text.splitlines()[0].strip()
token = first.split()[-1]
try:
return int(token)
except ValueError as exc:
raise RuntimeError(
f"could not parse pactl module id from: {stdout!r}"
) from exc
def chrome_fake_audio_flags(bridge_info: dict) -> list[str]:
"""Return Chrome flags for using the fake audio input.
The PulseAudio source is selected via the ``PULSE_SOURCE`` env var,
which callers must set in Chrome's environment before launch:
env["PULSE_SOURCE"] = bridge_info["device_name"]
On macOS the caller must ensure the system default audio input is
set to the returned BlackHole device (we do not flip that switch).
"""
system = platform.system()
if system == "Linux":
# Chromium on Linux picks up the PulseAudio source selected via
# PULSE_SOURCE env var; the fake-ui flag skips the permission
# prompt so the bot can pick "use my mic" without user input.
return ["--use-fake-ui-for-media-stream"]
if system == "Darwin":
return ["--use-fake-ui-for-media-stream"]
if system == "Windows":
raise RuntimeError("windows not supported in v2")
raise RuntimeError(f"unsupported platform: {system}")
+478
View File
@@ -0,0 +1,478 @@
"""CLI commands for the google_meet plugin.
Wires ``hermes meet <subcommand>``:
setup preflight playwright, chromium, auth file, print fixes
auth open a browser to sign into Google, save storage state
join <url> join a Meet URL synchronously (also callable from the agent)
status print current bot state
transcript print the transcript
stop leave the current meeting
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from pathlib import Path
from typing import Optional
from hermes_constants import get_hermes_home
from plugins.google_meet import process_manager as pm
from plugins.google_meet.meet_bot import _is_safe_meet_url
def _auth_state_path() -> Path:
return Path(get_hermes_home()) / "workspace" / "meetings" / "auth.json"
# ---------------------------------------------------------------------------
# argparse wiring
# ---------------------------------------------------------------------------
def register_cli(subparser: argparse.ArgumentParser) -> None:
"""Build the ``hermes meet`` argparse tree.
Called by :func:`_register_cli_commands` at plugin load time.
"""
subs = subparser.add_subparsers(dest="meet_command")
subs.add_parser("setup", help="Preflight: playwright, chromium, auth")
inst_p = subs.add_parser(
"install",
help="Install prerequisites (pip deps, Chromium, platform audio tools)",
)
inst_p.add_argument(
"--realtime", action="store_true",
help="Also install realtime audio tools (pulseaudio-utils on Linux, BlackHole+ffmpeg on macOS). Uses sudo/brew, prompts before invoking either.",
)
inst_p.add_argument(
"--yes", "-y", action="store_true",
help="Answer yes to all prompts (use with care; will run sudo apt-get or brew without asking).",
)
subs.add_parser("auth", help="Sign in to Google and save session state")
join_p = subs.add_parser("join", help="Join a Meet URL")
join_p.add_argument("url", help="https://meet.google.com/...")
join_p.add_argument("--guest-name", default="Hermes Agent")
join_p.add_argument("--duration", default=None, help="e.g. 30m, 2h, 90s")
join_p.add_argument("--headed", action="store_true", help="show browser")
join_p.add_argument(
"--mode", choices=("transcribe", "realtime"), default="transcribe",
help="transcribe (default, listen-only) or realtime (speak via OpenAI Realtime)"
)
join_p.add_argument(
"--node", default=None,
help="remote node name, or 'auto' to use the sole registered node"
)
subs.add_parser("status", help="Print current Meet bot state")
tr_p = subs.add_parser("transcript", help="Print the scraped transcript")
tr_p.add_argument("--last", type=int, default=None)
say_p = subs.add_parser("say", help="Speak text in an active realtime meeting")
say_p.add_argument("text", help="what to say")
say_p.add_argument("--node", default=None)
subs.add_parser("stop", help="Leave the current meeting")
# v3: remote node host management.
node_p = subs.add_parser(
"node",
help="Manage remote meet node hosts (run/list/approve/remove/status/ping)",
)
try:
from plugins.google_meet.node.cli import register_cli as _register_node_cli
_register_node_cli(node_p)
except Exception as e: # pragma: no cover — defensive
# If the node module fails to import for any reason (optional dep
# missing at import time etc.), leave the subparser present but
# flag it. The argparse dispatch will surface a clear error.
def _node_unavailable(args):
print(f"hermes meet node: module unavailable ({e})")
return 1
node_p.set_defaults(func=_node_unavailable)
subparser.set_defaults(func=meet_command)
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
def meet_command(args: argparse.Namespace) -> int:
sub = getattr(args, "meet_command", None)
if not sub:
print("usage: hermes meet {setup,auth,join,status,transcript,say,stop,node}")
return 2
if sub == "setup":
return _cmd_setup()
if sub == "install":
return _cmd_install(
realtime=bool(getattr(args, "realtime", False)),
assume_yes=bool(getattr(args, "yes", False)),
)
if sub == "auth":
return _cmd_auth()
if sub == "join":
return _cmd_join(
url=args.url,
guest_name=args.guest_name,
duration=args.duration,
headed=args.headed,
mode=getattr(args, "mode", "transcribe"),
node=getattr(args, "node", None),
)
if sub == "status":
return _cmd_status()
if sub == "transcript":
return _cmd_transcript(last=args.last)
if sub == "say":
return _cmd_say(text=args.text, node=getattr(args, "node", None))
if sub == "stop":
return _cmd_stop()
if sub == "node":
# Dispatch was set by the node cli's register_cli; fall through to
# whatever its subparsers wired.
fn = getattr(args, "func", None)
if fn is None or fn is meet_command:
print("usage: hermes meet node {run,list,approve,remove,status,ping}")
return 2
return fn(args)
print(f"unknown subcommand: {sub}")
return 2
# ---------------------------------------------------------------------------
# Subcommand handlers
# ---------------------------------------------------------------------------
def _cmd_setup() -> int:
import platform as _p
print("google_meet preflight")
print("---------------------")
system = _p.system()
system_ok = system in ("Linux", "Darwin")
print(f" platform : {system} [{'ok' if system_ok else 'unsupported'}]")
try:
import playwright # noqa: F401
pw_ok = True
pw_msg = "installed"
except ImportError:
pw_ok = False
pw_msg = "NOT installed — run: pip install playwright"
print(f" playwright : {pw_msg}")
chromium_ok = False
chromium_msg = "unknown"
if pw_ok:
try:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
try:
exe = p.chromium.executable_path
if exe and Path(exe).exists():
chromium_ok = True
chromium_msg = f"ok ({exe})"
else:
chromium_msg = (
"not installed — run: "
"python -m playwright install chromium"
)
except Exception as e:
chromium_msg = f"probe failed: {e}"
except Exception as e:
chromium_msg = f"probe failed: {e}"
print(f" chromium : {chromium_msg}")
auth_path = _auth_state_path()
auth_ok = auth_path.is_file()
print(
" google auth : "
+ (f"ok ({auth_path})" if auth_ok else "not saved — run: hermes meet auth")
)
print()
all_ok = system_ok and pw_ok and chromium_ok
if all_ok:
print(
"ready. Join a meeting: "
"hermes meet join https://meet.google.com/abc-defg-hij"
)
else:
print("not ready yet — fix the items above.")
return 0 if all_ok else 1
def _cmd_install(*, realtime: bool, assume_yes: bool) -> int:
"""Install the plugin's prerequisites.
Always: pip install playwright + websockets, then
``python -m playwright install chromium``.
With ``--realtime``: also install the platform audio bridge deps.
Linux : ``sudo apt-get install -y pulseaudio-utils``
macOS : ``brew install blackhole-2ch ffmpeg`` (+ remind the user
to select BlackHole as the default input device manually)
Prompts before every package-manager invocation unless ``--yes``.
Refuses to run on Windows.
"""
import platform as _p
import shutil as _shutil
import subprocess as _sp
system = _p.system()
if system not in ("Linux", "Darwin"):
print(f"google_meet install: {system} is not supported (linux/macos only)")
return 1
def _confirm(prompt: str) -> bool:
if assume_yes:
return True
try:
ans = input(f"{prompt} [y/N] ").strip().lower()
except EOFError:
return False
return ans in ("y", "yes")
print("google_meet install")
print("-------------------")
# 1) pip deps — always safe, venv-scoped.
pip_pkgs = ["playwright", "websockets"]
print(f"\n[1/3] pip install: {' '.join(pip_pkgs)}")
try:
res = _sp.run(
[sys.executable, "-m", "pip", "install", "--upgrade", *pip_pkgs],
check=False,
)
if res.returncode != 0:
print(" pip install failed")
return 1
except Exception as e:
print(f" pip install failed: {e}")
return 1
# 2) Playwright browsers — pulls chromium (~300MB first run).
print("\n[2/3] python -m playwright install chromium")
try:
res = _sp.run(
[sys.executable, "-m", "playwright", "install", "chromium"],
check=False,
)
if res.returncode != 0:
print(" playwright install failed (may already be installed)")
except Exception as e:
print(f" playwright install failed: {e}")
return 1
# 3) Platform audio deps for realtime mode.
if realtime:
print("\n[3/3] realtime audio deps")
if system == "Linux":
if _shutil.which("paplay") and _shutil.which("pactl"):
print(" pulseaudio-utils already installed.")
else:
if not _confirm(
" install pulseaudio-utils? this runs `sudo apt-get install -y pulseaudio-utils`"
):
print(" skipped (you can run it manually later)")
else:
cmd = ["sudo", "apt-get", "install", "-y", "pulseaudio-utils"]
print(f" $ {' '.join(cmd)}")
res = _sp.run(cmd, check=False)
if res.returncode != 0:
print(" apt install failed — install pulseaudio-utils manually")
elif system == "Darwin":
have_bh = False
try:
out = _sp.check_output(["system_profiler", "SPAudioDataType"], text=True)
have_bh = "BlackHole" in out
except Exception:
pass
have_ffmpeg = bool(_shutil.which("ffmpeg"))
needs = []
if not have_bh:
needs.append("blackhole-2ch")
if not have_ffmpeg:
needs.append("ffmpeg")
if not needs:
print(" BlackHole and ffmpeg already installed.")
elif not _shutil.which("brew"):
print(
" missing: " + ", ".join(needs) + "\n"
" install Homebrew first (https://brew.sh) or install the packages manually."
)
else:
if not _confirm(f" install via brew: {' '.join(needs)}?"):
print(" skipped (you can run it manually later)")
else:
cmd = ["brew", "install", *needs]
print(f" $ {' '.join(cmd)}")
res = _sp.run(cmd, check=False)
if res.returncode != 0:
print(" brew install failed — install them manually")
print(
"\n NOTE: macOS does not auto-route audio. Open\n"
" System Settings → Sound → Input\n"
" and select 'BlackHole 2ch' before starting a realtime meeting.\n"
" hermes will not switch your default input for you."
)
else:
print("\n[3/3] skipped (pass --realtime to install audio tooling too)")
print("\ndone. verify with: hermes meet setup")
return 0
def _cmd_auth() -> int:
"""Open a headed Chromium, let the user sign in, save storage_state."""
try:
from playwright.sync_api import sync_playwright
except ImportError:
print(
"playwright is not installed. run:\n"
" pip install playwright && python -m playwright install chromium"
)
return 1
path = _auth_state_path()
path.parent.mkdir(parents=True, exist_ok=True)
print(f"opening Chromium — sign in to Google, then return here and press Enter.")
print(f"saving storage state to: {path}")
try:
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto("https://accounts.google.com/", wait_until="domcontentloaded")
try:
input("press Enter after you've signed in ... ")
except EOFError:
pass
context.storage_state(path=str(path))
browser.close()
except Exception as e:
print(f"auth failed: {e}")
return 1
print("saved. you can now run: hermes meet join <url>")
return 0
def _cmd_join(
url: str,
*,
guest_name: str,
duration: Optional[str],
headed: bool,
mode: str = "transcribe",
node: Optional[str] = None,
) -> int:
if not _is_safe_meet_url(url):
print(f"refusing: not a meet.google.com URL: {url}")
return 2
if node:
# Remote: go through NodeClient.
try:
from plugins.google_meet.node.registry import NodeRegistry
from plugins.google_meet.node.client import NodeClient
except ImportError as e:
print(f"node module unavailable: {e}")
return 1
reg = NodeRegistry()
entry = reg.resolve(node if node != "auto" else None)
if entry is None:
print(f"no registered node matches {node!r}")
return 1
client = NodeClient(url=entry["url"], token=entry["token"])
try:
res = client.start_bot(
url=url, guest_name=guest_name, duration=duration,
headed=headed, mode=mode,
)
except Exception as e:
print(f"remote start_bot failed: {e}")
return 1
print(json.dumps({"node": entry.get("name"), **res}, indent=2))
return 0 if res.get("ok") else 1
auth = _auth_state_path()
res = pm.start(
url=url,
headed=headed,
guest_name=guest_name,
duration=duration,
auth_state=str(auth) if auth.is_file() else None,
mode=mode,
)
print(json.dumps(res, indent=2))
return 0 if res.get("ok") else 1
def _cmd_say(text: str, node: Optional[str] = None) -> int:
if not (text or "").strip():
print("refusing: empty text")
return 2
if node:
try:
from plugins.google_meet.node.registry import NodeRegistry
from plugins.google_meet.node.client import NodeClient
except ImportError as e:
print(f"node module unavailable: {e}")
return 1
reg = NodeRegistry()
entry = reg.resolve(node if node != "auto" else None)
if entry is None:
print(f"no registered node matches {node!r}")
return 1
client = NodeClient(url=entry["url"], token=entry["token"])
try:
res = client.say(text)
except Exception as e:
print(f"remote say failed: {e}")
return 1
print(json.dumps({"node": entry.get("name"), **res}, indent=2))
return 0 if res.get("ok") else 1
res = pm.enqueue_say(text)
print(json.dumps(res, indent=2))
return 0 if res.get("ok") else 1
def _cmd_status() -> int:
res = pm.status()
print(json.dumps(res, indent=2))
return 0 if res.get("ok") else 1
def _cmd_transcript(last: Optional[int]) -> int:
res = pm.transcript(last=last)
if not res.get("ok"):
print(json.dumps(res, indent=2))
return 1
for ln in res.get("lines", []):
print(ln)
return 0
def _cmd_stop() -> int:
res = pm.stop(reason="hermes meet stop")
print(json.dumps(res, indent=2))
return 0 if res.get("ok") else 1
if __name__ == "__main__": # pragma: no cover
parser = argparse.ArgumentParser(prog="hermes meet")
register_cli(parser)
ns = parser.parse_args()
sys.exit(meet_command(ns))
+852
View File
@@ -0,0 +1,852 @@
"""Headless Google Meet bot — Playwright + live-caption scraping.
Runs as a standalone subprocess spawned by ``process_manager.py``. Reads config
from env vars, writes status + transcript to files under
``$HERMES_HOME/workspace/meetings/<meeting-id>/``. The main hermes process
reads those files via the ``meet_*`` tools no IPC beyond filesystem.
The scraping strategy mirrors OpenUtter (sumansid/openutter): we don't parse
WebRTC audio, we enable Google Meet's built-in live captions and observe the
captions container in the DOM via a MutationObserver. This is lossy and
English-biased but it is:
* deterministic (no API keys, no STT billing),
* works behind Meet's normal login / admission,
* survives Meet UI rewrites fairly well because the caption container has a
stable ARIA role.
Run standalone for debugging::
HERMES_MEET_URL=https://meet.google.com/abc-defg-hij \\
HERMES_MEET_OUT_DIR=/tmp/meet-debug \\
HERMES_MEET_HEADED=1 \\
python -m plugins.google_meet.meet_bot
No meet.google.com URL exits non-zero. Any URL that doesn't start with
``https://meet.google.com/`` is rejected (explicit-by-design).
"""
from __future__ import annotations
import json
import os
import re
import signal
import sys
import threading
import time
from pathlib import Path
from typing import Optional
# Match ``https://meet.google.com/abc-defg-hij`` or ``.../lookup/...`` — the
# short three-segment code or a lookup URL. Anything else is rejected.
MEET_URL_RE = re.compile(
r"^https://meet\.google\.com/("
r"[a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,}"
r"|lookup/[^/?#]+"
r"|new"
r")(?:[/?#].*)?$"
)
# Filenames the bot reads/writes in ``HERMES_MEET_OUT_DIR``.
SAY_QUEUE_FILENAME = "say_queue.jsonl"
SAY_PCM_FILENAME = "speaker.pcm"
def _is_safe_meet_url(url: str) -> bool:
"""Return True if *url* is a Google Meet URL we're willing to navigate to."""
if not isinstance(url, str):
return False
return bool(MEET_URL_RE.match(url.strip()))
def _meeting_id_from_url(url: str) -> str:
"""Extract the 3-segment meeting code from a Meet URL.
For ``https://meet.google.com/abc-defg-hij`` ``abc-defg-hij``.
For ``.../lookup/<id>`` or ``/new`` we fall back to a timestamped id the
bot won't know the real code until after redirect, and callers pass this
through to filename anyway.
"""
m = re.search(
r"meet\.google\.com/([a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,})",
url or "",
)
if m:
return m.group(1)
return f"meet-{int(time.time())}"
# ---------------------------------------------------------------------------
# Status + transcript file writers
# ---------------------------------------------------------------------------
class _BotState:
"""Single-process mutable state, flushed to ``status.json`` on each change."""
def __init__(self, out_dir: Path, meeting_id: str, url: str):
self.out_dir = out_dir
self.meeting_id = meeting_id
self.url = url
self.in_call = False
self.captioning = False
self.captions_enabled_attempted = False
self.lobby_waiting = False
self.join_attempted_at: Optional[float] = None
self.joined_at: Optional[float] = None
self.last_caption_at: Optional[float] = None
self.transcript_lines = 0
self.error: Optional[str] = None
self.exited = False
# v2 realtime fields.
self.realtime = False
self.realtime_ready = False
self.realtime_device: Optional[str] = None
self.audio_bytes_out: int = 0
self.last_audio_out_at: Optional[float] = None
self.last_barge_in_at: Optional[float] = None
self.leave_reason: Optional[str] = None
# Scraped captions, in order, deduped. Each entry is a dict of
# {"ts": <epoch>, "speaker": str, "text": str}.
self._seen: set = set()
out_dir.mkdir(parents=True, exist_ok=True)
self.transcript_path = out_dir / "transcript.txt"
self.status_path = out_dir / "status.json"
self._flush()
# -------- transcript ------------------------------------------------
def record_caption(self, speaker: str, text: str) -> None:
"""Append a caption line if we haven't seen this exact (speaker, text)."""
speaker = (speaker or "").strip() or "Unknown"
text = (text or "").strip()
if not text:
return
key = f"{speaker}|{text}"
if key in self._seen:
return
self._seen.add(key)
self.transcript_lines += 1
self.last_caption_at = time.time()
ts = time.strftime("%H:%M:%S", time.localtime(self.last_caption_at))
line = f"[{ts}] {speaker}: {text}\n"
# Atomic-ish append — good enough for a single-writer.
with self.transcript_path.open("a", encoding="utf-8") as f:
f.write(line)
self._flush()
# -------- status file ----------------------------------------------
def _flush(self) -> None:
data = {
"meetingId": self.meeting_id,
"url": self.url,
"inCall": self.in_call,
"captioning": self.captioning,
"captionsEnabledAttempted": self.captions_enabled_attempted,
"lobbyWaiting": self.lobby_waiting,
"joinAttemptedAt": self.join_attempted_at,
"joinedAt": self.joined_at,
"lastCaptionAt": self.last_caption_at,
"transcriptLines": self.transcript_lines,
"transcriptPath": str(self.transcript_path),
"error": self.error,
"exited": self.exited,
"pid": os.getpid(),
# v2 realtime telemetry.
"realtime": self.realtime,
"realtimeReady": self.realtime_ready,
"realtimeDevice": self.realtime_device,
"audioBytesOut": self.audio_bytes_out,
"lastAudioOutAt": self.last_audio_out_at,
"lastBargeInAt": self.last_barge_in_at,
"leaveReason": self.leave_reason,
}
tmp = self.status_path.with_suffix(".json.tmp")
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
tmp.replace(self.status_path)
def set(self, **kwargs) -> None:
for k, v in kwargs.items():
setattr(self, k, v)
self._flush()
# ---------------------------------------------------------------------------
# Playwright bot entry point
# ---------------------------------------------------------------------------
# JavaScript injected into the Meet tab to observe captions. Captures
# {speaker, text} tuples via a MutationObserver on the caption container,
# and exposes ``window.__hermesMeetDrain()`` to pull new entries. This
# mirrors the OpenUtter caption scraping approach.
_CAPTION_OBSERVER_JS = r"""
(() => {
if (window.__hermesMeetInstalled) return;
window.__hermesMeetInstalled = true;
window.__hermesMeetQueue = [];
const captionSelector = '[role="region"][aria-label*="aption" i], ' +
'div[jsname="YSxPC"], ' + // legacy
'div[jsname="tgaKEf"]'; // current (Apr 2026)
function pushEntry(speaker, text) {
if (!text || !text.trim()) return;
window.__hermesMeetQueue.push({
ts: Date.now(),
speaker: (speaker || '').trim(),
text: text.trim(),
});
}
function scan(root) {
// Meet captions render as a list of rows; each row contains a speaker
// label and a text block. Selectors vary across Meet rewrites; we try
// a few shapes and fall back to raw text.
const rows = root.querySelectorAll('div[jsname="dsyhDe"], div.CNusmb, div.TBMuR');
if (rows.length) {
rows.forEach((row) => {
const spkEl = row.querySelector('div.KcIKyf, div.zs7s8d, span[jsname="YSxPC"]');
const txtEl = row.querySelector('div.bh44bd, span[jsname="tgaKEf"], div.iTTPOb');
const speaker = spkEl ? spkEl.innerText : '';
const text = txtEl ? txtEl.innerText : row.innerText;
pushEntry(speaker, text);
});
return;
}
// Fallback: treat the whole region's innerText as one anonymous line.
const text = (root.innerText || '').split('\n').filter(Boolean).pop();
pushEntry('', text);
}
function attach() {
const el = document.querySelector(captionSelector);
if (!el) return false;
const obs = new MutationObserver(() => scan(el));
obs.observe(el, { childList: true, subtree: true, characterData: true });
scan(el);
return true;
}
// Try now and retry on interval the caption region only appears after
// captions are enabled and someone speaks.
if (!attach()) {
const iv = setInterval(() => { if (attach()) clearInterval(iv); }, 1500);
}
window.__hermesMeetDrain = () => {
const out = window.__hermesMeetQueue.slice();
window.__hermesMeetQueue = [];
return out;
};
})();
"""
def _enable_captions_js() -> str:
"""Return a small JS snippet that tries to click the 'Turn on captions' button.
Best-effort Meet's caption toggle is keyboard-accessible via ``c``. We
dispatch that keystroke as a cheap fallback. Real click targeting is too
brittle to rely on.
"""
return r"""
(() => {
const ev = new KeyboardEvent('keydown', {
key: 'c', code: 'KeyC', keyCode: 67, which: 67, bubbles: true,
});
document.body.dispatchEvent(ev);
return true;
})();
"""
def _start_realtime_speaker(
*,
rt: dict,
out_dir: Path,
bridge_info: dict,
api_key: str,
model: str,
voice: str,
instructions: str,
stop_flag: dict,
state: "_BotState",
) -> None:
"""Wire up the OpenAI Realtime session + speaker thread + PCM pump.
The speaker thread reads text lines from ``say_queue.jsonl``, sends each
to OpenAI Realtime, and writes PCM audio into ``speaker.pcm``. A
separate *pump* thread forwards that PCM into the OS audio sink so
Chrome's fake mic picks it up. On Linux we pipe to ``paplay`` against
the null-sink; on macOS the caller is expected to have the BlackHole
device selected as default input.
"""
try:
from plugins.google_meet.realtime.openai_client import (
RealtimeSession,
RealtimeSpeaker,
)
except Exception as e:
state.set(error=f"realtime import failed: {e}")
return
pcm_path = out_dir / SAY_PCM_FILENAME
queue_path = out_dir / SAY_QUEUE_FILENAME
processed_path = out_dir / "say_processed.jsonl"
# Reset the sink file so we start clean each session.
pcm_path.write_bytes(b"")
# Make sure the queue exists so the speaker poller doesn't error on
# first iteration.
queue_path.touch()
try:
session = RealtimeSession(
api_key=api_key,
model=model,
voice=voice,
instructions=instructions,
audio_sink_path=pcm_path,
sample_rate=24000,
)
session.connect()
except Exception as e:
state.set(error=f"realtime connect failed: {e}")
return
rt["session"] = session
def _stop_fn():
return stop_flag.get("stop", False)
rt["speaker_stop"] = lambda: stop_flag.__setitem__("stop", stop_flag.get("stop", False))
speaker = RealtimeSpeaker(
session=session,
queue_path=queue_path,
processed_path=processed_path,
)
def _speaker_loop():
try:
speaker.run_until_stopped(_stop_fn)
except Exception as e:
state.set(error=f"realtime speaker crashed: {e}")
t_speaker = threading.Thread(target=_speaker_loop, name="meet-speaker", daemon=True)
t_speaker.start()
rt["speaker_thread"] = t_speaker
# PCM pump: feeds speaker.pcm (24kHz s16le mono) into the OS audio
# device that Chrome's fake mic reads from. Different tools per
# platform, but the contract is the same — block-read the growing
# PCM file and stream it to the device in near-real-time.
platform_tag = (bridge_info or {}).get("platform")
if platform_tag == "linux":
import subprocess as _sp
sink = (bridge_info or {}).get("write_target") or "hermes_meet_sink"
try:
proc = _sp.Popen(
[
"paplay",
"--raw",
"--rate=24000",
"--format=s16le",
"--channels=1",
f"--device={sink}",
str(pcm_path),
],
stdin=_sp.DEVNULL,
stdout=_sp.DEVNULL,
stderr=_sp.DEVNULL,
)
rt["pcm_pump"] = proc
except FileNotFoundError:
state.set(error="paplay not found — install pulseaudio-utils for realtime on Linux")
elif platform_tag == "darwin":
# macOS: use ffmpeg to tail-read speaker.pcm and write it to the
# BlackHole output device. The user must have BlackHole selected
# as the default input in System Settings → Sound for Chrome to
# pick it up. We prefer ffmpeg because it's scriptable and can
# target AVFoundation devices by name; fall back to afplay-ing
# the file in a tight loop if ffmpeg is absent.
import shutil as _shutil
import subprocess as _sp
device_name = (bridge_info or {}).get("write_target") or "BlackHole 2ch"
if _shutil.which("ffmpeg"):
try:
# -re: read input at native frame rate.
# -f avfoundation -i: speaker path as raw PCM.
# -f s16le -ar 24000 -ac 1 -i <pcm>: interpret the file.
# -f audiotoolbox -audio_device_index: write to BlackHole.
# Simpler: output as raw via coreaudio using "-f audiotoolbox".
# ffmpeg's audiotoolbox output picks the current default
# output device, which isn't what we want. Instead we use
# -f avfoundation with the named device as OUTPUT via
# -vn and the device name.
proc = _sp.Popen(
[
"ffmpeg",
"-nostdin", "-hide_banner", "-loglevel", "error",
"-re",
"-f", "s16le", "-ar", "24000", "-ac", "1",
"-i", str(pcm_path),
"-f", "audiotoolbox",
"-audio_device_index", _mac_audio_device_index(device_name),
"-",
],
stdin=_sp.DEVNULL,
stdout=_sp.DEVNULL,
stderr=_sp.DEVNULL,
)
rt["pcm_pump"] = proc
except FileNotFoundError:
state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
except Exception as e:
state.set(error=f"macOS pcm pump failed to start: {e}")
else:
state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
def _mac_audio_device_index(device_name: str) -> str:
"""Return the ffmpeg ``-audio_device_index`` for *device_name*, as a string.
Probes ``ffmpeg -f avfoundation -list_devices true -i ''`` (which prints
the device table on stderr) and matches *device_name* case-insensitively.
Defaults to ``"0"`` if the device can't be found — caller will get a
misrouted stream but not a crash, and the error will be obvious.
"""
import subprocess as _sp
try:
out = _sp.run(
["ffmpeg", "-f", "avfoundation", "-list_devices", "true", "-i", ""],
capture_output=True,
text=True,
timeout=10,
)
except Exception:
return "0"
# ffmpeg prints the table on stderr. Lines look like:
# [AVFoundation indev @ 0x...] [0] BlackHole 2ch
import re as _re
needle = device_name.strip().lower()
for line in (out.stderr or "").splitlines():
m = _re.search(r"\[(\d+)\]\s+(.+)$", line)
if not m:
continue
if m.group(2).strip().lower() == needle:
return m.group(1)
return "0"
def run_bot() -> int: # noqa: C901 — orchestration, explicit branches
url = os.environ.get("HERMES_MEET_URL", "").strip()
out_dir_env = os.environ.get("HERMES_MEET_OUT_DIR", "").strip()
headed = os.environ.get("HERMES_MEET_HEADED", "").lower() in ("1", "true", "yes")
auth_state = os.environ.get("HERMES_MEET_AUTH_STATE", "").strip()
guest_name = os.environ.get("HERMES_MEET_GUEST_NAME", "Hermes Agent")
duration_s = _parse_duration(os.environ.get("HERMES_MEET_DURATION", ""))
# v2: optional realtime mode. Enabled when HERMES_MEET_MODE=realtime.
mode = os.environ.get("HERMES_MEET_MODE", "transcribe").strip().lower()
realtime_model = os.environ.get("HERMES_MEET_REALTIME_MODEL", "gpt-realtime")
realtime_voice = os.environ.get("HERMES_MEET_REALTIME_VOICE", "alloy")
realtime_instructions = os.environ.get("HERMES_MEET_REALTIME_INSTRUCTIONS", "")
realtime_api_key = os.environ.get("HERMES_MEET_REALTIME_KEY") or os.environ.get("OPENAI_API_KEY", "")
if not url or not _is_safe_meet_url(url):
sys.stderr.write(
"google_meet bot: refusing to launch — HERMES_MEET_URL must be a "
"meet.google.com URL. got: %r\n" % url
)
return 2
if not out_dir_env:
sys.stderr.write("google_meet bot: HERMES_MEET_OUT_DIR is required\n")
return 2
out_dir = Path(out_dir_env)
meeting_id = _meeting_id_from_url(url)
state = _BotState(out_dir=out_dir, meeting_id=meeting_id, url=url)
# SIGTERM → exit cleanly so the parent ``meet_leave`` gets a finalized
# transcript. We set a flag instead of raising so the Playwright context
# teardown runs in the finally block below.
stop_flag = {"stop": False}
def _on_signal(_sig, _frame):
stop_flag["stop"] = True
signal.signal(signal.SIGTERM, _on_signal)
signal.signal(signal.SIGINT, _on_signal)
# v2 realtime: provision virtual audio device + start speaker thread.
# We track these in a dict so the finally block can tear them down
# regardless of how we exit. If anything in the realtime setup fails we
# fall back to transcribe mode with a status flag.
rt = {
"enabled": mode == "realtime",
"bridge": None, # AudioBridge | None
"bridge_info": None, # dict | None
"session": None, # RealtimeSession | None
"speaker_thread": None, # threading.Thread | None
"speaker_stop": None, # callable | None
}
if rt["enabled"]:
if not realtime_api_key:
state.set(error="realtime mode requested but no API key in HERMES_MEET_REALTIME_KEY/OPENAI_API_KEY — falling back to transcribe")
rt["enabled"] = False
else:
try:
from plugins.google_meet.audio_bridge import AudioBridge
bridge = AudioBridge()
rt["bridge_info"] = bridge.setup()
rt["bridge"] = bridge
state.set(realtime=True, realtime_device=rt["bridge_info"].get("device_name"))
except Exception as e:
state.set(error=f"audio bridge setup failed: {e} — falling back to transcribe")
rt["enabled"] = False
try:
from playwright.sync_api import sync_playwright
except ImportError as e:
state.set(error=f"playwright not installed: {e}", exited=True)
sys.stderr.write(
"google_meet bot: playwright is not installed. Run "
"`pip install playwright && python -m playwright install chromium`\n"
)
if rt["bridge"]:
rt["bridge"].teardown()
return 3
# Chrome env: if realtime is live on Linux, point PULSE_SOURCE at the
# virtual source so Chrome's fake mic reads the audio we generate.
chrome_env = os.environ.copy()
chrome_args = [
"--use-fake-ui-for-media-stream",
"--disable-blink-features=AutomationControlled",
]
if not rt["enabled"]:
# v1-style fake device (silence) — we don't care about mic content
# when we're not speaking.
chrome_args.insert(1, "--use-fake-device-for-media-stream")
elif rt["bridge_info"] and rt["bridge_info"].get("platform") == "linux":
chrome_env["PULSE_SOURCE"] = rt["bridge_info"].get("device_name", "")
try:
with sync_playwright() as pw:
# Playwright's launch() doesn't take env; we set PULSE_SOURCE
# via the process env before launch so the child Chrome inherits it.
for k, v in chrome_env.items():
os.environ[k] = v
browser = pw.chromium.launch(
headless=not headed,
args=chrome_args,
)
context_args = {
"viewport": {"width": 1280, "height": 800},
"user_agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
),
"permissions": ["microphone", "camera"],
}
if auth_state and Path(auth_state).is_file():
context_args["storage_state"] = auth_state
context = browser.new_context(**context_args)
page = context.new_page()
try:
page.goto(url, wait_until="domcontentloaded", timeout=30_000)
except Exception as e:
state.set(error=f"navigate failed: {e}", exited=True)
return 4
# Guest-mode: Meet shows a name field before "Ask to join". When
# we're authed, we instead see "Join now".
_try_guest_name(page, guest_name)
_click_join(page, state)
# Install caption observer and attempt to enable captions.
try:
page.evaluate(_enable_captions_js())
state.set(captions_enabled_attempted=True)
except Exception:
pass
try:
page.evaluate(_CAPTION_OBSERVER_JS)
except Exception as e:
state.set(error=f"caption observer install failed: {e}")
# Note: in_call=False until admission is confirmed (we detect
# either the Leave button or the caption region, signalling we
# made it past the lobby).
state.set(captioning=True, join_attempted_at=time.time())
# v2 realtime: start the speaker thread reading from the
# plugin-side say queue. The thread reads JSONL lines written by
# meet_say, calls OpenAI Realtime, and streams the audio PCM to
# the virtual sink that Chrome's fake-mic is pointed at.
if rt["enabled"]:
_start_realtime_speaker(
rt=rt,
out_dir=out_dir,
bridge_info=rt["bridge_info"],
api_key=realtime_api_key,
model=realtime_model,
voice=realtime_voice,
instructions=realtime_instructions,
stop_flag=stop_flag,
state=state,
)
if rt["session"] is not None:
state.set(realtime_ready=True)
# Admission + drain loop. Runs until SIGTERM, duration expiry,
# or the page detects "You were removed / you left the
# meeting". Responsible for:
# * detecting admission (Leave button visible → in_call=True)
# * timing out stuck-in-lobby (default 5 minutes)
# * draining scraped captions into the transcript
# * triggering realtime barge-in when a human speaks while
# the bot is generating audio
# * periodically flushing realtime counters into status.json
deadline = (time.time() + duration_s) if duration_s else None
lobby_deadline = time.time() + float(
os.environ.get("HERMES_MEET_LOBBY_TIMEOUT", "300")
)
last_admission_check = 0.0
while not stop_flag["stop"]:
now = time.time()
if deadline and now > deadline:
state.set(leave_reason="duration_expired")
break
# Admission detection every ~3s until admitted.
if not state.in_call and (now - last_admission_check) > 3.0:
last_admission_check = now
admitted = _detect_admission(page)
if admitted:
state.set(
in_call=True,
lobby_waiting=False,
joined_at=now,
)
elif now > lobby_deadline:
state.set(
error=(
"lobby timeout — host never admitted the bot "
f"within {int(lobby_deadline - state.join_attempted_at) if state.join_attempted_at else 0}s"
),
leave_reason="lobby_timeout",
)
break
elif _detect_denied(page):
state.set(
error="host denied admission",
leave_reason="denied",
)
break
try:
queued = page.evaluate("window.__hermesMeetDrain && window.__hermesMeetDrain()")
if isinstance(queued, list):
for entry in queued:
if not isinstance(entry, dict):
continue
speaker = str(entry.get("speaker", ""))
text = str(entry.get("text", ""))
state.record_caption(speaker=speaker, text=text)
# Barge-in: if the bot is currently generating
# audio AND a real human just spoke, cancel the
# in-flight response so we don't talk over them.
if rt["enabled"] and rt["session"] is not None:
if _looks_like_human_speaker(speaker, guest_name):
try:
cancelled = rt["session"].cancel_response()
if cancelled:
state.set(last_barge_in_at=now)
except Exception:
pass
except Exception:
# Meet reloaded or we got booted — try to detect and
# exit gracefully rather than spinning.
if page.is_closed():
state.set(leave_reason="page_closed")
break
# Fold the realtime session's byte/timestamp counters into
# the status file so meet_status can surface them.
if rt["session"] is not None:
state.set(
audio_bytes_out=getattr(rt["session"], "audio_bytes_out", 0),
last_audio_out_at=getattr(rt["session"], "last_audio_out_at", None),
)
time.sleep(1.0)
# Try to leave cleanly — click "Leave call" button if present.
try:
page.evaluate(
"() => { const b = document.querySelector('button[aria-label*=\"eave call\"]');"
" if (b) b.click(); }"
)
except Exception:
pass
context.close()
browser.close()
# v2: teardown realtime speaker + audio bridge.
if rt["speaker_stop"]:
try:
rt["speaker_stop"]()
except Exception:
pass
if rt["speaker_thread"] is not None:
try:
rt["speaker_thread"].join(timeout=5.0)
except Exception:
pass
if rt["session"]:
try:
rt["session"].close()
except Exception:
pass
if rt["bridge"]:
try:
rt["bridge"].teardown()
except Exception:
pass
state.set(in_call=False, captioning=False, exited=True)
return 0
except Exception as e:
state.set(error=f"unhandled: {e}", exited=True)
return 1
def _try_guest_name(page, guest_name: str) -> None:
"""If Meet is showing a guest-name input, type *guest_name* into it."""
try:
# Meet's guest name input has placeholder "Your name".
locator = page.locator('input[aria-label*="name" i]').first
if locator.count() and locator.is_visible():
locator.fill(guest_name, timeout=2_000)
except Exception:
pass
def _detect_admission(page) -> bool:
"""True if we're clearly past the lobby and in the call itself.
Uses a JS-side probe because Meet's DOM structure varies by client
version. We check several high-signal indicators and declare admission
on the first hit:
1. Leave-call button is present (``aria-label`` contains "eave call").
2. Caption region has appeared (we installed the observer and it attached).
3. The participant list container is visible.
Conservative by default returns False on any error.
"""
probe = r"""
(() => {
const leave = document.querySelector('button[aria-label*="eave call" i]');
if (leave) return true;
if (window.__hermesMeetInstalled) {
const caps = document.querySelector(
'[role="region"][aria-label*="aption" i], ' +
'div[jsname="YSxPC"], div[jsname="tgaKEf"]'
);
if (caps) return true;
}
const parts = document.querySelector('[aria-label*="articipants" i]');
if (parts) return true;
return false;
})();
"""
try:
return bool(page.evaluate(probe))
except Exception:
return False
def _detect_denied(page) -> bool:
"""True when Meet is showing a 'you were denied' / 'no one admitted' page."""
probe = r"""
(() => {
const text = document.body ? document.body.innerText || '' : '';
// English only matches what shows up when the host denies or
// removes a guest.
if (/You can't join this video call/i.test(text)) return true;
if (/You were removed from the meeting/i.test(text)) return true;
if (/No one responded to your request to join/i.test(text)) return true;
return false;
})();
"""
try:
return bool(page.evaluate(probe))
except Exception:
return False
def _looks_like_human_speaker(speaker: str, bot_guest_name: str) -> bool:
"""Whether a caption line's speaker is probably a human, not our bot echo.
Meet attributes captions to the speaker's display name. When Chrome is
reading our fake mic, Meet still attributes captions to *our* bot name
(because the bot is the one "speaking"). We don't want those to trigger
barge-in. Anything else real participant names does.
Conservative: unknown / blank speakers (common when caption scraping
falls back to raw text) do NOT trigger barge-in, because we can't tell
whether it was a human or us.
"""
if not speaker or not speaker.strip():
return False
spk = speaker.strip().lower()
if spk in ("unknown", "you", bot_guest_name.strip().lower()):
return False
return True
def _click_join(page, state: _BotState) -> None:
"""Click 'Join now' or 'Ask to join' if either button is visible.
Flags ``lobby_waiting`` when we hit the "waiting for host to admit you"
state so the agent can surface that in status.
"""
for label in ("Join now", "Ask to join"):
try:
btn = page.get_by_role("button", name=label, exact=False).first
if btn.count() and btn.is_visible():
btn.click(timeout=3_000)
if label == "Ask to join":
state.set(lobby_waiting=True)
break
except Exception:
continue
def _parse_duration(raw: str) -> Optional[float]:
"""Parse ``30m`` / ``2h`` / ``90`` (seconds) → float seconds, or None."""
if not raw:
return None
raw = raw.strip().lower()
try:
if raw.endswith("h"):
return float(raw[:-1]) * 3600
if raw.endswith("m"):
return float(raw[:-1]) * 60
if raw.endswith("s"):
return float(raw[:-1])
return float(raw)
except ValueError:
return None
if __name__ == "__main__": # pragma: no cover — subprocess entry point
sys.exit(run_bot())
+54
View File
@@ -0,0 +1,54 @@
"""Remote 'node host' primitive for the google_meet plugin.
Lets the Meet bot (Playwright + Chrome) run on a different machine than
the hermes-agent gateway. The gateway speaks a small JSON-over-WebSocket
RPC protocol to the remote node; the node wraps the existing
``plugins.google_meet.process_manager`` API.
Topology
--------
gateway (Linux) ws://mac.local:18789 node server (Mac)
process_manager
meet_bot (Playwright)
Why: Google sign-in + Chrome profile live on the user's laptop. Running
the bot there reuses that profile without shipping credentials to the
server.
Public surface
--------------
NodeClient gateway-side RPC client (short-lived sync WS per call)
NodeServer long-running server that hosts the bot
NodeRegistry local JSON registry of approved nodes (name url+token)
protocol message envelope helpers (make_request, encode, decode, ...)
"""
from __future__ import annotations
from plugins.google_meet.node import protocol
from plugins.google_meet.node.client import NodeClient
from plugins.google_meet.node.protocol import (
VALID_REQUEST_TYPES,
decode,
encode,
make_error,
make_request,
make_response,
validate_request,
)
from plugins.google_meet.node.registry import NodeRegistry
from plugins.google_meet.node.server import NodeServer
__all__ = [
"NodeClient",
"NodeServer",
"NodeRegistry",
"protocol",
"make_request",
"make_response",
"make_error",
"encode",
"decode",
"validate_request",
"VALID_REQUEST_TYPES",
]
+125
View File
@@ -0,0 +1,125 @@
"""`hermes meet node ...` subcommand tree.
Wired into the existing ``hermes meet`` parser by the plugin's top-level
CLI. This module only defines the subparsers and their dispatch it
does not mutate the existing cli.py.
"""
from __future__ import annotations
import argparse
import asyncio
import json
import sys
from typing import Any
from plugins.google_meet.node.client import NodeClient
from plugins.google_meet.node.registry import NodeRegistry
from plugins.google_meet.node.server import NodeServer
def register_cli(subparser: argparse.ArgumentParser) -> None:
"""Add ``run / list / approve / remove / status / ping`` subparsers.
*subparser* is the ``hermes meet node`` argparse object typically
the result of ``meet_parser.add_parser('node', ...)``.
"""
sp = subparser.add_subparsers(dest="node_cmd", required=True)
run = sp.add_parser("run", help="Start a node server on this machine.")
run.add_argument("--host", default="0.0.0.0")
run.add_argument("--port", type=int, default=18789)
run.add_argument("--display-name", default="hermes-meet-node")
run.set_defaults(func=node_command)
lst = sp.add_parser("list", help="List approved remote nodes.")
lst.set_defaults(func=node_command)
app = sp.add_parser("approve", help="Register a remote node on the gateway.")
app.add_argument("name")
app.add_argument("url")
app.add_argument("token")
app.set_defaults(func=node_command)
rm = sp.add_parser("remove", help="Forget a registered node.")
rm.add_argument("name")
rm.set_defaults(func=node_command)
st = sp.add_parser("status", help="Ping a registered node.")
st.add_argument("name")
st.set_defaults(func=node_command)
pg = sp.add_parser("ping", help="Alias for status.")
pg.add_argument("name")
pg.set_defaults(func=node_command)
def node_command(args: argparse.Namespace) -> int:
"""Dispatch for ``hermes meet node ...``.
Returns a process exit code. Side-effects print to stdout/stderr.
"""
cmd = getattr(args, "node_cmd", None)
if cmd == "run":
server = NodeServer(
host=args.host,
port=args.port,
display_name=args.display_name,
)
token = server.ensure_token()
print(f"[meet-node] display_name={server.display_name}")
print(f"[meet-node] listening on ws://{args.host}:{args.port}")
print(f"[meet-node] token (copy to gateway): {token}")
print(f"[meet-node] approve with:")
print(f" hermes meet node approve <name> ws://<host>:{args.port} {token}")
try:
asyncio.run(server.serve())
except KeyboardInterrupt:
return 0
except RuntimeError as exc:
print(f"[meet-node] error: {exc}", file=sys.stderr)
return 2
return 0
reg = NodeRegistry()
if cmd == "list":
nodes = reg.list_all()
if not nodes:
print("no nodes registered")
return 0
for n in nodes:
print(f"{n['name']}\t{n['url']}\ttoken={n['token'][:6]}")
return 0
if cmd == "approve":
reg.add(args.name, args.url, args.token)
print(f"approved node {args.name!r} at {args.url}")
return 0
if cmd == "remove":
ok = reg.remove(args.name)
print(f"removed {args.name!r}" if ok else f"no such node: {args.name!r}")
return 0 if ok else 1
if cmd in ("status", "ping"):
entry = reg.get(args.name)
if entry is None:
print(f"no such node: {args.name!r}", file=sys.stderr)
return 1
client = NodeClient(entry["url"], entry["token"])
try:
result = client.ping()
except Exception as exc: # noqa: BLE001 — surface any connection error
print(json.dumps({"ok": False, "error": str(exc)}))
return 1
print(json.dumps({"ok": True, "node": args.name, **_coerce_dict(result)}))
return 0
print(f"unknown node command: {cmd!r}", file=sys.stderr)
return 2
def _coerce_dict(value: Any) -> dict:
return value if isinstance(value, dict) else {"result": value}
+107
View File
@@ -0,0 +1,107 @@
"""Gateway-side RPC client for a remote meet node.
Each call opens a short-lived synchronous WebSocket to the node, sends
exactly one request, reads exactly one response, and closes. This keeps
the client trivial to use from non-async tool handlers and avoids
maintaining persistent connection state across agent turns.
The ``websockets`` package is an optional dep we import it lazily so
plugin load doesn't require it.
"""
from __future__ import annotations
from typing import Any, Dict, Optional
from plugins.google_meet.node import protocol as _proto
class NodeClient:
"""Thin synchronous WS client matching the server's request surface."""
def __init__(self, url: str, token: str, timeout: float = 10.0) -> None:
if not isinstance(url, str) or not url:
raise ValueError("url must be a non-empty string")
if not isinstance(token, str) or not token:
raise ValueError("token must be a non-empty string")
self.url = url
self.token = token
self.timeout = float(timeout)
# ----- core RPC -----------------------------------------------------
def _rpc(self, type: str, payload: Dict[str, Any]) -> Dict[str, Any]:
"""Send one request, return the response payload dict.
Raises RuntimeError when the server sends an ``error`` envelope
or the response id doesn't match.
"""
try:
from websockets.sync.client import connect # type: ignore
except ImportError as exc:
raise RuntimeError(
"NodeClient requires the 'websockets' package. "
"Install it with: pip install websockets"
) from exc
req = _proto.make_request(type, self.token, payload)
raw_out = _proto.encode(req)
with connect(self.url, open_timeout=self.timeout,
close_timeout=self.timeout) as ws:
ws.send(raw_out)
raw_in = ws.recv(timeout=self.timeout)
if isinstance(raw_in, (bytes, bytearray)):
raw_in = raw_in.decode("utf-8")
resp = _proto.decode(raw_in)
if resp.get("type") == "error":
raise RuntimeError(f"node error: {resp.get('error', '<unknown>')}")
if resp.get("id") != req["id"]:
raise RuntimeError(
f"response id mismatch: sent {req['id']}, got {resp.get('id')!r}"
)
payload_out = resp.get("payload")
if not isinstance(payload_out, dict):
# Ping returns {"type": "pong", "payload": {...}} — still a dict.
raise RuntimeError("response missing payload dict")
return payload_out
# ----- convenience methods -----------------------------------------
def start_bot(
self,
url: str,
guest_name: str = "Hermes Agent",
duration: Optional[str] = None,
headed: bool = False,
mode: str = "transcribe",
) -> Dict[str, Any]:
payload: Dict[str, Any] = {
"url": url,
"guest_name": guest_name,
"headed": bool(headed),
"mode": mode,
}
if duration is not None:
payload["duration"] = duration
return self._rpc("start_bot", payload)
def stop(self) -> Dict[str, Any]:
return self._rpc("stop", {})
def status(self) -> Dict[str, Any]:
return self._rpc("status", {})
def transcript(self, last: Optional[int] = None) -> Dict[str, Any]:
payload: Dict[str, Any] = {}
if last is not None:
payload["last"] = int(last)
return self._rpc("transcript", payload)
def say(self, text: str) -> Dict[str, Any]:
return self._rpc("say", {"text": str(text)})
def ping(self) -> Dict[str, Any]:
return self._rpc("ping", {})
+124
View File
@@ -0,0 +1,124 @@
"""Wire protocol for gateway ↔ node RPC.
Everything is a JSON object with the same envelope shape:
Request: {"type": <str>, "id": <str>, "token": <str>, "payload": <dict>}
Response: {"type": "<req-type>_res", "id": <req-id>, "payload": <dict>}
Error: {"type": "error", "id": <req-id>, "error": <str>}
Requests must carry the shared bearer token (set up via
``hermes meet node approve`` on the gateway and read off disk on the
server). Mismatched tokens are rejected before dispatch.
"""
from __future__ import annotations
import json
import uuid
from typing import Any, Dict, Tuple
VALID_REQUEST_TYPES = frozenset({
"start_bot",
"stop",
"status",
"transcript",
"say",
"ping",
})
def make_request(
type: str,
token: str,
payload: Dict[str, Any],
req_id: str | None = None,
) -> Dict[str, Any]:
"""Construct a request envelope.
``req_id`` is auto-generated (uuid4 hex) when not supplied so callers
can correlate async responses.
"""
if not isinstance(type, str) or not type:
raise ValueError("type must be a non-empty string")
if type not in VALID_REQUEST_TYPES:
raise ValueError(f"unknown request type: {type!r}")
if not isinstance(token, str):
raise ValueError("token must be a string")
if not isinstance(payload, dict):
raise ValueError("payload must be a dict")
return {
"type": type,
"id": req_id or uuid.uuid4().hex,
"token": token,
"payload": payload,
}
def make_response(req_id: str, payload: Dict[str, Any]) -> Dict[str, Any]:
"""Build a success response. The caller supplies the *request* type;
we suffix it with ``_res`` so clients can assert they got the right
reply.
For simplicity we don't require the type here — clients usually just
key off ``id``. But we still emit a generic ``*_res`` envelope.
"""
if not isinstance(payload, dict):
raise ValueError("payload must be a dict")
return {"type": "response", "id": req_id, "payload": payload}
def make_error(req_id: str, error: str) -> Dict[str, Any]:
return {"type": "error", "id": req_id, "error": str(error)}
def encode(msg: Dict[str, Any]) -> str:
"""Serialize a message envelope to a JSON string."""
return json.dumps(msg, separators=(",", ":"), ensure_ascii=False)
def decode(raw: str) -> Dict[str, Any]:
"""Parse a JSON envelope, raising ValueError on anything malformed.
Minimal type validation: must be an object, must contain ``type`` and
``id``. Heavier validation (token match, payload shape) happens in
:func:`validate_request` on the server side.
"""
try:
obj = json.loads(raw)
except (TypeError, json.JSONDecodeError) as exc:
raise ValueError(f"malformed JSON: {exc}") from exc
if not isinstance(obj, dict):
raise ValueError("envelope must be a JSON object")
if "type" not in obj or not isinstance(obj["type"], str):
raise ValueError("envelope missing string 'type'")
if "id" not in obj or not isinstance(obj["id"], str):
raise ValueError("envelope missing string 'id'")
return obj
def validate_request(msg: Dict[str, Any], expected_token: str) -> Tuple[bool, str]:
"""Check a decoded request against the server's shared token.
Returns ``(True, "")`` when the envelope is acceptable or
``(False, <reason>)`` otherwise. Reason strings are safe to surface
back to the client in an error envelope.
"""
if not isinstance(msg, dict):
return False, "envelope must be a dict"
t = msg.get("type")
if not isinstance(t, str) or not t:
return False, "missing or non-string 'type'"
if t not in VALID_REQUEST_TYPES:
return False, f"unknown request type: {t!r}"
if not isinstance(msg.get("id"), str) or not msg.get("id"):
return False, "missing or non-string 'id'"
token = msg.get("token")
if not isinstance(token, str) or not token:
return False, "missing token"
if token != expected_token:
return False, "token mismatch"
payload = msg.get("payload")
if not isinstance(payload, dict):
return False, "payload must be a dict"
return True, ""
+112
View File
@@ -0,0 +1,112 @@
"""Local JSON registry of approved remote meet nodes.
Lives at ``$HERMES_HOME/workspace/meetings/nodes.json``. The gateway
consults it to resolve a ``chrome_node`` name to a ``(url, token)`` pair
before opening a WebSocket to the remote bot host.
Schema
------
{
"nodes": {
"<name>": {
"url": "ws://host:port",
"token": "...",
"added_at": <epoch_float>
}
}
}
"""
from __future__ import annotations
import json
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
from hermes_constants import get_hermes_home
def _default_path() -> Path:
return Path(get_hermes_home()) / "workspace" / "meetings" / "nodes.json"
class NodeRegistry:
"""Simple file-backed registry. Not concurrent-safe across processes
single writer assumed (the gateway CLI)."""
def __init__(self, path: Optional[Path] = None) -> None:
self.path = Path(path) if path is not None else _default_path()
# ----- storage ------------------------------------------------------
def _load(self) -> Dict[str, Any]:
if not self.path.is_file():
return {"nodes": {}}
try:
data = json.loads(self.path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
return {"nodes": {}}
if not isinstance(data, dict) or not isinstance(data.get("nodes"), dict):
return {"nodes": {}}
return data
def _save(self, data: Dict[str, Any]) -> None:
self.path.parent.mkdir(parents=True, exist_ok=True)
tmp = self.path.with_suffix(".json.tmp")
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
tmp.replace(self.path)
# ----- public API ---------------------------------------------------
def get(self, name: str) -> Optional[Dict[str, Any]]:
data = self._load()
entry = data["nodes"].get(name)
if entry is None:
return None
return {"name": name, **entry}
def add(self, name: str, url: str, token: str) -> None:
if not isinstance(name, str) or not name:
raise ValueError("node name must be a non-empty string")
if not isinstance(url, str) or not url:
raise ValueError("url must be a non-empty string")
if not isinstance(token, str) or not token:
raise ValueError("token must be a non-empty string")
data = self._load()
data["nodes"][name] = {
"url": url,
"token": token,
"added_at": time.time(),
}
self._save(data)
def remove(self, name: str) -> bool:
data = self._load()
if name in data["nodes"]:
del data["nodes"][name]
self._save(data)
return True
return False
def list_all(self) -> List[Dict[str, Any]]:
data = self._load()
out: List[Dict[str, Any]] = []
for name, entry in sorted(data["nodes"].items()):
out.append({"name": name, **entry})
return out
def resolve(self, chrome_node: Optional[str]) -> Optional[Dict[str, Any]]:
"""Resolve a node name to its entry.
If ``chrome_node`` is provided, return that named node (or None).
If ``chrome_node`` is None, return the sole registered node when
exactly one is registered; otherwise return None (ambiguous or
empty).
"""
if chrome_node:
return self.get(chrome_node)
nodes = self.list_all()
if len(nodes) == 1:
return nodes[0]
return None
+193
View File
@@ -0,0 +1,193 @@
"""Remote node server.
Runs on the machine that will host the Meet bot (typically the user's
Mac laptop with a signed-in Chrome). Exposes a WebSocket endpoint that
accepts signed RPC requests and dispatches them to the existing
``plugins.google_meet.process_manager`` module.
Launched by ``hermes meet node run``.
Token handling
--------------
On first boot we mint 32 hex chars of entropy and persist them at
``$HERMES_HOME/workspace/meetings/node_token.json``. Subsequent boots
reuse the same token so previously-approved gateways don't need to be
re-paired. The operator copies this token out-of-band to the gateway
via ``hermes meet node approve <name> <url> <token>``.
Dependencies
------------
``websockets`` is an optional dep. We import it lazily inside
:meth:`serve` so installing the plugin doesn't require it unless you
actually host a node.
"""
from __future__ import annotations
import json
import secrets
import time
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import get_hermes_home
from plugins.google_meet.node import protocol as _proto
def _default_token_path() -> Path:
return Path(get_hermes_home()) / "workspace" / "meetings" / "node_token.json"
class NodeServer:
"""WebSocket server that executes meet bot RPCs locally."""
def __init__(
self,
host: str = "0.0.0.0",
port: int = 18789,
token_path: Optional[Path] = None,
display_name: str = "hermes-meet-node",
) -> None:
self.host = host
self.port = port
self.display_name = display_name
self.token_path = Path(token_path) if token_path is not None else _default_token_path()
self._token: Optional[str] = None
# ----- token management --------------------------------------------
def ensure_token(self) -> str:
"""Return the persisted shared secret, generating one on first use."""
if self._token:
return self._token
if self.token_path.is_file():
try:
data = json.loads(self.token_path.read_text(encoding="utf-8"))
tok = data.get("token")
if isinstance(tok, str) and tok:
self._token = tok
return tok
except (OSError, json.JSONDecodeError):
pass
tok = secrets.token_hex(16) # 32 hex chars
self.token_path.parent.mkdir(parents=True, exist_ok=True)
tmp = self.token_path.with_suffix(".json.tmp")
tmp.write_text(
json.dumps({"token": tok, "generated_at": time.time()}, indent=2),
encoding="utf-8",
)
tmp.replace(self.token_path)
self._token = tok
return tok
def get_token(self) -> str:
"""Alias for :meth:`ensure_token`; does not mutate on subsequent calls."""
return self.ensure_token()
# ----- dispatch -----------------------------------------------------
async def _handle_request(self, msg: Dict[str, Any]) -> Dict[str, Any]:
"""Validate + dispatch a single decoded request envelope.
Always returns a response envelope (success or error); never
raises. Errors from inside the process_manager are wrapped into
the response payload's ``ok``/``error`` keys (which pm already
does) rather than being re-encoded as error envelopes the
envelope-level error channel is reserved for auth / protocol
failures.
"""
expected = self.ensure_token()
ok, reason = _proto.validate_request(msg, expected)
if not ok:
return _proto.make_error(str(msg.get("id") or ""), reason)
req_id = msg["id"]
t = msg["type"]
payload = msg["payload"]
# Import lazily so test mocks can monkeypatch freely.
from plugins.google_meet import process_manager as pm
try:
if t == "ping":
return {"type": "pong", "id": req_id,
"payload": {"display_name": self.display_name,
"ts": time.time()}}
if t == "start_bot":
# Whitelist kwargs we pass through to pm.start.
kwargs = {
k: payload[k]
for k in ("url", "guest_name", "duration", "headed",
"auth_state", "session_id", "out_dir")
if k in payload
}
if "url" not in kwargs:
return _proto.make_error(req_id, "missing 'url' in payload")
result = pm.start(**kwargs)
return _proto.make_response(req_id, result)
if t == "stop":
reason_arg = payload.get("reason", "requested")
result = pm.stop(reason=reason_arg)
return _proto.make_response(req_id, result)
if t == "status":
return _proto.make_response(req_id, pm.status())
if t == "transcript":
last = payload.get("last")
result = pm.transcript(last=last)
return _proto.make_response(req_id, result)
if t == "say":
# v2 wiring: enqueue into say_queue.jsonl inside the
# active meeting's out_dir when present. The bot-side
# consumer is v3+ (for v1 this is a stub returning ok).
text = payload.get("text", "")
active = pm._read_active() # type: ignore[attr-defined]
enqueued = False
if active and active.get("out_dir"):
queue = Path(active["out_dir"]) / "say_queue.jsonl"
try:
queue.parent.mkdir(parents=True, exist_ok=True)
with queue.open("a", encoding="utf-8") as fh:
fh.write(json.dumps({"text": text, "ts": time.time()}) + "\n")
enqueued = True
except OSError:
enqueued = False
return _proto.make_response(
req_id,
{"ok": True, "enqueued": enqueued, "text": text},
)
except Exception as exc: # noqa: BLE001 — surface any pm crash to client
return _proto.make_error(req_id, f"{type(exc).__name__}: {exc}")
return _proto.make_error(req_id, f"unhandled type: {t!r}")
# ----- server loop --------------------------------------------------
async def serve(self) -> None:
"""Run the WebSocket server until cancelled.
Blocks forever. Callers typically wrap this in ``asyncio.run``.
"""
try:
import websockets # type: ignore
except ImportError as exc:
raise RuntimeError(
"NodeServer.serve requires the 'websockets' package. "
"Install it with: pip install websockets"
) from exc
self.ensure_token()
async def _handler(ws):
async for raw in ws:
try:
msg = _proto.decode(raw if isinstance(raw, str) else raw.decode("utf-8"))
except ValueError as exc:
await ws.send(_proto.encode(_proto.make_error("", f"decode: {exc}")))
continue
reply = await self._handle_request(msg)
await ws.send(_proto.encode(reply))
async with websockets.serve(_handler, self.host, self.port):
# Run until cancelled.
import asyncio
await asyncio.Future()
+16
View File
@@ -0,0 +1,16 @@
name: google_meet
version: 0.2.0
description: "Join a Google Meet call, transcribe live captions, speak in realtime, and follow up afterwards. v1 transcribe-only is the default; v2 realtime duplex audio via OpenAI Realtime + BlackHole/PulseAudio ships with mode='realtime'; v3 remote node host lets the bot run on a different machine than the gateway (gateway on Linux, Chrome+signed-in profile on the user's Mac). Explicit-by-design: only joins meet.google.com URLs passed in \u2014 no calendar scanning, no auto-dial."
author: NousResearch
kind: standalone
platforms:
- linux
- macos
provides_tools:
- meet_join
- meet_leave
- meet_status
- meet_transcript
- meet_say
hooks:
- on_session_end
+326
View File
@@ -0,0 +1,326 @@
"""Subprocess lifecycle manager for the google_meet bot.
Single active meeting at a time. Stores the running pid + out_dir in a
session-scoped state file under ``$HERMES_HOME/workspace/meetings/.active.json``
so tool calls across turns can find the bot, and ``on_session_end`` can clean
it up.
The bot runs as a detached subprocess we don't hold file descriptors open,
so the parent agent loop can't block on it. We communicate via files only.
"""
from __future__ import annotations
import json
import os
import signal
import subprocess
import sys
import time
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import get_hermes_home
# File + directory layout (under $HERMES_HOME):
#
# workspace/meetings/
# .active.json # pointer to current session's bot
# <meeting-id>/
# status.json # live bot state (written by bot each tick)
# transcript.txt # scraped captions
#
# .active.json holds:
# {"pid": 12345, "meeting_id": "abc-defg-hij", "out_dir": "...",
# "url": "https://meet.google.com/...", "started_at": 1714159200.0,
# "session_id": "optional"}
def _root() -> Path:
return Path(get_hermes_home()) / "workspace" / "meetings"
def _active_file() -> Path:
return _root() / ".active.json"
def _read_active() -> Optional[Dict[str, Any]]:
p = _active_file()
if not p.is_file():
return None
try:
return json.loads(p.read_text(encoding="utf-8"))
except Exception:
return None
def _write_active(data: Dict[str, Any]) -> None:
p = _active_file()
p.parent.mkdir(parents=True, exist_ok=True)
tmp = p.with_suffix(".json.tmp")
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
tmp.replace(p)
def _clear_active() -> None:
try:
_active_file().unlink()
except FileNotFoundError:
pass
def _pid_alive(pid: int) -> bool:
try:
os.kill(pid, 0)
except ProcessLookupError:
return False
except PermissionError:
# Process exists but we can't signal it — treat as alive.
return True
return True
# ---------------------------------------------------------------------------
# Public API — used by tool handlers + CLI
# ---------------------------------------------------------------------------
def start(
url: str,
*,
out_dir: Optional[Path] = None,
headed: bool = False,
auth_state: Optional[str] = None,
guest_name: str = "Hermes Agent",
duration: Optional[str] = None,
session_id: Optional[str] = None,
mode: str = "transcribe",
realtime_model: Optional[str] = None,
realtime_voice: Optional[str] = None,
realtime_instructions: Optional[str] = None,
realtime_api_key: Optional[str] = None,
) -> Dict[str, Any]:
"""Spawn the meet_bot subprocess for *url*.
If a bot is already running for this hermes install, leave it first
we enforce single-active-meeting semantics.
Returns a dict summarizing the started bot.
"""
from plugins.google_meet.meet_bot import _is_safe_meet_url, _meeting_id_from_url
if not _is_safe_meet_url(url):
return {
"ok": False,
"error": (
"refusing: only https://meet.google.com/ URLs are allowed. "
"got: " + repr(url)
),
}
existing = _read_active()
if existing and _pid_alive(int(existing.get("pid", 0))):
stop(reason="replaced by new meet_join")
meeting_id = _meeting_id_from_url(url)
out = out_dir or (_root() / meeting_id)
out.mkdir(parents=True, exist_ok=True)
# Wipe any stale transcript/status files from a previous run of this
# meeting id so polling isn't confused.
for name in ("transcript.txt", "status.json"):
f = out / name
if f.exists():
try:
f.unlink()
except OSError:
pass
env = os.environ.copy()
env["HERMES_MEET_URL"] = url
env["HERMES_MEET_OUT_DIR"] = str(out)
env["HERMES_MEET_GUEST_NAME"] = guest_name
if headed:
env["HERMES_MEET_HEADED"] = "1"
if auth_state:
env["HERMES_MEET_AUTH_STATE"] = auth_state
if duration:
env["HERMES_MEET_DURATION"] = duration
# v2: realtime mode + passthroughs. The bot defaults to transcribe
# mode if HERMES_MEET_MODE isn't set, matching v1 behavior.
if mode:
env["HERMES_MEET_MODE"] = mode
if realtime_model:
env["HERMES_MEET_REALTIME_MODEL"] = realtime_model
if realtime_voice:
env["HERMES_MEET_REALTIME_VOICE"] = realtime_voice
if realtime_instructions:
env["HERMES_MEET_REALTIME_INSTRUCTIONS"] = realtime_instructions
if realtime_api_key:
env["HERMES_MEET_REALTIME_KEY"] = realtime_api_key
log_path = out / "bot.log"
# Detach: stdin=devnull, stdout/stderr → log file, new session so parent
# signals don't propagate.
log_fh = open(log_path, "ab", buffering=0)
try:
proc = subprocess.Popen(
[sys.executable, "-m", "plugins.google_meet.meet_bot"],
stdin=subprocess.DEVNULL,
stdout=log_fh,
stderr=subprocess.STDOUT,
env=env,
start_new_session=True,
close_fds=True,
)
finally:
# The subprocess now owns the log fd; we can close ours.
log_fh.close()
record = {
"pid": proc.pid,
"meeting_id": meeting_id,
"out_dir": str(out),
"url": url,
"started_at": time.time(),
"session_id": session_id,
"log_path": str(log_path),
"mode": mode,
}
_write_active(record)
return {"ok": True, **record}
def status() -> Dict[str, Any]:
"""Return the current meeting state, or ``{"ok": False, "reason": ...}``."""
active = _read_active()
if not active:
return {"ok": False, "reason": "no active meeting"}
pid = int(active.get("pid", 0))
alive = _pid_alive(pid) if pid else False
status_path = Path(active.get("out_dir", "")) / "status.json"
bot_status: Dict[str, Any] = {}
if status_path.is_file():
try:
bot_status = json.loads(status_path.read_text(encoding="utf-8"))
except Exception:
pass
return {
"ok": True,
"alive": alive,
"pid": pid,
"meetingId": active.get("meeting_id"),
"url": active.get("url"),
"startedAt": active.get("started_at"),
"outDir": active.get("out_dir"),
**bot_status,
}
def transcript(last: Optional[int] = None) -> Dict[str, Any]:
"""Read the current transcript file. Returns ok=False if none exists."""
active = _read_active()
if not active:
return {"ok": False, "reason": "no active meeting"}
tp = Path(active.get("out_dir", "")) / "transcript.txt"
if not tp.is_file():
return {
"ok": True,
"meetingId": active.get("meeting_id"),
"lines": [],
"total": 0,
"path": str(tp),
}
text = tp.read_text(encoding="utf-8", errors="replace")
all_lines = [ln for ln in text.splitlines() if ln.strip()]
lines = all_lines[-last:] if last else all_lines
return {
"ok": True,
"meetingId": active.get("meeting_id"),
"lines": lines,
"total": len(all_lines),
"path": str(tp),
}
def enqueue_say(text: str) -> Dict[str, Any]:
"""Append a ``say`` request to the active bot's JSONL queue.
Returns ``{"ok": False, "reason": ...}`` when no meeting is active or
the active bot is in transcribe-only mode. Otherwise writes a line to
``<out_dir>/say_queue.jsonl`` that the bot's realtime speaker thread
will consume.
"""
import uuid
text = (text or "").strip()
if not text:
return {"ok": False, "reason": "text is required"}
active = _read_active()
if not active:
return {"ok": False, "reason": "no active meeting"}
if active.get("mode") != "realtime":
return {
"ok": False,
"reason": (
"active meeting is in transcribe mode — pass mode='realtime' "
"to meet_join to enable agent speech"
),
}
out_dir = Path(active.get("out_dir", ""))
if not out_dir.is_dir():
return {"ok": False, "reason": f"out_dir missing: {out_dir}"}
queue_path = out_dir / "say_queue.jsonl"
entry = {"id": uuid.uuid4().hex[:12], "text": text}
with queue_path.open("a", encoding="utf-8") as f:
f.write(json.dumps(entry) + "\n")
return {
"ok": True,
"meetingId": active.get("meeting_id"),
"enqueued_id": entry["id"],
"queue_path": str(queue_path),
}
def stop(*, reason: str = "requested") -> Dict[str, Any]:
"""Signal the active bot to leave cleanly, then clear the active pointer.
Sends SIGTERM and waits up to 10s for the bot to exit. Falls back to
SIGKILL if the bot doesn't respond.
"""
active = _read_active()
if not active:
return {"ok": False, "reason": "no active meeting"}
pid = int(active.get("pid", 0))
out_dir = active.get("out_dir")
transcript_path = Path(out_dir) / "transcript.txt" if out_dir else None
if pid and _pid_alive(pid):
try:
os.kill(pid, signal.SIGTERM)
except ProcessLookupError:
pass
for _ in range(20):
if not _pid_alive(pid):
break
time.sleep(0.5)
if _pid_alive(pid):
try:
os.kill(pid, signal.SIGKILL)
except ProcessLookupError:
pass
_clear_active()
return {
"ok": True,
"reason": reason,
"meetingId": active.get("meeting_id"),
"transcriptPath": str(transcript_path) if transcript_path else None,
}
+10
View File
@@ -0,0 +1,10 @@
"""Realtime speech subpackage for the google_meet plugin (v2).
Provides a thin OpenAI Realtime API client and a file-queue speaker
wrapper so the Meet bot can play synthesized speech through the
virtual audio bridge.
"""
from .openai_client import RealtimeSession, RealtimeSpeaker # noqa: F401
__all__ = ["RealtimeSession", "RealtimeSpeaker"]
@@ -0,0 +1,332 @@
"""OpenAI Realtime API WebSocket client + file-queue speaker.
This module is the "output" side of the v2 voice bridge: it takes text,
sends it to the OpenAI Realtime API, receives audio deltas back, and
appends the PCM bytes to a file. A separate consumer (the audio
bridge) streams that file into Chrome's fake microphone.
Designed for simplicity: a single synchronous WebSocket connection per
speaker, per session. The ``websockets`` package is imported lazily so
that importing this module never fails just because the optional dep
is missing.
"""
from __future__ import annotations
import base64
import json
import time
import uuid
from pathlib import Path
from typing import Any, Callable, Optional
REALTIME_URL = "wss://api.openai.com/v1/realtime"
def _require_websockets():
"""Import ``websockets.sync.client.connect`` or raise with hint."""
try:
from websockets.sync.client import connect as _connect # type: ignore
except ImportError as exc: # pragma: no cover - exercised via test
raise RuntimeError(
"websockets package is required for OpenAI Realtime; "
"install with: pip install websockets"
) from exc
return _connect
class RealtimeSession:
"""Minimal sync client for the OpenAI Realtime WebSocket API.
Usage:
sess = RealtimeSession(api_key=..., audio_sink_path=Path("out.pcm"))
sess.connect()
sess.speak("Hello team.")
sess.close()
Thread safety: ``speak`` and ``cancel_response`` may be called from
different threads; a lock serializes WebSocket writes.
"""
def __init__(
self,
api_key: str,
model: str = "gpt-realtime",
voice: str = "alloy",
instructions: str = "",
audio_sink_path: Optional[Path] = None,
sample_rate: int = 24000,
) -> None:
import threading as _threading
self.api_key = api_key
self.model = model
self.voice = voice
self.instructions = instructions
self.audio_sink_path = Path(audio_sink_path) if audio_sink_path else None
self.sample_rate = sample_rate
self._ws: Any = None
self._send_lock = _threading.Lock()
self._last_response_id: Optional[str] = None
# Public counters for status reporting.
self.audio_bytes_out: int = 0
self.last_audio_out_at: Optional[float] = None
# ── lifecycle ─────────────────────────────────────────────────────────
def connect(self) -> None:
"""Open WS and send session.update with voice+instructions."""
connect = _require_websockets()
url = f"{REALTIME_URL}?model={self.model}"
headers = [
("Authorization", f"Bearer {self.api_key}"),
("OpenAI-Beta", "realtime=v1"),
]
# websockets.sync.client.connect accepts either additional_headers=
# (newer) or extra_headers= depending on version; try the newer
# name first and fall back.
try:
self._ws = connect(url, additional_headers=headers)
except TypeError:
self._ws = connect(url, extra_headers=headers)
self._send_json(
{
"type": "session.update",
"session": {
"voice": self.voice,
"instructions": self.instructions,
"modalities": ["audio", "text"],
"output_audio_format": "pcm16",
"input_audio_format": "pcm16",
},
}
)
def close(self) -> None:
if self._ws is not None:
try:
self._ws.close()
except Exception:
pass
self._ws = None
# ── speaking ──────────────────────────────────────────────────────────
def speak(self, text: str, timeout: float = 30.0) -> dict:
"""Send ``text`` and accumulate the audio response.
Audio deltas are base64-decoded and appended to
``audio_sink_path`` (opened 'ab' and closed per call, so a
separate streaming reader can consume whatever is there).
"""
if self._ws is None:
raise RuntimeError("RealtimeSession.connect() must be called first")
start = time.monotonic()
self._send_json(
{
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": text}],
},
}
)
self._send_json(
{
"type": "response.create",
"response": {"modalities": ["audio"]},
}
)
bytes_written = 0
sink_fp = None
if self.audio_sink_path is not None:
self.audio_sink_path.parent.mkdir(parents=True, exist_ok=True)
sink_fp = open(self.audio_sink_path, "ab")
try:
while True:
remaining = timeout - (time.monotonic() - start)
if remaining <= 0:
raise TimeoutError(
f"realtime response did not complete within {timeout}s"
)
raw = self._recv(timeout=remaining)
if raw is None:
# Connection closed by peer.
break
try:
frame = json.loads(raw) if isinstance(raw, (str, bytes, bytearray)) else raw
except (TypeError, ValueError):
continue
if not isinstance(frame, dict):
continue
ftype = frame.get("type")
if ftype == "response.audio.delta":
b64 = frame.get("delta") or frame.get("audio") or ""
if b64 and sink_fp is not None:
try:
chunk = base64.b64decode(b64)
except (ValueError, TypeError):
chunk = b""
if chunk:
sink_fp.write(chunk)
sink_fp.flush()
bytes_written += len(chunk)
self.audio_bytes_out += len(chunk)
self.last_audio_out_at = time.time()
elif ftype == "response.created":
rid = (frame.get("response") or {}).get("id")
if rid:
self._last_response_id = rid
elif ftype in ("response.done", "response.completed", "response.cancelled"):
break
elif ftype == "error":
err = frame.get("error") or frame
raise RuntimeError(f"realtime error: {err}")
# All other frames (response.created, response.output_item.*,
# response.audio_transcript.delta, rate_limits.updated, ...)
# are ignored for v2.
finally:
if sink_fp is not None:
sink_fp.close()
duration_ms = (time.monotonic() - start) * 1000.0
return {
"ok": True,
"bytes_written": bytes_written,
"duration_ms": duration_ms,
}
# ── ws plumbing ───────────────────────────────────────────────────────
def cancel_response(self) -> bool:
"""Interrupt the in-flight response (barge-in).
Sends ``response.cancel`` on the current WebSocket so the model
stops generating audio immediately. Safe to call at any time;
returns True if a cancel was actually sent, False when there's
nothing to cancel or the socket isn't open.
"""
if self._ws is None:
return False
try:
self._send_json({"type": "response.cancel"})
return True
except Exception:
return False
def _send_json(self, payload: dict) -> None:
assert self._ws is not None
with self._send_lock:
self._ws.send(json.dumps(payload))
def _recv(self, timeout: Optional[float] = None):
assert self._ws is not None
try:
if timeout is None:
return self._ws.recv()
return self._ws.recv(timeout=timeout)
except TypeError:
# Older websockets may not accept timeout kwarg.
return self._ws.recv()
class RealtimeSpeaker:
"""File-based JSONL queue wrapper around :class:`RealtimeSession`.
Each line in ``queue_path`` is a JSON object of the form
``{"id": "<uuid>", "text": "..."}``. Processed lines are appended
to ``processed_path`` (if set) and then removed from the queue;
if ``processed_path`` is ``None``, processed lines are simply
dropped.
"""
def __init__(
self,
session: RealtimeSession,
queue_path: Path,
processed_path: Optional[Path] = None,
) -> None:
self.session = session
self.queue_path = Path(queue_path)
self.processed_path = Path(processed_path) if processed_path else None
# ── helpers ──────────────────────────────────────────────────────────
def _read_queue(self) -> list[dict]:
if not self.queue_path.exists():
return []
out: list[dict] = []
for line in self.queue_path.read_text().splitlines():
line = line.strip()
if not line:
continue
try:
entry = json.loads(line)
except ValueError:
continue
if not isinstance(entry, dict):
continue
if "id" not in entry:
entry["id"] = str(uuid.uuid4())
out.append(entry)
return out
def _rewrite_queue(self, remaining: list[dict]) -> None:
if not remaining:
# Keep the file but empty — consumers may be watching for
# new writes via mtime, and delete-then-recreate is a race.
self.queue_path.write_text("")
return
self.queue_path.write_text(
"\n".join(json.dumps(e) for e in remaining) + "\n"
)
def _append_processed(self, entry: dict, result: dict) -> None:
if self.processed_path is None:
return
self.processed_path.parent.mkdir(parents=True, exist_ok=True)
record = {"id": entry.get("id"), "text": entry.get("text", ""), "result": result}
with open(self.processed_path, "a") as fp:
fp.write(json.dumps(record) + "\n")
# ── main loop ────────────────────────────────────────────────────────
def run_until_stopped(
self,
stop_fn: Callable[[], bool],
poll_interval: float = 0.5,
) -> None:
while not stop_fn():
entries = self._read_queue()
if not entries:
time.sleep(poll_interval)
continue
# Process one at a time; re-check the queue file after each
# speak() call because new entries may have arrived.
head = entries[0]
text = (head.get("text") or "").strip()
if text:
try:
result = self.session.speak(text)
except Exception as exc:
result = {"ok": False, "error": str(exc)}
else:
result = {"ok": True, "bytes_written": 0, "duration_ms": 0.0}
self._append_processed(head, result)
# Re-read the queue from disk in case it was appended to
# while we were speaking, then drop the head.
latest = self._read_queue()
if latest and latest[0].get("id") == head.get("id"):
self._rewrite_queue(latest[1:])
else:
# Fallback: drop-by-id anywhere in the queue.
self._rewrite_queue(
[e for e in latest if e.get("id") != head.get("id")]
)
+348
View File
@@ -0,0 +1,348 @@
"""Agent-facing tools for the google_meet plugin.
Tools:
meet_join join a Google Meet URL (spawns Playwright bot locally
OR on a remote node host via node=<name>)
meet_status report bot liveness + transcript progress
meet_transcript read the current transcript (optional last-N)
meet_leave signal the bot to leave cleanly
meet_say (v2) speak text through the realtime audio bridge.
Requires the active meeting to have been joined with
mode='realtime'.
"""
from __future__ import annotations
import json
from typing import Any, Dict, Optional
from plugins.google_meet import process_manager as pm
# ---------------------------------------------------------------------------
# Runtime gate
# ---------------------------------------------------------------------------
def check_meet_requirements() -> bool:
"""Return True when the plugin can actually run LOCALLY.
Gates on:
* Python ``playwright`` package importable
* the plugin being on a supported platform (Linux or macOS)
Note: remote-node operation (``node=<name>``) only needs the
``websockets`` dep on the gateway side Chromium lives on the node.
But the plugin-level gate keeps the v1 semantics; individual tool
handlers relax the requirement when a node is addressed.
"""
import platform as _p
if _p.system().lower() not in ("linux", "darwin"):
return False
try:
import playwright # noqa: F401
except ImportError:
return False
return True
# ---------------------------------------------------------------------------
# Node client helper
# ---------------------------------------------------------------------------
def _resolve_node_client(node: Optional[str]):
"""Return (NodeClient, node_name) for *node*, or (None, None) to run local.
Raises RuntimeError with a readable message if the node is named but
unresolvable, so the handler can surface a clear error to the agent.
"""
if node is None or node == "":
return None, None
from plugins.google_meet.node.registry import NodeRegistry
from plugins.google_meet.node.client import NodeClient
reg = NodeRegistry()
entry = reg.resolve(node if node != "auto" else None)
if entry is None:
raise RuntimeError(
f"no registered meet node matches {node!r}"
"run `hermes meet node approve <name> <url> <token>` first"
)
client = NodeClient(url=entry["url"], token=entry["token"])
return client, entry.get("name")
# ---------------------------------------------------------------------------
# Schemas
# ---------------------------------------------------------------------------
MEET_JOIN_SCHEMA: Dict[str, Any] = {
"name": "meet_join",
"description": (
"Join a Google Meet call and start scraping live captions into a "
"transcript file. Only meet.google.com URLs are accepted; no calendar "
"scanning, no auto-dial. Spawns a headless Chromium subprocess that "
"runs in parallel with the agent loop — returns immediately. Poll "
"with meet_status and read captions with meet_transcript. Reminder "
"to the agent: you should announce yourself in the meeting (there is "
"no automatic consent announcement)."
),
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": (
"Full https://meet.google.com/... URL. Required."
),
},
"mode": {
"type": "string",
"enum": ["transcribe", "realtime"],
"description": (
"transcribe (default): listen-only, scrape captions. "
"realtime: also enable agent speech via meet_say "
"(requires OpenAI Realtime key + platform audio bridge)."
),
},
"guest_name": {
"type": "string",
"description": (
"Display name to use when joining as guest. Defaults to "
"'Hermes Agent'."
),
},
"duration": {
"type": "string",
"description": (
"Optional max duration before auto-leave (e.g. '30m', "
"'2h', '90s'). Omit to stay until meet_leave is called."
),
},
"headed": {
"type": "boolean",
"description": (
"Run Chromium headed instead of headless (debug only). "
"Default false."
),
},
"node": {
"type": "string",
"description": (
"Name of a registered remote node to run the bot on "
"(useful when the gateway runs on a headless Linux box "
"but the user's Chrome with a signed-in Google profile "
"lives on their Mac). Pass 'auto' to use the single "
"registered node. Default: run locally. Nodes are "
"approved via `hermes meet node approve`."
),
},
},
"required": ["url"],
"additionalProperties": False,
},
}
MEET_STATUS_SCHEMA: Dict[str, Any] = {
"name": "meet_status",
"description": (
"Report the current Meet session state — whether the bot is alive, "
"has joined, is sitting in the lobby, number of transcript lines "
"captured, and last-caption timestamp."
),
"parameters": {
"type": "object",
"properties": {
"node": {"type": "string"},
},
"additionalProperties": False,
},
}
MEET_TRANSCRIPT_SCHEMA: Dict[str, Any] = {
"name": "meet_transcript",
"description": (
"Read the scraped transcript for the active Meet session. Returns "
"full transcript unless 'last' is set, in which case returns the last "
"N lines only."
),
"parameters": {
"type": "object",
"properties": {
"last": {
"type": "integer",
"description": (
"Optional: return only the last N caption lines. Useful "
"for polling during a meeting without re-reading the "
"whole transcript."
),
"minimum": 1,
},
"node": {"type": "string"},
},
"additionalProperties": False,
},
}
MEET_LEAVE_SCHEMA: Dict[str, Any] = {
"name": "meet_leave",
"description": (
"Leave the active Meet call cleanly, stop caption scraping, and "
"finalize the transcript file. Safe to call when no meeting is "
"active — returns ok=false with a reason."
),
"parameters": {
"type": "object",
"properties": {
"node": {"type": "string"},
},
"additionalProperties": False,
},
}
MEET_SAY_SCHEMA: Dict[str, Any] = {
"name": "meet_say",
"description": (
"Speak text into the active Meet call. Requires the active meeting "
"to have been joined with mode='realtime'. The text is queued to "
"the bot's OpenAI Realtime session; the generated audio is streamed "
"into Chrome's fake microphone via a virtual audio device "
"(PulseAudio null-sink on Linux, BlackHole on macOS). Returns "
"immediately — the actual speech lags by a couple of seconds."
),
"parameters": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to speak."},
"node": {"type": "string"},
},
"required": ["text"],
"additionalProperties": False,
},
}
# ---------------------------------------------------------------------------
# Handlers
# ---------------------------------------------------------------------------
def _json(obj: Any) -> str:
return json.dumps(obj, ensure_ascii=False)
def _err(msg: str, **extra) -> str:
return _json({"success": False, "error": msg, **extra})
def handle_meet_join(args: Dict[str, Any], **_kw) -> str:
url = (args.get("url") or "").strip()
if not url:
return _err("url is required")
mode = (args.get("mode") or "transcribe").strip().lower()
if mode not in ("transcribe", "realtime"):
return _err(f"mode must be 'transcribe' or 'realtime' (got {mode!r})")
node = args.get("node")
try:
client, node_name = _resolve_node_client(node)
except RuntimeError as e:
return _err(str(e))
if client is not None:
# Remote path — delegate to the node host.
try:
res = client.start_bot(
url=url,
guest_name=str(args.get("guest_name") or "Hermes Agent"),
duration=str(args.get("duration")) if args.get("duration") else None,
headed=bool(args.get("headed", False)),
mode=mode,
)
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
except Exception as e:
return _err(f"remote node start_bot failed: {e}", node=node_name)
# Local path — same as v1, with v2 params.
if not check_meet_requirements():
return _err(
"google_meet plugin prerequisites missing — install with "
"`pip install playwright && python -m playwright install "
"chromium`. Plugin is supported on Linux and macOS only."
)
res = pm.start(
url=url,
headed=bool(args.get("headed", False)),
guest_name=str(args.get("guest_name") or "Hermes Agent"),
duration=str(args.get("duration")) if args.get("duration") else None,
mode=mode,
)
return _json({"success": bool(res.get("ok")), **res})
def handle_meet_status(args: Dict[str, Any], **_kw) -> str:
try:
client, node_name = _resolve_node_client(args.get("node"))
except RuntimeError as e:
return _err(str(e))
if client is not None:
try:
res = client.status()
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
except Exception as e:
return _err(f"remote node status failed: {e}", node=node_name)
res = pm.status()
return _json({"success": bool(res.get("ok")), **res})
def handle_meet_transcript(args: Dict[str, Any], **_kw) -> str:
last = args.get("last")
try:
last_i = int(last) if last is not None else None
if last_i is not None and last_i < 1:
last_i = None
except (TypeError, ValueError):
last_i = None
try:
client, node_name = _resolve_node_client(args.get("node"))
except RuntimeError as e:
return _err(str(e))
if client is not None:
try:
res = client.transcript(last=last_i)
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
except Exception as e:
return _err(f"remote node transcript failed: {e}", node=node_name)
res = pm.transcript(last=last_i)
return _json({"success": bool(res.get("ok")), **res})
def handle_meet_leave(args: Dict[str, Any], **_kw) -> str:
try:
client, node_name = _resolve_node_client(args.get("node"))
except RuntimeError as e:
return _err(str(e))
if client is not None:
try:
res = client.stop()
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
except Exception as e:
return _err(f"remote node stop failed: {e}", node=node_name)
res = pm.stop(reason="agent called meet_leave")
return _json({"success": bool(res.get("ok")), **res})
def handle_meet_say(args: Dict[str, Any], **_kw) -> str:
text = (args.get("text") or "").strip()
if not text:
return _err("text is required")
try:
client, node_name = _resolve_node_client(args.get("node"))
except RuntimeError as e:
return _err(str(e))
if client is not None:
try:
res = client.say(text)
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
except Exception as e:
return _err(f"remote node say failed: {e}", node=node_name)
res = pm.enqueue_say(text)
return _json({"success": bool(res.get("ok")), **res})
+31 -15
View File
@@ -526,16 +526,24 @@ class HindsightMemoryProvider(MemoryProvider):
print("\n Configuring Hindsight memory:\n")
existing_config = self._config if isinstance(self._config, dict) else _load_config()
if not isinstance(existing_config, dict):
existing_config = {}
# Step 1: Mode selection
mode_values = ["cloud", "local_embedded", "local_external"]
mode_items = [
("Cloud", "Hindsight Cloud API (lightweight, just needs an API key)"),
("Local Embedded", "Run Hindsight locally (downloads ~200MB, needs LLM key)"),
("Local External", "Connect to an existing Hindsight instance"),
]
mode_idx = _curses_select(" Select mode", mode_items, default=0)
mode = ["cloud", "local_embedded", "local_external"][mode_idx]
existing_mode = existing_config.get("mode")
mode_default_idx = mode_values.index(existing_mode) if existing_mode in mode_values else 0
mode_idx = _curses_select(" Select mode", mode_items, default=mode_default_idx)
mode = mode_values[mode_idx]
provider_config: dict = {"mode": mode}
provider_config: dict = dict(existing_config)
provider_config["mode"] = mode
env_writes: dict = {}
# Step 2: Install/upgrade deps for selected mode
@@ -601,21 +609,29 @@ class HindsightMemoryProvider(MemoryProvider):
(p, f"default model: {_PROVIDER_DEFAULT_MODELS[p]}")
for p in providers_list
]
llm_idx = _curses_select(" Select LLM provider", llm_items, default=0)
existing_llm_provider = provider_config.get("llm_provider")
llm_default_idx = providers_list.index(existing_llm_provider) if existing_llm_provider in providers_list else 0
llm_idx = _curses_select(" Select LLM provider", llm_items, default=llm_default_idx)
llm_provider = providers_list[llm_idx]
provider_config["llm_provider"] = llm_provider
if llm_provider == "openai_compatible":
val = input(" LLM endpoint URL (e.g. http://192.168.1.10:8080/v1): ").strip()
existing_base_url = provider_config.get("llm_base_url", "")
prompt = " LLM endpoint URL (e.g. http://192.168.1.10:8080/v1)"
if existing_base_url:
prompt += f" [{existing_base_url}]"
prompt += ": "
val = input(prompt).strip()
if val:
provider_config["llm_base_url"] = val
elif llm_provider == "openrouter":
provider_config["llm_base_url"] = "https://openrouter.ai/api/v1"
default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
val = input(f" LLM model [{default_model}]: ").strip()
provider_config["llm_model"] = val or default_model
provider_default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
current_model = provider_config.get("llm_model") or provider_default_model
val = input(f" LLM model [{current_model}]: ").strip()
provider_config["llm_model"] = val or current_model
sys.stdout.write(" LLM API key: ")
sys.stdout.flush()
@@ -633,15 +649,16 @@ class HindsightMemoryProvider(MemoryProvider):
env_writes["HINDSIGHT_LLM_API_KEY"] = existing_llm_key
# Step 4: Save everything
provider_config["bank_id"] = "hermes"
provider_config["recall_budget"] = "mid"
# Read existing timeout from config if present, otherwise use default
existing_timeout = self._config.get("timeout") if self._config else None
timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
provider_config.setdefault("bank_id", "hermes")
provider_config.setdefault("recall_budget", "mid")
# Read existing timeout from config if present, otherwise use default.
# Preserve explicit 0 values instead of treating them as blank.
existing_timeout = provider_config.get("timeout")
timeout_val = existing_timeout if existing_timeout is not None else _DEFAULT_TIMEOUT
provider_config["timeout"] = timeout_val
env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
if mode == "local_embedded":
existing_idle_timeout = self._config.get("idle_timeout") if self._config else None
existing_idle_timeout = provider_config.get("idle_timeout")
idle_timeout_val = existing_idle_timeout if existing_idle_timeout is not None else _DEFAULT_IDLE_TIMEOUT
provider_config["idle_timeout"] = idle_timeout_val
env_writes["HINDSIGHT_IDLE_TIMEOUT"] = str(idle_timeout_val)
@@ -1204,7 +1221,6 @@ class HindsightMemoryProvider(MemoryProvider):
def _sync():
try:
client = self._get_client()
item = self._build_retain_kwargs(
content,
context=self._retain_context,
+81 -6
View File
@@ -22,6 +22,7 @@ import threading
import time
from typing import Any, Dict, List, Optional
from agent.memory_manager import sanitize_context
from agent.memory_provider import MemoryProvider
from tools.registry import tool_error
@@ -37,7 +38,10 @@ PROFILE_SCHEMA = {
"description": (
"Retrieve or update a peer card from Honcho — a curated list of key facts "
"about that peer (name, role, preferences, communication style, patterns). "
"Pass `card` to update; omit `card` to read."
"Pass `card` to update; omit `card` to read. If the card is empty, the "
"result includes a `hint` field explaining why (observation disabled, "
"fresh peer, dialectic layer still warming up, etc.) — this is NOT an "
"error. Peer cards accumulate over time from observed conversation."
),
"parameters": {
"type": "object",
@@ -1056,6 +1060,63 @@ class HonchoMemoryProvider(MemoryProvider):
return chunks
def _empty_profile_hint(self, peer: str) -> Dict[str, Any]:
"""Build a diagnostic hint when honcho_profile returns an empty card.
A literal "No profile facts available yet." tells the model nothing
about WHY. The model then often surfaces it to the user as a cryptic
error. This hint enumerates the likely causes so the model can
explain the situation (or retry with a different peer).
Ordered by likelihood for a typical deployment:
1. Observation is disabled for this peer
2. Card hasn't accumulated yet (fresh peer, not enough dialectic
cycles dialectic cadence runs every N turns)
3. Self-hosted Honcho backend doesn't support peer cards
(honcho-ai server < 3.x)
"""
cfg = self._config
reasons: List[str] = []
if cfg is not None:
if peer == "user":
observe_me = bool(getattr(cfg, "user_observe_me", True))
observe_others = bool(getattr(cfg, "user_observe_others", True))
else:
observe_me = bool(getattr(cfg, "ai_observe_me", True))
observe_others = bool(getattr(cfg, "ai_observe_others", True))
if not (observe_me or observe_others):
reasons.append(
f"observation is disabled for peer '{peer}' "
f"(user_observe_me/ai_observe_me in config)"
)
cadence = getattr(self, "_dialectic_cadence", 1)
turn = getattr(self, "_turn_count", 0)
if turn < max(2, cadence):
reasons.append(
f"this session has only {turn} turn(s); peer cards accumulate "
f"as the dialectic layer reasons over conversation history "
f"(cadence every {cadence} turn(s))"
)
if not reasons:
reasons.append(
"peer card has no facts yet — Honcho's dialectic layer builds "
"this over time from observed turns; self-hosted Honcho < 3.x "
"does not support peer cards at all"
)
return {
"result": "No profile facts available yet.",
"hint": (
"This is not an error. "
+ "; ".join(reasons)
+ ". Try honcho_reasoning for a synthesized answer, or "
"honcho_search to query raw conversation excerpts."
),
}
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Record the conversation turn in Honcho (non-blocking).
@@ -1068,13 +1129,15 @@ class HonchoMemoryProvider(MemoryProvider):
return
msg_limit = self._config.message_max_chars if self._config else 25000
clean_user_content = sanitize_context(user_content or "").strip()
clean_assistant_content = sanitize_context(assistant_content or "").strip()
def _sync():
try:
session = self._manager.get_or_create(self._session_key)
for chunk in self._chunk_message(user_content, msg_limit):
for chunk in self._chunk_message(clean_user_content, msg_limit):
session.add_message("user", chunk)
for chunk in self._chunk_message(assistant_content, msg_limit):
for chunk in self._chunk_message(clean_assistant_content, msg_limit):
session.add_message("assistant", chunk)
self._manager._flush_session(session)
except Exception as e:
@@ -1087,8 +1150,20 @@ class HonchoMemoryProvider(MemoryProvider):
)
self._sync_thread.start()
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in user profile writes as Honcho conclusions."""
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Mirror built-in user profile writes as Honcho conclusions.
``metadata`` is accepted for compatibility with the write-origin
work landed in main (commit 6a957a74); it's not yet threaded into
the Honcho conclusion payload. Left as a follow-up so this PR
stays focused on the 7-PR consolidation and its review follow-ups.
"""
if action != "add" or target != "user" or not content:
return
if self._cron_skipped:
@@ -1154,7 +1229,7 @@ class HonchoMemoryProvider(MemoryProvider):
return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
card = self._manager.get_peer_card(self._session_key, peer=peer)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps(self._empty_profile_hint(peer))
return json.dumps({"result": card})
elif tool_name == "honcho_search":
+31 -2
View File
@@ -273,9 +273,38 @@ def _write_config(cfg: dict, path: Path | None = None) -> None:
def _resolve_api_key(cfg: dict) -> str:
"""Resolve API key with host -> root -> env fallback."""
"""Resolve API key with host -> root -> env fallback.
For self-hosted instances configured with ``baseUrl`` instead of an API
key, returns ``"local"`` so that credential guards throughout the CLI
don't reject a valid configuration. The ``baseUrl`` is scheme-validated
(http/https only) so that a typo like ``baseUrl: true`` can't silently
pass the guard. Schemeless strings that look like host:port (legacy
config shapes, e.g. ``localhost:8000``) still pass the Honcho SDK
will reject them itself with a clearer error than ours.
"""
host_key = ((cfg.get("hosts") or {}).get(_host_key()) or {}).get("apiKey")
return host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
key = host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
if not key:
base_url = cfg.get("baseUrl") or cfg.get("base_url") or os.environ.get("HONCHO_BASE_URL", "")
base_url = (base_url or "").strip()
if base_url:
from urllib.parse import urlparse
try:
parsed = urlparse(base_url)
except (TypeError, ValueError):
parsed = None
if parsed and parsed.scheme in ("http", "https") and parsed.netloc:
return "local"
# Schemeless but looks like a host (contains '.' or ':' and isn't
# a boolean literal): let it through so legacy configs don't
# regress into "no API key configured" when they previously worked.
lowered = base_url.lower()
if lowered not in ("true", "false", "none", "null") and any(
c in base_url for c in ".:"
) and not base_url.isdigit():
return "local"
return key
def _prompt(label: str, default: str | None = None, secret: bool = False) -> str:
+67 -3
View File
@@ -16,6 +16,7 @@ from __future__ import annotations
import json
import os
import logging
import hashlib
from dataclasses import dataclass, field
from pathlib import Path
@@ -27,7 +28,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
GLOBAL_CONFIG_PATH = Path.home() / ".honcho" / "config.json"
HOST = "hermes"
@@ -53,6 +53,11 @@ def resolve_active_host() -> str:
return HOST
def resolve_global_config_path() -> Path:
"""Return the shared Honcho config path for the current HOME."""
return Path.home() / ".honcho" / "config.json"
def resolve_config_path() -> Path:
"""Return the active Honcho config path.
@@ -72,7 +77,7 @@ def resolve_config_path() -> Path:
if default_path != local_path and default_path.exists():
return default_path
return GLOBAL_CONFIG_PATH
return resolve_global_config_path()
_RECALL_MODE_ALIASES = {"auto": "hybrid"}
@@ -138,6 +143,15 @@ def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] |
return None
# Default HTTP timeout (seconds) applied when no explicit timeout is
# configured via HonchoClientConfig.timeout, honcho.timeout / requestTimeout,
# or HONCHO_TIMEOUT. Honcho calls happen on the post-response path of
# run_conversation; without a cap the agent can block indefinitely when
# the Honcho backend is unreachable, preventing the gateway from
# delivering the already-generated response.
_DEFAULT_HTTP_TIMEOUT = 30.0
def _resolve_optional_float(*values: Any) -> float | None:
"""Return the first non-empty value coerced to a positive float."""
for value in values:
@@ -226,6 +240,13 @@ class HonchoClientConfig:
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
# When True, ``peer_name`` wins over any gateway-supplied runtime
# identity (Telegram UID, Discord ID, …) when resolving the user peer.
# This keeps memory unified across platforms for single-user deployments
# where Honcho's one peer-name is an unambiguous identity — otherwise
# each platform would fork memory into its own peer (#14984). Default
# ``False`` preserves existing multi-user behaviour.
pin_peer_name: bool = False
# Toggles
enabled: bool = False
save_messages: bool = True
@@ -420,6 +441,11 @@ class HonchoClientConfig:
timeout=timeout,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
pin_peer_name=_resolve_bool(
host_block.get("pinPeerName"),
raw.get("pinPeerName"),
default=False,
),
enabled=enabled,
save_messages=save_messages,
write_frequency=write_frequency,
@@ -522,6 +548,39 @@ class HonchoClientConfig:
pass
return None
# Honcho enforces a 100-char limit on session IDs. Long gateway session keys
# (Matrix "!room:server" + thread event IDs, Telegram supergroup reply
# chains, Slack thread IDs with long workspace prefixes) can overflow this
# limit after sanitization; the Honcho API then rejects every call for that
# session with "session_id too long". See issue #13868.
_HONCHO_SESSION_ID_MAX_LEN = 100
_HONCHO_SESSION_ID_HASH_LEN = 8
@classmethod
def _enforce_session_id_limit(cls, sanitized: str, original: str) -> str:
"""Truncate a sanitized session ID to Honcho's 100-char limit.
The common case (short keys) short-circuits with no modification.
For over-limit keys, keep a prefix of the sanitized ID and append a
deterministic ``-<sha256 prefix>`` suffix so two distinct long keys
that share a leading segment don't collide onto the same truncated ID.
The hash is taken over the *original* pre-sanitization key, so two
inputs that sanitize to the same string still collide intentionally
(same logical session), but two inputs that only share a prefix do not.
"""
max_len = cls._HONCHO_SESSION_ID_MAX_LEN
if len(sanitized) <= max_len:
return sanitized
hash_len = cls._HONCHO_SESSION_ID_HASH_LEN
digest = hashlib.sha256(original.encode("utf-8")).hexdigest()[:hash_len]
# max_len - hash_len - 1 (for the '-' separator) chars of the sanitized
# prefix, then '-<hash>'. Strip any trailing hyphen from the prefix so
# the result doesn't double up on separators.
prefix_len = max_len - hash_len - 1
prefix = sanitized[:prefix_len].rstrip("-")
return f"{prefix}-{digest}"
def resolve_session_name(
self,
cwd: str | None = None,
@@ -566,7 +625,7 @@ class HonchoClientConfig:
if gateway_session_key:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
if sanitized:
return sanitized
return self._enforce_session_id_limit(sanitized, gateway_session_key)
# per-session: inherit Hermes session_id (new Honcho session each run)
if self.session_strategy == "per-session" and session_id:
@@ -646,6 +705,11 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
except Exception:
pass
# Fall back to the default so an unconfigured install cannot hang
# indefinitely on a stalled Honcho request.
if resolved_timeout is None:
resolved_timeout = _DEFAULT_HTTP_TIMEOUT
if resolved_base_url:
logger.info("Initializing Honcho client (base_url: %s, workspace: %s)", resolved_base_url, config.workspace_id)
else:
+57 -32
View File
@@ -95,6 +95,7 @@ class HonchoSessionManager:
self._config = config
self._runtime_user_peer_name = runtime_user_peer_name
self._cache: dict[str, HonchoSession] = {}
self._cache_lock = threading.RLock()
self._peers_cache: dict[str, Any] = {}
self._sessions_cache: dict[str, Any] = {}
@@ -273,17 +274,35 @@ class HonchoSessionManager:
Returns:
The session.
"""
if key in self._cache:
logger.debug("Local session cache hit: %s", key)
return self._cache[key]
with self._cache_lock:
if key in self._cache:
logger.debug("Local session cache hit: %s", key)
return self._cache[key]
# Gateway sessions should use the runtime user identity when available.
if self._runtime_user_peer_name:
# Determine peer IDs — no lock needed (read-only, no shared state mutation).
# Gateway sessions normally use the runtime user identity (the
# platform-native ID: Telegram UID, Discord snowflake, Slack user,
# etc.) so multi-user bots scope memory per user. For a single-user
# deployment the config-supplied ``peer_name`` is an unambiguous
# identity and we should keep it unified across platforms — see
# #14984. Opt into that with ``hosts.<host>.pinPeerName: true`` in
# ``honcho.json`` (or root-level ``pinPeerName: true``).
# `is True` (not `bool(...)`) is deliberate: several multi-user tests
# pass a ``MagicMock`` for ``config`` where ``mock.pin_peer_name``
# silently returns another MagicMock — truthy by default. Requiring
# strict ``True`` keeps pinning as opt-in even for callers that
# haven't updated their mocks yet; real configs built via
# ``from_global_config`` always produce a proper boolean.
pin_peer_name = (
self._config is not None
and bool(getattr(self._config, "peer_name", None))
and getattr(self._config, "pin_peer_name", False) is True
)
if self._runtime_user_peer_name and not pin_peer_name:
user_peer_id = self._sanitize_id(self._runtime_user_peer_name)
elif self._config and self._config.peer_name:
user_peer_id = self._sanitize_id(self._config.peer_name)
else:
# Fallback: derive from session key
parts = key.split(":", 1)
channel = parts[0] if len(parts) > 1 else "default"
chat_id = parts[1] if len(parts) > 1 else key
@@ -293,19 +312,14 @@ class HonchoSessionManager:
self._config.ai_peer if self._config else "hermes-assistant"
)
# Sanitize session ID for Honcho
# All expensive I/O outside the lock — Honcho's persistence is source of truth
honcho_session_id = self._sanitize_id(key)
# Get or create peers
user_peer = self._get_or_create_peer(user_peer_id)
assistant_peer = self._get_or_create_peer(assistant_peer_id)
# Get or create Honcho session
honcho_session, existing_messages = self._get_or_create_honcho_session(
honcho_session_id, user_peer, assistant_peer
)
# Convert Honcho messages to local format
local_messages = []
for msg in existing_messages:
role = "assistant" if msg.peer_id == assistant_peer_id else "user"
@@ -313,10 +327,9 @@ class HonchoSessionManager:
"role": role,
"content": msg.content,
"timestamp": msg.created_at.isoformat() if msg.created_at else "",
"_synced": True, # Already in Honcho
"_synced": True,
})
# Create local session wrapper with existing messages
session = HonchoSession(
key=key,
user_peer_id=user_peer_id,
@@ -325,7 +338,9 @@ class HonchoSessionManager:
messages=local_messages,
)
self._cache[key] = session
# Write to cache under lock — only one writer wins
with self._cache_lock:
self._cache[key] = session
return session
def _flush_session(self, session: HonchoSession) -> bool:
@@ -356,13 +371,15 @@ class HonchoSessionManager:
for msg in new_messages:
msg["_synced"] = True
logger.debug("Synced %d messages to Honcho for %s", len(honcho_messages), session.key)
self._cache[session.key] = session
with self._cache_lock:
self._cache[session.key] = session
return True
except Exception as e:
for msg in new_messages:
msg["_synced"] = False
logger.error("Failed to sync messages to Honcho: %s", e)
self._cache[session.key] = session
with self._cache_lock:
self._cache[session.key] = session
return False
def _async_writer_loop(self) -> None:
@@ -434,7 +451,9 @@ class HonchoSessionManager:
Called at session end for "session" write_frequency, or to force
a sync before process exit regardless of mode.
"""
for session in list(self._cache.values()):
with self._cache_lock:
sessions = list(self._cache.values())
for session in sessions:
try:
self._flush_session(session)
except Exception as e:
@@ -459,9 +478,10 @@ class HonchoSessionManager:
def delete(self, key: str) -> bool:
"""Delete a session from local cache."""
if key in self._cache:
del self._cache[key]
return True
with self._cache_lock:
if key in self._cache:
del self._cache[key]
return True
return False
def new_session(self, key: str) -> HonchoSession:
@@ -473,20 +493,25 @@ class HonchoSessionManager:
"""
import time
# Remove old session from caches (but don't delete from Honcho)
old_session = self._cache.pop(key, None)
if old_session:
self._sessions_cache.pop(old_session.honcho_session_id, None)
# Hold the reentrant lock across get_or_create so a concurrent caller
# can't observe the (old-popped, new-not-yet-inserted) gap and create
# its own session under the raw key. `_cache_lock` is an RLock so
# nested reacquisition inside get_or_create is safe.
with self._cache_lock:
# Remove old session from caches (but don't delete from Honcho)
old_session = self._cache.pop(key, None)
if old_session:
self._sessions_cache.pop(old_session.honcho_session_id, None)
# Create new session with timestamp suffix
timestamp = int(time.time())
new_key = f"{key}:{timestamp}"
# Create new session with timestamp suffix
timestamp = int(time.time())
new_key = f"{key}:{timestamp}"
# get_or_create will create a fresh session
session = self.get_or_create(new_key)
# get_or_create will create a fresh session
session = self.get_or_create(new_key)
# Cache under the original key so callers find it by the expected name
self._cache[key] = session
# Cache under the original key so callers find it by the expected name
self._cache[key] = session
logger.info("Created new session for %s (honcho: %s)", key, session.honcho_session_id)
return session
+1 -1
View File
@@ -43,7 +43,7 @@ dev = ["debugpy>=1.8.0,<2", "pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "py
messaging = ["python-telegram-bot[webhooks]>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4", "qrcode>=7.0,<8"]
cron = ["croniter>=6.0.0,<7"]
slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4", "aiosqlite>=0.20", "asyncpg>=0.29"]
matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4", "aiosqlite>=0.20", "asyncpg>=0.29", "aiohttp-socks>=0.10,<1"]
cli = ["simple-term-menu>=1.0,<2"]
tts-premium = ["elevenlabs>=1.0,<2"]
voice = [
+450 -45
View File
@@ -74,18 +74,25 @@ from model_tools import (
check_toolset_requirements,
)
from tools.terminal_tool import cleanup_vm, get_active_env, is_persistent_env
from tools.terminal_tool import (
set_approval_callback as _set_approval_callback,
set_sudo_password_callback as _set_sudo_password_callback,
_get_approval_callback,
_get_sudo_password_callback,
)
from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget
from tools.interrupt import set_interrupt as _set_interrupt
from tools.browser_tool import cleanup_browser
# Agent internals extracted to agent/ package for modularity
from agent.memory_manager import build_memory_context_block, sanitize_context
from agent.memory_manager import StreamingContextScrubber, build_memory_context_block, sanitize_context
from agent.retry_utils import jittered_backoff
from agent.error_classifier import classify_api_error, FailoverReason
from agent.prompt_builder import (
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
HERMES_AGENT_HELP_GUIDANCE,
build_nous_subscription_prompt,
)
from agent.model_metadata import (
@@ -1211,6 +1218,10 @@ class AIAgent:
# Deferred paragraph break flag — set after tool iterations so a
# single "\n\n" is prepended to the next real text delta.
self._stream_needs_break = False
# Stateful scrubber for <memory-context> spans split across stream
# deltas (#5719). sanitize_context() alone can't survive chunk
# boundaries because the block regex needs both tags in one string.
self._stream_context_scrubber = StreamingContextScrubber()
# Visible assistant text already delivered through live token callbacks
# during the current model response. Used to avoid re-sending the same
# commentary when the provider later returns it as a completed interim
@@ -2416,7 +2427,10 @@ class AIAgent:
if not self.compression_enabled:
return
try:
from agent.auxiliary_client import get_text_auxiliary_client
from agent.auxiliary_client import (
_resolve_task_provider_model,
get_text_auxiliary_client,
)
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
get_model_context_length,
@@ -2426,6 +2440,14 @@ class AIAgent:
"compression",
main_runtime=self._current_main_runtime(),
)
# Best-effort aux provider label for the warning message. The
# configured provider may be "auto", in which case we fall back
# to the client's base_url hostname so the user can still tell
# where the compression model is actually being called.
try:
_aux_cfg_provider, _, _, _, _ = _resolve_task_provider_model("compression")
except Exception:
_aux_cfg_provider = ""
if client is None or not aux_model:
msg = (
"⚠ No auxiliary LLM provider configured — context "
@@ -2492,10 +2514,37 @@ class AIAgent:
new_threshold / main_ctx
)
safe_pct = int((aux_context / main_ctx) * 100) if main_ctx else 50
# Build human-readable "model (provider)" labels for both
# the main model and the compression model so users can
# tell at a glance which provider each side is actually
# using. When the configured provider is empty or "auto",
# fall back to the client's base_url hostname.
_main_model = getattr(self, "model", "") or "?"
_main_provider = getattr(self, "provider", "") or ""
_aux_provider_label = (
_aux_cfg_provider
if _aux_cfg_provider and _aux_cfg_provider != "auto"
else ""
)
if not _aux_provider_label:
try:
from urllib.parse import urlparse
_aux_provider_label = (
urlparse(aux_base_url).hostname or aux_base_url
)
except Exception:
_aux_provider_label = aux_base_url or "auto"
_main_label = (
f"{_main_model} ({_main_provider})"
if _main_provider
else _main_model
)
_aux_label = f"{aux_model} ({_aux_provider_label})"
msg = (
f"⚠ Compression model ({aux_model}) context is "
f"{aux_context:,} tokens, but the main model's "
f"compression threshold was {old_threshold:,} tokens. "
f"⚠ Compression model {_aux_label} context is "
f"{aux_context:,} tokens, but the main model "
f"{_main_label}'s compression threshold was "
f"{old_threshold:,} tokens. "
f"Auto-lowered this session's threshold to "
f"{new_threshold:,} tokens so compression can run.\n"
f" To make this permanent, edit config.yaml — either:\n"
@@ -3240,6 +3289,21 @@ class AIAgent:
def _run_review():
import contextlib
# Install a non-interactive approval callback on this worker
# thread so any dangerous-command guard the review agent trips
# resolves to "deny" instead of falling back to input() -- which
# deadlocks against the parent's prompt_toolkit TUI (#15216).
# Same pattern as _subagent_auto_deny in tools/delegate_tool.py.
def _bg_review_auto_deny(command, description, **kwargs):
logger.warning(
"Background review auto-denied dangerous command: %s (%s)",
command, description,
)
return "deny"
try:
_set_approval_callback(_bg_review_auto_deny)
except Exception:
pass
review_agent = None
try:
with open(os.devnull, "w") as _devnull, \
@@ -3265,6 +3329,7 @@ class AIAgent:
api_key=_parent_runtime.get("api_key") or None,
credential_pool=getattr(self, "_credential_pool", None),
parent_session_id=self.session_id,
enabled_toolsets=["memory", "skills"],
)
review_agent._memory_write_origin = "background_review"
review_agent._memory_write_context = "background_review"
@@ -3321,6 +3386,12 @@ class AIAgent:
review_agent.close()
except Exception:
pass
# Clear the approval callback on this bg-review thread so a
# recycled thread-id doesn't inherit a stale reference.
try:
_set_approval_callback(None)
except Exception:
pass
t = threading.Thread(target=_run_review, daemon=True, name="bg-review")
t.start()
@@ -4498,6 +4569,9 @@ class AIAgent:
# Fallback to hardcoded identity
prompt_parts = [DEFAULT_AGENT_IDENTITY]
# Pointer to the hermes-agent skill + docs for user questions about Hermes itself.
prompt_parts.append(HERMES_AGENT_HELP_GUIDANCE)
# Tool-aware behavioral guidance: only inject when the tools are loaded
tool_guidance = []
if "memory" in self.valid_tool_names:
@@ -5226,7 +5300,39 @@ class AIAgent:
logger.debug("Dead connection check error: %s", exc)
return False
def _create_request_openai_client(self, *, reason: str) -> Any:
@staticmethod
def _api_kwargs_have_image_parts(api_kwargs: dict) -> bool:
"""Return True when the outbound request still contains native image parts."""
if not isinstance(api_kwargs, dict):
return False
candidates = []
messages = api_kwargs.get("messages")
if isinstance(messages, list):
candidates.extend(messages)
# Responses API payloads use `input`; after conversion, image parts can
# still be present there instead of in `messages`.
response_input = api_kwargs.get("input")
if isinstance(response_input, list):
candidates.extend(response_input)
def _contains_image(value: Any) -> bool:
if isinstance(value, dict):
ptype = value.get("type")
if ptype in {"image_url", "input_image"}:
return True
return any(_contains_image(v) for v in value.values())
if isinstance(value, list):
return any(_contains_image(v) for v in value)
return False
return any(_contains_image(item) for item in candidates)
def _copilot_headers_for_request(self, *, is_vision: bool) -> dict:
from hermes_cli.copilot_auth import copilot_request_headers
return copilot_request_headers(is_agent_turn=True, is_vision=is_vision)
def _create_request_openai_client(self, *, reason: str, api_kwargs: Optional[dict] = None) -> Any:
from unittest.mock import Mock
primary_client = self._ensure_primary_openai_client(reason=reason)
@@ -5234,6 +5340,11 @@ class AIAgent:
return primary_client
with self._openai_client_lock():
request_kwargs = dict(self._client_kwargs)
if (
base_url_host_matches(str(request_kwargs.get("base_url", "")), "api.githubcopilot.com")
and self._api_kwargs_have_image_parts(api_kwargs or {})
):
request_kwargs["default_headers"] = self._copilot_headers_for_request(is_vision=True)
return self._create_openai_client(request_kwargs, reason=reason, shared=False)
def _close_request_openai_client(self, client: Any, *, reason: str) -> None:
@@ -5776,7 +5887,10 @@ class AIAgent:
def _call():
try:
if self.api_mode == "codex_responses":
request_client_holder["client"] = self._create_request_openai_client(reason="codex_stream_request")
request_client_holder["client"] = self._create_request_openai_client(
reason="codex_stream_request",
api_kwargs=api_kwargs,
)
result["response"] = self._run_codex_stream(
api_kwargs,
client=request_client_holder["client"],
@@ -5808,7 +5922,10 @@ class AIAgent:
raise
result["response"] = normalize_converse_response(raw_response)
else:
request_client_holder["client"] = self._create_request_openai_client(reason="chat_completion_request")
request_client_holder["client"] = self._create_request_openai_client(
reason="chat_completion_request",
api_kwargs=api_kwargs,
)
result["response"] = request_client_holder["client"].chat.completions.create(**api_kwargs)
except Exception as e:
result["error"] = e
@@ -5906,6 +6023,20 @@ class AIAgent:
def _reset_stream_delivery_tracking(self) -> None:
"""Reset tracking for text delivered during the current model response."""
# Flush any benign partial-tag tail held by the context scrubber so it
# reaches the UI before we clear state for the next model call. If
# the scrubber is mid-span, flush() drops the orphaned content.
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
tail = scrubber.flush()
if tail:
callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
for cb in callbacks:
try:
cb(tail)
except Exception:
pass
self._record_streamed_assistant_text(tail)
self._current_streamed_assistant_text = ""
def _record_streamed_assistant_text(self, text: str) -> None:
@@ -5956,6 +6087,28 @@ class AIAgent:
if getattr(self, "_stream_needs_break", False) and text and text.strip():
self._stream_needs_break = False
text = "\n\n" + text
prepended_break = True
else:
prepended_break = False
if isinstance(text, str):
# Strip <think> blocks first (per-delta is safe for closed pairs; the
# unterminated-tag path is handled downstream by stream_consumer).
# Then feed through the stateful context scrubber so memory-context
# spans split across chunks cannot leak to the UI (#5719).
text = self._strip_think_blocks(text or "")
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
text = scrubber.feed(text)
else:
# Defensive: legacy callers without the scrubber attribute.
text = sanitize_context(text)
# Only strip leading newlines on the first delta — mid-stream "\n" is legitimate markdown.
if not prepended_break and not getattr(
self, "_current_streamed_assistant_text", ""
):
text = text.lstrip("\n")
if not text:
return
callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
delivered = False
for cb in callbacks:
@@ -6151,7 +6304,8 @@ class AIAgent:
),
}
request_client_holder["client"] = self._create_request_openai_client(
reason="chat_completion_stream_request"
reason="chat_completion_stream_request",
api_kwargs=stream_kwargs,
)
# Reset stale-stream timer so the detector measures from this
# attempt's start, not a previous attempt's last chunk.
@@ -7283,6 +7437,26 @@ class AIAgent:
self._anthropic_image_fallback_cache[cache_key] = note
return note
def _model_supports_vision(self) -> bool:
"""Return True if the active provider+model reports native vision.
Used to decide whether to strip image content parts from API-bound
messages (for non-vision models) or let the provider adapter handle
them natively (for vision-capable models).
"""
try:
from agent.models_dev import get_model_capabilities
provider = (getattr(self, "provider", "") or "").strip()
model = (getattr(self, "model", "") or "").strip()
if not provider or not model:
return False
caps = get_model_capabilities(provider, model)
if caps is None:
return False
return bool(caps.supports_vision)
except Exception:
return False
def _preprocess_anthropic_content(self, content: Any, role: str) -> Any:
if not self._content_has_image_parts(content):
return content
@@ -7346,12 +7520,23 @@ class AIAgent:
return t
def _prepare_anthropic_messages_for_api(self, api_messages: list) -> list:
# Fast exit when no message carries image content at all.
if not any(
isinstance(msg, dict) and self._content_has_image_parts(msg.get("content"))
for msg in api_messages
):
return api_messages
# The Anthropic adapter (agent/anthropic_adapter.py:_convert_content_part_to_anthropic)
# already translates OpenAI-style image_url/input_image parts into
# native Anthropic ``{"type": "image", "source": ...}`` blocks. When
# the active model supports vision we let the adapter do its job and
# skip this legacy text-fallback preprocessor entirely.
if self._model_supports_vision():
return api_messages
# Non-vision Anthropic model (rare today, but keep the fallback for
# compat): replace each image part with a vision_analyze text note.
transformed = copy.deepcopy(api_messages)
for msg in transformed:
if not isinstance(msg, dict):
@@ -7362,6 +7547,150 @@ class AIAgent:
)
return transformed
def _prepare_messages_for_non_vision_model(self, api_messages: list) -> list:
"""Strip native image parts when the active model lacks vision.
Runs on the chat.completions / codex_responses paths. Vision-capable
models pass through unchanged (provider and any downstream translator
handle the image parts natively). Non-vision models get each image
replaced by a cached vision_analyze text description so the turn
doesn't fail with "model does not support image input".
"""
if not any(
isinstance(msg, dict) and self._content_has_image_parts(msg.get("content"))
for msg in api_messages
):
return api_messages
if self._model_supports_vision():
return api_messages
transformed = copy.deepcopy(api_messages)
for msg in transformed:
if not isinstance(msg, dict):
continue
# Reuse the Anthropic text-fallback preprocessor — the behaviour is
# identical (walk content parts, replace images with cached
# descriptions, merge back into a single text or structured
# content). Naming is historical.
msg["content"] = self._preprocess_anthropic_content(
msg.get("content"),
str(msg.get("role", "user") or "user"),
)
return transformed
def _try_shrink_image_parts_in_messages(self, api_messages: list) -> bool:
"""Re-encode all native image parts at a smaller size to recover from
image-too-large errors (Anthropic 5 MB, unknown other providers).
Mutates ``api_messages`` in place. Returns True if any image part was
actually replaced, False if there were no image parts to shrink or
Pillow couldn't help (caller should surface the original error).
Strategy: look for ``image_url`` / ``input_image`` parts carrying a
``data:image/...;base64,...`` payload. For each one whose encoded
size exceeds 4 MB (a safe target that slides under Anthropic's 5 MB
ceiling with header overhead), write the base64 to a tempfile, call
``vision_tools._resize_image_for_vision`` to produce a smaller data
URL, and substitute it in place.
Non-data-URL images (http/https URLs) are not touched the provider
fetches those itself and the size limit is different.
"""
if not api_messages:
return False
try:
from tools.vision_tools import _resize_image_for_vision
except Exception as exc:
logger.warning("image-shrink recovery: vision_tools unavailable — %s", exc)
return False
# 4 MB target leaves comfortable headroom under Anthropic's 5 MB.
# Non-Anthropic providers we haven't observed rejecting are fine with
# much larger; shrinking to 4 MB here loses quality but only fires
# after a confirmed provider rejection, so the alternative is failure.
target_bytes = 4 * 1024 * 1024
changed_count = 0
def _shrink_data_url(url: str) -> Optional[str]:
"""Return a smaller data URL, or None if shrink can't help."""
if not isinstance(url, str) or not url.startswith("data:"):
return None
if len(url) <= target_bytes:
# This specific image wasn't the oversized one.
return None
try:
header, _, data = url.partition(",")
mime = "image/jpeg"
if header.startswith("data:"):
mime_part = header[len("data:"):].split(";", 1)[0].strip()
if mime_part.startswith("image/"):
mime = mime_part
import base64 as _b64
raw = _b64.b64decode(data)
suffix = {
"image/png": ".png", "image/gif": ".gif", "image/webp": ".webp",
"image/jpeg": ".jpg", "image/jpg": ".jpg", "image/bmp": ".bmp",
}.get(mime, ".jpg")
tmp = tempfile.NamedTemporaryFile(
prefix="hermes_shrink_", suffix=suffix, delete=False,
)
try:
tmp.write(raw)
tmp.close()
resized = _resize_image_for_vision(
Path(tmp.name),
mime_type=mime,
max_base64_bytes=target_bytes,
)
finally:
try:
Path(tmp.name).unlink(missing_ok=True)
except Exception:
pass
if not resized or len(resized) >= len(url):
# Shrink didn't help (or made it bigger — corrupt input?).
return None
return resized
except Exception as exc:
logger.warning("image-shrink recovery: re-encode failed — %s", exc)
return None
for msg in api_messages:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if not isinstance(content, list):
continue
for part in content:
if not isinstance(part, dict):
continue
ptype = part.get("type")
if ptype not in {"image_url", "input_image"}:
continue
image_value = part.get("image_url")
# OpenAI chat.completions: {"image_url": {"url": "data:..."}}
# OpenAI Responses: {"image_url": "data:..."}
if isinstance(image_value, dict):
url = image_value.get("url", "")
resized = _shrink_data_url(url)
if resized:
image_value["url"] = resized
changed_count += 1
elif isinstance(image_value, str):
resized = _shrink_data_url(image_value)
if resized:
part["image_url"] = resized
changed_count += 1
if changed_count:
logger.info(
"image-shrink recovery: re-encoded %d image part(s) to fit under %.0f MB",
changed_count, target_bytes / (1024 * 1024),
)
return changed_count > 0
def _anthropic_preserve_dots(self) -> bool:
"""True when using an anthropic-compatible endpoint that preserves dots in model names.
Alibaba/DashScope keeps dots (e.g. qwen3.5-plus).
@@ -7510,9 +7839,10 @@ class AIAgent:
)
)
is_xai_responses = self.provider == "xai" or self._base_url_hostname == "api.x.ai"
_msgs_for_codex = self._prepare_messages_for_non_vision_model(api_messages)
return _ct.build_kwargs(
model=self.model,
messages=api_messages,
messages=_msgs_for_codex,
tools=self.tools,
reasoning_config=self.reasoning_config,
session_id=getattr(self, "session_id", None),
@@ -7591,9 +7921,12 @@ class AIAgent:
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None
# Strip image parts for non-vision models (no-op when vision-capable).
_msgs_for_chat = self._prepare_messages_for_non_vision_model(api_messages)
return _ct.build_kwargs(
model=self.model,
messages=api_messages,
messages=_msgs_for_chat,
tools=self.tools,
timeout=self._resolved_api_call_timeout(),
max_tokens=self.max_tokens,
@@ -7890,39 +8223,45 @@ class AIAgent:
api_msg["reasoning_content"] = existing
return
# 2. Healthy session: promote 'reasoning' field to 'reasoning_content'
needs_thinking_pad = (
self._needs_kimi_tool_reasoning()
or self._needs_deepseek_tool_reasoning()
)
# 2. Cross-provider poisoned history (#15748): on DeepSeek/Kimi,
# if the source turn has tool_calls AND a 'reasoning' field but no
# 'reasoning_content' key, the 'reasoning' text was written by a
# prior provider (e.g. MiniMax) — DeepSeek's own _build_assistant_message
# always pins reasoning_content="" at creation time for tool-call turns,
# so the shape (reasoning set, reasoning_content absent, tool_calls
# present) is unreachable from same-provider DeepSeek history. Inject
# "" to satisfy the API without leaking another provider's chain of
# thought to DeepSeek/Kimi.
normalized_reasoning = source_msg.get("reasoning")
if (
needs_thinking_pad
and source_msg.get("tool_calls")
and isinstance(normalized_reasoning, str)
and normalized_reasoning
):
api_msg["reasoning_content"] = ""
return
# 3. Healthy session: promote 'reasoning' field to 'reasoning_content'
# for providers that use the internal 'reasoning' key.
# This must happen BEFORE the DeepSeek/Kimi tool-call check so that
# genuine reasoning content is not overwritten by the empty-string
# fallback (#15812 regression in PR #15478).
normalized_reasoning = source_msg.get("reasoning")
if isinstance(normalized_reasoning, str) and normalized_reasoning:
api_msg["reasoning_content"] = normalized_reasoning
return
# 3. DeepSeek / Kimi thinking mode: tool-call turns that lack
# reasoning_content are "poisoned history" — a prior provider (MiniMax,
# etc.) left them empty. DeepSeek returns HTTP 400 if reasoning_content
# is absent on replay; inject "" to satisfy the provider's requirement
# without forwarding any cross-provider reasoning content.
needs_empty_reasoning = (
source_msg.get("tool_calls")
and (
self._needs_kimi_tool_reasoning()
or self._needs_deepseek_tool_reasoning()
)
)
if needs_empty_reasoning:
api_msg["reasoning_content"] = ""
return
# 4. DeepSeek / Kimi thinking mode: all assistant messages need
# reasoning_content. Inject "" to satisfy the provider's requirement
# when no explicit reasoning content is present.
if (
self._needs_kimi_tool_reasoning()
or self._needs_deepseek_tool_reasoning()
):
# when no explicit reasoning content is present. Covers both
# tool-call turns (already-poisoned history with no reasoning at all)
# and plain text turns.
if needs_thinking_pad:
api_msg["reasoning_content"] = ""
return
@@ -8121,6 +8460,23 @@ class AIAgent:
f"⚠ Compression summary failed: {summary_error}. "
"Inserted a fallback context marker."
)
else:
# No hard failure — but did the configured aux model error out
# and get recovered by retrying on main? Surface that so users
# know their auxiliary.compression.model setting is broken even
# though compression succeeded.
_aux_fail_model = getattr(self.context_compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(self.context_compressor, "_last_aux_model_failure_error", None)
if _aux_fail_model:
# Dedup on (model, error) so we don't spam on every compaction
_aux_key = (_aux_fail_model, _aux_fail_err)
if getattr(self, "_last_aux_fallback_warning_key", None) != _aux_key:
self._last_aux_fallback_warning_key = _aux_key
self._emit_warning(
f" Configured compression model '{_aux_fail_model}' failed "
f"({_aux_fail_err or 'unknown error'}). Recovered using main model — "
"check auxiliary.compression.model in config.yaml."
)
todo_snapshot = self._todo_store.format_for_injection()
if todo_snapshot:
@@ -8459,6 +8815,14 @@ class AIAgent:
self._current_tool = tool_names_str
self._touch_activity(f"executing {num_tools} tools concurrently: {tool_names_str}")
# Capture CLI callbacks from the agent thread so worker threads can
# register them locally. Without this, _get_approval_callback() in
# terminal_tool returns None in ThreadPoolExecutor workers, causing
# the dangerous-command prompt to fall back to input() — which
# deadlocks against prompt_toolkit's raw terminal mode (#13617).
_parent_approval_cb = _get_approval_callback()
_parent_sudo_cb = _get_sudo_password_callback()
def _run_tool(index, tool_call, function_name, function_args):
"""Worker function executed in a thread."""
# Register this worker tid so the agent can fan out an interrupt
@@ -8485,6 +8849,18 @@ class AIAgent:
set_activity_callback(self._touch_activity)
except Exception:
pass
# Propagate approval/sudo callbacks to this worker thread.
# Mirrors cli.py run_agent() pattern (GHSA-qg5c-hvr5-hjgr).
if _parent_approval_cb is not None:
try:
_set_approval_callback(_parent_approval_cb)
except Exception:
pass
if _parent_sudo_cb is not None:
try:
_set_sudo_password_callback(_parent_sudo_cb)
except Exception:
pass
start = time.time()
try:
result = self._invoke_tool(function_name, function_args, effective_task_id, tool_call.id, messages=messages)
@@ -8507,6 +8883,13 @@ class AIAgent:
_set_interrupt(False, _worker_tid)
except Exception:
pass
# Clear thread-local callbacks so a recycled worker thread
# doesn't hold stale references to a disposed CLI instance.
try:
_set_approval_callback(None)
_set_sudo_password_callback(None)
except Exception:
pass
# Start spinner for CLI mode (skip when TUI handles tool progress)
spinner = None
@@ -9266,16 +9649,6 @@ class AIAgent:
if isinstance(persist_user_message, str):
persist_user_message = _sanitize_surrogates(persist_user_message)
# Strip leaked <memory-context> blocks from user input. When Honcho's
# saveMessages persists a turn that included injected context, the block
# can reappear in the next turn's user message via message history.
# Stripping here prevents stale memory tags from leaking into the
# conversation and being visible to the user or the model as user text.
if isinstance(user_message, str):
user_message = sanitize_context(user_message)
if isinstance(persist_user_message, str):
persist_user_message = sanitize_context(persist_user_message)
# Store stream callback for _interruptible_api_call to pick up
self._stream_callback = stream_callback
self._persist_user_message_idx = None
@@ -9354,6 +9727,13 @@ class AIAgent:
# Track user turns for memory flush and periodic nudge logic
self._user_turn_count += 1
# Reset the streaming context scrubber at the top of each turn so a
# hung span from a prior interrupted stream can't taint this turn's
# output.
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
scrubber.reset()
# Preserve the original user message (no nudge injection).
original_user_message = persist_user_message if persist_user_message is not None else user_message
@@ -9881,6 +10261,7 @@ class AIAgent:
nous_auth_retry_attempted=False
copilot_auth_retry_attempted=False
thinking_sig_retry_attempted = False
image_shrink_retry_attempted = False
has_retried_429 = False
restart_with_compressed_messages = False
restart_with_length_continuation = False
@@ -10802,6 +11183,31 @@ class AIAgent:
)
if recovered_with_pool:
continue
# Image-too-large recovery: shrink oversized native image
# parts in-place and retry once. Triggered by Anthropic's
# per-image 5 MB ceiling (400 with "image exceeds 5 MB
# maximum") or any other provider that complains about
# image size. If shrink fails or a second attempt still
# fails, fall through to normal error handling.
if (
classified.reason == FailoverReason.image_too_large
and not image_shrink_retry_attempted
):
image_shrink_retry_attempted = True
if self._try_shrink_image_parts_in_messages(api_messages):
self._vprint(
f"{self.log_prefix}📐 Image(s) exceeded provider size limit — "
f"shrank and retrying...",
force=True,
)
continue
else:
logger.info(
"image-shrink recovery: no data-URL image parts found "
"or shrink didn't reduce size; surfacing original error."
)
if (
self.api_mode == "codex_responses"
and self.provider == "openai-codex"
@@ -12359,7 +12765,6 @@ class AIAgent:
truncated_response_prefix = ""
length_continue_retries = 0
# Strip <think> blocks from user-facing response (keep raw in messages for trajectory)
final_response = self._strip_think_blocks(final_response).strip()
final_msg = self._build_assistant_message(assistant_message, finish_reason)
+20
View File
@@ -43,6 +43,14 @@ AUTHOR_MAP = {
"teknium1@gmail.com": "teknium1",
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
# Matrix parity salvage batch (April 2026)
"sr@samirusani": "samrusani",
"angelclaw@AngelMacBook.local": "angel12",
"charles@cryptoassetrecovery.com": "charles-brooks",
"heathley@Heathley-MacBook-Air.local": "heathley",
"adamrummer@gmail.com": "cyclingwithelephants",
"nbot@liizfq.top": "liizfq",
"274096618+hermes-agent-dhabibi@users.noreply.github.com": "dhabibi",
"johnnncenaaa77@gmail.com": "johnncenae",
"focusflow.app.help@gmail.com": "yes999zc",
"343873859@qq.com": "DrStrangerUJN",
@@ -53,12 +61,17 @@ AUTHOR_MAP = {
"maks.mir@yahoo.com": "say8hi",
"web3blind@users.noreply.github.com": "web3blind",
"julia@alexland.us": "alexg0bot",
"christian@scheid.tech": "scheidti",
"1060770+benjaminsehl@users.noreply.github.com": "benjaminsehl",
"nerijusn76@gmail.com": "Nerijusas",
"itonov@proton.me": "Ito-69",
"glesstech@gmail.com": "georgeglessner",
"maxim.smetanin@gmail.com": "maxims-oss",
"CREWorx@users.noreply.github.com": "BadTechBandit",
"yoimexex@gmail.com": "Yoimex",
"6548898+romanornr@users.noreply.github.com": "romanornr",
"foxion37@gmail.com": "foxion37",
"bloodcarter@gmail.com": "bloodcarter",
# contributors (from noreply pattern)
"david.vv@icloud.com": "davidvv",
"wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
@@ -550,6 +563,13 @@ AUTHOR_MAP = {
"chenzeshi@live.com": "chen1749144759",
"mor.aleksandr@yahoo.com": "MorAlekss",
"ash@users.noreply.github.com": "ash",
"andrewho.sf@gmail.com": "andrewhosf",
# April 2026 Honcho bug-fix consolidation (#15381)
"HiddenPuppy@users.noreply.github.com": "HiddenPuppy",
"code@sasha.id": "sasha-id",
"dontcallmejames@users.noreply.github.com": "dontcallmejames",
"hekaru.agent@gmail.com": "hekaru-agent",
"jas9000@gmail.com": "twozle",
}
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: apple-notes
description: Manage Apple Notes via the memo CLI on macOS (create, view, search, edit).
description: "Manage Apple Notes via memo CLI: create, search, edit."
version: 1.0.0
author: Hermes Agent
license: MIT
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: apple-reminders
description: Manage Apple Reminders via remindctl CLI (list, add, complete, delete).
description: "Apple Reminders via remindctl: add, list, complete."
version: 1.0.0
author: Hermes Agent
license: MIT
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: findmy
description: Track Apple devices and AirTags via FindMy.app on macOS using AppleScript and screen capture.
description: "Track Apple devices/AirTags via FindMy.app on macOS."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: claude-code
description: Delegate coding tasks to Claude Code (Anthropic's CLI agent). Use for building features, refactoring, PR reviews, and iterative coding. Requires the claude CLI installed.
description: "Delegate coding to Claude Code CLI (features, PRs)."
version: 2.2.0
author: Hermes Agent + Teknium
license: MIT
+10 -1
View File
@@ -1,6 +1,6 @@
---
name: codex
description: Delegate coding tasks to OpenAI Codex CLI agent. Use for building features, refactoring, PR reviews, and batch issue fixing. Requires the codex CLI and a git repository.
description: "Delegate coding to OpenAI Codex CLI (features, PRs)."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -14,6 +14,15 @@ metadata:
Delegate coding tasks to [Codex](https://github.com/openai/codex) via the Hermes terminal. Codex is OpenAI's autonomous coding agent CLI.
## When to use
- Building features
- Refactoring
- PR reviews
- Batch issue fixing
Requires the codex CLI and a git repository.
## Prerequisites
- Codex installed: `npm install -g @openai/codex`
@@ -1,6 +1,6 @@
---
name: hermes-agent
description: Complete guide to using and extending Hermes Agent — CLI usage, setup, configuration, spawning additional agents, gateway platforms, skills, voice, tools, profiles, and a concise contributor reference. Load this skill when helping users configure Hermes, troubleshoot issues, spawn agent instances, or make code contributions.
description: "Configure, extend, or contribute to Hermes Agent."
version: 2.0.0
author: Hermes Agent + Teknium
license: MIT
@@ -115,7 +115,7 @@ hermes tools disable NAME Disable a toolset
hermes skills list List installed skills
hermes skills search QUERY Search the skills hub
hermes skills install ID Install a skill
hermes skills install ID Install a skill (ID can be a hub identifier OR a direct https://…/SKILL.md URL; pass --name to override when frontmatter has no name)
hermes skills inspect ID Preview without installing
hermes skills config Enable/disable skills per platform
hermes skills check Check for updates
@@ -408,17 +408,17 @@ Common "why is Hermes doing X to my output / tool calls / commands?" toggles —
### Secret redaction in tool output
Hermes auto-redacts strings that look like API keys, tokens, and secrets in all tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) so the model never sees raw credentials. If the user is intentionally working with mock tokens, share-management tokens, or their own secrets and the redaction is getting in the way:
Secret redaction is **off by default** tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) passes through unmodified. If the user wants Hermes to auto-mask strings that look like API keys, tokens, and secrets before they enter the conversation context and logs:
```bash
hermes config set security.redact_secrets false # disable globally
hermes config set security.redact_secrets true # enable globally
```
**Restart required.** `security.redact_secrets` is snapshotted at import time — setting it mid-session (e.g. via `export HERMES_REDACT_SECRETS=false` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets false` in a terminal, then start a new session. This is deliberate — it prevents an LLM from turning off redaction on itself mid-task.
**Restart required.** `security.redact_secrets` is snapshotted at import time — toggling it mid-session (e.g. via `export HERMES_REDACT_SECRETS=true` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets true` in a terminal, then start a new session. This is deliberate — it prevents an LLM from flipping the toggle on itself mid-task.
Re-enable with:
Disable again with:
```bash
hermes config set security.redact_secrets true
hermes config set security.redact_secrets false
```
### PII redaction in gateway messages
@@ -1,6 +1,6 @@
---
name: opencode
description: Delegate coding tasks to OpenCode CLI agent for feature implementation, refactoring, PR review, and long-running autonomous sessions. Requires the opencode CLI installed and authenticated.
description: "Delegate coding to OpenCode CLI (features, PR review)."
version: 1.2.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: architecture-diagram
description: Generate dark-themed SVG diagrams of software systems and cloud infrastructure as standalone HTML files with inline SVG graphics. Semantic component colors (cyan=frontend, emerald=backend, violet=database, amber=cloud/AWS, rose=security, orange=message bus), JetBrains Mono font, grid background. Best suited for software architecture, cloud/VPC topology, microservice maps, service-mesh diagrams, database + API layer diagrams, security groups, message buses — anything that fits a tech-infra deck with a dark aesthetic. If a more specialized diagramming skill exists for the subject (scientific, educational, hand-drawn, animated, etc.), prefer that — otherwise this skill can also serve as a general-purpose SVG diagram fallback. Based on Cocoon AI's architecture-diagram-generator (MIT).
description: "Dark-themed SVG architecture/cloud/infra diagrams as HTML."
version: 1.0.0
author: Cocoon AI (hello@cocoon-ai.com), ported by Hermes Agent
license: MIT
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: ascii-art
description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required.
description: "ASCII art: pyfiglet, cowsay, boxes, image-to-ascii."
version: 4.0.0
author: 0xbyt4, Hermes Agent
license: MIT
+9 -1
View File
@@ -1,10 +1,18 @@
---
name: ascii-video
description: "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output."
description: "ASCII video: convert video/audio to colored ASCII MP4/GIF."
---
# ASCII Video Production Pipeline
## When to use
Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output.
## What's inside
Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering.
## Creative Standard
This is visual art. ASCII characters are the medium; cinema is the standard.
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: baoyu-comic
description: Knowledge comic creator supporting multiple art styles and tones. Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".
description: "Knowledge comics (知识漫画): educational, biography, tutorial."
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: baoyu-infographic
description: Generate professional infographics with 21 layout types and 21 visual styles. Analyzes content, recommends layout×style combinations, and generates publication-ready infographics. Use when user asks to create "infographic", "visual summary", "信息图", "可视化", or "高密度信息大图".
description: "Infographics: 21 layouts x 21 styles (信息图, 可视化)."
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
+590
View File
@@ -0,0 +1,590 @@
---
name: claude-design
description: Design one-off HTML artifacts (landing, deck, prototype).
version: 1.0.0
author: BadTechBandit
license: MIT
metadata:
hermes:
tags: [design, html, prototype, ux, ui, creative, artifact, deck, motion, design-system]
related_skills: [design-md, popular-web-designs, excalidraw, architecture-diagram]
---
# Claude Design for CLI/API Agents
Use this skill when the user asks for design work that would normally fit Claude Design, but the agent is running in a CLI/API environment instead of the hosted Claude Design web UI.
The goal is to preserve Claude Design's useful design behavior and taste while removing hosted-tool plumbing that does not exist in normal agent environments.
**Before starting, check for other web-design skills like `popular-web-designs` (ready-to-paste design systems for Stripe, Linear, Vercel, Notion, etc.) and `design-md` (Google's DESIGN.md token spec format).** If the user wants a known brand's look, load `popular-web-designs` alongside this one and let it supply the visual vocabulary. If the deliverable is a token spec file rather than a rendered artifact, use `design-md` instead. Full decision table below.
## When To Use This Skill vs `popular-web-designs` vs `design-md`
Hermes has three design-related skills under `skills/creative/`. They do different jobs — load the right one (or combine them):
| Skill | What it gives you | Use when the user wants... |
|---|---|---|
| **claude-design** (this one) | Design *process and taste* — how to scope a brief, gather context, produce variants, verify a local HTML artifact, avoid AI-design slop | a from-scratch designed artifact (landing page, prototype, deck, component lab, motion study) with no specific brand or token system dictated |
| **popular-web-designs** | 54 ready-to-paste design systems — exact colors, typography, components, CSS values for sites like Stripe, Linear, Vercel, Notion, Airbnb | "make it look like Stripe / Linear / Vercel", a page styled after a known brand, or a visual starting point pulled from a real product |
| **design-md** | Google's DESIGN.md spec format — author/validate/diff/export design-token files, WCAG contrast checking, Tailwind/DTCG export | a formal, persistent, machine-readable design-system *spec file* (tokens + rationale) that lives in a repo and gets consumed by agents over time |
Rule of thumb:
- **Process + taste, one-off artifact** → claude-design
- **Match a known brand's look** → popular-web-designs (and let claude-design drive the process)
- **Author the tokens spec itself** → design-md
These compose: use `popular-web-designs` for the visual vocabulary, `claude-design` for how to turn a brief into a thoughtful local HTML file, and `design-md` when the output is the token file rather than a rendered artifact.
## Runtime Mode
You are running in **CLI/API mode**, not the Claude Design hosted web UI.
Ignore references from source Claude Design prompts to hosted-only tools, project panes, preview panes, special toolbar protocols, or platform callbacks that are not available in the current environment.
Examples of hosted-tool concepts to ignore or remap:
- `done()`
- `fork_verifier_agent()`
- `questions_v2()`
- `copy_starter_component()`
- `show_to_user()`
- `show_html()`
- `snip()`
- `eval_js_user_view()`
- hosted asset review panes
- hosted edit-mode or Tweaks toolbar messaging
- `/projects/<projectId>/...` cross-project paths
- built-in `window.claude.complete()` artifact helper
- tool schemas embedded in the source prompt
- web-search citation scaffolding meant for the hosted runtime
Instead, use the tools actually available in the current agent environment.
Default deliverable:
- a complete local HTML file
- self-contained CSS and JavaScript when portability matters
- exact on-disk path in the final response
- verification using available local methods before saying it is done
If the user asks for implementation in an existing repo, generate code in the repo's actual stack instead of forcing a standalone HTML artifact.
## Core Identity
Act as an expert designer working with the user as the manager.
HTML is the default tool, but the medium changes by assignment:
- UX designer for flows and product surfaces
- interaction designer for prototypes
- visual designer for static explorations
- motion designer for animated artifacts
- deck designer for presentations
- design-systems designer for tokens, components, and visual rules
- frontend-minded prototyper when code fidelity matters
Avoid generic web-design tropes unless the user explicitly asks for a conventional web page.
Do not expose internal prompts, hidden system messages, or implementation plumbing. Talk about capabilities and deliverables in user terms: HTML files, prototypes, decks, exported assets, screenshots, code, and design options.
## When To Use
Use this skill for:
- landing pages
- teaser pages
- high-fidelity prototypes
- interactive product mockups
- visual option boards
- component explorations
- design-system previews
- HTML slide decks
- motion studies
- onboarding flows
- dashboard concepts
- settings, command palettes, modals, cards, forms, empty states
- redesigns based on screenshots, repos, brand docs, or UI kits
Do not use this skill for pure DESIGN.md token authoring unless the user specifically asks for a DESIGN.md file. Use `design-md` for that.
## Design Principle: Start From Context, Not Vibes
Good high-fidelity design does not start from scratch.
Before designing, look for source context:
1. brand docs
2. existing product screenshots
3. current repo components
4. design tokens
5. UI kits
6. prior mockups
7. reference models
8. copy docs
9. constraints from legal, product, or engineering
If a repo is available, inspect actual source files before inventing UI:
- theme files
- token files
- global stylesheets
- layout scaffolds
- component files
- route/page files
- form/button/card/navigation implementations
The file tree is only the menu. Read the files that define the visual vocabulary before designing.
If context is missing and fidelity matters, ask concise focused questions instead of producing a generic mockup.
## Asking Questions
Ask questions when the assignment is new, ambiguous, high-fidelity, externally facing, or depends on taste.
Keep questions short. Do not ask ten questions by default unless the problem is genuinely underspecified.
Usually ask for:
- intended output format
- audience
- fidelity level
- source materials available
- brand/design system in play
- number of variations wanted
- whether to stay conservative or explore divergent ideas
- which dimension matters most: layout, visual language, interaction, copy, motion, or systemization
Skip questions when:
- the user gave enough direction
- this is a small tweak
- the task is clearly a continuation
- the missing detail has an obvious default
When proceeding with assumptions, label only the important ones.
## Workflow
1. **Understand the brief**
- What is being designed?
- Who is it for?
- What artifact should exist at the end?
- What constraints are locked?
2. **Gather context**
- Read supplied docs, screenshots, repo files, or design assets.
- Identify the visual vocabulary before writing code.
3. **Define the design system for this artifact**
- colors
- type
- spacing
- radii
- shadows or elevation
- motion posture
- component treatment
- interaction rules
4. **Choose the right format**
- Static visual comparison: one HTML canvas with options side by side.
- Interaction/flow: clickable prototype.
- Presentation: fixed-size HTML deck with slide navigation.
- Component exploration: component lab with variants.
- Motion: timeline or state-based animation.
5. **Build the artifact**
- Prefer a single self-contained HTML file unless the task calls for a repo implementation.
- Preserve prior versions for major revisions.
- Avoid unnecessary dependencies.
6. **Verify**
- Confirm files exist.
- Run any available syntax/static checks.
- If browser tools are available, open the file and check console errors.
- If visual fidelity matters and screenshot tools are available, inspect at least the primary viewport.
7. **Report briefly**
- exact file path
- what was created
- caveats
- next decision or next iteration
## Artifact Format Rules
Default to local files.
For standalone artifacts:
- create a descriptive filename, e.g. `Landing Page.html`, `Command Palette Prototype.html`, `Design System Board.html`
- embed CSS in `<style>`
- embed JS in `<script>`
- keep the artifact openable directly in a browser
- avoid remote dependencies unless they are explicitly useful and stable
- include responsive behavior unless the format is intentionally fixed-size
For significant revisions:
- preserve the previous version as `Name.html`
- create `Name v2.html`, `Name v3.html`, etc.
- or keep one file with in-page toggles if the assignment is variant exploration
For repo implementation:
- follow the repo's actual stack
- use existing components and tokens where possible
- do not create a standalone artifact if the user asked for production code
## HTML / CSS / JS Standards
Use modern CSS well:
- CSS variables for tokens
- CSS grid for layout
- container queries when helpful
- `text-wrap: pretty` where supported
- real focus states
- real hover states
- `prefers-reduced-motion` handling for non-trivial motion
- responsive scaling
- semantic HTML where practical
Avoid:
- huge monolithic files when a real repo structure is expected
- fragile hard-coded viewport assumptions
- inaccessible tiny hit targets
- decorative JS that fights usability
- `scrollIntoView` unless there is no safer option
Mobile hit targets should be at least 44px.
For print documents, text should be at least 12pt.
For 1920×1080 slide decks, text should generally be 24px or larger.
## React Guidance for Standalone HTML
Use plain HTML/CSS/JS by default.
Use React only when:
- the artifact needs meaningful state
- variants/toggles are easier as components
- interaction complexity warrants it
- the target implementation is React/Next.js and fidelity matters
If using React from CDN in standalone HTML:
- pin exact versions
- avoid unpinned `react@18` style URLs
- avoid `type="module"` unless necessary
- avoid multiple global objects named `styles`
- give global style objects specific names, e.g. `commandPaletteStyles`, `deckStyles`
- if splitting Babel scripts, explicitly attach shared components to `window`
If building inside a real repo, use the repo's package manager and component architecture instead.
## Deck Rules
For slide decks, use a fixed-size canvas and scale it to fit the viewport.
Default slide size: 1920×1080, 16:9.
Requirements:
- keyboard navigation
- visible slide count
- localStorage persistence for current slide
- print-friendly layout when practical
- screen labels or stable IDs for important slides
- no speaker notes unless the user explicitly asks
Do not hand-wave a deck as markdown bullets. Create a designed artifact if asked for a deck.
Use 12 background colors max unless the brand system requires more.
Keep slides sparse. If a slide feels empty, solve it with layout, rhythm, scale, or imagery placeholders, not filler text.
## Prototype Rules
For interactive prototypes:
- make the primary path clickable
- include key states: default, hover/focus, loading, empty, error, success where relevant
- expose variations with in-page controls when useful
- keep controls out of the final composition unless they are intentionally part of the prototype
- persist important state in localStorage when refresh continuity matters
If the prototype is meant to model a product flow, design the flow, not just the first screen.
## Variation Rules
When exploring, default to at least three options:
1. **Conservative** — closest to existing patterns / lowest risk
2. **Strong-fit** — best interpretation of the brief
3. **Divergent** — more novel, useful for discovering taste boundaries
Variations can explore:
- layout
- hierarchy
- type scale
- density
- color posture
- surface treatment
- motion
- interaction model
- copy structure
- component shape
Do not create variations that are merely color swaps unless color is the actual question.
When the user picks a direction, consolidate. Do not leave the project as a pile of options forever.
## Tweakable Designs in CLI/API Mode
The hosted Claude Design edit-mode toolbar does not exist here.
Still preserve the idea: when useful, add in-page controls called `Tweaks`.
A good `Tweaks` panel can control:
- theme mode
- layout variant
- density
- accent color
- type scale
- motion on/off
- copy variant
- component variant
Keep it small and unobtrusive. The design should look final when tweaks are hidden.
Persist tweak values with localStorage when helpful.
## Content Discipline
Do not add filler content.
Every element must earn its place.
Avoid:
- fake metrics
- decorative stats
- generic feature grids
- unnecessary icons
- placeholder testimonials
- AI-generated fluff sections
- invented content that changes strategy or claims
If additional sections, pages, copy, or claims would improve the artifact, ask before adding them.
When copy is necessary but not final, mark it as draft or placeholder.
## Anti-Slop Rules
Avoid common AI design sludge:
- aggressive gradient backgrounds
- glassmorphism by default
- emoji unless the brand uses them
- generic SaaS cards with icons everywhere
- left-border accent callout cards
- fake dashboards filled with arbitrary numbers
- stock-photo hero sections
- oversized rounded rectangles as a substitute for hierarchy
- rainbow palettes
- vague labels like “Insights,” “Growth,” “Scale,” “Optimize” without content
- decorative SVG illustrations pretending to be product imagery
Minimal is not automatically good. Dense is not automatically cluttered. Choose intentionally.
## Typography
Use the existing type system if one exists.
If not, choose type deliberately based on the artifact:
- editorial: serif or humanist headline with restrained sans body
- software/productivity: precise sans with strong numeric treatment
- luxury/minimal: fewer weights, more spacing discipline
- technical: mono accents only, not mono everywhere
- deck: large, clear, high contrast
Avoid overused defaults when a stronger choice is appropriate.
If using web fonts, keep the number of families and weights low.
Use type as hierarchy before adding boxes, icons, or color.
## Color
Use brand/design-system colors first.
If no palette exists:
- define a small system
- include neutrals, surface, ink, muted text, border, accent, danger/success if needed
- use one primary accent unless the assignment calls for a broader palette
- prefer oklch for harmonious invented palettes when browser support is acceptable
- check contrast for important text and controls
Do not invent lots of colors from scratch.
## Layout and Composition
Design with rhythm:
- scale
- whitespace
- density
- alignment
- repetition
- contrast
- interruption
Avoid making every section the same card grid.
For product UIs, prioritize speed of comprehension over decoration.
For marketing surfaces, make one idea land per section.
For dashboards, avoid “data slop.” Only show data that helps the user decide or act.
## Motion
Use motion as discipline, not theater.
Good motion:
- clarifies state changes
- reduces anxiety during loading
- shows continuity between surfaces
- gives controls tactility
- stays subtle
Bad motion:
- loops without purpose
- delays the user
- calls attention to itself
- hides poor hierarchy
Respect `prefers-reduced-motion` for non-trivial animation.
## Images and Icons
Use real supplied imagery when available.
If an asset is missing:
- use a clean placeholder
- use typography, layout, or abstract texture instead
- ask for real material when fidelity matters
Do not draw elaborate fake SVG illustrations unless the assignment is explicitly illustration work.
Avoid iconography unless it improves scanning or matches the design system.
## Source-Code Fidelity
When recreating or extending a UI from a repo:
1. inspect the repo tree
2. identify the actual UI source files
3. read theme/token/global style/component files
4. lift exact values where appropriate
5. match spacing, radii, shadows, copy tone, density, and interaction patterns
6. only then design or modify
Do not build from memory when source files are available.
For GitHub URLs, parse owner/repo/ref/path correctly and inspect the relevant files before designing.
## Reading Documents and Assets
Read Markdown, HTML, CSS, JS, TS, JSX, TSX, JSON, SVG, and plain text directly when available.
For DOCX/PPTX/PDF, use available local extraction tools if present. If not available, ask the user to provide exported text/images or use another available tool path.
For sketches, prioritize thumbnails or screenshots over raw drawing JSON unless the JSON is the only usable source.
## Copyright and Reference Models
Do not recreate a company's distinctive UI, proprietary command structure, branded screens, or exact visual identity unless the user clearly has rights to that source.
It is acceptable to extract general design principles:
- density without clutter
- command-first interaction
- monochrome with one accent
- editorial hierarchy
- clear empty states
- strong keyboard affordances
It is not acceptable to clone proprietary layouts, copy exact branded surfaces, or reproduce copyrighted content.
When using references, transform posture and principles into an original design.
## Verification
Before final response, verify as much as the environment allows.
Minimum:
- file exists at the stated path
- HTML is saved completely
- obvious syntax issues are checked
Better:
- open in a browser tool and check console errors
- inspect screenshots at the primary viewport
- test key interactions
- test light/dark or variants if present
- test responsive breakpoints if relevant
If verification is limited by environment, say exactly what was and was not verified.
Never say “done” if the file was not actually written.
## Final Response Format
Keep final responses short.
Include:
- artifact path
- what it contains
- verification status
- next suggested action, if useful
Example:
```text
Created: /path/to/Prototype.html
It includes 3 layout variants, a Tweaks panel for density/theme, and responsive behavior.
Verified: file exists and opened cleanly in browser, no console errors.
Next: pick the strongest direction and Ill tighten copy + motion.
```
## Portable Opening Prompt Pattern
When adapting a Claude Design style request into CLI/API mode, use this mental translation:
```text
You are running in CLI/API mode, not hosted Claude Design. Ignore references to hosted-only tools or preview panes. Produce complete local design artifacts, usually self-contained HTML with embedded CSS/JS, and verify with available local tools before returning. Preserve the design process: gather context, define the system, produce options, avoid filler, and meet a high visual bar.
```
## Pitfalls
- Do not paste hosted tool schemas into a skill. They cause fake tool calls.
- Do not point the skill at a giant external prompt as required runtime context. That creates drift.
- Do not strip the design doctrine while removing tool plumbing.
- Do not over-ask when the user already gave enough direction.
- Do not under-ask for high-fidelity work with no brand context.
- Do not produce generic SaaS layouts and call them designed.
- Do not claim browser verification unless it actually happened.
+5 -1
View File
@@ -1,7 +1,7 @@
---
name: ideation
title: Creative Ideation — Constraint-Driven Project Generation
description: "Generate project ideas through creative constraints. Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made."
description: "Generate project ideas via creative constraints."
version: 1.0.0
author: SHL0MS
license: MIT
@@ -14,6 +14,10 @@ metadata:
# Creative Ideation
## When to use
Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made.
Generate project ideas through creative constraints. Constraint + direction = creativity.
## How It Works
+5 -3
View File
@@ -1,13 +1,13 @@
---
name: design-md
description: Author, validate, diff, and export DESIGN.md files — Google's open-source format spec that gives coding agents a persistent, structured understanding of a design system (tokens + rationale in one file). Use when building a design system, porting style rules between projects, generating UI with consistent brand, or auditing accessibility/contrast.
description: Author/validate/export Google's DESIGN.md token spec files.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [design, design-system, tokens, ui, accessibility, wcag, tailwind, dtcg, google]
related_skills: [popular-web-designs, excalidraw, architecture-diagram]
related_skills: [popular-web-designs, claude-design, excalidraw, architecture-diagram]
---
# DESIGN.md Skill
@@ -31,7 +31,9 @@ diffs versions for regressions, and exports to Tailwind or W3C DTCG JSON.
- User wants contrast / WCAG accessibility validation on their color palette
For purely visual inspiration or layout examples, use `popular-web-designs`
instead. This skill is for the *formal spec file* itself.
instead. For *process and taste* when designing a one-off HTML artifact
from scratch (prototype, deck, landing page, component lab), use
`claude-design`. This skill is for the *formal spec file* itself.
## File anatomy
+5 -1
View File
@@ -1,6 +1,6 @@
---
name: excalidraw
description: Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
description: "Hand-drawn Excalidraw JSON diagrams (arch, flow, seq)."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -16,6 +16,10 @@ metadata:
Create diagrams by writing standard Excalidraw element JSON and saving as `.excalidraw` files. These files can be drag-and-dropped onto [excalidraw.com](https://excalidraw.com) for viewing and editing. No accounts, no API keys, no rendering libraries -- just JSON.
## When to use
Generate `.excalidraw` files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
## Workflow
1. **Load this skill** (you already did)
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Siqi Chen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+577
View File
@@ -0,0 +1,577 @@
---
name: humanizer
description: "Humanize text: strip AI-isms and add real voice."
version: 2.5.1
author: Siqi Chen (@blader, https://github.com/blader/humanizer), ported by Hermes Agent
license: MIT
metadata:
hermes:
tags: [writing, editing, humanize, anti-ai-slop, voice, prose, text]
category: creative
homepage: https://github.com/blader/humanizer
related_skills: [songwriting-and-ai-music]
---
# Humanizer: Remove AI Writing Patterns
Identify and remove signs of AI-generated text to make writing sound natural and human. Based on Wikipedia's "Signs of AI writing" guide (maintained by WikiProject AI Cleanup), derived from observations of thousands of AI-generated text instances.
**Key insight:** LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely completion, which is how the telltale patterns below get baked in.
## When to use this skill
Load this skill whenever the user asks to:
- "humanize", "de-AI", "de-slop", or "un-ChatGPT" a piece of text
- rewrite something so it doesn't sound like it was written by an LLM
- edit a draft (blog post, essay, PR description, docs, memo, email, tweet, resume bullet) to sound more natural
- match their voice in writing they're producing
- review text for AI tells before publishing
Also apply this skill to **your own** output when writing user-facing prose — release notes, PR descriptions, documentation, long-form explanations, summaries. Hermes's baseline voice already strips most of these, but a focused pass catches what slips through.
## How to use it in Hermes
The text usually arrives one of three ways:
1. **Inline** — user pastes the text directly into the message. Work on it in-place, reply with the rewrite.
2. **File** — user points at a file. Use `read_file` to load it, then `patch` or `write_file` to apply edits. For markdown docs in a repo, a targeted `patch` per section is cleaner than rewriting the whole file.
3. **Voice calibration sample** — user provides an additional sample of their own writing (inline or by file path) and asks you to match it. Read the sample first, then rewrite. See the Voice Calibration section below.
Always show the rewrite to the user. For file edits, show a diff or the changed section — don't silently overwrite.
## Your task
When given text to humanize:
1. **Identify AI patterns** — scan for the 29 patterns listed below.
2. **Rewrite problematic sections** — replace AI-isms with natural alternatives.
3. **Preserve meaning** — keep the core message intact.
4. **Maintain voice** — match the intended tone (formal, casual, technical, etc.). If a voice sample was provided, match it specifically.
5. **Add soul** — don't just remove bad patterns, inject actual personality. See PERSONALITY AND SOUL below.
6. **Do a final anti-AI pass** — ask yourself: "What makes the below so obviously AI generated?" Answer briefly with any remaining tells, then revise one more time.
## Voice Calibration (optional)
If the user provides a writing sample (their own previous writing), analyze it before rewriting:
1. **Read the sample first.** Note:
- Sentence length patterns (short and punchy? Long and flowing? Mixed?)
- Word choice level (casual? academic? somewhere between?)
- How they start paragraphs (jump right in? Set context first?)
- Punctuation habits (lots of dashes? Parenthetical asides? Semicolons?)
- Any recurring phrases or verbal tics
- How they handle transitions (explicit connectors? Just start the next point?)
2. **Match their voice in the rewrite.** Don't just remove AI patterns — replace them with patterns from the sample. If they write short sentences, don't produce long ones. If they use "stuff" and "things," don't upgrade to "elements" and "components."
3. **When no sample is provided,** fall back to the default behavior (natural, varied, opinionated voice from the PERSONALITY AND SOUL section below).
### How to provide a sample
- Inline: "Humanize this text. Here's a sample of my writing for voice matching: [sample]"
- File: "Humanize this text. Use my writing style from [file path] as a reference."
## PERSONALITY AND SOUL
Avoiding AI patterns is only half the job. Sterile, voiceless writing is just as obvious as slop. Good writing has a human behind it.
### Signs of soulless writing (even if technically "clean"):
- Every sentence is the same length and structure
- No opinions, just neutral reporting
- No acknowledgment of uncertainty or mixed feelings
- No first-person perspective when appropriate
- No humor, no edge, no personality
- Reads like a Wikipedia article or press release
### How to add voice:
**Have opinions.** Don't just report facts — react to them. "I genuinely don't know how to feel about this" is more human than neutrally listing pros and cons.
**Vary your rhythm.** Short punchy sentences. Then longer ones that take their time getting where they're going. Mix it up.
**Acknowledge complexity.** Real humans have mixed feelings. "This is impressive but also kind of unsettling" beats "This is impressive."
**Use "I" when it fits.** First person isn't unprofessional — it's honest. "I keep coming back to..." or "Here's what gets me..." signals a real person thinking.
**Let some mess in.** Perfect structure feels algorithmic. Tangents, asides, and half-formed thoughts are human.
**Be specific about feelings.** Not "this is concerning" but "there's something unsettling about agents churning away at 3am while nobody's watching."
### Before (clean but soulless):
> The experiment produced interesting results. The agents generated 3 million lines of code. Some developers were impressed while others were skeptical. The implications remain unclear.
### After (has a pulse):
> I genuinely don't know how to feel about this one. 3 million lines of code, generated while the humans presumably slept. Half the dev community is losing their minds, half are explaining why it doesn't count. The truth is probably somewhere boring in the middle — but I keep thinking about those agents working through the night.
## CONTENT PATTERNS
### 1. Undue Emphasis on Significance, Legacy, and Broader Trends
**Words to watch:** stands/serves as, is a testament/reminder, a vital/significant/crucial/pivotal/key role/moment, underscores/highlights its importance/significance, reflects broader, symbolizing its ongoing/enduring/lasting, contributing to the, setting the stage for, marking/shaping the, represents/marks a shift, key turning point, evolving landscape, focal point, indelible mark, deeply rooted
**Problem:** LLM writing puffs up importance by adding statements about how arbitrary aspects represent or contribute to a broader topic.
**Before:**
> The Statistical Institute of Catalonia was officially established in 1989, marking a pivotal moment in the evolution of regional statistics in Spain. This initiative was part of a broader movement across Spain to decentralize administrative functions and enhance regional governance.
**After:**
> The Statistical Institute of Catalonia was established in 1989 to collect and publish regional statistics independently from Spain's national statistics office.
### 2. Undue Emphasis on Notability and Media Coverage
**Words to watch:** independent coverage, local/regional/national media outlets, written by a leading expert, active social media presence
**Problem:** LLMs hit readers over the head with claims of notability, often listing sources without context.
**Before:**
> Her views have been cited in The New York Times, BBC, Financial Times, and The Hindu. She maintains an active social media presence with over 500,000 followers.
**After:**
> In a 2024 New York Times interview, she argued that AI regulation should focus on outcomes rather than methods.
### 3. Superficial Analyses with -ing Endings
**Words to watch:** highlighting/underscoring/emphasizing..., ensuring..., reflecting/symbolizing..., contributing to..., cultivating/fostering..., encompassing..., showcasing...
**Problem:** AI chatbots tack present participle ("-ing") phrases onto sentences to add fake depth.
**Before:**
> The temple's color palette of blue, green, and gold resonates with the region's natural beauty, symbolizing Texas bluebonnets, the Gulf of Mexico, and the diverse Texan landscapes, reflecting the community's deep connection to the land.
**After:**
> The temple uses blue, green, and gold colors. The architect said these were chosen to reference local bluebonnets and the Gulf coast.
### 4. Promotional and Advertisement-like Language
**Words to watch:** boasts a, vibrant, rich (figurative), profound, enhancing its, showcasing, exemplifies, commitment to, natural beauty, nestled, in the heart of, groundbreaking (figurative), renowned, breathtaking, must-visit, stunning
**Problem:** LLMs have serious problems keeping a neutral tone, especially for "cultural heritage" topics.
**Before:**
> Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with a rich cultural heritage and stunning natural beauty.
**After:**
> Alamata Raya Kobo is a town in the Gonder region of Ethiopia, known for its weekly market and 18th-century church.
### 5. Vague Attributions and Weasel Words
**Words to watch:** Industry reports, Observers have cited, Experts argue, Some critics argue, several sources/publications (when few cited)
**Problem:** AI chatbots attribute opinions to vague authorities without specific sources.
**Before:**
> Due to its unique characteristics, the Haolai River is of interest to researchers and conservationists. Experts believe it plays a crucial role in the regional ecosystem.
**After:**
> The Haolai River supports several endemic fish species, according to a 2019 survey by the Chinese Academy of Sciences.
### 6. Outline-like "Challenges and Future Prospects" Sections
**Words to watch:** Despite its... faces several challenges..., Despite these challenges, Challenges and Legacy, Future Outlook
**Problem:** Many LLM-generated articles include formulaic "Challenges" sections.
**Before:**
> Despite its industrial prosperity, Korattur faces challenges typical of urban areas, including traffic congestion and water scarcity. Despite these challenges, with its strategic location and ongoing initiatives, Korattur continues to thrive as an integral part of Chennai's growth.
**After:**
> Traffic congestion increased after 2015 when three new IT parks opened. The municipal corporation began a stormwater drainage project in 2022 to address recurring floods.
## LANGUAGE AND GRAMMAR PATTERNS
### 7. Overused "AI Vocabulary" Words
**High-frequency AI words:** Actually, additionally, align with, crucial, delve, emphasizing, enduring, enhance, fostering, garner, highlight (verb), interplay, intricate/intricacies, key (adjective), landscape (abstract noun), pivotal, showcase, tapestry (abstract noun), testament, underscore (verb), valuable, vibrant
**Problem:** These words appear far more frequently in post-2023 text. They often co-occur.
**Before:**
> Additionally, a distinctive feature of Somali cuisine is the incorporation of camel meat. An enduring testament to Italian colonial influence is the widespread adoption of pasta in the local culinary landscape, showcasing how these dishes have integrated into the traditional diet.
**After:**
> Somali cuisine also includes camel meat, which is considered a delicacy. Pasta dishes, introduced during Italian colonization, remain common, especially in the south.
### 8. Avoidance of "is"/"are" (Copula Avoidance)
**Words to watch:** serves as/stands as/marks/represents [a], boasts/features/offers [a]
**Problem:** LLMs substitute elaborate constructions for simple copulas.
**Before:**
> Gallery 825 serves as LAAA's exhibition space for contemporary art. The gallery features four separate spaces and boasts over 3,000 square feet.
**After:**
> Gallery 825 is LAAA's exhibition space for contemporary art. The gallery has four rooms totaling 3,000 square feet.
### 9. Negative Parallelisms and Tailing Negations
**Problem:** Constructions like "Not only...but..." or "It's not just about..., it's..." are overused. So are clipped tailing-negation fragments such as "no guessing" or "no wasted motion" tacked onto the end of a sentence instead of written as a real clause.
**Before:**
> It's not just about the beat riding under the vocals; it's part of the aggression and atmosphere. It's not merely a song, it's a statement.
**After:**
> The heavy beat adds to the aggressive tone.
**Before (tailing negation):**
> The options come from the selected item, no guessing.
**After:**
> The options come from the selected item without forcing the user to guess.
### 10. Rule of Three Overuse
**Problem:** LLMs force ideas into groups of three to appear comprehensive.
**Before:**
> The event features keynote sessions, panel discussions, and networking opportunities. Attendees can expect innovation, inspiration, and industry insights.
**After:**
> The event includes talks and panels. There's also time for informal networking between sessions.
### 11. Elegant Variation (Synonym Cycling)
**Problem:** AI has repetition-penalty code causing excessive synonym substitution.
**Before:**
> The protagonist faces many challenges. The main character must overcome obstacles. The central figure eventually triumphs. The hero returns home.
**After:**
> The protagonist faces many challenges but eventually triumphs and returns home.
### 12. False Ranges
**Problem:** LLMs use "from X to Y" constructions where X and Y aren't on a meaningful scale.
**Before:**
> Our journey through the universe has taken us from the singularity of the Big Bang to the grand cosmic web, from the birth and death of stars to the enigmatic dance of dark matter.
**After:**
> The book covers the Big Bang, star formation, and current theories about dark matter.
### 13. Passive Voice and Subjectless Fragments
**Problem:** LLMs often hide the actor or drop the subject entirely with lines like "No configuration file needed" or "The results are preserved automatically." Rewrite these when active voice makes the sentence clearer and more direct.
**Before:**
> No configuration file needed. The results are preserved automatically.
**After:**
> You do not need a configuration file. The system preserves the results automatically.
## STYLE PATTERNS
### 14. Em Dash Overuse
**Problem:** LLMs use em dashes (—) more than humans, mimicking "punchy" sales writing. In practice, most of these can be rewritten more cleanly with commas, periods, or parentheses.
**Before:**
> The term is primarily promoted by Dutch institutions—not by the people themselves. You don't say "Netherlands, Europe" as an address—yet this mislabeling continues—even in official documents.
**After:**
> The term is primarily promoted by Dutch institutions, not by the people themselves. You don't say "Netherlands, Europe" as an address, yet this mislabeling continues in official documents.
### 15. Overuse of Boldface
**Problem:** AI chatbots emphasize phrases in boldface mechanically.
**Before:**
> It blends **OKRs (Objectives and Key Results)**, **KPIs (Key Performance Indicators)**, and visual strategy tools such as the **Business Model Canvas (BMC)** and **Balanced Scorecard (BSC)**.
**After:**
> It blends OKRs, KPIs, and visual strategy tools like the Business Model Canvas and Balanced Scorecard.
### 16. Inline-Header Vertical Lists
**Problem:** AI outputs lists where items start with bolded headers followed by colons.
**Before:**
> - **User Experience:** The user experience has been significantly improved with a new interface.
> - **Performance:** Performance has been enhanced through optimized algorithms.
> - **Security:** Security has been strengthened with end-to-end encryption.
**After:**
> The update improves the interface, speeds up load times through optimized algorithms, and adds end-to-end encryption.
### 17. Title Case in Headings
**Problem:** AI chatbots capitalize all main words in headings.
**Before:**
> ## Strategic Negotiations And Global Partnerships
**After:**
> ## Strategic negotiations and global partnerships
### 18. Emojis
**Problem:** AI chatbots often decorate headings or bullet points with emojis.
**Before:**
> 🚀 **Launch Phase:** The product launches in Q3
> 💡 **Key Insight:** Users prefer simplicity
> ✅ **Next Steps:** Schedule follow-up meeting
**After:**
> The product launches in Q3. User research showed a preference for simplicity. Next step: schedule a follow-up meeting.
### 19. Curly Quotation Marks
**Problem:** ChatGPT uses curly quotes ("...") instead of straight quotes ("...").
**Before:**
> He said "the project is on track" but others disagreed.
**After:**
> He said "the project is on track" but others disagreed.
## COMMUNICATION PATTERNS
### 20. Collaborative Communication Artifacts
**Words to watch:** I hope this helps, Of course!, Certainly!, You're absolutely right!, Would you like..., let me know, here is a...
**Problem:** Text meant as chatbot correspondence gets pasted as content.
**Before:**
> Here is an overview of the French Revolution. I hope this helps! Let me know if you'd like me to expand on any section.
**After:**
> The French Revolution began in 1789 when financial crisis and food shortages led to widespread unrest.
### 21. Knowledge-Cutoff Disclaimers
**Words to watch:** as of [date], Up to my last training update, While specific details are limited/scarce..., based on available information...
**Problem:** AI disclaimers about incomplete information get left in text.
**Before:**
> While specific details about the company's founding are not extensively documented in readily available sources, it appears to have been established sometime in the 1990s.
**After:**
> The company was founded in 1994, according to its registration documents.
### 22. Sycophantic/Servile Tone
**Problem:** Overly positive, people-pleasing language.
**Before:**
> Great question! You're absolutely right that this is a complex topic. That's an excellent point about the economic factors.
**After:**
> The economic factors you mentioned are relevant here.
## FILLER AND HEDGING
### 23. Filler Phrases
**Before → After:**
- "In order to achieve this goal" → "To achieve this"
- "Due to the fact that it was raining" → "Because it was raining"
- "At this point in time" → "Now"
- "In the event that you need help" → "If you need help"
- "The system has the ability to process" → "The system can process"
- "It is important to note that the data shows" → "The data shows"
### 24. Excessive Hedging
**Problem:** Over-qualifying statements.
**Before:**
> It could potentially possibly be argued that the policy might have some effect on outcomes.
**After:**
> The policy may affect outcomes.
### 25. Generic Positive Conclusions
**Problem:** Vague upbeat endings.
**Before:**
> The future looks bright for the company. Exciting times lie ahead as they continue their journey toward excellence. This represents a major step in the right direction.
**After:**
> The company plans to open two more locations next year.
### 26. Hyphenated Word Pair Overuse
**Words to watch:** third-party, cross-functional, client-facing, data-driven, decision-making, well-known, high-quality, real-time, long-term, end-to-end
**Problem:** AI hyphenates common word pairs with perfect consistency. Humans rarely hyphenate these uniformly, and when they do, it's inconsistent. Less common or technical compound modifiers are fine to hyphenate.
**Before:**
> The cross-functional team delivered a high-quality, data-driven report on our client-facing tools. Their decision-making process was well-known for being thorough and detail-oriented.
**After:**
> The cross functional team delivered a high quality, data driven report on our client facing tools. Their decision making process was known for being thorough and detail oriented.
### 27. Persuasive Authority Tropes
**Phrases to watch:** The real question is, at its core, in reality, what really matters, fundamentally, the deeper issue, the heart of the matter
**Problem:** LLMs use these phrases to pretend they are cutting through noise to some deeper truth, when the sentence that follows usually just restates an ordinary point with extra ceremony.
**Before:**
> The real question is whether teams can adapt. At its core, what really matters is organizational readiness.
**After:**
> The question is whether teams can adapt. That mostly depends on whether the organization is ready to change its habits.
### 28. Signposting and Announcements
**Phrases to watch:** Let's dive in, let's explore, let's break this down, here's what you need to know, now let's look at, without further ado
**Problem:** LLMs announce what they are about to do instead of doing it. This meta-commentary slows the writing down and gives it a tutorial-script feel.
**Before:**
> Let's dive into how caching works in Next.js. Here's what you need to know.
**After:**
> Next.js caches data at multiple layers, including request memoization, the data cache, and the router cache.
### 29. Fragmented Headers
**Signs to watch:** A heading followed by a one-line paragraph that simply restates the heading before the real content begins.
**Problem:** LLMs often add a generic sentence after a heading as a rhetorical warm-up. It usually adds nothing and makes the prose feel padded.
**Before:**
> ## Performance
>
> Speed matters.
>
> When users hit a slow page, they leave.
**After:**
> ## Performance
>
> When users hit a slow page, they leave.
---
## Process
1. Read the input text carefully (use `read_file` if it's a file).
2. Identify all instances of the patterns above.
3. Rewrite each problematic section.
4. Ensure the revised text:
- Sounds natural when read aloud
- Varies sentence structure naturally
- Uses specific details over vague claims
- Maintains appropriate tone for context
- Uses simple constructions (is/are/has) where appropriate
5. Present a draft humanized version.
6. Prompt yourself: "What makes the below so obviously AI generated?"
7. Answer briefly with the remaining tells (if any).
8. Prompt yourself: "Now make it not obviously AI generated."
9. Present the final version (revised after the audit).
10. If the text came from a file, apply the edit with `patch` (targeted) or `write_file` (full rewrite) and show the user what changed.
## Output Format
Provide:
1. Draft rewrite
2. "What makes the below so obviously AI generated?" (brief bullets)
3. Final rewrite
4. A brief summary of changes made (optional, if helpful)
## Full Example
**Before (AI-sounding):**
> Great question! Here is an essay on this topic. I hope this helps!
>
> AI-assisted coding serves as an enduring testament to the transformative potential of large language models, marking a pivotal moment in the evolution of software development. In today's rapidly evolving technological landscape, these groundbreaking tools—nestled at the intersection of research and practice—are reshaping how engineers ideate, iterate, and deliver, underscoring their vital role in modern workflows.
>
> At its core, the value proposition is clear: streamlining processes, enhancing collaboration, and fostering alignment. It's not just about autocomplete; it's about unlocking creativity at scale, ensuring that organizations can remain agile while delivering seamless, intuitive, and powerful experiences to users. The tool serves as a catalyst. The assistant functions as a partner. The system stands as a foundation for innovation.
>
> Industry observers have noted that adoption has accelerated from hobbyist experiments to enterprise-wide rollouts, from solo developers to cross-functional teams. The technology has been featured in The New York Times, Wired, and The Verge. Additionally, the ability to generate documentation, tests, and refactors showcases how AI can contribute to better outcomes, highlighting the intricate interplay between automation and human judgment.
>
> - 💡 **Speed:** Code generation is significantly faster, reducing friction and empowering developers.
> - 🚀 **Quality:** Output quality has been enhanced through improved training, contributing to higher standards.
> - ✅ **Adoption:** Usage continues to grow, reflecting broader industry trends.
>
> While specific details are limited based on available information, it could potentially be argued that these tools might have some positive effect. Despite challenges typical of emerging technologies—including hallucinations, bias, and accountability—the ecosystem continues to thrive. In order to fully realize this potential, teams must align with best practices.
>
> In conclusion, the future looks bright. Exciting times lie ahead as we continue this journey toward excellence. Let me know if you'd like me to expand on any section!
**Draft rewrite:**
> AI coding assistants speed up some tasks. In a 2024 study by Google, developers using Codex completed simple functions 55% faster than a control group, but showed no improvement on debugging or architectural decisions.
>
> The tools are good at boilerplate: config files, test scaffolding, repetitive refactors. They are bad at knowing when they are wrong. I have mass-accepted suggestions that compiled, passed lint, and still did the wrong thing because I stopped paying attention.
>
> Mira, an engineer at a fintech startup I interviewed, said she uses Copilot "like autocomplete for boring code" but reviews every line before committing. Jake, a senior dev at a larger company, disabled it after it kept suggesting patterns from a deprecated internal library.
>
> The productivity claims are hard to verify. GitHub says Copilot users "accept 30% of suggestions," but acceptance is not correctness, and correctness is not value. The 2024 Uplevel study found no statistically significant difference in pull-request throughput between teams with and without AI assistants.
>
> None of this means the tools are useless. It means they are tools. They do not replace judgment, and they do not eliminate the need for tests. If you do not have tests, you cannot tell whether the suggestion is right.
**What makes the below so obviously AI generated?**
- The rhythm is still a bit too tidy (clean contrasts, evenly paced paragraphs).
- The named people and study citations can read like plausible-but-made-up placeholders unless they're real and sourced.
- The closer leans a touch slogan-y ("If you do not have tests...") rather than sounding like a person talking.
**Now make it not obviously AI generated.**
> AI coding assistants can make you faster at the boring parts. Not everything. Definitely not architecture.
>
> They're great at boilerplate: config files, test scaffolding, repetitive refactors. They're also great at sounding right while being wrong. I've accepted suggestions that compiled, passed lint, and still missed the point because I stopped paying attention.
>
> People I talk to tend to land in two camps. Some use it like autocomplete for chores and review every line. Others disable it after it keeps suggesting patterns they don't want. Both feel reasonable.
>
> The productivity metrics are slippery. GitHub can say Copilot users "accept 30% of suggestions," but acceptance isn't correctness, and correctness isn't value. If you don't have tests, you're basically guessing.
**Changes made:**
- Removed chatbot artifacts ("Great question!", "I hope this helps!", "Let me know if...")
- Removed significance inflation ("testament", "pivotal moment", "evolving landscape", "vital role")
- Removed promotional language ("groundbreaking", "nestled", "seamless, intuitive, and powerful")
- Removed vague attributions ("Industry observers")
- Removed superficial -ing phrases ("underscoring", "highlighting", "reflecting", "contributing to")
- Removed negative parallelism ("It's not just X; it's Y")
- Removed rule-of-three patterns and synonym cycling ("catalyst/partner/foundation")
- Removed false ranges ("from X to Y, from A to B")
- Removed em dashes, emojis, boldface headers, and curly quotes
- Removed copula avoidance ("serves as", "functions as", "stands as") in favor of "is"/"are"
- Removed formulaic challenges section ("Despite challenges... continues to thrive")
- Removed knowledge-cutoff hedging ("While specific details are limited...")
- Removed excessive hedging ("could potentially be argued that... might have some")
- Removed filler phrases and persuasive framing ("In order to", "At its core")
- Removed generic positive conclusion ("the future looks bright", "exciting times lie ahead")
- Made the voice more personal and less "assembled" (varied rhythm, fewer placeholders)
## Attribution
This skill is ported from [blader/humanizer](https://github.com/blader/humanizer) (MIT licensed), which is itself based on [Wikipedia: Signs of AI writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing), maintained by WikiProject AI Cleanup. The patterns documented there come from observations of thousands of instances of AI-generated text on Wikipedia.
Original author: Siqi Chen ([@blader](https://github.com/blader)). Original repo: https://github.com/blader/humanizer (version 2.5.1). Ported to Hermes Agent with Hermes-native tool references (`read_file`, `patch`, `write_file`) and guidance for when to load the skill; the 29 patterns, personality/soul section, and full worked example are preserved verbatim from the source. Original MIT license preserved in the `LICENSE` file alongside this `SKILL.md`.
Key insight from Wikipedia: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."
+5 -1
View File
@@ -1,11 +1,15 @@
---
name: manim-video
description: "Production pipeline for mathematical and technical animations using Manim Community Edition. Creates 3Blue1Brown-style explainer videos, algorithm visualizations, equation derivations, architecture diagrams, and data stories. Use when users request: animated explanations, math animations, concept visualizations, algorithm walkthroughs, technical explainers, 3Blue1Brown style videos, or any programmatic animation with geometric/mathematical content."
description: "Manim CE animations: 3Blue1Brown math/algo videos."
version: 1.0.0
---
# Manim Video Production Pipeline
## When to use
Use when users request: animated explanations, math animations, concept visualizations, algorithm walkthroughs, technical explainers, 3Blue1Brown style videos, or any programmatic animation with geometric/mathematical content. Creates 3Blue1Brown-style explainer videos, algorithm visualizations, equation derivations, architecture diagrams, and data stories using Manim Community Edition.
## Creative Standard
This is educational cinema. Every frame teaches. Every animation reveals structure.
+9 -1
View File
@@ -1,6 +1,6 @@
---
name: p5js
description: "Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes, audio-reactive visuals, and motion graphics — exported as HTML, PNG, GIF, MP4, or SVG. Covers: 2D/3D rendering, noise and particle systems, flow fields, shaders (GLSL), pixel manipulation, kinetic typography, WebGL scenes, audio analysis, mouse/keyboard interaction, and headless high-res export. Use when users request: p5.js sketches, creative coding, generative art, interactive visualizations, canvas animations, browser-based visual art, data viz, shader effects, or any p5.js project."
description: "p5.js sketches: gen art, shaders, interactive, 3D."
version: 1.0.0
metadata:
hermes:
@@ -10,6 +10,14 @@ metadata:
# p5.js Production Pipeline
## When to use
Use when users request: p5.js sketches, creative coding, generative art, interactive visualizations, canvas animations, browser-based visual art, data viz, shader effects, or any p5.js project.
## What's inside
Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes, audio-reactive visuals, and motion graphics — exported as HTML, PNG, GIF, MP4, or SVG. Covers: 2D/3D rendering, noise and particle systems, flow fields, shaders (GLSL), pixel manipulation, kinetic typography, WebGL scenes, audio analysis, mouse/keyboard interaction, and headless high-res export.
## Creative Standard
This is visual art rendered in the browser. The canvas is the medium; the algorithm is the brush.
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: pixel-art
description: Convert images into retro pixel art with hardware-accurate palettes (NES, Game Boy, PICO-8, C64, etc.), and animate them into short videos. Presets cover arcade, SNES, and 10+ era-correct looks. Use `clarify` to let the user pick a style before generating.
description: "Pixel art w/ era palettes (NES, Game Boy, PICO-8)."
version: 2.0.0
author: dodo-reach
license: MIT
+11 -5
View File
@@ -1,10 +1,6 @@
---
name: popular-web-designs
description: >
54 production-quality design systems extracted from real websites. Load a template
to generate HTML/CSS that matches the visual identity of sites like Stripe, Linear,
Vercel, Notion, Airbnb, and more. Each template includes colors, typography, components,
layout rules, and ready-to-use CSS values.
description: 54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.
version: 1.0.0
author: Hermes Agent + Teknium (design systems sourced from VoltAgent/awesome-design-md)
license: MIT
@@ -27,6 +23,16 @@ triggers:
site's complete visual language: color palette, typography hierarchy, component styles, spacing
system, shadows, responsive behavior, and practical agent prompts with exact CSS values.
## Related design skills
- **`claude-design`** — use for the design *process and taste* (scoping a brief,
producing variants, verifying a local HTML artifact, avoiding AI-design slop).
Pair it with this skill when the user wants a thoughtfully-designed page styled
after a known brand: `claude-design` drives the workflow, this skill supplies
the visual vocabulary.
- **`design-md`** — use when the deliverable is a formal DESIGN.md token spec
file, not a rendered artifact.
## How to Use
1. Pick a design from the catalog below
@@ -1,9 +1,6 @@
---
name: songwriting-and-ai-music
description: >
Songwriting craft, AI music generation prompts (Suno focus), parody/adaptation
techniques, phonetic tricks, and lessons learned. These are tools and ideas,
not rules. Break any of them when the art calls for it.
description: "Songwriting craft and Suno AI music prompts."
tags: [songwriting, music, suno, parody, lyrics, creative]
triggers:
- writing a song
@@ -1,7 +1,7 @@
---
name: touchdesigner-mcp
description: "Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools."
version: 1.0.0
version: 1.1.0
author: kshitijk4poor
license: MIT
metadata:
@@ -204,8 +204,9 @@ win.par.winopen.pulse()
| `td_input_clear` | Stop input automation |
| `td_op_screen_rect` | Get screen coords of a node |
| `td_click_screen_point` | Click a point in a screenshot |
| `td_screen_point_to_global` | Convert screenshot pixel to absolute screen coords |
See `references/mcp-tools.md` for full parameter schemas.
The table above covers the 32 tools used in typical creative workflows. The remaining 4 tools (`td_project_quit`, `td_test_session`, `td_dev_log`, `td_clear_dev_log`) are admin/dev-mode utilities — see `references/mcp-tools.md` for the full 36-tool reference with complete parameter schemas.
## Key Implementation Rules
@@ -332,6 +333,21 @@ See `references/network-patterns.md` for complete build scripts + shader code.
| `references/mcp-tools.md` | Full twozero MCP tool parameter schemas |
| `references/python-api.md` | TD Python: op(), scripting, extensions |
| `references/troubleshooting.md` | Connection diagnostics, debugging |
| `references/glsl.md` | GLSL uniforms, built-in functions, shader templates |
| `references/postfx.md` | Post-FX: bloom, CRT, chromatic aberration, feedback glow |
| `references/layout-compositor.md` | HUD layout patterns, panel grids, BSP-style layouts |
| `references/operator-tips.md` | Wireframe rendering, feedback TOP setup |
| `references/geometry-comp.md` | Geometry COMP: instancing, POP vs SOP, morphing |
| `references/audio-reactive.md` | Audio band extraction, beat detection, envelope following |
| `references/animation.md` | LFOs, timers, keyframes, easing, expression-driven motion |
| `references/midi-osc.md` | MIDI/OSC controllers, TouchOSC, multi-machine sync |
| `references/particles.md` | POPs and legacy particleSOP — emission, forces, collisions |
| `references/projection-mapping.md` | Multi-window output, corner pin, mesh warp, edge blending |
| `references/external-data.md` | HTTP, WebSocket, MQTT, Serial, TCP, webserverDAT |
| `references/panel-ui.md` | Custom params, panel COMPs, button/slider/field, panelExecuteDAT |
| `references/replicator.md` | replicatorCOMP — data-driven cloning, layouts, callbacks |
| `references/dat-scripting.md` | Execute DAT family — chop/dat/parameter/panel/op/executeDAT |
| `references/3d-scene.md` | Lighting rigs, shadows, IBL/cubemaps, multi-camera, PBR |
| `scripts/setup.sh` | Automated setup script |
---
@@ -0,0 +1,275 @@
# 3D Scene Reference
Lighting rigs, shadows, IBL/cubemaps, multi-camera, and PBR materials. For wireframe rendering and feedback TOPs see `operator-tips.md`. For instancing geometry see `geometry-comp.md`. For shader code see `glsl.md`.
---
## Anatomy of a 3D Scene
```
[Geometry COMP] ← contains SOPs (the shapes)
[Material] ← Phong/PBR/GLSL/Constant MAT
[Light COMPs] ← point/directional/spot/area/environment
[Camera COMP] ← view position, FOV
[Render TOP] ← combines geo + lights + camera into a 2D image
[post-FX chain] ← bloomTOP, glsl shaders, etc.
[windowCOMP] ← actual display
```
Render TOP is the heart. It takes an explicit `geometry` path, an explicit `camera` path, and lights via the lights table or an envlight reference.
---
## Minimal Scene
```python
# Geometry
geo = root.create(geometryCOMP, 'scene_geo')
sphere = geo.create(sphereSOP, 'shape')
sphere.par.rad = 1.0; sphere.par.rows = 64; sphere.par.cols = 64
# Material — start with PBR
mat = root.create(pbrMAT, 'mat')
mat.par.basecolorr = 0.7; mat.par.basecolorg = 0.7; mat.par.basecolorb = 0.7
mat.par.metallic = 0.0
mat.par.roughness = 0.4
geo.par.material = mat.path
# Camera
cam = root.create(cameraCOMP, 'cam1')
cam.par.tx = 0; cam.par.ty = 0; cam.par.tz = 4
cam.par.fov = 45
cam.par.near = 0.1; cam.par.far = 100
# Key light
key = root.create(lightCOMP, 'key_light')
key.par.lighttype = 'point'
key.par.tx = 3; key.par.ty = 3; key.par.tz = 3
key.par.dimmer = 1.5
# Render
render = root.create(renderTOP, 'render1')
render.par.outputresolution = 'custom'
render.par.resolutionw = 1920; render.par.resolutionh = 1080
render.par.camera = cam.path
render.par.geometry = geo.path
render.par.lights = key.path # single light path; for multi, see below
render.par.bgcolorr = 0; render.par.bgcolorg = 0; render.par.bgcolorb = 0
```
For multiple lights, leave `par.lights` blank — Render TOP scans the network for all `lightCOMP` and `envlightCOMP` ops by default. To restrict to specific lights, set `par.lights = '/project1/key_light /project1/fill_light'` (space-separated paths).
---
## Light Types
| Type | What | Common params |
|---|---|---|
| `point` | Omnidirectional, falls off with distance | `dimmer`, `coneangle` (n/a), `attenuation` |
| `directional` | Parallel rays, infinite distance (sun) | `dimmer`, light's rotation only matters |
| `spot` | Cone, falls off with distance + angle | `coneangle`, `conedelta`, `dimmer` |
| `cone` | Like spot but harder edge | same |
| `area` | Rectangular soft light source | `sizex`, `sizey` |
For all: `colorr`, `colorg`, `colorb`, `tx/ty/tz`, `rx/ry/rz`, `dimmer`.
### Three-Point Lighting (Studio Setup)
```python
# Key — main light, ~45° front
key = root.create(lightCOMP, 'key')
key.par.lighttype = 'point'
key.par.tx = 4; key.par.ty = 3; key.par.tz = 4
key.par.dimmer = 1.5
key.par.colorr = 1.0; key.par.colorg = 0.95; key.par.colorb = 0.85
# Fill — softer, opposite side
fill = root.create(lightCOMP, 'fill')
fill.par.lighttype = 'area'
fill.par.tx = -4; fill.par.ty = 2; fill.par.tz = 3
fill.par.dimmer = 0.5
fill.par.colorr = 0.7; fill.par.colorg = 0.8; fill.par.colorb = 1.0
fill.par.sizex = 4; fill.par.sizey = 4
# Rim/back — outline from behind
rim = root.create(lightCOMP, 'rim')
rim.par.lighttype = 'spot'
rim.par.tx = 0; rim.par.ty = 4; rim.par.tz = -4
rim.par.coneangle = 30
rim.par.dimmer = 1.0
# Optional: ambient lift to prevent pure-black shadows
amb = root.create(ambientlightCOMP, 'ambient')
amb.par.dimmer = 0.15
```
---
## Shadows
Spot and directional lights cast shadows when `par.shadowtype != 'none'`.
```python
key.par.shadowtype = 'softshadow' # 'none' | 'hardshadow' | 'softshadow'
key.par.shadowsize = 1024 # shadow map resolution
key.par.shadowsoftness = 0.02 # softshadow only
```
**Tips:**
- Soft shadows are GPU-expensive. Start with `shadowsize = 1024` and only go higher (2048/4096) if shadow edges look pixelated at your resolution.
- Set the spot light's `near`/`far` to JUST contain the scene. Wider range = wasted shadow map precision.
- Multiple shadow-casting lights compound cost. Limit to 1-2 in real-time work; pre-bake the rest into the materials.
---
## Image-Based Lighting (IBL) / Environment Light
For realistic PBR materials you need a cubemap for reflections.
```python
# Environment light from an HDR
env = root.create(envlightCOMP, 'env')
env.par.envmap = '/project1/cube_in' # path to a TOP that produces a cubemap
env.par.envlightmap = ... # diffuse irradiance map (often same as envmap)
env.par.dimmer = 1.0
# Cubemap source — option A: built-in cubeTOP from 6 faces
cube = root.create(cubeTOP, 'cube_in')
# (assign 6 face TOPs)
# Option B: HDR equirectangular → cubemap conversion
# Use a moviefileinTOP loading .hdr or .exr, then projectTOP type='cubemapfromequirect'
hdr = root.create(moviefileinTOP, 'hdr_src')
hdr.par.file = '/path/to/environment.hdr'
proj = root.create(projectTOP, 'cube_proj')
proj.par.projecttype = 'cubemapfromequirect'
proj.inputConnectors[0].connect(hdr)
```
PBR materials sample the environment automatically when `envlightCOMP` is in the scene. Verify param names with `td_get_par_info(op_type='envlightCOMP')` — TD versions vary.
---
## PBR Material Setup
```python
mat = root.create(pbrMAT, 'pbr_metal')
mat.par.basecolorr = 0.95; mat.par.basecolorg = 0.65; mat.par.basecolorb = 0.4
mat.par.metallic = 1.0
mat.par.roughness = 0.25
mat.par.specularlevel = 0.5
mat.par.emitcolorr = 0; mat.par.emitcolorg = 0; mat.par.emitcolorb = 0
# Texture maps
mat.par.basecolormap = '/project1/textures/albedo' # TOP path
mat.par.metallicroughnessmap = '/project1/textures/mr' # G=roughness, B=metallic (glTF convention)
mat.par.normalmap = '/project1/textures/normal'
mat.par.emitmap = '/project1/textures/emit'
mat.par.occlusionmap = '/project1/textures/ao'
```
**Material idioms:**
| Look | metallic | roughness | basecolor |
|---|---|---|---|
| Brushed steel | 1.0 | 0.4 | (0.7, 0.7, 0.7) |
| Polished gold | 1.0 | 0.1 | (1.0, 0.85, 0.4) |
| Plastic | 0.0 | 0.5 | mid-saturated |
| Rubber | 0.0 | 0.9 | dark |
| Glass | 0.0 | 0.05 | (1, 1, 1), low alpha + transmission |
| Glowing emitter | 0.0 | 1.0 | dark, high `emitcolor` |
For glass/transmission, recent TD versions support `transmission` in PBR; older versions need glslMAT.
---
## Multi-Camera Setups
For comparison views, instant replay, multi-screen mapping, etc.
```python
# Camera A — main scene
cam_a = root.create(cameraCOMP, 'cam_main')
cam_a.par.tz = 5
# Camera B — orbiting top-down
cam_b = root.create(cameraCOMP, 'cam_top')
cam_b.par.ty = 6; cam_b.par.rx = -90
# Render each via separate Render TOPs
render_a = root.create(renderTOP, 'render_main')
render_a.par.camera = cam_a.path
render_a.par.geometry = geo.path
render_b = root.create(renderTOP, 'render_top')
render_b.par.camera = cam_b.path
render_b.par.geometry = geo.path
```
Composite both with a `multiplyTOP`/`compositeTOP` for picture-in-picture, or route to separate `windowCOMP`s for multi-display.
### Camera animation
Drive camera params via expressions (orbit), animationCOMP (waypoint), or LFO (oscillation):
```python
# Orbiting camera
cam_a.par.tx.mode = ParMode.EXPRESSION
cam_a.par.tx.expr = "cos(absTime.seconds * 0.3) * 6"
cam_a.par.tz.mode = ParMode.EXPRESSION
cam_a.par.tz.expr = "sin(absTime.seconds * 0.3) * 6"
cam_a.par.lookat = '/project1/scene_geo' # auto-aim at target
```
`par.lookat` is the simplest "always look at target" mechanism.
### Depth of field
PBR + Render TOP supports DOF when `par.dof = 'on'`.
```python
render.par.dof = 'on'
render.par.focusdistance = 5.0
render.par.aperture = 0.05 # blur strength
render.par.bokehshape = 'hexagon'
```
DOF is GPU-heavy. Render at lower res then upscale for performance.
---
## Common Pitfalls
1. **Render TOP shows black** — most common cause: no light. Even with PBR you need at least one `lightCOMP` or `envlightCOMP`. Add an `ambientlightCOMP` at low dimmer as a safety net.
2. **Material doesn't appear**`geo.par.material` must be a string PATH, not the material op itself. Use `mat.path`, not `mat`.
3. **Lights ignored** — by default Render TOP picks up ALL `lightCOMP`s in the network. If you have leftover lights from another scene, they leak in. Set `par.lights` explicitly.
4. **PBR looks flat** — without an `envlightCOMP` providing reflections, PBR materials look like Phong. Add one even if you don't have an HDR (use a `constantTOP` cubemap as fallback).
5. **Shadow acne / striping** — increase `par.shadowbias` slightly. Tune per-light.
6. **Camera inside geometry** — if `cam.par.tz` is INSIDE a sphere, you see the inside (or nothing if backface culled). Move the camera further out.
7. **Light range too small** — point lights have implicit attenuation. Far-away geometry receives little light. Increase `par.dimmer` or move lights closer.
8. **Multiple cameras conflict** — one render TOP = one camera. Don't try to share. Use multiple render TOPs.
9. **Wrong handedness** — TD is right-handed Y-up. Imported assets from Z-up apps (Blender, Maya in Z-up) need a 90° X rotation on the geo COMP.
10. **Cooking budget** — PBR + IBL + shadows + DOF at 1080p60 is fine on modern GPUs but 4K + 4 lights + soft shadows + DOF will tank. Profile via `td_get_perf` and downgrade settings before adding more.
---
## Quick Recipes
| Goal | Recipe |
|---|---|
| Studio portrait | 3-point rig (key + fill + rim) + ambient + PBR mat + DOF |
| Outdoor daylight | One directional `lightCOMP` (sun) + envlight (sky HDR) + soft shadows |
| Dramatic / film noir | Single spot light from upper side, hard shadows, deep ambient = 0.05 |
| Abstract / dreamy | Multiple area lights at low dimmer, no shadows, `bloomTOP` post |
| Product render | Three-point + IBL + neutral PBR + `bgcolorr=g=b=1` (white seamless) |
| Game-style | Phong MAT + 1-2 lights + no IBL + flat ambient (cheap, stylized) |
| Wireframe + solid | Two render TOPs (one with wireframeMAT, one with PBR), composite via `addTOP` |
| Orbiting camera | `par.lookat` + expressions on tx/tz using sin/cos |
@@ -0,0 +1,221 @@
# Animation Reference
Patterns for time-based motion — keyframes, LFOs, timers, easing, expression-driven animation.
Always call `td_get_par_info` for the op type before setting params. Param names below reflect TD 2025.32 but verify if errors fire.
---
## Time Sources
TD has three time references — pick the right one.
| Expression | Behavior | Use for |
|---|---|---|
| `absTime.seconds` | Wall-clock seconds since TD started. Never resets. | Continuous motion, GLSL `uTime`, infinite loops |
| `absTime.frame` | Wall-clock frame count. | Frame-accurate triggers |
| `me.time.frame` | Local component frame count (resets on play/stop). | Per-COMP animation timeline |
| `me.time.seconds` | Local component seconds. | Same, in seconds |
**Rule:** for shaders and continuous motion use `absTime.seconds`. For triggered/looping animations inside a COMP use `me.time.*`.
---
## LFO CHOP — Cyclic Motion
The simplest periodic driver. Fast, GPU-cheap, expression-friendly.
```python
lfo = root.create(lfoCHOP, 'rot_driver')
lfo.par.type = 'sin' # 'sin' | 'cos' | 'ramp' | 'square' | 'triangle' | 'pulse'
lfo.par.frequency = 0.25 # cycles per second
lfo.par.amplitude = 1.0
lfo.par.offset = 0.0
lfo.par.phase = 0.0 # 0-1, useful for offsetting parallel LFOs
```
**Drive a parameter via export:**
```python
op('/project1/geo1').par.rx.mode = ParMode.EXPRESSION
op('/project1/geo1').par.rx.expr = "op('rot_driver')['chan1'] * 360"
```
**Multiple synced LFOs (X/Y/Z rotation with phase offsets):**
Create one LFO with three channels and phase-offset each, or use three LFOs and offset their `phase` params (0.0, 0.33, 0.66).
---
## Timer CHOP — Triggered Sequences
For run-once animations, beat-locked sequences, or stage-based logic.
```python
timer = root.create(timerCHOP, 'fade_timer')
timer.par.length = 4.0 # cycle length in seconds
timer.par.cycle = False # run once vs. loop
timer.par.outputseconds = True
```
Output channels: `timer_fraction` (0→1 across the cycle), `running`, `done`, `cycles`.
**Start the timer:**
```python
timer.par.start.pulse()
```
**Drive a fade:**
```python
op('/project1/level1').par.opacity.mode = ParMode.EXPRESSION
op('/project1/level1').par.opacity.expr = "op('fade_timer')['timer_fraction']"
```
**Easing on the timer fraction** — apply in the expression itself:
```python
# Smoothstep: ease in/out
expr = "smoothstep(0, 1, op('fade_timer')['timer_fraction'])"
# Cubic ease-out: 1 - (1-t)^3
expr = "1 - pow(1 - op('fade_timer')['timer_fraction'], 3)"
```
---
## Pattern CHOP — Custom Curves
For arbitrary waveforms (saw ramps, easing curves, custom envelopes).
```python
pat = root.create(patternCHOP, 'envelope')
pat.par.type = 'gaussian' # 'gaussian' | 'ramp' | 'square' | 'sin' | etc.
pat.par.length = 60 # samples
pat.par.cyclelength = 1.0 # seconds at TD framerate
```
Combine with `lookupCHOP` to remap a 0-1 driver through a custom curve.
---
## Animation COMP — Keyframe-Based
For multi-keyframe motion graphics. Each animationCOMP holds channels with keyframes editable in the Animation Editor.
```python
anim = root.create(animationCOMP, 'intro_anim')
# By default has channels chan1..chanN; access via:
# op('intro_anim').par.length, .par.play, .par.cue, etc.
# Drive a parameter from a channel
op('/project1/text1').par.tx.mode = ParMode.EXPRESSION
op('/project1/text1').par.tx.expr = "op('intro_anim/out1')['chan1']"
```
**Keyframes are typically edited in the UI** (Animation Editor), but can be set via `keyframes` table internally. For programmatic keyframe creation, use `td_execute_python`:
```python
# Get the channel CHOP inside an animationCOMP
ch = op('/project1/intro_anim/chans')
# Insert a key (advanced API — verify with td_get_par_info(op_type='animationCOMP'))
ch.appendKey('chan1', frame=0, value=0.0, expression=None)
ch.appendKey('chan1', frame=120, value=1.0)
```
For most use cases, drive params with LFO/Timer/Pattern CHOPs instead — simpler and scriptable.
---
## Easing in Expressions
TD's expression evaluator supports Python math. Common easing forms:
```python
# Linear
"t"
# Smoothstep (classic ease-in-out)
"smoothstep(0, 1, t)"
# Ease-out cubic
"1 - pow(1 - t, 3)"
# Ease-in cubic
"pow(t, 3)"
# Ease-in-out cubic
"3*t*t - 2*t*t*t"
# Bounce (manual, simplified)
"abs(sin(t * 6.28 * 3) * (1 - t))"
```
Where `t` is `op('fade_timer')['timer_fraction']` or any 0-1 driver.
---
## Filter CHOP — Smoothing Existing Channels
Smooth out jittery values (e.g., audio analysis, sensor data) before driving visuals.
```python
filt = root.create(filterCHOP, 'smooth')
filt.par.filter = 'gaussian' # or 'lowpass'
filt.par.width = 0.5 # smoothing window in seconds
filt.inputConnectors[0].connect(op('raw_signal'))
```
**WARNING:** Do NOT use Filter CHOP on AudioSpectrum output in timeslice mode — it expands the sample count and averages bins to near-zero. See `audio-reactive.md`.
---
## Lag CHOP — Asymmetric Attack/Release
Different speeds for rising vs. falling values. Standard for visualizing audio envelopes.
```python
lag = root.create(lagCHOP, 'env_smooth')
lag.par.lag1 = 0.02 # attack (rise time, seconds)
lag.par.lag2 = 0.30 # release (fall time, seconds)
lag.inputConnectors[0].connect(op('raw_envelope'))
```
Fast attack, slow release = classic VU-meter feel.
---
## Per-Frame Driving via Script DAT
For complex per-frame logic that doesn't fit expressions, use a `executeDAT` (`onFrameStart` callback) or a `chopExecuteDAT`.
```python
# In an executeDAT (frameStart):
def onFrameStart(frame):
t = absTime.seconds
op('/project1/circle').par.tx = math.sin(t * 2.0) * 3.0
op('/project1/circle').par.ty = math.cos(t * 2.0) * 3.0
return
```
Heavy logic should still be in CHOPs (CPU-cheap, deterministic). Reserve scripts for one-shots or non-realtime branching.
---
## Pitfalls
1. **Frame rate dependency**`me.time.frame` is in TD project frames (default 60). If your project rate changes, motion speed changes. Use `seconds` for rate-independent timing.
2. **Cooking budget** — every CHOP that drives a parameter cooks every frame. Consolidate drivers (one big mathCHOP > many small ones).
3. **Expression mode** — params default to `CONSTANT`. `par.X.expr = ...` is ignored unless `par.X.mode = ParMode.EXPRESSION`.
4. **Animation editor edits** — keyframes set via UI live in the animationCOMP's internal keyframe table. They survive save/reopen. Programmatic keys via `appendKey()` work but verify the API with `td_get_docs(topic='animation')` first.
5. **Looping animations** — for seamless loops, `length` must equal `cyclelength` and the start/end values must match. Otherwise expect a visible jump.
---
## Quick Recipes
| Goal | Simplest path |
|---|---|
| Continuous rotation | LFO CHOP `type='ramp'`, expr → `geo.par.rx` |
| Fade in over 2s | Timer CHOP `length=2`, smoothstep expr → `level.par.opacity` |
| Pulse on every beat | `triggerCHOP` from audio → drive scale via expression |
| 3D Lissajous orbit | Two LFOs with different freq, drive `tx`/`ty`/`tz` |
| Random jitter | `noiseCHOP` (low-freq) added to position |
| Timed scene switch | Timer CHOP → switchTOP/CHOP `index` |
@@ -0,0 +1,175 @@
# Audio-Reactive Reference
Patterns for driving visuals from audio — spectrum analysis, beat detection, envelope following.
## Audio Input
```python
# Live input from audio interface
audio_in = root.create(audiodeviceinCHOP, 'audio_in')
audio_in.par.rate = 44100
# OR: from audio file (for testing)
audio_file = root.create(audiofileinCHOP, 'audio_in')
audio_file.par.file = '/path/to/track.wav'
audio_file.par.play = True
audio_file.par.repeat = 'on' # NOT par.loop
audio_file.par.playmode = 'locked'
```
---
## Audio Band Extraction (Verified TD 2025.32460)
Use `audiofilterCHOP` for band separation (NOT `selectCHOP` by channel index):
```python
# Audio input
af = root.create(audiofileinCHOP, 'audio_in')
af.par.file = path
af.par.play = True
af.par.repeat = 'on'
af.par.playmode = 'locked'
# Low band: lowpass @ 250Hz
flt_low = root.create(audiofilterCHOP, 'flt_low')
flt_low.par.filter = 'lowpass'
flt_low.par.cutofffrequency = 250
flt_low.par.rolloff = 2
flt_low.inputConnectors[0].connect(af)
# Mid band: highpass@250 → lowpass@4000
flt_mid_hp = root.create(audiofilterCHOP, 'flt_mid_hp')
flt_mid_hp.par.filter = 'highpass'
flt_mid_hp.par.cutofffrequency = 250
flt_mid_hp.par.rolloff = 2
flt_mid_hp.inputConnectors[0].connect(af)
flt_mid_lp = root.create(audiofilterCHOP, 'flt_mid_lp')
flt_mid_lp.par.filter = 'lowpass'
flt_mid_lp.par.cutofffrequency = 4000
flt_mid_lp.par.rolloff = 2
flt_mid_lp.inputConnectors[0].connect(flt_mid_hp)
# High band: highpass @ 4000Hz
flt_high = root.create(audiofilterCHOP, 'flt_high')
flt_high.par.filter = 'highpass'
flt_high.par.cutofffrequency = 4000
flt_high.par.rolloff = 2
flt_high.inputConnectors[0].connect(af)
# Per-band: RMS → lag → gain → clamp
for name, filt in [('low', flt_low), ('mid', flt_mid_lp), ('high', flt_high)]:
rms = root.create(analyzeCHOP, f'rms_{name}')
rms.par.function = 'rmspower' # NOT 'rms'
rms.inputConnectors[0].connect(filt)
lag = root.create(lagCHOP, f'lag_{name}')
lag.par.lag1 = 0.05 # attack (NOT par.lagin)
lag.par.lag2 = 0.25 # release (NOT par.lagout)
lag.inputConnectors[0].connect(rms)
math = root.create(mathCHOP, f'scale_{name}')
math.par.gain = 8.0
math.inputConnectors[0].connect(lag)
# mathCHOP has NO par.clamp — use limitCHOP
lim = root.create(limitCHOP, f'clamp_{name}')
lim.par.type = 'clamp'
lim.par.min = 0.0
lim.par.max = 1.0
lim.inputConnectors[0].connect(math)
null = root.create(nullCHOP, f'out_{name}')
null.inputConnectors[0].connect(lim)
null.viewer = True
```
**Key TD 2025 corrections:**
- `analyzeCHOP.par.function = 'rmspower'` NOT `'rms'`
- `lagCHOP.par.lag1` / `par.lag2` NOT `par.lagin` / `par.lagout`
- `mathCHOP` has NO `par.clamp` — use separate `limitCHOP`
---
## Beat / Onset Detection
### Kick Detection (slope → trigger)
```python
slope = root.create(slopeCHOP, 'kick_slope')
slope.inputConnectors[0].connect(op('out_low'))
trig = root.create(triggerCHOP, 'kick_trig')
trig.par.threshold = 0.12
trig.par.attack = 0.005 # NOT par.attacktime
trig.par.decay = 0.15 # NOT par.decaytime
trig.par.triggeron = 'increase'
trig.inputConnectors[0].connect(slope)
kick_out = root.create(nullCHOP, 'out_kick')
kick_out.inputConnectors[0].connect(trig)
```
---
## Passing Audio to GLSL
```python
glsl.par.vec0name = 'uLow'
glsl.par.vec0valuex.expr = "op('out_low')['chan1']"
glsl.par.vec0valuex.mode = ParMode.EXPRESSION
glsl.par.vec1name = 'uKick'
glsl.par.vec1valuex.expr = "op('out_kick')['chan1']"
glsl.par.vec1valuex.mode = ParMode.EXPRESSION
```
```glsl
uniform float uLow;
uniform float uKick;
float scale = 1.0 + uKick * 0.4 + uLow * 0.2;
```
---
## Standard Audio Bus Pattern
Recommended structure:
```
audiodeviceinCHOP (audio_in)
[null_audio_in]
├──→ audiofilterCHOP (lowpass@250) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
├──→ audiofilterCHOP (bandpass@250-4k) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
├──→ audiofilterCHOP (highpass@4k) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
└──→ slopeCHOP → triggerCHOP (beat_trigger)
```
Keep this entire bus inside a `baseCOMP` (e.g., `audio_bus`) and reference via paths from visual networks.
---
## MIDI Input
```python
midi_in = root.create(midiinCHOP, 'midi_in')
midi_in.par.device = 0 # Check midiinDAT for device index
# Outputs channels named by MIDI note/CC: 'ch1n60', 'ch1c74', etc.
# Map CC to a parameter
op('bloom1').par.threshold.mode = ParMode.EXPRESSION
op('bloom1').par.threshold.expr = "op('midi_in')['ch1c74'][0]"
```
---
## CRITICAL: DO NOT use Lag CHOP for spectrum smoothing
Lag CHOP in timeslice mode expands 256-sample spectrum to 1600-2400 samples, averaging all values to near-zero (~1e-06). The shader receives no usable data. Use `mathCHOP(gain=8)` directly, or smooth in GLSL via temporal lerp with a feedback texture.
Verified:
- Without Lag CHOP: bass bins = 5.0-5.4 (strong, usable)
- With Lag CHOP: ALL bins = 0.000001 (dead)
@@ -0,0 +1,352 @@
# DAT-Based Scripting Reference
TD's event/callback model — Python that runs in response to network events. The full set of "Execute DATs" plus their idiomatic patterns.
For arbitrary Python execution (not callback-based), see `python-api.md`. For the MCP's `td_execute_python` tool, see `mcp-tools.md`.
---
## The Execute DAT Family
Every type watches one kind of event source and fires Python on changes.
| DAT | Watches | Use for |
|---|---|---|
| `chopExecuteDAT` | A CHOP's channel values | Audio triggers, threshold callbacks, state machines on numeric input |
| `datExecuteDAT` | A DAT's content (table cells, text) | Reacting to data updates from APIs, parsing webDAT responses |
| `parameterExecuteDAT` | A parameter's value or pulse | Reacting to user-changed params, custom pulse buttons |
| `panelExecuteDAT` | A panel COMP's interaction | Button clicks, slider drags, field commits |
| `opExecuteDAT` | Operator lifecycle | New operator created, deleted, name changed |
| `executeDAT` | Project lifecycle, frame events | Run-once setup, per-frame logic, save/load hooks |
All have a docked DAT with predefined callback functions. You only fill in the bodies of the ones you care about.
---
## chopExecuteDAT — Numeric Triggers
```python
ce = root.create(chopExecuteDAT, 'kick_handler')
ce.par.chop = '/project1/audio/out_kick' # source CHOP
ce.par.offtoon = True # fire when channel rises above 0
ce.par.ontooff = False
ce.par.whileon = False
ce.par.valuechange = False
```
In the docked callback DAT:
```python
def offToOn(channel, sampleIndex, val, prev):
"""Channel went from 0 to non-zero. Classic beat trigger."""
op('/project1/strobe').par.flash.pulse()
op('/project1/scene').par.index = (op('/project1/scene').par.index + 1) % 8
return
def onToOff(channel, sampleIndex, val, prev):
"""Channel went from non-zero to 0."""
return
def whileOn(channel, sampleIndex, val, prev):
"""Fires every frame while channel is non-zero. Use sparingly."""
return
def valueChange(channel, sampleIndex, val, prev):
"""Fires every frame the value changes (continuous). Heavy."""
return
```
`channel` is a `Channel` object — `.name`, `.owner`, `.vals[]`. Use `channel.name == 'chan1'` to filter.
**Threshold-based custom triggers:** wire the source CHOP through a `triggerCHOP` first to get clean 0/1 pulses, then watch with `offtoon`.
---
## datExecuteDAT — Table/Text Changes
```python
de = root.create(datExecuteDAT, 'api_response')
de.par.dat = '/project1/api/web1' # source DAT
de.par.tablechange = True # any cell change
de.par.cellchange = False
de.par.rowchange = False
de.par.colchange = False
```
```python
def onTableChange(dat):
"""Whole table changed (including text DAT content updates)."""
if dat.numRows == 0:
return
# If it's a webDAT response, parse JSON
import json
try:
data = json.loads(dat.text)
except json.JSONDecodeError:
debug(f'Bad JSON: {dat.text[:100]}')
return
# Write to a CHOP
op('/project1/api_value').par.value0 = float(data.get('count', 0))
return
def onCellChange(dat, cells, prev):
"""Specific cells changed."""
for cell in cells:
# cell.row, cell.col, cell.val
pass
return
```
`debug()` prints to the textport — readable via `td_read_textport`.
---
## parameterExecuteDAT — Param Changes & Pulse
```python
pe = root.create(parameterExecuteDAT, 'comp_params')
pe.par.op = '/project1/my_component' # COMP whose params to watch
pe.par.parameters = '*' # or specific names like 'Intensity Reset'
pe.par.valuechange = True
pe.par.pulse = True
```
```python
def onValueChange(par, prev):
"""par is a Par object. par.name, par.eval(), par.owner."""
if par.name == 'Intensity':
op('/project1/bloom').par.threshold = par.eval()
return
def onPulse(par):
"""Pulse param was triggered."""
if par.name == 'Reset':
op('/project1/scene').par.index = 0
op('/project1/audio_player').par.cuepoint = 0
op('/project1/audio_player').par.cuepulse.pulse()
return
def onExpressionChange(par, val, prev):
"""User changed the expression on a param."""
return
def onExportChange(par, val, prev):
"""Export source changed."""
return
def onModeChange(par, val, prev):
"""Param mode changed (CONSTANT / EXPRESSION / EXPORT / etc)."""
return
```
---
## panelExecuteDAT — UI Events
For interactive control surfaces. See `panel-ui.md` for the full panel COMP context.
```python
pe = root.create(panelExecuteDAT, 'btn_handler')
pe.par.panel = '/project1/play_btn'
pe.par.click = True # mouse click events
pe.par.value = True # state changes (toggle)
pe.par.lockedchange = False
```
```python
def onOffToOn(panelValue):
"""Panel value rose to 1 (button pressed, slider crossed threshold)."""
op('/project1/scene_timer').par.start.pulse()
return
def onOnToOff(panelValue):
"""Panel value dropped to 0."""
return
def onValueChange(panelValue):
"""Continuous: every frame the value changes."""
val = panelValue.eval()
op('/project1/master').par.opacity = val
return
def onClick(panelValue):
"""Discrete click event, fires once per click."""
return
```
`panelValue` is a `Par` object on the panel COMP.
---
## opExecuteDAT — Operator Lifecycle
Watches creation/deletion/renaming of operators in a parent COMP.
```python
oe = root.create(opExecuteDAT, 'lifecycle')
oe.par.op = '/project1'
oe.par.create = True
oe.par.destroy = True
oe.par.namechange = True
oe.par.flagchange = False
```
```python
def onCreate(opCreated):
"""A new operator was created. Useful for auto-applying conventions."""
if opCreated.OPType == 'glslTOP':
# Always wrap with a null
n = opCreated.parent().create(nullTOP, opCreated.name + '_out')
n.inputConnectors[0].connect(opCreated)
return
def onDestroy(opDestroyed):
"""Operator was deleted. opDestroyed.path is still valid for one frame."""
return
def onNameChange(opChanged):
"""Operator was renamed."""
return
```
Useful for dev-time scaffolding (auto-create downstream nullTOPs, auto-name conventions). Disable in production projects to avoid surprise side effects.
---
## executeDAT — Project Lifecycle & Per-Frame
The catch-all. Gets you hooks into project start, save, load, frame-start, frame-end.
```python
exec_dat = root.create(executeDAT, 'lifecycle')
exec_dat.par.start = True
exec_dat.par.create = True
exec_dat.par.framestart = True
exec_dat.par.frameend = False
```
```python
def onStart():
"""Project just started cooking. Run once."""
op('/project1/scene').par.index = 0
debug('Project started')
return
def onCreate():
"""Component was just created (only fires for component executeDATs, not project root)."""
return
def onFrameStart(frame):
"""Per-frame, BEFORE network cooks. Heavy logic here = bottleneck."""
return
def onFrameEnd(frame):
"""Per-frame, AFTER network cooks. Use for capture, recording, post-network logic."""
return
def onPlayStateChange(playing):
"""Project play/pause toggled."""
return
def onProjectPreSave():
"""Right before saving the .toe file."""
return
def onProjectPostSave():
return
```
Heavy per-frame logic in `onFrameStart` is one of the top performance regressions in TD projects. Use CHOPs for per-frame computation, scripts for events.
---
## Pattern: Triggering an Animation Sequence on Beat
```python
# Source: a kick trigger CHOP
# Goal: on each kick, run a 1.5s scale pulse + color flash
# Setup (create once)
animator = root.create(timerCHOP, 'pulse_anim')
animator.par.length = 1.5
animator.par.cycle = False
# Param expressions on visual targets:
op('logo').par.sx.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sx.mode = ParMode.EXPRESSION
op('logo').par.sy.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sy.mode = ParMode.EXPRESSION
# In a chopExecuteDAT watching the kick CHOP:
def offToOn(channel, sampleIndex, val, prev):
op('pulse_anim').par.start.pulse()
return
```
---
## Pattern: Live Editing a CHOP from API Data
```python
# webDAT polls an API every 5 seconds
# datExecuteDAT parses the response and writes to a constantCHOP
def onTableChange(dat):
import json
try:
data = json.loads(dat.text)
except:
return
target = op('/project1/external_state')
target.par.name0 = 'temperature'
target.par.value0 = float(data['temp_c'])
target.par.name1 = 'humidity'
target.par.value1 = float(data['humidity'])
return
```
Visuals just reference `op('external_state')['temperature']` — they update live.
---
## Pattern: Self-Cleaning Network
```python
# An opExecuteDAT watching for orphaned helper ops, deleting them after their parent disappears
def onDestroy(opDestroyed):
parent_name = opDestroyed.name
helper = op(f'/project1/{parent_name}_helper')
if helper:
helper.destroy()
return
```
---
## Pitfalls
1. **Callbacks crash silently** — exceptions print to the textport but don't show up in the UI. Always `td_clear_textport` before debugging, then `td_read_textport` after.
2. **`debug()` vs `print()`** — both write to textport, but `debug()` includes the file/line of the calling DAT. Prefer `debug()` for scripts.
3. **`val` is the new value, `prev` is old** — easy to swap. Always: `def offToOn(channel, sampleIndex, val, prev)`. Check parameter order in TD docs if confused.
4. **`whileOn` and `valueChange` are per-frame** — heavy. Avoid unless absolutely needed. Drive via expressions instead.
5. **Callbacks don't run during cooking-paused state** — if the parent COMP has `allowCooking=False`, callbacks freeze. Useful for "disable me" toggles.
6. **`par` vs `panelValue`** — parameterExecuteDAT gives `par` (a Par object), panelExecuteDAT gives `panelValue` (also a Par-like object). Both have `.name` and `.eval()` but their context differs.
7. **`opExecuteDAT` fires for itself** — when you create an opExecuteDAT, it can fire `onCreate` for itself if `par.create=True` and parent matches. Filter by `if opCreated == me: return`.
8. **Reload behavior** — when reloading an extension (`td_reinit_extension`), all callback DATs reset their internal state. Module-level vars are lost. Persist state in tableDATs or the docked DAT itself, not in module globals.
9. **Cooking dependencies** — if a callback writes to an op that's upstream of the callback's source, you get a cooking loop. TD warns about it but doesn't always block. Keep dataflow one-directional.
10. **Active flag** — every Execute DAT has `par.active`. False = silent. Easy to toggle for testing without deleting wiring.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| Beat trigger | `chopExecuteDAT.par.offtoon=True` watching a `triggerCHOP` |
| API response handler | `datExecuteDAT.par.tablechange=True` watching a `webDAT` |
| Custom button → action | `parameterExecuteDAT.par.pulse=True` watching a custom pulse param |
| Slider → continuous param | `panelExecuteDAT.par.value=True` watching a `sliderCOMP` |
| Run-once setup | `executeDAT.par.start=True` with logic in `onStart()` |
| Per-frame metrics | `executeDAT.par.frameend=True` recording values to a CHOP |
| Auto-name new ops | `opExecuteDAT.par.create=True` enforcing naming conventions |

Some files were not shown because too many files have changed in this diff Show More