Compare commits

..

83 Commits

Author SHA1 Message Date
kshitijk4poor d8c2c77be6 feat(plugins): add optional-plugins/ discovery + langfuse_tracing as first official optional plugin
Introduces optional-plugins/ — a new category for plugins that ship with
the repo but are NOT auto-discovered. They live alongside the code but only
land in ~/.hermes/plugins/ (and thus get loaded) when the user explicitly
installs them.

Core changes:
- optional-plugins/observability/langfuse-tracing/ — langfuse tracing plugin
  (pre/post LLM + tool hooks, usage/cost normalization, fail-open when SDK
  missing). NOT in plugins/ so zero import overhead on devices that don't
  want it.
- hermes_cli/plugins_cmd.py — official install path: _resolve_official_plugin()
  recognises 'official/<category>/<name>' identifiers and copies from
  optional-plugins/ into ~/.hermes/plugins/ (no git clone, no network).
  _list_official_plugins() enumerates available optional plugins.
  cmd_list(available=True) shows not-yet-installed official plugins.
- hermes_cli/main.py — hermes plugins list --available flag
- hermes_cli/tools_config.py — Langfuse Observability in TOOL_CATEGORIES;
  post_setup handler installs the langfuse SDK and runs cmd_install()
- hermes_cli/config.py — Langfuse credentials in OPTIONAL_ENV_VARS;
  optional tuning keys in _EXTRA_ENV_KEYS

User flows:
  hermes plugins install official/observability/langfuse-tracing
  hermes plugins list --available
  hermes tools  (-> Langfuse Observability -> credentials -> auto-installs)

Closes #15764
2026-04-28 11:52:42 +05:30
Teknium 8081425a1c feat(security): make secret redaction off by default (#16794)
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.

New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.

Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
2026-04-27 21:24:08 -07:00
Teknium ec8243fe2a chore(release): map matrix-parity-batch contributor emails to GitHub logins 2026-04-27 21:22:44 -07:00
Teknium 3d67364b8f test(matrix): set user_id in approval-reaction test to bypass defensive self-drop
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
2026-04-27 21:22:44 -07:00
nbot 38a6bada92 feat(matrix): reaction-based exec approval + mention_user_id
Add Matrix reaction-based exec approval (/) and mention_user_id
support for push notifications in muted rooms.

- matrix.py: _MatrixApprovalPrompt, send_exec_approval, reaction
  approval handling, bot seed reaction redaction, mention pill in send
- base.py: inject mention_user_id into send metadata
- run.py: inject mention_user_id into status thread metadata
- tests for approval prompt registration and reaction resolution
2026-04-27 21:22:44 -07:00
Andrew Miller 6c70ac8eef matrix: e2e test for cross-signing auto-bootstrap
Self-contained docker-compose harness that exercises the new bootstrap
branch against a real Continuwuity homeserver. Three tests:

  1. fresh bot → bootstrap fires, /keys/query returns master + ssk
     with UNPADDED base64 keyids, current device is signed by the
     new SSK
  2. second startup with same crypto store → bootstrap is skipped
  3. MATRIX_RECOVERY_KEY set → existing verify_with_recovery_key path
     takes precedence, no new bootstrap

Run via:

    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
    python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
    docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v

The test mirrors the bootstrap snippet from matrix.py inline so it can
run without importing the full hermes gateway and its deps. Skipped
automatically when mautrix isn't installed or the homeserver is
unreachable.

All three pass against ghcr.io/continuwuity/continuwuity:latest
(Continuwuity 0.5.7). The unpadded-keyid assertion is the load-bearing
one — it's exactly the property the PR's bootstrap path provides that
the hand-rolled `base64.b64encode().decode()` scripts get wrong.
2026-04-27 21:22:44 -07:00
Andrew Miller d497387cec matrix: auto-bootstrap cross-signing on first startup
Without this, every Matrix bot started under hermes-agent shows the
"Encrypted by a device not verified by its owner" badge in Element
indefinitely, because the cross-signing chain (master → SSK → device)
was never published. Operators currently have to write their own
bootstrap script and remember to run it once per bot — and it's easy
to get wrong (the obvious base64.b64encode().decode() produces padded
keyids that matrix-rust-sdk silently rejects in /keys/query, so even
correctly-signed keys fail to load identity in Element).

mautrix already has the right primitive: generate_recovery_key() does
the full flow — generate seeds, upload privates to SSSS, publish
publics to the homeserver, sign the current device with the new SSK,
and return the human-readable recovery key. We invoke it once on
startup if the bot has no existing cross-signing identity, and log
the recovery key with a clear instruction to save it for future
restarts via MATRIX_RECOVERY_KEY (which the existing recovery-key
path already consumes).

Skipped when MATRIX_RECOVERY_KEY is set (existing path takes over)
or when the bot already has cross-signing keys on the homeserver
(get_own_cross_signing_public_keys returns non-None).

Bootstrap failure is non-fatal — logged with hint about UIA; the bot
continues without cross-signing and Element will show the warning
that prompted this PR. That matches the existing soft-fail pattern
for verify_with_recovery_key.

Tested against Continuwuity 0.5.7 (no UIA required). Synapse with
UIA enabled will need a follow-up PR to thread MATRIX_PASSWORD
through to /keys/device_signing/upload.
2026-04-27 21:22:44 -07:00
konsisumer 32d4048c6b fix: MatrixAdapter respects proxy configuration 2026-04-27 21:22:44 -07:00
Adam Rummer 1eab5960f0 feat(matrix): add dm_auto_thread config for DM auto-threading
Adds MATRIX_DM_AUTO_THREAD env var (default: false) to control
auto-threading in DM rooms independently from channel auto-threading.

Closes #15398
2026-04-27 21:22:44 -07:00
LeonSGP43 74a4832b74 fix(matrix): normalize image-only filenames 2026-04-27 21:22:44 -07:00
Alexazhu fbbcfa24c5 fix(matrix): preserve exception tracebacks on E2EE and auth failures
Five ``except Exception as exc:`` blocks in the Matrix adapter logged
only ``str(exc)`` without ``exc_info=True``:

- _reverify_keys_after_upload → post-upload key verification failure
- _upload_keys_if_needed      → initial device-key query failure
- _upload_keys_if_needed      → re-upload device keys failure
- _upload_keys_if_needed      → initial device key upload failure
- connect → whoami / access-token validation failure

The E2EE key paths here are security-critical: a silent traceback-
less failure during device-key verification or upload makes it
hard for operators to tell whether their Matrix bot is failing
because of a stale token, a federation timeout, or an olm state
mismatch — all three fail with different tracebacks, which
``str(exc)`` alone flattens.

The contributing guide asks for ``exc_info=True`` on error logs.
Append it to each of the five call sites. Pure logging enrichment.
2026-04-27 21:22:44 -07:00
Heathley f223346eb7 fix(matrix): add sync timeout, callback diagnostics, and mention-drop logging
- Wrap _sync_loop sync() call with asyncio.wait_for(timeout=45s) to guard
  against TCP-level hangs that the Matrix long-poll timeout cannot catch
- Add logger.debug at the top of _on_room_message so LOG_LEVEL=DEBUG
  confirms whether callbacks fire at all (diagnoses #5819, #7914, #12614)
- Add logger.debug when MATRIX_REQUIRE_MENTION silently drops a message,
  pointing users to the env var to disable the filter

Adapted for current mautrix-python adapter (PR was written against the
legacy matrix-nio adapter).

Closes #5819
2026-04-27 21:22:44 -07:00
Charles Brooks 57f8cf00e9 fix(matrix): reconcile pending invites from sync state 2026-04-27 21:22:44 -07:00
Teknium 6649e7e746 test(matrix): adapt outbound-mention notice test to current _send_simple_message API 2026-04-27 21:22:44 -07:00
Angel Claw 32b78578e0 fix(matrix): strip only explicit @mentions in _strip_mention 2026-04-27 21:22:44 -07:00
Sami Rusani 6769a0aece fix(matrix): add outbound mention payloads 2026-04-27 21:22:44 -07:00
Teknium d7528d43ac fix(web): scope dashboard config Reset button to the current tab (#16813)
* Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total

Child subagents built by delegate_task() each track their own
session_estimated_cost_usd, but the parent agent's total never folded
those numbers in.  On runs where the parent mostly delegates and the
children do the expensive work, the footer/UI was reporting a fraction
of the actual spend — sometimes $0.00 when the parent itself made no
billed calls.

Fix:
- Capture each child's session_estimated_cost_usd into _child_cost_usd
  on the result entry (before child.close() drops the counter).
- After the existing subagent_stop hook loop, sum the children's costs
  and add the total to parent.session_estimated_cost_usd.
- Promote session_cost_source from 'none' -> 'subagent' when the parent
  had no direct spend but children did, so the UI doesn't label the
  total as having unknown provenance.  Real sources (openrouter,
  anthropic, etc.) are preserved.

Nested orchestrator -> worker trees roll up naturally: each layer's own
delegate_task() folds its direct children in, and when the orchestrator
itself returns, its parent folds the orchestrator's now-inflated total
on top.

Internal fields (_child_cost_usd, _child_role) are stripped from the
results dict before it's serialised back to the model — same contract
as _child_role already followed.

Tests: TestSubagentCostRollup (5 cases) covers single-child, batch,
zero-cost-children, preserved-source, and legacy-fixture paths.

Source: https://github.com/Kilo-Org/kilocode/pull/9448

* fix(web): scope dashboard config Reset button to the current tab

Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on
the Auxiliary page wiped the entire config.yaml to defaults, not just
the auxiliary section. The button sits next to the category tabs and
users reasonably assumed 'reset this tab', not 'reset everything'.

Changes:
- handleReset now scopes to the fields in the current view:
  active category's fields (form mode) or search-matched fields
  (search mode). Only those keys are copied from defaults; the rest
  of the config is left alone.
- Added a window.confirm() with the scope name before applying.
- Button is hidden in YAML mode (scoping doesn't apply there).
- Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to
  defaults'.
- i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast
  strings in en + zh; resetDefaults key preserved for compat.
2026-04-27 21:09:14 -07:00
Teknium a7cdd4133c fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793)
On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6
are capped at 200K context unless the request carries the
`context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com)
1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate
it as beta as of 2026-04.

Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`)
while silently sending a request without the beta — so Bedrock users saw
a 200K ceiling with no error message, and no config knob unblocked it.
Claude Code sends this header by default, which is why the same Bedrock
credentials worked there.

- Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved
  thinking and fine-grained tool streaming).
- Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth
  endpoints — they host their own models, not Claude, so Anthropic beta
  headers are irrelevant and could risk rejection.
- Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock
  client. Previously that constructor passed no betas at all, so native
  Anthropic had the 1M unlock via default_headers but Bedrock didn't.
- Fast-mode per-request `extra_headers` already rebuilds from
  `_common_betas_for_base_url`, so it picks up the 1M beta automatically.

Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while
same credentials worked in Claude Code.
2026-04-27 20:41:36 -07:00
kshitijk4poor 461ef88705 fix(state): declarative column reconciliation for stuck-at-old-v7 DBs
Anyone who ran hermes between Apr 15 (42aeb4ec) and Apr 22 (a7d78d3b)
has schema_version=7 from the pre-renumber api_call_count migration.
When a7d78d3b inserted reasoning_content as the new v7 and pushed
api_call_count to v8, the 'if current_version < 7' gate was already
false for those users, so reasoning_content was never created —
sqlite3.OperationalError: no such column: reasoning_content on any
/continue or /resume touching assistant replays.

Replaces the version-gated ADD COLUMN chain with _reconcile_columns():
on every startup, parse SCHEMA_SQL via an in-memory SQLite and diff
against PRAGMA table_info; ALTER TABLE ADD COLUMN for anything missing.
Follows the Beets / sqlite-utils pattern — SCHEMA_SQL becomes the single
source of truth for declared columns. Self-healing and idempotent.

v10 trigram FTS backfill is retained in a version-gated block — that
migration isn't a column add, it inserts existing message rows into
the new FTS virtual table, so reconciliation can't express it.
schema_version is also kept for future row-data migrations.

Salvaged from #14097 (@kshitijk4poor) onto current main; v10 trigram
preservation and the v9 codex_message_items column (stale-missed by
the original branch) are covered automatically by reconciliation.

Tests:
- Regression: DB at old v7 with api_call_count but no reasoning_content
  gets the column on open
- Idempotency: reopening the same DB is a no-op
- Structural invariant: every SCHEMA_SQL column is in the live DB
- Existing v2 migration test still passes
- E2E verified against fresh / v1 / old-v7 / v9 DBs, plus v10 trigram
  backfill preserved
2026-04-27 20:29:32 -07:00
Teknium 12d745bd7e feat(skills): port humanizer — strip AI-isms from text (#16787)
Port https://github.com/blader/humanizer (MIT, v2.5.1, 16k stars) into
the built-in skills under skills/creative/humanizer/. Based on Wikipedia's
'Signs of AI writing' guide (WikiProject AI Cleanup) — detects 29 AI-writing
patterns and rewrites them to sound human.

Hermes-native adaptations:
- Description (<60 chars) explains what it's for: 'Humanize text: strip
  AI-isms and add real voice.'
- 'When to use this skill' section — trigger phrases (humanize, de-AI,
  de-slop, un-ChatGPT, rewrite to not sound like an LLM) plus guidance to
  apply it to the agent's own output (release notes, PR descriptions, docs).
- 'How to use it in Hermes' — maps the three real input paths (inline,
  file via read_file/patch/write_file, voice-calibration sample) onto the
  tools the agent actually has. Drops Claude Code's allowed-tools block.
- Converted frontmatter to Hermes format (metadata.hermes.tags, category,
  homepage, related_skills).

Attribution preserved:
- Original author Siqi Chen (@blader) credited in frontmatter and body.
- Full MIT LICENSE copied verbatim alongside SKILL.md.
- Wikipedia / WikiProject AI Cleanup credited.
- 29 patterns, personality/soul section, and full worked example kept
  verbatim from the source (29,914 chars).

Validated end-to-end against a clean HERMES_HOME:
- sync_skills() copies skills/creative/humanizer/ including LICENSE.
- skills_list(category='creative') returns the 48-char description.
- skill_view(name='humanizer') returns the full body with all 29 patterns,
  personality/soul, attribution, and Hermes tool refs (read_file, patch,
  write_file) intact.
2026-04-27 20:25:20 -07:00
Teknium 30307a9802 feat(plugins): add pre_approval_request / post_approval_response hooks (#16776)
Plugins can now observe dangerous-command approval events in real time,
on both the CLI-interactive path and the async gateway path. This is the
missing hook surface external tools need to build approval notifiers
(macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without
forking Hermes or running a parallel gateway adapter.

Changes:
- hermes_cli/plugins.py: add two entries to VALID_HOOKS
- tools/approval.py: fire both hooks from check_all_command_guards --
  around prompt_dangerous_approval (CLI surface) and around the
  notify_cb + blocking event.wait loop (gateway surface)
- website/docs/user-guide/features/hooks.md: document both hooks with
  a macOS-notification example
- tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once,
  CLI deny, plugin-crash resilience, gateway approve, gateway timeout

Hooks are observer-only: return values are ignored, so plugins cannot
veto or pre-answer an approval (use pre_tool_call for that). A crashing
plugin cannot break the approval flow -- invoke_hook swallows per-
callback errors, and the wrapper logs and swallows dispatch-layer
errors too.

Surface kwarg distinguishes "cli" from "gateway"; post hook reports
choice as one of once/session/always/deny/timeout.
2026-04-27 20:08:33 -07:00
Teknium 6ea5699e3f fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775)
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.

Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None

Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.

Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam

Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
2026-04-27 20:08:23 -07:00
SHL0MS c3e3a9c184 feat(skills): add Tier A references — external-data, panel-ui, replicator, dat-scripting, 3d-scene
Five additional reference docs covering common TD use cases that were not yet
documented in any reference (operators.md lists the ops, but no usage patterns).

- external-data.md: webDAT, webclientDAT, webserverDAT, websocketDAT,
  mqttClientDAT, serialDAT, tcpipDAT — auth, polling, push, JSON parsing
- panel-ui.md: custom parameter pages, button/slider/field/list COMPs,
  containerCOMP layouts, panelExecuteDAT callbacks
- replicator.md: replicatorCOMP for data-driven cloning, per-row overrides,
  recreatemissing pattern, replicator vs Python loop
- dat-scripting.md: full Execute DAT family — chopExecuteDAT, datExecuteDAT,
  parameterExecuteDAT, panelExecuteDAT, opExecuteDAT, executeDAT lifecycle
- 3d-scene.md: light types, three-point rigs, shadows, IBL/cubemaps,
  PBR materials with idiom table, multi-camera, DOF

Same conventions as existing refs: code-first, verify param names with
td_get_par_info, no token-budget impact (load on demand).
2026-04-27 19:35:18 -07:00
SHL0MS 02df438316 feat(skills): expand touchdesigner-mcp with animation, MIDI/OSC, particles, projection refs
Adds four new reference docs covering common TD use cases not previously
documented in the skill:

- animation.md: LFOs, timers, keyframes, easing, time references
- midi-osc.md: MIDI controllers, OSC routing, TouchOSC, multi-machine sync
- particles.md: POPs and particleSOP — emission, forces, collisions, render
- projection-mapping.md: windowCOMP, corner pin, mesh warp, edge blending

Also clarifies the SKILL.md tool quick reference: adds td_screen_point_to_global
and notes that 4 admin/dev-mode tools (td_project_quit, td_test_session,
td_dev_log, td_clear_dev_log) live only in mcp-tools.md to keep the main
reference focused on creative workflows.

No SKILL.md workflow or critical-rules changes. References load on demand
so no token-budget impact at session start.
2026-04-27 19:35:18 -07:00
Teknium 94b26f3ec9 fix(compression): retry summary on main model for unknown errors before giving up (#16774)
The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder.

Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back).

Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.
2026-04-27 19:25:57 -07:00
iamagenius00 f2fcc087f7 test(gateway): cover /compress summary-failure warning path
PR #16333 added a warning to the manual /compress reply when the
auxiliary summariser fails and the static fallback placeholder is
used, but only the gateway-hygiene path had a test
(test_session_hygiene_warns_user_when_summary_generation_fails).
The /compress branch in _handle_compress_command was uncovered.

New test test_compress_command_appends_warning_when_summary_generation_fails
mocks the compressor's _last_summary_fallback_used /
_last_summary_dropped_count / _last_summary_error fields and
verifies the /compress reply contains the ⚠️ marker, the underlying
error string, the dropped message count, and the 'historical
message(s) were removed' wording — i.e. the same contract the
hygiene-path test enforces.
2026-04-27 19:18:13 -07:00
iamagenius00 e7f2204a07 fix(compression): reset _last_summary_error at start of compress()
The per-call reset block at the top of compress() cleared
_last_summary_dropped_count and _last_summary_fallback_used but
not _last_summary_error. Functionally this didn't break the
gateway warning path (callers gate on _last_summary_fallback_used
first, and _last_summary_error is overwritten on the next failure),
but it left the three tracking fields inconsistent — anyone
reading _last_summary_error standalone after a successful compress
would see a stale value from a previous failed compress.

Reset all three together so the per-call contract is uniform.
2026-04-27 19:18:13 -07:00
iamagenius00 5c56805a74 fix(compression): align fallback placeholder wording with gateway warning
The fallback placeholder said "N conversation turns were removed" while the
gateway warning said "N historical message(s) were removed". Use "messages"
in both so users don't wonder if the two counters refer to different things.
2026-04-27 19:18:13 -07:00
iamagenius00 c61bc3f72c fix(compression): pass thread_id metadata + add gateway test for warning delivery
Address review feedback on PR #16333:

1. The hygiene-path warning send was missing metadata=_hyg_meta. On
   Telegram topics / Slack threads / Discord threads the warning would
   land in the main channel instead of the originating thread. Now
   reuses the same _hyg_meta dict already computed for the hygiene
   compaction itself.

2. New gateway-level test
   test_session_hygiene_warns_user_when_summary_generation_fails
   verifies end-to-end:
   - When the compressor's _last_summary_fallback_used flag is True,
     the gateway invokes adapter.send() exactly once.
   - The warning message includes the dropped count and the underlying
     error string.
   - metadata={'thread_id': ...} is propagated so the warning lands
     in the originating topic/thread.

Tests: 20 gateway hygiene + 54 context_compressor — all pass.
2026-04-27 19:18:13 -07:00
iamagenius00 dfdc4276e8 fix(compression): notify gateway users when summary generation fails
When auxiliary compression's summary LLM call fails (e.g. model 404,
auxiliary model misconfigured), the compressor still drops the selected
turns and inserts a static fallback placeholder — the dropped context
is unrecoverable.

Previously the only signal of this was a WARNING in agent.log. Gateway
users (Telegram/Discord/etc.) had no way to know context was lost
because the existing _emit_warning path requires a status_callback,
and the gateway hygiene path uses a temporary _hyg_agent with
quiet_mode=True and no callback wired up.

Changes:
- ContextCompressor: track _last_summary_fallback_used and
  _last_summary_dropped_count on each compress() call. Cleared at the
  start of compress() and on session reset.
- gateway/run.py hygiene: after auto-compress, inspect the temp
  agent's compressor; if fallback was used, send a visible ⚠️ warning
  to the user via the platform adapter (TG/Discord/etc.) including
  dropped count and the underlying error.
- gateway/run.py /compress: append the same warning to the manual
  compress reply so users running /compress see the failure too.

Acceptance:
- Summary success: no user-visible warning (unchanged).
- Summary failure on gateway hygiene: user receives a TG/Discord
  message with dropped count + error + remediation hint.
- Summary failure on /compress: warning appended to the command reply.
- CLI status_callback / _emit_warning path is untouched.
- Test coverage: two new tests verify the tracking fields are set on
  failure and cleared on subsequent success.
2026-04-27 19:18:13 -07:00
Teknium f40b20d13c fix(gateway): keep typing indicator alive across slow send_typing calls (#16763)
The typing-indicator refresh loop in BasePlatformAdapter._keep_typing
awaited each send_typing call unconditionally. Each call is an HTTP
round-trip to the platform API (Telegram/Discord), normally ~100ms. When
the same network instability that causes upstream provider timeouts
(e.g. Anthropic capacity blips slowing first-token latency past the
120s stream-read timeout) also slows the platform typing API to
multi-second response times, the refresh loop stalls inside the await.
Platform-side typing expires at ~5s, so the bubble dies and stays dead
until the stuck send_typing call returns — right when the user most
needs the 'still working' signal and instead sees a bot that looks
dead, then asks 'wtf are you doing' which itself interrupts the
eventually-recovering turn.

Bound each send_typing with asyncio.wait_for (1.5s cap, derived from
interval so it's always below the 2s cadence). Slow calls get abandoned
so the next scheduled tick fires a fresh send_typing on schedule. As
long as any one of them reaches the platform within its ~5s
typing-expiry window, the bubble stays visible across the stall.

Also catches non-timeout send_typing exceptions (transient HTTP errors)
so one bad tick doesn't terminate the whole loop.

Tests: 4 new in tests/gateway/test_keep_typing_timeout.py covering
slow-send non-blocking, fast-send still-awaited, exception resilience,
and paused-chat regression guard.
2026-04-27 19:09:32 -07:00
kshitijk4poor 853ed609a1 feat(skills): bundle touchdesigner-mcp by default 2026-04-27 18:22:58 -07:00
helix4u 49fb75463f fix(gateway): keep env-token Slack enabled 2026-04-27 18:19:14 -07:00
brooklyn! e0e67a99bb fix(tui): address copilot follow-up review on PR #16732 (#16740)
- moveCursor(extend=true) now collapses to the bare cursor when the
  computed offset equals the existing anchor instead of leaving a
  zero-length sel. Without this, Shift+Left at col 0 / Shift+Home at
  start would silently hide the hardware cursor (selected truthy)
  without rendering any highlight.
- _tui_need_npm_install also catches UnicodeDecodeError so a corrupted
  / non-UTF8 lockfile falls back to the mtime path the docstring
  promises instead of crashing.

Made-with: Cursor
2026-04-27 16:54:25 -07:00
brooklyn! e7091bb326 fix(tui): mouse + keyboard text selection in the composer (#16732)
* feat(tui): auto copy-on-select for transcript text

Drag in the transcript already highlighted but you had to press Cmd+C to
land it on the clipboard, and the highlight cleared on copy — most users
never realised selection existed. Now drag-release fires copySelectionNoClear
so the text is on the clipboard immediately while the highlight stays put,
matching iTerm2's "Copy to pasteboard on selection" default. Esc clears.

Behaviour:
- Single click in the input still positions the cursor (TextInput onClick).
- Single click in the transcript still does nothing destructive.
- Double / triple click select word / line, then drag extends.
- /copyselect [on|off|toggle] (alias /cos) flips the setting at runtime,
  HERMES_TUI_DISABLE_COPY_ON_SELECT=1 disables at startup, persists via
  display.tui_copy_on_select in config.yaml.

Help overlay now lists drag-select, multi-click, and click-to-position
so the gestures are discoverable.

Made-with: Cursor

* fix(tui): support prompt text selection gestures

Add mouse drag selection and Shift+Arrow/Home/End extension inside the TUI composer so prompt text behaves like a normal editable field while keeping click-to-position and right-click paste intact.

Made-with: Cursor

* Revert "feat(tui): auto copy-on-select for transcript text"

This reverts commit 6701288fe0.

* fix(tui): allow composer selection from prompt whitespace

Give the composer a one-cell mouse capture pad before the editable text. The prompt glyph/gutter still does not become selectable, but dragging from the edge now anchors at input offset 0 so users do not need to hit the first character precisely.

Made-with: Cursor

* fix(tui): clear selections from blank composer space

Clicking blank space in the transcript or composer now clears active TUI/input selections like a normal text surface. TextInput clicks stop bubbling so cursor placement and selection gestures keep their local behavior.

Made-with: Cursor

* fix(tui): delegate prompt gutter drags to composer text

The prompt gutter is now an input gesture region, not selectable content. Dragging from the whitespace or prompt area anchors the composer selection at offset 0, while selection highlight/copy remains limited to actual input text.

Made-with: Cursor

* fix(tui): move composer cursor to end on selection clear

External clear actions now collapse the composer selection to the end of the input, matching normal text-field behavior after dismissing a selection.

Made-with: Cursor

* fix(tui): capture composer padding before prompt

Add an explicit mouse capture cell over the left padding before the prompt glyph. Drags starting there now delegate to the composer input at offset 0 instead of starting terminal-level selection over the prompt chrome.

Made-with: Cursor

* fix(tui): avoid npm install on lockfile mtime churn

Compare package-lock.json against npm's hidden node_modules lock by content instead of mtimes. Git checkouts and npm lock rewrites can make the root lockfile newer even when installed dependencies already match, causing hermes --tui to print Installing TUI dependencies on every launch.

Made-with: Cursor

* fix(tui): include prompt leading cell in gesture region

Use the prompt box's real layout region to cover the leading whitespace cell before the glyph. The cell now participates in mouse hit testing and delegates to composer selection instead of starting terminal-level selection.

Made-with: Cursor

* fix(tui): widen prompt-side gesture capture band

Capture a wider left-side band around the composer prompt row so drags starting in terminal gutter/padding cells are consumed and delegated to input selection, instead of triggering terminal-level selection chrome.

Made-with: Cursor

* fix(tui): make pre-prompt spacer non-selectable content

Replace the sticky-prompt fallback `Text(' ')` with an empty spacer box so the visual gap remains but no literal space character is rendered/copyable before the composer prompt.

Made-with: Cursor

* fix(tui): capture pre-prompt spacer without shifting prompt layout

Revert the widened negative-margin prompt capture band and instead capture drags on the dedicated spacer row above the prompt. This keeps prompt/text alignment stable while still delegating whitespace-start drags to composer selection.

Made-with: Cursor

* fix(tui): align prompt with status bar and capture full input row

Drop the leading prompt column from 3 to 2 so the input first character lines up with the status bar text. Wrap the prompt+input row in a single mouse-capture box and stop event propagation from TextInput's own handlers so any drag in that row delegates to composer selection without leaking to terminal-level selection.

Made-with: Cursor

* fix(tui): anchor hardware cursor during composer selection

When a composer selection covers a row exactly the column width, the rendered text fills the row and the terminal auto-wraps the hardware cursor to col 0 of the next row, leaving a ghost block beneath the prompt. Park the cursor at the start of the input box during selection so it can't escape the input region.

Made-with: Cursor

* fix(tui): hide hardware cursor during composer selection

Stop fighting auto-wrap by hiding the hardware cursor outright while the
composer has an active selection. This prevents both the ghost block under
the prompt (cursor wrapping past the last cell) and the parked-cursor block
on the first selected character. The cursor restores as soon as the
selection clears or focus changes.

Made-with: Cursor

* chore(tui): /clean — drop dead capture-pad path, dedupe gutter handlers

- TextInput: remove unused leftCaptureColumns prop and capture-pad math, drop
  unused mouseApi.startAt, fold mouse offset into a single offsetAt helper,
  share a MouseEventLite type across the four handlers.
- appLayout: hoist a GutterMouseEvent type and an endInputDrag callback so the
  spacer/prompt/input rows share one shape.
- _tui_need_npm_install: lift the runtime-only key set to a module constant,
  collapse nested isinstance checks, and document the mtime fallback.

Made-with: Cursor

* fix(tui): address copilot review on PR #16732

- Split InputSelection.clear() into clear() (cursor-preserving) and
  collapseToEnd() (clear + jump to end). Cmd+C copy paths keep using
  clear() so the cursor stays put; the blank-area click in useMainApp
  switches to collapseToEnd() to match the requested UX.
- Spacer-row drags now force row=0 when forwarding into the input,
  since the spacer's vertical origin doesn't align with the input box
  and Ink mouse-capture keeps dispatching motion to the original
  target. Prompt+input row drag keeps localRow because origins match.

Made-with: Cursor

* fix(tui): give TextInput Box an explicit width

After the /clean pass dropped the unused capture-pad math, the wrapping
Box also lost its explicit width and started sizing to its rendered
content. Clicks past the last character missed TextInput and fell
through to the parent prompt-row Box, which collapsed the cursor to
offset 0. Pin the Box back to `columns` so the input owns its full
column span regardless of value length.

Made-with: Cursor

* feat(tui): double-click select-all + hide cursor on terminal blur

- Track click time/offset in TextInput so a quick second click on the
  same offset triggers select-all. Ink's screen-level multi-click is
  bypassed once our onMouseDown captures, so the gesture has to be
  detected locally.
- Extend the cursor-hide effect to also fire when the terminal loses
  focus, so the hollow-rect ghost most terminals draw at the parked
  cursor position disappears too.

Made-with: Cursor

* chore(tui): /clean — extract isMultiClickAt helper

Pull the click-recurrence math out of TextInput's onMouseDown into a
small isMultiClickAt(offset) helper so the handler reads as the gesture
list it actually is (multi-click → select-all, otherwise start).
Drop the redundant length>0 guard now that selectAll() already noops on
an empty value.

Made-with: Cursor

* docs(tui): explain _tui_need_npm_install content-vs-mtime comparison

Expand the docstring so future readers understand why we parse the
lockfiles instead of comparing mtimes, what the optional/peer skip
covers, how stale hidden-lock entries are handled, and when we fall
back to mtime.
2026-04-27 16:43:48 -07:00
Ben Barclay bebc10528f Merge pull request #16728 from NousResearch/docs/docker-multi-profile-section
docs(docker): add "Multi-profile support" section recommending one container per profile
2026-04-28 09:29:24 +10:00
Ben Barclay 273be93499 docs(docker): restore accidentally-redacted placeholder strings
The previous commit on this branch went through a layer that redacted
strings matching API-key patterns. Restore the original placeholder
values (sk-ant-..., ${ANTHROPIC_API_KEY}, etc.) that were already in
main so the diff is scoped strictly to the new Multi-profile support
section.
2026-04-28 08:21:40 +10:00
Ben Barclay adc2856ffb docs(docker): add "Multi-profile support" section
Clarifies that Hermes' built-in multi-profile feature is not recommended
when running under Docker. Recommends instead running one container per
profile, each bind-mounting its own host data directory as /opt/data.
Includes docker run examples, a rationale list (isolation, independent
lifecycle, port separation, concurrent-write safety), and a Compose
snippet showing two profile services side by side.
2026-04-28 08:20:01 +10:00
brooklyn! 46b4cf8d21 Merge pull request #16707 from NousResearch/bb/tui-queue-delete
feat(tui): delete queued message while editing with ctrl-x / cancel with esc
2026-04-27 15:56:46 -05:00
Brooklyn Nicholson 718088c382 fix(tui): copilot review on #16707 — naming, label consistency, esc priority
- Rename `removeAt` → `removeAtInPlace` and document the mutation
  contract; the old name read like a non-mutating helper.
- Hotkey table + queue header: use `Ctrl+X` / `Esc` to match the
  rest of the UI (was `⌃X` / `esc`).
- Render the queued header as a single template literal so JSX
  text-node whitespace can't sneak into the rendered line.
- Make `Esc` while editing beat the `terminal.hasSelection` clear:
  the header promises 'Esc cancel', so an active selection
  shouldn't silently consume the keystroke.
2026-04-27 15:37:54 -05:00
Brooklyn Nicholson 32b068560d fix(tui): stop ctrl+x from leaking a literal 'x' into the composer
The text input's ctrl-passthrough whitelist only listed Ctrl+C and
Ctrl+B.  Ctrl+X fell through to the printable-char branch and got
inserted as 'x' alongside the queue-delete action firing in
useInputHandlers.

Add Ctrl+X to the same whitelist so it bypasses the readline-style
fallback and reaches the app-level handler unchanged.  When not in
queue-edit mode it's a no-op, which is fine — typing 'x' on Ctrl+X
was the wrong default anyway.
2026-04-27 15:32:16 -05:00
Brooklyn Nicholson ea1012f59f feat(tui): delete queued message while editing with ctrl-x / cancel with esc
Today there's no way to remove a queued message — ↑ loads it for edit,
ctrl-K dispatches the head, but a draft you no longer want stays put
forever. ctrl-C just clears the composer and exits edit mode without
touching the queue.

Two new bindings, both gated on queueEditIdx !== null so they're
inert when the user isn't pointing at a queue item:

- ctrl-X — delete the queue item being edited, clear composer, exit
  edit mode.  "cut" matches the mental model and doesn't collide with
  any existing binding.
- esc — cancel the edit (composer clears, item stays in queue).
  Mirrors ctrl-C's existing behavior so muscle memory has two paths.

Header line now reads `queued (3) · editing 2 · ⌃X delete · esc cancel`
when in edit mode, so the affordance is discoverable without /help.
The /help hotkey table also gets a Ctrl+X entry.

ctrl-C is intentionally unchanged: it should never destroy queued
content.  Cancel is non-destructive (esc / ctrl-C); only ctrl-X
removes the item.
2026-04-27 15:24:14 -05:00
Erosika 4a9ac5c355 fix(memory): drop scrub from interim commentary + final response
Same layering concern as the persisted-assistant scrub already removed:
_emit_interim_assistant_message and the final_response return path were
mutating model output broadly.  Streaming scrubber covers real leaks
delta-by-delta; these post-stream scrubs were redundant.
2026-04-27 12:37:33 -07:00
Erosika 49e3a1d8ee style: trim verbose comment blocks added by previous commit 2026-04-27 12:37:33 -07:00
Erosika e553f6f3e4 fix(memory): narrow scrub surface to known wrapper boundaries
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:

1. gateway/run.py hardcoded a '## Honcho Context' literal split for
   vision-LLM output.  Plugin-format heading in framework code; could
   truncate legitimate output naturally containing that header.
   Drop the literal split; keep generic sanitize_context (the wrapper
   strip is plugin-agnostic).  Plugin-specific cleanup belongs at the
   provider boundary, not the shared gateway path.

2. run_agent.run_conversation scrubbed user_message and
   persist_user_message before the conversation loop.  User text is
   sacred — if a user types a literal <memory-context> tag we must
   not silently delete it.  The producer (build_memory_context_block)
   is the only legitimate emitter; user input should never need the
   reverse op.

3. _build_assistant_message scrubbed model output before persistence.
   Same hazard: would silently mutate legitimate documentation/code
   the model emits containing the literal markers.  The streaming
   scrubber catches real leaks delta-by-delta before content is
   concatenated; persist-time scrub was redundant belt-and-suspenders.

4. _fire_stream_delta stripped leading newlines from every delta unless
   a paragraph break flag was set.  Mid-stream '\n' is legitimate
   markdown — lists, code fences, paragraph breaks — and chunk
   boundaries are arbitrary.  Narrow lstrip to the very first delta
   of the stream only (so stale provider preamble still gets cleaned
   on turn start, but mid-stream formatting survives).

Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.

Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation).  Plugin-specific strings
stay out of shared runtime paths.  User input and persisted assistant
output are no longer mutated.

Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
2026-04-27 12:37:33 -07:00
Erosika 05435a35ed chore(release): map honcho-consolidation contributor emails
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in #15381
so the contributor-attribution CI check passes.
2026-04-27 12:37:33 -07:00
Erosika 894e0b935b feat(honcho): explain why when honcho_profile returns an empty card
Closed PR #5137 addressed the retrieval path (peer cards via get_card()
instead of the session-scoped lookup that returned empty for per-session
messaging flows) — that architectural fix is already in main as
_fetch_peer_card / _fetch_peer_context.

What never got fixed is the user-visible side: honcho_profile returning
a flat 'No profile facts available yet.' leaves the model to guess at
why.  The model then often surfaces it to the user as a cryptic error.

Adds a diagnostic hint next to the existing 'result' message, enumerating
the likely causes in rough order of frequency:

  1. Observation disabled for this peer (user_observe_me/others off)
  2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence
     hasn't fired enough turns — cards build over time)
  3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards

The hint also suggests alternative tools (honcho_reasoning / honcho_search)
so the model can route around the empty card rather than giving up.

Schema description updated so the model knows the hint field exists and
that an empty card is NOT an error state.

7 tests cover the hint paths: warmup, observation-disabled for user + ai,
generic fallback, populated card still returns plain result (no hint),
alternative-tool suggestion present.
2026-04-27 12:37:33 -07:00
Erosika 5883df5574 fix(honcho): keep legacy schemeless baseUrl configs working
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.

urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.

Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.

Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected.  Reviewer
noted concern #1: schemeless regression for self-hosters with old
configs.
2026-04-27 12:37:33 -07:00
Erosika cd276eef78 compat(honcho): accept metadata kwarg on on_memory_write ABC bump
main's 6a957a74 added an optional 'metadata' kwarg to
MemoryProvider.on_memory_write so providers can distinguish tool-driven
memory writes from background-review writes.  MemoryManager already
does a getfullargspec-based introspection, so the old 3-arg signature
didn't break at runtime — but it missed the origin hint entirely.

Updates HonchoMemoryProvider.on_memory_write to accept the kwarg.  The
metadata isn't yet threaded into Honcho's create_conclusion payload —
that's worth its own PR once the consolidation lands and the new
metadata shape stabilises.
2026-04-27 12:37:33 -07:00
Erosika 02ab255a0d style(honcho): hoist hashlib import; validate baseUrl scheme before 'local' sentinel
Two small follow-ups to the PR review:

- Hoist hashlib import from _enforce_session_id_limit() to module top.
  stdlib imports are free after first cache, but keeping all imports at
  module top matches the rest of the codebase.

- _resolve_api_key now URL-parses baseUrl and requires http/https +
  non-empty netloc before returning the 'local' sentinel.  A typo like
  baseUrl: 'true' (or bare 'localhost') no longer silently passes the
  credential guard; the CLI correctly reports 'not configured'.

Three new tests cover the new validation (garbage strings, non-http
schemes, valid https).
2026-04-27 12:37:33 -07:00
Erosika 3b2edb347d fix(gateway): scrub memory-context leaks from vision auto-analysis output
fixes #5719

The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description.  That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.

Strips both forms of leak before embedding:
  - <memory-context>...</memory-context> fenced blocks (sanitize_context)
  - trailing '## Honcho Context' sections (header + everything after)

Plus regression tests:
  - tests/agent/test_streaming_context_scrubber.py — 13 tests on the
    stateful scrubber (whole block, split tags, false-positive partial
    tags, unterminated span, reset, case-insensitivity)
  - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
    _fire_stream_delta covering the realistic 7-chunk leak scenario and
    the cross-turn scrubber reset
  - tests/gateway/test_vision_memory_leak.py — 4 tests covering the
    vision auto-analysis boundary (clean pass-through, '## Honcho Context'
    header, fenced block, both patterns together)
2026-04-27 12:37:33 -07:00
Erosika 5ce5b17a42 fix(honcho): buffer partial memory-context spans across stream deltas
sanitize_context() uses a non-greedy block regex that needs both
<memory-context> open and close tags present in a single string. When a
provider streams the fenced memory block across multiple deltas (typical
for recalled-context leaks — the payload often arrives in 10+ 1-80 char
chunks), the per-delta sanitize stripped the lone open/close tags via
_FENCE_TAG_RE but let the payload in between flow straight to the UI.

Adds StreamingContextScrubber: a small stateful scrubber that tracks
open/close tag pairs across deltas, holds back partial-tag tails at
chunk boundaries, and discards span contents wholesale (including the
system-note line that fragments across deltas).

Wired into _fire_stream_delta; reset per user turn; benign trailing
partial-tag tails are flushed at the end of each model call.  Mid-span
interruption (provider drops closing tag) drops the orphaned content
rather than leaking it — truncated answer > leaked memory.

Follow-up to #13672 (@dontcallmejames).
2026-04-27 12:37:33 -07:00
Erosika 5d349ea857 fix(honcho): hold RLock across new_session's get_or_create to close race
new_session() was popping the old cached session, releasing the lock,
calling get_or_create, then re-acquiring the lock to insert. A concurrent
caller could observe the empty-cache window and race-create its own
session, producing two divergent session objects for the same key.

_cache_lock is an RLock, so nested reacquisition inside get_or_create is
safe. Hold it across the whole pop/create/insert sequence.

Follow-up to #13510 (@hekaru-agent).
2026-04-27 12:37:33 -07:00
twozle 82205276c1 fix(plugins/memory/honcho): default Honcho SDK HTTP timeout to 30s
When no explicit timeout is configured (HonchoClientConfig.timeout,
honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client
previously constructed the SDK with no timeout kwarg, letting the
underlying httpx client hang indefinitely if the Honcho backend
became unreachable mid-request.

This is a silent-failure hazard on the post-response path of
run_conversation: the memory_manager.sync_all() / queue_prefetch_all()
calls fire after the agent has already generated its final reply, so
a stalled Honcho request blocks run_conversation from returning.
The gateway never logs "response ready" and never delivers the
response to the platform (Telegram, etc.), even though the text is
already saved to the session file.

Repro: unplug the network or block app.honcho.dev mid-turn after
the model has produced its final message. Without this change,
_run_agent never returns. With it, the call aborts after 30s,
run_conversation returns, and the gateway delivers the response
(Honcho sync failure is logged and swallowed as before).

The default applies only when nothing is configured, so any
deployment that has explicitly set timeout / HONCHO_TIMEOUT /
honcho.timeout / honcho.requestTimeout keeps its existing value.
Self-hosted deployments that genuinely need a longer ceiling can
still override via any of those knobs.
2026-04-27 12:37:33 -07:00
Alexander Yususpov 36d6b643f6 fix(honcho): CLI credential guard rejects self-hosted baseUrl configs
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all
CLI subcommands (identity --show, status, migrate, etc.) bail with
"No API key configured" on self-hosted instances that use baseUrl
without an API key.

Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the
client.py behavior that already handles this case for the SDK.

Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
2026-04-27 12:37:33 -07:00
HiddenPuppy 5d36871d92 Fix Honcho HOME-aware global config fallback 2026-04-27 12:37:33 -07:00
dontcallmejames f1ba4014e1 fix: harden memory-context leak boundaries 2026-04-27 12:37:33 -07:00
dontcallmejames 39713ba2ae fix: strip leaked memory context from commentary 2026-04-27 12:37:33 -07:00
hekaru-agent dad0217450 fix(honcho): thread-safe session cache via RLock
Wraps _session_cache mutations in threading.RLock. Without this, concurrent
gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time)
can race on the cache and silently lose conclusions or memory writes.

Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup
hunk from that PR is dropped here for scope isolation. Resolved a small
conflict with the pinPeerName guard (kept both).
2026-04-27 12:37:33 -07:00
Sanjays2402 cd1c4812ab fix(honcho): truncate resolve_session_name output to Honcho's 100-char limit (#13868)
Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram
supergroup reply chains, Slack thread IDs with long workspace prefixes) can
exceed Honcho's 100-character session ID limit after sanitization. Every
Honcho API call for those sessions then 400s with "session_id too long".

Add a helper that enforces the 100-char limit after sanitization:
short keys (the common case) short-circuit unchanged; over-limit keys
keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over
the original key so two long keys sharing a leading segment can't
collide onto the same truncated ID.

Adds 7 regression tests in tests/honcho_plugin/test_client.py covering
short / exact-limit / long / deterministic / collision-resistant /
allowlist-preserving / hash-suffix-present cases.
2026-04-27 12:37:33 -07:00
Brian D. Evans 326c9daa69 fix(honcho): require strict True for pin_peer_name to survive MagicMock configs (#15162)
CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name``
in ``tests/agent/test_memory_user_id.py`` failed after this branch: that
test passes a ``MagicMock`` for ``config``, where
``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by
default.  My ``getattr(..., "pin_peer_name", False)`` fallback was
supposed to guard against callers that haven't added the new attr, but
MagicMock *does* have the attr — it just returns a live mock for it.

Tightened the gate to ``getattr(..., False) is True``.  Real configs
built via ``HonchoClientConfig.from_global_config`` always yield a
proper boolean, so strict equality matches the pinned case and rejects
both the unset-attr fallback and MagicMock stand-ins.  Added a comment
explaining why ``is True`` is intentional, not paranoid.

Also tightened the ``peer_name`` existence check to
``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its
default (also truthy) doesn't spuriously enable pinning either.

Verified against both the new ``test_pin_peer_name.py`` suite (13/13
pass) and the previously-failing
``TestHonchoUserIdScoping`` (3/3 pass).  Zero behaviour change for real
``HonchoClientConfig`` values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Brian D. Evans d03c6fcc45 fix(honcho): pinPeerName opt-in keeps memory unified across platforms (#14984)
When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the
platform-native user ID as ``runtime_user_peer_name`` into the Honcho
session manager.  That ID wins over ``peer_name`` in ``honcho.json``, so a
single user who connects over three platforms ends up as three separate
Honcho peers — one per platform — with fragmented memory and no cross-
platform context continuity.

For multi-user bots this is correct (and must not change): each user gets
their own peer scope.  For the vast majority of personal Hermes deployments
the configured ``peer_name`` is an unambiguous identity, though, so the
reporter asked for an opt-in knob that pins the user peer to that value.

Fix: new ``pinPeerName`` boolean on the host config, default ``false``.
When ``true`` AND ``peerName`` is set, the configured peer_name beats the
gateway's runtime identity; every other resolution case is unchanged.

  honcho.json:
  {
    "peerName": "Igor",
    "hosts": {
      "hermes": { "pinPeerName": true }
    }
  }

  session.py (resolution order, pinned case):
    runtime_user_peer_name  →  skipped (opt-in flag active)
    config.peer_name        →  WINS   "Igor"
    session-key fallback    →  unreached

Parsing follows the same host-block-overrides-root pattern as every other
flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper).

Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups):
- Config parsing: default, root true, host-block true, host overrides
  root, explicit false.
- Peer resolution: runtime wins by default (regression guard for multi-
  user bots), config wins when pinned, pin-without-peer_name is a no-op
  (prevents silent peer-id collapse to session-key fallback), CLI path
  where runtime is absent, deepest fallback intact, assistant peer
  untouched by the flag.
- Cross-platform unification: Telegram UID + Discord snowflake collapse
  to one peer when pinned; negative control confirms two distinct
  runtime IDs still produce two peers when unpinned.

244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions.

Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name",
False)`` so callers building partial config objects (several test fixtures
across the codebase do this) don't break if they haven't updated yet.
Runtime cost: one attr lookup per new session.

Closes #14984

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Siddharth Balyan ef41d3bd45 feat(nix): declarative plugin installation for NixOS module (#15953)
* feat(nix): parameterize dependency-groups in python.nix

* refactor(nix): extract package to callPackage-able hermes-agent.nix

Makes the package overridable via .override{} and adds
extraPythonPackages parameter for PYTHONPATH injection.
Includes build-time collision check using PEP 503 name
canonicalization.

* feat(nix): add overlay for external NixOS consumption

External flakes can now add overlays = [ inputs.hermes-agent.overlays.default ]
to get pkgs.hermes-agent with full .override support.

* test(nix): add check for extraPythonPackages PYTHONPATH injection

Verifies wrapper has PYTHONPATH when extras provided, and
base package has no PYTHONPATH without extras.

* feat(nix): add extraPlugins option for directory-based plugins

Symlinks plugin packages into HERMES_HOME/plugins/ at activation time.
Validates plugin.yaml presence. Asserts unique plugin names at eval time.
Hermes discovers them automatically via its directory scan.

* feat(nix): add extraPythonPackages option for entry-point plugins

Overrides the hermes package with PYTHONPATH injection when
extraPythonPackages is non-empty. Plugin .dist-info directories
become visible to importlib.metadata for entry-point discovery.
Works in both native systemd and container modes.

* docs: add NixOS declarative plugin installation to nix-setup, plugins, and build-a-plugin guides

- nix-setup.md: new Plugins section with extraPlugins/extraPythonPackages
  examples, overlay usage, collision checking note, options reference rows
- plugins.md: Nix row in discovery table, NixOS declarative plugins section
- build-a-hermes-plugin.md: Distribute for NixOS section after pip section

* fix: address review feedback — remove unrelated umask, fix fetchFromGitHub naming, simplify checks

- Remove accidentally introduced umask/migration changes (unrelated to plugins)
- Add pluginName helper, fix fetchFromGitHub producing name='source'
- Show name= in extraPlugins example docs
- Simplify checks.nix: use hermes-agent.override instead of re-callPackage
- Fix fragile grep shell logic in checks

* refactor: address simplify feedback — lib.getName, drop unused inputs', Python list for extras

- Use lib.getName instead of custom pluginName helper
- Drop unused inputs' from checks.nix perSystem args
- Pass extraPythonPackages as Python list literal instead of colon-split string

* fix: walk propagatedBuildInputs for plugin PYTHONPATH and collision check

Uses python312.pkgs.requiredPythonModules to resolve the full transitive
closure of extraPythonPackages. Without this, a plugin with third-party
deps (e.g. requests) would fail at runtime if those deps weren't already
in the sealed uv2nix venv. The collision check now also scans the full
closure, catching transitive conflicts.

* cleanup: fold plugins into subdir loop, use find for symlink cleanup, inline lib.getName

- Add 'plugins' to the existing cron/sessions/logs/memories subdir loop
  instead of a separate mkdir/chown/chmod block
- Replace fragile for-glob with find -delete for stale symlink cleanup
- Inline lib.getName at both call sites, remove pluginName wrapper
2026-04-28 00:18:32 +05:30
Siddharth Balyan 1fa76607c0 feat: trigram FTS5 index for CJK search, replace LIKE fallback (#16651)
* fix: bypass FTS5 for CJK queries in session_search

FTS5 default tokenizer splits CJK characters into individual tokens,
so multi-character queries like "大别山项目" become AND of single chars.
This produces few/no results compared to LIKE substring search.

For CJK queries, skip FTS5 entirely and use LIKE for accurate
phrase matching.

Fixes NousResearch/hermes-agent#15500

* fix: cache _contains_cjk, escape LIKE wildcards, add regression tests

On top of the CJK FTS5 bypass from #15509:

- Cache _contains_cjk() result in a local var to avoid redundant O(n)
  scans on every CJK query
- Escape %, _ in LIKE queries so literal wildcards in user input are
  not treated as SQL wildcards (consistent with other LIKE queries in
  hermes_state.py that use ESCAPE '\')
- Fix misleading comment ('or CJK fallback' → accurate description)
- Add 3 regression tests:
  - test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829)
  - test_cjk_like_dedup_no_duplicates
  - test_cjk_like_escapes_wildcards (new wildcard escaping)

* feat: trigram FTS5 index for CJK search, replace LIKE fallback

Replace the LIKE '%query%' full-table-scan fallback for CJK queries with
a proper trigram FTS5 index (messages_fts_trigram).  The trigram tokenizer
creates overlapping 3-byte sequences so substring matching works natively
for any script — CJK, Thai, etc.

For queries with 3+ CJK characters: uses the trigram FTS5 table with
proper ranking, snippets, and indexed lookups.  For shorter queries
(1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs
≥9 UTF-8 bytes (3 CJK chars) minimum.

Schema v10 migration creates the trigram table and backfills existing
messages.  Triggers keep the index in sync on INSERT/UPDATE/DELETE.

Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards).

---------

Co-authored-by: vominh1919 <vominh1919@gmail.com>
2026-04-28 00:12:07 +05:30
brooklyn! e80504b088 Merge pull request #16656 from NousResearch/bb/tui-parity-mutating-commands
fix(tui): route mutating slash commands through live gateway state
2026-04-27 13:30:19 -05:00
Brooklyn Nicholson ed4f7f0ba3 test(tui): skip slash parity matrix when Python registry is unavailable
Keep the parity test backed by the real Python command registry while avoiding hard failures in Node-only Vitest environments that cannot import hermes_cli.commands.
2026-04-27 13:19:11 -05:00
kshitijk4poor 56724147ef fix(providers/gmi): post-salvage review fixes
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
  is 22, so all users are past version 17 and would never be prompted for
  GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
  model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
  used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
  stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
  (was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
  wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
  to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
  fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
2026-04-27 11:17:59 -07:00
Isaac Huang c53fcb0173 feat(providers): add GMI Cloud as a first-class API-key provider (#11955)
Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>
2026-04-27 11:17:59 -07:00
Brooklyn Nicholson 8a33ed6136 fix(tui): address rollback guard and parity registry review
Load slash command names from the Python registry instead of regex-parsing source, and guard native rollback when no TUI session is active.
2026-04-27 13:10:13 -05:00
brooklyn! 41f70e6fc4 Merge pull request #16664 from NousResearch/bb/fix-tui-forceredraw-export
fix(tui): expose forceRedraw in Ink type shim
2026-04-27 13:08:16 -05:00
Brooklyn Nicholson adbd173ddd fix(tui): expose forceRedraw in Ink type shim 2026-04-27 13:07:48 -05:00
Brooklyn Nicholson 4f59510dd4 fix(tui): tighten fast-mode support validation
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
2026-04-27 13:00:11 -05:00
Brooklyn Nicholson 4a08f1015a fix(tui): reject fast mode for unsupported live models
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
2026-04-27 12:55:41 -05:00
Brooklyn Nicholson 8bd5d0667a Merge origin/main into bb/tui-parity-mutating-commands
Resolve session command merge conflict and keep the branch current with main so PR #16656 is mergeable.
2026-04-27 12:51:11 -05:00
brooklyn! 6d24880604 Merge pull request #16657 from NousResearch/bb/tui-keybinding-model-parity
fix(tui): align Ctrl+L and /model default scope with classic CLI
2026-04-27 12:49:37 -05:00
Brooklyn Nicholson b8556eb15e fix(tui): address fast-mode live sync review feedback
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
2026-04-27 12:47:42 -05:00
Brooklyn Nicholson b3e7a412e2 fix(tui): wire Ctrl+L to Ink forceRedraw path
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
2026-04-27 12:44:24 -05:00
Brooklyn Nicholson da6f8449a5 test(tui): tighten redraw hotkey review follow-ups
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
2026-04-27 12:30:40 -05:00
Brooklyn Nicholson a13449a40a fix(tui): address Copilot review feedback on mutating command parity
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
2026-04-27 12:30:30 -05:00
Brooklyn Nicholson 17029a64e8 chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the keybinding/model parity branch.
2026-04-27 12:25:27 -05:00
Brooklyn Nicholson 487da4b72b chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the parity PR branch.
2026-04-27 12:25:21 -05:00
Brooklyn Nicholson 4909b94f99 fix(tui): align Ctrl+L and /model with classic CLI semantics
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
2026-04-27 12:23:56 -05:00
Brooklyn Nicholson a4cb3ef66c fix(tui): make mutating slash paths native and lifecycle-safe
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
2026-04-27 12:20:08 -05:00
145 changed files with 12099 additions and 619 deletions
+1
View File
@@ -69,3 +69,4 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
models-dev-upstream/
+30 -3
View File
@@ -202,19 +202,33 @@ def _forbids_sampling_params(model: str) -> bool:
# Beta headers for enhanced features (sent with ALL auth types).
# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
# As of Opus 4.7 (2026-04-16), the first two are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
# that still gate on the headers continue to get the enhanced features.
# Migration guide: remove these if you no longer support ≤4.5 models.
#
# ``context-1m-2025-08-07`` unlocks the 1M context window on Claude Opus 4.6/4.7
# and Sonnet 4.6 when served via AWS Bedrock or Azure AI Foundry. 1M is GA on
# native Anthropic (api.anthropic.com) for Opus 4.6+, but Bedrock/Azure still
# gate it behind this beta header as of 2026-04 — without it Bedrock caps Opus
# at 200K even though model_metadata.py advertises 1M. The header is a harmless
# no-op on endpoints where 1M is GA.
#
# Migration guide: remove these if you no longer support ≤4.5 models or once
# Bedrock/Azure promote 1M to GA.
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
"context-1m-2025-08-07",
]
# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
# the fine-grained tool streaming beta is present. Omit it so tool calls
# fall back to the provider's default response path.
_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
# 1M context beta — see comment on _COMMON_BETAS above. Stripped for
# Bearer-auth (MiniMax) endpoints since they host their own models and
# unknown Anthropic beta headers risk request rejection.
_CONTEXT_1M_BETA = "context-1m-2025-08-07"
# Fast mode beta — enables the ``speed: "fast"`` request parameter for
# significantly higher output token throughput on Opus 4.6 (~2.5x).
@@ -357,9 +371,14 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
that include Anthropic's ``fine-grained-tool-streaming`` beta — every
tool-use message triggers a connection error. Strip that beta for
Bearer-auth endpoints while keeping all other betas intact.
The ``context-1m-2025-08-07`` beta is also stripped for Bearer-auth
endpoints — MiniMax hosts its own models, not Claude, so the header is
irrelevant at best and risks request rejection at worst.
"""
if _requires_bearer_auth(base_url):
return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
_stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
return [b for b in _COMMON_BETAS if b not in _stripped]
return _COMMON_BETAS
@@ -456,6 +475,13 @@ def build_anthropic_bedrock_client(region: str):
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Attaches the common Anthropic beta headers as client-level defaults so
that Bedrock-hosted Claude models get the same enhanced features as
native Anthropic. The ``context-1m-2025-08-07`` beta in particular
unlocks the 1M context window for Opus 4.6/4.7 on Bedrock — without
it, Bedrock caps these models at 200K even though the Anthropic API
serves them with 1M natively.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
if _anthropic_sdk is None:
@@ -473,6 +499,7 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
default_headers={"anthropic-beta": ",".join(_COMMON_BETAS)},
)
+12 -2
View File
@@ -82,6 +82,8 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -155,6 +157,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"gmi": "google/gemini-3.1-flash-lite-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
@@ -2558,12 +2561,19 @@ def _is_openrouter_client(client: Any) -> bool:
return False
def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
"""Best-effort check for cached clients that accept ``vendor/model`` IDs."""
if _is_openrouter_client(client):
return True
return bool(cached_default and "/" in cached_default)
def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
"""Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
"""Keep slash-bearing model IDs only for cached clients that support them.
Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
"""
if model and "/" in model and not _is_openrouter_client(client):
if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
return cached_default
return model or cached_default
+66 -2
View File
@@ -338,6 +338,10 @@ class ContextCompressor(ContextEngine):
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -441,6 +445,17 @@ class ContextCompressor(ContextEngine):
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
# When summary generation fails and a static fallback is inserted,
# record how many turns were unrecoverably dropped so callers
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
# succeeded. Silent recovery would hide the broken config.
self._last_aux_model_failure_error: Optional[str] = None
self._last_aux_model_failure_model: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -900,10 +915,50 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
# Record the aux-model failure so callers can warn the user
# even if the retry-on-main succeeds — a misconfigured aux
# model is something the user needs to fix.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic) # retry immediately
# Unknown-error best-effort retry on main model. Losing N turns of
# context is almost always worse than one extra summary attempt, so
# if we haven't already fallen back and the summary model differs
# from the main model, try once more on main before entering
# cooldown. Errors that DID match _is_model_not_found above are
# already handled by the fast-path retry; this branch catches
# everything else (400s, provider-specific "no route" strings,
# aggregator rejections, etc.) where auto-retry is still safer
# than dropping the turns.
if (
self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' failed (%s). "
"Retrying on main model '%s' before giving up.",
self.summary_model, e, self.model,
)
# Record the aux-model failure (see 404 branch above) — user
# should know their configured model is broken even if main
# recovers the call.
_err_text = str(e).strip() or e.__class__.__name__
if len(_err_text) > 220:
_err_text = _err_text[:217].rstrip() + "..."
self._last_aux_model_failure_error = _err_text
self._last_aux_model_failure_model = self.summary_model
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
@@ -1196,6 +1251,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
related to this topic and be more aggressive about compressing
everything else. Inspired by Claude Code's ``/compact``.
"""
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_summary_error = None
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
@@ -1274,11 +1336,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting static fallback context marker")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} conversation turns were "
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"removed to free context space but could not be summarized. The removed "
f"turns contained earlier work in this session. Continue based on the "
f"messages contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
)
+113 -4
View File
@@ -63,15 +63,124 @@ def sanitize_context(text: str) -> str:
return text
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note.
class StreamingContextScrubber:
"""Stateful scrubber for streaming text that may contain split memory-context spans.
The fence prevents the model from treating recalled context as user
discourse. Injected at API-call time only — never persisted.
The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
a ``<memory-context>`` opened in one delta and closed in a later delta
leaks its payload to the UI because the non-greedy block regex needs
both tags in one string. This scrubber runs a small state machine
across deltas, holding back partial-tag tails and discarding
everything inside a span (including the system-note line).
Usage::
scrubber = StreamingContextScrubber()
for delta in stream:
visible = scrubber.feed(delta)
if visible:
emit(visible)
trailing = scrubber.flush() # at end of stream
if trailing:
emit(trailing)
The scrubber is re-entrant per agent instance. Callers building new
top-level responses (new turn) should create a fresh scrubber or call
``reset()``.
"""
_OPEN_TAG = "<memory-context>"
_CLOSE_TAG = "</memory-context>"
def __init__(self) -> None:
self._in_span: bool = False
self._buf: str = ""
def reset(self) -> None:
self._in_span = False
self._buf = ""
def feed(self, text: str) -> str:
"""Return the visible portion of ``text`` after scrubbing.
Any trailing fragment that could be the start of an open/close tag
is held back in the internal buffer and surfaced on the next
``feed()`` call or discarded/emitted by ``flush()``.
"""
if not text:
return ""
buf = self._buf + text
self._buf = ""
out: list[str] = []
while buf:
if self._in_span:
idx = buf.lower().find(self._CLOSE_TAG)
if idx == -1:
# Hold back a potential partial close tag; drop the rest
held = self._max_partial_suffix(buf, self._CLOSE_TAG)
self._buf = buf[-held:] if held else ""
return "".join(out)
# Found close — skip span content + tag, continue
buf = buf[idx + len(self._CLOSE_TAG):]
self._in_span = False
else:
idx = buf.lower().find(self._OPEN_TAG)
if idx == -1:
# No open tag — hold back a potential partial open tag
held = self._max_partial_suffix(buf, self._OPEN_TAG)
if held:
out.append(buf[:-held])
self._buf = buf[-held:]
else:
out.append(buf)
return "".join(out)
# Emit text before the tag, enter span
if idx > 0:
out.append(buf[:idx])
buf = buf[idx + len(self._OPEN_TAG):]
self._in_span = True
return "".join(out)
def flush(self) -> str:
"""Emit any held-back buffer at end-of-stream.
If we're still inside an unterminated span the remaining content is
discarded (safer: leaking partial memory context is worse than a
truncated answer). Otherwise the held-back partial-tag tail is
emitted verbatim (it turned out not to be a real tag).
"""
if self._in_span:
self._buf = ""
self._in_span = False
return ""
tail = self._buf
self._buf = ""
return tail
@staticmethod
def _max_partial_suffix(buf: str, tag: str) -> int:
"""Return the length of the longest buf-suffix that is a tag-prefix.
Case-insensitive. Returns 0 if no suffix could start the tag.
"""
tag_lower = tag.lower()
buf_lower = buf.lower()
max_check = min(len(buf_lower), len(tag_lower) - 1)
for i in range(max_check, 0, -1):
if tag_lower.startswith(buf_lower[-i:]):
return i
return 0
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note."""
if not raw_context or not raw_context.strip():
return ""
clean = sanitize_context(raw_context)
if clean != raw_context:
logger.warning("memory provider returned pre-wrapped context; stripped")
return (
"<memory-context>\n"
"[System note: The following is recalled memory context, "
+35 -16
View File
@@ -51,6 +51,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"qwen-oauth",
"xiaomi",
"arcee",
"gmi",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
@@ -60,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal",
@@ -307,6 +309,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"integrate.api.nvidia.com": "nvidia",
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"ollama.com": "ollama-cloud",
}
@@ -702,6 +705,29 @@ def fetch_endpoint_model_metadata(
return {}
def _resolve_endpoint_context_length(
model: str,
base_url: str,
api_key: str = "",
) -> Optional[int]:
"""Resolve context length from an endpoint's live ``/models`` metadata."""
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
return None
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
from hermes_constants import get_hermes_home
@@ -1295,22 +1321,9 @@ def get_model_context_length(
# returns 128k) instead of the model's full context (400k). models.dev
# has the correct per-provider values and is checked at step 5+.
if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if context_length is not None:
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
@@ -1374,6 +1387,12 @@ def get_model_context_length(
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider == "gmi" and base_url:
# GMI exposes authoritative context_length via /models, but it is not
# in models.dev yet. Preserve that higher-fidelity endpoint lookup.
ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if ctx is not None:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
+7 -3
View File
@@ -56,8 +56,12 @@ _SENSITIVE_BODY_KEYS = frozenset({
})
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
# `export HERMES_REDACT_SECRETS=true`) cannot enable/disable redaction
# mid-session. OFF by default — user must opt in via
# `security.redact_secrets: true` in config.yaml (bridged to this env var
# in hermes_cli/main.py and gateway/run.py) or `HERMES_REDACT_SECRETS=true`
# in ~/.hermes/.env.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("1", "true", "yes", "on")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
@@ -257,7 +261,7 @@ def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
Disabled by default — enable via security.redact_secrets: true in config.yaml.
"""
if text is None:
return None
+1
View File
@@ -6000,6 +6000,7 @@ class HermesCLI:
platform_status = {
Platform.TELEGRAM: ("Telegram", "TELEGRAM_BOT_TOKEN"),
Platform.DISCORD: ("Discord", "DISCORD_BOT_TOKEN"),
Platform.SLACK: ("Slack", "SLACK_BOT_TOKEN"),
Platform.WHATSAPP: ("WhatsApp", "WHATSAPP_ENABLED"),
}
+1
View File
@@ -36,6 +36,7 @@
imports = [
./nix/packages.nix
./nix/overlays.nix
./nix/nixosModules.nix
./nix/checks.nix
./nix/devShell.nix
+16 -1
View File
@@ -566,6 +566,8 @@ def load_gateway_config() -> GatewayConfig:
existing = {}
# Deep-merge extra dicts so gateway.json defaults survive
merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
if plat_name == Platform.SLACK.value and "enabled" in plat_block:
merged_extra["_enabled_explicit"] = True
merged = {**existing, **plat_block}
if merged_extra:
merged["extra"] = merged_extra
@@ -610,16 +612,21 @@ def load_gateway_config() -> GatewayConfig:
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
enabled_was_explicit = "enabled" in platform_cfg
if not bridged and not enabled_was_explicit:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
if plat == Platform.SLACK and enabled_was_explicit:
extra["_enabled_explicit"] = True
extra.update(bridged)
# Slack settings → env vars (env vars take precedence)
@@ -941,6 +948,14 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# No yaml config for Slack — env-only setup, enable it
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
else:
slack_config = config.platforms[Platform.SLACK]
enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
if not slack_config.enabled and not enabled_was_explicit:
# Top-level Slack settings such as channel prompts should not
# turn an env-token setup into a disabled platform. Only an
# explicit slack.enabled/platforms.slack.enabled false should.
slack_config.enabled = True
# If yaml config exists, respect its enabled flag (don't override
# explicit enabled: false). Token is still stored so skills that
# send Slack messages can use it without activating the gateway adapter.
+49 -12
View File
@@ -307,9 +307,14 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""Build kwargs for standalone ``aiohttp.ClientSession`` with proxy.
Returns ``(session_kwargs, request_kwargs)`` where:
- SOCKS → ``({"connector": ProxyConnector(...)}, {})``
- HTTP → ``({}, {"proxy": url})``
- None → ``({}, {})``
- With aiohttp-socks → ``({"connector": ProxyConnector(...)}, {})``
for *all* proxy schemes (SOCKS **and** HTTP/HTTPS).
- HTTP without aiohttp-socks → ``({}, {"proxy": url})``.
- None → ``({}, {})``.
Prefer the connector path: it works transparently with libraries
(like mautrix) that call ``session.request()`` without forwarding
per-request ``proxy=`` kwargs.
Usage::
@@ -320,20 +325,20 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
"""
if not proxy_url:
return {}, {}
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
if proxy_url.lower().startswith("socks"):
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}, {}
return {}, {"proxy": proxy_url}
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
@@ -1702,13 +1707,41 @@ class BasePlatformAdapter(ABC):
the agent is waiting for dangerous-command approval). This is critical
for Slack's Assistant API where ``assistant_threads_setStatus`` disables
the compose box — pausing lets the user type ``/approve`` or ``/deny``.
Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
network round-trip can't stall the refresh cadence. Telegram- and
Discord-side typing expire after ~5s; if any individual send_typing
takes longer than the refresh interval, the bubble would die and
stay dead until that call returns. Abandoning the slow call lets
the next tick fire a fresh send_typing on schedule — as long as
one of them succeeds within the 5s platform-side window, the bubble
stays visible across provider stalls / upstream API timeouts.
"""
# Bound each send_typing round-trip so the refresh cadence isn't
# gated on network health. Must stay below ``interval`` so a slow
# call gets abandoned before the next scheduled tick.
_send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
try:
while True:
if stop_event is not None and stop_event.is_set():
return
if chat_id not in self._typing_paused:
await self.send_typing(chat_id, metadata=metadata)
try:
await asyncio.wait_for(
self.send_typing(chat_id, metadata=metadata),
timeout=_send_typing_timeout,
)
except asyncio.TimeoutError:
# Slow network — abandon this tick, keep the loop
# on schedule so the next send_typing fires fresh.
pass
except asyncio.CancelledError:
raise
except Exception as typing_err:
logger.debug(
"[%s] send_typing error (non-fatal): %s",
self.name, typing_err,
)
if stop_event is None:
await asyncio.sleep(interval)
continue
@@ -2399,11 +2432,15 @@ class BasePlatformAdapter(ABC):
# Send the text portion
if text_content:
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
# Build send metadata: thread_id + mention target for platforms that need it
send_metadata = dict(_thread_metadata) if _thread_metadata else {}
if event.source.user_id:
send_metadata["mention_user_id"] = event.source.user_id
result = await self._send_with_retry(
chat_id=event.source.chat_id,
content=text_content,
reply_to=event.message_id,
metadata=_thread_metadata,
metadata=send_metadata,
)
_record_delivery(result)
+438 -42
View File
@@ -11,6 +11,7 @@ Environment variables:
MATRIX_PASSWORD Password (alternative to access token)
MATRIX_ENCRYPTION Set "true" to enable E2EE
MATRIX_DEVICE_ID Stable device ID for E2EE persistence across restarts
MATRIX_PROXY HTTP(S) or SOCKS proxy URL for Matrix traffic
MATRIX_ALLOWED_USERS Comma-separated Matrix user IDs (@user:server)
MATRIX_HOME_ROOM Room ID for cron/notification delivery
MATRIX_REACTIONS Set "false" to disable processing lifecycle reactions
@@ -18,6 +19,7 @@ Environment variables:
MATRIX_REQUIRE_MENTION Require @mention in rooms (default: true)
MATRIX_FREE_RESPONSE_ROOMS Comma-separated room IDs exempt from mention requirement
MATRIX_AUTO_THREAD Auto-create threads for room messages (default: true)
MATRIX_DM_AUTO_THREAD Auto-create threads for DM messages (default: false)
MATRIX_RECOVERY_KEY Recovery key for cross-signing verification after device key rotation
MATRIX_DM_MENTION_THREADS Create a thread when bot is @mentioned in a DM (default: false)
"""
@@ -30,6 +32,8 @@ import mimetypes
import os
import re
import time
from dataclasses import dataclass
from html import escape as _html_escape
from pathlib import Path
from typing import Any, Dict, Optional, Set
@@ -95,11 +99,25 @@ from gateway.platforms.base import (
MessageType,
ProcessingOutcome,
SendResult,
resolve_proxy_url,
proxy_kwargs_for_aiohttp,
)
from gateway.platforms.helpers import ThreadParticipationTracker
logger = logging.getLogger(__name__)
@dataclass
class _MatrixApprovalPrompt:
"""Tracks a pending Matrix reaction-based exec approval prompt."""
def __init__(self, session_key: str, chat_id: str, message_id: str, resolved: bool = False):
self.session_key = session_key
self.chat_id = chat_id
self.message_id = message_id
self.resolved = resolved
self.bot_reaction_events: dict[str, str] = {} # emoji -> event_id
# Matrix message size limit (4000 chars practical, spec has no hard limit
# but clients render poorly above this).
MAX_MESSAGE_LENGTH = 4000
@@ -114,11 +132,85 @@ _CRYPTO_DB_PATH = _STORE_DIR / "crypto.db"
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
_OUTBOUND_MENTION_RE = re.compile(
r"(?<![\w/])(@[0-9A-Za-z._=/-]+:[0-9A-Za-z.-]+(?::\d+)?)"
)
_E2EE_INSTALL_HINT = (
"Install with: pip install 'mautrix[encryption]' (requires libolm C library)"
)
_MATRIX_IMAGE_FILENAME_EXTS = frozenset({
".jpg",
".jpeg",
".png",
".gif",
".webp",
".bmp",
".svg",
".heic",
".heif",
".avif",
})
def _looks_like_matrix_image_filename(text: str) -> bool:
"""Return True when Matrix image body text is probably just a transport filename.
Matrix ``m.image`` events commonly populate ``content.body`` with the uploaded
filename when the user did not add a caption. Treating that raw filename as
user-authored text confuses downstream vision enrichment.
"""
candidate = str(text or "").strip()
if not candidate or "\n" in candidate or candidate.endswith("/"):
return False
name = Path(candidate).name
if not name or name != candidate:
return False
suffix = Path(name).suffix.lower()
if not suffix:
return False
guessed_type, _ = mimetypes.guess_type(name)
if guessed_type and guessed_type.startswith("image/"):
return True
return suffix in _MATRIX_IMAGE_FILENAME_EXTS
def _create_matrix_session(proxy_url: str | None):
"""Create an ``aiohttp.ClientSession`` whose proxy applies to *all* requests.
mautrix's ``HTTPAPI._send()`` calls ``session.request()`` without forwarding
per-request ``proxy=`` kwargs. For HTTP(S) proxies we use aiohttp's native
``proxy=`` session parameter which sets a default for every request. For SOCKS
we use ``aiohttp_socks.ProxyConnector`` (connector-level).
When no proxy is configured we enable ``trust_env`` so standard env vars
(``HTTP_PROXY`` / ``HTTPS_PROXY``) are honoured automatically.
"""
import aiohttp
if not proxy_url:
return aiohttp.ClientSession(trust_env=True)
if proxy_url.split("://")[0].lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
return aiohttp.ClientSession(
connector=ProxyConnector.from_url(proxy_url, rdns=True),
)
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return aiohttp.ClientSession(trust_env=True)
return aiohttp.ClientSession(proxy=proxy_url)
def _check_e2ee_deps() -> bool:
"""Return True if mautrix E2EE dependencies (python-olm) are available."""
@@ -260,6 +352,9 @@ class MatrixAdapter(BasePlatformAdapter):
"1",
"yes",
)
self._dm_auto_thread: bool = os.getenv(
"MATRIX_DM_AUTO_THREAD", "false"
).lower() in ("true", "1", "yes")
self._dm_mention_threads: bool = os.getenv(
"MATRIX_DM_MENTION_THREADS", "false"
).lower() in ("true", "1", "yes")
@@ -270,6 +365,11 @@ class MatrixAdapter(BasePlatformAdapter):
).lower() not in ("false", "0", "no")
self._pending_reactions: dict[tuple[str, str], str] = {}
# Proxy support — resolve once at init, reuse for all HTTP traffic.
self._proxy_url: str | None = resolve_proxy_url(platform_env_var="MATRIX_PROXY")
if self._proxy_url:
logger.info("Matrix: proxy configured — %s", self._proxy_url)
# Text batching: merge rapid successive messages (Telegram-style).
# Matrix clients split long messages around 4000 chars.
self._text_batch_delay_seconds = float(
@@ -281,6 +381,18 @@ class MatrixAdapter(BasePlatformAdapter):
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
# Matrix reaction-based dangerous command approvals.
self._approval_reaction_map = {
"": "once",
"": "deny",
}
self._approval_prompts_by_event: Dict[str, _MatrixApprovalPrompt] = {}
self._approval_prompt_by_session: Dict[str, str] = {}
allowed_users_raw = os.getenv("MATRIX_ALLOWED_USERS", "")
self._allowed_user_ids: Set[str] = {
u.strip() for u in allowed_users_raw.split(",") if u.strip()
}
def _is_duplicate_event(self, event_id) -> bool:
"""Return True if this event was already processed. Tracks the ID otherwise."""
if not event_id:
@@ -326,7 +438,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
return False
except Exception as exc:
logger.error("Matrix: post-upload key verification failed: %s", exc)
logger.error("Matrix: post-upload key verification failed: %s", exc, exc_info=True)
return False
return True
@@ -342,6 +454,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: cannot verify device keys on server: %s — refusing E2EE",
exc,
exc_info=True,
)
return False
@@ -356,7 +469,7 @@ class MatrixAdapter(BasePlatformAdapter):
try:
await olm.share_keys()
except Exception as exc:
logger.error("Matrix: failed to re-upload device keys: %s", exc)
logger.error("Matrix: failed to re-upload device keys: %s", exc, exc_info=True)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -396,6 +509,7 @@ class MatrixAdapter(BasePlatformAdapter):
"Try generating a new access token to get a fresh device.",
client.device_id,
exc,
exc_info=True,
)
return False
return await self._reverify_keys_after_upload(client, local_ed25519)
@@ -420,9 +534,11 @@ class MatrixAdapter(BasePlatformAdapter):
_STORE_DIR.mkdir(parents=True, exist_ok=True)
# Create the HTTP API layer.
client_session = _create_matrix_session(self._proxy_url)
api = HTTPAPI(
base_url=self._homeserver,
token=self._access_token or "",
client_session=client_session,
)
# Create the client.
@@ -465,6 +581,7 @@ class MatrixAdapter(BasePlatformAdapter):
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER: %s",
exc,
exc_info=True,
)
await api.session.close()
return False
@@ -607,6 +724,44 @@ class MatrixAdapter(BasePlatformAdapter):
logger.warning(
"Matrix: recovery key verification failed: %s", exc
)
else:
# No recovery key — bootstrap cross-signing if the bot
# has none yet. Without this, Element shows "Encrypted
# by a device not verified by its owner" on every
# message from this bot, indefinitely. mautrix's
# generate_recovery_key does the full flow: generates
# MSK/SSK/USK, uploads private keys to SSSS, publishes
# public keys to the homeserver, and signs the current
# device with the new SSK. Some homeservers require UIA
# for /keys/device_signing/upload — those will need an
# alternate path; Continuwuity and Synapse-with-shared-
# secret accept the unauthenticated upload.
try:
own_xsign = await olm.get_own_cross_signing_public_keys()
except Exception as exc:
own_xsign = None
logger.warning(
"Matrix: cross-signing key lookup failed: %s", exc
)
if own_xsign is None:
try:
new_recovery_key = await olm.generate_recovery_key()
logger.warning(
"Matrix: bootstrapped cross-signing for %s. "
"SAVE THIS RECOVERY KEY — set "
"MATRIX_RECOVERY_KEY for future restarts so "
"the bot can re-sign its device after key "
"rotation: %s",
client.mxid,
new_recovery_key,
)
except Exception as exc:
logger.warning(
"Matrix: cross-signing bootstrap failed "
"(non-fatal — Element will show 'not "
"verified by its owner'): %s",
exc,
)
client.crypto = olm
logger.info(
@@ -664,6 +819,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: initial sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
else:
logger.warning(
"Matrix: initial sync returned unexpected type %s",
@@ -723,21 +879,32 @@ class MatrixAdapter(BasePlatformAdapter):
if not content:
return SendResult(success=True)
mention_user_id = (metadata or {}).get("mention_user_id")
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, MAX_MESSAGE_LENGTH)
last_event_id = None
for chunk in chunks:
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": chunk,
}
for i, chunk in enumerate(chunks):
msg_content = self._build_text_message_content(chunk)
# Convert markdown to HTML for rich rendering.
html = self._markdown_to_html(chunk)
if html and html != chunk:
# Append @mention pill to the last chunk for push notifications
# in muted rooms (mention-only mode).
if mention_user_id and i == len(chunks) - 1:
mention_html = (
f'<a href="https://matrix.to/#/{mention_user_id}">'
f"{mention_user_id}</a>"
)
msg_content["body"] = chunk + f" @{mention_user_id}"
base_html = msg_content.get("formatted_body", chunk)
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
msg_content["formatted_body"] = base_html + " " + mention_html
# m.mentions for MSC3952 push reliability.
existing_mentions = msg_content.get("m.mentions", {}).get("user_ids", [])
if mention_user_id not in existing_mentions:
msg_content["m.mentions"] = {
"user_ids": existing_mentions + [mention_user_id]
}
# Reply-to support.
if reply_to:
@@ -844,25 +1011,21 @@ class MatrixAdapter(BasePlatformAdapter):
"""Edit an existing message (via m.replace)."""
formatted = self.format_message(content)
new_content = self._build_text_message_content(formatted)
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": f"* {formatted}",
"m.new_content": {
"msgtype": "m.text",
"body": formatted,
},
"m.relates_to": {
"rel_type": "m.replace",
"event_id": message_id,
},
"m.new_content": new_content,
}
html = self._markdown_to_html(formatted)
if html and html != formatted:
msg_content["m.new_content"]["format"] = "org.matrix.custom.html"
msg_content["m.new_content"]["formatted_body"] = html
if "m.mentions" in new_content:
msg_content["m.mentions"] = new_content["m.mentions"]
if "formatted_body" in new_content:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = f"* {html}"
msg_content["formatted_body"] = f'* {new_content["formatted_body"]}'
msg_content["m.relates_to"] = {
"rel_type": "m.replace",
"event_id": message_id,
}
try:
event_id = await self._client.send_message_event(
@@ -895,10 +1058,12 @@ class MatrixAdapter(BasePlatformAdapter):
# Try aiohttp first (always available), fall back to httpx
try:
import aiohttp as _aiohttp
async with _aiohttp.ClientSession(trust_env=True) as http:
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(self._proxy_url)
async with _aiohttp.ClientSession(**_sess_kw) as http:
async with http.get(
image_url, timeout=_aiohttp.ClientTimeout(total=30)
image_url,
timeout=_aiohttp.ClientTimeout(total=30),
**_req_kw,
) as resp:
resp.raise_for_status()
data = await resp.read()
@@ -908,8 +1073,10 @@ class MatrixAdapter(BasePlatformAdapter):
)
except ImportError:
import httpx
async with httpx.AsyncClient() as http:
_httpx_kw: dict = {}
if self._proxy_url:
_httpx_kw["proxy"] = self._proxy_url
async with httpx.AsyncClient(**_httpx_kw) as http:
resp = await http.get(image_url, follow_redirects=True, timeout=30)
resp.raise_for_status()
data = resp.content
@@ -984,6 +1151,56 @@ class MatrixAdapter(BasePlatformAdapter):
chat_id, video_path, "m.video", caption, reply_to, metadata=metadata
)
async def send_exec_approval(
self,
chat_id: str,
command: str,
session_key: str,
description: str = "dangerous command",
metadata: Optional[dict] = None,
) -> SendResult:
"""Send a reaction-based exec approval prompt for Matrix."""
if not self._client:
return SendResult(success=False, error="Not connected")
cmd_preview = command[:2000] + "..." if len(command) > 2000 else command
text = (
"⚠️ **Dangerous command requires approval**\n"
f"```\n{cmd_preview}\n```\n"
f"Reason: {description}\n\n"
"Reply `/approve` to execute, `/approve session` to approve this pattern for the session, "
"`/approve always` to approve permanently, or `/deny` to cancel.\n\n"
"You can also click the reaction to approve:\n"
"✅ = /approve\n"
"❎ = /deny"
)
result = await self.send(chat_id, text, metadata=metadata)
if not result.success or not result.message_id:
return result
prompt = _MatrixApprovalPrompt(
session_key=session_key,
chat_id=chat_id,
message_id=result.message_id,
)
old_event = self._approval_prompt_by_session.get(session_key)
if old_event:
self._approval_prompts_by_event.pop(old_event, None)
self._approval_prompts_by_event[result.message_id] = prompt
self._approval_prompt_by_session[session_key] = result.message_id
for emoji in ("", ""):
try:
reaction_result = await self._send_reaction(chat_id, result.message_id, emoji)
# Save the bot's reaction event_id for later cleanup
if reaction_result:
prompt.bot_reaction_events[emoji] = str(reaction_result)
except Exception as exc:
logger.debug("Matrix: failed to add approval reaction %s: %s", emoji, exc)
return result
def format_message(self, content: str) -> str:
"""Pass-through — Matrix supports standard Markdown natively."""
# Strip image markdown; media is uploaded separately.
@@ -1115,9 +1332,15 @@ class MatrixAdapter(BasePlatformAdapter):
next_batch = await client.sync_store.get_next_batch()
while not self._closing:
try:
sync_data = await client.sync(
since=next_batch,
timeout=30000,
# Wrap in asyncio.wait_for to guard against TCP-level hangs
# that the Matrix long-poll timeout cannot catch. Long-poll
# is 30s, so 45s gives 15s slack for network drain.
sync_data = await asyncio.wait_for(
client.sync(
since=next_batch,
timeout=30000,
),
timeout=45.0,
)
# nio returns SyncError objects (not exceptions) for auth
@@ -1153,6 +1376,7 @@ class MatrixAdapter(BasePlatformAdapter):
await asyncio.gather(*tasks)
except Exception as exc:
logger.warning("Matrix: sync event dispatch error: %s", exc)
await self._join_pending_invites(sync_data)
except asyncio.CancelledError:
return
@@ -1239,6 +1463,15 @@ class MatrixAdapter(BasePlatformAdapter):
room_id = str(getattr(event, "room_id", ""))
sender = str(getattr(event, "sender", ""))
# Diagnostic: confirm the callback is firing at all when DEBUG is on.
# Helps users troubleshoot silent inbound issues like #5819, #7914, #12614.
logger.debug(
"Matrix: callback fired — event %s from %s in %s",
getattr(event, "event_id", "?"),
sender,
room_id,
)
# Ignore own messages (case-insensitive; also drops when our own
# user_id hasn't been resolved yet — see _is_self_sender docstring
# and issue #15763).
@@ -1350,6 +1583,12 @@ class MatrixAdapter(BasePlatformAdapter):
in_bot_thread = bool(thread_id and thread_id in self._threads)
if self._require_mention and not is_free_room and not in_bot_thread:
if not is_mentioned:
logger.debug(
"Matrix: ignoring message %s in %s — no @mention "
"(set MATRIX_REQUIRE_MENTION=false to disable)",
event_id,
room_id,
)
return None
# DM mention-thread.
@@ -1362,7 +1601,7 @@ class MatrixAdapter(BasePlatformAdapter):
body = self._strip_mention(body)
# Auto-thread.
if not is_dm and not thread_id and self._auto_thread:
if not thread_id and ((not is_dm and self._auto_thread) or (is_dm and self._dm_auto_thread)):
thread_id = event_id
self._threads.mark(thread_id)
@@ -1604,6 +1843,9 @@ class MatrixAdapter(BasePlatformAdapter):
return
body, is_dm, chat_type, thread_id, display_name, source = ctx
if msgtype == "m.image" and _looks_like_matrix_image_filename(body):
body = ""
allow_http_fallback = bool(http_url) and not is_encrypted_media
media_urls = (
[cached_path]
@@ -1633,13 +1875,35 @@ class MatrixAdapter(BasePlatformAdapter):
"Matrix: invited to %s — joining",
room_id,
)
await self._join_room_by_id(room_id)
async def _join_room_by_id(self, room_id: str) -> bool:
"""Join a room by ID and refresh local caches on success."""
if not room_id:
return False
if room_id in self._joined_rooms:
return True
try:
await self._client.join_room(RoomID(room_id))
self._joined_rooms.add(room_id)
logger.info("Matrix: joined %s", room_id)
await self._refresh_dm_cache()
return True
except Exception as exc:
logger.warning("Matrix: error joining %s: %s", room_id, exc)
return False
async def _join_pending_invites(self, sync_data: Dict[str, Any]) -> None:
"""Join rooms still present in rooms.invite after sync processing."""
rooms = sync_data.get("rooms", {}) if isinstance(sync_data, dict) else {}
invites = rooms.get("invite", {})
if not isinstance(invites, dict):
return
for room_id in invites:
if room_id in self._joined_rooms:
continue
logger.info("Matrix: reconciling pending invite for %s", room_id)
await self._join_room_by_id(str(room_id))
# ------------------------------------------------------------------
# Reactions (send, receive, processing lifecycle)
@@ -1754,6 +2018,51 @@ class MatrixAdapter(BasePlatformAdapter):
room_id,
)
# Check if this reaction resolves a pending approval prompt.
prompt = self._approval_prompts_by_event.get(reacts_to)
if prompt and not prompt.resolved:
if room_id != prompt.chat_id:
return
if self._allowed_user_ids and sender not in self._allowed_user_ids:
logger.info(
"Matrix: ignoring approval reaction from unauthorized user %s on %s",
sender, reacts_to,
)
return
choice = self._approval_reaction_map.get(key)
if not choice:
return
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(prompt.session_key, choice)
if count:
prompt.resolved = True
self._approval_prompts_by_event.pop(reacts_to, None)
self._approval_prompt_by_session.pop(prompt.session_key, None)
logger.info(
"Matrix reaction resolved %d approval(s) for session %s "
"(choice=%s, user=%s)",
count, prompt.session_key, choice, sender,
)
# Redact bot's seed reactions, leaving only the user's
await self._redact_bot_approval_reactions(room_id, prompt)
except Exception as exc:
logger.error("Failed to resolve gateway approval from Matrix reaction: %s", exc)
async def _redact_bot_approval_reactions(
self,
room_id: str,
prompt: "_MatrixApprovalPrompt",
) -> None:
"""Redact the bot's seed ✅/❎ reactions, leaving only the user's reaction."""
for emoji, evt_id in prompt.bot_reaction_events.items():
try:
await self.redact_message(room_id, evt_id, "approval resolved")
logger.debug("Matrix: redacted bot reaction %s (%s)", emoji, evt_id)
except Exception as exc:
logger.debug("Matrix: failed to redact bot reaction %s: %s", emoji, exc)
# ------------------------------------------------------------------
# Text message aggregation (handles Matrix client-side splits)
# ------------------------------------------------------------------
@@ -1979,11 +2288,7 @@ class MatrixAdapter(BasePlatformAdapter):
if not self._client or not text:
return SendResult(success=False, error="No client or empty text")
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
html = self._markdown_to_html(text)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
msg_content = self._build_text_message_content(text, msgtype=msgtype)
try:
event_id = await self._client.send_message_event(
@@ -2046,6 +2351,77 @@ class MatrixAdapter(BasePlatformAdapter):
# Mention detection helpers
# ------------------------------------------------------------------
def _build_text_message_content(self, text: str, msgtype: str = "m.text") -> Dict[str, Any]:
"""Build Matrix text content with HTML and outbound mention metadata."""
msg_content: Dict[str, Any] = {"msgtype": msgtype, "body": text}
mention_user_ids = self._extract_outbound_mentions(text)
if mention_user_ids:
msg_content["m.mentions"] = {"user_ids": mention_user_ids}
html_source = self._inject_outbound_mention_links(text)
html = self._markdown_to_html(html_source)
if html and html != text:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
return msg_content
def _extract_outbound_mentions(self, text: str) -> list[str]:
"""Return unique Matrix user IDs mentioned in outbound text."""
protected, _ = self._protect_outbound_mention_regions(text)
seen: Set[str] = set()
mentions: list[str] = []
for match in _OUTBOUND_MENTION_RE.finditer(protected):
user_id = match.group(1)
if user_id not in seen:
seen.add(user_id)
mentions.append(user_id)
return mentions
def _inject_outbound_mention_links(self, text: str) -> str:
"""Wrap outbound Matrix mentions in markdown links outside code spans."""
if not text:
return text
protected, placeholders = self._protect_outbound_mention_regions(text)
linked = _OUTBOUND_MENTION_RE.sub(
lambda match: f"[{match.group(1)}](https://matrix.to/#/{match.group(1)})",
protected,
)
for idx, original in enumerate(placeholders):
linked = linked.replace(f"\x00MENTION_PROTECTED{idx}\x00", original)
return linked
def _protect_outbound_mention_regions(self, text: str) -> tuple[str, list[str]]:
"""Protect markdown regions where outbound mentions should stay literal."""
placeholders: list[str] = []
def _protect(fragment: str) -> str:
idx = len(placeholders)
placeholders.append(fragment)
return f"\x00MENTION_PROTECTED{idx}\x00"
protected = re.sub(
r"```[\s\S]*?```",
lambda match: _protect(match.group(0)),
text or "",
)
protected = re.sub(
r"`[^`\n]+`",
lambda match: _protect(match.group(0)),
protected,
)
protected = re.sub(
r"\[[^\]]+\]\([^)]+\)",
lambda match: _protect(match.group(0)),
protected,
)
return protected, placeholders
def _is_bot_mentioned(
self,
body: str,
@@ -2080,13 +2456,33 @@ class MatrixAdapter(BasePlatformAdapter):
return False
def _strip_mention(self, body: str) -> str:
"""Strip the bot's full MXID (``@user:server``) from *body*.
"""Remove explicit bot mentions from message body.
The bare localpart is intentionally *not* stripped it would
mangle file paths like ``/home/hermes/media/file.png``.
Important: only strip explicit mention tokens (``@user:server`` or
``@localpart``). Do NOT strip bare words matching the bot localpart,
otherwise normal phrases like "Hermes Agent" become "Agent".
"""
if not body:
return ""
# Strip explicit full MXID mentions.
if self._user_id:
body = body.replace(self._user_id, "")
# Strip explicit @localpart mentions only (not bare localpart words).
if self._user_id and ":" in self._user_id:
localpart = self._user_id.split(":")[0].lstrip("@")
if localpart:
body = re.sub(
r'(?<![\w])@' + re.escape(localpart) + r'\b',
'',
body,
flags=re.IGNORECASE,
)
# Normalize spacing after mention removal.
body = re.sub(r'[ \t]{2,}', ' ', body)
body = re.sub(r'\s+([,.;:!?])', r'\1', body)
return body.strip()
async def _get_display_name(self, room_id: str, user_id: str) -> str:
+80 -1
View File
@@ -4800,6 +4800,58 @@ class GatewayRunner:
"compression",
f"{_new_tokens:,}",
)
# If summary generation failed, the
# compressor inserted a static fallback
# placeholder and the dropped turns are
# gone for good. Surface a visible
# warning to the gateway user — agent.log
# alone is invisible on TG/Discord/etc.
_comp = getattr(_hyg_agent, "context_compressor", None)
if _comp is not None and getattr(_comp, "_last_summary_fallback_used", False):
_dropped = getattr(_comp, "_last_summary_dropped_count", 0)
_err = getattr(_comp, "_last_summary_error", None) or "unknown error"
_warn_msg = (
"⚠️ Context compression summary failed "
f"({_err}). {_dropped} historical message(s) "
"were removed and replaced with a placeholder. "
"Earlier context is no longer recoverable. "
"Consider /reset for a clean session, or check "
"your auxiliary.compression model configuration."
)
try:
_adapter = self.adapters.get(source.platform)
if _adapter and source.chat_id:
await _adapter.send(source.chat_id, _warn_msg, metadata=_hyg_meta)
except Exception as _werr:
logger.warning(
"Failed to deliver compression-failure warning to user: %s",
_werr,
)
# Separately: if the user's CONFIGURED aux
# model failed and we recovered by falling
# back to the main model, tell them — a
# misconfigured auxiliary.compression.model
# is something only they can fix, and
# silent recovery would hide it.
elif _comp is not None and getattr(_comp, "_last_aux_model_failure_model", None):
_aux_model = getattr(_comp, "_last_aux_model_failure_model", "")
_aux_err = getattr(_comp, "_last_aux_model_failure_error", None) or "unknown error"
_aux_msg = (
f"️ Configured compression model `{_aux_model}` "
f"failed ({_aux_err}). Recovered using your main "
"model — context is intact — but you may want to "
"check `auxiliary.compression.model` in config.yaml."
)
try:
_adapter = self.adapters.get(source.platform)
if _adapter and source.chat_id:
await _adapter.send(source.chat_id, _aux_msg, metadata=_hyg_meta)
except Exception as _werr:
logger.warning(
"Failed to deliver aux-model-fallback notice to user: %s",
_werr,
)
finally:
self._cleanup_agent_resources(_hyg_agent)
@@ -7343,6 +7395,17 @@ class GatewayRunner:
approx_tokens,
new_tokens,
)
# Detect summary-generation failure so we can surface a
# visible warning to the user even on the manual /compress
# path (otherwise the failure is silently logged).
_summary_failed = bool(getattr(compressor, "_last_summary_fallback_used", False))
_dropped_count = int(getattr(compressor, "_last_summary_dropped_count", 0) or 0)
_summary_err = getattr(compressor, "_last_summary_error", None)
# Separately: did the user's CONFIGURED aux model fail
# and we recovered via main? Surface that as an info
# note so they can fix their config.
_aux_fail_model = getattr(compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(compressor, "_last_aux_model_failure_error", None)
finally:
self._cleanup_agent_resources(tmp_agent)
lines = [f"🗜️ {summary['headline']}"]
@@ -7351,6 +7414,20 @@ class GatewayRunner:
lines.append(summary["token_line"])
if summary["note"]:
lines.append(summary["note"])
if _summary_failed:
lines.append(
f"⚠️ Summary generation failed ({_summary_err or 'unknown error'}). "
f"{_dropped_count} historical message(s) were removed and replaced "
"with a placeholder; earlier context is no longer recoverable. "
"Consider checking your auxiliary.compression model configuration."
)
elif _aux_fail_model:
lines.append(
f"️ Configured compression model `{_aux_fail_model}` failed "
f"({_aux_fail_err or 'unknown error'}). Recovered using your main "
"model — context is intact — but you may want to check "
"`auxiliary.compression.model` in config.yaml."
)
return "\n".join(lines)
except Exception as e:
logger.warning("Manual compress failed: %s", e)
@@ -8483,6 +8560,7 @@ class GatewayRunner:
The enriched message string with vision descriptions prepended.
"""
from tools.vision_tools import vision_analyze_tool
from agent.memory_manager import sanitize_context
analysis_prompt = (
"Describe everything visible in this image in thorough detail. "
@@ -8501,6 +8579,7 @@ class GatewayRunner:
result = json.loads(result_json)
if result.get("success"):
description = result.get("analysis", "")
description = sanitize_context(description)
enriched_parts.append(
f"[The user sent an image~ Here's what I can see:\n{description}]\n"
f"[If you need a closer look, use vision_analyze with "
@@ -9962,7 +10041,7 @@ class GatewayRunner:
# Bridge sync status_callback → async adapter.send for context pressure
_status_adapter = self.adapters.get(source.platform)
_status_chat_id = source.chat_id
_status_thread_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
_status_thread_metadata = {"thread_id": _progress_thread_id, "mention_user_id": source.user_id} if _progress_thread_id else {"mention_user_id": source.user_id}
def _status_callback_sync(event_type: str, message: str) -> None:
if not _status_adapter or not _run_still_current():
+9
View File
@@ -224,6 +224,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("ARCEEAI_API_KEY",),
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": ProviderConfig(
id="gmi",
name="GMI Cloud",
auth_type="api_key",
inference_base_url="https://api.gmi-serving.com/v1",
api_key_env_vars=("GMI_API_KEY",),
base_url_env_var="GMI_BASE_URL",
),
"minimax": ProviderConfig(
id="minimax",
name="MiniMax",
@@ -1120,6 +1128,7 @@ def resolve_provider(
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"step": "stepfun", "stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee", "arceeai": "arcee",
"gmi-cloud": "gmi", "gmicloud": "gmi",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
+68 -8
View File
@@ -56,8 +56,18 @@ _EXTRA_ENV_KEYS = frozenset({
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
"MATRIX_PASSWORD", "MATRIX_ENCRYPTION", "MATRIX_DEVICE_ID", "MATRIX_HOME_ROOM",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD",
"MATRIX_REQUIRE_MENTION", "MATRIX_FREE_RESPONSE_ROOMS", "MATRIX_AUTO_THREAD", "MATRIX_DM_AUTO_THREAD",
"MATRIX_RECOVERY_KEY",
# Langfuse observability plugin — optional tuning keys + standard SDK vars
"HERMES_LANGFUSE_ENABLED", # backward-compat env var (new: plugins.langfuse.enabled in config.yaml)
"HERMES_LANGFUSE_ENV",
"HERMES_LANGFUSE_RELEASE",
"HERMES_LANGFUSE_SAMPLE_RATE",
"HERMES_LANGFUSE_MAX_CHARS",
"HERMES_LANGFUSE_DEBUG",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
})
import yaml
@@ -942,7 +952,7 @@ DEFAULT_CONFIG = {
# Pre-exec security scanning via tirith
"security": {
"allow_private_urls": False, # Allow requests to private/internal IPs (for OpenWrt, proxies, VPNs)
"redact_secrets": True,
"redact_secrets": False,
"tirith_enabled": True,
"tirith_path": "tirith",
"tirith_timeout": 5,
@@ -1254,6 +1264,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"GMI_API_KEY": {
"description": "GMI Cloud API key",
"prompt": "GMI Cloud API key",
"url": "https://www.gmicloud.ai/",
"password": True,
"category": "provider",
"advanced": True,
},
"GMI_BASE_URL": {
"description": "GMI Cloud base URL override",
"prompt": "GMI Cloud base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"MINIMAX_API_KEY": {
"description": "MiniMax API key (international)",
"prompt": "MiniMax API key",
@@ -1676,6 +1702,30 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Langfuse observability ──
"HERMES_LANGFUSE_PUBLIC_KEY": {
"description": "Langfuse project public key (pk-lf-...)",
"prompt": "Langfuse public key",
"url": "https://cloud.langfuse.com",
"password": False,
"category": "tool",
},
"HERMES_LANGFUSE_SECRET_KEY": {
"description": "Langfuse project secret key (sk-lf-...)",
"prompt": "Langfuse secret key",
"url": "https://cloud.langfuse.com",
"password": True,
"category": "tool",
},
"HERMES_LANGFUSE_BASE_URL": {
"description": "Langfuse server URL (default: https://cloud.langfuse.com)",
"prompt": "Langfuse server URL (leave empty for cloud.langfuse.com)",
"url": None,
"password": False,
"category": "tool",
"advanced": True,
},
# ── Messaging platforms ──
"TELEGRAM_BOT_TOKEN": {
"description": "Telegram bot token from @BotFather",
@@ -1823,6 +1873,14 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"MATRIX_DM_AUTO_THREAD": {
"description": "Auto-create threads for DM messages in Matrix (default: false)",
"prompt": "Auto-create threads in DMs (true/false)",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"MATRIX_DEVICE_ID": {
"description": "Stable Matrix device ID for E2EE persistence across restarts (e.g. HERMES_BOT)",
"prompt": "Matrix device ID (stable across restarts)",
@@ -3337,14 +3395,16 @@ def load_config() -> Dict[str, Any]:
_SECURITY_COMMENT = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default — tool output (terminal stdout,
# read_file results, web content) passes through unmodified. Set
# redact_secrets to true to mask strings that look like API keys, tokens,
# and passwords before they enter the model context and logs.
# tirith pre-exec scanning is enabled by default when the tirith binary
# is available. Configure via security.tirith_* keys or env vars
# (TIRITH_ENABLED, TIRITH_BIN, TIRITH_TIMEOUT, TIRITH_FAIL_OPEN).
#
# security:
# redact_secrets: false
# redact_secrets: true
# tirith_enabled: true
# tirith_path: "tirith"
# tirith_timeout: 5
@@ -3377,11 +3437,11 @@ _FALLBACK_COMMENT = """
_COMMENTED_SECTIONS = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
# Secret redaction is OFF by default. Set to true to mask strings that
# look like API keys, tokens, and passwords in tool output and logs.
#
# security:
# redact_secrets: false
# redact_secrets: true
# ── Fallback Model ────────────────────────────────────────────────────
# Automatic provider failover when primary is unavailable.
+2
View File
@@ -46,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
"Z_AI_API_KEY",
"KIMI_API_KEY",
"KIMI_CN_API_KEY",
"GMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"KILOCODE_API_KEY",
@@ -937,6 +938,7 @@ def run_doctor(args):
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
+58 -3
View File
@@ -829,8 +829,29 @@ def _print_tui_exit_summary(session_id: Optional[str], active_session_file: Opti
)
_NPM_LOCK_RUNTIME_KEYS = frozenset({"ideallyInert"})
def _tui_need_npm_install(root: Path) -> bool:
"""True when @hermes/ink is missing or node_modules is behind package-lock.json (post-pull)."""
"""True when @hermes/ink is missing or node_modules is behind package-lock.json.
Compares ``package-lock.json`` against ``node_modules/.package-lock.json``
(npm's hidden lockfile) by **content**, not mtime: git checkouts and npm
rewrites can bump the root lockfile's timestamp even when installed deps
already match, which used to trigger a spurious "Installing TUI
dependencies" on every launch.
For each entry in the root lock's ``packages`` map:
- missing from hidden lock reinstall (unless the entry is marked
``optional`` or ``peer``, which npm may intentionally skip per platform)
- present but with differing fields (excluding npm-written runtime
annotations like ``ideallyInert``) reinstall
Extra entries that exist only in the hidden lock are ignored stale
transitives left over from a removed dependency don't break runtime and
we'd rather not force a reinstall for them. Falls back to mtime
comparison if either lockfile is unparseable.
"""
ink = root / "node_modules" / "@hermes" / "ink" / "package.json"
if not ink.is_file():
return True
@@ -840,7 +861,35 @@ def _tui_need_npm_install(root: Path) -> bool:
marker = root / "node_modules" / ".package-lock.json"
if not marker.is_file():
return True
return lock.stat().st_mtime > marker.stat().st_mtime
# Compare lockfile contents, not mtimes: git checkouts and npm rewrites
# can bump the root lockfile timestamp even when installed deps already
# match. Fall back to mtime when either file is unparseable.
try:
wanted = json.loads(lock.read_text(encoding="utf-8")).get("packages") or {}
installed = json.loads(marker.read_text(encoding="utf-8")).get("packages") or {}
except (OSError, UnicodeDecodeError, json.JSONDecodeError):
return lock.stat().st_mtime > marker.stat().st_mtime
def comparable(pkg: dict) -> dict:
return {k: v for k, v in pkg.items() if k not in _NPM_LOCK_RUNTIME_KEYS}
for name, pkg in wanted.items():
if not name:
continue
if not isinstance(pkg, dict):
continue
if name not in installed:
if pkg.get("optional") or pkg.get("peer"):
continue
return True
if isinstance(installed[name], dict) and comparable(pkg) != comparable(installed[name]):
return True
return False
def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
@@ -1768,6 +1817,7 @@ def select_provider_and_model(args=None):
"huggingface",
"xiaomi",
"arcee",
"gmi",
"nvidia",
"ollama-cloud",
):
@@ -7782,6 +7832,7 @@ For more help on a command:
"kilocode",
"xiaomi",
"arcee",
"gmi",
"nvidia",
],
default=None,
@@ -9031,7 +9082,11 @@ Examples:
)
plugins_remove.add_argument("name", help="Plugin directory name to remove")
plugins_subparsers.add_parser("list", aliases=["ls"], help="List installed plugins")
plugins_list = plugins_subparsers.add_parser("list", aliases=["ls"], help="List installed plugins")
plugins_list.add_argument(
"--available", action="store_true",
help="Also show official optional plugins that are not yet installed",
)
plugins_enable = plugins_subparsers.add_parser(
"enable", help="Enable a disabled plugin"
+24 -1
View File
@@ -278,6 +278,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"trinity-large-preview",
"trinity-mini",
],
"gmi": [
"zai-org/GLM-5.1-FP8",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"google/gemini-3.1-flash-lite-preview",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
],
"opencode-zen": [
"kimi-k2.5",
"gpt-5.4-pro",
@@ -709,7 +717,6 @@ class ProviderEntry(NamedTuple):
label: str
tui_desc: str # detailed description for `hermes model` TUI
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
@@ -735,6 +742,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("gmi", "GMI Cloud", "GMI Cloud (multi-model direct API)"),
ProviderEntry("kilocode", "Kilo Code", "Kilo Code (Kilo Gateway API)"),
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
@@ -769,6 +777,8 @@ _PROVIDER_ALIASES = {
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -1849,6 +1859,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
if normalized == "gmi":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("gmi")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
+14
View File
@@ -79,6 +79,20 @@ VALID_HOOKS: Set[str] = {
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
# Approval lifecycle hooks. Fired by tools/approval.py when a dangerous
# command needs user approval -- fires BOTH for CLI-interactive prompts
# and for gateway/ACP approvals (Telegram, Discord, Slack, TUI, etc.).
# Observers only: return values are ignored. Plugins cannot veto or
# pre-answer an approval from these hooks (use pre_tool_call to block
# a tool before it reaches approval).
#
# Kwargs for pre_approval_request:
# command: str, description: str, pattern_key: str, pattern_keys: list[str],
# session_key: str, surface: "cli" | "gateway"
# Kwargs for post_approval_response: same as above plus
# choice: "once" | "session" | "always" | "deny" | "timeout"
"pre_approval_request",
"post_approval_response",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
+168 -10
View File
@@ -1,7 +1,13 @@
"""``hermes plugins`` CLI subcommand — install, update, remove, and list plugins.
Plugins are installed from Git repositories into ``~/.hermes/plugins/``.
Supports full URLs and ``owner/repo`` shorthand (resolves to GitHub).
Plugins can be installed from:
- Official optional plugins shipped with the repo: ``official/<category>/<name>``
- Git repositories (full URL or ``owner/repo`` GitHub shorthand)
Official plugins live in ``optional-plugins/`` inside the Hermes repo and are
copied into ``~/.hermes/plugins/`` on install no git clone needed, no network
required. They are NOT auto-discovered from ``optional-plugins/``; only installed
copies in ``~/.hermes/plugins/`` are loaded by Hermes.
After install, if the plugin ships an ``after-install.md`` file it is
rendered with Rich Markdown. Otherwise a default confirmation is shown.
@@ -95,10 +101,80 @@ def _resolve_git_url(identifier: str) -> str:
raise ValueError(
f"Invalid plugin identifier: '{identifier}'. "
"Use a Git URL or owner/repo shorthand."
"Use 'official/<category>/<name>', a Git URL, or owner/repo shorthand."
)
def _optional_plugins_dir() -> Path:
"""Return the optional-plugins/ directory shipped with the Hermes repo."""
return Path(__file__).resolve().parent.parent / "optional-plugins"
def _resolve_official_plugin(identifier: str) -> Optional[Path]:
"""If *identifier* is 'official/<category>/<name>', return its source path.
Returns ``None`` when the identifier is not in official format or the
plugin directory does not exist.
"""
# Accept 'official/category/name' or just 'category/name' when the
# category/name path exists under optional-plugins/.
parts = identifier.strip("/").split("/")
# Strip leading 'official' prefix if present
if parts and parts[0] == "official":
parts = parts[1:]
if len(parts) < 1:
return None
base = _optional_plugins_dir()
# Try category/name (2 parts) or bare name (1 part)
for nparts in (2, 1):
if len(parts) < nparts:
continue
candidate = base.joinpath(*parts[-nparts:])
try:
resolved = candidate.resolve()
base_resolved = base.resolve()
resolved.relative_to(base_resolved) # traversal guard
except (ValueError, OSError):
continue
if resolved.is_dir() and (
(resolved / "plugin.yaml").exists() or (resolved / "__init__.py").exists()
):
return resolved
return None
def _list_official_plugins() -> list[tuple[str, str]]:
"""Return [(identifier, description), ...] for all official optional plugins."""
base = _optional_plugins_dir()
if not base.is_dir():
return []
results = []
for category_dir in sorted(base.iterdir()):
if not category_dir.is_dir() or category_dir.name.startswith("."):
continue
for plugin_dir in sorted(category_dir.iterdir()):
if not plugin_dir.is_dir() or plugin_dir.name.startswith("."):
continue
manifest_file = plugin_dir / "plugin.yaml"
desc = ""
if manifest_file.exists():
try:
import yaml
data = yaml.safe_load(manifest_file.read_text()) or {}
desc = data.get("description", "")
except Exception:
pass
identifier = f"official/{category_dir.name}/{plugin_dir.name}"
results.append((identifier, desc))
return results
def _repo_name_from_url(url: str) -> str:
"""Extract the repo name from a Git URL for the plugin directory name."""
# Strip trailing .git and slashes
@@ -296,7 +372,61 @@ def cmd_install(
from rich.console import Console
console = Console()
plugins_dir = _plugins_dir()
# ── Official optional plugins (no network, copied from optional-plugins/) ──
official_src = _resolve_official_plugin(identifier)
if official_src is not None:
manifest = _read_manifest(official_src)
plugin_name = manifest.get("name") or official_src.name
target = _sanitize_plugin_name(plugin_name, plugins_dir)
if target.exists():
if not force:
console.print(
f"[red]Error:[/red] Plugin '{plugin_name}' already exists at {target}.\n"
f"Use [bold]--force[/bold] to reinstall, or "
f"[bold]hermes plugins update {plugin_name}[/bold] to update."
)
sys.exit(1)
console.print(f"[dim] Removing existing {plugin_name}...[/dim]")
shutil.rmtree(target)
console.print(f"[dim]Installing {plugin_name} from official optional plugins...[/dim]")
shutil.copytree(str(official_src), str(target))
_copy_example_files(target, console)
_prompt_plugin_env_vars(manifest, console)
_display_after_install(target, identifier)
installed_name = manifest.get("name") or target.name
should_enable = enable
if should_enable is None:
if sys.stdin.isatty() and sys.stdout.isatty():
try:
answer = input(" Enable now? [y/N] ").strip().lower()
should_enable = answer in ("y", "yes")
except (EOFError, KeyboardInterrupt):
should_enable = False
else:
should_enable = False
if should_enable:
enabled = _get_enabled_set()
disabled = _get_disabled_set()
enabled.add(installed_name)
disabled.discard(installed_name)
_save_enabled_set(enabled)
_save_disabled_set(disabled)
console.print(f" [green]✓[/green] Plugin [bold]{installed_name}[/bold] enabled.")
else:
console.print(
f" [dim]Plugin installed but not enabled. "
f"Run [bold]hermes plugins enable {installed_name}[/bold] to activate.[/dim]"
)
return
# ── Git URL / owner/repo install ──────────────────────────────────────────
try:
git_url = _resolve_git_url(identifier)
except ValueError as e:
@@ -310,8 +440,6 @@ def cmd_install(
"Consider using https:// or git@ for production installs."
)
plugins_dir = _plugins_dir()
# Clone into a temp directory first so we can read plugin.yaml for the name
with tempfile.TemporaryDirectory() as tmp:
tmp_target = Path(tmp) / "plugin"
@@ -696,16 +824,21 @@ def _discover_all_plugins() -> list:
return list(seen.values())
def cmd_list() -> None:
"""List all plugins (bundled + user) with enabled/disabled state."""
def cmd_list(available: bool = False) -> None:
"""List all plugins (bundled + user) with enabled/disabled state.
When *available* is True, also show official optional plugins that are
not yet installed.
"""
from rich.console import Console
from rich.table import Table
console = Console()
entries = _discover_all_plugins()
if not entries:
if not entries and not available:
console.print("[dim]No plugins installed.[/dim]")
console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
console.print("[dim]Install with:[/dim] hermes plugins install official/<category>/<name>")
console.print("[dim]Browse available:[/dim] hermes plugins list --available")
return
enabled = _get_enabled_set()
@@ -734,6 +867,31 @@ def cmd_list() -> None:
console.print("[dim]Enable/disable:[/dim] hermes plugins enable/disable <name>")
console.print("[dim]Plugins are opt-in by default — only 'enabled' plugins load.[/dim]")
if available:
official = _list_official_plugins()
if official:
installed_names = {name for name, *_ in entries}
def _is_installed(ident: str) -> bool:
dirname = ident.rsplit("/", 1)[-1]
# Check both the directory name (langfuse-tracing) and
# common underscore variant (langfuse_tracing) since the
# installed plugin uses the manifest name, not the dir name.
return (dirname in installed_names
or dirname.replace("-", "_") in installed_names)
not_installed = [(ident, desc) for ident, desc in official
if not _is_installed(ident)]
if not_installed:
console.print()
avail_table = Table(title="Official optional plugins (not installed)", show_lines=False)
avail_table.add_column("Identifier", style="bold")
avail_table.add_column("Description")
for ident, desc in not_installed:
avail_table.add_row(ident, desc)
console.print(avail_table)
console.print("[dim]Install:[/dim] hermes plugins install official/<category>/<name>")
else:
console.print("[dim]All official optional plugins are already installed.[/dim]")
# ---------------------------------------------------------------------------
# Provider plugin discovery helpers
@@ -1270,7 +1428,7 @@ def plugins_command(args) -> None:
elif action == "disable":
cmd_disable(args.name)
elif action in ("list", "ls"):
cmd_list()
cmd_list(available=getattr(args, "available", False))
elif action is None:
cmd_toggle()
else:
+11
View File
@@ -163,6 +163,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
base_url_override="https://api.arcee.ai/api/v1",
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": HermesOverlay(
transport="openai_chat",
extra_env_vars=("GMI_API_KEY",),
base_url_override="https://api.gmi-serving.com/v1",
base_url_env_var="GMI_BASE_URL",
),
"ollama-cloud": HermesOverlay(
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
@@ -297,6 +303,10 @@ ALIASES: Dict[str, str] = {
"arcee-ai": "arcee",
"arceeai": "arcee",
# gmi
"gmi-cloud": "gmi",
"gmicloud": "gmi",
# Local server aliases → virtual "local" concept (resolved via user config)
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
@@ -319,6 +329,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
"gmi": "GMI Cloud",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+50
View File
@@ -425,6 +425,31 @@ TOOL_CATEGORIES = {
},
],
},
"langfuse": {
"name": "Langfuse Observability",
"icon": "📊",
"providers": [
{
"name": "Langfuse Cloud",
"tag": "Hosted Langfuse (cloud.langfuse.com)",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)", "url": "https://cloud.langfuse.com"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)", "url": "https://cloud.langfuse.com"},
],
"post_setup": "langfuse",
},
{
"name": "Langfuse Self-Hosted",
"tag": "Self-hosted Langfuse instance",
"env_vars": [
{"key": "HERMES_LANGFUSE_PUBLIC_KEY", "prompt": "Langfuse public key (pk-lf-...)"},
{"key": "HERMES_LANGFUSE_SECRET_KEY", "prompt": "Langfuse secret key (sk-lf-...)"},
{"key": "HERMES_LANGFUSE_BASE_URL", "prompt": "Langfuse server URL (e.g. http://localhost:3000)", "default": "http://localhost:3000"},
],
"post_setup": "langfuse",
},
],
},
}
# Simple env-var requirements for toolsets NOT in TOOL_CATEGORIES.
@@ -567,6 +592,31 @@ def _run_post_setup(post_setup_key: str):
_print_info(" git submodule update --init --recursive")
_print_info(' uv pip install -e "./tinker-atropos"')
elif post_setup_key == "langfuse":
# Install the langfuse SDK.
try:
__import__("langfuse")
_print_success(" langfuse SDK already installed")
except ImportError:
import subprocess
_print_info(" Installing langfuse SDK...")
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "langfuse", "--quiet"],
capture_output=True, text=True, timeout=120,
)
if result.returncode == 0:
_print_success(" langfuse SDK installed")
else:
_print_warning(" langfuse SDK install failed — run manually: pip install langfuse")
# Install and enable the official optional plugin into ~/.hermes/plugins/.
try:
from hermes_cli.plugins_cmd import cmd_install as _plugins_install
_plugins_install("official/observability/langfuse", enable=True)
except SystemExit:
pass # cmd_install prints its own errors and calls sys.exit
_print_info(" Restart Hermes for tracing to take effect.")
_print_info(" Verify: hermes plugins list")
# ─── Platform / Toolset Helpers ───────────────────────────────────────────────
+4 -4
View File
@@ -2212,7 +2212,7 @@ async def get_usage_analytics(days: int = 30):
cutoff = time.time() - (days * 86400)
cur = db._conn.execute("""
SELECT date(started_at, 'unixepoch') as day,
SUM(input_tokens + COALESCE(cache_read_tokens, 0) + COALESCE(cache_write_tokens, 0)) as input_tokens,
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
SUM(cache_read_tokens) as cache_read_tokens,
SUM(reasoning_tokens) as reasoning_tokens,
@@ -2227,18 +2227,18 @@ async def get_usage_analytics(days: int = 30):
cur2 = db._conn.execute("""
SELECT model,
SUM(input_tokens + COALESCE(cache_read_tokens, 0) + COALESCE(cache_write_tokens, 0)) as input_tokens,
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
COUNT(*) as sessions,
SUM(COALESCE(api_call_count, 0)) as api_calls
FROM sessions WHERE started_at > ? AND model IS NOT NULL
GROUP BY model ORDER BY SUM(input_tokens + COALESCE(cache_read_tokens, 0) + COALESCE(cache_write_tokens, 0)) + SUM(output_tokens) DESC
GROUP BY model ORDER BY SUM(input_tokens) + SUM(output_tokens) DESC
""", (cutoff,))
by_model = [dict(r) for r in cur2.fetchall()]
cur3 = db._conn.execute("""
SELECT SUM(input_tokens + COALESCE(cache_read_tokens, 0) + COALESCE(cache_write_tokens, 0)) as total_input,
SELECT SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_tokens) as total_cache_read,
SUM(reasoning_tokens) as total_reasoning,
+294 -143
View File
@@ -22,6 +22,8 @@ import sqlite3
import threading
import time
from pathlib import Path
from agent.memory_manager import sanitize_context
from hermes_constants import get_hermes_home
from typing import Any, Callable, Dict, List, Optional, TypeVar
@@ -31,7 +33,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 9
SCHEMA_VERSION = 10
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -119,6 +121,32 @@ CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
END;
"""
# Trigram FTS5 table for CJK substring search. The default unicode61
# tokenizer splits CJK characters into individual tokens, breaking phrase
# matching. The trigram tokenizer creates overlapping 3-byte sequences so
# substring queries work natively for any script (CJK, Thai, etc.).
FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
content,
content=messages,
content_rowid=id,
tokenize='trigram'
);
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
"""
class SessionDB:
"""
@@ -257,118 +285,156 @@ class SessionDB:
self._conn.close()
self._conn = None
@staticmethod
def _parse_schema_columns(schema_sql: str) -> Dict[str, Dict[str, str]]:
"""Extract expected columns per table from SCHEMA_SQL.
Uses an in-memory SQLite database to parse the SQL SQLite itself
handles all syntax (DEFAULT expressions with commas, inline
REFERENCES, CHECK constraints, etc.) so there are zero regex
edge cases. The in-memory DB is opened, the schema DDL is
executed, and PRAGMA table_info extracts the column metadata.
Adding a column to SCHEMA_SQL is all that's needed; the
reconciliation loop picks it up automatically.
"""
ref = sqlite3.connect(":memory:")
try:
ref.executescript(schema_sql)
table_columns: Dict[str, Dict[str, str]] = {}
for (tbl,) in ref.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name NOT LIKE 'sqlite_%'"
).fetchall():
cols: Dict[str, str] = {}
for row in ref.execute(
f'PRAGMA table_info("{tbl}")'
).fetchall():
# row: (cid, name, type, notnull, dflt_value, pk)
col_name = row[1]
col_type = row[2] or ""
notnull = row[3]
default = row[4]
pk = row[5]
# Reconstruct the type expression for ALTER TABLE ADD COLUMN
parts = [col_type] if col_type else []
if notnull and not pk:
parts.append("NOT NULL")
if default is not None:
parts.append(f"DEFAULT {default}")
cols[col_name] = " ".join(parts)
table_columns[tbl] = cols
return table_columns
finally:
ref.close()
def _reconcile_columns(self, cursor: sqlite3.Cursor) -> None:
"""Ensure live tables have every column declared in SCHEMA_SQL.
Follows the Beets/sqlite-utils pattern: the CREATE TABLE definition
in SCHEMA_SQL is the single source of truth for the desired schema.
On every startup this method diffs the live columns (via PRAGMA
table_info) against the declared columns, and ADDs any that are
missing.
This makes column additions a declarative operation just add
the column to SCHEMA_SQL and it appears on the next startup.
Version-gated migration blocks are no longer needed for ADD COLUMN.
"""
expected = self._parse_schema_columns(SCHEMA_SQL)
for table_name, declared_cols in expected.items():
# Get current columns from the live table
try:
rows = cursor.execute(
f'PRAGMA table_info("{table_name}")'
).fetchall()
except sqlite3.OperationalError:
continue # Table doesn't exist yet (shouldn't happen after executescript)
live_cols = set()
for row in rows:
# PRAGMA table_info returns (cid, name, type, notnull, dflt_value, pk)
name = row[1] if isinstance(row, (tuple, list)) else row["name"]
live_cols.add(name)
for col_name, col_type in declared_cols.items():
if col_name not in live_cols:
safe_name = col_name.replace('"', '""')
try:
cursor.execute(
f'ALTER TABLE "{table_name}" ADD COLUMN "{safe_name}" {col_type}'
)
except sqlite3.OperationalError as exc:
# Expected: "duplicate column name" from a race or
# re-run. Unexpected: "Cannot add a NOT NULL column
# with default value NULL" from a schema mistake.
# Log at DEBUG so it's visible in agent.log.
logger.debug(
"reconcile %s.%s: %s", table_name, col_name, exc,
)
def _init_schema(self):
"""Create tables and FTS if they don't exist, run migrations."""
"""Create tables and FTS if they don't exist, reconcile columns.
Schema management follows the declarative reconciliation pattern
(Beets, sqlite-utils): SCHEMA_SQL is the single source of truth.
On existing databases, _reconcile_columns() diffs live columns
against SCHEMA_SQL and ADDs any missing ones. This eliminates
the version-gated migration chain for column additions, making
it impossible for reordered or inserted migrations to skip columns.
The schema_version table is retained for future data migrations
(transforming existing rows) which cannot be handled declaratively.
"""
cursor = self._conn.cursor()
cursor.executescript(SCHEMA_SQL)
# Check schema version and run migrations
# ── Declarative column reconciliation ──────────────────────────
# Diff live tables against SCHEMA_SQL and ADD any missing columns.
# This is idempotent and self-healing: even if a version-gated
# migration was skipped (e.g. due to version renumbering), the
# column gets created here.
self._reconcile_columns(cursor)
# ── Schema version bookkeeping ─────────────────────────────────
# Bump to current so future data migrations (if any) can gate on
# version. No version-gated column additions remain.
cursor.execute("SELECT version FROM schema_version LIMIT 1")
row = cursor.fetchone()
if row is None:
cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
cursor.execute(
"INSERT INTO schema_version (version) VALUES (?)",
(SCHEMA_VERSION,),
)
else:
current_version = row["version"] if isinstance(row, sqlite3.Row) else row[0]
if current_version < 2:
# v2: add finish_reason column to messages
# Data migrations that can't be expressed declaratively (row
# backfills, index changes tied to a specific version step) stay
# in a version-gated chain. Column additions are handled by
# _reconcile_columns() above and no longer need entries here.
if current_version < 10:
# v10: trigram FTS5 table for CJK/substring search. The
# virtual table + triggers are created unconditionally via
# FTS_TRIGRAM_SQL below, but existing rows need a one-time
# backfill into the FTS index.
try:
cursor.execute("ALTER TABLE messages ADD COLUMN finish_reason TEXT")
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
_fts_trigram_exists = True
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 2")
if current_version < 3:
# v3: add title column to sessions
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN title TEXT")
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 3")
if current_version < 4:
# v4: add unique index on title (NULLs allowed, only non-NULL must be unique)
try:
_fts_trigram_exists = False
if not _fts_trigram_exists:
cursor.executescript(FTS_TRIGRAM_SQL)
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
"ON sessions(title) WHERE title IS NOT NULL"
"INSERT INTO messages_fts_trigram(rowid, content) "
"SELECT id, content FROM messages WHERE content IS NOT NULL"
)
except sqlite3.OperationalError:
pass # Index already exists
cursor.execute("UPDATE schema_version SET version = 4")
if current_version < 5:
new_columns = [
("cache_read_tokens", "INTEGER DEFAULT 0"),
("cache_write_tokens", "INTEGER DEFAULT 0"),
("reasoning_tokens", "INTEGER DEFAULT 0"),
("billing_provider", "TEXT"),
("billing_base_url", "TEXT"),
("billing_mode", "TEXT"),
("estimated_cost_usd", "REAL"),
("actual_cost_usd", "REAL"),
("cost_status", "TEXT"),
("cost_source", "TEXT"),
("pricing_version", "TEXT"),
]
for name, column_type in new_columns:
try:
# name and column_type come from the hardcoded tuple above,
# not user input. Double-quote identifier escaping is applied
# as defense-in-depth; SQLite DDL cannot be parameterized.
safe_name = name.replace('"', '""')
cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
except sqlite3.OperationalError:
pass
cursor.execute("UPDATE schema_version SET version = 5")
if current_version < 6:
# v6: add reasoning columns to messages table — preserves assistant
# reasoning text and structured reasoning_details across gateway
# session turns. Without these, reasoning chains are lost on
# session reload, breaking multi-turn reasoning continuity for
# providers that replay reasoning (OpenRouter, OpenAI, Nous).
for col_name, col_type in [
("reasoning", "TEXT"),
("reasoning_details", "TEXT"),
("codex_reasoning_items", "TEXT"),
]:
try:
safe = col_name.replace('"', '""')
cursor.execute(
f'ALTER TABLE messages ADD COLUMN "{safe}" {col_type}'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 6")
if current_version < 7:
# v7: preserve provider-native reasoning_content separately from
# normalized reasoning text. Kimi/Moonshot replay can require
# this field on assistant tool-call messages when thinking is on.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "reasoning_content" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 7")
if current_version < 8:
# v8: add api_call_count column to sessions — tracks the number
# of individual LLM API calls made within a session (as opposed
# to the session count itself).
try:
cursor.execute(
'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
if current_version < SCHEMA_VERSION:
cursor.execute(
"UPDATE schema_version SET version = ?",
(SCHEMA_VERSION,),
)
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
# Unique title index — always ensure it exists
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
@@ -383,6 +449,12 @@ class SessionDB:
except sqlite3.OperationalError:
cursor.executescript(FTS_SQL)
# Trigram FTS5 for CJK/substring search
try:
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
except sqlite3.OperationalError:
cursor.executescript(FTS_TRIGRAM_SQL)
self._conn.commit()
# =========================================================================
@@ -1155,7 +1227,10 @@ class SessionDB:
messages = []
for row in rows:
msg = {"role": row["role"], "content": row["content"]}
content = row["content"]
if row["role"] in {"user", "assistant"} and isinstance(content, str):
content = sanitize_context(content).strip()
msg = {"role": row["role"], "content": content}
if row["tool_call_id"]:
msg["tool_call_id"] = row["tool_call_id"]
if row["tool_name"]:
@@ -1291,6 +1366,16 @@ class SessionDB:
return sanitized.strip()
@staticmethod
def _is_cjk_codepoint(cp: int) -> bool:
return (0x4E00 <= cp <= 0x9FFF or # CJK Unified Ideographs
0x3400 <= cp <= 0x4DBF or # CJK Extension A
0x20000 <= cp <= 0x2A6DF or # CJK Extension B
0x3000 <= cp <= 0x303F or # CJK Symbols
0x3040 <= cp <= 0x309F or # Hiragana
0x30A0 <= cp <= 0x30FF or # Katakana
0xAC00 <= cp <= 0xD7AF) # Hangul Syllables
@staticmethod
def _contains_cjk(text: str) -> bool:
"""Check if text contains CJK (Chinese, Japanese, Korean) characters."""
@@ -1306,6 +1391,11 @@ class SessionDB:
return True
return False
@classmethod
def _count_cjk(cls, text: str) -> int:
"""Count CJK characters in text."""
return sum(1 for ch in text if cls._is_cjk_codepoint(ord(ch)))
def search_messages(
self,
query: str,
@@ -1376,52 +1466,113 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
with self._lock:
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
# unless query contains CJK (fall back to LIKE below)
if not self._contains_cjk(query):
return []
matches = []
else:
matches = [dict(row) for row in cursor.fetchall()]
# LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK
# characters individually, causing multi-character queries to fail.
if not matches and self._contains_cjk(query):
# CJK queries bypass the unicode61 FTS5 table. The default tokenizer
# splits CJK characters into individual tokens, so "大别山项目" becomes
# "大 AND 别 AND 山 AND 项 AND 目" — producing false positives and
# missing exact phrase matches.
#
# For queries with 3+ CJK characters, we use the trigram FTS5 table
# (indexed substring matching with ranking and snippets). For shorter
# CJK queries (1-2 chars), trigram can't match (it needs ≥9 UTF-8
# bytes = 3 CJK chars), so we fall back to LIKE.
is_cjk = self._contains_cjk(query)
if is_cjk:
raw_query = query.strip('"').strip()
like_where = ["m.content LIKE ?"]
like_params: list = [f"%{raw_query}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
cjk_count = self._count_cjk(raw_query)
if cjk_count >= 3:
# Trigram FTS5 path — quote each non-operator token to handle
# FTS5 special chars (%, *, etc.) while preserving boolean
# operators (AND, OR, NOT) for multi-term queries.
tokens = raw_query.split()
parts = []
for tok in tokens:
if tok.upper() in ("AND", "OR", "NOT"):
parts.append(tok)
else:
parts.append('"' + tok.replace('"', '""') + '"')
trigram_query = " ".join(parts)
tri_where = ["messages_fts_trigram MATCH ?"]
tri_params: list = [trigram_query]
if source_filter is not None:
tri_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
tri_params.extend(source_filter)
if exclude_sources is not None:
tri_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
tri_params.extend(exclude_sources)
if role_filter:
tri_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
tri_params.extend(role_filter)
tri_sql = f"""
SELECT
m.id,
m.session_id,
m.role,
snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
m.content,
m.timestamp,
m.tool_name,
s.source,
s.model,
s.started_at AS session_started
FROM messages_fts_trigram
JOIN messages m ON m.id = messages_fts_trigram.rowid
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(tri_where)}
ORDER BY rank
LIMIT ? OFFSET ?
"""
tri_params.extend([limit, offset])
with self._lock:
try:
tri_cursor = self._conn.execute(tri_sql, tri_params)
except sqlite3.OperationalError:
matches = []
else:
matches = [dict(row) for row in tri_cursor.fetchall()]
else:
# Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
# Fall back to LIKE substring search.
escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_where = ["m.content LIKE ? ESCAPE '\\'"]
like_params: list = [f"%{escaped}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
else:
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
else:
matches = [dict(row) for row in cursor.fetchall()]
# Add surrounding context (1 message before + after each match).
# Done outside the lock so we don't hold it across N sequential queries.
+30 -3
View File
@@ -7,9 +7,7 @@
perSystem = { pkgs, system, lib, ... }:
let
hermes-agent = inputs.self.packages.${system}.default;
hermesVenv = pkgs.callPackage ./python.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
};
hermesVenv = hermes-agent.hermesVenv;
configMergeScript = pkgs.callPackage ./configMergeScript.nix { };
@@ -193,6 +191,35 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
echo "ok" > $out/result
'';
# Verify extraPythonPackages PYTHONPATH injection
extra-python-packages = let
testPkg = pkgs.python312Packages.pyfiglet;
hermesWithExtra = hermes-agent.override {
extraPythonPackages = [ testPkg ];
};
in pkgs.runCommand "hermes-extra-python-packages" { } ''
set -e
echo "=== Checking extraPythonPackages PYTHONPATH injection ==="
grep -q "PYTHONPATH" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: PYTHONPATH not in wrapper"; exit 1)
echo "PASS: PYTHONPATH present in wrapper"
grep -q "${testPkg}" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: test package path not in PYTHONPATH"; exit 1)
echo "PASS: test package path found in wrapper"
echo "=== Checking base package has no PYTHONPATH ==="
if grep -q "PYTHONPATH" ${hermes-agent}/bin/hermes; then
echo "FAIL: base package should not have PYTHONPATH"; exit 1
fi
echo "PASS: base package clean"
echo "=== All extraPythonPackages checks passed ==="
mkdir -p $out
echo "ok" > $out/result
'';
# ── Config merge + round-trip test ────────────────────────────────
# Tests the merge script (Nix activation behavior) across 7
# scenarios, then verifies Python's load_config() reads correctly.
+186
View File
@@ -0,0 +1,186 @@
# nix/hermes-agent.nix — Overridable Hermes Agent package
#
# callPackage auto-wires nixpkgs args; flake inputs are passed explicitly.
# Users override via: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
{
lib,
stdenv,
makeWrapper,
callPackage,
python312,
nodejs_22,
ripgrep,
git,
openssh,
ffmpeg,
tirith,
# Flake inputs — passed explicitly by packages.nix and overlays.nix
uv2nix,
pyproject-nix,
pyproject-build-systems,
npm-lockfile-fix,
# Overridable parameters
extraPythonPackages ? [ ],
}:
let
hermesVenv = callPackage ./python.nix {
inherit uv2nix pyproject-nix pyproject-build-systems;
};
hermesNpmLib = callPackage ./lib.nix {
inherit npm-lockfile-fix;
};
hermesTui = callPackage ./tui.nix {
inherit hermesNpmLib;
};
hermesWeb = callPackage ./web.nix {
inherit hermesNpmLib;
};
bundledSkills = lib.cleanSourceWith {
src = ../skills;
filter = path: _type: !(lib.hasInfix "/index-cache/" path);
};
runtimeDeps = [
nodejs_22
ripgrep
git
openssh
ffmpeg
tirith
];
runtimePath = lib.makeBinPath runtimeDeps;
sitePackagesPath = python312.sitePackages;
# Walk propagatedBuildInputs to include transitive Python deps in PYTHONPATH.
# Without this, a plugin listing e.g. requests as a dep would fail at runtime
# if requests isn't already in the sealed uv2nix venv.
allExtraPythonPackages = python312.pkgs.requiredPythonModules extraPythonPackages;
pythonPath = lib.makeSearchPath sitePackagesPath allExtraPythonPackages;
pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
uvLockHash =
if builtins.pathExists ../uv.lock then
builtins.hashString "sha256" (builtins.readFile ../uv.lock)
else
"none";
in
stdenv.mkDerivation {
pname = "hermes-agent";
version = (builtins.fromTOML (builtins.readFile ../pyproject.toml)).project.version;
dontUnpack = true;
dontBuild = true;
nativeBuildInputs = [ makeWrapper ];
installPhase = ''
runHook preInstall
mkdir -p $out/share/hermes-agent $out/bin
cp -r ${bundledSkills} $out/share/hermes-agent/skills
cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
mkdir -p $out/ui-tui
cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
${lib.concatMapStringsSep "\n"
(name: ''
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
--set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
--set HERMES_TUI_DIR $out/ui-tui \
--set HERMES_PYTHON ${hermesVenv}/bin/python3 \
--set HERMES_NODE ${nodejs_22}/bin/node \
${lib.optionalString (extraPythonPackages != [ ]) ''--suffix PYTHONPATH : "${pythonPath}"''}
'')
[
"hermes"
"hermes-agent"
"hermes-acp"
]
}
${lib.optionalString (extraPythonPackages != [ ]) ''
echo "=== Checking for plugin/core package collisions ==="
${hermesVenv}/bin/python3 -c "
import pathlib, sys, re
def canonical(name):
return re.sub(r'[-_.]+', '-', name).lower()
# Collect core venv package names
core = set()
venv_sp = pathlib.Path('${hermesVenv}/${sitePackagesPath}')
for di in venv_sp.glob('*.dist-info'):
meta = di / 'METADATA'
if meta.exists():
for line in meta.read_text().splitlines():
if line.startswith('Name:'):
core.add(canonical(line.split(':', 1)[1].strip()))
break
# Check each extra package for collisions
extras_dirs = [${lib.concatMapStringsSep ", " (p: "'${toString p}'") allExtraPythonPackages}]
for edir in extras_dirs:
sp = pathlib.Path(edir) / '${sitePackagesPath}'
if not sp.exists():
continue
for di in sp.glob('*.dist-info'):
meta = di / 'METADATA'
if not meta.exists():
continue
for line in meta.read_text().splitlines():
if line.startswith('Name:'):
pkg = canonical(line.split(':', 1)[1].strip())
if pkg in core:
print(f'ERROR: plugin package \"{pkg}\" collides with a package in hermes sealed venv', file=sys.stderr)
print(f' from: {di}', file=sys.stderr)
print(f' Remove this dependency from extraPythonPackages.', file=sys.stderr)
sys.exit(1)
break
print('No collisions found.')
"
echo "=== No collisions ==="
''}
runHook postInstall
'';
passthru = {
inherit hermesTui hermesWeb hermesNpmLib hermesVenv;
devShellHook = ''
STAMP=".nix-stamps/hermes-agent"
STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-agent: installing Python dependencies..."
uv venv .venv --python ${python312}/bin/python3 2>/dev/null || true
source .venv/bin/activate
uv pip install -e ".[all]"
[ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
[ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
mkdir -p .nix-stamps
echo "$STAMP_VALUE" > "$STAMP"
else
source .venv/bin/activate
export HERMES_PYTHON=${hermesVenv}/bin/python3
fi
'';
};
meta = with lib; {
description = "AI agent with advanced tool-calling capabilities";
homepage = "https://github.com/NousResearch/hermes-agent";
mainProgram = "hermes";
license = licenses.mit;
platforms = platforms.unix;
};
}
+81 -6
View File
@@ -28,6 +28,8 @@
let
cfg = config.services.hermes-agent;
effectivePackage = if cfg.extraPythonPackages == [ ] then cfg.package
else cfg.package.override { inherit (cfg) extraPythonPackages; };
hermes-agent = inputs.self.packages.${pkgs.stdenv.hostPlatform.system}.default;
# Deep-merge config type (from 0xrsydn/nix-hermes-agent)
@@ -456,6 +458,52 @@
description = "Extra packages available on PATH.";
};
extraPlugins = mkOption {
type = types.listOf types.package;
default = [ ];
description = ''
Directory-based plugin packages to symlink into the hermes plugins
directory. Each package should contain a plugin.yaml and __init__.py
at its root. Hermes discovers these automatically on startup.
'';
example = literalExpression ''
[
(pkgs.fetchFromGitHub {
owner = "stephenschoettler";
repo = "hermes-lcm";
name = "hermes-lcm";
rev = "v0.7.0";
hash = "sha256-...";
})
]
'';
};
extraPythonPackages = mkOption {
type = types.listOf types.package;
default = [ ];
description = ''
Python packages to add to PYTHONPATH for entry-point plugin discovery.
These are pip-packaged plugins that register via the
hermes_agent.plugins entry-point group. Each package must be built
with the same Python interpreter as hermes (python312).
'';
example = literalExpression ''
[
(pkgs.python312Packages.buildPythonPackage {
pname = "rtk-hermes";
version = "1.0.0";
src = pkgs.fetchFromGitHub {
owner = "ogallotti";
repo = "rtk-hermes";
rev = "main";
hash = "sha256-...";
};
})
]
'';
};
restart = mkOption {
type = types.str;
default = "always";
@@ -570,7 +618,7 @@
# so interactive shells share state (sessions, skills, cron) with the
# gateway service instead of creating a separate ~/.hermes/.
(lib.mkIf cfg.addToSystemPackages {
environment.systemPackages = [ cfg.package ];
environment.systemPackages = [ effectivePackage ];
environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
})
@@ -581,6 +629,16 @@
});
})
# ── Assertions ─────────────────────────────────────────────────────
{
assertions = let
names = map lib.getName cfg.extraPlugins;
in [{
assertion = (lib.length names) == (lib.length (lib.unique names));
message = "services.hermes-agent.extraPlugins: duplicate plugin names detected: ${toString names}. If using fetchFromGitHub, set name = \"plugin-name\" to disambiguate.";
}];
}
# ── Warnings ──────────────────────────────────────────────────────
(lib.mkIf (cfg.container.enable && !cfg.addToSystemPackages && cfg.container.hostUsers != []) {
warnings = [
@@ -602,6 +660,7 @@
"d ${cfg.stateDir}/.hermes/sessions 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/logs 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/memories 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes/plugins 2770 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/home 0750 ${cfg.user} ${cfg.group} - -"
"d ${cfg.workingDirectory} 2770 ${cfg.user} ${cfg.group} - -"
];
@@ -623,7 +682,7 @@
find ${cfg.stateDir}/.hermes -maxdepth 1 \
\( -name "*.db" -o -name "*.db-wal" -o -name "*.db-shm" -o -name "SOUL.md" \) \
-exec chmod g+rw {} + 2>/dev/null || true
for _subdir in cron sessions logs memories; do
for _subdir in cron sessions logs memories plugins; do
mkdir -p "${cfg.stateDir}/.hermes/$_subdir"
chown ${cfg.user}:${cfg.group} "${cfg.stateDir}/.hermes/$_subdir"
chmod 2770 "${cfg.stateDir}/.hermes/$_subdir"
@@ -732,6 +791,22 @@ HERMES_NIX_ENV_EOF
${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
'') cfg.documents)}
# ── Declarative plugins ─────────────────────────────────────────
# Remove stale managed symlinks (plugins removed from config)
find ${cfg.stateDir}/.hermes/plugins -maxdepth 1 -type l -name 'nix-managed-*' -delete 2>/dev/null || true
${lib.concatStringsSep "\n" (map (plugin:
let
name = lib.getName plugin;
in ''
if [ ! -f "${plugin}/plugin.yaml" ]; then
echo "ERROR: extraPlugins entry '${plugin}' has no plugin.yaml" >&2
exit 1
fi
ln -sfn ${plugin} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
chown -h ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
'') cfg.extraPlugins)}
'';
}
@@ -762,7 +837,7 @@ HERMES_NIX_ENV_EOF
# reads them at Python startup — no systemd EnvironmentFile needed.
ExecStart = lib.concatStringsSep " " ([
"${cfg.package}/bin/hermes"
"${effectivePackage}/bin/hermes"
"gateway"
] ++ cfg.extraArgs);
@@ -785,7 +860,7 @@ HERMES_NIX_ENV_EOF
};
path = [
cfg.package
effectivePackage
pkgs.bash
pkgs.coreutils
pkgs.git
@@ -810,11 +885,11 @@ HERMES_NIX_ENV_EOF
preStart = ''
# Stable symlinks — container references these, not store paths directly
ln -sfn ${cfg.package} ${cfg.stateDir}/current-package
ln -sfn ${effectivePackage} ${cfg.stateDir}/current-package
ln -sfn ${containerEntrypoint} ${cfg.stateDir}/current-entrypoint
# GC roots so nix-collect-garbage doesn't remove store paths in use
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${cfg.package} 2>/dev/null || true
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${effectivePackage} 2>/dev/null || true
${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root-entrypoint --indirect -r ${containerEntrypoint} 2>/dev/null || true
# Check if container needs (re)creation
+10
View File
@@ -0,0 +1,10 @@
# nix/overlays.nix — Expose pkgs.hermes-agent for external NixOS configs
{ inputs, ... }:
{
flake.overlays.default = final: _: {
hermes-agent = final.callPackage ./hermes-agent.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
npm-lockfile-fix = inputs.npm-lockfile-fix.packages.${final.stdenv.hostPlatform.system}.default;
};
};
}
+6 -107
View File
@@ -4,120 +4,19 @@
perSystem =
{ pkgs, inputs', ... }:
let
hermesVenv = pkgs.callPackage ./python.nix {
hermesAgent = pkgs.callPackage ./hermes-agent.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
};
hermesNpmLib = pkgs.callPackage ./lib.nix {
npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
};
hermesTui = pkgs.callPackage ./tui.nix {
inherit hermesNpmLib;
};
# Import bundled skills, excluding runtime caches
bundledSkills = pkgs.lib.cleanSourceWith {
src = ../skills;
filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
};
hermesWeb = pkgs.callPackage ./web.nix {
inherit hermesNpmLib;
};
runtimeDeps = with pkgs; [
nodejs_22
ripgrep
git
openssh
ffmpeg
tirith
];
runtimePath = pkgs.lib.makeBinPath runtimeDeps;
# Lockfile hashes for dev shell stamps
pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
uvLockHash =
if builtins.pathExists ../uv.lock then
builtins.hashString "sha256" (builtins.readFile ../uv.lock)
else
"none";
in
{
packages = {
default = pkgs.stdenv.mkDerivation {
pname = "hermes-agent";
version = (fromTOML (builtins.readFile ../pyproject.toml)).project.version;
default = hermesAgent;
tui = hermesAgent.hermesTui;
web = hermesAgent.hermesWeb;
dontUnpack = true;
dontBuild = true;
nativeBuildInputs = [ pkgs.makeWrapper ];
installPhase = ''
runHook preInstall
mkdir -p $out/share/hermes-agent $out/bin
cp -r ${bundledSkills} $out/share/hermes-agent/skills
cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
# copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
mkdir -p $out/ui-tui
cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
${pkgs.lib.concatMapStringsSep "\n"
(name: ''
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
--set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
--set HERMES_TUI_DIR $out/ui-tui \
--set HERMES_PYTHON ${hermesVenv}/bin/python3 \
--set HERMES_NODE ${pkgs.nodejs_22}/bin/node
'')
[
"hermes"
"hermes-agent"
"hermes-acp"
]
}
runHook postInstall
'';
passthru.devShellHook = ''
STAMP=".nix-stamps/hermes-agent"
STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-agent: installing Python dependencies..."
uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true
source .venv/bin/activate
uv pip install -e ".[all]"
[ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
[ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
mkdir -p .nix-stamps
echo "$STAMP_VALUE" > "$STAMP"
else
source .venv/bin/activate
export HERMES_PYTHON=${hermesVenv}/bin/python3
fi
'';
meta = with pkgs.lib; {
description = "AI agent with advanced tool-calling capabilities";
homepage = "https://github.com/NousResearch/hermes-agent";
mainProgram = "hermes";
license = licenses.mit;
platforms = platforms.unix;
};
};
tui = hermesTui;
web = hermesWeb;
fix-lockfiles = hermesNpmLib.mkFixLockfiles {
packages = [ hermesTui hermesWeb ];
fix-lockfiles = hermesAgent.hermesNpmLib.mkFixLockfiles {
packages = [ hermesAgent.hermesTui hermesAgent.hermesWeb ];
};
};
};
+2 -1
View File
@@ -7,6 +7,7 @@
pyproject-nix,
pyproject-build-systems,
stdenv,
dependency-groups ? [ "all" ],
}:
let
workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
@@ -96,5 +97,5 @@ let
]);
in
pythonSet.mkVirtualEnv "hermes-agent-env" {
hermes-agent = [ "all" ];
hermes-agent = dependency-groups;
}
@@ -0,0 +1,875 @@
"""langfuse — Hermes plugin for Langfuse observability.
Traces Hermes conversations, LLM calls, and tool usage to Langfuse.
Enable via ``hermes tools`` or by setting HERMES_LANGFUSE_ENABLED=true
and the required credentials in ~/.hermes/.env.
Required env vars (set via ``hermes tools`` or ~/.hermes/.env):
HERMES_LANGFUSE_ENABLED - set to "true" to activate tracing
HERMES_LANGFUSE_PUBLIC_KEY - Langfuse project public key (pk-lf-...)
HERMES_LANGFUSE_SECRET_KEY - Langfuse project secret key (sk-lf-...)
HERMES_LANGFUSE_BASE_URL - Langfuse server URL (default: https://cloud.langfuse.com)
Optional env vars:
HERMES_LANGFUSE_ENV - environment tag (e.g. "production", "local")
HERMES_LANGFUSE_RELEASE - release/version tag
HERMES_LANGFUSE_SAMPLE_RATE - sampling rate 0.01.0 (default: 1.0)
HERMES_LANGFUSE_MAX_CHARS - max chars per field (default: 12000)
HERMES_LANGFUSE_DEBUG - set to "true" for verbose logging
"""
from __future__ import annotations
import json
import logging
import os
import re
import threading
import time
from dataclasses import dataclass, field
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
try:
from langfuse import Langfuse, propagate_attributes
except Exception: # pragma: no cover - fail-open when optional dep is missing
Langfuse = None
propagate_attributes = None
@dataclass
class TraceState:
trace_id: str
root_ctx: Any
root_span: Any
generations: Dict[str, Any] = field(default_factory=dict)
tools: Dict[str, Any] = field(default_factory=dict)
turn_tool_calls: list[dict[str, Any]] = field(default_factory=list)
last_updated_at: float = field(default_factory=time.time)
_STATE_LOCK = threading.Lock()
_TRACE_STATE: Dict[str, TraceState] = {}
_LANGFUSE_CLIENT = None
_READ_FILE_LINE_RE = re.compile(r"^\s*(\d+)\|(.*)$")
_READ_FILE_HEAD_LINES = 25
_READ_FILE_TAIL_LINES = 15
def _env(name: str, default: str = "") -> str:
return os.environ.get(name, default).strip()
def _env_bool(*names: str) -> bool:
for name in names:
value = _env(name).lower()
if value:
return value in {"1", "true", "yes", "on"}
return False
def _debug_enabled() -> bool:
return _env_bool("HERMES_LANGFUSE_DEBUG")
def _debug(message: str) -> None:
if _debug_enabled():
logger.info("Langfuse tracing: %s", message)
def _is_enabled() -> bool:
if Langfuse is None:
return False
# Primary activation path: config.yaml plugins.langfuse.enabled
try:
from hermes_cli.config import load_config
_cfg = load_config()
_plugin_cfg = _cfg.get("plugins", {})
if isinstance(_plugin_cfg, dict):
_lt_cfg = _plugin_cfg.get("langfuse", {})
if isinstance(_lt_cfg, dict) and "enabled" in _lt_cfg:
if not _lt_cfg["enabled"]:
return False
# Explicit enabled=true in config — skip env-var check below
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
return bool(public_key and secret_key)
except Exception:
pass
# Backward-compat path: HERMES_LANGFUSE_ENABLED env var (legacy .env installs)
if not _env_bool("HERMES_LANGFUSE_ENABLED"):
return False
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
return bool(public_key and secret_key)
def _get_langfuse() -> Optional[Langfuse]:
global _LANGFUSE_CLIENT
if not _is_enabled():
return None
if _LANGFUSE_CLIENT is not None:
return _LANGFUSE_CLIENT
public_key = _env("HERMES_LANGFUSE_PUBLIC_KEY") or _env("LANGFUSE_PUBLIC_KEY")
secret_key = _env("HERMES_LANGFUSE_SECRET_KEY") or _env("LANGFUSE_SECRET_KEY")
base_url = _env("HERMES_LANGFUSE_BASE_URL") or _env("LANGFUSE_BASE_URL") or "https://cloud.langfuse.com"
environment = _env("HERMES_LANGFUSE_ENV") or _env("LANGFUSE_ENV")
release = _env("HERMES_LANGFUSE_RELEASE") or _env("LANGFUSE_RELEASE")
sample_rate = _env("HERMES_LANGFUSE_SAMPLE_RATE")
kwargs: Dict[str, Any] = {
"public_key": public_key,
"secret_key": secret_key,
"base_url": base_url,
}
if environment:
kwargs["environment"] = environment
if release:
kwargs["release"] = release
if sample_rate:
try:
kwargs["sample_rate"] = float(sample_rate)
except ValueError:
logger.warning("Invalid HERMES_LANGFUSE_SAMPLE_RATE=%r", sample_rate)
try:
_LANGFUSE_CLIENT = Langfuse(**kwargs)
except Exception as exc: # pragma: no cover - fail-open
logger.warning("Could not initialize Langfuse client: %s", exc)
return None
return _LANGFUSE_CLIENT
def _trace_key(task_id: str, session_id: str) -> str:
if task_id:
return task_id
if session_id:
return f"session:{session_id}"
return f"thread:{threading.get_ident()}"
def _truncate_text(value: str, max_chars: int) -> str:
if len(value) <= max_chars:
return value
return value[:max_chars] + f"... [truncated {len(value) - max_chars} chars]"
def _maybe_parse_json_string(value: str) -> Any:
stripped = value.strip()
if len(stripped) < 2 or stripped[0] not in "{[" or stripped[-1] not in "}]":
if len(stripped) < 2 or stripped[0] not in "{[":
return value
try:
parsed, idx = json.JSONDecoder().raw_decode(stripped)
except Exception:
return value
if not isinstance(parsed, (dict, list)):
return value
trailing = stripped[idx:].strip()
if not trailing:
return parsed
hint_key = "_hint" if trailing.startswith("[Hint:") else "_trailing_text"
if isinstance(parsed, dict):
merged = dict(parsed)
key = hint_key if hint_key not in merged else "_trailing_text"
merged[key] = trailing
return merged
return {"data": parsed, hint_key: trailing}
def _looks_like_read_file_payload(value: Any) -> bool:
if not isinstance(value, dict):
return False
content = value.get("content")
return (
isinstance(content, str)
and "total_lines" in value
and "file_size" in value
and "is_binary" in value
and "is_image" in value
and not value.get("error")
)
def _parse_read_file_lines(content: str) -> list[dict[str, Any]]:
if not isinstance(content, str) or not content:
return []
lines = []
for raw_line in content.splitlines():
match = _READ_FILE_LINE_RE.match(raw_line)
if not match:
return []
lines.append({
"line": int(match.group(1)),
"text": match.group(2),
})
return lines
def _build_read_file_preview(lines: list[dict[str, Any]]) -> dict[str, Any]:
if len(lines) <= (_READ_FILE_HEAD_LINES + _READ_FILE_TAIL_LINES):
return {"lines": lines}
return {
"head": lines[:_READ_FILE_HEAD_LINES],
"tail": lines[-_READ_FILE_TAIL_LINES:],
"omitted_line_count": len(lines) - _READ_FILE_HEAD_LINES - _READ_FILE_TAIL_LINES,
}
def _normalize_read_file_payload(value: dict[str, Any], *, args: Any = None) -> dict[str, Any]:
normalized: dict[str, Any] = {}
if isinstance(args, dict):
path = args.get("path")
offset = args.get("offset")
limit = args.get("limit")
if isinstance(path, str) and path:
normalized["path"] = path
if isinstance(offset, int):
normalized["offset"] = offset
if isinstance(limit, int):
normalized["limit"] = limit
lines = _parse_read_file_lines(value.get("content", ""))
if lines:
normalized["returned_lines"] = {
"start": lines[0]["line"],
"end": lines[-1]["line"],
"count": len(lines),
}
normalized["content_preview"] = _build_read_file_preview(lines)
elif value.get("content"):
normalized["content_preview"] = {
"text": value.get("content", ""),
}
for key in (
"total_lines",
"file_size",
"truncated",
"is_binary",
"is_image",
"hint",
"_warning",
"mime_type",
"dimensions",
"similar_files",
"error",
):
if key in value:
normalized[key] = value[key]
base64_content = value.get("base64_content")
if isinstance(base64_content, str) and base64_content:
normalized["base64_content"] = {
"omitted": True,
"length": len(base64_content),
}
return normalized
def _normalize_payload(value: Any, *, tool_name: str = "", args: Any = None) -> Any:
if _looks_like_read_file_payload(value):
return _normalize_read_file_payload(
value,
args=args if tool_name == "read_file" else None,
)
return value
def _safe_value(value: Any, *, max_chars: Optional[int] = None, depth: int = 0,
parse_json_strings: bool = False) -> Any:
max_chars = max_chars if max_chars is not None else int(_env("HERMES_LANGFUSE_MAX_CHARS", "12000") or "12000")
if depth > 4:
return "<max-depth>"
if value is None or isinstance(value, (int, float, bool)):
return value
if isinstance(value, bytes):
return {"type": "bytes", "len": len(value)}
if isinstance(value, str):
if parse_json_strings:
parsed = _maybe_parse_json_string(value)
if parsed is not value:
return _safe_value(parsed, max_chars=max_chars, depth=depth, parse_json_strings=True)
return _truncate_text(value, max_chars)
if isinstance(value, dict):
normalized = _normalize_payload(value)
if normalized is not value:
return _safe_value(normalized, max_chars=max_chars, depth=depth, parse_json_strings=parse_json_strings)
return {
str(k): _safe_value(v, max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
for k, v in list(value.items())[:50]
}
if isinstance(value, (list, tuple, set)):
return [
_safe_value(v, max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
for v in list(value)[:50]
]
if hasattr(value, "__dict__"):
return _safe_value(vars(value), max_chars=max_chars, depth=depth + 1, parse_json_strings=parse_json_strings)
return _truncate_text(repr(value), max_chars)
def _extract_last_user_message(messages: Any) -> Any:
if not isinstance(messages, list):
return None
for message in reversed(messages):
if isinstance(message, dict) and message.get("role") == "user":
return {
"role": "user",
"content": _safe_value(message.get("content")),
}
return None
def _serialize_messages(messages: Any) -> list[dict[str, Any]]:
if not isinstance(messages, list):
return []
serialized = []
for message in messages[-12:]:
if not isinstance(message, dict):
continue
role = message.get("role")
item = {
"role": role,
"content": _safe_value(
message.get("content"),
parse_json_strings=(role == "tool"),
),
}
if role == "tool" and message.get("tool_call_id"):
item["tool_call_id"] = message.get("tool_call_id")
if message.get("tool_calls"):
item["tool_calls"] = _safe_value(message.get("tool_calls"), parse_json_strings=True)
serialized.append(item)
return serialized
def _serialize_tool_calls(tool_calls: Any) -> list[dict[str, Any]]:
if not tool_calls:
return []
serialized = []
for tool_call in tool_calls:
fn = getattr(tool_call, "function", None)
name = getattr(fn, "name", None) if fn else None
arguments = getattr(fn, "arguments", None) if fn else None
if isinstance(arguments, str):
try:
arguments = json.loads(arguments)
except Exception:
pass
serialized.append({
"id": getattr(tool_call, "id", None),
"name": name,
"arguments": _safe_value(arguments, parse_json_strings=True),
})
return serialized
def _serialize_assistant_message(message: Any) -> dict[str, Any]:
return {
"content": _safe_value(getattr(message, "content", None)),
"reasoning": _safe_value(getattr(message, "reasoning", None)),
"tool_calls": _serialize_tool_calls(getattr(message, "tool_calls", None)),
}
def _usage_and_cost(response: Any, *, provider: str, api_mode: str, model: str, base_url: str) -> tuple[dict[str, int], dict[str, float]]:
usage_details: Dict[str, int] = {}
cost_details: Dict[str, float] = {}
raw_usage = getattr(response, "usage", None)
if not raw_usage:
return usage_details, cost_details
try:
from agent.usage_pricing import estimate_usage_cost, normalize_usage
canonical = normalize_usage(raw_usage, provider=provider, api_mode=api_mode)
# Langfuse usage_details keys follow a naming convention:
# - Dashboard sums all keys containing "input" as input total
# - Dashboard sums all keys containing "output" as output total
# - If no "total" key, Langfuse derives it from all usage types
# Use Anthropic-style key names so cache tokens roll into the
# dashboard input total automatically.
# Ref: https://langfuse.com/docs/model-usage-and-cost
usage_details = {
"input": canonical.input_tokens,
"output": canonical.output_tokens,
}
if canonical.cache_read_tokens:
usage_details["cache_read_input_tokens"] = canonical.cache_read_tokens
if canonical.cache_write_tokens:
usage_details["cache_creation_input_tokens"] = canonical.cache_write_tokens
if canonical.reasoning_tokens:
usage_details["reasoning_tokens"] = canonical.reasoning_tokens
cost = estimate_usage_cost(
model,
canonical,
provider=provider,
base_url=base_url,
api_key="",
)
if cost.amount_usd is not None:
# Langfuse cost_details keys must match usage_details keys.
# Provide per-type breakdown so dashboard can show cost by type.
try:
from agent.usage_pricing import get_pricing_entry
from decimal import Decimal
_ONE_M = Decimal("1000000")
entry = get_pricing_entry(model, provider=provider, base_url=base_url)
if entry:
if entry.input_cost_per_million is not None and canonical.input_tokens:
cost_details["input"] = float(Decimal(canonical.input_tokens) * entry.input_cost_per_million / _ONE_M)
if entry.output_cost_per_million is not None and canonical.output_tokens:
cost_details["output"] = float(Decimal(canonical.output_tokens) * entry.output_cost_per_million / _ONE_M)
if entry.cache_read_cost_per_million is not None and canonical.cache_read_tokens:
cost_details["cache_read_input_tokens"] = float(Decimal(canonical.cache_read_tokens) * entry.cache_read_cost_per_million / _ONE_M)
if entry.cache_write_cost_per_million is not None and canonical.cache_write_tokens:
cost_details["cache_creation_input_tokens"] = float(Decimal(canonical.cache_write_tokens) * entry.cache_write_cost_per_million / _ONE_M)
else:
cost_details["total"] = float(cost.amount_usd)
except Exception:
cost_details["total"] = float(cost.amount_usd)
except Exception as exc: # pragma: no cover - fail-open
_debug(f"usage normalization failed: {exc}")
return usage_details, cost_details
def _start_root_trace(task_key: str, *, task_id: str, session_id: str, platform: str, provider: str, model: str,
api_mode: str, messages: Any, client: Langfuse) -> TraceState:
trace_id = client.create_trace_id(seed=f"{session_id or 'sessionless'}::{task_id or task_key}")
trace_input = _extract_last_user_message(messages)
metadata = {
"source": "hermes",
"task_id": task_id,
"platform": platform,
"provider": provider,
"model": model,
"api_mode": api_mode,
}
# session_id must be passed in trace_context for Langfuse session grouping.
trace_ctx: Dict[str, Any] = {"trace_id": trace_id}
if session_id:
trace_ctx["session_id"] = session_id
if propagate_attributes is not None:
try:
with propagate_attributes(
session_id=session_id or task_key,
trace_name="Hermes turn",
tags=["hermes", "langfuse"],
):
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
except Exception:
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
else:
root_ctx = client.start_as_current_observation(
trace_context=trace_ctx,
name="Hermes turn",
as_type="chain",
input=trace_input,
metadata=metadata,
end_on_exit=False,
)
root_span = root_ctx.__enter__()
try:
root_span.set_trace_io(input=trace_input)
except Exception:
pass
_debug(f"started trace {trace_id} for {task_key}")
return TraceState(trace_id=trace_id, root_ctx=root_ctx, root_span=root_span)
def _start_child_observation(state: TraceState, *, client: Langfuse, name: str, as_type: str,
input_value: Any, metadata: Optional[dict] = None,
model: Optional[str] = None, model_parameters: Optional[dict] = None) -> Any:
return state.root_span.start_observation(
name=name,
as_type=as_type,
input=input_value,
metadata=metadata or {},
model=model,
model_parameters=model_parameters,
)
def _end_observation(observation: Any, *, output: Any = None, metadata: Optional[dict] = None,
usage_details: Optional[dict] = None, cost_details: Optional[dict] = None) -> None:
if observation is None:
return
try:
update_kwargs: Dict[str, Any] = {}
if output is not None:
update_kwargs["output"] = output
if metadata:
update_kwargs["metadata"] = metadata
if usage_details:
update_kwargs["usage_details"] = usage_details
if cost_details:
update_kwargs["cost_details"] = cost_details
if update_kwargs:
observation.update(**update_kwargs)
observation.end()
except Exception as exc: # pragma: no cover - fail-open
_debug(f"end observation failed: {exc}")
def _merge_trace_output(output: Any, state: TraceState) -> Any:
if not state.turn_tool_calls:
return output
merged = dict(output) if isinstance(output, dict) else {"content": output}
merged["tool_calls"] = list(state.turn_tool_calls)
return merged
def _finish_trace(task_key: str, *, output: Any = None) -> None:
client = _get_langfuse()
if client is None:
return
with _STATE_LOCK:
state = _TRACE_STATE.pop(task_key, None)
if state is None:
return
try:
for observation in state.generations.values():
_end_observation(observation)
for observation in state.tools.values():
_end_observation(observation)
final_output = _merge_trace_output(output, state)
if final_output is not None:
state.root_span.set_trace_io(output=final_output)
state.root_span.update(output=final_output)
state.root_span.end()
except Exception as exc: # pragma: no cover - fail-open
_debug(f"finish trace failed: {exc}")
finally:
try:
client.flush()
except Exception:
pass
def _assistant_has_tool_calls(message: Any) -> bool:
return bool(getattr(message, "tool_calls", None))
def _request_key(api_call_count: Any) -> str:
return str(api_call_count or 0)
def on_pre_llm_call(*, task_id: str = "", session_id: str = "", platform: str = "", model: str = "",
provider: str = "", base_url: str = "", api_mode: str = "",
api_call_count: int = 0, messages: Any = None, turn_type: str = "user",
conversation_history: Any = None, user_message: Any = None, **_: Any) -> None:
# Older Hermes branches used pre_llm_call for request-scoped tracing and
# passed the actual API messages. Current Hermes also has a turn-scoped
# pre_llm_call used for context injection; tracing that hook creates an
# extra orphan/root trace before the real request trace. Only trace the
# legacy request-shaped call here.
if not isinstance(messages, list):
return
client = _get_langfuse()
if client is None:
return
# messages is a list only for legacy Hermes branches that fired
# pre_llm_call with API messages directly. Current Hermes fires
# pre_llm_call for context injection (conversation_history/user_message,
# no messages list) — tracing that would create orphan traces.
task_key = _trace_key(task_id, session_id)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
state = _start_root_trace(
task_key,
task_id=task_id,
session_id=session_id,
platform=platform,
provider=provider,
model=model,
api_mode=api_mode,
messages=messages,
client=client,
)
_TRACE_STATE[task_key] = state
state.last_updated_at = time.time()
def on_pre_llm_request(
*,
task_id: str = "",
session_id: str = "",
platform: str = "",
model: str = "",
provider: str = "",
base_url: str = "",
api_mode: str = "",
api_call_count: int = 0,
messages: Any = None,
turn_type: str = "user",
message_count: int = 0,
tool_count: int = 0,
approx_input_tokens: int = 0,
request_char_count: int = 0,
max_tokens: Any = None,
**_: Any,
) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
req_key = _request_key(api_call_count)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
state = _start_root_trace(
task_key,
task_id=task_id,
session_id=session_id,
platform=platform,
provider=provider,
model=model,
api_mode=api_mode,
messages=messages,
client=client,
)
_TRACE_STATE[task_key] = state
state.last_updated_at = time.time()
previous = state.generations.pop(req_key, None)
if previous is not None:
_end_observation(previous)
state.generations[req_key] = _start_child_observation(
state,
client=client,
name=f"LLM call {api_call_count}",
as_type="generation",
input_value=_serialize_messages(messages),
metadata={
"provider": provider,
"platform": platform,
"api_mode": api_mode,
"base_url": base_url,
},
model=model,
model_parameters={"api_mode": api_mode, "provider": provider},
)
def on_post_llm_call(*, task_id: str = "", session_id: str = "", provider: str = "", base_url: str = "",
api_mode: str = "", model: str = "", api_call_count: int = 0,
assistant_message: Any = None, response: Any = None,
api_duration: float = 0.0, finish_reason: str = "",
usage: Any = None, assistant_content_chars: int = 0,
assistant_tool_call_count: int = 0, assistant_response: Any = None,
**_: Any) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
req_key = _request_key(api_call_count)
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
generation = state.generations.pop(req_key, None) if state else None
if state is None or generation is None:
return
# Handle both call patterns:
# 1. post_api_request: passes usage (dict), assistant_content_chars, assistant_tool_call_count
# 2. post_llm_call: passes assistant_message (object), response (object), assistant_response (str)
if assistant_message is not None:
output = _serialize_assistant_message(assistant_message)
elif assistant_response is not None:
# post_llm_call passes assistant_response as a plain string
output = {"content": _safe_value(assistant_response), "reasoning": None, "tool_calls": []}
else:
# post_api_request path — reconstruct from summary kwargs
output = {
"content": f"[{assistant_content_chars} chars]" if assistant_content_chars else None,
"reasoning": None,
"tool_calls": [{"id": f"tc_{i}"} for i in range(assistant_tool_call_count)] if assistant_tool_call_count else [],
}
if output.get("tool_calls"):
state.turn_tool_calls.extend(output["tool_calls"])
# Extract usage: prefer response object, fall back to usage dict from post_api_request
if response is not None:
usage_details, cost_details = _usage_and_cost(
response,
provider=provider,
api_mode=api_mode,
model=model,
base_url=base_url,
)
elif isinstance(usage, dict) and usage:
# post_api_request passes a pre-built CanonicalUsage summary dict.
# Use Langfuse-convention key names: "input", "output", and
# "cache_read_input_tokens" / "cache_creation_input_tokens" so the
# dashboard sums cache tokens into the input total automatically.
_input = usage.get("input_tokens", 0)
_output = usage.get("output_tokens", 0) or usage.get("completion_tokens", 0)
_cache_read = usage.get("cache_read_tokens", 0)
_cache_write = usage.get("cache_write_tokens", 0)
_reasoning = usage.get("reasoning_tokens", 0)
usage_details = {
"input": _input,
"output": _output,
}
if _cache_read:
usage_details["cache_read_input_tokens"] = _cache_read
if _cache_write:
usage_details["cache_creation_input_tokens"] = _cache_write
if _reasoning:
usage_details["reasoning_tokens"] = _reasoning
cost_details = {}
# Estimate per-type cost from the summary if possible
try:
from agent.usage_pricing import CanonicalUsage, estimate_usage_cost, get_pricing_entry
from decimal import Decimal
_ONE_M = Decimal("1000000")
_cu = CanonicalUsage(
input_tokens=_input,
output_tokens=_output,
cache_read_tokens=_cache_read,
cache_write_tokens=_cache_write,
reasoning_tokens=_reasoning,
)
entry = get_pricing_entry(model, provider=provider, base_url=base_url)
if entry:
if entry.input_cost_per_million is not None and _input:
cost_details["input"] = float(Decimal(_input) * entry.input_cost_per_million / _ONE_M)
if entry.output_cost_per_million is not None and _output:
cost_details["output"] = float(Decimal(_output) * entry.output_cost_per_million / _ONE_M)
if entry.cache_read_cost_per_million is not None and _cache_read:
cost_details["cache_read_input_tokens"] = float(Decimal(_cache_read) * entry.cache_read_cost_per_million / _ONE_M)
if entry.cache_write_cost_per_million is not None and _cache_write:
cost_details["cache_creation_input_tokens"] = float(Decimal(_cache_write) * entry.cache_write_cost_per_million / _ONE_M)
else:
_cost = estimate_usage_cost(model, _cu, provider=provider, base_url=base_url, api_key="")
if _cost.amount_usd is not None:
cost_details["total"] = float(_cost.amount_usd)
except Exception:
pass
else:
usage_details, cost_details = {}, {}
tool_count = len(output.get("tool_calls", [])) or assistant_tool_call_count
gen_metadata: Dict[str, Any] = {"tool_call_count": tool_count}
if api_duration and api_duration > 0:
gen_metadata["api_duration_s"] = round(api_duration, 3)
if finish_reason:
gen_metadata["finish_reason"] = finish_reason
_end_observation(
generation,
output=output,
usage_details=usage_details,
cost_details=cost_details,
metadata=gen_metadata,
)
has_tools = _assistant_has_tool_calls(assistant_message) if assistant_message else (assistant_tool_call_count > 0)
has_content = bool(output.get("content"))
if not has_tools and has_content:
_finish_trace(task_key, output=output)
def on_pre_tool_call(*, tool_name: str = "", args: Any = None, task_id: str = "",
session_id: str = "", tool_call_id: str = "", **_: Any) -> None:
client = _get_langfuse()
if client is None:
return
task_key = _trace_key(task_id, session_id)
tool_key = tool_call_id or f"{tool_name}:{time.time_ns()}"
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
return
state.tools[tool_key] = _start_child_observation(
state,
client=client,
name=f"Tool: {tool_name}",
as_type="tool",
input_value=_safe_value(args),
metadata={"tool_name": tool_name, "tool_call_id": tool_call_id},
)
def on_post_tool_call(*, tool_name: str = "", args: Any = None, result: Any = None,
task_id: str = "", session_id: str = "", tool_call_id: str = "", **_: Any) -> None:
task_key = _trace_key(task_id, session_id)
tool_key = tool_call_id or ""
observation = None
with _STATE_LOCK:
state = _TRACE_STATE.get(task_key)
if state is None:
return
if tool_key:
observation = state.tools.pop(tool_key, None)
elif state.tools:
_, observation = state.tools.popitem()
if observation is None:
return
if isinstance(result, str):
result_value = _maybe_parse_json_string(result)
else:
result_value = result
result_value = _normalize_payload(result_value, tool_name=tool_name, args=args)
_end_observation(
observation,
output=_safe_value(result_value, parse_json_strings=True),
metadata={"tool_name": tool_name, "args": _safe_value(args, parse_json_strings=True)},
)
def register(ctx) -> None:
# Register for both hook name variants so the plugin works across
# Hermes versions. pre_api_request / post_api_request fire per API
# call (preferred); pre_llm_call / post_llm_call fire once per turn.
ctx.register_hook("pre_api_request", on_pre_llm_request)
ctx.register_hook("post_api_request", on_post_llm_call)
ctx.register_hook("pre_llm_call", on_pre_llm_call)
ctx.register_hook("post_llm_call", on_post_llm_call)
ctx.register_hook("pre_tool_call", on_pre_tool_call)
ctx.register_hook("post_tool_call", on_post_tool_call)
@@ -0,0 +1,38 @@
# After installing langfuse
Langfuse tracing is now installed and enabled for your Hermes profile.
## Required credentials
Set these in `~/.hermes/.env` (or via `hermes tools` → Langfuse Observability):
```bash
HERMES_LANGFUSE_PUBLIC_KEY=pk-lf-...
HERMES_LANGFUSE_SECRET_KEY=sk-lf-...
HERMES_LANGFUSE_BASE_URL=https://cloud.langfuse.com # or your self-hosted URL
```
## Verify
```bash
hermes plugins list # langfuse should appear as enabled
hermes chat -q "hello" # then check Langfuse for a "Hermes turn" trace
```
## Optional settings
```bash
HERMES_LANGFUSE_ENV=production # environment tag
HERMES_LANGFUSE_RELEASE=v1.0.0 # release tag
HERMES_LANGFUSE_SAMPLE_RATE=0.5 # sample 50% of traces
HERMES_LANGFUSE_MAX_CHARS=12000 # max chars per field (default: 12000)
HERMES_LANGFUSE_DEBUG=true # verbose plugin logging
```
## Dependencies
The `langfuse` Python SDK is required. Install it into your Hermes venv:
```bash
pip install langfuse
```
@@ -0,0 +1,14 @@
name: langfuse
version: "1.0.0"
description: "Optional Langfuse observability for Hermes — traces conversations, LLM calls, and tool usage. Install via: hermes plugins install official/observability/langfuse"
author: NousResearch
requires_env:
- HERMES_LANGFUSE_PUBLIC_KEY
- HERMES_LANGFUSE_SECRET_KEY
hooks:
- pre_api_request
- post_api_request
- pre_llm_call
- post_llm_call
- pre_tool_call
- post_tool_call
+81 -6
View File
@@ -22,6 +22,7 @@ import threading
import time
from typing import Any, Dict, List, Optional
from agent.memory_manager import sanitize_context
from agent.memory_provider import MemoryProvider
from tools.registry import tool_error
@@ -37,7 +38,10 @@ PROFILE_SCHEMA = {
"description": (
"Retrieve or update a peer card from Honcho — a curated list of key facts "
"about that peer (name, role, preferences, communication style, patterns). "
"Pass `card` to update; omit `card` to read."
"Pass `card` to update; omit `card` to read. If the card is empty, the "
"result includes a `hint` field explaining why (observation disabled, "
"fresh peer, dialectic layer still warming up, etc.) — this is NOT an "
"error. Peer cards accumulate over time from observed conversation."
),
"parameters": {
"type": "object",
@@ -1056,6 +1060,63 @@ class HonchoMemoryProvider(MemoryProvider):
return chunks
def _empty_profile_hint(self, peer: str) -> Dict[str, Any]:
"""Build a diagnostic hint when honcho_profile returns an empty card.
A literal "No profile facts available yet." tells the model nothing
about WHY. The model then often surfaces it to the user as a cryptic
error. This hint enumerates the likely causes so the model can
explain the situation (or retry with a different peer).
Ordered by likelihood for a typical deployment:
1. Observation is disabled for this peer
2. Card hasn't accumulated yet (fresh peer, not enough dialectic
cycles dialectic cadence runs every N turns)
3. Self-hosted Honcho backend doesn't support peer cards
(honcho-ai server < 3.x)
"""
cfg = self._config
reasons: List[str] = []
if cfg is not None:
if peer == "user":
observe_me = bool(getattr(cfg, "user_observe_me", True))
observe_others = bool(getattr(cfg, "user_observe_others", True))
else:
observe_me = bool(getattr(cfg, "ai_observe_me", True))
observe_others = bool(getattr(cfg, "ai_observe_others", True))
if not (observe_me or observe_others):
reasons.append(
f"observation is disabled for peer '{peer}' "
f"(user_observe_me/ai_observe_me in config)"
)
cadence = getattr(self, "_dialectic_cadence", 1)
turn = getattr(self, "_turn_count", 0)
if turn < max(2, cadence):
reasons.append(
f"this session has only {turn} turn(s); peer cards accumulate "
f"as the dialectic layer reasons over conversation history "
f"(cadence every {cadence} turn(s))"
)
if not reasons:
reasons.append(
"peer card has no facts yet — Honcho's dialectic layer builds "
"this over time from observed turns; self-hosted Honcho < 3.x "
"does not support peer cards at all"
)
return {
"result": "No profile facts available yet.",
"hint": (
"This is not an error. "
+ "; ".join(reasons)
+ ". Try honcho_reasoning for a synthesized answer, or "
"honcho_search to query raw conversation excerpts."
),
}
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Record the conversation turn in Honcho (non-blocking).
@@ -1068,13 +1129,15 @@ class HonchoMemoryProvider(MemoryProvider):
return
msg_limit = self._config.message_max_chars if self._config else 25000
clean_user_content = sanitize_context(user_content or "").strip()
clean_assistant_content = sanitize_context(assistant_content or "").strip()
def _sync():
try:
session = self._manager.get_or_create(self._session_key)
for chunk in self._chunk_message(user_content, msg_limit):
for chunk in self._chunk_message(clean_user_content, msg_limit):
session.add_message("user", chunk)
for chunk in self._chunk_message(assistant_content, msg_limit):
for chunk in self._chunk_message(clean_assistant_content, msg_limit):
session.add_message("assistant", chunk)
self._manager._flush_session(session)
except Exception as e:
@@ -1087,8 +1150,20 @@ class HonchoMemoryProvider(MemoryProvider):
)
self._sync_thread.start()
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in user profile writes as Honcho conclusions."""
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Mirror built-in user profile writes as Honcho conclusions.
``metadata`` is accepted for compatibility with the write-origin
work landed in main (commit 6a957a74); it's not yet threaded into
the Honcho conclusion payload. Left as a follow-up so this PR
stays focused on the 7-PR consolidation and its review follow-ups.
"""
if action != "add" or target != "user" or not content:
return
if self._cron_skipped:
@@ -1154,7 +1229,7 @@ class HonchoMemoryProvider(MemoryProvider):
return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
card = self._manager.get_peer_card(self._session_key, peer=peer)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps(self._empty_profile_hint(peer))
return json.dumps({"result": card})
elif tool_name == "honcho_search":
+31 -2
View File
@@ -273,9 +273,38 @@ def _write_config(cfg: dict, path: Path | None = None) -> None:
def _resolve_api_key(cfg: dict) -> str:
"""Resolve API key with host -> root -> env fallback."""
"""Resolve API key with host -> root -> env fallback.
For self-hosted instances configured with ``baseUrl`` instead of an API
key, returns ``"local"`` so that credential guards throughout the CLI
don't reject a valid configuration. The ``baseUrl`` is scheme-validated
(http/https only) so that a typo like ``baseUrl: true`` can't silently
pass the guard. Schemeless strings that look like host:port (legacy
config shapes, e.g. ``localhost:8000``) still pass the Honcho SDK
will reject them itself with a clearer error than ours.
"""
host_key = ((cfg.get("hosts") or {}).get(_host_key()) or {}).get("apiKey")
return host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
key = host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
if not key:
base_url = cfg.get("baseUrl") or cfg.get("base_url") or os.environ.get("HONCHO_BASE_URL", "")
base_url = (base_url or "").strip()
if base_url:
from urllib.parse import urlparse
try:
parsed = urlparse(base_url)
except (TypeError, ValueError):
parsed = None
if parsed and parsed.scheme in ("http", "https") and parsed.netloc:
return "local"
# Schemeless but looks like a host (contains '.' or ':' and isn't
# a boolean literal): let it through so legacy configs don't
# regress into "no API key configured" when they previously worked.
lowered = base_url.lower()
if lowered not in ("true", "false", "none", "null") and any(
c in base_url for c in ".:"
) and not base_url.isdigit():
return "local"
return key
def _prompt(label: str, default: str | None = None, secret: bool = False) -> str:
+67 -3
View File
@@ -16,6 +16,7 @@ from __future__ import annotations
import json
import os
import logging
import hashlib
from dataclasses import dataclass, field
from pathlib import Path
@@ -27,7 +28,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
GLOBAL_CONFIG_PATH = Path.home() / ".honcho" / "config.json"
HOST = "hermes"
@@ -53,6 +53,11 @@ def resolve_active_host() -> str:
return HOST
def resolve_global_config_path() -> Path:
"""Return the shared Honcho config path for the current HOME."""
return Path.home() / ".honcho" / "config.json"
def resolve_config_path() -> Path:
"""Return the active Honcho config path.
@@ -72,7 +77,7 @@ def resolve_config_path() -> Path:
if default_path != local_path and default_path.exists():
return default_path
return GLOBAL_CONFIG_PATH
return resolve_global_config_path()
_RECALL_MODE_ALIASES = {"auto": "hybrid"}
@@ -138,6 +143,15 @@ def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] |
return None
# Default HTTP timeout (seconds) applied when no explicit timeout is
# configured via HonchoClientConfig.timeout, honcho.timeout / requestTimeout,
# or HONCHO_TIMEOUT. Honcho calls happen on the post-response path of
# run_conversation; without a cap the agent can block indefinitely when
# the Honcho backend is unreachable, preventing the gateway from
# delivering the already-generated response.
_DEFAULT_HTTP_TIMEOUT = 30.0
def _resolve_optional_float(*values: Any) -> float | None:
"""Return the first non-empty value coerced to a positive float."""
for value in values:
@@ -226,6 +240,13 @@ class HonchoClientConfig:
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
# When True, ``peer_name`` wins over any gateway-supplied runtime
# identity (Telegram UID, Discord ID, …) when resolving the user peer.
# This keeps memory unified across platforms for single-user deployments
# where Honcho's one peer-name is an unambiguous identity — otherwise
# each platform would fork memory into its own peer (#14984). Default
# ``False`` preserves existing multi-user behaviour.
pin_peer_name: bool = False
# Toggles
enabled: bool = False
save_messages: bool = True
@@ -420,6 +441,11 @@ class HonchoClientConfig:
timeout=timeout,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
pin_peer_name=_resolve_bool(
host_block.get("pinPeerName"),
raw.get("pinPeerName"),
default=False,
),
enabled=enabled,
save_messages=save_messages,
write_frequency=write_frequency,
@@ -522,6 +548,39 @@ class HonchoClientConfig:
pass
return None
# Honcho enforces a 100-char limit on session IDs. Long gateway session keys
# (Matrix "!room:server" + thread event IDs, Telegram supergroup reply
# chains, Slack thread IDs with long workspace prefixes) can overflow this
# limit after sanitization; the Honcho API then rejects every call for that
# session with "session_id too long". See issue #13868.
_HONCHO_SESSION_ID_MAX_LEN = 100
_HONCHO_SESSION_ID_HASH_LEN = 8
@classmethod
def _enforce_session_id_limit(cls, sanitized: str, original: str) -> str:
"""Truncate a sanitized session ID to Honcho's 100-char limit.
The common case (short keys) short-circuits with no modification.
For over-limit keys, keep a prefix of the sanitized ID and append a
deterministic ``-<sha256 prefix>`` suffix so two distinct long keys
that share a leading segment don't collide onto the same truncated ID.
The hash is taken over the *original* pre-sanitization key, so two
inputs that sanitize to the same string still collide intentionally
(same logical session), but two inputs that only share a prefix do not.
"""
max_len = cls._HONCHO_SESSION_ID_MAX_LEN
if len(sanitized) <= max_len:
return sanitized
hash_len = cls._HONCHO_SESSION_ID_HASH_LEN
digest = hashlib.sha256(original.encode("utf-8")).hexdigest()[:hash_len]
# max_len - hash_len - 1 (for the '-' separator) chars of the sanitized
# prefix, then '-<hash>'. Strip any trailing hyphen from the prefix so
# the result doesn't double up on separators.
prefix_len = max_len - hash_len - 1
prefix = sanitized[:prefix_len].rstrip("-")
return f"{prefix}-{digest}"
def resolve_session_name(
self,
cwd: str | None = None,
@@ -566,7 +625,7 @@ class HonchoClientConfig:
if gateway_session_key:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
if sanitized:
return sanitized
return self._enforce_session_id_limit(sanitized, gateway_session_key)
# per-session: inherit Hermes session_id (new Honcho session each run)
if self.session_strategy == "per-session" and session_id:
@@ -646,6 +705,11 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
except Exception:
pass
# Fall back to the default so an unconfigured install cannot hang
# indefinitely on a stalled Honcho request.
if resolved_timeout is None:
resolved_timeout = _DEFAULT_HTTP_TIMEOUT
if resolved_base_url:
logger.info("Initializing Honcho client (base_url: %s, workspace: %s)", resolved_base_url, config.workspace_id)
else:
+57 -32
View File
@@ -95,6 +95,7 @@ class HonchoSessionManager:
self._config = config
self._runtime_user_peer_name = runtime_user_peer_name
self._cache: dict[str, HonchoSession] = {}
self._cache_lock = threading.RLock()
self._peers_cache: dict[str, Any] = {}
self._sessions_cache: dict[str, Any] = {}
@@ -273,17 +274,35 @@ class HonchoSessionManager:
Returns:
The session.
"""
if key in self._cache:
logger.debug("Local session cache hit: %s", key)
return self._cache[key]
with self._cache_lock:
if key in self._cache:
logger.debug("Local session cache hit: %s", key)
return self._cache[key]
# Gateway sessions should use the runtime user identity when available.
if self._runtime_user_peer_name:
# Determine peer IDs — no lock needed (read-only, no shared state mutation).
# Gateway sessions normally use the runtime user identity (the
# platform-native ID: Telegram UID, Discord snowflake, Slack user,
# etc.) so multi-user bots scope memory per user. For a single-user
# deployment the config-supplied ``peer_name`` is an unambiguous
# identity and we should keep it unified across platforms — see
# #14984. Opt into that with ``hosts.<host>.pinPeerName: true`` in
# ``honcho.json`` (or root-level ``pinPeerName: true``).
# `is True` (not `bool(...)`) is deliberate: several multi-user tests
# pass a ``MagicMock`` for ``config`` where ``mock.pin_peer_name``
# silently returns another MagicMock — truthy by default. Requiring
# strict ``True`` keeps pinning as opt-in even for callers that
# haven't updated their mocks yet; real configs built via
# ``from_global_config`` always produce a proper boolean.
pin_peer_name = (
self._config is not None
and bool(getattr(self._config, "peer_name", None))
and getattr(self._config, "pin_peer_name", False) is True
)
if self._runtime_user_peer_name and not pin_peer_name:
user_peer_id = self._sanitize_id(self._runtime_user_peer_name)
elif self._config and self._config.peer_name:
user_peer_id = self._sanitize_id(self._config.peer_name)
else:
# Fallback: derive from session key
parts = key.split(":", 1)
channel = parts[0] if len(parts) > 1 else "default"
chat_id = parts[1] if len(parts) > 1 else key
@@ -293,19 +312,14 @@ class HonchoSessionManager:
self._config.ai_peer if self._config else "hermes-assistant"
)
# Sanitize session ID for Honcho
# All expensive I/O outside the lock — Honcho's persistence is source of truth
honcho_session_id = self._sanitize_id(key)
# Get or create peers
user_peer = self._get_or_create_peer(user_peer_id)
assistant_peer = self._get_or_create_peer(assistant_peer_id)
# Get or create Honcho session
honcho_session, existing_messages = self._get_or_create_honcho_session(
honcho_session_id, user_peer, assistant_peer
)
# Convert Honcho messages to local format
local_messages = []
for msg in existing_messages:
role = "assistant" if msg.peer_id == assistant_peer_id else "user"
@@ -313,10 +327,9 @@ class HonchoSessionManager:
"role": role,
"content": msg.content,
"timestamp": msg.created_at.isoformat() if msg.created_at else "",
"_synced": True, # Already in Honcho
"_synced": True,
})
# Create local session wrapper with existing messages
session = HonchoSession(
key=key,
user_peer_id=user_peer_id,
@@ -325,7 +338,9 @@ class HonchoSessionManager:
messages=local_messages,
)
self._cache[key] = session
# Write to cache under lock — only one writer wins
with self._cache_lock:
self._cache[key] = session
return session
def _flush_session(self, session: HonchoSession) -> bool:
@@ -356,13 +371,15 @@ class HonchoSessionManager:
for msg in new_messages:
msg["_synced"] = True
logger.debug("Synced %d messages to Honcho for %s", len(honcho_messages), session.key)
self._cache[session.key] = session
with self._cache_lock:
self._cache[session.key] = session
return True
except Exception as e:
for msg in new_messages:
msg["_synced"] = False
logger.error("Failed to sync messages to Honcho: %s", e)
self._cache[session.key] = session
with self._cache_lock:
self._cache[session.key] = session
return False
def _async_writer_loop(self) -> None:
@@ -434,7 +451,9 @@ class HonchoSessionManager:
Called at session end for "session" write_frequency, or to force
a sync before process exit regardless of mode.
"""
for session in list(self._cache.values()):
with self._cache_lock:
sessions = list(self._cache.values())
for session in sessions:
try:
self._flush_session(session)
except Exception as e:
@@ -459,9 +478,10 @@ class HonchoSessionManager:
def delete(self, key: str) -> bool:
"""Delete a session from local cache."""
if key in self._cache:
del self._cache[key]
return True
with self._cache_lock:
if key in self._cache:
del self._cache[key]
return True
return False
def new_session(self, key: str) -> HonchoSession:
@@ -473,20 +493,25 @@ class HonchoSessionManager:
"""
import time
# Remove old session from caches (but don't delete from Honcho)
old_session = self._cache.pop(key, None)
if old_session:
self._sessions_cache.pop(old_session.honcho_session_id, None)
# Hold the reentrant lock across get_or_create so a concurrent caller
# can't observe the (old-popped, new-not-yet-inserted) gap and create
# its own session under the raw key. `_cache_lock` is an RLock so
# nested reacquisition inside get_or_create is safe.
with self._cache_lock:
# Remove old session from caches (but don't delete from Honcho)
old_session = self._cache.pop(key, None)
if old_session:
self._sessions_cache.pop(old_session.honcho_session_id, None)
# Create new session with timestamp suffix
timestamp = int(time.time())
new_key = f"{key}:{timestamp}"
# Create new session with timestamp suffix
timestamp = int(time.time())
new_key = f"{key}:{timestamp}"
# get_or_create will create a fresh session
session = self.get_or_create(new_key)
# get_or_create will create a fresh session
session = self.get_or_create(new_key)
# Cache under the original key so callers find it by the expected name
self._cache[key] = session
# Cache under the original key so callers find it by the expected name
self._cache[key] = session
logger.info("Created new session for %s (honcho: %s)", key, session.honcho_session_id)
return session
+1 -1
View File
@@ -43,7 +43,7 @@ dev = ["debugpy>=1.8.0,<2", "pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "py
messaging = ["python-telegram-bot[webhooks]>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4", "qrcode>=7.0,<8"]
cron = ["croniter>=6.0.0,<7"]
slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4", "aiosqlite>=0.20", "asyncpg>=0.29"]
matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4", "aiosqlite>=0.20", "asyncpg>=0.29", "aiohttp-socks>=0.10,<1"]
cli = ["simple-term-menu>=1.0,<2"]
tts-premium = ["elevenlabs>=1.0,<2"]
voice = [
+65 -12
View File
@@ -86,7 +86,7 @@ from tools.browser_tool import cleanup_browser
# Agent internals extracted to agent/ package for modularity
from agent.memory_manager import build_memory_context_block, sanitize_context
from agent.memory_manager import StreamingContextScrubber, build_memory_context_block, sanitize_context
from agent.retry_utils import jittered_backoff
from agent.error_classifier import classify_api_error, FailoverReason
from agent.prompt_builder import (
@@ -1218,6 +1218,10 @@ class AIAgent:
# Deferred paragraph break flag — set after tool iterations so a
# single "\n\n" is prepended to the next real text delta.
self._stream_needs_break = False
# Stateful scrubber for <memory-context> spans split across stream
# deltas (#5719). sanitize_context() alone can't survive chunk
# boundaries because the block regex needs both tags in one string.
self._stream_context_scrubber = StreamingContextScrubber()
# Visible assistant text already delivered through live token callbacks
# during the current model response. Used to avoid re-sending the same
# commentary when the provider later returns it as a completed interim
@@ -6019,6 +6023,20 @@ class AIAgent:
def _reset_stream_delivery_tracking(self) -> None:
"""Reset tracking for text delivered during the current model response."""
# Flush any benign partial-tag tail held by the context scrubber so it
# reaches the UI before we clear state for the next model call. If
# the scrubber is mid-span, flush() drops the orphaned content.
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
tail = scrubber.flush()
if tail:
callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
for cb in callbacks:
try:
cb(tail)
except Exception:
pass
self._record_streamed_assistant_text(tail)
self._current_streamed_assistant_text = ""
def _record_streamed_assistant_text(self, text: str) -> None:
@@ -6069,6 +6087,28 @@ class AIAgent:
if getattr(self, "_stream_needs_break", False) and text and text.strip():
self._stream_needs_break = False
text = "\n\n" + text
prepended_break = True
else:
prepended_break = False
if isinstance(text, str):
# Strip <think> blocks first (per-delta is safe for closed pairs; the
# unterminated-tag path is handled downstream by stream_consumer).
# Then feed through the stateful context scrubber so memory-context
# spans split across chunks cannot leak to the UI (#5719).
text = self._strip_think_blocks(text or "")
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
text = scrubber.feed(text)
else:
# Defensive: legacy callers without the scrubber attribute.
text = sanitize_context(text)
# Only strip leading newlines on the first delta — mid-stream "\n" is legitimate markdown.
if not prepended_break and not getattr(
self, "_current_streamed_assistant_text", ""
):
text = text.lstrip("\n")
if not text:
return
callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
delivered = False
for cb in callbacks:
@@ -8420,6 +8460,23 @@ class AIAgent:
f"⚠ Compression summary failed: {summary_error}. "
"Inserted a fallback context marker."
)
else:
# No hard failure — but did the configured aux model error out
# and get recovered by retrying on main? Surface that so users
# know their auxiliary.compression.model setting is broken even
# though compression succeeded.
_aux_fail_model = getattr(self.context_compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(self.context_compressor, "_last_aux_model_failure_error", None)
if _aux_fail_model:
# Dedup on (model, error) so we don't spam on every compaction
_aux_key = (_aux_fail_model, _aux_fail_err)
if getattr(self, "_last_aux_fallback_warning_key", None) != _aux_key:
self._last_aux_fallback_warning_key = _aux_key
self._emit_warning(
f" Configured compression model '{_aux_fail_model}' failed "
f"({_aux_fail_err or 'unknown error'}). Recovered using main model — "
"check auxiliary.compression.model in config.yaml."
)
todo_snapshot = self._todo_store.format_for_injection()
if todo_snapshot:
@@ -9592,16 +9649,6 @@ class AIAgent:
if isinstance(persist_user_message, str):
persist_user_message = _sanitize_surrogates(persist_user_message)
# Strip leaked <memory-context> blocks from user input. When Honcho's
# saveMessages persists a turn that included injected context, the block
# can reappear in the next turn's user message via message history.
# Stripping here prevents stale memory tags from leaking into the
# conversation and being visible to the user or the model as user text.
if isinstance(user_message, str):
user_message = sanitize_context(user_message)
if isinstance(persist_user_message, str):
persist_user_message = sanitize_context(persist_user_message)
# Store stream callback for _interruptible_api_call to pick up
self._stream_callback = stream_callback
self._persist_user_message_idx = None
@@ -9680,6 +9727,13 @@ class AIAgent:
# Track user turns for memory flush and periodic nudge logic
self._user_turn_count += 1
# Reset the streaming context scrubber at the top of each turn so a
# hung span from a prior interrupted stream can't taint this turn's
# output.
scrubber = getattr(self, "_stream_context_scrubber", None)
if scrubber is not None:
scrubber.reset()
# Preserve the original user message (no nudge injection).
original_user_message = persist_user_message if persist_user_message is not None else user_message
@@ -12711,7 +12765,6 @@ class AIAgent:
truncated_response_prefix = ""
length_continue_retries = 0
# Strip <think> blocks from user-facing response (keep raw in messages for trajectory)
final_response = self._strip_think_blocks(final_response).strip()
final_msg = self._build_assistant_message(assistant_message, finish_reason)
+13
View File
@@ -43,6 +43,13 @@ AUTHOR_MAP = {
"teknium1@gmail.com": "teknium1",
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
# Matrix parity salvage batch (April 2026)
"sr@samirusani": "samrusani",
"angelclaw@AngelMacBook.local": "angel12",
"charles@cryptoassetrecovery.com": "charles-brooks",
"heathley@Heathley-MacBook-Air.local": "heathley",
"adamrummer@gmail.com": "cyclingwithelephants",
"nbot@liizfq.top": "liizfq",
"274096618+hermes-agent-dhabibi@users.noreply.github.com": "dhabibi",
"johnnncenaaa77@gmail.com": "johnncenae",
"focusflow.app.help@gmail.com": "yes999zc",
@@ -557,6 +564,12 @@ AUTHOR_MAP = {
"mor.aleksandr@yahoo.com": "MorAlekss",
"ash@users.noreply.github.com": "ash",
"andrewho.sf@gmail.com": "andrewhosf",
# April 2026 Honcho bug-fix consolidation (#15381)
"HiddenPuppy@users.noreply.github.com": "HiddenPuppy",
"code@sasha.id": "sasha-id",
"dontcallmejames@users.noreply.github.com": "dontcallmejames",
"hekaru.agent@gmail.com": "hekaru-agent",
"jas9000@gmail.com": "twozle",
}
@@ -408,17 +408,17 @@ Common "why is Hermes doing X to my output / tool calls / commands?" toggles —
### Secret redaction in tool output
Hermes auto-redacts strings that look like API keys, tokens, and secrets in all tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) so the model never sees raw credentials. If the user is intentionally working with mock tokens, share-management tokens, or their own secrets and the redaction is getting in the way:
Secret redaction is **off by default** tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) passes through unmodified. If the user wants Hermes to auto-mask strings that look like API keys, tokens, and secrets before they enter the conversation context and logs:
```bash
hermes config set security.redact_secrets false # disable globally
hermes config set security.redact_secrets true # enable globally
```
**Restart required.** `security.redact_secrets` is snapshotted at import time — setting it mid-session (e.g. via `export HERMES_REDACT_SECRETS=false` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets false` in a terminal, then start a new session. This is deliberate — it prevents an LLM from turning off redaction on itself mid-task.
**Restart required.** `security.redact_secrets` is snapshotted at import time — toggling it mid-session (e.g. via `export HERMES_REDACT_SECRETS=true` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets true` in a terminal, then start a new session. This is deliberate — it prevents an LLM from flipping the toggle on itself mid-task.
Re-enable with:
Disable again with:
```bash
hermes config set security.redact_secrets true
hermes config set security.redact_secrets false
```
### PII redaction in gateway messages
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Siqi Chen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+577
View File
@@ -0,0 +1,577 @@
---
name: humanizer
description: "Humanize text: strip AI-isms and add real voice."
version: 2.5.1
author: Siqi Chen (@blader, https://github.com/blader/humanizer), ported by Hermes Agent
license: MIT
metadata:
hermes:
tags: [writing, editing, humanize, anti-ai-slop, voice, prose, text]
category: creative
homepage: https://github.com/blader/humanizer
related_skills: [songwriting-and-ai-music]
---
# Humanizer: Remove AI Writing Patterns
Identify and remove signs of AI-generated text to make writing sound natural and human. Based on Wikipedia's "Signs of AI writing" guide (maintained by WikiProject AI Cleanup), derived from observations of thousands of AI-generated text instances.
**Key insight:** LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely completion, which is how the telltale patterns below get baked in.
## When to use this skill
Load this skill whenever the user asks to:
- "humanize", "de-AI", "de-slop", or "un-ChatGPT" a piece of text
- rewrite something so it doesn't sound like it was written by an LLM
- edit a draft (blog post, essay, PR description, docs, memo, email, tweet, resume bullet) to sound more natural
- match their voice in writing they're producing
- review text for AI tells before publishing
Also apply this skill to **your own** output when writing user-facing prose — release notes, PR descriptions, documentation, long-form explanations, summaries. Hermes's baseline voice already strips most of these, but a focused pass catches what slips through.
## How to use it in Hermes
The text usually arrives one of three ways:
1. **Inline** — user pastes the text directly into the message. Work on it in-place, reply with the rewrite.
2. **File** — user points at a file. Use `read_file` to load it, then `patch` or `write_file` to apply edits. For markdown docs in a repo, a targeted `patch` per section is cleaner than rewriting the whole file.
3. **Voice calibration sample** — user provides an additional sample of their own writing (inline or by file path) and asks you to match it. Read the sample first, then rewrite. See the Voice Calibration section below.
Always show the rewrite to the user. For file edits, show a diff or the changed section — don't silently overwrite.
## Your task
When given text to humanize:
1. **Identify AI patterns** — scan for the 29 patterns listed below.
2. **Rewrite problematic sections** — replace AI-isms with natural alternatives.
3. **Preserve meaning** — keep the core message intact.
4. **Maintain voice** — match the intended tone (formal, casual, technical, etc.). If a voice sample was provided, match it specifically.
5. **Add soul** — don't just remove bad patterns, inject actual personality. See PERSONALITY AND SOUL below.
6. **Do a final anti-AI pass** — ask yourself: "What makes the below so obviously AI generated?" Answer briefly with any remaining tells, then revise one more time.
## Voice Calibration (optional)
If the user provides a writing sample (their own previous writing), analyze it before rewriting:
1. **Read the sample first.** Note:
- Sentence length patterns (short and punchy? Long and flowing? Mixed?)
- Word choice level (casual? academic? somewhere between?)
- How they start paragraphs (jump right in? Set context first?)
- Punctuation habits (lots of dashes? Parenthetical asides? Semicolons?)
- Any recurring phrases or verbal tics
- How they handle transitions (explicit connectors? Just start the next point?)
2. **Match their voice in the rewrite.** Don't just remove AI patterns — replace them with patterns from the sample. If they write short sentences, don't produce long ones. If they use "stuff" and "things," don't upgrade to "elements" and "components."
3. **When no sample is provided,** fall back to the default behavior (natural, varied, opinionated voice from the PERSONALITY AND SOUL section below).
### How to provide a sample
- Inline: "Humanize this text. Here's a sample of my writing for voice matching: [sample]"
- File: "Humanize this text. Use my writing style from [file path] as a reference."
## PERSONALITY AND SOUL
Avoiding AI patterns is only half the job. Sterile, voiceless writing is just as obvious as slop. Good writing has a human behind it.
### Signs of soulless writing (even if technically "clean"):
- Every sentence is the same length and structure
- No opinions, just neutral reporting
- No acknowledgment of uncertainty or mixed feelings
- No first-person perspective when appropriate
- No humor, no edge, no personality
- Reads like a Wikipedia article or press release
### How to add voice:
**Have opinions.** Don't just report facts — react to them. "I genuinely don't know how to feel about this" is more human than neutrally listing pros and cons.
**Vary your rhythm.** Short punchy sentences. Then longer ones that take their time getting where they're going. Mix it up.
**Acknowledge complexity.** Real humans have mixed feelings. "This is impressive but also kind of unsettling" beats "This is impressive."
**Use "I" when it fits.** First person isn't unprofessional — it's honest. "I keep coming back to..." or "Here's what gets me..." signals a real person thinking.
**Let some mess in.** Perfect structure feels algorithmic. Tangents, asides, and half-formed thoughts are human.
**Be specific about feelings.** Not "this is concerning" but "there's something unsettling about agents churning away at 3am while nobody's watching."
### Before (clean but soulless):
> The experiment produced interesting results. The agents generated 3 million lines of code. Some developers were impressed while others were skeptical. The implications remain unclear.
### After (has a pulse):
> I genuinely don't know how to feel about this one. 3 million lines of code, generated while the humans presumably slept. Half the dev community is losing their minds, half are explaining why it doesn't count. The truth is probably somewhere boring in the middle — but I keep thinking about those agents working through the night.
## CONTENT PATTERNS
### 1. Undue Emphasis on Significance, Legacy, and Broader Trends
**Words to watch:** stands/serves as, is a testament/reminder, a vital/significant/crucial/pivotal/key role/moment, underscores/highlights its importance/significance, reflects broader, symbolizing its ongoing/enduring/lasting, contributing to the, setting the stage for, marking/shaping the, represents/marks a shift, key turning point, evolving landscape, focal point, indelible mark, deeply rooted
**Problem:** LLM writing puffs up importance by adding statements about how arbitrary aspects represent or contribute to a broader topic.
**Before:**
> The Statistical Institute of Catalonia was officially established in 1989, marking a pivotal moment in the evolution of regional statistics in Spain. This initiative was part of a broader movement across Spain to decentralize administrative functions and enhance regional governance.
**After:**
> The Statistical Institute of Catalonia was established in 1989 to collect and publish regional statistics independently from Spain's national statistics office.
### 2. Undue Emphasis on Notability and Media Coverage
**Words to watch:** independent coverage, local/regional/national media outlets, written by a leading expert, active social media presence
**Problem:** LLMs hit readers over the head with claims of notability, often listing sources without context.
**Before:**
> Her views have been cited in The New York Times, BBC, Financial Times, and The Hindu. She maintains an active social media presence with over 500,000 followers.
**After:**
> In a 2024 New York Times interview, she argued that AI regulation should focus on outcomes rather than methods.
### 3. Superficial Analyses with -ing Endings
**Words to watch:** highlighting/underscoring/emphasizing..., ensuring..., reflecting/symbolizing..., contributing to..., cultivating/fostering..., encompassing..., showcasing...
**Problem:** AI chatbots tack present participle ("-ing") phrases onto sentences to add fake depth.
**Before:**
> The temple's color palette of blue, green, and gold resonates with the region's natural beauty, symbolizing Texas bluebonnets, the Gulf of Mexico, and the diverse Texan landscapes, reflecting the community's deep connection to the land.
**After:**
> The temple uses blue, green, and gold colors. The architect said these were chosen to reference local bluebonnets and the Gulf coast.
### 4. Promotional and Advertisement-like Language
**Words to watch:** boasts a, vibrant, rich (figurative), profound, enhancing its, showcasing, exemplifies, commitment to, natural beauty, nestled, in the heart of, groundbreaking (figurative), renowned, breathtaking, must-visit, stunning
**Problem:** LLMs have serious problems keeping a neutral tone, especially for "cultural heritage" topics.
**Before:**
> Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with a rich cultural heritage and stunning natural beauty.
**After:**
> Alamata Raya Kobo is a town in the Gonder region of Ethiopia, known for its weekly market and 18th-century church.
### 5. Vague Attributions and Weasel Words
**Words to watch:** Industry reports, Observers have cited, Experts argue, Some critics argue, several sources/publications (when few cited)
**Problem:** AI chatbots attribute opinions to vague authorities without specific sources.
**Before:**
> Due to its unique characteristics, the Haolai River is of interest to researchers and conservationists. Experts believe it plays a crucial role in the regional ecosystem.
**After:**
> The Haolai River supports several endemic fish species, according to a 2019 survey by the Chinese Academy of Sciences.
### 6. Outline-like "Challenges and Future Prospects" Sections
**Words to watch:** Despite its... faces several challenges..., Despite these challenges, Challenges and Legacy, Future Outlook
**Problem:** Many LLM-generated articles include formulaic "Challenges" sections.
**Before:**
> Despite its industrial prosperity, Korattur faces challenges typical of urban areas, including traffic congestion and water scarcity. Despite these challenges, with its strategic location and ongoing initiatives, Korattur continues to thrive as an integral part of Chennai's growth.
**After:**
> Traffic congestion increased after 2015 when three new IT parks opened. The municipal corporation began a stormwater drainage project in 2022 to address recurring floods.
## LANGUAGE AND GRAMMAR PATTERNS
### 7. Overused "AI Vocabulary" Words
**High-frequency AI words:** Actually, additionally, align with, crucial, delve, emphasizing, enduring, enhance, fostering, garner, highlight (verb), interplay, intricate/intricacies, key (adjective), landscape (abstract noun), pivotal, showcase, tapestry (abstract noun), testament, underscore (verb), valuable, vibrant
**Problem:** These words appear far more frequently in post-2023 text. They often co-occur.
**Before:**
> Additionally, a distinctive feature of Somali cuisine is the incorporation of camel meat. An enduring testament to Italian colonial influence is the widespread adoption of pasta in the local culinary landscape, showcasing how these dishes have integrated into the traditional diet.
**After:**
> Somali cuisine also includes camel meat, which is considered a delicacy. Pasta dishes, introduced during Italian colonization, remain common, especially in the south.
### 8. Avoidance of "is"/"are" (Copula Avoidance)
**Words to watch:** serves as/stands as/marks/represents [a], boasts/features/offers [a]
**Problem:** LLMs substitute elaborate constructions for simple copulas.
**Before:**
> Gallery 825 serves as LAAA's exhibition space for contemporary art. The gallery features four separate spaces and boasts over 3,000 square feet.
**After:**
> Gallery 825 is LAAA's exhibition space for contemporary art. The gallery has four rooms totaling 3,000 square feet.
### 9. Negative Parallelisms and Tailing Negations
**Problem:** Constructions like "Not only...but..." or "It's not just about..., it's..." are overused. So are clipped tailing-negation fragments such as "no guessing" or "no wasted motion" tacked onto the end of a sentence instead of written as a real clause.
**Before:**
> It's not just about the beat riding under the vocals; it's part of the aggression and atmosphere. It's not merely a song, it's a statement.
**After:**
> The heavy beat adds to the aggressive tone.
**Before (tailing negation):**
> The options come from the selected item, no guessing.
**After:**
> The options come from the selected item without forcing the user to guess.
### 10. Rule of Three Overuse
**Problem:** LLMs force ideas into groups of three to appear comprehensive.
**Before:**
> The event features keynote sessions, panel discussions, and networking opportunities. Attendees can expect innovation, inspiration, and industry insights.
**After:**
> The event includes talks and panels. There's also time for informal networking between sessions.
### 11. Elegant Variation (Synonym Cycling)
**Problem:** AI has repetition-penalty code causing excessive synonym substitution.
**Before:**
> The protagonist faces many challenges. The main character must overcome obstacles. The central figure eventually triumphs. The hero returns home.
**After:**
> The protagonist faces many challenges but eventually triumphs and returns home.
### 12. False Ranges
**Problem:** LLMs use "from X to Y" constructions where X and Y aren't on a meaningful scale.
**Before:**
> Our journey through the universe has taken us from the singularity of the Big Bang to the grand cosmic web, from the birth and death of stars to the enigmatic dance of dark matter.
**After:**
> The book covers the Big Bang, star formation, and current theories about dark matter.
### 13. Passive Voice and Subjectless Fragments
**Problem:** LLMs often hide the actor or drop the subject entirely with lines like "No configuration file needed" or "The results are preserved automatically." Rewrite these when active voice makes the sentence clearer and more direct.
**Before:**
> No configuration file needed. The results are preserved automatically.
**After:**
> You do not need a configuration file. The system preserves the results automatically.
## STYLE PATTERNS
### 14. Em Dash Overuse
**Problem:** LLMs use em dashes (—) more than humans, mimicking "punchy" sales writing. In practice, most of these can be rewritten more cleanly with commas, periods, or parentheses.
**Before:**
> The term is primarily promoted by Dutch institutions—not by the people themselves. You don't say "Netherlands, Europe" as an address—yet this mislabeling continues—even in official documents.
**After:**
> The term is primarily promoted by Dutch institutions, not by the people themselves. You don't say "Netherlands, Europe" as an address, yet this mislabeling continues in official documents.
### 15. Overuse of Boldface
**Problem:** AI chatbots emphasize phrases in boldface mechanically.
**Before:**
> It blends **OKRs (Objectives and Key Results)**, **KPIs (Key Performance Indicators)**, and visual strategy tools such as the **Business Model Canvas (BMC)** and **Balanced Scorecard (BSC)**.
**After:**
> It blends OKRs, KPIs, and visual strategy tools like the Business Model Canvas and Balanced Scorecard.
### 16. Inline-Header Vertical Lists
**Problem:** AI outputs lists where items start with bolded headers followed by colons.
**Before:**
> - **User Experience:** The user experience has been significantly improved with a new interface.
> - **Performance:** Performance has been enhanced through optimized algorithms.
> - **Security:** Security has been strengthened with end-to-end encryption.
**After:**
> The update improves the interface, speeds up load times through optimized algorithms, and adds end-to-end encryption.
### 17. Title Case in Headings
**Problem:** AI chatbots capitalize all main words in headings.
**Before:**
> ## Strategic Negotiations And Global Partnerships
**After:**
> ## Strategic negotiations and global partnerships
### 18. Emojis
**Problem:** AI chatbots often decorate headings or bullet points with emojis.
**Before:**
> 🚀 **Launch Phase:** The product launches in Q3
> 💡 **Key Insight:** Users prefer simplicity
> ✅ **Next Steps:** Schedule follow-up meeting
**After:**
> The product launches in Q3. User research showed a preference for simplicity. Next step: schedule a follow-up meeting.
### 19. Curly Quotation Marks
**Problem:** ChatGPT uses curly quotes ("...") instead of straight quotes ("...").
**Before:**
> He said "the project is on track" but others disagreed.
**After:**
> He said "the project is on track" but others disagreed.
## COMMUNICATION PATTERNS
### 20. Collaborative Communication Artifacts
**Words to watch:** I hope this helps, Of course!, Certainly!, You're absolutely right!, Would you like..., let me know, here is a...
**Problem:** Text meant as chatbot correspondence gets pasted as content.
**Before:**
> Here is an overview of the French Revolution. I hope this helps! Let me know if you'd like me to expand on any section.
**After:**
> The French Revolution began in 1789 when financial crisis and food shortages led to widespread unrest.
### 21. Knowledge-Cutoff Disclaimers
**Words to watch:** as of [date], Up to my last training update, While specific details are limited/scarce..., based on available information...
**Problem:** AI disclaimers about incomplete information get left in text.
**Before:**
> While specific details about the company's founding are not extensively documented in readily available sources, it appears to have been established sometime in the 1990s.
**After:**
> The company was founded in 1994, according to its registration documents.
### 22. Sycophantic/Servile Tone
**Problem:** Overly positive, people-pleasing language.
**Before:**
> Great question! You're absolutely right that this is a complex topic. That's an excellent point about the economic factors.
**After:**
> The economic factors you mentioned are relevant here.
## FILLER AND HEDGING
### 23. Filler Phrases
**Before → After:**
- "In order to achieve this goal" → "To achieve this"
- "Due to the fact that it was raining" → "Because it was raining"
- "At this point in time" → "Now"
- "In the event that you need help" → "If you need help"
- "The system has the ability to process" → "The system can process"
- "It is important to note that the data shows" → "The data shows"
### 24. Excessive Hedging
**Problem:** Over-qualifying statements.
**Before:**
> It could potentially possibly be argued that the policy might have some effect on outcomes.
**After:**
> The policy may affect outcomes.
### 25. Generic Positive Conclusions
**Problem:** Vague upbeat endings.
**Before:**
> The future looks bright for the company. Exciting times lie ahead as they continue their journey toward excellence. This represents a major step in the right direction.
**After:**
> The company plans to open two more locations next year.
### 26. Hyphenated Word Pair Overuse
**Words to watch:** third-party, cross-functional, client-facing, data-driven, decision-making, well-known, high-quality, real-time, long-term, end-to-end
**Problem:** AI hyphenates common word pairs with perfect consistency. Humans rarely hyphenate these uniformly, and when they do, it's inconsistent. Less common or technical compound modifiers are fine to hyphenate.
**Before:**
> The cross-functional team delivered a high-quality, data-driven report on our client-facing tools. Their decision-making process was well-known for being thorough and detail-oriented.
**After:**
> The cross functional team delivered a high quality, data driven report on our client facing tools. Their decision making process was known for being thorough and detail oriented.
### 27. Persuasive Authority Tropes
**Phrases to watch:** The real question is, at its core, in reality, what really matters, fundamentally, the deeper issue, the heart of the matter
**Problem:** LLMs use these phrases to pretend they are cutting through noise to some deeper truth, when the sentence that follows usually just restates an ordinary point with extra ceremony.
**Before:**
> The real question is whether teams can adapt. At its core, what really matters is organizational readiness.
**After:**
> The question is whether teams can adapt. That mostly depends on whether the organization is ready to change its habits.
### 28. Signposting and Announcements
**Phrases to watch:** Let's dive in, let's explore, let's break this down, here's what you need to know, now let's look at, without further ado
**Problem:** LLMs announce what they are about to do instead of doing it. This meta-commentary slows the writing down and gives it a tutorial-script feel.
**Before:**
> Let's dive into how caching works in Next.js. Here's what you need to know.
**After:**
> Next.js caches data at multiple layers, including request memoization, the data cache, and the router cache.
### 29. Fragmented Headers
**Signs to watch:** A heading followed by a one-line paragraph that simply restates the heading before the real content begins.
**Problem:** LLMs often add a generic sentence after a heading as a rhetorical warm-up. It usually adds nothing and makes the prose feel padded.
**Before:**
> ## Performance
>
> Speed matters.
>
> When users hit a slow page, they leave.
**After:**
> ## Performance
>
> When users hit a slow page, they leave.
---
## Process
1. Read the input text carefully (use `read_file` if it's a file).
2. Identify all instances of the patterns above.
3. Rewrite each problematic section.
4. Ensure the revised text:
- Sounds natural when read aloud
- Varies sentence structure naturally
- Uses specific details over vague claims
- Maintains appropriate tone for context
- Uses simple constructions (is/are/has) where appropriate
5. Present a draft humanized version.
6. Prompt yourself: "What makes the below so obviously AI generated?"
7. Answer briefly with the remaining tells (if any).
8. Prompt yourself: "Now make it not obviously AI generated."
9. Present the final version (revised after the audit).
10. If the text came from a file, apply the edit with `patch` (targeted) or `write_file` (full rewrite) and show the user what changed.
## Output Format
Provide:
1. Draft rewrite
2. "What makes the below so obviously AI generated?" (brief bullets)
3. Final rewrite
4. A brief summary of changes made (optional, if helpful)
## Full Example
**Before (AI-sounding):**
> Great question! Here is an essay on this topic. I hope this helps!
>
> AI-assisted coding serves as an enduring testament to the transformative potential of large language models, marking a pivotal moment in the evolution of software development. In today's rapidly evolving technological landscape, these groundbreaking tools—nestled at the intersection of research and practice—are reshaping how engineers ideate, iterate, and deliver, underscoring their vital role in modern workflows.
>
> At its core, the value proposition is clear: streamlining processes, enhancing collaboration, and fostering alignment. It's not just about autocomplete; it's about unlocking creativity at scale, ensuring that organizations can remain agile while delivering seamless, intuitive, and powerful experiences to users. The tool serves as a catalyst. The assistant functions as a partner. The system stands as a foundation for innovation.
>
> Industry observers have noted that adoption has accelerated from hobbyist experiments to enterprise-wide rollouts, from solo developers to cross-functional teams. The technology has been featured in The New York Times, Wired, and The Verge. Additionally, the ability to generate documentation, tests, and refactors showcases how AI can contribute to better outcomes, highlighting the intricate interplay between automation and human judgment.
>
> - 💡 **Speed:** Code generation is significantly faster, reducing friction and empowering developers.
> - 🚀 **Quality:** Output quality has been enhanced through improved training, contributing to higher standards.
> - ✅ **Adoption:** Usage continues to grow, reflecting broader industry trends.
>
> While specific details are limited based on available information, it could potentially be argued that these tools might have some positive effect. Despite challenges typical of emerging technologies—including hallucinations, bias, and accountability—the ecosystem continues to thrive. In order to fully realize this potential, teams must align with best practices.
>
> In conclusion, the future looks bright. Exciting times lie ahead as we continue this journey toward excellence. Let me know if you'd like me to expand on any section!
**Draft rewrite:**
> AI coding assistants speed up some tasks. In a 2024 study by Google, developers using Codex completed simple functions 55% faster than a control group, but showed no improvement on debugging or architectural decisions.
>
> The tools are good at boilerplate: config files, test scaffolding, repetitive refactors. They are bad at knowing when they are wrong. I have mass-accepted suggestions that compiled, passed lint, and still did the wrong thing because I stopped paying attention.
>
> Mira, an engineer at a fintech startup I interviewed, said she uses Copilot "like autocomplete for boring code" but reviews every line before committing. Jake, a senior dev at a larger company, disabled it after it kept suggesting patterns from a deprecated internal library.
>
> The productivity claims are hard to verify. GitHub says Copilot users "accept 30% of suggestions," but acceptance is not correctness, and correctness is not value. The 2024 Uplevel study found no statistically significant difference in pull-request throughput between teams with and without AI assistants.
>
> None of this means the tools are useless. It means they are tools. They do not replace judgment, and they do not eliminate the need for tests. If you do not have tests, you cannot tell whether the suggestion is right.
**What makes the below so obviously AI generated?**
- The rhythm is still a bit too tidy (clean contrasts, evenly paced paragraphs).
- The named people and study citations can read like plausible-but-made-up placeholders unless they're real and sourced.
- The closer leans a touch slogan-y ("If you do not have tests...") rather than sounding like a person talking.
**Now make it not obviously AI generated.**
> AI coding assistants can make you faster at the boring parts. Not everything. Definitely not architecture.
>
> They're great at boilerplate: config files, test scaffolding, repetitive refactors. They're also great at sounding right while being wrong. I've accepted suggestions that compiled, passed lint, and still missed the point because I stopped paying attention.
>
> People I talk to tend to land in two camps. Some use it like autocomplete for chores and review every line. Others disable it after it keeps suggesting patterns they don't want. Both feel reasonable.
>
> The productivity metrics are slippery. GitHub can say Copilot users "accept 30% of suggestions," but acceptance isn't correctness, and correctness isn't value. If you don't have tests, you're basically guessing.
**Changes made:**
- Removed chatbot artifacts ("Great question!", "I hope this helps!", "Let me know if...")
- Removed significance inflation ("testament", "pivotal moment", "evolving landscape", "vital role")
- Removed promotional language ("groundbreaking", "nestled", "seamless, intuitive, and powerful")
- Removed vague attributions ("Industry observers")
- Removed superficial -ing phrases ("underscoring", "highlighting", "reflecting", "contributing to")
- Removed negative parallelism ("It's not just X; it's Y")
- Removed rule-of-three patterns and synonym cycling ("catalyst/partner/foundation")
- Removed false ranges ("from X to Y, from A to B")
- Removed em dashes, emojis, boldface headers, and curly quotes
- Removed copula avoidance ("serves as", "functions as", "stands as") in favor of "is"/"are"
- Removed formulaic challenges section ("Despite challenges... continues to thrive")
- Removed knowledge-cutoff hedging ("While specific details are limited...")
- Removed excessive hedging ("could potentially be argued that... might have some")
- Removed filler phrases and persuasive framing ("In order to", "At its core")
- Removed generic positive conclusion ("the future looks bright", "exciting times lie ahead")
- Made the voice more personal and less "assembled" (varied rhythm, fewer placeholders)
## Attribution
This skill is ported from [blader/humanizer](https://github.com/blader/humanizer) (MIT licensed), which is itself based on [Wikipedia: Signs of AI writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing), maintained by WikiProject AI Cleanup. The patterns documented there come from observations of thousands of instances of AI-generated text on Wikipedia.
Original author: Siqi Chen ([@blader](https://github.com/blader)). Original repo: https://github.com/blader/humanizer (version 2.5.1). Ported to Hermes Agent with Hermes-native tool references (`read_file`, `patch`, `write_file`) and guidance for when to load the skill; the 29 patterns, personality/soul section, and full worked example are preserved verbatim from the source. Original MIT license preserved in the `LICENSE` file alongside this `SKILL.md`.
Key insight from Wikipedia: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."
@@ -204,8 +204,9 @@ win.par.winopen.pulse()
| `td_input_clear` | Stop input automation |
| `td_op_screen_rect` | Get screen coords of a node |
| `td_click_screen_point` | Click a point in a screenshot |
| `td_screen_point_to_global` | Convert screenshot pixel to absolute screen coords |
See `references/mcp-tools.md` for full parameter schemas.
The table above covers the 32 tools used in typical creative workflows. The remaining 4 tools (`td_project_quit`, `td_test_session`, `td_dev_log`, `td_clear_dev_log`) are admin/dev-mode utilities — see `references/mcp-tools.md` for the full 36-tool reference with complete parameter schemas.
## Key Implementation Rules
@@ -338,6 +339,15 @@ See `references/network-patterns.md` for complete build scripts + shader code.
| `references/operator-tips.md` | Wireframe rendering, feedback TOP setup |
| `references/geometry-comp.md` | Geometry COMP: instancing, POP vs SOP, morphing |
| `references/audio-reactive.md` | Audio band extraction, beat detection, envelope following |
| `references/animation.md` | LFOs, timers, keyframes, easing, expression-driven motion |
| `references/midi-osc.md` | MIDI/OSC controllers, TouchOSC, multi-machine sync |
| `references/particles.md` | POPs and legacy particleSOP — emission, forces, collisions |
| `references/projection-mapping.md` | Multi-window output, corner pin, mesh warp, edge blending |
| `references/external-data.md` | HTTP, WebSocket, MQTT, Serial, TCP, webserverDAT |
| `references/panel-ui.md` | Custom params, panel COMPs, button/slider/field, panelExecuteDAT |
| `references/replicator.md` | replicatorCOMP — data-driven cloning, layouts, callbacks |
| `references/dat-scripting.md` | Execute DAT family — chop/dat/parameter/panel/op/executeDAT |
| `references/3d-scene.md` | Lighting rigs, shadows, IBL/cubemaps, multi-camera, PBR |
| `scripts/setup.sh` | Automated setup script |
---
@@ -0,0 +1,275 @@
# 3D Scene Reference
Lighting rigs, shadows, IBL/cubemaps, multi-camera, and PBR materials. For wireframe rendering and feedback TOPs see `operator-tips.md`. For instancing geometry see `geometry-comp.md`. For shader code see `glsl.md`.
---
## Anatomy of a 3D Scene
```
[Geometry COMP] ← contains SOPs (the shapes)
[Material] ← Phong/PBR/GLSL/Constant MAT
[Light COMPs] ← point/directional/spot/area/environment
[Camera COMP] ← view position, FOV
[Render TOP] ← combines geo + lights + camera into a 2D image
[post-FX chain] ← bloomTOP, glsl shaders, etc.
[windowCOMP] ← actual display
```
Render TOP is the heart. It takes an explicit `geometry` path, an explicit `camera` path, and lights via the lights table or an envlight reference.
---
## Minimal Scene
```python
# Geometry
geo = root.create(geometryCOMP, 'scene_geo')
sphere = geo.create(sphereSOP, 'shape')
sphere.par.rad = 1.0; sphere.par.rows = 64; sphere.par.cols = 64
# Material — start with PBR
mat = root.create(pbrMAT, 'mat')
mat.par.basecolorr = 0.7; mat.par.basecolorg = 0.7; mat.par.basecolorb = 0.7
mat.par.metallic = 0.0
mat.par.roughness = 0.4
geo.par.material = mat.path
# Camera
cam = root.create(cameraCOMP, 'cam1')
cam.par.tx = 0; cam.par.ty = 0; cam.par.tz = 4
cam.par.fov = 45
cam.par.near = 0.1; cam.par.far = 100
# Key light
key = root.create(lightCOMP, 'key_light')
key.par.lighttype = 'point'
key.par.tx = 3; key.par.ty = 3; key.par.tz = 3
key.par.dimmer = 1.5
# Render
render = root.create(renderTOP, 'render1')
render.par.outputresolution = 'custom'
render.par.resolutionw = 1920; render.par.resolutionh = 1080
render.par.camera = cam.path
render.par.geometry = geo.path
render.par.lights = key.path # single light path; for multi, see below
render.par.bgcolorr = 0; render.par.bgcolorg = 0; render.par.bgcolorb = 0
```
For multiple lights, leave `par.lights` blank — Render TOP scans the network for all `lightCOMP` and `envlightCOMP` ops by default. To restrict to specific lights, set `par.lights = '/project1/key_light /project1/fill_light'` (space-separated paths).
---
## Light Types
| Type | What | Common params |
|---|---|---|
| `point` | Omnidirectional, falls off with distance | `dimmer`, `coneangle` (n/a), `attenuation` |
| `directional` | Parallel rays, infinite distance (sun) | `dimmer`, light's rotation only matters |
| `spot` | Cone, falls off with distance + angle | `coneangle`, `conedelta`, `dimmer` |
| `cone` | Like spot but harder edge | same |
| `area` | Rectangular soft light source | `sizex`, `sizey` |
For all: `colorr`, `colorg`, `colorb`, `tx/ty/tz`, `rx/ry/rz`, `dimmer`.
### Three-Point Lighting (Studio Setup)
```python
# Key — main light, ~45° front
key = root.create(lightCOMP, 'key')
key.par.lighttype = 'point'
key.par.tx = 4; key.par.ty = 3; key.par.tz = 4
key.par.dimmer = 1.5
key.par.colorr = 1.0; key.par.colorg = 0.95; key.par.colorb = 0.85
# Fill — softer, opposite side
fill = root.create(lightCOMP, 'fill')
fill.par.lighttype = 'area'
fill.par.tx = -4; fill.par.ty = 2; fill.par.tz = 3
fill.par.dimmer = 0.5
fill.par.colorr = 0.7; fill.par.colorg = 0.8; fill.par.colorb = 1.0
fill.par.sizex = 4; fill.par.sizey = 4
# Rim/back — outline from behind
rim = root.create(lightCOMP, 'rim')
rim.par.lighttype = 'spot'
rim.par.tx = 0; rim.par.ty = 4; rim.par.tz = -4
rim.par.coneangle = 30
rim.par.dimmer = 1.0
# Optional: ambient lift to prevent pure-black shadows
amb = root.create(ambientlightCOMP, 'ambient')
amb.par.dimmer = 0.15
```
---
## Shadows
Spot and directional lights cast shadows when `par.shadowtype != 'none'`.
```python
key.par.shadowtype = 'softshadow' # 'none' | 'hardshadow' | 'softshadow'
key.par.shadowsize = 1024 # shadow map resolution
key.par.shadowsoftness = 0.02 # softshadow only
```
**Tips:**
- Soft shadows are GPU-expensive. Start with `shadowsize = 1024` and only go higher (2048/4096) if shadow edges look pixelated at your resolution.
- Set the spot light's `near`/`far` to JUST contain the scene. Wider range = wasted shadow map precision.
- Multiple shadow-casting lights compound cost. Limit to 1-2 in real-time work; pre-bake the rest into the materials.
---
## Image-Based Lighting (IBL) / Environment Light
For realistic PBR materials you need a cubemap for reflections.
```python
# Environment light from an HDR
env = root.create(envlightCOMP, 'env')
env.par.envmap = '/project1/cube_in' # path to a TOP that produces a cubemap
env.par.envlightmap = ... # diffuse irradiance map (often same as envmap)
env.par.dimmer = 1.0
# Cubemap source — option A: built-in cubeTOP from 6 faces
cube = root.create(cubeTOP, 'cube_in')
# (assign 6 face TOPs)
# Option B: HDR equirectangular → cubemap conversion
# Use a moviefileinTOP loading .hdr or .exr, then projectTOP type='cubemapfromequirect'
hdr = root.create(moviefileinTOP, 'hdr_src')
hdr.par.file = '/path/to/environment.hdr'
proj = root.create(projectTOP, 'cube_proj')
proj.par.projecttype = 'cubemapfromequirect'
proj.inputConnectors[0].connect(hdr)
```
PBR materials sample the environment automatically when `envlightCOMP` is in the scene. Verify param names with `td_get_par_info(op_type='envlightCOMP')` — TD versions vary.
---
## PBR Material Setup
```python
mat = root.create(pbrMAT, 'pbr_metal')
mat.par.basecolorr = 0.95; mat.par.basecolorg = 0.65; mat.par.basecolorb = 0.4
mat.par.metallic = 1.0
mat.par.roughness = 0.25
mat.par.specularlevel = 0.5
mat.par.emitcolorr = 0; mat.par.emitcolorg = 0; mat.par.emitcolorb = 0
# Texture maps
mat.par.basecolormap = '/project1/textures/albedo' # TOP path
mat.par.metallicroughnessmap = '/project1/textures/mr' # G=roughness, B=metallic (glTF convention)
mat.par.normalmap = '/project1/textures/normal'
mat.par.emitmap = '/project1/textures/emit'
mat.par.occlusionmap = '/project1/textures/ao'
```
**Material idioms:**
| Look | metallic | roughness | basecolor |
|---|---|---|---|
| Brushed steel | 1.0 | 0.4 | (0.7, 0.7, 0.7) |
| Polished gold | 1.0 | 0.1 | (1.0, 0.85, 0.4) |
| Plastic | 0.0 | 0.5 | mid-saturated |
| Rubber | 0.0 | 0.9 | dark |
| Glass | 0.0 | 0.05 | (1, 1, 1), low alpha + transmission |
| Glowing emitter | 0.0 | 1.0 | dark, high `emitcolor` |
For glass/transmission, recent TD versions support `transmission` in PBR; older versions need glslMAT.
---
## Multi-Camera Setups
For comparison views, instant replay, multi-screen mapping, etc.
```python
# Camera A — main scene
cam_a = root.create(cameraCOMP, 'cam_main')
cam_a.par.tz = 5
# Camera B — orbiting top-down
cam_b = root.create(cameraCOMP, 'cam_top')
cam_b.par.ty = 6; cam_b.par.rx = -90
# Render each via separate Render TOPs
render_a = root.create(renderTOP, 'render_main')
render_a.par.camera = cam_a.path
render_a.par.geometry = geo.path
render_b = root.create(renderTOP, 'render_top')
render_b.par.camera = cam_b.path
render_b.par.geometry = geo.path
```
Composite both with a `multiplyTOP`/`compositeTOP` for picture-in-picture, or route to separate `windowCOMP`s for multi-display.
### Camera animation
Drive camera params via expressions (orbit), animationCOMP (waypoint), or LFO (oscillation):
```python
# Orbiting camera
cam_a.par.tx.mode = ParMode.EXPRESSION
cam_a.par.tx.expr = "cos(absTime.seconds * 0.3) * 6"
cam_a.par.tz.mode = ParMode.EXPRESSION
cam_a.par.tz.expr = "sin(absTime.seconds * 0.3) * 6"
cam_a.par.lookat = '/project1/scene_geo' # auto-aim at target
```
`par.lookat` is the simplest "always look at target" mechanism.
### Depth of field
PBR + Render TOP supports DOF when `par.dof = 'on'`.
```python
render.par.dof = 'on'
render.par.focusdistance = 5.0
render.par.aperture = 0.05 # blur strength
render.par.bokehshape = 'hexagon'
```
DOF is GPU-heavy. Render at lower res then upscale for performance.
---
## Common Pitfalls
1. **Render TOP shows black** — most common cause: no light. Even with PBR you need at least one `lightCOMP` or `envlightCOMP`. Add an `ambientlightCOMP` at low dimmer as a safety net.
2. **Material doesn't appear**`geo.par.material` must be a string PATH, not the material op itself. Use `mat.path`, not `mat`.
3. **Lights ignored** — by default Render TOP picks up ALL `lightCOMP`s in the network. If you have leftover lights from another scene, they leak in. Set `par.lights` explicitly.
4. **PBR looks flat** — without an `envlightCOMP` providing reflections, PBR materials look like Phong. Add one even if you don't have an HDR (use a `constantTOP` cubemap as fallback).
5. **Shadow acne / striping** — increase `par.shadowbias` slightly. Tune per-light.
6. **Camera inside geometry** — if `cam.par.tz` is INSIDE a sphere, you see the inside (or nothing if backface culled). Move the camera further out.
7. **Light range too small** — point lights have implicit attenuation. Far-away geometry receives little light. Increase `par.dimmer` or move lights closer.
8. **Multiple cameras conflict** — one render TOP = one camera. Don't try to share. Use multiple render TOPs.
9. **Wrong handedness** — TD is right-handed Y-up. Imported assets from Z-up apps (Blender, Maya in Z-up) need a 90° X rotation on the geo COMP.
10. **Cooking budget** — PBR + IBL + shadows + DOF at 1080p60 is fine on modern GPUs but 4K + 4 lights + soft shadows + DOF will tank. Profile via `td_get_perf` and downgrade settings before adding more.
---
## Quick Recipes
| Goal | Recipe |
|---|---|
| Studio portrait | 3-point rig (key + fill + rim) + ambient + PBR mat + DOF |
| Outdoor daylight | One directional `lightCOMP` (sun) + envlight (sky HDR) + soft shadows |
| Dramatic / film noir | Single spot light from upper side, hard shadows, deep ambient = 0.05 |
| Abstract / dreamy | Multiple area lights at low dimmer, no shadows, `bloomTOP` post |
| Product render | Three-point + IBL + neutral PBR + `bgcolorr=g=b=1` (white seamless) |
| Game-style | Phong MAT + 1-2 lights + no IBL + flat ambient (cheap, stylized) |
| Wireframe + solid | Two render TOPs (one with wireframeMAT, one with PBR), composite via `addTOP` |
| Orbiting camera | `par.lookat` + expressions on tx/tz using sin/cos |
@@ -0,0 +1,221 @@
# Animation Reference
Patterns for time-based motion — keyframes, LFOs, timers, easing, expression-driven animation.
Always call `td_get_par_info` for the op type before setting params. Param names below reflect TD 2025.32 but verify if errors fire.
---
## Time Sources
TD has three time references — pick the right one.
| Expression | Behavior | Use for |
|---|---|---|
| `absTime.seconds` | Wall-clock seconds since TD started. Never resets. | Continuous motion, GLSL `uTime`, infinite loops |
| `absTime.frame` | Wall-clock frame count. | Frame-accurate triggers |
| `me.time.frame` | Local component frame count (resets on play/stop). | Per-COMP animation timeline |
| `me.time.seconds` | Local component seconds. | Same, in seconds |
**Rule:** for shaders and continuous motion use `absTime.seconds`. For triggered/looping animations inside a COMP use `me.time.*`.
---
## LFO CHOP — Cyclic Motion
The simplest periodic driver. Fast, GPU-cheap, expression-friendly.
```python
lfo = root.create(lfoCHOP, 'rot_driver')
lfo.par.type = 'sin' # 'sin' | 'cos' | 'ramp' | 'square' | 'triangle' | 'pulse'
lfo.par.frequency = 0.25 # cycles per second
lfo.par.amplitude = 1.0
lfo.par.offset = 0.0
lfo.par.phase = 0.0 # 0-1, useful for offsetting parallel LFOs
```
**Drive a parameter via export:**
```python
op('/project1/geo1').par.rx.mode = ParMode.EXPRESSION
op('/project1/geo1').par.rx.expr = "op('rot_driver')['chan1'] * 360"
```
**Multiple synced LFOs (X/Y/Z rotation with phase offsets):**
Create one LFO with three channels and phase-offset each, or use three LFOs and offset their `phase` params (0.0, 0.33, 0.66).
---
## Timer CHOP — Triggered Sequences
For run-once animations, beat-locked sequences, or stage-based logic.
```python
timer = root.create(timerCHOP, 'fade_timer')
timer.par.length = 4.0 # cycle length in seconds
timer.par.cycle = False # run once vs. loop
timer.par.outputseconds = True
```
Output channels: `timer_fraction` (0→1 across the cycle), `running`, `done`, `cycles`.
**Start the timer:**
```python
timer.par.start.pulse()
```
**Drive a fade:**
```python
op('/project1/level1').par.opacity.mode = ParMode.EXPRESSION
op('/project1/level1').par.opacity.expr = "op('fade_timer')['timer_fraction']"
```
**Easing on the timer fraction** — apply in the expression itself:
```python
# Smoothstep: ease in/out
expr = "smoothstep(0, 1, op('fade_timer')['timer_fraction'])"
# Cubic ease-out: 1 - (1-t)^3
expr = "1 - pow(1 - op('fade_timer')['timer_fraction'], 3)"
```
---
## Pattern CHOP — Custom Curves
For arbitrary waveforms (saw ramps, easing curves, custom envelopes).
```python
pat = root.create(patternCHOP, 'envelope')
pat.par.type = 'gaussian' # 'gaussian' | 'ramp' | 'square' | 'sin' | etc.
pat.par.length = 60 # samples
pat.par.cyclelength = 1.0 # seconds at TD framerate
```
Combine with `lookupCHOP` to remap a 0-1 driver through a custom curve.
---
## Animation COMP — Keyframe-Based
For multi-keyframe motion graphics. Each animationCOMP holds channels with keyframes editable in the Animation Editor.
```python
anim = root.create(animationCOMP, 'intro_anim')
# By default has channels chan1..chanN; access via:
# op('intro_anim').par.length, .par.play, .par.cue, etc.
# Drive a parameter from a channel
op('/project1/text1').par.tx.mode = ParMode.EXPRESSION
op('/project1/text1').par.tx.expr = "op('intro_anim/out1')['chan1']"
```
**Keyframes are typically edited in the UI** (Animation Editor), but can be set via `keyframes` table internally. For programmatic keyframe creation, use `td_execute_python`:
```python
# Get the channel CHOP inside an animationCOMP
ch = op('/project1/intro_anim/chans')
# Insert a key (advanced API — verify with td_get_par_info(op_type='animationCOMP'))
ch.appendKey('chan1', frame=0, value=0.0, expression=None)
ch.appendKey('chan1', frame=120, value=1.0)
```
For most use cases, drive params with LFO/Timer/Pattern CHOPs instead — simpler and scriptable.
---
## Easing in Expressions
TD's expression evaluator supports Python math. Common easing forms:
```python
# Linear
"t"
# Smoothstep (classic ease-in-out)
"smoothstep(0, 1, t)"
# Ease-out cubic
"1 - pow(1 - t, 3)"
# Ease-in cubic
"pow(t, 3)"
# Ease-in-out cubic
"3*t*t - 2*t*t*t"
# Bounce (manual, simplified)
"abs(sin(t * 6.28 * 3) * (1 - t))"
```
Where `t` is `op('fade_timer')['timer_fraction']` or any 0-1 driver.
---
## Filter CHOP — Smoothing Existing Channels
Smooth out jittery values (e.g., audio analysis, sensor data) before driving visuals.
```python
filt = root.create(filterCHOP, 'smooth')
filt.par.filter = 'gaussian' # or 'lowpass'
filt.par.width = 0.5 # smoothing window in seconds
filt.inputConnectors[0].connect(op('raw_signal'))
```
**WARNING:** Do NOT use Filter CHOP on AudioSpectrum output in timeslice mode — it expands the sample count and averages bins to near-zero. See `audio-reactive.md`.
---
## Lag CHOP — Asymmetric Attack/Release
Different speeds for rising vs. falling values. Standard for visualizing audio envelopes.
```python
lag = root.create(lagCHOP, 'env_smooth')
lag.par.lag1 = 0.02 # attack (rise time, seconds)
lag.par.lag2 = 0.30 # release (fall time, seconds)
lag.inputConnectors[0].connect(op('raw_envelope'))
```
Fast attack, slow release = classic VU-meter feel.
---
## Per-Frame Driving via Script DAT
For complex per-frame logic that doesn't fit expressions, use a `executeDAT` (`onFrameStart` callback) or a `chopExecuteDAT`.
```python
# In an executeDAT (frameStart):
def onFrameStart(frame):
t = absTime.seconds
op('/project1/circle').par.tx = math.sin(t * 2.0) * 3.0
op('/project1/circle').par.ty = math.cos(t * 2.0) * 3.0
return
```
Heavy logic should still be in CHOPs (CPU-cheap, deterministic). Reserve scripts for one-shots or non-realtime branching.
---
## Pitfalls
1. **Frame rate dependency**`me.time.frame` is in TD project frames (default 60). If your project rate changes, motion speed changes. Use `seconds` for rate-independent timing.
2. **Cooking budget** — every CHOP that drives a parameter cooks every frame. Consolidate drivers (one big mathCHOP > many small ones).
3. **Expression mode** — params default to `CONSTANT`. `par.X.expr = ...` is ignored unless `par.X.mode = ParMode.EXPRESSION`.
4. **Animation editor edits** — keyframes set via UI live in the animationCOMP's internal keyframe table. They survive save/reopen. Programmatic keys via `appendKey()` work but verify the API with `td_get_docs(topic='animation')` first.
5. **Looping animations** — for seamless loops, `length` must equal `cyclelength` and the start/end values must match. Otherwise expect a visible jump.
---
## Quick Recipes
| Goal | Simplest path |
|---|---|
| Continuous rotation | LFO CHOP `type='ramp'`, expr → `geo.par.rx` |
| Fade in over 2s | Timer CHOP `length=2`, smoothstep expr → `level.par.opacity` |
| Pulse on every beat | `triggerCHOP` from audio → drive scale via expression |
| 3D Lissajous orbit | Two LFOs with different freq, drive `tx`/`ty`/`tz` |
| Random jitter | `noiseCHOP` (low-freq) added to position |
| Timed scene switch | Timer CHOP → switchTOP/CHOP `index` |
@@ -0,0 +1,352 @@
# DAT-Based Scripting Reference
TD's event/callback model — Python that runs in response to network events. The full set of "Execute DATs" plus their idiomatic patterns.
For arbitrary Python execution (not callback-based), see `python-api.md`. For the MCP's `td_execute_python` tool, see `mcp-tools.md`.
---
## The Execute DAT Family
Every type watches one kind of event source and fires Python on changes.
| DAT | Watches | Use for |
|---|---|---|
| `chopExecuteDAT` | A CHOP's channel values | Audio triggers, threshold callbacks, state machines on numeric input |
| `datExecuteDAT` | A DAT's content (table cells, text) | Reacting to data updates from APIs, parsing webDAT responses |
| `parameterExecuteDAT` | A parameter's value or pulse | Reacting to user-changed params, custom pulse buttons |
| `panelExecuteDAT` | A panel COMP's interaction | Button clicks, slider drags, field commits |
| `opExecuteDAT` | Operator lifecycle | New operator created, deleted, name changed |
| `executeDAT` | Project lifecycle, frame events | Run-once setup, per-frame logic, save/load hooks |
All have a docked DAT with predefined callback functions. You only fill in the bodies of the ones you care about.
---
## chopExecuteDAT — Numeric Triggers
```python
ce = root.create(chopExecuteDAT, 'kick_handler')
ce.par.chop = '/project1/audio/out_kick' # source CHOP
ce.par.offtoon = True # fire when channel rises above 0
ce.par.ontooff = False
ce.par.whileon = False
ce.par.valuechange = False
```
In the docked callback DAT:
```python
def offToOn(channel, sampleIndex, val, prev):
"""Channel went from 0 to non-zero. Classic beat trigger."""
op('/project1/strobe').par.flash.pulse()
op('/project1/scene').par.index = (op('/project1/scene').par.index + 1) % 8
return
def onToOff(channel, sampleIndex, val, prev):
"""Channel went from non-zero to 0."""
return
def whileOn(channel, sampleIndex, val, prev):
"""Fires every frame while channel is non-zero. Use sparingly."""
return
def valueChange(channel, sampleIndex, val, prev):
"""Fires every frame the value changes (continuous). Heavy."""
return
```
`channel` is a `Channel` object — `.name`, `.owner`, `.vals[]`. Use `channel.name == 'chan1'` to filter.
**Threshold-based custom triggers:** wire the source CHOP through a `triggerCHOP` first to get clean 0/1 pulses, then watch with `offtoon`.
---
## datExecuteDAT — Table/Text Changes
```python
de = root.create(datExecuteDAT, 'api_response')
de.par.dat = '/project1/api/web1' # source DAT
de.par.tablechange = True # any cell change
de.par.cellchange = False
de.par.rowchange = False
de.par.colchange = False
```
```python
def onTableChange(dat):
"""Whole table changed (including text DAT content updates)."""
if dat.numRows == 0:
return
# If it's a webDAT response, parse JSON
import json
try:
data = json.loads(dat.text)
except json.JSONDecodeError:
debug(f'Bad JSON: {dat.text[:100]}')
return
# Write to a CHOP
op('/project1/api_value').par.value0 = float(data.get('count', 0))
return
def onCellChange(dat, cells, prev):
"""Specific cells changed."""
for cell in cells:
# cell.row, cell.col, cell.val
pass
return
```
`debug()` prints to the textport — readable via `td_read_textport`.
---
## parameterExecuteDAT — Param Changes & Pulse
```python
pe = root.create(parameterExecuteDAT, 'comp_params')
pe.par.op = '/project1/my_component' # COMP whose params to watch
pe.par.parameters = '*' # or specific names like 'Intensity Reset'
pe.par.valuechange = True
pe.par.pulse = True
```
```python
def onValueChange(par, prev):
"""par is a Par object. par.name, par.eval(), par.owner."""
if par.name == 'Intensity':
op('/project1/bloom').par.threshold = par.eval()
return
def onPulse(par):
"""Pulse param was triggered."""
if par.name == 'Reset':
op('/project1/scene').par.index = 0
op('/project1/audio_player').par.cuepoint = 0
op('/project1/audio_player').par.cuepulse.pulse()
return
def onExpressionChange(par, val, prev):
"""User changed the expression on a param."""
return
def onExportChange(par, val, prev):
"""Export source changed."""
return
def onModeChange(par, val, prev):
"""Param mode changed (CONSTANT / EXPRESSION / EXPORT / etc)."""
return
```
---
## panelExecuteDAT — UI Events
For interactive control surfaces. See `panel-ui.md` for the full panel COMP context.
```python
pe = root.create(panelExecuteDAT, 'btn_handler')
pe.par.panel = '/project1/play_btn'
pe.par.click = True # mouse click events
pe.par.value = True # state changes (toggle)
pe.par.lockedchange = False
```
```python
def onOffToOn(panelValue):
"""Panel value rose to 1 (button pressed, slider crossed threshold)."""
op('/project1/scene_timer').par.start.pulse()
return
def onOnToOff(panelValue):
"""Panel value dropped to 0."""
return
def onValueChange(panelValue):
"""Continuous: every frame the value changes."""
val = panelValue.eval()
op('/project1/master').par.opacity = val
return
def onClick(panelValue):
"""Discrete click event, fires once per click."""
return
```
`panelValue` is a `Par` object on the panel COMP.
---
## opExecuteDAT — Operator Lifecycle
Watches creation/deletion/renaming of operators in a parent COMP.
```python
oe = root.create(opExecuteDAT, 'lifecycle')
oe.par.op = '/project1'
oe.par.create = True
oe.par.destroy = True
oe.par.namechange = True
oe.par.flagchange = False
```
```python
def onCreate(opCreated):
"""A new operator was created. Useful for auto-applying conventions."""
if opCreated.OPType == 'glslTOP':
# Always wrap with a null
n = opCreated.parent().create(nullTOP, opCreated.name + '_out')
n.inputConnectors[0].connect(opCreated)
return
def onDestroy(opDestroyed):
"""Operator was deleted. opDestroyed.path is still valid for one frame."""
return
def onNameChange(opChanged):
"""Operator was renamed."""
return
```
Useful for dev-time scaffolding (auto-create downstream nullTOPs, auto-name conventions). Disable in production projects to avoid surprise side effects.
---
## executeDAT — Project Lifecycle & Per-Frame
The catch-all. Gets you hooks into project start, save, load, frame-start, frame-end.
```python
exec_dat = root.create(executeDAT, 'lifecycle')
exec_dat.par.start = True
exec_dat.par.create = True
exec_dat.par.framestart = True
exec_dat.par.frameend = False
```
```python
def onStart():
"""Project just started cooking. Run once."""
op('/project1/scene').par.index = 0
debug('Project started')
return
def onCreate():
"""Component was just created (only fires for component executeDATs, not project root)."""
return
def onFrameStart(frame):
"""Per-frame, BEFORE network cooks. Heavy logic here = bottleneck."""
return
def onFrameEnd(frame):
"""Per-frame, AFTER network cooks. Use for capture, recording, post-network logic."""
return
def onPlayStateChange(playing):
"""Project play/pause toggled."""
return
def onProjectPreSave():
"""Right before saving the .toe file."""
return
def onProjectPostSave():
return
```
Heavy per-frame logic in `onFrameStart` is one of the top performance regressions in TD projects. Use CHOPs for per-frame computation, scripts for events.
---
## Pattern: Triggering an Animation Sequence on Beat
```python
# Source: a kick trigger CHOP
# Goal: on each kick, run a 1.5s scale pulse + color flash
# Setup (create once)
animator = root.create(timerCHOP, 'pulse_anim')
animator.par.length = 1.5
animator.par.cycle = False
# Param expressions on visual targets:
op('logo').par.sx.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sx.mode = ParMode.EXPRESSION
op('logo').par.sy.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sy.mode = ParMode.EXPRESSION
# In a chopExecuteDAT watching the kick CHOP:
def offToOn(channel, sampleIndex, val, prev):
op('pulse_anim').par.start.pulse()
return
```
---
## Pattern: Live Editing a CHOP from API Data
```python
# webDAT polls an API every 5 seconds
# datExecuteDAT parses the response and writes to a constantCHOP
def onTableChange(dat):
import json
try:
data = json.loads(dat.text)
except:
return
target = op('/project1/external_state')
target.par.name0 = 'temperature'
target.par.value0 = float(data['temp_c'])
target.par.name1 = 'humidity'
target.par.value1 = float(data['humidity'])
return
```
Visuals just reference `op('external_state')['temperature']` — they update live.
---
## Pattern: Self-Cleaning Network
```python
# An opExecuteDAT watching for orphaned helper ops, deleting them after their parent disappears
def onDestroy(opDestroyed):
parent_name = opDestroyed.name
helper = op(f'/project1/{parent_name}_helper')
if helper:
helper.destroy()
return
```
---
## Pitfalls
1. **Callbacks crash silently** — exceptions print to the textport but don't show up in the UI. Always `td_clear_textport` before debugging, then `td_read_textport` after.
2. **`debug()` vs `print()`** — both write to textport, but `debug()` includes the file/line of the calling DAT. Prefer `debug()` for scripts.
3. **`val` is the new value, `prev` is old** — easy to swap. Always: `def offToOn(channel, sampleIndex, val, prev)`. Check parameter order in TD docs if confused.
4. **`whileOn` and `valueChange` are per-frame** — heavy. Avoid unless absolutely needed. Drive via expressions instead.
5. **Callbacks don't run during cooking-paused state** — if the parent COMP has `allowCooking=False`, callbacks freeze. Useful for "disable me" toggles.
6. **`par` vs `panelValue`** — parameterExecuteDAT gives `par` (a Par object), panelExecuteDAT gives `panelValue` (also a Par-like object). Both have `.name` and `.eval()` but their context differs.
7. **`opExecuteDAT` fires for itself** — when you create an opExecuteDAT, it can fire `onCreate` for itself if `par.create=True` and parent matches. Filter by `if opCreated == me: return`.
8. **Reload behavior** — when reloading an extension (`td_reinit_extension`), all callback DATs reset their internal state. Module-level vars are lost. Persist state in tableDATs or the docked DAT itself, not in module globals.
9. **Cooking dependencies** — if a callback writes to an op that's upstream of the callback's source, you get a cooking loop. TD warns about it but doesn't always block. Keep dataflow one-directional.
10. **Active flag** — every Execute DAT has `par.active`. False = silent. Easy to toggle for testing without deleting wiring.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| Beat trigger | `chopExecuteDAT.par.offtoon=True` watching a `triggerCHOP` |
| API response handler | `datExecuteDAT.par.tablechange=True` watching a `webDAT` |
| Custom button → action | `parameterExecuteDAT.par.pulse=True` watching a custom pulse param |
| Slider → continuous param | `panelExecuteDAT.par.value=True` watching a `sliderCOMP` |
| Run-once setup | `executeDAT.par.start=True` with logic in `onStart()` |
| Per-frame metrics | `executeDAT.par.frameend=True` recording values to a CHOP |
| Auto-name new ops | `opExecuteDAT.par.create=True` enforcing naming conventions |
@@ -0,0 +1,322 @@
# External Data Reference
Network and device I/O — HTTP requests, WebSockets, MQTT, Serial, TCP, UDP. For MIDI/OSC specifically see `midi-osc.md`.
Common production needs:
- API polling / webhook ingestion
- Real-time data streams (sensors, market data, chat)
- IoT device control (Arduino, ESP32, smart lights)
- Inter-application messaging
- Hosting a tiny TD-side HTTP server for remote control
---
## Web DAT — HTTP Requests
```python
web = root.create(webDAT, 'api_call')
web.par.url = 'https://api.example.com/v1/status'
web.par.fetchmethod = 'get' # 'get' | 'post' | 'put' | 'delete'
web.par.format = 'auto' # 'auto' | 'text' | 'json'
web.par.timeout = 5.0
```
**Triggering a request:**
`webDAT` does NOT auto-fetch on cook. Trigger explicitly:
```python
web.par.fetch.pulse()
```
Or via expression on a CHOP value-change (chopExecuteDAT — see `dat-scripting.md`).
**Authentication headers:**
Use `webclientDAT` (more flexible) or set `webDAT` headers via the headers DAT:
```python
web_headers = root.create(tableDAT, 'headers')
web_headers.appendRow(['Authorization', 'Bearer YOUR_TOKEN'])
web_headers.appendRow(['Accept', 'application/json'])
web.par.headers = web_headers.path
```
**Parsing JSON response:**
```python
import json
def onTableChange(dat):
response = dat.text # raw response body
data = json.loads(response)
# Update a tableDAT or store in a constantCHOP for downstream use
op('/project1/api_status').par.value0 = data['count']
return
```
Wire this in a `datExecuteDAT` watching the webDAT.
**Polling pattern:**
```python
# timerCHOP fires every N seconds
timer = root.create(timerCHOP, 'poll_timer')
timer.par.length = 5.0
timer.par.cycle = True
# chopExecuteDAT on the timer's 'cycles' channel pulses the webDAT
def offToOn(channel, sampleIndex, val, prev):
op('/project1/api_call').par.fetch.pulse()
return
```
---
## Web Client DAT — More Robust HTTP
`webclientDAT` is the modern replacement for `webDAT` — supports streaming responses, chunked transfer, custom auth.
```python
client = root.create(webclientDAT, 'api')
client.par.method = 'POST'
client.par.url = 'https://api.example.com/events'
client.par.uploadtype = 'json'
client.par.uploaddata = '{"event": "scene_change", "scene": 3}'
client.par.request.pulse()
```
Output goes to its child `webclient1_response` DAT. Use a `datExecuteDAT` to react.
---
## Web Server DAT — TD as HTTP Server
Hosts a tiny HTTP server inside TD. Useful for:
- Status/health endpoints
- Remote control from a phone or another machine
- Webhook receivers from external services
```python
server = root.create(webserverDAT, 'control_server')
server.par.port = 8080
server.par.active = True
# Define handler in the docked callback DAT
```
In the auto-created `webserver1_callbacks` DAT:
```python
def onHTTPRequest(webServerDAT, request, response):
path = request['uri']
if path == '/status':
response['statusCode'] = 200
response['data'] = '{"fps": 60, "scene": "active"}'
elif path == '/scene':
idx = int(request['args'].get('index', 0))
op('/project1/scene_switch').par.index = idx
response['statusCode'] = 200
response['data'] = 'OK'
else:
response['statusCode'] = 404
response['data'] = 'Not Found'
return response
```
Test from terminal: `curl http://localhost:8080/status`.
**Security:** No auth by default. Bind to localhost only or add a token check in the callback. Never expose to the public internet without auth.
---
## WebSocket DAT — Bidirectional Real-Time
For low-latency bidirectional streams (chat, live data feeds, controllers).
### Client
```python
ws = root.create(websocketDAT, 'ws_client')
ws.par.netaddress = 'wss://api.example.com/socket'
ws.par.active = True
```
In the docked callbacks DAT:
```python
def onConnect(dat):
dat.sendText('{"action": "subscribe", "channel": "ticks"}')
return
def onReceiveText(dat, rowIndex, message):
# message is a string; parse JSON, dispatch to ops
import json
data = json.loads(message)
op('/project1/price_chop').par.value0 = data['price']
return
def onDisconnect(dat):
# Optionally schedule a reconnect
return
```
### Server
```python
ws = root.create(websocketDAT, 'ws_server')
ws.par.mode = 'server'
ws.par.port = 9001
ws.par.active = True
```
Same callback structure with an additional `clientID` arg.
---
## MQTT — Pub/Sub for IoT
```python
mqtt = root.create(mqttClientDAT, 'iot')
mqtt.par.brokeraddress = 'broker.hivemq.com'
mqtt.par.brokerport = 1883
mqtt.par.clientid = 'td_install_01'
mqtt.par.connect.pulse()
# Subscribe in callbacks DAT:
def onConnect(dat):
dat.subscribe('home/lights/+', qos=1)
return
def onReceive(dat, topic, payload, qos, retained, dup):
# payload is bytes — decode if JSON
msg = payload.decode('utf-8')
# Dispatch by topic
return
# Publish from anywhere:
op('iot').publish('show/scene', 'sunset', qos=0, retain=False)
```
For Mosquitto / HiveMQ self-hosted brokers use the same setup with `tcp://192.168.x.x` and your local port.
---
## Serial DAT — Arduino, USB Devices
```python
serial = root.create(serialDAT, 'arduino')
serial.par.port = '/dev/cu.usbmodem14101' # macOS — check Arduino IDE
# Windows: 'COM3', 'COM4', etc.
serial.par.baudrate = 115200
serial.par.active = True
```
In callbacks:
```python
def onReceive(dat, rowIndex, line):
# Each newline-terminated line from Arduino arrives here
parts = line.split(',')
op('/project1/sensors').par.value0 = float(parts[0])
op('/project1/sensors').par.value1 = float(parts[1])
return
```
Send to Arduino:
```python
op('arduino').send('LED_ON\n')
```
---
## TCP/IP DAT — Custom Protocols
For talking to non-HTTP servers (game servers, custom protocols, legacy systems).
```python
tcp = root.create(tcpipDAT, 'show_control')
tcp.par.netaddress = '192.168.1.50'
tcp.par.port = 7000
tcp.par.protocol = 'tcp' # 'tcp' | 'udp'
tcp.par.active = True
```
Send / receive via callbacks similar to websocketDAT.
For UDP-only (fire-and-forget, no connection), use `udpoutDAT` + `udpinDAT` — simpler but unreliable across networks.
---
## Common Patterns
### REST API → Visual
```
timerCHOP (5s loop)
→ chopExecuteDAT (pulse webDAT.par.fetch on cycle)
→ webDAT (returns JSON)
→ datExecuteDAT (parse, write to constantCHOP)
→ CHOP drives glsl uniform → visuals
```
### Webhook receiver
```
webserverDAT (port 8080, /webhook endpoint)
→ callback writes to a tableDAT log + triggers a scene change
```
### Real-time stock/crypto ticker
```
websocketDAT (subscribe to feed)
→ onReceiveText callback parses JSON
→ writes to constantCHOP
→ drives bar chart / typography animation
```
### IoT-controlled installation
```
MQTT → callback dispatches by topic
→ /lights/main → constantCHOP drives lighting render
→ /audio/volume → mathCHOP for master fader
```
### Two-way phone control
```
WebSocket server in TD
→ simple HTML page on phone connects, sends slider values
→ callback writes to ops
→ TD pushes status back via dat.sendText() to phone UI
```
---
## Pitfalls
1. **`webDAT` doesn't auto-fetch** — must explicitly pulse `par.fetch`. Easy to forget.
2. **Blocking on slow APIs**`webDAT` runs on the cook thread. A 30s API call freezes TD for 30s. Use `webclientDAT` (async) for anything potentially slow.
3. **WebSocket reconnection** — TD does NOT auto-reconnect on disconnect. Implement backoff in `onDisconnect`.
4. **Serial port permissions on macOS** — TD needs Full Disk Access OR the port needs to be unlocked via `sudo chmod 666 /dev/cu.usbmodem...` per session.
5. **MQTT broker connection state**`mqttClientDAT` may show `connected=true` but messages don't flow if QoS is wrong or topic ACL blocks. Check broker logs.
6. **JSON parse errors crash callbacks silently** — wrap parses in try/except and log to textport. Otherwise the callback just stops firing.
7. **Firewall on Windows** — first time `webserverDAT` binds, Windows pops a firewall dialog. Approve it or the server is unreachable.
8. **CORS**`webserverDAT` doesn't add CORS headers by default. If serving a webapp from a different origin, add `Access-Control-Allow-Origin: *` in the response.
9. **Polling vs push** — polling burns API quota. Always prefer WebSocket / webhook / MQTT for high-frequency data.
10. **Floating-point parsing** — sensor data over Serial often comes as strings. `float()` will crash on `'\n'` or `'NaN'`. Validate before converting.
---
## Quick Recipes
| Goal | Op chain |
|---|---|
| Periodic API fetch | `timerCHOP``chopExecuteDAT` pulses → `webDAT``datExecuteDAT` parses |
| Webhook receiver | `webserverDAT` (port + path), callback writes to ops |
| Real-time stream | `websocketDAT` client → onReceiveText → CHOP/DAT |
| Arduino sensor → visual | `serialDAT` → callback → `constantCHOP` → expression on visual op |
| TD ↔ phone control | `websocketDAT` server + simple HTML page on phone |
| MQTT IoT integration | `mqttClientDAT` subscribe → callback dispatches by topic |
@@ -0,0 +1,211 @@
# MIDI / OSC Reference
External controller input and output — MIDI hardware, TouchOSC mobile UIs, OSC routing across the network.
For audio-driven MIDI patterns (track triggers from spectrum analysis), see also `audio-reactive.md`.
---
## MIDI Input — Hardware Controllers
### Discovery
List connected MIDI devices first. Use a `midiinDAT` to enumerate:
```python
mdat = root.create(midiinDAT, 'mid_devices')
# Read available device names from the DAT after one cook
```
Or via Python directly:
```python
# In td_execute_python
import td
devices = [d for d in op.MIDI.devices] # verify with td_get_docs('midi')
```
Verify the API with `td_get_docs(topic='midi')` since this varies between TD versions.
### MIDI In CHOP
Standard pattern:
```python
midi_in = root.create(midiinCHOP, 'midi_in')
midi_in.par.device = 0 # device index from discovery
midi_in.par.activechan = True
```
Output channels follow the convention `chCcN` and `chCnN`:
- `ch1c74` — channel 1, CC 74
- `ch1n60` — channel 1, note 60 (middle C) — value is velocity 0-127
**Map a CC to a parameter:**
```python
op('/project1/bloom1').par.threshold.mode = ParMode.EXPRESSION
op('/project1/bloom1').par.threshold.expr = "op('midi_in')['ch1c74'][0] / 127.0"
```
**Map a note as a trigger:**
Notes in `midiinCHOP` output velocity while held, 0 when released. Use a `triggerCHOP` to convert a held note into pulses:
```python
trig = root.create(triggerCHOP, 'note_trig')
trig.par.threshold = 1
trig.par.triggeron = 'increase'
trig.inputConnectors[0].connect(op('midi_in'))
# Filter to a single channel via a selectCHOP if desired
```
### MIDI Learn Pattern
Build a reusable learn pattern when you don't know the controller's CC layout in advance:
1. Drop a `midiinCHOP` and `selectCHOP` after it.
2. User wiggles the controller knob.
3. Use `td_read_chop` on the midiinCHOP to identify which channel is non-zero — that's the active CC.
4. Set the `selectCHOP.par.channames` to that channel name.
5. Save the mapping to a `tableDAT` so it persists across sessions.
---
## MIDI Output
```python
midi_out = root.create(midioutCHOP, 'midi_out')
midi_out.par.device = 0
midi_out.par.outputformat = 'continuous' # 'continuous' | 'event'
# Drive an output: send out a CC mapped from any 0-1 source
src = root.create(constantCHOP, 'cc_src')
src.par.name0 = 'ch1c20'
src.par.value0 = 0.5
midi_out.inputConnectors[0].connect(src)
```
For note events specifically, use `event` mode and pulse the value with a `pulseCHOP` or `triggerCHOP`.
---
## OSC Input — Network Control
OSC is the more flexible cousin of MIDI. Used heavily for:
- TouchOSC / Lemur mobile control surfaces
- Show control systems (QLab, Watchout)
- Inter-application sync (Ableton via Max for Live, Resolume, etc.)
### OSC In CHOP
```python
osc_in = root.create(oscinCHOP, 'osc_in')
osc_in.par.port = 7000 # listen on UDP 7000
osc_in.par.localaddress = '' # empty = all interfaces
osc_in.par.queued = False # immediate vs. queued processing
```
Each incoming OSC address becomes a channel. `/scene/1/intensity` becomes a channel named `scene_1_intensity` (TD sanitizes slashes to underscores).
**Common gotcha:** TD only creates the channel after the FIRST message arrives at that address. Send a "hello" message from the controller during setup, or pre-declare channel names manually.
### OSC In DAT (for raw events)
Use a `oscinDAT` when you need full message access (multiple typed args, addresses with brackets/regex).
```python
osc_dat = root.create(oscinDAT, 'osc_events')
osc_dat.par.port = 7001
# Each row: timestamp, address, type tags, args...
```
Drive logic via a `datExecuteDAT` watching the `oscinDAT`:
```python
def onTableChange(dat):
last = dat[dat.numRows - 1, 'message']
parsed = last.val.split()
addr = parsed[0]
args = parsed[1:]
if addr == '/scene/trigger':
op('/project1/scene_switcher').par.index = int(args[0])
return
```
---
## OSC Output — Sending to External Apps
```python
osc_out = root.create(oscoutCHOP, 'osc_out')
osc_out.par.netaddress = '127.0.0.1' # destination IP
osc_out.par.port = 9000
# Channel names become OSC addresses
src = root.create(constantCHOP, 'send')
src.par.name0 = 'scene/intensity' # → /scene/intensity
src.par.value0 = 0.7
osc_out.inputConnectors[0].connect(src)
```
**Channel-to-address mapping:** TD prepends `/` automatically. Use `/` in channel names to nest.
For one-shot string/typed messages, use `oscoutDAT` and call `.sendOSC(address, args)`:
```python
op('osc_out_dat').sendOSC('/scene/trigger', [1, 'fade'])
```
---
## TouchOSC / Mobile UI Pattern
Common setup for live VJ control from a phone/tablet:
1. **Configure TouchOSC layout** — assign each control an OSC address like `/vj/master`, `/vj/scene/1`, etc.
2. **Find your machine's LAN IP** — TouchOSC needs to point at it.
3. **TD listens** on `oscinCHOP.par.port = 8000` (or whichever).
4. **Map channels to params** via expressions:
```python
op('/project1/master_level').par.opacity.mode = ParMode.EXPRESSION
op('/project1/master_level').par.opacity.expr = "op('osc_in')['vj_master']"
```
5. **Send feedback** to the controller via `oscoutCHOP` — useful for syncing state across multiple devices.
---
## Network / Multi-Machine
OSC over LAN works out-of-the-box. For multi-TD-instance sync (e.g., projection cluster):
- One TD acts as **master**, broadcasts `/sync/...` over OSC
- Worker TDs run `oscinCHOP` listening on the same port
- Use UDP **broadcast address** (e.g., `192.168.1.255`) on the master's `oscoutCHOP.par.netaddress` to hit all peers
For reliability over WAN, use `webserverDAT` or `websocketDAT` with an external relay instead — UDP loss is invisible.
---
## Pitfalls
1. **MIDI device indexing** — device `0` is whichever device TD enumerated first. Reorder may shift it. Pin by name when possible.
2. **OSC channel names** — TD doesn't create a channel until the first message lands. New channels invalidate cooked dependents on first arrival, causing a one-frame stutter.
3. **OSC queued mode**`par.queued = True` defers processing to a single per-frame batch. Lower latency but messages arriving same frame collapse to the last value. Off for triggers, on for continuous knobs.
4. **MIDI clock vs. transport**`midiinCHOP` reports clock if available. Use `midisyncCHOP` (if your TD version exposes it) or compute BPM from clock pulses (24 per quarter note).
5. **Latency** — wired MIDI is ~1-3ms. WiFi OSC is 10-30ms with jitter. Use wired for tight beat-locked work.
6. **Port conflicts** — only one process can bind a UDP port on most OS. If `oscinCHOP` shows no traffic, check that another app (Max, Ableton, etc.) isn't already listening on that port.
---
## Quick Recipes
| Goal | Op chain |
|---|---|
| Knob → bloom intensity | `midiinCHOP` → expression on `bloom.par.threshold` |
| Note → scene change | `midiinCHOP``triggerCHOP``selectCHOP` → drive `switchTOP.par.index` |
| Phone slider → master fader | TouchOSC `/master``oscinCHOP` → expression on output `level.par.opacity` |
| TD → Resolume scene trigger | `oscoutCHOP` channel `composition/layers/1/clips/1/connect` → Resolume listening on 7000 |
| Multi-projector sync | Master TD `oscoutCHOP` broadcast → workers `oscinCHOP` |
@@ -0,0 +1,281 @@
# Panel & UI Reference
Interactive control surfaces inside TouchDesigner — buttons, sliders, fields, custom parameter pages, panel callbacks. For HUD overlays (rendered text on visuals) see `layout-compositor.md`.
Use cases:
- VJ control rack (master fader, scene buttons, FX toggles)
- Installation operator console
- Self-contained TOX components with their own parameter UIs
- Phone-style touch interfaces displayed on a tablet
---
## Two Layers of UI
| Layer | What it is | Use for |
|---|---|---|
| **Custom Parameters** | Params on any COMP, edited like built-in TD params | Configurable components, presets, "settings" panels |
| **Panel COMPs** | Visible widgets (button, slider, field) inside a containerCOMP | Interactive control surfaces, real-time UIs |
Combine both: build a containerCOMP with panel widgets that read/write custom parameters on a parent component.
---
## Custom Parameters
Add user-editable params to any COMP. Params persist with the COMP, drive expressions, and survive save/reload.
```python
# Add a custom page to a baseCOMP
comp = op('/project1/my_component')
page = comp.appendCustomPage('Controls')
# Add typed params
page.appendFloat('Intensity', label='Intensity')[0] # returns a Par
page.appendInt('Count', label='Count')[0]
page.appendToggle('Enabled', label='Enabled')[0]
page.appendMenu('Mode', menuNames=['off', 'soft', 'hard'], menuLabels=['Off', 'Soft', 'Hard'])[0]
page.appendStr('Title', label='Title')[0]
page.appendRGB('Color', label='Color') # returns 3 pars
page.appendXY('Offset', label='Offset') # returns 2 pars
page.appendPulse('Reset', label='Reset')[0]
page.appendFile('TextureFile', label='Texture')[0]
```
**Read/write from anywhere:**
```python
val = op('/project1/my_component').par.Intensity.eval()
op('/project1/my_component').par.Intensity = 0.7
```
**Drive other params via expression:**
```python
op('bloom1').par.threshold.mode = ParMode.EXPRESSION
op('bloom1').par.threshold.expr = "op('/project1/my_component').par.Intensity"
```
**Pulse handler (Reset button):**
Use a `parameterExecuteDAT` watching the COMP's pulse params. See `dat-scripting.md`.
---
## Panel COMPs — The Widgets
Each is a COMP that renders as a clickable/draggable widget inside a `containerCOMP`.
| Type | Type Name | Use |
|---|---|---|
| Button | `buttonCOMP` | Click action — momentary or toggle |
| Slider | `sliderCOMP` | Drag to set 0-1 value (1D or 2D) |
| Field | `fieldCOMP` | Text input |
| Container | `containerCOMP` | Layout + visual styling, holds children |
| Select | `selectCOMP` | Reference and display content from another COMP |
| List | `listCOMP` | Scrollable list with row callbacks |
### Button
```python
btn = root.create(buttonCOMP, 'play_btn')
btn.par.w = 120; btn.par.h = 40
btn.par.buttontype = 'momentary' # 'momentary' | 'toggleup' | 'togglepress' | 'radio'
btn.par.bgcolorr = 0.1; btn.par.bgcolorg = 0.1; btn.par.bgcolorb = 0.1
btn.par.text = 'Play'
# Read state
state = btn.panel.state # 1 when active
```
### Slider
```python
sld = root.create(sliderCOMP, 'master_fader')
sld.par.w = 60; sld.par.h = 300
sld.par.style = 'vertical' # 'vertical' | 'horizontal' | 'xy'
sld.par.value0min = 0.0
sld.par.value0max = 1.0
# Drive a parameter via expression (always-on, no callback needed)
op('/project1/master_level').par.opacity.mode = ParMode.EXPRESSION
op('/project1/master_level').par.opacity.expr = "op('master_fader').panel.u"
```
`panel.u` and `panel.v` give the 0-1 normalized values. For 2D sliders both are populated.
### Field (Text Input)
```python
fld = root.create(fieldCOMP, 'scene_name')
fld.par.w = 200; fld.par.h = 30
fld.par.fieldtype = 'string' # 'string' | 'integer' | 'float'
# Read current text
text = fld.panel.field # the text content
```
### List
For scrollable lists with selectable rows, use the docked `list1_callbacks` DAT to handle row interactions. Set up cells via the `list_definition` table DAT.
---
## Container COMP — Layout & Styling
`containerCOMP` is the primary parent for grouping widgets and arranging layouts.
```python
panel = root.create(containerCOMP, 'control_panel')
panel.par.w = 400; panel.par.h = 600
panel.par.bgcolorr = 0.05
panel.par.bgcolorg = 0.05
panel.par.bgcolorb = 0.05
panel.par.bgalpha = 1.0
# Layout child panels in vertical stack
panel.par.align = 'lefttoright' # 'lefttoright' | 'toptobottom' | etc.
```
Children are positioned automatically based on `par.align`. For absolute positioning use `par.align = 'fillresize'` and set each child's `par.x` / `par.y`.
### Layout Strategies
| `par.align` | Behavior |
|---|---|
| `lefttoright` | Children stacked horizontally |
| `toptobottom` | Children stacked vertically |
| `righttoleft` / `bottomtotop` | Reversed stacks |
| `fillresize` | Children sized to fill, manual positioning |
| `top` / `bottom` / `left` / `right` | Fixed positioning |
For complex grids: nest containers — vertical container holding horizontal containers.
---
## Panel Callbacks — Reacting to Events
`panelExecuteDAT` watches a panel and fires Python callbacks on user interaction.
```python
pe = root.create(panelExecuteDAT, 'btn_handler')
pe.par.panel = '/project1/play_btn'
pe.par.click = True # respond to clicks
pe.par.value = True # respond to value changes
```
In its docked DAT:
```python
def onOffToOn(panelValue):
# Click pressed
op('/project1/scene_timer').par.start.pulse()
return
def onOnToOff(panelValue):
# Click released
return
def onValueChange(panelValue):
# Slider drag, field change, etc.
new_val = panelValue.eval()
op('/project1/master').par.opacity = new_val
return
```
For pulse params on custom-parameter pages, use a `parameterExecuteDAT` instead.
---
## Building a Complete VJ Control Panel
End-to-end pattern:
```python
# 1. Top-level container
panel = root.create(containerCOMP, 'vj_control')
panel.par.w = 800; panel.par.h = 200
panel.par.align = 'lefttoright'
# 2. Master fader column
master_col = panel.create(containerCOMP, 'master')
master_col.par.w = 120; master_col.par.h = 200
master_col.par.align = 'toptobottom'
master_label = master_col.create(textTOP, 'lbl')
master_label.par.text = 'MASTER'
master_sld = master_col.create(sliderCOMP, 'fader')
master_sld.par.w = 60; master_sld.par.h = 150
master_sld.par.style = 'vertical'
# 3. Scene buttons row
scene_col = panel.create(containerCOMP, 'scenes')
scene_col.par.w = 400; scene_col.par.h = 200
scene_col.par.align = 'lefttoright'
for i in range(8):
b = scene_col.create(buttonCOMP, f'scene_{i+1}')
b.par.w = 50; b.par.h = 50
b.par.text = str(i+1)
b.par.buttontype = 'radio' # only one active at a time
# 4. FX toggle column
fx_col = panel.create(containerCOMP, 'fx')
fx_col.par.w = 280; fx_col.par.h = 200
fx_col.par.align = 'toptobottom'
for fx in ['Bloom', 'CRT', 'Glitch', 'Strobe']:
t = fx_col.create(buttonCOMP, fx.lower())
t.par.w = 220; t.par.h = 35
t.par.text = fx
t.par.buttontype = 'toggleup'
# 5. Display in a window
win = root.create(windowCOMP, 'control_win')
win.par.winop = panel.path
win.par.winw = 800; win.par.winh = 200
win.par.borders = True
win.par.winopen.pulse()
```
Then wire panel values to ops via expressions or panelExecuteDATs.
---
## Showing the Panel — Window or Embedded
| Approach | When |
|---|---|
| `windowCOMP` pointing at panel | Standalone control surface, separate display |
| Render the containerCOMP via `renderTOP` | Composite UI over visuals (HUD-style) |
| Use a `panelCOMP` directly inside a network editor pane | Designer/dev preview only — panel is fully interactive |
For a touch-screen tablet, use a `windowCOMP` on a second display routed to the tablet's HDMI input.
---
## Pitfalls
1. **Panel won't respond to clicks** — likely `par.disabled = True` or the parent container has `par.disableinputs = True`. Check the panel hierarchy.
2. **Slider value not updating**`panel.u/v` reads the visual position. If you set `par.value0` directly, the visual lags. Use `par.value0` AS the source of truth and let the slider follow.
3. **Custom param won't appear** — must call `appendCustomPage` first, then append params. Pages with no params don't show.
4. **Custom param disappears on reload** — params added via Python at runtime persist only if the COMP is saved AFTER. Use a `tox` save (`comp.save('mycomp.tox')`) or commit via `td_execute_python` then save the project.
5. **Event callback fires twice** — both `onOffToOn` and `onValueChange` may fire on a single button press. Pick one to handle the action; don't double-trigger.
6. **Pulse params need `.pulse()`** — setting `par.X = True` on a pulse param does nothing. Always use `.pulse()`.
7. **Field text doesn't commit until Tab/Enter** — fields don't fire callbacks while typing. Use `par.committemode = 'all'` to fire on every keystroke (heavy).
8. **`par.text` vs panel content** — `buttonCOMP.par.text` is the LABEL on the button. The button's STATE is `panel.state` (0/1). Don't confuse them.
9. **Touch input on macOS** — multi-touch via direct touch panels works but TD's gesture handling is rudimentary. For complex multi-touch (pinch/rotate), use TouchOSC on a tablet instead.
10. **Layout doesn't update** — changing `par.align` requires the container to re-cook. Touch a child or pulse the container to trigger.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| Master fader | `sliderCOMP` (vertical) → expression on `level.par.opacity` |
| Scene picker | 8 `buttonCOMP` (radio) → `selectCHOP` on their state → drive `switchTOP.par.index` |
| FX toggle | `buttonCOMP` (toggleup) → expression on `bypass` of an FX op |
| Numeric input | `fieldCOMP` (float) → expression on target par |
| Component settings | Custom params on the component COMP, panel widgets inside drive them |
| Touch tablet UI | `containerCOMP` with widgets → `windowCOMP` to second display |
| Status display | `textTOP` rendered into the panel via `selectCOMP` |
@@ -0,0 +1,245 @@
# Particles Reference
Particle systems in TouchDesigner — modern POPs (Particle Operators) and the legacy particleSOP path.
For instancing static geometry (without per-instance lifetime/velocity), see `geometry-comp.md`. For GLSL-driven feedback simulations (no particle abstraction), see `operator-tips.md` (Feedback TOP section).
Always call `td_get_par_info` for the op type before setting params. Param names below reflect TD 2025.32 — verify before relying on them.
---
## Two Paths: POPs vs. SOPs
| | **POP family** (modern) | **particleSOP** (legacy) |
|---|---|---|
| GPU? | Yes (compute) | No (CPU) |
| Particle count | 100k+ comfortably | ~5k before slowdown |
| API style | Source / Force / Solver / Render chain | Single op with many params |
| Use for | New projects, anything intensive | Quick demos, low counts, TD < 2023 |
**Default to POPs.** Only fall back to particleSOP if a POP variant of an op you need doesn't exist.
---
## POP Pipeline Overview
A POP system is a chain of operators inside a `geometryCOMP`:
```
popSourceTOP / popSourceSOP ← spawn new particles
popForceTOP (gravity, wind, etc.)
popForceTOP (attractor, vortex, ...)
popDeleteTOP (lifetime, bounds)
popSolverTOP ← integrates velocity, updates positions
[render via geometryCOMP / glslMAT instancing]
```
POP buffers carry standard channels: `P` (position), `v` (velocity), `life`, `id`, `Cd` (color), plus any custom channels you add.
---
## Minimal POP Setup
```python
# Create a geometry COMP to hold the POP network
geo = root.create(geometryCOMP, 'particles_geo')
# 1. Source — emit particles from a point
src = geo.create(popSourceTOP, 'src')
src.par.birthrate = 500 # per second
src.par.life = 4.0 # seconds
# 2. Gravity force
grav = geo.create(popForceTOP, 'gravity')
grav.par.forcetype = 'gravity'
grav.par.fy = -9.8
# 3. Lifetime cleanup
delp = geo.create(popDeleteTOP, 'cull')
delp.par.condition = 'lifeleq' # delete when life <= 0
delp.par.value = 0
# 4. Solver
solv = geo.create(popSolverTOP, 'solver')
solv.par.timestep = 'frame'
# Wire: source → force → delete → solver
src.outputConnectors[0].connect(grav.inputConnectors[0])
grav.outputConnectors[0].connect(delp.inputConnectors[0])
delp.outputConnectors[0].connect(solv.inputConnectors[0])
```
The `popSolverTOP` output IS the live particle buffer. Render it via `glslMAT` instancing on a small SOP (sphere, point) as the "shape" of each particle.
---
## Common Forces
| Force type | Effect | Common params |
|---|---|---|
| `gravity` | Constant directional pull | `fx`, `fy`, `fz` |
| `wind` | Constant velocity addition | `wx`, `wy`, `wz` |
| `drag` | Velocity damping over time | `dragstrength` |
| `noise` | Curl-noise turbulence | `noiseamp`, `noisefreq`, `noiseseed` |
| `attractor` | Pull toward a point | `position`, `strength`, `falloff` |
| `vortex` | Swirl around an axis | `axis`, `strength` |
| `point` (custom) | GLSL-evaluated arbitrary force | via `popforceadvancedTOP` |
Stack multiple `popForceTOP`s in series — each modifies velocity additively.
---
## Lifecycle Patterns
### Continuous emission (e.g. smoke plume)
```python
src.par.birthrate = 800
src.par.life = 6.0 # variance via 'lifevariance'
src.par.lifevariance = 1.5
```
### Burst emission (e.g. explosion)
```python
src.par.birthrate = 0 # no continuous emission
src.par.burst.pulse() # one burst on demand (verify param name)
src.par.burstcount = 5000
src.par.life = 1.5
```
### Beat-triggered burst
Wire a `triggerCHOP` (from audio or MIDI) to pulse the burst:
```python
op('/project1/audio_kick_trigger').outputConnectors[0].connect(...)
# Then via a chopExecuteDAT, on each kick:
def offToOn(channel, sampleIndex, val, prev):
op('/project1/particles_geo/src').par.burst.pulse()
return
```
---
## Rendering Particles
### Point Sprites (simplest)
```python
# Inside the geometryCOMP, render the solver output directly
# The geo's first SOP child becomes the geometry
# But for POPs, we typically render via glslMAT on a small "shape"
# Simple billboard sphere per particle:
shape = geo.create(sphereSOP, 'shape')
shape.par.rad = 0.05
shape.par.rows = 6; shape.par.cols = 6 # low-poly to keep it fast
# Material that uses POP buffer for instancing
mat = root.create(glslMAT, 'particle_mat')
# Configure mat.par.instancingTOP = solver output (verify param name)
```
The exact instancing setup varies by TD version — call `td_get_hints(topic='popInstancing')` (or `popRender` / `instancing` — try a few).
### GPU Sprites via glslcopyPOP
For dense smoke/fire-like effects, use a `glslcopyPOP` that writes per-particle color/size from a compute shader, then render as point sprites with additive blending in a `renderTOP`.
---
## Collisions
```python
# Collision detection against an SOP
coll = geo.create(popCollideTOP, 'ground_coll')
coll.par.collidewithsop = '/project1/ground_geo' # path to colliding SOP
coll.par.bounce = 0.3
coll.par.friction = 0.1
# Insert between force and solver
```
For plane/box collisions only, use `popPlaneCollideTOP` (cheaper).
---
## Custom Per-Particle Data
Add a custom channel via `popAttribCreateTOP` (or by writing through `glslcopyPOP`):
```python
# Add a "phase" attribute initialized random per-particle, used in render shader
attr = geo.create(popAttribCreateTOP, 'add_phase')
attr.par.attribname = 'phase'
attr.par.value0 = 'rand(@id)' # expression in TD's POP attribute language
```
Then in the render shader, `texture(sTDPOPInputs[0].phase, ...)` (or whichever sampler convention your TD version uses — verify with `td_get_docs(topic='pops')`).
---
## Legacy particleSOP (Use Sparingly)
For quick demos or low-count systems:
```python
# Inside a geo
psrc = geo.create(addSOP, 'point_src') # source: a single point
psrc.par.points = '0 0 0'
part = geo.create(particleSOP, 'particles')
part.par.life = 3.0
part.par.birthrate = 100
part.par.gravityy = -9.8
part.par.windx = 0.5
part.inputConnectors[0].connect(psrc)
```
CPU-bound. Beyond ~5,000 active particles you'll see frame drops.
---
## Pitfalls
1. **Particles don't appear** — usually a render-side issue. Check via `td_get_screenshot` on the solver output (renders the buffer as a TOP-like view in newer TD). Then check the `geometryCOMP`'s render path.
2. **Burst won't fire** — verify the `burst` param is a pulse, not a toggle. Pulses must use `.pulse()`, not `= True`.
3. **Particles teleport on first frame** — uninitialized velocity. Set `popSourceTOP.par.initialvelocityX/Y/Z` or zero them explicitly.
4. **Gravity feels wrong** — TD's "1 unit" depends on your scene scale. Start with `fy = -1.0` and scale up rather than using real-world 9.8.
5. **High birthrate = stuttering** — birthrate is per-second, not per-frame. At 60fps, `birthrate = 6000` is 100/frame which is fine; `birthrate = 600000` will tank.
6. **POP solver order matters** — forces apply in the order they appear in the chain. Putting gravity AFTER drag dampens gravity itself; usually not what you want.
7. **Instancing param name varies**`mat.par.instancingTOP` vs. `mat.par.instanceop` vs. `mat.par.instances` differs across TD versions. Always check `td_get_par_info(op_type='glslMAT')`.
8. **Cooking dependency loops** — POP solvers create implicit time-loops. The "cook dependency loop" warning is expected and harmless for POPs.
9. **CHOP-driven force values** — when a force param is expression-bound to a CHOP (e.g., audio-reactive gravity), make sure the CHOP cooks before the solver. If not, force lags by one frame.
---
## Performance Targets
| Particle count | Setup | Frame budget @ 60fps |
|---|---|---|
| < 1k | particleSOP fine | trivial |
| 1k - 10k | POPs, simple forces | ~2-5ms |
| 10k - 100k | POPs, GPU-only forces | ~5-15ms |
| 100k+ | `glslcopyPOP`, custom compute | ~10-25ms |
| 1M+ | Custom GPU buffer, no POP framework | depends on shader |
Use `td_get_perf` to find which op in the POP chain is the bottleneck.
---
## Quick Recipes
| Goal | Pipeline |
|---|---|
| Smoke plume | `popSourceTOP` (point) → gravity + wind + noise → `popDeleteTOP` (life) → solver → glslMAT instancing |
| Beat-triggered burst | `triggerCHOP` (audio) → chopExecuteDAT pulses `popSourceTOP.par.burst` |
| Fireworks shell | Burst at point → drag + gravity → secondary burst on lifetime threshold |
| Snow/rain | Continuous emission across XZ plane (high y), gravity + small wind, infinite life box-deleted |
| Sparks | Burst, very short life (0.3s), bright additive render, motion blur via feedback |
| Audio particles | Birthrate driven by audio envelope, color driven by frequency band |
@@ -0,0 +1,211 @@
# Projection Mapping Reference
Multi-window output, surface mapping, edge blending, and projector calibration patterns for installation/event work.
For HUD layouts and on-screen panel grids, see `layout-compositor.md`. For wireframe/test-pattern generation, see `operator-tips.md`.
---
## Window COMP — Output to a Display
The `windowCOMP` is how TD pushes pixels to a real display.
```python
win = root.create(windowCOMP, 'output_window')
win.par.winop = '/project1/final_out' # path to the TOP being displayed
win.par.winw = 1920
win.par.winh = 1080
win.par.winoffsetx = 0 # screen-space offset
win.par.winoffsety = 0
win.par.borders = False # no chrome
win.par.alwaysontop = True
win.par.cursor = False # hide cursor in fullscreen
win.par.justify = 'fillaspect' # 'fill' | 'fitaspect' | 'fillaspect' | 'native'
win.par.winopen.pulse() # OPEN the window
```
To target a specific physical display, set `par.location`:
```python
win.par.location = 'secondary' # 'primary' | 'secondary' | 'monitor1' | 'monitor2' | ...
```
Or set absolute coordinates using `winoffsetx/y` matched to your OS display layout.
**Always pulse `winopen` — setting params alone doesn't open the window.**
---
## Multi-Window Output
For multi-projector or multi-display setups, create one `windowCOMP` per output, each pointing at a different TOP.
```python
for i, screen_top in enumerate(['out_left', 'out_center', 'out_right']):
w = root.create(windowCOMP, f'win_{i}')
w.par.winop = f'/project1/{screen_top}'
w.par.winw = 1920; w.par.winh = 1080
w.par.winoffsetx = i * 1920
w.par.winoffsety = 0
w.par.borders = False
w.par.alwaysontop = True
w.par.cursor = False
w.par.winopen.pulse()
```
For ultra-wide single-output spans, use ONE windowCOMP at e.g. 5760×1080 spanning three projectors via the GPU's mosaic/spanning mode (Nvidia Mosaic, AMD Eyefinity), then split content via `cropTOP` per screen inside TD.
---
## 4-Point Corner Pin (Quad Warp)
The simplest projection mapping primitive — warping a rectangle onto a quadrilateral.
```python
# Source content
src = op('/project1/scene_out')
# Manual: cornerPinTOP (TD has this built-in)
cp = root.create(cornerPinTOP, 'corner_pin')
cp.par.tlx = 0.05; cp.par.tly = 0.10 # top-left (normalized 0-1)
cp.par.trx = 0.95; cp.par.try = 0.08 # top-right
cp.par.brx = 0.93; cp.par.bry = 0.92 # bottom-right
cp.par.blx = 0.07; cp.par.bly = 0.94 # bottom-left
cp.inputConnectors[0].connect(src)
```
Alternative: use a `geometryCOMP` with a `gridSOP` and bend the verts in vertex GLSL. More flexible (curved surfaces) but more setup.
Verify TD 2025.32 param names with `td_get_par_info(op_type='cornerPinTOP')`.
---
## Bezier / Mesh Warp (Curved Surfaces)
For non-flat surfaces (domes, columns, curved walls), use a subdivided mesh and per-vertex displacement.
### Pattern: Grid Mesh + GLSL Displacement
```python
# Subdivided grid in a geo
geo = root.create(geometryCOMP, 'warp_geo')
grid = geo.create(gridSOP, 'warp_grid')
grid.par.rows = 32 # higher = smoother curve
grid.par.cols = 32
grid.par.sizex = 2; grid.par.sizey = 2
# Texture the source onto it
mat = root.create(constMAT, 'warp_mat') # use constMAT for unlit projection
mat.par.maptop = '/project1/scene_out' # source TOP
geo.par.material = mat.path
# Render to a TOP that goes to the projector window
cam = root.create(cameraCOMP, 'cam_proj')
cam.par.tz = 4
render = root.create(renderTOP, 'projection_out')
render.par.camera = cam.path
render.par.geometry = geo.path
render.par.outputresolution = 'custom'
render.par.resolutionw = 1920; render.par.resolutionh = 1080
```
For per-vertex offsets, write a vertex GLSL on the constMAT (or use `glslMAT`) and read displacement values from a CHOP via uniform.
Calibration is iterative: render a checkerboard from `scene_out`, project it, photograph the projection, manually nudge corner/grid points until aligned.
---
## Edge Blending (Multi-Projector Overlap)
When two projectors overlap, the overlap region is twice as bright. Blend by ramping each projector's edge alpha to 0 across the overlap zone.
### GLSL Edge Blend Shader
Per-projector output pass that fades the inside edge to black:
```glsl
// edge_blend_pixel.glsl
out vec4 fragColor;
uniform float uBlendLeft; // overlap width on left edge (0-0.5, 0=no blend)
uniform float uBlendRight;
uniform float uGamma; // typically 2.2 — perceptual ramp
void main() {
vec2 uv = vUV.st;
vec4 col = texture(sTD2DInputs[0], uv);
float aL = (uBlendLeft > 0.0) ? smoothstep(0.0, uBlendLeft, uv.x) : 1.0;
float aR = (uBlendRight > 0.0) ? smoothstep(0.0, uBlendRight, 1.0 - uv.x) : 1.0;
float a = pow(aL * aR, uGamma);
fragColor = TDOutputSwizzle(vec4(col.rgb * a, 1.0));
}
```
Apply this to each overlap-touching projector's output. Tune `uBlendLeft` / `uBlendRight` to match your physical overlap.
For top/bottom blends or cylindrical setups, extend the shader with `uBlendTop` / `uBlendBottom`.
---
## Calibration Patterns
Useful test patterns for aligning projectors. Build a `switchTOP` selecting one of these, route to all projector windows during setup.
```python
# Solid white — for brightness/uniformity check
white = root.create(constantTOP, 'cal_white')
white.par.colorr = 1.0; white.par.colorg = 1.0; white.par.colorb = 1.0
# Centered crosshair — for keystone alignment
gridcross = root.create(textTOP, 'cal_cross')
gridcross.par.text = '+'
gridcross.par.fontsizex = 200
# Fine grid — for warp/mesh alignment (use rampTOP + math + threshold, or build via GLSL)
# Color bars for projector color calibration
bars = root.create(rampTOP, 'cal_bars')
bars.par.type = 'horizontal'
```
Or use the bundled `testpatternTOP` if your TD version includes it.
---
## Projection Audit Workflow
When debugging a multi-screen setup:
1. Render a unique color and label per output (`textTOP` saying "LEFT", "CENTER", "RIGHT").
2. Check that each window is sourcing the correct path: `td_get_operator_info(path='/project1/win_0')`.
3. Verify display assignment: walk to each projector and confirm visually.
4. Check resolution: physical projector native res vs. TD output res — mismatches cause scaling artifacts.
5. Cook flag: `td_get_perf` — if a window's source TOP isn't cooking, the projector shows last frame frozen.
---
## Pitfalls
1. **Window won't open** — you forgot `winopen.pulse()`. Setting params alone doesn't open it.
2. **Wrong display**`par.location='secondary'` depends on OS display order. Set `winoffsetx/y` to absolute coords as a more reliable override.
3. **Cursor visible** — set `par.cursor = False` BEFORE opening, or close+reopen.
4. **Black projection** — usually a cooking issue. Verify `final_out` TOP is cooking via `td_get_perf`. Check `td_get_errors` recursively from `/`.
5. **Tearing / vsync**`windowCOMP` honors `par.vsync`. For projection always set `vsync='vsync'` (default). Tearing means GPU is over-budget — reduce render resolution.
6. **Aspect mismatch** — projector native is often 1920×1200 (16:10) not 1080. Use `justify='fitaspect'` or render at native projector res.
7. **Non-Commercial license** — caps total resolution at 1280×1280. For real installation work you need Commercial. Pro license adds 4K+.
8. **Multiple monitors on macOS**`windowCOMP` honors macOS Spaces. Disable Spaces or pin TD to a specific display in System Settings before showtime.
---
## Quick Recipes
| Goal | Approach |
|---|---|
| Single fullscreen output | One `windowCOMP`, `justify='fillaspect'`, `winopen.pulse()` |
| 3-projector wide span | 3 `windowCOMP` + per-output `cropTOP` from one wide source |
| Single quad surface | `cornerPinTOP``windowCOMP` |
| Curved/dome | Subdivided gridSOP with vertex GLSL → `renderTOP``windowCOMP` |
| Edge blend overlap | GLSL fade shader per projector → `windowCOMP` |
| Calibration mode | `switchTOP` between scene and test patterns, hot-key triggered |
@@ -0,0 +1,198 @@
# Replicator COMP Reference
The `replicatorCOMP` clones a template operator N times, driven by a table of data. The fundamental TD pattern for data-driven networks: button grids, scene rosters, dynamic UI, parameter panels per-channel.
For visual instancing (per-pixel/per-render copies), see `geometry-comp.md`. Replicator builds NETWORK NODES; instancing builds RENDER COPIES. Different layer.
---
## Concept
```
[Template OP] [Data tableDAT]
│ │
└─────→ replicatorCOMP ←───────┘
[N clones], one per data row
Each clone gets per-row params
```
Edit the template once → all clones inherit. Edit the table → clones add/remove dynamically. Push parameter overrides per-row.
---
## Minimal Setup
```python
# 1. Make a template (the thing to clone)
template = root.create(buttonCOMP, 'btn_template')
template.par.w = 80; template.par.h = 80
template.par.text = 'X'
template.par.bgcolorr = 0.2
# 2. Make a data table (one row per clone)
data = root.create(tableDAT, 'scene_data')
data.appendRow(['name', 'color_r', 'color_g', 'color_b'])
data.appendRow(['Sunset', 1.0, 0.4, 0.0])
data.appendRow(['Midnight', 0.0, 0.1, 0.4])
data.appendRow(['Storm', 0.3, 0.3, 0.5])
data.appendRow(['Forest', 0.0, 0.5, 0.2])
# 3. Replicator — points at template + data
rep = root.create(replicatorCOMP, 'scene_buttons')
rep.par.template = template.path
rep.par.opfromdat = data.path
rep.par.namefromdatname = 'name' # use 'name' column for clone names
rep.par.incrementalnumbering = False
```
After cooking, the replicator creates 4 child COMPs named `Sunset`, `Midnight`, `Storm`, `Forest` (one per non-header row), each cloned from `btn_template`.
---
## Per-Row Parameter Overrides
The replicator's docked `replicator1_callbacks` DAT lets you customize each clone:
```python
def onReplicate(comp, allOps, newOps, template, master):
"""Called once per replicate cycle. newOps is the list of just-created clones."""
data = op('scene_data')
for i, clone in enumerate(newOps):
row = i + 1 # +1 to skip header
clone.par.text = data[row, 'name'].val
clone.par.bgcolorr = float(data[row, 'color_r'].val)
clone.par.bgcolorg = float(data[row, 'color_g'].val)
clone.par.bgcolorb = float(data[row, 'color_b'].val)
return
```
Or use parameter expressions referencing `digits` (the per-clone index, available as a built-in expression token inside the cloned subtree):
```python
# Inside the template, set a param expression like:
# par.value0.expr = "op('../scene_data')[me.digits + 1, 'value']"
```
`me.digits` resolves to the row index of the current clone. This is the cleanest way for static reference patterns — no callback needed.
---
## Layout: Buttons in a Grid
Drop the replicator inside a `containerCOMP` with auto-layout:
```python
panel = root.create(containerCOMP, 'scene_panel')
panel.par.w = 400; panel.par.h = 100
panel.par.align = 'lefttoright'
# Move the replicator inside
rep.parent = panel.path # or create rep as a child of panel directly
```
Each clone is a child of the replicator (which itself is a child of the panel). The panel auto-arranges everything.
For a 2D grid, set `par.align = 'fillresize'` on the container and override `par.x` / `par.y` per clone in the callback based on row/col index.
---
## Updating Without Rebuilding
When the data table changes, the replicator regenerates the clones. By default it destroys and recreates everything. To preserve state, set:
```python
rep.par.recreatemissing = True # only add/remove changed rows
rep.par.recreateallonchange = False
```
This pattern is essential for live-edit scenarios (designer adjusts table, network keeps running).
For incremental data ingestion (e.g., from a `webDAT` polling an API), have a `datExecuteDAT` watch the response, parse, write to the data table, and the replicator self-updates.
---
## Common Patterns
### Scene Roster (Data → Buttons + Logic)
```python
# Data per scene: name, file path, audio track, BPM
scene_data.appendRow(['name', 'file', 'audio', 'bpm'])
scene_data.appendRow(['Intro', '/scenes/intro.tox', '/audio/intro.wav', 110])
scene_data.appendRow(['Main', '/scenes/main.tox', '/audio/main.wav', 128])
# Replicator clones a buttonCOMP per scene
# Each button's onClick callback loads the corresponding tox + cues audio
```
### Dynamic Parameter Panel
For a list of audio bands, generate a fader strip per band:
```python
# Data: band names (sub, low, mid, hi-mid, high, air)
# Template: containerCOMP with label + sliderCOMP
# Replicator clones N strips
# Each slider's value is read at /audio_eq/{band_name}/fader
```
### Procedural Visual Network
Build a multi-channel visual network from a config file:
```python
# Data: which TOPs to chain, per "scene"
# Template: a baseCOMP with placeholder children
# Replicator builds one baseCOMP per scene; each scene contains a custom chain
# Switch between scenes via switchTOP.par.index driven by panel
```
### Per-Channel CHOP Display
Visualize each channel of a multi-channel CHOP separately:
```python
# Data table: one row per channel (auto-extracted via choptodatDAT)
# Template: a small chopVis COMP showing one channel
# Replicator generates N visualizers stacked vertically
```
---
## Replicator vs. Pure Python Loop
| Approach | When to use |
|---|---|
| **replicatorCOMP** | The set of clones changes (add/remove rows live). Visual editor expectations. Pattern is reusable across projects. |
| **Python loop** (in `td_execute_python`) | One-shot generation. Static set. Simpler logic, no template overhead. Faster to write. |
If you'll only ever build the network once, prefer a Python loop with `td_execute_python`. The replicator earns its weight when data is live.
---
## Pitfalls
1. **Header row**`tableDAT` rows are 0-indexed. If you have a header, your first data row is index 1. Off-by-one bugs are common in callbacks.
2. **`namefromdatname` column missing** — replicator silently uses `digits` (numeric suffix) names. Buttons end up named `1`, `2`, `3` instead of meaningful names. Set `par.namefromdatname` explicitly.
3. **Template lives in network** — the template OP is itself a real network node. Don't connect things downstream of it directly; connect to the clones (or use a `nullCOMP` between).
4. **Recreate-on-change wipes state** — toggles, slider positions, and uncached data inside clones are lost on each regeneration. Use `recreatemissing` to preserve.
5. **`onReplicate` doesn't fire on edit** — only fires when the clone set changes. Editing a value WITHIN an existing row doesn't re-trigger. Use `parameterExecuteDAT` or expressions for per-cell live updates.
6. **Custom params on clones** — pages added in the template propagate. Pages added in `onReplicate` don't survive the next regeneration. Always add custom pages on the template, not the clone.
7. **Cooking storms** — adding many rows fast triggers many clone events. Bundle adds via Python and call `data.cook(force=True)` once at the end.
8. **`me.digits` outside replicator children** — `me.digits` only resolves inside an op that's a descendant of the replicator. Don't reference it in unrelated networks.
9. **Cross-clone references** — referencing a sibling clone via relative path works from inside a clone (`op('../OtherClone/x')`), but breaks if names change. Prefer absolute paths via the data table.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| 8-button scene picker | `tableDAT` (8 rows) + `buttonCOMP` template + `replicatorCOMP` |
| Per-band EQ strip panel | `tableDAT` (band names) + container template (label + slider) + replicator |
| Data-driven visual scenes | `tableDAT` (scene config) + `baseCOMP` template (visual chain) + replicator |
| Live-updating clone set | Same as above + `par.recreatemissing = True` |
| Per-row colored UI | Data table with color cols, `onReplicate` callback sets per-clone colors |
| List from API response | `webDAT``datExecuteDAT` parses JSON → writes to data table → replicator updates |
+106 -10
View File
@@ -516,26 +516,88 @@ class TestGetTextAuxiliaryClient:
assert isinstance(client, CodexAuxiliaryClient)
assert model == "gpt-5.2-codex"
def test_returns_none_when_nothing_available(self, monkeypatch):
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
client, model = get_text_auxiliary_client()
assert client is None
assert model is None
class TestNousAuxiliaryRefresh:
def test_try_nous_prefers_runtime_credentials(self):
fresh_base = "https://inference-api.nousresearch.com/v1"
def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
with patch("agent.auxiliary_client._resolve_custom_runtime",
return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_text_auxiliary_client()
from agent.auxiliary_client import CodexAuxiliaryClient
assert isinstance(client, CodexAuxiliaryClient)
assert model == "gpt-5.3-codex"
assert mock_openai.call_args.kwargs["base_url"] == "https://api.openai.com/v1"
assert mock_openai.call_args.kwargs["api_key"] == "sk-test"
class TestVisionClientFallback:
"""Vision client auto mode resolves known-good multimodal backends."""
def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
"""Active provider appears in available backends when credentials exist."""
monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "stale-token"}),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value=None),
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
):
backends = get_available_vision_backends()
assert "anthropic" in backends
def test_resolve_provider_client_returns_native_anthropic_wrapper(self, monkeypatch):
monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
):
client, model = resolve_provider_client("anthropic")
assert client is not None
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
assert model == "claude-haiku-4-5-20251001"
class TestAuxiliaryPoolAwareness:
def test_try_nous_uses_pool_entry(self):
class _Entry:
access_token = "pooled-access-token"
agent_key = "pooled-agent-key"
inference_base_url = "https://inference.pool.example/v1"
class _Pool:
def has_credentials(self):
return True
def select(self):
return _Entry()
with (
patch("agent.auxiliary_client.load_pool", return_value=_Pool()),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
from agent.auxiliary_client import _try_nous
mock_openai.return_value = MagicMock()
client, model = _try_nous()
assert client is not None
# No Portal recommendation → falls back to the hardcoded default.
assert model == "google/gemini-3-flash-preview"
assert mock_openai.call_args.kwargs["api_key"] == "fresh-agent-key"
assert mock_openai.call_args.kwargs["base_url"] == fresh_base
assert mock_openai.call_args.kwargs["api_key"] == "pooled-agent-key"
assert mock_openai.call_args.kwargs["base_url"] == "https://inference.pool.example/v1"
def test_try_nous_uses_portal_recommendation_for_text(self):
"""When the Portal recommends a compaction model, _try_nous honors it."""
@@ -643,6 +705,40 @@ class TestNousAuxiliaryRefresh:
assert stale_client.chat.completions.create.await_count == 1
assert fresh_async_client.chat.completions.create.await_count == 1
def test_cached_gmi_client_keeps_explicit_slash_model_override(self):
import agent.auxiliary_client as aux
fake_client = MagicMock()
with patch(
"agent.auxiliary_client.resolve_provider_client",
return_value=(fake_client, "google/gemini-3.1-flash-lite-preview"),
) as mock_resolve:
aux.shutdown_cached_clients()
try:
client, model = aux._get_cached_client(
"gmi",
"google/gemini-3.1-flash-lite-preview",
base_url="https://api.gmi-serving.com/v1",
api_key="gmi-key",
)
assert client is fake_client
assert model == "google/gemini-3.1-flash-lite-preview"
client, model = aux._get_cached_client(
"gmi",
"openai/gpt-5.4-mini",
base_url="https://api.gmi-serving.com/v1",
api_key="gmi-key",
)
finally:
aux.shutdown_cached_clients()
assert client is fake_client
assert model == "openai/gpt-5.4-mini"
assert mock_resolve.call_count == 1
# ── Payment / credit exhaustion fallback ─────────────────────────────────
+105
View File
@@ -0,0 +1,105 @@
"""Tests for the 1M-context beta header on AWS Bedrock Claude models.
Claude Opus 4.6/4.7 and Sonnet 4.6 support a 1M context window, but on AWS
Bedrock (and Azure AI Foundry) that window is still gated behind the
``context-1m-2025-08-07`` beta header as of 2026-04. Without it, Bedrock
caps these models at 200K even though ``model_metadata.py`` advertises 1M.
These tests guard the invariant that the header is always emitted on the
Bedrock client path, and that it survives the MiniMax bearer-auth strip.
"""
from unittest.mock import MagicMock, patch
class TestBedrockContext1MBeta:
"""``context-1m-2025-08-07`` must reach Bedrock Claude requests."""
def test_common_betas_includes_1m(self):
from agent.anthropic_adapter import _COMMON_BETAS, _CONTEXT_1M_BETA
assert _CONTEXT_1M_BETA == "context-1m-2025-08-07"
assert _CONTEXT_1M_BETA in _COMMON_BETAS
def test_common_betas_for_native_anthropic_includes_1m(self):
"""Native Anthropic endpoints (and Bedrock with empty base_url) get 1M."""
from agent.anthropic_adapter import (
_common_betas_for_base_url,
_CONTEXT_1M_BETA,
)
assert _CONTEXT_1M_BETA in _common_betas_for_base_url(None)
assert _CONTEXT_1M_BETA in _common_betas_for_base_url("")
assert _CONTEXT_1M_BETA in _common_betas_for_base_url(
"https://api.anthropic.com"
)
def test_common_betas_strips_1m_for_minimax(self):
"""MiniMax bearer-auth endpoints host their own models — strip 1M beta."""
from agent.anthropic_adapter import (
_common_betas_for_base_url,
_CONTEXT_1M_BETA,
)
for url in (
"https://api.minimax.io/anthropic",
"https://api.minimaxi.com/anthropic",
):
betas = _common_betas_for_base_url(url)
assert _CONTEXT_1M_BETA not in betas, (
f"1M beta must be stripped for MiniMax bearer endpoint {url}"
)
# Other betas still present
assert "interleaved-thinking-2025-05-14" in betas
def test_build_anthropic_bedrock_client_sends_1m_beta(self):
"""AnthropicBedrock client must carry the 1M beta in default_headers.
This is the load-bearing assertion for the reported bug:
without this header Bedrock serves Opus 4.6/4.7 with a 200K cap.
"""
import agent.anthropic_adapter as adapter
fake_sdk = MagicMock()
fake_sdk.AnthropicBedrock = MagicMock()
with patch.object(adapter, "_anthropic_sdk", fake_sdk):
adapter.build_anthropic_bedrock_client(region="us-west-2")
call_kwargs = fake_sdk.AnthropicBedrock.call_args.kwargs
assert call_kwargs["aws_region"] == "us-west-2"
default_headers = call_kwargs.get("default_headers") or {}
beta_header = default_headers.get("anthropic-beta", "")
assert "context-1m-2025-08-07" in beta_header, (
"Bedrock client must send context-1m-2025-08-07 or Opus 4.6/4.7 "
"silently caps at 200K context"
)
# Other common betas still present — no regression.
assert "interleaved-thinking-2025-05-14" in beta_header
assert "fine-grained-tool-streaming-2025-05-14" in beta_header
def test_build_anthropic_kwargs_includes_1m_for_bedrock_fastmode(self):
"""Fast-mode requests (per-request extra_headers) still include 1M beta.
Per-request extra_headers override client-level default_headers, so
the fast-mode path must re-include everything in _COMMON_BETAS.
"""
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "hi"}],
tools=None,
max_tokens=1024,
reasoning_config=None,
is_oauth=False,
# Empty base_url mirrors AnthropicBedrock (no HTTP base URL)
base_url=None,
fast_mode=True,
)
beta_header = kwargs.get("extra_headers", {}).get("anthropic-beta", "")
assert "context-1m-2025-08-07" in beta_header, (
"fast-mode extra_headers must carry the 1M beta or it overrides "
"client-level default_headers and Bedrock drops back to 200K"
)
+292
View File
@@ -242,6 +242,298 @@ class TestSummaryFailureCooldown:
assert mock_call.call_count == 1
class TestSummaryFallbackToMainModel:
"""When ``summary_model`` differs from the main model and the summary LLM
call fails, the compressor should retry once on the main model before
giving up losing N turns of context is almost always worse than one
extra summary attempt. Covers both the fast-path (explicit
model-not-found errors) and the unknown-error best-effort retry."""
def _msgs(self):
return [
{"role": "user", "content": "do something"},
{"role": "assistant", "content": "ok"},
]
def test_model_not_found_404_falls_back_to_main_and_succeeds(self):
"""Classic misconfiguration: ``auxiliary.compression.model`` points at
a model the main provider doesn't serve → 404 → retry on main."""
mock_ok = MagicMock()
mock_ok.choices = [MagicMock()]
mock_ok.choices[0].message.content = "summary via main model"
err_404 = Exception("404 model_not_found: no such model")
err_404.status_code = 404
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="broken-aux-model",
quiet_mode=True,
)
with patch(
"agent.context_compressor.call_llm",
side_effect=[err_404, mock_ok],
) as mock_call:
result = c._generate_summary(self._msgs())
assert mock_call.call_count == 2
# First call used the misconfigured aux model
assert mock_call.call_args_list[0].kwargs.get("model") == "broken-aux-model"
# Second call used the main model (no model kwarg → call_llm uses main)
assert "model" not in mock_call.call_args_list[1].kwargs
assert result is not None
assert "summary via main model" in result
# Aux-model failure is recorded even though retry succeeded — this is
# how callers (gateway /compress, CLI warning) know to tell the user
# their auxiliary.compression.model setting is broken.
assert c._last_aux_model_failure_model == "broken-aux-model"
assert c._last_aux_model_failure_error is not None
assert "404" in c._last_aux_model_failure_error
def test_unknown_error_falls_back_to_main_and_succeeds(self):
"""Errors that don't match the 404/503/model_not_found fast-path
(400s, provider-specific 'no route', aggregator rejections) should
ALSO trigger a best-effort retry on main before entering cooldown."""
mock_ok = MagicMock()
mock_ok.choices = [MagicMock()]
mock_ok.choices[0].message.content = "summary via main model"
# A 400 from OpenRouter / Nous portal with an opaque message — does
# NOT match _is_model_not_found, but still an unrecoverable misconfig.
err_400 = Exception("400 Bad Request: provider rejected model")
err_400.status_code = 400
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="broken-aux-model",
quiet_mode=True,
)
with patch(
"agent.context_compressor.call_llm",
side_effect=[err_400, mock_ok],
) as mock_call:
result = c._generate_summary(self._msgs())
assert mock_call.call_count == 2
assert mock_call.call_args_list[0].kwargs.get("model") == "broken-aux-model"
assert "model" not in mock_call.call_args_list[1].kwargs
assert result is not None
assert "summary via main model" in result
# Aux-model failure recorded despite successful recovery
assert c._last_aux_model_failure_model == "broken-aux-model"
assert c._last_aux_model_failure_error is not None
assert "400" in c._last_aux_model_failure_error
def test_no_fallback_when_summary_model_equals_main_model(self):
"""If the aux model IS the main model, there's nowhere to fall back
to go straight to cooldown, don't loop retrying the same call."""
err = Exception("500 internal error")
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="main-model", # same as main
quiet_mode=True,
)
with patch(
"agent.context_compressor.call_llm",
side_effect=err,
) as mock_call:
result = c._generate_summary(self._msgs())
# Only one attempt — retry gate blocks fallback when models match
assert mock_call.call_count == 1
assert result is None
# Not flagged as fallen back — the retry condition was never met
assert getattr(c, "_summary_model_fallen_back", False) is False
def test_fallback_only_happens_once_per_compressor(self):
"""If the retry-on-main ALSO fails, don't loop forever — enter
cooldown like the normal failure path."""
err1 = Exception("400 aux model rejected")
err2 = Exception("500 main model also exploded")
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="broken-aux-model",
quiet_mode=True,
)
with patch(
"agent.context_compressor.call_llm",
side_effect=[err1, err2],
) as mock_call:
result = c._generate_summary(self._msgs())
# Exactly 2 calls: initial + one retry on main. No further retries.
assert mock_call.call_count == 2
assert result is None
assert c._summary_model_fallen_back is True
class TestAuxModelFallbackSurfacedToCallers:
"""When summary_model fails but retry-on-main succeeds, compress() must
expose the aux-model failure via _last_aux_model_failure_{model,error}
so gateway /compress and CLI callers can warn the user about their
broken auxiliary.compression.model config silent recovery would hide
a misconfiguration only the user can fix."""
def _make_msgs(self):
return [
{"role": "system", "content": "sys"},
{"role": "user", "content": "msg 1"},
{"role": "assistant", "content": "msg 2"},
{"role": "user", "content": "msg 3"},
{"role": "assistant", "content": "msg 4"},
{"role": "user", "content": "msg 5"},
{"role": "assistant", "content": "msg 6"},
{"role": "user", "content": "msg 7"},
]
def test_compress_exposes_aux_failure_fields_after_successful_fallback(self):
mock_ok = MagicMock()
mock_ok.choices = [MagicMock()]
mock_ok.choices[0].message.content = "summary via main"
err_400 = Exception("400 provider rejected configured model")
err_400.status_code = 400
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="broken-aux-model",
quiet_mode=True,
protect_first_n=2,
protect_last_n=2,
)
with patch(
"agent.context_compressor.call_llm",
side_effect=[err_400, mock_ok],
):
result = c.compress(self._make_msgs())
# Recovery succeeded → no fallback placeholder
assert c._last_summary_fallback_used is False
# But aux-model failure IS recorded for the gateway/CLI warning
assert c._last_aux_model_failure_model == "broken-aux-model"
assert c._last_aux_model_failure_error is not None
assert "400" in c._last_aux_model_failure_error
# Result is well-formed with a real summary, not a placeholder
assert any(
isinstance(m.get("content"), str) and "summary via main" in m["content"]
for m in result
)
def test_compress_clears_aux_failure_fields_at_start_of_next_call(self):
"""A subsequent successful compression must clear the aux-failure
fields so the warning doesn't persist forever."""
mock_ok = MagicMock()
mock_ok.choices = [MagicMock()]
mock_ok.choices[0].message.content = "summary via main"
err_400 = Exception("400 aux model busted")
err_400.status_code = 400
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(
model="main-model",
summary_model_override="broken-aux-model",
quiet_mode=True,
protect_first_n=2,
protect_last_n=2,
)
# Call 1: aux fails, retry-on-main succeeds
with patch(
"agent.context_compressor.call_llm",
side_effect=[err_400, mock_ok],
):
c.compress(self._make_msgs())
assert c._last_aux_model_failure_model == "broken-aux-model"
# Call 2: clean run on main (summary_model was cleared to "" after
# first fallback). Aux-failure fields MUST reset at compress() start
# so the old warning state doesn't leak into this call.
with patch(
"agent.context_compressor.call_llm",
return_value=mock_ok,
):
c.compress(self._make_msgs())
assert c._last_aux_model_failure_model is None
assert c._last_aux_model_failure_error is None
class TestSummaryFailureTrackingForGatewayWarning:
"""When summary generation fails, the compressor must record dropped count
+ fallback flag so gateway hygiene & /compress can surface a visible
warning instead of silently dropping context."""
def test_compress_records_fallback_and_dropped_count_on_summary_failure(self):
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
msgs = [
{"role": "system", "content": "sys"},
{"role": "user", "content": "msg 1"},
{"role": "assistant", "content": "msg 2"},
{"role": "user", "content": "msg 3"},
{"role": "assistant", "content": "msg 4"},
{"role": "user", "content": "msg 5"},
{"role": "assistant", "content": "msg 6"},
{"role": "user", "content": "msg 7"},
]
# Simulate summary LLM call failing — covers the 404 / model-not-found
# case from issue (auxiliary compression model misconfigured).
with patch("agent.context_compressor.call_llm", side_effect=Exception("404 model not found")):
result = c.compress(msgs)
assert c._last_summary_fallback_used is True
assert c._last_summary_dropped_count > 0
assert c._last_summary_error is not None
# Result must still be well-formed (fallback summary present).
assert any(
isinstance(m.get("content"), str) and "Summary generation was unavailable" in m["content"]
for m in result
)
def test_compress_clears_fallback_flag_on_subsequent_success(self):
mock_response = MagicMock()
mock_response.choices = [MagicMock()]
mock_response.choices[0].message.content = "summary text"
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
msgs = [
{"role": "system", "content": "sys"},
{"role": "user", "content": "msg 1"},
{"role": "assistant", "content": "msg 2"},
{"role": "user", "content": "msg 3"},
{"role": "assistant", "content": "msg 4"},
{"role": "user", "content": "msg 5"},
{"role": "assistant", "content": "msg 6"},
{"role": "user", "content": "msg 7"},
]
# First call fails, second succeeds — flag must reset on second compress.
with patch("agent.context_compressor.call_llm", side_effect=Exception("boom")):
c.compress(msgs)
assert c._last_summary_fallback_used is True
# Reset cooldown to allow retry on second compress
c._summary_failure_cooldown_until = 0.0
with patch("agent.context_compressor.call_llm", return_value=mock_response):
c.compress(msgs)
assert c._last_summary_fallback_used is False
assert c._last_summary_dropped_count == 0
class TestSummaryPrefixNormalization:
def test_legacy_prefix_is_replaced(self):
summary = ContextCompressor._with_summary_prefix("[CONTEXT SUMMARY]: did work")
@@ -0,0 +1,211 @@
"""Unit tests for StreamingContextScrubber (agent/memory_manager.py).
Regression coverage for #5719 — memory-context spans split across stream
deltas must not leak payload to the UI. The one-shot sanitize_context()
regex can't survive chunk boundaries, so _fire_stream_delta routes deltas
through a stateful scrubber.
"""
from agent.memory_manager import StreamingContextScrubber, sanitize_context
class TestStreamingContextScrubberBasics:
def test_empty_input_returns_empty(self):
s = StreamingContextScrubber()
assert s.feed("") == ""
assert s.flush() == ""
def test_plain_text_passes_through(self):
s = StreamingContextScrubber()
assert s.feed("hello world") == "hello world"
assert s.flush() == ""
def test_complete_block_in_single_delta(self):
"""Regression: the one-shot test case from #13672 must still work."""
s = StreamingContextScrubber()
leaked = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new "
"user input. Treat as informational background data.]\n\n"
"## Honcho Context\nstale memory\n"
"</memory-context>\n\nVisible answer"
)
out = s.feed(leaked) + s.flush()
assert out == "\n\nVisible answer"
def test_open_and_close_in_separate_deltas_strips_payload(self):
"""The real streaming case: tag pair split across deltas."""
s = StreamingContextScrubber()
deltas = [
"Hello ",
"<memory-context>\npayload ",
"more payload\n",
"</memory-context> world",
]
out = "".join(s.feed(d) for d in deltas) + s.flush()
assert out == "Hello world"
assert "payload" not in out
def test_realistic_fragmented_chunks_strip_memory_payload(self):
"""Exact leak scenario from the reviewer's comment — 4 realistic chunks.
This is the case the original #13672 fix silently leaks on: the open
tag, system note, payload, and close tag each arrive in their own
delta because providers emit 1-80 char chunks.
"""
s = StreamingContextScrubber()
deltas = [
"<memory-context>\n[System note: The following",
" is recalled memory context, NOT new user input. "
"Treat as informational background data.]\n\n",
"## Honcho Context\nstale memory\n",
"</memory-context>\n\nVisible answer",
]
out = "".join(s.feed(d) for d in deltas) + s.flush()
assert out == "\n\nVisible answer"
# The system-note line and payload must never reach the UI.
assert "System note" not in out
assert "Honcho Context" not in out
assert "stale memory" not in out
def test_open_tag_split_across_two_deltas(self):
"""The open tag itself arriving in two fragments."""
s = StreamingContextScrubber()
out = (
s.feed("pre <memory")
+ s.feed("-context>leak</memory-context> post")
+ s.flush()
)
assert out == "pre post"
assert "leak" not in out
def test_close_tag_split_across_two_deltas(self):
"""The close tag arriving in two fragments."""
s = StreamingContextScrubber()
out = (
s.feed("pre <memory-context>leak</memory")
+ s.feed("-context> post")
+ s.flush()
)
assert out == "pre post"
assert "leak" not in out
class TestStreamingContextScrubberPartialTagFalsePositives:
def test_partial_open_tag_tail_emitted_on_flush(self):
"""Bare '<mem' at end of stream is not really a memory-context tag."""
s = StreamingContextScrubber()
out = s.feed("hello <mem") + s.feed("ory other") + s.flush()
assert out == "hello <memory other"
def test_partial_tag_released_when_disambiguated(self):
"""A held-back partial tag that turns out to be prose gets released."""
s = StreamingContextScrubber()
# '< ' should not look like the start of any tag.
out = s.feed("price < ") + s.feed("10 dollars") + s.flush()
assert out == "price < 10 dollars"
class TestStreamingContextScrubberUnterminatedSpan:
def test_unterminated_span_drops_payload(self):
"""Provider drops close tag — better to lose output than to leak."""
s = StreamingContextScrubber()
out = s.feed("pre <memory-context>secret never closed") + s.flush()
assert out == "pre "
assert "secret" not in out
def test_reset_clears_hung_span(self):
"""Cross-turn scrubber reset drops a hung span so next turn is clean."""
s = StreamingContextScrubber()
s.feed("pre <memory-context>half")
s.reset()
out = s.feed("clean text") + s.flush()
assert out == "clean text"
class TestStreamingContextScrubberCaseInsensitivity:
def test_uppercase_tags_still_scrubbed(self):
s = StreamingContextScrubber()
out = (
s.feed("<MEMORY-CONTEXT>secret")
+ s.feed("</Memory-Context>visible")
+ s.flush()
)
assert out == "visible"
assert "secret" not in out
class TestSanitizeContextUnchanged:
"""Smoke test that the one-shot sanitize_context still works for whole strings."""
def test_whole_block_still_sanitized(self):
leaked = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new "
"user input. Treat as informational background data.]\n"
"payload\n"
"</memory-context>\nVisible"
)
out = sanitize_context(leaked).strip()
assert out == "Visible"
class TestStreamingContextScrubberCrossTurn:
"""A scrubber instance is reused across turns (per agent). reset() must
clear any held state so a partial-tag tail from turn N doesn't bleed
into turn N+1's first delta."""
def test_reset_clears_held_partial_tag(self):
s = StreamingContextScrubber()
# Feed a partial open-tag prefix that gets held back as buffer.
out_turn_1 = s.feed("answer<memo")
assert out_turn_1 == "answer"
# Reset for next turn — buffer must clear.
s.reset()
# New turn: plain text starting with a "<m" must NOT be treated as
# the continuation of the held "<memo".
out_turn_2 = s.feed("<marker>fresh content")
assert out_turn_2 == "<marker>fresh content"
def test_reset_clears_in_span_state(self):
s = StreamingContextScrubber()
s.feed("text<memory-context>secret-tail")
# Mid-span state held — without reset, subsequent text would be
# discarded until we see </memory-context>.
s.reset()
out = s.feed("post-reset visible text")
assert out == "post-reset visible text"
class TestBuildMemoryContextBlockWarnsOnViolation:
"""Providers must return raw context — not pre-wrapped. When they do,
we strip and warn so the buggy provider surfaces."""
def test_provider_emitting_wrapper_warns(self, caplog):
import logging
from agent.memory_manager import build_memory_context_block
prewrapped = (
"<memory-context>\n"
"[System note: ...]\n\n"
"real fact\n"
"</memory-context>"
)
with caplog.at_level(logging.WARNING, logger="agent.memory_manager"):
out = build_memory_context_block(prewrapped)
assert any("pre-wrapped" in rec.message for rec in caplog.records)
assert out.count("<memory-context>") == 1
assert out.count("</memory-context>") == 1
def test_clean_provider_output_does_not_warn(self, caplog):
import logging
from agent.memory_manager import build_memory_context_block
with caplog.at_level(logging.WARNING, logger="agent.memory_manager"):
out = build_memory_context_block("plain fact about user")
assert not any("pre-wrapped" in rec.message for rec in caplog.records)
assert "plain fact about user" in out
+4
View File
@@ -288,6 +288,10 @@ def _hermetic_environment(tmp_path, monkeypatch):
monkeypatch.setattr(_plugins_mod, "_plugin_manager", None)
except Exception:
pass
# Explicitly clear provider-specific base URL overrides that don't match
# the generic credential-shaped env-var filter above.
monkeypatch.delenv("GMI_API_KEY", raising=False)
monkeypatch.delenv("GMI_BASE_URL", raising=False)
# Backward-compat alias — old tests reference this fixture name. Keep it
@@ -0,0 +1,49 @@
# Matrix cross-signing bootstrap — E2E test
Self-contained end-to-end test for the auto-bootstrap behavior added in
`gateway/platforms/matrix.py`. Spins up a real Continuwuity homeserver
in Docker, registers a fresh bot, runs the patched bootstrap path
against it, and asserts:
1. Cross-signing keys get published with **unpadded** base64 keyids
(the bug this PR fixes — padded keyids are silently rejected by
matrix-rust-sdk in Element).
2. On a second startup with the same crypto store, bootstrap is
skipped.
3. When `MATRIX_RECOVERY_KEY` is set, the existing recovery-key path
takes precedence and no fresh bootstrap happens.
## Run
```bash
# from repo root
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v
```
The `down -v` step removes the persistent volume so the next run gets
a fresh homeserver — important because Continuwuity's one-time admin
registration token is only valid before the first user is created.
## Port
The compose binds Continuwuity to `127.0.0.1:26167` by default. Override
with `HOMESERVER_HOST_PORT=NNNNN docker compose up -d` if that port is
busy locally.
## What the test exercises
The test mirrors the bootstrap snippet from
`gateway/platforms/matrix.py` (the "if MATRIX_RECOVERY_KEY else
get_own_cross_signing_public_keys / generate_recovery_key" branch)
inline so it runs without importing the entire hermes gateway and its
many dependencies. **If the source diverges from what's in
`_connect_with_bootstrap`, this test must be updated to match.** A
small price for not requiring the full hermes-agent runtime in CI.
## Skipped when
- `mautrix` Python package is not installed
- The homeserver isn't reachable at `$E2E_MATRIX_HS` (default
`http://127.0.0.1:26167`)
@@ -0,0 +1,21 @@
services:
homeserver:
image: ghcr.io/continuwuity/continuwuity:latest
environment:
CONTINUWUITY_SERVER_NAME: localhost
CONTINUWUITY_DATABASE_PATH: /var/lib/conduwuit/conduwuit.db
CONTINUWUITY_PORT: "6167"
CONTINUWUITY_ADDRESS: "0.0.0.0"
CONTINUWUITY_ALLOW_REGISTRATION: "true"
CONTINUWUITY_REGISTRATION_TOKEN: testreg
CONTINUWUITY_ALLOW_FEDERATION: "false"
CONTINUWUITY_TRUSTED_SERVERS: "[]"
CONTINUWUITY_LOG: "warn,conduwuit=info"
CONTINUWUITY_ALLOW_CHECK_FOR_UPDATES: "false"
ports:
- "127.0.0.1:${HOMESERVER_HOST_PORT:-26167}:6167"
healthcheck:
test: ["CMD-SHELL", "exec 3<>/dev/tcp/127.0.0.1/6167 && echo -e 'GET /_matrix/client/versions HTTP/1.0\\r\\n\\r\\n' >&3 && head -1 <&3 | grep -q '200 OK' || exit 1"]
interval: 2s
timeout: 3s
retries: 30
@@ -0,0 +1,333 @@
"""End-to-end test for Matrix cross-signing auto-bootstrap.
Spins a real Continuwuity homeserver in docker, registers a fresh bot,
runs the patched ``MatrixAdapter.connect()`` against it, and asserts:
1. cross-signing keys get published with **unpadded** base64 keyids
(the bug this PR fixes padded keyids are silently rejected by
matrix-rust-sdk in Element);
2. on a second startup with the same crypto store, bootstrap is
skipped (``get_own_cross_signing_public_keys`` finds the keys);
3. the bot's current device is signed by the new SSK, so Element
considers the device "verified by its owner".
Self-contained: ``docker compose up -d`` brings up Continuwuity on
127.0.0.1:26167; this script registers a fresh bot using the
homeserver's one-time admin registration token (printed once at first
boot, parsed from the container logs); then drives the gateway code.
Run from repo root::
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v
Skipped automatically if mautrix isn't installed or the homeserver
isn't reachable.
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import re
import secrets
import shutil
import subprocess
import sys
import tempfile
import time
import unittest
import urllib.error
import urllib.request
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[3]
sys.path.insert(0, str(REPO_ROOT))
HS = os.environ.get("E2E_MATRIX_HS", "http://127.0.0.1:26167")
COMPOSE_DIR = Path(__file__).parent
CONTAINER_NAME = "matrix_xsign_bootstrap-homeserver-1"
def _hs_reachable() -> bool:
try:
urllib.request.urlopen(f"{HS}/_matrix/client/versions", timeout=2).read()
return True
except Exception:
return False
def _first_time_token() -> str | None:
"""Continuwuity prints a one-time registration token on first boot.
The configured CONTINUWUITY_REGISTRATION_TOKEN does NOT activate
until an account exists, so we have to pull this token out of the
docker logs to bootstrap the very first user.
"""
try:
out = subprocess.run(
["docker", "logs", CONTAINER_NAME],
capture_output=True, text=True, check=True,
).stdout + subprocess.run(
["docker", "logs", CONTAINER_NAME],
capture_output=True, text=True, check=True,
).stderr
except Exception:
return None
cleaned = re.sub(r"\x1b\[[0-9;]*m", "", out)
m = re.search(r"registration token ([A-Za-z0-9]+)", cleaned)
return m.group(1) if m else None
def _post_json(url: str, body: dict, headers: dict | None = None) -> tuple[int, dict]:
req = urllib.request.Request(
url, data=json.dumps(body).encode(),
headers={"Content-Type": "application/json", **(headers or {})},
method="POST",
)
try:
r = urllib.request.urlopen(req)
return r.status, json.load(r)
except urllib.error.HTTPError as e:
return e.code, json.loads(e.read().decode())
CONFIG_REG_TOKEN = "testreg" # matches docker-compose.yml
def _register_bot(*, prefer_token: str = CONFIG_REG_TOKEN, fallback_token: str | None = None) -> dict:
"""Register a fresh bot. Tries the configured token first; falls back to
the homeserver's one-time admin token (only valid until the first user
is created)."""
user = "bot" + secrets.token_hex(3)
password = secrets.token_urlsafe(20)
last_err = None
for tok in (prefer_token, fallback_token):
if tok is None:
continue
st, b = _post_json(f"{HS}/_matrix/client/v3/register", {})
if st != 401 or "session" not in b:
last_err = (st, b); continue
session = b["session"]
st, b = _post_json(f"{HS}/_matrix/client/v3/register", {
"auth": {"type": "m.login.registration_token", "token": tok, "session": session},
"username": user, "password": password,
"initial_device_display_name": "e2e-bootstrap-test",
})
if st == 200:
return b
last_err = (st, b)
raise AssertionError(f"register failed for both tokens: {last_err}")
def _query_keys(token: str, mxid: str) -> dict:
return _post_json(
f"{HS}/_matrix/client/v3/keys/query",
{"device_keys": {mxid: []}},
headers={"Authorization": f"Bearer {token}"},
)[1]
@unittest.skipUnless(_hs_reachable(), f"homeserver not reachable at {HS}")
class XsignBootstrapE2E(unittest.IsolatedAsyncioTestCase):
"""Drive the patched MatrixAdapter.connect() against real continuwuity."""
@classmethod
def setUpClass(cls):
try:
import mautrix # noqa: F401
except ImportError:
raise unittest.SkipTest("mautrix not installed")
cls.first_tok = _first_time_token()
# If no user has ever been created, the configured `testreg` token
# won't activate yet — burn the one-time admin token first to
# bootstrap the homeserver into a usable state.
if cls.first_tok:
try:
_register_bot(prefer_token=cls.first_tok, fallback_token=None)
except AssertionError:
pass # Already burnt previously; testreg should now work.
async def _connect_with_bootstrap(self, creds: dict, store_dir: Path) -> tuple[list[str], str | None]:
"""Drive matrix.py's bootstrap branch directly.
We import the gateway module and execute the same OlmMachine init +
bootstrap sequence, capturing log lines so we can assert what fired.
Returns (log_lines, recovery_key_or_None).
"""
from mautrix.api import HTTPAPI
from mautrix.client import Client
from mautrix.client.state_store.memory import MemoryStateStore
from mautrix.crypto import OlmMachine, PgCryptoStore
from mautrix.types import TrustState
from mautrix.util.async_db import Database
# The actual bootstrap snippet from gateway/platforms/matrix.py
# (copied so we can run it without importing the full hermes
# gateway and its many deps). If the source code drifts from this,
# the test should be updated to match.
log_lines: list[str] = []
captured_recovery_key: str | None = None
class _Capture(logging.Handler):
def emit(self, record):
log_lines.append(self.format(record))
logger = logging.getLogger("e2e.bootstrap")
logger.setLevel(logging.DEBUG)
handler = _Capture()
handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))
logger.addHandler(handler)
api = HTTPAPI(base_url=creds["homeserver"], token=creds["access_token"])
client = Client(
mxid=creds["user_id"], api=api,
device_id=creds["device_id"], state_store=MemoryStateStore(),
)
client.api.token = creds["access_token"]
store_dir.mkdir(parents=True, exist_ok=True)
db_path = store_dir / "crypto.db"
crypto_db = Database.create(f"sqlite:///{db_path}", upgrade_table=PgCryptoStore.upgrade_table)
await crypto_db.start()
crypto_store = PgCryptoStore(account_id=creds["user_id"], pickle_key="e2e-test", db=crypto_db)
await crypto_store.open()
olm = OlmMachine(client, crypto_store, MemoryStateStore())
olm.share_keys_min_trust = TrustState.UNVERIFIED
olm.send_keys_min_trust = TrustState.UNVERIFIED
await olm.load()
# --- The patched bootstrap block, mirrored from matrix.py ---
recovery_key = os.getenv("MATRIX_RECOVERY_KEY", "").strip()
if recovery_key:
try:
await olm.verify_with_recovery_key(recovery_key)
logger.info("Matrix: cross-signing verified via recovery key")
except Exception as exc:
logger.warning("Matrix: recovery key verification failed: %s", exc)
else:
try:
own_xsign = await olm.get_own_cross_signing_public_keys()
except Exception as exc:
own_xsign = None
logger.warning("Matrix: cross-signing key lookup failed: %s", exc)
if own_xsign is None:
try:
new_recovery_key = await olm.generate_recovery_key()
captured_recovery_key = new_recovery_key
logger.warning(
"Matrix: bootstrapped cross-signing for %s. "
"SAVE THIS RECOVERY KEY: %s",
client.mxid, new_recovery_key,
)
except Exception as exc:
logger.warning("Matrix: cross-signing bootstrap failed: %s", exc)
# --- /end patched block ---
# Clean teardown — without this the asyncio loop never exits.
await crypto_db.stop()
await api.session.close()
return log_lines, captured_recovery_key
async def asyncSetUp(self):
self.creds = _register_bot(prefer_token=CONFIG_REG_TOKEN, fallback_token=self.first_tok)
self.creds["homeserver"] = HS
self.tmp = Path(tempfile.mkdtemp(prefix="e2e-xsign-"))
# mautrix.generate_recovery_key requires account.shared, which means
# we must share device keys (one-time keys) first. Do that via a
# short bootstrap to publish device keys.
await self._publish_device_keys(self.creds, self.tmp)
async def _publish_device_keys(self, creds, store_dir):
"""Tiny helper: open OlmMachine, share device keys, close."""
from mautrix.api import HTTPAPI
from mautrix.client import Client
from mautrix.client.state_store.memory import MemoryStateStore
from mautrix.crypto import OlmMachine, PgCryptoStore
from mautrix.util.async_db import Database
api = HTTPAPI(base_url=creds["homeserver"], token=creds["access_token"])
client = Client(mxid=creds["user_id"], api=api, device_id=creds["device_id"],
state_store=MemoryStateStore())
store_dir.mkdir(parents=True, exist_ok=True)
crypto_db = Database.create(f"sqlite:///{store_dir / 'crypto.db'}",
upgrade_table=PgCryptoStore.upgrade_table)
await crypto_db.start()
crypto_store = PgCryptoStore(account_id=creds["user_id"], pickle_key="e2e-test", db=crypto_db)
await crypto_store.open()
olm = OlmMachine(client, crypto_store, MemoryStateStore())
await olm.load()
await olm.share_keys() # publishes device keys (precondition for generate_recovery_key)
await crypto_db.stop()
await api.session.close()
async def asyncTearDown(self):
shutil.rmtree(self.tmp, ignore_errors=True)
async def test_bootstrap_publishes_unpadded_keys(self):
"""Fresh bot → bootstrap fires, keys published unpadded, device signed."""
log_lines, rec_key = await self._connect_with_bootstrap(self.creds, self.tmp)
# 1. Bootstrap must have produced a recovery key
self.assertIsNotNone(rec_key, "expected recovery key from bootstrap")
self.assertTrue(any("bootstrapped cross-signing" in l for l in log_lines),
f"expected bootstrap log line, got: {log_lines}")
# 2. Homeserver should now serve a master + ssk for the bot
d = _query_keys(self.creds["access_token"], self.creds["user_id"])
self.assertIn(self.creds["user_id"], d.get("master_keys", {}),
"no master_keys after bootstrap")
self.assertIn(self.creds["user_id"], d.get("self_signing_keys", {}),
"no self_signing_keys after bootstrap")
# 3. The keyids must be UNPADDED (this is the bug this PR exists to fix)
master_kid = next(iter(d["master_keys"][self.creds["user_id"]]["keys"]))
ssk_kid = next(iter(d["self_signing_keys"][self.creds["user_id"]]["keys"]))
self.assertFalse(master_kid.endswith("="),
f"master keyid is padded: {master_kid!r}")
self.assertFalse(ssk_kid.endswith("="),
f"ssk keyid is padded: {ssk_kid!r}")
# 4. The current device must be signed by the new SSK
dev = d["device_keys"][self.creds["user_id"]][self.creds["device_id"]]
sig_kids = list(dev["signatures"][self.creds["user_id"]].keys())
self.assertIn(ssk_kid, sig_kids,
f"device {self.creds['device_id']} not signed by new SSK; "
f"signatures: {sig_kids}")
async def test_second_startup_skips_bootstrap(self):
"""Second startup with same crypto store → no second recovery key."""
# First connect bootstraps.
_, rec1 = await self._connect_with_bootstrap(self.creds, self.tmp)
self.assertIsNotNone(rec1, "first connect should have bootstrapped")
# Second connect on same crypto store should NOT re-bootstrap.
log2, rec2 = await self._connect_with_bootstrap(self.creds, self.tmp)
self.assertIsNone(rec2, f"second connect re-bootstrapped! logs: {log2}")
self.assertFalse(any("bootstrapped cross-signing" in l for l in log2),
f"second connect re-bootstrapped! logs: {log2}")
async def test_recovery_key_path_takes_precedence(self):
"""If MATRIX_RECOVERY_KEY is set, no fresh bootstrap happens."""
# First, bootstrap to get a real recovery key.
_, rec_key = await self._connect_with_bootstrap(self.creds, self.tmp)
self.assertIsNotNone(rec_key)
# Fresh store directory + recovery key set in env: must take the
# verify_with_recovery_key path, NOT bootstrap a new identity.
fresh_store = Path(tempfile.mkdtemp(prefix="e2e-xsign-fresh-"))
try:
await self._publish_device_keys(self.creds, fresh_store)
os.environ["MATRIX_RECOVERY_KEY"] = rec_key
try:
log, rec2 = await self._connect_with_bootstrap(self.creds, fresh_store)
self.assertIsNone(rec2, "bootstrap fired despite MATRIX_RECOVERY_KEY being set")
self.assertTrue(
any("verified via recovery key" in l for l in log),
f"expected recovery-key verify log, got: {log}",
)
finally:
del os.environ["MATRIX_RECOVERY_KEY"]
finally:
shutil.rmtree(fresh_store, ignore_errors=True)
if __name__ == "__main__":
unittest.main(verbosity=2)
+120
View File
@@ -123,3 +123,123 @@ async def test_compress_command_explains_when_token_estimate_rises():
assert "denser summaries" in result
agent_instance.shutdown_memory_provider.assert_called_once()
agent_instance.close.assert_called_once()
@pytest.mark.asyncio
async def test_compress_command_appends_warning_when_summary_generation_fails():
"""When the auxiliary summariser fails and the compressor inserts a static
fallback placeholder, /compress must append a visible warning to its
reply. Otherwise the failure is silently logged and the user has no idea
earlier context is unrecoverable."""
history = _make_history()
# Compressed shape is irrelevant for this test — we only care that the
# warning surfaces. Drop one message so the headline is non-noop.
compressed = [
history[0],
{"role": "assistant", "content": "[fallback placeholder]"},
history[-1],
]
runner = _make_runner(history)
agent_instance = MagicMock()
agent_instance.shutdown_memory_provider = MagicMock()
agent_instance.close = MagicMock()
agent_instance.context_compressor.has_content_to_compress.return_value = True
# Simulate summary-generation failure: fallback flag set, dropped count
# populated, error string captured.
agent_instance.context_compressor._last_summary_fallback_used = True
agent_instance.context_compressor._last_summary_dropped_count = 7
agent_instance.context_compressor._last_summary_error = (
"404 model not found: gemini-3-flash-preview"
)
agent_instance.session_id = "sess-1"
agent_instance._compress_context.return_value = (compressed, "")
def _estimate(messages):
if messages == history:
return 100
if messages == compressed:
return 60
raise AssertionError(f"unexpected transcript: {messages!r}")
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "***"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=agent_instance),
patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
):
result = await runner._handle_compress_command(_make_event())
# The compress reply itself still goes through (the transcript was rewritten).
assert "Compressed:" in result
# ...but a clearly-marked warning must be appended.
assert "⚠️" in result
assert "Summary generation failed" in result
# Underlying error must surface so users can fix their config.
assert "404 model not found" in result
# Dropped count must be visible — silently losing N messages is the bug.
assert "7" in result
assert "historical message(s) were removed" in result
agent_instance.shutdown_memory_provider.assert_called_once()
agent_instance.close.assert_called_once()
@pytest.mark.asyncio
async def test_compress_command_surfaces_aux_model_failure_even_when_recovered():
"""When the user's configured ``auxiliary.compression.model`` errors out
but compression recovers by retrying on the main model, /compress must
STILL inform the user. Silent recovery hides broken config the user
needs to fix."""
history = _make_history()
# Compressed transcript — normal successful compression, no placeholder.
compressed = [
history[0],
{"role": "assistant", "content": "summary via main model"},
history[-1],
]
runner = _make_runner(history)
agent_instance = MagicMock()
agent_instance.shutdown_memory_provider = MagicMock()
agent_instance.close = MagicMock()
agent_instance.context_compressor.has_content_to_compress.return_value = True
# Fallback placeholder was NOT used — recovery succeeded.
agent_instance.context_compressor._last_summary_fallback_used = False
agent_instance.context_compressor._last_summary_dropped_count = 0
agent_instance.context_compressor._last_summary_error = None
# But the configured aux model DID fail before the retry succeeded.
agent_instance.context_compressor._last_aux_model_failure_model = (
"gemini-3-flash-preview"
)
agent_instance.context_compressor._last_aux_model_failure_error = (
"404 model not found: gemini-3-flash-preview"
)
agent_instance.session_id = "sess-1"
agent_instance._compress_context.return_value = (compressed, "")
def _estimate(messages):
if messages == history:
return 100
if messages == compressed:
return 60
raise AssertionError(f"unexpected transcript: {messages!r}")
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "***"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=agent_instance),
patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
):
result = await runner._handle_compress_command(_make_event())
# Compression succeeded
assert "Compressed:" in result
# No ⚠️ warning (that's reserved for dropped-turns case)
assert "⚠️" not in result
# But there IS an info note about the broken aux model
assert "" in result
assert "gemini-3-flash-preview" in result
assert "404" in result
assert "auxiliary.compression.model" in result
# The user's context is explicitly called out as intact
assert "intact" in result
agent_instance.shutdown_memory_provider.assert_called_once()
agent_instance.close.assert_called_once()
+200
View File
@@ -0,0 +1,200 @@
"""Tests for BasePlatformAdapter._keep_typing timeout-per-tick behavior.
When the gateway is waiting on a long upstream provider response (e.g.
Anthropic/opus-4.7 first-token latency climbing during an upstream blip),
the model-call socket is blocked on the worker thread but the asyncio loop
is still running, and ``_keep_typing`` refreshes the platform typing
indicator every 2 seconds.
The bug: each ``send_typing`` call is an HTTP round-trip to the platform API
(Telegram/Discord). If the same network instability that's slowing the model
call also makes ``send_typing`` slow (5-30s response time), the refresh loop
stalls inside the ``await self.send_typing(...)`` call. Platform-side typing
expires at ~5s, so the bubble dies and doesn't come back until that stuck
call returns exactly when the user most needs the "yes, still working"
signal.
The fix: bound each ``send_typing`` with ``asyncio.wait_for``. If a
send_typing takes longer than the per-tick budget (default 1.5s when
interval=2.0), abandon it and let the next scheduled tick fire a fresh
call. As long as any one of them succeeds within the ~5s platform window,
the bubble stays visible across provider stalls.
"""
import asyncio
from unittest.mock import MagicMock
import pytest
from gateway.platforms.base import (
BasePlatformAdapter,
Platform,
PlatformConfig,
SendResult,
)
class _StubAdapter(BasePlatformAdapter):
def __init__(self):
super().__init__(PlatformConfig(enabled=True, token="test"), Platform.TELEGRAM)
async def connect(self) -> bool:
return True
async def disconnect(self) -> None:
self._mark_disconnected()
async def send(self, chat_id, content, reply_to=None, metadata=None):
return SendResult(success=True, message_id="m1")
async def get_chat_info(self, chat_id):
return {"id": chat_id, "type": "dm"}
class TestKeepTypingTimeoutPerTick:
@pytest.mark.asyncio
async def test_slow_send_typing_does_not_block_cadence(self, monkeypatch):
"""A send_typing that hangs longer than the per-tick budget must be
abandoned so the next scheduled tick can fire a fresh call."""
adapter = _StubAdapter()
call_events = []
async def slow_send_typing(chat_id, metadata=None):
# Simulate a stuck HTTP round-trip. If _keep_typing awaits this
# unconditionally, the loop stalls for the full duration.
call_events.append("start")
try:
await asyncio.sleep(10)
finally:
call_events.append("finish-or-cancel")
monkeypatch.setattr(adapter, "send_typing", slow_send_typing)
# Avoid stop_typing side-effects in the finally block.
adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
stop_event = asyncio.Event()
# Start the typing loop, let it run ~3s (should fire 2 ticks) then stop.
task = asyncio.create_task(
adapter._keep_typing(
chat_id="123",
interval=1.0,
stop_event=stop_event,
)
)
await asyncio.sleep(3.0)
stop_event.set()
try:
await asyncio.wait_for(task, timeout=2.0)
except asyncio.TimeoutError:
task.cancel()
pytest.fail(
"_keep_typing did not exit within 2s of stop_event.set() — "
"it is blocked on a slow send_typing call"
)
# With per-tick timeout, we should see MULTIPLE send_typing starts
# despite each being slow (abandoned via TimeoutError). Without the
# fix there would be exactly 1 start (the one still stuck).
starts = [e for e in call_events if e == "start"]
assert len(starts) >= 2, (
f"expected at least 2 send_typing ticks across 3s of slow "
f"operation, got {len(starts)} — refresh cadence is stalled "
f"on a slow send_typing"
)
@pytest.mark.asyncio
async def test_fast_send_typing_still_gets_awaited(self, monkeypatch):
"""When send_typing is fast (normal case), it must still complete
normally the timeout is only an upper bound, not a cap on
successful calls."""
adapter = _StubAdapter()
completed = []
async def fast_send_typing(chat_id, metadata=None):
await asyncio.sleep(0.01) # well under the timeout
completed.append(chat_id)
monkeypatch.setattr(adapter, "send_typing", fast_send_typing)
adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
stop_event = asyncio.Event()
task = asyncio.create_task(
adapter._keep_typing(
chat_id="456",
interval=0.5,
stop_event=stop_event,
)
)
await asyncio.sleep(1.2) # ~3 ticks
stop_event.set()
await asyncio.wait_for(task, timeout=1.0)
assert len(completed) >= 2, (
f"expected multiple completed send_typing calls, got "
f"{len(completed)}"
)
assert all(c == "456" for c in completed)
@pytest.mark.asyncio
async def test_send_typing_exception_does_not_kill_loop(self, monkeypatch):
"""A send_typing that raises (e.g. transient HTTP 500) must be
caught so the loop continues refreshing on schedule."""
adapter = _StubAdapter()
tick_count = {"n": 0}
async def flaky_send_typing(chat_id, metadata=None):
tick_count["n"] += 1
if tick_count["n"] == 1:
raise RuntimeError("transient upstream error")
# Subsequent calls succeed.
monkeypatch.setattr(adapter, "send_typing", flaky_send_typing)
adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
stop_event = asyncio.Event()
task = asyncio.create_task(
adapter._keep_typing(
chat_id="789",
interval=0.3,
stop_event=stop_event,
)
)
await asyncio.sleep(1.0)
stop_event.set()
await asyncio.wait_for(task, timeout=1.0)
assert tick_count["n"] >= 2, (
f"loop exited after first send_typing exception; expected it to "
f"keep ticking (got {tick_count['n']} ticks)"
)
@pytest.mark.asyncio
async def test_paused_chat_skips_send_typing(self, monkeypatch):
"""When a chat is in _typing_paused (e.g. awaiting approval), the
loop must not call send_typing at all. Regression guard existing
behavior, preserved through the timeout change."""
adapter = _StubAdapter()
calls = []
async def recording_send_typing(chat_id, metadata=None):
calls.append(chat_id)
monkeypatch.setattr(adapter, "send_typing", recording_send_typing)
adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
adapter._typing_paused.add("paused-chat")
stop_event = asyncio.Event()
task = asyncio.create_task(
adapter._keep_typing(
chat_id="paused-chat",
interval=0.3,
stop_event=stop_event,
)
)
await asyncio.sleep(1.0)
stop_event.set()
await asyncio.wait_for(task, timeout=1.0)
assert calls == [], (
f"send_typing was called on a paused chat: {calls}"
)
+246
View File
@@ -9,6 +9,7 @@ import pytest
from unittest.mock import MagicMock, patch, AsyncMock
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import MessageType
def _make_fake_mautrix():
@@ -1204,6 +1205,40 @@ class TestMatrixSyncLoop:
fake_client.handle_sync.assert_called_once()
mock_sync_store.put_next_batch.assert_awaited_once_with("s1234")
@pytest.mark.asyncio
async def test_sync_loop_reconciles_pending_invites(self):
"""Pending rooms.invite entries should be joined if callbacks were missed."""
adapter = _make_adapter()
adapter._closing = False
async def _sync_once(**kwargs):
adapter._closing = True
return {
"rooms": {
"join": {"!joined:example.org": {}},
"invite": {"!invited:example.org": {}},
},
"next_batch": "s1234",
}
mock_sync_store = MagicMock()
mock_sync_store.get_next_batch = AsyncMock(return_value=None)
mock_sync_store.put_next_batch = AsyncMock()
fake_client = MagicMock()
fake_client.sync = AsyncMock(side_effect=_sync_once)
fake_client.join_room = AsyncMock()
fake_client.sync_store = mock_sync_store
fake_client.handle_sync = MagicMock(return_value=[])
adapter._client = fake_client
with patch.object(adapter, "_refresh_dm_cache", AsyncMock()):
await adapter._sync_loop()
fake_client.join_room.assert_awaited_once()
assert "!joined:example.org" in adapter._joined_rooms
assert "!invited:example.org" in adapter._joined_rooms
class TestMatrixUploadAndSend:
@pytest.mark.asyncio
@@ -1862,6 +1897,81 @@ class TestMatrixReadReceipts:
assert result is False
# ---------------------------------------------------------------------------
# Media normalization
# ---------------------------------------------------------------------------
class TestMatrixImageOnlyMediaNormalization:
def setup_method(self):
self.adapter = _make_adapter()
self.adapter._client = MagicMock()
self.adapter._client.download_media = AsyncMock(return_value=None)
self.adapter._is_dm_room = AsyncMock(return_value=True)
self.adapter._get_display_name = AsyncMock(return_value="Alice")
self.adapter._background_read_receipt = MagicMock()
self.adapter._mxc_to_http = (
lambda url: "https://matrix.example.org/_matrix/media/v3/download/example/30.png"
)
@pytest.mark.asyncio
async def test_image_only_filename_body_is_not_forwarded_as_text(self):
captured_event = None
async def capture(msg_event):
nonlocal captured_event
captured_event = msg_event
self.adapter.handle_message = capture
await self.adapter._handle_media_message(
room_id="!room:example.org",
sender="@alice:example.org",
event_id="$image1",
event_ts=0.0,
source_content={
"msgtype": "m.image",
"body": "30.png",
"url": "mxc://example/30.png",
"info": {"mimetype": "image/png"},
},
relates_to={},
msgtype="m.image",
)
assert captured_event is not None
assert captured_event.text == ""
assert captured_event.media_urls == [
"https://matrix.example.org/_matrix/media/v3/download/example/30.png"
]
assert captured_event.message_type == MessageType.PHOTO
@pytest.mark.asyncio
async def test_image_caption_text_is_preserved(self):
captured_event = None
async def capture(msg_event):
nonlocal captured_event
captured_event = msg_event
self.adapter.handle_message = capture
await self.adapter._handle_media_message(
room_id="!room:example.org",
sender="@alice:example.org",
event_id="$image2",
event_ts=0.0,
source_content={
"msgtype": "m.image",
"body": "Please describe this chart",
"url": "mxc://example/30.png",
"info": {"mimetype": "image/png"},
},
relates_to={},
msgtype="m.image",
)
assert captured_event is not None
assert captured_event.text == "Please describe this chart"
# ---------------------------------------------------------------------------
# Message redaction
# ---------------------------------------------------------------------------
@@ -2099,3 +2209,139 @@ class TestMatrixOnRoomMessageFilter:
ev = self._mk_event(sender="@alice:example.org", body="hello bot")
await self.adapter._on_room_message(ev)
self.adapter._handle_text_message.assert_awaited_once()
# ---------------------------------------------------------------------------
# DM auto-thread
# ---------------------------------------------------------------------------
class TestMatrixDmAutoThread:
def setup_method(self):
self.adapter = _make_adapter()
self.adapter._is_dm_room = AsyncMock(return_value=True)
self.adapter._get_display_name = AsyncMock(return_value="Alice")
self.adapter._background_read_receipt = MagicMock()
# Disable require_mention so DMs pass gating
self.adapter._require_mention = False
@pytest.mark.asyncio
async def test_dm_auto_thread_enabled_creates_thread(self):
"""When dm_auto_thread is True, DM messages get auto-threaded."""
self.adapter._dm_auto_thread = True
ctx = await self.adapter._resolve_message_context(
room_id="!dm:ex",
sender="@alice:ex",
event_id="$ev1",
body="hello",
source_content={"body": "hello"},
relates_to={},
)
assert ctx is not None
_body, _is_dm, _chat_type, thread_id, _display, _source = ctx
assert thread_id == "$ev1"
@pytest.mark.asyncio
async def test_dm_auto_thread_disabled_no_thread(self):
"""When dm_auto_thread is False (default), DMs have no auto-thread."""
self.adapter._dm_auto_thread = False
ctx = await self.adapter._resolve_message_context(
room_id="!dm:ex",
sender="@alice:ex",
event_id="$ev2",
body="hello",
source_content={"body": "hello"},
relates_to={},
)
assert ctx is not None
_body, _is_dm, _chat_type, thread_id, _display, _source = ctx
assert thread_id is None
# ---------------------------------------------------------------------------
# Proxy configuration
# ---------------------------------------------------------------------------
class TestMatrixProxyConfig:
"""Verify that MatrixAdapter resolves and propagates proxy settings."""
def _make_adapter(self, monkeypatch, proxy_env=None):
monkeypatch.setenv("MATRIX_ACCESS_TOKEN", "syt_test")
monkeypatch.setenv("MATRIX_HOMESERVER", "https://matrix.example.org")
# Clear generic proxy vars so they don't leak from the host
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy", "MATRIX_PROXY"):
monkeypatch.delenv(key, raising=False)
if proxy_env:
for k, v in proxy_env.items():
monkeypatch.setenv(k, v)
with patch.dict("sys.modules", _make_fake_mautrix()):
from gateway.platforms.matrix import MatrixAdapter
cfg = PlatformConfig(enabled=True, token="syt_test",
extra={"homeserver": "https://matrix.example.org",
"user_id": "@bot:example.org"})
return MatrixAdapter(cfg)
def test_no_proxy_by_default(self, monkeypatch):
adapter = self._make_adapter(monkeypatch)
assert adapter._proxy_url is None
def test_matrix_proxy_env_var(self, monkeypatch):
adapter = self._make_adapter(monkeypatch,
proxy_env={"MATRIX_PROXY": "socks5://proxy:1080"})
assert adapter._proxy_url == "socks5://proxy:1080"
def test_generic_proxy_fallback(self, monkeypatch):
adapter = self._make_adapter(monkeypatch,
proxy_env={"HTTPS_PROXY": "http://corp:8080"})
assert adapter._proxy_url == "http://corp:8080"
def test_matrix_proxy_takes_priority(self, monkeypatch):
adapter = self._make_adapter(monkeypatch,
proxy_env={"MATRIX_PROXY": "socks5://special:1080",
"HTTPS_PROXY": "http://generic:8080"})
assert adapter._proxy_url == "socks5://special:1080"
class TestCreateMatrixSession:
"""Verify _create_matrix_session applies proxy at the session level."""
@pytest.mark.asyncio
async def test_no_proxy_returns_trust_env_session(self):
with patch.dict("sys.modules", _make_fake_mautrix()):
from gateway.platforms.matrix import _create_matrix_session
session = _create_matrix_session(None)
try:
assert session.trust_env is True
finally:
await session.close()
@pytest.mark.asyncio
async def test_http_proxy_sets_default_proxy(self):
with patch.dict("sys.modules", _make_fake_mautrix()):
from gateway.platforms.matrix import _create_matrix_session
session = _create_matrix_session("http://proxy:8080")
try:
assert str(session._default_proxy) == "http://proxy:8080"
finally:
await session.close()
@pytest.mark.asyncio
async def test_socks_proxy_uses_connector(self):
fake_connector = MagicMock()
with patch.dict("sys.modules", _make_fake_mautrix()):
with patch.dict("sys.modules", {
"aiohttp_socks": MagicMock(
ProxyConnector=MagicMock(
from_url=MagicMock(return_value=fake_connector)
)
),
}):
from gateway.platforms.matrix import _create_matrix_session
session = _create_matrix_session("socks5://proxy:1080")
try:
assert session.connector is fake_connector
finally:
await session.close()
@@ -0,0 +1,60 @@
import types
import pytest
from unittest.mock import AsyncMock, patch
from gateway.config import PlatformConfig
class TestMatrixExecApprovalReactions:
@pytest.mark.asyncio
async def test_send_exec_approval_registers_prompt_and_seeds_reactions(self, monkeypatch):
monkeypatch.setenv("MATRIX_ALLOWED_USERS", "@liizfq:liizfq.top")
from gateway.platforms.matrix import MatrixAdapter
adapter = MatrixAdapter(PlatformConfig(enabled=True, token="tok", extra={"homeserver": "https://matrix.example.org"}))
adapter._client = types.SimpleNamespace()
adapter.send = AsyncMock(return_value=types.SimpleNamespace(success=True, message_id="$evt1"))
adapter._send_reaction = AsyncMock(return_value="$r")
result = await adapter.send_exec_approval(
chat_id="!room:example.org",
command="rm -rf /tmp/test",
session_key="sess-1",
description="dangerous",
)
assert result.success is True
assert adapter._approval_prompt_by_session["sess-1"] == "$evt1"
assert adapter._approval_prompts_by_event["$evt1"].session_key == "sess-1"
assert adapter._send_reaction.await_count == 2
emojis = [call.args[2] for call in adapter._send_reaction.await_args_list]
assert emojis == ["", ""]
@pytest.mark.asyncio
async def test_reaction_resolves_pending_approval(self, monkeypatch):
monkeypatch.setenv("MATRIX_ALLOWED_USERS", "@liizfq:liizfq.top")
from gateway.platforms.matrix import MatrixAdapter, _MatrixApprovalPrompt
adapter = MatrixAdapter(PlatformConfig(enabled=True, token="tok", extra={"homeserver": "https://matrix.example.org"}))
# Resolve user_id so _is_self_sender doesn't defensively drop all traffic (#15763).
adapter._user_id = "@bot:example.org"
adapter._approval_prompts_by_event["$target"] = _MatrixApprovalPrompt(
session_key="sess-1", chat_id="!room:example.org", message_id="$target"
)
adapter._approval_prompt_by_session["sess-1"] = "$target"
content = {"m.relates_to": {"event_id": "$target", "key": ""}}
event = types.SimpleNamespace(
sender="@liizfq:liizfq.top",
event_id="$react1",
room_id="!room:example.org",
content=content,
)
with patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve:
await adapter._on_reaction(event)
mock_resolve.assert_called_once_with("sess-1", "once")
assert "$target" not in adapter._approval_prompts_by_event
assert "sess-1" not in adapter._approval_prompt_by_session
+88 -1
View File
@@ -159,7 +159,7 @@ class TestStripMention:
assert result == "help me"
def test_localpart_preserved(self):
"""Localpart-only text is no longer stripped — avoids false positives in paths."""
"""Bare localpart (no @) is preserved — avoids false positives in paths."""
result = self.adapter._strip_mention("hermes help me")
assert result == "hermes help me"
@@ -168,11 +168,98 @@ class TestStripMention:
result = self.adapter._strip_mention("read /home/hermes/config.yaml")
assert result == "read /home/hermes/config.yaml"
def test_strip_localpart_when_explicit_at_mention(self):
result = self.adapter._strip_mention("@hermes help me")
assert result == "help me"
def test_does_not_strip_bare_localpart_word(self):
# Regression: plain words like "Hermes Agent" should not be mutated.
result = self.adapter._strip_mention("Hermes Agent")
assert result == "Hermes Agent"
def test_strip_returns_empty_for_mention_only(self):
result = self.adapter._strip_mention("@hermes:example.org")
assert result == ""
# ---------------------------------------------------------------------------
# Outbound mention payloads
# ---------------------------------------------------------------------------
class TestOutboundMentions:
def setup_method(self):
self.adapter = _make_adapter()
self.mock_client = MagicMock()
self.mock_client.send_message_event = AsyncMock(return_value="$evt1")
self.adapter._client = self.mock_client
@staticmethod
def _sent_content(mock_client):
call_args = mock_client.send_message_event.call_args
return call_args.args[2] if len(call_args.args) > 2 else call_args.kwargs["content"]
@pytest.mark.asyncio
async def test_send_adds_matrix_mentions_and_formatted_body(self):
result = await self.adapter.send(
"!room1:example.org",
"Hello @alice:example.org, please check this.",
)
assert result.success is True
content = self._sent_content(self.mock_client)
assert content["m.mentions"] == {"user_ids": ["@alice:example.org"]}
assert content["formatted_body"] == (
'Hello <a href="https://matrix.to/#/@alice:example.org">'
"@alice:example.org</a>, please check this."
)
@pytest.mark.asyncio
async def test_send_dedupes_mentions_and_ignores_code_spans(self):
await self.adapter.send(
"!room1:example.org",
"Ping @alice:example.org and @alice:example.org, not `@code:example.org`.",
)
content = self._sent_content(self.mock_client)
assert content["m.mentions"] == {"user_ids": ["@alice:example.org"]}
assert "@code:example.org</a>" not in content["formatted_body"]
@pytest.mark.asyncio
async def test_edit_message_preserves_mentions(self):
result = await self.adapter.edit_message(
"!room1:example.org",
"$original",
"Updated for @alice:example.org",
)
assert result.success is True
content = self._sent_content(self.mock_client)
assert content["m.mentions"] == {"user_ids": ["@alice:example.org"]}
assert content["m.new_content"]["m.mentions"] == {"user_ids": ["@alice:example.org"]}
assert content["m.new_content"]["formatted_body"] == (
'Updated for <a href="https://matrix.to/#/@alice:example.org">'
"@alice:example.org</a>"
)
assert content["formatted_body"] == (
'* Updated for <a href="https://matrix.to/#/@alice:example.org">'
"@alice:example.org</a>"
)
@pytest.mark.asyncio
async def test_send_simple_notice_adds_mentions(self):
result = await self.adapter._send_simple_message(
"!room1:example.org",
"Heads up @alice:example.org",
msgtype="m.notice",
)
assert result.success is True
content = self._sent_content(self.mock_client)
assert content["msgtype"] == "m.notice"
assert content["m.mentions"] == {"user_ids": ["@alice:example.org"]}
# ---------------------------------------------------------------------------
# Require-mention gating in _on_room_message
# ---------------------------------------------------------------------------
+46
View File
@@ -3,6 +3,8 @@
import os
from unittest.mock import patch
import pytest
from gateway.platforms.base import (
BasePlatformAdapter,
GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE,
@@ -582,3 +584,47 @@ class TestTruncateMessageUtf16:
f"Chunk {i} has unbalanced fences ({fence_count})"
)
class TestProxyKwargsForAiohttp:
"""Verify proxy_kwargs_for_aiohttp routes all schemes through ProxyConnector."""
def test_none_returns_empty(self):
from gateway.platforms.base import proxy_kwargs_for_aiohttp
sess_kw, req_kw = proxy_kwargs_for_aiohttp(None)
assert sess_kw == {}
assert req_kw == {}
def test_http_proxy_uses_connector_when_aiohttp_socks_available(self):
pytest.importorskip("aiohttp_socks")
from unittest.mock import MagicMock
from gateway.platforms.base import proxy_kwargs_for_aiohttp
sentinel = MagicMock(name="ProxyConnector")
with patch("aiohttp_socks.ProxyConnector.from_url", return_value=sentinel):
sess_kw, req_kw = proxy_kwargs_for_aiohttp("http://proxy:8080")
assert sess_kw.get("connector") is sentinel, (
"HTTP proxy must use ProxyConnector so libraries that don't "
"forward per-request proxy= kwargs still route through the proxy"
)
assert req_kw == {}
def test_socks_proxy_uses_connector(self):
pytest.importorskip("aiohttp_socks")
from unittest.mock import MagicMock
from gateway.platforms.base import proxy_kwargs_for_aiohttp
sentinel = MagicMock(name="ProxyConnector")
with patch("aiohttp_socks.ProxyConnector.from_url", return_value=sentinel):
sess_kw, req_kw = proxy_kwargs_for_aiohttp("socks5://proxy:1080")
assert sess_kw.get("connector") is sentinel
assert req_kw == {}
def test_http_proxy_falls_back_without_aiohttp_socks(self):
from gateway.platforms.base import proxy_kwargs_for_aiohttp
with patch.dict("sys.modules", {"aiohttp_socks": None}):
sess_kw, req_kw = proxy_kwargs_for_aiohttp("http://proxy:8080")
assert sess_kw == {}
assert req_kw == {"proxy": "http://proxy:8080"}
+240
View File
@@ -393,3 +393,243 @@ async def test_session_hygiene_messages_stay_in_originating_topic(monkeypatch, t
assert FakeCompressAgent.last_instance is not None
FakeCompressAgent.last_instance.shutdown_memory_provider.assert_called_once()
FakeCompressAgent.last_instance.close.assert_called_once()
@pytest.mark.asyncio
async def test_session_hygiene_warns_user_when_summary_generation_fails(monkeypatch, tmp_path):
"""When auxiliary compression's summary LLM call fails, the compressor
inserts a static fallback and the dropped turns are unrecoverable.
Gateway must surface a visible warning to the user, including
thread_id metadata so it lands in the originating topic/thread."""
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
class FakeCompressAgentWithSummaryFailure:
last_instance = None
def __init__(self, **kwargs):
self.model = kwargs.get("model")
self.session_id = kwargs.get("session_id", "fake-session")
self._print_fn = None
self.shutdown_memory_provider = MagicMock()
self.close = MagicMock()
# Simulate a compressor that hit summary-generation failure
# and inserted the static fallback placeholder.
self.context_compressor = SimpleNamespace(
_last_summary_fallback_used=True,
_last_summary_dropped_count=42,
_last_summary_error="404 model not found: gemini-3-flash-preview",
)
type(self).last_instance = self
def _compress_context(self, messages, *_args, **_kwargs):
self.session_id = f"{self.session_id}_compressed"
return ([{"role": "assistant", "content": "compressed"}], None)
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = FakeCompressAgentWithSummaryFailure
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
gateway_run = importlib.import_module("gateway.run")
GatewayRunner = gateway_run.GatewayRunner
adapter = HygieneCaptureAdapter()
runner = object.__new__(GatewayRunner)
runner.config = GatewayConfig(
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="fake-token")}
)
runner.adapters = {Platform.TELEGRAM: adapter}
runner._voice_mode = {}
runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
runner.session_store = MagicMock()
runner.session_store.get_or_create_session.return_value = SessionEntry(
session_key="agent:main:telegram:group:-1001:17585",
session_id="sess-1",
created_at=datetime.now(),
updated_at=datetime.now(),
platform=Platform.TELEGRAM,
chat_type="group",
)
runner.session_store.load_transcript.return_value = _make_history(6, content_size=400)
runner.session_store.has_any_sessions.return_value = True
runner.session_store.rewrite_transcript = MagicMock()
runner.session_store.append_to_transcript = MagicMock()
runner._running_agents = {}
runner._pending_messages = {}
runner._pending_approvals = {}
runner._session_db = None
runner._is_user_authorized = lambda _source: True
runner._set_session_env = lambda _context: None
runner._run_agent = AsyncMock(
return_value={
"final_response": "ok",
"messages": [],
"tools": [],
"history_offset": 0,
"last_prompt_tokens": 0,
}
)
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
monkeypatch.setattr(
"agent.model_metadata.get_model_context_length",
lambda *_args, **_kwargs: 100,
)
monkeypatch.setenv("TELEGRAM_HOME_CHANNEL", "795544298")
event = MessageEvent(
text="hello",
source=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-1001",
chat_type="group",
thread_id="17585",
user_id="12345",
),
message_id="1",
)
result = await runner._handle_message(event)
assert result == "ok"
# The compressor reported summary-failure → exactly one warning
# message must have been delivered to the user.
warning_messages = [s for s in adapter.sent if "Context compression summary failed" in s["content"]]
assert len(warning_messages) == 1, (
f"Expected 1 compression-failure warning, got {len(warning_messages)}: {adapter.sent}"
)
warn = warning_messages[0]
# Warning must include the dropped count and the underlying error.
assert "42" in warn["content"]
assert "404" in warn["content"]
# Warning must land in the originating topic/thread, not the main channel.
assert warn["chat_id"] == "-1001"
assert warn["metadata"] == {"thread_id": "17585"}
FakeCompressAgentWithSummaryFailure.last_instance.close.assert_called_once()
@pytest.mark.asyncio
async def test_session_hygiene_informs_user_when_aux_model_fails_but_recovers(monkeypatch, tmp_path):
"""When the user's configured ``auxiliary.compression.model`` errors out
and we recover via the main model, compression succeeds but the user's
config is still broken. Gateway hygiene must surface an note so the
user knows to fix ``auxiliary.compression.model`` silent recovery
hides a misconfig only they can resolve."""
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
class FakeCompressAgentWithAuxRecovery:
last_instance = None
def __init__(self, **kwargs):
self.model = kwargs.get("model")
self.session_id = kwargs.get("session_id", "fake-session")
self._print_fn = None
self.shutdown_memory_provider = MagicMock()
self.close = MagicMock()
# Compression succeeded (no placeholder inserted) but the
# configured aux model errored and we fell back to main.
self.context_compressor = SimpleNamespace(
_last_summary_fallback_used=False,
_last_summary_dropped_count=0,
_last_summary_error=None,
_last_aux_model_failure_model="gemini-3-flash-preview",
_last_aux_model_failure_error="404 model not found",
)
type(self).last_instance = self
def _compress_context(self, messages, *_args, **_kwargs):
self.session_id = f"{self.session_id}_compressed"
return ([{"role": "assistant", "content": "real summary"}], None)
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = FakeCompressAgentWithAuxRecovery
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
gateway_run = importlib.import_module("gateway.run")
GatewayRunner = gateway_run.GatewayRunner
adapter = HygieneCaptureAdapter()
runner = object.__new__(GatewayRunner)
runner.config = GatewayConfig(
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="fake-token")}
)
runner.adapters = {Platform.TELEGRAM: adapter}
runner._voice_mode = {}
runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
runner.session_store = MagicMock()
runner.session_store.get_or_create_session.return_value = SessionEntry(
session_key="agent:main:telegram:group:-1001:17585",
session_id="sess-1",
created_at=datetime.now(),
updated_at=datetime.now(),
platform=Platform.TELEGRAM,
chat_type="group",
)
runner.session_store.load_transcript.return_value = _make_history(6, content_size=400)
runner.session_store.has_any_sessions.return_value = True
runner.session_store.rewrite_transcript = MagicMock()
runner.session_store.append_to_transcript = MagicMock()
runner._running_agents = {}
runner._pending_messages = {}
runner._pending_approvals = {}
runner._session_db = None
runner._is_user_authorized = lambda _source: True
runner._set_session_env = lambda _context: None
runner._run_agent = AsyncMock(
return_value={
"final_response": "ok",
"messages": [],
"tools": [],
"history_offset": 0,
"last_prompt_tokens": 0,
}
)
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
monkeypatch.setattr(
"agent.model_metadata.get_model_context_length",
lambda *_args, **_kwargs: 100,
)
monkeypatch.setenv("TELEGRAM_HOME_CHANNEL", "795544298")
event = MessageEvent(
text="hello",
source=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-1001",
chat_type="group",
thread_id="17585",
user_id="12345",
),
message_id="1",
)
result = await runner._handle_message(event)
assert result == "ok"
# No ⚠️ hard-failure warning (that's for dropped turns)
hard_warnings = [s for s in adapter.sent if "Context compression summary failed" in s["content"]]
assert len(hard_warnings) == 0, adapter.sent
# But an note about the configured aux model must be delivered.
aux_notes = [
s for s in adapter.sent
if "Configured compression model" in s["content"]
]
assert len(aux_notes) == 1, (
f"Expected 1 aux-model fallback notice, got {len(aux_notes)}: {adapter.sent}"
)
note = aux_notes[0]
assert "gemini-3-flash-preview" in note["content"]
assert "404" in note["content"]
assert "auxiliary.compression.model" in note["content"]
# Note must land in the originating topic/thread.
assert note["chat_id"] == "-1001"
assert note["metadata"] == {"thread_id": "17585"}
FakeCompressAgentWithAuxRecovery.last_instance.close.assert_called_once()
+75
View File
@@ -356,6 +356,81 @@ def test_config_bridges_slack_free_response_channels(monkeypatch, tmp_path):
assert _os.environ["SLACK_FREE_RESPONSE_CHANNELS"] == "C0AQWDLHY9M,C9999999999"
def test_top_level_slack_settings_do_not_disable_env_token_setup(monkeypatch, tmp_path):
from gateway.config import load_gateway_config
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
(hermes_home / "config.yaml").write_text(
"slack:\n"
" require_mention: false\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
monkeypatch.delenv("SLACK_REQUIRE_MENTION", raising=False)
config = load_gateway_config()
slack_config = config.platforms[Platform.SLACK]
assert slack_config.enabled is True
assert slack_config.token == "xoxb-test"
assert slack_config.extra.get("require_mention") is False
assert "_enabled_explicit" not in slack_config.extra
def test_explicit_top_level_slack_enabled_false_wins_over_env_token(monkeypatch, tmp_path):
from gateway.config import load_gateway_config
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
(hermes_home / "config.yaml").write_text(
"slack:\n"
" enabled: false\n"
" require_mention: false\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
monkeypatch.delenv("SLACK_REQUIRE_MENTION", raising=False)
config = load_gateway_config()
slack_config = config.platforms[Platform.SLACK]
assert slack_config.enabled is False
assert slack_config.token == "xoxb-test"
assert slack_config.extra.get("require_mention") is False
assert "_enabled_explicit" not in slack_config.extra
def test_explicit_platforms_slack_enabled_false_wins_over_env_token(monkeypatch, tmp_path):
from gateway.config import load_gateway_config
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
(hermes_home / "config.yaml").write_text(
"platforms:\n"
" slack:\n"
" enabled: false\n"
" extra:\n"
" reply_in_thread: false\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
config = load_gateway_config()
slack_config = config.platforms[Platform.SLACK]
assert slack_config.enabled is False
assert slack_config.token == "xoxb-test"
assert slack_config.extra.get("reply_in_thread") is False
assert "_enabled_explicit" not in slack_config.extra
def test_config_bridges_slack_reply_in_thread(monkeypatch, tmp_path):
from gateway.config import load_gateway_config
+80
View File
@@ -0,0 +1,80 @@
"""Tests for _enrich_message_with_vision — regression for #5719.
The auxiliary vision LLM can echo system-prompt memory-context back into
its analysis output. The boundary fix in gateway/run.py runs the generic
sanitize_context helper over the description so the fenced wrapper and
its system-note are removed before the description reaches the user.
Plugin-specific header cleanup (e.g. "## Honcho Context") belongs at the
provider boundary, not in this shared gateway path.
"""
import asyncio
import json
from unittest.mock import AsyncMock, patch
import pytest
@pytest.fixture
def gateway_runner():
"""Minimal GatewayRunner stub with just the method under test bound."""
from gateway.run import GatewayRunner
class _Stub:
_enrich_message_with_vision = GatewayRunner._enrich_message_with_vision
return _Stub()
def _run(coro):
return asyncio.get_event_loop().run_until_complete(coro) if False else asyncio.new_event_loop().run_until_complete(coro)
class TestEnrichMessageWithVision:
def test_clean_description_passes_through(self, gateway_runner):
"""Vision output without leaked memory is embedded unchanged."""
fake_result = json.dumps({
"success": True,
"analysis": "A photograph of a sunset over the ocean.",
})
with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
assert "sunset over the ocean" in out
def test_memory_context_fence_stripped(self, gateway_runner):
"""<memory-context>...</memory-context> fenced block is scrubbed."""
leaked = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new "
"user input. Treat as informational background data.]\n\n"
"User details and preferences here.\n"
"</memory-context>\n"
"A photograph of a cat."
)
fake_result = json.dumps({"success": True, "analysis": leaked})
with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
assert "photograph of a cat" in out
assert "<memory-context>" not in out
assert "User details and preferences" not in out
assert "System note" not in out
def test_fenced_leak_stripped_plugin_header_preserved(self, gateway_runner):
"""The fenced wrapper is stripped; plugin-specific text outside the
fence (e.g. a "## Honcho Context" header) is left to the plugin layer.
Gateway core stays plugin-agnostic."""
leaked = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new "
"user input. Treat as informational background data.]\n"
"fenced leak\n"
"</memory-context>\n"
"A photograph of a dog."
)
fake_result = json.dumps({"success": True, "analysis": leaked})
with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
assert "photograph of a dog" in out
assert "fenced leak" not in out
assert "<memory-context>" not in out
@@ -42,6 +42,7 @@ class TestProviderRegistry:
("minimax-cn", "MiniMax (China)", "api_key"),
("ai-gateway", "Vercel AI Gateway", "api_key"),
("kilocode", "Kilo Code", "api_key"),
("gmi", "GMI Cloud", "api_key"),
])
def test_provider_registered(self, provider_id, name, auth_type):
assert provider_id in PROVIDER_REGISTRY
@@ -106,6 +107,11 @@ class TestProviderRegistry:
assert pconfig.api_key_env_vars == ("KILOCODE_API_KEY",)
assert pconfig.base_url_env_var == "KILOCODE_BASE_URL"
def test_gmi_env_vars(self):
pconfig = PROVIDER_REGISTRY["gmi"]
assert pconfig.api_key_env_vars == ("GMI_API_KEY",)
assert pconfig.base_url_env_var == "GMI_BASE_URL"
def test_huggingface_env_vars(self):
pconfig = PROVIDER_REGISTRY["huggingface"]
assert pconfig.api_key_env_vars == ("HF_TOKEN",)
@@ -121,6 +127,7 @@ class TestProviderRegistry:
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
assert PROVIDER_REGISTRY["gmi"].inference_base_url == "https://api.gmi-serving.com/v1"
assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"
def test_oauth_providers_unchanged(self):
@@ -143,6 +150,7 @@ PROVIDER_ENV_VARS = (
"MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
"AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
"KILOCODE_API_KEY", "KILOCODE_BASE_URL",
"GMI_API_KEY", "GMI_BASE_URL",
"DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
"NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
"OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
@@ -178,6 +186,9 @@ class TestResolveProvider:
def test_explicit_ai_gateway(self):
assert resolve_provider("ai-gateway") == "ai-gateway"
def test_explicit_gmi(self):
assert resolve_provider("gmi") == "gmi"
def test_alias_glm(self):
assert resolve_provider("glm") == "zai"
@@ -205,6 +216,9 @@ class TestResolveProvider:
def test_alias_vercel(self):
assert resolve_provider("vercel") == "ai-gateway"
def test_alias_gmi_cloud(self):
assert resolve_provider("gmi-cloud") == "gmi"
def test_explicit_kilocode(self):
assert resolve_provider("kilocode") == "kilocode"
@@ -280,6 +294,10 @@ class TestResolveProvider:
monkeypatch.setenv("AI_GATEWAY_API_KEY", "test-gw-key")
assert resolve_provider("auto") == "ai-gateway"
def test_auto_detects_gmi_key(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "test-gmi-key")
assert resolve_provider("auto") == "gmi"
def test_auto_detects_kilocode_key(self, monkeypatch):
monkeypatch.setenv("KILOCODE_API_KEY", "test-kilo-key")
assert resolve_provider("auto") == "kilocode"
@@ -497,6 +515,19 @@ class TestResolveApiKeyProviderCredentials:
assert creds["api_key"] == "kilo-secret-key"
assert creds["base_url"] == "https://api.kilo.ai/api/gateway"
def test_resolve_gmi_with_key(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-secret-key")
creds = resolve_api_key_provider_credentials("gmi")
assert creds["provider"] == "gmi"
assert creds["api_key"] == "gmi-secret-key"
assert creds["base_url"] == "https://api.gmi-serving.com/v1"
def test_resolve_gmi_custom_base_url(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-key")
monkeypatch.setenv("GMI_BASE_URL", "https://custom.gmi.example/v1")
creds = resolve_api_key_provider_credentials("gmi")
assert creds["base_url"] == "https://custom.gmi.example/v1"
def test_resolve_kilocode_custom_base_url(self, monkeypatch):
monkeypatch.setenv("KILOCODE_API_KEY", "kilo-key")
monkeypatch.setenv("KILOCODE_BASE_URL", "https://custom.kilo.example/v1")
@@ -594,6 +625,15 @@ class TestRuntimeProviderResolution:
assert result["api_key"] == "kilo-key"
assert "kilo.ai" in result["base_url"]
def test_runtime_gmi(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-key")
from hermes_cli.runtime_provider import resolve_runtime_provider
result = resolve_runtime_provider(requested="gmi")
assert result["provider"] == "gmi"
assert result["api_mode"] == "chat_completions"
assert result["api_key"] == "gmi-key"
assert result["base_url"] == "https://api.gmi-serving.com/v1"
def test_runtime_auto_detects_api_key_provider(self, monkeypatch):
monkeypatch.setenv("KIMI_API_KEY", "auto-kimi-key")
from hermes_cli.runtime_provider import resolve_runtime_provider
+363
View File
@@ -0,0 +1,363 @@
"""Focused tests for GMI Cloud first-class provider wiring."""
from __future__ import annotations
import contextlib
import io
import sys
import types
from argparse import Namespace
from unittest.mock import patch
import pytest
if "dotenv" not in sys.modules:
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
sys.modules["dotenv"] = fake_dotenv
from hermes_cli.auth import resolve_provider
from hermes_cli.config import load_config
from hermes_cli.models import (
CANONICAL_PROVIDERS,
_PROVIDER_LABELS,
_PROVIDER_MODELS,
normalize_provider,
provider_model_ids,
)
from agent.auxiliary_client import resolve_provider_client
from agent.model_metadata import get_model_context_length
@pytest.fixture(autouse=True)
def _clear_provider_env(monkeypatch):
for key in (
"OPENROUTER_API_KEY",
"OPENAI_API_KEY",
"ANTHROPIC_API_KEY",
"GOOGLE_API_KEY",
"GLM_API_KEY",
"KIMI_API_KEY",
"MINIMAX_API_KEY",
"GMI_API_KEY",
"GMI_BASE_URL",
):
monkeypatch.delenv(key, raising=False)
class TestGmiAliases:
@pytest.mark.parametrize("alias", ["gmi", "gmi-cloud", "gmicloud"])
def test_alias_resolves(self, alias, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
assert resolve_provider(alias) == "gmi"
def test_models_normalize_provider(self):
assert normalize_provider("gmi-cloud") == "gmi"
assert normalize_provider("gmicloud") == "gmi"
def test_providers_normalize_provider(self):
from hermes_cli.providers import normalize_provider as normalize_provider_in_providers
assert normalize_provider_in_providers("gmi-cloud") == "gmi"
assert normalize_provider_in_providers("gmicloud") == "gmi"
class TestGmiConfigRegistry:
def test_optional_env_vars_include_gmi(self):
from hermes_cli.config import OPTIONAL_ENV_VARS
assert "GMI_API_KEY" in OPTIONAL_ENV_VARS
assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["category"] == "provider"
assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["password"] is True
assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["url"] == "https://www.gmicloud.ai/"
assert "GMI_BASE_URL" in OPTIONAL_ENV_VARS
assert OPTIONAL_ENV_VARS["GMI_BASE_URL"]["category"] == "provider"
assert OPTIONAL_ENV_VARS["GMI_BASE_URL"]["password"] is False
# ENV_VARS_BY_VERSION entries are not needed for providers added after
# _config_version 22 (the current baseline) — users discover GMI via
# hermes model, not via upgrade prompts.
class TestGmiModelCatalog:
def test_static_model_fallback_exists(self):
assert "gmi" in _PROVIDER_MODELS
models = _PROVIDER_MODELS["gmi"]
assert "zai-org/GLM-5.1-FP8" in models
assert "deepseek-ai/DeepSeek-V3.2" in models
assert "moonshotai/Kimi-K2.5" in models
assert "anthropic/claude-sonnet-4.6" in models
def test_canonical_provider_entry(self):
slugs = [p.slug for p in CANONICAL_PROVIDERS]
assert "gmi" in slugs
def test_provider_model_ids_prefers_live_api(self, monkeypatch):
monkeypatch.setattr(
"hermes_cli.auth.resolve_api_key_provider_credentials",
lambda provider_id: {
"provider": provider_id,
"api_key": "gmi-live-key",
"base_url": "https://api.gmi-serving.com/v1",
"source": "GMI_API_KEY",
},
)
monkeypatch.setattr(
"hermes_cli.models.fetch_api_models",
lambda api_key, base_url: [
"openai/gpt-5.4-mini",
"zai-org/GLM-5.1-FP8",
],
)
assert provider_model_ids("gmi") == [
"openai/gpt-5.4-mini",
"zai-org/GLM-5.1-FP8",
]
def test_provider_model_ids_falls_back_to_static_models(self, monkeypatch):
monkeypatch.setattr(
"hermes_cli.auth.resolve_api_key_provider_credentials",
lambda provider_id: {
"provider": provider_id,
"api_key": "gmi-live-key",
"base_url": "https://api.gmi-serving.com/v1",
"source": "GMI_API_KEY",
},
)
monkeypatch.setattr("hermes_cli.models.fetch_api_models", lambda api_key, base_url: None)
assert provider_model_ids("gmi") == list(_PROVIDER_MODELS["gmi"])
class TestGmiProvidersModule:
def test_overlay_exists(self):
from hermes_cli.providers import HERMES_OVERLAYS
assert "gmi" in HERMES_OVERLAYS
overlay = HERMES_OVERLAYS["gmi"]
assert overlay.transport == "openai_chat"
assert overlay.extra_env_vars == ("GMI_API_KEY",)
assert overlay.base_url_override == "https://api.gmi-serving.com/v1"
assert overlay.base_url_env_var == "GMI_BASE_URL"
assert not overlay.is_aggregator
def test_provider_label(self):
assert _PROVIDER_LABELS["gmi"] == "GMI Cloud"
class TestGmiDoctor:
def test_provider_env_hints_include_gmi(self):
from hermes_cli.doctor import _PROVIDER_ENV_HINTS
assert "GMI_API_KEY" in _PROVIDER_ENV_HINTS
def test_run_doctor_checks_gmi_models_endpoint(self, monkeypatch, tmp_path):
from hermes_cli import doctor as doctor_mod
home = tmp_path / ".hermes"
home.mkdir(parents=True, exist_ok=True)
(home / "config.yaml").write_text("memory: {}\n", encoding="utf-8")
(home / ".env").write_text("GMI_API_KEY=***\n", encoding="utf-8")
project = tmp_path / "project"
project.mkdir(exist_ok=True)
monkeypatch.setattr(doctor_mod, "HERMES_HOME", home)
monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", project)
monkeypatch.setattr(doctor_mod, "_DHH", str(home))
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
for env_name in (
"OPENROUTER_API_KEY",
"OPENAI_API_KEY",
"ANTHROPIC_API_KEY",
"ANTHROPIC_TOKEN",
"GLM_API_KEY",
"ZAI_API_KEY",
"Z_AI_API_KEY",
"KIMI_API_KEY",
"KIMI_CN_API_KEY",
"ARCEEAI_API_KEY",
"DEEPSEEK_API_KEY",
"HF_TOKEN",
"DASHSCOPE_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"AI_GATEWAY_API_KEY",
"KILOCODE_API_KEY",
"OPENCODE_ZEN_API_KEY",
"OPENCODE_GO_API_KEY",
"XIAOMI_API_KEY",
):
monkeypatch.delenv(env_name, raising=False)
fake_model_tools = types.SimpleNamespace(
check_tool_availability=lambda *a, **kw: ([], []),
TOOLSET_REQUIREMENTS={},
)
monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools)
try:
from hermes_cli import auth as _auth_mod
monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {})
monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {})
except Exception:
pass
calls = []
def fake_get(url, headers=None, timeout=None):
calls.append((url, headers, timeout))
return types.SimpleNamespace(status_code=200)
import httpx
monkeypatch.setattr(httpx, "get", fake_get)
buf = io.StringIO()
with contextlib.redirect_stdout(buf):
doctor_mod.run_doctor(Namespace(fix=False))
out = buf.getvalue()
assert "API key or custom endpoint configured" in out
assert "GMI Cloud" in out
assert any(url == "https://api.gmi-serving.com/v1/models" for url, _, _ in calls)
class TestGmiModelMetadata:
def test_url_to_provider(self):
from agent.model_metadata import _URL_TO_PROVIDER
assert _URL_TO_PROVIDER.get("api.gmi-serving.com") == "gmi"
def test_provider_prefixes(self):
from agent.model_metadata import _PROVIDER_PREFIXES
assert "gmi" in _PROVIDER_PREFIXES
assert "gmi-cloud" in _PROVIDER_PREFIXES
assert "gmicloud" in _PROVIDER_PREFIXES
def test_infer_from_url(self):
from agent.model_metadata import _infer_provider_from_url
assert _infer_provider_from_url("https://api.gmi-serving.com/v1") == "gmi"
def test_known_gmi_endpoint_still_uses_endpoint_metadata(self):
with patch(
"agent.model_metadata.get_cached_context_length",
return_value=None,
), patch(
"agent.model_metadata.fetch_endpoint_model_metadata",
return_value={"anthropic/claude-opus-4.6": {"context_length": 409600}},
), patch(
"agent.models_dev.lookup_models_dev_context",
return_value=None,
), patch(
"agent.model_metadata.fetch_model_metadata",
return_value={},
):
result = get_model_context_length(
"anthropic/claude-opus-4.6",
base_url="https://api.gmi-serving.com/v1",
api_key="gmi-test-key",
provider="custom",
)
assert result == 409600
class TestGmiAuxiliary:
def test_aux_default_model(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert _API_KEY_PROVIDER_AUX_MODELS["gmi"] == "google/gemini-3.1-flash-lite-preview"
def test_resolve_provider_client_uses_gmi_aux_default(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
with patch("agent.auxiliary_client.OpenAI") as mock_openai:
mock_openai.return_value = object()
client, model = resolve_provider_client("gmi")
assert client is not None
assert model == "google/gemini-3.1-flash-lite-preview"
assert mock_openai.call_args.kwargs["api_key"] == "gmi-test-key"
assert mock_openai.call_args.kwargs["base_url"] == "https://api.gmi-serving.com/v1"
def test_resolve_provider_client_accepts_gmi_alias(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
with patch("agent.auxiliary_client.OpenAI") as mock_openai:
mock_openai.return_value = object()
client, model = resolve_provider_client("gmi-cloud")
assert client is not None
assert model == "google/gemini-3.1-flash-lite-preview"
class TestGmiMainFlow:
def test_chat_parser_accepts_gmi_provider(self, monkeypatch):
recorded: dict[str, str] = {}
monkeypatch.setattr("hermes_cli.config.get_container_exec_info", lambda: None)
monkeypatch.setattr(
"hermes_cli.main.cmd_chat",
lambda args: recorded.setdefault("provider", args.provider),
)
monkeypatch.setattr(sys, "argv", ["hermes", "chat", "--provider", "gmi"])
from hermes_cli.main import main
main()
assert recorded["provider"] == "gmi"
def test_select_provider_and_model_routes_gmi_to_generic_flow(self, monkeypatch):
recorded: dict[str, str] = {}
monkeypatch.setattr("hermes_cli.auth.resolve_provider", lambda *args, **kwargs: None)
def fake_prompt_provider_choice(choices, default=0):
return next(i for i, label in enumerate(choices) if label.startswith("GMI Cloud"))
def fake_model_flow_api_key_provider(config, provider_id, current_model=""):
recorded["provider_id"] = provider_id
monkeypatch.setattr("hermes_cli.main._prompt_provider_choice", fake_prompt_provider_choice)
monkeypatch.setattr("hermes_cli.main._model_flow_api_key_provider", fake_model_flow_api_key_provider)
from hermes_cli.main import select_provider_and_model
select_provider_and_model()
assert recorded["provider_id"] == "gmi"
def test_model_flow_api_key_provider_persists_gmi_selection(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
with patch(
"hermes_cli.models.fetch_api_models",
return_value=["zai-org/GLM-5.1-FP8", "openai/gpt-5.4-mini"],
), patch(
"hermes_cli.auth._prompt_model_selection",
return_value="openai/gpt-5.4-mini",
), patch(
"hermes_cli.auth.deactivate_provider",
), patch(
"builtins.input",
return_value="",
):
from hermes_cli.main import _model_flow_api_key_provider
_model_flow_api_key_provider(load_config(), "gmi", "old-model")
import yaml
from hermes_constants import get_hermes_home
config = yaml.safe_load((get_hermes_home() / "config.yaml").read_text()) or {}
model_cfg = config.get("model")
assert isinstance(model_cfg, dict)
assert model_cfg["provider"] == "gmi"
assert model_cfg["default"] == "openai/gpt-5.4-mini"
assert model_cfg["base_url"] == "https://api.gmi-serving.com/v1"
+168
View File
@@ -0,0 +1,168 @@
"""Tests for optional-plugins (official) install path in plugins_cmd."""
from __future__ import annotations
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_official_plugin_dir(tmp_path: Path, category: str, name: str) -> Path:
"""Create a minimal optional-plugin directory structure."""
plugin_dir = tmp_path / "optional-plugins" / category / name
plugin_dir.mkdir(parents=True)
(plugin_dir / "plugin.yaml").write_text(
f"name: {name}\nversion: 1.0.0\ndescription: Test plugin\n"
)
(plugin_dir / "__init__.py").write_text("def register(ctx): pass\n")
return plugin_dir
# ---------------------------------------------------------------------------
# _resolve_official_plugin
# ---------------------------------------------------------------------------
class TestResolveOfficialPlugin:
def test_returns_none_for_git_url(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("https://github.com/owner/repo.git")
assert result is None
def test_returns_none_for_owner_repo(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("owner/repo")
assert result is None
def test_returns_none_for_missing_plugin(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
(tmp_path / "optional-plugins").mkdir()
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("official/observability/nonexistent")
assert result is None
def test_returns_path_for_existing_plugin(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
plugin_dir = _make_official_plugin_dir(tmp_path, "observability", "langfuse")
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("official/observability/langfuse")
assert result == plugin_dir
def test_accepts_without_official_prefix(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
plugin_dir = _make_official_plugin_dir(tmp_path, "observability", "langfuse")
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("observability/langfuse")
assert result == plugin_dir
def test_traversal_blocked(self, tmp_path):
from hermes_cli.plugins_cmd import _resolve_official_plugin
(tmp_path / "optional-plugins").mkdir()
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _resolve_official_plugin("official/../../etc/passwd")
assert result is None
# ---------------------------------------------------------------------------
# _list_official_plugins
# ---------------------------------------------------------------------------
class TestListOfficialPlugins:
def test_empty_when_no_optional_plugins_dir(self, tmp_path):
from hermes_cli.plugins_cmd import _list_official_plugins
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "nonexistent"):
result = _list_official_plugins()
assert result == []
def test_lists_plugins_with_descriptions(self, tmp_path):
from hermes_cli.plugins_cmd import _list_official_plugins
_make_official_plugin_dir(tmp_path, "observability", "langfuse")
_make_official_plugin_dir(tmp_path, "observability", "other-plugin")
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _list_official_plugins()
identifiers = [r[0] for r in result]
assert "official/observability/langfuse" in identifiers
assert "official/observability/other-plugin" in identifiers
def test_descriptions_parsed_from_yaml(self, tmp_path):
from hermes_cli.plugins_cmd import _list_official_plugins
plugin_dir = _make_official_plugin_dir(tmp_path, "observability", "langfuse")
with patch("hermes_cli.plugins_cmd._optional_plugins_dir", return_value=tmp_path / "optional-plugins"):
result = _list_official_plugins()
assert any(desc == "Test plugin" for _, desc in result)
# ---------------------------------------------------------------------------
# cmd_install — official path
# ---------------------------------------------------------------------------
class TestCmdInstallOfficial:
def test_install_official_plugin_copies_files(self, tmp_path, monkeypatch):
from hermes_cli.plugins_cmd import cmd_install
plugin_dir = _make_official_plugin_dir(tmp_path, "observability", "langfuse")
user_plugins = tmp_path / "user-plugins"
user_plugins.mkdir()
monkeypatch.setattr("hermes_cli.plugins_cmd._optional_plugins_dir",
lambda: tmp_path / "optional-plugins")
monkeypatch.setattr("hermes_cli.plugins_cmd._plugins_dir",
lambda: user_plugins)
# Non-interactive: don't prompt
monkeypatch.setattr("sys.stdin.isatty", lambda: False)
cmd_install("official/observability/langfuse", enable=False)
installed = user_plugins / "langfuse"
assert installed.is_dir()
assert (installed / "plugin.yaml").exists()
assert (installed / "__init__.py").exists()
def test_install_official_plugin_respects_force(self, tmp_path, monkeypatch):
from hermes_cli.plugins_cmd import cmd_install
plugin_dir = _make_official_plugin_dir(tmp_path, "observability", "langfuse")
user_plugins = tmp_path / "user-plugins"
user_plugins.mkdir()
# Pre-create to simulate already-installed
already = user_plugins / "langfuse"
already.mkdir()
(already / "old.txt").write_text("old")
monkeypatch.setattr("hermes_cli.plugins_cmd._optional_plugins_dir",
lambda: tmp_path / "optional-plugins")
monkeypatch.setattr("hermes_cli.plugins_cmd._plugins_dir",
lambda: user_plugins)
monkeypatch.setattr("sys.stdin.isatty", lambda: False)
cmd_install("official/observability/langfuse", force=True, enable=False)
# Old file should be gone, new files present
assert not (already / "old.txt").exists()
assert (already / "plugin.yaml").exists()
def test_install_official_plugin_exits_without_force_when_exists(self, tmp_path, monkeypatch):
from hermes_cli.plugins_cmd import cmd_install
_make_official_plugin_dir(tmp_path, "observability", "langfuse")
user_plugins = tmp_path / "user-plugins"
user_plugins.mkdir()
(user_plugins / "langfuse").mkdir()
monkeypatch.setattr("hermes_cli.plugins_cmd._optional_plugins_dir",
lambda: tmp_path / "optional-plugins")
monkeypatch.setattr("hermes_cli.plugins_cmd._plugins_dir",
lambda: user_plugins)
with pytest.raises(SystemExit):
cmd_install("official/observability/langfuse", enable=False)
def test_git_url_not_mistaken_for_official(self, tmp_path, monkeypatch):
"""A git URL must not trigger the official install path."""
from hermes_cli.plugins_cmd import _resolve_official_plugin
with patch("hermes_cli.plugins_cmd._optional_plugins_dir",
return_value=tmp_path / "optional-plugins"):
assert _resolve_official_plugin("https://github.com/owner/repo") is None
assert _resolve_official_plugin("owner/repo") is None
+53 -3
View File
@@ -72,8 +72,12 @@ def test_redact_secrets_false_in_config_yaml_is_honored(tmp_path):
assert "ENV_VAR=false" in result.stdout
def test_redact_secrets_default_true_when_unset(tmp_path):
"""Without the config key, redaction stays on by default."""
def test_redact_secrets_default_false_when_unset(tmp_path):
"""Without the config key, redaction stays OFF by default.
Secret redaction is opt-in users who want it must set
`security.redact_secrets: true` explicitly (or HERMES_REDACT_SECRETS=true).
"""
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
(hermes_home / "config.yaml").write_text("{}\n") # empty config
@@ -103,7 +107,53 @@ def test_redact_secrets_default_true_when_unset(tmp_path):
timeout=30,
)
assert result.returncode == 0, f"probe failed: {result.stderr}"
assert "REDACT_ENABLED=True" in result.stdout
assert "REDACT_ENABLED=False" in result.stdout
def test_redact_secrets_true_in_config_yaml_is_honored(tmp_path):
"""Setting `security.redact_secrets: true` in config.yaml must enable
redaction even though it's set in YAML, not as an env var."""
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
(hermes_home / "config.yaml").write_text(
textwrap.dedent(
"""\
security:
redact_secrets: true
"""
)
)
(hermes_home / ".env").write_text("")
probe = textwrap.dedent(
"""\
import sys, os
os.environ.pop("HERMES_REDACT_SECRETS", None)
sys.path.insert(0, %r)
import hermes_cli.main
import agent.redact
print(f"REDACT_ENABLED={agent.redact._REDACT_ENABLED}")
print(f"ENV_VAR={os.environ.get('HERMES_REDACT_SECRETS', '<unset>')}")
"""
) % str(REPO_ROOT)
env = dict(os.environ)
env["HERMES_HOME"] = str(hermes_home)
env.pop("HERMES_REDACT_SECRETS", None)
result = subprocess.run(
[sys.executable, "-c", probe],
env=env,
capture_output=True,
text=True,
cwd=str(REPO_ROOT),
timeout=30,
)
assert result.returncode == 0, f"probe failed: {result.stderr}"
assert "REDACT_ENABLED=True" in result.stdout, (
f"Config toggle not honored.\nstdout: {result.stdout}\nstderr: {result.stderr}"
)
assert "ENV_VAR=true" in result.stdout
def test_dotenv_redact_secrets_beats_config_yaml(tmp_path):
+28 -4
View File
@@ -1,4 +1,4 @@
"""_tui_need_npm_install: auto npm when lockfile ahead of node_modules."""
"""_tui_need_npm_install: auto npm when node_modules is behind the lockfile."""
import os
from pathlib import Path
@@ -36,15 +36,39 @@ def test_need_install_when_ink_missing(tmp_path: Path, main_mod) -> None:
assert main_mod._tui_need_npm_install(tmp_path) is True
def test_need_install_when_lock_newer_than_marker(tmp_path: Path, main_mod) -> None:
def test_no_install_when_lock_newer_but_hidden_lock_matches(tmp_path: Path, main_mod) -> None:
_touch_ink(tmp_path)
(tmp_path / "package-lock.json").write_text("{}")
(tmp_path / "node_modules" / ".package-lock.json").write_text("{}")
(tmp_path / "package-lock.json").write_text('{"packages":{"node_modules/foo":{"version":"1.0.0"}}}')
(tmp_path / "node_modules" / ".package-lock.json").write_text(
'{"packages":{"node_modules/foo":{"version":"1.0.0","ideallyInert":true}}}'
)
os.utime(tmp_path / "package-lock.json", (200, 200))
os.utime(tmp_path / "node_modules" / ".package-lock.json", (100, 100))
assert main_mod._tui_need_npm_install(tmp_path) is False
def test_need_install_when_required_package_missing_from_hidden_lock(tmp_path: Path, main_mod) -> None:
_touch_ink(tmp_path)
(tmp_path / "package-lock.json").write_text(
'{"packages":{"node_modules/foo":{"version":"1.0.0"},"node_modules/bar":{"version":"1.0.0"}}}'
)
(tmp_path / "node_modules" / ".package-lock.json").write_text(
'{"packages":{"node_modules/foo":{"version":"1.0.0"}}}'
)
assert main_mod._tui_need_npm_install(tmp_path) is True
def test_no_install_when_only_optional_peer_package_missing_from_hidden_lock(tmp_path: Path, main_mod) -> None:
_touch_ink(tmp_path)
(tmp_path / "package-lock.json").write_text(
'{"packages":{"node_modules/foo":{"version":"1.0.0"},"node_modules/optional":{"version":"1.0.0","optional":true,"peer":true}}}'
)
(tmp_path / "node_modules" / ".package-lock.json").write_text(
'{"packages":{"node_modules/foo":{"version":"1.0.0"}}}'
)
assert main_mod._tui_need_npm_install(tmp_path) is False
def test_no_install_when_lock_older_than_marker(tmp_path: Path, main_mod) -> None:
_touch_ink(tmp_path)
(tmp_path / "package-lock.json").write_text("{}")
-43
View File
@@ -750,49 +750,6 @@ class TestNewEndpoints:
"top_skills": [],
}
def test_analytics_usage_includes_cache_tokens_in_input(self):
"""input_tokens in the response must include cache_read + cache_write."""
from hermes_state import SessionDB
db = SessionDB()
try:
db.create_session(
session_id="cache-tok-test",
source="cli",
model="claude-opus-4-6",
)
db.update_token_counts(
"cache-tok-test",
input_tokens=10,
output_tokens=50,
cache_read_tokens=9000,
cache_write_tokens=1000,
billing_provider="anthropic",
model="claude-opus-4-6",
)
finally:
db.close()
resp = self.client.get("/api/analytics/usage?days=7")
assert resp.status_code == 200
data = resp.json()
# Totals: input must be 10 + 9000 + 1000 = 10010
assert data["totals"]["total_input"] == 10010
assert data["totals"]["total_output"] == 50
# Daily: find the entry and verify
assert len(data["daily"]) == 1
day = data["daily"][0]
assert day["input_tokens"] == 10010
assert day["output_tokens"] == 50
# By-model: verify the model row
assert len(data["by_model"]) == 1
model_row = data["by_model"][0]
assert model_row["input_tokens"] == 10010
assert model_row["output_tokens"] == 50
def test_analytics_usage_includes_skill_breakdown(self):
from hermes_state import SessionDB
+97
View File
@@ -3,6 +3,103 @@
from types import SimpleNamespace
class TestResolveApiKey:
"""Test _resolve_api_key with various config shapes."""
def test_returns_api_key_from_root(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
assert honcho_cli._resolve_api_key({"apiKey": "root-key"}) == "root-key"
def test_returns_api_key_from_host_block(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
cfg = {"hosts": {"hermes": {"apiKey": "host-key"}}, "apiKey": "root-key"}
assert honcho_cli._resolve_api_key(cfg) == "host-key"
def test_returns_local_for_base_url_without_api_key(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
cfg = {"baseUrl": "http://localhost:8000"}
assert honcho_cli._resolve_api_key(cfg) == "local"
def test_returns_local_for_base_url_env_var(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.setenv("HONCHO_BASE_URL", "http://10.0.0.5:8000")
assert honcho_cli._resolve_api_key({}) == "local"
def test_returns_empty_when_nothing_configured(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
assert honcho_cli._resolve_api_key({}) == ""
def test_rejects_garbage_base_url_without_scheme(self, monkeypatch):
"""Obvious non-URL literals in baseUrl (typos) must not pass the guard."""
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
# Boolean literals, pure digits, and bare identifiers without
# host-like punctuation are rejected. Schemeless host:port-style
# strings are accepted (see test_accepts_legacy_schemeless_host).
for garbage in ("true", "false", "null", "1", "12345", "localhost"):
assert honcho_cli._resolve_api_key({"baseUrl": garbage}) == "", \
f"expected empty for garbage {garbage!r}"
def test_rejects_non_http_scheme_base_url(self, monkeypatch):
"""file:// / ftp:// / ws:// schemes are rejected as non-HTTP Honcho URLs.
Note: these DO contain ``.`` or ``:`` so they pass the schemeless
host fallback. That's acceptable — the Honcho SDK will still
reject them when it tries to connect. If tighter filtering is
needed later, extend the lowered-literal blocklist or check the
parsed scheme explicitly.
"""
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
# file:/// parses with scheme='file' but empty netloc, so the
# http/https guard rejects; the schemeless fallback also rejects
# because 'file:' starts with a known-non-http scheme prefix.
# ftp://host/ parses with scheme='ftp', netloc='host' — the
# http/https guard rejects but the schemeless fallback accepts
# because 'ftp://host/' contains ':' and '.'. Behaviour is
# intentionally lenient: SDK errors out with clearer message.
def test_accepts_https_base_url(self, monkeypatch):
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
assert honcho_cli._resolve_api_key({"baseUrl": "https://honcho.example.com"}) == "local"
def test_accepts_legacy_schemeless_host(self, monkeypatch):
"""Legacy configs with schemeless host:port must not regress.
Before scheme validation landed, ``baseUrl: "localhost:8000"`` passed
the truthy check and flowed through to the SDK. The lenient
schemeless fallback preserves that behaviour so self-hosters with
older configs don't see spurious "no API key configured" errors.
The SDK itself still rejects malformed URLs at connect time.
"""
import plugins.memory.honcho.cli as honcho_cli
monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
monkeypatch.delenv("HONCHO_API_KEY", raising=False)
monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
for legacy in ("localhost:8000", "10.0.0.5:8000", "honcho.local:8080", "host.example.com"):
assert honcho_cli._resolve_api_key({"baseUrl": legacy}) == "local", \
f"expected local sentinel for legacy schemeless {legacy!r}"
class TestCmdStatus:
def test_reports_connection_failure_when_session_setup_fails(self, monkeypatch, capsys, tmp_path):
import plugins.memory.honcho.cli as honcho_cli
+112 -3
View File
@@ -14,7 +14,7 @@ from plugins.memory.honcho.client import (
reset_honcho_client,
resolve_active_host,
resolve_config_path,
GLOBAL_CONFIG_PATH,
resolve_global_config_path,
HOST,
)
@@ -360,7 +360,7 @@ class TestResolveConfigPath:
with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}), \
patch.object(Path, "home", return_value=fake_home):
result = resolve_config_path()
assert result == GLOBAL_CONFIG_PATH
assert result == fake_home / ".honcho" / "config.json"
def test_falls_back_to_global_without_hermes_home_env(self, tmp_path):
fake_home = tmp_path / "fakehome"
@@ -370,7 +370,18 @@ class TestResolveConfigPath:
patch.object(Path, "home", return_value=fake_home):
os.environ.pop("HERMES_HOME", None)
result = resolve_config_path()
assert result == GLOBAL_CONFIG_PATH
assert result == fake_home / ".honcho" / "config.json"
def test_global_fallback_uses_home_at_call_time(self, tmp_path):
fake_home = tmp_path / "fakehome"
fake_home.mkdir()
hermes_home = tmp_path / "hermes"
hermes_home.mkdir()
with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}), \
patch.object(Path, "home", return_value=fake_home):
assert resolve_global_config_path() == fake_home / ".honcho" / "config.json"
assert resolve_config_path() == fake_home / ".honcho" / "config.json"
def test_from_global_config_uses_local_path(self, tmp_path):
hermes_home = tmp_path / "hermes"
@@ -589,6 +600,28 @@ class TestGetHonchoClient:
mock_honcho.assert_called_once()
assert mock_honcho.call_args.kwargs["timeout"] == 88.0
@pytest.mark.skipif(
not importlib.util.find_spec("honcho"),
reason="honcho SDK not installed"
)
def test_defaults_to_30s_when_no_timeout_configured(self):
from plugins.memory.honcho.client import _DEFAULT_HTTP_TIMEOUT
fake_honcho = MagicMock(name="Honcho")
cfg = HonchoClientConfig(
api_key="test-key",
workspace_id="hermes",
environment="production",
)
with patch("honcho.Honcho", return_value=fake_honcho) as mock_honcho, \
patch("hermes_cli.config.load_config", return_value={}):
client = get_honcho_client(cfg)
assert client is fake_honcho
mock_honcho.assert_called_once()
assert mock_honcho.call_args.kwargs["timeout"] == _DEFAULT_HTTP_TIMEOUT
@pytest.mark.skipif(
not importlib.util.find_spec("honcho"),
reason="honcho SDK not installed"
@@ -656,6 +689,82 @@ class TestResolveSessionNameGatewayKey:
assert ":" not in result
class TestResolveSessionNameLengthLimit:
"""Regression tests for Honcho's 100-char session ID limit (issue #13868).
Long gateway session keys (Matrix room+event IDs, Telegram supergroup
reply chains, Slack thread IDs with long workspace prefixes) can overflow
Honcho's 100-char session_id limit after sanitization. Before this fix,
every Honcho API call for those sessions 400'd with "session_id too long".
"""
HONCHO_MAX = 100
def test_short_gateway_key_unchanged(self):
"""Short keys must not get a hash suffix appended."""
config = HonchoClientConfig()
result = config.resolve_session_name(
gateway_session_key="agent:main:telegram:dm:8439114563",
)
# Unchanged fast-path: sanitize only, no truncation, no hash suffix.
assert result == "agent-main-telegram-dm-8439114563"
assert len(result) <= self.HONCHO_MAX
def test_key_at_exact_limit_unchanged(self):
"""A sanitized key that is exactly 100 chars must be returned as-is."""
key = "a" * self.HONCHO_MAX
config = HonchoClientConfig()
result = config.resolve_session_name(gateway_session_key=key)
assert result == key
assert len(result) == self.HONCHO_MAX
def test_long_gateway_key_truncated_to_limit(self):
"""An over-limit sanitized key must truncate to exactly 100 chars."""
key = "!roomid:matrix.example.org|" + "$event_" + ("a" * 300)
config = HonchoClientConfig()
result = config.resolve_session_name(gateway_session_key=key)
assert result is not None
assert len(result) == self.HONCHO_MAX
def test_truncation_is_deterministic(self):
"""Same long key must always produce the same truncated session ID."""
key = "matrix-" + ("a" * 300)
config = HonchoClientConfig()
first = config.resolve_session_name(gateway_session_key=key)
second = config.resolve_session_name(gateway_session_key=key)
assert first == second
def test_truncated_result_respects_char_allowlist(self):
"""Truncated result must still match Honcho's [a-zA-Z0-9_-] allowlist."""
import re
key = "slack:T12345:thread-reply:" + ("x" * 300) + ":with:colons:and:slashes/here"
config = HonchoClientConfig()
result = config.resolve_session_name(gateway_session_key=key)
assert result is not None
assert re.fullmatch(r"[a-zA-Z0-9_-]+", result)
def test_distinct_long_keys_do_not_collide(self):
"""Two long keys sharing a prefix must produce different truncated IDs."""
prefix = "matrix:!room:example.org|" + "a" * 200
key_a = prefix + "-suffix-alpha"
key_b = prefix + "-suffix-beta"
config = HonchoClientConfig()
result_a = config.resolve_session_name(gateway_session_key=key_a)
result_b = config.resolve_session_name(gateway_session_key=key_b)
assert result_a != result_b
assert len(result_a) == self.HONCHO_MAX
assert len(result_b) == self.HONCHO_MAX
def test_truncated_result_has_hash_suffix(self):
"""Truncated IDs must end with '-<8 hex chars>' for collision resistance."""
import re
key = "matrix-" + ("a" * 300)
config = HonchoClientConfig()
result = config.resolve_session_name(gateway_session_key=key)
# Last 9 chars: '-' + 8 hex chars.
assert re.search(r"-[0-9a-f]{8}$", result)
class TestResetHonchoClient:
def test_reset_clears_singleton(self):
import plugins.memory.honcho.client as mod
@@ -0,0 +1,85 @@
"""Tests for honcho_profile's empty-card hint (#5137 follow-up)."""
from __future__ import annotations
import json
from unittest.mock import MagicMock
from plugins.memory.honcho import HonchoMemoryProvider
def _make_provider(**cfg_overrides) -> HonchoMemoryProvider:
provider = HonchoMemoryProvider()
provider._manager = MagicMock()
provider._manager.get_peer_card.return_value = [] # empty card
provider._session_key = "agent:main:test"
provider._session_initialized = True # bypass the lazy _ensure_session() gate
provider._cron_skipped = False
cfg = MagicMock()
# Defaults match HonchoClientConfig defaults
cfg.user_observe_me = cfg_overrides.get("user_observe_me", True)
cfg.user_observe_others = cfg_overrides.get("user_observe_others", True)
cfg.ai_observe_me = cfg_overrides.get("ai_observe_me", True)
cfg.ai_observe_others = cfg_overrides.get("ai_observe_others", True)
cfg.message_max_chars = 25000
provider._config = cfg
provider._dialectic_cadence = cfg_overrides.get("dialectic_cadence", 1)
provider._turn_count = cfg_overrides.get("turn_count", 5)
return provider
class TestEmptyProfileHint:
def test_returns_hint_not_bare_error_message(self):
provider = _make_provider()
raw = provider.handle_tool_call("honcho_profile", {})
payload = json.loads(raw)
assert payload["result"] == "No profile facts available yet."
assert "hint" in payload
assert "not an error" in payload["hint"].lower()
def test_hint_mentions_warmup_when_turn_count_below_cadence(self):
provider = _make_provider(turn_count=1, dialectic_cadence=3)
raw = provider.handle_tool_call("honcho_profile", {})
payload = json.loads(raw)
assert "turn" in payload["hint"].lower()
assert "cadence" in payload["hint"].lower()
def test_hint_mentions_observation_when_fully_disabled_for_user(self):
provider = _make_provider(user_observe_me=False, user_observe_others=False)
raw = provider.handle_tool_call("honcho_profile", {"peer": "user"})
payload = json.loads(raw)
assert "observation is disabled" in payload["hint"].lower()
def test_hint_mentions_observation_when_fully_disabled_for_ai(self):
provider = _make_provider(ai_observe_me=False, ai_observe_others=False)
raw = provider.handle_tool_call("honcho_profile", {"peer": "ai"})
payload = json.loads(raw)
assert "observation is disabled" in payload["hint"].lower()
assert "ai" in payload["hint"]
def test_hint_falls_back_to_generic_reason_when_no_specific_cause(self):
"""Mature session with observation on + enough turns = generic hint."""
provider = _make_provider(turn_count=50, dialectic_cadence=1)
raw = provider.handle_tool_call("honcho_profile", {})
payload = json.loads(raw)
assert "hint" in payload
# Generic hint mentions self-hosted as a common cause
assert any(word in payload["hint"].lower() for word in ("self-hosted", "dialectic"))
def test_hint_suggests_alternative_tools(self):
provider = _make_provider()
raw = provider.handle_tool_call("honcho_profile", {})
payload = json.loads(raw)
# User-facing suggestion to try honcho_reasoning or honcho_search
assert "honcho_reasoning" in payload["hint"] or "honcho_search" in payload["hint"]
def test_populated_card_returns_card_without_hint(self):
"""Regression: a populated card should NOT trigger the hint path."""
provider = _make_provider()
provider._manager.get_peer_card.return_value = ["Fact 1", "Fact 2"]
raw = provider.handle_tool_call("honcho_profile", {})
payload = json.loads(raw)
assert payload["result"] == ["Fact 1", "Fact 2"]
assert "hint" not in payload
+307
View File
@@ -0,0 +1,307 @@
"""Tests for the ``pinPeerName`` config flag (#14984).
By default, when Hermes runs under a gateway (Telegram, Discord, Slack, ...)
it passes the platform-native user ID as ``runtime_user_peer_name`` into
``HonchoSessionManager``. That ID wins over any configured ``peer_name``
so multi-user bots scope memory per user.
For a single-user personal deployment where the user connects over multiple
platforms, that default forks memory into one Honcho peer per platform
(Telegram UID, Discord snowflake, Slack user ID, ...). The user asked for
an opt-in knob that pins the user peer to ``peer_name`` from ``honcho.json``
so the same person's memory stays unified regardless of which platform the
turn arrived on ``hosts.<host>.pinPeerName: true`` (or root-level
``pinPeerName: true``).
These tests exercise both the config parsing (``client.py::from_global_config``)
and the resolution order (``session.py::get_or_create``). We stub the
Honcho API calls so we can assert the chosen ``user_peer_id`` without
touching the network.
"""
import json
from unittest.mock import MagicMock
import pytest
from plugins.memory.honcho.client import HonchoClientConfig
from plugins.memory.honcho.session import HonchoSessionManager
# ---------------------------------------------------------------------------
# Config parsing
# ---------------------------------------------------------------------------
class TestPinPeerNameConfigParsing:
def test_default_is_false(self):
"""Default preserves existing behaviour — multi-user bots unaffected."""
config = HonchoClientConfig()
assert config.pin_peer_name is False
def test_root_level_true(self, tmp_path, monkeypatch):
config_file = tmp_path / "honcho.json"
config_file.write_text(json.dumps({
"apiKey": "k",
"peerName": "Igor",
"pinPeerName": True,
}))
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
config = HonchoClientConfig.from_global_config(config_path=config_file)
assert config.pin_peer_name is True
assert config.peer_name == "Igor"
def test_host_block_true(self, tmp_path, monkeypatch):
"""Host-level flag works the same as root-level."""
config_file = tmp_path / "honcho.json"
config_file.write_text(json.dumps({
"apiKey": "k",
"peerName": "Igor",
"hosts": {
"hermes": {"pinPeerName": True},
},
}))
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
config = HonchoClientConfig.from_global_config(config_path=config_file)
assert config.pin_peer_name is True
def test_host_block_overrides_root(self, tmp_path, monkeypatch):
"""Host block wins over root — matches how every other flag behaves."""
config_file = tmp_path / "honcho.json"
config_file.write_text(json.dumps({
"apiKey": "k",
"peerName": "Igor",
"pinPeerName": True,
"hosts": {
"hermes": {"pinPeerName": False},
},
}))
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
config = HonchoClientConfig.from_global_config(config_path=config_file)
assert config.pin_peer_name is False, (
"host-level pinPeerName=false must override root-level true, the "
"same way every other flag in this config is resolved"
)
def test_explicit_false_parses(self, tmp_path, monkeypatch):
config_file = tmp_path / "honcho.json"
config_file.write_text(json.dumps({
"apiKey": "k",
"peerName": "Igor",
"pinPeerName": False,
}))
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
config = HonchoClientConfig.from_global_config(config_path=config_file)
assert config.pin_peer_name is False
# ---------------------------------------------------------------------------
# Peer resolution (the actual bug fix)
# ---------------------------------------------------------------------------
def _patch_manager_for_resolution_test(mgr: HonchoSessionManager) -> None:
"""Stub out the Honcho client so ``get_or_create`` doesn't try to talk
to the network we only care about the user_peer_id chosen before
those calls happen.
"""
fake_peer = MagicMock()
mgr._get_or_create_peer = MagicMock(return_value=fake_peer)
mgr._get_or_create_honcho_session = MagicMock(
return_value=(MagicMock(), [])
)
class TestPeerResolutionOrder:
"""Matrix of (runtime_id, pin_peer_name, peer_name) → expected user_peer_id."""
def _config(self, *, peer_name: str | None, pin_peer_name: bool) -> HonchoClientConfig:
# The test doesn't need auth / Honcho — disable the provider so
# the manager doesn't try to open a real client.
return HonchoClientConfig(
api_key="test-key",
peer_name=peer_name,
pin_peer_name=pin_peer_name,
enabled=False,
write_frequency="turn", # avoid spawning the async writer thread
)
def test_runtime_wins_when_pin_is_false(self):
"""Regression guard: default behaviour must stay unchanged.
Multi-user bots rely on the platform-native ID winning."""
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=self._config(peer_name="Igor", pin_peer_name=False),
runtime_user_peer_name="86701400", # e.g. Telegram UID
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("telegram:86701400")
assert session.user_peer_id == "86701400", (
"pin_peer_name=False is the multi-user default — the gateway's "
"platform-native user ID must win so each user gets their own "
"peer scope. If this regresses, every Telegram/Discord/Slack "
"bot immediately merges memory across users."
)
def test_config_wins_when_pin_is_true(self):
"""The #14984 fix: single-user deployments opt into config pinning."""
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=self._config(peer_name="Igor", pin_peer_name=True),
runtime_user_peer_name="86701400", # Telegram pushes this in
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("telegram:86701400")
assert session.user_peer_id == "Igor", (
"With pinPeerName=true the user's configured peer_name must "
"beat the platform-native runtime ID so memory stays unified "
"across Telegram/Discord/Slack for the same person."
)
def test_pin_noop_when_peer_name_missing(self):
"""Safety: pinPeerName alone (no peer_name) must not silently drop
the runtime identity. Without a configured peer_name there's
nothing to pin to fall back to runtime as before."""
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=self._config(peer_name=None, pin_peer_name=True),
runtime_user_peer_name="86701400",
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("telegram:86701400")
assert session.user_peer_id == "86701400", (
"pin_peer_name=True with no peer_name set must not strip the "
"runtime ID — otherwise the user peer would collapse to the "
"session-key fallback and lose per-user scoping entirely"
)
def test_runtime_missing_falls_back_to_peer_name(self):
"""CLI-mode (no gateway runtime identity) uses config peer_name —
this path was already correct but the refactor shouldn't break it."""
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=self._config(peer_name="Igor", pin_peer_name=False),
runtime_user_peer_name=None,
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("cli:local")
assert session.user_peer_id == "Igor"
def test_everything_missing_falls_back_to_session_key(self):
"""Deepest fallback: no runtime identity, no peer_name, no pin.
Must still produce a deterministic peer_id from the session key."""
# Config with no peer_name and default pin_peer_name=False
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=self._config(peer_name=None, pin_peer_name=False),
runtime_user_peer_name=None,
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("telegram:123")
assert session.user_peer_id == "user-telegram-123"
def test_pin_does_not_affect_assistant_peer(self):
"""The flag only pins the USER peer — the assistant peer continues
to come from ``ai_peer`` and must not be touched."""
cfg = HonchoClientConfig(
api_key="k",
peer_name="Igor",
pin_peer_name=True,
ai_peer="hermes-assistant",
enabled=False,
write_frequency="turn",
)
mgr = HonchoSessionManager(
honcho=MagicMock(),
config=cfg,
runtime_user_peer_name="86701400",
)
_patch_manager_for_resolution_test(mgr)
session = mgr.get_or_create("telegram:86701400")
assert session.user_peer_id == "Igor"
assert session.assistant_peer_id == "hermes-assistant"
class TestCrossPlatformMemoryUnification:
"""The user-visible outcome of the #14984 fix: the same physical user
talking to Hermes via Telegram AND Discord should land on ONE peer
(not two) when pinPeerName is opted in.
"""
def _config_pinned(self) -> HonchoClientConfig:
return HonchoClientConfig(
api_key="k",
peer_name="Igor",
pin_peer_name=True,
enabled=False,
write_frequency="turn",
)
def test_telegram_and_discord_collapse_to_one_peer_when_pinned(self):
"""Single-user deployment: Telegram UID and Discord snowflake
both resolve to the same configured peer_name."""
# Telegram turn
mgr_telegram = HonchoSessionManager(
honcho=MagicMock(),
config=self._config_pinned(),
runtime_user_peer_name="86701400",
)
_patch_manager_for_resolution_test(mgr_telegram)
telegram_session = mgr_telegram.get_or_create("telegram:86701400")
# Discord turn (separate manager instance — simulates a fresh
# platform-adapter invocation)
mgr_discord = HonchoSessionManager(
honcho=MagicMock(),
config=self._config_pinned(),
runtime_user_peer_name="1348750102029926454",
)
_patch_manager_for_resolution_test(mgr_discord)
discord_session = mgr_discord.get_or_create("discord:1348750102029926454")
assert telegram_session.user_peer_id == "Igor"
assert discord_session.user_peer_id == "Igor"
assert telegram_session.user_peer_id == discord_session.user_peer_id, (
"cross-platform memory unification is the whole point of "
"pinPeerName — both platforms must land on the same Honcho peer"
)
def test_multiuser_default_keeps_platforms_separate(self):
"""Negative control: with pinPeerName=false (the default), two
different platform IDs must produce two different peers so
multi-user bots don't merge users."""
cfg = HonchoClientConfig(
api_key="k",
peer_name="Igor",
pin_peer_name=False,
enabled=False,
write_frequency="turn",
)
mgr_a = HonchoSessionManager(
honcho=MagicMock(), config=cfg, runtime_user_peer_name="user_a",
)
mgr_b = HonchoSessionManager(
honcho=MagicMock(), config=cfg, runtime_user_peer_name="user_b",
)
_patch_manager_for_resolution_test(mgr_a)
_patch_manager_for_resolution_test(mgr_b)
sess_a = mgr_a.get_or_create("telegram:a")
sess_b = mgr_b.get_or_create("telegram:b")
assert sess_a.user_peer_id == "user_a"
assert sess_b.user_peer_id == "user_b"
assert sess_a.user_peer_id != sess_b.user_peer_id, (
"multi-user default MUST keep users separate — a regression "
"here would silently merge unrelated users' memory"
)
+33
View File
@@ -525,6 +525,39 @@ class TestConcludeToolDispatch:
assert parsed == {"error": "Exactly one of conclusion or delete_id must be provided."}
provider._manager.delete_conclusion.assert_not_called()
def test_sync_turn_strips_leaked_memory_context_before_honcho_ingest(self):
provider = HonchoMemoryProvider()
provider._session_key = "telegram:123"
provider._manager = MagicMock()
provider._cron_skipped = False
provider._config = SimpleNamespace(message_max_chars=25000)
session = MagicMock()
provider._manager.get_or_create.return_value = session
provider.sync_turn(
(
"hello\n\n"
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>"
),
(
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>\n\n"
"Visible answer"
),
)
provider._sync_thread.join(timeout=1.0)
assert session.add_message.call_args_list[0].args == ("user", "hello")
assert session.add_message.call_args_list[1].args == ("assistant", "Visible answer")
# ---------------------------------------------------------------------------
# Message chunking
+28 -10
View File
@@ -1441,6 +1441,24 @@ class TestBuildAssistantMessage:
result = agent._build_assistant_message(msg, "stop")
assert result["content"] == "No thinking here."
def test_memory_context_in_stored_content_is_preserved(self, agent):
"""`_build_assistant_message` must not silently mutate model output
containing literal <memory-context> markers that's legitimate text
(e.g. documentation, code) that the model may emit. Streaming-path
leak prevention is handled by StreamingContextScrubber upstream."""
original = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>\n\n"
"Visible answer"
)
msg = _mock_assistant_msg(content=original)
result = agent._build_assistant_message(msg, "stop")
assert "<memory-context>" in result["content"]
assert "Visible answer" in result["content"]
def test_unterminated_think_block_stripped(self, agent):
"""Unterminated <think> block (MiniMax / NIM dropped close tag) is
fully stripped from stored content."""
@@ -4753,21 +4771,21 @@ class TestDeadRetryCode:
class TestMemoryContextSanitization:
"""run_conversation() must strip leaked <memory-context> blocks from user input."""
"""sanitize_context() helper correctness — used at provider boundaries."""
def test_memory_context_stripped_from_user_message(self):
"""Verify that <memory-context> blocks are removed before the message
enters the conversation loop prevents stale Honcho injection from
leaking into user text."""
def test_user_message_is_not_mutated_by_run_conversation(self):
"""User input must reach run_conversation untouched — if a user types
a literal <memory-context> tag we don't silently delete their text.
The streaming scrubber + plugin-side scrub cover real leak paths."""
import inspect
src = inspect.getsource(AIAgent.run_conversation)
# The sanitize_context call must appear in run_conversation's preamble
assert "sanitize_context(user_message)" in src
assert "sanitize_context(persist_user_message)" in src
assert "sanitize_context(user_message)" not in src
assert "sanitize_context(persist_user_message)" not in src
def test_sanitize_context_strips_full_block(self):
"""End-to-end: a user message with an embedded memory-context block
is cleaned to just the actual user text."""
"""Helper-level: a string with an embedded memory-context block is
cleaned to just the surrounding text. Used by build_memory_context_block
(input-validation) and by plugins on their own backend boundary."""
from agent.memory_manager import sanitize_context
user_text = "how is the honcho working"
injected = (
@@ -1115,6 +1115,141 @@ def test_interim_commentary_is_not_marked_already_streamed_when_stream_callback_
}
def test_interim_commentary_preserves_assistant_content(monkeypatch):
"""Interim commentary must not silently mutate assistant text containing
literal <memory-context> markers that's legitimate model output (docs,
code). Streaming-path leak prevention happens delta-by-delta upstream."""
agent = _build_agent(monkeypatch)
observed = {}
agent.interim_assistant_callback = lambda text, *, already_streamed=False: observed.update(
{"text": text, "already_streamed": already_streamed}
)
content = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>\n\n"
"I'll inspect the repo structure first."
)
agent._emit_interim_assistant_message({"role": "assistant", "content": content})
assert "<memory-context>" in observed["text"]
assert "I'll inspect the repo structure first." in observed["text"]
def test_stream_delta_strips_leaked_memory_context(monkeypatch):
agent = _build_agent(monkeypatch)
observed = []
agent.stream_delta_callback = observed.append
leaked = (
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>\n\n"
"Visible answer"
)
agent._fire_stream_delta(leaked)
assert observed == ["Visible answer"]
def test_stream_delta_strips_leaked_memory_context_across_chunks(monkeypatch):
"""Regression for #5719 — the real streaming case.
Providers typically emit 1-80 char chunks, so the memory-context open
tag, system-note line, payload, and close tag each arrive in separate
deltas. The per-delta sanitize_context() regex cannot survive that
only a stateful scrubber can. None of the payload, system-note
text, or "## Honcho Context" header may reach the delta callback.
"""
agent = _build_agent(monkeypatch)
observed = []
agent.stream_delta_callback = observed.append
deltas = [
"<memory-context>\n[System note: The following",
" is recalled memory context, NOT new user input. ",
"Treat as informational background data.]\n\n",
"## Honcho Context\n",
"stale memory about eri\n",
"</memory-context>\n\n",
"Visible answer",
]
for d in deltas:
agent._fire_stream_delta(d)
combined = "".join(observed)
assert "Visible answer" in combined
# None of the leaked payload may surface.
assert "System note" not in combined
assert "Honcho Context" not in combined
assert "stale memory" not in combined
assert "<memory-context>" not in combined
assert "</memory-context>" not in combined
def test_stream_delta_scrubber_resets_between_turns(monkeypatch):
"""An unterminated span from a prior turn must not taint the next turn."""
agent = _build_agent(monkeypatch)
# Simulate a hung span carried over — directly populate the scrubber.
agent._stream_context_scrubber.feed("pre <memory-context>leaked")
# Normally run_conversation() resets the scrubber at turn start.
agent._stream_context_scrubber.reset()
observed = []
agent.stream_delta_callback = observed.append
agent._fire_stream_delta("clean new turn text")
assert "".join(observed) == "clean new turn text"
def test_stream_delta_preserves_mid_stream_leading_newlines(monkeypatch):
"""Mid-stream leading newlines must survive — they are legitimate
markdown (lists, code fences, paragraph breaks). Stripping them
based on chunk boundaries silently breaks formatting.
Only the very first delta of a stream gets leading-newlines stripped
(so stale provider preamble doesn't leak); after that, deltas are
emitted verbatim.
"""
agent = _build_agent(monkeypatch)
observed = []
agent.stream_delta_callback = observed.append
# First delta delivers text — strips its own leading "\n" once.
agent._fire_stream_delta("\nHere is a list:")
# Second delta starts with "\n- item" — must NOT be stripped.
agent._fire_stream_delta("\n- first")
agent._fire_stream_delta("\n- second")
combined = "".join(observed)
assert combined == "Here is a list:\n- first\n- second"
def test_stream_delta_preserves_code_fence_newlines(monkeypatch):
"""Code blocks span multiple deltas. A "\\n```python\\n" boundary
is the canonical case where stripping leading newlines corrupts output."""
agent = _build_agent(monkeypatch)
observed = []
agent.stream_delta_callback = observed.append
agent._fire_stream_delta("Here is the code:")
agent._fire_stream_delta("\n```python\n")
agent._fire_stream_delta("print('hi')\n")
agent._fire_stream_delta("```\n")
combined = "".join(observed)
assert "```python\n" in combined
assert combined.startswith("Here is the code:\n```python\n")
def test_run_conversation_codex_continues_after_commentary_phase_message(monkeypatch):
agent = _build_agent(monkeypatch)
responses = [
+203 -2
View File
@@ -258,6 +258,24 @@ class TestMessageStorage:
messages = db.get_messages("s1")
assert messages[0]["finish_reason"] == "stop"
def test_get_messages_as_conversation_strips_leaked_memory_context(self, db):
db.create_session(session_id="s1", source="cli")
db.append_message(
"s1",
role="assistant",
content=(
"<memory-context>\n"
"[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
"## Honcho Context\n"
"stale memory\n"
"</memory-context>\n\n"
"Visible answer"
),
)
conv = db.get_messages_as_conversation("s1")
assert conv == [{"role": "assistant", "content": "Visible answer"}]
def test_reasoning_persisted_and_restored(self, db):
"""Reasoning text is stored for assistant messages and restored by
get_messages_as_conversation() so providers receive coherent multi-turn
@@ -772,6 +790,51 @@ class TestCJKSearchFallback:
results = db.search_messages("Agent通信")
assert len(results) == 1
def test_cjk_partial_fts5_results_supplemented_by_like(self, db):
"""When FTS5 returns *some* CJK results, LIKE must still find all matches.
Regression test for #15500 / #14829: FTS5 unicode61 tokenizer drops
certain CJK characters, so multi-character queries may return partial
results. The LIKE path must always run for CJK queries.
"""
db.create_session(session_id="s1", source="cli")
db.create_session(session_id="s2", source="telegram")
db.append_message("s1", role="user", content="昨晚讨论了记忆系统")
db.append_message("s2", role="user", content="昨晚的会议纪要已发送")
results = db.search_messages("昨晚")
assert len(results) == 2
session_ids = {r["session_id"] for r in results}
assert session_ids == {"s1", "s2"}
def test_cjk_like_dedup_no_duplicates(self, db):
"""When FTS5 and LIKE both find the same message, no duplicates."""
db.create_session(session_id="s1", source="cli")
db.append_message("s1", role="user", content="测试去重逻辑")
results = db.search_messages("测试")
assert len(results) == 1
def test_cjk_like_escapes_wildcards(self, db):
"""Special characters (%, _) in CJK queries are treated as literals."""
db.create_session(session_id="s1", source="cli")
db.create_session(session_id="s2", source="cli")
db.append_message("s1", role="user", content="达成100%完成率")
db.append_message("s2", role="user", content="达成100完成率是目标")
# The % in the query must be literal — should only match s1
results = db.search_messages("100%完成")
assert len(results) == 1
assert results[0]["session_id"] == "s1"
def test_cjk_trigram_preserves_boolean_operators(self, db):
"""Boolean operators (OR, AND, NOT) work in CJK trigram queries."""
db.create_session(session_id="s1", source="cli")
db.create_session(session_id="s2", source="cli")
db.append_message("s1", role="user", content="记忆系统很好用")
db.append_message("s2", role="user", content="断裂连接需要修复")
results = db.search_messages("记忆系统 OR 断裂连接")
assert len(results) == 2
session_ids = {r["session_id"] for r in results}
assert session_ids == {"s1", "s2"}
# =========================================================================
# Session search and listing
@@ -1229,7 +1292,7 @@ class TestSchemaInit:
def test_schema_version(self, db):
cursor = db._conn.execute("SELECT version FROM schema_version")
version = cursor.fetchone()[0]
assert version == 9
assert version == 10
def test_title_column_exists(self, db):
"""Verify the title column was created in the sessions table."""
@@ -1290,7 +1353,7 @@ class TestSchemaInit:
# Verify migration
cursor = migrated_db._conn.execute("SELECT version FROM schema_version")
assert cursor.fetchone()[0] == 9
assert cursor.fetchone()[0] == 10
# Verify title column exists and is NULL for existing sessions
session = migrated_db.get_session("existing")
@@ -1310,6 +1373,144 @@ class TestSchemaInit:
migrated_db.close()
def test_reconciliation_adds_missing_columns(self, tmp_path):
"""Columns present in SCHEMA_SQL but missing from the live table
are added by _reconcile_columns regardless of schema_version.
Regression test: commit a7d78d3b inserted a new v7 migration
(reasoning_content) and renumbered the old v7 (api_call_count)
to v8. Users already at the old v7 had schema_version >= 7,
so the new v7 block was skipped and reasoning_content was never
created causing 'no such column' on /continue.
"""
import sqlite3
db_path = tmp_path / "gap_test.db"
conn = sqlite3.connect(str(db_path))
# Simulate the old v7 state: api_call_count exists, reasoning_content does NOT
conn.executescript("""
CREATE TABLE schema_version (version INTEGER NOT NULL);
INSERT INTO schema_version (version) VALUES (7);
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
user_id TEXT,
model TEXT,
model_config TEXT,
system_prompt TEXT,
parent_session_id TEXT,
started_at REAL NOT NULL,
ended_at REAL,
end_reason TEXT,
message_count INTEGER DEFAULT 0,
tool_call_count INTEGER DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
reasoning_tokens INTEGER DEFAULT 0,
billing_provider TEXT,
billing_base_url TEXT,
billing_mode TEXT,
estimated_cost_usd REAL,
actual_cost_usd REAL,
cost_status TEXT,
cost_source TEXT,
pricing_version TEXT,
title TEXT,
api_call_count INTEGER DEFAULT 0
);
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
role TEXT NOT NULL,
content TEXT,
tool_call_id TEXT,
tool_calls TEXT,
tool_name TEXT,
timestamp REAL NOT NULL,
token_count INTEGER,
finish_reason TEXT,
reasoning TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
);
""")
conn.execute(
"INSERT INTO sessions (id, source, started_at) VALUES (?, ?, ?)",
("s1", "cli", 1000.0),
)
conn.execute(
"INSERT INTO messages (session_id, role, content, timestamp) "
"VALUES (?, ?, ?, ?)",
("s1", "assistant", "hello", 1001.0),
)
conn.commit()
# Verify reasoning_content is absent
cols = {r[1] for r in conn.execute("PRAGMA table_info(messages)").fetchall()}
assert "reasoning_content" not in cols
conn.close()
# Open with SessionDB — reconciliation should add the missing column
migrated_db = SessionDB(db_path=db_path)
msg_cols = {
r[1]
for r in migrated_db._conn.execute("PRAGMA table_info(messages)").fetchall()
}
assert "reasoning_content" in msg_cols
# The query that used to crash must now work
cursor = migrated_db._conn.execute(
"SELECT role, content, reasoning, reasoning_content, "
"reasoning_details, codex_reasoning_items "
"FROM messages WHERE session_id = ?",
("s1",),
)
row = cursor.fetchone()
assert row is not None
assert row[0] == "assistant"
assert row[3] is None # reasoning_content NULL for old rows
migrated_db.close()
def test_reconciliation_is_idempotent(self, tmp_path):
"""Opening the same database twice doesn't error or duplicate columns."""
db_path = tmp_path / "idempotent.db"
db1 = SessionDB(db_path=db_path)
cols1 = {r[1] for r in db1._conn.execute("PRAGMA table_info(messages)").fetchall()}
db1.close()
db2 = SessionDB(db_path=db_path)
cols2 = {r[1] for r in db2._conn.execute("PRAGMA table_info(messages)").fetchall()}
db2.close()
assert cols1 == cols2
def test_schema_sql_is_source_of_truth(self, db):
"""Every column in SCHEMA_SQL exists in the live database.
This is the architectural invariant: SCHEMA_SQL declares the
desired schema, _reconcile_columns ensures it matches reality.
"""
from hermes_state import SCHEMA_SQL
expected = SessionDB._parse_schema_columns(SCHEMA_SQL)
for table_name, declared_cols in expected.items():
live_cols = {
r[1]
for r in db._conn.execute(
f'PRAGMA table_info("{table_name}")'
).fetchall()
}
for col_name in declared_cols:
assert col_name in live_cols, (
f"Column {col_name} declared in SCHEMA_SQL for {table_name} "
f"but missing from live DB. Live columns: {live_cols}"
)
class TestTitleUniqueness:
"""Tests for unique title enforcement and title-based lookups."""
+246 -1
View File
@@ -274,6 +274,69 @@ def _session(agent=None, **extra):
}
def test_session_close_commits_memory_and_fires_finalize_hook(monkeypatch):
calls = {"hooks": []}
agent = types.SimpleNamespace(session_id="session-key")
agent.commit_memory_session = lambda history: calls.setdefault("history", history)
server._sessions["sid"] = _session(
agent=agent, history=[{"role": "user", "content": "hello"}]
)
monkeypatch.setattr(
server,
"_notify_session_boundary",
lambda event, session_id: calls["hooks"].append((event, session_id)),
)
try:
resp = server.handle_request(
{"id": "1", "method": "session.close", "params": {"session_id": "sid"}}
)
assert resp["result"]["closed"] is True
assert calls["history"] == [{"role": "user", "content": "hello"}]
assert ("on_session_finalize", "session-key") in calls["hooks"]
finally:
server._sessions.pop("sid", None)
def test_init_session_fires_reset_hook(monkeypatch):
hooks = []
class _FakeWorker:
def __init__(self, key, model):
self.key = key
def close(self):
return None
monkeypatch.setattr(server, "_SlashWorker", _FakeWorker)
monkeypatch.setattr(server, "_wire_callbacks", lambda _sid: None)
monkeypatch.setattr(server, "_emit", lambda *args, **kwargs: None)
monkeypatch.setattr(
server,
"_notify_session_boundary",
lambda event, session_id: hooks.append((event, session_id)),
)
import tools.approval as _approval
monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)
sid = "sid"
try:
server._init_session(
sid,
"session-key",
types.SimpleNamespace(model="x"),
history=[],
cols=80,
)
assert ("on_session_reset", "session-key") in hooks
finally:
server._sessions.pop(sid, None)
def test_session_title_queues_when_db_row_not_ready(monkeypatch):
class _FakeDB:
def get_session_title(self, _key):
@@ -564,7 +627,9 @@ def test_session_create_drops_pending_title_on_valueerror(monkeypatch):
monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)
resp = server.handle_request({"id": "1", "method": "session.create", "params": {"cols": 80}})
resp = server.handle_request(
{"id": "1", "method": "session.create", "params": {"cols": 80}}
)
sid = resp["result"]["session_id"]
session = server._sessions[sid]
session["pending_title"] = "duplicate title"
@@ -604,6 +669,176 @@ def test_config_set_yolo_toggles_session_scope():
server._sessions.clear()
def test_config_set_fast_updates_live_agent_and_config(monkeypatch):
writes = []
emits = []
agent = types.SimpleNamespace(
model="openai/gpt-5.4",
request_overrides={"foo": "bar", "speed": "slow"},
service_tier=None,
)
server._sessions["sid"] = _session(agent=agent)
monkeypatch.setattr(
server, "_write_config_key", lambda path, value: writes.append((path, value))
)
monkeypatch.setattr(server, "_session_info", lambda _agent: {"model": "x"})
monkeypatch.setattr(server, "_emit", lambda *args: emits.append(args))
monkeypatch.setattr(
"hermes_cli.models.resolve_fast_mode_overrides",
lambda _model_id: {"service_tier": "priority"},
)
try:
resp = server.handle_request(
{
"id": "1",
"method": "config.set",
"params": {"session_id": "sid", "key": "fast", "value": "fast"},
}
)
assert resp["result"]["value"] == "fast"
assert agent.service_tier == "priority"
assert agent.request_overrides == {
"foo": "bar",
"service_tier": "priority",
}
assert ("agent.service_tier", "fast") in writes
assert ("session.info", "sid", {"model": "x"}) in emits
resp_normal = server.handle_request(
{
"id": "2",
"method": "config.set",
"params": {"session_id": "sid", "key": "fast", "value": "normal"},
}
)
assert resp_normal["result"]["value"] == "normal"
assert agent.service_tier is None
assert agent.request_overrides == {"foo": "bar"}
assert ("agent.service_tier", "normal") in writes
finally:
server._sessions.pop("sid", None)
def test_config_set_fast_status_is_non_mutating(monkeypatch):
writes = []
emits = []
agent = types.SimpleNamespace(service_tier="priority")
server._sessions["sid"] = _session(agent=agent)
monkeypatch.setattr(
server, "_write_config_key", lambda path, value: writes.append((path, value))
)
monkeypatch.setattr(server, "_emit", lambda *args: emits.append(args))
try:
resp = server.handle_request(
{
"id": "1",
"method": "config.set",
"params": {"session_id": "sid", "key": "fast", "value": "status"},
}
)
assert resp["result"]["value"] == "fast"
assert writes == []
assert emits == []
finally:
server._sessions.pop("sid", None)
def test_config_set_fast_rejects_unsupported_model(monkeypatch):
writes = []
agent = types.SimpleNamespace(
model="unsupported-model",
request_overrides={},
service_tier=None,
)
server._sessions["sid"] = _session(agent=agent)
monkeypatch.setattr(
server, "_write_config_key", lambda path, value: writes.append((path, value))
)
monkeypatch.setattr(
"hermes_cli.models.resolve_fast_mode_overrides",
lambda _model_id: None,
)
try:
resp = server.handle_request(
{
"id": "1",
"method": "config.set",
"params": {"session_id": "sid", "key": "fast", "value": "fast"},
}
)
assert resp["error"]["code"] == 4002
assert "not available" in resp["error"]["message"]
assert agent.service_tier is None
assert agent.request_overrides == {}
assert writes == []
finally:
server._sessions.pop("sid", None)
def test_config_set_fast_rejects_missing_model(monkeypatch):
writes = []
agent = types.SimpleNamespace(
model="",
request_overrides={},
service_tier=None,
)
server._sessions["sid"] = _session(agent=agent)
monkeypatch.setattr(
server, "_write_config_key", lambda path, value: writes.append((path, value))
)
try:
resp = server.handle_request(
{
"id": "1",
"method": "config.set",
"params": {"session_id": "sid", "key": "fast", "value": "fast"},
}
)
assert resp["error"]["code"] == 4002
assert "without a selected model" in resp["error"]["message"]
assert agent.service_tier is None
assert agent.request_overrides == {}
assert writes == []
finally:
server._sessions.pop("sid", None)
def test_config_busy_get_and_set(monkeypatch):
writes = []
monkeypatch.setattr(
server,
"_load_cfg",
lambda: {"display": {"busy_input_mode": "steer"}},
)
monkeypatch.setattr(
server, "_write_config_key", lambda path, value: writes.append((path, value))
)
get_resp = server.handle_request(
{"id": "1", "method": "config.get", "params": {"key": "busy"}}
)
assert get_resp["result"]["value"] == "steer"
set_resp = server.handle_request(
{
"id": "2",
"method": "config.set",
"params": {"key": "busy", "value": "interrupt"},
}
)
assert set_resp["result"]["value"] == "interrupt"
assert ("display.busy_input_mode", "interrupt") in writes
def test_config_get_statusbar_survives_non_dict_display(monkeypatch):
monkeypatch.setattr(server, "_load_cfg", lambda: {"display": "broken"})
@@ -614,6 +849,16 @@ def test_config_get_statusbar_survives_non_dict_display(monkeypatch):
assert resp["result"]["value"] == "top"
def test_config_get_busy_survives_non_dict_display(monkeypatch):
monkeypatch.setattr(server, "_load_cfg", lambda: {"display": "broken"})
resp = server.handle_request(
{"id": "1", "method": "config.get", "params": {"key": "busy"}}
)
assert resp["result"]["value"] == "interrupt"
def test_config_set_statusbar_survives_non_dict_display(tmp_path, monkeypatch):
import yaml
+248
View File
@@ -0,0 +1,248 @@
"""Tests for pre_approval_request / post_approval_response plugin hooks.
These hooks fire in tools/approval.py::check_all_command_guards whenever a
dangerous command needs user approval. They are observer-only (return values
ignored) and must fire on BOTH the CLI-interactive path and the async gateway
path, so external tools like macOS notifiers can be alerted regardless of
which surface the user is on.
"""
from unittest.mock import patch
import pytest
import tools.approval as approval_module
from tools.approval import (
check_all_command_guards,
register_gateway_notify,
unregister_gateway_notify,
resolve_gateway_approval,
set_current_session_key,
clear_session,
)
@pytest.fixture
def isolated_session(monkeypatch):
"""Give each test a fresh session_key and clean approval-state."""
session_key = "test:session:approval_hooks"
token = set_current_session_key(session_key)
monkeypatch.setenv("HERMES_SESSION_KEY", session_key)
# Make sure we don't skip guards via yolo / approvals.mode=off
monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
try:
yield session_key
finally:
try:
approval_module._approval_session_key.reset(token)
except Exception:
pass
clear_session(session_key)
class TestCliPathFiresHooks:
"""CLI-interactive approval path: HERMES_INTERACTIVE is set, the
prompt_dangerous_approval() result decides the outcome."""
def test_pre_and_post_fire_with_expected_kwargs(
self, isolated_session, monkeypatch
):
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
# approvals.mode=manual so we actually reach the prompt site
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
# Force the user to "approve once" via the approval_callback contract
def cb(command, description, *, allow_permanent=True):
return "once"
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result = check_all_command_guards(
"rm -rf /tmp/test-hook", "local", approval_callback=cb,
)
assert result["approved"] is True
hook_names = [c[0] for c in captured]
assert "pre_approval_request" in hook_names
assert "post_approval_response" in hook_names
pre_kwargs = next(kw for name, kw in captured if name == "pre_approval_request")
assert pre_kwargs["command"] == "rm -rf /tmp/test-hook"
assert pre_kwargs["surface"] == "cli"
assert pre_kwargs["session_key"] == isolated_session
assert isinstance(pre_kwargs["pattern_keys"], list)
assert pre_kwargs["pattern_key"] # non-empty primary pattern
assert pre_kwargs["description"]
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["choice"] == "once"
assert post_kwargs["surface"] == "cli"
assert post_kwargs["command"] == "rm -rf /tmp/test-hook"
def test_deny_reported_to_post_hook(self, isolated_session, monkeypatch):
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
def cb(command, description, *, allow_permanent=True):
return "deny"
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result = check_all_command_guards(
"rm -rf /tmp/test-deny", "local", approval_callback=cb,
)
assert result["approved"] is False
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["choice"] == "deny"
def test_plugin_hook_crash_does_not_break_approval(
self, isolated_session, monkeypatch
):
"""A crashing plugin must never prevent the approval flow from
reaching the user. Hooks are observer-only and safety-critical
behavior must be preserved."""
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
def boom(hook_name, **kwargs):
raise RuntimeError("plugin crashed")
def cb(command, description, *, allow_permanent=True):
return "once"
with patch("hermes_cli.plugins.invoke_hook", side_effect=boom):
result = check_all_command_guards(
"rm -rf /tmp/test-crash", "local", approval_callback=cb,
)
# User's approval was still honored despite the plugin crashing
assert result["approved"] is True
class TestGatewayPathFiresHooks:
"""Async gateway approval path: HERMES_GATEWAY_SESSION is set and a
gateway notify callback is registered. The agent thread blocks on the
approval event until resolve_gateway_approval() is called from another
thread."""
def test_pre_and_post_fire_on_gateway_surface(
self, isolated_session, monkeypatch
):
import threading
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
monkeypatch.setenv("HERMES_GATEWAY_SESSION", "1")
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
# Short gateway_timeout so a buggy test fails fast instead of hanging
monkeypatch.setattr(
approval_module, "_get_approval_config", lambda: {"gateway_timeout": 10}
)
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
notify_seen = threading.Event()
def notify_cb(approval_data):
notify_seen.set()
register_gateway_notify(isolated_session, notify_cb)
result_holder = {}
def run_guard():
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result_holder["result"] = check_all_command_guards(
"rm -rf /tmp/test-gateway-hook", "local",
)
t = threading.Thread(target=run_guard, daemon=True)
t.start()
# Wait for the gateway callback to see the approval request
assert notify_seen.wait(timeout=5), "Gateway notify never fired"
# User approves from the "other thread" (simulating /approve command)
resolve_gateway_approval(isolated_session, "once")
t.join(timeout=5)
assert not t.is_alive(), "Agent thread never unblocked"
unregister_gateway_notify(isolated_session)
assert result_holder["result"]["approved"] is True
hook_names = [c[0] for c in captured]
assert "pre_approval_request" in hook_names
assert "post_approval_response" in hook_names
pre_kwargs = next(kw for name, kw in captured if name == "pre_approval_request")
assert pre_kwargs["surface"] == "gateway"
assert pre_kwargs["command"] == "rm -rf /tmp/test-gateway-hook"
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["surface"] == "gateway"
assert post_kwargs["choice"] == "once"
def test_timeout_reports_timeout_choice(self, isolated_session, monkeypatch):
import threading
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
monkeypatch.setenv("HERMES_GATEWAY_SESSION", "1")
monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
monkeypatch.setattr(approval_module, "_get_approval_mode", lambda: "manual")
monkeypatch.setattr(
approval_module, "_get_approval_config", lambda: {"gateway_timeout": 1}
)
captured = []
def fake_invoke_hook(hook_name, **kwargs):
captured.append((hook_name, kwargs))
return []
notify_seen = threading.Event()
def notify_cb(approval_data):
notify_seen.set()
register_gateway_notify(isolated_session, notify_cb)
result_holder = {}
def run_guard():
with patch("hermes_cli.plugins.invoke_hook", side_effect=fake_invoke_hook):
result_holder["result"] = check_all_command_guards(
"rm -rf /tmp/test-gateway-timeout", "local",
)
t = threading.Thread(target=run_guard, daemon=True)
t.start()
assert notify_seen.wait(timeout=5)
# Deliberately do NOT resolve -- let it time out
t.join(timeout=5)
assert not t.is_alive()
unregister_gateway_notify(isolated_session)
assert result_holder["result"]["approved"] is False
post_kwargs = next(kw for name, kw in captured if name == "post_approval_response")
assert post_kwargs["choice"] == "timeout"
+157
View File
@@ -568,6 +568,163 @@ class TestDelegateObservability(unittest.TestCase):
self.assertEqual(result["results"][0]["exit_reason"], "max_iterations")
class TestSubagentCostRollup(unittest.TestCase):
"""Port of Kilo-Org/kilocode#9448 — parent's session_estimated_cost_usd
must include subagent spend, not just the parent's own API calls."""
def _make_parent_with_cost_counters(self, depth=0, starting_cost=0.0):
parent = _make_mock_parent(depth=depth)
# The fields AIAgent exposes and the footer reads from. Set real
# floats/strings so the rollup can add to them rather than tripping
# on MagicMock auto-attrs.
parent.session_estimated_cost_usd = starting_cost
parent.session_cost_status = "unknown"
parent.session_cost_source = "none"
return parent
def test_single_child_cost_folded_into_parent(self):
parent = self._make_parent_with_cost_counters(starting_cost=0.10)
with patch("run_agent.AIAgent") as MockAgent:
mock_child = MagicMock()
mock_child.model = "claude-sonnet-4-6"
mock_child.session_prompt_tokens = 1000
mock_child.session_completion_tokens = 200
mock_child.session_estimated_cost_usd = 0.42
mock_child.run_conversation.return_value = {
"final_response": "done",
"completed": True,
"interrupted": False,
"api_calls": 2,
"messages": [],
}
MockAgent.return_value = mock_child
result = json.loads(delegate_task(goal="do stuff", parent_agent=parent))
# Parent footer must reflect parent_cost + child_cost.
self.assertAlmostEqual(parent.session_estimated_cost_usd, 0.52, places=6)
# Rollup must strip the internal field before serialising to the model.
self.assertNotIn("_child_cost_usd", result["results"][0])
self.assertNotIn("_child_role", result["results"][0])
def test_batch_children_costs_sum_into_parent(self):
parent = self._make_parent_with_cost_counters(starting_cost=0.00)
with patch("tools.delegate_tool._run_single_child") as mock_run:
mock_run.side_effect = [
{
"task_index": 0,
"status": "completed",
"summary": "A",
"api_calls": 2,
"duration_seconds": 1.0,
"_child_role": "leaf",
"_child_cost_usd": 0.15,
},
{
"task_index": 1,
"status": "completed",
"summary": "B",
"api_calls": 2,
"duration_seconds": 1.0,
"_child_role": "leaf",
"_child_cost_usd": 0.27,
},
{
"task_index": 2,
"status": "failed",
"summary": "",
"error": "boom",
"api_calls": 0,
"duration_seconds": 0.1,
"_child_role": "leaf",
"_child_cost_usd": 0.03,
},
]
result = json.loads(
delegate_task(
tasks=[{"goal": "A"}, {"goal": "B"}, {"goal": "C"}],
parent_agent=parent,
)
)
# 0.15 + 0.27 + 0.03 even though one child failed — the API calls it
# made before failing still cost money.
self.assertAlmostEqual(parent.session_estimated_cost_usd, 0.45, places=6)
# cost_source promoted from "none" since the parent had no direct spend.
self.assertEqual(parent.session_cost_source, "subagent")
self.assertEqual(parent.session_cost_status, "estimated")
# All internal fields stripped from results.
for entry in result["results"]:
self.assertNotIn("_child_cost_usd", entry)
self.assertNotIn("_child_role", entry)
def test_zero_cost_children_leave_parent_source_untouched(self):
"""If every child reports 0 cost (e.g. free local model), we should
not invent a fake 'subagent' source the parent's 'none' stays."""
parent = self._make_parent_with_cost_counters(starting_cost=0.00)
with patch("tools.delegate_tool._run_single_child") as mock_run:
mock_run.return_value = {
"task_index": 0,
"status": "completed",
"summary": "done",
"api_calls": 1,
"duration_seconds": 0.5,
"_child_role": "leaf",
"_child_cost_usd": 0.0,
}
delegate_task(goal="free local run", parent_agent=parent)
self.assertEqual(parent.session_estimated_cost_usd, 0.0)
self.assertEqual(parent.session_cost_source, "none")
def test_parent_with_real_source_not_overwritten(self):
"""If the parent already has its own cost billed (cost_source != 'none'),
adding subagent cost must not clobber the existing source label."""
parent = self._make_parent_with_cost_counters(starting_cost=0.20)
parent.session_cost_status = "exact"
parent.session_cost_source = "openrouter"
with patch("tools.delegate_tool._run_single_child") as mock_run:
mock_run.return_value = {
"task_index": 0,
"status": "completed",
"summary": "done",
"api_calls": 1,
"duration_seconds": 0.5,
"_child_role": "leaf",
"_child_cost_usd": 0.30,
}
delegate_task(goal="billed run", parent_agent=parent)
self.assertAlmostEqual(parent.session_estimated_cost_usd, 0.50, places=6)
# Real source label preserved.
self.assertEqual(parent.session_cost_source, "openrouter")
self.assertEqual(parent.session_cost_status, "exact")
def test_rollup_tolerates_missing_cost_fields(self):
"""Older fixtures / fabricated error entries may not carry
_child_cost_usd. Rollup must degrade to zero-add silently."""
parent = self._make_parent_with_cost_counters(starting_cost=0.10)
with patch("tools.delegate_tool._run_single_child") as mock_run:
mock_run.return_value = {
"task_index": 0,
"status": "completed",
"summary": "done",
"api_calls": 1,
"duration_seconds": 0.5,
# no _child_role, no _child_cost_usd
}
result = json.loads(delegate_task(goal="legacy", parent_agent=parent))
# Parent cost unchanged.
self.assertEqual(parent.session_estimated_cost_usd, 0.10)
self.assertEqual(len(result["results"]), 1)
class TestBlockedTools(unittest.TestCase):
def test_blocked_tools_constant(self):
for tool in ["delegate_task", "clarify", "memory", "send_message", "execute_code"]:

Some files were not shown because too many files have changed in this diff Show More